Distributional Reinforcement Learning (RL) maintains the entire probability distribution of reward-to-go, i.e. return, providing more learning signals that account for uncertainty associated with policy performance, which may be beneficial trading off exploration and exploitation in general. Previous works distributional RL focused mainly on computing state-action-return distributions, here we ...