Abstract Policy gradient methods have become one of the most popular classes algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many these credit assignment: assessing an agent’s contribution to overall performance, which crucial learning good policies. We propose a novel algorithm called Dr.Reinforce explicitly tackles this combining differenc...