In many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. addition, training process is centralized, namely critic of can access policies agents. This scheme has certain limitations since every single only obtain information neighbor due to communication range in practical applications. Therefore, this paper, a distributed deep...