This article presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms Markov games. Using the idea of delayed Q-learning, this extends well-known Nash Q-learning algorithm to build new PAC MARL general-sum In addition guiding design provably algorithm, enables checking whether an arbitrary is PAC. Comparative numerical result...