multi agent markov decision processes (mmdps), as the generalization of markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for multi agent reinforcement learning. in this paper, a generalized learning automata based algorithm for finding optimal policies in mmdp is proposed. in the proposed algorithm, mmdp ...