Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Authors

  • Behrooz Masoumi Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
  • Samaneh Assar Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
Abstract:

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP problem is described as a directed graph in which the nodes are the states of the problem, and the directed edges represent the actions that result in transition from one state to another. Each state of the environment is equipped with a generalized learning automaton whose actions are moving to different adjacent states of that state. Each agent moves from one state to another and tries to reach the goal state. In each state, the agent chooses its next transition with help of the generalized learning automaton in that state. The experimental results have shown that the proposed algorithm have better learning performance in terms of the speed of reaching the optimal policy as compared to existing learning algorithms.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

utilizing generalized learning automata for finding optimal policies in mmdps

multi agent markov decision processes (mmdps), as the generalization of markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for multi agent reinforcement learning. in this paper, a generalized learning automata based algorithm for finding optimal policies in mmdp is proposed. in the proposed algorithm, mmdp ...

full text

Learning Automata based Algorithms for Finding Optimal Policies in Fully Cooperative Markov Games

Markov games, as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi-agent systems. In this paper, several learning automata based multi-agent system algorithms for finding optimal policies in fully-cooperative Markov Games are proposed. In the proposed algorithms, Markov problem is described as a directed graph in which the nodes are ...

full text

Learning Automata Based Multi-agent System Algorithms for Finding Optimal Policies in Markov Games

Markov games, as the generalization of Markov decision processes to the multi-agent case, have long been used for modeling multi-agent systems (MAS). The Markov game view of MAS is considered as a sequence of games having to be played by multiple players while each game belongs to a different state of the environment. In this paper, several learning automata based multiagent system algorithms f...

full text

Finding Optimal Refueling Policies in Transportation Networks

We study the combinatorial properties of optimal refueling policies, which specify the transportation paths and the refueling operations along the paths to minimize the total transportation costs between vertices. The insight into the structure of optimal refueling policies leads to an elegant reduction of the problem of finding optimal refueling policies into the classical shortest path proble...

full text

A linear-time algorithm for finding optimal vehicle refueling policies

We explore a fixed-route vehicle refueling problem as a special case of the inventorycapacitated lot-sizing problem, and present a linear-time greedy algorithm for finding optimal refueling policies.

full text

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

A simulation method is developed, to find an optimal policy for the expected average reward of a Markovian Decision Process. It is shown that the method is consistent, in the sense that it produces solutions arbitrarily close to the optimal. Various types of estimation errors are examined, and bounds are developed.

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 6  issue 2

pages  15- 22

publication date 2013-02-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023