markov games

A Near-Optimal Poly-Time Algorithm for Learning a class of Stochastic Games

1999

Ronen I. Brafman Moshe Tennenholtz

We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [ 1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes....

متن کامل

Center for the Study of Rationality

2005

Ziv Gorodeisky

We consider stability properties of equilibria in stochastic evolutionary dynamics. In particular, we study the stability of mixed equilibria in strategic form games. In these games, when the populations are small, all strategies may be stable. We prove that when the populations are large, the unique stable outcome of best-reply dynamics in 2 × 2 games with a unique Nash equilibrium that is com...

متن کامل

Graphical potential games

Journal: :J. Economic Theory 2016

Yakov Babichenko Omer Tamuz

We study the class of potential games that are also graphical games with respect to a given graph G of connections between the players. We show that, up to strategic equivalence, this class of games can be identified with the set of Markov random fields on G. From this characterization, and from the Hammersley-Clifford theorem, it follows that the potentials of such games can be decomposed to l...

متن کامل

] 2 9 A ug 2 00 3 Discrete - time ratchets , the Fokker - Planck equation and Parrondo ’ s paradox

2004

P. Amengual A. Allison R. Toral D. Abbott

Parrondo's games manifest the apparent paradox where losing strategies can be combined to win and have generated significant multidisciplinary interest in the literature. Here we review two recent approaches, based on the Fokker-Planck equation , that rigorously establish the connection between Parrondo's games and a physical model known as the flashing Brownian ratchet. This gives rise to a ne...

متن کامل

An Algorithm for Computing Stochastically Stable Distributions with Applications to Multiagent Learning in Repeated Games

2005

John R. Wicks Amy Greenwald

One of the proposed solutions to the equilibrium selection problem for agents learning in repeated games is obtained via the notion of stochastic stability. Learning algorithms are perturbed so that the Markov chain underlying the learning dynamics is necessarily irreducible and yields a unique stable distribution. The stochastically stable distribution is the limit of these stable distribution...

متن کامل

Discrete-time ratchets, the Fokker-Planck equation and Parrondo’s paradox

2004

By P. Amengual A. Allison R. Toral D. Abbott

Parrondo’s games manifest the apparent paradox where losing strategies can be combined to win and have generated significant multidisciplinary interest in the literature. Here we review two recent approaches, based on the Fokker-Planck equation, that rigorously establish the connection between Parrondo’s games and a physical model known as the flashing Brownian ratchet. This gives rise to a new...

متن کامل

Learning Automata Based Multi-agent System Algorithms for Finding Optimal Policies in Markov Games

2010

Behrooz Masoumi Mohammad Reza Meybodi

Markov games, as the generalization of Markov decision processes to the multi-agent case, have long been used for modeling multi-agent systems (MAS). The Markov game view of MAS is considered as a sequence of games having to be played by multiple players while each game belongs to a different state of the environment. In this paper, several learning automata based multiagent system algorithms f...

متن کامل

Adversarial Policy Switching with Application to RTS Games

2012

Brian King Alan Fern Jesse Hostetler

Complex games such as RTS games are naturally formalized as Markov games. Given a Markov game, it is often possible to hand-code or learn a set of policies that capture the diversity of possible strategies. It is also often possible to hand-code or learn an abstract simulator of the game that can estimate the outcome of playing two strategies against one another from any state. We consider how ...

متن کامل

Reversibility and Mixing Time for Logit Dynamics with Concurrent Updates

Journal: :CoRR 2012

Vincenzo Auletta Diodato Ferraioli Francesco Pasquale Paolo Penna Giuseppe Persiano

Logit dynamics [Blume, Games and Economic Behavior, 1993] is a randomized best response dynamics where at every time step a player is selected uniformly at random and she chooses a new strategy according to the “logit choice function”, i.e. a probability distribution biased towards strategies promising higher payoffs, where the bias level corresponds to the degree of rationality of the agents. ...

متن کامل

The Value of Markov Chain Games with Lack of Information on One Side

Journal: :Math. Oper. Res. 2006

Jérôme Renault

We consider a two-player zero-sum game given by a Markov chain over a finite set of states K and a family of zero-sum matrix games (G)k∈K . The sequence of states follows the Markov chain. At the beginning of each stage, only player 1 is informed of the current state k, then the game G is played, the actions played are observed by both players and the play proceeds to the next stage. We call su...

متن کامل