نتایج جستجو برای: markov games

تعداد نتایج: 126585  

2016
Julien Pérolat Bilal Piot Matthieu Geist Bruno Scherrer Olivier Pietquin

This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results...

2004
Raghav Aras Alain Dutech François Charpillet

In this paper, we present a comunication-integrated reinforcement-learning algorithm for a general-sum Markov game or MG played by independent, cooperative agents. The algorithm assumes that agents can communicate but do not know the purpose (the semantic) of doing so. We model agents that have different tasks, some of which may be commonly beneficial. The objective of the agents is to determin...

Journal: :CoRR 2013
Yanling Chang Alan L. Erera Chelsea C. White

The intent of this research is to generate a set of non-dominated policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon, partially observed Markov game (POMG). The actions of the agents are selected simultan...

Journal: :Cognitive Systems Research 2001
Michael L. Littman

Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason ab...

2014
A. S. Nowak

In many real-life situations, the preferences of an economic agent change over time. Rational behaviour of such agents was studied by many authors (Strotz, Pollak, Bernheim and Ray) who considered so-called “consistent plans”. Phelps and Pollak [10] introduced the notion of “quasi-hyperbolic discounting”, which is a modification of the classical discounting proposed in 1937 by Samuelson. Within...

2008
S. Moya

In this paper a regularized version of the”extraproximal method” is suggested to be applied for finding a Nash equilibrium in a multi­participant finite game where the dynamics of each player is governed by a finite controllable Markov chain. The suggested iterative technique realizes the application of a two­step procedure at each iteration: at the first (or preliminary) step some ”predictive ...

Journal: :Int. J. Game Theory 1997
János Flesch Frank Thuijsman Koos Vrieze

We examine a three-person stochastic game where the only existing equi-libria consist of cyclic Markov strategies. Unlike in two-person games of a similar type, stationary "-equilibria (" > 0) do not exist for this game. Besides we characterize the set of feasible equilibrium rewards.

2004
Ville Könönen

The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team problems as Markov games endowed with the asymmetric equilibrium concept and based on this formulation, we provide a direct policy gradient learning method. In addition, we test the proposed method with a small example p...

2011
Endre Boros Khaled M. Elbassioni Mahmoud Fouz Vladimir Gurvich Kazuhisa Makino Bodo Manthey

We consider two-person zero-sum stochastic mean payoff games with perfect information modeled by a digraph with black, white, and random vertices. These BWR-games games are polynomially equivalent with the classical Gillette games, which include many well-known subclasses, such as cyclic games, simple stochastic games, stochastic parity games, and Markov decision processes. They can also be use...

Journal: :SIAM J. Control and Optimization 2008
Erik Ekström Goran Peskir

where the horizon T (the upper bound for τ and σ above) may be either finite or infinite (it is assumed that G1(XT ) = G2(XT ) if T is finite and lim inft→∞G2(Xt) ≤ lim supt→∞G1(Xt) if T is infinite). If X is right-continuous, then the Stackelberg equilibrium holds, in the sense that V ∗(x) = V∗(x) for all x with V := V ∗ = V∗ defining a measurable function. If X is right-continuous and left-co...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید