نتایج جستجو برای: markov games
تعداد نتایج: 126585 فیلتر نتایج به سال:
This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results...
In this paper, we present a comunication-integrated reinforcement-learning algorithm for a general-sum Markov game or MG played by independent, cooperative agents. The algorithm assumes that agents can communicate but do not know the purpose (the semantic) of doing so. We model agents that have different tasks, some of which may be commonly beneficial. The objective of the agents is to determin...
The intent of this research is to generate a set of non-dominated policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon, partially observed Markov game (POMG). The actions of the agents are selected simultan...
Markov games are a model of multiagent environments that are convenient for studying multiagent reinforcement learning. This paper describes a set of reinforcement-learning algorithms based on estimating value functions and presents convergence theorems for these algorithms. The main contribution of this paper is that it presents the convergence theorems in a way that makes it easy to reason ab...
In many real-life situations, the preferences of an economic agent change over time. Rational behaviour of such agents was studied by many authors (Strotz, Pollak, Bernheim and Ray) who considered so-called “consistent plans”. Phelps and Pollak [10] introduced the notion of “quasi-hyperbolic discounting”, which is a modification of the classical discounting proposed in 1937 by Samuelson. Within...
In this paper a regularized version of the”extraproximal method” is suggested to be applied for finding a Nash equilibrium in a multiparticipant finite game where the dynamics of each player is governed by a finite controllable Markov chain. The suggested iterative technique realizes the application of a twostep procedure at each iteration: at the first (or preliminary) step some ”predictive ...
We examine a three-person stochastic game where the only existing equi-libria consist of cyclic Markov strategies. Unlike in two-person games of a similar type, stationary "-equilibria (" > 0) do not exist for this game. Besides we characterize the set of feasible equilibrium rewards.
The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team problems as Markov games endowed with the asymmetric equilibrium concept and based on this formulation, we provide a direct policy gradient learning method. In addition, we test the proposed method with a small example p...
We consider two-person zero-sum stochastic mean payoff games with perfect information modeled by a digraph with black, white, and random vertices. These BWR-games games are polynomially equivalent with the classical Gillette games, which include many well-known subclasses, such as cyclic games, simple stochastic games, stochastic parity games, and Markov decision processes. They can also be use...
where the horizon T (the upper bound for τ and σ above) may be either finite or infinite (it is assumed that G1(XT ) = G2(XT ) if T is finite and lim inft→∞G2(Xt) ≤ lim supt→∞G1(Xt) if T is infinite). If X is right-continuous, then the Stackelberg equilibrium holds, in the sense that V ∗(x) = V∗(x) for all x with V := V ∗ = V∗ defining a measurable function. If X is right-continuous and left-co...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید