Successive Approximation Methods in Undiscounted Stochastic Games

نویسنده

  • Awi Federgruen
چکیده

T HIS PAPER considers two-person, zero-sum stochastic games with finite state space = {1, -, N} and in each state i E 2, two finite sets K(i) and L(i) of actions available to player 1 and 2, respectively. The state of the system is observed at equidistant epochs. When the system is observed to be in state i, the two players choose an action, or a randomization of actions out of K(i) and L(i), respectively. When the actions k C K(i), 1 E L(i) are chosen in state i, thenP*" ' 0 denotes the probability that state j is the next state to be observed ( PPki = 1) and qi is the one-step expected reward earned by player 1 from player 2. If the payoffs are discounted at the interest rate r > 0, the stochastic game is called the r-discounted game. The existence of a value and of stationary optimal policies in the r-discounted game goes essentially back to Shapley [22]; in addition it is easily verified that value-iteration converges to the value of the game, in view of the value-iteration operator being a contraction mapping on EN, the N-dimensional Euclidean space. In the undiscounted version of the game, i.e., when the long run average return per unit time is the criterion to be considered, one or both players may fail to have optimal stationary policies, as follows from an example in Gillette [11]. Both for this model and for the case of more general state and action spaces, recurrency conditions with respect to the transition probability matrices (tpm's) associated with the stationary policies have been obtained under which the existence of a stationary pair of equilibrium policies (AEP) is guaranteed (see Federgruen [7], Hoffman and Karp [13], Rogers [18], Sobel [23], and Stern [24]). So far, very little attention has been paid to the actual computation of both the asymptotic average value g * and of a solution v * to the average

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discounted approximations of undiscounted stochastic games and Markov decision processes are already poor in the almost deterministic case

It is shown that the discount factor needed to solve an undiscounted mean payoff stochastic game to optimality is exponentially close to 1, even in oneplayer games with a single random node and polynomially bounded rewards and transition probabilities. On the other hand, for the class of the so-called irreducible games with perfect information and a constant number of random nodes, we obtain a ...

متن کامل

Exact Algorithms for Solving Stochastic Games

Shapley’s discounted stochastic games, Everett’s recursive games and Gillette’s undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games. When the number of positions of the game is constant, our algorithms run in polynomial time.

متن کامل

Markov Decision Processes and Stochastic Games with Total Effective Payoff a

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a sadd...

متن کامل

epsilon-Equilibria for Stochastic Games with Uncountable State Space and Unbounded Costs

We study a class of noncooperative stochastic games with unbounded cost functions and an uncountable state space. It is assumed that the transition law is absolutely continuous with respect to some probability measure on the state space. Undiscounted stochastic games with expected average costs are considered first. It is shown under a uniform geometric ergodicity assumption that there exists a...

متن کامل

A lower bound for discounting algorithms solving two-person zero-sum limit average payoff stochastic games

It is shown that the discount factor needed to solve an undiscounted mean payoff stochastic game to optimality is exponentially close to 1, even in games with a single random node and polynomially bounded rewards and transition probabilities.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Operations Research

دوره 28  شماره 

صفحات  -

تاریخ انتشار 1980