MS&E 336 Lecture 4: Stochastic games

نویسنده

  • Ramesh Johari
چکیده

i Ai(x), a stage payoff Qi(a;x). 4. For each state x and action vector a ∈ ∏ i Ai(x), a transition probability P(x |x,a) that is a distribution on the state space X . 5. A discount factor δ, 0 < δ < 1. 6. An initial state x. Play proceeds as follows. The game starts in state x. At each stage t, all players simultaneously choose (possibly mixed) actions ai, with possible pure actions given by the set Ai(x ). The stage payoffs Qi(a;x) are realized, and the next state is chosen according to P(·|x,a). All players observe the entire past history of play before choosing their actions at stage t. (This is the simplest assumption; versions with partial monitoring have also been studied.) As usual, let si denote a strategy for player i in this dynamic game; it is a mapping from histories (including states and actions) to actions. (After any history leading to state x, player i’s strategy must choose an action in Ai(x).) Given strategies s1, . . . , sN , the expected discounted payoff of player i starting from state x is:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MS&E 336 Lecture 11: The multiplicative weights algorithm

This lecture is based on the corresponding paper of Freund and Schapire [2], though with some differences in notation and analysis. We introduce and study the multiplicative weights (MW) algorithm, which is an external regret minimizing (i.e., Hannan consistent) algorithm for playing a game. The same algorithm has been analyzed in various forms, particularly in the study of online learning; see...

متن کامل

MS&E 336 Lecture 1: Dynamic games

• For each player i, a partition Ii of {h ∈ H : P (h) = i} (i.e., each history occurs in exactly one of the sets in Ii), such that if h, h ∈ Ii, then A(h) = A(h). The collection Ii is the information partition of player i, and a member Ii ∈ Ii is an information set of player i. For any information set I , let P (I) and A(I) denote the player and action set corresponding to the information set. ...

متن کامل

MS&E 336 Lecture 15: Calibration

Calibration is a concept that tries to formalize a notion of quality for forecasters. For example, suppose a weatherman predicts each day whether the it will rain, or be sunny. Typically forecasters will predict such events in terms of probabilities, i.e., “There is a 30% chance of rain.” Given only the outcome that day, it is impossible to judge the quality of such a forecast. However, if we c...

متن کامل

Nstitute for M Athematics and Its a Pplications

We will discuss the regularity theory and the geometry of the free boundary for free boundary problems of obstacle type, but without positivity assumptions, for instance the Pompeiu problem 4:15–5:00 pm Wendell Fleming Brown University Risk Sensitive Stochastic Control Abstract: Risk sensitive control provides a link between deterministic and stochastic modelling of disturbances in control syst...

متن کامل

Perturbations of Markov Chains with Applications to Stochastic Games

In this lecture we will review several topics that are extensively used in the study of n-player stochastic games. These tools were used in the proof of several results on non zero-sum stochastic games. Most of the results that are presented here appeared in Vieille (1997a,b), and some appeared in Solan (1998, 1999). The first main issue is Markov chains where the transition rule is a Puiseux p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007