MS&E 336 Lecture 4: Stochastic games
نویسنده
چکیده
i Ai(x), a stage payoff Qi(a;x). 4. For each state x and action vector a ∈ ∏ i Ai(x), a transition probability P(x |x,a) that is a distribution on the state space X . 5. A discount factor δ, 0 < δ < 1. 6. An initial state x. Play proceeds as follows. The game starts in state x. At each stage t, all players simultaneously choose (possibly mixed) actions ai, with possible pure actions given by the set Ai(x ). The stage payoffs Qi(a;x) are realized, and the next state is chosen according to P(·|x,a). All players observe the entire past history of play before choosing their actions at stage t. (This is the simplest assumption; versions with partial monitoring have also been studied.) As usual, let si denote a strategy for player i in this dynamic game; it is a mapping from histories (including states and actions) to actions. (After any history leading to state x, player i’s strategy must choose an action in Ai(x).) Given strategies s1, . . . , sN , the expected discounted payoff of player i starting from state x is:
منابع مشابه
MS&E 336 Lecture 11: The multiplicative weights algorithm
This lecture is based on the corresponding paper of Freund and Schapire [2], though with some differences in notation and analysis. We introduce and study the multiplicative weights (MW) algorithm, which is an external regret minimizing (i.e., Hannan consistent) algorithm for playing a game. The same algorithm has been analyzed in various forms, particularly in the study of online learning; see...
متن کاملMS&E 336 Lecture 1: Dynamic games
• For each player i, a partition Ii of {h ∈ H : P (h) = i} (i.e., each history occurs in exactly one of the sets in Ii), such that if h, h ∈ Ii, then A(h) = A(h). The collection Ii is the information partition of player i, and a member Ii ∈ Ii is an information set of player i. For any information set I , let P (I) and A(I) denote the player and action set corresponding to the information set. ...
متن کاملMS&E 336 Lecture 15: Calibration
Calibration is a concept that tries to formalize a notion of quality for forecasters. For example, suppose a weatherman predicts each day whether the it will rain, or be sunny. Typically forecasters will predict such events in terms of probabilities, i.e., “There is a 30% chance of rain.” Given only the outcome that day, it is impossible to judge the quality of such a forecast. However, if we c...
متن کاملNstitute for M Athematics and Its a Pplications
We will discuss the regularity theory and the geometry of the free boundary for free boundary problems of obstacle type, but without positivity assumptions, for instance the Pompeiu problem 4:15–5:00 pm Wendell Fleming Brown University Risk Sensitive Stochastic Control Abstract: Risk sensitive control provides a link between deterministic and stochastic modelling of disturbances in control syst...
متن کاملPerturbations of Markov Chains with Applications to Stochastic Games
In this lecture we will review several topics that are extensively used in the study of n-player stochastic games. These tools were used in the proof of several results on non zero-sum stochastic games. Most of the results that are presented here appeared in Vieille (1997a,b), and some appeared in Solan (1998, 1999). The first main issue is Markov chains where the transition rule is a Puiseux p...
متن کامل