Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games

نویسندگان

Colin McMillen

Manuela M. Veloso

چکیده

In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent’s score; the “true” reward of a win, loss, or tie is determined at the end of a game by applying a threshold function to the cumulative intermediate reward. We introduce thresholded-rewards problems to capture this dependency of the final reward outcome on the cumulative intermediate reward. Thresholded-rewards problems reflect different real-world stochastic planning domains, especially zero-sum games, in which time and score need to be considered. We investigate the application of thresholded rewards to finitehorizon Markov Decision Processes (MDPs). In general, the optimal policy for a thresholded-rewards MDP will be nonstationary, depending on the number of time steps remaining and the cumulative intermediate reward. We introduce an efficient value iteration algorithm that solves thresholdedrewards MDPs exactly, but with running time quadratic on the number of states in the MDP and the length of the time horizon. We investigate a number of heuristic-based techniques that efficiently find approximate solutions for MDPs with large state spaces or long time horizons.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thresholded-Rewards Decision Problems: Acting Effectively in Timed Domains

In timed, zero-sum games, winning against the opponent is more important than the final score. A team that is losing near the end of the game may choose to play aggressively to try to even the score before time runs out. In this thesis, we consider the problem of finding optimal policies in stochastic domains with limited time, some notion of score, and in complex environments, such as domains ...

متن کامل

Average Reward Timed Games

We consider real-time games where the goal consists, for each player, in maximizing the average reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on discrete-time game structures which can be specified using a two-player version of timed automata whose locations are ...

متن کامل

A TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS

In this paper, we deal with games with fuzzy payoffs. We proved that players who are playing a zero-sum game with fuzzy payoffs against Nature are able to increase their joint payoff, and hence their individual payoffs by cooperating. It is shown that, a cooperative game with the fuzzy characteristic function can be constructed via the optimal game values of the zero-sum games with fuzzy payoff...

متن کامل

Uniform Equilibrium: More Than Two Players

Until this day, no counter example was found. Furthermore, we have seen that a positive answer was given for several special classes, including recursive games (Everett, 1957), zero-sum games (Mertens and Neyman, 1981), two-player absorbing games (Vrieze and Thuijsman, 1989) and two-player non zero-sum games (Vieille, 1997b). For n-player games, existence of stationary equilibrium profiles was ...

متن کامل

Canonical forms of two-person zero-sum limit average payoff stochastic games

We consider two-person zero-sum stochastic games with perfect information and, for each k ∈ Z+, introduce a new payoff function, called the k-total reward. For k = 0 and 1 they are the so called mean and total rewards, respectively. For all k, we prove solvability of the considered games in pure stationary strategies, and show that the uniformly optimal strategies for the discounted mean payoff...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games

نویسندگان

چکیده

منابع مشابه

Thresholded-Rewards Decision Problems: Acting Effectively in Timed Domains

Average Reward Timed Games

A TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS

Uniform Equilibrium: More Than Two Players

Canonical forms of two-person zero-sum limit average payoff stochastic games

عنوان ژورنال:

اشتراک گذاری