Heuristic Search Value Iteration for Zero-Sum Stochastic Games
نویسندگان
چکیده
In sequential decision making, heuristic search algorithms allow exploiting both the initial situation and an admissible to efficiently for optimal solution, often planning purposes. Such exist problems with uncertain dynamics, partial observability, multiple criteria, or collaborating agents. this article, we look at two-player zero-sum stochastic games (zsSGs) a discounted criterion, in view propose solution tailored fully observable case, while solutions have been proposed particular, though still more general, partially cases. This setting induces reasoning on lower upper bound of value function, which leads us proposing zsSG-HSVI, algorithm based iteration (HSVI), thus relies generating trajectories. We demonstrate that, each player acting optimistically, employing simple initializations, HSVI's convergence finite time ∈-optimal is preserved. An empirical study resulting approach conducted benchmark various sizes.
منابع مشابه
Definable Zero-Sum Stochastic Games
Definable zero-sum stochastic games involve a finite number of states and action sets, reward and transition functions that are definable in an o-minimal structure. Prominent examples of such games are finite, semi-algebraic or globally subanalytic stochastic games. We prove that the Shapley operator of any definable stochastic game with separable transition and reward functions is definable in...
متن کاملHeuristic Search Value Iteration for POMDPs
We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI’s soundness an...
متن کاملRelative Value Iteration for Stochastic Differential Games
Abstract. We study zero-sum stochastic differential games with player dynamics governed by a nondegenerate controlled diffusion process. Under the assumption of uniform stability, we establish the existence of a solution to the Isaac’s equation for the ergodic game and characterize the optimal stationary strategies. The data is not assumed to be bounded, nor do we assume geometric ergodicity. T...
متن کاملPolicy iteration algorithm for zero-sum multichain stochastic games with mean payoff and perfect information
We consider zero-sum stochastic games with finite state and action spaces, perfect information, mean payoff criteria, without any irreducibility assumption on the Markov chains associated to strategies (multichain games). The value of such a game can be characterized by a system of nonlinear equations, involving the mean payoff vector and an auxiliary vector (relative value or bias). We develop...
متن کاملSymbolic Heuristic Search Value Iteration for Factored POMDPs
We propose Symbolic heuristic search value iteration (Symbolic HSVI) algorithm, which extends the heuristic search value iteration (HSVI) algorithm in order to handle factored partially observable Markov decision processes (factored POMDPs). The idea is to use algebraic decision diagrams (ADDs) for compactly representing the problem itself and all the relevant intermediate computation results i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on games
سال: 2021
ISSN: ['2475-1502', '2475-1510']
DOI: https://doi.org/10.1109/tg.2020.3005214