Markov Decision Problems Where Means Bound Variances
نویسندگان
چکیده
منابع مشابه
Markov Decision Problems Where Means Bound Variances
We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of the optimal total reward can be bounded by a simple affine function of its expected value. The class is characterized by three natural properties: reward boundedness, existence of a do-nothing action, and optimal action monotonicity. These properties are commonly present and typically easy to ch...
متن کاملMarkov Decision Problems
Markov Decision Problems (MDPs) are the foundation for many problems that are of interest to researchers in Artificial Intelligence and Operations Research. In this paper, we will review what is known about algorithms for solving MDPs as well as the complexity of solving MDPs in general. We will argue that, even though there are theoretically efficient algorithms for solving MDPs, these algorit...
متن کاملLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2006 We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities ...
متن کاملDenumerable Constrained Markov Decision Problems and Finite Approximations Denumerable Constrained Markov Decision Problems and Finite Approximations
The purpose of this paper is two fold. First to establish the Theory of discounted constrained Markov Decision Processes with a countable state and action spaces with general multi-chain structure. Second, to introduce nite approximation methods. We deene the occupation measures and obtain properties of the set of all achievable occupation measures under the diierent admissible policies. We est...
متن کاملLower Bound On the Computational Complexity of Discounted Markov Decision Problems
We study the computational complexity of the infinite-horizon discounted-reward Markov Decision Problem (MDP) with a finite state space S and a finite action space A. We show that any randomized algorithm needs a running time at least Ω(|S||A|) to compute an -optimal policy with high probability. We consider two variants of the MDP where the input is given in specific data structures, including...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Operations Research
سال: 2014
ISSN: 0030-364X,1526-5463
DOI: 10.1287/opre.2014.1281