Risk-Averse $\omega$-regular Markov Decision Process Control
نویسندگان
چکیده
Many control problems in environments that can be modeled as Markov decision processes (MDPs) concern infinite-time horizon specifications. The classical aim in this context is to compute a control policy that maximizes the probability of satisfying the specification. In many scenarios, there is however a non-zero probability of failure in every step of the system’s execution. For infinite-time horizon specifications, this implies that the specification is violated with probability 1 in the long run no matter what policy is chosen, which prevents previous policy computation methods from being useful in these scenarios. In this paper, we introduce a new optimization criterion for MDP policies that captures the task of working towards the satisfaction of some infinite-time horizonω-regular specification. The new criterion is applicable to MDPs in which the violation of the specification cannot be avoided in the long run. We give an algorithm to compute policies that are optimal in this criterion and show that it captures the ideas of optimism and risk-averseness in MDP control: while the computed policies are optimistic in that a MDP run enters a failure state relatively late, they are risk-averse by always maximizing the probability to reach their respective next goal state. We give results on two robot control scenarios to validate the usability of risk-averse MDP policies.
منابع مشابه
Risk-averse dynamic programming for Markov decision processes
We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we also develop a risk-averse policy iteration method and we pro...
متن کاملAn Approximate Solution Method for Large Risk-Averse Markov Decision Processes
Stochastic domains often involve risk-averse decision makers. While recent work has focused on how to model risk in Markov decision processes using risk measures, it has not addressed the problem of solving large risk-averse formulations. In this paper, we propose and analyze a new method for solving large risk-averse MDPs with hybrid continuous-discrete state spaces and continuous action space...
متن کاملRisk-averse control of Markov decision processes with ω-regular objectives
This is the author-archived version of the paper. The original publication is available on IEEE XPlore under http://dx.doi.org/10.1109/CDC.2016.7798306. c © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creati...
متن کاملRisk premiums and certainty equivalents of loss-averse newsvendors of bounded utility
Loss-averse behavior makes the newsvendors avoid the losses more than seeking the probable gains as the losses have more psychological impact on the newsvendor than the gains. In economics and decision theory, the classical newsvendor models treat losses and gains equally likely, by disregarding the expected utility when the newsvendor is loss-averse. Moreover, the use of unbounded utility to m...
متن کاملRisk-Averse Control of Undiscounted Transient Markov Models
We use Markov risk measures to formulate a risk-averse version of the undiscounted total cost problem for a transient controlled Markov process. We derive risk-averse dynamic programming equations and we show that a randomized policy may be strictly better than deterministic policies, when risk measures are employed. We illustrate the results on an optimal stopping problem and an organ transpla...
متن کامل