Markov Decision Processes with Multiple Long-Run Average Objectives
نویسنده
چکیده
We consider Markov decision processes (MDPs) with multiple long-run average objectives. Such MDPs occur in design problems where one wishes to simultaneously optimize several criteria, for example, latency and power. The possible trade-offs between the different objectives are characterized by the Pareto curve. We show that every Pareto optimal point can be ε-approximated by a memoryless strategy, for all ε > 0. In contrast to the single-objective case, the memoryless strategy may require randomization. We show that the Pareto curve can be approximated (a) in polynomial time in the size of the MDP for irreducible MDPs; and (b) in polynomial space in the size of the MDP for all MDPs. Additionally, we study the problem if a given value vector is realizable by any strategy, and show that it can be decided in polynomial time for irreducible MDPs and in NP for all MDPs. These results provide algorithms for design exploration in MDP models with multiple long-run average objectives.
منابع مشابه
Value Iteration for Long-Run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive exte...
متن کاملMagnifying Lens Abstraction for Stochastic Games with Discounted and Long-run Average Objectives
Turn-based stochastic games and its important subclass Markov decision processes (MDPs) provide models for systems with both probabilistic and nondeterministic behaviors. We consider turnbased stochastic games with two classical quantitative objectives: discounted-sum and long-run average objectives. The game models and the quantitative objectives are widely used in probabilistic verification, ...
متن کاملA Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes y
Since the long-run average reward optimality criterion is underselective, a decisionmaker often uses bias to distinguish between multiple average optimal policies. We study bias optimality in unichain, nite state and action space Markov Decision Processes. A probabilistic approach is used to give intuition as to why a bias-based decision-maker prefers a particular policy over another. Using rel...
متن کاملOn the Controller Synthesis for Finite-State Markov Decision Processes
The controller synthesis problem is one of the fundamental research topics in the area of system design. Loosely speaking, the task is to modify or limit some parts of a given system so that a given property is satisfied. In this presentation, we concentrate on a class of probabilistic systems that can be modelled by finitestate Markov decision processes. Intuitively, Markov decision processes ...
متن کاملSemi-markov Decision Processes
Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.
متن کامل