Lecture notes for “Analysis of Algorithms”: Markov decision processes
نویسنده
چکیده
We give an introduction to infinite-horizon Markov decision processes (MDPs) with finite sets of states and actions. We focus primarily on discounted MDPs for which we present Shapley’s (1953) value iteration algorithm and Howard’s (1960) policy iteration algorithm. We also give a short introduction to discounted turn-based stochastic games, a 2-player generalization of MDPs. Finally, we give a short introduction to two alternative criteria for optimality: average cost and total cost. The presentation given in these lecture notes is based on [6, 9, 5]. 1 Markov decision processes A Markov decision process (MDP) is composed of a finite set of states, and for each state a finite, non-empty set of actions. In each time unit, the MDP is in exactly one of the states. A controller must choose one of the actions associated with the current state. Using an action a ∈ A incurs an immediate cost, and results in a probabilistic transition to a new state according to a probability distribution that depends on the action. The process goes on indefinitely. The goal of the controller is to minimize the incurred costs according to some criterion. We will later define three optimality criteria: discounted cost, average cost, and total cost. Formally, a Markov decision process is defined as follows. We use ∆(S) to denote the set of probability distributions over elements of a set S. Definition 1.1 (Markov decision process) A Markov decision process (MDP) is a tuple M = (S,A, s, c, p), where ∗School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. E-mail: [email protected].
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملProbabilistic Aspects of Computer Science: Markovian Models
These lecture notes present four Markovian models: discrete time Markov chains (DTMC), continous time Markov chains (CTMC), Markov decision processes (MDP) and probabilistic automata (PA). It is addressed for master students and tries as most as possible to be self-contained. However the basics of discrete probability (and additionally the basics of measure and integration for the study of CTMC...
متن کاملSimulation-Based Graph Similarity
We present symmetric and asymmetric similarity measures for labeled directed rooted graphs that are inspired by the simulation and bisimulation relations on labeled transition systems. Computation of the similarity measures has close connections to discounted Markov decision processes in the asymmetric case and to perfect-information stochastic games in the symmetric case. For the symmetric cas...
متن کاملMarkov Chains Compact Lecture Notes and Exercises
part of course 6CCM320A part of course 6CCM380A 2 1 Introduction 3 2 Definitions and properties of stochastic processes 7 2.
متن کامل