Simulation-Based Algorithms for Average Cost Markov Decision Processes
نویسندگان
چکیده
In this paper, we give a summary of recent development of simulation-based algorithms for average cost MDP problems, which are different from those for discounted cost problems or shortest path problems. We introduce both simulation-based policy iteration algorithms and simulation-based value iteration algorithms for average cost problem, and give the pros and cons of each algorithm.
منابع مشابه
Simulation-Based Algorithms for Markov Decision Processes
Title of Dissertation: Simulation-Based Algorithms for Markov Decision Processes Ying He, Doctor of Philosophy, 2002 Dissertation directed by: Professor Steven I. Marcus Department of Electrical & Computer Engineering Professor Michael C. Fu Department of Decision & Information Technologies Problems of sequential decision making under uncertainty are common in manufacturing, computer and commun...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملReinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over cor...
متن کاملLecture notes for “Analysis of Algorithms”: Markov decision processes
We give an introduction to infinite-horizon Markov decision processes (MDPs) with finite sets of states and actions. We focus primarily on discounted MDPs for which we present Shapley’s (1953) value iteration algorithm and Howard’s (1960) policy iteration algorithm. We also give a short introduction to discounted turn-based stochastic games, a 2-player generalization of MDPs. Finally, we give a...
متن کاملTitle of dissertation : LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES
Title of dissertation: LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES Abraham Thomas, Doctor of Philosophy, 2009 Dissertation directed by: Professor Steven Marcus Department of Electrical and Computer Engineering We propose various computational schemes for solving Partially Observable Markov Decision Processes with the finite stage additive cost and infinite horizon discounted cost criterio...
متن کامل