Policy Iteration for Factored MDPs
نویسندگان
چکیده
Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not re tain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restr icted basis functions, each of which refers only to a small subset of variables. An approximate fac tored value function for a particular policy can be computed using approximate dynamic pro gramming, but this approach (and others) can only produce an approximation relative to a dis tance metric which is weighted by the station ary distribution of the current policy. This type of weighted projection is ill-suited to policy im provement. We present a new approach to value determination, that uses a simple closed-form computation to compute a least-squares decom posed approximation to the value function for any weights directly. We then use this value de termination algorithm as a subroutine in a pol icy iteration process. We show that, under rea sonable restrictions, the policies induced by a factored value function can be compactly repre sented as a decision list, and can be manipulated efficiently in a policy iteration process. We also present a method for computing error bounds for decomposed value functions using a variable elimination algorithm for function optimization. The complexity of all of our algorithms depends on the factorization of the system dynamics and of the approximate value function.
منابع مشابه
Iteration for Factored MDPsDaphne
Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restricted basis functions, each of which refers only to a small subset of ...
متن کاملSymbolic Opportunistic Policy Iteration for Factored-Action MDPs
This paper addresses the scalability of symbolic planning under uncertainty with factored states and actions. Our first contribution is a symbolic implementation of Modified Policy Iteration (MPI) for factored actions that views policy evaluation as policy-constrained value iteration (VI). Unfortunately, a naı̈ve approach to enforce policy constraints can lead to large memory requirements, somet...
متن کاملProbabilistic Reachability Analysis for Structured Markov Decision Processes
We present a stochastic planner based on Markov Decision Processes (MDPs) that participates to the probablistic planning track of the 2004 International Planning Competition. The planner transforms the PDDL problems into factored MDPs that are then solved with a structured policy iteration algorithm. A probabilistic reachability analysis is performed, approximating the MDP solution over the rea...
متن کاملMax-norm Projections for Factored MDPs
Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution a...
متن کاملSymbolic Stochastic Focused Dynamic Programming with Decision Diagrams
We present a stochastic planner based on Markov Decision Processes (MDPs) that participates to the probabilistic planning track of the 2006 International Planning Competition. The planner transforms the PPDDL problems into factored MDPs that are then solved with a structured modified value iteration algorithm based on the safest stochastic path computation from the initial states to the goal st...
متن کامل