Policy Iteration for Factored MDPs

نویسندگان

  • Daphne Koller
  • Ronald Parr
چکیده

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not re­ tain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restr icted basis functions, each of which refers only to a small subset of variables. An approximate fac­ tored value function for a particular policy can be computed using approximate dynamic pro­ gramming, but this approach (and others) can only produce an approximation relative to a dis­ tance metric which is weighted by the station­ ary distribution of the current policy. This type of weighted projection is ill-suited to policy im­ provement. We present a new approach to value determination, that uses a simple closed-form computation to compute a least-squares decom­ posed approximation to the value function for any weights directly. We then use this value de­ termination algorithm as a subroutine in a pol­ icy iteration process. We show that, under rea­ sonable restrictions, the policies induced by a factored value function can be compactly repre­ sented as a decision list, and can be manipulated efficiently in a policy iteration process. We also present a method for computing error bounds for decomposed value functions using a variable­ elimination algorithm for function optimization. The complexity of all of our algorithms depends on the factorization of the system dynamics and of the approximate value function.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iteration for Factored MDPsDaphne

Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not retain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restricted basis functions, each of which refers only to a small subset of ...

متن کامل

Symbolic Opportunistic Policy Iteration for Factored-Action MDPs

This paper addresses the scalability of symbolic planning under uncertainty with factored states and actions. Our first contribution is a symbolic implementation of Modified Policy Iteration (MPI) for factored actions that views policy evaluation as policy-constrained value iteration (VI). Unfortunately, a naı̈ve approach to enforce policy constraints can lead to large memory requirements, somet...

متن کامل

Probabilistic Reachability Analysis for Structured Markov Decision Processes

We present a stochastic planner based on Markov Decision Processes (MDPs) that participates to the probablistic planning track of the 2004 International Planning Competition. The planner transforms the PDDL problems into factored MDPs that are then solved with a structured policy iteration algorithm. A probabilistic reachability analysis is performed, approximating the MDP solution over the rea...

متن کامل

Max-norm Projections for Factored MDPs

Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution a...

متن کامل

Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams

We present a stochastic planner based on Markov Decision Processes (MDPs) that participates to the probabilistic planning track of the 2006 International Planning Competition. The planner transforms the PPDDL problems into factored MDPs that are then solved with a structured modified value iteration algorithm based on the safest stochastic path computation from the initial states to the goal st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000