Run-Time Improvement of Point-Based POMDP Policies

نویسندگان

Minlue Wang

Richard Dearden

چکیده

The most successful recent approaches to partially observable Markov decision problem (POMDP) solving have largely been point-based approximation algorithms. These work by selecting a finite number of belief points, computing alpha-vectors for those points, and using the resulting policy everywhere. However, if during execution the belief state is far from the points, there is no guarantee that the policy will be good. This case occurs either when the points are chosen poorly or there are too few points to capture the whole optimal policy, for example in domains where there are many low probability transitions, such as faults or exogenous events. In this paper we explore the use of an on-line plan repair approach to overcome this difficulty. The idea is to split computation between off-line plan creation and, if necessary, on-line plan repair. We evaluate a variety of heuristics used to determine when plan repair might be useful, and then repair the plan by sampling a small number of additional belief points and recomputing the policy. We show in several domains that the approach is more effective than either off-line planning alone even with much more computation time, or a purely on-line planning based on forward search. We also show that the overhead of checking the heuristics is very small when replanning is unnecessary.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...

متن کامل

Monitoring plan execution in partially observable stochastic worlds

This thesis presents two novel algorithms for monitoring plan execution in stochastic partially observable environments. The problems can be naturally formulated as partially-observable Markov decision processes (POMDPs). Exact solutions of POMDP problems are difficult to find due to the computational complexity, so many approximate solutions are proposed instead. These POMDP solvers tend to ge...

متن کامل

Improving Point-Based POMDP Policies at Run-Time

Point-based algorithms have been widely used for computing approximate solutions for POMDPs. While they work well in many cases, they can perform very poorly if the current belief state at run time has not been well sampled. In this paper we proposed several heuristic functions for estimating when offline approximate policies are likely to perform poorly at the current belief point. We show tha...

متن کامل

Exploiting Belief Locality in Run-Time Decision-Theoretic Planners

While Partially-Observable Markov Decision Processes have become a popular means of representing realistic planning problems, exact approaches to finding POMDP policies are extremely computationally complex. An alternative approach for control in POMDP domains is to use run-time optimization over action sequences in a dynamic decision network. While exact algorithms have to generate a policy ov...

متن کامل

Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of Pomdp Exact Solutions

Decision making is one of the central problems in artificial intelligence and specifically in robotics. In most cases this problem comes with uncertainty both in data received by the decision maker/agent and in the actions performed in the environment. One effective method to solve this problem is to model the environment and the agent as a Partially Observable Markov Decision Process (POMDP). ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Run-Time Improvement of Point-Based POMDP Policies

نویسندگان

چکیده

منابع مشابه

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Monitoring plan execution in partially observable stochastic worlds

Improving Point-Based POMDP Policies at Run-Time

Exploiting Belief Locality in Run-Time Decision-Theoretic Planners

Evaluating Effects of Two Alternative Filters for the Incremental Pruning Algorithm on Quality of Pomdp Exact Solutions

عنوان ژورنال:

اشتراک گذاری