Value-Directed Sampling Methods for POMDPs
نویسندگان
چکیده
We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially observable Markov decision process (POMDP). While particle fil tering has become a widely used tool in AI for monitor ing dynamical systems, rather scant attention has been paid to their use in the context of decision making. As suming the existence of a value function, we derive er ror bounds on decision quality associated with filtering using importance sampling. We also describe an adap tive procedure that can be used to dynamically deter mine the number of samples required to meet specific error bounds. Empirical evidence is offered supporting this technique as a profitable means of directing sam pling effort where it is needed to distinguish policies.
منابع مشابه
Point-Based Value Iteration for Continuous POMDPs
We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value fu...
متن کاملValue-Directed Sampling Methods for Monitoring POMDPs
We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially observable Markov decision process (POMDP). While particle filtering has become a widely used tool in AI for monitoring dynamical systems, rather scant attention has been paid to their use in the context of decision making. Assuming the existence of a...
متن کاملMonte Carlo POMDPs
We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a sa...
متن کاملVector-space Analysis of Belief-state Approximation for POMDPs
We propose a new approach to value-directed belief state approximation for POMDPs. The valuedirected model allows one to choose approximation methods for belief state monitoring that have a small impact on decision quality. Using a vector space analysis of the problem, we devise two new search procedures for selecting an approximation scheme that have much better computational properties than e...
متن کاملValue-Directed Belief State Approximation for POMDPs
We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for beliefstate approximation (e.g., based on minimizing a measure such as KL-divergence between the true and estimated state) are not necessarily appropriate for POMDPs. Inste...
متن کامل