Policy-Gradients for PSRs and POMDPs

نویسندگان

  • Douglas Aberdeen
  • Olivier Buffet
  • Owen Thomas
چکیده

In uncertain and partially observable environments control policies must be a function of the complete history of actions and observations. Rather than present an ever growing history to a learner, we instead track sufficient statistics of the history and map those to a control policy. The mapping has typically been done using dynamic programming, requiring large amounts of memory. We present a general approach to mapping sufficient statistics directly to control policies by combining the tracking of sufficient statistics with the use of policy-gradient reinforcement learning. The best known sufficient statistic is the belief state, computed from a known or estimated partially observable Markov decision process (POMDP) model. More recently, predictive state representations (PSRs) have emerged as a potentially compact model of partially observable systems. Our experiments explore the usefulness of both of these sufficient statistics, exact and estimated, in direct policy-search.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hilbert Space Embeddings of PSRs

Many problems in machine learning and artificial intelligence involve discrete-time partially observable nonlinear dynamical systems. If the observations are discrete, then Hidden Markov Models (HMMs) (Rabiner, 1989) or, in the control setting, Partially Observable Markov Decision Processes (POMDPs) (Sondik, 1971) can be used to represent belief as a discrete distribution over latent states. Pr...

متن کامل

Planning in Models that Combine Memory with Predictive Representations of State

Models of dynamical systems based on predictive state representations (PSRs) use predictions of future observations as their representation of state. A main departure from traditional models such as partially observable Markov decision processes (POMDPs) is that the PSR-model state is composed entirely of observable quantities. PSRs have recently been extended to a class of models called memory...

متن کامل

Predictive State Representations: A New Theory for Modeling Dynamical Systems

Modeling dynamical systems, both for control purposes and to make predictions about their behavior, is ubiquitous in science and engineering. Predictive state representations (PSRs) are a recently introduced class of models for discrete-time dynamical systems. The key idea behind PSRs and the closely related OOMs (Jaeger’s observable operator models) is to represent the state of the system as a...

متن کامل

A Survey of Predictive State Representations

Predictive State Representations (PSRs) [10] are a model for a discrete-time finite action and observation stochastic systems, presented as an alternative to HMMs and POMDPs. A PSR represents the system’s state as a set of predictions of the observable outcomes of tests performed in the system. Unlike hidden variable models no latent variables are assumed or required – only observable outcomes ...

متن کامل

Planning in Decentralized POMDPs with Predictive Policy Representations

We discuss the problem of policy representation in stochastic and partially observable systems, and address the case where the policy is a hidden parameter of the planning problem. We propose an adaptation of the Predictive State Representations (PSRs) to this problem by introducing tests (sequences of actions and observations) on policies. The new model, called the Predictive Policy Representa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007