A Model-Checking Approach to Decision-Theoretic Planning with Non-Markovian Rewards

نویسندگان

  • Sylvie Thiébaux
  • Froduald Kabanza
  • John Slaney
چکیده

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic to express rewards. Our translation has the effect of embedding model-checking in the solution method and results in an MDP of the minimal size achievable without stepping outside the anytime framework.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision-Theoretic Planning with non-Markovian Rewards

A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decisiontheoretic planning, where many desirable behaviours are more naturally expressed as properties of execution sequences rather than as properties of states, NMRDPs form a more natural model than the commonly adopted fully Markovi...

متن کامل

A More Expressive Behavioral Logic for Decision-Theoretic Planning

We examine the problem of compactly expressing models of non-Markovian reward decision processes (NMRDP). In the field of decision-theoretic planning NMRDPs are used whenever the agent’s reward is determined by the history of visited states. Two different propositional linear temporal logics can be used to describe execution histories that are rewarding. Called PLTL and $FLTL, they are backward...

متن کامل

Fahiem Bacchus

Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the welldeveloped, expressive theory that includes effective solution techniques. But the Markov assumption-that dynamics and rewards depend on the current state only, and not on historyis often inappropriate. This is especially true of rewards: we frequently wish to associate rewar...

متن کامل

Rewarding Behaviors

Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the welldeveloped, expressive theory that includes effective solution techniques. But the Markov assumption—that dynamics and rewards depend on the current state only, and not on history— is often inappropriate. This is especially true of rewards: we frequently wish to associate rew...

متن کامل

Structured Sohtion Methods for

Markov Decision Processes (MDPs), currently a popular method for modeling and solving decision theoretic planning problems, are limited by the Markovian assumption: rewards and dynamics depend on the current state only, and not on previous history. Non-Markovian decision processes (NMDPs) can also be defined, but then the more tractable solution techniques developed for MDP’s cannot be directly...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002