نتایج جستجو برای: partially observable markov decision process

تعداد نتایج: 1776231  

2003
Hajime Fujita Yoichiro Matsuno Shin Ishii

We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning (RL) problem. Since there are often a lot of unobservable cards in this game, RL is approximately dealt with in the framework of a partially observable Markov decision process (POMDP). This article presents a POMDP-RL method based on estimation of unobservable state variable...

2008
Rakesh Gosangi Ricardo Gutierrez-Osuna

We present an active-perception strategy to optimize the temperature program of metal-oxide sensors in real time, as the sensor reacts with its environment. We model the problem as a partially observable Markov decision process (POMDP), where actions correspond to measurements at particular temperatures, and the agent is to find a temperature sequence that minimizes the Bayes risk. We validate ...

1996
Richard Washington

This paper presents an approach to building plans using partially observable Markov decision processes. The approach begins with a base solution that assumes full observability. The partially observable solution is incrementally constructed by considering increasing amounts of information from observations. The base solution directs the expansion of the plan by providing an evaluation function ...

2002
Samuel W. Hasinoff

In this paper, we describe how techniques from reinforcement learning might be used to approach the problem of acting under uncertainty. We start by introducing the theory of partially observable Markov decision processes (POMDPs) to describe what we call hidden state problems. After a brief review of other POMDP solution techniques, we motivate reinforcement learning by considering an agent wi...

1994
Anthony R. Cassandra Leslie Pack Kaelbling Michael L. Littman

In this paper, we describe the partially observable Markov decision process (pomdp) approach to nding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest t...

2010
Jeremiah T. Folsom-Kovarik Gita Reese Sukthankar Sae Lynne Schatz Denise M. Nicholson

A promising application area for proactive assistant agents is automated tutoring and training. Intelligent tutoring systems (ITSs) assist tutors and tutees by automating diagnosis and adaptive tutoring. These tasks are well modeled by a partially observable Markov decision process (POMDP) since it accounts for the uncertainty inherent in diagnosis. However, an important aspect of making POMDP ...

2011
Linus Gisslén Matthew D. Luciw Vincent Graziano Jürgen Schmidhuber

Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential Constant-Size Compressor (SCSC). The SCSC takes the form of a sequential Recurrent Auto-Associative Me...

Journal: :Operations Research 2006
Shoshana Anily Abraham Grosfeld-Nir

A batch production process that is initially in the in-control state can fail with constant failure rate to the out-of-control state. The probability that a unit is conforming if produced while the process is in control is constant and higher than the respective constant conformance probability while the process is out of control. When production ends, the units are inspected in the order they ...

2015
Diederik M. Roijers Shimon Whiteson Frans A. Oliehoek

Many sequential decision-making problems require an agent to reason about both multiple objectives and uncertainty regarding the environment’s state. Such problems can be naturally modelled as multi-objective partially observable Markov decision processes (MOPOMDPs). We propose optimistic linear support with alpha reuse (OLSAR), which computes a bounded approximation of the optimal solution set...

2007
Matthijs Spaan

For reinforcement learning in environments in which an agent has access to a reliable state signal, methods based on the Markov decision process (MDP) have had many successes. In many problem domains, however, an agent suffers from limited sensing capabilities that preclude it from recovering a Markovian state signal from its perceptions. Extending the MDP framework, partially observable Markov...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید