نتایج جستجو برای: partially observable markov decision process

تعداد نتایج: 1776231  

2007
Judy Goldsmith Martin Mundhenk

It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is com...

2009
Camille Besse Brahim Chaib-draa

We study a subclass of POMDPs, called quasi-deterministic POMDPs (QDET-POMDPs), characterized by deterministic actions and stochastic observations. While this framework does not model the same general problems as POMDPs, they still capture a number of interesting and challenging problems and, in some cases, have interesting properties. By studying the observability available in this subclass, w...

2011
Lucas Agussurja Hoong Chuin Lau

We consider a partially observable Markov decision process (POMDP) model for improving a taxi agent cruising decision in a congested urban city. Using real-world data provided by a large taxi company in Singapore as a guide, we derive the state transition function of the POMDP. Specifically, we model the cruising behavior of the drivers as continuous-time Markov chains. We then apply dynamic pr...

2009
Michael R. James Satinder P. Singh

Reinforcement learning algorithms that use eligibility traces, such as Sarsa(λ), have been empirically shown to be effective in learning good estimated-state-based policies in partially observable Markov decision processes (POMDPs). Nevertheless, one can construct counterexamples, problems in which Sarsa(λ < 1 ) fails to find a good policy even though one exists. Despite this, these algorithms ...

2007
Ross Glashan Tomás Lozano-Pérez

Abstract— We describe a method for planning under uncertainty for robotic manipulation of objects by partitioning the configuration space into a set of regions that are closed under compliant motions. These regions can be treated as states in a partially observable Markov decision process (POMDP), which can be solved to yield optimal control policies under uncertainty. We demonstrate the approa...

2001
Bo Zhang Qingsheng Cai Jianfeng Mao Baining Guo

Uncertainty plays a central role in spoken dialogue systems. Some stochastic models like the Markov decision process (MDP) are used to model the dialogue manager. But the partially observable system state and user intentions hinder the natural representation of the dialogue state. A MDP-based system degrades quickly when uncertainty about a user's intention increases. We propose a novel dialogu...

2012
Zongzhang Zhang Michael L. Littman Xiaoping Chen

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable fr...

1999
Samuel P. M. Choi Dit-Yan Yeung Nevin Lianwen Zhang

Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonsta-tionary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basic...

Journal: :Cognitive Systems Research 2017
Gavin Rens Deshendran Moodley

This article presents an agent architecture for controlling an autonomous agent in stochastic environments. The architecture combines the partially observable Markov decision process (POMDP) model with the belief-desire-intention (BDI) framework. The Hybrid POMDP-BDI agent architecture takes the best features from the two approaches, that is, the online generation of reward-maximizing courses o...

2001
Bo Zhang Qingsheng Cai Jianfeng Mao Eric Chang Baining Guo

Some stochastic models like Markov decision process (MDP) are used to model the dialogue manager. MDP-based system degrades fast when uncertainty about user’s intention increases. We propose a novel dialogue model based on the partially observable Markov decision process (POMDP). We use hidden system states and user intentions as the state set, parser results and low-level information as the ob...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید