نتایج جستجو برای: markov decision process

تعداد نتایج: 1627273  

Journal: :Revue d'Intelligence Artificielle 2003
Alain Dutech Manuel Samuelides

We present a new algorithm that extends the Reinforcement Learning framework to Partially Observed Markov Decision Processes (POMDP). The main idea of our method is to build a state extension, called exhaustive observable, which allow us to define a next processus that is Markovian. We bring the proof that solving this new process, to which classical RL methods can be applied, brings an optimal...

2008
Blaise Thomson Milica Gasic Simon Keizer François Mairesse Jost Schatzmann Kai Yu Steve J. Young

This paper presents the results of a comparative user evaluation of various approaches to dialogue management. The major contribution is a comparison of traditional systems against a system that uses a Bayesian Update of Dialogue State approach. This approach is based on the Partially Observable Markov Decision Process (POMDP), which has previously been shown to give improved robustness in simu...

2000
Nicholas Roy Joelle Pineau Sebastian Thrun

Spoken dialogue managers have benefited from using stochastic planners such as Markov Decision Processes (MDPs). However, so far, MDPs do not handle well noisy and ambiguous speech utterances. We use a Partially Observable Markov Decision Process (POMDP)-style approach to generate dialogue strategies by inverting the notion of dialogue state; the state represents the user’s intentions, rather t...

2013
Yoichi Matsuyama Iwao Akiba Akihiro Saito Tetsunori Kobayashi

In this paper, we propose a framework for conversational robots that facilitates fourparticipant groups. In three-participant conversations, the minimum unit for multiparty conversations, social imbalance, in which a participant is left behind in the current conversation, sometimes occurs. In such scenarios, a conversational robot has the potential to facilitate situations as the fourth partici...

Journal: :SIAM J. Control and Optimization 2006
Diego Klabjan Daniel Adelman

Semi-Markov decision processes on Borel spaces with deterministic kernels have many practical applications, particularly in inventory theory. Most of the results from general semi-Markov decision processes do not carry over to a deterministic kernel since such a kernel does not provide “smoothness.” We develop infinite dimensional linear programming theory for a general stochastic semi-Markov d...

2003
H. Brendan McMahan Geoffrey J. Gordon

We describe applications and theoretical results for a new class of two-player planning games. In these games, each player plans in a separate Markov Decision Process (MDP), but the costs associated with a policy in one of the MDPs depend on the policy selected by the other player. These costpaired MDPs represent an interesting and computationally tractable subset of adversarial planning proble...

2013
Chien-Cheng Huang Kwo-Jean Farn Feng-Yu Lin Frank Yeong-Sung Lin

Information security incidents frequency has been increasing dramatically, the aim of this study is to analyze the state-space reachability problems through the transition of vulnerable status after the informative system vulnerability exposure. In this research we took into consideration the time factor to analyze the arrival time to reachable states problem discussed in stochastic Petri nets....

Journal: :CoRR 2016
Zhongqi Lu Qiang Yang

We report the ‘Recurrent Deterioration’ (RD) phenomenon observed in online recommender systems. The RD phenomenon is reflected by the trend of performance degradation when the recommendation model is always trained based on users’ feedbacks of the previous recommendations. There are several reasons for the recommender systems to encounter the RD phenomenon, including the lack of negative traini...

2000
Haili Song Chen-Ching Liu Robert W. Dahlgren

The bidding decision making problem is studied from a supplier’s viewpoint in a spot market environment. The decision-making problem is formulated as a Markov Decision Process a discrete stochastic optimization method All other suppliers are modeled by their bidding parameters with corresponding probabilities. A systematic method is developed to calculate transition probabilities and rewards. A...

Journal: :Math. Oper. Res. 2009
Vikram Krishnamurthy Bo Wahlberg

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy’s approximation is presented. We then show ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید