نتایج جستجو برای: markov decision process

تعداد نتایج: 1627273  

2013
Jac Dinnes Lavinia Ferrante di Ruffano Alice Sitch Julie Parkes Jenny Hewison Doug Altman Jon Deeks

Background Given the advantages of the randomised controlled trial (RCT) design for the evaluation of therapeutic interventions, it is tempting to assume that the same approach must be the gold standard for the evaluation of testing strategies. Such trials present considerable challenges, due to the complex nature of the decision–making process. To interpret how changes in testing strategies cr...

2012
Ru-Shuo Sheu Han-Hsin Chou Te-Shyang Tan

Considering a reservoir with periodic states and different cost functions with penalty, its release rules can be modeled as a periodic Markov decision process (PMDP). First, we prove that policyiteration algorithm also works for the PMDP. Then, with policyiteration algorithm, we obtain the optimal policies for a special aperiodic reservoir model with two cost functions under large penalty and g...

Journal: :CoRR 2015
Yao Ma Hao Zhang Masashi Sugiyama

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and theoretically establish a sublinear regret bound. A notable advantage of the proposed algorithm is that it can be easily combined with function approximation, and thu...

2011
Hélène Soubaras Christophe Labreuche Pierre Savéant

This paper proposes a new model, the EMDP (Evidential Markov Decision Process). It is a MDP (Markov Decision Process) for belief functions in which rewards are defined for each state transition, like in a classical MDP, whereas the transitions are modeled as in an EMC (Evidential Markov Chain), i.e. they are sets transitions instead of states transitions. The EMDP can fit to more applications t...

2012
Siegmund Düll Lina Weichbrodt Alexander Hans Steffen Udluft

This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in...

Journal: :CoRR 2018
Zhi Chen Pengqian Yu William B. Haskell

The distributionally robust Markov Decision Process approach has been proposed in the literature, where the goal is to seek a distributionally robust policy that achieves the maximal expected total reward under the most adversarial joint distribution of uncertain parameters. In this paper, we study distributionally robust MDP where ambiguity sets for uncertain parameters are of a format that ca...

Journal: :Math. Oper. Res. 2005
Shie Mannor John N. Tsitsiklis

We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distan...

2011
András Lörincz

AGI relies on Markov Decision Processes, which assume deterministic states. However, such states must be learned. We propose that states are deterministic spatio-temporal chunks of observations and notice that learning of such episodic memory is attributed to the entorhinal hippocampal complex in the brain. EHC receives information from the neocortex and encodes learned episodes into neocortica...

Journal: :Computers & Industrial Engineering 2015
Shichang Du Rui Xu Delin Huang Xufeng Yao

Modeling and analysis of multi-stage manufacturing systems (MMSs) for product quality propagation have attracted a great deal of attention recently. Due to cost and resources constraints, MMSs do not always have ubiquitous inspection, and MMSs with remote quality information feedback (RQIF, i.e., quality inspection operation is conducted at the end of the production line) are widely applied. Th...

2007
Ronald Ortner

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید