markov decision process

نتایج جستجو برای: markov decision process

تعداد نتایج: 1627273 فیلتر نتایج به سال:

How well documented are testing strategies and outcome measurement methods in trials of tests? A comparison of reporting quality in test-treatment and monitoring RCTS

2013

Jac Dinnes Lavinia Ferrante di Ruffano Alice Sitch Julie Parkes Jenny Hewison Doug Altman Jon Deeks

Background Given the advantages of the randomised controlled trial (RCT) design for the evaluation of therapeutic interventions, it is tempting to assume that the same approach must be the gold standard for the evaluation of testing strategies. Such trials present considerable challenges, due to the complex nature of the decision–making process. To interpret how changes in testing strategies cr...

متن کامل

Periodic Storage Control Problem

2012

Ru-Shuo Sheu Han-Hsin Chou Te-Shyang Tan

Considering a reservoir with periodic states and different cost functions with penalty, its release rules can be modeled as a periodic Markov decision process (PMDP). First, we prove that policyiteration algorithm also works for the PMDP. Then, with policyiteration algorithm, we obtain the optimal policies for a special aperiodic reservoir model with two cost functions under large penalty and g...

متن کامل

Online Markov decision processes with policy iteration

Journal: :CoRR 2015

Yao Ma Hao Zhang Masashi Sugiyama

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and theoretically establish a sublinear regret bound. A notable advantage of the proposed algorithm is that it can be easily combined with function approximation, and thu...

متن کامل

Evidential Markov Decision Processes

2011

Hélène Soubaras Christophe Labreuche Pierre Savéant

This paper proposes a new model, the EMDP (Evidential Markov Decision Process). It is a MDP (Markov Decision Process) for belief functions in which rewards are defined for each state transition, like in a classical MDP, whereas the transitions are modeled as in an EMC (Evidential Markov Chain), i.e. they are sets transitions instead of states transitions. The EMDP can fit to more applications t...

متن کامل

Recurrent Neural State Estimation in Domains with Long-Term Dependencies

2012

Siegmund Düll Lina Weichbrodt Alexander Hans Steffen Udluft

This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in...

متن کامل

Distributionally Robust Optimization for Sequential Decision Making

Journal: :CoRR 2018

Zhi Chen Pengqian Yu William B. Haskell

The distributionally robust Markov Decision Process approach has been proposed in the literature, where the goal is to seek a distributionally robust policy that achieves the maximal expected total reward under the most adversarial joint distribution of uncertain parameters. In this paper, we study distributionally robust MDP where ambiguity sets for uncertain parameters are of a format that ca...

متن کامل

On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies

Journal: :Math. Oper. Res. 2005

Shie Mannor John N. Tsitsiklis

We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and establish that every element of this polytope is the limit of the empirical frequency vector, under some policy, in a strong sense. Furthermore, we show that the probability of exceeding a given distan...

متن کامل

Learning the States: A Brain Inspired Neural Model

2011

András Lörincz

AGI relies on Markov Decision Processes, which assume deterministic states. However, such states must be learned. We propose that states are deterministic spatio-temporal chunks of observations and notice that learning of such episodic memory is attributed to the entorhinal hippocampal complex in the brain. EHC receives information from the neocortex and encodes learned episodes into neocortica...

متن کامل

Markov modeling and analysis of multi-stage manufacturing systems with remote quality information feedback

Journal: :Computers & Industrial Engineering 2015

Shichang Du Rui Xu Delin Huang Xufeng Yao

Modeling and analysis of multi-stage manufacturing systems (MMSs) for product quality propagation have attracted a great deal of attention recently. Due to cost and resources constraints, MMSs do not always have ubiquitous inspection, and MMSs with remote quality information feedback (RQIF, i.e., quality inspection operation is conducted at the end of the production line) are widely applied. Th...

متن کامل

Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

2007

Ronald Ortner

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید