نتایج جستجو برای: markov decision process

تعداد نتایج: 1627273  

2015
Takayuki Osogami

We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to ...

2012
Wei Huang Jun Zhang

In the course lectures, we have discussed a lot regarding unconstrained Markov Decision Process (MDP). The dynamic programming decomposition and optimal policies with MDP are also given. However, in this report we are going to discuss a different MDP model, which is constrained MDP. There are many realistic demand of studying constrained MDP. For instance, in the wireless sensors networks, each...

2012
Kumer Pial Das

Outline Objective Background: Stochastic tools used in healthcare MDP in healthcare Preliminaries Optimality Equations and the Principle of Optimality Solving MDPs Examples References Objective: To discuss the construction and evaluation of Markov Decision Process (MDP) To investigate the role of MDP in healthcare. To identify the most appropriate solution techniques for finite and infinite-hor...

2016
Houju Hori Yukio Matsumoto

In this paper, we considered decision processes in which one decision is made in each process. We incorporate the utility function concept into the decision process, derived the utility function in fuzzy events and by the max-product operation obtained the utility possibility measure of the fuzzy events. In cases with numerous decision processes, the optimum action can be determined from the re...

2010
Siegmund Düll Alexander Hans Steffen Udluft

This paper presents the Markov decision process extraction network, which is a data-efficient, automatic state estimation approach for discrete-time reinforcement learning (RL) based on recurrent neural networks. The architecture is designed to model the minimal relevant dynamics of an environment, capable of condensing large sets of continuous observables to a compact state representation and ...

Journal: :Math. Meth. of OR 2002
Antonio M. Rodríguez-Chía Justo Puerto Francisco R. Fernández

In this paper, we deal with a multicriteria competitive Markov decision process. In the decision process there are two decision makers with a competitive behaviour, so they are usually called players. Their rewards are coupled because depend on the actions chosen by both players in each state of the process. We propose as solution of this game the set of Pareto-optimal security strategies for a...

2009
Qi Zhang Guangzhong Sun Yinlong Xu

Markov decision process (MDP) provides the foundations for a number of problems, such as artificial intelligence studying, automated planning and reinforcement learning. MDP can be solved efficiently in theory. However, for large scenarios, more investigations are needed to reveal practical algorithms. Algorithms for solving MDP have a natural concurrency. In this paper, we present parallel alg...

2004
Eyal Even-Dar Sham M. Kakade Yishay Mansour

We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Similar to the experts setting, we address the question of how well can an agent do when compared to the reward achieved under the best stationary policy over time. We provide efficient algorithms, which have regret bounds...

Journal: :مهندسی صنایع 0
مهدی احمدی دانشجوی دکتری دانشکدة مهندسی صنایع، دانشگاه صنعتی شریف حسن شوندی دانشیار دانشکدة مهندسی صنایع، دانشگاه صنعتی شریف

in this article, decisions about price and stock allocation for a seller with multiple customer classes are analyzed. with each customer arrival, the seller needs to decide about accepting or rejecting the customer’s demand by considering the stock on hand. in the case of acceptance, one needs to decide about the selling price. after any change in the inventory level, decision about continuing ...

A. Sadegheih, M. Abooie M. Fallahnezhad S. Yazdanparast

This study aimed at presenting a method for formulating optimal production, repair and replacement policies. The system was based on the production rate of defective parts and machine repairs and then was set up to optimize maintenance activities and related costs. The machine is either repaired or replaced. The machine is changed completely in the replacement process, but the productio...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید