Successive approximations for Markov decision processes and Markov games with unbounded rewards
نویسندگان
چکیده
منابع مشابه
Successive approximations for Markov decision processes and Markov games with unbounded rewards
• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version ...
متن کاملDenumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards
This paper establishes the existence of a solution to the optimality equations in undiscounted semi-Markov decision models with countable state space, under conditions generalizing the hitherto obtained results. In particular, we merely require the existence of a finite set of states in which every pair of states can reach each other via some stationary policy, instead of the traditional and re...
متن کاملMarkov Decision Processes with Functional Rewards
Markov decision processes (MDP) have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that sev...
متن کاملMarkov decision processes with fuzzy rewards
In this paper, we consider the model that the information on the rewards in vector-valued Markov decision processes includes imprecision or ambiguity. The fuzzy reward model is analyzed as follows: The fuzzy reward is represented by the fuzzy set on the multi-dimensional Euclidian space R and the infinite horizon fuzzy expected discounted reward(FEDR) from any stationary policy is characterized...
متن کاملInteractive Value Iteration for Markov Decision Processes with Unknown Rewards
To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible rewa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematische Operationsforschung und Statistik. Series Optimization
سال: 1979
ISSN: 0323-3898
DOI: 10.1080/02331937908842597