Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences

نویسنده

  • Paul Weng
چکیده

In a standard Markov decision process (MDP), rewards are assumed to be precisely known and of quantitative nature. This can be a too strong hypothesis in some situations. When rewards can really be modeled numerically, specifying the reward function is often difficult as it is a cognitively-demanding and/or time-consuming task. Besides, rewards can sometimes be of qualitative nature as when they represent qualitative risk levels for instance. In those cases, it is problematic to use directly standard MDPs and we propose instead to resort to MDPs with ordinal rewards. Only a total order over rewards is assumed to be known. In this setting, we explain how an alternative way to define expressive and interpretable preferences using reference points can be exploited.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ordinal Decision Models for Markov Decision Processes

Setting the values of rewards in Markov decision processes (MDP) may be a difficult task. In this paper, we consider two ordinal decision models for MDPs where only an order is known over rewards. The first one, which has been proposed recently in MDPs [23], defines preferences with respect to a reference point. The second model, which can been viewed as the dual approach of the first one, is b...

متن کامل

Interactive Q-Learning with Ordinal Rewards and Unreliable Tutor

Conventional reinforcement learning (RL) requires the specification of a numeric reward function, which is often a difficult task. In this paper, we extend the Q-learning approach toward the handling of ordinal rewards. The method we propose is interactive in the sense of allowing the agent to query a tutor for comparing sequences of ordinal rewards. More specifically, this method can be seen a...

متن کامل

Markov decision processes with fuzzy rewards

In this paper, we consider the model that the information on the rewards in vector-valued Markov decision processes includes imprecision or ambiguity. The fuzzy reward model is analyzed as follows: The fuzzy reward is represented by the fuzzy set on the multi-dimensional Euclidian space R and the infinite horizon fuzzy expected discounted reward(FEDR) from any stationary policy is characterized...

متن کامل

A New Clustering Technic by the Preferences of the Objective in Data Envelopment Analysis

The ways of placing decision making units (DMUs) in certain clusters are found as a subject in statistics, these ways usually are heuristic. The proposed clustering approach in this article considers preferences of DMUs. This study applies Data Envelopment Analysis (DEA) DMUs are clustered by solving multi-objective linear problem (MOLP) and by considering preferences of each DMU at production ...

متن کامل

Reference-based preferences aggregation procedures in multi-criteria decision making

This paper aims at introducing and investigating a new family of merely qualitative models for multicriteria decision making. Such models do not require any numerical representation. Within this family, we will focus on decision rules using reference levels in order to help comparing several alternatives. We will investigate both the descriptive potential of such rules and their axiomatic found...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011