APRIL: Active Preference-based Reinforcement Learning

نویسندگان

Riad Akrour

Marc Schoenauer

Michèle Sebag

چکیده

This work tackles in-situ robotics: the goal is to learn a policy while the robot operates in the real-world, with neither ground truth nor rewards. The proposed approach is based on preference-based policy learning: Iteratively, the robot demonstrates a few policies, is informed of the expert’s preferences about the demonstrated policies, constructs a utility function compatible with all expert preferences, uses it in a self-training phase, and demonstrates in the next iteration a new policy. While in previous work, the new policy was one maximizing the current utility function, this paper uses active ranking to select the most informative policy (Viappiani and Boutilier 2010). The challenge is the following. The policy return estimate (the expert’s approximate preference function) learned from the policy parametric space, referred to as direct representation, fails to give any useful information; indeed, arbitrary small modifications of the direct policy representation can produce significantly different behaviors, and thus entail different appreciations from the expert. A behavioral policy space, referred to as indirect representation and automatically built from the sensori-motor data stream generated by the operating robot, is therefore devised and used to express the policy return estimate. In the meanwhile, active ranking criteria are classically expressed w.r.t. the explicit domain representation − here the direct policy representation. A novelty of the paper is to show how active ranking can be achieved through black-box optimization on the indirect policy representation. Two experiments in single and two-robot settings are used to illustrate the approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

APRIL: Active Preference Learning-Based Reinforcement Learning

This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent de...

متن کامل

Preference-Based Reinforcement Learning: A preliminary survey

Preference-based reinforcement learning has gained significant popularity over the years, but it is still unclear what exactly preference learning is and how it relates to other reinforcement learning tasks. In this paper, we present a general definition of preferences as well as some insight how these approaches compare to reinforcement learning, inverse reinforcement learning and other relate...

متن کامل

Preference-based Reinforcement Learning

This paper investigates the problem of policy search based on the only expert’s preferences. Whereas reinforcement learning classically relies on a reward function, or exploits the expert’s demonstrations, preference-based policy learning (PPL) iteratively builds and optimizes a policy return estimate as follows: The learning agent demonstrates a few policies, is informed of the expert’s prefer...

متن کامل

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...

متن کامل

Towards Preference-Based Reinforcement Learning

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

APRIL: Active Preference-based Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

APRIL: Active Preference Learning-Based Reinforcement Learning

Preference-Based Reinforcement Learning: A preliminary survey

Preference-based Reinforcement Learning

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

Towards Preference-Based Reinforcement Learning

عنوان ژورنال:

اشتراک گذاری