Active Reward Learning from Critiques

نویسندگان

Yuchen Cui

Scott Niekum

چکیده

Learning from demonstration algorithms, such as Inverse Reinforcement Learning, aim to provide a natural mechanism for programming robots, but can often require a prohibitive number of demonstrations to capture important subtleties of a task. Rather than requesting additional demonstrations blindly, active learning methods leverage uncertainty to query the user for action labels at states with high expected information gain. However, this approach can still require a large number of labels to adequately reduce uncertainty and may also be unintuitive, as users are not accustomed to determining optimal actions in a single out-of-context state. To address these shortcomings, we propose a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that 1) queries the user for critiques of automatically generated trajectories, rather than asking for demonstrations or action labels, 2) utilizes trajectory segmentation to expedite the critique / labeling process, and 3) predicts the user’s critiques to generate the most highly informative trajectory queries. We evaluated our algorithm in simulated domains, finding it to compare favorably to prior work and a randomized baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning from Critiques via Bayesian Inverse Reinforcement Learning

متن کامل

Active Reward Learning

While reward functions are an essential component of many robot learning methods, defining such functions remains a hard problem in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. Instead, we propose to learn the reward fu...

متن کامل

Active Preference-Based Learning of Reward Functions

Our goal is to efficiently learn reward functions encoding a human’s preferences for how a dynamical system should act. There are two challenges with this. First, in many problems it is difficult for people to provide demonstrations of the desired system trajectory (like a high-DOF robot arm motion or an aggressive driving maneuver), or to even assign how much numerical reward an action or traj...

متن کامل

Active Learning for Reward Estimation in Inverse Reinforcement Learning

Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” ...

متن کامل

Performance monitoring and empathy during active and observational learning in patients with major depression.

Previous literature established a link between major depressive disorder (MDD) and altered reward processing as well as between empathy and (observational) reward learning. The aim of the present study was to assess the effects of MDD on the electrophysiological correlates - the feedback-related negativity (FRN) and the P300 - of active and observational reward processing and to relate them to ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Active Reward Learning from Critiques

نویسندگان

چکیده

منابع مشابه

Active Learning from Critiques via Bayesian Inverse Reinforcement Learning

Active Reward Learning

Active Preference-Based Learning of Reward Functions

Active Learning for Reward Estimation in Inverse Reinforcement Learning

Performance monitoring and empathy during active and observational learning in patients with major depression.

عنوان ژورنال:

اشتراک گذاری