Active Reward Learning from Critiques
نویسندگان
چکیده
Learning from demonstration algorithms, such as Inverse Reinforcement Learning, aim to provide a natural mechanism for programming robots, but can often require a prohibitive number of demonstrations to capture important subtleties of a task. Rather than requesting additional demonstrations blindly, active learning methods leverage uncertainty to query the user for action labels at states with high expected information gain. However, this approach can still require a large number of labels to adequately reduce uncertainty and may also be unintuitive, as users are not accustomed to determining optimal actions in a single out-of-context state. To address these shortcomings, we propose a novel trajectory-based active Bayesian inverse reinforcement learning algorithm that 1) queries the user for critiques of automatically generated trajectories, rather than asking for demonstrations or action labels, 2) utilizes trajectory segmentation to expedite the critique / labeling process, and 3) predicts the user’s critiques to generate the most highly informative trajectory queries. We evaluated our algorithm in simulated domains, finding it to compare favorably to prior work and a randomized baseline.
منابع مشابه
Active Learning from Critiques via Bayesian Inverse Reinforcement Learning
Learning from demonstration algorithms, such as Inverse Reinforcement Learning, aim to provide a natural mechanism for programming robots, but can often require a prohibitive number of demonstrations to capture important subtleties of a task. Rather than requesting additional demonstrations blindly, active learning methods leverage uncertainty to query the user for action labels at states with ...
متن کاملActive Reward Learning
While reward functions are an essential component of many robot learning methods, defining such functions remains a hard problem in many practical applications. For tasks such as grasping, there are no reliable success measures available. Defining reward functions by hand requires extensive task knowledge and often leads to undesired emergent behavior. Instead, we propose to learn the reward fu...
متن کاملActive Preference-Based Learning of Reward Functions
Our goal is to efficiently learn reward functions encoding a human’s preferences for how a dynamical system should act. There are two challenges with this. First, in many problems it is difficult for people to provide demonstrations of the desired system trajectory (like a high-DOF robot arm motion or an aggressive driving maneuver), or to even assign how much numerical reward an action or traj...
متن کاملActive Learning for Reward Estimation in Inverse Reinforcement Learning
Inverse reinforcement learning addresses the general problem of recovering a reward function from samples of a policy provided by an expert/demonstrator. In this paper, we introduce active learning for inverse reinforcement learning. We propose an algorithm that allows the agent to query the demonstrator for samples at specific states, instead of relying only on samples provided at “arbitrary” ...
متن کاملPerformance monitoring and empathy during active and observational learning in patients with major depression.
Previous literature established a link between major depressive disorder (MDD) and altered reward processing as well as between empathy and (observational) reward learning. The aim of the present study was to assess the effects of MDD on the electrophysiological correlates - the feedback-related negativity (FRN) and the P300 - of active and observational reward processing and to relate them to ...
متن کامل