APRIL: Active Preference-based Reinforcement Learning
نویسندگان
چکیده
This work tackles in-situ robotics: the goal is to learn a policy while the robot operates in the real-world, with neither ground truth nor rewards. The proposed approach is based on preference-based policy learning: Iteratively, the robot demonstrates a few policies, is informed of the expert’s preferences about the demonstrated policies, constructs a utility function compatible with all expert preferences, uses it in a self-training phase, and demonstrates in the next iteration a new policy. While in previous work, the new policy was one maximizing the current utility function, this paper uses active ranking to select the most informative policy (Viappiani and Boutilier 2010). The challenge is the following. The policy return estimate (the expert’s approximate preference function) learned from the policy parametric space, referred to as direct representation, fails to give any useful information; indeed, arbitrary small modifications of the direct policy representation can produce significantly different behaviors, and thus entail different appreciations from the expert. A behavioral policy space, referred to as indirect representation and automatically built from the sensori-motor data stream generated by the operating robot, is therefore devised and used to express the policy return estimate. In the meanwhile, active ranking criteria are classically expressed w.r.t. the explicit domain representation − here the direct policy representation. A novelty of the paper is to show how active ranking can be achieved through black-box optimization on the indirect policy representation. Two experiments in single and two-robot settings are used to illustrate the approach.
منابع مشابه
APRIL: Active Preference Learning-Based Reinforcement Learning
This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent de...
متن کاملPreference-Based Reinforcement Learning: A preliminary survey
Preference-based reinforcement learning has gained significant popularity over the years, but it is still unclear what exactly preference learning is and how it relates to other reinforcement learning tasks. In this paper, we present a general definition of preferences as well as some insight how these approaches compare to reinforcement learning, inverse reinforcement learning and other relate...
متن کاملPreference-based Reinforcement Learning
This paper investigates the problem of policy search based on the only expert’s preferences. Whereas reinforcement learning classically relies on a reward function, or exploits the expert’s demonstrations, preference-based policy learning (PPL) iteratively builds and optimizes a policy return estimate as follows: The learning agent demonstrates a few policies, is informed of the expert’s prefer...
متن کاملPreference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a “preference-based” approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numeri...
متن کاملTowards Preference-Based Reinforcement Learning
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs o...
متن کامل