Imitation Learning as f-Divergence Minimization
نویسندگان
چکیده
We address the problem of imitation learning with multi-modal demonstrations. Instead attempting to learn all modes, we argue that in many tasks it is sufficient imitate any one them. show state-of-the-art methods such as GAIL and behavior cloning, due their choice loss function, often incorrectly interpolate between modes. Our key insight minimize right divergence learner expert state-action distributions, namely reverse KL or I-projection. propose a general framework for estimating minimizing f-Divergence. By plugging different divergences, are able recover existing algorithms Behavior Cloning (Kullback-Leibler), (Jensen Shannon) DAgger (Total Variation). Empirical results our approximate I-projection technique behaviors more reliably than cloning.
منابع مشابه
Racing F-ZERO with Imitation Learning
With this project we use the Retro Learning Environment for the SNES game F-Zero in order to reproduce work in the Imitation Learning space. We were inspired by some of the recent successes in Reinforcement Learning and Imitation Learning, and hoped to achieve a deeper understanding of what kind of impact both model architecture, human, and hand-refined policies could have on the eventual learn...
متن کاملf-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generati...
متن کاملImitation Learning as Cause-Effect Reasoning
We propose a framework for general-purpose imitation learning centered on cause-effect reasoning. Our approach infers a hierarchical representation of a demonstrator’s intentions, which can explain why they acted as they did. This enables rapid generalization of the observed actions to new situations. We employ a novel causal inference algorithm with formal guarantees and connections to automat...
متن کاملNeuroprosthetic Decoder Training as Imitation Learning
Neuroprosthetic brain-computer interfaces function via an algorithm which decodes neural activity of the user into movements of an end effector, such as a cursor or robotic arm. In practice, the decoder is often learned by updating its parameters while the user performs a task. When the user's intention is not directly observable, recent methods have demonstrated value in training the decoder a...
متن کاملDistributionally Robust Games, Part I: f-Divergence and Learning
In this paper we introduce the novel framework of distributionally robust games. These are multi-player games where each player models the state of nature using a worst-case distribution, also called adversarial distribution. Thus each player’s payoff depends on the other players’ decisions and on the decision of a virtual player (nature) who selects an adversarial distribution of scenarios. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Springer proceedings in advanced robotics
سال: 2021
ISSN: ['2511-1256', '2511-1264']
DOI: https://doi.org/10.1007/978-3-030-66723-8_19