Imitation Learning as f-Divergence Minimization

نویسندگان

چکیده

We address the problem of imitation learning with multi-modal demonstrations. Instead attempting to learn all modes, we argue that in many tasks it is sufficient imitate any one them. show state-of-the-art methods such as GAIL and behavior cloning, due their choice loss function, often incorrectly interpolate between modes. Our key insight minimize right divergence learner expert state-action distributions, namely reverse KL or I-projection. propose a general framework for estimating minimizing f-Divergence. By plugging different divergences, are able recover existing algorithms Behavior Cloning (Kullback-Leibler), (Jensen Shannon) DAgger (Total Variation). Empirical results our approximate I-projection technique behaviors more reliably than cloning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Racing F-ZERO with Imitation Learning

With this project we use the Retro Learning Environment for the SNES game F-Zero in order to reproduce work in the Imitation Learning space. We were inspired by some of the recent successes in Reinforcement Learning and Imitation Learning, and hoped to achieve a deeper understanding of what kind of impact both model architecture, human, and hand-refined policies could have on the eventual learn...

متن کامل

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generati...

متن کامل

Imitation Learning as Cause-Effect Reasoning

We propose a framework for general-purpose imitation learning centered on cause-effect reasoning. Our approach infers a hierarchical representation of a demonstrator’s intentions, which can explain why they acted as they did. This enables rapid generalization of the observed actions to new situations. We employ a novel causal inference algorithm with formal guarantees and connections to automat...

متن کامل

Neuroprosthetic Decoder Training as Imitation Learning

Neuroprosthetic brain-computer interfaces function via an algorithm which decodes neural activity of the user into movements of an end effector, such as a cursor or robotic arm. In practice, the decoder is often learned by updating its parameters while the user performs a task. When the user's intention is not directly observable, recent methods have demonstrated value in training the decoder a...

متن کامل

Distributionally Robust Games, Part I: f-Divergence and Learning

In this paper we introduce the novel framework of distributionally robust games. These are multi-player games where each player models the state of nature using a worst-case distribution, also called adversarial distribution. Thus each player’s payoff depends on the other players’ decisions and on the decision of a virtual player (nature) who selects an adversarial distribution of scenarios. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Springer proceedings in advanced robotics

سال: 2021

ISSN: ['2511-1256', '2511-1264']

DOI: https://doi.org/10.1007/978-3-030-66723-8_19