RAIL: Risk-Averse Imitation Learning

نویسندگان

  • Anirban Santara
  • Abhishek Naik
  • Balaraman Ravindran
  • Dipankar Das
  • Dheevatsa Mudigere
  • Sasikanth Avancha
  • Bharat Kaul
چکیده

Imitation learning algorithms learn viable policies by imitating an expert’s behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert’s behavior is available as a fixed set of trajectories. We evaluate in terms of the expert’s cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAILagents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in safety-critical applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail-risk within the GAIL framework. We quantify tail-risk by the Conditional-Value-at-Risk (CV aR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in safety-critical applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of p300 in risk-seeker and risk-averse people during simple gambling task

Risk preference, the degree of tendency to take risk, has a fundamental role at individual and social health and is divided to risk seeker and risk averse. Therefore, the study of neural corelates of risk preferences is essential at the field of psychology and psychiatry. The current study aimed to examine and compare an ERP component named P300 between subjects with different risk preferences....

متن کامل

Risk premiums and certainty equivalents of loss-averse newsvendors of bounded utility

Loss-averse behavior makes the newsvendors avoid the losses more than seeking the probable gains as the losses have more psychological impact on the newsvendor than the gains. In economics and decision theory, the classical newsvendor models treat losses and gains equally likely, by disregarding the expected utility when the newsvendor is loss-averse. Moreover, the use of unbounded utility to m...

متن کامل

Understanding safety and production risks in rail engineering planning and protection.

Much of the published human factors work on risk is to do with safety and within this is concerned with prediction and analysis of human error and with human reliability assessment. Less has been published on human factors contributions to understanding and managing project, business, engineering and other forms of risk and still less jointly assessing risk to do with broad issues of 'safety' a...

متن کامل

Optimal Patent Length and Patent Width for an Economy with Creative Destruction and Non-Diversifiable Risk

This study examines optimal public policy in a product cycle model where R&D firms innovate and imitate and households face nondiversifiable risk. The government controls product cycles by two policy instruments: patent length, i.e. the expected time an innovation is imitated, and patent width, i.e. the innovator’s profit after a successful imitation relative to that before. The main results ar...

متن کامل

Learning to Search via Self-Imitation

We study the problem of learning a good search policy. To do so, we propose the self-imitation learning setting, which builds upon imitation learning in two ways. First, self-imitation uses feedback provided by retrospective analysis of demonstrated search traces. Second, the policy can learn from its own decisions and mistakes without requiring repeated feedback from an external expert. Combin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.06658  شماره 

صفحات  -

تاریخ انتشار 2017