Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

نویسندگان

Kun Li

Joel W. Burdick

چکیده

This work handles the inverse reinforcement learning (IRL) problem where only a small number of demonstrations are available from a demonstrator for each highdimensional task, insufficient to estimate an accurate reward function. Observing that each demonstrator has an inherent reward for each state and the task-specific behaviors mainly depend on a small number of key states, we propose a meta IRL algorithm that first models the reward function for each task as a distribution conditioned on a baseline reward function shared by all tasks and dependent only on the demonstrator, and then finds the most likely reward function in the distribution that explains the task-specific behaviors. We test the method in a simulated environment on path planning tasks with limited demonstrations, and show that the accuracy of the learned reward function is significantly improved. We also apply the method to analyze the motion of a patient under rehabilitation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverse Optimal Control

In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing a reward function for a given learning task is often non trivial. Inverse Reinforcement Learning, which is sometimes also called Inverse Optimal Control, addresses this problem by learning the reward function from expert demonstrations. The aim of this paper is to give a brief introduc...

متن کامل

Reinforcement Learning from Demonstration and Human Reward

In this paper, we proposed a model-based method—IRL-TAMER— for combining learning from demonstration via inverse reinforcement learning (IRL) and learning from human reward via the TAMER framework. We tested our method in the Grid World domain and compared with the TAMER framework using different discount factors on human reward. Our results suggest that with one demonstration, although an agen...

متن کامل

The Use of Apprenticeship Learning Via Inverse Reinforcement Learning for Generating Melodies

The research presented in this paper uses apprenticeship learning via inverse reinforcement learning to ascertain a reward function in a musical context. The learning agent then used this reward function to generate new melodies using reinforcement learning. Reinforcement learning is a type of unsupervised machine learning where rewards are used to guide an agent’s learning. These rewards are u...

متن کامل

Large-Scale Inverse Reinforcement Learning via Function Approximation for Clinical Motion Analysis

This paper introduces a new method for inverse reinforcement learning in large-scale and high-dimensional state spaces. To avoid solving the computationally expensive reinforcement learning problems in reward learning, we propose a function approximation method to ensure that the Bellman Optimality Equation always holds, and then estimate a function to maximize the likelihood of the observed mo...

متن کامل

Inverse Reinforcement Learning in Large State Spaces via Function Approximation

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1710.03592 شماره

صفحات -

تاریخ انتشار 2017

Meta Inverse Reinforcement Learning via Maximum Reward Sharing for Human Motion Analysis

نویسندگان

چکیده

منابع مشابه

Inverse Optimal Control

Reinforcement Learning from Demonstration and Human Reward

The Use of Apprenticeship Learning Via Inverse Reinforcement Learning for Generating Melodies

Large-Scale Inverse Reinforcement Learning via Function Approximation for Clinical Motion Analysis

Inverse Reinforcement Learning in Large State Spaces via Function Approximation

عنوان ژورنال:

اشتراک گذاری