Off-Environment RL with Rare Events

نویسندگان

  • Kamil Ciosek
  • Shimon Whiteson
چکیده

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables – state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Curriculum Learning by Rewarding Temporally Rare Events

Reward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, w...

متن کامل

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Policy iteration (PI) is a recursive process of policy evaluation and improvement to solve an optimal decision-making, e.g., reinforcement learning (RL) or optimal control problem and has served as the fundamental to develop RL methods. Motivated by integral PI (IPI) schemes in optimal control and RL methods in continuous time and space (CTS), this paper proposes on-policy IPI to solve the gene...

متن کامل

OFFER: Off-Environment Reinforcement Learning

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables – state features that are randomly determined by the environment in a physical setting but controlla...

متن کامل

Reinforcement Learning of Intelligent Characters in Fighting Action Games

Abstract. In this paper, we investigate reinforcement learning (RL) of intelligent characters, based on neural network technology, for fighting action games. RL can be either on-policy or off-policy. We apply both schemes to tabula rasa learning and adaptation. The experimental results show that (1) in tabula rasa leaning, off-policy RL outperforms on-policy RL, but (2) in adaptation, on-policy...

متن کامل

Development of a mechanical earth model in an Iranian off-shore gas field

Wellbore instability is a quite common event during drilling, and causes many problems such as stuck pipe and lost circulation. It is primarily due to the inadequate understanding of the rock properties, pore pressure, and earth stress environment prior to well construction. This study aims to use the existing relevant logs, drilling, and other data from offset wells to construct a precise mech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016