Off-Environment RL with Rare Events
نویسندگان
چکیده
Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables – state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.
منابع مشابه
Automated Curriculum Learning by Rewarding Temporally Rare Events
Reward shaping allows reinforcement learning (RL) agents to accelerate learning by receiving additional reward signals. However, these signals can be difficult to design manually, especially for complex RL tasks. We propose a simple and general approach that determines the reward of pre-defined events by their rarity alone. Here events become less rewarding as they are experienced more often, w...
متن کاملIntegral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space
Policy iteration (PI) is a recursive process of policy evaluation and improvement to solve an optimal decision-making, e.g., reinforcement learning (RL) or optimal control problem and has served as the fundamental to develop RL methods. Motivated by integral PI (IPI) schemes in optimal control and RL methods in continuous time and space (CTS), this paper proposes on-policy IPI to solve the gene...
متن کاملOFFER: Off-Environment Reinforcement Learning
Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables – state features that are randomly determined by the environment in a physical setting but controlla...
متن کاملReinforcement Learning of Intelligent Characters in Fighting Action Games
Abstract. In this paper, we investigate reinforcement learning (RL) of intelligent characters, based on neural network technology, for fighting action games. RL can be either on-policy or off-policy. We apply both schemes to tabula rasa learning and adaptation. The experimental results show that (1) in tabula rasa leaning, off-policy RL outperforms on-policy RL, but (2) in adaptation, on-policy...
متن کاملDevelopment of a mechanical earth model in an Iranian off-shore gas field
Wellbore instability is a quite common event during drilling, and causes many problems such as stuck pipe and lost circulation. It is primarily due to the inadequate understanding of the rock properties, pore pressure, and earth stress environment prior to well construction. This study aims to use the existing relevant logs, drilling, and other data from offset wells to construct a precise mech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016