Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

نویسندگان

Michel Tokic

Günther Palm

چکیده

This paper proposes “Value-Difference Based Exploration combined with Softmax action selection” (VDBE-Softmax) as an adaptive exploration/exploitation policy for temporal-difference learning. The advantage of the proposed approach is that exploration actions are only selected in situations when the knowledge about the environment is uncertain, which is indicated by fluctuating values during learning. The method is evaluated in experiments having deterministic rewards and a mixture of both deterministic and stochastic rewards. The results show that a VDBE-Softmax policy can outperform ε-greedy, Softmax and VDBE policies in combination with onand off-policy learning algorithms such as Q-learning and Sarsa. Furthermore, it is also shown that VDBE-Softmax is more reliable in case of value-function oscillations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences

This paper presents “Value-Difference Based Exploration” (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent’s uncertainty about the environment. VDB...

متن کامل

Task Allocation through Vacancy Chains: Action Selection in Multi-Robot Learning

We present an adaptive multi-robot task allocation algorithm based on vacancy chains, a resource distribution process common in animal and human societies. The algorithm uses individual reinforcement learning of task utilities and relies on the specializing abilities of the members of the group to promote dedicated optimal allocation patterns. We demonstrate through experiments in simulation, t...

متن کامل

Dynamic Locomotion Skills for Obstacle Sequences Using Reinforcement Learning

Most locomotion control strategies are developed for flat terrain. We explore the use of reinforcement learning to develop motor skills for the highly dynamic traversal of terrains having sequences of gaps, walls, and steps. Results are demonstrated using simulations of a 21-link planar dog and a 7-link planar biped. Our approach is characterized by: non-parametric representation of the value f...

متن کامل

Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number ...

متن کامل

Task Allocation through Vacancy Chains: Effects of Action Selection in Multi-Robot Learning

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

نویسندگان

چکیده

منابع مشابه

Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences

Task Allocation through Vacancy Chains: Action Selection in Multi-Robot Learning

Dynamic Locomotion Skills for Obstacle Sequences Using Reinforcement Learning

Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

Task Allocation through Vacancy Chains: Effects of Action Selection in Multi-Robot Learning

عنوان ژورنال:

اشتراک گذاری