Modelling the dynamics of multiagent Q-learning with ε-greedy exploration

نویسندگان

  • Eduardo Rodrigues Gomes
  • Ryszard Kowalczyk
چکیده

We present a framework to model the dynamics of Multiagent Q-learning with -greedy exploration. The applicability of the framework is tested through experiments in typical games selected from the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration

Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, ...

متن کامل

Exploration Methods for Connectionist Q-learning in Bomberman

In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative ...

متن کامل

Exploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning

Difficulty of making a balance between exploration and exploitation in multiagent environment is a dilemma that does not have a clear answer and there are still different methods for investigation of this problem that all refer to it. In this paper, we provide a method based on fuzzy variables for making exploration and exploitation in multiagent environment. In this method, an effective agent ...

متن کامل

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

This paper proposes “Value-Difference Based Exploration combined with Softmax action selection” (VDBE-Softmax) as an adaptive exploration/exploitation policy for temporal-difference learning. The advantage of the proposed approach is that exploration actions are only selected in situations when the knowledge about the environment is uncertain, which is indicated by fluctuating values during lea...

متن کامل

Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences

This paper presents “Value-Difference Based Exploration” (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent’s uncertainty about the environment. VDB...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009