A Comparison of Exploration/Exploitation Techniques for a Q-Learning Agent in the Wumpus World

نویسنده

  • A. Friesen
چکیده

The Q-Learning algorithm, suggested by Watkins [1], has become one of the most popular reinforcement learning algorithms due to its relatively simple implementation and the complexity reduction gained by the use of a model-free method. However, QLearning does not specify how to trade off exploration of the world for exploitation of the developed policy. Multiple such tradeoffs are possible and preference of one over the other should depend mainly on whether a fast, but less accurate convergence to a policy is desired or whether a slower convergence to a more accurate policy is better. This paper will present the results of several exploration vs. exploitation (EE) methods within the contrived environment of the Wumpus World [2].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Q-learning in Wumpus World

Wumpus world is a toy problem domain used in [3] to introduce the concept of logical agents. Using agents that reason about the world with the tools of formal logic is an intuitive approach that underpinned much of the earlier research in artificial intelligence, however there are issues associated with using formal logic as a basis for reasoning (such as the frame problem). I propose to implem...

متن کامل

Exploration Methods for Connectionist Q-learning in Bomberman

In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative ...

متن کامل

An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...

متن کامل

Balancing Exploration and Exploitation in Agent Learning

The issue of controlling the ratio of exploration and exploitation in agent learning in dynamic environments provides a continuing challenge in the application of agent learning techniques. Methods to control this ratio in a manner that mimics human behavior are required for use in the representation of human behavior, which seek to constrain agent learning mechanisms in a manner similar to tha...

متن کامل

Q-Learning with Basic Emotions

Q-learning is a simple and powerful tool in solving dynamic problems where environments are unknown. It uses a balance of exploration and exploitation to find an optimal solution to the problem. In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Qlearning agent. Simulations show that the proposed affective agent requires lesser number of steps to f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008