A Comparison of Exploration/Exploitation Techniques for a Q-Learning Agent in the Wumpus World
نویسنده
چکیده
The Q-Learning algorithm, suggested by Watkins [1], has become one of the most popular reinforcement learning algorithms due to its relatively simple implementation and the complexity reduction gained by the use of a model-free method. However, QLearning does not specify how to trade off exploration of the world for exploitation of the developed policy. Multiple such tradeoffs are possible and preference of one over the other should depend mainly on whether a fast, but less accurate convergence to a policy is desired or whether a slower convergence to a more accurate policy is better. This paper will present the results of several exploration vs. exploitation (EE) methods within the contrived environment of the Wumpus World [2].
منابع مشابه
Q-learning in Wumpus World
Wumpus world is a toy problem domain used in [3] to introduce the concept of logical agents. Using agents that reason about the world with the tools of formal logic is an intuitive approach that underpinned much of the earlier research in artificial intelligence, however there are issues associated with using formal logic as a basis for reasoning (such as the frame problem). I propose to implem...
متن کاملExploration Methods for Connectionist Q-learning in Bomberman
In this paper, we investigate which exploration method yields the best performance in the game Bomberman. In Bomberman the controlled agent has to kill opponents by placing bombs. The agent is represented by a multi-layer perceptron that learns to play the game with the use of Q-learning. We introduce two novel exploration strategies: Error-Driven-ε and Interval-Q, which base their explorative ...
متن کاملAn Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources
This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...
متن کاملBalancing Exploration and Exploitation in Agent Learning
The issue of controlling the ratio of exploration and exploitation in agent learning in dynamic environments provides a continuing challenge in the application of agent learning techniques. Methods to control this ratio in a manner that mimics human behavior are required for use in the representation of human behavior, which seek to constrain agent learning mechanisms in a manner similar to tha...
متن کاملQ-Learning with Basic Emotions
Q-learning is a simple and powerful tool in solving dynamic problems where environments are unknown. It uses a balance of exploration and exploitation to find an optimal solution to the problem. In this paper, we propose using four basic emotions: joy, sadness, fear, and anger to influence a Qlearning agent. Simulations show that the proposed affective agent requires lesser number of steps to f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008