q learning

Combining Model-Based Meta-Reasoning and Reinforcement Learning for Adapting Game-Playing Agents

2008

Patrick Ulam Joshua Jones Ashok K. Goel

Human experience with interactive games will be enhanced if the game-playing software agents learn from their failures and do not make the same mistakes over and over again. Reinforcement learning, e.g., Q-Learning, provides one method for learning from failures. Model-based meta-reasoning that uses an agent’s self-model for blame assignment provides another. In this paper, we combine the two m...

متن کامل

Nash Q-Learning for General-Sum Stochastic Games

Journal: :Journal of Machine Learning Research 2003

Junling Hu Michael P. Wellman

We extend Q-learning to a noncooperative multiagent context, using the framework of generalsum stochastic games. A learning agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values. This learning protocol provably converges given certain restrictions on the stage games (defined by Q-values) that arise during learn...

متن کامل

Coco-Q: Learning in Stochastic Games with Side Payments

2013

Eric Sodomka Elizabeth Hilliard Michael L. Littman Amy Greenwald

Coco (“cooperative/competitive”) values are a solution concept for two-player normalform games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing ...

متن کامل

A Competitive Strategy for Function Approximation in Q-Learning

2011

Alejandro Agostini Enric Celaya

In this work we propose an approach for generalization in continuous domain Reinforcement Learning that, instead of using a single function approximator, tries many different function approximators in parallel, each one defined in a different region of the domain. Associated with each approximator is a relevance function that locally quantifies the quality of its approximation, so that, at each...

متن کامل

Adopel: adaptive Data Collection Protocol using Reinforcement Learning for VANETs

Journal: :JCS 2014

Ahmed Soua Hossam Afifi

Efficient propagation of information over a vehicular wireless network has usually remained the focus of the research community. Although, scanty contributions have been made in the field of vehicular data collection and more especially in applying learning techniques to such a very changing networking scheme. These smart learning approaches excel in making the collecting operation more reactiv...

متن کامل

Interactive Spoken Content Retrieval by Deep Reinforcement Learning

2016

Yen-Chen Wu Tzu-Hsiang Lin Yang-De Chen Hung-yi Lee Lin-Shan Lee

User-machine interaction is important for spoken content retrieval. For text content retrieval, the user can easily scan through and select on a list of retrieved item. This is impossible for spoken content retrieval, because the retrieved items are difficult to show on screen. Besides, due to the high degree of uncertainty for speech recognition, the retrieval results can be very noisy. One wa...

متن کامل

The Necessity of Average Rewards in Cooperative Multirobot Learning

2002

Poj Tangamchit John M. Dolan Pradeep K. Khosla

Learning can be an effective way for robot systems to deal with dynamic environments and changing task conditions. However, popular singlerobot learning algorithms based on discounted rewards, such as Q learning, do not achieve cooperation (i.e., purposeful division of labor) when applied to task-level multirobot systems. A tasklevel system is defined as one performing a mission that is decompo...

متن کامل

Integration of Genetic Programming and Reinforcement Learning for Real Robots

2003

Shotaro Kamio Hideyuki Mitsuhasi Hitoshi Iba

We propose an integrated technique of genetic programming (GP) and reinforcement learning (RL) that allows a real robot to execute real-time learning. Our technique does not need a precise simulator because learning is done with a real robot. Moreover, our technique makes it possible to learn optimal actions in real robots. We show the result of an experiment with a real robot AIBO and represen...

متن کامل

Fast Learning in an Actor-Critic Architecture with Reward and Punishment

2008

Christian Balkenius Stefan Winberg

A reinforcement architecture is introduced that consists of three complementary learning systems with different generalization abilities. The ACTOR learns state-action associations, the CRITIC learns a goal-gradient, and the PUNISH system learns what actions to avoid. The architecture is compared to the standard actor-crititc and Q-learning models on a number of maze learning tasks. The novel a...

متن کامل

Multi-goal Q-learning of cooperative teams

Journal: :Expert Syst. Appl. 2011

Jing Li Zhaohan Sheng Kwan-Chew Ng

This paper studies a multi-goal Q-learning algorithm of cooperative teams. Member of the cooperative teams is simulated by an agent. In the virtual cooperative team, agents adapt its knowledge according to cooperative principles. The multi-goal Q-learning algorithm is approached to the multiple learning goals. In the virtual team, agents learn what knowledge to adopt and how much to learn (choo...

متن کامل