q learning

A Reinforcement Learning Solution for Allocating Replicated Fragments in a Distributed Database

Journal: :Computación y Sistemas 2007

Abel Rodríguez Darien Rosa Paz Marisela Mainegra Hing Luisa Manuela González González

Due to the complexity of the data distribution problem in Distributed Database Systems, most of the proposed solutions divide the design process into two parts: the fragmentation and the allocation of fragments to the locations in the network. Here we consider the allocation problem with the possibility to replicate fragments, minimizing the total cost, which is in general NP-complete, and prop...

متن کامل

Standing up with Motor Primitives

2005

Verena Hamburger Karsten Berns Fumiya Iida Rolf Pfeifer

As observed in nature, complex locomotion can be generated based on an adequate combination of motor primitives. In this context, the paper focused on experiments which result in the development of a quality criterion for the design and analysis of motor primitives. First, the impact of different vocabularies on behavioural diversity, robustness of pre-learned behaviours and learning process is...

متن کامل

CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games

2012

Ulit Jaidee Héctor Muñoz-Avila

We present CLASSQ-L (for: class Q-learning) an application of the Q-learning reinforcement learning algorithm to play complete Wargus games. Wargus is a real-time strategy game where players control armies consisting of units of different classes (e.g., archers, knights). CLASSQ-L uses a single table for each class of unit so that each unit is controlled and updates its class’ Qtable. This enab...

متن کامل

The MAXQ Method for Hierarchical Reinforcement Learning

1998

Thomas G. Dietterich

This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics—as a subroutine hierarchy—and a declarative semantics—as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kae...

متن کامل

Flexible theft and resolute punishment: Evolutionary dynamics of social behavior among reinforcement-learning agents

2014

James MacGlashan Michael L. Littman Fiery Cushman

Existing models of the evolution of social behavior typically involve innate strategies such as tit-for-tat. Yet, both behavioral and neural evidence indicates a substantial role for learned social behavior. We explore the evolutionary dynamics of two simple social behaviors among learning agents: Theft and punishment. In our simulation, agents employ Q-learning, a common reinforcement learning...

متن کامل

UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles

Journal: :CoRR 2017

Richard Y. Chen Szymon Sidor Pieter Abbeel John Schulman

We show how an ensemble ofQ-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

متن کامل

Q-Learning with Hidden-Unit Restarting

1992

Charles W. Anderson

Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, units continue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on ave...

متن کامل

Internally Driven Q-learning - Convergence and Generalization Results

2012

Eduardo Alonso Esther Mondragón Niclas Kjäll-Ohlsson

We present an approach to solving the reinforcement learning problem in which agents are provided with internal drives against which they evaluate the value of the states according to a similarity function. We extend Q-learning by substituting internally driven values for ad hoc rewards. The resulting algorithm, Internally Driven Q-learning (IDQ-learning), is experimentally proved to convergenc...

متن کامل

Towards Finite-Sample Convergence of Direct Reinforcement Learning

2005

Shiau Hong Lim Gerald DeJong

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning ...

متن کامل

A-Learning for Approximate Planning

2004

D. Blatt S. A. Murphy

Abstract We consider a new algorithm for reinforcement learning called A-learning. A-learning learns the advantages from a single training set. We compare A-learning with function approximation to Q-learning with function approximation and find that because A-learning approximates only the advantages it is less likely to exhibit bias due to the function approximation as compared to Q-learning.W...

متن کامل