نتایج جستجو برای: q learning

تعداد نتایج: 717428  

Journal: :Computación y Sistemas 2007
Abel Rodríguez Darien Rosa Paz Marisela Mainegra Hing Luisa Manuela González González

Due to the complexity of the data distribution problem in Distributed Database Systems, most of the proposed solutions divide the design process into two parts: the fragmentation and the allocation of fragments to the locations in the network. Here we consider the allocation problem with the possibility to replicate fragments, minimizing the total cost, which is in general NP-complete, and prop...

2005
Verena Hamburger Karsten Berns Fumiya Iida Rolf Pfeifer

As observed in nature, complex locomotion can be generated based on an adequate combination of motor primitives. In this context, the paper focused on experiments which result in the development of a quality criterion for the design and analysis of motor primitives. First, the impact of different vocabularies on behavioural diversity, robustness of pre-learned behaviours and learning process is...

2012
Ulit Jaidee Héctor Muñoz-Avila

We present CLASSQ-L (for: class Q-learning) an application of the Q-learning reinforcement learning algorithm to play complete Wargus games. Wargus is a real-time strategy game where players control armies consisting of units of different classes (e.g., archers, knights). CLASSQ-L uses a single table for each class of unit so that each unit is controlled and updates its class’ Qtable. This enab...

1998
Thomas G. Dietterich

This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics—as a subroutine hierarchy—and a declarative semantics—as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kae...

2014
James MacGlashan Michael L. Littman Fiery Cushman

Existing models of the evolution of social behavior typically involve innate strategies such as tit-for-tat. Yet, both behavioral and neural evidence indicates a substantial role for learned social behavior. We explore the evolutionary dynamics of two simple social behaviors among learning agents: Theft and punishment. In our simulation, agents employ Q-learning, a common reinforcement learning...

Journal: :CoRR 2017
Richard Y. Chen Szymon Sidor Pieter Abbeel John Schulman

We show how an ensemble ofQ-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

1992
Charles W. Anderson

Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, units continue to learn via back-propagation. The resulting restart algorithm is tested in a Q-Iearning network that learns to solve an inverted pendulum problem. Solutions are found faster on ave...

2012
Eduardo Alonso Esther Mondragón Niclas Kjäll-Ohlsson

We present an approach to solving the reinforcement learning problem in which agents are provided with internal drives against which they evaluate the value of the states according to a similarity function. We extend Q-learning by substituting internally driven values for ad hoc rewards. The resulting algorithm, Internally Driven Q-learning (IDQ-learning), is experimentally proved to convergenc...

2005
Shiau Hong Lim Gerald DeJong

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning ...

2004
D. Blatt S. A. Murphy

Abstract We consider a new algorithm for reinforcement learning called A-learning. A-learning learns the advantages from a single training set. We compare A-learning with function approximation to Q-learning with function approximation and find that because A-learning approximates only the advantages it is less likely to exhibit bias due to the function approximation as compared to Q-learning.W...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید