temporal difference learning

A Convergent Online Single Time Scale Actor Critic Algorithm

Journal: :Journal of Machine Learning Research 2010

Dotan Di Castro Ron Meir

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local m...

متن کامل

Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants

2012

Michel Tokic Günther Palm

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data ...

متن کامل

Mixing-Time Regularized Policy Gradient

2014

Tetsuro Morimura Takayuki Osogami Tomoyuki Shirai

Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by t...

متن کامل

TD Models: Modeling the World at a Mixture of Time Scales

1995

Richard S. Sutton

Temporal-diierence (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at diierent levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based rein...

متن کامل

Sample-Efficient Evolutionary Function Approximation for Reinforcement Learning

2006

Shimon Whiteson Peter Stone

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent’s optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of ...

متن کامل

Model-Free Temporal-Difference Learning and Dopamine in Alcohol Dependence: Examining Concepts From Theory and Animals in Human Imaging.

Journal: :Biological psychiatry. Cognitive neuroscience and neuroimaging 2016

Quentin J M Huys Lorenz Deserno Klaus Obermayer Florian Schlagenhauf Andreas Heinz

Dopamine potentially unites two important roles: one in addiction, being involved in most substances of abuse including alcohol, and a second one in a specific type of learning, namely model-free temporal-difference reinforcement learning. Theories of addiction have long suggested that drugs of abuse may usurp dopamine's role in learning. Here, we briefly review the preclinical literature to mo...

متن کامل

Reinforcement Learning of Local Shape in the Game of Go

2007

David Silver Richard S. Sutton Martin Müller

We explore an application to the game of Go of a reinforcement learning approach based on a linear evaluation function and large numbers of binary features. This strategy has proved effective in game playing programs and other reinforcement learning applications. We apply this strategy to Go by creating over a million features based on templates for small fragments of the board, and then use te...

متن کامل

Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

2008

Dotan Di Castro Dmitry Volkinshtein Ron Meir

Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal gangli...

متن کامل

Stability of Stochastic Approximations with 'Controlled Markov' Noise and Temporal Difference Learning

Journal: :CoRR 2015

Arunselvan Ramaswamy Shalabh Bhatnagar

In this paper we present a ‘stability theorem’ for stochastic approximation (SA) algorithms with ‘controlled Markov’ noise. Such algorithms were first studied by Borkar in 2006. Specifically, sufficient conditions are presented which guarantee the stability of the iterates. Further, under these conditions the iterates are shown to track a solution to the differential inclusion defined in terms ...

متن کامل

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

Journal: :Journal of Machine Learning Research 2007

Matthew E. Taylor Peter Stone Yaxin Liu

Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. Th...

متن کامل