temporal difference learning

Ensembles of extreme learning machine networks for value prediction

2014

Pablo Escandell-Montero José María Martínez-Martínez Emilio Soria-Olivas Joan Vila-Francés José David Martín-Guerrero

Value prediction is an important subproblem of several reinforcement learning (RL) algorithms. In a previous work, it has been shown that the combination of least-squares temporal-difference learning with ELM (extreme learning machine) networks is a powerful method for value prediction in continuous-state problems. This work proposes the use of ensembles to improve the approximation capabilitie...

متن کامل

Experiments with learning opening strategy in the game of go

Journal: :International Journal on Artificial Intelligence Tools 2004

Timothy Huang Graeme Connell Bryan McQuade

We present an experimental methodology and results for a machine learning approach to learning opening strategy in the game of Go, a game for which the best computer programs play only at the level of an advanced beginning human player. While the evaluation function in most computer Go programs consists of a carefully crafted combination of pattern matchers, expert rules, and selective search, ...

متن کامل

Temporal difference learning is favored for rewards, but not punishments, in simulations and human behavior

2014

Adam Morris Fiery Cushman

Evidence indicates that dopaminergic neurons in basal ganglia implement a form of temporal difference (TD) reinforcement learning. Yet, while phasic dopamine levels encode prediction errors of rewarding outcomes, the encoding of punishing outcomes is weaker and less precise. We posit that this asymmetry between reward and punishment reflects functional design. In order to test this hypothesis, ...

متن کامل

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Journal: :CoRR 2017

Stephen Tu Benjamin Recht

Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic R...

متن کامل

Transfer Learning for Policy Search Methods

2006

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies...

متن کامل

Multiple model-based reinforcement learning explains dopamine neuronal activity

Journal: :Neural networks : the official journal of the International Neural Network Society 2007

Mathieu Bertin Nicolas Schweighofer Kenji Doya

A number of computational models have explained the behavior of dopamine neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by...

متن کامل

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

Journal: :Journal of Machine Learning Research 2016

Richard S. Sutton Ashique Rupam Mahmood Martha White

In this paper we introduce the idea of improving the performance of parametric temporaldifference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(λ)’s updates in a particular way causes its expected update to become stable under off-policy training. The only prior model-fre...

متن کامل

Using Temporal-Difference Reinforcement Learning to Improve Decision-Theoretic Utilities for Diagnosis

1995

Magnus Stensmo Terrence J. Sejnowski

Probability theory represents and manipulates uncertainties, but cannot tell us how to behave. For that we need utility theory which assigns values to the usefulness of different states, and decision theory which concerns optimal rational decisions. There are many methods for probability modeling, but few for learning utility and decision models. We use reinforcement learning to find the optima...

متن کامل

TDγ: Re-evaluating Complex Backups in Temporal Difference Learning

2011

George Konidaris Scott Niekum Philip S. Thomas

We show that the λ-return target used in the TD(λ) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an nstep return estimate increases with n. We introduce the γ-return estimator, an alternative target based on a more accurate model of variance, which defines the TDγ family of complex-backup temporal difference learning algorithms. We derive T...

متن کامل

Shifting Attention Using a Temporal Difference Prediction Error and High-Dimensional Input

Journal: :Adaptive Behaviour 2007

William H. Alexander

Research on reinforcement learning has increasingly focused on the role of neuromodulatory systems implicated in associative learning. Formulations of temporal difference (TD) learning have gained a great deal of attention due to the similarity of the TD prediction error and the observed activity of dopamine neurons in the primate midbrain. Recent work has attempted to integrate additional neur...

متن کامل