temporal difference learning

Kernel Least-Squares Temporal Difference Learning

2006

Xin Xu Tao Xie Dewen Hu Xicheng Lu

Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, non-linear and non-parametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel methods in reinforcement learning have not been popularly studied in the literature. In this paper, w...

متن کامل

Multigrid Algorithms for Temporal Difference Reinforcement Learning

2005

Omer Ziv Nahum Shimkin

We introduce a class of Multigrid based temporal difference algorithms for reinforcement learning with linear function approximation. Multigrid methods are commonly used to accelerate convergence of iterative numerical computation algorithms. The proposed Multigrid-enhanced TD(λ) algorithms allows to accelerate the convergence of the basic TD(λ) algorithm while keeping essentially the same per-...

متن کامل

Temporal-Difference Learning in Self-Play Training

2003

Clifford Kotnik Jugal Kalita

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...

متن کامل

Temporal Difference Learning in the Tetris Game

2009

Hans Pirnay Slava Arabagi

Learning to play the game Tetris has been a common challenge on a few past machine learning competitions. Most have achieved tremendous success by using a heuristic cost function during the game and optimizing each move based on those heuristics. This approach dates back to 1990’s when it has been shown to eliminate successfully a couple hundred lines. However, we decided to completely depart f...

متن کامل

Mondragón, Esther and Gray, Jonathan and Alonso, Eduardo and Bonardi, Charlotte and Jennings, Dómhnall J. (2014) SSCC TD: a serial and simultaneous configural-cue compound stimuli

2017

Esther Mondragón Jonathan Gray Eduardo Alonso Charlotte Bonardi Dómhnall J. Jennings

This paper presents a novel representational framework for the Temporal Difference (TD) model of learning, which allows the computation of configural stimuli – cumulative compounds of stimuli that generate perceptual emergents known as configural cues. This Simultaneous and Serial Configural-cue Compound Stimuli Temporal Difference model (SSCC TD) can model both simultaneous and serial stimulus...

متن کامل

$\ell_1$ Regularized Gradient Temporal-Difference Learning

2016

Dominik Meyer Hao Shen Klaus Diepold

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent development of Gradient TD (GTD) algorithms has addressed this problem successfully. However, the success of GTD algorithms requires a set of well chosen features,...

متن کامل

Locally Weighted Least Squares Temporal Difference Learning

2013

Matthew Howard Yoshihiko Nakamura

This paper introduces locally weighted temporal difference learning for evaluation of a class of policies whose value function is nonlinear in the state. Least squares temporal difference learning is used for training local models according to a distance metric in state-space. Empirical evaluations are reported demonstrating learning performance on a number of strongly non-linear value function...

متن کامل

Altered temporal difference learning in bulimia nervosa.

Journal: :Biological psychiatry 2011

Guido K W Frank Jeremy R Reynolds Megan E Shott Randall C O'Reilly

BACKGROUND The neurobiology of bulimia nervosa (BN) is poorly understood. Recent animal literature suggests that binge eating is associated with altered brain dopamine (DA) reward function. In this study, we wanted to investigate DA-related brain reward learning in BN. METHODS Ill BN (n = 20, age: mean = 25.2, SD = 5.3 years) and healthy control women (CW) (n = 23, age: mean = 27.2, SD = 6.4 ...

متن کامل

Adaptive Lambda Least-Squares Temporal Difference Learning

Journal: :CoRR 2016

Timothy Arthur Mann Hugo Penedones Shie Mannor Todd Hester

Temporal Difference learning or TD(λ) is a fundamental algorithm in the field of reinforcement learning. However, setting TD’s λ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the λ selection problem as a bias-variance trade-off where the solution is the value of λ that leads to the smallest Mean Squared Value Error (MSVE). To solve...

متن کامل

Investigating Practical Linear Temporal Difference Learning

2016

Adam M. White Martha White

Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policyevaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to offpolicy sampling, function approximati...

متن کامل