نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

2006
Xin Xu Tao Xie Dewen Hu Xicheng Lu

Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, non-linear and non-parametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel methods in reinforcement learning have not been popularly studied in the literature. In this paper, w...

2005
Omer Ziv Nahum Shimkin

We introduce a class of Multigrid based temporal difference algorithms for reinforcement learning with linear function approximation. Multigrid methods are commonly used to accelerate convergence of iterative numerical computation algorithms. The proposed Multigrid-enhanced TD(λ) algorithms allows to accelerate the convergence of the basic TD(λ) algorithm while keeping essentially the same per-...

2003
Clifford Kotnik Jugal Kalita

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...

2009
Hans Pirnay Slava Arabagi

Learning to play the game Tetris has been a common challenge on a few past machine learning competitions. Most have achieved tremendous success by using a heuristic cost function during the game and optimizing each move based on those heuristics. This approach dates back to 1990’s when it has been shown to eliminate successfully a couple hundred lines. However, we decided to completely depart f...

2017
Esther Mondragón Jonathan Gray Eduardo Alonso Charlotte Bonardi Dómhnall J. Jennings

This paper presents a novel representational framework for the Temporal Difference (TD) model of learning, which allows the computation of configural stimuli – cumulative compounds of stimuli that generate perceptual emergents known as configural cues. This Simultaneous and Serial Configural-cue Compound Stimuli Temporal Difference model (SSCC TD) can model both simultaneous and serial stimulus...

2016
Dominik Meyer Hao Shen Klaus Diepold

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent development of Gradient TD (GTD) algorithms has addressed this problem successfully. However, the success of GTD algorithms requires a set of well chosen features,...

2013
Matthew Howard Yoshihiko Nakamura

This paper introduces locally weighted temporal difference learning for evaluation of a class of policies whose value function is nonlinear in the state. Least squares temporal difference learning is used for training local models according to a distance metric in state-space. Empirical evaluations are reported demonstrating learning performance on a number of strongly non-linear value function...

Journal: :Biological psychiatry 2011
Guido K W Frank Jeremy R Reynolds Megan E Shott Randall C O'Reilly

BACKGROUND The neurobiology of bulimia nervosa (BN) is poorly understood. Recent animal literature suggests that binge eating is associated with altered brain dopamine (DA) reward function. In this study, we wanted to investigate DA-related brain reward learning in BN. METHODS Ill BN (n = 20, age: mean = 25.2, SD = 5.3 years) and healthy control women (CW) (n = 23, age: mean = 27.2, SD = 6.4 ...

Journal: :CoRR 2016
Timothy Arthur Mann Hugo Penedones Shie Mannor Todd Hester

Temporal Difference learning or TD(λ) is a fundamental algorithm in the field of reinforcement learning. However, setting TD’s λ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the λ selection problem as a bias-variance trade-off where the solution is the value of λ that leads to the smallest Mean Squared Value Error (MSVE). To solve...

2016
Adam M. White Martha White

Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policyevaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to offpolicy sampling, function approximati...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید