temporal difference learning

نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164 فیلتر نتایج به سال:

on the comparison of keyword and semantic-context methods of learning new vocabulary meaning

پایان نامه :0 1374

آواز صبحی گیوی, پرویز بیرجندی,

the rationale behind the present study is that particular learning strategies produce more effective results when applied together. the present study tried to investigate the efficiency of the semantic-context strategy alone with a technique called, keyword method. to clarify the point, the current study seeked to find answer to the following question: are the keyword and semantic-context metho...

15 صفحه اول

Quasi Newton Temporal Difference Learning

2014

Arash Givchi Maziar Palhang

Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rates but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and ...

متن کامل

Accelerated Gradient Temporal Difference Learning

2017

Yangchen Pan Adam M. White Martha White

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorith...

متن کامل

Predictive State Temporal Difference Learning

2010

Byron Boots Geoffrey J. Gordon

We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the...

متن کامل

Hyperbolically Discounted Temporal Difference Learning

Journal: :Neural computation 2010

William H. Alexander Joshua W. Brown

Hyperbolic discounting of future outcomes is widely observed to underlie choice behavior in animals. Additionally, recent studies (Kobayashi & Schultz, 2008) have reported that hyperbolic discounting is observed even in neural systems underlying choice. However, the most prevalent models of temporal discounting, such as temporal difference learning, assume that future outcomes are discounted ex...

متن کامل

Distributed relational temporal difference learning

2013

Qiangfeng Peter Lau Mong-Li Lee Wynne Hsu

Relational representations have great potential for rapidly generalizing learned knowledge in large Markov decision processes such as multi-agent problems. In this work, we introduce relational temporal difference learning for the distributed case where the communication links among agents are dynamic. Thus no critical components of the system should reside in any one agent. Relational generali...

متن کامل

True Online Temporal-Difference Learning

Journal: :Journal of Machine Learning Research 2016

Harm van Seijen Ashique Rupam Mahmood Patrick M. Pilarski Marlos C. Machado Richard S. Sutton

The temporal-difference methods TD(λ) and Sarsa(λ) form a core part of modern reinforcement learning. Their appeal comes from their good performance, low computational cost, and their simple interpretation, given by their forward view. Recently, new versions of these methods were introduced, called true online TD(λ) and true online Sarsa(λ), respectively (van Seijen and Sutton, 2014). Algorithm...

متن کامل

Average cost temporal-difference learning

Journal: :Automatica 1999

John N. Tsitsiklis Benjamin Van Roy

We propose a variant of temporal-difference learning that approximates average and differential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of fixed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present a proof of convergence (with probability 1), and a characterization o...

متن کامل

Analysis of Temporal-Difference Learning

1996

Benjamin Van Roy

We present new results about the temporal-difference learning algorithm, as applied to approximating the cost-to-go function of a Markov chain using linear function approximators. The algorithm we analyze performs on-line updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a ch...

متن کامل

Dynamics of Temporal Difference Learning

2007

Andreas Wendemuth

In behavioural sciences, the problem that a sequence of stimuli is followed by a sequence of rewards r(t) is considered. The subject is to learn the full sequence of rewards from the stimuli, where the prediction is modelled by the Sutton-Barto rule. In a sequence of n trials, this prediction rule is learned iteratively by temporal difference learning. We present a closed formula of the predict...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید