نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

1999
Justin A. Boyan

Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-98-152.) TD( ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD( ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it ...

2008
Simon M. Lucas

Evidently, any learning algorithm can only learn on the basis of the information given to it. This paper presents an initial attempt to place an upper bound on the information rates attainable with standard co-evolution and with TDL. The upper bound for TDL is shown to be much higher than for evolution. To test how well these bounds correlate with actual learning, a simple two-player game calle...

Journal: :Network: Computation in Neural Systems 2008

Journal: :CoRR 2017
Todd Hester Matej Vecerik Olivier Pietquin Marc Lanctot Tom Schaul Bilal Piot Andrew Sendonaris Gabriel Dulac-Arnold Ian Osband John Agapiou Joel Z. Leibo Audrunas Gruslys

Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-wo...

2010
Carlton Downey Scott Sanner

Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of “bootstrapped” return estimates to make efficient use of sampled data. In particular, TD(λ) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(λ) chooses a v...

Journal: :Journal of Machine Learning Research 2011
Tsuyoshi Ueno Shin-ichi Maeda Motoaki Kawanabe Shin Ishii

Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new ...

2009
Christopher M. Vigorito

Temporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید