نتایج جستجو برای: temporal difference learning
تعداد نتایج: 1222164 فیلتر نتایج به سال:
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-98-152.) TD( ) is a popular family of algorithms for approximate policy evaluation in large MDPs. TD( ) works by incrementally updating the value function after each observed transition. It has two major drawbacks: it ...
Evidently, any learning algorithm can only learn on the basis of the information given to it. This paper presents an initial attempt to place an upper bound on the information rates attainable with standard co-evolution and with TDL. The upper bound for TDL is shown to be much higher than for evolution. To test how well these bounds correlate with actual learning, a simple two-player game calle...
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-wo...
Temporal difference (TD) algorithms are attractive for reinforcement learning due to their ease-of-implementation and use of “bootstrapped” return estimates to make efficient use of sampled data. In particular, TD(λ) methods comprise a family of reinforcement learning algorithms that often yield fast convergence by averaging multiple estimators of the expected return. However, TD(λ) chooses a v...
Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new ...
Temporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید