نتایج جستجو برای: temporal difference learning
تعداد نتایج: 1222164 فیلتر نتایج به سال:
This paper deals with the problem of constructing an intelligent Focused Crawler, i.e. a system that is able to retrieve documents of a specific topic from the Web. The crawler must contain a component which assigns visiting priorities to the links, by estimating the probability of leading to a relevant page in the future. Reinforcement Learning was chosen as a method that fits this task nicely...
Aversive processing plays a central role in human phobic fears and may also be important in some symptoms of psychosis. We developed a temporal-difference model of the conditioned avoidance response, an important experimental model for aversive learning which is also a central pharmacological model of psychosis. In the model, dopamine neurons reported outcomes that were better than the learner ...
Temporal Difference learning is one of the most used approaches for policy evaluation. It is a central part of solving reinforcement learning tasks. For deriving optimal control, policies have to be evaluated. This task requires value function approximation. At this point TD methods find application. The use of eligibility traces for backpropagation of updates as well as the bootstrapping of th...
Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential d...
This paper proposes a novel neural-network method for sequential detection, We first examine the optimal parametric sequential probability ratio test (SPRT) and make a simple equivalent transformation of the SPRT that makes it suitable for neural-network architectures. We then discuss how neural networks can learn the SPRT decision functions from observation data and labels. Conventional superv...
A striking recent finding is that monkeys behave maladaptively in a class of tasks in which they know that reward is going to be systematically delayed. This may be explained by a malign Pavlovian influence arising from states with low predicted values. However, by very carefully analyzing behavioral data from such tasks, La Camera and Richmond (2008) observed the additional important character...
in Temporal Difference Learning Frédérick Garcia and Florent Serre Abstract. TD( ) is an algorithm that learns the value function associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD( ) with accumulating eligibility trace, called ATD( ). We then use the Ordinary Differential Equation (ODE) method to analyse ATD( ) and to op...
In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to pri...
We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence...
Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more comp...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید