Efficient Asymptotic Approximation in Temporal Difference Learning
نویسندگان
چکیده
in Temporal Difference Learning Frédérick Garcia and Florent Serre Abstract. TD( ) is an algorithm that learns the value function associated to a policy in a Markov Decision Process (MDP). We propose in this paper an asymptotic approximation of online TD( ) with accumulating eligibility trace, called ATD( ). We then use the Ordinary Differential Equation (ODE) method to analyse ATD( ) and to optimize the choice of the parameter and the learning stepsize, and we introduce ATD, a new efficient temporal difference learning algorithm.
منابع مشابه
Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
We present for the first time an asymptotic convergence analysis of two-timescale stochastic approximation driven by controlled Markov noise. In particular, both the faster and slower recursions have non-additive Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both times...
متن کاملFrom Q( ) to Average Q-learning: Efficient Implementation of an Asymptotic Approximation
Q( ) is a reinforcement learning algorithm that combines Q-learning and TD( ). Online implementations of Q( ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins’ Q( ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q( ) that appears to be a gain matrix variant of basic Qlearnin...
متن کاملPreconditioned Temporal Difference Learning
LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new met...
متن کاملA Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal Difference Learning
The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desire...
متن کاملProximal Gradient Temporal Difference Learning Algorithms
In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to pri...
متن کامل