Gradient Temporal Difference with Momentum: Stability and Convergence
نویسندگان
چکیده
Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) used for policy evaluation in reinforcement learning. Here, we consider TD with an additional heavy ball momentum term and provide choice step size parameter that ensures almost sure convergence these asymptotically. In doing so, decompose the iterates into three separate different sizes. We first analyze under one-timescale SA setting using results from current literature. However, case is restrictive more general analysis can be provided by looking at three-timescale decomposition iterates. process conditions stability SA. then prove algorithm convergent our analysis. Finally, evaluate on standard RL problems report improvement performance over vanilla algorithms.
منابع مشابه
Gradient Temporal Difference Networks
Temporal-difference (TD) networks (Sutton and Tanner, 2004) are a predictive representation of state in which each node is an answer to a question about future observations or questions. Unfortunately, existing algorithms for learning TD networks are known to diverge, even in very simple problems. In this paper we present the first sound learning rule for TD networks. Our approach is to develop...
متن کاملConvergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
A. Proof of Theorem 1 We first recall the following lemma. Lemma 1 (Lemma 1, (Gong et al., 2013)). Under Assumption 1.{3}. For any η > 0 and any x,y ∈ R such that x = proxηg(y − η∇f(y)), one has that F (x) ≤ F (y)− ( 1 2η − L 2 )‖x− y‖ . Applying Lemma 1 with x = xk,y = yk, we obtain that F (xk) ≤ F (yk)− ( 1 2η − L 2 )‖xk − yk‖ . (12) Since η < 1 L , it follows that F (xk) ≤ F (yk). Moreover, ...
متن کاملConvergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization
In many modern machine learning applications, structures of underlying mathematical models often yield nonconvex optimization problems. Due to the intractability of nonconvexity, there is a rising need to develop efficient methods for solving general nonconvex problems with certain performance guarantee. In this work, we investigate the accelerated proximal gradient method for nonconvex program...
متن کاملAccelerated Gradient Temporal Difference Learning
The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorith...
متن کامل$\ell_1$ Regularized Gradient Temporal-Difference Learning
In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent development of Gradient TD (GTD) algorithms has addressed this problem successfully. However, the success of GTD algorithms requires a set of well chosen features,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i6.20601