Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning
نویسندگان
چکیده
Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). In such methods, the iterates consist of two parts that are updated using different stepsizes. We develop the first convergence rate result for these algorithms; in particular, we provide a general methodology for analyzing two-timescale linear SA. We apply our methodology to two-timescale RL algorithms such as GTD(0), GTD2, and TDC.
منابع مشابه
Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-discrete) action setting while the rest are for finite-action spaces. On the slower timescale, all the algorithms perform a gradient search over cor...
متن کاملA model for the evolution of reinforcement learning in fluctuating games
Many species are able to learn to associate behaviors with rewards as this gives fitness advantages in changing environments. Social interactions between population members may, however, require more cognitive abilities than simple trial-and-error learning, in particular the capacity to make accurate hypotheses about the material payoff consequences of alternative action combinations. It is unc...
متن کاملNatural actor-critic algorithms
We present four new reinforcement learning algorithms based on actor–critic, natural-gradient and function-approximation ideas, and we provide their convergence proofs. Actor–critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochasti...
متن کاملIncremental Natural Actor-Critic Algorithms
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods...
متن کاملFinite Sample Analysis for TD(0) with Linear Function Approximation
TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such a result. Works that managed to obtain concentration bounds for online Temporal Difference (TD) methods analyzed modified versions of them, carefully crafted f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.05376 شماره
صفحات -
تاریخ انتشار 2017