The Asymptotic Convergence-Rate of Q-learning
نویسنده
چکیده
In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning is O(1/tR(1-1') if R(1 ,) < 1/2 and O( Jlog log tit) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here R = Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to convergent on-line learning provided that Pmin > 0, where Pmin and Pmax now become the minimum and maximum state-action occupation frequencies corresponding to the stationary distribution.
منابع مشابه
From Q( ) to Average Q-learning: Efficient Implementation of an Asymptotic Approximation
Q( ) is a reinforcement learning algorithm that combines Q-learning and TD( ). Online implementations of Q( ) that use eligibility traces have been shown to speed basic Q-learning. In this paper we present an asymptotic analysis of Watkins’ Q( ) with accumulative eligibility traces. We first introduce an asymptotic approximation of Q( ) that appears to be a gain matrix variant of basic Qlearnin...
متن کاملAsymptotic behavior of a system of two difference equations of exponential form
In this paper, we study the boundedness and persistence of the solutions, the global stability of the unique positive equilibrium point and the rate of convergence of a solution that converges to the equilibrium $E=(bar{x}, bar{y})$ of the system of two difference equations of exponential form: begin{equation*} x_{n+1}=dfrac{a+e^{-(bx_n+cy_n)}}{d+bx_n+cy_n}, y_{n+1}=dfrac{a+e^{-(by_n+cx_n)}}{d+...
متن کاملSuperlinearly convergent exact penalty projected structured Hessian updating schemes for constrained nonlinear least squares: asymptotic analysis
We present a structured algorithm for solving constrained nonlinear least squares problems, and establish its local two-step Q-superlinear convergence. The approach is based on an adaptive structured scheme due to Mahdavi-Amiri and Bartels of the exact penalty method of Coleman and Conn for nonlinearly constrained optimization problems. The structured adaptation also makes use of the ideas of N...
متن کاملOn the approximation by Chlodowsky type generalization of (p,q)-Bernstein operators
In the present article, we introduce Chlodowsky variant of $(p,q)$-Bernstein operators and compute the moments for these operators which are used in proving our main results. Further, we study some approximation properties of these new operators, which include the rate of convergence using usual modulus of continuity and also the rate of convergence when the function $f$ belongs to the class Li...
متن کاملTwo Novel Learning Algorithms for CMAC Neural Network Based on Changeable Learning Rate
Cerebellar Model Articulation Controller Neural Network is a computational model of cerebellum which acts as a lookup table. The advantages of CMAC are fast learning convergence, and capability of mapping nonlinear functions due to its local generalization of weight updating, single structure and easy processing. In the training phase, the disadvantage of some CMAC models is unstable phenomenon...
متن کامل