Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis
نویسندگان
چکیده
منابع مشابه
Evaluating project’s completion time with Q-learning
Nowadays project management is a key component in introductory operations management. The educators and the researchers in these areas advocate representing a project as a network and applying the solution approaches for network models to them to assist project managers to monitor their completion. In this paper, we evaluated project’s completion time utilizing the Q-learning algorithm. So the ...
متن کاملEvaluating project’s completion time with Q-learning
Nowadays project management is a key component in introductory operations management. The educators and the researchers in these areas advocate representing a project as a network and applying the solution approaches for network models to them to assist project managers to monitor their completion. In this paper, we evaluated project’s completion time utilizing the Q-learning algorithm. So the ...
متن کاملFault Detection for Nonlinear Discrete-Time Systems via Deterministic Learning
Recently, an approach for the rapid detection of small oscillation faults based on deterministic learning theory was proposed for continuous-time systems. In this paper, a fault detection scheme is proposed for a class of nonlinear discrete-time systems via deterministic learning. By using a discrete-time extension of deterministic learning algorithm, the general fault functions (i.e., the inte...
متن کاملFastest Convergence for Q-learning
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-s...
متن کاملThe Asymptotic Convergence-Rate of Q-learning
In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning is O(1/tR(1-1') if R(1 ,) < 1/2 and O( Jlog log tit) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here R = Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to conv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Cybernetics
سال: 2017
ISSN: 2168-2267,2168-2275
DOI: 10.1109/tcyb.2016.2542923