Discrete-Time Deterministic $Q$ -Learning: A Novel Convergence Analysis

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating project’s completion time with Q-learning

Nowadays project management is a key component in introductory operations management. The educators and the researchers in these areas advocate representing a project as a network and applying the solution approaches for network models to them to assist project managers to monitor their completion. In this paper, we evaluated project’s completion time utilizing the Q-learning algorithm. So the ...

متن کامل

Evaluating project’s completion time with Q-learning

متن کامل

Fault Detection for Nonlinear Discrete-Time Systems via Deterministic Learning

Recently, an approach for the rapid detection of small oscillation faults based on deterministic learning theory was proposed for continuous-time systems. In this paper, a fault detection scheme is proposed for a class of nonlinear discrete-time systems via deterministic learning. By using a discrete-time extension of deterministic learning algorithm, the general fault functions (i.e., the inte...

متن کامل

Fastest Convergence for Q-learning

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-s...

متن کامل

The Asymptotic Convergence-Rate of Q-learning

In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning is O(1/tR(1-1') if R(1 ,) < 1/2 and O( Jlog log tit) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here R = Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to conv...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Cybernetics

سال: 2017

ISSN: 2168-2267,2168-2275

DOI: 10.1109/tcyb.2016.2542923