TD(?) converges with probability 1

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TD ( X ) Converges with Probability 1

The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions co...

متن کامل

TD(0) Converges Provably Faster than the Residual Gradient Algorithm

In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the concept of asymptotic convergence rate to prove that under certain conditions the synchronous off-policy TD(0) algorithm converges faster than the synchronous offpolicy residual gradient algorithm if the value function...

متن کامل

FONT COLOR B TD TD TD TD TD FONT FACE FONT empty string FACE FONT FACE FONT

متن کامل

Alzheimer's Disease Neurodevelopment Converges with Neurodegeneration

dues, perhaps by a CNR-associated Src family protein Alzheimer’s disease (AD) is the most common senile tyrosine kinase. Phosphorylation of Dab somehow dementia in the elderly. The disease is characterized causes the migrating neuron to cease migrating, release behaviorally by global cognitive decline, and defined from its substratum, the radial glial cell, and begin to histologically by two di...

متن کامل

Factored Value Iteration Converges

In this paper we propose a novel algorithm, factored value iteration (FVI), for the approximate solution of factored Markov decision processes (fMDPs). The traditional approximate value iteration algorithm is modified in two ways. For one, the least-squares projection operator is modified so that it does not increase max-norm, and thus preserves convergence. The other modification is that we un...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 1994

ISSN: 0885-6125,1573-0565

DOI: 10.1007/bf00993978