An Analysis of Temporal Di erence Learning with Function Approximation

نویسندگان

John N Tsitsiklis

Benjamin Van Roy

چکیده

We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probability a characterization of the limit of convergence and a bound on the resulting approximation error Furthermore our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal di erence learning In addition to proving new and stronger positive results than those previously available we identify the signi cance of on line updating and potential hazards associated with the use of nonlinear function approximators First we prove that divergence may occur when updates are not based on trajectories of the Markov chain This fact reconciles positive and negative results that have been discussed in the literature regarding the soundness of temporal di erence learning Second we present an example illustrating the possibility of divergence when temporal di erence learning is used in the presence of a nonlinear function approximator

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Analysis of Temporal - Di erence Learning with Function Approximation 1

We discuss the temporal-di erence learning algorithm, as applied to approximating the cost-to-go function of an in nite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator on{line, during a single endless trajectory of an irreducible aperiodic Markov chain with a nite or in nite state space. We present a proof of convergence (with proba...

متن کامل

LIDS - P - 2390 May 1997 Average Cost Temporal { Di erence Learning 1

We propose a variant of temporal{di erence learning that approximates average and di erential costs of an irreducible aperiodic Markov chain. Approximations are comprised of linear combinations of xed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain. We present a proof of convergence (with probability 1), and a characterization of th...

متن کامل

Stable Function Approximation in Dynamic Programming

The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difculty of reasoning about function approximators that gen...

متن کامل

Average Cost Temporal{diierence Learning 1

We propose a variant of temporal di erence learning that approximates average and di erential costs of an irreducible aperiodic Markov chain Approximations are comprised of linear combinations of xed basis functions whose weights are incrementally updated during a single endless trajectory of the Markov chain We present a proof of convergence with probability and a characterization of the limit...

متن کامل

Learning and value function approximation in complex decision processes

In principle, a wide variety of sequential decision problems { ranging from dynamic resource allocation in telecommunication networks to nancial risk management { can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortuna...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

An Analysis of Temporal Di erence Learning with Function Approximation

نویسندگان

چکیده

منابع مشابه

An Analysis of Temporal - Di erence Learning with Function Approximation 1

LIDS - P - 2390 May 1997 Average Cost Temporal { Di erence Learning 1

Stable Function Approximation in Dynamic Programming

Average Cost Temporal{diierence Learning 1

Learning and value function approximation in complex decision processes

عنوان ژورنال:

اشتراک گذاری