On Step - Size and Bias in Temporal - Di erence LearningRichard

نویسندگان

  • Richard S. Sutton
  • Satinder P. Singh
چکیده

We present results for three new algorithms for setting the step-size parameters, and , of temporal-diierence learning methods such as TD(). The overall task is that of learning to predict the outcome of an unknown Markov chain based on repeated observations of its state trajectories. The new algorithms select step-size parameters online in such a way as to eliminate the bias normally inherent in temporal-diierence methods. We compare our algorithms with conventional Monte Carlo methods. Monte Carlo methods have a natural way of setting the step size: for each state s they use a step size of 1=n s , where n s is the number of times state s has been visited. We seek and come close to achieving comparable step-size algorithms for TD(). One new algorithm uses a = 1=n s schedule to achieve the same eeect as processing a state backwards with TD(0), but remains completely incremental. Another algorithm uses a at each time equal to the estimated transition probability of the current transition. We present empirical results showing improvement in convergence rate over Monte Carlo methods and conventional TD(). A limitation of our results at present is that they apply only to tasks whose state trajectories do not contain cycles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analytical Mean Squared Error Curves in Temporal Di erence Learning

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal di erence value estimation algorithms change with o ine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its steps...

متن کامل

Orr Sommerfeld Solver Using Mapped Finite Di?erence Scheme for Plane Wake Flow

Linear stability analysis of the three dimensional plane wake flow is performed using a mapped finite di?erence scheme in a domain which is doubly infinite in the cross–stream direction of wake flow. The physical domain in cross–stream direction is mapped to the computational domain using a cotangent mapping of the form y = ?cot(??). The Squire transformation [2], proposed by Squire, is also us...

متن کامل

Factors in uencing ascertainment bias of microsatellite allele sizes: impact on estimates of mutation rates

Microsatellite loci play an important role as markers for identi cation, disease gene mapping and evolutionary studies. Mutation rate, which is of fundamental importance, can be obtained from interspecies comparisons, which however are subject to ascertainment bias. This bias arises for example when a locus is selected based on its large allele size in one species (cognate species 1), in which ...

متن کامل

An Analysis of Temporal Di erence Learning with Function Approximation

We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probabili...

متن کامل

An Analysis of Temporal - Di erence Learning with Function Approximation 1

We discuss the temporal-di erence learning algorithm, as applied to approximating the cost-to-go function of an in nite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator on{line, during a single endless trajectory of an irreducible aperiodic Markov chain with a nite or in nite state space. We present a proof of convergence (with proba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994