Analytical Mean Squared Error Curves in Temporal Di erence Learning
نویسندگان
چکیده
We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal di erence value estimation algorithms change with o ine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its stepsize and eligibility trace parameters.
منابع مشابه
Analytical Mean Squared Error Curves in Temporal Difference Learning
Peter Dayan Brain and Cognitive Sciences E25-210, MIT Cambridge, MA 02139 [email protected] We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve...
متن کاملAnalytical Mean Squared Error Curves in Temporal Diierence Learning
We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal diierence value estimation algorithms change with ooine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its step-...
متن کاملAn Analysis of Temporal Di erence Learning with Function Approximation
We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probabili...
متن کاملAn Analysis of Temporal - Di erence Learning with Function Approximation 1
We discuss the temporal-di erence learning algorithm, as applied to approximating the cost-to-go function of an in nite-horizon discounted Markov chain. The algorithm we analyze updates parameters of a linear function approximator on{line, during a single endless trajectory of an irreducible aperiodic Markov chain with a nite or in nite state space. We present a proof of convergence (with proba...
متن کاملRegularization Learning of Neural Networks for Generalization
In this paper, we propose a learning method of neural networks based on the regularization method and analyze its generalization capability. In learning from examples, training samples are independently drawn from some unknown probability distribution. The goal of learning is minimizing the expected risk for future test samples, which are also drawn from the same distribution. The problem can b...
متن کامل