Improving Temporal Di erence Learning for Deterministic Sequential Decision Problems
نویسندگان
چکیده
We present a new connectionist framework for solving deterministic sequential decision problems based on temporal diierence learning and dynamic programming. The framework will be compared to Tesauro's approach of learning a strategy for playing backgammon, a proba-bilistic board game. It will be shown that his approach is not applicable for deterministic games, but simple modiications lead to impressive results. Furthermore we demonstrate that a combination with methods from dynamic programming and the introduction of a re-learning factor can reduce the necessary training time by several orders of magnitude and improve the overall performance.
منابع مشابه
Analytical Mean Squared Error Curves in Temporal Di erence Learning
We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal di erence value estimation algorithms change with o ine updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its steps...
متن کاملAn Analysis of Temporal Di erence Learning with Function Approximation
We discuss the temporal di erence learning algorithm as applied to approximating the cost to go function of an in nite horizon discounted Markov chain The algorithm we analyze updates parameters of a linear function approximator on line during a single endless traject ory of an irreducible aperiodic Markov chain with a nite or in nite state space We present a proof of convergence with probabili...
متن کاملLearning and value function approximation in complex decision processes
In principle, a wide variety of sequential decision problems { ranging from dynamic resource allocation in telecommunication networks to nancial risk management { can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortuna...
متن کاملEvolutionary Algorithms for Reinforcement Learning
There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal di erence methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal di erence methods. This article focuses on the application of evo...
متن کاملStable Function Approximation in Dynamic Programming
The success of reinforcement learning in practical problems depends on the ability to combine function approximation with temporal di erence methods such as value iteration. Experiments in this area have produced mixed results; there have been both notable successes and notable disappointments. Theory has been scarce, mostly due to the difculty of reasoning about function approximators that gen...
متن کامل