Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding 1 Reinforcement Learning and Function Approximation 2 Good Convergence on Control Problems
نویسنده
چکیده
On large problems, reinforcement learning systems must use parame-terized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year's meeting a series of negative results in attempting to apply dynamic programming together with function approximation to simple control problems with continuous state spaces. In this paper, we present positive results for all the control tasks they attempted, and for one that is signiicantly larger. The most important diierences are that we used sparse-coarse-coded function approximators (CMACs) whereas they used mostly global function approximators, and that we learned online whereas they learned ooine. Boyan and Moore and others have suggested that the problems they encountered could be solved by using actual outcomes (\rollouts"), as in classical Monte Carlo methods, and as in the TD() algorithm when = 1. However, in our experiments this always resulted in substantially poorer performance. We conclude that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justiication at present for avoiding the case of general. Reinforcement learning is a broad class of optimal control methods based on estimating Many of these methods, e.g., dynamic programming and temporal-diierence learning, build their estimates in part on the basis of other estimates. This may be worrisome because, in practice, the estimates never become exact; on large problems, parameterized function approximators such as neural networks must be used. Because the estimates are imperfect, and because they in turn are used as the targets for other estimates, it seems possible that the ultimate result might be very poor estimates, or even divergence. Indeed some such methods have been shown to be unstable in theory What are the key requirements of a method or task in order to obtain good performance? The experiments in this paper are part of narrowing the answer to this question. The reinforcement learning methods we use are variations of the sarsa algorithm (Rum-, except applied to state-action pairs instead of states, and where the predictions are used as the basis for selecting actions. The learning agent estimates action-values, Q (s; a), deened as the expected future reward starting in state s, taking action a, and thereafter following policy. These are estimated for all states and actions, and …
منابع مشابه
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, and computational results have been mixed. In particular, Boyan and Moore reported at last year’s meeting a series of negative ...
متن کاملAdvances in Neural Information Processing Systems pp MIT Press Generalization in Reinforcement Learning Successful Examples Using Sparse Coarse Coding
On large problems reinforcement learning systems must use parame terized function approximators such as neural networks in order to gen eralize between similar situations and actions In these cases there are no strong theoretical results on the accuracy of convergence and com putational results have been mixed In particular Boyan and Moore reported at last year s meeting a series of negative re...
متن کاملTree-Based Batch Mode Reinforcement Learning
Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (xt ,ut ,rt ,xt+1) where xt denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and xt+1 the succe...
متن کاملAbstraction and Generalization in Reinforcement Learning: A Summary and Framework
ion and Generalization in Reinforcement Learning: A Summary and Framework Marc Ponsen, Matthew E. Taylor, and Karl Tuyls 1 Universiteit Maastricht, Maastricht, The Netherlands {m.ponsen,k.tuyls}@maastrichtuniversity.nl 2 The University of Southern California, Los Angeles, CA [email protected] Abstract. In this paper we survey the basics of reinforcement learning, generalization and abstraction. W...
متن کاملManifold Representations for Value-Function Approximation in Reinforcement Learning
Reinforcement learning (RL) has shown itself to be a successful paradigm for solving optimal control problems. However, that success has been mostly limited to problems with a finite set of states and actions. The problem of extending reinforcement learning techniques to the continuous state case has received quite a bit of attention in the last few years. One approach to solving reinforcement ...
متن کامل