A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning with Linear Function Approximation
نویسندگان
چکیده
In this paper, we provide a new algorithm for the problem of prediction in Reinforcement Learning, i.e., estimating the Value Function of a Markov Reward Process (MRP) using the linear function approximation architecture, with memory and computation costs scaling quadratically in the size of the feature set. The algorithm is a multi-timescale variant of the very popular Cross Entropy (CE) method which is a model based search method to find the global optimum of a realvalued function. This is the first time a model based search method is used for the prediction problem. The application of CE to a stochastic setting is a completely unexplored domain. A proof of convergence using the ODE method is provided. The theoretical results are supplemented with experimental comparisons. The algorithm achieves good performance fairly consistently on many RL benchmark problems. This demonstrates the competitiveness of our algorithm against least squares and other state-of-the-art algorithms in terms of computational efficiency, accuracy and stability.
منابع مشابه
Basis Function Adaptation in Temporal Difference Reinforcement Learning
We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman approximation error. A gradient-based...
متن کاملVerification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation
Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...
متن کاملCombination of Approximation and Simulation Approaches for Distribution Functions in Stochastic Networks
This paper deals with the fundamental problem of estimating the distribution function (df) of the duration of the longest path in the stochastic activity network such as PERT network. First a technique is introduced to reduce variance in Conditional Monte Carlo Sampling (CMCS). Second, based on this technique a new procedure is developed for CMCS. Third, a combined approach of simulation and ap...
متن کاملKalman Temporal Differences
Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncert...
متن کاملAdaptive Approximation-Based Control for Uncertain Nonlinear Systems With Unknown Dead-Zone Using Minimal Learning Parameter Algorithm
This paper proposes an adaptive approximation-based controller for uncertain strict-feedback nonlinear systems with unknown dead-zone nonlinearity. Dead-zone constraint is represented as a combination of a linear system with a disturbance-like term. This work invokes neural networks (NNs) as a linear-in-parameter approximator to model uncertain nonlinear functions that appear in virtual and act...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1609.09449 شماره
صفحات -
تاریخ انتشار 2016