A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning with Linear Function Approximation

نویسندگان

Ajin George Joseph

Shalabh Bhatnagar

چکیده

In this paper, we provide a new algorithm for the problem of prediction in Reinforcement Learning, i.e., estimating the Value Function of a Markov Reward Process (MRP) using the linear function approximation architecture, with memory and computation costs scaling quadratically in the size of the feature set. The algorithm is a multi-timescale variant of the very popular Cross Entropy (CE) method which is a model based search method to find the global optimum of a realvalued function. This is the first time a model based search method is used for the prediction problem. The application of CE to a stochastic setting is a completely unexplored domain. A proof of convergence using the ODE method is provided. The theoretical results are supplemented with experimental comparisons. The algorithm achieves good performance fairly consistently on many RL benchmark problems. This demonstrates the competitiveness of our algorithm against least squares and other state-of-the-art algorithms in terms of computational efficiency, accuracy and stability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Basis Function Adaptation in Temporal Difference Reinforcement Learning

We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman approximation error. A gradient-based...

متن کامل

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Nonlinear function approximation is one of the most important tasks in system analysis and identification. Several models have been presented to achieve an accurate approximation on nonlinear mathematics functions. However, the majority of the models are specific to certain problems and systems. In this paper, an evolutionary-based wavelet neural network model is proposed for structure definiti...

متن کامل

Combination of Approximation and Simulation Approaches for Distribution Functions in Stochastic Networks

This paper deals with the fundamental problem of estimating the distribution function (df) of the duration of the longest path in the stochastic activity network such as PERT network. First a technique is introduced to reduce variance in Conditional Monte Carlo Sampling (CMCS). Second, based on this technique a new procedure is developed for CMCS. Third, a combined approach of simulation and ap...

متن کامل

Kalman Temporal Differences

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncert...

متن کامل

Adaptive Approximation-Based Control for Uncertain Nonlinear Systems With Unknown Dead-Zone Using Minimal Learning Parameter Algorithm

This paper proposes an adaptive approximation-based controller for uncertain strict-feedback nonlinear systems with unknown dead-zone nonlinearity. Dead-zone constraint is represented as a combination of a linear system with a disturbance-like term. This work invokes neural networks (NNs) as a linear-in-parameter approximator to model uncertain nonlinear functions that appear in virtual and act...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1609.09449 شماره

صفحات -

تاریخ انتشار 2016

A Cross Entropy based Stochastic Approximation Algorithm for Reinforcement Learning with Linear Function Approximation

نویسندگان

چکیده

منابع مشابه

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Verification of an Evolutionary-based Wavelet Neural Network Model for Nonlinear Function Approximation

Combination of Approximation and Simulation Approaches for Distribution Functions in Stochastic Networks

Kalman Temporal Differences

Adaptive Approximation-Based Control for Uncertain Nonlinear Systems With Unknown Dead-Zone Using Minimal Learning Parameter Algorithm

عنوان ژورنال:

اشتراک گذاری