Online Reinforcement Learning Using a Probability Density Estimation

نویسندگان

  • Alejandro Agostini
  • Enric Celaya
چکیده

Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of Õ(T 3 4 ) after any T steps has been given by Ortner...

متن کامل

Reinforcement Learning for Robot Control using Probability Density Estimations

The successful application of Reinforcement Learning (RL) techniques to robot control is limited by the fact that, in most robotic tasks, the state and action spaces are continuous, multidimensional, and in essence, too large for conventional RL algorithms to work. The well known curse of dimensionality makes infeasible using a tabular representation of the value function, which is the classica...

متن کامل

Probability Density Estimation of the Q Function for Reinforcement Learning IRI Technical Report

Performing Q-Learning in continuous state-action spaces is a problem still unsolved for many complex applications. The Q function may be rather complex and can not be expected to fit into a predefined parametric model. In addition, the function approximation must be able to cope with the high non-stationarity of the estimated q values, the on-line nature of the learning with a strongly biased s...

متن کامل

Online Probability Density Estimation of Nonstationary Random Signal using Dynamic Bayesian Networks 109 Online Probability Density Estimation of Nonstationary Random Signal using Dynamic Bayesian Networks

We present two estimators for discrete non-Gaussian and nonstationary probability density estimation based on a dynamic Bayesian network (DBN). The first estimator is for offline computation and consists of a DBN whose transition distribution is represented in terms of kernel functions. The estimator parameters are the weights and shifts of the kernel functions. The parameters are determined th...

متن کامل

Reinforcement Learning Estimation of Distribution Algorithm

This paper proposes an algorithm for combinatorial optimizations that uses reinforcement learning and estimation of joint probability distribution of promising solutions to generate a new population of solutions. We call it Reinforcement Learning Estimation of Distribution Algorithm (RELEDA). For the estimation of the joint probability distribution we consider each variable as univariate. Then ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural computation

دوره 29 1  شماره 

صفحات  -

تاریخ انتشار 2017