Fitted Q-iteration by Functional Networks for control problems

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosted Fitted Q-Iteration

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bellman residuals acros...

متن کامل

Fitted Q-iteration by Advantage Weighted Regression

Recently, fitted Q-iteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonl...

متن کامل

CFQI: Fitted Q-Iteration with Complex Returns

Fitted Q-Iteration (FQI) is a popular approximate value iteration (AVI) approach that makes effective use of off-policy data. FQI uses a 1-step return value update which does not exploit the sequential nature of trajectory data. Complex returns (weighted averages of the n-step returns) use trajectory data more effectively, but have not been used in an AVI context because of off-policy bias. In ...

متن کامل

Regularized Fitted Q-iteration: Application to Planning

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the functio...

متن کامل

Bias Correction and Confidence Intervals for Fitted Q-iteration

We consider finite-horizon fitted Q-iteration with linear function approximation to learn a policy from a training set of trajectories. We show that fitted Q-iteration can give biased estimates and invalid confidence intervals for the parameters that feature in the policy. We propose a regularized estimator called soft-threshold estimator, derive it as an approximate empirical Bayes estimator, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Mathematical Modelling

سال: 2016

ISSN: 0307-904X

DOI: 10.1016/j.apm.2016.05.049