A Generalization Error for Q-Learning
نویسنده
چکیده
Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.
منابع مشابه
A distinct numerical approach for the solution of some kind of initial value problem involving nonlinear q-fractional differential equations
The fractional calculus deals with the generalization of integration and differentiation of integer order to those ones of any order. The q-fractional differential equation usually describe the physical process imposed on the time scale set Tq. In this paper, we first propose a difference formula for discretizing the fractional q-derivative of Caputo type with order and scale index . We es...
متن کاملDoes generalization performance of lq regularization learning depend on q? A negative example
l-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a l estimator differs in varying choices of the regularization order q. In particular, l leads to the LASSO estimate, while l corres...
متن کاملAttentional Mechanisms as a Strategy for Generalization in the Q-Learning Algorithm
In the last few years, reinforcement learning algorithms have been proposed as a more natural way of modelling animal learning. Unlike supervised learning methods, reinforcement learning addresses the basic problem faced by an animal when trying to control a discrete stochastic dynamic system: discover by trial and error a policy of actions that maximises some criterium of optimality, usually e...
متن کاملLearning dynamics of simple perceptrons with non-extensive cost functions.
A Tsallis-statistics-based generalization of the gradient descent dynamics (using non- extensive cost functions), recently introduced by one of us, is proposed as a learning rule in a simple perceptron. The resulting Langevin equations are solved numerically for different values of an index q (q = 1 and q ≠ 1 respectively correspond to the extensive and non-extensive cases) and for different co...
متن کاملConnectionist Q-learning in Robot Control Task
The Q-Learning algorithm suggested by Watkins in 1989 [1] belongs to a group of reinforcement learning algorithms. Reinforcement learning in robot control tasks has the form of multi-step procedure of adaptation. The main feature of that technique is that in the process of learning the system is not shown how to act in a specific situation. Instead, learning develops by trial and error using re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of machine learning research : JMLR
دوره 6 شماره
صفحات -
تاریخ انتشار 2005