Internally Driven Q-learning - Convergence and Generalization Results
نویسندگان
چکیده
We present an approach to solving the reinforcement learning problem in which agents are provided with internal drives against which they evaluate the value of the states according to a similarity function. We extend Q-learning by substituting internally driven values for ad hoc rewards. The resulting algorithm, Internally Driven Q-learning (IDQ-learning), is experimentally proved to convergence to optimality and to generalize well. These results are preliminary yet encouraging: IDQ-learning is more psychologically plausible than Q-learning, and it devolves control and thus autonomy to agents that are otherwise at the mercy of the environment (i.e., of the designer).
منابع مشابه
On the approximation by Chlodowsky type generalization of (p,q)-Bernstein operators
In the present article, we introduce Chlodowsky variant of $(p,q)$-Bernstein operators and compute the moments for these operators which are used in proving our main results. Further, we study some approximation properties of these new operators, which include the rate of convergence using usual modulus of continuity and also the rate of convergence when the function $f$ belongs to the class Li...
متن کاملTwo Novel Learning Algorithms for CMAC Neural Network Based on Changeable Learning Rate
Cerebellar Model Articulation Controller Neural Network is a computational model of cerebellum which acts as a lookup table. The advantages of CMAC are fast learning convergence, and capability of mapping nonlinear functions due to its local generalization of weight updating, single structure and easy processing. In the training phase, the disadvantage of some CMAC models is unstable phenomenon...
متن کاملConnectionist Q-learning in Robot Control Task
The Q-Learning algorithm suggested by Watkins in 1989 [1] belongs to a group of reinforcement learning algorithms. Reinforcement learning in robot control tasks has the form of multi-step procedure of adaptation. The main feature of that technique is that in the process of learning the system is not shown how to act in a specific situation. Instead, learning develops by trial and error using re...
متن کاملA distinct numerical approach for the solution of some kind of initial value problem involving nonlinear q-fractional differential equations
The fractional calculus deals with the generalization of integration and differentiation of integer order to those ones of any order. The q-fractional differential equation usually describe the physical process imposed on the time scale set Tq. In this paper, we first propose a difference formula for discretizing the fractional q-derivative of Caputo type with order and scale index . We es...
متن کاملSemi-online neural-Q_leaming for real-time robot learning
Reinforcement Learning (RL) is a very suitable technique for robot learning, as it can learn in unknown environments and in real-time computation. The main difficulties in adapting classic RL algorithms to robotic systems are the generalization problem and the correct observation of the Markovian state. This paper attempts to solve the generalization problem by proposing the Semi-Online NeuralQ...
متن کامل