Gradient Temporal Difference Networks

نویسنده

  • David Silver
چکیده

Temporal-difference (TD) networks (Sutton and Tanner, 2004) are a predictive representation of state in which each node is an answer to a question about future observations or questions. Unfortunately, existing algorithms for learning TD networks are known to diverge, even in very simple problems. In this paper we present the first sound learning rule for TD networks. Our approach is to develop a true gradient descent algorithm that takes account of all three roles performed by each node in the network: as state, as an answer, and as a target for other questions. Our algorithm combines gradient temporal-difference learning (Maei et al., 2009) with real-time recurrent learning (Williams and Zipser, 1994). We provide a generalisation of the Bellman equation that corresponds to the semantics of the TD network, and prove that our algorithm converges to a fixed point of this equation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-step Predictions Based on TD-DBP ELMAN Neural Network for Wave Compensating Platform

The gradient descent momentum and adaptive learning rate TD-DBP algorithm can improve the training speed and stability of Elman network effectively. BP algorithm is the typical supervised learning algorithm, so neural network cannot be trained on-line by it. For this reason, a new algorithm (TDDBP), which was composed of temporal difference (TD) method and dynamic BP algorithm (DBP), was propos...

متن کامل

Modular SRV Reinforcement Learning Architectures for Non-linear Control

This paper demonstrates the advantages of using a hybrid reinforcement–modular neural network architecture for non-linear control. Specifically, the method of ACTION-CRITIC reinforcement learning, modular neural networks, competitive learning and stochastic updating are combined. This provides an architecture able to both support temporal difference learning and probabilistic partitioning of th...

متن کامل

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximat...

متن کامل

Intelligent Optimization of a Mixed Culture Cultivation Process

In the present paper a neural network approach called “Adaptive Critic Design” (ACD) was applied to optimal tuning of set point controllers of the three main substrates (sugar, nitrogen source and dissolved oxygen) for PHB production process. For approximation of the critic and the controllers a special kind of recurrent neural networks called Echo state networks (ESN) were used. Their structur...

متن کامل

Handwritten Character Recognition using Modified Gradient Descent Technique of Neural Networks and Representation of Conjugate Descent for Training Patterns

The purpose of this study is to analyze the performance of Back propagation algorithm with changing training patterns and the second momentum term in feed forward neural networks. This analysis is conducted on 250 different words of three small letters from the English alphabet. These words are presented to two vertical segmentation programs which are designed in MATLAB and based on portions (1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012