منابع مشابه
Why did TD-Gammon Work?
Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neural network, without using back-propagation, reinforcement ...
متن کاملFeature Construction for Reinforcement Learning in Hearts
Temporal difference (TD) learning has been used to learn strong evaluation functions in a variety of two-player games. TD-gammon illustrated how the combination of game tree search and learning methods can achieve grand-master level play in backgammon. In this work, we develop a player for the game of hearts, a 4-player game, based on stochastic linear regression and TD learning. Using a small ...
متن کاملSSCC TD: A Serial and Simultaneous Configural-Cue Compound Stimuli Representation for Temporal Difference Learning
This paper presents a novel representational framework for the Temporal Difference (TD) model of learning, which allows the computation of configural stimuli--cumulative compounds of stimuli that generate perceptual emergents known as configural cues. This Simultaneous and Serial Configural-cue Compound Stimuli Temporal Difference model (SSCC TD) can model both simultaneous and serial stimulus ...
متن کاملTD(λ) Networks: Temporal-Difference Networks with Eligibility Traces
Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-s...
متن کاملThe Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ICGA Journal
سال: 1995
ISSN: 2468-2438,1389-6911
DOI: 10.3233/icg-1995-18207