temporal difference learning

Learning to Play Board Games using Temporal Difference Methods

2005

Marco A. Wiering Jan Peter Patist Henk Mannen

A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an expert program, and (3) Learning from viewing experts play against themselves. Although the third...

متن کامل

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

2017

Markus Dumke

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

متن کامل

Control of Multivariable Systems Based on Emotional Temporal Difference Learning Controller

Journal: International Journal of Engineering 2004

Ali Khaki-Sedigh, C. Lucas, GholamHassan Famil Khalili, Javad Abdi, Mehrdad Fatourechi,

One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...

متن کامل

Temporal Difference Model Reproduces Anticipatory Neural Activity

Journal: :Neural computation 2001

Roland E. Suri Wolfram Schultz

Anticipatory neural activity preceding behaviorally important events has been reported in cortex, striatum, and midbrain dopamine neurons. Whereas dopamine neurons are phasically activated by reward-predictive stimuli, anticipatory activity of cortical and striatal neurons is increased during delay periods before important events. Characteristics of dopamine neuron activity resemble those of th...

متن کامل

Effect of look-ahead search depth in learning position evaluation functions for Othello using epsilon-greedy exploration

2007

Thomas Philip Runarsson Egill O. Jonsson

This paper studies the effect of varying the depth of look-ahead for heuristic search in temporal difference (TD) learning and game playing. The acquisition position evaluation functions for the game of Othello is studied. The paper provides important insights into the strengths and weaknesses of using different search depths during learning when 2-greedy exploration is applied. The main findin...

متن کامل

Algorithms for Fast Gradient Temporal Difference Learning

2012

Christoph Dann

Temporal difference learning is one of the oldest and most used techniques in reinforcement learning to estimate value functions. Many modifications and extension of the classical TD methods have been proposed. Recent examples are TDC and GTD(2) ([Sutton et al., 2009b]), the first approaches that are as fast as classical TD and have proven convergence for linear function approximation in onand ...

متن کامل

Temporal Difference Learning in Chinese Chess

1998

Thong B. Trinh Anwer S. Bashi Nikhil Deshpande

Reinforcement learning, in general, has not been totally successful at solving complex realworld problems which can be described by nonlinear functions. However, temporal difference learning is a type of reinforcement learning algorithm that has been researched and applied to various prediction problems with promising results. This paper discusses the application of temporal-difference learning...

متن کامل

L1 Regularized Linear Temporal Difference Learning

2012

Christopher Painter-Wakefield Ronald Parr

Several recent efforts in the field of reinforcement learning have focused attention on the importance of regularization, but the techniques for incorporating regularization into reinforcement learning algorithms, and the effects of these changes upon the convergence of these algorithms, are ongoing areas of research. In particular, little has been written about the use of regularization in onl...

متن کامل

Temporal Difference Learning in Score-Four

2012

Matthew Hlavacek Benedict Lim John Ngo

We have developed a machine-learning taught computer player for the game Score-Four, a three-dimensional variant of the aptly named Connect Four. This project was constructed at Northwestern University as a final project for the EECS 349: machine Learning course taught by Professor Bryan Pardo, where we applied temporal difference learning onto a neural network.

متن کامل

Online Multi-Task Gradient Temporal-Difference Learning

2014

Vishnu Purushothaman Sreenivasan Haitham Bou-Ammar Eric Eaton

Reinforcement learning (RL) is an essential tool in designing autonomous systems, yet RL agents often require extensive experience to achieve optimal behavior. This problem is compounded when an RL agent is required to learn policies for different tasks within the same environment or across multiple environments. In such situations, learning task models jointly rather than independently can sig...

متن کامل