نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

2002
Mark H. M. Winands Levente Kocsis Jos W. H. M. Uiterwijk H. Jaap van den Herik

This paper investigates to what extent learning methods are beneficial for the Lines of Action tournament program MIA. We focus on two components of the program: (1) the evaluation function and (2) the move ordering. Using temporal difference learning the evaluation function was improved by tuning the weights. We found substantial improvements for three weights. The move ordering was enhanced b...

Journal: :Neural computation 2014
Samuel Gershman

Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: ...

2003
Byung-Doo Lee Hans Werner Guesgen Jacky Baltes

This paper describes the results of applying Temporal Difference (TD) learning with a network to the opening game problems in Go. The main difference from other research is that this experiment applied TD learning to the fullsized (19×19) game of Go instead of a simple version (e.g., 9×9 game). We discuss and compare TD(λ) learning for predicting an opening game’s winning and for finding the be...

2000
Paul J. Darwen

The most difficult but realistic learning tasks are both noisy and computationally intensive. This paper investigates how, for a given solution representation, coevolutionary learning can achieve the highest ability from the least computation time. Using a population of Backgammon strategies, this paper examines ways to make computational costs reasonable. With the same simple architecture Gera...

1993
Peter Dayan

Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalisation between states is determined by how similar their successors are, and representations should follow suit. This paper shows howTDmachinery can be used to learn such representations, and illustrates, using a...

Journal: :Neural Computation 1993
Peter Dayan

Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalization between states is determined by how similar their successors are, and representations should follow suit. This paper shows how TD machinery can be used to learn such representations, and illustrates, using...

2004
Jacek Mandziuk Daniel Osman

In this paper we examine the application of temporal difference methods in learning a linear state value function approximation in a game of give-away checkers. Empirical results show that the TD(λ) algorithm can be successfully used to improve playing policy quality in this domain. Training games with strong and random opponents were considered. Results show that learning only on negative game...

Journal: :Network: Computation in Neural Systems 2009

2005
Fletcher Lu

With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo m...

2012
Tatpong Katanyukul Edwin K. P. Chong William S. Duff

The common belief is that using Reinforcement Learning methods (RL) with bootstrapping gives better results than without. However, inclusion of bootstrapping increases the complexity of the RL implementation and requires significant effort. This study investigates whether inclusion of bootstrapping is worth the effort when applying RL to inventory problems. Specifically, we investigate bootstra...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید