نتایج جستجو برای: temporal difference learning
تعداد نتایج: 1222164 فیلتر نتایج به سال:
This paper investigates to what extent learning methods are beneficial for the Lines of Action tournament program MIA. We focus on two components of the program: (1) the evaluation function and (2) the move ordering. Using temporal difference learning the evaluation function was improved by tuning the weights. We found substantial improvements for three weights. The move ordering was enhanced b...
Temporal difference learning models of dopamine assert that phasic levels of dopamine encode a reward prediction error. However, this hypothesis has been challenged by recent observations of gradually ramping stratal dopamine levels as a goal is approached. This note describes conditions under which temporal difference learning models predict dopamine ramping. The key idea is representational: ...
This paper describes the results of applying Temporal Difference (TD) learning with a network to the opening game problems in Go. The main difference from other research is that this experiment applied TD learning to the fullsized (19×19) game of Go instead of a simple version (e.g., 9×9 game). We discuss and compare TD(λ) learning for predicting an opening game’s winning and for finding the be...
The most difficult but realistic learning tasks are both noisy and computationally intensive. This paper investigates how, for a given solution representation, coevolutionary learning can achieve the highest ability from the least computation time. Using a population of Backgammon strategies, this paper examines ways to make computational costs reasonable. With the same simple architecture Gera...
Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalisation between states is determined by how similar their successors are, and representations should follow suit. This paper shows howTDmachinery can be used to learn such representations, and illustrates, using a...
Estimation of returns over time, the focus of temporal difference (TD) algorithms, imposes particular constraints on good function approximators or representations. Appropriate generalization between states is determined by how similar their successors are, and representations should follow suit. This paper shows how TD machinery can be used to learn such representations, and illustrates, using...
In this paper we examine the application of temporal difference methods in learning a linear state value function approximation in a game of give-away checkers. Empirical results show that the TD(λ) algorithm can be successfully used to improve playing policy quality in this domain. Training games with strong and random opponents were considered. Results show that learning only on negative game...
With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo m...
The common belief is that using Reinforcement Learning methods (RL) with bootstrapping gives better results than without. However, inclusion of bootstrapping increases the complexity of the RL implementation and requires significant effort. This study investigates whether inclusion of bootstrapping is worth the effort when applying RL to inventory problems. Specifically, we investigate bootstra...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید