نتایج جستجو برای: temporal difference learning
تعداد نتایج: 1222164 فیلتر نتایج به سال:
This article proposes a general, intuitive and rigorous framework for designing temporal differences algorithms to solve optimal control problems in continuous time and space. Within this framework, we derive a version of the classical TD(λ) algorithm as well as a new TD algorithm which is similar, but designed to be more accurate and to converge as fast as TD(λ) for the best values of λ withou...
Palamedes is an ongoing project for building expert playing bots that can play backgammon variants. As in all successful modern backgammon programs, it is based on neural networks trained using temporal difference learning. This paper improves upon the training method that we used in our previous approach for the two backgammon variants popular in Greece and neighboring countries, Plakoto and F...
Remote sensing technology is very useful in getting synoptic estimates of grass bio-parameters which is important for rangeland monitoring . The retrieval of bio-parameters is usually realized by developing empirical regression formula. However, it is less convincing if the regression line is built on limited observations from one growth stage . In this research, three field surveys were carrie...
This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously developed weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games o...
In this paper we present TDLeaf(λ), a variation on the TD(λ) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(λ) and another less radical variant, TD-directed(λ). In particular, our chess program, " KnightCap, " used TDLeaf(λ) to learn its evaluation fun...
Temporal difference (TD) learning and its variants, such as multistage TD (MS-TD) temporal coherence (TC) learning, have been successfully applied to 2048. These methods rely on the stochasticity of environment 2048 for exploration. In this paper, we propose employ optimistic initialization (OI) encourage exploration 2048, empirically show that quality is significantly improved. This approach o...
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergen...
We use temporal difference (TD) learning to train neural networks for four nondeterministic board games: backgammon, hypergammon, pachisi, and Parcheesi. We investigate the influence of two variables on the development of these networks: first, the source of training data, either learner-vs.self or learner-vs.-other game play; second, the choice of attributes used: a simple encoding of the boar...
Recently there has been much interest in modeling the activity of primate midbrain dopamine neurons as signalling reward prediction error. But since the models are based on temporaldi!erence (TD) learning, they assume an exponential decline with time in the value of delayed reinforcers, an assumption long known to con#ict with animal behavior. We show that a variant of TD learning that tracks v...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید