نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

2005
Masafumi Nishida Yasuo Horiuchi Akira Ichikawa

This paper describes a novel approach based on online unsupervised adaptation and clustering using temporal-difference (TD) learning. Temporal-difference learning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The adaptation progres...

2014
Tim Brys Ann Nowé Daniel Kudenko Matthew E. Taylor

Multi-objective problems with correlated objectives are a class of problems that deserve specific attention. In contrast to typical multi-objective problems, they do not require the identification of trade-offs between the objectives, as (near-) optimal solutions for any objective are (near-) optimal for every objective. Intelligently combining the feedback from these objectives, instead of onl...

Journal: :Neural computation 2009
Christoph Kolodziejski Bernd Porr Florentin Wörgötter

In this theoretical contribution, we provide mathematical proof that two of the most important classes of network learning-correlation-based differential Hebbian learning and reward-based temporal difference learning-are asymptotically equivalent when timing the learning with a modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning ...

2014
Wojciech Jaskowski Marcin Grzegorz Szubert Pawel Liskowski

We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: i) generalization performance or expected utility, ii) average results against a hand-crafted heuristic and iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance meas...

Journal: :CoRR 2017
Bo Liu Daoming Lyu Wen Dong Saad Biaz

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions are optimal w.r.t approximating the true value function V . Two novel algorithms are proposed to approximate the true value function V . This paper makes the following contributions: • A batch algorit...

2014
Christopher J. Gatti Mark J. Embrechts

We use a reinforcement learning approach to learn a real world control problem, the truck backer-upper problem. In this problem, a tractor trailer truck must be backed into a loading dock from an arbitrary location and orientation. Our approach uses the temporal difference algorithm using a neural network as the value function approximator. The novelty of this work is the simplicity of our impl...

2012
Michel Tokic Günther Palm

Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state...

2006
Adrian K. Agogino Kagan Tumer

Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit must be assigned for an action taken at time step t that results in a reward at time step t′ > t. Second, credit must be assigned for the contribution of agent i to the overall system performance. The first credit assig...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید