نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

2006
László Jeni Zoltán Istenes Péter Korondi Hideki Hashimoto

Finding the safest shortest path in an unknown environment is a fundamental task in mobile robotics. To emulate the human adaptibility in this field, we can use the Intelligent Space concept. The Intelligent Space is a distributed sensory system, which is the background infrastructure to observe human walking in a limited area. The observation of human beings is applied to create a walkable are...

2013
David Silver Leonard Newnham David Barker Suzanne Weller Jason McFall

In this paper, we explore applications in which a company interacts concurrently with many customers. The company has an objective function, such as maximising revenue, customer satisfaction, or customer loyalty, which depends primarily on the sequence of interactions between company and customer. A key aspect of this setting is that interactions with different customers occur in parallel. As a...

2009
Kunikazu Kobayashi Hiroyuki Mizoue Takashi Kuremoto Masanao Obayashi

In general, meta-parameters in a reinforcement learning system, such as a learning rate and a discount rate, are empirically determined and fixed during learning. When an external environment is therefore changed, the sytem cannot adapt itself to the variation. Meanwhile, it is suggested that the biological brain might conduct reinforcement learning and adapt itself to the external environment ...

2012
David Silver

Temporal-difference (TD) networks (Sutton and Tanner, 2004) are a predictive representation of state in which each node is an answer to a question about future observations or questions. Unfortunately, existing algorithms for learning TD networks are known to diverge, even in very simple problems. In this paper we present the first sound learning rule for TD networks. Our approach is to develop...

2000
Robert Levinson Ryan Weber

Over the years, various research projects have attempted to develop a chess program that learns to play well given little prior knowledge beyond the rules of the game. Early on it was recognized that the key would be to adequately represent the relationships between the pieces and to evaluate the strengths or weaknesses of such relationships. As such, representations have developed, including a...

2015
Jacob Rafati David C. Noelle

There is growing support for Temporal Difference (TD) Learning as a formal account of the role of the midbrain dopamine system and the basal ganglia in learning from reinforcement. This account is challenged, however, by the fact that realistic implementations of TD Learning have been shown to fail on some fairly simple learning tasks — tasks well within the capabilities of humans and non-human...

1998
Jun Morimoto Kenji Doya

I n this paper, we propose a learning method f o r implementing human-like sequential movements in robots. As an example of dynamic sequential movement, we consider the “stand-up” task f o r a two-joint, three-link robot. In contrast t o the case of steady walking or standing, the desired trajectory fo r such a transient behavior is very dificult t o derive. The goal of the task is to find a pa...

Journal: :KI 2009
Julien Vitay Jérémy Fix Fred Henrik Hamker Henning Schroll Frederik Beuth

This review focuses on biological issues of reinforcement learning. Since the influential discovery of W. Schultz of an analogy between the reward prediction error signal of the temporal difference algorithm and the firing pattern of some dopaminergic neurons in the midbrain during classical conditioning, biological models have emerged that use computational reinforcement learning concepts to e...

2010
Violeta Mirchevska Boštjan Kaluža

Reinforcement learning is an approach for learning optimal action policy via experiencing, i.e. using observed reward in environment states. Reinforcement learning algorithms include adaptive dynamic programming, temporal difference learning and Q-learning[1]. Examples of successful applications of reinforcement learning are controller for sustained inverted flight on an autonomous helicopter [...

2006
Matthew E. Taylor Shimon Whiteson Peter Stone

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید