temporal difference learning

Learning complementary multiagent behaviors: a case study

2009

Shivaram Kalyanakrishnan Peter Stone

As machine learning is applied to increasingly complex tasks, it is likely that the diverse challenges encountered can only be addressed by combining the strengths of different learning algorithms. We examine this aspect of learning through a case study grounded in the robot soccer context. The task we consider is Keepaway, a popular benchmark for multiagent reinforcement learning from the simu...

متن کامل

Temporal Difference Updating without a Learning Rate

2007

Marcus Hutter Shane Legg

In the field of reinforcement learning, temporal difference (TD) learning is perhaps the most popular way to estimate the future discounted reward of states. We derive an equation for TD learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting equation is similar...

متن کامل

Optimistic Risk Perception in the Temporal Difference error Explains the Relation between Risk-taking, Gambling, Sensation-seeking and Low Fear

2014

Joost Broekens Tim Baarslag

Understanding the affective, cognitive and behavioural processes involved in risk taking is essential for treatment and for setting environmental conditions to limit damage. Using Temporal Difference Reinforcement Learning (TDRL) we computationally investigated the effect of optimism in risk perception in a variety of goal-oriented tasks. Optimism in risk perception was studied by varying the c...

متن کامل

An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

Journal: :Eng. Appl. of AI 2013

Simon Box Ben Waterson

This paper shows how temporal difference learning can be used to build a signalized junction controller that will learn its own strategies though experience. Simulation tests detailed here show that the learned strategies can have high performance. This work builds upon previous work where a neural network based junction controller that can learn strategies from a human expert was developed (Bo...

متن کامل

Reducing the Memory Footprint of Temporal Difference Learning over Finitely Many States by Using Case-Based Generalization

2010

Matt Dilts Hector Muñoz-Avila

In this paper we present an approach for reducing the memory footprint requirement of temporal difference methods in which the set of states is finite. We use case-based generalization to group the states visited during the reinforcement learning process. We follow a lazy learning approach; cases are grouped in the order in which they are visited. Any new state visited is assigned to an existin...

متن کامل

Learning to Support Another Learner

2005

Sattiraju V. Prabhakar

Learners such as humans and intelligent agents, often require support to perform complex tasks in real world environments. Inaccessible states and the complexity of tasks add to this requirement. Large state and action spaces associated with the environments also contribute to this need for support. In this paper, we present a design of a learning agent that learns from the environment in order...

متن کامل

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

Journal: :CoRR 2018

Vitchyr Pong Shixiang Gu Murtaza Dalal Sergey Levine

Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information...

متن کامل

Some Explorations in Reinforcement Learning Techniques Applied to the Problem of Learning to Play Pinball

2002

Nathaniel Scott Winstead

Historically, the accepted approach to control problems in physically complicated domains has been through machine learning, due to the fact that knowledge engineering in these domains can be extremely complicated. When the already physically complicated domain is also continuous and dynamical (possibly with composite and/or sequential goals), the learning task becomes even more difficult due t...

متن کامل

Reinforcement of Local Pattern Cases for Playing Tetris

2008

Houcine Romdhane Luc Lamontagne

In the paper, we investigate the use of reinforcement learning in CBR for estimating and managing a legacy case base for playing the game of Tetris. Each case corresponds to a local pattern describing the relative height of a subset of columns where pieces could be placed. We evaluate these patterns through reinforcement learning to determine if significant performance improvement can be observ...

متن کامل

Behaviour navigation learninig using facl algorithm

2007

Abdelkarim Souissi Hacene Rezine

In this article, we are interested in the reactive behaviours navigation training of a mobile robot in an unknown environment. The method we will suggest ensures navigation in unknown environments with presence off different obstacles shape and consists in bringing the robot in a goal position, avoiding obstacles and releasing it from the tight corners and deadlock obstacles shape. In this fram...

متن کامل