temporal difference learning

Bayesian Reinforcement Learning with Behavioral Feedback

2016

Teakgyu Hong Jongmin Lee Kee-Eung Kim Pedro A. Ortega Daniel D. Lee

In the standard reinforcement learning setting, the agent learns optimal policy solely from state transitions and rewards from the environment. We consider an extended setting where a trainer additionally provides feedback on the actions executed by the agent. This requires appropriately incorporating the feedback, even when the feedback is not necessarily accurate. In this paper, we present a ...

متن کامل

Learning Shaping Rewards in Model-based Reinforcement Learning

2009

Marek Grzes Daniel Kudenko

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper...

متن کامل

Temporal Second Difference Traces

Journal: :CoRR 2011

Mitchell Keith Bloch

Q-learning is a reliable but inefficient off-policy temporal-difference method, backing up reward only one step at a time. Replacing traces, using a recency heuristic, are more efficient but less reliable. In this work, we introduce model-free, off-policy temporal difference methods that make better use of experience than Watkins’ Q(λ). We introduce both Optimistic Q(λ) and the temporal second ...

متن کامل

Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison

2007

Matthew E. Taylor Shimon Whiteson Peter Stone

Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weakne...

متن کامل

Reducing Commitment to Tasks with Off-Policy Hierarchical Reinforcement Learning

Journal: :CoRR 2011

Mitchell Keith Bloch

In experimenting with off-policy temporal difference (TD) methods in hierarchical reinforcement learning (HRL) systems, we have observed unwanted on-policy learning under reproducible conditions. Here we present modifications to several TD methods that prevent unintentional on-policy learning from occurring. These modifications create a tension between exploration and learning. Traditional TD m...

متن کامل

Adaptive objective selection for correlated objectives in multi-objective reinforcement learning

2014

Tim Brys Kristof Van Moffaert Ann Nowé Matthew E. Taylor

In this paper we introduce a novel scale-invariant and parameterless technique, called adaptive objective selection, that allows a temporal-difference learning agent to exploit the correlation between objectives in a multi-objective problem. It identifies and follows in each state the objective whose estimates it is most confident about. We propose several variants of the approach and empirical...

متن کامل

Reinforcement Learning with N-tuples on the Game Connect-4

2012

Markus Thill Patrick Koch Wolfgang Konen

Learning complex game functions is still a difficult task. We apply temporal difference learning (TDL), a well-known variant of the reinforcement learning approach, in combination with n-tuple networks to the game Connect-4. Our agent is trained just by self-play. It is able, for the first time, to consistently beat the optimal-playing Minimax agent (in game situations where a win is possible)....

متن کامل

The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy

2003

Clifford Kotnik Jugal K. Kalita

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...

متن کامل

Reinforcement Learning: Insights from Interesting Failures in Parameter Selection

2008

Wolfgang Konen Thomas Bartz-Beielstein

We investigate reinforcement learning methods, namely the temporal difference learning TD(λ) algorithm, on game-learning tasks. Small modifications in algorithm setup and parameter choice can have significant impact on success or failure to learn. We demonstrate that small differences in input features influence significantly the learning process. By selecting the right feature set we found goo...

متن کامل

Speeding up Tabular Reinforcement Learning Using State-Action Similarities (Extended Abstract)

2017

Ariel Rosenfeld Matthew E. Taylor Sarit Kraus

One of the most prominent approaches for speeding up reinforcement learning is injecting human prior knowledge into the learning agent. This paper proposes a novel method to speed up temporal difference learning by using state-action similarities. These handcoded similarities are tested in three well-studied domains of varying complexity, demonstrating our approach’s benefits.

متن کامل