نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

Journal: :JCS 2014
Britton Wolfe James Harpe

Developing general purpose algorithms for learning an accurate model of dynamical systems from example traces of the system is still a challenging research problem. Predictive State Representation (PSR) models represent the state of a dynamical system as a set of predictions about future events. Our work focuses on improving Temporal Difference Networks (TD Nets), a general class of predictive ...

2003
Timothy Huang Graeme Connell Bryan McQuade

In this paper, we present an experimental methodology and results for a machine learning approach to learning opening strategy in the game of Go, a game for which the best computer programs play only at the level of an advanced beginning human player. While the evaluation function in most computer Go programs consists of a carefully crafted combination of pattern matchers, expert rules, and sel...

1992
David J. Finton Yu Hen Hu

In reinforcement learning for multi-step problems, the sparse nature of the feedback aggravates the difficulty of learning to perform. This paper explores the use of a reinforcement learning architecture, leading to a discussion of reinforcement learning in terms of feature abstraction, credit-assignment, and temporal-difference learning. Issues discussed include: the conditioning of the reinfo...

2016
Richard Klima Karl Tuyls Frans Oliehoek

In this paper we present a preliminary investigation of modelling spatial aspects of security games within the context of Markov games. Reinforcement learning is a powerful tool for adaptation in unknown environments, however the basic singleagent RL algorithms are unfit to be applied in adversarial scenarios. Therefore, we profit from Adversarial Multi-Armed Bandit (AMAB) methods which are des...

2009
José Antonio Martín H. Javier de Lope Asiaín Darío Maravall Gómez-Allende

A reinforcement learning algorithm called kNN-TD is introduced. This algorithm has been developed using the classical formulation of temporal difference methods and a k-nearest neighbors scheme as its expectations memory. By means of this kind of memory the algorithm is able to generalize properly over continuous state spaces and also take benefits from collective action selection and learning ...

2004
Kristian Kersting Luc De Raedt

Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Deci...

2004
Pablo L. Durango-Cohen

In the existing approach to maintenance and repair decision making for infrastructure facilities, policy evaluation and policy selection are performed under the assumption that a perfect facility deterioration model is available. The writer formulates the problem of developing maintenance and repair policies as a reinforcement learning problem in order to address this limitation. The writer exp...

2015
Kevin Lloyd Peter Dayan

Substantial evidence suggests that the phasic activity of dopamine neurons represents reinforcement learning's temporal difference prediction error. However, recent reports of ramp-like increases in dopamine concentration in the striatum when animals are about to act, or are about to reach rewards, appear to pose a challenge to established thinking. This is because the implied activity is persi...

Journal: :Nous 2011
Nicholas Shea

Although there has been considerable debate about the existence of metarepresentational capacities in non-human animals and their scope in humans, the well-confirmed temporal difference reinforcement learning models of rewardguided decision making have been largely overlooked. This paper argues that the reward prediction error signals which are postulated by temporal difference models and have ...

Journal: :Neural networks : the official journal of the International Neural Network Society 2010
Marek Grzes Daniel Kudenko

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. I...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید