نتایج جستجو برای: temporal difference learning

تعداد نتایج: 1222164  

Journal: :Journal of Machine Learning Research 2010
Dotan Di Castro Ron Meir

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local m...

2012
Michel Tokic Günther Palm

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data ...

2014
Tetsuro Morimura Takayuki Osogami Tomoyuki Shirai

Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by t...

1995
Richard S. Sutton

Temporal-diierence (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at diierent levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based rein...

2006
Shimon Whiteson Peter Stone

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent’s optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of ...

Journal: :Biological psychiatry. Cognitive neuroscience and neuroimaging 2016
Quentin J M Huys Lorenz Deserno Klaus Obermayer Florian Schlagenhauf Andreas Heinz

Dopamine potentially unites two important roles: one in addiction, being involved in most substances of abuse including alcohol, and a second one in a specific type of learning, namely model-free temporal-difference reinforcement learning. Theories of addiction have long suggested that drugs of abuse may usurp dopamine's role in learning. Here, we briefly review the preclinical literature to mo...

2007
David Silver Richard S. Sutton Martin Müller

We explore an application to the game of Go of a reinforcement learning approach based on a linear evaluation function and large numbers of binary features. This strategy has proved effective in game playing programs and other reinforcement learning applications. We apply this strategy to Go by creating over a million features based on templates for small fragments of the board, and then use te...

2008
Dotan Di Castro Dmitry Volkinshtein Ron Meir

Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g., when function approximation is involved). Interestingly, there is growing evidence that actor-critic approaches based on phasic dopamine signals play a key role in biological learning through cortical and basal gangli...

Journal: :CoRR 2015
Arunselvan Ramaswamy Shalabh Bhatnagar

In this paper we present a ‘stability theorem’ for stochastic approximation (SA) algorithms with ‘controlled Markov’ noise. Such algorithms were first studied by Borkar in 2006. Specifically, sufficient conditions are presented which guarantee the stability of the iterates. Further, under these conditions the iterates are shown to track a solution to the differential inclusion defined in terms ...

Journal: :Journal of Machine Learning Research 2007
Matthew E. Taylor Peter Stone Yaxin Liu

Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. Th...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید