نتایج جستجو برای: مدل reward beta

تعداد نتایج: 336113  

2012
Koji Toda Yasuko Sugase-Miyamoto Takashi Mizuhiki Kiyonori Inaba Barry J. Richmond Munetaka Shidara

BACKGROUND The value of a predicted reward can be estimated based on the conjunction of both the intrinsic reward value and the length of time to obtain it. The question we addressed is how the two aspects, reward size and proximity to reward, influence the responses of neurons in rostral anterior cingulate cortex (rACC), a brain region thought to play an important role in reward processing. ...

Journal: :Journal of neurophysiology 2015
Ali A Nikooyan Alaa A Ahmed

Recent findings have demonstrated that reward feedback alone can drive motor learning. However, it is not yet clear whether reward feedback alone can lead to learning when a perturbation is introduced abruptly, or how a reward gradient can modulate learning. In this study, we provide reward feedback that decays continuously with increasing error. We asked whether it is possible to learn an abru...

2014
Victor Costumero Alfonso Barrós Juan Carlos Bustamante Noelia Ventura-Campos Paola Fuentes César Ávila Alfonso Barrós-Loscertales

Reward sensitivity or the tendency to engage in motivated approach behavior in the presence of rewarding stimuli, may be a contributing factor to vulnerability to disinhibitory behaviors. While evidence exists for a reward sensitivity-related increased response in reward brain areas (i.e., nucleus accumbens or midbrain) during the processing of reward cues, it is unknown how this trait modulate...

2017
Joe Collenette Katie Atkinson Daan Bloembergen Karl Tuyls

Simulating mood within a decision making process has been shown to allow cooperation to occur within the Prisoner’s Dilemma. In this paper we propose how to integrate a mood model into the classical reinforcement learning algorithm Sarsa, and show how this addition can allow self-interested agents to be successful within a multi agent environment. The human-inspired moody agent will learn to co...

Journal: :CoRR 2015
Yusen Zhan Matthew E. Taylor

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Qlearning and Sarsa with ...

2006
Laurissa N. Tokarchuk John Bigham Laurie G. Cuthbert

Reinforcement learning (RL) is a machine learning technique for sequential decision making. This approach is well proven in many small-scale domains. The true potential of this technique cannot be fully realised until it can adequately deal with the large domain sizes that typically describe real world problems. RL with function approximation is one method of dealing with the domain size proble...

Journal: :CoRR 2018
Kamil Ciosek Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussi...

2005
Peter Vamplew Robert Ollington

In order to scale to problems with large or continuous state-spaces, reinforcement learning algorithms need to be combined with function approximation techniques. The majority of work on function approximation for reinforcement learning has so far focused either on global function approximation with a static structure (such as multi-layer perceptrons), or on constructive architectures using loc...

Journal: :Theor. Comput. Sci. 2007
Rocco De Nicola Joost-Pieter Katoen Diego Latella Michele Loreti Mieke Massink

The TemporalMobile Stochastic Logic (MOSL) has been introduced in previous work by the authors for formulating properties of systems specified in STOKLAIM, a Markovian extension of KLAIM. The main purpose of MOSL is to address key functional aspects of global computing such as distribution awareness, mobility, and security and their integration with performance and dependability guarantees. In ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید