reward processes

Eliciting Additive Reward Functions for Markov Decision Processes

2011

Kevin Regan Craig Boutilier

Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decisiontheoretically sound heuristics for eliciting rewards for factored...

متن کامل

A transient dopamine signal encodes subjective value and causally influences demand in an economic context

2017

Scott A. Schelp Katherine J. Pultorak Dylan R. Rakowski Devan M. Gomez Gregory Krzystyniak Raibatak Das Erik B. Oleson

The mesolimbic dopamine system is strongly implicated in motivational processes. Currently accepted theories suggest that transient mesolimbic dopamine release events energize reward seeking and encode reward value. During the pursuit of reward, critical associations are formed between the reward and cues that predict its availability. Conditioned by these experiences, dopamine neurons begin to...

متن کامل

Reward associations impact both iconic and visual working memory

Journal: :Vision Research 2015

Elisa Infanti Clayton Hickey Massimo Turatto

Reward plays a fundamental role in human behavior. A growing number of studies have shown that stimuli associated with reward become salient and attract attention. The aim of the present study was to extend these results into the investigation of iconic memory and visual working memory. In two experiments we asked participants to perform a visual-search task where different colors of the target...

متن کامل

Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.

Journal: :Psychological review 2007

A David Redish Steve Jensen Adam Johnson Zeb Kurth-Nelson

Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL models are based on the hypothesis that dopamine ...

متن کامل

Negative reward expectations in Borderline Personality Disorder patients: neurophysiological evidence.

Journal: :Biological psychology 2013

Daniel Vega Àngel Soto Julià L Amengual Joan Ribas Rafael Torrubia Antoni Rodríguez-Fornells Josep Marco-Pallarés

Borderline Personality Disorder (BPD) patients present profound disturbances in affect regulation and impulse control which could reflect a dysfunction in reward-related processes. The current study investigated these processes in a sample of 18 BPD patients and 18 matched healthy controls, using an event-related brain potentials methodology. Results revealed a reduction in the amplitude of the...

متن کامل

Performability Analysis Us ing Semi-Markov Reward Processes

1990

GIANFRANCO CIARDO KISHOR S. TRIVEDI

With the increasing complexity of multiprocessor and distributed processing systems, the need to develop efficient and accurate modeling methods is evident. Fault tolerance and degradable performance of such systems has given rise to considerable interest in models for the combined evaluation of performance and reliability [l], [2]. Markov or semi-Markov reward models can be used to evaluate th...

متن کامل

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

Journal: :CoRR 2015

Reinaldo Uribe Fernando Lozano Charles Anderson

This paper describes a novel method to solve average-reward semi-Markov decision processes, by reducing them to a minimal sequence of cumulative reward problems. The usual solution methods for this type of problems update the gain (optimal average reward) immediately after observing the result of taking an action. The alternative introduced, optimal nudging, relies instead on setting the gain t...

متن کامل

Reward Networks in the Brain as Captured by Connectivity Measures

2009

Estela Camara Antoni Rodriguez-Fornells Zheng Ye Thomas F. Münte

An assortment of human behaviors is thought to be driven by rewards including reinforcement learning, novelty processing, learning, decision making, economic choice, incentive motivation, and addiction. In each case the ventral tegmental area/ventral striatum (nucleus accumbens) (VTA-VS) system has been implicated as a key structure by functional imaging studies, mostly on the basis of standard...

متن کامل

Simulation-based optimization of Markov decision processes: An empirical process theory approach

Journal: :Automatica 2010

Rahul Jain Pravin Varaiya

We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted...

متن کامل

On the Undecidability of Probabilistic Planning and In nite-Horizon Partially Observable Markov Decision Problems

1999

Omid Madani Steve Hanks Anne Condon

We investigate the computability of problems in probabilistic planning and partially observable innnite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic nite automata is adapted to show that the following problem of plan existence in probabilistic planning is undecidable: given a probabilistic planning problem, determine whether there exist...

متن کامل