reward processes

نتایج جستجو برای: reward processes

تعداد نتایج: 554393 فیلتر نتایج به سال:

Exact Distributions for Reward Functions on Semi-markov and Markov Additive Processes

2006

VALERI T. STEFANOV V. T. STEFANOV

The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed. On the other hand, it has been noticed that it is difficult to find exact distribution results which lead to the effective computation of such distributions. Note that there is no satisfactory exact distr...

متن کامل

Striatum processes reward differently in adolescents versus adults.

Journal: :Proceedings of the National Academy of Sciences of the United States of America 2012

David A Sturman Bita Moghaddam

Adolescents often respond differently than adults to the same salient motivating contexts, such as peer interactions and pleasurable stimuli. Delineating the neural processing differences of adolescents is critical to understanding this phenomenon, as well as the bases of serious behavioral and psychiatric vulnerabilities, such as drug abuse, mood disorders, and schizophrenia. We believe that a...

متن کامل

On Markov Decision Processes with Pseudo-Boolean Reward Functions

2018

Michael N. Katehakis

متن کامل

Simulation-Based Optimization of Markov Reward Processes: Implementation Issues

1999

Peter Marbach John N. Tsitsiklis

We consider discrete time, finite state space Markov rewaxd processes which depend on a set of parameters. Previously, we proposed a simulation-based methodology to tune the parameters to optimize the average reward. The resulting algorithms converge with probability 1, but may have a high variance. Here we propose two approaches to reduce the variance, which however introduce a new bias into t...

متن کامل

Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes

Journal: :Math. Oper. Res. 2012

Eugene A. Feinberg Uriel G. Rothblum

This paper studies a discrete-time total-reward Markov decision process (MDP) with a given initial state distribution. A (randomized) stationary policy can be split on a given set of states if the occupancy measure of this policy can be expressed as a convex combination of the occupancy measures of stationary policies, each selecting deterministic actions on the given set and coinciding with th...

متن کامل

Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

2007

Ronald Ortner

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared ...

متن کامل

Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

2011

CHIN HON TAN JOSEPH C. HARTMAN

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be perf...

متن کامل

Geometric convergence in average reward Markov decision processes

2017

W. H. M.

• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version ...

متن کامل

On mean reward variance in semi-Markov processes

Journal: :Math. Meth. of OR 2005

Karel Sladký

As an extension of the discrete-time case, this note investigates the variance of the total cumulative reward for the embedded Markov chain of semi-Markov processes. Under the assumption that the chain is aperiodic and contains a single class of recurrent states recursive formulae for the variance are obtained which show that the variance growth rate is asymptotically linear in time. Expression...

متن کامل

Cannabinoid conditioned reward and aversion: behavioral and neural processes.

Journal: :ACS chemical neuroscience 2010

Jennifer E Murray Rick A Bevins

The discovery that delta-9-tetrahydrocannabinol (Δ(9)-THC) is the primary psychoactive ingredient in marijuana prompted research that helped elucidate the endogenous cannabinoid system of the brain. Δ(9)-THC and other cannabinoid ligands with agonist action (CP 55,940, HU210, and WIN 55,212-2) increase firing of dopamine neurons and increase synaptic dopamine in brain regions associated with re...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید