reward

Online Transfer Learning in Reinforcement Learning Domains

Journal: :CoRR 2015

Yusen Zhan Matthew E. Taylor

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Qlearning and Sarsa with ...

متن کامل

Fuzzy and Tile Coding Function Approximation in Agent Coevolution

2006

Laurissa N. Tokarchuk John Bigham Laurie G. Cuthbert

Reinforcement learning (RL) is a machine learning technique for sequential decision making. This approach is well proven in many small-scale domains. The true potential of this technique cannot be fully realised until it can adequately deal with the large domain sizes that typically describe real world problems. RL with function approximation is one method of dealing with the domain size proble...

متن کامل

Expected Policy Gradients for Reinforcement Learning

Journal: :CoRR 2018

Kamil Ciosek Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussi...

متن کامل

Global Versus Local Constructive Function Approximation for On-Line Reinforcement Learning

2005

Peter Vamplew Robert Ollington

In order to scale to problems with large or continuous state-spaces, reinforcement learning algorithms need to be combined with function approximation techniques. The majority of work on function approximation for reinforcement learning has so far focused either on global function approximation with a static structure (such as multi-layer perceptrons), or on constructive architectures using loc...

متن کامل

Model checking mobile stochastic logic

Journal: :Theor. Comput. Sci. 2007

Rocco De Nicola Joost-Pieter Katoen Diego Latella Michele Loreti Mieke Massink

The TemporalMobile Stochastic Logic (MOSL) has been introduced in previous work by the authors for formulating properties of systems specified in STOKLAIM, a Markovian extension of KLAIM. The main purpose of MOSL is to address key functional aspects of global computing such as distribution awareness, mobility, and security and their integration with performance and dependability guarantees. In ...

متن کامل

Slothrop: Knuth-Bendix Completion with a Modern Termination Checker

2006

Ian Wehrman Aaron Stump Edwin M. Westbrook

A Knuth-Bendix completion procedure is parametrized by a reduction ordering used to ensure termination of intermediate and resulting rewriting systems. While in principle any reduction ordering can be used, modern completion tools typically implement only Knuth-Bendix and path orderings. Consequently, the theories for which completion can possibly yield a decision procedure are limited to those...

متن کامل

اثربخشی تکلیف پاداش محور بر سطوح عاطفی افراد افسرده

ژورنال: روانپزشکی و روانشناسی بالینی ایران 2018

اعتمادی چهارده, نیلوفر, بخشی پور, عباس, کریم پور وظیفه خورانی, علیرضا, کمالی قاسم آبادی, حسین,

Objectives The present study examined the effects of reward-driven task on improving the affective levels in individuals with depressive symptoms. Methods The present study is an experiment study with pretest- posttest and follow-up with control group. The community of this research was the students in Tabriz University in 2016-2017 semester. The sample size was 40 students which had visited t...

متن کامل

A Fully Syntactic AC-RPO

Journal: :Inf. Comput. 1999

Albert Rubio

We present the first fully syntactic (i.e., non-interpretation-based) AC-compatible recursive path ordering (RPO). It is simple, and hence easy to implement, and its behaviour is intuitive as in the standard RPO. The ordering is AC-total and defined uniformly for both ground and nonground terms, as well as for partial precedences. More important, it is the first one that can deal incrementally ...

متن کامل

Delayed Reward Discounting and Alcohol Misuse: The Roles of Response Consistency and Reward Magnitude.

Journal: :Journal of experimental psychopathology 2011

Michael Amlung James MacKillop

Delayed reward discounting (DRD) is a common index of impulsivity that refers to an individual's devaluation of rewards based on delay of receipt and has been linked to alcohol misuse and other maladaptive behaviors. The current study investigated response consistency and reward magnitude effects in two measures of DRD in a sample of 111 undergraduates who consumed an average of 10.7 drinks/wee...

متن کامل

Online Control With Least-Squares Methods

2007

Policy evaluation using least-squares techniques (such as LSTD and iLSTD) have been shown to estimate the value of a policy with far less data than traditional TD techniques. Unfortunately, they make use of policy-dependent statistics that have to be discarded when the policy changes. This makes it difficult to use the techniques for online control problems. In this paper, we explore the effect...

متن کامل