نتایج جستجو برای: reward
تعداد نتایج: 29303 فیلتر نتایج به سال:
This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Qlearning and Sarsa with ...
Reinforcement learning (RL) is a machine learning technique for sequential decision making. This approach is well proven in many small-scale domains. The true potential of this technique cannot be fully realised until it can adequately deal with the large domain sizes that typically describe real world problems. RL with function approximation is one method of dealing with the domain size proble...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussi...
In order to scale to problems with large or continuous state-spaces, reinforcement learning algorithms need to be combined with function approximation techniques. The majority of work on function approximation for reinforcement learning has so far focused either on global function approximation with a static structure (such as multi-layer perceptrons), or on constructive architectures using loc...
The TemporalMobile Stochastic Logic (MOSL) has been introduced in previous work by the authors for formulating properties of systems specified in STOKLAIM, a Markovian extension of KLAIM. The main purpose of MOSL is to address key functional aspects of global computing such as distribution awareness, mobility, and security and their integration with performance and dependability guarantees. In ...
A Knuth-Bendix completion procedure is parametrized by a reduction ordering used to ensure termination of intermediate and resulting rewriting systems. While in principle any reduction ordering can be used, modern completion tools typically implement only Knuth-Bendix and path orderings. Consequently, the theories for which completion can possibly yield a decision procedure are limited to those...
Objectives The present study examined the effects of reward-driven task on improving the affective levels in individuals with depressive symptoms. Methods The present study is an experiment study with pretest- posttest and follow-up with control group. The community of this research was the students in Tabriz University in 2016-2017 semester. The sample size was 40 students which had visited t...
We present the first fully syntactic (i.e., non-interpretation-based) AC-compatible recursive path ordering (RPO). It is simple, and hence easy to implement, and its behaviour is intuitive as in the standard RPO. The ordering is AC-total and defined uniformly for both ground and nonground terms, as well as for partial precedences. More important, it is the first one that can deal incrementally ...
Delayed reward discounting (DRD) is a common index of impulsivity that refers to an individual's devaluation of rewards based on delay of receipt and has been linked to alcohol misuse and other maladaptive behaviors. The current study investigated response consistency and reward magnitude effects in two measures of DRD in a sample of 111 undergraduates who consumed an average of 10.7 drinks/wee...
Policy evaluation using least-squares techniques (such as LSTD and iLSTD) have been shown to estimate the value of a policy with far less data than traditional TD techniques. Unfortunately, they make use of policy-dependent statistics that have to be discarded when the policy changes. This makes it difficult to use the techniques for online control problems. In this paper, we explore the effect...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید