نتایج جستجو برای: passive critic

تعداد نتایج: 73280  

2001
Alec M. Rogers

In the context of fuzzy control, antecedent parameters are used to provide a segmentation of the state space so that different regions can be modeled appropriately. In Adaptive Critic methodologies, two modules (the critic and the controller) must properly segment the state space to insure good performance. In this paper, we explore the effects of tuning antecedent parameters that are shared be...

Journal: :Journal of the American Medical Association 1908

Journal: :The Old Testament Student 1885

2007
K. Wendy Tang

Heuristic Dynamic Programming (HDP) is the simplest kind of Adaptive Critic which is a powerful form of reinforcement control 1]. It can be used to maximize or minimize any utility function, such as total energy or trajectory error, of a system over time in a noisy environment. Unlike supervised learning, adaptive critic design does not require the desired control signals be known. Instead, fee...

2007
Jan Peters Stefan Schaal

In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the “building blocks of movement generation”, called motor ...

Journal: :ادبیات پایداری 0

0

Journal: :CoRR 2018
Hamid Reza Maei

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning where the action representation adds to thecurse-of-dimensionality; that is, with continuous or large action sets, thus making it infeasible to estimate state-...

2014
Kimberly L. Stachenfeld Matthew M. Botvinick Samuel J. Gershman

The SR-based critic learns an estimate of the value function, using the SR as its feature representation. Unlike standard actor-critic methods, the critic does not use reward-based temporal difference errors to update its value estimate; instead, it relies on the fact that the value function is given by V (s) = ∑ s′ M(s, s ′)R(s′), where M is the successor representation andR is the expected re...

2013
A. Shafiekhani M. J. Mahjoob M. Roozegar

In this work, an adaptive critic-based neuro-fuzzy is presented for an unmanned bicycle. The only information available for the critic agent is the system feedback which is interpreted as the last action the controller has performed in the previous state. The signal produced by the critic agent is used alongside the back propagation of error algorithm to tune online conclusion parts of the fuzz...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید