نتایج جستجو برای: critic

تعداد نتایج: 2831  

Journal: :CoRR 2013
Keyvan Yahya

This paper aims to find an algorithmic structure that affords to predict and explain the economical choice behaviour particularly under uncertainty(random policies) by manipulating the prevalent Actor-Critic learning method to comply with the requirements we have been entrusted ever since the field of neuroeconomics dawned on us. Whilst skimming some basics of neuroeconomics that might be relev...

2016
Ngo Anh Vien Peter Englert Marc Toussaint

Modeling policies in reproducing kernel Hilbert space (RKHS) renders policy gradient reinforcement learning algorithms non-parametric. As a result, the policies become very flexible and have a rich representational potential without a predefined set of features. However, their performances might be either non-covariant under reparameterization of the chosen kernel, or very sensitive to step-siz...

2016
Huitian Lei

An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention by Huitian Lei Chair: Professor Susan A. Murphy Assistant Professor Ambuj Tewari Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative health interventions. An Adaptive Intervention (AI) personalizes the type, mode and...

2002
Matti Aksela Ramunas Girdziusas Jorma Laaksonen Erkki Oja Jari Kangas

This paper discusses a combination of two techniques for improving the recognition accuracy of on-line handwritten character recognition: committee classification and adaptation to the user. A novel adaptive committee structure, namely the Class-Confidence Critic Combination (CCCC) scheme, is presented and evaluated. It is shown to be able to improve significantly on its member classifiers. Als...

1987
Gerhard Fischer

Our goal is to establish the conceptual foundations for using the computational power that is or will be available on computer systems. Much of the available computing power is wasted, however, if users have difficulty understanding and llsing the full potential of these systems. Too much attention in the past has been given to the technology of computer systems and not enough to the effects of...

Journal: :CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne 2009
Julie Strong

Journal: :CoRR 2018
Mikolaj Binkowski Dougal J. Sutherland Michael Arbel Arthur Gretton

We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning...

Journal: :CoRR 2018
Xiaoqin Zhang Huimin Ma

Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-c...

2012
M. Sedighizadeh A. Rezazadeh

A self tuning PID control strategy using reinforcement learning is proposed in this paper to deal with the control of wind energy conversion systems (WECS). Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to impro...

Journal: :Adaptive Behaviour 2005
Mehdi Khamassi Loïc Lachèze Benoît Girard Alain Berthoz Agnès Guillot

Since 1995, numerous Actor-Critic architectures for reinforcement learning have been proposed as models of dopamine-like reinforcement learning mechanisms in the rat’s basal ganglia. However, these models were usually tested in different tasks, and it is then difficult to compare their efficiency for an autonomous animat. We present here the comparison of four architectures in an animat as it p...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید