نتایج جستجو برای: critic
تعداد نتایج: 2831 فیلتر نتایج به سال:
This paper aims to find an algorithmic structure that affords to predict and explain the economical choice behaviour particularly under uncertainty(random policies) by manipulating the prevalent Actor-Critic learning method to comply with the requirements we have been entrusted ever since the field of neuroeconomics dawned on us. Whilst skimming some basics of neuroeconomics that might be relev...
Modeling policies in reproducing kernel Hilbert space (RKHS) renders policy gradient reinforcement learning algorithms non-parametric. As a result, the policies become very flexible and have a rich representational potential without a predefined set of features. However, their performances might be either non-covariant under reparameterization of the chosen kernel, or very sensitive to step-siz...
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention
An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention by Huitian Lei Chair: Professor Susan A. Murphy Assistant Professor Ambuj Tewari Increasing technological sophistication and widespread use of smartphones and wearable devices provide opportunities for innovative health interventions. An Adaptive Intervention (AI) personalizes the type, mode and...
This paper discusses a combination of two techniques for improving the recognition accuracy of on-line handwritten character recognition: committee classification and adaptation to the user. A novel adaptive committee structure, namely the Class-Confidence Critic Combination (CCCC) scheme, is presented and evaluated. It is shown to be able to improve significantly on its member classifiers. Als...
Our goal is to establish the conceptual foundations for using the computational power that is or will be available on computer systems. Much of the available computing power is wasted, however, if users have difficulty understanding and llsing the full potential of these systems. Too much attention in the past has been given to the technology of computer systems and not enough to the effects of...
We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning...
Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-c...
A self tuning PID control strategy using reinforcement learning is proposed in this paper to deal with the control of wind energy conversion systems (WECS). Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to impro...
Since 1995, numerous Actor-Critic architectures for reinforcement learning have been proposed as models of dopamine-like reinforcement learning mechanisms in the rat’s basal ganglia. However, these models were usually tested in different tasks, and it is then difficult to compare their efficiency for an autonomous animat. We present here the comparison of four architectures in an animat as it p...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید