critic

Stochastic Control Strategies and Adaptive Critic Methods

2008

Randa Herzallah David Lowe

Adaptive critic methods have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Since they approximate the dynamic programming solutions, they are potentially suitable for learning in noisy, nonlinear and nonstationary environments. In this study, a novel probabilistic dual heuristic programming (DHP) based adaptive critic controller is proposed...

متن کامل

Robust Contextual Bandit via the Capped-$\ell_{2}$ norm

Journal: :CoRR 2017

Feiyun Zhu Xinliang Zhu Sheng Wang Jiawen Yao Junzhou Huang

This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decisionmaking methods in mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue...

متن کامل

An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

1998

Hajime Kimura Shigenobu Kobayashi

We present an analysis of actor/critic algorithms, in which the actor updates its policy using eligibility traces of the policy parameters. Most of the theoretical results for eligibility traces have been for only critic's value iteration algorithms. This paper investigates what the actor's eligibility trace does. The results show that the algorithm is an extension of Williams' REINFORCE algori...

متن کامل

Learning to Play Donkey Kong Using Neural Networks and Reinforcement Learning

2017

Paul Ozkohen Jelle Visser Martijn van Otterlo Marco Wiering

Neural networks and reinforcement learning have successfully been applied to various games, such as Ms. Pacman and Go. We combine multilayer perceptrons and a class of reinforcement learning algorithms known as actor-critic to learn to play the arcade classic Donkey Kong. Two neural networks are used in this study: the actor and the critic. The actor learns to select the best action given the g...

متن کامل

Variance Adjusted Actor Critic Algorithms

Journal: :CoRR 2013

Aviv Tamar Shie Mannor

We present an actor-critic framework for MDPs where the objective is the variance-adjusted expected return. Our critic uses linear function approximation, and we extend the concept of compatible features to the variance-adjusted setting. We present an episodic actor-critic algorithm and show that it converges almost surely to a locally optimal point of the objective function. Index Terms Reinfo...

متن کامل

A Divergence Critic

1996

Adel Bouhoula Michael Rusinowitch Pierre Lescanne David Basin Alan Bundy Miki Hermann Andrew Ireland

Inductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a \di erence matching" procedure. The critic then proposes lemmas and generalizations which \ripple" these differences away so that the proof can go through with...

متن کامل

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Journal: :CoRR 2017

Baolin Peng Xiujun Li Jianfeng Gao Jingjing Liu Yun-Nung Chen Kam-Fai Wong

This paper presents a new method — adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in taskcompletion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the ...

متن کامل

A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems

Journal: :Automatica 2013

Shubhendu Bhasin R. Kamalapurkar Marcus Johnson Kyriakos G. Vamvoudakis Frank L. Lewis Warren E. Dixon

An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor–critic–identifier (ACI) is proposed to approximate the Hamilton–Jacobi–Bellman equation using three neural network (NN) structures—actor and critic NNs approximate the optimal control and the optimal value function,...

متن کامل

Towards Feature Selection In Actor-Critic Algorithms

2007

Khashayar Rohanimanesh Nicholas Roy Russ Tedrake

Choosing features for the critic in actor-critic algorithms with function approximation is known to be a challenge. Too few critic features can lead to degeneracy of the actor gradient, and too many features may lead to slower convergence of the learner. In this paper, we show that a well-studied class of actor policies satisfy the known requirements for convergence when the actor features are ...

متن کامل

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning

2017

Vivek Veeriah Harm van Seijen Richard S. Sutton

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step λ-return with nonlinear function approximators. However, this was limited to action-value methods. In thi...

متن کامل