On using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains
نویسندگان
چکیده
We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of CohenGrossberg node dynamics describes the node-activation computations of a neural network (NN). Our NN (i.e., agent) is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning by how much the state space must be augmented so that the Markov property holds. This serves as an alternative to using Elman or Jordantype recurrent neural networks, whose context units function as a history memory in order to develop sensitivity to non-Markovian dependencies. We shall demonstrate our concept using a small-scale non-Markovian deterministic path problem, in which our actor-critic NN finds an optimal sequence of actions (but learns neither transitional dynamics nor associated rewards), although it needs many iterations due to the nature of neural model-free learning. This is, in spirit, a neurodynamic programming approach.
منابع مشابه
Totally Model-Free Reinforcement Learning by Actor-Critic Elman Networks in Non-Markovian Domains
In this paper we describe how an actor critic rein forcement learning agent in a non Markovian domain nds an optimal sequence of actions in a totally model free fashion that is the agent neither learns transitional probabilities and associated rewards nor by how much the state space should be augmented so that the Markov prop erty holds In particular we employ an Elman type re current neural ne...
متن کاملRobust stability of fuzzy Markov type Cohen-Grossberg neural networks by delay decomposition approach
In this paper, we investigate the delay-dependent robust stability of fuzzy Cohen-Grossberg neural networks with Markovian jumping parameter and mixed time varying delays by delay decomposition method. A new Lyapunov-Krasovskii functional (LKF) is constructed by nonuniformly dividing discrete delay interval into multiple subinterval, and choosing proper functionals with different weighting matr...
متن کاملLearning to Play Donkey Kong Using Neural Networks and Reinforcement Learning
Neural networks and reinforcement learning have successfully been applied to various games, such as Ms. Pacman and Go. We combine multilayer perceptrons and a class of reinforcement learning algorithms known as actor-critic to learn to play the arcade classic Donkey Kong. Two neural networks are used in this study: the actor and the critic. The actor learns to select the best action given the g...
متن کاملOnline Learning of Optimal Control Solutions Using Integral Reinforcement Learning and Neural Networks
In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system’...
متن کاملA novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor–critic–identifier (ACI) is proposed to approximate the Hamilton–Jacobi–Bellman equation using three neural network (NN) structures—actor and critic NNs approximate the optimal control and the optimal value function,...
متن کامل