Sequential Constant Size Compressors for Reinforcement Learning
نویسندگان
چکیده
Traditional Reinforcement Learning methods are insufficient for AGIs who must be able to learn to deal with Partially Observable Markov Decision Processes. We investigate a novel method for dealing with this problem: standard RL techniques using as input the hidden layer output of a Sequential Constant-Size Compressor (SCSC). The SCSC takes the form of a sequential Recurrent Auto-Associative Memory, trained through standard back-propagation. Results illustrate the feasibility of this approach — this system learns to deal with highdimensional visual observations (up to 640 pixels) in partially observable environments where there are long time lags (up to 12 steps) between relevant sensory information and necessary action.
منابع مشابه
Reinforcement Learning and Design of Nonparametric Sequential Decision Networks
In this paper we discuss the design of sequential detection networks for nonparametric sequential analysis. We present a general probabilistic model for sequential detection problems where the sample size as well as the statistics of the sample can be varied. A general sequential detection network handles three decisions. First, the network decides whether to continue sampling or stop and make ...
متن کاملA novel genetic reinforcement learning for nonlinear fuzzy control problems
Unlike a supervise learning, a reinforcement learning problem has only very simple ‘‘evaluative’’ or ‘‘critic’’ information available for learning, rather than ‘‘instructive’’ information. A novel genetic reinforcement learning, called reinforcement sequential-search-based genetic algorithm (R-SSGA), is proposed for solving the nonlinear fuzzy control problems in this paper. Unlike the traditio...
متن کاملEfficient Approximate Policy Iteration Methods for Sequential Decision Making in Reinforcement Learning
(Computer Science—Machine Learning) EFFICIENT APPROXIMATE POLICY ITERATION METHODS FOR SEQUENTIAL DECISION MAKING IN REINFORCEMENT LEARNING
متن کاملLinear Stochastic Approximation: Constant Step-Size and Iterate Averaging
We consider d-dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate θ∗ ∈ R (that is an optimum or a fixed point) using noisy data and O(d) updates per iteration. In this paper, we ar...
متن کاملEfficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments
Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments by Miao Liu Department of Electrical and Computer Engineering Duke University
متن کامل