Forward-Backward Reinforcement Learning
نویسندگان
چکیده
Goals for reinforcement learning problems are typically defined through handspecified rewards. To design such problems, developers of learning algorithms must inherently be aware of what the task goals are, yet we often require agents to discover them on their own without any supervision beyond these sparse rewards. While much of the power of reinforcement learning derives from the concept that agents can learn with little guidance, this requirement greatly burdens the training process. If we relax this one restriction and endow the agent with knowledge of the reward function, and in particular of the goal, we can leverage backwards induction to accelerate training. To achieve this, we propose training a model to learn to take imagined reversal steps from known goal states. Rather than training an agent exclusively to determine how to reach a goal while moving forwards in time, our approach travels backwards to jointly predict how we got there. We evaluate our work in Gridworld and Towers of Hanoi and empirically demonstrate that it yields better performance than standard DDQN.
منابع مشابه
Planning with neural networks and reinforcement learning
planning with neural networks, time limits of discounted reinforcement learning Planning, taskability, Dyna-PI architectures Dyna-PI architectures: focussing, forward and backward planning, acting and (re)planning. Tested with... Ideas from problem solving and
متن کاملForward propagating reinforcement learning--biologically plausible learning method for multi-layer networks.
We introduce a biologically plausible method of implementing reinforcement learning to multi-layer neural networks. The key idea is to spatially localize the synaptic modulation induced by reinforcement signals, proceeding downstream from the initial layer to the final layer. Since reinforcement signals are known to be broadcast signals in the actual brain, we need two key assumptions, inhibito...
متن کاملReal-world reinforcement learning for autonomous humanoid robot docking
Reinforcement learning (RL) is a biologically supported learning paradigm, which allows an agent to learn through experience acquired by interaction with its environment. Its potential to learn complex action sequences has been proven for a variety of problems, such as navigation tasks. However, the interactive randomized exploration of the state space, common in reinforcement learning, makes i...
متن کاملWhere Forward-Looking and Backward-Looking Models Meet
The present paper begins by deriving an instantaneous formulation for the backwardlooking (reinforcement based learning) satisfaction balance model of Gray and Tallman (1984). This model is then used to generate interactional data from four simulated agents in a network interaction experiment. Because this initial model does not generate stable interaction structures in the network experiment, ...
متن کاملBackward vs. Forward-Oriented Decision Making in the Iterated Prisoner's Dilemma: A Comparison Between Two Connectionist Models
We compare the performance of two connectionist models developed to model specific aspects of the decision making process in the Iterated Prisoner’s Dilemma Game. Both models are based on common recurrent network architecture. The first of them uses a backward-oriented reinforcement learning algorithm for learning to play the game while the second one makes its move decisions based on generated...
متن کامل