A Comparison of Human and Agent Reinforcement Learning in Partially Observable Domains
نویسندگان
چکیده
It is commonly stated that reinforcement learning (RL) algorithms require more samples to learn than humans. In this work, we investigate this claim using two standard problems from the RL literature. We compare the performance of human subjects to RL techniques. We find that context—the meaningfulness of the observations—plays a significant role in the rate of human RL. Moreover, without contextual information, humans often fare much worse than classic algorithms. Comparing the detailed responses of humans and RL algorithms, we also find that humans appear to employ rather different strategies from standard algorithms, even in cases where they had indistinguishable performance to them.
منابع مشابه
Efficient Exploration in Reinforcement Learning with Hidden State
Undoubtedly, efficient exploration is crucial for the success of a learning agent. Previous approaches to exploration in reinforcement learning exclusively address exploration in Markovian domains, i.e. domains in which the state of the environment is fully observable. If the environment is only partially observable, they cease to work because exploration statistics are confounded between alias...
متن کاملRobot Navigation in Partially Observable Domains using Hierarchical Memory-Based Reinforcement Learning
In this paper, we attempt to find a solution to the problem of robot navigation in a domain with partial observability. The domain is a grid-world with intersecting corridors, where the agent learns an optimal policy for navigation by making use of a hierarchical memory-based learning algorithm. We define a hierarchy of levels over which the agent abstracts the learning process, as well as it...
متن کاملNonparametric Bayesian Policy Priors for Reinforcement Learning
We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and s...
متن کاملMarkov Games of Incomplete Information for Multi-Agent Reinforcement Learning
Partially observable stochastic games (POSGs) are an attractive model for many multi-agent domains, but are computationally extremely difficult to solve. We present a new model, Markov games of incomplete information (MGII) which imposes a mild restriction on POSGs while overcoming their primary computational bottleneck. Finally we show how to convert a MGII into a continuous but bounded fully ...
متن کاملLocking in Returns: Speeding Up Q-Learning by Scaling
One problem common to many reinforcement learning algorithms is their need for large amounts of training, resulting in a variety of methods for speeding up these algorithms. We propose a novel method that is remarkable both for its simplicity and its utility in speeding up Q-learning. It operates by scaling the values in the Q-table after limited, typically small, amounts of learning. Empirical...
متن کامل