A Comparison of Human and Agent Reinforcement Learning in Partially Observable Domains

نویسندگان

Finale Doshi-Velez

Zoubin Ghahramani

چکیده

It is commonly stated that reinforcement learning (RL) algorithms require more samples to learn than humans. In this work, we investigate this claim using two standard problems from the RL literature. We compare the performance of human subjects to RL techniques. We find that context—the meaningfulness of the observations—plays a significant role in the rate of human RL. Moreover, without contextual information, humans often fare much worse than classic algorithms. Comparing the detailed responses of humans and RL algorithms, we also find that humans appear to employ rather different strategies from standard algorithms, even in cases where they had indistinguishable performance to them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Exploration in Reinforcement Learning with Hidden State

Undoubtedly, efficient exploration is crucial for the success of a learning agent. Previous approaches to exploration in reinforcement learning exclusively address exploration in Markovian domains, i.e. domains in which the state of the environment is fully observable. If the environment is only partially observable, they cease to work because exploration statistics are confounded between alias...

متن کامل

Robot Navigation in Partially Observable Domains using Hierarchical Memory-Based Reinforcement Learning

 In this paper, we attempt to find a solution to the problem of robot navigation in a domain with partial observability. The domain is a grid-world with intersecting corridors, where the agent learns an optimal policy for navigation by making use of a hierarchical memory-based learning algorithm. We define a hierarchy of levels over which the agent abstracts the learning process, as well as it...

متن کامل

Nonparametric Bayesian Policy Priors for Reinforcement Learning

We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and s...

متن کامل

Markov Games of Incomplete Information for Multi-Agent Reinforcement Learning

Partially observable stochastic games (POSGs) are an attractive model for many multi-agent domains, but are computationally extremely difficult to solve. We present a new model, Markov games of incomplete information (MGII) which imposes a mild restriction on POSGs while overcoming their primary computational bottleneck. Finally we show how to convert a MGII into a continuous but bounded fully ...

متن کامل

Locking in Returns: Speeding Up Q-Learning by Scaling

One problem common to many reinforcement learning algorithms is their need for large amounts of training, resulting in a variety of methods for speeding up these algorithms. We propose a novel method that is remarkable both for its simplicity and its utility in speeding up Q-learning. It operates by scaling the values in the Q-table after limited, typically small, amounts of learning. Empirical...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

A Comparison of Human and Agent Reinforcement Learning in Partially Observable Domains

نویسندگان

چکیده

منابع مشابه

Efficient Exploration in Reinforcement Learning with Hidden State

Robot Navigation in Partially Observable Domains using Hierarchical Memory-Based Reinforcement Learning

Nonparametric Bayesian Policy Priors for Reinforcement Learning

Markov Games of Incomplete Information for Multi-Agent Reinforcement Learning

Locking in Returns: Speeding Up Q-Learning by Scaling

عنوان ژورنال:

اشتراک گذاری