Scaling Up Reinforcement Learning through Targeted Exploration
نویسندگان
چکیده
Recent Reinforcement Learning (RL) algorithms, such as RMAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STAR-MAX behaves identically to R-MAX. When ξ is a subset of the state space, to keep exploration within ξ, a recovery rule β is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ξ, STAR-MAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ξ and β. However, we show that an effective recovery rule β can be learned on-line and ξ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to R-MAX. We expect these results to lead to more efficient methods for RL in large-scale problems.
منابع مشابه
Algorithm-Directed Exploration for Model-Based Reinforcement Learning in Factored MDPs
One of the central challenges in reinforcement learning is to balance the exploration/exploitation tradeoff while scaling up to large problems. Although model-based reinforcement learning has been less prominent than value-based methods in addressing these challenges, recent progress has generated renewed interest in pursuing modelbased approaches: Theoretical work on the exploration/exploitati...
متن کاملA Study of Qualitative Knowledge-Based Exploration for Continuous Deep Reinforcement Learning
As an important method to solve sequential decisionmaking problems, reinforcement learning learns the policy of tasks through the interaction with environment. But it has difficulties scaling to largescale problems. One of the reasons is the exploration and exploitation dilemma which may lead to inefficient learning. We present an approach that addresses this shortcoming by introducing qualitat...
متن کاملReinforcement Algorithms Using Functional Approximation for Generalization and their Application to Cart Centering and Fractal Compression
We address the conflict between identification and control or alternatively, the conflict between exploration and exploitation, within the framework of reinforcement learning. Qlearning has recently become a popular offpolicy reinforcement learning method. The conflict between exploration and exploitation slows down Q-learning algorithms; their performance does not scale up and degrades rapidly...
متن کاملActive Reinforcement Learning with Monte-Carlo Tree Search
Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to BayesAdaptive MDPs. We provide an ARL algorithm using Monte-C...
متن کاملEfficient Exploration through Bayesian Deep Q-Networks
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) mode...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011