First return, then explore
نویسندگان
چکیده
The promise of reinforcement learning is to solve complex sequential decision problems autonomously by specifying a high-level reward function only. However, algorithms struggle when, as often the case, simple and intuitive rewards provide sparse deceptive feedback. Avoiding these pitfalls requires thoroughly exploring environment, but creating that can do so remains one central challenges field. We hypothesise main impediment effective exploration originates from forgetting how reach previously visited states ("detachment") failing first return state before it ("derailment"). introduce Go-Explore, family addresses two directly through principles explicitly remembering promising returning such intentionally exploring. Go-Explore solves all heretofore unsolved Atari games surpasses art on hard-exploration games, with orders magnitude improvements grand Montezuma's Revenge Pitfall. also demonstrate practical potential sparse-reward pick-and-place robotics task. Additionally, we show adding goal-conditioned policy further improve Go-Explore's efficiency enable handle stochasticity throughout training. substantial performance gains suggest states, them, them are powerful general approach exploration, an insight may prove critical creation truly intelligent agents.
منابع مشابه
On Explore-Then-Commit strategies
We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by exploitation are necessarily suboptimal. The results hold regardless of whether or not the difference in means between the two arms is known. Besides the main mess...
متن کاملFirst remodel, then recycle
M ultivesicular bodies (MVBs) are specialized endosomes that promote the degradation of membrane proteins by delivering them to lysosomes. This process is regulated by a series of endosomal sorting complexes required for transport (ESCRTs) (1). The ESCRT-0,-I, and-II complexes sequester ubiquitinated membrane proteins and recruit the ESCRT-III complex to drive the invagination and release of ca...
متن کاملFirst depressed, then discriminated against?
Each year a substantial share of the European population suffers from major depression. This mental illness may affect individuals' later life outcomes indirectly by the stigma it inflicts. The present study assesses hiring discrimination based on disclosed depression. To this end, between May 2015 and July 2015, we sent out 288 trios of job applications from fictitious candidates to real vacan...
متن کاملDiet first, then medication for hypercholesterolemia.
REFERENCES 1. Yahalom J, Petrek JA, Biddinger PW, et al. Breast cancer in patients irradiated for Hodgkin's disease: a clinical and pathologic analysis of 45 events in 37 patients. et al. Long-term survival and competing causes of death in patients with early-stage Hodgkin's disease treated at age 50 or younger. Second malignancy after Hodgkin disease treated with radiation therapy with or with...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Nature
سال: 2021
ISSN: ['1476-4687', '0028-0836']
DOI: https://doi.org/10.1038/s41586-020-03157-9