The promise of reinforcement learning is to solve complex sequential decision problems autonomously by specifying a high-level reward function only. However, algorithms struggle when, as often the case, simple and intuitive rewards provide sparse deceptive feedback. Avoiding these pitfalls requires thoroughly exploring environment, but creating that can do so remains one central challenges fiel...