Bayesian Distributional Policy Gradients
نویسندگان
چکیده
Distributional Reinforcement Learning (RL) maintains the entire probability distribution of reward-to-go, i.e. return, providing more learning signals that account for uncertainty associated with policy performance, which may be beneficial trading off exploration and exploitation in general. Previous works distributional RL focused mainly on computing state-action-return distributions, here we model state-return distributions. This enables us to translate successful conventional algorithms are based state values into RL. We formulate Bellman operation as an inference-based auto-encoding process minimises Wasserstein metrics between target/model return The proposed algorithm, BDPG (Bayesian Policy Gradients), uses adversarial training joint-contrastive estimate a variational posterior from returns. Moreover, can now interpret prediction information gain, allows obtain new curiosity measure helps steer actively efficiently. demonstrate suite Atari 2600 games MuJoCo tasks, including well known hard-exploration challenges, how learns generally faster higher asymptotic performance than reference algorithms.
منابع مشابه
Distributed Distributional Deterministic Policy Gradients
This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple impr...
متن کاملBayesian Policy Gradients via Alpha Divergence Dropout Inference
Policy gradient methods have had great success in solving continuous control tasks, yet the stochastic nature of such problems makes deterministic value estimation difficult. We propose an approach which instead estimates a distribution by fitting the value function with a Bayesian Neural Network. We optimize an α-divergence objective with Bayesian dropout approximation to learn and estimate th...
متن کاملBayesian Optimization with Gradients
Bayesian optimization has been successful at global optimization of expensiveto-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, ...
متن کاملBayesian structured additive distributional regression
In this paper, we propose a generic Bayesian framework for inference in distributional regression models in which each parameter of a potentially complex response distribution and not only the mean is related to a structured additive predictor. The latter is composed additively of a variety of different functional effect types such as nonlinear effects, spatial effects, random coefficients, int...
متن کاملHindsight policy gradients
Goal-conditional policies allow reinforcement learning agents to pursue specific goals during different episodes. In addition to their potential to generalize desired behavior to unseen goals, such policies may also help in defining options for arbitrary subgoals, enabling higher-level planning. While trying to achieve a specific goal, an agent may also be able to exploit information about the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i10.17024