Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization
نویسندگان
چکیده
In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional (KL-C51) is proposed to integrate advantages both (KL) one framework. KL-C51 derived Bellman equation TD errors regularized by KL divergence a perspective explored approximated strategies properly mapping corresponding Boltzmann softmax term into distributions. Evaluated not only several benchmark tasks with different complexity from OpenAI Gym but also six Atari 2600 games Arcade Learning Environment, method clearly illustrates positive effect regularization including exclusive exploration behaviors smooth value function update, demonstrates an improvement compared other related baseline approaches.
منابع مشابه
Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence
We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carrying out extended value iterations, under a constraint of consistency with the estimated model transition probabilities. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this pur...
متن کاملRényi Divergence and Kullback-Leibler Divergence
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon’s entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibl...
متن کاملUse of Kullback–Leibler divergence for forgetting
Non-symmetric Kullback–Leibler divergence (KLD) measures proximity of probability density functions (pdfs). Bernardo (Ann. Stat. 1979; 7(3):686–690) had shown its unique role in approximation of pdfs. The order of the KLD arguments is also implied by his methodological result. Functional approximation of estimation and stabilized forgetting, serving for tracking of slowly varying parameters, us...
متن کاملVector Quantization by Minimizing Kullback-Leibler Divergence
This paper proposes a new method for vector quantization by minimizing the Kullback-Leibler Divergence between the class label distributions over the quantization inputs, which are original vectors, and the output, which is the quantization subsets of the vector set. In this way, the vector quantization output can keep as much information of the class label as possible. An objective function is...
متن کاملKullback-Leibler Divergence for Nonnegative Matrix Factorization
The I-divergence or unnormalized generalization of KullbackLeibler (KL) divergence is commonly used in Nonnegative Matrix Factorization (NMF). This divergence has the drawback that its gradients with respect to the factorizing matrices depend heavily on the scales of the matrices, and learning the scales in gradient-descent optimization may require many iterations. This is often handled by expl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Intelligence
سال: 2023
ISSN: ['0924-669X', '1573-7497']
DOI: https://doi.org/10.1007/s10489-023-04867-z