Efficient distributional reinforcement learning with Kullback-Leibler divergence regularization

نویسندگان

چکیده

In this article, we address the issues of stability and data-efficiency in reinforcement learning (RL). A novel RL approach, Kullback-Leibler divergence-regularized distributional (KL-C51) is proposed to integrate advantages both (KL) one framework. KL-C51 derived Bellman equation TD errors regularized by KL divergence a perspective explored approximated strategies properly mapping corresponding Boltzmann softmax term into distributions. Evaluated not only several benchmark tasks with different complexity from OpenAI Gym but also six Atari 2600 games Arcade Learning Environment, method clearly illustrates positive effect regularization including exclusive exploration behaviors smooth value function update, demonstrates an improvement compared other related baseline approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. Optimism is usually implemented by carrying out extended value iterations, under a constraint of consistency with the estimated model transition probabilities. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this pur...

متن کامل

Rényi Divergence and Kullback-Leibler Divergence

Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon’s entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibl...

متن کامل

Use of Kullback–Leibler divergence for forgetting

Non-symmetric Kullback–Leibler divergence (KLD) measures proximity of probability density functions (pdfs). Bernardo (Ann. Stat. 1979; 7(3):686–690) had shown its unique role in approximation of pdfs. The order of the KLD arguments is also implied by his methodological result. Functional approximation of estimation and stabilized forgetting, serving for tracking of slowly varying parameters, us...

متن کامل

Vector Quantization by Minimizing Kullback-Leibler Divergence

This paper proposes a new method for vector quantization by minimizing the Kullback-Leibler Divergence between the class label distributions over the quantization inputs, which are original vectors, and the output, which is the quantization subsets of the vector set. In this way, the vector quantization output can keep as much information of the class label as possible. An objective function is...

متن کامل

Kullback-Leibler Divergence for Nonnegative Matrix Factorization

The I-divergence or unnormalized generalization of KullbackLeibler (KL) divergence is commonly used in Nonnegative Matrix Factorization (NMF). This divergence has the drawback that its gradients with respect to the factorizing matrices depend heavily on the scales of the matrices, and learning the scales in gradient-descent optimization may require many iterations. This is often handled by expl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Intelligence

سال: 2023

ISSN: ['0924-669X', '1573-7497']

DOI: https://doi.org/10.1007/s10489-023-04867-z