Minimax Estimation of KL Divergence between Discrete Distributions

نویسندگان

  • Yanjun Han
  • Jiantao Jiao
  • Tsachy Weissman
چکیده

We refine the general methodology in [1] for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions with support size S comparable with the number of observations n. Specifically, we determine the “smooth” and “non-smooth” regimes based on the confidence set and the smoothness of the functional. In the “non-smooth” regime, we apply an unbiased estimator for a suitable polynomial approximation of the functional. In the “smooth” regime, we construct a general version of the bias-corrected Maximum Likelihood Estimator (MLE) based on Taylor expansion. We apply the general methodology to the problem of estimating the KL divergence between two discrete probability measures P and Q from empirical data in a non-asymptotic and possibly large alphabet setting. We construct minimax rate-optimal estimators for D(P‖Q) when the likelihood ratio is upper bounded by a constant which may depend on the support size, and show that the performance of the optimal estimator with n samples is essentially that of the MLE with n lnn samples. Our estimator is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio. We show that the general methodology results in minimax rate-optimal estimators for other divergences as well, such as the Hellinger distance and the χ-divergence. Our approach refines the Approximation methodology recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, Rényi entropy, mutual information and `1 distance in large alphabet settings, and shows that the effective sample size enlargement phenomenon holds significantly more widely than previously established. Index Terms Divergence estimation, KL divergence, multivariate approximation theory, Taylor expansion, functional estimation, maximum likelihood estimator, high dimensional statistics, minimax lower bound

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimax Estimator of a Lower Bounded Parameter of a Discrete Distribution under a Squared Log Error Loss Function

The problem of estimating the parameter ?, when it is restricted to an interval of the form , in a class of discrete distributions, including Binomial Negative Binomial discrete Weibull and etc., is considered. We give necessary and sufficient conditions for which the Bayes estimator of with respect to a two points boundary supported prior is minimax under squared log error loss function....

متن کامل

Minimax Estimation of Discrete Distributions under $\ell_1$ Loss

We consider the problem of discrete distribution estimation under l1 loss. We provide tight upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the support size S may grow with the number of observations n. We show that among distributions with bounded entropy H , the asymptotic maximum risk for the e...

متن کامل

Minimax Estimation of the Scale Parameter in a Family of Transformed Chi-Square Distributions under Asymmetric Squared Log Error and MLINEX Loss Functions

This paper is concerned with the problem of finding the minimax estimators of the scale parameter ? in a family of transformed chi-square distributions, under asymmetric squared log error (SLE) and modified linear exponential (MLINEX) loss functions, using the Lehmann Theorem [2]. Also we show that the results of Podder et al. [4] for Pareto distribution are a special case of our results for th...

متن کامل

Tight Bounds on Profile Redundancy and Distinguishability

The minimax KL-divergence of any distribution from all distributions in a collection P has several practical implications. In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P . In online estimation and learning, it is the lowest expected log-loss regret when guessing a sequence of random...

متن کامل

Variational Minimax Estimation of Discrete Distributions under KL Loss

We develop a family of upper and lower bounds on the worst-case expected KL loss for estimating a discrete distribution on a finite numberm of points, given N i.i.d. samples. Our upper bounds are approximationtheoretic, similar to recent bounds for estimating discrete entropy; the lower bounds are Bayesian, based on averages of the KL loss under Dirichlet distributions. The upper bounds are con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.09124  شماره 

صفحات  -

تاریخ انتشار 2016