Information Divergence Measures and Surrogate Loss Functions
نویسندگان
چکیده
In this extended abstract, we provide an overview of our recent work on the connection between information divergence measures and convex surrogate loss functions used in statistical machine learning. Further details can be found in the technical report [7] and conference paper [6]. The class of f -divergences, introduced independently by Csiszar [4] and Ali and Silvey [1], arise in many areas of information theory and statistics. For instance, they frequently play the role of error exponents in an asymptotic setting [3]. These connections have motivated various researchers from the 1960’s onwards—studying problems such as signal selection or quantizer design in hypothesis testing—to advocate maximizing various types of f -divergence measures as a computationally feasible alternative to the intractable problem of minimizing the probability of error directly [5], [8]. On the other hand, convex surrogates play an important role in the binary classification problem that is studied in statistical machine learning. Here the goal is to design a hypothesis testing procedure when the underlying distributions are unknown, but the learner has access to labeled samples from both classes. Any such procedure for learning a decision rule is said to be consistent if it achieves the Bayes-optimal misclassification error as the number of samples grows. A unifying theme in the recent literature on statistical learning is the notion of a surrogate loss function [2], [10]—meaning a convex upper bound on the 0-1 loss. Many practical and widely-used algorithms for learning classifiers can be formulated in terms of minimizing empirical averages of such surrogate loss functions. Our work [7], [6] establishes a general correspondence between the class of f -divergences, and the family of surrogate loss functions (see Fig. 1). This correspondence has a number of interesting consequences. First, it partitions the set of surrogate loss functions into a set of equivalence classes, defined by the relation of inducing the same f -divergence measure. Second, it allows various well-known inequalities between different f -divergences [9] to be leveraged in analyzing surrogate φ1
منابع مشابه
Information Measures via Copula Functions
In applications of differential geometry to problems of parametric inference, the notion of divergence is often used to measure the separation between two parametric densities. Among them, in this paper, we will verify measures such as Kullback-Leibler information, J-divergence, Hellinger distance, -Divergence, … and so on. Properties and results related to distance between probability d...
متن کاملOn Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing
We establish a general correspondence between two classes of statistical functions: AliSilvey distances (also known as f -divergences) and surrogate loss functions. Ali-Silvey distances play an important role in signal processing and information theory, for instance as error exponents in hypothesis testing problems. Surrogate loss functions (e.g., hinge loss, exponential loss) are the basis of ...
متن کاملA surrogate method for density-based global sensitivity analysis
This paper describes an accurate and computationally efficient surrogate method, known as the polynomial dimensional decomposition (PDD) method, for estimating a general class of density-based fsensitivity indices. Unlike the variance-based Sobol index, the f-sensitivity index is applicable to random input following dependent as well as independent probability distributions. The proposed method...
متن کاملDivergences, surrogate loss functions and experimental design
In this paper, we provide a general theorem that establishes a correspondence between surrogate loss functions in classification and the family of f -divergences. Moreover, we provide constructive procedures for determining the f -divergence induced by a given surrogate loss, and conversely for finding all surrogate loss functions that realize a given f -divergence. Next we introduce the notion...
متن کاملOn distance measures, surrogate loss functions, and distributed detection
In this paper, we show the correspondence between distance measures and surrogate loss functions in the context of decentralized binary hypothesis testing. This correspondence helps explicate the use of various distance measures in signal processing and quantization theory, as well as explain the behavior of surrogate loss functions often used in machine learning and statistics. We then develop...
متن کامل