Theory of Optimizing Pseudolinear Performance Measures: Application to F-measure

نویسندگان

  • Shameem Puthiya Parambath
  • Nicolas Usunier
  • Yves Grandvalet
چکیده

State of the art classification algorithms are designed to minimize the misclassification error of the system, which is a linear function of the per-class false negatives and false positives. Nonetheless non-linear performance measures are widely used for the evaluation of learning algorithms. For example, F -measure is a commonly used non-linear performance measure in classification problems. We study the theoretical properties of a subset of non-linear performance measures called pseudo-linear performance measures which includes F -measure, Jaccard index, among many others. We establish that many notions of F -measures and Jaccard index are pseudo-linear functions of the per-class false negatives and false positives for binary, multiclass and multilabel classification. Based on this observation, we present a general reduction of such performance measure optimization problem to cost-sensitive classification problem with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F -measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on binary F -score, which are asymptotic in nature. Our analysis shows that thresholding cost-insensitive scores, a common technique employed to optimize F -measure, yields sub-optimal results. We also establish the multi-objective nature of the F -measure maximization problem by linking the algorithm with the weighted-sum approach used in multi-objective optimization. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F -measure optimization tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of measures of noncompactness to infinite system of linear equations in sequence spaces

G. Darbo [Rend. Sem. Math. Univ. Padova, 24 (1955) 84--92] used the measure of noncompactness to investigate operators whose properties can be characterized as being intermediate between those of contraction and compact operators. In this paper, we apply the Darbo's fixed point theorem for solving infinite system of linear equations in some sequence spaces.  

متن کامل

Simplex-type algorithm for optimizing a pseudolinear quadratic fractional function over a polytope

Recently Cambini and Carosi described a characterization of pseudolinearity of quadratic fractional functions. A reformulation of their result was given by Rapcsák. Using this reformulation, in this paper we describe an alternative proof of the Cambini–Carosi Theorem. Our proof is shorter than the proof given by Cambini–Carosi and less involved than the proof given by Rapcsák. As an application...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Optimizing Non-decomposable Performance Measures: A Tale of Two Classes

Modern classification problems frequently present mild to severe label imbalance as well as specific requirements on classification characteristics, and require optimizing performance measures that are non-decomposable over the dataset, such as F-measure. Such measures have spurred much interest and pose specific challenges to learning algorithms since their non-additive nature precludes a dire...

متن کامل

Fuzzy relations, Possibility theory, Measures of uncertainty, Mathematical modeling.

A central aim of educational research in the area of mathematical modeling and applications is to recognize the attainment level of students at defined states of the modeling process. In this paper, we introduce principles of fuzzy sets theory and possibility theory to describe the process of mathematical modeling in the classroom. The main stages of the modeling process are represented as fuzz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1505.00199  شماره 

صفحات  -

تاریخ انتشار 2015