Theoretical Foundations of Equitability and the Maximal Information Coefficient

نویسندگان

  • Yakir Reshef
  • David N. Reshef
  • Pardis Sabeti
  • Michael Mitzenmacher
چکیده

The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables [1]. MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called equitability, is important for analyzing high-dimensional data sets. Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show that equitability is a generalization of power against statistical independence. Second, it allows us to compute and discuss the population value of MIC, which we call MIC∗. In doing so we generalize and strengthen the mathematical results proven in [1] and clarify the relationship between MIC and mutual information. Introducing MIC∗ also enables us to reason about the properties of MIC more abstractly: for instance, we show that MIC∗ is continuous and that there is a sense in which it is a canonical “smoothing” of mutual information. We also prove an alternate, equivalent characterization of MIC∗ that we use to state new estimators of it as well as an algorithm for explicitly computing it when the joint probability density function of a pair of random variables is known. Our hope is that this paper provides a richer theoretical foundation for MIC and equitability going forward. This paper will be accompanied by a forthcoming companion paper that performs extensive empirical analysis and comparison to other methods and discusses the practical aspects of both equitability and the use of MIC and its related statistics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Equitability and MIC: an FAQ

The original paper on equitability and the maximal information coefficient (MIC) [Reshef et al., 2011] has generated much discussion and interest, and so far MIC has enjoyed use in a variety of disciplines. This document serves to provide some basic background and understanding of MIC as well as to address some of the questions raised about MIC in the literature, and to provide pointers to rele...

متن کامل

Equitability Analysis of the Maximal Information Coefficient, with Comparisons

A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable stati...

متن کامل

Cleaning up the record on the maximal information coefficient and equitability.

Although we appreciate Kinney and Atwal’s interest in equitability and maximal information coefficient (MIC), we believe they misrepresent our work. We highlight a few of our main objections below. Regarding our original paper (1), Kinney and Atwal (2) state “MIC is said to satisfy not just the heuristic notion of equitability, but also the mathematical criterion of R equitability,” the latter ...

متن کامل

Equitability, mutual information, and the maximal information coefficient.

How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequa...

متن کامل

Reply to Murrell et al.: Noise matters.

The concept of statistical " equitability " plays a central role in the 2011 paper by Reshef et al. (1). Formalizing equitability first requires formalizing the notion of a " noisy functional relationship, " that is, a relationship between two real variables, X and Y, having the form Y = f ðXÞ + η; where f is a function and η is a noise term. Whether a dependence measure satisfies equi-tability...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1408.4908  شماره 

صفحات  -

تاریخ انتشار 2014