Generalized Adjusted Rand Indices for cluster ensembles

نویسندگان

  • Shaohong Zhang
  • Hau-San Wong
  • Ying Shen
چکیده

In this paper, Adjusted Rand Index (ARI) is generalized to two new measures based on matrix comparison: (i) Adjusted Rand Index between a similarity matrix and a cluster partition (ARImp), to evaluate the consistency of a set of clustering solutions with their corresponding consensus matrix in a cluster ensemble, and (ii) Adjusted Rand Index between similarity matrices (ARImm), to evaluate the consistency between two similarity matrices. Desirable properties of ARI are preserved in the two new measures, and new properties are discussed. These properties include: (i) detection of uncorrelatedness; (ii) computation of ARImp/ARImm in a distributed environment; and (iii) characterization of the degree of uncertainty of a consensus matrix. All of these properties are investigated from both the perspectives of theoretical analysis and experimental validation. We have also performed a number of experiments to show the usefulness and effectiveness of the two proposed measures in practical applications. & 2011 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Moderate diversity for better cluster ensembles

Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some cases, ensembles which exhibited a moderate level of diversity gave a more accurate clustering. Based on this, a procedure for building a cluste...

متن کامل

Development of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance

Validating a given clustering result is a very challenging task in real world. So for this purpose, several cluster validity indices have been developed in the literature. Cluster validity indices are divided into two main categories: external and internal. External cluster validity indices rely on some supervised information available and internal validity indices utilize the intrinsic structu...

متن کامل

Adjusting for Chance Clustering Comparison Measures

Adjusted for chance measures are widely used to compare partitions/clusterings of the same data set. In particular, the Adjusted Rand Index (ARI) based on pair-counting, and the Adjusted Mutual Information (AMI) based on Shannon information theory are very popular in the clustering community. Nonetheless it is an open problem as to what are the best application scenarios for each measure and gu...

متن کامل

Performance of an Ensemble Clustering Algorithm on Biological Data Sets

Ensemble clustering is a promising approach that combines the results of multiple clustering algorithms to obtain a consensus partition by merging different partitions based upon well-defined rules. In this study, we use an ensemble clustering approach for merging the results of five different clustering algorithms that are sometimes used in bioinformatics applications. The ensemble clustering ...

متن کامل

Unsupervised Feature Selection by Means of External Validity Indices

Feature selection for unsupervised data is a difficult task because a reference partition is not available to evaluate the relevance of the features. Recently, different proposals of methods for consensus clustering have used external validity indices to assess the agreement among partitions obtained by clustering algorithms with different parameter values. Theses indices are independent of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2012