Hierarchical Clustering With Prototypes via Minimax Linkage.
نویسندگان
چکیده
Agglomerative hierarchical clustering is a popular class of methods for understanding the structure of a dataset. The nature of the clustering depends on the choice of linkage-that is, on how one measures the distance between clusters. In this article we investigate minimax linkage, a recently introduced but little-studied linkage. Minimax linkage is unique in naturally associating a prototype chosen from the original dataset with every interior node of the dendrogram. These prototypes can be used to greatly enhance the interpretability of a hierarchical clustering. Furthermore, we prove that minimax linkage has a number of desirable theoretical properties; for example, minimax-linkage dendrograms cannot have inversions (unlike centroid linkage) and is robust against certain perturbations of a dataset. We provide an efficient implementation and illustrate minimax linkage's strengths as a data analysis and visualization tool on a study of words from encyclopedia articles and on a dataset of images of human faces.
منابع مشابه
Cluster merging and splitting in hierarchical clustering algorithms
Hierarchical clustering constructs a hierarchy of clusters by either repeatedly merging two smaller clusters into a larger one or splitting a larger cluster into smaller ones. The crucial step is how to best select the next cluster(s) to split or merge. Here we provide a comprehensive analysis of selection methods and propose several new methods. We perform extensive clustering experiments to t...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کاملTwo tree - formation methods and fast pattern search using nearestneighbor and nearest
4 This paper describes tree-based classiication of character images, comparing two methods of tree formation and two methods of matching: nearest neighbor and nearest centroid. The rst method, Preprocess Using Relative Distances (PURD) is a tree-based reorganization of a at list of patterns, designed to speed up nearest-neighbor matching. The second method is a variant of agglomerative hierarch...
متن کاملOn the properties of $\alpha$-unchaining single linkage hierarchical clustering
In the election of a hierarchical clustering method, theoretic properties may give some insight to determine which method is the most suitable to treat a clustering problem. Herein, we study some basic properties of two hierarchical clustering methods: α-unchaining single linkage or SL(α) and a modified version of this one, SL∗(α). We compare the results with the properties satisfied by the cla...
متن کاملNoise Thresholds for Spectral Clustering
Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Statistical Association
دوره 106 495 شماره
صفحات -
تاریخ انتشار 2011