Fast, Linear Time Hierarchical Clustering using the Baire Metric
نویسندگان
چکیده
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through kmeans partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
منابع مشابه
Fast redshift clustering with the Baire (ultra) metric
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cost...
متن کاملروش نوین خوشهبندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی
Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...
متن کاملCone normed spaces
In this paper, we introduce the cone normed spaces and cone bounded linear mappings. Among other things, we prove the Baire category theorem and the Banach--Steinhaus theorem in cone normed spaces.
متن کاملHierarchical linear subspace indexing method
Traditional multimedia indexing methods are based on the principle of hierarchical clustering of the data space where metric properties are used to build a tree that can then be used to prune branches while processing the queries. However, the performance of these methods will deteriorate rapidly when the dimensionality of the data space is increased. We describe a new hierarchical linear subsp...
متن کاملFast Deterministic Single-Linkage 2D-Spatial Cluster Analysis
Cluster analysis is a common task in data mining, machine learning and related fields. There exist a plethora of clustering algorithms designed for this purpose, but many are prohibitively inefficient (e.g. quality-threshold clustering), non-deterministic (k-means) or utilise inherently lossy partitioning models (k-d tree clustering). Single-linkage hierarchical clustering is a form of cluster ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Classification
دوره 29 شماره
صفحات -
تاریخ انتشار 2012