Fast, Linear Time Hierarchical Clustering using the Baire Metric

نویسندگان

  • Pedro Contreras
  • Fionn Murtagh
چکیده

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through kmeans partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast redshift clustering with the Baire (ultra) metric

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cost...

متن کامل

روش نوین خوشه‌بندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی

Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...

متن کامل

Cone normed spaces

In this paper, we introduce the cone normed spaces and cone bounded linear mappings. Among other things, we prove the Baire category theorem and the Banach--Steinhaus theorem in cone normed spaces.

متن کامل

Hierarchical linear subspace indexing method

Traditional multimedia indexing methods are based on the principle of hierarchical clustering of the data space where metric properties are used to build a tree that can then be used to prune branches while processing the queries. However, the performance of these methods will deteriorate rapidly when the dimensionality of the data space is increased. We describe a new hierarchical linear subsp...

متن کامل

Fast Deterministic Single-Linkage 2D-Spatial Cluster Analysis

Cluster analysis is a common task in data mining, machine learning and related fields. There exist a plethora of clustering algorithms designed for this purpose, but many are prohibitively inefficient (e.g. quality-threshold clustering), non-deterministic (k-means) or utilise inherently lossy partitioning models (k-d tree clustering). Single-linkage hierarchical clustering is a form of cluster ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Classification

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2012