Iterative random projections for high-dimensional data clustering

نویسندگان

  • Ângelo Cardoso
  • Andreas Wichert
چکیده

In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the Kmeans algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data after each convergence of K-means. We compare the proposed algorithm on four high-dimensional datasets, image, text and two synthetic, with K-means clustering using a single random projection and K-means clustering of the original high-dimensional data. Regarding time we observe that the algorithm reduces drastically the time when compared to K-means on the original high-dimensional data. Regarding mean squared error the proposed method reaches a better solution than clustering using a single random projection. More notably in the experiments performed it also reaches a better solution than clustering on the original high-dimensional data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Method for Clustering High-Dimensional Data Using 1D Random Projections

Han, Sangchun PhD, Purdue University, December 2014. A Method for Clustering High-Dimensional Data Using 1D Random Projections. Major Professor: Mireille Boutin. Clustering high-dimensional data is more difficult than clustering low-dimensional data. The problem is twofold. First, there is an efficiency problem related to the data size, which increases with the dimensionality. Second, there is ...

متن کامل

Fast and Robust Subspace Clustering Using Random Projections

Over the past several decades, subspace clustering has been receiving increasing interest and continuous progress. However, due to the lack of scalability and/or robustness, existing methods still have difficulty in dealing with the data that possesses simultaneously three characteristics: high-dimensional, massive and grossly corrupted. To tackle the scalability and robustness issues simultane...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Ensemble Fuzzy Clustering using Cumulative Aggregation on Random Projections

Random projection is a popular method for dimensionality reduction due to its simplicity and efficiency. In the past few years, random projection and fuzzy c-means based cluster ensemble approaches have been developed for high dimensional data clustering. However, they require large amounts of space for storing a big affinity matrix, and incur large computation time while clustering in this aff...

متن کامل

Dimensionality Reduction for Distance Based Video Clustering

Clustering of video sequences is essential in order to perform video summarization. Because of the high spatial and temporal dimensions of the video data, dimensionality reduction becomes imperative before performing Euclidean distance based clustering. In this paper, we present non-adaptive dimensionality reduction approaches using random projections on the video data. Assuming the data to be ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2012