Refining Initial Points for K-Means Clustering

نویسندگان

  • Paul S. Bradley
  • Usama M. Fayyad
چکیده

Practical approaches to clustering use an iterative procedure (e.g. K-Means, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition allows the iterative algorithm to converge to a “better” local minimum. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the popular K-Means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale clustering problems in data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Convegence Clustering Ensemble

Clustering ensemble combines some clustering outputs to obtain better results. High robustness, accuracy and stability are the most important characteristics of clustering ensembles. Previous clustering ensembles usually use k-means to generate ensemble members. The main problem of k-means is initial samples which have high effect on final results. Refining initial samples of kmeans increases t...

متن کامل

Refining membership degrees obtained from fuzzy C-means by re-fuzzification

Fuzzy C-mean (FCM) is the most well-known and widely-used fuzzy clustering algorithm. However, one of the weaknesses of the FCM is the way it assigns membership degrees to data which is based on the distance to the cluster centers. Unfortunately, the membership degrees are determined without considering the shape and density of the clusters. In this paper, we propose an algorithm which takes th...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Comparing Model-based Versus K-means Clustering for the Planar Shapes

‎In some fields‎, ‎there is an interest in distinguishing different geometrical objects from each other‎. ‎A field of research that studies the objects from a statistical point of view‎, ‎provided they are‎ ‎invariant under translation‎, ‎rotation and scaling effects‎, ‎is known as the statistical shape analysis‎. ‎Having some objects that are registered using key points on the outline...

متن کامل

Automatic Clustering Approaches Based On Initial Seed Points

-Since clustering is applied in many fields, a number of clustering techniques and algorithms have been proposed and are available in the literature. This paper proposes a novel approach to address the major problems in any of the partitional clustering algorithms like choosing appropriate K-value and selection of K-initial seed points. The performance of any partitional clustering algorithms d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998