Refining A Divisive Partitioning Algorithm for Unsupervised Clustering
نویسندگان
چکیده
The Principal Direction Divisive Partitioning (PDDP) algorithm is a fast and scalable clustering algorithm [3]. The basic idea is to recursively split the data set into sub-clusters based on principal direction vectors. However, the PDDP algorithm can yield poor results, especially when cluster structures are not well-separated from one another. Its stopping criterion is based on a heuristic that often tends to over-estimate the number of clusters. In this paper, we propose simple and efficient solutions to the problems by refining results from the splitting process, and applying the Bayesian Information Criterion (BIC) to estimate the true number of clusters. This motivates a novel algorithm for unsupervised clustering, which its experimental results on different data sets are very encouraging.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملA Multi-level Approach for Document Clustering
The divisive MinMaxCut algorithm of Ding et al. [3] produces more accurate clustering results than existing document cluster methods. Multilevel algorithms [4, 1, 5, 7] have been used to boost the speed of graph partitioning algorithms. We combine these two algorithms to construct faster and more accurate algorithm. In this new algorithm, the original graph is coarsened, partitioned by the divi...
متن کاملDocument Clustering using Sequential Information Bottleneck Method
Document clustering is a subset of the larger field of data clustering, which borrows concepts from the fields of information retrieval (IR), natural language processing (NLP), and machine learning (ML). It is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering. There exist a wide variety of unsupervised cluste...
متن کاملWeb Document Clustering Using Threshold Selection Partitioning
Clustering techniques have been applied to categorize documents on World Wide Web. In previous research, PDDP (Principal Direction Divisive Partitioning) is a well-known clustering algorithm. PDDP algorithm employs top-down and unsupervised clustering based on the principal component analysis and splits documents into two sets using a plane perpendicular to the maximum principal direction passi...
متن کاملOn the performance of bisecting K-means and PDDP
The problem this paper focuses on is the unsupervised clustering of a data-set. The dataset is given by the matrix [ ] N p N x x x M × R ∈ = ,..., , 2 1 , where each column of M, p i x R ∈ , is a single data-point. This is one of the more basic and common problems in fields like pattern analysis, data mining, document retrieval, image segmentation, decision making, etc. ([12, 13]). The specific...
متن کامل