Size Regularized Cut for Data Clustering
نویسندگان
چکیده
We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut.
منابع مشابه
Clustering of Defect Reports Using Graph Partitioning Algorithms
We present in this paper several solutions to the challenging task of clustering software defect reports. Clustering defect reports can be very useful for prioritizing the testing effort and to better understand the nature of software defects. Despite some challenges with the language used and semi-structured nature of defect reports, our experiments on data collected from the open source proje...
متن کاملOn Trivial Solution and Scale Transfer Problems in Graph Regularized NMF
Combining graph regularization with nonnegative matrix (tri-)factorization (NMF) has shown great performance improvement compared with traditional nonnegativematrix (tri-)factorizationmodels due to its ability to utilize the geometric structure of the documents and words. In this paper, we show that these models are not well-defined and suffering from trivial solution and scale transfer problem...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملSubspace Clustering via Graph Regularized Sparse Coding
Sparse coding has gained popularity and interest due to the benefits of dealing with sparse data, mainly space and time efficiencies. It presents itself as an optimization problem with penalties to ensure sparsity. While this approach has been studied in the literature, it has rarely been explored within the confines of clustering data. It is our belief that graph-regularized sparse coding can ...
متن کاملLaplacian regularized low rank subspace clustering
The problem of fitting a union of subspaces to a collection of data points drawn from multiple subspaces is considered in this paper. In the traditional low rank representation model, the dictionary used to represent the data points is chosen as the data points themselves and thus the dictionary is corrupted with noise. This problem is solved in the low rank subspace clustering model which deco...
متن کامل