Hierarchical Document Clustering Using Correlation Preserving Indexing
نویسنده
چکیده
This paper presents a spectral clustering method called as correlation preserving indexing (CPI). This method is performed in the correlation similarity measure space. Correlation preserving indexing explicitly considers the manifold structure embedded in the similarities between the documents. The aim of CPI method is to find an optimal semantic subspace by maximizing the correlation between the documents in the local patches and simultaneously correlation in the patches outside are minimized. Correlation is a similarity measure can capture the intrinsic structure in high dimensional data. In an effort to reduce the computational cost of CPI method, we propose to apply the biiterative least square method to reduce the dimensions. On comparison of the effectiveness of the CPI method with other clustering methods, using the Bi-iterative least square method there has been a considerable reduction in the time computation. Key Terms: Document clustering; Correlation preserving indexing; Singular value decomposition; Dimensionality reduction; Correlation measure; QR decomposition Full Text: http://www.ijcsmc.com/docs/papers/August2013/V2I8201352.pdf
منابع مشابه
Correlation Preserved Indexing Based Approach For Document Clustering
Document clustering is the act of collecting similar documents into clusters, where similarity is some function on a document. Document clustering method achieves 1) a high accuracy for documents 2) document frequency can be calculated 3) term weight is calculated with the term frequency vector. Document clustering is closely related to the concept of data clustering. Document clustering is a m...
متن کاملGrouping and Categorization of Documents in Relativity Measure
This paper presents a spectral clustering method called correlation through preserving indexing (CPI), which is to perform in the correlation similarity measure space. The documents are considered into a low dimensional semantic space, the correlations between the documents in the local patches are maximized and correlations between the documents outside these patches are minimized. The intrins...
متن کاملNon-hierarchical Clustering with Rival Penalized Competitive Learning for Information Retrieval
In large content-based image database applications, e cient information retrieval depends heavily on good indexing structures of the extracted features. While indexing techniques for text retrieval are well understood, e cient and robust indexing methodology for image retrieval is still in its infancy. In this paper, we present a non-hierarchical clustering scheme for index generation using the...
متن کاملContext Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining
Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possib...
متن کاملEffects of Visual Concept-based Post-retrieval Clustering in ImageCLEFphoto 2008
We examined the effectiveness of post-retrieval clustering that was based on the visual similarities among images to enhance the instance recall in the photo retrieval task of ImageCLEF 2008. The visual similarities are defined by the example visual concepts that were provided for the automatic photo indexing task. We tested two types of visual concepts and two kinds of clustering methods, hier...
متن کامل