Keyword-based document clustering
نویسنده
چکیده
Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.
منابع مشابه
Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملLog based Keyword Extraction and Spread based Clustering for an Efficient Information Searching
Today an efficient information search is very important to extract and analyze user requirements in vast amount of web information. Due to this reason, this paper proposes the log based keyword extraction method which finds the associated keywords in a certain domain. Also, this paper proposes the spread based clustering method as clustering the keywords with high association among the keyword-...
متن کاملSimultaneous Categorization of Text Documents and Identification of Cluster-dependent Keywords
In this paper, we propose a new approach to unsupervised text document categorization based on a coupled process of clustering and cluster-dependent keyword weighting. The proposed algorithm is based on the K-Means clustering algorithm. Hence it is computationally and implementationally simple. Moreover, it learns a different set of keyword weights for each cluster. This means that, as a by-pro...
متن کاملContent Based Document Image Retrieval with Support Vectors Clustering
The goal of this paper is representing a suitable approach to content based document image retrieval. in proposed algorithm a feature vector is extracted with wavelet transform for sub-words. then based on this features, sub-words are clustered with support vector clustering (SVC) algorithm, then this approach is used for searching based on keyword in content based document retrieval problem. T...
متن کاملExperiments in Clustering Documents for Automatic Acquisition of Lexical Semantic Networks for Polish
The aim of this work is to explore document clustering techniques for the needs of semi–automatic construction of a lexical semantic network for Polish. Although the majority of research in this area is based on measures of distributional similarity calculated from co-occurrences of words in large collections of documents, we wanted to approach a difficult problem of meaning ambiguity resolutio...
متن کامل