Automatic generation of initial value k to apply k-means method for text documents clustering
نویسندگان
چکیده
Retrieving relevant text documents on a topic from a large document collection is a challenging task. Different clustering algorithms are developed to retrieve relevant documents of interest. Hierarchical clustering shows quadratic time complexity of O(n 2 ) for n text documents. K-means algorithm has a time complexity of O(n) but it is sensitive to the initial randomly selected cluster centers, giving local optimum solution. Global Kmeans employs the K-means algorithm as a local search procedure to produce global optimum solution but shows polynomial time complexity of O(nk) to produce k clusters. In this chapter, a new approach is proposed for clustering text documents that overcomes the drawback of K-means and Global K-means and gives global optimal solution with time complexity of O(lk) to obtain k clusters from initial set of l starting clusters. Experimental evaluation on Reuters newsfeeds (Reuters-21578) shows clustering results (entropy, purity, F-measure) obtained by proposed method comparable with K-means and Global Kmeans.
منابع مشابه
Comparing k-means clusters on parallel Persian-English corpus
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...
متن کاملGraph based Text Document Clustering by Detecting Initial Centroids for k-Means
Document clustering is used in information retrieval to organize a large collection of text documents into some meaningful clusters. k-means clustering algorithm of pratitional category, performs well on document clustering. k-means organizes a large collection of items into k clusters so that a criterion function is optimized. As it is sensitive to the initial values of cluster centroids, this...
متن کاملFuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملطراحی سامانه نیمهخودکار ساخت هستیشناسی بهکمک تحلیل همرخدادی واژگان و روش C-value (مطالعه موردی: حوزه علمسنجی ایران)
Ontology is one of formal concepts and the relations in the specific regions.It have recently tried to design the learning, automatic methods of Ontology. Whereas Ontology containing concepts and the relations, exploiting concepts, the semantic relations among concept. The various Ontology of regions and different applications are expensive processes that are automatic.The lack of main knowledg...
متن کاملOptimization of Initial Centroids for K-Means Algorithm Based on Small World Network
K-means algorithm is a relatively simple and fast gather clustering algorithm. However, the initial clustering center of the traditional k-means algorithm was generated randomly from the dataset, and the clustering result was unstable. In this paper, we propose a novel method to optimize the selection of initial centroids for k-means algorithm based on the small world network. This paper firstl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJDMMM
دوره 3 شماره
صفحات -
تاریخ انتشار 2011