Validation of Document Clustering based on Purity and Entropy measures

نویسندگان

  • M. Deepa
  • P. Revathy
چکیده

Document clustering aims to automatically group related document into clusters. If two documents are close to each other in the original document space, they are grouped into the same cluster. If the two documents are far away from each other in the original document space, they tend to be grouped into different cluster. The classical clustering algorithms assign each data to exactly one cluster, but fuzzy c-means allow data belong to different clusters. Fuzzy clustering is a powerful unsupervised method for the analysis of data. Cluster validity measure is useful in estimating the optimal number of clusters. Purity and Entropy are the validity measures used in this

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Simulated Annealing Clustering for Optimum GPS Satellite Selection

This paper utilizes a clustering approach based on Simulated Annealing (SA) method to select optimum satellite subsets from the visible satellites. Geometric Dilution of Precision (GDOP) is used as criteria of optimality. The lower the values of the GDOP number, the better the geometric strength, and vice versa. Not needing to calculate the inverse matrix, which is time-consuming process, is a ...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Text Document Clustering based on Phrase

Affinity propagation (AP) was recently introduced as an unsupervised learning algorithm for exemplar based clustering. In this paper novel text document clustering algorithm has been developed based on vector space model, phrases and affinity propagation clustering algorithm. Proposed algorithm can be called Phrase affinity clustering (PAC). PAC first finds the phrase by ukkonen suffix tree con...

متن کامل

An Analysis of Ministry of Education’s Strategic Plans Based on Favorable Components of English Language Teaching Using Shannon’s Entropy

The present research aims to analyze the content of Ministry of Education’s strategic plans (the Fundamental Reform Document of Education, the Comprehensive National Scientific Plan and the National Curriculum Document) based on Shannon's entropy regarding the favorable components of teaching English. The contents of the Fundamental Reform Document of Education, the Comprehensive National Scien...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012