Hierarchical and Partitioning Algorithm for Document Clustring: a Survey

نویسندگان

  • JITENDRA AGRAWAL
  • SHIKHA AGRAWAL
  • SAUMYA AGRAWAL
  • SANJEEV SHARMA
چکیده

Document clustering is the widely researched area because of large amount of rich and dynamic information are available in world wide web. It is the application of cluster analysis to texual documents. There are different applications of document clustering include automatic document organization, data mining , topic extraction and filtering or fast information retrieval. The purpose of this survey is to provide a review of different partitioning and hierarchical techniques used in documentclustering. KeywordsDocument clustering, Hierarchical clustering, Partitioning clustering, k-means.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study on Storage & Retrival in Data Mining Using Clustring

The purpose data mining is to extract the useful information from a bulky data set. Clustering analysis is an important technique in the field of data mining. It is the process of grouping similar vectors of a document into number of clusters. The basic procedure for clustering is to divide document into set of terms and assigns weight to these terms and classify them according to their feature...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Algorithm for Hierarchical Multi-way Divisive Clustering of Document Collections

This paper proposes a novel algorithm of hierarchical divisive clustering, which generates a multi-branch tree, not a binary one, as its output. In order to use the algorithm for clustering large document sets, a spherical kmeans clustering algorithm based on a cosine measure is adopted for partitioning recursively the document set from the top to bottom. Also, by selecting automatically the nu...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015