Fuzzy Clustering of Documents

نویسندگان

  • Matjaž Juršič
  • Nada Lavrač
چکیده

This paper presents a short overview of methods for fuzzy clustering and states desired properties for an optimal fuzzy document clustering algorithm. Based on these criteria we chose one of the fuzzy clustering most prominent methods – the c-means, more precisely probabilistic c-means. This algorithm is presented in more detail along with some empirical results of the clustering of 2-dimensional points and documents. For the needs of documents clustering we implemented fuzzy c-means in the TextGarden environment. We show few difficulties with the implementation and their possible solutions. As a conclusion we also propose further work that would be needed in order to fully exploit the power of fuzzy document clustering in TextGarden.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Fuzzy C-Means Clustering for Biomedical Documents Using Ontology Based Indexing and Semantic Annotation

Search is the most obvious application of information retrieval. The variety of widely obtainable biomedical data is enormous and is expanding fast. This expansion makes the existing techniques are not enough to extract the most interesting patterns from the collection as per the user requirement. Recent researches are concentrating more on semantic based searching than the traditional term bas...

متن کامل

Retrieval of Web Documents Using a Fuzzy Hierarchical Clustering

The World Wide Web has huge amount of information that is retrieved using information retrieval tool like Search Engine. Page repository of Search Engine contains the web documents downloaded by the crawler. This repository contains variety of web documents from different domains. In this paper, a technique called “Retrieval of Web documents using a fuzzy hierarchical clustering” is being propo...

متن کامل

Web Document Clustering Using Fuzzy Equivalence Relations

Conventional clustering means classifying the given data objects as exclusive subsets (clusters).That means we can discriminate clearly whether an object belongs to a cluster or not. However such a partition is insufficient to represent many real situations. Therefore a fuzzy clustering method is offered to construct clusters with uncertain boundaries and allows that one object belongs to overl...

متن کامل

Document Clustering Based On Semi-Supervised Term Clustering

The study is conducted to propose a multi-step feature (term) selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM) clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008