A Comparison of Two Document Clustering Approaches for Clustering Medical Documents

نویسندگان

  • Fathi H. Saad
  • Beatriz de la Iglesia
  • Duncan G. Bell
چکیده

form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering the medical documents into small number of meaningful clusters may facilitate discovering patterns by allowing us to extract a number of relevant features from each cluster, thus introducing structure into the data and facilitating the application of conventional data mining techniques. For this approach to work, it is essential to produce high-quality clustering. Thus, the main goals of this paper are (1) to experimentally evaluate the performance of six criterion functions in the context of partitional clustering approach, (2) to compare the clustering results of agglomerative approach and partitional approach for each of the criterion functions using real-world medical documents, and (3) to establish the right clustering algorithm to produce high quality clustering of real-world medical documents in order to discover hidden knowledge by analyzing the produced clusters. Our experimental results show that the clustering solutions produced by the agglomerative approach are consistently better than those produced by the partitional approach for all the criterion functions. Moreover, the results show that different criterion functions lead to substantially different results. In addition, we examine the quality of the features produced for each cluster for a classification task. The task involves discriminating between successful and unsuccessful procedures. The features extracted are used to produce an accurate classification of the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

Using Clustering Techniques for on-segmented Language Document Management: A Comparison of K-mean and Self Organizing Map Techniques

Since the number of electronics non-segmented language documents is growing very fast, efficient document clustering techniques for non-segmented languages are needed as a tool in today’s world where a lot of documents are stored and retrieved electronically. It enables one to group the similar documents using keywords or terms of the clusters. Thus document clustering can be used to group and ...

متن کامل

Comparing k-means clusters on parallel Persian-English corpus

This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006