Fuzzy semi-supervised co-clustering for text documents
نویسندگان
چکیده
In this paper we propose a new heuristic semi-supervised fuzzy co-clustering algorithm (SS-HFCR) for categorization of large web documents. In this approach, the clustering process is carried out by incorporating some prior knowledge in the form of pair-wise constraints provided by users into the fuzzy co-clustering framework. Each constraint specifies whether a pair of documents “must” or “cannot” be clustered together. Moreover, we formulate the competitive agglomeration cost function which is also able to make use of prior knowledge in the clustering process. The experimental studies on a number of large benchmark datasets demonstrate the strength and potentials of SS-HFCR in terms of accuracy, stability and efficiency, compared with some of the recent popular semi-supervised clustering approaches. © 2012 Elsevier B.V. All rights reserved.
منابع مشابه
Text Categorization using the Semi-Supervised Fuzzy c-Means Algorithm
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. For the past few years, TC has become very important essentially in the Information Retrieval area, where information needs have tremendously increased with the rapid growth of textual information sources such as the Internet. In this paper, we compare , for text categoriz...
متن کاملDocument Clustering Based On Semi-Supervised Term Clustering
The study is conducted to propose a multi-step feature (term) selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM) clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to f...
متن کاملA Genetic Semi-supervised Fuzzy Clustering Approach to Text Classification
A genetic semi-supervised fuzzy clustering algorithm is proposed, which can learn text classifier from labeled and unlabeled documents. Labeled documents are used to guide the evolution process of each chromosome, which is fuzzy partition on unlabeled documents. The fitness of each chromosome is evaluated with a combination of fuzzy within cluster variance of unlabeled documents and misclassifi...
متن کاملSemi-Supervised Learning for Web Text Clustering
Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semi-supervised framework is a promising approach to reduc...
متن کاملA Semi - supervised Text Clustering Algorithm Based on Pairwise Constraints ★
In this paper, an active learning method which can effectively select pairwise constraints during clustering procedure was presented. A novel semi-supervised text clustering algorithm was proposed, which employed an effective pairwise constraints selection method. As the samples on the fuzzy boundary are far away from the cluster center in the clustering procedure, they can be easily divided in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Fuzzy Sets and Systems
دوره 215 شماره
صفحات -
تاریخ انتشار 2013