An effective refinement strategy for KNN text classifier

نویسنده

  • Songbo Tan
چکیده

Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. A wide range of supervised learning algorithms has been introduced to deal with text classification. Among all these classifiers, K-Nearest Neighbors (KNN) is a widely used classifier in text categorization community because of its simplicity and efficiency. However, KNN still suffers from inductive biases or model misfits that result from its assumptions, such as the presumption that training data are evenly distributed among all categories. In this paper, we propose a new refinement strategy, which we called as DragPushing, for the KNN Classifier. The experiments on three benchmark evaluation collections show that DragPushing achieved a significant improvement on the performance of the KNN Classifier. q 2005 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An kNN Model-Based Approach and Its Application in Text Categorization

An investigation has been conducted on two well known similarity-based learning approaches to text categorization. This includes the k-nearest neighbor (kNN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, we propose a new classifier called the kNN model-based classifier by unifying the strengths of k-NN and Rocchio classifier and adapting t...

متن کامل

Rough set based hybrid algorithm for text classification

Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic due to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method without premodelling, kNN has a high cost to classify new documents whe...

متن کامل

Multiclass Boosting with Adaptive Group-Based kNN and Its Application in Text Categorization

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine SVM , neural networks NN , naı̈ve Bayes, and k-nearest neighbor kNN . This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple tw...

متن کامل

Deep Classifier for Large Scale Hierarchical Text Classification

In this competition, we refined a novel algorithm for classification on large scale documents with deep category structure based on a two-stage strategy known as the deep-classifier [1]. The basic idea of deep-classifier is to take advantage of the better performance of kNN relevance search on large scale categories and the higher precision of the Naive Bayes for multi-class classification. As ...

متن کامل

Using kNN Model-based Approach for Automatic Text Categorization

An investigation has been conducted on two well known similarity-based learning approaches to text categorization: the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNNModel) has been proposed. It combines the strength of both k-NN and Rocchio. A text categor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2006