A New Supervised Term Ranking Method for Text Categorization

نویسندگان

  • Musa A. Mammadov
  • John Yearwood
  • Lei Zhao
چکیده

In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, χ statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and χ statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning with Unlabeled Data for Text Categorization Using a Bootstrapping and a Feature Projection Technique

A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. ...

متن کامل

Proposing a New Term Weighting Scheme for Text Categorization

In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf , and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimen...

متن کامل

Does a New Simple Gaussian Weighting Approach Perform Well in Text Categorization?

A new approach to the Text Categorization problem is here presented. It is called Gaussian Weighting and it is a supervised learning algorithm that, during the training phase, estimates two very simple and easily computable statistics which are: the Presence P, how much a term / is present in a category c\ the Expressiveness E, how much / is present outside c in the rest of the domain. Once the...

متن کامل

Probabilistic Supervised Term Weighting for Binary Text Categorization

In text categorization, the class agnostic (unsupervised) tf× idf term weighting scheme has seen widespread usage. Recently proposed supervised term weighting methods including tf×rf and tf× δidf make use of term class distribution to improve the classification accuracy. However, they only account for the presence of terms in classes, ignoring the absence of key categorical terms, which may giv...

متن کامل

Research on Text Categorization Based on a Weakly-Supervised Transfer Learning Method

This paper presents a weakly-supervised transfer learning based text categorization method, which does not need to tag new training documents when facing classification tasks in new area. Instead, we can take use of the already tagged documents in other domains to accomplish the automatic categorization task. By extracting linguistic information such as part-of-speech, semantic, co-occurrence o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010