A Novel Term Weighting Scheme Midf for Text Categorization

نویسنده

  • C. DEISY
چکیده

Text categorization is a task of automatically assigning documents to a set of predefined categories. Usually it involves a document representation method and term weighting scheme. This paper proposes a new term weighting scheme called Modified Inverse Document Frequency (MIDF) to improve the performance of text categorization. The document represented in MIDF is trained using the support vector machines classifier with radial basis function kernel. The experiments are carried out in Reuters-21578 corpora. The performance measures taken for text categorization are F1–measure and cost measure. The proposed term weighting scheme performs better than the existing term weighting schemes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Supervised Term Weighting for Binary Text Categorization

In text categorization, the class agnostic (unsupervised) tf× idf term weighting scheme has seen widespread usage. Recently proposed supervised term weighting methods including tf×rf and tf× δidf make use of term class distribution to improve the classification accuracy. However, they only account for the presence of terms in classes, ignoring the absence of key categorical terms, which may giv...

متن کامل

Inverse Category Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) int...

متن کامل

Inverse-Category-Frequency based Supervised Term Weighting Schemes for Text Categorization

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) int...

متن کامل

The Role of Rare Terms in Enhancing the Performance of Polynomial Networks Based Text Categorization

In this paper, the role of rare or infrequent terms in enhancing the accuracy of English Text Categorization using Polynomial Networks (PNs) is investigated. To study the impact of rare terms in enhancing the accuracy of PNs-based text categorization, different term reduction criteria as well as different term weighting schemes were experimented on the Reuters Corpus using PNs. Each term weight...

متن کامل

Proposing a New Term Weighting Scheme for Text Categorization

In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf , and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010