Simple-Random-Sampling-Based Multiclass Text Classification Algorithm

نویسندگان

  • Wuying Liu
  • Lin Wang
  • Mianzhu Yi
چکیده

Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imbalanced Multiclass Data Classification Using Ant Colony Optimization Algorithm

Class imbalance problems have drawn increasing interest lately because of its classification trouble caused by imbalanced class deliveries and poor prediction performance for minority class. This problem is particularly common in preparation and can be detected in various disciplines including fraud detection, anomaly detection, oil spillage detection, medical diagnosis, facial recognition. Man...

متن کامل

Boostexter: a System for Multiclass Multi-label Text Categorization

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. We rst show how to extend the standard notion of classiication by allowing each instance to be associated with multiple labels. We then discuss our approach for multiclass multi-label text categorization which is based on a new and improved family of boosting algorithms. We desc...

متن کامل

Multiclass Boosting with Adaptive Group-Based kNN and Its Application in Text Categorization

AdaBoost is an excellent committee-based tool for classification. However, its effectiveness and efficiency in multiclass categorization face the challenges from methods based on support vector machine SVM , neural networks NN , naı̈ve Bayes, and k-nearest neighbor kNN . This paper uses a novel multi-class AdaBoost algorithm to avoid reducing the multi-class classification problem to multiple tw...

متن کامل

Using Multidimensional ADTPE and SVM for Optical Modulation Real-Time Recognition

Based on the feature extraction of multidimensional asynchronous delay-tap plot entropy (ADTPE) and multiclass classification of support vector machine (SVM), we propose a method for recognition of multiple optical modulation formats and various data rates. We firstly present the algorithm of multidimensional ADTPE, which is extracted from asynchronous delay sampling pairs of modulated optical ...

متن کامل

A Novel Multiclass Text Classification Algorithm Based on Multiconlitron

A novel multiclass text classification algorithm based on multiconlitron is proposed. The multiconlitron is constructed for each possible pair of classes in sample space, each of which is used to separate two classes. For the sample to be classified, every multiconlitron is used to judge its classman vote for the corresponding class. The final class of the sample is determined by the number of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2014  شماره 

صفحات  -

تاریخ انتشار 2014