A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

نویسندگان

  • Saman Khalandi Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmai, Iran.
چکیده مقاله:

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, forming feature vectors, and final classification. In the presented model, the authors formed a feature vector for each document by means of weighting features use for IWO. Then, documents are trained with NB classifier; then using the test, similar documents are classified together. FS do increase accuracy and decrease the calculation time. IWO-NB was performed on the datasets Reuters-21578, WebKb, and Cade 12. In order to demonstrate the superiority of the proposed model in the FS, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) have been used as comparison models. Results show that in FS the proposed model has a higher accuracy than NB and other models. In addition, comparing the proposed model with and without FS suggests that error rate has decreased.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transferring Naive Bayes Classifiers for Text Classification

A basic assumption in traditional machine learning is that the training and test data distributions should be identical. This assumption may not hold in many situations in practice, but we may be forced to rely on a different-distribution data to learn a prediction model. For example, this may be the case when it is expensive to label the data in a domain of interest, although in a related but ...

متن کامل

Naïve Bayes Classifier with Various Smoothing Techniques for Text Documents

Due to huge amount of increase in text data, its classification has become an important issue, now days. There are many good classification techniques discussed in this paper. Each classification method has its own assumptions, advantages and limitations. One of the most widely used classifier is Naïve Bayes which performs well with different data sets. Various Smoothing techniques are applied ...

متن کامل

Text Classification using Association Rule with a Hybrid Concept of Naive Bayes Classifier and Genetic Algorithm

Text classification is the automated assignment of natural language texts to predefined categories based on their content. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Now a day the de...

متن کامل

Poisson naive Bayes for text classification with feature weighting

In this paper, we investigate the use of multivariate Poisson model and feature weighting to learn naive Bayes text classifier. Our new naive Bayes text classification model assumes that a document is generated by a multivariate Poisson model while the previous works consider a document as a vector of binary term features based on the presence or absence of each term. We also explore the use of...

متن کامل

Naive Bayes for Text Classification with Unbalanced Classes

Abstract. Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1, 2]. In this paper we present another transformation that is designed to combat a potential problem with ...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 4  شماره 3

صفحات  167- 184

تاریخ انتشار 2018-08-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023