A Comparative Impact Study of Attribute Selection Techniques on Naïve Bayes Spam Filters
نویسندگان
چکیده
The main problem of the Internet e-mail service is the massive spam message delivery. Everyday, hundreds of unwanted and unhelpful messages are received by Internet users flooding their mailboxes. Fortunately, nowadays there are different kinds of filters able to identify and automatically delete most of these messages. In order to reduce the problem dimensionality only representative attributes are selected from each e-mail using feature selection techniques. This work presents a comparison among five well-known feature selection strategies when they are applied in conjunction with four different types of Naïve Bayes classifiers. The results obtained from the experiments carried out show the relevance of choosing an appropriate feature selection technique in order to obtain accurate results.
منابع مشابه
Spam Detection System Combining Cellular Automata and Naïve Bayes Classifier
In this study, we focus on the problem of spam detection. Based on a cellular automaton approach and naïve Bayes technique which are built as individual classifiers we evaluate a novel method combining multiple classifiers diversified both by feature selection and different classifiers to determine whether we can more accurately detect Spam. This approach combines decisions from three cellular ...
متن کاملNaive Bayes Spam Filtering Using Word Position Attributes
This paper explores the use of the naive Bayes classifier as the basis for personalized spam filters. Various machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word position based attribute vectors gives very good results when tested on several publicly available corpora. The effect of various forms ...
متن کاملNaive Bayes Spam Filtering Using Word-Position-Based Attributes
This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using wordposition-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms o...
متن کاملNaive Bayes spam filtering using word-position-based attributes and length-sensitive classification thresholds
This paper explores the use of the naive Bayes classifier as the basis for personalised spam filters. Several machine learning algorithms, including variants of naive Bayes, have previously been used for this purpose, but the author’s implementation using word-position-based attribute vectors gave very good results when tested on several publicly available corpora. The effects of various forms ...
متن کاملWeb Spam Detection Using Machine Learning in Specific Domain Features
In the last few years, as Internet usage becomes the main artery of the life's daily activities, the problem of spam becomes very serious for internet community. Spam pages form a real threat for all types of users. This threat proved to evolve continuously without any clue to abate. Different forms of spam witnessed a dramatic increase in both size and negative impact. A large amount of E-mail...
متن کامل