AN EVALUATION OF FILTERING TECHNIQUES IN A NAÏVE BAYESIAN ANTI-SPAM FILTER by

نویسندگان

  • Vikas P. Deshpande
  • Robert F. Erbacher
  • Nicholas Flann
  • Hugo de Garis
چکیده

An efficient anti-spam filter that would block all unsolicited messages i.e. spam, without blocking any legitimate messages is a growing need. To address this problem, this report takes a statistically-based approach, employing a Bayesian anti-spam filter, because it is content-based and self-learning (adaptive) in nature. We train the filter, using a large corpus of legitimate messages and spam, and we test the filter using new incoming personal messages. We evaluate four effective filtering techniques available for a Bayesian filter for our purposes. We look at the effectiveness of the technique, and we evaluate its different configurations for different threshold values in order to find an optimal anti-spam filter configuration. Based on cost-sensitive measures, we conclude that additional safety precautions are needed for a Bayesian anti-spam filter to be put into practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variable Thresholding In Naïve Bayesian Spam Filters

Email has become an essential means of communication for both business and personal use. However, the proliferation of unwanted email advertising or spam has cost organizations millions of dollars and has reduced the effectiveness of email as a communications medium. Recently, spam filters have been widely adopted as a means of combating these unwanted messages. This paper presents a method for...

متن کامل

An evaluation of Naive Bayesian anti-spam filtering

It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performan...

متن کامل

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an ...

متن کامل

BUPT at TREC 2006: Spam Track

This report summarizes our participation in the TREC 2006 spam track, in which we consider the use of Bayesian models for the spam filtering task. Firstly, our anti-spam filter, Kidult, is briefly introduced. And then we try to use weighted adjustment of separating hyperplane and selective classifiers ensemble to improve the filtering performance. Finally, we summarize the relevant results from...

متن کامل

Introduction of Fingerprint Vector based Bayesian Method for Spam Filtering

With the development of the diversification of spam, it raises the difficulties and challenges to content-based spam filtering. To address this problem, this paper firstly introduced the statistical features of Email headers, and then proposed a method to use these features to improve Bayesian anti-spam filter. The selected Email-header features are presented as the fingerprint vectors, and the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004