Combined Heuristic Technique for Optimization of Bloom Filter in Spam Filtering

نویسنده

  • Arulanand Natarajan
چکیده

Problem statement: Spam is an irrelevant or inappropriate message sent on the internet to a large number of newsgroups or users. A spam word is a list of well-known words that often appear in spam mails. Bloom Filter (BF) is used for identification of spam word. Approach: BF is a simple but powerful data structure that can check membership to a static set. The trade-off to use BF is a certain configurable risk of false positives. The odds of a false positive can be made very low if the hash bitmap is sufficiently large. Bin Bloom Filter (BBF) has number of BFs which assign group of words into bins with different false positive rates based on weight of the spam words. Genetic Algorithm (GA) was employed to minimize the total membership invalidation cost of BBF. GA had premature convergence problem. Simulated Annealing (SA) was incorporated with GA to prevent the premature convergence effectively. Results: The experimental results of total membership invalidation cost are analyzed for various sizes of bins. The results showed that the combined GA-SA model outperforms SA and GA model. Conclusion: GA has premature convergence due to its genetic operators that are not able to generate offsprings which are superior to the parents. So more number of similar chromosomes presented on the population. When GA is incorporated with SA new genes were introduced which causes diversity in the population and prevents premature convergence. The combined GA-SA outperforms GA and SA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services

This paper introduces Wirebrush4SPAM, a plug-in-based C framework specifically designed for the development of fast spam filters by assembling different antispam schemes and techniques. Wirebrush4SPAM can be used to (i) build, execute and deploy simple spam filters and (ii) develop new techniques that can be easily combined and tested to achieve more accurate antispam models. To construct custo...

متن کامل

Artificial Immune System for Bloom filter Optimization

Bloom filter is a probabilistic and space efficient data structure designed to check the membership of an element in a set. The trade-off to use Bloom filter may have configurable risk of false positives. The percentages of a false positive can be made low if the hash bit map is sufficiently massive. Spam is an unsolicited or irrelevant message sent on the internet to an outsized range of users...

متن کامل

Enhancing Signature-based Collaborative Spam Detection with Bloom Filters

Signature-based collaborative spam detection(SCSD) systems provide a promising solution addressing many problems facing statistical spam filters, the most widely adopted technology for detecting junk emails. In particular, some SCSD systems can identify previously unseen spam messages as such, although intuitively this would appear to be impossible. However, the SCSD approach usually relies on ...

متن کامل

Active Power Filter Design by a Novel Approach of Multi-Objective Optimization

This paper presents an innovative active power filter design method to simultaneously compensate the current harmonics and reactive power of a nonlinear load. The power filter integrates a passive power filter which is a RL low-pass filter placed in series with the load, and an active power filter which comprises an RL in series with an IGBT based voltage source converter. The filter is assumed...

متن کامل

Payload Inspection Using Parallel Bloom Filter in Dual Core Processor

This paper presents payload inspection for identification of spam files using bloom filter in dual core processor. Spam files flood the Internet in an attempt to dump the messages on recipients who do not intend to receive it. Spam costs the sender very little to send and most of the costs are levied to the recipients or the carriers. The proposed system identifies and filters the incoming spam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011