Semi Supervised Image Spam Hunter: A Regularized Discriminant EM Approach
نویسندگان
چکیده
Image spam is a new trend in the family of email spams. The new image spams employ a variety of image processing technologies to create random noises. In this paper, we propose a semi-supervised approach, regularized discriminant EM algorithm (RDEM), to detect image spam emails, which leverages small amount of labeled data and large amount of unlabeled data for identifying spams and training a classification model simultaneously. Compared with fully supervised learning algorithms, the semi-supervised learning algorithm is more suitedin adversary classification problems, because the spammers are actively protecting their work by constantly making changes to circumvent the spam detection. It makes the cost too high for fully supervised learning to frequently collect sufficient labeled data for training. Experimental results demonstrate that our approach achieves 91.66% high detection rate with less than 2.96% false positive rate, meanwhile it significantly reduces the labeling cost.
منابع مشابه
Towards Self-Exploring Discriminating Features
Many visual learning tasks are usually confronted by some common diiculties. One of them is the lack of supervised information, due to the fact that labeling could be tedious, expensive or even impossible. Such scenario makes it challenging to learn object concepts from images. This problem could be alleviated by taking a hybrid of labeled and unlabeled training data for learning. Since the unl...
متن کاملTowards Self-Exploring Discriminating Features for Visual Learning
Many visual learning tasks are usually confronted by some common difficulties. One of them is the lack of supervised information, due to the fact that labeling could be tedious, expensive or even impossible. Another difficulty is the high dimensionality of the visual data. Fortunately, these difficulties could be alleviated by using a hybrid of labeled and unlabeled training data for learning. ...
متن کاملSemi-supervised logistic discrimination via regularized Gaussian basis expansions
The problem of constructing classification methods based on both classified and unclassified data sets is considered for analyzing data with complex structures. We introduce a semi-supervised logistic discriminant model with Gaussian basis expansions. Unknown parameters included in the logistic model are estimated by regularization method along with the technique of EM algorithm. For selection ...
متن کاملSelf-Supervised Learning for Object Recognition based on Kernel Discriminant-EM Algorithm
In Proc. of IEEE Int’l Conf. on Computer Vision, Vancouver, Canada, 2001 It is often tedious and expensive to label large training data sets for learning-based object recognition systems. This problem could be alleviated by selfsupervised learning techniques, which take a hybrid of labeled and unlabeled training data to learn classifiers. Discriminant-EM (D-EM) proposed a framework for such tas...
متن کاملTraining SpamAssassin with Active Semi-supervised Learning
Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filter...
متن کامل