Spam Campaign Cluster Detection Using Redirected URLs and Randomized Sub-Domains
نویسندگان
چکیده
A substantial majority of the email sent everyday is spam. Spam emails cause many problems if someone acts or clicks on the link provided in the email body. The problems may include infecting users personal machine with malware, stealing personal information, capturing credit card information, etc. Since spam emails are generated as a part of a very limited numbers of spam campaigns, it is useful to cluster spam messages into campaigns, so as to identify which campaigns are the largest. This enables investigation to focus this attention on the largest as the most significant clusters. In this paper, we present a method to cluster spam emails into spam campaigns. In our approach, the redirected URL has been chosen as the primary field for cluster formation. Our study shows that, a huge number of URLs arriving in spam email eventually points to a much smaller set of redirected URLs. Our multilevel clustering method grouped 90% of our half million spam emails into 4 spam campaigns. In addition to redirected URLs, we also use randomized sub domains, which come as a given URL in email body, for campaign identification. We believe that our model can be applied in real time to quickly detect major campaign.
منابع مشابه
Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملRecognizing Spam Domains by Extracting Features from Spam Emails using Data Mining
This paper attempts to develop an algorithm to recognize spam domains using data mining techniques with the focus on law enforcement forensic analysis. Spam filtering has been the major weapon against spam, but failed to reduce the number of spam emails sent to an indiscriminate set of recipients. The proposed algorithm accepts as input, spam mails of personal account and extracts features such...
متن کاملCUD: Crowdsourcing for URL Spam Detection
The prevalence of spam URLs in Internet services, such as email, social networks, blogs and online forums has become a serious problem. These spam URLs host spam advertisements, phishing attempts, and malwares, which are harmful for normal users. Existing URL blacklist approaches offer limited protection. Although recent machine learning based URL classification approaches demonstrate good accu...
متن کاملA link graph-based approach to identify forum spam
Web spammers have taken note of the popularity of public forums such as blogs, wikis, webboards, and guestbooks. They are now exploiting them with the purpose of driving traffic to their malicious or fraudulent websites, such as those used for phishing, distributing malware, or selling counterfeit pharmaceuticals. A popular technique they use is to spam these forums with URLs to their spam webs...
متن کاملWarningBird: Detecting Suspicious URLs in Twitter Stream
Twitter can suffer from malicious tweets containing suspicious URLs for spam, phishing, and malware distribution. Previous Twitter spam detection schemes have used account features such as the ratio of tweets containing URLs and the account creation date, or relation features in the Twitter graph. Malicious users, however, can easily fabricate account features. Moreover, extracting relation fea...
متن کامل