Spam Source Clustering by Constructing Spammer Network with Correlation Measure
نویسندگان
چکیده
Spam filtering is one of the most challenging problems in electric message systems. In general, recent studies on specifying real spam source are based on content filtering because spammers usually falsify their origin. We propose a method to specify spam source based on structural analysis with complex network. We assume that each spam sources either has the same victim list or uses the same spam-hosting program. We treat spam source target relationship as a bipartite network and construct weighted spam source network by network projection using correlation measure. We find that community clustering methods are inappropriate with spammer network. We group spammers with gradient-based grouping, which uses correlations between nodes as gradient between nodes. We convert them into local minima, which helps to cluster spammers into a few spam source groups. We investigate the weblog spam data with the proposed method and validate it. The method that we propose can be applied to diverse categorization problems, such as multiple text categorization and network subunit clustering.
منابع مشابه
Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining
This paper attempts to develop an algorithm to recognize spam domains using data mining techniques with the focus on law enforcement forensic analysis. Spam filtering has been the major weapon against spam, but failed to reduce the number of spam emails sent to an indiscriminate set of recipients. The proposed algorithm accepts as input, spam mails of personal account and extracts features such...
متن کاملIntroducing Social Trust to Collaborative Spam Mitigation
We propose SocialFilter, a trust-aware collaborative spam mitigation system. SocialFilter enables nodes with no email classification functionality to query the network on whether a host is a spammer. It employs Sybil-resilient trust inference to weigh the reports concerning spamming hosts that collaborating spam-detecting nodes (reporters) submit to the system. It weighs the spam reports accord...
متن کاملMachine Learning Approaches for Modeling Spammer Behavior
Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavi...
متن کاملSocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks
We have entered the era of social media networks represented by Facebook, Twitter, YouTube and Flickr. Internet users now spend more time on social networks than search engines. Business entities or public figures set up social networking pages to enhance direct interactions with online users. Social media systems heavily depend on users for content contribution and sharing. Information is spre...
متن کاملClustering Spam Domains and Destination Websites: Digital Forensics with Data Mining
Spam related cyber crimes have become a serious threat to society. Current spam research mainly aims to detect spam more effectively. We believe the identification and disruption of the supporting infrastructure used by spammers is a more effective way of stopping spam than filtering. The termination of spam hosts will greatly reduce the profit a spammer can generate and thwart his ability to s...
متن کامل