Fighting Web Spam
نویسندگان
چکیده
High ranking of a Web site in search engines can be directly correlated to high revenues. This amplifies the phenomenon of Web spamming which can be defined as preparing or manipulating any features of Web documents or hosts to mislead search engines’ ranking algorithms to gain an undeservedly high position in search results. Web spam remarkably deteriorates the information quality available on the Web and thus affects the whole Web community including search engines. The struggle between search engines and spammers is ongoing: both sides apply increasingly sophisticated techniques and counter-techniques against each other. In this paper, we first present a general background concerning the Web spam phenomenon. We then explain why the machine learning approach is so attractive for Web spam combating. Finally, we provide results of our experiments aiming at verification of certain open questions. We investigate the quality of data provided as the Web Spam Reference Corpus, widely used by the research community as a benchmark, and propose some improvements. We also try to address the question concerning parameter tuning for cost-sensitive classifiers and we delve into the possibility of using linguistic features for distinguishing spam from non-spam.
منابع مشابه
Using Spam Farm to Boost PageRank
Today people have become more and more dependent on search engines such as Google, Yahoo, and MSN, etc., for their information needs. Web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniques is essential for evaluating the strength and weakness of a ranking algorithm, and for fig...
متن کاملLink-Based Similarity Search to Fight Web Spam
We investigate the usability of similarity search in fighting Web spam based on the assumption that an unknown spam page is more similar to certain known spam pages than to honest pages. In order to be successful, search engine spam never appears in isolation: we observe link farms and alliances for the sole purpose of search engine ranking manipulation. The artificial nature and strong inside ...
متن کاملUniversity of California Riverside Fighting Spam, Phishing and Email Fraud
OF THE THESIS Fighting Spam, Phishing and Email Fraud
متن کاملAnti-Trust Rank: Fighting Web Spam
The Web is both an excellent medium for sharing information as well as an attractive platform for delivering products and services. This platform is, to some extent, mediated by search engines in order to meet the needs of users seeking information. Search engines are the “dragons” that keep a valuable treasure: information [8]. Given the vast amount of information available on the Web, it is c...
متن کاملFighting WebSpam: Detecting Spam on the Graph Via Content and Link Features
We address a novel semi-supervised learning strategy for Web Spam issue. The proposed approach explores graph construction which is the key of representing data semantical relationship, and emphasizes on label propagation from multi views under consistency criterion. Furthermore, we infer labels for the rest of the unlabeled nodes in fusing spectral space. Experiments on the Webspam Challenging...
متن کامل