Detecting Singleton Review Spammers Using Semantic Similarity

نویسندگان

  • Vlad Sandulescu
  • Martin Ester
چکیده

Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make wellinformed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more than a couple of reviews, discarding one-time reviewers. The number of singleton reviewers however is expected to be high for many review websites. While behavioral patterns are effective when dealing with elite users, for one-time reviewers, the review text needs to be exploited. In this paper we tackle the problem of detecting fake reviews written by the same person using multiple names, posting each review under a different name. We propose two methods to detect similar reviews and show the results generally outperform the vectorial similarity measures used in prior works. The first method extends the semantic similarity between words to the reviews level. The second method is based on topic modeling and exploits the similarity of the reviews topic distributions using two models: bag-of-words and bag-of-opinionphrases. The experiments were conducted on reviews from three different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset (800 reviews).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Spam Review Detection through Lexical Chain Based Semantic Similarity Algorithm (LCBSS) for Negative Reviews

The negative spam reviews are more harmful for hotel services because the impacts of negative information are faster and greater, than positive information. The sentiment analysis for detection of spam review is not effective in today scenario because for making new spam review, the spammers instead of copying exact texts; they are combining two or more text which contains the same sentiment. T...

متن کامل

Modeling Review Spam Using Temporal Patterns and Co-bursting Behaviors

Online reviews play a crucial role in helping consumers evaluate and compare products and services. However, review hosting sites are often targeted by opinion spamming. In recent years, many such sites have put a great deal of effort in building effective review filtering systems to detect fake reviews and to block malicious accounts. Thus, fraudsters or spammers now turn to compromise, purcha...

متن کامل

Detecting Compositionality of English Verb-Particle Constructions using Semantic Similarity

We present a novel method for detecting the compositionality of English verbparticle constructions (VPCs), based on the assumption that compositionality can be modelled with semantic similarity between VPCs and their base verb in isolation. We also evaluate the contribution of components in compositional VPCs using semantic overlap.

متن کامل

Spammers Are Becoming "Smarter" on Twitter

T witter has become one of the most commonly used communication tools in daily life. With 500 million users, Twitter now generates more than 500 million tweets per day. However, its popularity has also attracted spamming. Spammers spread many intensive tweets, which can lure legitimate users to commercial or malicious sites containing malware downloads, phishing, drug sales, scams, and more.1 S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015