Improved Micro-blog Classification for Detecting Abusive Arabic Twitter Accounts
نویسندگان
چکیده
The increased use of social media in Arab regions has attracted spammers seeking new victims. Spammers use accounts on Twitter to distribute adult content in Arabic-language tweets, yet this content is prohibited in these countries due to Arabic cultural norms. These spammers succeed in sending targeted spam by exploiting vulnerabilities in content-filtering and internet censorship systems, primarily by using misspelled words to bypass content filters. In this paper we propose an Arabic word correction method to address this vulnerability. Using our approach, we achieve a predictive accuracy of 96.5% for detecting abusive accounts with Arabic tweets.
منابع مشابه
Abusive Language Detection on Arabic Social Media
In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a n...
متن کاملApplying geographical clustering methods to analyze geo-located open micro-blog posts
In this paper we conduct an exploratory geographical analysis of a sample of post data from the popular micro-blogging service Twitter for the period 22nd June to 12th October 2011 in the city of Leeds. For some user accounts clear patterns of daily activity are observed, and spatiotemporal concentrations of Twitter posts (tweets) are thought likely to represent, among other things, the residen...
متن کاملOne-step and Two-step Classification for Abusive Language Detection on Twitter
Automatic abusive language detection is a difficult but important task for online social media. Our research explores a twostep approach of performing classification on abusive language and then classifying into specific types and compares it with one-step approach of doing one multi-class classification for detecting sexist and racist languages. With a public English Twitter corpus of 20 thous...
متن کاملTowards Analyzing Micro-Blogs for Detection and Classification of Real-Time Intentions
Micro-blog forums, such as Twitter, constitute a powerful medium today that people use to express their thoughts and intentions on a daily, and in many cases, hourly, basis. Extracting ‘Real-Time Intention’ (RTI) of a user from such short text updates is a huge opportunity towards web personalization and social networking around dynamic user context. In this paper, we explore the novel problem ...
متن کاملOnline Forums Hotspot Detection and Analysis Using Aging Theory
The exponential growth of social media arouses much attention on public opinion information. The online forums, blogs, micro blogs are proving to be extremely valuable resources and are having bulk volume of information. However, most of the social media data is unstructured and semi structured form. So that it is more difficult to decipher automatically. Therefore, it is very much essential to...
متن کامل