Using KNN and SVM Based One-Class Classifier for Detecting Online Radicalization on Twitter
نویسندگان
چکیده
Twitter is the largest and most popular micro-blogging website on Internet. Due to low publication barrier, anonymity and wide penetration, Twitter has become an easy target or platform for extremists to disseminate their ideologies and opinions by posting hate and extremism promoting tweets. Millions of tweets are posted on Twitter everyday and it is practically impossible for Twitter moderators or an intelligence and security analyst to manually identify such tweets, users and communities. However, automatic classification of tweets into predefined categories is a non-trivial problem problem due to short text of the tweet (the maximum length of a tweet can be 140 characters) and noisy content (incorrect grammar, spelling mistakes, presence of standard and non-standard abbreviations and slang). We frame the problem of hate and extremism promoting tweet detection as a one-class or unary-class categorization problem by learning a statistical model from a training set containing only the objects of one class . We propose several linguistic features such as presence of war, religious, negative emotions and offensive terms to discriminate hate and extremism promoting tweets from other tweets. We employ a single-class SVM and KNN algorithm for one-class classification task. We conduct a case-study on Jihad, perform a characterization study of the tweets and measure the precision and recall of the machine-learning based classifier. Experimental results on large and real-world dataset demonstrate that the proposed approach is effective with F-score of 0.60 and 0.83 for the KNN and SVM classifier respectively.
منابع مشابه
Sentiment Analysis on Twitter Data using KNN and SVM
Millions of users share opinions on various topics using micro-blogging every day. Twitter is a very popular microblogging site where users are allowed a limit of 140 characters; this kind of restriction makes the users be concise as well as expressive at the same time. For that reason, it becomes a rich source for sentiment analysis and belief mining. The aim of this paper is to develop such a...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملEpileptic Seizure Detection in EEG signals Using TQWT and SVM-GOA Classifier
Background: Epilepsy is a Brain disorder disease that affects people's quality of life. If it is diagnosed at an early stage, it will not be spread. Electroencephalography (EEG) signals are used to diagnose epileptic seizures. However, this screening system cannot diagnose epileptic seizure states precisely. Nevertheless, with the help of computer-aided diagnosis systems (CADS), neurologists ca...
متن کاملGene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کاملTracking Model of Moving Target Based on KNN - SVM
According to the defects of KNN(K-Nearest Neighbor) algorithm and SVM(Support Vector Machine) algorithm in tracking a moving target such the large consumption and the low accuracy of target tracking error, a tracking model of moving target is proposed based on the combination of KNN algorithm and SVM algorithm with minimum distance optimization. First categories divided according to the princip...
متن کامل