Machine Learning for Classifying Authors of Anonymous Tweets, Blogs and Reviews
نویسندگان
چکیده
In this paper, we focus on detecting the profile of authors (age, gender) through their discussions. The 2014 Pan@Clef corpus consists of 4 sub-corpuses: tweets, blogs, social media and reviews. The proposed method is based on automatic classification, which uses some data extracted statistically from a source corpus. We present a hybrid method that combines the analysis of data in texts with a machine learning method. In order to obtain a better management of these data, we relied on the use of the “Decision table algorithm”.
منابع مشابه
Automatic Adaptation of Author's Stylometric Features to Document Types
Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: boo...
متن کاملExamination of Authors' Stylistic Elements of Electronic Messages based on Researched Studies
Identifying author is an important issue in natural language processing and text classification. It shows the author's characteristic in various texts. The rapid development of the Internet causes Web-based tools such as email and blogs with an anonymous identity become a popular method of communication for the perpetrators. Moreover, it creates some specific security issues. In this paper, we ...
متن کاملSentence Boundary Detection for Social Media Text
The paper presents a study on automatic sentence boundary detection in social media texts such as Facebook messages and Twitter micro-blogs (tweets). We explore the limitations of using existing rule-based sentence boundary detection systems on social media text, and as an alternative investigate applying three machine learning algorithms (Conditional Random Fields, Naïve Bayes, and Sequential ...
متن کاملComparison of classic regression methods with neural network and support vector machine in classifying groundwater resources
In the present era, classification of data is one of the most important issues in various sciences in order to detect and predict events. In statistics, the traditional view of these classifications will be based on classic methods and statistical models such as logistic regression. In the present era, known as the era of explosion of information, in most cases, we are faced with data that c...
متن کاملGender Classification with Deep Learning
For our project, we consider the task of classifying the gender of an author of a blog, novel, tweet, post or comment. Previous attempts have considered traditional NLP models such as bag of words and n-grams to capture gender differences in authorship, and apply it to a specific media (e.g. formal writing, books, tweets, or blogs). Our project takes a novel approach by applying deep learning m...
متن کامل