Machine Learning for Classifying Authors of Anonymous Tweets, Blogs and Reviews

نویسندگان

  • Seifeddine Mechti
  • Maher Jaoua
  • Lamia Hadrich Belguith
چکیده

In this paper, we focus on detecting the profile of authors (age, gender) through their discussions. The 2014 Pan@Clef corpus consists of 4 sub-corpuses: tweets, blogs, social media and reviews. The proposed method is based on automatic classification, which uses some data extracted statistically from a source corpus. We present a hybrid method that combines the analysis of data in texts with a machine learning method. In order to obtain a better management of these data, we relied on the use of the “Decision table algorithm”.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Adaptation of Author's Stylometric Features to Document Types

Many Internet users face the problem of anonymous documents and texts with a counterfeit authorship. The number of questionable documents exceeds the capacity of human experts, therefore a universal automated authorship identification system supporting all types of documents is needed. In this paper, five predominant document types are analysed in the context of the authorship verification: boo...

متن کامل

Examination of Authors' Stylistic Elements of Electronic Messages based on Researched Studies

Identifying author is an important issue in natural language processing and text classification. It shows the author's characteristic in various texts. The rapid development of the Internet causes Web-based tools such as email and blogs with an anonymous identity become a popular method of communication for the perpetrators. Moreover, it creates some specific security issues. In this paper, we ...

متن کامل

Sentence Boundary Detection for Social Media Text

The paper presents a study on automatic sentence boundary detection in social media texts such as Facebook messages and Twitter micro-blogs (tweets). We explore the limitations of using existing rule-based sentence boundary detection systems on social media text, and as an alternative investigate applying three machine learning algorithms (Conditional Random Fields, Naïve Bayes, and Sequential ...

متن کامل

Comparison of classic regression methods with neural network and support vector machine in classifying groundwater resources

In the present era, classification of data is one of the most important issues in various sciences in order to detect and predict events. In statistics, the traditional view of these classifications will be based on classic methods and statistical models such as logistic regression. In the present era, known as the era of explosion of information, in most cases, we are faced with data that c...

متن کامل

Gender Classification with Deep Learning

For our project, we consider the task of classifying the gender of an author of a blog, novel, tweet, post or comment. Previous attempts have considered traditional NLP models such as bag of words and n-grams to capture gender differences in authorship, and apply it to a specific media (e.g. formal writing, books, tweets, or blogs). Our project takes a novel approach by applying deep learning m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014