Topic Models and n-gram Language Models for Author Profiling - Notebook for PAN at CLEF 2015
نویسندگان
چکیده
Author profiling is the task of determining the attributes for a set of authors. This paper presents the design, approach, and results of our submission to the PAN 2015 Author Profiling Shared Task. Four corpora, each in a different language, were provided. Each corpus consisted of collections of tweets for a number of Twitter users whose gender, age and personality scores are know. The task was to construct some system capable of inferring the same attributes on as yet unseen authors. Our system utilizes two sets of text based features, n–grams and topic models, in conjunction with Support Vector Machines to predict gender, age and personality scores. We ran our system on each dataset and received results indicating that n-grams and topic models are effective features across a number of languages.
منابع مشابه
Segmenting Target Audiences: Automatic Author Profiling using Tweets: Notebook for PAN at CLEF 2015
This paper describes a methodology proposed for author profiling using natural language processing and machine learning techniques. We used lexical information in the learning process. For those languages without lexicons, we automatically translated them, in order to be able to use this information. Finally, we will discuss how we applied this methodology to the 3rd Author Profiling Task at PA...
متن کاملSyntactic N-grams as Features for the Author Profiling Task: Notebook for PAN at CLEF 2015
This paper describes our approach to tackle the Author Profiling task at PAN 2015. Our method relies on syntactic features, such as syntactic based n-grams of various types in order to predict the age, gender and personality traits that has the author of a given text. In this paper, we describe the used features, the employed classification algorithm, and other general ideas concerning the expe...
متن کاملXRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015
This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification...
متن کاملAuthor Profiling using LDA and Maximum Entropy Notebook for PAN at CLEF 2013
This paper describes the traditional authorship attribution subtask of the PAN/CLEF 2013 workshop. In our attempt to classify the documents based on gender and age of an author, we have applied a traditional approach of topic modeling using Latent Dirichlet Allocation[LDA]. We used the content based features like topics and style based features like preposition-frequencies, which act as the eff...
متن کاملStatistical Learning Methods for Profiling Analysis: Notebook for PAN at CLEF 2015
Author profiling is the task to infer some information about an author by analyzing her/his writing style. It’s application in forensics, business intelligence and psychology makes this topic interesting for researching. In this notebook, we present our baseline approach using SVM and Linear Discriminant Analysis (LDA) classifiers. We analyze features obtained from LIWC dictionaries, these are ...
متن کامل