Using Intra-Profile Information for Author Profiling

نویسندگان

  • Adrián Pastor López-Monroy
  • Manuel Montes-y-Gómez
  • Hugo Jair Escalante
  • Luis Villaseñor Pineda
چکیده

In this paper we describe the participation of the Laboratory of Language Technologies of INAOE at PAN 2014. We address the Author Profiling (AP) task finding and exploiting relationships among terms, documents, profiles and subprofiles. Our approach uses the idea of second order attributes (a lowdimensional and dense document representation) [4], but goes beyond incorporating information among each target profile. The proposed representation deepen the analysis incorporating information among texts in the same profile, this is, we focus in subprofiles. For this, we automatically find subprofiles and build document vectors that represent more detailed relationships of documents and subprofiles. We compare the proposed representation with the standard Bag-of-Terms and the best method in PAN13 using the PAN 2014 corpora for AP task. Results show evidence of the usefulness of intra-profile information to determine gender and age profiles. According to the PAN 2014 official results, the proposed method was one of the best three approaches for most social media domains. Particularly, it achieved the best performance in predicting age and gender profiles for blogs and tweets in English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Author Profiling using Complementary Second Order Attributes and Stylometric Features

In this paper we present an approach for the task of author profiling. We propose a modular framework, extracting two main group of features, combined with appropriate preprocessing, implementing Support Vector Machines for classification. The two main groups we used were stylometric and discriminative, featuring trigrams on one hand and complementary-weighted Second Order Attributes on the oth...

متن کامل

Glow Discharge Depth Profiling a Powerful Analytical Technique in Surface Engineering (TECHNICAL NOTE)

A variety of analytical techniques have been developed and employed to characterize the surfaces, subsurfaces and interfaces of surface engineering systems. They provide important information for quality control, process optimization and further development. Since the mid 1980's, glow discharge spectrometry (GDS) has emerged as an important and versatile technique for rapid depth profiling anal...

متن کامل

Style-based Distance Features for Author Verification Notebook for PAN at CLEF 2013

In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.

متن کامل

Exploring the Effects of Cross-Genre Machine Learning for Author Profiling in PAN 2016

Author profiling deals with the study of various profile dimensions of an author such as age and gender. This work describes our methodology proposed for the task of cross-genre author profiling at PAN 2016. We address gender and age prediction as a classification task and approach this problem by extracting stylistic and lexical features for training a logistic regression model. Furthermore, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014