Automatic Author Profiling Based on Linguistic and Stylistic Features Notebook for PAN at CLEF 2013
نویسندگان
چکیده
The rapid expansion of blog and electronic data in Web 2.0 is abounding and thus it is becoming important to identify the author‟s profile also. The problems of automatic identification of author‟s gender and age based on linguistic and stylistic pattern have been a subject of increasingly research interest in the recent years. The research methodologies are also helpful for several other applications like criminal detection, security and author detection etc. We have used lexical, syntactic and structural features for identifying the gender and age group of the authors. We have employed the Decision tree classifier for classifying the author profile. We have achieved the accuracies of 56.83% and 28.95% for gender and age group classification, respectively.
منابع مشابه
Author Profiling for English and Spanish Text Notebook for PAN at CLEF 2013
This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality that has been successfully used in other classification tasks such as authorship attribution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of text. For each modality, we extract informa...
متن کاملEnsemble-based Classification for Author Profiling Using Various Features Notebook for PAN at CLEF 2013
This paper summarize our approach to author profiling task – a part of evaluation lab PAN’13. We have used ensemble-based classification on large features set. All the features are roughly described and experimental section provides evaluation of different methods and classification approaches.
متن کاملAuthor Profiling Using Corpus Statistics, Lexicons and Stylistic Features Notebook for PAN at CLEF-2013
This paper describes our participation in the 9th PAN evaluation lab in the author profiling task. The proposed approach relies on the extraction of stylistic, lexicon and corpus-based features, which were combined with a logistic classifier. These three sets of features contain pairwise intersections and even some features that belong to all categories. A comprehensive comparison of the contri...
متن کاملReadability for Author Profiling? Notebook for PAN at CLEF 2013
This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.
متن کاملStyle-based Distance Features for Author Verification Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.
متن کامل