Overview of the 3rd Author Profiling Task at PAN 2015
نویسندگان
چکیده
In this paper we describe and evaluate the corpora submitted to the PAN 2015 shared task on plagiarism detection for text alignment. We received monoand cross-language corpora in the following languages (pairs): English, Persian, Chinese, and Urdu-English, English-Persian. We present an independent section for each submitted corpus including statistics, discussion of the obfuscation techniques employed, and assessment of the corpus quality.
منابع مشابه
Segmenting Target Audiences: Automatic Author Profiling using Tweets: Notebook for PAN at CLEF 2015
This paper describes a methodology proposed for author profiling using natural language processing and machine learning techniques. We used lexical information in the learning process. For those languages without lexicons, we automatically translated them, in order to be able to use this information. Finally, we will discuss how we applied this methodology to the 3rd Author Profiling Task at PA...
متن کاملOverview of the PAN/CLEF 2015 Evaluation Lab
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...
متن کاملOverview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter
This overview presents the framework and the results of the Author Profiling task at PAN 2017. The objective of this year is to address gender and language variety identification. For this purpose a corpus from Twitter has been provided for four different languages: Arabic, English, Portuguese, and Spanish. Altogether, the approaches of 22 participants are evaluated.
متن کاملOverview of the Author Profiling Task at PAN 2013
This overview presents the framework and results for the Author Profiling task at PAN 2013. We describe in detail the corpus and its characteristics, and the evaluation framework we used to measure the participants performance to solve the problem of identifying age and gender from anonymous texts. Finally, the approaches of the 21 participants and their results are described.
متن کاملOverview of the 4th Author Profiling Task at PAN 2016: Cross-Genre Evaluations
This overview presents the framework and the results of the Author Profiling task at PAN 2016. The objective was to predict age and gender from a cross-genre perspective. For this purpose a corpus from Twitter has been provided for training, and different corpora from social media, blogs, essays, and reviews have been provided for evaluation. Altogether, the approaches of 22 participants were e...
متن کامل