A Textual Modus Operandi: Surrey's Simple System for Author Identification Notebook for PAN at CLEF 2013
نویسندگان
چکیده
Detecting deceptions of various kinds may be variously possible, but has little value if the deceiver cannot be identified. In this paper, we discuss our approach to Authorship Attribution that uses vector similarity with a frequencymean-variance framework for patterns of stopwords (no more than ten). The high frequency individual occurrences, and patterns of co-occurrence, can be used as identifier of an author’s style, and operates similarly across certain languages without prior linguistic knowledge. This simple system achieved F1 values of 0.66, 0.74 and 0.78 for Early Bird, Final, and Post submission assessment of the Train Corpus. We cannot yet offer further explanation as the Test Corpus is not available at the time of writing.
منابع مشابه
Readability for Author Profiling? Notebook for PAN at CLEF 2013
This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.
متن کاملSemantic-based Features for Author Profiling Identification: First insights Notebook for PAN at CLEF 2013
In this article we present a semantic-based approach concerning the identification of particular author’s traits, such as age and gender, from social media texts. The model here described is intended to provide information on different levels of analysis: from textual markers to semantics. Different classifiers were used to assess the performance and scope of the model.
متن کاملVector Space Model and Overlap Metric for Author Identification Notebook for PAN at CLEF 2013
This paper describes our entry for the Author Identification task at PAN 2013. The Author Identification task was performed using a combination of Vector Space Model [1] (VSM) and Similarity Overlap Metric [3] (SOM) on the character n-grams extracted from the documents related to an author and the document of question. A combination of the VSM and SOM provided an overall F-measure, precision an...
متن کاملStyle-based Distance Features for Author Verification Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.
متن کاملA Graph Based Authorship Identification Approach: Notebook for PAN at CLEF 2015
The paper describes our approach for the Authorship Identification task at the PAN CLEF 2015. We extract textual patterns based on features obtained from shortest path walks over Integrated Syntactic Graphs (ISG). Then we calculate a similarity between the unknown document and the known document with these patterns. The approach uses a predefined threshold in order to decide if the unknown docu...
متن کامل