Digital Authorship Attribution in Russian-Language Fanfiction and Classical Literature
نویسندگان
چکیده
This article is the third paper in a series aimed at establishment of authorship Russian-language texts. considers methods for determining classical Russian literary texts, as well fanfiction The process author was first considered version classification experiments using closed set authors, and were also completed complicated modification problem an open authors. use to identify text justified by conclusions about effectiveness fastText Support Vector Machine (SVM) with selection informative features discussed our past studies. In case attribution, proposed are based on author’s combination One-Class SVM statistical estimates vector’s similarity measures. feature algorithm authors chosen comparison five different methods, including previously genetic baseline. regularization-based (RbFS) found be most efficient method, while complete enumeration (FFS SFS) ineffective any accuracy RbFS texts averaged 83%, which outperforms other 3 10% identical number features, average 84%. For attribution cross-topic classification, method 85%, in-group it 75 78%, depending group, best result among considered.
منابع مشابه
Cross-Language Authorship Attribution
This paper presents a novel task of cross-language authorship attribution (CLAA), an extension of authorship attribution task to multilingual settings: given data labelled with authors in language X , the objective is to determine the author of a document written in language Y , where X 6= Y . We propose a number of cross-language stylometric features for the task of CLAA, such as those based o...
متن کاملAuthorship Attribution in Bengali Language
We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...
متن کاملLanguage Independent Authorship Attribution using Character Level Language Models
We present a method for computerassisted authorship attribution based on character-level -gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present exp...
متن کاملWho's At The Keyboard? Authorship Attribution in Digital Evidence Investigations
In some investigations of digital crime, the question of who was at the keyboard when incriminating documents were produced can be legitimately raised. Authorship attribution can then contribute to the investigation. Authorship methods which focus on linguistic characteristics currently have accuracy rates ranging from 72% to 89%, within the computational paradigm. This article presents a compu...
متن کاملAuthorship Attribution
Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in “non-traditional” authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. An...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Algorithms
سال: 2022
ISSN: ['1999-4893']
DOI: https://doi.org/10.3390/a16010013