Digital Authorship Attribution in Russian-Language Fanfiction and Classical Literature

نویسندگان

چکیده

This article is the third paper in a series aimed at establishment of authorship Russian-language texts. considers methods for determining classical Russian literary texts, as well fanfiction The process author was first considered version classification experiments using closed set authors, and were also completed complicated modification problem an open authors. use to identify text justified by conclusions about effectiveness fastText Support Vector Machine (SVM) with selection informative features discussed our past studies. In case attribution, proposed are based on author’s combination One-Class SVM statistical estimates vector’s similarity measures. feature algorithm authors chosen comparison five different methods, including previously genetic baseline. regularization-based (RbFS) found be most efficient method, while complete enumeration (FFS SFS) ineffective any accuracy RbFS texts averaged 83%, which outperforms other 3 10% identical number features, average 84%. For attribution cross-topic classification, method 85%, in-group it 75 78%, depending group, best result among considered.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Language Authorship Attribution

This paper presents a novel task of cross-language authorship attribution (CLAA), an extension of authorship attribution task to multilingual settings: given data labelled with authors in language X , the objective is to determine the author of a document written in language Y , where X 6= Y . We propose a number of cross-language stylometric features for the task of CLAA, such as those based o...

متن کامل

Authorship Attribution in Bengali Language

We describe Authorship Attribution of Bengali literary text. Our contributions include a new corpus of 3,000 passages written by three Bengali authors, an end-toend system for authorship classification based on character n-grams, feature selection for authorship attribution, feature ranking and analysis, and learning curve to assess the relationship between amount of training data and test accu...

متن کامل

Language Independent Authorship Attribution using Character Level Language Models

We present a method for computerassisted authorship attribution based on character-level -gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection. To demonstrate the effectiveness and language independence of our approach, we present exp...

متن کامل

Who's At The Keyboard? Authorship Attribution in Digital Evidence Investigations

In some investigations of digital crime, the question of who was at the keyboard when incriminating documents were produced can be legitimately raised. Authorship attribution can then contribute to the investigation. Authorship methods which focus on linguistic characteristics currently have accuracy rates ranging from 72% to 89%, within the computational paradigm. This article presents a compu...

متن کامل

Authorship Attribution

Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in “non-traditional” authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. An...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2022

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a16010013