Pastiche Detection Based on Stopword Rankings. Exposing Impersonators of a Romanian Writer

نویسندگان

  • Liviu P. Dinu
  • Vlad Niculae
  • Maria-Octavia Sulea
چکیده

We applied hierarchical clustering using Rank distance, previously used in computational stylometry, on literary texts written by Mateiu Caragiale and a number of different authors who attempted to impersonate Caragiale after his death, or simply to mimic his style. Their pastiches were consistently clustered opposite to the original work, thereby confirming the performance of the method and proposing an extension of the method from simple authorship attribution to the more complicated problem of pastiche detection. The novelty of our work is the use of frequency rankings of stopwords as features, showing that this idea yields good results for pastiche detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameters for Topic Boundary Detection in Multi-Party Dialogues

We present a topic boundary detection method that searches for connections between sequences of utterances in multi party dialogues. The connections are established based on word identity. We compare our method to a state-of-the art automatic topic boundary detection method that was also used on multi party dialogues. We checked various methods of preprocessing of the data, including stemming, ...

متن کامل

The Tash her Father Wore : World Literature ,

This article studies the Turkish-German writer Kemal Kurt’s Ja, sagt Molly (1998) [‘Yes, says Molly’], an ironic meta-fiction to which little critical attention has been paid. Kurt questions the representation of Turks as untutored aspirants to Western culture and challenges the traditional images of exclusion and discrimination. Through a study of his use of pastiche and references to World Li...

متن کامل

Authorship Identification of Romanian Texts with Controversial Paternity

In this work we propose a new strategy for the authorship identification problem and we test it on an example from Romanian literature: did Radu Albala found the continuation of Mateiu Caragiale’s novel ”Sub pecetea tainei”, or did he write himself the respective continuation? The proposed strategy is based on the similarity of rankings of function words; we compare the obtained results with th...

متن کامل

Chaperones and Impersonators: Run-time Support for Contracts on Higher-Order, Stateful Values

Racket’s chaperone and impersonator language constructs provide run-time support for implementing higher-order contracts on mutable structures and abstract datatypes. Using chaperones and impersonators, contracts on mutable data can be enforced without changing the API to that data; contracts on large data structures can be checked lazily on only the accessed parts of the structure; contracts o...

متن کامل

Automatically Building a Stopword List for an Information Retrieval System

Words in a document that are frequently occurring but meaningless in terms of Information Retrieval (IR) are called stopwords. It is repeatedly claimed that stopwords do not contribute towards the context or information of the documents and they should be removed during indexing as well as before querying by an IR system. However, the use of a single fixed stopword list across different documen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012