Stylogenetics: Clustering-based stylistic analysis of literary corpora

نویسندگان

  • Kim Luyckx
  • Walter Daelemans
  • Edward Vanhoutte
چکیده

Current advances in shallow parsing allow us to use results from this field in stylogenetic research, so that a new methodology for the automatic analysis of literary texts can be developed. The main pillars of this methodology which is borrowed from topic detection research are (i) using more complex features than the simple lexical features suggested by traditional approaches, (ii) using authors or groups of authors as a prediction class, and (iii) using clustering methods to indicate the differences and similarities between authors (i.e. stylogenetics). On the basis of the stylistic genome of authors, we try to cluster them into closely related and meaningful groups. We report on experiments with a literary corpus of five million words consisting of representative samples of female and male authors. Combinations of syntactic, token-based and lexical features constitute a profile that characterizes the style of an author. The stylogenetics methodology opens up new perspectives for literary analysis, enabling and necessitating close cooperation between literary scholars and computational linguists.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Invited talk: Text Analysis and Machine Learning for Stylometrics and Stylogenetics

Automatic Text Categorization, learning to assign documents to specific categories (e.g. in topic assignment or spam filtering), has been an influential application in Natural Language Processing. These systems consist of two components: a first one that constructs representations of documents (mostly bags of words represented as binary or numeric vectors), and a second one that uses standard m...

متن کامل

Linguistic Issues in Language Technology – LiLT

T. S. Eliot’s poem The Waste Land is a notoriously challenging example of modernist poetry, mixing the independent viewpoints of over ten distinct characters without any clear demarcation of which voice is speaking when. In this work, we apply unsupervised techniques in computational stylistics to distinguish the particular styles of these voices, offering a computer’s perspective on longstandi...

متن کامل

The historical composition of the lexicon as a stylistic factor in a text-oriented culture: a case-study from Modern Hebrew

This article studies the relevance of an historical lexical analysis to the stylistic description of Modern Hebrew texts. The examination of the lexical make-up of two distinct genres – administrative language and folksong – reveals a correlation between the social functions of the corpora and their formal characteristics. The administrative corpus reflects the lexical structure of standard Mod...

متن کامل

Authorship identification from unstructured texts

Authorship identification is a task of identifying authors of anonymous texts given examples of the writing of authors. The increasingly large volumes of anonymous texts on the Internet enhance the great yet urgent necessity for authorship identification. It has been applied to more and more practical applications including literary works, intelligence, criminal law, civil law, and computer for...

متن کامل

Stylistic Analysis of a Poetic Text: A Case from Persian

Poetic analysis involves the explication of a poem by focusing on the process of semiosis in it. Through semiosis linguistic meaning is transformed into stylistic meaning. An examination of semiosis brings us to look at the hypersemanticized poetic structures which are none other than the style features of a poem. Since style functions in a literary text by conveying meanings other than literal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006