Classification of Author and/or Genre? The Impact of Word Length
نویسندگان
چکیده
190 Russian texts – letters and poems by three different authors – are analyzed as to their word length. The basic question concerns the quantitative classification of these texts as to authorship or as to text sort. By way of multivariate analyses it is shown that word length is a characteristic of genre, rather than of authorship. 1 Word Length and the Quantitative Description of Text(s) and Author(s) This study focuses on word length. Word length is a central characteristic in the framework of quantitatively oriented linguistics. In fact, the study of word length can be traced back to a hundred year long tradition (as to a historical and methodological survey of these studies, cf. Grzybek 2004). Knowing this historical background, it is evident that word length, as it is studied today, is no isolated characteristic. The basic question of the present study is to what degree word length may contribute to the discrimination of authors and genres. An answer to this question will not only shed light on specific factors influencing word length; it will also provide an argument if word length is an appropriate variable to describe an author’s individual style, or the stylistic traits of specific genres. The discussion of these questions has a history of its own: as opposed to the field of quantitative typology of texts (cf. Alekseev 1988, Pieper 1979), approaches in the realm of stylometry (cf. Martynenko 1988) assume that the individual style of texts and/or authors can be quantitatively described. Part of this research has concentrated on the question of authorship attribution, particularly applying quantitative methods to decide doubtful cases of authorship (cf. Marusenko 1990). In a way, these approaches have paved the 1 This study has been conducted in context of research project # 15485 («Word Length Frequencies in Slavic Texts»), financially supported by the Austrian Research Fund (FWF); cf.: http://www-gewi.uni-graz.at/quanta. 2 Within a synergetic approach, word length is closely interrelated with other linguistic levels and units, and it is well known that word length interacts, e.g., with the number of phonemes (in a given inventory), with lexicon size (cf. Köhler 1986), with polysemy (cf. Altmann et al. 1982), or word length and word frequency (Strauss et al. 2004, with a survey of the Zipfian tradition).
منابع مشابه
The Effect of “Narrow Reading” on Learning Mid-Frequency Vocabulary: The Role of Genre and Author
This study investigated the effect of Narrow Reading (NR) on learning mid-frequency words. Vocabulary Size Test (VST) designed by Nation and Beglar (2007) was administered as the first pre-test to 196 students, from among whom 91 students whose vocabulary size ranged between 2100- 3500-word families, , became the target of this study and were randomly c...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملشناسایی خودکار سبک موسیقی
Nowadays, automatic analysis of music signals has gained a considerable importance due to the growing amount of music data found on the Web. Music genre classification is one of the interesting research areas in music information retrieval systems. In this paper several techniques were implemented and evaluated for music genre classification including feature extraction, feature selection and m...
متن کاملA Genre Analysis of Persian Research Article Abstracts: Communicative Moves and Author Identity
Most studies within the area of genre analysis, particularly those conducted in Iran, have exclusively used text analysis. While such investigations have led to important understandings of generic features of texts, it can be argued that incorporating interview data for triangulation can lead to better understanding of generic features of texts. Along this line, this paper reports the results o...
متن کاملAuthors, Genre, and Linguistic Convention
Authorship, Language, and Individual Choice The basic premise underlying authorship attribution studies is that while the form of expression in language is in some respects strictly bound by linguistic rule systems and in others somewhat constrained by topic and genre, it is in some other respects freely available for configuration or preferential choice by author or speaker. This individual va...
متن کامل