Character Distributions of Classical Chinese Literary Texts: Zipf's Law, Genres, and Epochs
نویسندگان
چکیده
We collect 14 representative corpora for major periods in Chinese history in this study. These corpora include poetic works produced in several dynasties, novels of the Ming and Qing dynasties, and essays and news reports written in modern Chinese. The time span of these corpora ranges between 1046 BCE and 2007 CE. We analyze their character and word distributions from the viewpoint of the Zipf's law, and look for factors that affect the deviations and similarities between their Zipfian curves. Genres and epochs demonstrated their influences in our analyses. Specifically, the character distributions for poetic works of between 618 CE and 1644 CE exhibit striking similarity. In addition, although texts of the same dynasty may tend to use the same set of characters, their character distributions still deviate from each other.
منابع مشابه
Zipf ’ S Law in Literature
We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal power law is obtained for the lexical wealth defined as the ratio between the number of different words and the total number of words of a given text. By consi...
متن کاملRandom Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution
BACKGROUND Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,...) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random...
متن کاملCharacter Networks and Book Genre Classification
We compare the social character networks of biographical, legendary and fictional texts, in search of statistical marks of historical information. We examine the frequency of character appearance and find a Zipf Law that does not depend on the literary genera and historical content. We also examine global and local complex networks indexes, in particular, correlation plots between the recently ...
متن کاملZipf's law against the text size: a half-rational model
In this article, we consider Zipf-Mandelbrot law as applied to texts in natural languages. We present a simple model of dependence of the law on the text size, which is featured by variable power-law tail and constant ratio of the most frequent words. As a result we derive several closed formulas, which accord with empirical data qualitatively and partially quantitatively. For example, there ap...
متن کاملZipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with dif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.05587 شماره
صفحات -
تاریخ انتشار 2017