Character Distributions of Classical Chinese Literary Texts: Zipf's Law, Genres, and Epochs

نویسندگان

  • Chao-Lin Liu
  • Shuhua Zhang
  • Yuanli Geng
  • Huei-ling Lai
  • Hongsu Wang
چکیده

We collect 14 representative corpora for major periods in Chinese history in this study. These corpora include poetic works produced in several dynasties, novels of the Ming and Qing dynasties, and essays and news reports written in modern Chinese. The time span of these corpora ranges between 1046 BCE and 2007 CE. We analyze their character and word distributions from the viewpoint of the Zipf's law, and look for factors that affect the deviations and similarities between their Zipfian curves. Genres and epochs demonstrated their influences in our analyses. Specifically, the character distributions for poetic works of between 618 CE and 1644 CE exhibit striking similarity. In addition, although texts of the same dynasty may tend to use the same set of characters, their character distributions still deviate from each other.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zipf ’ S Law in Literature

We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal power law is obtained for the lexical wealth defined as the ratio between the number of different words and the total number of words of a given text. By consi...

متن کامل

Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution

BACKGROUND Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,...) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random...

متن کامل

Character Networks and Book Genre Classification

We compare the social character networks of biographical, legendary and fictional texts, in search of statistical marks of historical information. We examine the frequency of character appearance and find a Zipf Law that does not depend on the literary genera and historical content. We also examine global and local complex networks indexes, in particular, correlation plots between the recently ...

متن کامل

Zipf's law against the text size: a half-rational model

In this article, we consider Zipf-Mandelbrot law as applied to texts in natural languages. We present a simple model of dependence of the law on the text size, which is featured by variable power-law tail and constant ratio of the most frequent words. As a result we derive several closed formulas, which accord with empirical data qualitatively and partially quantitatively. For example, there ap...

متن کامل

Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.05587  شماره 

صفحات  -

تاریخ انتشار 2017