Zipf’s law outside the middle range
نویسنده
چکیده
Zipf (1949) already noted that the linear relationship that he observed between log frequency and log rank is strongest in the middle range: both very high and very low frequency items tend to deviate from the log-log regression line. In this paper the causes for such deviations are investigated and a more detailed statistical model is offered. The subgeometric mean property of frequency counts is introduced and used in proving that the size of the vocabulary tends to infinity as sample size is increased without bounds.
منابع مشابه
Can simple models explain Zipf's law for all exponents?
H. Simon proposed a simple stochastic process for explaining Zipf’s law for word frequencies. Here we introduce two similar generalizations of Simon’s model that cover the same range of exponents as the standard Simon model. The mathematical approach followed minimizes the amount of mathematical background needed for deriving the exponent, compared to previous approaches to the standard Simon’s...
متن کاملApproximation of the truncated Zeta distribution and Zipf's law
Zipf’s law appears in many application areas but does not have a closed form expression, which may make its use cumbersome. Since it coincides with the truncated version of the Zeta distribution, in this paper we propose three approximate closed form expressions for the truncated Zeta distribution, which may be employed for Zipf’s law as well. The three approximations are based on the replaceme...
متن کاملA Comparative Analysis of Gibrat’s and Zipf’s Law on Urban Population
The regional economics and geography literature on urban population size has in recent years shown interesting conceptual and methodological contributions on the validity of Gibrat’s Law and Zipf’s Law. Despite distinct modeling features, they express similar fundamental characteristics in an equilibrium situation. Zipf’s law is formalized in a static form, while its associated dynamic process ...
متن کاملThe span of correlations in dolphin whistle sequences
Long-range correlations are found in symbolic sequences from human language, music and DNA. Determining the span of correlations in dolphin whistle sequences is crucial for shedding light on their communicative complexity. Dolphin whistles share various statistical properties with human words, i.e. Zipf’s law for word frequencies (namely that the probability of the ith most frequent word of a t...
متن کاملUniversality of Zipf’s Law
We introduce a simple and generic model that reproduces Zipf’s law. By regarding the time evolution of the model as a random walk in the logarithmic scale, we explain theoretically why this model reproduces Zipf’s law. The explanation shows that the behavior of the model is very robust and universal.
متن کامل