Frequency of occurrence of numbers in the World Wide Web

نویسندگان

  • Sergey N. Dorogovtsev
  • Jose F. F. Mendes
  • J. G. Oliveira
چکیده

The distribution of numbers in human documents is determined by a variety of diverse natural and human factors, whose relative significance can be evaluated by studying the numbers’ frequency of occurrence. Although it has been studied since the 1880’s, this subject remains poorly understood. Here, we obtain the detailed statistics of numbers in the World Wide Web, finding that their distribution is a heavy-tailed dependence which splits in a set of power-law ones. In particular, we find that the frequency of numbers associated to western calendar years shows an uneven behavior: 2004 represents a ‘singular critical’ point, appearing with a strikingly high frequency; as we move away from it, the decreasing frequency allows us to compare the amounts of existing information on the past and on the future. Moreover, while powers of ten occur extremely often, allowing us to obtain statistics up to the huge 10 , ‘non-round’ numbers occur in a much more limited range, the variations of their frequencies being dramatically different from standard statistical fluctuations. These findings provide a view of the array of numbers used by humans as a highly non-equilibrium and inhomogeneous system, and shed a new light on an issue that, once fully investigated, could lead to a better understanding of many sociological and psychological phenomena.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frequency of occurrence of numbers in the , ame , World Wide Web

University The distribution of numbers in human documents is determined by a variety of diver and human factors, whose relative significance can be evaluated by studying the numbers’ of occurrence. Although it has been studied since the 1880’s, this subject remains poorly un Here, we obtain the detailed statistics of numbers in the World Wide Web, finding distribution is a heavy-tailed dependen...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

The shape of the past in the World Wide Web: Scale-free patterns and dynamics

Human societies accumulate a great deal of information about past events. People make reference to things that happened in time in different ways and record them in multiple media. We have studied the current use of this information by analysing the frequency of occurrence of numbers associated with years in the World Wide Web (WWW). We found a consistent scale-free reduction in the number of w...

متن کامل

Fraudulent Support Telephone Number Identification Based on Co-Occurrence Information on the Web

“Fraudulent support phones” refers to the misleading telephone numbers placed on Web pages or other media that claim to provide services with which they are not associated. Most fraudulent support phone information is found on search engine result pages (SERPs), and such information substantially degrades the search engine user experience. In this paper, we propose an approach to identify fraud...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/physics/0504185  شماره 

صفحات  -

تاریخ انتشار 2005