Mercator as a web crawler
نویسنده
چکیده
The Mercator describes, as a scalable, extensible web crawler written entirely in Java. In term of Scalable, web crawlers must be scalable and it is important component of many web services, but their design is not well-documented in the literature. In this paper, we enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercator’s support for extensibility and customizability. Finally, we comment on Mercator’s performance, which we have found to be more efficient and comparable to that of other craw-
منابع مشابه
A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler
This Paper described A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler. We enumerate the major components of any Scalable and Focused Web Crawler and describe the particular components used in this Novel Architecture. We also describe this Novel Architecture support for Extensibility and downloaded user’s support information. We also describe how the ...
متن کاملUsing High Performance Systems to Build Collections for a Digital Library
Nothing is more distributed than the Web, with its content spread across thousands of servers. High performance hardware and software is essential for an effective download, analysis, and organization of this content. We describe our experience with a highly parallel Web crawling system (Mercator) to construct – automatically – collections of scientific resources for the National Science Digita...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملImplications of Web Mercator and Its Use in Online Mapping
Online interactive maps have become a popular means of communicating with spatial data. In most online mapping systems, Web Mercator has become the dominant projection. While the Mercator projection has a long history of discussion about its inappropriateness for general-purpose mapping, particularly at the global scale, and seems to have been virtually phased out for general-purpose global-sca...
متن کاملSurvey on – Self Adaptive Focused Crawler
A focused crawler may be described as a crawler which returns relevant web pages on a given topic in traversing the web. Web Crawlers are one of the most crucial part of the Search Engines to collect pages from the Web. The requirement of a web crawler that downloads most relevant web pages from such a large web is still a major challenge in the field of Information Retrieval Systems. Most Web ...
متن کامل