A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler

نویسندگان

  • Sarnam Singh
  • Nidhi Tyagi
چکیده

This Paper described A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler. We enumerate the major components of any Scalable and Focused Web Crawler and describe the particular components used in this Novel Architecture. We also describe this Novel Architecture support for Extensibility and downloaded user’s support information. We also describe how the Focused Web Crawler component integrates with Mercator: A Scalable, Extensible Web Crawler and also describe their functionality of every component and how to work together. We also describe how this Novel Architecture downloaded maximum pages from web in minimum time and sure partially extract web pages which is needed to users. Full Text: http://www.ijcsmc.com/docs/papers/June2013/V2I6201372.pdf

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mercator as a web crawler

The Mercator describes, as a scalable, extensible web crawler written entirely in Java. In term of Scalable, web crawlers must be scalable and it is important component of many web services, but their design is not well-documented in the literature. In this paper, we enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe t...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Tarantula - A Scalable and Extensible Web Spider

Web crawlers today suffer from poor navigation techniques which reduce their scalability while crawling the World Wide Web (WWW). In this paper we present a web crawler named Tarantula that is scalable and fully configurable. The work on Tarantula project was started with the aim of making a simple, elegant and yet an efficient Web Crawler offering better crawling strategies while walking throu...

متن کامل

The Architecture and Implementation of an Extensible Web Crawler

Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive. In this paper, we introduce the extensible crawler, a service that crawls the Web on behalf of its many client applications. Clients inject filters into the extensible crawler; the crawler evaluates all received filt...

متن کامل

Building a Peer-to-Peer, domain specific web crawler

The introduction of a crawler in mid 90s opened the floodgates for research in various application domains. Many attempts to create an ideal crawler failed due to the explosive nature of the web. In this paper, we describe the building blocks of PeerCrawl a Peer-to-Peer web crawler. This crawler can be used for generic crawling, is easily scalable and can be implemented on a grid of day-to-day ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013