Optimization of Distributed Crawler under Hadoop

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DHT-Based Distributed Crawler

A search engine, like Google, is built using two pieces of infrastructure a crawler that indexes the web and a searcher that uses the index to answer user queries. While Google's crawler has worked well, there is the issue of timeliness and the lack of control given to end-users to direct the crawl according to their interests. The interface presented by such search engines is hence very limite...

متن کامل

A Scalable, Distributed Web-Crawler*

In this paper we present a design and implementation of a scalable, distributed web-crawler. The motivation for design of such a system to effectively distribute crawling tasks to different machined in a peer-peer distributed network. Such architecture will lead to scalability and help tame the exponential growth or crawl space in the World Wide Web. With experiments on the implementation of th...

متن کامل

FoCUS – Forum Crawler Under Supervision

Forum Crawler Under Supervision (FoCUS) is a supervised web-scale forum crawler. The web contains large data and innumerable websites that are monitored by a tool or program known as crawler. The goal is to crawl relevant forum content from the web with minimal overhead. Forums have different layouts or styles and are powered by different forum software packages. They have similar implicit navi...

متن کامل

Distributed Optimization Under Adversarial Nodes

We investigate the vulnerabilities of consensus-based distributed optimization protocols to nodes that deviate from the prescribed update rule (e.g., due to failures or adversarial attacks). We first characterize certain fundamental limitations on the performance of any distributed optimization algorithm in the presence of adversaries. We then propose a resilient distributed optimization algori...

متن کامل

UbiCrawler: a scalable fully distributed Web crawler

We present the design and implementation of UbiCrawler, a scalable distributed web crawler, and we analyze its performance. The main features of UbiCrawler are platform independence, fault tolerance, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: MATEC Web of Conferences

سال: 2015

ISSN: 2261-236X

DOI: 10.1051/matecconf/20152202029