crawler

PDD Crawler: A focused web crawler using link and content analysis for relevance prediction

Journal: :CoRR 2014

Prashant Dahiwale Mukesh M. Raghuwanshi Latesh G. Malik

Majority of the computer or mobile phone enthusiasts make use of the web for searching activity. Web search engines are used for the searching; The results that the search engines get are provided to it by a software module known as the Web Crawler. The size of this web is increasing round-the-clock. The principal problem is to search this huge database for specific information. To state whethe...

متن کامل

Discovering Land Cover Web Map Services from the Deep Web with JavaScript Invocation Rules

Journal: :ISPRS Int. J. Geo-Information 2016

Dongyang Hou Jun Chen Hao Wu

Automatic discovery of isolated land cover web map services (LCWMSs) can potentially help in sharing land cover data. Currently, various search engine-based and crawler-based approaches have been developed for finding services dispersed throughout the surface web. In fact, with the prevalence of geospatial web applications, a considerable number of LCWMSs are hidden in JavaScript code, which be...

متن کامل

Pdd Crawler: a Focused Web Crawler Using Link and Content Analysis for Relevence Prediction

2014

David C. Wyld Prashant Dahiwale M M Raghuwanshi Latesh Malik

Majority of the computer or mobile phone enthusiasts make use of the web for searching activity. Web search engines are used for the searching; The results that the search engines get are provided to it by a software module known as the Web Crawler. The size of this web is increasing round-the-clock. The principal problem is to search this huge database for specific information. To state whethe...

متن کامل

A Metadata Focused Crawler for Linked Data

2014

Raphael do Vale Amaral Gomes Marco A. Casanova Giseli Rabello Lopes Luiz André P. Paes Leme

The Linked Data best practices recommend publishers of triplesets to use well-known ontologies in the triplication process and to link their triplesets with other triplesets. However, despite the fact that extensive lists of open ontologies and triplesets are available, most publishers typically do not adopt those ontologies and link their triplesets only with popular ones, such as DBpedia and ...

متن کامل

Search Engine-Crawler Symbiosis: Adapting to Community Interests

2003

Gautam Pant Shannon Bradshaw Filippo Menczer

Web crawlers have been used for nearly a decade as a search engine component to create and update large collections of documents. Typically the crawler and the rest of the search engine are not closely integrated. If the purpose of a search engine is to have as large a collection as possible to serve the general Web community, a close integration may not be necessary. However, if the search eng...

متن کامل

Collaborative Web Crawler over High-speed Research Network

2006

Shisanu Tongchim Canasai Kruengkrai Virach Sornlertlamvanich Hitoshi Isahara

This paper proposes an idea for constructing a distributed web crawler by utilizing existing high-speed research networks. This is an initial effort of the Web Language Engineering (WLE) project which investigates techniques in processing the languages found in published web documents. In this paper, we focus on designing a geographically distributed web crawler. Multiple crawlers work collabor...

متن کامل

Minimizing the Network Distance in Distributed Web Crawling

2004

Odysseas Papapetrou George Samaras

Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed crawlers is currently not fully utilized. The optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work we describe IPMicra, a distributed location aware web crawler that utilizes an IP a...

متن کامل

The Viúva Negra crawler: an experience report

Journal: :Softw., Pract. Exper. 2008

Daniel Gomes Mário J. Silva

This paper documents hazardous situations on the Web that crawlers must address. This knowledge was accumulated while developing and operating the Viúva Negra (VN) crawler to feed a search engine and a Web archive for the Portuguese Web for four years. The design, implementation and evaluation of the VN crawler are also presented as a case study of a Web crawler design. The case study tested pr...

متن کامل

An Ontology-Based Focused Crawler

2008

Lefteris Kozanidis

In this paper we present a novel approach for building a focused crawler. The goal of our crawler is to effectively identify web pages that relate to a set of predefined topics and download them regardless of their web topology or connectivity with other popular pages on the web. The main challenges that we address in our study concern the following. First we need to be able to effectively iden...

متن کامل

Enhance Crawler: A Dual-Stage Crawler for Efficiently Harvesting Deep Web Interfaces

Journal: :International Journal of Computer Applications 2017

متن کامل