نتایج جستجو برای: crawler

تعداد نتایج: 1856  

Journal: :American Entomologist 2007

2013
Huynh Thi Thanh Binh Ha Minh Long Tran Duc Khanh

A focused crawler traverses the web selecting out relevant pages according to a predefined topic. While browsing the internet it is difficult to identify relevant pages and predict which links lead to high quality pages. This paper proposes a topical crawler for Vietnamese web pages using greedy heuristic and genetic algorithms. Our crawler based on genetic algorithms uses different recombinati...

Journal: :Computer Networks 1998
Junghoo Cho Hector Garcia-Molina Lawrence Page

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evalu...

2010
Nidhi Tyagi Deepti Gupta

The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Due to the growing and dynamic nature of the web, it has become a challenge to traverse all URLs in the web documents and handle these URLs, so it has become imperative to parallelize a crawling process. The crawler process is further being parallelized in the form ecology of crawler workers that para...

2015
Swapnil V. Patil Sharmila M. Shinde

Focused crawling is aimed at specifically searching out pages that are relevant to a predefined set of topics. Since ontology is an all around framed information representation, ontology based focused crawling methodologies have come into exploration. Crawling is one of the essential systems for building information stockpiles. The reason for semantic focused crawler is naturally finding, comme...

Journal: :Engineering Letters 2006
Manuel Álvarez Juan Raposo Fidel Cacheda Alberto Pan

There is a great amount of valuable information on the web that cannot be accessed by conventional crawler engines. This portion of the web is usually known as the Deep Web or the Hidden Web. Most probably, the information of highest value contained in the deep web, is that behind web forms. In this paper, we describe a prototype hidden-web crawler able to access such content. Our approach is b...

2002
Niran Angkawattanawit Arnon Rungsawang

The rapid growth of the Internet has put us into trouble when we need to find information in such a large network of databases. At present, using topic-specific web crawler becomes a way to seek the needed information. The main characteristic of a topic-specific web crawler is to select and retrieve only relevant web pages in each crawling process. There are many previous researches focusing on...

2016
Steffen Remus Christian Biemann

This work presents a straightforward method for extending or creating in-domain web corpora by focused webcrawling. The focused webcrawler uses statistical N-gram language models to estimate the relatedness of documents and weblinks and needs as input only N-grams or plain texts of a predefined domain and seed URLs as starting points. Two experiments demonstrate that our focused crawler is able...

Journal: :SN Computer Science 2021

Journal: :International Journal of Computer Applications 2013

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید