نتایج جستجو برای: crawler
تعداد نتایج: 1856 فیلتر نتایج به سال:
This paper describes and gives access to a database of the link structures of 109 UK university and higher education college websites, as created by a specialist information science web crawler in June and July of 2001. With the increasing interest in web links by information and computer scientists this is an attempt to make available raw data for research that is not reliant upon the opaque t...
The pervasiveness of the Internet makes it an ideal medium for sharing scholarly information. Nowadays, many authors post their publications online so that others may easily access to them, increasing the author’s impact in his/her research area. In this project, we develop a focused crawling to find publication pages, web pages that link to online, freely available scholarly publications. In c...
In South Africa, electricity is supplied through thousands-of-kilometers of overhead power cables, which is owned by Eskom the national energy supplier. Currently monitoring of these overhead power cables are done by means of helicopter inspection flights and foot patrols, which are infrequent and expensive. In this paper, the authors present the design of a prototype power line crawler (inspec...
A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out of concern. While surfing the internet it is difficult to deal with irrelevant pages and to predict which links lead to quality pages. In this paper, a technique of effective focused crawling is implemented to improve the quality of web navigation. To check the similarity of web pages...
High-performance web crawlers are an important component of many web services. For example, search services use web crawlers to populate their indices, comparison shopping engines use them to collect product and pricing information from online vendors, and the Internet Archive uses them to record a history of the Internet. The design of a high-performance crawler poses many challenges, both tec...
This paper discusses a modular and opensource focused crawler (ILSP-FC) for the automatic acquisition of domain-specific monolingual and bilingual corpora from the Web. Besides describing the main modules integrated in the crawler (dealing with page fetching, normalization, cleaning, text classification, de-duplication and document pair detection), we evaluate several of the system functionalit...
The Web provides us with a huge and endless resource for information. But, the rapidly growing size of the Web poses great challenge for general purpose crawlers and search engines. It is impossible for any search engine to index the whole Web. Focused crawler collects domain relevant pages from the Web by avoiding the irrelevant portion of the Web. Focused crawler can help the search engine to...
The World Wide Web is a global, large repository of text documents, images, multimedia and much other information, referred to as information resources. A large amount of new information is posted on the Web every day. Web Crawler is a program, which fetches information from the World Wide Web in an automated manner. The crawler keeps visiting pages after the collection reaches its target size,...
World Wide Web consists of more than 50 billion pages online. It is highly dynamic [6] i.e. the web continuously introduces new capabilities and attracts many people. Due to this explosion in size, the effective information retrieval system or search engine can be used to access the information. In this paper we have proposed the EPOW (Effective Performance of WebCrawler) architecture. It is a ...
Forum Crawler Under Supervision (FoCUS) is a supervised web-scale forum crawler. The web contains large data and innumerable websites that are monitored by a tool or program known as crawler. The goal is to crawl relevant forum content from the web with minimal overhead. Forums have different layouts or styles and are powered by different forum software packages. They have similar implicit navi...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید