CoBWeb - A Crawler for the Brazilian Web
نویسندگان
چکیده
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per time period, while observing operational and ethical limits in the crawling process. CoBWeb is part of the SIAM (Information Systems in Mobile Computing Environments) search engine which is being implemented to support the Brazilian Web. Thus, several results related to the Brazilian Web are presented.
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملAnalyzing new features of infected web content in detection of malicious web pages
Recent improvements in web standards and technologies enable the attackers to hide and obfuscate infectious codes with new methods and thus escaping the security filters. In this paper, we study the application of machine learning techniques in detecting malicious web pages. In order to detect malicious web pages, we propose and analyze a novel set of features including HTML, JavaScript (jQuery...
متن کاملCobWeb: Tailorable, Analyzable Rules for Collaborative Web Use
CobWeb is a collaborative web browsing system that allows the rules governing the interactions of multiple users the collaboration protocol to be externally speci ed and dynamically changed We explain the architec ture of the CobWeb implementation and conclude by showing how the Java classes de ning collaboration pro tocols are generated from visual formal speci cations We note also that though...
متن کاملCobWeb: Visual Design of Collaboration Protocols for Dynamic Group Web Browsing
CobWeb is a collaborative web browsing system that allows the rules governing the interactions of multiple users (the collaboration protocol) to be externally specified and dynamically changed. We explain the architecture of the CobWeb implementation, and conclude by showing how the Java classes defining collaboration protocols are generated from visual formal specifications. We note also that,...
متن کاملEnvia Garciai, a New Genus and Species of Mygalomorph Spiders (araneae, Microstigmatidae) from Brazilian Amazonia
The genus Envia, comprising only the new species Envia garciai, is proposed. These small mygalomorph spiders were abundantly collected in soil cores and litter samples in primary rain forests near Manaus, Amazonas, Brazil.
متن کامل