Slug: A Semantic Web Crawler
نویسنده
چکیده
This paper introduces “Slug” a web crawler (or “Scutter”) designed for harvesting semantic web content. Implemented in Java using the Jena API, Slug provides a configurable, modular framework that allows a great degree of flexibility in configuring the retrieval, processing and storage of harvested content. The framework provides an RDF vocabulary for describing crawler configurations and collects metadata concerning crawling activity. Crawler metadata allows for reporting and analysis of crawling progress, as well as more efficient retrieval through the storage of HTTP caching data.
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملPriority based Semantic Web Crawler
The Internet has billions of web pages and these web pages are attached to each other using URL(Uniform Resource Allocation). Web crawler is a main module of Search engine that gathers these documents from WWW. Most of the web pages present on Internet are active and changes periodically. Thus, Crawler is required to update these web pages to update database of search engine. In this paper, pri...
متن کاملOntology Based Approach for Services Information Discovery using Hybrid Self Adaptive Semantic Focused Crawler
Focused crawling is aimed at specifically searching out pages that are relevant to a predefined set of topics. Since ontology is an all around framed information representation, ontology based focused crawling methodologies have come into exploration. Crawling is one of the essential systems for building information stockpiles. The reason for semantic focused crawler is naturally finding, comme...
متن کاملSearch Optimization using Context based Search
Finding meaningful information among the billions of information resources on the web is a tedious task as the popularity of Internet is growing rapidly. The future of web is a structured semantic web in place of unstructured information present in the web nowadays. On semantic web, ontology is used to assign meaning to the content of the web. The main concern of focused crawling is to retrieve...
متن کاملSemantic Web in the Automotive Industry: a case study
This paper describes an ongoing case study on management of engineering data to evaluate the use of Semantic Web technologies in automotive industry. This case study explores specifically the theme of data modeling, navigation and retrieval using Semantic Web, data presentation in html+svg and data analysis using neural networks. It implements a web crawler to automatically collect data of inte...
متن کامل