Slug: A Semantic Web Crawler

نویسنده

Leigh Dodds

چکیده

This paper introduces “Slug” a web crawler (or “Scutter”) designed for harvesting semantic web content. Implemented in Java using the Jena API, Slug provides a configurable, modular framework that allows a great degree of flexibility in configuring the retrieval, processing and storage of harvested content. The framework provides an RDF vocabulary for describing crawler configurations and collects metadata concerning crawling activity. Crawler metadata allows for reporting and analysis of crawling progress, as well as more efficient retrieval through the storage of HTTP caching data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Priority based Semantic Web Crawler

The Internet has billions of web pages and these web pages are attached to each other using URL(Uniform Resource Allocation). Web crawler is a main module of Search engine that gathers these documents from WWW. Most of the web pages present on Internet are active and changes periodically. Thus, Crawler is required to update these web pages to update database of search engine. In this paper, pri...

متن کامل

Ontology Based Approach for Services Information Discovery using Hybrid Self Adaptive Semantic Focused Crawler

Focused crawling is aimed at specifically searching out pages that are relevant to a predefined set of topics. Since ontology is an all around framed information representation, ontology based focused crawling methodologies have come into exploration. Crawling is one of the essential systems for building information stockpiles. The reason for semantic focused crawler is naturally finding, comme...

متن کامل

Search Optimization using Context based Search

Finding meaningful information among the billions of information resources on the web is a tedious task as the popularity of Internet is growing rapidly. The future of web is a structured semantic web in place of unstructured information present in the web nowadays. On semantic web, ontology is used to assign meaning to the content of the web. The main concern of focused crawling is to retrieve...

متن کامل

Semantic Web in the Automotive Industry: a case study

This paper describes an ongoing case study on management of engineering data to evaluate the use of Semantic Web technologies in automotive industry. This case study explores specifically the theme of data modeling, navigation and retrieval using Semantic Web, data presentation in html+svg and data analysis using neural networks. It implements a web crawler to automatically collect data of inte...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Slug: A Semantic Web Crawler

نویسنده

چکیده

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

Priority based Semantic Web Crawler

Ontology Based Approach for Services Information Discovery using Hybrid Self Adaptive Semantic Focused Crawler

Search Optimization using Context based Search

Semantic Web in the Automotive Industry: a case study

عنوان ژورنال:

اشتراک گذاری