A Novel Architecture of Agent based Crawling for OAI Resources

نویسنده

  • Shruti Sharma
چکیده

Nowadays, most of the search engines are competing to index as much of the Surface Web as possible with leaving a lurch at the OAI content (pdf documents), which holds a huge amount of information than surface web. In this paper, a novel framework for OAI-PMH based Crawler is being proposed that uses agents to extract the metadata about the OAI resources and store them in a repository which is later on queried through the OAI-PMH layer to generate the XML pages containing the metadata. These pages are further added to the search engines repository for indexing that makes in turn increases the relevancy of Search Engine. Agents are being used to parallelize the whole process so that metadata extraction from multiple resources can be carried out simultaneously. Keywords-OAI-PMH; Agents; Surface web;Hidden Web

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Metadata harvesting for content-based distributed information retrieval

We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative’s (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval...

متن کامل

OAI-P2P: A Peer-to-Peer Network for Open Archives

OAI is designed with a low-barrier technology approach, thus allowing institutions to provide content metadata with little effort. On the other hand, search capabilities are very limited on OAI data providers, and have to be provided by separate service providers. We propose that data providers form a peer-to-peer network which supports distributed search over all connected metadata repositorie...

متن کامل

A novel vedic divider based crypto-hardware for nanocomputing paradigm: An extended perspective

Restoring and non-restoring divider has become widely applicability in the era of digital computing application due to its computation speed. In this paper, we have proposed the design of divider of different architecture for the computation of Vedic sutra based. The design of divider in the Vedic mode results in high computation throughput due to its replica architecture, where latency is mini...

متن کامل

Adding eScience Assets to the Data Web

Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Arc...

متن کامل

A novel vedic divider based crypto-hardware for nanocomputing paradigm: An extended perspective

Restoring and non-restoring divider has become widely applicability in the era of digital computing application due to its computation speed. In this paper, we have proposed the design of divider of different architecture for the computation of Vedic sutra based. The design of divider in the Vedic mode results in high computation throughput due to its replica architecture, where latency is mini...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010