A Novel Architecture of Agent based Crawling for OAI Resources
نویسنده
چکیده
Nowadays, most of the search engines are competing to index as much of the Surface Web as possible with leaving a lurch at the OAI content (pdf documents), which holds a huge amount of information than surface web. In this paper, a novel framework for OAI-PMH based Crawler is being proposed that uses agents to extract the metadata about the OAI resources and store them in a repository which is later on queried through the OAI-PMH layer to generate the XML pages containing the metadata. These pages are further added to the search engines repository for indexing that makes in turn increases the relevancy of Search Engine. Agents are being used to parallelize the whole process so that metadata extraction from multiple resources can be carried out simultaneously. Keywords-OAI-PMH; Agents; Surface web;Hidden Web
منابع مشابه
Metadata harvesting for content-based distributed information retrieval
We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative’s (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval...
متن کاملOAI-P2P: A Peer-to-Peer Network for Open Archives
OAI is designed with a low-barrier technology approach, thus allowing institutions to provide content metadata with little effort. On the other hand, search capabilities are very limited on OAI data providers, and have to be provided by separate service providers. We propose that data providers form a peer-to-peer network which supports distributed search over all connected metadata repositorie...
متن کاملA novel vedic divider based crypto-hardware for nanocomputing paradigm: An extended perspective
Restoring and non-restoring divider has become widely applicability in the era of digital computing application due to its computation speed. In this paper, we have proposed the design of divider of different architecture for the computation of Vedic sutra based. The design of divider in the Vedic mode results in high computation throughput due to its replica architecture, where latency is mini...
متن کاملAdding eScience Assets to the Data Web
Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Arc...
متن کاملA novel vedic divider based crypto-hardware for nanocomputing paradigm: An extended perspective
Restoring and non-restoring divider has become widely applicability in the era of digital computing application due to its computation speed. In this paper, we have proposed the design of divider of different architecture for the computation of Vedic sutra based. The design of divider in the Vedic mode results in high computation throughput due to its replica architecture, where latency is mini...
متن کامل