Adaptive Query-Based Sampling of Distributed Collections
نویسندگان
چکیده
As part of a Distributed Information Retrieval system a description of each remote information resource, archive or repository is usually stored centrally in order to facilitate resource selection. The acquisition of precise resource descriptions is therefore an important phase in Distributed Information Retrieval, as the quality of such representations will impact on selection accuracy, and ultimately retrieval performance. While Query-Based Sampling is currently used for content discovery of uncooperative resources, the application of this technique is dependent upon heuristic guidelines to determine when a sufficiently accurate representation of each remote resource has been obtained. In this paper we address this shortcoming by using the Predictive Likelihood to provide both an indication of the quality of an acquired resource description estimate, and when a sufficiently good representation of a resource has been obtained during Query-Based Sampling.
منابع مشابه
Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques
In the context of information retrieval, traditional collection selection algorithms have been widely studied. These algorithms utilize language models, a representation of the contents of each text collection over which selection is to be performed, but these language models cannot always be easily acquired. Query-based sampling is a technique by which these language models are discovered by i...
متن کاملSample Sizes for Query Probing in Uncooperative Distributed Information Retrieval
The goal of distributed information retrieval is to support effective searching over multiple document collections. For efficiency, queries should be routed to only those collections that are likely to contain relevant documents, so it is necessary to first obtain information about the content of the target collections. In an uncooperative environment, query probing — where randomly-chosen quer...
متن کاملThe Eeects of Query-based Sampling on Automatic Database Selection Algorithms Keywords: Distributed Collections, Merging Search Results/information Synthesis, Database Selection
Database selection algorithms need to know the subject areas covered by each text database, but this metadata can be diicult to acquire in multi-party environments, such as the Internet, where each party has diierent interests and capabilities. Query-based sampling is a relatively new technique in which metadata is inferred by interacting with each text database and observing the outcomes. Quer...
متن کاملQuery-driven Adaptive Term Set Search in large Peer-to- peer Textual Collections
Most of the search mechanisms which include in Distributed Hash Table based Peer-to-peer system depends on multiple single keyword-based search operations. This increases the traffic cost and has a poor accuracy. Pre-computing the term-set-based index can reduce the cost but needs exponentially growing index size. Based on the observations made, queries are usually short and the users have limi...
متن کاملA Clustered Index Approach to Distributed XPath
Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...
متن کامل