Performance of Inverted Indices in Shared - Nothing
نویسندگان
چکیده
The performance of distributed text document retrieval systems is strongly innuenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new prob-abilistic model of the database and queries. Simulation experiments determine which variables most strongly in-uence response time and throughput. This leads to a set of design trade-oos over a range of hardware conng-urations and new parallel query processing strategies.
منابع مشابه
Effect of Inverted Index Partitioning Schemes on Performance of Query Processing in Parallel Text Retrieval Systems
Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate ...
متن کاملEfficient Query Processing on Term-Based-Partitioned Inverted Indexes
In a shared-nothing, parallel text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the inverted index is either document-based or term-based partitioned, depending on properties of the underlying hardware infrastructure, query traffic, and some performance and availability constraints. In query processing on term-b...
متن کاملOn the Parallel Implementation of Sparse Matrix Information Retrieval Engine
We demonstrate a parallel implementation of a sparse matrix information retrieval engine. We use a shared nothing PC cluster. We perform our experiments with TREC disk 4 and 5 data, a NIST 2 Gigabytes standard benchmark text collection on 2, 4, 6, 8, 10, 12 and 14 processing nodes with different queries. We compare the results with the results of sequential inverted index, a conventional and co...
متن کاملCaching and Database Scaling in Distributed Shared-Nothing Information Retrieval Systems
A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted inde...
متن کاملScaling in Distributed Shared - Nothing
A common class of existing information retrieval system provides access to abstracts. For example Stanford University , through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted ind...
متن کامل