A Space-Efficient Indexing Algorithm for Boolean Query Processing
نویسندگان
چکیده
Boolean search is the most common information retrieval technique that requires users to input the exact words they are looking for into a text field. Existing algorithms use inverted list indexing schemes to answer boolean search queries. These approaches occupy large index size as the inverted lists are highly overlapping and redundant. In this paper, we propose a novel approach that reduces the size of inverted lists while retaining time-efficiency. Our solution is based on merging inverted lists that bear high overlap to each other. A new algorithm is designed to discover overlapping inverted lists and construct condensed index for a given dataset. We conduct extensive experiments on several publicly available datasets. The proposed algorithm delivers considerable space usage improvement while exhibits tolerable time performance penalties.
منابع مشابه
Efficient processing of top-k spatial Boolean queries for distributed system
The widened use of spatial data imposes the heavy load on databases and data sizes. Now a day’s spatial data is a part of smart computing because location is mostly responsible for user activities. In this paper we focus on reviewing a more efficient system to process spatial data in distributed system by implementing a combined approach for geospatial indexing and query formatting. Previous re...
متن کاملMetric Indexing for the Vector Model in Text Retrieval
In the area of Text Retrieval, processing a query in the vector model has been verified to be qualitatively more effective than searching in the boolean model. However, in case of the classic vector model the current methods of processing many-term queries are inefficient, in case of LSI model there does not exist an efficient method for processing even the few-term queries. In this paper we pr...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملQuerying Mobile Objects in Spatio-Temporal Databases
In dynamic spatio-temporal environments where objects may continuously move in space, maintaining consistent information about the location of objects and processing motion-specific queries is a challenging problem. In this paper, we focus on indexing and query processing techniques for mobile objects. Specifically, we develop a classification of different types of selection queries that arise ...
متن کامل