Compact Set Representation for Information Retrieval
نویسندگان
چکیده
Conjunctive Boolean queries are a fundamental operation in web search engines. These queries can be reduced to the problem of intersecting ordered sets of integers, where each set represents the documents containing one of the query terms. But there is tension between the desire to store the lists effectively, in a compressed form, and the desire to carry out intersection operations efficiently, using non-sequential processing modes. In this paper we evaluate intersection algorithms on compressed sets, comparing them to the best non-sequential arraybased intersection algorithms. By adding a simple, low-cost, auxiliary index, we show that compressed storage need not hinder efficient and high-speed intersec-
منابع مشابه
Compact Features for Detection of Near-Duplicates in Distributed Retrieval
In distributed information retrieval, answers from separate collections are combined into a single result set. However, the collections may overlap. The fact that the collections are distributed means that it is not in general feasible to prune duplicate and near-duplicate documents at index time. In this paper we introduce and analyze the grainy hash vector, a compact document representation t...
متن کاملBinary Partition Tree as an Efficient Representation for Filtering Segmentation
This paper discusses the interest of Binary Partition Trees as shape oriented image representations Binary Partition Trees concentrate in a compact and structured representation a set of meaningful regions that can be extracted from an image This representation can be used for a large number of processing goals such as ltering segmentation information retrieval and visual browsing Furthermore t...
متن کاملA new shape retrieval method using the Group delay of the Fourier descriptors
In this paper, we introduced a new way to analyze the shape using a new Fourier based descriptor, which is the smoothed derivative of the phase of the Fourier descriptors. It is extracted from the complex boundary of the shape, and is called the smoothed group delay (SGD). The usage of SGD on the Fourier phase descriptors, allows a compact representation of the shape boundaries which is robust ...
متن کاملDiscovering Linguistic Dependencies with Graphical Models
Graphical models provide a compact approach to analysing and modeling the interaction between attributes. By exploiting marginal and conditional independence relations, high-dimensional distributions are factorized into a set of distributions over lower dimensional subdomains, allowing for a compact representation and efficient reasoning. In this paper, we motivate the choice of linguistic para...
متن کاملA Compact FP-Tree for Fast Frequent Pattern Retrieval
Frequent patterns are useful in many data mining problems including query suggestion. Frequent patterns can be mined through frequent pattern tree (FPtree) data structure which is used to store the compact (or compressed) representation of a transaction database (Han, et al, 2000). In this paper, we propose an algorithm to compress frequent pattern set into a smaller one, and store the set in a...
متن کاملBinary partition tree as an efficient representation for image processing, segmentation, and information retrieval
This paper discusses the interest of binary partition trees as a region-oriented image representation. Binary partition trees concentrate in a compact and structured representation a set of meaningful regions that can be extracted from an image. They offer a multiscale representation of the image and define a translation invariant 2-connectivity rule among regions. As shown in this paper, this ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007