Practical Top-K Document Retrieval in Reduced Space

نویسندگان

  • Gonzalo Navarro
  • Daniel Valenzuela
چکیده

Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k retrieval and propose new alternatives. Our experimental results show that our novel algorithms and data structures dominate almost all the space/time tradeoff. ∗Partially funded by the Millennium Institute for Cell Dynamics and Biotechnology (ICDB), Grant ICM P05-001F, Mideplan, Chile; and by Fondecyt Grant 1-110066, Chile.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Space-Efficient Top-k Document Retrieval

Supporting top-k document retrieval queries on general text databases, that is, finding the k documents where a given pattern occurs most frequently, has become a topic of interest with practical applications. While the problem has been solved in optimal time and linear space, the actual space usage is a serious concern. In this paper we study various reduced-space structures that support top-k...

متن کامل

Practical Compressed Document Retrieval

Recent research on document retrieval for general texts has established the virtues of explicitly representing the so-called document array, which stores the document each pointer of the suffix array belongs to. While it makes document retrieval faster, this array occupies a significative amount of redundant space and is not easily compressible. In this paper we present the first practical prop...

متن کامل

Top-k Document Retrieval in External Memory

Let D be a given set of (string) documents of total length n. The top-k document retrieval problem is to index D such that when a pattern P of length p, and a parameter k come as a query, the index returns those k documents which are most relevant to P . Hon et al. [22] proposed a linear space framework to solve this problem in O(p+k log k) time. This query time was improved to O(p+k) by Navarr...

متن کامل

Colored Range Queries and Document Retrieval

Colored range queries are a well-studied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important one-dimensional colored range queries — colored range listing, colored range top-k queries and colored range counting — and, thus, new bounds fo...

متن کامل

Top-k document retrieval in optimal space

We present an index for top-k most frequent document retrieval whose space is |CSA|+o(n)+D log n D+O(D) bits, and its query time is O(log k log 2+ n) per reported document, where D is the number of documents, n is the sum of lengths of the documents, and |CSA| is the space of the compressed suffix array for the documents. This improves over previous results for this problem, whose space complex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1111.4395  شماره 

صفحات  -

تاریخ انتشار 2011