Memory Efficient Ranking

نویسندگان

Alistair Moffat

Justin Zobel

Ron Sacks-Davis

چکیده

Fast and effective ranking of a collection of documents with respect to a query requires several structures, including a vocabulary, inverted file entries, arrays of term weights and document lengths, an array of partial similarity accumulators, and address tables for inverted file entries and documents. Of all of these structures, the array of document lengths and the array of accumulators are the components accessed most frequently in a ranked query, and it is crucial to acceptable performance that they be held in main memory. Here we describe an approximate ranking process that makes use of a compact array of in-memory low precision approximations for the lengths. Combined with another simple rule for reducing the memory required by the partial similarity accumulators, the approximation heuristic allows the ranking of large document collections using less than one byte of memory per document, an eight-fold reduction compared with the space required by conventional techniques. Moreover, in our experiments retrieval effectiveness was unaffected by the use of these heuristics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking efficient DMUs using the infinity norm and virtual inefficient DMU in DEA

In many applications, ranking of decision making units (DMUs) is a problematic technical task procedure to decision makers in data envelopment analysis (DEA), especially when there are extremely efficient DMUs. In such cases, many DEA models may usually get the same efficiency score for different DMUs. Hence, there is a growing interest in ranking techniques yet. The purpose of this paper is ra...

متن کامل

Ranking efficient DMUs using minimizing distance in DEA

متن کامل

Ranking Efficient DMUs Using the Ideal point and Norms

In this paper, presenting two simple methods for ranking of efficient DMUs in DEA models that included to add one virtual DMU as ideal DMU and is using the additive model. Note that, we use an ideal point just for comparing efficient DMUs with. Although these methods are simple, they have ability for ranking all efficient DMUs, extreme points and the others, also they are capable of ranking t...

متن کامل

Ranking Efficient Decision Making Units in Data Envelopment Analysis based on Changing Reference Set

One of the drawbacks of Data Envelopment Analysis (DEA) is the problem of lack of discrimination among efficient Decision Making Units (DMUs). A method for removing this difficulty is called changing reference set proposed by Jahanshahloo and et.al (2007). The method has some drawbacks. In this paper a modified method and new method to overcome this problems are suggested. The main advantage of...

متن کامل

Efficient Algorithms and Data Structures for Massive Data Sets

For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are I/O efficient. This thesis adds to the growing chorus of such results. The computational models we use are the external memory model and the W-Stream model. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Inf. Process. Manage.

دوره 30 شماره

صفحات -

تاریخ انتشار 1994

Memory Efficient Ranking

نویسندگان

چکیده

منابع مشابه

Ranking efficient DMUs using the infinity norm and virtual inefficient DMU in DEA

Ranking efficient DMUs using minimizing distance in DEA

Ranking Efficient DMUs Using the Ideal point and Norms

Ranking Efficient Decision Making Units in Data Envelopment Analysis based on Changing Reference Set

Efficient Algorithms and Data Structures for Massive Data Sets

عنوان ژورنال:

اشتراک گذاری