Static Pruning of Terms in Inverted Files
نویسندگان
چکیده
This paper addresses the problem of identifying collection dependent stop-words in order to reduce the size of inverted files. We present four methods to automatically recognise stop-words, analyse the tradeoff between efficiency and effectiveness, and compare them with a previous pruning approach. The experiments allow us to conclude that in some situations stop-words pruning is competitive with respect to other inverted file reduction techniques.
منابع مشابه
Static Index Pruning for Information Retrieval Systems: A Posting-Based Approach
Static index pruning methods have been proposed to reduce size of the inverted index of information retrieval systems. The goal is to increase efficiency (in terms of query response time) while preserving effectiveness (in terms of ranking quality). Current state-of-the-art approaches include the term-centric pruning approach and the document-centric pruning approach. While the term-centric pru...
متن کاملA Practitioner's Guide for Static Index Pruning
We compare the termand document-centric static index pruning approaches as described in the literature and investigate their sensitivity to the scoring functions employed during the pruning and actual retrieval stages. 1 Static Inverted Index Pruning Static index pruning permanently removes some information from the index, for the purposes of utilizing the disk space and improving query process...
متن کاملDiversification Based Static Index Pruning - Application to Temporal Collections
Nowadays, web archives preserve the history of large portions of the web. As medias are shifting from printed to digital editions, accessing these huge information sources is drawing increasingly more attention from national and international institutions, as well as from the research community. These collections are intrinsically big, leading to index files that do not fit into the memory and ...
متن کاملWithin-Document Term-Based Index Pruning with Statistical Hypothesis Testing
Document-centric static index pruning methods provide smaller indexes and faster query times by dropping some within-document term information from inverted lists. We present a method of pruning inverted lists derived from the formulation of unigram language models for retrieval. Our method is based on the statistical significance of term frequency ratios: using the two-sample two-proportion (2...
متن کاملFriction Compensation for Dynamic and Static Models Using Nonlinear Adaptive Optimal Technique
Friction is a nonlinear phenomenon which has destructive effects on performance of control systems. To obviate these effects, friction compensation is an effectual solution. In this paper, an adaptive technique is proposed in order to eliminate limit cycles as one of the undesired behaviors due to presence of friction in control systems which happen frequently. The proposed approach works for n...
متن کامل