Ranked Document Selection
نویسندگان
چکیده
Let D be a collection of string documents of n characters in total. The top-k document retrieval problem is to preprocess D into a data structure that, given a query (P, k), can return the k documents of D most relevant to pattern P . The relevance of a document d for a pattern P is given by a predefined ranking function w(P, d). Linear space and optimal query time solutions already exist for this problem. In this paper we consider a novel problem, document selection queries, which aim to report the kth document most relevant to P (instead of reporting all top-k documents). We present a data structure using O(n log n) space, for any constant > 0, answering selection queries in time O(log k/ log logn), and a linear-space data structure answering queries in time O(log k), given the locus node of P in a (generalized) suffix tree of D. We also prove that it is unlikely that a succinct-space solution for this problem exists with poly-logarithmic query time.
منابع مشابه
A Multistage Feature Selection Model for Document Classification Using Information Gain and Rough Set
Huge number of documents are increasing rapidly, therefore, to organize it in digitized form text categorization becomes an challenging issue. A major issue for text categorization is its large number of features. Most of the features are noisy, irrelevant and redundant, which may mislead the classifier. Hence, it is most important to reduce dimensionality of data to get smaller subset and prov...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملEnhancing Query Formulation for Spoken Document Retrieval
The popularity and ubiquity of multimedia associated with spoken documents has spurred a lot of research interest in spoken document retrieval (SDR) in the recent past. Beyond much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the user’s informati...
متن کاملComparing scientific production of prioritized health areas of Iran\'s comprehensive scientific map with outlook horizon 1404 countries, a scientometric study: brief report
Background: Studying and evolution of medical sciences is so important to draw up the future path with a view to per capita of science production. The purpose of this study was to clarify the status and position of Iran in science production and compare it with four competitor countries of the region for 2025. Methods: This research is conducted using the scientometric method our ranked citati...
متن کاملRobust Query-Specific Pseudo Feedback Document Selection for Query Expansion
In document retrieval using pseudo relevance feedback, after initial ranking, a fixed number of top-ranked documents are selected as feedback to build a new expansion query model. However, very little attention has been paid to an intuitive but critical fact that the retrieval performance for different queries is sensitive to the selection of different numbers of feedback documents. In this pap...
متن کامل