An Interactive Search Technique for String Databases
نویسندگان
چکیده
The explosive growth of string databases makes similarity search a challenging problem. Current search tools are non-interactive in the sense that the user has to wait a long time until the entire database is inspected. We consider the problem of interactive searching, and propose a set of innovative -NN ( -Nearest Neighbor) search algorithms. We propose a new model for the distance distribution of a query to a set of strings. Using this distribution, our first technique orders the MBRs (Minimum Bounding Rectangles) of an index structure based on their order statistics. We also propose an early pruning strategy to reduce the total search time for this technique. Our second technique exploits existing statistical models to define an order on the index structure MBRs. We also propose a method to compute the confidence levels for the partial results. Our experiments show that our technique can achieve 75 accuracy within the first 2.5-35 of the iterations and 90 accuracy within the first 12-45 of the iterations. Furthermore, the reported confidence levels reflect the quality of the partial results accurately.
منابع مشابه
Accelerating Substring Searching: Breaking the I/O Barrier
The exponential increase in the size of string databases makes substring search a challenging problem. Current techniques suffer from both disk I/O and computational cost because of extensive memory requirements and large candidate sets. We accelerate string search tools and reduce their memory requirements by precomputing the associations between the database strings and the query string. Our ...
متن کاملar X iv : 0 90 3 . 31 18 v 1 [ qu an t - ph ] 1 8 M ar 2 00 9 Generation of a Common Reference String , secure against Quantum Adversaries , and Applications
In this paper, we present the generation of a common reference string “from scratch” via coin-flipping in the presence of a quantum adversary. First, we present how we achieve quantumsecure coin-flipping using Watrous’ quantum rewinding technique [Wat06]. Then, by combining this coin-flipping with any non-interactive zero-knowledge protocol we get an easy transformation from non-interactive zer...
متن کاملFast Approximate String Matching in a Dictionary
A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that on-line search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over the fastest online algorithms , when the tolerated error level is low (which is reasonable in tex...
متن کاملThe Impact of the Objective Complexity and Product of Work Task on Interactive Information Searching Behavior
Background and Aim: this study aimed to explore the impact of objective complexity and Product of work task on user's interactive information searching behavior. Method: The research population consisted of MSc students of Ferdowsi university of Mashhad enrolled in 2012-13 academic year. In 3 stages of sampling (random stratified, quota, and voluntary sampling), 30 cases were selected. Each of ...
متن کاملAn Index based Pattern Matching using Multithreading
Pattern matching, the problem of finding sub sequences within a long sequence is essential for many applications such as information retrieval, disease analysis, structural and functional analysis, logic programming, theorem-proving, term rewriting and DNA-computing. In computational biology the essential components for DNA applications is the exact string matching algorithms. Many databases li...
متن کامل