Trading Quality for Time with Nearest Neighbor Search
نویسندگان
چکیده
In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In particular, this holds true for Nearest-Neighbor Search (NN-Search), a typical implementation of similarity search. In this article, we investigate approximate NNquery evaluation techniques based on the VA-File. This data structure efficiently supports NNquery evaluation in high dimensions. The VA-File contains approximations of each point. VA-File based NN-query evaluation computes bounds on the distance between each point and the query to filter out the vast majority of points. Then, a second phase identifies the NN by computing exact distances of all remaining points. To develop approximate query-evaluation techniques, we proceed in two steps: first, we derive an analytic model for VA-File based NN-search. This is to investigate the relationship between approximation granularity, effectiveness of the filtering step and search performance. In more detail, we develop formulae for the distribution of the error of the bounds and the duration of the different phases of query evaluation. Based on these results, we develop different approximate query evaluation techniques. The first one adapts the bounds to have a more rigid filtering, the second one skips computation of the exact distances. Experiments show that these techniques have the desired effect: for instance, when allowing for a small but specific reduction of result quality, we observed a speedup of 7 in 50-NN search.
منابع مشابه
An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAccelerating Fractal Image Compression by Multi-Dimensional Nearest Neighbor Search
In fractal image compression the encoding step is computationally expensive. A large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find a best match for another image portion. Our theory developed here shows that this basic procedure of fractal image compression is equivalent to multi-dimensional nearest neighbor search. This res...
متن کاملWhat Is the Nearest Neighbor in High Dimensional Spaces?
Nearest neighbor search in high dimensional spaces is an interesting and important problem which is relevant for a wide variety of novel database applications. As recent results show, however, the problem is a very di cult one, not only with regards to the performance issue but also to the quality issue. In this paper, we discuss the quality issue and identify a new generalized notion of neares...
متن کاملFractal Image Compression via Nearest Neighbor Search
In fractal image compression the encoding step is computationally expensive. A large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find best matches for other image portions called ranges. Our theory developed here shows that this basic procedure of fractal image compression is equivalent to multi-dimensional nearest neighbor sea...
متن کامل