DiVA: Indexing high-dimensional data by "diving" into vector approximations
نویسندگان
چکیده
Contemporary multimedia, scientific and medical applications use indexing structures to access their highdimensional data. Yet, in sufficiently high-dimensional spaces, conventional tree-based access methods are eventually outperformed by simple serial scans. Vector quantization has been effectively used to index data that are mostly distributed uniformly. However, in real-world applications, clustered data and skewed query distributions are the norm. In this paper, we propose DiVA, an approach that selectively adapts the quantization step to accommodate varying indexing needs. This adaptation mechanism triggers the restructuring and possible expansion of DiVA so as to provide finer indexing granularity and enhanced access performance in certain “hot” areas of the search space. User-supplied policies help both identify such “hot” areas and satisfy versatile application requirements. Experimentation with our detailed prototype shows that in a real-world data set, DiVA yields up-to 64% reduced I/O compared to competing methods such as the VA-file and the A-tree.
منابع مشابه
DiVA: Using Application-Specific Policies to 'Dive' into Vector Approximations
In high-dimensional data domains, the performance of conventional tree-based access structures is occasionally outperformed by simple sequential scans. To this end, the introduction of approximation-based methods helped speed-up queries by providing compact representations of stored data. Approximation methods exploit vector quantization to index data mainly presumed to follow a uniform distrib...
متن کاملیک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملUtilization of Principle Axis Analysis for Fast Nearest Neighbor Searches in High-Dimensional Image Databases
This paper presents an efficient indexing method for similarity searches in highdimensional image database by principal axis analysis. Image databases often represent the image objects as high-dimensional feature vectors and access them via the feature vectors and similarity measure. However, the performance of the existing nearest neighbor search methods is far from satisfactory for feature ve...
متن کاملVector Approximation based Indexing for High-Dimensional Multimedia Databases
the proliferation of multimedia data, there is an increasing need to support the indexing and searching of high-dimensional data. In this paper, we propose an efficient indexing method for high-dimensional multimedia databases using the filtering approach, known also as vector approximation approach which supports the nearest neighbor search efficiently. Our technique called RA +-Blocks (Region...
متن کاملThe Nondeterministic Divide
The noadeterministic divide partitions a vector into two nonempty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-andconquer sequential algorithm on one or more vectors of the same range, whose division point for a new ...
متن کامل