Efficiently Indexing High-Dimensional Data Spaces
نویسنده
چکیده
Indexing high-dimensional data spaces is an emerging research domain. It gains increasing importance by the need to support modern applications by powerful search tools. In the so-called non-standard applications of database systems such as multimedia, CAD, molecular biology, medical imaging, time series processing and many others, similarity search in large data sets is required as a basic functionality. A technique widely applied for similarity search is the so-called feature transformation, where important properties of the database objects are transformed into points of a multidimensional vector space, the so-called feature vectors. Thus, similarity queries are naturally translated into neighborhood queries in the feature space. In order to achieve a high performance in query processing, multidimensional index structures are used to manage the feature vectors. Unfortunately, multidimensional index structures deteriorate in performance when the dimension of the data space increases, because they are primarily designed for low-dimensional data spaces and due to a bunch of effects usually called the ‘curse of dimensionality’. The general goal of this thesis is therefore the improvement of the efficiency of indexbased query processing in high-dimensional data spaces. For this purpose, a cost model for index-based query processing in high-dimensional data spaces was developed. It is applicable to a variety of index structures and query processing techniques and can be used for the evaluation of techniques and for optimization. Based on this cost model, a variety of improvement and optimization techniques for multidimensional index structures was developed. The first, called DABS-tree, involves a cost model based split algorithm supporting a dynamic and local adaptation of the block size of the index structure. Dynamic block size adaptation is especially useful as we can show that conventional index structures often access data in too small portions.
منابع مشابه
یک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملSIMP: Accurate and Efficient Near Neighbor Search in Very High Dimensional Spaces
Near neighbor search in very high dimensional spaces is useful in many applications. Existing techniques solve this problem efficiently only for the approximate case. These solutions are designed to solve r-near neighbor queries only for a fixed query range or a set of query ranges with probabilistic guarantees and then, extended for nearest neighbor queries. Solutions supporting a set of query...
متن کاملImproving the Performance of High-Dimensional kNN Retrieval through Localized Dataspace Segmentation and Hybrid Indexing
Efficient data indexing and nearest neighbor retrieval are challenging tasks in high-dimensional spaces. This work builds upon our previous analyses of iDistance partitioning strategies to develop the backbone of a new indexing method using a heuristic-guided hybrid index that further segments congested areas of the dataspace to improve overall performance for exact k-nearest neighbor (kNN) que...
متن کاملRetrieval of Optimal Subspace Clusters Set for an Effective Similarity Search in a High-Dimensional Spaces
High dimensional data is often analysed resorting to its distribution properties in subspaces. Subspace clustering is a powerfull method for elicication of high dimensional data features. The result of subspace clustering can be an essential base for building indexing structures and further data search. However, a high number of subspaces and data instances can conceal a high number of subspace...
متن کاملLocal Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
Many emerging application domains require database systems to support efficient access over highly multidimensional datasets. The current state-of-the-art technique to indexing high dimensional data is to first reduce the dimensionality of the data using Principal Component Analysis and then indexing the reduceddimensionality space using a multidimensional index structure. The above technique, ...
متن کامل