Similarity Search in Metric Spaces
نویسنده
چکیده
Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by some similarity criterion. It has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. In this thesis, we examine algorithms designed for similarity search over arbitrary metric spaces rather than restricting ourselves to vector spaces. The contributions in this paper include the following: First, after de ning pivot sharing and pivot localization, we prove probabilistically that pivot sharing level should be increased for scattered data while pivot localization level should be increased for clustered data. This conclusion is supported by extensive experiments. Moreover, we proposed two new algorithms, RLAESA and NGH-tree. RLAESA, using high pivot sharing level and low pivot localization level, outperforms the fastest algorithm in the same category, MVP-tree. NGH-tree is used as a framework to show the e ect of increasing pivot sharing level on search e ciency. It provides a way to improve the search e ciency in almost all algorithms. The experiments with RLAESA and NGH-tree not only show their preformance, but also support the rst conclusion we mentioned above. Second, we analyzed the issue of disk I/O on similarity search and proposed a new algorithm SLAESA to improve the search e ciency by switching random I/O access to sequential I/O access.
منابع مشابه
A Content-Addressable Network for Similarity Search in Metric Spaces
Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database uti...
متن کاملAccess Structures for Advanced Similarity Search in Metric Spaces
Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets f...
متن کاملNew Approaches to Similarity Searching in Metric Spaces
Title of dissertation: NEW APPROACHES TO SIMILARITY SEARCHING IN METRIC SPACES Cengiz Celik, Doctor of Philosophy, 2006 Dissertation directed by: Professor David Mount Department of Computer Science The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of informatio...
متن کاملAspects of Metric Spaces in Computation
Metric spaces, which generalise the properties of commonly-encountered physical and abstract spaces into a mathematical framework, frequently occur in computer science applications. Three major kinds of questions about metric spaces are considered here: the intrinsic dimensionality of a distribution, the maximum number of distance permutations, and the difficulty of reverse similarity search. I...
متن کاملSpatial Selection of Sparse Pivots for Similarity Search in Metric Spaces
Similarity search is a fundamental operation for applications that deal with unstructured data sources. In this paper we propose a new pivot-based method for similarity search, called Sparse Spatial Selection (SSS). The main characteristic of this method is that it guarantees a good pivot selection more efficiently than other methods previously proposed. In addition, SSS adapts itself to the di...
متن کاملSimilarity Measures for Relational Databases
We enrich sets with an integrated notion of similarity, measured in a (complete) lattice, special cases of which are reflexive sets and bounded metric spaces. Relations and basic relational operations of traditional relational algebra are interpreted in such richer structured environments. An canonical similarity measure between relations is introduced. In the special case of reflexive sets it ...
متن کامل