Access Structures for Advanced Similarity Search in Metric Spaces

نویسندگان

  • Vlastislav Dohnal
  • Claudio Gennaro
  • Pasquale Savino
  • Pavel Zezula
چکیده

Similarity retrieval is an important paradigm for searching in environments where exact match has little meaning. Moreover, in order to enlarge the set of data types for which the similarity search can efficiently be performed, the notion of mathematical metric space provides a useful abstraction for similarity. In this paper we consider the problem of organizing and searching large data-sets from arbitrary metric spaces, and a novel access structure for similarity search in metric data, called D-Index, is discussed. D-Index combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. Moreover, we propose an extension of this access structure (eD-Index) which is able to deal with the problem of similarity self join. Though this approach is not able to eliminate the intrinsic quadratic complexity of similarity joins, significant performance improvements are confirmed by experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Content-Addressable Network for Similarity Search in Metric Spaces

Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database uti...

متن کامل

A Hashed Schema for Similarity Search in Metric Spaces (invited talk)

A novel access structure for similarity search in metric data, called Similarity Hashing (SH), is proposed. Its multi-level hash structure of separable buckets on each level supports easy insertion and bounded search costs, because at most one bucket needs to be accessed at each level for range queries up to a pre-de ned value of search radius. At the same time, the number of distance computati...

متن کامل

New Approaches to Similarity Searching in Metric Spaces

Title of dissertation: NEW APPROACHES TO SIMILARITY SEARCHING IN METRIC SPACES Cengiz Celik, Doctor of Philosophy, 2006 Dissertation directed by: Professor David Mount Department of Computer Science The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of informatio...

متن کامل

Scalable and Distributed Similarity Search in Metric Spaces

In this paper we propose a new access structure, called GHT*, based on generalized hyperplane tree (GHT) and distributed dynamic hashing (DDH) techniques. GHT* is a distributed structure which allows to perform range search in a metric space according to a distance function d. The structure does not require a central directory and it is able to gracefully scale through splits of one bucket at a...

متن کامل

Similarity Search in Metric Spaces

Similarity search refers to any searching problem which retrieves objects from a set that are close to a given query object as re ected by some similarity criterion. It has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. In this thesis, we examine algorithms designed for similarity search over arbitrar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003