A Proximity measures for rank join

نویسنده

  • DAVIDE MARTINENGHI
چکیده

We introduce the proximity rank join problem, where we are given a set of relations whose tuples are equipped with a score and a real-valued feature vector. Given a target feature vector, the goal is to return the K combinations of tuples with high scores that are as close as possible to the target and to each other, according to some notion of distance or dissimilarity. The setting closely resembles that of traditional rank join, but the geometry of the vector space plays a distinctive role in the computation of the overall score of a combination. Also, the input relations typically return their results either by distance from the target or by score. Because of these aspects, it turns out that traditional rank join algorithms, such as the well-known HRJN , have shortcomings in solving the proximity rank join problem, as they may read more input than needed. To overcome this weakness, we define a tight bound (used as a stopping criterion) that guarantees instance optimality, i.e., an I/O cost is achieved that is always within a constant factor of optimal. The tight bound can also be used to drive an adaptive pulling strategy, deciding at each step which relation to access next. For practically relevant classes of problems, we show how to compute the tight bound efficiently. An extensive experimental study validates our results and demonstrates significant gains over existing solutions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proximity Rank Join Based on Cosine Similarity

Proximity rank join is the problem of finding the top-K combinations with the highest aggregate score in which the best combinations of objects coming from different services are sought, and each object is equipped with both a score and a real-valued feature vector. The proximity of the objects i.e. the geometry of the feature space plays a distinctive role in the computation of the overall sco...

متن کامل

Proximity Rank Join

We introduce the proximity rank join problem, where weare given a set of relations whose tuples are equipped witha score and a real-valued feature vector. Given a targetfeature vector, the goal is to return the K combinations oftuples with high scores that are as close as possible to thetarget and to each other, according to some notion of dis-tance. The setting closely ...

متن کامل

RANK-AWARE QUERY PROCESSING AND OPTIMIZATION A Thesis

Ilyas, Ihab F. Ph.D., Purdue University, August, 2004. Rank-aware Query Processing and Optimization. Major Professors: Ahmed K. Elmagarmid and Walid G. Aref. This dissertation focuses on supporting ranking in relational database systems through a rank-aware query processing and optimization framework. We introduce ranking algorithms and operators to be adopted by current relational query engine...

متن کامل

JTop Algorithms for Top-k Join Queries

Top-k join queries have become very important in many important areas of computing. One of the most efficient algorithms for top-k join queries is the Rank-Join algorithm [17] [18]. However, there are many cases where Rank-Join does much unnecessary access to the input data sources. In this report, we first show that there are many cases where Rank-Join's stopping mechanism is not efficient, an...

متن کامل

Proximity Search in Databases

An information retrieval (IR) engine can rank documents based on textual proximity of keywords within each document. In this paper we apply this notion to search across an entire database for objects that are \near" other relevant objects. Proximity search enables simple \focusing" queries based on general relationships among objects, helpful for interactive query sessions. We view the database...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011