Using Optimized Multi-Attribute Hash Indexes for Hash Joins

نویسندگان

Evan P. Harris

Kotagiri Ramamohanarao

چکیده

The join operation is one of the most frequently used and expensive query processing operations in relational database systems. One method of joining two relations is to use a hash-based join algorithm. Hash-based join algorithms typically have two phases, a partitioning phase and a partition joining phase. We describe how an optimal multi-attribute hash (MAH) indexing scheme can be used to reduce the average cost of the partitioning phase of any hash-based join algorithm, by eliminating the partitioning phase entirely for many of the most common join queries. We demonstrate that the technique can be extended to include multiple copies of the data le, each with a diierent organization of the MAH indexing scheme, and that this further reduces the average cost of performing the partitioning phase of the hash join algorithm. We describe a relatively inexpensive method for determining a good MAH indexing scheme. Our experiments show that the schemes found using this method are usually optimal and performs the partitioning phase of the hash join algorithm at least three times faster than using the standard approach. We show that a signiicant change in the query pattern is required for a reorganization of the data le to be necessary, and show that reorganizing the data le is an inexpensive operation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Truncating Hash Algorithm for Processing Band-Join Queries

A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a specified band about the values in the join attribute of S. This paper describes a new algorithm, termed a truncating-hash band join, for evaluating band joins. This algorithm is based on the idea of truncating join attribute values in order to ezecute band joins ...

متن کامل

Analyzing In-Memory Hash Joins: Granularity Matters

Predicting the performance of join algorithms on modern hardware is challenging. In this work, we focus on mainmemory no-partitioning and partitioning hash join algorithms executing on multi-core platforms. We discuss the main parameters impacting performance, and present an effective performance model. This model can be used to select the most appropriate algorithm for different input data-set...

متن کامل

Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join. The relative performance of these two join approaches have been a topic of discussion for a long time. With the advent of modern multicore architectures, it has been argued that sort-merge join is now a better choice than radix-hash join. This...

متن کامل

Optimal Clustering of Relations to Improve Sorting and Partitioning for Joins

The sorting or partitioning of relations is very common in relational database systems. Implementations of the join operation include the sortmerge join algorithm, which sorts both relations, and the hash join algorithm, which usually partitions both relations. We describe how clustering records using an optimal multi-attribute hash (MAH) "le, taking the query pattern and distribution into acc...

متن کامل

An Improved Hash Function Based on the Tillich-Zémor Hash Function

Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Using Optimized Multi-Attribute Hash Indexes for Hash Joins

نویسندگان

چکیده

منابع مشابه

A Truncating Hash Algorithm for Processing Band-Join Queries

Analyzing In-Memory Hash Joins: Granularity Matters

Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited

Optimal Clustering of Relations to Improve Sorting and Partitioning for Joins

An Improved Hash Function Based on the Tillich-Zémor Hash Function

عنوان ژورنال:

اشتراک گذاری