Using Optimized Multi-Attribute Hash Indexes for Hash Joins
نویسندگان
چکیده
The join operation is one of the most frequently used and expensive query processing operations in relational database systems. One method of joining two relations is to use a hash-based join algorithm. Hash-based join algorithms typically have two phases, a partitioning phase and a partition joining phase. We describe how an optimal multi-attribute hash (MAH) indexing scheme can be used to reduce the average cost of the partitioning phase of any hash-based join algorithm, by eliminating the partitioning phase entirely for many of the most common join queries. We demonstrate that the technique can be extended to include multiple copies of the data le, each with a diierent organization of the MAH indexing scheme, and that this further reduces the average cost of performing the partitioning phase of the hash join algorithm. We describe a relatively inexpensive method for determining a good MAH indexing scheme. Our experiments show that the schemes found using this method are usually optimal and performs the partitioning phase of the hash join algorithm at least three times faster than using the standard approach. We show that a signiicant change in the query pattern is required for a reorganization of the data le to be necessary, and show that reorganizing the data le is an inexpensive operation.
منابع مشابه
A Truncating Hash Algorithm for Processing Band-Join Queries
A non-equijoin of relations R and S is a band join if the join predicate requires values in the join attribute of R to fall within a specified band about the values in the join attribute of S. This paper describes a new algorithm, termed a truncating-hash band join, for evaluating band joins. This algorithm is based on the idea of truncating join attribute values in order to ezecute band joins ...
متن کاملAnalyzing In-Memory Hash Joins: Granularity Matters
Predicting the performance of join algorithms on modern hardware is challenging. In this work, we focus on mainmemory no-partitioning and partitioning hash join algorithms executing on multi-core platforms. We discuss the main parameters impacting performance, and present an effective performance model. This model can be used to select the most appropriate algorithm for different input data-set...
متن کاملMulti-Core, Main-Memory Joins: Sort vs. Hash Revisited
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join. The relative performance of these two join approaches have been a topic of discussion for a long time. With the advent of modern multicore architectures, it has been argued that sort-merge join is now a better choice than radix-hash join. This...
متن کاملOptimal Clustering of Relations to Improve Sorting and Partitioning for Joins
The sorting or partitioning of relations is very common in relational database systems. Implementations of the join operation include the sortmerge join algorithm, which sorts both relations, and the hash join algorithm, which usually partitions both relations. We describe how clustering records using an optimal multi-attribute hash (MAH) "le, taking the query pattern and distribution into acc...
متن کاملAn Improved Hash Function Based on the Tillich-Zémor Hash Function
Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.
متن کامل