A Generalized Join Algorithm
نویسنده
چکیده
Database query processing traditionally relies on three alternative join algorithms: index nested loops join exploits an index on its inner input, merge join exploits sorted inputs, and hash join exploits differences in the sizes of the join inputs. Cost-based query optimization chooses the most appropriate algorithm for each query and for each operation. Unfortunately , mistaken algorithm choices during compile-time query optimization are common yet expensive to investigate and to resolve. Our goal is to end mistaken choices among join algorithms by replacing the three traditional join algorithms with a single one. Like merge join, this new join algorithm exploits sorted inputs. Like hash join, it exploits different input sizes for unsorted inputs. In fact, for unsorted inputs, the cost functions for recursive hash join and for hybrid hash join have guided our search for the new join algorithm. In consequence, the new join algorithm can replace both merge join and hash join in a database management system. The in-memory components of the new join algorithm employ indexes. If the database contains indexes for one (or both) of the inputs, the new join can exploit persistent indexes instead of temporary in-memory indexes. Using database indexes to match input records, the new join algorithm can also replace index nested loops join. Results from an implementation of the core algorithm are reported.
منابع مشابه
GYM: A Multiround Join Algorithm In MapReduce And Its Analysis
We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the query in O(d+log(n)) rounds andO(n (IN +OUT) M ) communication cost, where M is the memory availabl...
متن کاملبهبود بهروزرسانی پایگاه داده تحلیلی نیمهآنی
Near-real time data warehouse gives the end users the essential information to achieve appropriate decisions. Whatever the data are fresher in it, the decision would have a better result either. To achieve a fresh and up-to-date data, the changes happened in the side of source must be added to the data warehouse with little delay. For this reason, they should be transformed in to the data wareh...
متن کاملChromatic and flow polynomials of generalized vertex join graphs and outerplanar graphs
A generalized vertex join of a graph is obtained by joining an arbitrary multiset of its vertices to a new vertex. We present a low-order polynomial time algorithm for finding the chromatic polynomials of generalized vertex joins of trees, and by duality we find the flow polynomials of arbitrary outerplanar graphs. We also present closed formulas for the chromatic and flow polynomials of vertex...
متن کاملGeneralized Parallel Join Algorithms and Designing Cost Models
Applications for large-scale data analysis use such techniques as parallel DBMS, MapReduce (MR) paradigm, and columnar storage. In this paper we focus in a MapReduce environment. The aim of this work is to compare the different join algorithms and designing cost models for further use in the query optimizer.
متن کاملGeneralized Hash Teams for Join and Group-by
We propose a new class of algorithms that can be used to speed up the execution of multi-way join queries or of queries that involve one or more joins and a group-by. These new evaluation techniques allow to perform several hash-based operations (join and grouping) in one pass without repartitioning intermediate results. These techniques work particularly well for joining hierarchical structure...
متن کامل