Selectivity Estimation for Joins Using Systematic Sampling
نویسندگان
چکیده
We propose a new approach to the estimation of join selectivity. The technique, which we have called “systematic sampling”, is a novel variant of the sampling-based approach. Systematic sampling works as follows: Given a relation R of N tuples, with a join attribute that can be accessed in ascending/descending order via an index, if n is the number of tuples to be sampled from R, select a tuple at random from the first k = dNn e tuples of R and every kth tuple thereafter. We first develop a theoretical foundation for systematic sampling which suggests that the method gives a more representative sample than the traditional simple random sampling. Subsequent experimental analysis on a range of synthetic relations confirms that the quality of sample relations (participating in a join) yielded by systematic sampling is higher than those produced by the traditional simple random sampling. To ensure that the sample relations produced by the systematic sampling indeed assist in computation for more accurate join selectivities, we compare the systematic sampling with the most efficient simple random sampling called t_cross using a variety of star joins and a variety of relation configurations. The results demonstrate that with the same amounts of sampling, the systematic sampling can provide considerably more accurate join selectivities than the t_cross sampling.
منابع مشابه
Selectivity Estimation for Spatial Joins
Spatial Joins are important and time consuming operations in spatial database management systems. It is crucial to be able to accurately estimate the performance of these operations so that one can derive efficient query execution plans, and even develop/refine data structures to improve their performance. While estimation techniques for analyzing the performance of other operations, such as ra...
متن کاملA Study on the Accuracy and Precision of Estimation of the Number, Basal Area and Standing Trees Volume per Hectare Using of some Sampling Methods in Forests of NavAsalem
The present study aimed to investigate the accuracy and precision estimation of the number, basal area and volume of the standing trees by methods of random and systematic random sampling in the forests of West Guilan. The cost or inventory time was determined using the criteria (E%2 × T). Inventory was carried out by complete sampling (census) in an area of 52 hectares. The study area (sect...
متن کاملSelectivity Estimation for Spatial Joins with Geometric Selections
Spatial join is an expensive operation that is commonly used in spatial database systems. In order to generate efficient query plans for the queries involving spatial join operations, it is crucial to obtain accurate selectivity estimates for these operations. In this paper we introduce a framework for estimating the selectivity of spatial joins constrained by geometric selections. The center p...
متن کاملEstimating Join Selectivities using Bandwidth-Optimized Kernel Density Models
Accurately predicting the cardinality of intermediate plan operations is an essential part of any modern relational query optimizer. The accuracy of said estimates has a strong and direct impact on the quality of the generated plans, and incorrect estimates can have a negative impact on query performance. One of the biggest challenges in this field is to predict the result size of join operatio...
متن کاملMulti-way spatial join selectivity for the ring join graph
Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that they compute the relationships (e.g. intersect) among the spatial data. Thus, accurate estimation of the spatial join selectivity is essential to ...
متن کامل