A Cost Model for Estimating the Performance of Spatial Joins Using R-trees
نویسندگان
چکیده
The development of a cost model for predicting the performance of spatial joins has been identified in the literature as an important and difficult problem. In this paper, we present the first cost model that can predict the performance of spatial joins using R-trees. Based on two existing R-trees (join targets), our model first estimates the number of expected I/Os for the join process by assuming a zero buffer size. Our method for this estimation extends the cost model for R-tree window queries (developed by Kamel and Faloutsos and by Pagel et al.) to also handle spatial joins (which are more complex). In the context of spatial join processing, this number of zero-buffer expected I/Os is not practical for performance prediction in a buffered environment. To model the buffer impact, we use an (exponential) distribution function to measure the probability that a bufferless I/O would cause a page fault in a buffered environment. Based on this probability and the zero-buffer expected I/O cost, the estimated number of I/Os for an R-tree join can then be computed. The comparisons between the predictions from our cost model and the actual results from our experiments based on real GIS maps show that the average relative error ratio is about 10% with a maximum of about 20% for a wide range of buffer sizes. Therefore, our model is a useful tool for the query optimization of spatial join queries. This work was supported in part by the University of Michigan ITS Research Center of Excellence grant (DTFH61-93-X-00017-Sub) sponsored by the U.S. Dept. of Transportation and by the Michigan Dept. of Transportation. N. Jing was supported in part by the State Education Commission of P.R. China. y This work was performed while the author was at the University of Michigan. z This work was performed while the author was a visitor at the University of Michigan. x This work was performed while the author was a faculty member of the University of Michigan.
منابع مشابه
A Fast Algorithm for high-dimensional Similarity Joins
Many emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the -kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of nd...
متن کاملHigh-dimensional Proximity Joins
Many emerging data mining applications require a proximity (similarity) join between points in a high-dimensional domain. We present a new algorithm that utilizes a new data structure, called the -kd tree, for fast spatial proximity joins on high-dimensional points. This data structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal c...
متن کاملCost models for distance joins queries using R-trees
The K-Closest-Pairs Query (K-CPQ), a type of distance join in spatial databases, discovers the K pairs of objects formed from two different datasets with the K smallest distances. Recently, branch-and-bound algorithms based on R-trees have been developed in order to answer K-CPQs efficiently. For query optimization purposes, analytical models are needed to estimate the processing cost of a spec...
متن کاملAn Efficient Cost Model for Spatial Joins Using R-trees
Spatial join is one of the fundamental operations in a Spatial Data Base Management System. Recently, the family of R-tree-based data structures has been adopted to support the execution of spatial joins. This paper introduces an analytical model that efficiently estimates the cost (in terms of disk accesses) of a spatial join query between two spatial datasets. The proposed model is based on a...
متن کاملThe efficiency of sampling indices in estimating the spatial pattern of wooden species in central zagros forests (Kalkhani forest in Kouhdasht, Lorestan province, Iran)
It is so important to apply suitable methods to have a reliable estimation of the spatial distribution of trees. This research was aimed to determine and evaluate the spatial pattern of five species by distance- and density-based indices (Quercus brantii, Acer moncepesulanum, Crataegus aronia, Pistacia atlantica & Amygdalus lycioides) in the Kalkhani Forest in Koudasht Lorestan province, Iran. ...
متن کامل