A Probabilistic Analysis of Trie-Based Sorting of Large Collections of Line Segments in Spatial Databases

نویسندگان

  • Michael Lindenbaum
  • Hanan Samet
  • Gísli R. Hjaltason
چکیده

The size of five trie-based methods of sorting large collections of line segments in a spatial database is investigated analytically using a random lines image model and geometric probability techniques. The methods are based on sorting the line segments with respect to the space that they occupy. Since the space is two-dimensional, the trie is formed by interleaving the bits corresponding to the binary representation of the x and y coordinates of the underlying space and then testing two bits at each iteration. The result of this formulation yields a class of representations that are referred to as quadtrie variants, although they have been traditionally referred to as quadtree variants. The analysis differs from prior work in that it uses a detailed explicit model of the image instead of relying on modeling the branching process represented by the tree and leaving the underlying image unspecified. The analysis provides analytic expressions and bounds on the expected size of these quadtree variants. This enables the prediction of storage required by the representations and of the associated performance of algorithms that rely on them. The results are useful in the following two ways: 1. They reveal the properties of the various representations and permit their comparison using analytic, nonexperimental criteria. Some of the results confirm previous analyses (e.g., that the storage requirement of the MX quadtree is proportional to the total lengths of the line segments). An important new result is that for a PMR and Bucket PMR quadtree with sufficiently high values of the splitting threshold (i.e., ≥ 4) the number of nodes is proportional to the number of line segments and is independent of the maximum depth of the tree. This provides a theoretical justification for the good behavior and use of the PMR quadtree, which so far has been only of an empirical nature. 2. The random lines model was found to be general enough to approximate real data in the sense that the properties of the trie representations, when used to store real data (e.g., maps), are similar to their properties when storing random lines data. Therefore, by specifying an equivalent random lines model for a real map, the proposed analytical expressions can be applied to predict the storage required for real data. Specifying the equivalent random lines model requires only an estimate of the effective number of random lines in it. Several such estimates are derived for real images, and the accuracy of the implied predictions is demonstrated on a real collection of maps. The agreement between the predictions and real data suggests that they could serve as the basis of a cost model that can be used by a query optimizer to generate an appropriate query evaluation plan.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Adaptive Algorithm for Splitting Large Sets of Strings and Its Application to Efficient External Sorting

In this paper, we study the problem of sorting a large collection of strings in external memory. Based on adaptive construction of a summary data structure, called adaptive synopsis trie, we present a practical string sorting algorithm DistStrSort, which is suitable to sorting string collections of large size in external memory, and also suitable for more complex string processing problems in t...

متن کامل

Periodic Oscillations in the Analysis of Algorithms and Their Cancellations

A large number of results in analysis of algorithms contain fluctuations. A typical result might read “The expected number of . . . for large n behaves like log2 n + constant + delta(log2 n), where delta(x) is a periodic function of period one and mean zero.” Examples include various trie parameters, approximate counting, probabilistic counting, radix exchange sort, leader election, skip lists,...

متن کامل

Layout Analysis based on Text Line Segment Hypotheses

This paper describes on-going work in the development of a document layout analysis system based on text line segments. It describes two novel algorithms: gapped text line finding, which can identify text line segments, taking into account per-text line font information for the determination of where text line segments break, and reading order recovery for text line segments using topological s...

متن کامل

Modeling of a Probabilistic Re-Entrant Line Bounded by Limited Operation Utilization Time

This paper presents an analytical model based on mean value analysis (MVA) technique for a probabilistic re-entrant line. The objective is to develop a solution method to determine the total cycle time of a Reflow Screening (RS) operation in a semiconductor assembly plant. The uniqueness of this operation is that it has to be borrowed from another department in order to perform the production s...

متن کامل

Probabilistic View of Occurrence of Large Earthquakes in Iran

In this research seismicity parameters, repeat times and occurrence probability of large earthquakes are estimated for 35 seismic lineaments in Persian plateau and the surrounding area. 628 earthquakes of historical time and present century with MW>5.5 were used for further data analysis. A probabilistic model is used for forecasting future large earthquake occurrences in each chosen lineament....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Comput.

دوره 35  شماره 

صفحات  -

تاریخ انتشار 2005