Using Difficulty of Prediction to Decrease Computation: Fast Sort, Priority Queue and Convex Hull on Entropy Bounded Inputs
نویسندگان
چکیده
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see [Vitter,KrishnanSl], [Karlin,Philips,Raghavan92], [Raghavan9 for use of Markov models for on-line algorithms, e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vise versa), and showed that on-line algorithms can improve their performance by prediction. Actual page access sequences are in fact somewhat compressible, so their predictive methods can be of benefit. This paper investigates the interesting idea of decreasing computation by using learning in the opposite way, namely to determine the difficulty of prediction. Tha t is, we will a p proximately learn the input distribution, and then improve the performance of the computation when the input i s not too predictable, rather than the reverse. To our knowledge, this is first case of a computational problem where we do not assume any particular fixed input distribution and yet computation is decreased when the input is less predictable, rather than the reverse. We concentrate our investigation on a basic computational problem: sorting and a basic data structure problem: maintaining a priority queue. We present the first known case of sorting and priority queue algorithms whose complexity depends on the binary entropy H 5 1 of input keys where assume that input keys are generated from an unknown but arbitrary stationary ergodic source. This is, we assume that each of the input keys can be each arbitrarily long, but have entropy H. Note that H can be estimated in practice since the compression ratio p using optimal Ziv-Lempel compression limits t o 1/H for large inputs. Although sets of keys found in practice can not be expected to satisfy any fixed particular distribution such as uniform distribution, there is a large well documented body of empirical evidence that shows this compression ratio p and thus 1 / H is a constant for realistic inputs encountered in practice [l, 311, say typicall around 3 to a t most 20. Our algorithm runs in O(nlog($)) sequential expected time to sort n keys in a unit cost sequential RAM machine. This is O(n log log n) with the very reasonable assumption that the compression *Email addresses: reifQcs.duke.edu and chenQcs.duke.edu. S u p ported by DARPA/ISTO Contracts N00014-8SK-0458, DARPA N00014-91-J-1985, N00014-91-C-0114, NASA subcontract 550-63 of prime contract NAS5-30428, US-Israel Binational NSF Grant 8800282/2, and NSF Grant NSF-IRI-91-00681. 0272-5428/93 $03.00
منابع مشابه
Using Diiculty of Prediction to Decrease Computation: Fast Sort, Priority Queue and Convex Hull on Entropy Bounded Inputs
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently (e.g. see Vitter,Krishnan91], Karlin,Philips,Raghavan92], Raghavan92] for use of Markov models for on-line algorithms, e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vi...
متن کاملUsing Learning and Difficulty of Prediction to Decrease Computation: A Fast Sort and Priority Queue on Entropy Bounded Inputs
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently, (e.g. see [Vitter,Krishnan,FOCS91], [Karlin,Philips,Raghavan,FOCS92] [Raghavan92]) for use of Markov models for on-line algorithms e.g., cashing and prefetching). Their results used the fact that compressible sources are predic...
متن کاملMemory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls
We consider space-bounded computations on a random-access machine (RAM) where the input is given on a read-only random-access medium, the output is to be produced to a writeonly sequential-access medium, and the available workspace allows random reads and writes but is of limited capacity. The length of the input is N elements, the length of the output is limited by the computation, and the cap...
متن کاملSweep Line Algorithm for Convex Hull Revisited
Convex hull of some given points is the intersection of all convex sets containing them. It is used as primary structure in many other problems in computational geometry and other areas like image processing, model identification, geographical data systems, and triangular computation of a set of points and so on. Computing the convex hull of a set of point is one of the most fundamental and imp...
متن کاملActive Data Structures and Applications to Dynamic and Kinetic Algorithms
We propose and study a novel data-structuring paradigm, called active data structures. Like a time machine, active data structures allow changes to occur not only in the present but at any point in time—including the past. Unlike most time machines, where changes to the past are incorporated and propagated automatically by magic, active data structures systematically communicate with the affect...
متن کامل