Stream Quantiles via Maximal Entropy Histograms
نویسندگان
چکیده
We address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We (i) highlight the limitations of approaches previously described in the literature which make them unsuitable for non-stationary streams, (ii) describe a novel principle for the utilization of the available storage space, and (iii) introduce two novel algorithms which exploit the proposed principle. Experiments on three large real-world data sets demonstrate that the proposed methods vastly outperform the existing alternatives.
منابع مشابه
CR-precis: A Deterministic Summary Structure for Update Data Streams
We present deterministic sub-linear space algorithms for problems over update data streams, including, estimating frequencies of items and ranges, finding approximate frequent items and approximate φ-quantiles, estimating inner-products, constructing near-optimal B-bucket histograms and estimating entropy. We also present improved lower bound results for several problems over update data streams.
متن کاملMeasures of maximal entropy
We extend the results of Walters on the uniqueness of invariant measures with maximal entropy on compact groups to an arbitrary locally compact group. We show that the maximal entropy is attained at the left Haar measure and the measure of maximal entropy is unique.
متن کاملDeterministically Estimating Data Stream Frequencies
We consider updates to an n-dimensional frequency vector of a data stream, that is, the vector f is updated coordinate-wise by means of insertions or deletions in any arbitrary order. A fundamental problem in this model is to recall the vector approximately, that is to return an estimate f̂ of f such that ∣f̂i − fi∣ < ∥f∥p, for every i = 1, 2, . . . , n, where is an accuracy parameter and p is th...
متن کاملar X iv : c s / 06 09 03 2 v 1 [ cs . D S ] 7 S ep 2 00 6 CR - precis : A deterministic summary structure for update data streams
We present the CR-precis structure, that is a general-purpose, deterministic and sub-linear data structure for summarizing update data streams. The CR-precis structure yields the first deterministic sub-linear space/time algorithms for update streams for answering a variety of fundamental stream queries, such as, (a) point queries, (b) range queries, (c) finding approximate frequent items, (d) ...
متن کامل