Efficient Approximation of Correlated Sums on Data Streams

نویسندگان

  • Rohit Ananthakrishna
  • Abhinandan Das
  • Johannes Gehrke
  • Flip Korn
  • S. Muthukrishnan
  • Divesh Srivastava
چکیده

In many applications such as IP network management, data arrives in streams, and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries formed by composing basic aggregates on (x, y) pairs, and are of the form SUM{g(y) : x ≤ f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CSaggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna’s summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much lesser space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large. Index: Correlated aggregates, data streams, approximation, summary structures, a priori error bounds, IP network management.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time-Decayed Correlated Aggregates over Data Streams

Data stream analysis frequently relies on identifying correlations and posing conditional queries on the data after it has been seen. Correlated aggregates form an important example of such queries, which ask for an aggregation over one dimension of stream elements which satisfy a predicate on another dimension. Since recent events are typically more important than older ones, time decay should...

متن کامل

Influence of Stream channel morphology and in-stream habitats on fish community in Golestan province Streams

Four streams with different sizes were selected for studying the effects of environmental factors on fish assemblages using indirect (Detrended Correspondence Analysis, DCA) and direct (Redundancy Analysis, RDA) gradient analysis in Golestan province. DCA of presence-absence and relative abundance data showed well gradient and linear model of species variability. In the within-site RDA, environ...

متن کامل

Efficient simulation of tail probabilities of sums of correlated lognormals

We consider the problem of efficient estimation of tail probabilities of sums of correlated lognormals via simulation. This problem is motivated by the tail analysis of portfolios of assets driven by correlated Black-Scholes models. We propose two estimators that can be rigorously shown to be efficient as the tail probability of interest decreases to zero. The first estimator, based on importan...

متن کامل

On the Variance of Subset Sum Estimation

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries to arbitrary subset sums. With unit weights, we can compute subset sizes which together with the previous sums provide the subset averages. The question addre...

متن کامل

Some results of 2-periodic functions by Fourier sums in the space Lp(2)

In this paper, using the Steklov function, we introduce the generalized continuity modulus and denethe class of functions Wr;kp;' in the space Lp. For this class, we prove an analog of the estimates in [1]in the space Lp.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2003