An Approximate Lp-Di erence Algorithm for Massive Data Streams
نویسندگان
چکیده
Several recent papers have shown how to approximate the diierence P i jai ? bi j or P jai ? bi j 2 between two functions, when the function values ai and bi are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream and approximate with small relative error. Using diierent techniques, we show how to approximate the L p-diierence P i jai ?bij p for any rational-valued p 2 (0; 2], with comparable eeciency and error. We also show how to approximate P i jai ? bi j p for larger values of p but with a worse error guarantee. Our results ll in gaps left by recent work, by providing an algorithm that is precisely tunable for the application at hand. These results can be used to assess the diierence between two chronologically or physically separated massive data sets, making one quick pass over each data set, without buuering the data or requiring the data source to pause. For example, one can use our techniques to judge whether the traac on two remote network routers are similar without requiring either router to transmit a copy of its traac. A web search engine could use such algorithms to construct a library of small \sketches," one for each distinct page on the web; one can approximate the extent to which new web pages duplicate old ones by comparing the sketches of the web pages. Such techniques will become increasingly important as the enormous scale, distributional nature, and one-pass processing requirements of data sets become more commonplace.
منابع مشابه
An Approximate Lp-Difference Algorithm for Massive Data Streams
Several recent papers have shown how to approximate the difference ∑i |ai−bi| or ∑ |ai−bi| between two functions, when the function values ai and bi are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream. They approximate with small rel...
متن کاملOrr Sommerfeld Solver Using Mapped Finite Di?erence Scheme for Plane Wake Flow
Linear stability analysis of the three dimensional plane wake flow is performed using a mapped finite di?erence scheme in a domain which is doubly infinite in the cross–stream direction of wake flow. The physical domain in cross–stream direction is mapped to the computational domain using a cotangent mapping of the form y = ?cot(??). The Squire transformation [2], proposed by Squire, is also us...
متن کاملFixed-structure discrete-time H2/H-infinity controller synthesis using the delta operator
This paper considers the ®xed-structure, discrete-time mixed H2=H1 controller synthesis problem in the delta operator (di erence operator) framework. The di erential operator and shift operator versions of the problem are reviewed for comparison, and necessary conditions are derived for all three formulations. A quasi-Newton/continuation algorithm is then used to obtain approximate solutions ...
متن کاملAn Approximate L1-Difference Algorithm for Massive Data Streams
We give a space-efficient, one-pass algorithm for approximating the L1 difference Pi jai bij between two functions, when the function values ai and bi are given as data streams, and their order is chosen by an adversary. Our main technical innovation is a method of constructing families fVjg of limited-independence random variables that are range-summable, by which we mean that Pc 1 j=0 Vj(s) i...
متن کاملAn Approximate L-Difference Algorithm for Massive Data Streams
Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in streams, and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce “synopses” or “sketches” for furth...
متن کامل