Counting Distinct Elements in a Data Stream
نویسندگان
چکیده
We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ǫ. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.
منابع مشابه
An Evaluation of Streaming Algorithms for Distinct Counting Over a Sliding Window
Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring. On a stream of elements, it is commonly needed to compute an aggregate over only the most recent elements, leading to the problem of distinct counting over a “sliding window” of the stream. We present a detailed...
متن کاملSketching and streaming —
Distinct elements (F0). In this note we will consider the distinct elements problem, also known as the F0 problem, defined as follows. We are given a stream of integers i1, . . . , im ∈ [n] where [n] denotes the set {1, 2, . . . , n}. We would like to output the number of distinct elements seen in the stream. As with Morris’ approximate counting algorithm, our goal will be to minimize our space...
متن کاملRange-Efficient Counting of Distinct Elements in a Massive Data
Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider range-efficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer, but an interval of integers. We present a randomized alg...
متن کاملData Streams as Random Permutations: the Distinct Element Problem
We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of k-records occurring in it. The algorithm has a score of interesting properties, such as providing a random sample of the set underlying the stream. To the best of our knowledge, a modified version of RECORDINALITY is the first cardinality estimation...
متن کاملRange-Efficient Counting of Distinct Elements in a Massive Data Stream
Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider rangeefficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002