Adaptive sampling for geometric problems over data streams
نویسندگان
چکیده
منابع مشابه
KSample: Dynamic Sampling Over Unbounded Data Streams
Data sampling over data streams is common practice to allow the analysis of data in real-time. However, sampling over data streams becomes complex when the stream does not fit in memory, and worse yet, when the length of the stream is unknown. A well-known technique for sampling data streams is the Reservoir Sampling. It requires a fixed-size reservoir that corresponds to the resulting sample s...
متن کاملWeighted Random Sampling over Data Streams
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2,4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
متن کاملStratified Reservoir Sampling over Heterogeneous Data Streams
Reservoir sampling is a well-known technique for random sampling over data streams. In many streaming applications, however, an input stream may be naturally heterogeneous, i.e., composed of substreams whose statistical properties may also vary considerably. For this class of applications, the conventional reservoir sampling technique does not guarantee a statistically sufficient number of tupl...
متن کاملSensitivity Sampling Over Dynamic Geometric Data Streams with Applications to k-Clustering
Sensitivity based sampling is crucial for constructing nearly-optimal coreset for k-means / median clustering. In this paper, we provide a novel data structure that enables sensitivity sampling over a dynamic data stream, where points from a high dimensional discrete Euclidean space can be either inserted or deleted. Based on this data structure, we provide a one-pass coreset construction for k...
متن کاملClustering Geometric Data Streams
Using recent knowledge in data stream clustering we present a modified approach to the facility location problem in the context of geometric data streams. We give insight to the existing algorithm from a less mathematical point of view, focusing on understanding and practical use, namely by computer graphics experts. We propose a modification of the original data stream k-median clustering to s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Geometry
سال: 2008
ISSN: 0925-7721
DOI: 10.1016/j.comgeo.2006.10.004