Estimating Dominance Norms of Multiple Data Streams
نویسندگان
چکیده
There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i’s correspond to the domain, j’s index the different signals and ai,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as i maxj{ai,j}. It may be thought as estimating the norm of the “upper envelope” of the multiple signals, or alternatively, as estimating the norm of the “marginal” distribution of tabular data streams. It is used in applications to estimate the “worst case influence” of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b — min-dominance ( i minj{ai,j}), countdominance (|{i|ai > bi}|) or relative-dominance ( i ai/max{1, bi} ) — are all impossible to estimate accurately with sublinear space.
منابع مشابه
Solving the Paradox of Multiple IRR\'s in Engineering Economic Problems by Choosing an Optimal -cut
Until now single values of IRR are traditionally used to estimate the time value of cash flows. Since uncertainty exists in estimating cost data, the resulting decision may not be reliable. The most commonly cited drawbacks to using the internal rate of return in evaluatton of deterministic cash flow streams is the possibility of multiple conflicting internal rates of return. In this paper we p...
متن کاملComparing Data Streams Using Hamming Norms (How to Zero In)
Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases, and instead must be processed “on the fly” as they are produced. Similarly, sensor networks produce multiple data streams of observations from their sensors. There is growing focus on ma...
متن کاملMax-stable sketches: estimation of Lp-norms, dominance norms and point queries for non-negative signals
Let f : {1, 2, . . . , N} → [0,∞) be a non–negative signal, defined over a very large domain and suppose that we want to be able to address approximate aggregate queries or point queries about f . To answer queries about f , we introduce a new type of random sketches calledmax–stable sketches. The (ideal precision) max–stable sketch of f , Ej(f), 1 ≤ j ≤ K, is defined as: Ej(f) := max 1≤i≤N f(i...
متن کاملPriority Setting Meets Multiple Streams: A Match to Be Further Examined?; Comment on “Introducing New Priority Setting and Resource Allocation Processes in a Canadian Healthcare Organization: A Case Study Analysis Informed by Multiple Streams Theory”
With demand for health services continuing to grow as populations age and new technologies emerge to meet health needs, healthcare policy-makers are under constant pressure to set priorities, ie, to make choices about the health services that can and cannot be funded within available resources. In a recent paper, Smith et al apply an influential policy studies framework – Kingdon’s multiple str...
متن کاملSelectivity Estimation over Multiple Data Streams using Micro-clustering
Selectivity estimation is an important task for query optimization. We propose a technique to perform range query estimation over multiple data streams using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters and cosine series for all streams. These microclusters maintain data distribution information about the stream values using cosine coefficients. These ...
متن کامل