Adaptive Load Diffusion for Stream Joins
نویسندگان
چکیده
Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. One major challenge in processing stream joins is to handle continuous, high-volume, and time-varying data streams under resource constraints. In this paper, we present a novel load diffusion system to enable scalable execution of resource-intensive stream joins using an ensemble of server hosts. The load diffusion is achieved by a simple correlation-aware stream partition algorithm. Different from previous work, the load diffusion system can (1) achieve fine-grained load sharing in the distributed stream processing system; and (2) produce exact query answers without missing any join results or generate duplicate join results. Our experimental results show that the load diffusion scheme can greatly improve the system throughput and achieve more balanced load distribution.
منابع مشابه
Modeling of streamflow- suspended sediment load relationship by adaptive neuro-fuzzy and artificial neural network approaches (Case study: Dalaki River, Iran)
Modeling of stream flow–suspended sediment relationship is one of the most studied topics in hydrology due to itsessential application to water resources management. Recently, artificial intelligence has gained much popularity owing toits application in calibrating the nonlinear relationships inherent in the stream flow–suspended sediment relationship. Thisstudy made us of adaptive neuro-fuzzy ...
متن کاملGreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams
We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the...
متن کاملRuntime Optimization of Join Location in Parallel Data Management Systems
Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both m...
متن کاملAdaptive Fault-Tolerance for Dynamic Resource Provisioning in Distributed Stream Processing Systems
A growing number of applications require continuous processing of high-throughput data streams, e.g., financial analysis, network traffic monitoring, or Big Data analytics for smart cities. Stream processing applications typically require specific quality-of-service levels to achieve their goals; yet, due to the high time-variability of stream characteristics, it is often inefficient to statica...
متن کاملAdaptive Batching of Streams to Enhance Throughput and to Support Dynamic Load Balancing
As data permeates all disciplines, the role of big data becomes increasingly important. Sensors, IoT devices, social networks, and online transactions are all generating data that can be monitored constantly to enable a business to identify opportunity to enhance customer service and increase revenue. This need for real-time processing of big data has led to the development of frameworks for di...
متن کامل