Overload Management in Data Stream Processing Systems with Latency Guarantees
نویسندگان
چکیده
Stream processing systems are becoming increasingly important to analyse real-time data generated by modern applications such as online social networks. Their main characteristic is to produce a continuous stream of fresh results as new data are being generated at real-time. Resource provisioning of stream processing systems is difficult due to time-varying workload data that induce unknown resource demands over time. Despite the development of scalable stream processing systems, which aim to provision for workload variations, there still exist cases where such systems face transient resource shortages. During overload, there is a lack of resources to process all incoming data in real-time; data accumulate in memory and their processing latency grows uncontrollably compromising the freshness of stream processing results. In this paper, we present a feedback control approach to design a nonlinear discrete-time controller that has no knowledge of the system to be controlled or the workload for the data and is still able to control the average tuple end-to-end latency in a single-node stream processing system. The results, of our evaluation on a prototype stream processing system, show that our method controls the average tuple end-to-end latency despite the time-varying workload demands and increasing number of queries.
منابع مشابه
UpStream: Storage-centric Load Management for Data Stream Processing Systems
Processing fast updating data streams in real-time must reflect the most recent data. A number of technologies including Data Stream Management Systems have emerged to respond to this challenge. While running their queries in a continuous fashion on high-volume push-based data streams (e.g. sensor data, GPS coordinates, stock quotes), one of the most important optimization problems that these s...
متن کاملHow to Screen a Data Stream - Quality-Driven Load Shedding in Sensor Data Streams
As most data stream sources exhibit bursty data rates, data stream management systems must recurrently cope with load spikes that exceed the average workload to a considerable degree. To guarantee low-latency processing results, load has to be shed from the stream, when data rates overstress system resources. There exist numerous load shedding strategies to delete excess data. However, the cons...
متن کاملContent-based Load Shedding in Multimedia Data Stream Management System
Overload management has become very important in public safety systems that analyse high performance multimedia data streams, especially in the case of detection of terrorist and criminal dangers. Efficient overload management improves the accuracy of automatic identification of persons suspected of terrorist or criminal activity without requiring interaction with them. We argue that in order t...
متن کاملLatency-aware Elastic Scaling for Distributed Data Stream Processing
Elastic scaling allows a data stream processing system to react to a dynamically changing query or event workload by automatically scaling in or out. Thereby, both unpredictable load peaks as well as underload situations can be handled. However, each scaling decision comes with a latency penalty due to the required operator movements. Therefore, in practice an elastic system might be able to im...
متن کاملState Management in Apache Flink®: Consistent Stateful Distributed Stream Processing
Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial...
متن کامل