Load Shedding in XML Streams
نویسندگان
چکیده
Because of the high volume and unpredictability arrival of data streams, stream processing systems may not always be able to keep up with the input — resulting in buffer overflow and uncontrolled loss of data. Load shedding, the prevalent strategy for solving this overflow problem, has todate been considered for relational stream engines. On the other hand face additional challenges and opportunities for ”structural shedding”, due to the complex nested XML input and result structures. We now tackle this open XML shedding problem by a three-pronged solution. First, we develop a preference model for XQuery to enable users to specify the relative importance of preserving different subpattern in the complex XML result structure. This transforms shedding into the problem of rewriting the user query into possibly several shedding queries that return approximate query answers yet with the highest possible utility as measured by the given user preference model. Two, we develop a cost model to compare both the performance and the utility of alternate shedding queries. Third,we propose two solutions: OptShed, and FastShed. OptShed guarantees to find an optimal solution however at the cost of an exponential complexity. FashShed as confirmed by our experiments, efficiently achieves a close-to-optimal result in a wide range of cases. Lastly we describe the in-automaton shedding mechanism for Raindrop system. The experimental results show that our proposed preference-driven shedding solutions always consistently achieve higher utility results compared to the existing “relational” shedding techniques.
منابع مشابه
Delivering Qos in Xml Data Stream Processing Using Load Shedding
In recent years, we have witnessed the emergence of new types of systems that deal with large volumes of streaming data. Examples include financial data analysis on feeds of stock tickers, sensorbased environmental monitoring, network track monitoring and click stream analysis to push customized advertisements or intrusion detection. Traditional database management systems (DBMS), which are ver...
متن کاملContinuously Providing Approximate Results under Limited Resources: Load Shedding and Spilling in XML Streams
Because of the high volume and unpredictable arrival rates, stream processing systems may not always be able to keep up with the input data streams, resulting in buffer overflow and uncontrolled loss of data. To continuously supply online results, two alternate solutions to tackle this problem of unpredictable failures of such overloaded systems can be identified. One technique, called load she...
متن کاملLoadstar: A Load Shedding Scheme for Classifying Data Streams
We consider the problem of resource allocation in mining multiple data streams. Due to the large volume and the high speed of streaming data, mining algorithms must cope with the effects of system overload. How to realize maximum mining benefits under resource constraints becomes a challenging task. In this paper, we propose a load shedding scheme for classifying multiple data streams. We focus...
متن کاملLoadstar: Load Shedding in Data Stream Mining
In this demo, we show that intelligent load shedding is essential in achieving optimum results in mining data streams under various resource constraints. The Loadstar system introduces load shedding techniques to classifying multiple data streams of large volume and high speed. Loadstar uses a novel metric known as the quality of decision (QoD) to measure the level of uncertainty in classificat...
متن کاملLoad Shedding in Data Stream Systems
Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. In this chapter, we focus on one particular type of adaptivity: the ability to gracefully degrade performance via "load shedding" (dropping unprocessed tuples to reduce system load) when the demands placed on the system cannot...
متن کامل