PhD Proposal: Functional monitoring problem for distributed large-scale data streams

نویسندگان

  • Emmanuelle Anceaume
  • Yann Busnel
  • Bruno Sericola
چکیده

In this PhD proposal, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, several fundamental problems has been raised recently, that concern many domains including machine learning, data mining, databases, information retrieval, and network monitoring. In all these applications, it is necessary to quickly and precisely process a huge amount of data. We propose to combine sampling techniques and information-theoretic methods to extract pertinent information from such a streams (metrics, summaries, pattern matching, etc.). Unfortunately, computing information theoretic measures in the data stream model is challenging essentially because one needs to process a huge amount of data sequentially, on the fly, and by using very little storage with respect to the size of the stream. In addition the analysis must be robust over time to detect any sudden change in the observed streams (which may be the manifestation of routers deny of service attack or worm propagation). On the other hand, very few works have tackled the distributed streaming model, also called the functional monitoring problem [12], which combines features of both the streaming model and communication complexity models. As in the streaming model, the input data is read on the fly, and processed with a minimum workspace and time. In the communication complexity model, each node receives an input data stream, performs some local computation, and communicates only with a coordinator who wishes to continuously compute or estimate a given function of the union of all the input streams. The challenging issue in this model is for the coordinator to compute the given function by minimizing the number of communicated bits [12, 6, 15].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PhD Research Proposal: Fault Tolerance and Quality of Service in Large-Scale Networked Virtual Environments

The Research Proposal is a part of the project: Middleware Services for Management of Shared State in Large-Scale Distributed Interactive Applications (MiSMoSS). MiSMoSS is funded by the Research Council of Norway and is Project No. 15992/431. The project is expected to lead to three PhD theses supervised by faculty members Carsten Griwodz, Paal Halvorsen and Ellen Munthe-Kaas in the Networks a...

متن کامل

Scalable Sum-shrinkage Schemes for Distributed Monitoring Large-scale Data Streams

In this article, we investigate the problem of monitoring independent large-scale data streams where an undesired event may occur at some unknown time and affect only a few unknown data streams. Motivated by parallel and distributed computing, we propose to develop scalable global monitoring schemes by parallel running local detection procedures and by using the sum of the shrinkage transformat...

متن کامل

Dynamic Querying of Streaming Data with the dQUOB System

Data streaming has established itself as a viable communication abstraction in data-intensive parallel and distributed computations, occurring in applications such as scientific visualization, performance monitoring, and large-scale data transfer. A known problem in large-scale event communication is tailoring the data received at the consumer. It is the general problem of extracting data of in...

متن کامل

Sketch-based Geometric Monitoring of Distributed Stream Queries

Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the spaceand time-efficiency requirements of conventional stream processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communic...

متن کامل

Approximate Geometric Query Tracking over Distributed Streams

Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013