A Survey of Change Diagnosis Algorithms in Evolving Data Streams

نویسنده

  • Charu C. Aggarwal
چکیده

An important problem in the field of data stream analysis is change detection and monitoring. In many cases, the data stream can show changes over time which can be used for understanding the nature of several applications. We discuss the concept of velocity density estimation, a technique used to understand, visualize and determine trends in the evolution of fast data streams. We show how to use velocity density estimation in order to create both temporal velocity proJiles and spatial velocity profiles at periodic instants in time. These profiles are then used in order to predict three kinds of data evolution. Methods are proposed to visualize the changing data trends in a single online scan of the data stream, and a computational requirement which is linear in the number of data points. In addition, batch processing techniques are proposed in order to identify combinations of dimensions which show the greatest amount of global evolution. We also discuss the problem of change detection in the context of graph data, and illustrate that it may often be useful to determine communities of evolution in graph environments. The presence of evolution in data streams may also change the underlying data to the extent that the underlying data mining models may need to be modified to account for the change in data distribution. We discuss a number of methods for micro-clustering which are used to study the effect of evolution on problems such as clustering and classification. 86 DATA STREAMS: MODELS AND ALGORITHMS

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Single-Pass Algorithms for Mining Frequency Change Patterns with Limited Space in Evolving Append-Only and Dynamic Transaction Data Streams

In this paper, we propose an online single-pass algorithm MFC-append (Mining Frequency Change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called ChangeSketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modifie...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

An adaptive algorithm for anomaly and novelty detection in evolving data streams

In the era of big data, considerable research focus is being put on designing efficient algorithms capable of learning and extracting high-level knowledge from ubiquitous data streams in an online fashion. While, most existing algorithms assume that data samples are drown from a stationary distribution, several complex environments deal with data streams that are subject to change over time. In...

متن کامل

Classifying Evolving Data Streams for Intrusion Detection

Stream data classification is a challenging problem because of two important properties: its infinite length and evolving nature. Traditional learning algorithms that require several passes on the training data are not directly applicable to stream classification problem because of the infinite length of the data stream. Data streams may evolve in several ways: the prior probability distributio...

متن کامل

Streaming Multi-label Classification

This paper presents a new experimental framework for studying multi-label evolving stream classification, with efficient methods that combine the best practices in streaming scenarios with the best practices in multi-label classification. Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non strea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007