Characterizing Drifts for Proactive Drift Detection in Data Streams
نویسندگان
چکیده
The evolution of data such as changes in the underlying model known as concept drift present many challenges for data stream research. Currently most drift detection methods are able to locate the point of change, but are unable to provide meaningful information on the characteristics of change or utilize historical trends. In this thesis, we investigate two streams of research: (1) the magnitude of change which we refer to as drift severity, and (2) the rate of change which we refer to as the stream volatility [7]. In the first part, we propose a drift detector, MAGSEED, for tracking the drift severity of a stream. Monitoring drift severity provides crucial information to users allowing them to formulate a more adaptive response. We show that our technique is capable of tracking drift severity with a high rate of true positives and a low rate of false positives and compare it to state-of-the art drift detectors ADWIN2 and DDM. In the second part, we explore ways to learn historical drift rate trends, and develop a proactive drift detection system. The main motivation for our work comes from the observation of volatility trends resulting from the application of current drift detection methods to real data streams. We observe that these patterns of change vary across different data streams. We use the term “volatility pattern” to describe change rates with a distinct distribution. We propose a novel drift prediction method, DPM, to predict the location of future drift points based on historical drift trends which we model as transitions between stream volatility patterns. Our method uses a probabilistic network to learn drift trends and is independent of the drift detection technique. We demonstrate that our method is able to learn and predict drift trends in streams with reoccurring volatility patterns. This allows the anticipation of future changes which enables users and detection methods to be more proactive. We then apply our drift prediction algorithm by incorporating the drift estimates into a drift detector, PROSEED, to improve its performance by decreasing the false positive rate.
منابع مشابه
A novel concept drift detection method in data streams using ensemble classifiers
Concept drift, change in the underlying distribution that data points come from, is an inevitable phenomenon in data streams. Due to increase in the number of data streams’ applications such as network intrusion detection, weather forecasting, and detection of unconventional behavior in financial transactions; numerous researches have recently been conducted in the area of concept drift detecti...
متن کاملCategorizing Concepts for Detecting Drifts in Stream
Mining evolving data streams for concept drifts has gained importance in applications like customer behavior analysis, network intrusion detection, credit card fraud detection. Several approaches have been proposed for detection of concept drifts in the context of supervised learning in data streams. Recently, researchers have been looking into the problem of identifying concept drifts in unlab...
متن کاملConcept drift detection in event logs using statistical information of variants
In recent years, business process management (BPM) has been highly regarded as an improvement in the efficiency and effectiveness of organizations. Extracting and analyzing information on business processes is an important part of this structure. But these processes are not sustainable over time and may change for a variety of reasons, such as the environment and human resources. These changes ...
متن کاملOn the Reliable Detection of Concept Drift from Streaming Unlabeled Data
Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has tr...
متن کاملReservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams
The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifi...
متن کامل