Adaptive Methods for Classification in Arbitrarily Imbalanced and Drifting Data Streams
نویسندگان
چکیده
Streaming data is pervasive in a multitude of data mining applications. One fundamental problem in the task of mining streaming data is distributional drift over time. Streams may also exhibit high and varying degrees of class imbalance, which can further complicate the task. In scenarios like these, class imbalance is particularly difficult to overcome and has not been as thoroughly studied. In this paper, we comprehensively consider the issues of changing distributions in conjunction with high degrees of class imbalance in streaming data. We propose new approaches based on distributional divergence and meta-classification that improve several performance metrics often applied in the study of imbalanced classification. We also propose a new distance measure for detecting distributional drift and examine its utility in weighting ensemble base classifiers. We employ a sequential validation framework, which we believe is the most meaningful option in the context of streaming imbalanced data.
منابع مشابه
Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams
Mining streaming and drifting data is among the most popular contemporary applications of machine learning methods. Due to the potentially unbounded number of instances arriving rapidly, evolving concepts and limitations imposed on utilized computational resources, there is a need to develop efficient and adaptive algorithms that can handle such problems. These learning difficulties can be furt...
متن کاملAn adaptive ensemble classifier for mining concept drifting data streams
Traditional data mining techniques cannot be directly applied to the real-time data streaming environment. Existing mining classifiers therefore need to be updated frequently to adopt the changes in data streams. In this paper, we address this issue and propose an adaptive ensemble approach for classification and novel class detection in concept-drifting data streams. The proposed approach uses...
متن کاملRecursive least square perceptron model for non-stationary and imbalanced data stream classification
Classifying non-stationary and imbalanced data streams encompasses two important challenges, namely concept drift and class imbalance. ‘‘Concept drift’’ (or nonstationarity) is changes in the underlying function being learnt, and class imbalance is vast difference between the numbers of instances in different classes of data. Class imbalance is an obstacle for the efficiency of most classifiers...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کامل