Online Ensemble Learning

نویسنده

Nikunj C. Oza

چکیده

Ensemble learning methods train combinations of base models, which may be decision trees, neural networks, or others traditionally used in supervised learning. Ensemble methods have gained popularity because many researchers have demonstrated their superior prediction performance relative to single models on a variety of problems especially when the correlations of the errors made by the base models are low (e.g., (Freund & Schapire 1996; Tumer & Oza 1999)). However, these learning methods have largely operated in batch mode—that is, they repeatedly process the entire set of training examples as a whole. These methods typically require at least one pass through the data for each base model in the ensemble. We would instead prefer to learn the entire ensemble in an online fashion, i.e., using only one pass through the entire dataset. This would make ensemble methods practical when data is being generated continuously so that storing data for batch learning is impractical, or in data mining tasks where the datasets are large enough that multiple passes would require a prohibitively large training time. We have so far developed online versions of the popular bagging (Breiman 1994) and boosting (Freund & Schapire 1996) algorithms. We have shown empirically that both online algorithms converge to the same prediction performance as the batch versions and proved this convergence for online bagging (Oza 2000). However, significant empirical and theoretical work remains to be done. There are several traditional ensemble learning issues that remain in our online ensemble learning framework such as the number and types of base models to use, the combining method to use, and how to maintain diversity among the base models. When learning large datasets, we may hope to avoid using all of the training examples and/or input features. We have developed input decimation (Tumer & Oza 1999), a technique that uses different subsets of the input features in different base models. We have shown that this method performs better than combinations of base models that use all the input features because of two characteristics of our base models: they overfit less by using only a small number of highly-relevant input features, and they have lower correlations in their errors because they use different input feature subsets. However, our method of selecting input features

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cascading randomized weighted majority: A new online ensemble learning algorithm

With the increasing volume of data in the world, the best approach for learning from this data is to exploit an online learning algorithm. Online ensemble methods are online algorithms which take advantage of an ensemble of classifiers to predict labels of data. Prediction with expert advice is a well-studied problem in the online ensemble learning literature. The Weighted Majority algorithm an...

متن کامل

An Online Ensemble of Classifiers

Along with the explosive increase of data and information, incremental learning ability has become more and more important for machine learning approaches. The online algorithms try to forget irrelevant information instead of synthesizing all available information (as opposed to classic batch learning algorithms). Nowadays, combining classifiers is proposed as a new direction for the improvemen...

متن کامل

Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods

Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...

متن کامل

DELA: A Dynamic Online Ensemble Learning Algorithm

The present paper investigates the problem of prediction in the context of dynamically changing environment, where data arrive over time. A Dynamic online Ensemble Learning Algorithm (DELA) is introduced. The adaptivity concerns three levels: structural adaptivity, combination adaptivity and model adaptivity. In particular, the structure of the ensemble is sought to evolve in order to be able t...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل