AveBoost2: Boosting for Noisy Data
نویسنده
چکیده
AdaBoost [4] is a well-known ensemble learning algorithm that constructs its constituent or base models in sequence. A key step in AdaBoost is constructing a distribution over the training examples to create each base model. This distribution, represented as a vector, is constructed to be orthogonal to the vector of mistakes made by the previous base model in the sequence [6]. The idea is to make the next base model’s errors uncorrelated with those of the previous model. In previous work [8], we developed an algorithm, AveBoost, that constructed distributions orthogonal to the mistake vectors of all the previous models, and then averaged them to create the next base model’s distribution. Our experiments demonstrated the superior accuracy of our approach. In this paper, we slightly revise our algorithm to allow us to obtain non-trivial theoretical results: bounds on the training error and generalization error (difference between training and test error). Our averaging process has a regularizing effect which, as expected, leads us to a worse training error bound for our algorithm than for AdaBoost but a superior generalization error bound. For this paper, we experimented with the data that we used in [SI both as originally supplied and with added label noise-a small fraction of the data has its original label changed. Noisy data are notoriously difficult for AdaBoost to learn. Our algorithm’s performance improvement over AdaBoost is even greater on the noisy data than the original data.
منابع مشابه
On Boosting and Noisy Labels
Boosting is a machine learning technique widely used across many disciplines. Boosting enables one to learn from labeled data in order to predict the labels of unlabeled data. A central property of boosting instrumental to its popularity is its resistance to overfitting. Previous experiments provide a margin-based explanation for this resistance to overfitting. In this thesis, the main finding ...
متن کاملSome Theoretical Aspects of Boosting in the Presence of Noisy Data
This is a survey of some theoretical results on boosting obtained from an analogous treatment of some regression and classi cation boosting algorithms. Some related papers include [J99] and [J00a,b,c,d], which is a set of (mutually overlapping) papers concerning the assumption of weak hypotheses, behavior of generalization error in the large time limit and during the process of boosting, compar...
متن کاملCombining Bagging and Boosting
Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, i...
متن کاملNetwork Game and Boosting
We propose an ensemble learning method called Network Boosting which combines weak learners together based on a random graph (network). A theoretic analysis based on the game theory shows that the algorithm can learn the target hypothesis asymptotically. The comparison results using several datasets of the UCI machine learning repository and synthetic data are promising and show that Network Bo...
متن کاملA novel boosting algorithm for improved i-vector based speaker verification in noisy environments
This paper explores the significance of an ensemble of boosted Support Vector Machine (SVM) classifiers in the i-vector framework for speaker verification (SV) in noisy environments. Prior work in this field have established the significance of supervector-based approaches and more specifically the i-vector extraction paradigm for robust SV. However, in highly degraded environments, SVMs traine...
متن کامل