The Convergence Rate of AdaBoost
نویسنده
چکیده
We pose the problem of determining the rate of convergence at which AdaBoost minimizes exponential loss. Boosting is the problem of combining many “weak,” high-error hypotheses to generate a single “strong” hypothesis with very low error. The AdaBoost algorithm of Freund and Schapire (1997) is shown in Figure 1. Here we are given m labeled training examples (x1, y1), . . . , (xm, ym) where the xi’s are in some domain X , and the labels yi ∈ {−1,+1}. On each round t, a distribution Dt is computed as in the figure over the m training examples, and a weak hypothesis ht : X → {−1,+1} is found, where our aim is to find a weak hypothesis with low weighted error t relative toDt. In particular, for simplicity, we assume that ht minimizes the weighted error over all hypotheses belonging to some finite class of weak hypothesesH = {~1, . . . , ~N}. The final hypothesis H computes the sign of a weighted combination of weak hypotheses F (x) = ∑T t=1 αtht(x). Since each ht is equal to ~jt for some jt, this can also be rewritten as F (x) = ∑N j=1 λj~j(x) for some set of values λ = 〈λ1, . . . λN 〉. It was observed by Breiman (1999) and others (Frean & Downs, 1998; Friedman et al., 2000; Mason et al., 1999; Onoda et al., 1998; Rätsch et al., 2001; Schapire & Singer, 1999) that AdaBoost behaves so as to minimize the exponential loss
منابع مشابه
The Rate of Convergence of Adaboost
The AdaBoost algorithm of Freund and Schapire (1997) was designed to combine many “weak” hypotheses that perform slightly better than a random guess into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss” with a fast rate of convergence. Our proofs do not require a weak-learning assumption, nor do the...
متن کاملPrecise Statements of Convergence for AdaBoost and arc-gv
We present two main results, the first concerning Freund and Schapire’s AdaBoost algorithm, and the second concerning Breiman’s arc-gv algorithm. Our discussion of AdaBoost revolves around a circumstance called the case of “bounded edges”, in which AdaBoost’s convergence properties can be completely understood. Specifically, our first main result is that if AdaBoost’s “edge” values fall into a ...
متن کاملBoosting Based on a Smooth Margin
We study two boosting algorithms, Coordinate Ascent Boosting and Approximate Coordinate Ascent Boosting, which are explicitly designed to produce maximum margins. To derive these algorithms, we introduce a smooth approximation of the margin that one can maximize in order to produce a maximum margin classifier. Our first algorithm is simply coordinate ascent on this function, involving a line se...
متن کاملSelfieBoost: A Boosting Algorithm for Deep Learning
We describe and analyze a new boosting algorithm for deep learning called SelfieBoost. Unlike other boosting algorithms, like AdaBoost, which construct ensembles of classifiers, SelfieBoost boosts the accuracy of a single network. We prove a log(1/ ) convergence rate for SelfieBoost under some “SGD success” assumption which seems to hold in practice.
متن کاملSome Open Problems in Optimal AdaBoost and Decision Stumps
The significance of the study of the theoretical and practical properties of AdaBoost is unquestionable, given its simplicity, wide practical use, and effectiveness on real-world datasets. Here we present a few open problems regarding the behavior of “Optimal AdaBoost,” a term coined by Rudin, Daubechies, and Schapire in 2004 to label the simple version of the standard AdaBoost algorithm in whi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010