Distribution-Specific Agnostic Boosting
نویسنده
چکیده
We consider the problem of boosting the accuracy of weak learning algorithms in the agnostic learning framework of Haussler (1992) and Kearns et al. (1992). Known algorithms for this problem (BenDavid et al., 2001; Gavinsky, 2002; Kalai et al. , 2008) follow the same strategy as boosting algorithms in the PAC model: the weak learner is executed on the same target function but over different distributions on the domain. Application of such boosting algorithms usually requires a distribution-independent weak agnostic learners. Here we demonstrate boosting algorithms for the agnostic learning framework that only modify the distribution on the labels of the points (or, equivalently, modify the target function). This allows boosting a distribution-specific weak agnostic learner to a strong agnostic learner with respect to the same distribution. Our algorithm achieves the same guarantees on the final error as the boosting algorithms of Kalai et al. (2008) but is substantially simpler and more efficient. When applied to the weak agnostic parity learning algorithm of Goldreich and Levin (1989) our algorithm yields a simple PAC learning algorithm for DNF and an agnostic learning algorithm for decision trees over the uniform distribution using membership queries. These results substantially simplify Jackson’s famous DNF learning algorithm (1994) and the recent result of Gopalan et al. (2008). We also strengthen the connection to hard-core set constructions discovered by Klivans and Servedio (1999) by demonstrating that hard-core set constructions that achieve the optimal hard-core set size (given by Holenstein (2005) and Barak et al. (2009)) imply distribution-specific agnostic boosting algorithms. Conversely, our boosting algorithm gives a simple hard-core set construction with an (almost) optimal hard-core set size.
منابع مشابه
Agnostic Boosting
We extend the boosting paradigm to the realistic setting of agnostic learning, that is, to a setting where the training sample is generated by an arbitrary (unknown) probability distribution over examples and labels. We deene a-weak agnostic learner with respect to a hypothesis class F as follows: given a distribution P it outputs some hypothesis h 2 F whose error is at most erP(F) + , where er...
متن کاملOptimally-Smooth Adaptive Boosting and Application to Agnostic Learning
We describe a new boosting algorithm that is the first such algorithm to be both smooth and adaptive. These two features make possible performance improvements for many learning tasks whose solutions use a boosting technique. The boosting approach was originally suggested for the standard PAC model; we analyze possible applications of boosting in the context of agnostic learning, which is more ...
متن کاملA Boosting Framework on Grounds of Online Learning
By exploiting the duality between boosting and online learning, we present a boosting framework which proves to be extremely powerful thanks to employing the vast knowledge available in the online learning area. Using this framework, we develop various algorithms to address multiple practically and theoretically interesting questions including sparse boosting, smooth-distribution boosting, agno...
متن کاملCommunication Efficient Distributed Agnostic Boosting
We consider the problem of learning from distributed data in the agnostic setting, i.e., in the presence of arbitrary forms of noise. Our main contribution is a general distributed boosting-based procedure for learning an arbitrary concept space, that is simultaneously noise tolerant, communication efficient, and computationally efficient. This improves significantly over prior works that were ...
متن کاملPotential-Based Agnostic Boosting
We prove strong noise-tolerance properties of a potential-based boosting algorithm, similar to MadaBoost (Domingo and Watanabe, 2000) and SmoothBoost (Servedio, 2003). Our analysis is in the agnostic framework of Kearns, Schapire and Sellie (1994), giving polynomial-time guarantees in presence of arbitrary noise. A remarkable feature of our algorithm is that it can be implemented without reweig...
متن کامل