The University of Chicago Algorithmic Stability and Ensemble-based Learning a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Computer Science by Samuel Kutin
نویسنده
چکیده
We explore two themes in formal learning theory. We begin with a detailed, general study of the relationship between the generalization error and stability of learning algorithms. We then examine ensemble-based learning from the points of view of stability, decorrelation, and threshold complexity. A central problem of learning theory is bounding generalization error. Most such bounds have been obtained through uniform convergence, via some variant of VC dimension. These analyses focus on the space of classifiers available to a learning algorithm, rather than on the algorithm itself. Bousquet and Elisseeff (2002) have shown, using McDiarmid’s method of independent bounded differences (1989), that algorithmic stability implies good bounds on generalization error. However, their definition of stability is too restrictive to be widely applicable. We introduce the more general notion of training stability. We show that training stability implies good bounds on generalization error even when the learner has infinite VC dimension. Our proof requires a result of independent interest: we generalize McDiarmid’s theorem to the case when differences are bounded with high probability. In the Probably-Approximately-Correct setting of Valiant (1984), training stability is necessary and sufficient for good error performance and serves as a distribution-dependent analog of VC dimension. This enables us to emphasize the learning algorithm instead of the classifier space. We next consider algorithms (e.g., boosting) which construct ensembles of weak classifiers. We show that AdaBoost (Freund and Schapire, 1997), a widelyused boosting algorithm, is stability-preserving. We demonstrate that some boosting algorithms implicitly construct classifiers which are decorrelated (i.e., rarely jointly wrong). We present two new
منابع مشابه
The University of Chicago Self-adjusting Machines a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Computer Science by Matthew Arthur Hammer
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
متن کاملThe University of Chicago Stable Algorithms and Kinetic Mesh Refinement a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Computer Science By
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
متن کاملThe University of Chicago Testing Isomorphism of Combinatorial and Algebraic Structures a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Computer Science By
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
متن کاملThe University of Chicago Structure, Automorphisms, and Isomorphisms of Regular Combinatorial Objects a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Mathematics by John Wilmes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
متن کاملThe University of Chicago Mirabolic Flag Varieties: Combinatorics and Convolution Algebras a Dissertation Submitted to the Faculty of the Division of the Physical Sciences in Candidacy for the Degree of Doctor of Philosophy Department of Mathematics by Daniele Rosso
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi CHAPTER
متن کامل