Estimating Generalization Error Using Out-of-Bag Estimates
نویسندگان
چکیده
We provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The outof-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have very inaccurate (much too optimistic) confidence levels. We can improve the out-of-bag estimate by incorporating a correction.
منابع مشابه
Cost-Complexity Pruning of Random Forests
Random forests perform boostrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal crossvalidation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first...
متن کاملOut-of-bag estimation of the optimal sample size in bagging
The performance of m-out-of-n bagging with and without replacement in terms of the sampling ratio (m/n) is analyzed. Standard bagging uses resampling with replacement to generate bootstrap samples of equal size as the original training set mwor = n. Without-replacement methods typically use half samples mwr = n/2. These choices of sampling sizes are arbitrary and need not be optimal in terms of...
متن کاملOn the overestimation of random forest’s out-of-bag error
Background The ensemble method random forests has become a popular classification tool in bioinformatics and related fields. The out-of-bag error is an error estimation technique which is often used to evaluate the accuracy of a random forest as well as for selecting appropriate values for tuning parameters, such as the number of candidate predictors that are randomly drawn for a split, referre...
متن کاملAn Eecient Method to Estimate Bagging's Generalization Error
Bagging [1] is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set [5, 4]. The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive for leave-one-out cross-validation one needs to train the underlying algorithm on the order of m times, where...
متن کاملOut of Bootstrap Estimation of Generalization Error Curves in Bagging Ensembles
The dependence of the classification error on the size of a bagging ensemble can be modeled within the framework of Monte Carlo theory for ensemble learning. These error curves are parametrized in terms of the probability that a given instance is misclassified by one of the predictors in the ensemble. Out of bootstrap estimates of these probabilities can be used to model generalization error cu...
متن کامل