Bagging Does Not Always Decrease Mean Squared Error
نویسندگان
چکیده
Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. Heuristically, the averaging process should reduce the variance component of the prediction error. This is supported by empirical evidence suggesting that bagging can indeed reduce prediction error and appears to be most effective for cart trees, which are highly unstable functions of the data. We study the effects of bagging for the simple class of U-statistics. While these do not describe cart trees, U-statistics have the advantage of admitting a complete and rigorous analysis. We find that bagging always inceases bias, but the effects on variance and mean squared error depend on the specifics of the U-statistic and its distribution. We also find a correspondence to order 1/N2 for bagging based on resampling with and without replacement, respectively. AT&T Labs–Research, 180 Park Ave, Florham Park, NJ 07932-0971; [email protected] Department of Statistics, University of Washington, Seattle, WA 98195-4322; [email protected]. Research partially supported by NSF grant DMS 9803226. This work was performed while the second author was on sabbatical leave at AT&T Labs.
منابع مشابه
On Bagging and Estimation in Multivariate Mixtures
Two bagging approaches, say 1 2 n-out-of-n without replacement (subagging) and n-out-of-n with replacement (bagging) have been applied in the problem of estimation of the parameters in a multivariate mixture model. It has been observed by Monte Carlo simulations and a real data example, that both bagging methods have improved the standard deviation of the maximum likelihood estimator of the mix...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملAnalyzing Bagging
Bagging is one of the most effective computationally intensive procedures to improve on unstable estimators or classifiers, useful especially for high dimensional data set problems. Here we formalize the notion of instability and derive theoretical results to analyze the variance reduction effect of bagging (or variants thereof) in mainly hard decision problems, which include estimation after t...
متن کاملPricing and hedging derivative securities with neural networks: Bayesian regularization, early stopping, and bagging
We study the effectiveness of cross validation, Bayesian regularization, early stopping, and bagging to mitigate overfitting and improving generalization for pricing and hedging derivative securities with daily S&P 500 index daily call options from January 1988 to December 1993. Our results indicate that Bayesian regularization can generate significantly smaller pricing and delta-hedging errors...
متن کاملA Case Study on Bagging, Boosting, and Basic Ensembles of Neural Networks for OCR
W e study the effectiveness of three neural network ensembles in improving OCR performance: ( i ) Basic, (ii) Bagging, and (iii) Boosting. Three random character degradation models are introduced in training indivadual networks in order to reduce error correlation between individual networks and to improve the generalization ability of neural networks. We compare the recognition accuracies of t...
متن کامل