Cross-Validated Bagged Learning.

نویسندگان

  • Maya L Petersen
  • Annette M Molinaro
  • Sandra E Sinisi
  • Mark J van der Laan
چکیده

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, bootstrap aggregating (bagging) has been introduced as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying an estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to address the curse of dimensionality, a common practice has been to apply bagging to estimators which themselves use cross-validation, thereby using cross-validation within a bootstrap sample to select fine-tuning parameters trading off bias and variance of the bootstrap sample-specific candidate estimators. In this article we point out that in order to achieve the correct bias variance trade-off for the parameter of interest, one should apply the cross-validation selector externally to candidate bagged estimators indexed by these fine-tuning parameters. We use three simulations to compare the new cross-validated bagging method with bagging of cross-validated estimators and bagging of non-cross-validated estimators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-validated bagged prediction of survival.

In this article, we show how to apply our previously proposed Deletion/Substitution/Addition algorithm in the context of right-censoring for the prediction of survival. Furthermore, we introduce how to incorporate bagging into the algorithm to obtain a cross-validated bagged estimator. The method is used for predicting the survival time of patients with diffuse large B-cell lymphoma based on ge...

متن کامل

An Eecient Method to Estimate Bagging's Generalization Error

Bagging [1] is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set [5, 4]. The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive for leave-one-out cross-validation one needs to train the underlying algorithm on the order of m times, where...

متن کامل

An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics

We present results from a large-scale empirical comparison between ten learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We evaluate the methods on binary classification problems using nine performance criteria: accuracy, squared error, cross-entropy, ROC Area, F-score, p...

متن کامل

The NeuralBAG algorithm: optimizing generalization performance in bagged neural networks

In this paper we propose an algorithm we call \NeuralBAG" that estimates the set of weights and number of hidden units each network in a bagged ensemble should have so that the generalization performance of the ensemble is optimized. Experiments performed on noisy synthetic data demonstrate the potential of the algorithm. On average, ensembles trained using NeuralBAG out-perform bagged networks...

متن کامل

Bagging tree classifiers for laser scanning images: a data- and simulation-based strategy

Diagnosis based on medical image data is common in medical decision making and clinical routine. We discuss a strategy to derive a classifier with good performance on clinical image data and to justify the properties of the classifier by an adapted simulation model of image data. We focus on the problem of classifying eyes as normal or glaucomatous based on 62 routine explanatory variables deri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of multivariate analysis

دوره 25 2  شماره 

صفحات  -

تاریخ انتشار 2008