Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

نویسندگان

  • Ioannis Tsamardinos
  • Elissavet Greasidou
  • Michalis Tsagris
  • Giorgos Borboudakis
چکیده

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV’s main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation [31] and a method by Tibshirani and Tibshirani [29], BBC-CV is computationallymore efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based hypothesis test we stop training of models on new folds of statistically-significantly inferior configurations. We name the method Bootstrap Corrected with Early Dropping CV (BCED-CV) that is both efficient and provides accurate performance estimates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating misclassification error with small samples via bootstrap cross-validation

MOTIVATION Estimation of misclassification error has received increasing attention in clinical diagnosis and bioinformatics studies, especially in small sample studies with microarray data. Current error estimation methods are not satisfactory because they either have large variability (such as leave-one-out cross-validation) or large bias (such as resubstitution and leave-one-out bootstrap). W...

متن کامل

Pii: S0895-4356(01)00341-9

The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model...

متن کامل

Judgmental Bootstrapping: Inferring Experts' Rules for Forecasting

Judgmental bootstrapping is a type of expert system. It translates an experts' rules into a quantitative model by regressing the experts' forecasts against the information that he used. Bootstrapping models apply an experts' rules consistently, and many studies have shown that decisions and predictions from bootstrapping models are similar to those from the experts. Three studies showed that bo...

متن کامل

Fingerprint resampling: A generic method for efficient resampling

In resampling methods, such as bootstrapping or cross validation, a very similar computational problem (usually an optimization procedure) is solved over and over again for a set of very similar data sets. If it is computationally burdensome to solve this computational problem once, the whole resampling method can become unfeasible. However, because the computational problems and data sets are ...

متن کامل

An Evolutionary Bootstrap Method for Selecting Dynamic Trading Strategies

This paper combines techniques drawn from the literature on evolutionary optimization algorithms along with bootstrap based statistical tests. Bootstrapping and cross validation are used as a general framework for estimating objectives out of sample by redrawing subsets from a training sample. Evolution is used to search the large space of potential network architectures. The combination of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.07180  شماره 

صفحات  -

تاریخ انتشار 2017