A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection

نویسنده

  • Ron Kohavi
چکیده

We review accuracy estimation methods and compare the two most commonmethods cross validation and bootstrap Recent experimen tal results on arti cial data and theoretical re sults in restricted settings have shown that for selecting a good classi er from a set of classi ers model selection ten fold cross validation may be better than the more expensive leave one out cross validation We report on a large scale experiment over half a million runs of C and a Naive Bayes algorithm to estimate the e ects of di erent parameters on these al gorithms on real world datasets For cross validation we vary the number of folds and whether the folds are strati ed or not for boot strap we vary the number of bootstrap sam ples Our results indicate that for real word datasets similar to ours the best method to use for model selection is ten fold strati ed cross validation even if computation power allows using more folds

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Cross - Validation and Bootstrapfor Accuracy Estimation and Model

We review accuracy estimation methods and compare the two most common methods: cross-validation and bootstrap. Recent experimental results on artiicial data and theoretical results in restricted settings have shown that for selecting a good classiier from a set of classi-ers (model selection), tenfold cross-validation may be better than the more expensive leave-one-out cross-validation. We repo...

متن کامل

Bootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation

Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration ...

متن کامل

Improvement of effort estimation accuracy in software projects using a feature selection approach

In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...

متن کامل

Estimating and Reducing the Error of a Classifier or Predictor

Methods, such as holdout, random subsampling, k-fold cross-validation, and bootstrap, for making error estimation are discussed. Also considered are general techniques, such as bagging and boosting, for increasing model accuracy. Directory • Table of

متن کامل

Estimation of genotype imputation accuracy using reference populations with varying degrees of relationship and marker density panel

Genotype imputation from low-density to high-density (SNP) chips is an important step before applying genomic selection, because denser chips can provide more reliable genomic predictions. In the current research, the accuracy of genotype imputation from low and moderate-density panels (5K and 50K) to high-density panels in the purebred and crossbred populations was assessed. The simulated popu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995