V -fold Cross-validation and V -fold Penalization in Least-squares Density Estimation

نویسندگان

  • SYLVAIN ARLOT
  • MATTHIEU LERASLE
چکیده

This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares risk of the selected estimator. We first prove a non asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization), with an upper bound decreasing as a function of V . In particular, this result implies V -fold penalization is asymptotically optimal. Then, we compute the variance of V -fold cross-validation and related criteria, as well as the variance of key quantities for model selection performances. We show these variances depend on V like 1 + 1/(V − 1) (at least in some particular cases), suggesting the performances increase much from V = 2 to V = 5 or 10, and then is almost constant. Overall, this explains the common advice to take V = 10—at least in our setting and when the computational power is limited—, as confirmed by some simulation experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

This paper studies V -fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V -fold cross-validation and its bias-corrected version (V -fold penalization). In particular, this result implie...

متن کامل

Appendix to the Article “ Choice of V for V - Fold Cross - Validation in Least - Squares Density Estimation ”

This appendix is organized as follows. The first section (called Section B, for consistency with the numbering of the article) gives complementary computations of variances. Then, results concerning hold-out penalization are detailed in Section D, with the proof of the oracle inequality stated in Section 8.2 (Theorem 12) and an exact computation of the variance. Section E provides complements o...

متن کامل

Model selection by resampling penalization

We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local Rademacher complexities and V fold cross-validation. In the case example of least-square regression on histograms, we prove oracle inequalities, and that these ...

متن کامل

Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases

We investigate the optimality for model selection of the so-called slope heuristics, V -fold cross-validation and V -fold penalization in a heteroscedatic with random design regression context. We consider a new class of linear models that we call strongly localized bases and that generalize histograms, piecewise polynomials and compactly supported wavelets. We derive sharp oracle inequalities ...

متن کامل

V-fold cross-validation improved: V-fold penalization

We study the efficiency of V -fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call “V -fold penalization”. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it “overpenalizes” all the more that V is large. Hence, asymptotic opti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012