Insights into Cross-validation

نویسندگان

  • AMIT DHURANDHAR
  • ALIN DOBRA
  • A. Dobra
چکیده

Cross-validation is one of the most widely used techniques, in estimating the Generalization Error of classification algorithms. Though several empirical studies have been conducted, to study the behavior of this method in the past, none of them clearly elucidate the reasons behind the observed behavior. In this paper we study the behavior of the moments (i.e. expected value and variance) of the cross-validation Error and explain the observed behavior in detail. In particular, we provide interesting insights into the behavior of covariance between the individuals runs of cross-validation, which has significant effects on the overall variance. We study this behavior on three classification models which are a mix of parametric and non-parametric models namely, Naive Bayes Classifier – parametric model, Decision Trees and K-Nearest Neighbor Classifier – non-parametric models. The moments are computed using closed form expressions rather than directly using Monte Carlo since the former is shown to be a more viable alternative. The work in this paper complements the prior experimental work done in studying this method since it explains in detail the reasons for the observed trends in the experiments, as opposed to simply reporting the observed behavior.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geometric Insights into Support Vector Machine Behavior using the KKT Conditions

The Support Vector Machine (SVM) is a powerful and widely used classification algorithm. Its performance is well known to be impacted by a tuning parameter which is frequently selected by cross-validation. This paper uses the Karush-Kuhn-Tucker conditions to provide rigorous mathematical proof for new insights into the behavior of SVM in the large and small tuning parameter regimes. These insig...

متن کامل

Cross-validating Image Description Datasets and Evaluation Metrics

The task of automatically generating sentential descriptions of image content has become increasingly popular in recent years, resulting in the development of large-scale image description datasets and the proposal of various metrics for evaluating image description generation systems. However, not much work has been done to analyse and understand both datasets and the metrics. In this paper, w...

متن کامل

Revealing the missing heritability via cross-validated genome-wide association studies

Presented here is a simple method for cross-validated genome-wide association studies (cvGWAS). Focusing on phenotype prediction, the method is able to reveal a significant amount of missing heritability by properly selecting a small number of loci with implicit predictive ability. The results provide new insights into the missing heritability problem and the underlying genetic architecture of ...

متن کامل

Analysis of Unstandardized Contributions in Cross Connected Networks

Understanding knowledge representations in neural nets has been a difficult problem. Principal components analysis (PCA) of contributions (products of sending activations and connection weights) has yielded valuable insights into knowledge representations, but much of this work has focused on the correlation matrix of contributions. The present work shows that analyzing the variance-covariance ...

متن کامل

Comparing the Data Sets

cross-national data sets on democracy. Whereas the main body of this article is organized around the four traditions, this appendix is structured around the resulting insights into the six data sets that have been examined. A substantial literature has assessed democracy measures from diverse perspectives (Munck & Verkuilen, 2002; Munck, 2009), and the discussion below considers only issues of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008