Off-training Set Error and a Priori Distinctions between Learning Algorithms
نویسنده
چکیده
This paper uses off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. It is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is “anti-cross-validation” (choose the generalizer with largest cross-validation error). On the other hand, for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However even for such loss functions, any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, it may be that (for example) cross-validation has better minimax properties than anti-cross-validation, even for zero-one loss. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors. Accordingly they prove, as a particular example, that cross-validation can not be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-crossvalidation rather than cross-validation (!). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the VC dimension of your generalizer is small; and the training set is large, then with high probability your OTS error is small. Other implications for “membership queries” algorithms and “punting” algorithms are also discussed. “Even after the observation of the frequent conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience.” David Hume, in A Treatise of Human Nature, Book I, part 3, Section 12.
منابع مشابه
The Lack of A Priori Distinctions Between Learning Algorithms
This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which fhere are such distinctions.) In this first paper it is shown, loosely speaking, that for a...
متن کاملMachine Learning Models for Housing Prices Forecasting using Registration Data
This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...
متن کاملTwo Novel Learning Algorithms for CMAC Neural Network Based on Changeable Learning Rate
Cerebellar Model Articulation Controller Neural Network is a computational model of cerebellum which acts as a lookup table. The advantages of CMAC are fast learning convergence, and capability of mapping nonlinear functions due to its local generalization of weight updating, single structure and easy processing. In the training phase, the disadvantage of some CMAC models is unstable phenomenon...
متن کاملSome results concerning off-training-set and IID error for the Gibbs and the Bayes optimal generalizers
In this paper we analyze the average behavior of the Bayes-optimal and Gibbs learning algorithms. We do this both for oo-training-set error and conventional IID error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results of 7]. We also show that expected IID test set error is a non-increasing function of training set s...
متن کاملCamparison of Numerically Stability of Two Algorithms for the Calculation of Variance
In descriptive statistics, there are two computational algorithms for determining the variance S2, of a set of observations : Algorithm 1: S2= - , Algorithm 2: S2= , where . It is interesting to discuss, which of the above formulas is numerically more trustworthy in machine numbers sets. I this paper, based on total effect of rounding error, we prove that the second Algorithm is better...
متن کامل