General Bias/Variance Decomposition with Target Independent Variance of Error Functions Derived from the Exponential Family of Distributions
نویسندگان
چکیده
An important theoretical tool in machine learning is the bias/variance decomposition of the generalization error. It was introduced for the mean square error in [3]. The bias/variance decomposition includes the concept of the ave-rage predictor. The bias is the error of the average predictor, and the systematic part of the generalization error, while the variability around the average predictor is the variance. We present a large group of error functions with the same desirable properties as the bias/variance decomposition in [3]. The error functions are derived from the exponential family of distributions via the statistical deviance measure. We prove that this family of error functions contains all error functions decomposable in that manner. We state the connection between the bias/variance decomposition and the ambiguity decomposition [7] and present a useful approximation of ambiguity that is quadratic in the ensemble coefficients. 1 Notation and problem domain The problem domain of this paper is finding the functional relationship between output and input based on an example set of target-input pairs . To make this a relevant problem it is assumed that the set is generated with noise from a function . We wish to find a predictor that is as close as possible to . The vector refers to the parameters that describe the predictor, e.g. the weights in a neural network. Furthermore, we are interested in the situation where we have an ensemble of predictors characterized by a distribution , which is independent of the noise distribution. The mean operator is denoted . The set of predictors can be finite or infinite. We will generally look at only one input point, so for notational convenience we will omit dependency of functions on input; we also omit the parameters of the predictors. The inaccuracy or error of a predictor is measured with an error function . 2 Bias/variance decomposition If is noisy it is not guaranteed that for all . It is therefore not optimal to find a predictor with for all . If is noise-free and we find a predictor with for all , then the predictor can be different from the function on all other points. In both cases, by using the principle of Occam’s Razor, the class of possible predictors should be restricted, e.g. by limiting the number of weights in neural networks. This raises an important question: Just how large a class of predictors should be used? If the class is too small, the predictors are too simple and cannot predict the target functions. On the other hand, if the class is too large, the predictors can become too complex and overfit. The two cases correspond to two different kinds of errors: Bias and Variance. To fully understand the difference between the errors we look at an ensemble of predictors. The mean of the predictors is the average predictor. The error of the average predictor expresses the systematic error of the predictors, i.e. the bias, while the mean of the error between the predictors and the average predictor expresses the stochastic error i.e. the variance. Generally, both kinds of errors will be made by an ensemble of predictors. We would like to be able to split the mean of the generalization error into a bias and a variance term: . This is the bias/variance decomposition. It was introduced in [3] for the mean square error ( MSE ). The average predictor is . The mean generalization error is MSE . We have
منابع مشابه
General Bias/Variance Decomposition with Target Independent Variance of Error Functions Derived from the Exponential Family of Distributions
An important theoretical tool in machine learning is the bias/variance decomposition of the generalization error. It was introduced for the mean square error in [3]. The bias/variance decomposition includes the concept of the average predictor. The bias is the error of the average predictor, and the systematic part of the generalization error, while the variability around the average predictor ...
متن کاملEmpirical Bayes Estimators with Uncertainty Measures for NEF-QVF Populations
The paper proposes empirical Bayes (EB) estimators for simultaneous estimation of means in the natural exponential family (NEF) with quadratic variance functions (QVF) models. Morris (1982, 1983a) characterized the NEF-QVF distributions which include among others the binomial, Poisson and normal distributions. In addition to the EB estimators, we provide approximations to the MSE’s of t...
متن کاملMinimax Estimation of the Scale Parameter in a Family of Transformed Chi-Square Distributions under Asymmetric Squared Log Error and MLINEX Loss Functions
This paper is concerned with the problem of finding the minimax estimators of the scale parameter ? in a family of transformed chi-square distributions, under asymmetric squared log error (SLE) and modified linear exponential (MLINEX) loss functions, using the Lehmann Theorem [2]. Also we show that the results of Podder et al. [4] for Pareto distribution are a special case of our results for th...
متن کاملEstimating bias and variance from data
The bias-variance decomposition of error provides useful insights into the error performance of a classifier as it is applied to different types of learning task. Most notably, it has been used to explain the extraordinary effectiveness of ensemble learning techniques. It is important that the research community have effective tools for assessing such explanations. To this end, techniques have ...
متن کاملSpatial prediction of soil electrical conductivity using soil axillary data, soft data derived from general linear model and error measurement
Indirect measurement of soil electrical conductivity (EC) has become a major data source in spatial/temporal monitoring of soil salinity. However, in many cases, the weak correlation between direct and indirect measurement of EC has reduced the accuracy and performance of the predicted maps. The objective of this research was to estimate soil EC based on a general linear model via using se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000