On Error-rate Estimation in Nonparametric Classification
نویسندگان
چکیده
There is a substantial literature on the estimation of error rate, or risk, for nonparametric classifiers. Error-rate estimation has at least two purposes: accurately describing the error rate, and estimating the tuning parameters that permit the error rate to be mininised. In the light of work on related problems in nonparametric statistics, it is attractive to argue that both problems admit the same solution. Indeed, methods for optimising the point-estimation performance of nonparametric curve estimators often start from an accurate estimator of error. However, we argue in this paper that accurate estimators of error rate in classification tend to give poor results when used to choose tuning parameters; and vice versa. Concise theory is used to illustrate this point in the case of cross-validation (which gives very accurate estimators of error rate, but poor estimators of tuning parameters) and the smoothed bootstrap (where error-rate estimation is poor but tuning-parameter approximations are particularly good). The theory is readily extended to other methods, for example to the 0.632+ bootstrap approach, which gives good estimators of error rate but poor estimators of tuning parameters. Reasons for the apparent contradiction are given, and numerical results are used to point to the practical implications of the theory.
منابع مشابه
Exact Rate of Convergence of Kernel-Based Classification Rule
A binary classification problem is considered, where the posteriori probability is estimated by the nonparametric kernel regression estimate with naive kernel. The excess error probability of the corresponding plug-in decision classification rule according to the error probability of the Bayes decision is studied such that the excess error probability is decomposed into approximation and estima...
متن کاملStatistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
متن کاملTHE COMPARISON OF TWO METHOD NONPARAMETRIC APPROACH ON SMALL AREA ESTIMATION (CASE: APPROACH WITH KERNEL METHODS AND LOCAL POLYNOMIAL REGRESSION)
Small Area estimation is a technique used to estimate parameters of subpopulations with small sample sizes. Small area estimation is needed in obtaining information on a small area, such as sub-district or village. Generally, in some cases, small area estimation uses parametric modeling. But in fact, a lot of models have no linear relationship between the small area average and the covariat...
متن کاملOn a Theory of Nonparametric Pairwise Similarity for Clustering: Connecting Clustering to Classification
Pairwise clustering methods partition the data space into clusters by the pairwise similarity between data points. The success of pairwise clustering largely depends on the pairwise similarity function defined over the data points, where kernel similarity is broadly used. In this paper, we present a novel pairwise clustering framework by bridging the gap between clustering and multi-class class...
متن کاملEstimating linear functionals of the error distribution in nonparametric regression
This paper addresses estimation of linear functionals of the error distribution in nonparametric regression models. It derives an i.i.d. representation for the empirical estimator based on residuals, using undersmoothed estimators for the regression curve. Asymptotic efficiency of the estimator is proved. Estimation of the error variance is discussed in detail. In this case, undersmoothing is n...
متن کامل