The Binomial Block Bootstrap Estimator for Evaluating Loss on Dependent Clusters
نویسندگان
چکیده
In this paper, we study the non-IID learning setting where samples exhibit dependency within latent clusters. Our goal is to estimate a learner’s loss on new clusters, an extension of the out-of-bag error. Previously developed cross-validation estimators are well suited to the case where the clustering of observed data is known a priori. However, as is often the case in real world problems, we are only given a noisy approximation of this clustering, likely the result of some clustering algorithm. This subtle yet potentially significant issue afflicts domains ranging from image classification to medical diagnostics, where naive cross-validation is an optimistically biased estimator. We present a novel bootstrap technique and corresponding cross-validation method that, somewhat counterintuitively, injects additional dependency to asymptotically recover the loss in the independent setting.
منابع مشابه
Minimax Estimator of a Lower Bounded Parameter of a Discrete Distribution under a Squared Log Error Loss Function
The problem of estimating the parameter ?, when it is restricted to an interval of the form , in a class of discrete distributions, including Binomial Negative Binomial discrete Weibull and etc., is considered. We give necessary and sufficient conditions for which the Bayes estimator of with respect to a two points boundary supported prior is minimax under squared log error loss function....
متن کاملOptimum Block Size in Separate Block Bootstrap to Estimate the Variance of Sample Mean for Lattice Data
The statistical analysis of spatial data is usually done under Gaussian assumption for the underlying random field model. When this assumption is not satisfied, block bootstrap methods can be used to analyze spatial data. One of the crucial problems in this setting is specifying the block sizes. In this paper, we present asymptotic optimal block size for separate block bootstrap to estimate the...
متن کاملAdmissible and Minimax Estimator of the Parameter $theta$ in a Binomial $Bin( n ,theta)$ distribution under Squared Log Error Loss Function in a Lower Bounded Parameter Space
Extended Abstract. The study of truncated parameter space in general is of interest for the following reasons: 1.They often occur in practice. In many cases certain parameter values can be excluded from the parameter space. Nearly all problems in practice have a truncated parameter space and it is most impossible to argue in practice that a parameter is not bounded. In truncated parameter...
متن کاملFast Block Variance Estimation Procedures for Inhomogeneous Spatial Point Processes
We introduce two new variance estimation procedures by using non-overlapping and overlapping blocks, respectively. The non-overlapping block (NOB) estimator can be viewed as the limit of the thinned block bootstrap (TBB) estimator recently proposed in Guan and Loh (2007), by letting the number of thinned processes and bootstrap samples therein both increase to infinity. Compared to the latter, ...
متن کاملBias Correction with Jackknife, Bootstrap, and Taylor Series
We analyze the bias correction methods using jackknife, bootstrap, and Taylor series. We focus on the binomial model, and consider the problem of bias correction for estimating f(p), where f ∈ C[0, 1] is arbitrary. We characterize the supremum norm of the bias of general jackknife and bootstrap estimators for any continuous functions, and demonstrate the in deleted jackknife, different values o...
متن کامل