Supplement to “Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests”
نویسندگان
چکیده
In this supplemental article, we provide detailed proofs for the propositions and theorems in the main paper. We write the indicator function of the event A as 1A. Let X1, · · · , Xn and Y1, · · · , Yñ be independent random samples in Rd from unknown distributions F and G, respectively, with corresponding densities f and g with respect to Lebesgue measure. The densities are assumed to be continuous on their supports. The two sample test can be stated as H0 : F = G versus H1 : F 6= G. Denote the two sets of indices Ωx = {1, · · · , n} and Ωy = {n+ 1, · · · ,m}, with m = n+ ñ. We
منابع مشابه
Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests
Some existing nonparametric two-sample tests for equality of multivariate distributions perform unsatisfactorily when the two sample sizes are unbalanced. In particular, the power of these tests tends to diminish with increasingly unbalanced sample sizes. In this article, we propose a new testing procedure to solve this problem. The proposed test, based on the nearest neighbor method by Schilli...
متن کاملK-sample subsampling in general spaces: The case of independent time series
The problem of subsampling in two-sample and K-sample settings is addressed where both the data and the statistics of interest take values in general spaces. We focus on the case where each sample is a stationary time series, and construct subsampling confidence intervals and hypothesis tests with asymptotic validity. Some examples are also given, and the problem of optimal block size choice is...
متن کاملK-sample Subsampling
The problem of subsampling in two-sample and K-sample settings is addressed where both the data and the statistics of interest take values in general spaces. We show the asymptotic validity of subsampling confidence intervals and hypothesis tests in the case of independent samples, and give a comparison to the bootstrap in the K-sample setting.
متن کاملTempering by Subsampling
In this paper we demonstrate that tempering Markov chain Monte Carlo samplers for Bayesian models by recursively subsampling observations without replacement can improve the performance of baseline samplers in terms of effective sample size per computation. We present two tempering by subsampling algorithms, subsampled parallel tempering and subsampled tempered transitions. We provide an asympt...
متن کاملSample Subset Optimization for Classifying Imbalanced Biological Data
Data in many biological problems are often compounded by imbalanced class distribution. That is, the positive examples may largely outnumbered by the negative examples. Many classification algorithms such as support vector machine (SVM) are sensitive to data with imbalanced class distribution, and result in a suboptimal classification. It is desirable to compensate the imbalance effect in model...
متن کامل