Fast Two-Sample Permutation Tests, Even When One Sample is Large, that Efficiently Maximize Power Under Crude Monte Carlo Sampling
نویسنده
چکیده
I present a method for quickly performing multiple nonparametric two-sample permutation tests on continuous data in SAS, even when one sample is large. I maximize statistical power (within the context of a crude Monte Carlo approach) by “oversampling” – drawing more permutation samples than desired, deleting duplicates, and then selecting the desired number of samples from the remainder. I determine the optimal number of samples to “oversample” based on sampling probability and the runtime of a sampling procedure (PROC PLAN). Implementing “oversampling” with nearly optimal numbers of samples increases start-to-finish runtime typically by only 5%, and always by less than 10%. Using telecommunications performance measurement data from multiple sources with a wide range of sample size pairs, I benchmark start-to-finish runtime against a) another SAS procedure (PROC MULTTEST), b) another SAS program written for the same purpose, and c) Cytel’s PROC TWOSAMPL, with very favorable results. The relative benchmark speeds would be identical if applied to data from randomized controlled studies. INTRODUCTION Permutation tests are as old as modern statistics, and their statistical properties are well understood and thoroughly documented in the statistical literature. Though not always as powerful as their parametric counterparts that rely on asymptotic theory, they sometimes have equal or even greater power. Often they can be used when asymptotic theory falls short (e.g. small samples and the Central Limit Theorem), and when fully enumerated, they provide gratifyingly exact results (as opposed to approximations based on asymptotic theory). Most important, however, is their reliance on very few distributional assumptions, giving permutation tests a much broader range of application. Until recently the major drawback of permutation tests has been their high computational demands. Fully enumerating a permutation test requires calculating the test statistic appropriate for the hypotheses being tested for every possible two-sample 1 Permutation tests were advocated by one of the fathers of modern statistics, Sir R.A. Fisher, as early as the 1920s. 2 Pesarin (2001) and Mielke and Berry (2001) contain extensive bibliographies. 3 For just one example, see Andersen and Legendre (1999). 4 In fact, it was Fisher who correctly characterized parametric tests relying on asymptotic theory as mere approximations to the exact results of fully enumerated permutation tests (Good (1994), p.4, citing Fisher). 5 Exchangeability of the data under both the null and alternate hypotheses is the major requirement of permutation tests. Good (1994) states that a permutation test requires only, “that the underlying distributions are symmetric, and/or the alternatives are simples [sic] shifts in value “, though there are cases satisfying these criteria where the basic, nonparametric permutation test discussed here is not the most appropriate method and should not be relied upon (e.g. the Behrens-Fisher problem -see Pesarin (2001), Ch. 10). combination of the data points. Then the value of the test statistic based on the original two samples must be compared to those based on all the “permutation” samples to obtain a p-value – the result of the test. Drawing only a sample of all possible samples, as is typically done, still has been associated with prohibitive computer runtimes. Recent advances in computing capacity and speed, however, increasingly have relaxed this constraint. But efficient statistical code still is needed to most effectively exploit these advances and to ensure that the choice of method is driven as much by statistical theory, and as little by technological constraint, as possible. The goal of the methods described below is to contribute to this effort. IMPLEMENTING PERMUTATION TESTS IN SAS Two procedures in SAS can be used to perform two-sample nonparametric permutation tests – PROC MULTTEST and PROC PLAN. The former directly samples the input dataset itself, while the later generates a record-by-record list identifying those records on the input dataset to include in the samples. This list subsequently must be merged with the original data to obtain the corresponding data points. In addition, PROC MULTTEST actually conducts the permutation test and provides a p-value (assuming that, for continuous data, a pooled-variance t-test is the appropriate test statistic); the results of PROC PLAN, on the other hand, must be manipulated “manually” to calculate the value of the test statistic associated with the original sample pair, and then compare it to those associated with all the permutation samples to obtain a p-value. However, this entire process using PROC PLAN still is much faster than PROC MULTTEST under most conditions, as shown in the benchmark section below, and it also provides more flexibility in the definition of the test statistic. However, PROC PLAN has a sample size constraint – the product of the sum of the two sample sizes (n1 + n2) and the number of samples being drawn (T) cannot exceed 2 or the procedure terminates. Yet this can be circumvented by inserting calls to PROC PLAN in a loop which cycles roundup((n1 + n2)* T / 2) times, each loop drawing T * [roundup((n1 + n2)* T / 2)] samples until T samples have been drawn (see code in Appendix A). And this looping in and of itself does not slow the total runtime of the procedure. The relative speed of PROC PLAN when samples are large enough to require such looping is at least 20 times faster than PROC MULTTEST. Another important issue regarding the implementation of these procedures is the sampling method: both PROC MULTTEST and PROC PLAN can perform crude Monte Carlo sampling without replacement within a sample, as required of a permutation test, but neither can avoid the possibility of drawing the same sample 6 Of course, the sample sizes of all these possible combinations are the same as the original two samples. 7 The p-value is simply a proportion – the percentage of “permutation” sample test statistic values at least as large as that based on the original data. 8 This paper addresses only the two-sample permutation test, though its methods readily can be applied to permutation tests with more complex study designs. 9 PROC MULTTEST is a versatile procedure with many functions – I address only its specific application to two-sample nonparametric permutation tests on continuous data. more than once. In other words, when drawing a sample of samples, both procedures can only sample the entire samples with replacement. This problem of drawing duplicate samples, its effect on the statistical power of the permutation test, and a proposed solution that maximizes power under crude Monte Carlo sampling are discussed below. DETERMINING THE NUMBER OF PERMUTATION SAMPLES Full enumeration of permutation tests quickly becomes infeasible as sample sizes increase because the number of possible sample combinations becomes very large, even for relatively small sample sizes. However, full enumeration is unnecessary as the permutation test can be based on only a sample of all possible samples. Recognizing that the resulting p-value is simply an estimated proportion distributed binomially, we can determine its coefficient of variation (cv) and confidence interval as functions of the number of samples (T) drawn. For example, to achieve cv>0.10 with a p-value equal to the critical value of the test (p-value = α = 0.05), T = 1,901. This yields a 95% confidence interval of within 0.01 of the estimated p-value, which for most practical purposes is a sufficiently precise estimate of the fully enumerated p-value. Larger values of T will yield greater precision, but the marginal increases in precision decrease rapidly and nonlinearly (see Graph 1 below). 10 These are simple, uniform random draws as opposed to more complicated sampling algorithms, such as importance sampling (see Mehta et. al. (1988)). 11 The number of possible sample combinations is given by ( ) 1 2 1 2 ! ! ! n n n n + , which for n 1 = 3 and n 2 = 3 is just 20, but for the slightly larger sample pair of n 1 = 29 and n 2 = 30, is a sizeable 59,132,290,782,430,700. 12 The coefficient of variation is a unitless measure of relative spread. It is simply the standard error of a statistic divided by its mean (see Zar (1999), p. 40).
منابع مشابه
Fast Permutation Tests that Maximize Power Under Conventional Monte Carlo Sampling for Pairwise and Multiple Comparisons
متن کامل
Exact and Asymptotically Robust Permutation Tests
Given independent samples from P and Q, two-sample permutation tests allow one to construct exact level tests when the null hypothesis is P = Q. On the other hand, when comparing or testing particular parameters θ of P and Q, such as their means or medians, permutation tests need not be level α, or even approximately level α in large samples. Under very weak assumptions for comparing estimators...
متن کاملA bi-aspect nonparametric test for the two-sample location problem
Permutation methods are prized for their lack of assumptions concerning distributions of variables. A bi-aspect permutation test based on the Nonparametric Combination of Dependent Tests theory is developed for testing hypotheses of location shifts of two independent populations. The test is obtained by combining the traditional permutation test with a test that takes into account whether a sam...
متن کاملAsymptotically Valid and Exact Permutation Tests Based on Two-sample U-statistics
The two-sample Wilcoxon test has been widely used in a broad range of scientific research, including economics, due to its good efficiency, robustness against parametric distributional assumptions, and the simplicity with which it can be performed. While the two-sample Wilcoxon test, by virtue of being both a rank and hence a permutation test, controls the exact probability of a Type 1 error un...
متن کاملPermutation sampling in Path Integral Monte Carlo
Abstract A simple algorithm is described to sample permutations of identical particles in Path Integral Monte Carlo (PIMC) simulations of continuum many-body systems. The sampling strategy illustrated here is fairly general, and can be easily incorporated in any PIMC implementation based on the staging algorithm. Although it is similar in spirit to an existing prescription, it differs from it i...
متن کامل