Abstract. When testing multiple hypotheses for genomics studies, we are usually confronted to the problem of unknown data and test statistics distributions.
Conditions for validity of re-sampling based multiple testing with applications in genomics Jason C. Hsu Department of Statistics The Ohio State University Columbus OH 43210 USA
Violeta Calian Science Institute University of Iceland Dunhaga 3, 107 Reykjavik Iceland
Abstract When testing multiple hypotheses for genomics studies, we are usually confronted to the problem of unknown data and test statistics distributions. The most popular solution of such problems seems to be based on re-sampling, with or without replacement, of raw data or residuals, in a modeling approach. In this paper we find conditions which guarantee that these resampling distributions provide strong control of maximum Type I error rate, when using a step - down partitioning multiple testing method. For this purpose, the true, theoretical distribution of the tests is compared with the analitically derived re-sampling distributions. These distributions are expressed as functionals of a generic data underlying, multivariate distribution. KEY WORDS: Re-sampling, Multiple testing, Family-wise error rate, Two-sample problem.
1
Motivation and formulation of the problem
The most challenging problems for statistical hypotheses testing are nowadays due to the advance of genetic studies. In this paper we evaluate how well re-sampling based multiple testing methods work in this setting. 1
Our problem is very well defined. We choose a type of genetic data, namely microarray gene expression level measurements and a type of multiple hypotheses to be tested, namely wether there is a difference between the mean expression levels of two groups of individuals, for the whole set of genes. The results are applicable to a larger category of problems. The evaluation of the re-sampling results is purely theoretical. We derive explicit expressions of the true distribution of the test statistic and of the re-sampling ones, as functionals of the data underlying but unknown distributions. In section 2 we discuss the control of error rates in multiple testing (MT) and in section 3 we explain the general MT method, the partitioning principle, its shortcutting step-down version and the conditions for validity, in terms of error control, of such an approach. The following remarks on the importance and practical applications of this problem will also show its relevance for classification algorithms. The levels at which different genes are functional in various tissue types, that is, their expression levels, can be screened simultaneously using microarrays. This is done at the transcription stage, based on the abundance of mRNA in samples. Microarrays and microarray experiments have the following clinical uses. • Disease prognosis: Gene expression levels measured by microarrays may be used to predict an individual’s prognosis in a certain disease. For example, MammaPrint by Agendia provides prognostics for breast cancer patients based on expression levels in tumor samples measured by microarrays. Medical devices for diagnostics and prognostics are subject to regulatory approval, by the Center for Device and Radiologic Health of the U.S. Food and Drug Administration (FDA). • Pharmacogenomics: Pharmacogenomics is the co-development of a drug that targets a subpopulation of the patient population, as well as a device that can be used to predict whether a patient is in this subpopulation of responders to the drug. An example of such a drug is Herceptin for HER-2/new positive breast cancer patients. Pharmacogenomics is subject to approval by the Center for Drug Evaluation and Research (CDER) and by the Center for Device and Radiologic Health of the U.S. Food and Drug Administration (FDA).
2
Developing a clinical device for prognostic or pharmacogenomics is a 2stage process. The first stage is to find marker genes to train a prognostic algorithm with, based on data with known disease outcomes. The second stage is a separate clinical trial validating the prediction algorithm, in terms of sensitivity and specificity. In the first stage, genes are first screened for differential expression, as having too many genes can overwhelm the training of a classification algorithm. At this stage, expression levels are measured using microarrays that probe a large number of genes, perhaps the whole genome. For example, the first stage in the training of MammaPrint used microarrays that probe approximately 24.500 genes and ESTs. For the second stage, a clinical diagnostic/prognostic chip for gene profiling is built, containing the genes in the signature for disease prognostic. For example, the second stage in the validation of MammaPrint used microarrays probing 70 genes only. Multiple testing is used in the first stage, in selecting genes to train a classification algorithm with.
2
Error rates and error control in MT
In testing the multiple null hypotheses H01 , . . . , H0k , the following types of error rate control are mostly used: 1. the comparison-wise error rate is said to be controlled at α if each H0i is tested so that P (reject H0i ) ≤ α when H0i is true. 2. the experiment-wise error rate is said to be controlled at α if when all H0i are true, P (reject at least one H0i ) ≤ α. Control of the experiment-wise error rate is also called weak control of the family-wise error rate. 3. the family-wise error rate (FWER) is said to be controlled strongly at α if regardless of which {H0i , i ∈ I}, I ⊆ {1, . . . , k}, are true, P (reject at least one H0i , i ∈ I) ≤ α. Strong control of the family-wise error rate is also called control of the maximum Type I error rate. 3
4. the False Discovery Rate (FDR) is said to be controlled at α if the expected proportion of incorrectly rejected (true) null hypotheses is no more than α : no. of true H0i rejected E ≤ α. total no. of H0i rejected A cautionary remark is necessary here: if a decision is going to be made from the results of all the tests, then one should test H01 , . . . , H0k in such a way that guarantees P (an incorrect decision) ≤ α.
(1)
After formulating a family of null hypotheses H01 , . . . , H0k , one should always perform a sanity check to make sure that control of whichever error rate chosen at α translates to (1). For instance, if a correct decision depends on correct inference from all the tests, then controlling the comparison-wise or the experiment-wise error rate does not guarantee (1). Note also that, even though a multiple testing method controls the familywise error rate or the false discovery rate, if the family of null hypotheses is not properly chosen, one may not guarantee the control of the probability of an incorrect decision. For some clinical trial problems, it turns out that controlling the per comparison error rate controls the probability of an incorrect decision. One example of such a problem is average AU C and Cmax bioequivalence. Another example is finding the minimum effective dose using a partitioning method. However, for some other types of clinical trial problems, controlling the family-wise error rate strongly still does not control the probability of an incorrect decision if the null hypotheses are not properly formulated. An example of such a problem is finding the minimum effective dose using a closed testing method with equalities being the null hypotheses.
3
Description of the MT method
Consider the problem of testing the statistical null hypotheses H0i : θ ∈ Θi , i ∈ I, where the complements Θci , of Θi , i ∈ I, and their intersections, correspond to useful scientific hypotheses to be proven. 4
3.1
Partitioning
The method we will use is known as the partitioning principle (Stefansson et all (1988)), (FinnerStrassburger(2002)) and consists of the following: S 1. Partition Ti∈I Θi into Θ?J , J ⊆ I, as follows. For each J ⊆ I, T Tdisjoint ? c let ΘJ = i∈J Θi ( j6∈J Θj ). Then {Θ?J , J ⊆ I}, including Θ?∅ = T c ? j∈I Θj , partition the parameter space. Note that ΘJ can be interpreted as the part of the parameter space in which exactly H0i , i ∈ J, are true, and H0j , j ∈ / J are false. There is thus an essential difference between Partitioning null hypotheses and Closed Testing null hypotheses. P 2. Test each H0J : θ ∈ Θ?J at level α. Since the null hypotheses are disjoint, at most one null hypothesis is true. Therefore, even though no multiplicity adjustment to the levels of the tests is made, the probability of rejecting at least one true null hypothesis is at most α. ? 0 3. For all J ⊆ I, infer θ 6∈ ΘJ if all H0J 0 such that J ⊆ J are rejected. That is, infer the intersection null hypothesis H0J is false if all null ? hypotheses H0J 0 implying it are rejected. In terms of useful scientific inference of rejecting the original null hypotheses H0i , i ∈ I, since S P such that J 3 i are Θi = J3i Θ?J , Partitioning rejects H0i if all H0J rejected.
Suppose the desired inferences are θ1 > 0, or θ2 > 0, or both. With the alternative hypotheses being Ha1 : θ1 > 0 and Ha1 : θ2 > 0, the proper null hypotheses are H01 : θ1 ≤ 0 and H01 : θ2 ≤ 0. Partitioning forms the hypotheses T
H0 : θ1 ≤ 0 P:θ ≤0 H01 1 P H02 : θ2 > 0
and
θ2 ≤ 0
and
θ2 > 0
and
θ2 ≤ 0
and test each at level-α. Since the three hypotheses are disjoint, at most one null hypothesis is true. Therefore, even though no multiplicity adjustment to the levels of the tests is made, the probability of rejecting at least one null hypothesis is at most α. 5
Applying the Partitioning Principle, statistical inference proceeds as follows: T If H0 : θ1T≤ 0 and θ2 ≤ 0 is accepted, no inference is given. P : θ ≤ 0 and θ > 0 If only H0 : θ1 ≤ 0 and θ2 ≤ 0 is rejected but not H01 1 2 P : θ > 0 and θ ≤ 0, the inference is at least one of H : θ > 0 or or H02 1 2 a1 1 Ha2 : θ2 T> 0 is true, but which one cannot be specified. P : θ ≤ 0 and θ > 0 are rejected but If H0 : θ1 ≤ 0 and θ2 ≤ 0 and H01 1 2 P not H02 T: θ1 > 0 and θ2 ≤ 0, the inference is θ1 > 0. P : θ > 0 and θ ≤ 0 are rejected but If H0 : θ1 ≤ 0 and θ2 ≤ 0 and H02 1 2 P : θ ≤ 0 and θ > 0, the inference is θ > 0. not H01 2 2 T 1 P : θ ≤ 0 and θ > 0 and H P : θ > If H0 : θ1 ≤ 0 and θ2 ≤ 0 and H01 1 2 1 02 0 and θ2 ≤ 0 are all rejected, the inference is both θ2 > 0 and θ2 > 0. Applying the Partitioning Principle to non-equivariant families of tests, Hsu (1981, 1984) derived multiple comparisons with the best confidence sets which imply Bechhofer’s (1954) Indifference Zone and Gupta’s (1956) Subset Selection inference. Stefansson, Kim, and Hsu (1988) and Hsu and Berger (1999) used the facts that Partitioning generates confidence sets and Partitioning can always achieve the same testing inference as Closed Testing to derive confidence sets for some closed tests.
3.2
Step - down testing
The disadvantage of the partitioning testing is the large number of disjoint hypotheses to be tested 2k − 1. A powerful solution to this problem is a testing method which “steps down” from the most statistically significant to the least statistically significant. It provides a shortcut for the partitioning testing, and can be described in a simple algorithmic way: Let [1], . . . , [g] be the random indices such that T[1] < · · · < T[g] . Step 1: If T[g] > c{[1],...,[g]} , then infer θ[g] 6= 0 and go to step 2; else stop. Step 2: If T[g−1] > c{[1],...,[g−1]} , then infer θ[g−1] 6= 0 and go to step 3; else stop. ···
6
Step g : If T[1] > c{[1]} , then infer θ{[1]} 6= 0 and stop; else stop. For example, suppose g = 3 and T1 < T3 < T2 . If |T2 | > c{1,2,3} so that H{1,2,3} is rejected, then one does not need to test H{1,2} and H{2,3} , because (by conditions S1–S3 below) the results would be rejections. A precise set of sufficient conditions for such shortcutting to be valid is as follows. S0: A level-α test for HI? should satisfy supθ∈ΘI Pθ {maxi∈I Ti > cI } ≤ α where ΘI = {θ : θi = 0 for i ∈ I and j 6= 0 for j ∈ / I} (note that the supremum of this rejection probability may or may not occur at θ1 = · · · = θg = 0. ) S1: Tests for all hypotheses are based on statistics Ti , i = 1, . . . , g, whose ? ; values do not vary with H0I ? ? S2: The level-α test for H0I is of the form of rejecting H0I if maxi∈I Ti > cI ;
S3: Critical values cI have the property that if J ⊂ I then cJ ≤ cI . Subtleties in conditions S1–S3 for shortcutting have not always been appreciated. For example, suppose the critical values cI are computed so that the probability of maxi∈I Ti > cI is less than or equal to α when ? H{1,...,g} : θ1 = · · · = θg = 0 is true, then the test for HI? may or may not be a level-α test. This is because the distribution of Ti for i ∈ I may depend on the values of θj , j ∈ / I. Proposition 1 Conditions S0–S3 guarantee this undesirable phenomenon does not occur, thereby ensuring strong control of the Family-wise Error Rate. One can easily prove this by noting that short-cutting partition testing is possible because if T[1]I > cI , then T[1]J > cJ for all J 3 [|1|I ], |J| ≤ |I|. For example, if T[1] < ck , then H0{1,...,k} cannot be rejected so no individual ∗ H0i will be rejected. One should thus accept H0[1] and stop. If the largest Tmax value (T max[1] ) is bigger than the largest critical ∗ value ck , then every H0J , J 3 [1], will be rejected and thus H0[1] can be rejected. So move on to compare the second largest Tmax value (T max[2] ) with ∗ the second largest critical value ck−1 . If T max[2] > ck−1 , then all H0J , J 3 [2], 7
will be rejected since c[1]J ≥ c[2] , J 3 [2], so H0[2] can be rejected. And we may continue.
4
Theoretical expressions of re-sampling distributions
We will address the multiple hypotheses testing problem of comparing the means (µαX and µαY of two distributions (FX and FY ) by using a test statistic ¯ α − Y¯ α ), with α = 1, 2, ...k. which is the difference in sample - means (T α = X Note that FX and FY are multivariate distributions, let k be their dimension. The parameters denoted θi in the first sections of this paper are thus µiX − µiY and H0i : θi = 0. We will use the step-down partitioning testing, which guarantees FWER control, provided the conditions for this shortcut to be valid are satisfied. In most practical applications, the multivariate distributions FX , FY and thus (FT ) or P (Tmax ) are not known. Therefore, the most popular way to solve the problem is to gnerate the latter by re-sampling. In order to evaluate the performance of several re-sampling methods, we first calculate these theoretical, true distributions as functionals of the data underlying distributions FX and FY . Then we derive theoretical expressions for the distributions obtained by re-sampling and obtain the conditions for these methods to give valid inference. The results of the following methods will be compared with the ones given by the true distribution of FT via the distribution of Tmax : a1) re-sampling with replacement from sample X and sample Y (PBboot ) ¯ and eY = a2) re-sampling with replacement, from residuals eX = X − X ¯ Y − Y (bootstrap centered data, PresBoot ) b1) permutation of raw data ( Ppermut ) b2) re-sampling with replacement from pooled sample of X and Y (Ppboot ) b3) re-sampling with replacement from pooled residuals ( Prespool )
8
4.1
Joint test statistic distributions
4.1.1
Distributions in terms of cumulants
We first derive the cumulants of theoretical and re-sampling distributions. Note that, for multivariate distributions, the first cumulant k1 is a vector (equal to the mean vector), the second cumulant k2 is a matrix (variance covariance matrix), the higher ones ka are all tensors of rank a. Proposition 2 Let X ∼ FX , Y ∼ FY , and assume that the cumulants of general multivariate (dimension K) distributions FX , FY exist and are finite, at least up ¯ − Y. ¯ Then the cumulants of the test statistic to some order q. Let T = X true distribution Ptheor are: ka (Ttheor ) = m1−a ka (FX ) + (−1)a n1−a ka (FY )
(2)
The proof has been given in our previous paper (Huang et all (2006)). For calculating the re-sampling distributions, one must take into account that only the empirical, discrete sample - distributions are practically available. The cumulants of a sample-distribution are equal to the sample - estimates of the cumulants of the true distribution. Consequently, they are distributed according to some probability density functions which have the true cumulants as means. We will introduce here a simplifying assumption: the sample sizes (m and n) for the samples of vectors X and Y, are large. The order of our approximations is O(m−1 ), O(n−1 ), and m, n having the same convergence rate. For consistency, this will also be the order of validity of our asymptotic expansions for FT , any higher negative powers of sample sizes giving irrelevant correction terms. In this approximations, the cumulants of the re-sampling distributions are given by the following: Theorem 1 Let X ∼ FX , Y ∼ FY , and assume that cumulants of general multivariate (dimension K) distributions FX , FY exist and are finite, at least up to some ¯ −Y ¯ and assume large sample sizes. Then: ka (PBboot ) ≈ order q). Let T = X ka (PresBoot ) ≈ ka (Ptheor ) and ka (Ppermut ) = ka (Ppboot ) ≈ ka (Prespool ), for any a > 1 and k1 is the zero - vector for all methods except a1) and the theoretical distribution. The cumulants for all methods are explicitly given as functionals of the original distributions as follows: 9
a1) re-sampling with replacement, from the sample X and sample Y (PBboot ): ka (TBboot ) = m1−a ka (FX ) + (−1)a n1−a ka (FY )
(3)
a2) resampling with replacement, separately, on each group of it residuals: ka (FX ) ma−1 a ka (FY ) (−1) na−1 (1
ka (TresBoot ) =
(1 − 1/m)a + (−1)a m−1 + ma − 1/n)a + (−1)a n−1 na
(4)
b1) permutations (Ppermut ): (−1)a ka (FY ) − na−1 a a a −(−1) /n ka (FY )) 1/m1/m+1/n
ka (Tpermut ) = (ka (FX ) −
ka (FX ) ma−1
+
b2) re-sampling with replacement from the pooled sample (Ppboot ): mka (FX ) + nka (FY ) 1 pboot a 1 + (−1) a−1 ka (T )= a−1 m n m+n
(5)
(6)
b3) re-sampling with replacement from pooled residuals (Prespool ): ka (Trespool ) =
1/ma−1 +(−1)a /na−1 m+n
·
(7) ka (FX ) Y) a−1 (m − 1) ((m − 1)a−1 + (−1)a ) + kna (F + (−1)a ) a−1 (n − 1) ((n − 1) ma−1
The proof is given in the Appendix. A few useful examples are listed in what follows: (i) For a = 1: k1 (Ttheor ) = µ1 (T theor ) = k1 (FX ) − k1 (FY )
(8)
(will be 0 under null hypotheses (for some dimension K for example in a step-down procedure), k1 (TBboot ) ≈ k1 (FX ) − k1 (FY ) 10
(9)
while: k1 (Tpboot ) = k1 (Tpermut ) = k1 (Trespool ) = k1 (TresBoot ) = 0
(10)
for any FX , FY . (ii) For a = 2 k2 (TBboot ) ≈ k2 (Ttheor ) =
1 1 k2 (FX ) + k2 (FY ) m n
(11)
k2 (Tpboot ) = k2 (Tpermut ) =
1 1 k2 (FX ) + k2 (FY ) n m
(12)
k2 (Trespool ) = k2 (Tpermut ) −
k2 (FX ) + k2 (FY ) mn
(13)
k2 (TresBoot ) = k2 (TBboot ) −
k2 (FX ) k2 (FY ) − m2 n2
(14)
(iii) For a=3: 1 1 k3 (FX ) − 2 k3 (FY ) 2 m n 1 1 k3 (FX ) k3 (FY ) pboot permut k3 (T ) = k3 (T )= − + m n n m k3 (TBboot ) ≈ k3 (Ttheor ) =
k3 (T
respool
)=
1 1 − m n
k3 (TresBoot ) = 4.1.2
(15) (16)
k3 (FX ) (m − 1)(m − 2) k3 (FY ) (n − 1)(n − 2) + n m2 m n2 (17)
k3 (FX ) (m − 1)(m − 2) k3 (FY ) (n − 1)(n − 2) + m2 m2 n2 n2
(18)
Asymptotic expansions of the joint FT
The pdf of the vector of test statistics T can be obtained by inverse Fourier transform from the characteristic function which in turn is written in terms of cumulants: Z g P 1 P0 n1 n2 ...nk n1 n2 n 1 ρ1 ρ2 ...ρk k d ρ P (T) = eiρT e−ik1 ρ− 2 ρk2 ρ+ a a! ka (19) (2π)g 11
P where the notation 0 means sum over all indices which give n1 + n2 + ... + nk = a. p By changing variables: T = k1 (T ) + z k2 (T ) (here T, z, ρ are vectors; a fractional power of the matrix k2 is defined in the usual way), we obtain: Z g P 1 P0 ˜n1 n2 ...nig n1 n2 n 1 2 ρ1 ρ2 ...ρk k d ρ P (T) = √ eizρ e−ρ /2 e a a! ka (20) (2π)g | k2 | √ where k˜a = ka /( k2 )a , formally. Note that k˜a will be of order m−a/2+1 , since ka is of order m−a+1 for any a. By using Laplace’s method, we write an asymptotic expansion for the integrand in (20), which will contain terms of orders mb , with b = 0, -1/2, -1, -3/2, ... (see a = 2, 3, ...). We are interested in terms up to m−1 , in order to be consistent with the large sample approximation defined in the previous section. The lowest order (b = 0, from a = 2) approximation gives the usual (multivariate-) normal distribution: 1 P (T) = √ φ(z)(1 + higher order terms )) (21) | k2 | where φ(z) is the k-dimensional Gaussian, with 0-mean vector and identity variance - covariance matrix,which then means that:i.e.the T - vector is also multivariate normal, with k1 mean - vector, and k2 variance - covariance matrix. The next correction term, of order O(m−1/2 ) (b = −1/2) can be written as: 2 0 X 1 e−z /2 1 √ k˜3n1 n2 n3 HnI(3) (z) 1 n2 n3 k | k2 | (2π) 3! n1 +n2 +n3 =3
(22)
I(a)
Here, a multidimensional Hermite polynomial Hn1 n2 ...nk of order a, with general covariance matrix γ is defined by: ∂ n1 ∂ nk − 1 xγx = (−1) e ... e 2 ∂xn1 1 ∂xnk k In general, higher order terms will be of the type: ! 0 k X Y 1 ˜n1 ,...,nk (a) ka Hn1 ,...,nk (z) n! n +n +...n =a i=1 i Hnγ(a) (x1 , ..., xk ) 1 n2 ...nk
1
2
Pk
i=1
k
12
ni
1 xγx 2
(23)
(24)
Next higher term would be of order b = −1 and we will not include it at this stage. Note that the integration needed in the next step for obtaining the distribution of maximum T implies that we need accurate approximations for the tails (large z) of the joint pdf. Thus in general, when the original distributions are not normal, one should consider at least the correcting term (22).
4.2
Tmax distributions and critical values
The distribution of max(Ti ) for i in a given set I = {1, ..., K} is obtained from the joint distribution of all components Ti (and dimension card(I)), which in turn is obtained by marginalizing the whole joint distribution of all T - vector components (dim = k). Let then a generic test statistic be denoted T = (T1 , T2 , ..., TK ). We need to solve (for cI ) equations of the type: Ptheoretical (Tmax > cI ) = α
(25)
Ppermutations (Tmax > c0I ) = α
(26)
or
where the distributions of maximum test statistic are given by:
P (Tmax = a) =
K X
P (T0(i) )l
i
< a|T = a) =
i=1
K Z X i=1
a
dTl0 P (T0(i) |T i = a)
−∞ l6=i
T0(i)
i
Y
1
where P (Tmax ) = P (maxi∈I T ) and we denote by = (T , ..., T a vector obtained from T by removing the component i. In order to obtain the critical values, we have to solve for cI : Z ∞ α= P (Tmax = a)da
i−1
(27) , T , ..., T K ) i+1
(28)
cI
As a consequence of the asymptotic expansions in the previous section, in general, the critical values depend on cumulants up to the third order, provided they exist and are finite. If the sample sizes are equal, the second (a1,a2) (b1,b2,b3) cumulants coincide for the two groups of methods k2 = k2 and in 13
normal approximation the critical values for the step-down testing are the same. This feature does not persist when looking at higher order expansions. The fourth order cumulants will be also the same, but the third differs drastically, even for equal sample sizes (k3permut = 0, while k3other methods 6= 0). In conclusion, the critical values will not be the same, when using one group of methods or the other, for general data distributions.
5
Discussion and conclusions
All conditions S0 - S3 are satisfied, for normal distributions FX , FY or in normal approximation, by almost all discussed methods. The only exception is the simple PBboot (bootstrap without post - centering), which does not have the right mean vector k1 (T Bboot ). This is why, a post - centering solution is recommended in PollardvanderLaan(2005), PollardvanderLaan(2003). The differences in higher cumulants induced by this process are small, i.e. of the same order as the approximation errors, for large sample sizes. For normal distributions, the pdf of the maxi∈I Ti does not depend on the components j ∈ / I of (k1 (FX ) − k1 (FY )). However, the critical values generated by all these methods will not coincide, unless FX = FY or m = n, due to the differences in variance - covariance matrices of the test under different methods of re-sampling. For higher order approximations which are necessary when the original distributions FX , FY are not multivariate normal, a new effect occurs: when integrating the joint distribution to obtain the distribution over some subset I, the resulted pdf will depend on the components of the mean vector which do not belong to I, due to higher order terms in the pdf expansion. This will not allow condition S0 to be satisfied anymore. In addition, the odd order cumulants (as k3 , for example) will not be equal for the category a) and b) of methods (in Theorem 1), even for equal sample sizes.
Appendix B: Proof of Theorem 1 We give a short proof of Theorem 1. We have showed already in Huang at all (2006) that the cumulants of the permutation distribution satisfy the equation (5). The expressions of ka (T Bboot ) are obtained exactly as in deriving the theoretical distribution cumulants, and they are the same, up to small 14
sample corrections. We use again two elementary properties of cumulants, namely: 1. the cumulant (of any order) of a sum of independently distributed random vectors is the sum of the individual cumulants 2. ka (cX) = ca ka (X), for any constant c ∈ R, When re-sampling with replacement from the pooled sample and calculate the cumulants of the difference in sample mean vectors, we obtain: 1 pboot a 1 ka (T )= + (−1) a−1 ka (Z) (29) ma−1 n where ka (Z) are the cumulants of the distribution of the pooled sample Z = {X1 , ..., Xm , Y1 , ..., Yn }: ka (Z) =
mka (FX ) + nka (FY ) m+n
(30)
which implies (6) We also verify here that the ”pooled-bootstrap” distribution of the test statistic is equal to the permutation distribution: ka (T pboot ) − ka (T permut ) = (31) ! a 1 + (−1) 1 1 1 (−1)a 1 ma na + ka (FX ) + − + n 1/n + 1/m ma−1 na−1 ma−1 1/n + 1/m ! (−1)a 1 + 1 (−1)a 1 1 1 a a n = + a−1 − a−1 + m ka (FY ) 1/n + 1/m m 1/n + 1/m ma−1 n n 1 (−1)a 1 (−1)a 1 1 −1 ka (FX )(1/n + 1/m) + + a− − a− + nma−1 na m na m nma−1 1 (−1)a 1 (−1)a 1 1 −1 + + a− − a− ka (FY )(1/n + 1/m) mna−1 na m na m mna−1 =0 As for the residuals’ re-sampling methods, one needs to write first the cumulants of the residuals: 15
ka (eX ) =
−1 ma
a
am
a
an
(1 − 1/m) + (−1)
ka (FX )
(32)
ka (FY )
(33)
and ka (eY ) =
(1 − 1/n) + (−1)
−1 na
Applying then similar calculations to the previous cases (bootstrap, permutations), the cumulants of the residuals’ re-sampling distribution are calculated for the pooled version (with a similar eZ ) and separate, with/without replacement.
References Hsu, J. C. (1996). Multiple comparisons. Theory and methods. Chapman & Hall. Stefansson G., Kim W., Hsu J.C. (1988). On confidence sets in multiple comparisons In Gupta S.S. and Berger J.O. (eds), Statitical decision theory and related topics IV, 2: 89 - 104 Springer Verlag, New-York. Pollard, K. S. and van der Laan, M. (2005). Re-Sampling-based multiple testing: Asymptotic control of type i error and applications to gene expression data. Journal of Statistical Planning and Inference, 125:85– 100. Pollard, K. S. and van der Laan, M. (2003). Multiple testing for gene expression data: an investigation of null distributions with consequences for the permutation test. In Proceedings of the 2003 International MultiConference in Computer Science and Engineering, pages 3–9. METMBS’03 Conference. Huang Y., Xu H., Calian V., Hsu J. C. (2006) To permute or not permute. Bioinformatics, vol.22,18: 2244 - 2248. van ’t Veer L. J., Dai H., van de Vijver M. J., He Y. D., Hart A. M., Mao M., Peterse H. L., van der Kooy K., Marton M.J., Witteveen A., T., Schreiber G. J., Kerkhoven R., M., Roberts C., Linsley P.S., Bernards 16
R., Friend S.H. Gene expression profiling predicts clinical outcome of breast cancer Nature, 415: 530 - 536. Finner H., Strassburger K., (2002) The partitioning principle: a powerful tool in multiple decision theory. Annals of Statistics, 30: 1194 - 1213 Peritz M., Peritz R., Peritz E., Gabriel K. R.(1976) On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63: 655 - 660.
17