Stat Comput DOI 10.1007/s11222-013-9412-6
Permutation tests for between-unit fixed effects in multivariate generalized linear mixed models Livio Finos · Dario Basso
Received: 26 August 2012 / Accepted: 18 June 2013 © Springer Science+Business Media New York 2013
Abstract A permutation testing approach in multivariate mixed models is presented. The solutions proposed allow for testing between-unit effect; they are exact under some assumptions, while approximated in the more general case. The classes of models comprised by this approach include generalized linear models, vector generalized additive models and other nonparametric models based on smoothing. Moreover it does not assume observations of different units to have the same distribution. The extensions to a multivariate framework are presented and discussed. The proposed multivariate tests exploit the dependence among variables, hence increasing the power with respect to other standard solutions (e.g. Bonferroni correction) which combine many univariate tests in an overall one. Examples are given of two applications to real data from psychological and ecological studies; a simulation study provides some insight into the unbiasedness of the tests and their power. The methods were implemented in the R package flip, freely available on CRAN. Keywords Permutation tests · Mixed models · Multivariate inference
1 Introduction The Linear Mixed Models (LMM) and Generalized LMM (GLMM) extend the linear and generalized linear models respectively, since they account for within-unit dependence
L. Finos (B) · D. Basso Department of Statistical Sciences, University of Padua, 35100 Padova, Italy e-mail:
[email protected]
(see Breslow and Clayton 1993 for a comprehensive review of this literature). Such models assume that, conditional on a realized vector of unit-specific parameters, the responses within a given unit follow a generalized linear model. The models further assume that the unit-specific parameters are random, typically multivariate normal with an unknown mean vector and covariance matrix. Such an approach becomes very useful in many experimental fields, for example when units are measured many times, possibly under different conditions. The inferential approach most widely used is certainly the maximum likelihood (ML) one. Despite the flexibility of the approach, there are several open issues connected with it. Due to the presence of nuisance parameters (describing the dependence within trials of the same unit) the inference cannot be exact and only approximate tests can be defined. The quality of this approximation for low sample sizes is not always assessed and can be potentially very low (especially for binomial responses). Despite relevant results have been reached on estimation methods (e.g. Neuhaus and Segal 1997; Zhu et al. 2005), the problem remains open. Furthermore, the main theoretical corpus has been developed under a univariate framework. When the perspective is multivariate, separated models can be tied together into a multivariate mixed model by specifying a joint distribution for their random effects. A number of papers have appeared using this approach (see Shah et al. 1997; Zhu et al. 2005 among others). But in large multivariate problems, as in omics and neuroimaging experiments, a computational problem arises when the dependence among responses has to be accounted for. The pairwise modeling approach proposed by Fieuws et al. (2007) offers an elegant solution, however the most favorable solution remains a matter of debate.
Stat Comput
A limitation of this multivariate approach relates to the restriction to homogeneous multivariate responses—indeed responses on the same scale are usually expected and longitudinal observations are the most typical example. Anderson (2001) proposes a solution for responses having different scales. The test is based on observation distances—based on the (possibly standardized) coordinates of the multivariate problem—with a permutation strategy used for inferences. This method covers a very broad range of experimental designs. However, it is restricted to multivariate LMM and does not apply to multivariate GLMM. In this work, we propose an inferential approach that overcomes many of the issues discussed. A solution for testing within-unit effects in a multivariate setting has been previously described in Basso and Finos (2012). Here we solve the more methodologically appealing problem by restricting our attention to between-unit effects. The present method enables us to deal with random effects when GLM models are used. We also discuss the multivariate extension to vector GLM (VGLM)—which comprises multinomial distributions, vector generalized additive models (VGAM; Yee and Wild 1996) and other nonparametric smoothing models—and the conditions under which the inference is possible. We make use of two datasets to lead the explanation of the method. The first dataset was published by Matozzo et al. (2012). In this work, we limit our interest to the formal aspects of the model rather than to the ecological interpretation of the results. The experiment compares Mediterranean Mussel (Mytilus galloprovincialis) under different living conditions, and measures three biomarkers, namely: total haemocyte count (THC), capability of haemocytes to take up the vital dye Neutral Red (NRU) and LYSozyme-like activity in cell-free haemolymph (LYS)—i.e. a three-variate response—to assess their health. The mussels are grown in different tanks and five mussels are measured in each tank. The experimental design considers three levels of pH (7.4, 7.7 and 8.1) and two temperatures (22 and 28 °C) with three tanks for each crossed level (i.e. it is a balanced 3 × 2 full factorial design with three replications). (With respect to the original data, we only consider tanks with salinities of 34 PSU.) In order to account for correlation among trials within the same tank (i.e. unit), one could perform an LMM for each of the three biomarkers. A valid model for a generic biomarker is: BioMarker = 1 + Temperature + pH + Temperature : pH + (1 | Tank) + ε
(1)
meaning that a common intercept, two main effects and an interaction are considered. There are two independent random terms: a tank-specific intercept and a mussel-specific
error term (i.e. ε). The former is the tank-specific random effect and is assumed to be drawn from a r.v. with zero mean and unknown—but fixed—variance, while the latter is the usual i.i.d. error term. The tank-specific effect means that the observations within the same tank are correlated. Therefore a standard linear model that does not consider this dependence among errors is not a valid model. The most widely used approach here is through an ML estimation followed by a likelihood ratio test (separately for each biomarker). The unknown variance of the random effect is estimated— typically under the assumption of normality for all error terms—and used in the model. Conditional on the realization of the random effects, the observations turn out to be independent of each other and the ML (and the residual variance) can be estimated without bias. An alternative and “naive” approach—which is very common in ecology, neuroscience, biomedicine and other applied fields—is the two-level anaLYSis (Moscatelli et al. 2012). It summarizes the observations within each unit with their means and then performs a standard two-way ANOVA with three observations in each crossed level. The rationale behind this is that—conditional and within each tank—the observations are independent. The mean of the observations within the same unit will estimate the true realized value of the tank random effect with a precision which is a function of the number of observations within the unit and the variance of ε. We will give more details in the next section; for now we just note that the approach provides valid inferences as long as one can assume: (i) an equal number of mussels within each tank; and (ii) equality of distribution of responses between the mussels from different tanks. In any other case, the results cannot be trusted and a different approach is needed. In this work we formalize and extend this naive approach, relaxing assumptions (i) and (ii). This approach does not have an immediate application when within-unit effects are considered in the design in addition to the intercept. To explain how to extend this approach, we use the data collected by Di Giorgio et al. (2012) in the experiment Visual Search for Faces among Objects in Complex Visual Displays. Twelve adults and nineteen threemonth-old infants were recorded under two possible viewing conditions (one face among four/six objects). Each participant was subjected to randomly selected images, approximately half for each viewing condition. The response variable FirstFix is a dichotomous variable that records whether the face is fixed at first sight or not. While the adults had 32 observations each (16 for each condition), the infants had a number of trials varying from 6 to 15 because they grew tired of the experiments. The aim of the study was to compare the two samples (i.e. Age, a between-subject factor) in the two viewing conditions (i.e. Display, a within-subject factor). Therefore a valid GLMM describes the logit of the probability of FirstFix as a function of the between-subject
Stat Comput
factor Age, the within-subject factor Display and the interaction Age : Display as follow: logit Pr(FirstFix) = 1 + Age + Display + Age : Display + (1 + Display | Subject)
(2)
In Sect. 2 we give more formal definition of these models and extend the two-level approach to account for effects within-unit—like the example above—and formalize it. In Sect. 3 we discuss the testing strategy and make further considerations, while in Sect. 4 we deal with the multivariate extension. Section 5 presents some simulations and Sect. 6 gives final remarks.
2 The model The heuristic of this methodological work is to move from the observation level to the unit level: a model for each unit is estimated and the coefficients are taken as the new multivariate output. This set of new responses is tested for between-unit effects. To formalize this idea, let us first consider the case of a univariate response. For each unit i (Tank in our ecological example, Subject in the psychological one) we have ni independent observations conditional on unit, for which we have an ni -vector response yi and an ni × p matrix of covariates. Assuming a GLM we write: E(yi ) = g −1 (Xi βi ),
i = 1, . . . , N
(3)
(N being the number of units) for a p-vector βi , a canonical link function g and dispersion parameter φi . When g is a canonical link function, the observed Fisher information of βi is I = −∂ 2 i /(∂βi ∂βiT ) (i being the log likelihood function), which is a function of dispersion parameter φi and βi (see McCullagh and Nelder 1989 for more details). Therefore βˆi is a r.v. with finite: E(βˆi |βi ) = βi Var(βˆi |βi ) = I
and −1
−1 = φi XTi Wi Xi = i .
Matrix Xi codes within-unit covariates. With respect to the ecological example above, Xi = 1ni since no since no within-unit covariate is considered and the link function g is the identity. In the psychological example, a within-subject factor is considered, so Xi can be modeled with a constant (i.e. the intercept with coefficient βintercept ) and a two-level factor indicating which display is shown (with coefficient βslope ). The link function g is the logit. Note that the conditional estimators of the fixed effects in each unit are unbiased and are assumed to be independent across units. We will exploit this result later in the inference phase. It is worth noting that estimates of coefficients of withinunit covariates are done conditional on values of betweenunit covariates. In the psychological example, the true
βintercept and βslope change depending on whether the unit is an adult or an infant. We model, then, these vectors of within coefficients βi through vectors of between-unit coefficients . Consider the N × p matrix (N number of units) T ]. of the true coefficients of all units B = [β1T , β2T , . . . , βN Let us model it with between-unit covariates: B = Z + u where Z is an N × q matrix of predictors and is the q × p matrix of the between-unit effect parameters, u = [uT1 ; . . . ; uTN ] is an N × p random matrix with independent rows (i.e. among units) satisfying E[ui ] = 0p , and Var(ui ) = u representing the variance of the random effect. Let us consider the ecological example and restrict the attention to a given biomarker (i.e. an univariate response). The matrix Z can be set as a matrix of contrasts including the intercept: ⎤ ⎡ 13 13 13 03 13 03 ⎢ 13 13 −13 13 −13 13 ⎥ ⎥ ⎢ ⎢ 13 13 03 −13 03 −13 ⎥ ⎥, ⎢ (4) Z=⎢ 03 −13 03 ⎥ ⎥ ⎢ 13 −13 13 ⎣ 13 −13 −13 13 13 −13 ⎦ 13
−13
03
−13
03
13
and is a 6 × 1 matrix of coefficients. The first column of Z represents the intercept, the second column models the Temperature, the third and fourth model the three levels of pH and the fifth and sixth model the interaction. For the psychological example Z is:
112 012 Z= , (5) 119 119 (12 and 19 being the number of adults and infants in the sample) and is a 2 × 2 matrix, the first column connecting Z with the average effect within subjects, the second linking Z with the slope among the two levels of Display (within each subject). Furthermore, the random effect component is assumed to be independent of the intra-unit error component, therefore we may write the whole model for the estimates of B as: ˆ = Z + (u + E) = Z + Eu , B
(6)
T ] is a random matrix with indewhere E = [ε1T , ε2T , . . . , εN pendent rows satisfying E(εi ) = 0N and Var(εi ) = i . Roughly speaking, the vector u models the random effects, and E models the variability of the estimators that are obtained conditionally and therefore reflects the variability within each unit. Then, unconditionally the rows of Bˆ satisfy
E(βˆi ) = βi and Var(βˆi ) = u + i = u+i
i = 1, . . . , N.
The model described here is very general and comprises a wide range of possible specifications. First of all it allows
Stat Comput
predictors Xi and to be of any nature (e.g. continuous or categorical) like in any other generalized linear model. It is also easy to extend the univariate ecological model (one biomarker per model) to a multivariate one considering all m biomarkers (m = 3 in this case) at once with possibly many covariates within-unit. B will be an N × pm matrix, a q × pm matrix and i = i ⊗ (XTi Xi )−1 is the covariance matrix (⊗ represents the Kronecker product and i is the multivariate dispersion parameter, usually the covariance matrix of the residuals, see also Mardia et al. 1992). To a certain extent, it is also possible to jointly model responses of different natures (e.g. suppose one biomarker has a binomial outcome). In particular, when no covariates within-unit are considered—as in the ecological datathe matrix Wi is a diagonal matrix with constant elements within each unit, therefore i is the covariance of the observed data within each unit i divided by ni . Finally, this approach also makes it simple to extend VGAM to Vector Generalized Additive Mixed Models (VGAMM), VGLM to Vector Generalized Linear Mixed Models (VGLMM) and any other smoothing model. The only requirement is to provide estimates of the parameters conditional on each unit.
3 Hypothesis testing Now we discuss how to make inference on the elements of the matrix . Let us suppose we want to test the interaction term in the psychological model (2), i.e. we want to test whenever the difference among the two levels of Display (i.e. βslope ) is higher or lower in infants than in adults (i.e. interaction Age : Display). γ12 —the first element of the second column in —models the slope in adults, while γ22 —the second row, second column of —describes how much larger or smaller is the slope in infants. Therefore, only the latter is of interest when the interaction is under test. Notice that γ11 and γ21 (i.e. the first column of ) describe the average level in adults and the difference among infants and adults, respectively. In this case we can restrict our inference to the second column of , still accounting for the nuisance parameters γ11 and γ21 (i.e. parameters not under test). More formally, the hypothesis of interest here is whenever q1 coefficients in are null—let’s collect them in 1 —allowing the other q0 coefficients—collected in 0 —to be not null. Therefore we have = [ 0 ; 1 ] (row-wise bound matrices) and a null hypothesis: H0 : 1 = 0.
(7)
The alternative is H1 : 1 = 0, ∀ 0 with some coefficients possibly restricted to a directional alternative. When testing for interaction Age : Display, we restrict our attenˆ and set 0 = γ12 and tion to the second column of B
1 = γ22 . Here, 0 = γ12 plays the role of nuisance parameter. However it is worth noting that it could be used to test the within-unit effect of Display, hence testing whenever γ12 := E(βslope ) = 0. As mentioned in Sect. 1, the problem is solved—within this framework—in Basso and Finos (2012), while here we deal with the between-unit testing problem. Let us suppose now that in model (2) we are interested in testing whenever the two groups are different in any of the two levels of the factor Display (i.e. Age or Age : Display are not null). With reference to matrix (5), γ21 models the difference in βintercept among the two groups (i.e. the effect of Age), while γ22 models the difference in βslope (i.e. the effect of interaction Age : Display). Therefore, 0 = [γ11 , γ12 ] and 1 = [γ21 , γ22 ] are now 1 × 2 vectors (the first and the second row of , respectively). In this case, the null hypothesis becomes multivariate since 1 is a matrix (i.e. has more than one column). A second explicative example of multivariate test comes from the ecological model. When testing for an overall effect of factors on the three biomarkers, a MANOVA-like test is usually required. Despite the extension of the meaning of the null hypothesis to the multivariate framework is straightforward, the inferential procedure deserves some caution as it will be discussed in Sect. 4. The rest of this section is devoted to the univariate (i.e. Bˆ and are vectors) inferential approach. 3.1 Conditions for an exact solution As already stated, the observed βˆi have unit-specific variances u+i (which are scalars under the hypotheses tested in this section). This is the main reason why, usually, the naive permutation test cannot be used. However, this becomes possible when some assumptions hold. The permutation test does not require the observations to be i.i.d. (and hence, u+i needs not to be the same for each βˆi ), despite requiring them to be exchangeable. Therefore if (under the null hypothesis) the process that determines the variability u+i —and, more generally, the whole distribution—is the same for each βˆi , we obtain an exact permutation test (see Pesarin 2001). This is quite a lighter condition. For example in the ecological example, it can happen that not all the mussels provide valid measurements, therefore the number of valid observations differs within each tank (i.e. unit) and is a random variable, itself. Therefore, the estimated βˆi have different precisions, i.e. different u+i . However, under H0 , the observed coefficients are generated by the same random process in each tank and this ensures the exactness of the test. In general, we reach this condition if in each unit (i.e. tank): (1) the design matrix is equal—or the process that generates it—is the same for each tank; (2) the observations (e.g. of the mussels) have possibly different distribution but
Stat Comput
this distribution is determined by the same process in each tank; (3) missing observations occur with the same probability within each unit. Let us note that this approach does not assume any parametric family distribution for the errors. Moreover, it does not require the error u of unit i to be independent of errors ε. For example, it is often realistic to assume that a high value of the tank-related random effect will also affect the behaviour (e.g. the variability) of the mussels in that tank. This possibility is automatically taken into account in this approach. More formally, the experimental setting enables us to assume exchangeability among units under the null hypothesis. Since the ecological data are collected under an experimental setting, there are no strong reasons to reject these assumptions. Let us suppose we want to test the null hypothesis that none of the covariates is related to the response. Z1 includes all the columns of matrix (4) but the first (which defines the one-column matrix Z0 ). Under H0 , all pools of mussels in each tank have the same distribution, hence they are exchangeable. To perform a permutation test one needs (1) to compute a test statistic (e.g. F statistic or any other quantity related to the estimates of the size of the effects) for the observed data. Then, (2) to randomly shuffle the rows of Z1 to compute again the same test statistic. (3) By repeating step 2 a large number of times (e.g. 1,000), one gets the conditional and exact null distribution for the test statistic under the null hypotheses. The p-value is simply the proportion of times that the test statistic computed on observed data is larger than or equal to the statistics computed on random permutations. When covariates are involved (i.e. Z0 is neither null nor a constant term), the observations are not exchangeable any more, even when the null hypothesis is true. This is because ˆ not conthe rows of 0 —which model the mean of B—are stant. Classical linear models assume that 0 only affects the mean and not the variability of the within-unit trials. Under the same assumption, we can make use of the proposal of Commenges (2003) which considers the model after residualization and orthogonalization with respect to the covariates not under test Z0 . The model is, Bˆ = Z1 1 + Eu
(8)
with such that T = IN − ZT0 (ZT0 Z0 )−1 ZT0 . More details will be given in Sect. 3.2 where the weighted extension will be presented. The derived test has approximated control of the Type I error and ends up being exact under the assumption of norˆ When 0 has replicated values (e.g. categorical mality of β. data), it is also possible to retain the exactness of the test by independent permutations within strata (Pesarin 2001). The strata in this case are given by the crossed levels of columns of Z0 .
3.2 The general solution The assumption of unit-exchangeability does not hold in general and particularly in non-experimental settings. As an example, consider the psychological data cited above. Here, the design matrix is the same in each unit (i.e. subject), however the process that generates missing data depends on the level of the factor of interest (Age) since infants often get ˆ higher in intired. This makes the variability of estimated B fants than in adults. Therefore, it is a typical condition that makes the naive solution inapplicable. More formally, under model (6) the N realized observations are independent but with different variances u+i . Here we present the general method to make observations weakly exchangeable (i.e. up to the second moment) and to get a permutation test from them. To get weak exchangeability, we require the observations (i) to have an equal mean, and (ii) to have same u+i . ˆ is univariate therefore, Let us remark that in this section, B ˆ V = Var(B) is a positive definite diagonal matrix and V−1/2 always exists. Now, following the lines of weighted regressions, let, ˆ = V−1/2 Z0 0 + V−1/2 Z1 1 + V−1/2 Eu , V−1/2 B
(9)
with Z0 and Z1 being the columns of Z associated with 0 and 1 respectively. The premultiplication with matrix V−1/2 makes the rescaled estimators homoscedastic, although they differ in their expected values. Indeed, under the null hypothesis, ˆ = V−1/2 Z0 0 E V−1/2 B and ˆ = V−1/2 Var(B)V ˆ −1/2 = IN . Var V−1/2 B We can get rid of the part of the design matrix which is not of interest for the test by projecting the response onto the orthogonal space Q of Z0 , where Q = IN − V−1/2 ZT0 (ZT0 V−1 Z0 )−1 ZT0 V−1/2 . The model can be residualized with respect to the covariates not under test Z0 through premultiplication by Q: QV−1/2 Bˆ = QV−1/2 Z1 1 + QV−1/2 Eu .
(10)
This makes the residuals zero-centered but not uncorrelated of each other. In order to obtain uncorrelated responses, write Q as T according to its spectral decomposition, where its eigenvalues can only be zero or one (since Q is idempotent) and is an (N × N − q0 ) matrix whose columns are the eigenvectors corresponding to non-zero eigenvalues. We get the final model by premultiplying both sides of (10) with T , obtaining, ˜ = Z˜ 1 1 + E˜ u . B
(11)
˜ = where ˜ denotes the transformed matrix, e.g. Y T T −1/2 QV Y = Y.
Stat Comput
˜ has N − q0 exchangeable elements (see also Hence B Commenges 2003) and the permutation test applies. The permutation framework allows the user to choose among a wide variety of test statistics. We suggest the use of correla˜ and Z ˜ 1 as a test statistic to test whenever 1 tion among B is null or not. This enables us to deal with directional alternatives, an extra feature to the LRT approach which is usually restricted to two-sided alternatives. Alternatively, Basso and Finos (2011) suggest a test statistic that adapts the ideas behind the F statistic to the weighted case. Here we are allowed to combine the effect of many predictors; as an example, we can get a single p-value for the interaction between pH and Temperature by combining the effect of the last two coefficients in the model. When the covariance matrices u+i are known, tests are exact if errors are normal. In all other cases it is exact up to the second moment, hence asymptotically exact for an increasing number of units (see Commenges 2003). Of course, in real cases, u+i are hardly known. As for ML approaches, these must be estimated resulting in an approximated test in any case. The estimates of u+i can be reached in different ways. The estimates of i are easily reached through residuals of the models within-unit i. As in the most classical approach for ANOVA with random effects, the estimates of u can be obtained by subtraction of all i from the total variance (see Montgomery 2004). When 0 is not a single intercept, the estimates can be obtained through an iterative procedure as described in Basso (2011). It should also be noted that in the model (and in the estimation procedure) we assume that unbiased estimators of the i are available within each group, thus we may also allow random effects to behave differently among groups and units. Therefore, applying unbiased and consistent estimators of u+i to the tests proposed above, we obtain asymptotically exact tests. Note that in this case, asymptotic behavior is valid for an increasing number of observations within-unit (to provide more reliable estimates of i ) and an increasing number of units (to provide more reliable estimates of u ).
4 Multivariate tests Within the multivariate framework, we can make inference on the null hypothesis (7) when B and 1 are matrices and not vectors. We consider now the ecological example where all three biomarkers are taken into account in the same inference and an overall p-value is provided. It should also be clear that within this approach, the within-unit covariates are dealt with as multivariate responses. For example, in the ˆ i are bivariate (i.e. overall efpsychological example, the B fect and display effect) and 1 will be a 1 × 2 vector, the first element testing the effect of Age, the second testing the interaction between Age and Display.
Finally, it is worth highlighting that even multinomial reˆ i being the vector sponses for yi fall under this framework, B of estimated coefficients within the ith unit. As mentioned in Sect. 2, the class of comprised models is very broad. 4.1 Assuming independence between u+i and Z As mentioned in Sect. 3.1, the naive approach (i.e. without use of V) can be used only when u+i are independent of the values of covariates in Z. In these cases, we can apply any of the standard permutation methods. That is, we can ˆ and combine perform a separate test for each column in B these tests with standard methods to get an overall test. Several methods can be used to combine univariate tests, a wide class of combining functions is proposed by Pesarin (2001). Within this approach, it is simple to combine test statistics of different natures (e.g. continuous and counts/binomial data) and it is also possible to combine tests with different alternative directions (i.e. a mix of one- and two-sided alternatives). Among the most used functions, we recall the Fisher one, that mimics the very well-known combination of pvalues but does not assume independence among test statistics. This setting also allows for powerful methods of control of the Familywise Error Rate (FWER) to be applied. For example, it is simple to perform the min-p proposed by Westfall and Young (1993). We argued that this assumption is reasonable in the ecological example, therefore we can apply this approach to it. When we test for interaction between pH and Temperature, we get p = 0.0036, 0.0098 and 0.3651 for THC, NRU and LYS (respectively). The combination of the three tests in a global p-value through the Fisher combining function (Pesarin 2001) yields p = 0.0016 (Note that the global p-value is smaller than the minimum among the three original pvalues in this case). Detailed results are reported in Table 1. If a min-p procedure is performed, it rejects the null hypothesis of no effects of THC and NRU at level α = 0.05. Similar results have been reached using LRT for each biomarker and the Holm procedure for the multiplicity control. Essentially, we get the same inferential results without assuming normality of neither the trial-related nor the subject-related errors, but just a fixed effects model. Moreover, we do not require the within-subject variability to be the same among units (tanks). The only requirement is that subjects be randomly assigned to each tank. For the sake of simplicity, we used a multivariate dataset with a limited number of responses. However, the example should highlight how the method can exploit all standard (and exact) solutions of permutation tests. Since permutation tests can be successfully used in datasets with huge numbers of variables, (e.g. Nichols and Holmes 2002), this approach extends mixed models to massively multivariate data applications. The exact control of Type I error is a prominent is-
Stat Comput Table 1 Permutation tests of interaction Temperature : pH for the ecological example. t statistic and F statistics are reported for each biomarker. The last row reports the combination of the three F tests, which is performed through Fisher combination. 10,000 random permutations are performed
Biomarker
Interaction
Estimate
Statistic
p-value
THC
Temperature28:pH7.7
−1.008 × 105
t = −0.154
0.2743
THC
Temperature28:pH8.1
−2.425 × 106
THC
Combined
0.0037 0.0036
NRU
Temperature28:pH7.7
−0.008
t = −1.170
0.4522
NRU
Temperature28:pH8.1
−0.022
t = −3.477
0.0014
NRU
Combined
F = 6.258
0.0098 0.2072
LYS
Temperature28:pH7.7
−0.219
t = −1.363
LYS
Temperature28:pH8.1
−0.029
t = −0.179
0.6773
LYS
Combined
F = 1.096
0.3651
Combined
Combined
Fisher = 11.05
0.0016
sue in these cases. The quality of the approximation is usually very poor for low levels of α, and this makes combined tests (e.g. Bonferroni or Fisher) unreliable. On the contrary, even in a multivariate setting, the exactness of control of Type I error in a permutation test is not lost. 4.2 The general multivariate solution In the general (univariate) solution discussed in Sect. 3.2, the tests are performed separately and the data in each column is standardized through its specific V (which is differˆ Therefore, the βˆi are standardent for each column of B). ized only in a marginal sense, having a covariance matrix ρu+i = diag( u+i )−1/2 u+i diag( u+i )−1/2 . As a consequence, the test proposed in Sect. 3.2 is applicable to univariate tests but does not make the observation exchangeable in a multivariate sense and the multivariate approach used in the previous section cannot be immediately applied here without invoking the independence of the ρu+i of predictors Z. This would relax the assumption of independence among u+i and Z but only in a vague sense. From a practical perspective this assumption is hardly fulfilled. There are at least two alternative solutions that lead to a multivariate test overcoming this problem of (multivariate) homoscedasticity. As a first general multivariate approach, we suggest the direct combination of the βi through a vector c: ˆ = Zc + Eu c ˆ c = Bc B
t = −3.707 F = 8.795
(12)
This combination makes the random variables in Eu c independent with (scalar) variance cT u+i c and the solution for the univariate test proposed in Sect. 3.2 can be immediately applied. We define it as a direct test statistic. The linear combination makes the problem essentially univariate, therefore the estimate of the scalar cT u+i c can be performed on transformed data using the same estimation method as for univariate tests (Basso 2011). This makes the
test very suitable for high-dimensional problems. The degree of control of the probability of Type I error is not an issue since the test gains all the properties of the univariate tests. Consistency is also ensured if one can assume that for all = 0 (i.e. at least a non-zero coefficient and test is under H1 ) we have c = 0. This is generally true if there are no columns of B which can be obtained as a linear combination of the others. Of course, power is a serious issue here since a bad choice of c can lead to a test with very low power. Therefore, it becomes very important to exploit a priori information to get a combination c with reasonably high power. It can be very fruitful to have an a priori expectation of the sign of the ˆ As an example, in the ecological effects in each column of B. study, we could expect the effects of the three biomarkers to have the same signs, when Temperature is active (i.e. under H1 ). Note that this does not mean that we are presuming anything as to the direction of the alternative; this simply means that if there is a non-null effect of Temperature in the first biomarker, the effects in the other biomarkers have the same signs or are null. The direct test statistic allows for one- and two-sided alternatives of course. In general it is not possible to use more than one linear combination without falling into the pitfall of generatˆ c with a different covariance matrix in each ing matrices B row (i.e. unit). Therefore, if more than one linear combination is considered (i.e. c is a matrix), the test is marginally unbiased, but the joint distribution of multivariate tests does not ensure control of Type I error. In this case, a Bonferroni correction is the “safe” way to obtain an overall test. The case of bivariate output partially derogates this rule. Let’s define c+ = diag( u+i )−1/2 , c− = diag(1, −1) × diag( u+i )−1/2 and C = [c+ , c− ] and apply: ˆ = ZC + Eu C ˆ c = BC B
(13)
Now each covariance matrix CT u+i C of rows of Eu C is an identity matrix. Therefore, the two tests derived from the
Stat Comput Table 2 Permutation tests of Age and interaction Age:Display of the psychological data. Results of separated test and their combinations are reported. 10,000 random permutations are performed
Factor
Estimate
Age
0.423 −0.041
Age:Display
p-value
t = 6.127