Module 1: General introduction to mixed models

16 downloads 11173 Views 145KB Size Report
1.2 Introductory example: NIR predictions of HPLC measurements . . . 3 ... Module 1: General introduction to mixed models. 2. This course is aimed at providing ...
DTU Informatics

02429/M IXED L INEAR M ODELS P REPARED BY THE S TATISTICS G ROUPS AT IMM, DTU AND KU-L IFE

Module 1: General introduction to mixed models 1.1

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Introductory example: NIR predictions of HPLC measurements . . .

3

1.2.1

Simple analysis . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2.2

Simple analysis by an ANOVA approach . . . . . . . . . . .

4

1.2.3

The problem of the ANOVA approach . . . . . . . . . . . . .

5

1.2.4

The mixed model . . . . . . . . . . . . . . . . . . . . . . . .

6

1.2.5

Comparison of fixed and mixed model . . . . . . . . . . . . .

6

1.2.6

Analysis by mixed model . . . . . . . . . . . . . . . . . . . .

8

Example with missing values . . . . . . . . . . . . . . . . . . . . . .

9

1.3.1

Analysis by fixed effects ANOVA . . . . . . . . . . . . . . .

9

1.3.2

Analysis by mixed model . . . . . . . . . . . . . . . . . . . .

10

Why use mixed models? . . . . . . . . . . . . . . . . . . . . . . . .

11

1.3

1.4

1.1

Preface

Analysis of variance and regression analysis are at the heart of applied statistics in research and industry and has been so for many years. The basic methodology is taught in introductory statistics courses within almost any field at any university around the world. Depending on the number and substance of these courses, they usually only provide statistical tools for a rather limited pool of setups, which often will be too simplistic for the more complex features of real life situations. Moreover, the classical approaches are based on rather strict assumptions about the data at hand: The structure must be described by a linear model, observations, or rather the residual or error terms, must follow a normal distribution, they must be independent and the variability should be homogeneous.

02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified August 23, 2011

Module 1: General introduction to mixed models

2

This course is aimed at providing participants with knowlegde and tools to handle more complex setups without having to fit them into a limited set of predefined settings. The theory behind and the tool box given by the mixed linear models emcompass methodology to relax the assumptions of independence and variance homogeneity. The great versatility of the mixed linear models has only relatively recently been generally accessable to users in commercial software packages, like SAS to be used in this course. And still today, many statistical packages will only offer a limited version of the possibilities with mixed linear models. Knowlegde about and experience with the possibilities of the mixed linear models will also form the basis of relaxing the assumptions about model linearity and normality. In the final Module 13 of the course it is indicated how the entire versatility of the linear mixed models (still using the normal distribution) can be embedded into non-linear and/or non-normal/non-quantitative data modelling and analysis. The availability of software to handle situations of this complexity is really still very limited and details of these issues are still a matter of active research within the science of statistics. We begin in this module by introducing the concept of a random effect. Module 2 introduces the factor structure diagram tool as an aide to handle general complex experimental structures together with the model notation to be used throughout. Module 3 is a case study illustrating how a data analysis project is commonly approached with or without random effects. In Module 4 the basic statistical theory of mixed linear models is presented. Modules 5 and 6 treat two specific and commonly appearing practical settings: the hierarchical data setting and the split-plot setting. In Module 7 ideas and methods for model diagnostics are presented together with a completion of the case study started in Module 3. Modules 8 and 9 cover two related specific settings: Mixed model versions of analysis of covariance (ANCOVA) and random coefficent (regression) models. In Module 10 the final theory is presented and in Modules 11 and 12 the important topic of repeated measures/longitudinal data is covered by a module on simple analysis methods and a module on more advanced modelling approaches. The readers are assumed to possess some basic knowlegde of statistics. A course including topics on regression and/or analysis of variance following an introductory statistics course will ususally be sufficient. The focus is on applications and interpretations of results obtained from software, but some insight in the underlying theory and in particular some feeling for the modelling concepts will be emphasized. Throughout the material, an effort is made to describe and make available all SAS code used and needed to obtain the presented results. Most examples are taken from the dayly work of the Statistics Group, Department of Mathematics and Physics, The Royal Veterinary and Agricultural University, Copenhagen, Denmark. As such biological oriented examples in a broad sense constitute the majority of cases. The material in general uses contributions from several staff members over the years: Henrik Stryhn, Ib Skovgaard, Bo Martin Bibby, Torben Martinussen.

02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

1.2

3

Introductory example: NIR predictions of HPLC measurements

In a pharmaceutical company the use of NIR (Near Infrared Reflectance) spectroscopy was investigated as an alternative to the more cumbersome (and expensive) HPLC method to determine the content of active substance in tablets. Below the measurements on 10 tablets are shown together with the differences (in mg): HPLC Tablet 1 10.4 Tablet 2 10.6 Tablet 3 10.2 Tablet 4 10.1 Tablet 5 10.3 Tablet 6 10.7 Tablet 7 10.3 Tablet 8 10.9 Tablet 9 10.1 Tablet 10 9.8

NIR Difference 10.1 0.3 10.8 -0.2 10.2 0.0 9.9 0.2 11 -0.7 10.5 0.2 10.2 0.1 10.9 0.0 10.4 -0.3 9.9 -0.1

One of the main interests lies in the average difference between the two methods, also called the method bias in this context.

1.2.1

Simple analysis

The most straightforward approach to an analysis of this data is by considering the setup as a two-paired-samples setup and carry out the corresponding paired t-test analysis. The paired t-test approach corresponds to a one-sample analysis of the differences, i.e. calculating the average and the standard deviation of the 10 differences given in the table above: d = −0.05, sd = 0.2953 The uncertainty of the estimated difference d = −0.05 is given by the standard error sd 0.2953 SEd = √ = √ = 0.0934 n 10 A t-test for the hypothesis of no difference (no method bias) is then given by t=

02429/Mixed Linear Models

d −0.05 = = −0.535 SEd 0.0934

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

4

with a P-value of 0.61 and there is no significant method bias. A 95%-confidence band for the difference is given by d ± t0.975 (9)SEd . So we get in this case, since the critical t-value with 9 degrees of freedom is 2.262, that the 95%-confidence band for the method bias is −0.05 ± 0.21 showing that even though we cannot claim a significant method difference, the difference could very well be as small as −0.26 or as large as 0.16. The simple analysis just carried out is based on a statistical model. Formally, the statistical model is expressed by di = µ + εi , ε ∼ N (0, σ 2 ), where µ is the true average difference between the two methods (the true method bias), and σ is the true standard deviation. ”True” refers here to the value in the ”population”, from which the ”random sample” of 10 tablets was taken (in this case tablets with a nominal level of 10 mg active content). The differences di are defined by di = yi2 −yi1 , where yij is measurement by method j (j = 1(NIR) and j = 2(HPLC)) for tablet i, i = 1, . . . , 10. Formally, the average and standard deviation of the differences are ”estimates” of the population values and a ”hat”-notation is usually employed: b = sd µ b = d, σ

1.2.2

Simple analysis by an ANOVA approach

The paired t-test setup can also be regarded as a ”Randomized Blocks” setup with 10 ”blocks” (the tablets) and two ”treatments” (the methods). The model for this situation becomes: yij = µ + αi + βj + εij , εij ∼ N (0, σ 2 ), (1.1) where µ now represents the overall mean of the measurements, αi is the effect of the ith tablet and βj is the effect of the jth method. An analysis of variance (ANOVA) will result in the following ANOVA table: Source of Degrees of Sums of Mean F variation freedom squares squares Tablets 9 2.0005 0.2223 5.10 Methods 1 0.0125 0.0125 0.29 Residual 9 0.3925 0.0436

02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

P 0.0118 0.6054

Last modified: August 23, 2011

Module 1: General introduction to mixed models

5

Note that the P-value for the method effect is the same as for the paired t-test above. This is a consequence of the fact that the F-statistic is exactly equal to the square of the paired t-statistic: FM ethods = t2 = (−0.535)2 = 0.29 The estimate of the residual standard deviation σ is given by p √ σ b = MSResidual = 0.0436 = 0.209 Note that this equals the standard deviation of differences divided by √ σ b = sd / 2



2:

The uncertainty of the average difference is given by s   1 1 2 + SE(y 2 − y 1 ) = σ b = 0.0934 10 10 exactly as above, and hence also the 95% confidence band will be the same.

1.2.3

The problem of the ANOVA approach

The analysis just carried out is what most statistical textbooks would present as the analysis of the randomized complete blocks design data. And when it comes to the test (and possible post hoc analysis) of treatment differences this is perfectly alright, and any statistical software would do this analysis for you (In SAS: PROC GLM). The problem arises if you also ask the programme to give you an estimate of the uncertainty of the treatment averages individually, that is, not a treatment difference. For instance, what is the uncertainty of the average value of the NIR method? A standard use of the model (1.1) leads to: σ b SE(y 1 ) = √ = 0.066 10 and this is again what any ordinary ANOVA software procedure would tell you. This is NOT correct, however! Assume for a moment that we only observed the NIR method. Then we would use these 10 values as a random sample to obtain the standard deviation s1 = 0.4012 and hence the uncertainty

s1 SE(y 1 ) = √ = 0.127 10 Note how the model (1.1) dramatically under-estimates what appears to be the real uncertainty in the NIR values. This is so, because the variance σ 2 in the model (1.1) 02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

6

measures residual variability after possible tablet differences has been corrected for (the effects of tablets in the model). In the subsequent, and better analysis in that respect, it is the variability between tablets that is used. The conceptual difference between the two approaches is whether the 10 tablets is considered as a random sample or not. In the ANOVA they are not; each tablet is individually modeled with an effect and the results of the analysis is only valid for these 10 specific tablets. However, these 10 specific tablets are not of particular interest themselves - we are interested in tablets in general, and these 10 tablets should be considered as representing this population of tablets, i.e. a random sample. The drawback of analyzing the NIR (resp. HPLC) data in a separate analysis is that the information available in the complete data material about variability within each tablet is not used, leading to in-efficient data analysis. The solution is to combine the two in a model for the complete data, that considers the tablets as a random sample, or in other words: where the tablet effect is considered as a ”random effect”.

1.2.4

The mixed model

The model with tablet as a random effect is expressed as yij = µ + ai + βj + εij , εij ∼ N (0, σ 2 ),

(1.2)

where µ as before represents the overall mean of the measurements, βj is the effect of the jth method and ai is the random effect of the ith tablet assumed to be independent and normally distributed ai ∼ N (0, σT2 ). The random effects are also assumed to be independent from the error terms εij Since the model consists of as well random effects (tablet) as fixed (non-random) effects (method) we refer to models of this kind as mixed models. Each random effect in the model gives rise to a variance component. The residual error term εij can be seen as a random effect and hence the residual variance σ 2 is a variance component.

1.2.5

Comparison of fixed and mixed model

To understand the conceptual differences between the models (1.1) and (1.2) we will study three theoretical features of the models: 1. The expected value of the ijth observation yij 2. The variance of the ijth observation yij 3. The relation between two different observations (covariance/correlation) The results are summarized in the following table: 02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

1. E(yij ) 2. var(yij ) 3. cov(yij , yi0 j 0 ) 0 (j 6= j )

7

Fixed model (1.1) Mixed model (1.2) µ + αi + βj µ + βj 2 σ σT2 + σ 2 0 0 σT2 (if i = i ) 0 0 (if i 6= i )

The table is obtained by applying basic rules of calculus for expected values, variances and covariances to the model expressions in (1.1) and (1.2), for instance the variance of yij in the mixed model : var(yij ) = = = =

var(µ + ai + βj + εij ) var(ai + εij ) var(ai ) + var(εij ) σT2 + σ 2

The expected values and the variances show how the effects of the tablets in the mixed model enter the variance (random) part (as a variance component) rather than the expected (fixed/systematic) part of the model. It also emphasizes that expectations under the mixed model does not depend on the individual tablet, but is an expectation of the average population of tablets. To understand the covariance part of the table, recall that the covariance between two random variables, e.g. two different observations in the data set, express a relation between those two variables. So independent observations have zero covariance, which is the case for the ordinary fixed effects model, where all observations are assumed independent. The result for the mixed model is obtained as follows cov(yij , yi0 j 0 ) = cov(µ + ai + βj + εij , µ + ai0 + βj 0 + εi0 j 0 ), using (1.2) = cov(ai + εij , ai0 + εi0 j 0 ), only the random effects = cov(ai , ai0 ) + cov(ai , εi0 j 0 ) + cov(εij , ai0 ) + cov(εij , εi0 j 0 ) (each possible pair) 0

If observations on two different tablets are considered, i 6= i , the independence assumptions of the mixed model gives that all these covariances (”relations”) are zero. 0 However, if two different observations on the same tablet are considered i = i and 0 j 6= j then only the latter three terms in the expression is zero, and the covariance is given by the first term: cov(yij , yi0 j 0 ) = cov(ai , ai ) = var(ai ) = σT2 Thus, observations on the same tablet are no longer assumed to be independent - some correlation is allowed between observations on the same tablet. This illustrates an essential feature of going from standard regression/ANOVA models to mixed models. 02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

1.2.6

8

Analysis by mixed model

As mentioned above, the analysis of the data based on the mixed model is in this case to a large extent an exact copy of the ordinary analysis: The same decomposition of variability as given by the ANOVA table is used, the F-tests for method effect (and/or tablet effect) and subsequent method comparisons are the same. The uncertainty of the average NIR-value in the mixed model is: p σ bT2 + σ b2 √ SE(y 1 ) = 10 Thus, to calculate this we need to estimate the two variance components. One way of doing this is by using the so-called expected mean squares. They give the theoretical expectation of the three mean squares of the ANOVA table: (we do not go through these theoretical calculations of expectations here) Source of Degrees of Sums of Mean variation freedom squares squares Tablets 9 2.0005 0.2223 Methods 1 0.0125 0.0125 Residual 9 0.3925 0.0436

E(MS) 2σT2 +P σ2 σ 2 + 10 βj2 σ2

The expectations show which features of the data enter each mean square. For instance, in the residual mean square only the residual error variance component enters, and thus this is a natural estimate of this variance component, σ b2 = 0.0436 (exactly as in the fixed model). And since the expectation for the tablet mean square shows that 0.2223 is an estimate of 2σT2 + σ 2 , we can use the value for σ 2 to obtain: σ bT2 =

0.2223 − 0.0436 = 0.0894 2

showing that the tablet-to-tablet variation seems to be around twice the size of the residual variation. The uncertainty of the average NIR-value now becomes √ 0.0894 + 0.0436 √ SE(y 1 ) = = 0.115, 10 now much closer to the figure found in the separate NIR data analysis above. And due to the complete model specification we were now able to decompose the total variability into its two variance components. This gives additional information AND has impact on the degrees of freedom to be used, when the standard error are to be used for hypothesis testing and/or confidence interval calculations - something we will return to in more detail later. The example has shown how the mixed model comes up as the more proper way of expressing a statistical model that fits the situation at hand. The main message was 02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

9

that to avoid making mistakes in the direction of under-estimation of uncertainty we needed the mixed model. As such, it came up as a necessary evil. The next example will illustrate how the mixed model in a direct way may give information about the key issues in a data set, that a straightforward fixed ANOVA does not.

1.3

Example with missing values

Imagine that in addition to the the 10 tablets of the previous example another 10 tablets were observed, but each of them only for one of the two methods, giving the following data table: HPLC Tablet 1 10.4 Tablet 2 10.6 Tablet 3 10.2 Tablet 4 10.1 Tablet 5 10.3 Tablet 6 10.7 Tablet 7 10.3 Tablet 8 10.9 Tablet 9 10.1 Tablet 10 9.8 Tablet 11 Tablet 12 Tablet 13 Tablet 14 Tablet 15 Tablet 16 10.3 Tablet 17 9.6 Tablet 18 10.0 Tablet 19 10.2 Tablet 20 9.9

1.3.1

NIR Difference 10.1 0.3 10.8 -0.2 10.2 0.0 9.9 0.2 11 -0.7 10.5 0.2 10.2 0.1 10.9 0.0 10.4 -0.3 9.9 -0.1 10.8 9.8 10.5 10.3 9.7

Analysis by fixed effects ANOVA

An ordinary fixed model analysis will result in the following ANOVA table: Source of Degrees of Sums of Mean F variation freedom squares squares Tablets 19 3.7230 0.1959 4.49 Methods 1 0.0125 0.0125 0.29 Residual 9 0.3925 0.0436 02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

P 0.0129 0.6054

Last modified: August 23, 2011

Module 1: General introduction to mixed models

10

Note that only the Tablets row of the table has changed compared to the previous analysis. Similarly, the estimate of the average method difference is given as before to be βb2 − βb1 = −0.05 only using the 10 tablets for which both observations are present. And the uncertainty is also just given by these 10 tablets, as before: s   1 1 2 b b + = 0.0934 b SE(β2 − β1 ) = σ 10 10 So in summary, the fixed effect analysis only uses the information in the first 10 tablets.

1.3.2

Analysis by mixed model

Consider for a moment how an analysis of the 10 tablets for which only one of the methods were observed could be carried out. This data set can be regarded as two independent samples - a sample of size 5 within each method, and a classical twosample t-test setting is at hand: y 1 = 10.22, s1 = 0.4658 y 2 = 10.00, s2 = 0.2739 The difference is estimated to y 2 − y 1 = −0.22, and the (pooled) standard error to: √ 2s s2 + s22 SE(y 2 − y 1 ) = √ = 0.24, s2 = 1 = 0.146 2 5 The results from the two separate analyzes can be summarized as:

Difference SE2

Tablets 1-10 Tablets 11-20 -0.05 -0.22 0.00872 0.0584

The fixed effects ANOVA only uses the first column of information; it would be preferable to use all the information. Since the two estimates of the method difference has

02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

11

(very) different uncertainties, a weighted average of the two using the inverse squared standard errors as weights could be calculated: 1 1 −0.05 0.00872 − 0.22 0.0584 βb2 − βb1 = 1 1 + 0.0584 0.00872 = (0.87)(−0.05) + (0.13)(−0.22) = −0.072

Using basic rules of variance calculus gives the squared standard error of this weighted average: 1 = 0.00759 SE2 (βb2 − βb1 ) = 1 1 + 0.00872 0.0584 and hence SE(βb2 − βb1 ) = 0.0871 Note that apart from giving a slightly different value, this estimator is also more precise, than the one only based on tablets 1-10. This is the kind of analysis that the mixed model for this situation leads to. And by combining the data in one analysis (rather than two separate ones), the information about the two variance components is used in an optimal way. In this case, the variance components are not easily derived from the ANOVA table. For now we will just state the results as they are given by PROC MIXED in SAS: σ b2 = 0.0435, σ bT2 = 0.1019 βb2 − βb1 = −0.07211, SE(βb2 − βb1 ) = 0.0870 We see how the mixed model automatically incorporates all the information in the analysis of the method difference, thus superior to a pure fixed effects ANOVA. This is an example of how analysis by the mixed model automatically ”recovers the interblock information” in an incomplete blocks design, Cochran and Cox (1957).

1.4

Why use mixed models?

We just saw above how the use of mixed linear models saved us from ”making a mistake” when it came to the uncertainty of the expected NIR level. More to the point would be to say, that the mixed linear model made it possible to broaden the statistical inference made about the average NIR level. Statistical inference is the process of using data to say something about the population(s)/real world from which the data came in the first place. Parameter estimates, uncertainties and confidence intervals and hypothesis testing are all examples of statistical inference. The inference induced by the fixed effects model in the introductory example is only valid for the 10 specific tablets in the experiment: the low uncertainty is valid for the estimation of the average of the 10 unknown true NIR values. The inference induced by the mixed model is valid for the estimation of the tablet population average NIR value. 02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011

Module 1: General introduction to mixed models

12

For the randomized complete block setting, the inference about treatment differences were not affected by the broadening of the inference space. In other situations, when the data has a hierarchical structure, the importance of doing the proper inference is an issue also for the tests of treatment differences. If 20 patients are allocated to two treatment groups, and then subsequently measured 10 times each, the essential variability when it comes to comparing the two treatments will most likely be in the patient-to-patient differences. And clearly it would not be valid just to analyse the data as if 100 independent observations are available in each group. A mixed model with patients as a random effect would handle the situation in the proper way inducing the inference most likely to be the relevant one in this case. We also saw above how a mixed model could recover information in the data not found by a fixed effect model, when incomplete and/or unbalanced data is at hand. This is an important benefit of the mixed models. The mixed model approach offers a flexible way of modelling covariance/correlation in the data. This is particularly relevant for longitudinal data or other types of repeated measures data, e.g. spatial data as in geostatistics. In this way, the proper inference about fixed effects is obtained and the covariance structure itself provides additional insight in the problem at hand. The handling of inhomogeneous variances in fixed and mixed models is also included in the tool box. Hence, there are many advantages of the mixed models. And in many cases, a mixed model is really the only reasonable model for the given data. It is only fair to admit that there is also a potential disadvantage. More distributional assumptions are made and approximations are used in the methodology leading to potential biased results. Also the high complexity of (some of) the models make the data handling and communication of the results a challenge. However, after this course you should be ready to meet this challenge!

02429/Mixed Linear Models

http://www.imm.dtu.dk/courses/02429/

Last modified: August 23, 2011