UNIVERSITY OF WISCONSIN — MADISON DEPARTMENT OF STATISTICS
Model Building for Nonlinear Mixed Effects Models Jose´ C. Pinheiro Douglas M. Bates Department of Statistics Mary J. Lindstrom Department of Biostatistics Technical Report #931 August, 1994
Department of Statistics University of Wisconsin–Madison 1210 West Dayton Street Madison WI 53706-1685
Phone: 608/262-2598 Fax: 608/262-0032 Internet:
[email protected]
Model Building in Nonlinear Mixed Effects Models Jos´e C. Pinheiro, Douglas M. Bates, and Mary J. Lindstrom Department of Statistics Dept. of Biostatistics University of Wisconsin — Madison ABSTRACT Nonlinear mixed effects models involve both fixed effects and random effects. Model building for nonlinear mixed effects is the process of determining the characteristics of both the fixed and the random effects so as to give an adequate but parsimonious model. We describe procedures based on information criterion statistics for comparing different structures of the random effects component. These include procedures for determining which parameters in the model should be mixed effects and which should be purely fixed effects, as well as procedures for modeling the dependence of parameters on cluster-specific covariates. These methods are illustrated using the nonlinear mixed effects methods and classes for S-plus and using data sets from a laboratory pharmacokinetic study and from a pharmacokinetics clinical study. Keywords: Mixed effects, random effects, nonlinear models, maximum likelihood.
1. INTRODUCTION Nonlinear mixed effects models have received a great deal of attention in the statistical literature in recent years because of the flexibility they offer in handling the unbalanced repeated measures data that can arise in different areas of investigation, such as pharmacokinetics, economics, and ecology. Model building in mixed effects models involves questions that do not have a parallel in (fixed effects) linear and nonlinear models. Some of these questions are:
determining which effects should have an associated random component and which should be purely fixed; using covariates to explain cluster-to-cluster parameter variability; using structured random effects variance-covariance matrices (e.g. diagonal matrices) to reduce the number of parameters in the model.
We consider here strategies for addressing these questions in the context of nonlinear mixed effects models, though most of the techniques described are also applicable to linear mixed effects models. Any model building strategy is by nature iterative: a tentative model is initially fitted and modified to generate possibly better models (according to some goodness-of-fit criterion) and the process is repeated until no further improvements are possible. In comparing alternative models one must also analyze the residuals from the fit, checking for departures from the model’s assumptions. It is also highly recommended that any model
building analysis be done in conjunction with experts in the field of application of the model, to ensure the practical usefulness of the chosen model. 2. THE NONLINEAR MIXED EFFECTS MODEL Several different nonlinear mixed effects models have been proposed in recent years (Sheiner and Beal, 1980; Mallet, Mentre, Steimer and Lokiek, 1988; Lindstrom and Bates, 1990; Vonesh and Carter, 1992; Davidian and Gallant, 1992; Wakefield, Smith, Racine-Poon and Gelfand, 1994). We consider here a slightly modified version of the model proposed in Lindstrom and Bates (1990). This model can be viewed as a hierarchical model that in some ways generalizes both the linear mixed effects model of Laird and Ware (1982) and the usual nonlinear model for independent data (Bates and Watts, 1988). In the first stage the j th observation on the ith cluster is modeled as
yij = f (ij ; xij )+ ij ; i = 1; : : : ; M; j = 1; : : : ; ni
(2.1)
where f is a nonlinear function of a cluster-specific parameter vector ij and the predictor vector ij , ij is a normally distributed noise term, M is the total number of clusters, and ni is the number of observations in the ith cluster. In the second stage the cluster-specific parameter vector is modeled as
x
ij = Aij + Bij bi; bi N (0; D); (2.2) where is a p-dimensional vector of fixed population parameters, bi is a q -dimensional random effects vector 2
–2–
A
B
associated with the ith cluster (not varying with j ), ij and ij are design matrices for the fixed and random effects respectively, is a (general) variance-covariance matrix. It is and 2 further assumed that observations made on different clusters are independent and that the i follow a ; 2 i distribution and are independent of the i . We will restrict ourselves here to . the case where i The parameters in the model are estimated by either maximum likelihood, or restricted maximum likelihood, based on the marginal density of . In general this density does not have a closed-form expression when the model function f is nonlinear in , so different approximations have been proposed for estimating it. We adopt here the approximation suggested in Lindstrom and Bates (1990). Pinheiro and Bates (1994) analyze several approximations to the loglikelihood of nonlinear mixed effects models and conclude that Lindstrom and Bates’ approximation usually gives very accurate results.
e
D
=I
y
b
j l
al e
N (0 )
j
i d bc b kl ik h i c a jf
8 Concentration (mg/L)
b
a 10
6
4
e j g k d c l
bd cj h k g fi
ea
l a j e
dc g b h
d g bc h i k f
k if
a
j
l d bc g h k i f
l e
a j
d c b gh i
l e d gc
kf
2
a
ib h kf
hhf aj g e d b
j e hg cl id fkb
fl ag 0
3. EXAMPLES
gj e cfli b d kh 0
We make extensive use of real data examples to illustrate the model building techniques discussed here. We now introduce the data sets that will be used throughout this paper.
a e
5
10
15
20
25
Time (hrs)
Figure 1: Theophylline concentrations (in mg/L) of twelve patients over time.
3.1. Theophylline This data set is courtesy of Dr. Robert A. Upton of the University of California, San Francisco. Theophylline was administered orally to 12 subjects whose serum concentrations were measured at 11 times over the next 25 hours. The data, presented in figure 1, is an example of a laboratory pharmacokinetic study characterized by many observations on a moderate number of individuals (clusters). A common model for such data is a first order compartment model with absorption in a peripheral compartment
Ct =
DKka [exp(?Kt) ? exp(?kat)] Cl(ka ? K )
(3.1)
where Ct is the observed concentration at time t (mg/L), t is the time (hr), D is the dose (mg/kg), Cl is the clearance (L/kg), K is the elimination rate constant (1/hr), and ka is the absorption rate constant (1/hr). In order to ensure positivity of the rate constants and the clearance, the logarithms of these quantities can be used in (3.1), giving the reparametrized model
Ct = where lCl
D exp(lka + lK ? lCl) (3.2) exp(lka ) ? exp(lK ) fexp[? exp(lK ) t] ? exp[? exp(lka ) t]g
= log(Cl); lka = log(ka), and lK = log(K ). 3.2. Quinidine
The second data set comes from a pharmacokinetics clinical study of the antiarrhytimic drug Quinidine. A total of
361 Quinidine concentration measurements were made on 136 hospitalized patients under varying dosage regimens. Additional data were collected on a set of nine covariates: age, height, weight, race, smoking status, ethanol abuse, congestive heart failure, creatinine clearance, and -1-acid glycoprotein concentration. Some of these covariates varied for the same patient during the course of the study, while others remained constant. One of the main objectives of the study was to investigate relationships between the individual pharmacokinetic parameters and the covariates. A full description of the data can be found in Verme, Ludden, Clementi and Harris (1992). Statistical analyses of these data using alternative modeling approaches are given in Davidian and Gallant (1993) and Wakefield (1993). The model that has been suggested for the Quinidine data is the one-compartment open model with first-order absorption. This model can be defined in a recursive way as follows. Suppose that, at time t, the patient receives at dose dt and prior to that time the last dose was given at time t0 . The expected concentration, Ct , and the apparent concentration in the absorption compartment, Cat , are given by
Ct Cat
exp[?K (t ? t0 )] + kCaa ?t kKa (3.3) fexp[?K (t ? t0 )] ? exp[?ka (t ? t0)]g = Cat exp[?ka (t ? t0 )] + dVt
=
Ct
0
0
0
where V represents the apparent volume in distribution and ka
–3–
and K are respectively the absorption and the elimination rate constants. When a patient receives the same dose d at regular time intervals , model (3.3) converges to a steady state model, where the expected concentrations are given by
Ct
=
d ka V (ka ? K )
(3.4)
1 ? exp(1 ?K ) ? 1 ? exp(1 ?k ) a
Cat
=
V
d
D
[1 ? exp(?ka)]
Patients considered to be in steady state conditions have concentrations modeled as above. Finally, for a between-dosages time t, the model for the expected concentration Ct , given that the last dose was received at time t0 , is identical to (3.3). Using the fact that the elimination rate constant K is equal to the ratio between the clearance (Cl) and the volume in distribution (V ), we can reparametrize models (3.3) and (3.4) in terms of V , ka , and Cl. In order to ensure that the estimates of V , ka , and Cl are positive, we can rewrite models (3.3) and (3.4) in terms of lV Cl . V ; lka ka and lCl and The initial conditions for the recursive model are C0 Ca0 d0=V , with d0 denoting the first dose received by the patient. It has been assumed throughout the model’s definition that the bioavailability of the drug, i.e. the percentage of the administered dose that reaches the measurement compartment, is equal to one.
log( ) =
= log( )
= log( )
smaller to larger approach is an alternative in these cases, but has the disadvantage of the large number of models that may have to be fitted before the desired one is found. There is yet another important aspect that is overlooked by the model nesting approach: sometimes it is a linear combination of random effects being treated as fixed that gives the best model reduction. The strategy we suggest for choosing the random effects to be included in the model is to start with all parameters as mixed effects, whenever no prior information about the random effects variance-covariance structure is available and convergence is possible. Then we examine the eigenvalues of the estimated matrix, checking if one, or more, are close to zero. The associated eigenvector(s) would then give an estimate of the linear combination of the parameters that could be taken as fixed. We used the Akaike information criterion to decide between alternative models, choosing the one with the smaller AIC. Small eigenvalues may arise when the relative magnitude of the scales of the parameters in the model are quite different, without necessarily implying overparametrization. Therefore we suggest using a normalized, scale invariant version of the variance-covariance matrix. There are different ways of normalizing , the most common being the correlation matrix. This is not a particularly good choice in the present context, since all random effects would then have normalized variance equal to one and we would not be able to identify those with relatively small dispersion (which would be natural candidates to be dropped from the model). Whenever the and matrices in (2.2) are incidence-like matrices (i.e. with just one nonzero entry per row), a more convenient choice of normalization is the coefficient of variation (CV) matrix CV with
=
=0
D
A
D [DCV ]ij = [D ]ij
4. VARIANCE-COVARIANCE MODELING In this section we consider the questions of determining which parameters in the model should have a random component and whether the scaled variance-covariance matrix of the random effects ( ) can be structured in a simpler form (i.e. with fewer parameters than the unstructured form). The first question that should be addressed in the analysis is choosing which parameters should be random effects and which purely fixed effects. Our approach is to fit different prospective models and compare nested models using likelihood ratio tests or some information criterion statistics, e.g. AIC (Sakamoto, Ishiguro and Kitagawa, 1986). One of the problems with this approach is deciding which way to construct the nesting; from smaller to larger models, or the other way around. Starting with a model where all parameters have associated random effects and then removing unnecessary terms is probably the best strategy, but may not be possible to implement if the model is badly overparametrized. In these cases the variancecovariance matrix of the random effects may become seriously ill-conditioned, making convergence difficult or impossible. The
D
B
(4.1)
k(i) k(j )
() ( )
where k represents the k th fixed effect and k i ; k j represent the indices of the fixed effects associated with the ith and j th random effects. When the nonzero elements of the ith row of and are equal to one, the ith diagonal element of CV is equal to the square of the coefficient of variation of i . We note that, in the vast majority of real life applications of model (2.1), and will be incidence-like matrices. To illustrate the use of this method we consider the examples described in section 3. We do not include any analyses of residuals here, but in all cases they did not indicate violations of the model’s assumptions. All maximum likelihood calculations in the examples were done using the nlme function in S-plus (Pinheiro, Bates and Lindstrom, 1993).
A
B
A
B
D
4.1. Theophylline The Theophylline data give an example where convergence is attained for the model in which all parameters are mixed effects. We refer to this model as model I.
–4–
124 03 4 324 0 019
D
: and the MLE of the CV matrix The AIC of model I is ?7 indicating has eigenvalues : ; : ; and : that the model is probably overparametrized. The eigenvector corresponding to the smallest eigenvalue, converted back to ; : ; : T, the original scale and normalized is : suggesting that the lCl random effect is approximately equal to twice the lK random effect. Recalling that the volume in distribution (V) is equal to the ratio between Cl and K , we see that lCl lK implies that the ratio between V and K , that we will denote by R, is a fixed effect. The recommendation at this point would be to contact a pharmacologist and check the plausability of this finding, as well as the interpretability of the parameter R. We will proceed the analysis here for the purpose of illustrating the use of the proposed model building techniques. We reparametrized model (3.2) in terms of lka , lK , and lR R , letting only the first two parameters be mixed effects. : , The AIC of this reduced model, called model II, is considerably smaller than the AIC of model I. The eigenvalues of the estimated CV matrix are : and : indicating that no further linear combinations of random effects can be eliminated from the model. In fact, if we remove either the lka or the lK random effect we get AIC values of respectively : and : , both substantially worse than model II. It is interesting to compare model II with the models obtained by considering each parameter at a time in model I as a fixed effect, to check if a more easily understood model could be used. The AIC of the models considering each of lCl, lka , and lK at a time as fixed effects are respectively : , : , and : , all considerably larger than the AIC of model II. Note however that the elimination of the lK random effect from model I has a much smaller impact on the AIC value, than the elimination of either the lCl or the lka random effects. This suggests that if one is willing to correct the overparametrization problem by dropping one of the random effects from the model (and not a linear combination of them, as was done in model II), lK would be the natural choice. The estimated correlation between the lK and the lka random : , suggesting that the two random effects in model II was effects could be regarded as independent and a diagonal used. The AIC of this model (III) is : indicating that it should be preferred over the previous models. No further reduction could be obtained and we in the number of parameters in concluded that model III was the most adequate.
2 031 10
(0 464 0 020 ?0 886)
numerical problem, with the optimizing algorithm alternating between equivalent solutions (in terms of the value of the loglikelihood) without ever converging. Different strategies can be used to try to circumvent the nonconvergence problem:
=2
= 118 20
log( )
D
203 865
0 356
0 158
200 135
163 224 194 189
125 446
?0 132
116 388 D
D
4.2. Quinidine The Quinidine data provide an example where convergence cannot be attained for the model with all parameters as mixed effects. The data are characterized by few observations on many patients: for patients there is only one observation patients only two. As of Quinidine concentration and for a consequence, the optimization of the loglikelihood for the saturated model (called model I) becomes a very ill-conditioned
46
32
D
try to achieve convergenceusing a diagonal and examine the relative variability of the random effects, investigating the possibility of eliminating one, or more, of them from the model; force convergence (e.g. letting the algorithm run until a preestablished maximum number of iterations) and examine the corresponding b CV matrix for rank deficiency;
D
try to achieve convergence for models with a smaller number of random effects.
D
Convergence could not be achieved even for a diagonal and so the second strategy was used here. We forced convergence after ten iterations of Lindstrom and Bates’ alternating algorithm. The AIC of this forced convergence fit was : . The eigenvalues ; : ; and : of the b CV matrix were equal to : ?9, suggesting that the model was overparametrized. The eigenvector corresponding to the smallest eigenvalue, converted to the original scale of the random effects and T , indicating that the ; : ; : normalized, was : lka random effect was about twice the lV random effect. In terms of the original parameters, that is equivalent to assume ka=V 2 is a fixed effect. As in the Theophylline that R example, the recommendation at this point would be to consult a pharmacologist about the physical meaning, if any, of the parameter R. In this example, though, it seems that the sparse nature of the data is responsible for the convergence problems in b general, and the rank deficiency observed for D CV in particular. Therefore, by using a reparametrization of model I in which R was incorporated as a fixed effect, we would run the risk of overfitting low quality data. We decided to follow a more conservative approach trying to solve the overparametrization problem by removing each random effect at a time from model I. Convergence was attained for the models in which each of lCl, lka , and lV at a time were treated as fixed. The : , : , corresponding AIC values were respectively and : , indicating that the model in which lka is the only fixed effect, called model II, should be preferred. For the sake of comparison, we also fitted the reparametrized model in which R was incorporated as the only fixed effect. The : and though this suggests that corresponding AIC was the reparametrized model fits the data better than model II, we will keep the latter for the reasons discussed previously. The estimated standard deviations of the random effects in and : and the estimated correlation model II were : coefficient was : . This suggested that the random effects
10
344 74 74 526 0 032
D
1 363
(0 097 0 415 ?0 905)
=
501 925 341 782
365 409
338 620
0 323 0 05
0 310
–5–
D D
5.1. Quinidine Figure 2 presents the scatter plots of the conditional modes of the lCl random effect, based on model III of subsection 2.3,
-0.8 40
o 50
0.4
o
60
70
80
0.0
lCl
-0.4
-0.4
o
o o oo o o oo o o o oo o o o o o o o ooo oo oo o o o o oo oo o oo ooo o oo ooooo ooo ooo o o ooo o o o ooo oo o o o o oo oooo oo o oo o o o o o oo oo o o oo o o oo o o o o o ooo oo o o o o o o o
-0.8
0.4
o o
lCl 0.0
o
90
0.4 lCl
-0.4 -0.8
o o o ooo o oooo o o o o oo ooo oo oo oo o o o o o ooo oo o o oo o o oo o o o ooo oo o o o o o ooo o oo o o oo o o oo o oo o o ooo oo o o ooo oo o o o o o o o o o o o
o o oo o o o o o o ooo ooo o o ooo o oo o o oo o o o o oo o o o oooo o ooo o o o o oo o oo oo o o o ooo o oo o o oooo o ooooo oo ooo o o o o o o o o o o o oo oo o o o oo o o oo o o o o oo ooo o o o o o o o o o o
o
60
o
o
65
70
75
40
60
100
120
0.4 lCl o o
o
Non Smoker Smoker Smoking status
-0.8
-0.4
0.0
0.4 lCl
-0.4 -0.8 Black
80 Weight
o
0.0
0.4 lCl 0.0 -0.4 -0.8
o Latin Race
o
Height o
Caucasian
o No/Mild Moderate Severe Congestive heart failure
Age
o o
o
50 Creatinine clearance
300
Glycoprotein concentration
In this section we consider the use of covariates to model random effects variability. This variability can either be related to natural cluster-to-cluster variation, or caused by differences in covariate values between and/or within clusters. The first questions to be addressed in the covariate modeling process are the determination of which variables are potentially useful in explaining random effects variation and which random effects may have their variability explained by covariates. This is probably best achieved by analyzing plots of the random effects estimates versus the covariates, looking for trends and patterns. The conditional modes of the random effects (Lindstrom and Bates, 1990) were used here for this purpose. After the candidate covariates have been chosen, a decision has to be made on how to test for their inclusion in the model. The number of extra parameters to be estimated tends to grow considerablywith the inclusion of covariates and their associated random effects in the model. If the number of covariates/random effects combinations is large, we suggest using a forward stepwise type of approach in which covariatesare included one at a time and the potential importance of the remaining covariates is (graphically) assessed at each step. The decision on whether or not to include a covariate can be based on the AIC of the fits with and without it. Another question that has to be addressed when including a covariate in the model, is which of the new parameters should be random or purely fixed. We suggest using an approach similar to the one described in section 4, for modeling the variance-covariance structure: whenever no prior information is available and convergence is possible, start with a saturated model (in which all new parameters are random) and, (or CV ) by examining the eigenstructure of the estimated matrix, search for plausible structures with fewer parameters. We use the Quinidine data to illustrate this model building approach. We reiterate that any model building strategy is not complete without a careful analysis of residuals and expert advice. In all examples considered here the residual analyses did not indicate departures from the model’s assumptions.
0.0
0.4 0.0
lCl
-0.4
250
0.4
200
0.0
150
lCl
5. COVARIATE MODELING
100
o o o
-0.4
50
o o o
-0.8
o
-0.8
0.4
339 799
o o oo o o o ooo o o ooo o o oo oo oooo o o oo ooo ooo oo oo o oo o oooo ooooooo o o oooooooooooooooooo ooo o ooo oooooooooo o o o o o oo o o o oooo oo o o o o o ooooo o o oo o o o
o
-0.8
497 968
versus the available covariates (when the covariate value changed over time, the mode was used). A loess smoother (Cleveland, Grosse and Shyu, 1992) is included in the continuous covariates’ plots to help the visualization of possible trends.
lCl 0.0
338 205
D
-0.4
had approximately the same variance and were not correlated. A multiple of the identity matrix was used to model and : , considerably the AIC of this reduced model (III) was smaller than those of both models I and II. No further reductions were possible, since, if we removed either the lCl random effect or the lV random effect from model III, we obtained AIC values : and : respectively. of In the next section we explore the use of covariates to explain the cluster-to-cluster variability observed for the lCl and lV random effects.
o
o o No
Abuse Previously Ethanol abuse
Figure 2: Conditional modes of the lCl random effect in model III versus available covariates. Clearance appears to decrease with -1-acid glycoprotein concentration and age with and to increase with weight and height. There is also some evidence that clearance decreases with severity of congestive heart failure and is smaller in Blacks than in both Caucasians and Latins. Clearly the -1-acid glycoprotein concentration is the most important covariate for explaining the lCl cluster-to-cluster variation. A straight line seems adequate to model the observed relationship. Figure 3 presents the scatter plots of the conditional modes of the lV random effect versus the available covariates. None of the covariates seems helpful in explaining the variability of this random effect and we did not pursue the modeling of its variability any further. Initially only the -1-acid glycoprotein concentration was included in the model to explain the lCl random effect variation according to a linear model. In the notation of (2.2) this modification of models (3.3) and (3.4) is accomplished by writing
lClij = ( 1 + b1i) + ( 2 + b2i) glycoproteinij
(5.1)
All parameters were treated as random in this first attempt, called model IV, but the random effects associated with lCl were assumed independent of the lV random effect, preserving
Caucasian
o
o
o o
o
Non Smoker Smoker Smoking status
No
50
60
o o o
o Abuse Previously Ethanol abuse
o
70
90
o o
o o o
65
o
o o o
o Latin Race
0.15 0.05 -0.05
lCl intercept
0.15
70
0.05 -0.05
75
Black
o o o o
o
o
o
o oo oo oo o o o o oo ooooo o ooo o o o ooo o oo oo ooo oooooooo o o o o o o o o o o o o o o o o oo ooo ooo oo ooo oo o oo ooooo oo o o o o oo o oooo o o o o o oo o o o o o 40
60
Height o o
o o
oo
-0.15
o
60
o o
o
Caucasian
lCl intercept
0.15 0.05 -0.05
lCl intercept
-0.15 80
-0.15
0.15 0.05 -0.15
lCl intercept
-0.05
0.15 40
o o
o
Age
0.0 -0.4
0.2 lV
lV
o o
o
o
o
o o oo oo oo o oo oo o ooo o o o o o oo o o oo oo o oo o oo o o o ooo oo o o o o oo o o o oo oo oooo o o o oo oo o o oo o o o o o o o o o o o o o o o o
80
100
120
Weight 0.15
o o o o
o o o o
Black
120
o
o
o
o
o
0.05
o
o o
o o o
0.2
o
o
Latin Race
100 o
o o o o o
o o o o o
o
o
80
o
o o o o o o o
o
o o o o o o o Non Smoker Smoker Smoking status
-0.05
0.4 o
60
o o o
o o
o
No/Mild Moderate Severe Congestive heart failure
lCl intercept
40
o
o
o
Weight
0.0
o o
o
-0.4
o o o o o
-0.15
0.2 lV -0.4
75
0.4
0.6
0.6 0.4 0.2 lV 0.0
70 Height
o
o
o o
o o
o o o o o o
-0.15
65
o o
o o
o
50 Creatinine clearance
0.15
60
o o o o o oo oo o oo o o o o o o o o o oo o o o o ooo o o o ooo o o ooooo o o o oo oo o ooo ooo oooo oo o ooooo o oo o o oo oo ooo oo o o oooo o oo o o o oo oo o oo o o o o oo o o o
o o
o o
300
0.05
90
o
250
-0.05
80
200
lCl intercept
70 Age
o
150
-0.15
60
o o o o o o ooo o o oo o ooo o oo oo ooo o o o o o o o o o o o oo oo ooooo o ooo ooo ooo ooo oooooo oo oooo o oo ooooo ooooo oooo o o o oo oo oo o oo o o o o oo o
o
100
o o
o o o o
Glycoprotein concentration
o
0.6
50
o
-0.4
-0.4 40
0.0
oo o o o o o o o o o o o oo ooo ooo o o o ooo oo ooo o o o ooo o o o oo o oo oo o ooo oo oo o oooo o o o o oo
o
o 50
0.4
o
o o
o
o
o
No/Mild Moderate Severe Congestive heart failure
0.4 0.2 lV 0.0
0.0
lV
0.2
0.4
o
o
-0.4
o
0.15
0.6
0.6
Glycoprotein concentration
o o o oo o o oooo o o oo o o o o o o o o o o o o o o o o o oo ooo o oo oo ooo ooo o ooo oo ooo oo o oo o oo ooo oo o o o o oooo o oooo o o oo o o oo o o o oo o oo o o
o
-0.4
o
50 Creatinine clearance
300
0.15
250
lCl intercept -0.05 0.05
200
o o o
0.6
150
-0.4
-0.4
o
o 100
o o
o
o
50
o o
o o o o o o o o o o o o o oooooooooo oo o ooo oo oo ooo ooooooooooo o oooooooo oo o ooooooo o oooooooooooo o o ooooooo o o oo o o oo o o o oo o o o o o o o o o
o
-0.15
o o o
o o
lCl intercept -0.05 0.05
o o
o
-0.15
o
o o o o o
lCl intercept -0.05 0.05
0.4 o o o o o
lV
lV
o
0.0
o
0.2
oo o o o oo oo o o oooo o o ooooooooooooooo oooooo o ooo oo o oooooooo o oo ooooo o oooo o ooo oooooo oo ooooooooooooo o o o o oo o oo o o o
o
0.2
0.4
o
0.0
0.0
lV
0.2
0.4
o
0.6
0.6
0.6
–6–
o
o
o o
o No
Abuse Previously Ethanol abuse
Figure 3: Conditional modes of the lV random effects in model III versus available covariates.
Figure 4: Conditional modes of the lCl intercept random effect in model V versus available covariates.
the covariance structure of model III. The AIC of this model was : indicating a considerable gain in goodness-of-fit : ). Using the when compared to model III (AIC of same strategy as in section 3 to model the variance-covariance matrix of the random effects, we selected a model in which the lCl random effect was independent of the lV random effect, the variance-covariance matrix of the lCl random effects was unstructured, but the variances of the intercept term in the lCl random effect, b1i in (5.1), and the lV random effect were the : . same. The AIC of this model (V) was Figures 4 and 5 present the scatter plots of the conditional modes of the lCl random effects in model V versus the available covariates. These plots indicate that the intercept random effect does not vary systematically with any of the covariates, but the slope random effect tends to increase with weight and height and is smaller among Blacks and patients with previous history of congestive heart failure. This suggests an interaction between the effects of these covariates and the -1-acid glycoprotein on the Quinidine clearance. At this point expert advice would be needed to clarify the plausability of this hypothesis. Since this was not possible here, we proceeded the model building analysis just for the purpose of illustrating the use of the proposed methodology. Using the forward stepwise approach we included the interactions between -1-acid glycoprotein and weight (as a linear predictor), race (as an indicator variable of black/not black status), and congestive heart failure (as an indicator variable of previous/no previous history of congestive heart failure) in
the model, in this order. The same random effect variancecovariance structure as in model V was used in all cases. The : , : , corresponding AIC values were respectively : . In all three cases substantial reductions in the AIC and values were observed. The random effects plots of the last model (with all three interactions with -1-acid glycoprotein included) did not indicate any other candidate covariates to be included in the model and we concluded that the model was adequate.
215 796
338 205
210 117 204 556
199 893
213 788
6. CONCLUSIONS The analysis of the eigenstructure of the estimated variancecovariance matrix ( ) of the random effects is a useful tool to determine which terms in the model should be random and which matrix also provides should be purely fixed. The estimated useful information to identify structured variance-covariance patterns. Information criterion statistics, such as the AIC, can be used as guidelines to model selection, but analysis of residuals and consultation with experts in the field of application of the model should also be used. The goodness-of-fit and interpretability of a mixed effects model can be substantially enhanced through the inclusion of covariates to explain random effects variability. Information criterion statistics can again be used for model selection, together with analysis of residuals and expert advice. We restricted ourselves here to mixed models in which the cluster errors, in model (2.1), were i:i:d:, but other covariance structures (e.g. autoregressive processes) can easily
D
D
–7–
60
70
80
90
o 65
75
o 60
Black
80
100
0.004
o o
o o Non Smoker Smoker Smoking status
Chi, E. M. and Reinsel, G. C. (1989). Models for longitudinal data with random effects and AR(1) errors, Journal of the American Statistical Association 84: 452–459. Cleveland, W. S., Grosse, E. and Shyu, W. M. (1992). Local regression models, Statistical Models in S, Wadsworth, Belmont, CA, chapter 8.
o o o o o
o o o o No
Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression Analysis and Its Applications, Wiley, New York.
120
Weight
lCl slope
0.004 -0.004
o
o o
o
o
0.0
o
Latin Race
o o oo o o o o oo oo o o o o o o oo o o o oo o o oo oooo ooooooooo o o o o oooo oooo ooo o ooo ooo o oo oo oooo oooo oo o o o o o o oo o o ooo o ooo oo ooo o o oo o oo o o o o o o o oo o o oo o o
40
o
lCl slope
0.004
70
o
o
Height
lCl slope 0.0 -0.004
0.0
lCl slope o ooo o o o o oo o o o oo oo oo o o o o oo o o o oo oo oo oo oo oo oo oo oo oo oo oo oo o ooooo ooo o oo ooo o oo oooo o oo oooo o oo o o oo o o o
60
o
Caucasian
o
o
Age o o o o
o 0.004
o o
0.0
0.004 0.0
lCl slope
o 50
REFERENCES
o No/Mild Moderate Severe Congestive heart failure
0.0
40
-0.004
o
o o
o oo oo o o oo o o ooo o o o o o o oo o oo o o o o o o o ooo o oo o ooo o ooo o oooo o ooo oo oooo ooo o oo o ooo oo o o o oo ooo o o o o o oo oo o o oo o o o o oooo o o o oo o oo o oo o
-0.004
0.004 lCl slope 0.0 -0.004
o o o
regions. More research is needed to determine which methods provide the most reliable statistical results and to compare their relative computational performance.
o o o
o
50 Creatinine clearance
300
Glycoprotein concentration
o
0.004
0.004
250
lCl slope
200
o
-0.004
150
0.0
o
lCl slope
o
-0.004
100
o o
o
o 50
o
o o
-0.004
-0.004
lCl slope 0.0
0.004
o o oo o o o o oo o o o o oo o oo ooo ooooooo o oo ooo o oo o o o ooo o ooo oo o o o o o ooooooooooooo o o ooooooooooooooooooo oo o oo oooooooo o o o o ooooooo o o o o oo o o
Abuse Previously Ethanol abuse
Davidian, M. and Gallant, A. R. (1992). Smooth nonparametric maximum likelihood estimation for population pharmacokinetics, with application to quinidine, Journal of Pharmacokinetics and Biopharmaceutics 20: 529–556.
Figure 5: Conditional modes of lCl slope random effect in model V versus available covariates.
Davidian, M. and Gallant, A. R. (1993). The nonlinear mixed effects model with a smooth random effects density, Biometrika 80: 475–488.
be incorporated into mixed models (Chi and Reinsel, 1989; Lindstrom and Bates, 1990). There is usually a trade-off between the number of random effects incorporated in the model and the complexity of the cluster errors covariance structure (Jones, 1990). Further research is needed in that area, especially under a model building perspective. Once a model has been chosen to represent the data, measures of variability for the estimates and confidence regions on the model’s parameters are usually needed for inferential purposes. Maximum likelihood asymptotic theory results can certainly be used at a preliminary stage. These results have not yet been proven for the nonlinear mixed effect model, but have been established for the linear mixed effects model (Pinheiro, 1994). Hence, at least as a first order approximation, asymptotic standard errors and confidence regions based on the normal distribution can be used. More refined methods, such as likelihood profile traces and contours (Bates and Watts, 1988), can also be used to assess the variability in the estimates, but these will generally be computer intensive. A compromise between the asymptotic and profiling methodologies is to use a linear approximation to the loglikelihood, as in Lindstrom and Bates (1990), to calculate the profile traces and contours. This constitutes a considerably less intensive computational problem than profiling the loglikelihood directly. Another (computationally intensive) alternative to assess the variability in the estimates is to use bootstrap methods (Efron and Tibshirani, 1993) to estimate standard errors and confidence
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap, Chapman & Hall, New York. Jones, R. H. (1990). Serial correlation or random subject effects, Communications in Stat., Part B–Simulation and Comp. 19: 1105–1123. Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data, Biometrics 38: 963–974. Lindstrom, M. J. and Bates, D. M. (1990). Nonlinear mixed effects models for repeated measures data, Biometrics 46: 673–687. Mallet, A., Mentre, F., Steimer, J.-L. and Lokiek, F. (1988). Nonparametric maximum likelihood estimation for population pharmacokinetics, with applications to Cyclosporine, J. Pharmacokin. Biopharm. 16: 311–327. Pinheiro, J. C. (1994). Topics in Mixed Effects Models, PhD thesis, University of Wisconsin–Madison. Pinheiro, J. C. and Bates, D. M. (1994). Approximations to the loglikelihood function in the nonlinear mixed effects model, Technical Report 922, Department of Statistics, University of Wisconsin – Madison. Pinheiro, J. C., Bates, D. M. and Lindstrom, M. J. (1993). Nonlinear mixed effects classes and methods for s, Technical Report 906, Department of Statistics, University of Wisconsin – Madison.
–8–
Sakamoto, Y., Ishiguro, M. and Kitagawa, G. (1986). Akaike Information Criterion Statistics, D. Reidel Publishing Company, Holland. Sheiner, L. B. and Beal, S. L. (1980). Evaluation of methods for estimating population pharmacokinetic parameters. I. michaelis-menten model: Routine clinical pharmacokinetic data, Journal of Pharmacokinetics and Biopharmaceutics 8(6): 553–571. Verme, C. N., Ludden, T. M., Clementi, W. A. and Harris, S. C. (1992). Pharmacokinetics of quinidine in male patients: A population analysis, Clin. Pharmacokin. 22: 468–480. Vonesh, E. F. and Carter, R. L. (1992). Mixed-effects nonlinear regression for unbalanced repeated measures, Biometrics 48: 1–18. Wakefield, J. C. (1993). The Bayesian analysis of population pharmacokinetic models, Technical Report TR-93-11, Statistics Section, Imperial College. Wakefield, J. C., Smith, A. F. M., Racine-Poon, A. and Gelfand, A. E. (1994). Bayesian analysis of linear and nonlinear population models using the Gibbs sampler, Applied Statistics. Accepted for publication.