Modelling Bivariate Longitudinal Data with Serial Correlation Gilbert MacKenzie1 and John Reeves1 1
Centre for Medical Statistics, Keele University, Keele, Staffordshire ST5 5BG, UK. E-mail:
[email protected]
Abstract In longitudinal data analysis (LDA) one may be confronted with a vector response whose evolution is of interest. This situation may arise in observational or randomised controlled studies where the outcome is a set of continuous measurements, whence the scientific interest lies in evaluating the influence of the fixed effects including the intervention. In this paper we formulate a bivariate regression model with random effects and analyse data measured on a continuous time scale. The model is recast in statespace form and estimation is performed using a modified Kalman filter. In the classical approach the intra individual errors are assumed to be statistically independent given the random effects, but we test this rather strong assumption by introducing a VAR(1) process which also enables us, inter alia, to estimate the cross-talk between the bivariate outcomes. Keywords: Longitudinal Data Analysis, Serial Correlation, Kalman Filter, Embedded Models, Goodness of Fit, Likelihood, GEEs, GLMMs, Double Kalman Filter
1
Introduction
In longitudinal studies, repeated meaurements are made on subjects over time and responses within a subject are likely to be correlated. Classical methods take this into account by assuming that, conditionally on a random effect, the repeated measures are independently and identically distributed. In simple cases, these assumptions give rise to models in which, unconditionally, the intra-individal correlation matrix between repeated measures has the property of compound symmetry - typically the off-diagonal terms 2 are given by ρ = σb2 /(σb2 + σw ), where σb2 is the between subject variance 2 and σw is the within subject variance.
2
Modelling Bivariate Longitudinal Data
These laird-Ware models are often physically unrealistic in medical and biological examples since it is the between subject variation which is generating the intra-individual correlation structure in the marginal model, viz ρ → 0 as σb2 → 0. The rationale for such a generating mechanasim is opaque in the context of longitudinal studies where the intra-individual correlation (over time) may have a strong medical or biological basis. For example, increased severity of a condition may limit the scope for subsequent improvement thereby inducing correlation between the resulting repeated response measurements. In these circumstances we should expect more intra-individual correlation when the individuals have the same severity status, contrary to the classical Laird-Ware assumptions. Despite these difficulties, somewhat perversely, the use of the Laird-Ware model remains deeply ingrained in the LDA culture and consequently mis-specification of correlation structures is common. Accordingly, we consider a more general class of models in which we allow serially correlated errors in the link functions of the models considered. This amounts to ’mopping-up’ after fitting a classical Laird-Ware model in which the correlation is possibly mis-specified. Moreover, we model the serial correlation directly on the real-time axis, an important choice, when, as is usual, the observations for each individual are irregularly spaced in time. We illustrate the basic ideas with a generalisation of the models proposed by Jones and Boadi-Boateng (1991) and Jones (1993) for serially correlated data. These authors modified the univariate Laird-Ware (1982) model, with individual random effects, to include a stationary auto-regressive AR(1) process in order to model residual serial correlation not accounted for by the random effects. We extend their modelling scheme to bivariate form by including a VAR(1) error process which connects the bivariate errors and show how the Kalman filter can be modified to relax the the rather strict assumption of stationarity. The model is used to analyse bivariate data from an observational study of anaemia in pregnancy. Finally we note that the modified Kalman filter may be adapted to fit GLMMs with serial correlation in the link.
2
Model Formulation
For the ith individual, i = 1, ..., n, consider a regression model of the form: ′
′
Yi = Xi β + Zi γi + ϵi
(1)
in which there are m repeated measurements on each subject. Then the ′ bivariate response, Yi , is stacked vector of order (2m × 1), Xi is a partitioned block diagonal matrix of order (2m × 2p) with zero off–diagonal
Gilbert MacKenzie and John Reeves
3
sub-matrices, where p is the number of regression parameters, whence β is ′ of order (2p×1). The block diagonal design matrix, Zi , is of order (2m×2r), where r is the number of random effects per dependent variable, whence γi is of order (2r × 1). Finally to complete the model the bivariate error ϵi is of order (2m × 1). In this bivariate model we assume: γi ∼ N (0, B) ϵi ∼ N (0, Wi )
(2)
and when Wi = I the repeated measures are statistically independent given the random effects. However, we complete the generalisation of the bivariate Laird-Ware model by adopting a first order vector autoregressive process(VAR(1)) to describe the within subject error: dϵi (t) = Aϵi (t)dt + Gdηi (t)
(3)
where: ηi (t = 1) ∼ N (0, I) is white noise, the infinitesimal change dϵi (t) is modelled by the 2 × 2 AR matrix A to be estimated and the matrix G determines the covariance of VAR(1) process. It follows that the bivariate AR(1) process can be decomposed as: d (1) (1) (2) (1) ϵ (t) = a11 ϵi (t) + a12 ϵi (t) + G11 ηi (t) dt i d (2) (1) (2) (2) ϵ (t) = a21 ϵi (t) + a22 ϵi (t) + G22 ηi (t) dt i
(4)
thus allowing the cross-talk between the bivariate responses to be investigated via autoregressive coefficients aij , i, j = 1, 2. We shall refer to this model as the generalised Laird-Ware model (GL-W).
3
Estimation
We rewrite the model in state space form and estimate the unknown parameters by Maximum Likelihood using a modification to the Kalman filter known as the Information Square Root Filter (Dyers and McReynolds, 1969). Without this modification, iterations of the usual Kalman Filter regularly failed to converge. 3.1
Kalman filtering
If we have independent errors and fixed effect regression parameters, then the model is a straightforward linear model and computation is relatively easy. However, the random variation in our model is not independent over all observations, and at time t the total error (random effects and residual error) takes the form Zi γi + εi (t), which we call ξi (t). We want to use
4
Modelling Bivariate Longitudinal Data
the Kalman Filter to effectively transform Xi and yi in equation (1) (the Laird-Ware model) so that we have the general linear model ˜ i β + ϵ˜i y ˜i = X
(5)
where ϵ˜′i = (˜ ϵi1 . . . ϵ˜iT ) and ϵ˜i ∼ N (0, σ 2 I). Thus, after transformation, we obtain the ordinary linear model with independent errors and estimation of regression coefficients is easy, for now ∑ ˜i ′ X ˜i )−1 X ˜i ′ y ˜i βˆ = (X i
The transformation is done via the Kalman filter, which essentially has two stages, a prediction step and an updating step. The prediction is carried out in accordance with the model, in our case the VAR(1) model, and the updating step accommodates new information from the actual observations in the model, Xi and yi . 3.2
State-space form
First we need to write the total random error, defined above, of the model in state space form with state equation d εi (t) = Aεi (t) + Gη i (t) dt and measurement equation, which is simply the total random error ( ) εi (t) ξi (t) = ( I Zi ) γi
(6)
(7)
The state equation is a first-order differential equation and is just the VAR(1) error process. The random effects are not updated over time because they are not subject to an AR process. However, the random effects of an individual can only be estimated from a subject’s past and present information, and when we come to update the state equation after incorporating the observed values of Xit and yit at time t, we have additional information on γi . The measurement equation is a function of the updated state equation and gives the structural part of the error process. After subtracting this structural error, ξi (t), from the actual error, εobs (t), we obtain an independent i error component, which is εobs (t) − ξi (t) ≡ ϵ˜i (t) i
t = 1, . . . T
where ϵ˜i is as in equation (5) above. From then we easily derive regression coefficients and the likelihood function. The total random error vector, ξi (t), is unobservable as ξi (t) = yit − Xit β. To obtain transformed independent errors, we proceed by transforming the columns of the design matrix, Xi , and the response vectors, yi , in a similar fashion by using the apparatus of the state and measurement equations.
Gilbert MacKenzie and John Reeves
3.3
5
Square-root Filtering
In order to initiate the Kalman Filter an estimate of the covariance of the state vector is required at time 0. It may be shown that the VAR(1) process is stationary when the eigenvalues, λi , of the autoregressive matrix, A, have negative real parts. Otherwise, the initial state covariance matrix is theoretically infinite and the Kalman recursion fails. Even when the VAR(1) process is stationary, the constraint on the eigenvalues may not be satisfied at every iteration of the filter. This problem manifested routinely in our early work and we sought a solution based on a modified Kalman filter known as the Information Square-Root filter. The information square-root filter avoids this computational difficulty by allowing us to update the inverse of the square-root of the state covariance matrix, and transform back in order to calculate the weighted crossproducts of Xi and yi . When the state covariance matrix is infinite, its inverse square root is the zero matrix. The information square root filter adds small quantities to the diagonal of the inverse square-root matrix - a strategy which enables updating to continue smoothly. Let us define x(t + 1) as the state vector that we wish to predict at time ¯ (t) as the current value of the state vector at time t, with t + 1, and x ¯ (t) and Σ(t) in making covariance Σ(t). The Kalman Filter works with x a one-step prediction for the state vector, whereas the information square root filter utilises: R(t) = Σ−1/2 (t) d(t) = Σ−1/2 (t)¯ x(t). Full details of the information square-root filter and of the method computing the one-step prediction for d(t) are rather complex, but they are set out in Dyer and McReynolds(1969) and further information is given in Reeves & MacKenzie (1998). Kitagawa (1981) used a square root filter to analyse a univariate nonstationary times series model whereas we are analysing a short bivariate time series within a regression framework. Accordingly, the use of the Information Square Root Filter may be viewed as a computational device which generalises the Kalman filter to non-stationary error processes. The modified Kalman recursion is computationally efficient and handles irregularly spaced data dynamically.
4
Results: A Study of Anaemia in Pregnancy
We illustrate our model using data from an observational study was carried out at the Mater Infirmorum Hospital in Belfast to investigate changes occurring in two blood measurements, Erythropoietin (Epo) and Haemoglobin
6
Modelling Bivariate Longitudinal Data
(Hb), throughout pregnancy. It is well known that, during pregnancy the total red blood cell mass initially falls, but then recovers its levels later on. It is thought that the dynamics of Haemoglobin (g/dl) may be influenced, at least in part, by changes in Epo (mU/ml, where mU are international units of Epo activity). Some 264 patients attending an antenatal clinic in Belfast were recruited, and three blood samples of Hb and Epo were taken at booking, 28-32 weeks and 38 weeks though, these times were variable and there were missing visits. We fitted a standard parametric model for longitudinal data, the LairdWare model, to both variables simultaneously. Covariates included in the model were gestation (weeks), age of patient (years) and parity (number of previous children), and an interaction term. Table 1 gives the fixed regression estimates of the two dependent variables, conditional on the patient’s individual random effects. The two models are Model 1, which is the standard Laird-Ware model with independent error, and Model 4, which was the final model chosen. Model 4 accounts for serial correlation on Epo and Hb, and a uni-directional effect from Epo to Hb. By comparing the fixed effects in the classical Laird-Ware model and the extended model to include serial correlation, we see there is little difference between either the coefficients or the standard errors. Thus, primary inference based on regression coefficients is not changed (in this case) by making the Laird-Ware model more complex. However, the fit is significantly improved by including AR(1) serial correlation. Thus our extended model provides evidence of serial correlation in addition to that already accounted for by the random effects and further insight into the relationship between the two variables studied. When we included a VAR(1) error process with all four elements of the autoregressive matrix, A present Model 2 the AIC statistic was reduced by 110.01, showing a significantly better fit over Model 1, the Laird-Ware model. Table 2 shows the the AIC for the Laird-Ware model and the differences in AICs obtained for the the reduced models proposed. Models 3 and 4 are reached by deleting the elements a12 and a21 from the full model, singly. The elements a11 and a12 define the effect of the previous values of Hb and Epo respectively on the change in the present level of Hb. The autoregressive term, a12 , appears to be significant, as the AIC increases when it is omitted, but no comparable increase was detected when a21 was omitted. Thus, we may infer that there exists a slight directional effect in the model - from Epo to Hb. To summarize, the change in Hb, is affected by the previous values of Hb and Epo, but the change in Epo only depends on the previous value of Epo.
Gilbert MacKenzie and John Reeves
7
TABLE 1. Comparison of Models: mles(standard errors)
Covariates
Hb
Epo
Model 1: Laird-Ware model with independent error Constant Gestation(in weeks) Parity Age Age*Parity
12.60 –0.0361 –0.206 0.0116 0.0044
(0.352) (0.00333) (0.216) (0.0134) (0.0072)
–0.6262 0.933 9.884 0.312 –0.258
(7.037) (0.0630) (4.307) (0.267) (0.144)
Model 4: Laird-Ware model with VAR(1) error Constant Gestation(in weeks) Parity Age Age*Parity
12.55 –0.0351 –0.209 0.0125 0.00468
(0.355) (0.00337) (0.217) (0.0134) (0.00724)
0.644 0.901 9.662 0.305 –0.257
TABLE 2. AIC values for the models
Models Model Model Model Model
1 2 3 4
AIC : : : :
Laird-Ware model (ind. error) Full A matrix a12 omitted a21 omitted
6979.47 –110.01 –20.52 –110.73
(6.859) (0.0687) (4.175) (0.259) (0.139)
8
5
Modelling Bivariate Longitudinal Data
Discussion
This short paper highlights the use of the bivariate generalised Laird-Ware model in the analysis of data arising in an observational study in which a vector response was measured irregularly in time. We have also successfully applied the model to bivariate data arising in randomised controlled clinical trials, especially in Ophthalmological trials where interest is focused on bivariate outcomes in one (affected) eye or the same measurement made in both eyes. Key features of the model are the extension of the Laird-Ware model to handle bivariate responses and the inclusion of a bivariate autogressive process to measure residual serial correlation. This more general model showed a significant reduction in the AIC over the standard Laird-Ware model in the Epo study. Thus, the Laird-Ware model, with individual random effects and (assumed) independent error, was not optimal. This is a consistent finding in other data sets analysed and perhaps one which is not too surprising given the unrealistic assumptions underpinning the mechanism generating the correlation in the Laird-Ware model. However, in this example, primary inference on the fixed effects was invariant to model choice. Initially, we were attracted to Kalman recursion and state-space modelling because of its potential computational convenience- the method avoids the large data matrices which usually arise in longitudinal data analysis and handles irregularly spaced data conveniently. The convergence problems encountered with the standard Kalman filter were unanticipated, but were resolved satisfactorily by the use of the Information Square Root Filter, which proved indispensable in the analysis of our data. The bivariate model enabled us to study the effect of covariates (and in the clinical trial setting, intervention) on both responses simultaneously. It provides a framework in which the complexity of the inter-connection between the response variables may be studied, for example, via correlated random effects and noise. These features shed light on the mechanism of operation of any treatment effect and should reduce the over-optimistic reporting which can result when the correlation between multiple responses is ignored. Accordingly, it provides a check on the interpretation of the corresponding univariate models. Moreover, while, it may be argued (especially from a GEE standpoint) that the correlation between repeated measures is merely a nuisance parameter, we do not share this perspective in general. Often, secondary aspects of the process, e.g, the time-structure, per se, will be of direct scientific interest. In this paper the bivariate model enabled us to study the direction of hypothesised biological effects between the correlated responses allowing for
Gilbert MacKenzie and John Reeves
9
baseline covariates and random effects and to confirm a clinical conjecture. The availability of ML estimation and the AIC criterion facilitated the choice of model (L-W or GL-W) when this might have been difficult otherwise, for example, with a GEE approach. Often, the better fit provided by the GL-W model was only evidenced in the residuals. Somewhat remarkably the modified Kalman filter algorithm can be extended by means of a double filter to deal with GLMMs with serial correlation in the link, thus extending the class of models proposed by Breslow and Clayton (1991).
6
Acknowledgements
We thank Dr. Mary McMullin of the Mater Hospital, Belfast, for providing the dataset and Professor R.H. Jones of the University of Colorado School of Medicine for some Fortran 77 code.
References Jones, RH and Boadi-Boateng, F (1990). Unequally spaced longitudinal data with AR(1) serial correlation Biometrics. 47: 161-175. Jones RH. (1993). Longitudinal data with Serial Correlation: A State-space Approach Chapman and Hall. Laird NM and Ware JH (1982). Random effects models for longitudinal data Biometrics 38: 963-974. Dyer P and McReynelds S (1969). Extension of squre-root filtering to include process noise Journal of Optimisation Theory and Applications 117(6): 444-458. Reeves J and MacKenzie G (1998). A bivariate regression model with serial correlation. JRSS Series D 47:4; 607-615. Breslow NE and Clayton (1993). Approximate inference in Generalised Linear Mixed Models. JASA 88: 9-25.