An Introduction to Generalized Estimating Equations and an ...

180 downloads 0 Views 323KB Size Report
tions include longitudinal and hierarchically organized (or clustered) data. Gen- eralized estimating equations (GEE) are a convenient and general approach to.
Journal of Educational and Behavioral Statistics Winter 2004, Vol. 29, No. 4, pp. 421–437

An Introduction to Generalized Estimating Equations and an Application to Assess Selectivity Effects in a Longitudinal Study on Very Old Individuals Paolo Ghisletta University of Geneva Dario Spini University of Lausanne Correlated data are very common in the social sciences. Most common applications include longitudinal and hierarchically organized (or clustered) data. Generalized estimating equations (GEE) are a convenient and general approach to the analysis of several kinds of correlated data. The main advantage of GEE resides in the unbiased estimation of population-averaged regression coefficients despite possible misspecification of the correlation structure. This article aims to provide a concise, nonstatistical introduction to GEE. To illustrate the method, an analysis of selectivity effects in the Swiss Interdisciplinary Longitudinal Study on the Oldest Old is presented. Keywords: clustered data, generalized estimating equations, population-averaged method, selectivity effects

Correlated data are very common in educational and more generally in social science research. Longitudinal and hierarchically organized or clustered data represent two frequent analytical situations in which data within clusters are correlated. The classical example in education describes children clustered within classrooms within schools. This violates the statistical assumption about independent observations of traditional regression methods. Children within the same classroom will likely display similar, correlated values on several variables, as a function, among others, of the teachers’ influence and of regular interactions. If the within (or intra-) cluster correlation is not accounted for in the analyses, parameters’ standard errors may be biased. More precisely, the modeling of time-invariant covariates (e.g., participants’ Parts of these analyses were presented at the 52nd Annual Scientific Meeting of the Gerontological Society of America, San Francisco by J. F. Riand, B. Vascotto, C. Cordonier, and C. J. Lalive d’Epinay. This research was supported by the Swiss National Science Foundation Priority Program “Switzerland: Towards the Future” (5004-047750/047752, Principal Investigator, C. Lalive d’Epinay), the “Département de la Santé et de l’Action Sociale de la République et Canton de Genève,” and the “Département de la Santé du Canton du Valais.” We thank Joanna M. Flemming for her helpful comments on a previous draft.

421

Ghisletta and Spini gender in a longitudinal study) usually results in underestimated standard errors, hence inflated Type I error; time-varying covariates (e.g., age assessed at each occasion in a longitudinal study) usually result in overestimated standard errors, causing Type II error inflation (Hu, Goldberg, Hedeker, Flay, & Pentz, 1998). Moreover, traditional regression models assume the errors to be normally distributed, and the variance of the outcome variable to be constant (i.e., homoscedasticity). However, normally distributed errors and homoscedasticity are assumptions often not adequately met in empirical analytic situations. Overview of GEEs While multilevel, mixed, or random effects models for the analyses of Gaussian correlated data have become popular in the social sciences, (e.g., Bryk & Raudenbush, 1987; Goldstein, 1995; Laird & Ware, 1982; McArdle & Hamagami, 1996; Ware, 1985), there is less familiarity with methods for analyses of nonnormal correlated data (however, for binary data, see Pendergast et al., 1996; Prentice, 1988; Ware, Lipsitz, & Speizer, 1988). Generalized estimating equations (GEEs), first introduced by Liang and Zeger (1986; Zeger & Liang, 1986), have become very popular in the biological, epidemiological, and related disciplines, yet remain less known in the educational and social sciences (Cheong, Fotiu, & Raudenbush, 2001; Miller & Guo, 2000; but cf. e.g., Norton, Bieler, Ennett, & Zarkin, 1996). GEEs provide a general framework for the analyses of continuous, ordinal, polychotomous, dichotomous, and count-dependent data, and relax several assumptions of traditional regression models. We aim to introduce the basics of GEEs and to demonstrate their utility in an analysis of selectivity effects in a longitudinal study of very old people. GEEs represent an extension of the generalized linear model (GLM) (Nelder & Wedderburn, 1972) to accommodate correlated data. This broader class of techniques includes, among others, traditional linear models with normally distributed errors (e.g., ordinary least squares regression, MANOVA, linear random coefficient models, etc.), logit or probit models with binary error distribution, and logarithmic models with Poisson error distribution. GLMs assume that the dependent variable of analyses can be expressed as a linear function of the independent variables (Lipsitz, Fitzmaurice, Orav, & Laird, 1994). A monotonic differentiable link function (technically its inverse) describes the linear relationship between the expected value of the outcome variable and its predictors. Examples of link functions are the identity for Gaussian data, the logit for dichotomous data, and the logarithm for count and positive, continuous data (e.g., Horton & Lipsitz, 1999). It is further assumed that the variance of the dependent variable be a known function of its expectation (thus allowing relaxation of the homoscedasticity assumption). Hence, GLMs do not require the specification of the form of the distribution, but only of the relationship between the outcome mean and the predictors and between the mean and the variance (i.e., the first two moments, Zeger & Liang, 1986). Exponential functions serve this purpose well (Dunlop, 1994). Examples of response probability distributions are normal for Gaussian data, binomial for dichotomous 422

Generalized Estimating Equations and more generally proportion data, Poisson for count data, and gamma for positive, continuous data. This specification corresponds to quasilikelihood models introduced by Wedderburn (1974), which allow relating linear models to nonGaussian data (Zeger & Liang, 1986). GEE is a marginal (or population-averaged) as opposed to a cluster-specific (or subject-specific, conditional) method. Hence, GEEs model the marginal expectations of the outcome and do not specify the joint distribution of a subject’s observations. The method is most sensible when the chief interest lies in the regression equation for the marginal expectations and not in the intracluster correlation structure. Longitudinal research often aims at describing the marginal expectations of the outcome as a function of the predictors. For example, when comparing a group of students subject to a novel teaching technique with a control group on a scholastic performance indicator over an extend period of time, the focus rests not on individuals’ odds ratios but on the average odds ratios of the two groups. Populationaveraged methods model the “average response over the subpopulation that shares a common value of” the predictors as a function of such predictors (Diggle, Heagerty, Liang, & Zeger, 2002, p. 126). Hence, they are appropriate when we have reasons to believe that few values of the predictive variables are shared by many observations. In this context, few subpopulations exist, and they are identified by their common values on the predictors. The desired relationship between the outcome variable and the independent variables maximizes the predictability of the subpopulations and not of the individuals comprising the subpopulations. Cluster-specific approaches, on the other hand, assume that the coefficients relating the predictors to the outcome variable stem from a distribution because “there is natural heterogeneity across individuals in their regression coefficients” (Diggle et al., 2002, p. 129). Here the focus lies on the numerous individuals, each with different values on the predictors. Discussing subpopulations in this latter context is not appropriate. The distinction between the two methods is crucial, especially because of different parameter interpretations. In particular, cluster-specific parameters represent the effect of a unit change in the predictors for all observations sharing equal values on any unmodeled variable (hence with equal Level 1 random component); populationaveraged parameters represent the averaged effect of a unit change in the predictors for the whole population. The two sets of parameters are mathematically related (Hu et al., 1998) and identical when the cluster-specific model obtains zero Level 1 variance (i.e., no Level 1 random effects) (Diggle et al., 2002; Zorn, 2001). Generally speaking, the absolute values of the regression coefficients quantifying covariates’ effects are smaller in population-averaged models than in cluster-specific models (Neuhaus, Kalbfleisch, & Hauck, 1991). This distinction is apparent for analytical situations requiring nonlinear link functions. For example, in the logistic model for the analyses of dichotomous outcomes, the ratio of the population odds is not the average of the ratio of individuals’ odds. For more details on this distinction, see Hu et al. (1998), Neuhaus et al. (1991), Pendergast et al. (1996), and Zorn (2001). Heagerty (1999) and Heagerty and Zeger (2000) have shown that a model analogous to GEE can be specified by combining a marginal regression with a random 423

Ghisletta and Spini effects model. This “marginalized random effects model” (cf. Diggle et al., 2002) has several advantages. First, because this approach is likelihood-based (as opposed to quasi-likelihood-based) attractive features such as unbiased estimates when data are missing at random (MAR; as opposed to missing completely at random, MCAR) are retained, even if the intracluster correlation structure is misspecified. Second, it is possible to reinterpret the marginal parameters as cluster-specific parameters, permitting us to address simultaneously both population-level and individual-level conclusions (Heagerty, 1999; see also, Diggle et al., 2002). The main advantage of GEEs lies in the consistent and unbiased estimation of parameters’ standard errors even when the correlation structure is misspecified. Because of this, the intracluster dependency correlation matrix is referred to as working correlation matrix (Liang & Zeger, 1986; Zeger & Liang, 1986). The dependency structure is assumed invariant over all observations, hence representing average time dependency (Hu et al., 1998). Correct specification of the correlation structure augments efficiency (Y.-G. Wang & Carey, 2003) and several specifications are commonly adopted: (1) independent, which assumes the nonexistence of time dependency (i.e., zero intracluster correlation), so that all offdiagonal elements of the working correlation matrix are zero; (2) exchangeable, which assumes constant time dependency, so that all the off-diagonal elements of the correlation matrix are equal; (3) autoregressive, which assumes the correlations to be an exponential function of the time lag; (4) stationary M, which assumes constant correlations within equal time intervals only; (5) M-dependent or nonstationary, which assumes a diagonal of ones, elements outside a specified band (i.e., M) equal to zero, and no constraints for the remaining elements; (6) unstructured, which assumes a saturated, free specification, hence no equality constraints; and (7) specified or fixed, a user-specified matrix. The choice among the several specifications should be based on substantive reasons whenever possible, and sensitivity analyses of the different specifications of the correlation structure are recommended (Y.-G. Wang & Carey, 2003; Zorn, 2001). However, some practical guidelines may aid in choosing a correlation structure. When there are many clusters, each with few observations, and the data are balanced and complete, an unstructured matrix is recommended; when the observations are collected at different time points, a matrix accounting for time information is recommended (e.g., stationary M, M-dependent, or autoregressive); when there is no logical ordering to the observations, the exchangeable matrix is recommended; when there are few clusters, the independence matrix is recommended; when the purpose is to cross-validate a correlational structure, the specified or fixed specification is the obvious choice (Horton & Lipsitz, 1999). For more details see Diggle et al. (2002). To augment the efficiency of GEEs, Prentice (1988; see also, Zhao & Prentice, 1990) introduced a variation called GEE2, which requires the correct specification of both the mean model and the correlation structure. The gain in efficiency, however, seems to be minor (Liang, Zeger, & Qaqish, 1992). Moreover, when the correlation structure is misspecified, the GEE2 estimated parameters are nonconsistent. Given that in empirical situations we often ignore the correct specifi424

Generalized Estimating Equations cation of the working correlation structure, use of GEE2 is not advised unless the correct dependency specification is known (Zorn, 2001). Qu, Lindsay, and Li (2000) proposed a different method to improve efficiency, based on quadratic inference functions. The authors show that their approach, with appropriate choice of scores for the quadratic inference functions, is more efficient than GEEs when the working correlation matrix is misspecified. However, this approach is not implemented in standard statistical software. Practical Considerations of GEEs The assumptions maintained by the GEE method are that: (1) the dependent variable be linearly related to the predictors (when the dependent variable is nonnormally distributed a nonidentity link function is to be selected); (2) the number of clusters be relatively high (a rule of thumb is no fewer than 10, possibly more than 30; Norton et al., 1996); (3) the observations in different clusters be independent (although within-cluster observations may correlate). GEEs offer several advantages. First, GEEs may provide consistent, asymptotically normal, unbiased standard errors, even with incorrect specification of the intracluster dependence structure, assuming the mean model is correctly specified and with complete or missing completely at random data (following the classification of Rubin, 1976). More specifically, GEEs offer two variance estimator algorithms. One algorithm is commonly referred to as model-based or naïve and it is the only one available in the more popular multilevel models. The model-based estimator requires that the correct working correlation matrix be specified. The second estimator is commonly referred to as robust (or empirical, Huber/White sandwich, model-free, agnostic), meaning that it is robust to misspecification of the working correlation matrix. Moreover, Cheong et al. (2001) showed, via simulation studies, that even when data are naturally organized within clusters, and the analyses do not account for such clusters, in large sample sizes the robust estimation yields correct standard errors (see also Raudenbush & Bryk, 2001). Nevertheless, correct specification of the correlation structure augments efficiency (especially with smaller total sample sizes) and is desirable with incomplete data. Furthermore, the model-based estimator may be appropriate with small total sample sizes and/or few clusters. Second, because GEEs accommodate the GLM to correlated data, this method can be applied to a broad range of outcome variables often encountered in empirical applications (e.g., continuous, ordinal, polychotomous, dichotomous, and count). Third, GEEs imply no strict distribution assumptions. Instead, they assume the variance of the outcome variable to be expressed as a known function of the expectation. Fourth, GEEs can be applied to incomplete data, given that individual observations are MCAR. Fifth, GEEs can be implemented with several commonly available software (e.g., SAS PROC GENMOD, Stata, S-Plus and R, SUDAAN, HLM) (see Horton & Lipsitz, 1999 for a review of software). Finally, in longitudinal applications, GEEs can account for unequal interwave spacing, so that unbalanced data can easily be analyzed. 425

Ghisletta and Spini As any other statistical methodology, GEEs are bound by some limitations. First, the technique is asymptotic, hence requiring large total sample sizes for unbiased and consistent estimation (as usual, what represents a large sample size is not unanimously agreed upon). Second, in applications to empirical data, sensitivity analyses of different specifications of the intracluster correlation matrix are advised. As mentioned, the form of the dependency correlation matrix does not affect parameter estimates as long as the mean model is correctly specified. Last, the technique assumes missing completely at random data, because GEEs do not specify the full conditional likelihood. However, GEEs do not yield a great deal of bias with MAR data (Fitzmaurice, Laird, & Rotnitzky, 1993). Much work is currently underway to assess GEEs’ behavior in the presence of different kinds of missing data (Diggle et al., 2002; Zorn, 2001). In particular, Robins, Rotnitzky, and Zhao (1995) propose an extension of GEEs for data that are MAR (several approaches to identify the type of dropout have been suggested, e.g., Diggle et al., 2002; Qu & Song, 2002). This consists in creating weights for each observation, at each time point, on the basis of previous observations and informative covariates. The weights represent the inverse probability of having dropped out (see also Heyting, Tolboom, & Essers, 1992; Robins & Rotnitzky, 1992) and are then incorporated into the GEE model to create weighted estimating equations, which aim at providing consistent and unbiased estimators. This approach assumes that the parametric model for dropout is correctly specified. For example, Carlin, Wolfe, Coffey, and Patton (1999) applied a logistic regression model to compute the dropout weights on the basis of the most relevant covariates. This approach was then generalized by Rotnitzky, Robins, and Scharfstein (1998) to allow for nonignorable nonresponse, where the dropout mechanism is related to the value of the response at the time and after dropout (MAR assumes the dropout mechanism to be related only to responses collected before dropout). Scharfstein, Rotnitzky, and Robins (1999a, 1999b) further expanded this approach by allowing the dropout model to be semiparametric. This extension is particularly meaningful when time to dropout is a known, continuous random variable, dependent on the outcome. In this situation, time to dropout and informative covariates can be used to estimate the dropout weights with a semiparametric proportional hazard model. Another extension, for binary outcomes, was proposed by Fitzmaurice et al. (1993), who discussed a likelihood-based approach using a log-linear representation. This likelihood-based “mixed parameter” model yields likelihood equations for the regression parameters that are analogous to those of GEEs. Assuming that the mean structure is correctly specified, consistent estimates of the regression parameters are obtained with MAR data, even if the intracluster correlation matrix is misspecified. Time-invariant covariates may require some special care when applying GEEs. Pepe and Anderson (1994) concluded that using longitudinal data to estimate a cross-sectional relationship requires that the “full covariate conditional model” (FCCM) hold. The FCCM assumption states that the expected outcome mean at time t conditional on the covariates assessed at all time points up to t be equal to 426

Generalized Estimating Equations the expected outcome mean at time t conditional on the covariates assessed at time t only. This assumption holds for time-invariant covariates, but may not hold for time-varying covariates. Moreover, if a feedback process were in play, such that the outcome at a certain time point may influence future covariates’ values, the FCCM assumption would require the inclusion of future covariate values. This issue is not limited to GEEs but important to all likelihood-based methods (Diggle et al., 2002). A solution proposed by Pepe and Anderson (1994) consists of applying a GEE model with a working independence correlation specification. An Application: Selectivity Effects in a Longitudinal Study on Very Old Individuals It is well documented that selectivity effects in longitudinal study on old participants represent a potential threat to internal and external validity (Cook & Campbell, 1979). Survivors who continue to participate in adult longitudinal (especially gerontological) studies are often younger, healthier, better educated, of advantaged socioeconomic background, and exhibiting higher cognitive performance (Krauss, 1980; Lindenberger, Singer, & Baltes, 2002; McArdle, 1994; Nesselroade, 1988; Pearson, 1903; Rubin, 1976; Schaie, 1973). Hence, in longitudinal analyses of older samples, generalizability of results hinge heavily upon the limitations of selectivity effects. Several methods have been advanced to assess covariates’ effects on the probability of dropout (e.g., Goodman & Blum, 1996; Guttman & Olkin, 1989; Lindenberger et al., 2002; Riegel, Riegel, & Meyer, 1968; Shadish, Hu, Glaser, Kownacki, & Wong, 1998). Because (1) being a participant at one wave is usually contingent upon participation at previous waves (if we assume monotonic study attrition, that is missing values are dropouts rather than intermittent values); (2) the specification of the resulting intraclass correlation matrix of the participation variable is most likely unknown to the analyst; (3) we are mainly concerned with the concurrent, cross-sectional relationships between study participation and antecedents, so that the longitudinal information is not regarded, except for adjustment to the intraclass correlation; and (4) being a participant at a given wave is most easily represented by a dichotomous variable (e.g., 0 = participant, 1 = not participant), the selectivity effects of covariates in a longitudinal study can be assessed with a GEE model using a logit link function (Little & Schenker, 1995). Often, in longitudinal studies, researchers want to reach general conclusions about the dropout process at work across all waves of measurement. Instead of analyzing covariates effects on each wave separately, we might thus opt for a more efficient strategy by pooling information across all time periods and treating it as cross sectional, nevertheless taking into account the correlation among repeated measurements for the same unit of observations. In the illustration, we compare a traditional logistic regression model (Walker & Duncan, 1967) to a GEE model with logit link function and binomial variance function. The two approaches, which differ in their consideration of the intracluster correlation, are identical if the working correlation matrix of the GEE model is specified as independent and if the nonrobust (model-based) standard error estimator is chosen. Moreover, we also compare the 427

Ghisletta and Spini more popular random effects logistic regression model to the GEE models. Again, the former model does not account for the time dependency of the outcome, but allows for the estimation of the effects of (latent) unmeasured covariates via random effects of the intercept. Finally, we compare GEEs with logit link functions and binomial variance function with different specifications of the intraclass correlation matrix. In all models, the dependent variable is coded as 1 for dropout, 0 for participation. Participants The data come from the Swiss Interdisciplinary Longitudinal Study on the Oldest Old (SWILSO-O) (Lalive d’Epinay, Pin, & Spini, 2001). SWILSO-O is a multicohort, longitudinal study on the psychological, health, social, and sociological situation as well as trajectory of a sample of octogenarians. For application purpose, data from the first five waves of the first cohort are analyzed here (at Wave 1, N = 340, mean age = 81.9, age range = 79.2 − 84.4; at Wave 5, N = 172 − 50.6% of Wave 1 sample, mean age = 86.6, age range = 84.3 − 89.2). At Wave 1, 168 participants were female (sex = 1; males sex = 0). One hundred seventy-three participants resided in the urban context of Geneva (canton = 1), while the remaining 167 lived in the more rural context of the canton of Valais (canton = 0; also in the French part of Switzerland). Variables Participants were assessed on a rich battery of psychological, health, social, and sociological variables administered individually by trained testers during face-to-face interviews. The analyses presented here focused on usual correlates of selectivity. We considered age, sex, living context (urban vs. semirural), living arrangement, depressive symptoms, physical troubles, and socioeconomic status (assessed as a dichotomous composite of education, income while professionally active, retirement funds, and other revenues; for example, private investments). The living arrangement predictor was coded 1 as living alone (132 at Time 1) versus 0 for living with others (203 at Time 1; 5 participants did not answer). Depressive symptoms were assessed with the Wang Self-Assessing Depression Scale (SADS) (R. Wang, Treul, & Alverno, 1975), which ranges from 0 to 10 symptoms. At Time 1, the average number of depressive symptoms was 2.0 (SD = 2.1). The physical troubles composite represents the total of items assessing the presence of pain in six bodily parts (lower limbs, upper limbs, head, back, stomach, genitals, chest), difficulties in three bodily functions (urinary, cardiac, respiratory), and presence of general, recurrent fever. The average number of reported physical troubles at Time 1 was 0.9 (SD = 1.3). At Time 1, 122 were of average or superior SES level (SES = 1), while 218 were classified at inferior SES status (SES = 0). Results Table 1 contains the parameter estimates and standard errors (SEs) of a traditional and a random effects logistic regression model (both of which disregard the dependence structure of the repeated measurements) and five GEE models, specified with 428

Generalized Estimating Equations TABLE 1 Parameter Estimates and Standard Errors for Logistic Regression Model, Random Effects Logistic Regression Model, and for GEE Models

Predictor Age Sex SES Canton Alone SADS Phys. Wave Constant R.E.Cons.

Log RE Log Regr Regr (naïve SE) (naïve SE) 0.088 (0.046) !0.366 (0.146) !0.833 (0.149) 0.712 (0.134) −0.187 (0.146) 0.076 (0.034) 0.174 (0.054) 0.678 (0.052) !10.786 (3.794) — —

0.197 (0.077) −0.333 (0.259) !0.843 (0.259) 0.732 (0.237) −0.165 (0.261) 0.088 (0.061) 0.176 (0.100) 0.651 (0.054) !19.789 (6.407) 2.771 (0.332)

GEE (Logit Link with Binomial Variance) (robust SE) Working Correlation Specification Indep

Exch

Unstr

Nonstat

AR-1

0.088 0.232 0.155 0.071 0.387 (0.79) (0.076) (0.073) (0.082) (0.070) −0.366 −0.280 −0.317 −0.390 −0.245 (0.237) (0.264) (0.210) (0.242) (0.257) !0.833 !0.852 !0.575 !0.648 !0.833 (0.260) (0.284) (0.213) (0.260) (0.275) 0.712 0.732 0.507 0.713 0.767 (0.230) (0.258) (0.194) (0.231) (0.255) −0.187 −0.260 −0.286 −0.252 −0.196 (0.246) (0.275) (0.216) (0.250) (0.270) 0.076 0.084 0.074 0.076 0.069 (0.061) (0.068) (0.050) (0.061) (0.067) 0.174 0.172 0.158 0.210 0.162 (0.096) (0.111) (0.076) (0.093) (0.108) 0.678 0.625 0.577 0.800 0.564 (0.038) (0.038) (0.046) (0.044) (0.040) −10.786 !22.440 !16.239 −10.026 !35.260 (6.534) (6.255) (6.043) (6.763) (5.816) — — — — —

Notes. Bold parameter estimates refer to significant predictors (alpha = 5%). Log Regr = logistic regression; RE Log Regr: random effects logistic regression; SE = standard error; Indep. = independent; Exch = exchangeable; Unstr = unstructured; Nonstat1 = Nonstationary with band 1; AR-1 = Autoregressive with lag 1 wave; Sex (1 = female, 0 = male); SES (1 = high, 0 = average or low); Canton (1 = Geneva, 0 = Valais); Alone (1 = alone, 0 = not alone); SADS = Wang Self-Assessing Depression Scale; Phys = Number of physical troubles; RE Cons = random effect of constant (i.e., intercept).

a common logit link function with binomial variance distribution, but with different working correlation matrices. Given that the traditional and random effects logistic regression models typically do not provide robust estimation of the SEs, only naïve SEs are presented for these models. On the other hand, both robust and naïve SE estimations are available for GEEs. Because one of the main advantages of GEEs is the computation of robust SEs, and to simplify the presentation, we opted to display only the robust estimates of SEs for the GEE models. Theoretically, and asymptotically, the specification of the dependence structure should not affect parameter estimates, but in research situations empirical factors (e.g., nature of dependence, number of clusters, patterns of data incompleteness, etc.) are often 429

Ghisletta and Spini differentially influential across different working correlations. Hence, the different estimated working correlation matrices are presented in Table 2. The traditional logistic regression model obtained that all time-invariant covariates except for living arrangements were significant predictors of drop out (at a 5% decision criterion). Age, which was entered as a time-invariant covariate, was not significant in this first model. The random effects logistic regression obtained age, SES, canton, and wave as predictors of drop out. Moreover, both fixed and random effects of the intercept were significant. The independence GEE model obtained the same significant predictors of drop out as the random effects logistic regression model, except for age. In fact, these three models posit the intracluster dependence to be zero. The remaining GEE models, as expected, obtained that the time-varying covariate age was significant. However, the unstructured and the nonstationary GEE models also obtained physical status as meaningful of study participation. The strong effects of low or median SES, urban context (i.e., canton), and wave appeared in all models. Although the GEE2 model is better suited if the interest lies in the substantive interpretation of the correlation structure, examination of the estimated intracluster dependency matrix is informative about the feasibility of its specification, and hence on the validity of the GEEs parameter estimates (Johnston, 1996; Y.-G. Wang & Carey, 2003). The unstructured correlation matrix represents the most

TABLE 2 Working Correlation Estimates for GEE with Logit Link Function and Binomial Variance Distribution Specification of Working Correlation Matrix rts, t≠s

Exchangeable 0.50 Unstructured

Wave 1 2 3 4 5

1 1.00 −0.06 −0.06 −0.04 −0.02

Nonstationary 1

2

3

1.00 1.00 0.81 0.60

1.00 1.00 0.75

4

5

1.00 0.85

1.00

1 1.00 −0.07 — — —

2

3

4

5

1.00 1.00 — —

1.00 0.91 —

1.00 0.76

1.00

Autoregressive 1 Wave 1 2 3 4 5

1 1.00 0.73 0.53 0.39 0.29

2

3

4

5

1.00 0.73 0.53 0.39

1.00 0.73 0.53

1.00 0.73

1.00

Note. rts, t≠s means correlation between any two off-diagonal elements.

430

Generalized Estimating Equations general specification. Indeed, all its coefficients (here (5 × 4) / 2 = 10) are freely estimated, without any constraints. Because of its generality, the unstructured specification should recover any other specification. The remaining correlation specifications, in fact, impose more parameter constraints. The most parsimonious specification is the exchangeable, which estimates one correlation coefficient for all possible intervals between any two waves. Hence, the exchangeable specification, although popular, is generally not adequate for longitudinal designs. In our application, the marginal correlation between any two waves is estimated, according to the exchangeable specification, at 0.50. The unstructured matrix shows the presence of a highly unusual process in place between Wave 1 and any other wave of the study. The marginal correlations between Wave 1 and the subsequent waves are close to zero (between −0.02 and −0.06). From Wave 2 on, however, the dropout process seems to be more homogeneous and follows a sensible simplex pattern (whereby, as the interwave period is larger, the marginal correlation between dropouts is weaker). A similar pattern is visible in the nonstationary 1 correlation matrix. There, the marginal correlation between Waves 1 and 2 is only −0.07, whereas the remaining indices are much stronger and positive. The autoregressive 1 and the exchangeable specifications must accommodate for the unusual dropout process occurring immediately after Wave 1 with many fewer parameters. Hence, the distinctive dropout process observed before and after the first wave is “washed out” in the autoregressive and the exchangeable specifications. This wave effect is confirmed by its significant regression weight in all models. Discussion The goal of this article is to discuss some properties, practical considerations, and an application of GEEs, a method for the analysis of clustered data. Although GEEs are widely applied in biological, pharmacological, and closely related disciplines, their application in educational and social sciences remains relatively scarce. However, data naturally organized within hierarchies or from longitudinal and panel studies are very frequent in educational and social sciences. For such data, the application of traditional regression models is not adequate; in particular, the statistical dependence arising from the similarity of observations organized within the same cluster, or stemming from the same participant assessed repeatedly, necessitates analyses that do not assume such dependence to be zero. Although results from traditional regression models are based on the assumption that the units of analysis are independent, GEEs represent an extension of (generalized) linear models to correlated data (i.e., where the units of analysis need not be independent). Hence, GEEs can potentially disentangle effects caused by the covariates of interest from the effects external to the unit of analysis but inherent to the dependency process (e.g., the effect of teacher’s assignment practice vs. children’s intellectual abilities on children’s school performance). Although an array of methodological approaches account for the intraclass correlation, especially when the dependent variable is normally distributed, GEEs offer the additional advantage of not requiring the correct specification of the correlation matrix in 431

Ghisletta and Spini order to reach unbiased statistical conclusions about the covariates’ effects, given that the robust, model-free estimation of standard errors be applied. The example centered on assessing predictors of drop out in a longitudinal study of a very old Swiss sample assessed five times. The application of a traditional logistic regression model yielded findings in disagreement with all logit GEE models considered. As expected, the standard errors of the logistic model for time-invariant covariates, when dependency is not accounted for, are overestimated, augmenting the likelihood of rejecting the null hypothesis (that a predictor’s estimated coefficient parameter be equal to zero). This occurred for all time-invariant predictors and changed the statistical conclusion of sex and presence of depressive symptoms. On the other hand, standard errors of time-varying covariates were underestimated by the traditional logistic regression model. This was the case for age and wave and resulted in a different statistical outcome for age. In sum, of the models considered, the logistic regression model concluded with the highest number of statistically significant predictors of dropout, given that most predictors were time-invariant. In this application, we favor the logit GEE model with unstructured dependence specification because of three reasons. First, the unstructured GEE model reflected that the dropout process between the first and all subsequent waves was different from the successive dropout processes. After Wave 1, 12.35% of living participants refused to participate again (compared to rates ranging from 2.99% to 5.74% in subsequent waves). This difference was also partially confirmed by the GEE model with independence and nonstationary 1 dependence working specification. The remaining GEE models (with independence, autoregressive 1, and exchangeable dependence specification) were not receptive to the unusual dropout process between Wave 1 and all subsequent waves. Second, the unstructured GEE model confirmed in our sample well-known dropout antecedents in aging studies. Older ages, lower physical health status, and lower SES are usually the strongest predictors of drop out in gerontological research (Krauss, 1980; Schaie, 1973). Third, the unstructured GEE model agrees with the conclusions of the independence GEE model, which is most robust to misspecification of the correlation structure in the presence of time-varying covariates (Pepe & Anderson, 1994). The only disagreement between the two models concerns the time-varying predictor age and the time-invariant predictor physical health. However, on substantive grounds we have strong reasons for considering these two predictors as important in predicting study participation, even in an initially relatively healthy and generally age-homogenous sample as ours. Two kinds of reasons may be invoked to explain why the attrition process between Waves 1 and 2 was different from attrition processes in place between any two subsequent waves. A main explanation may be related to a general attrition process related to longitudinal studies. In a meta-analysis covering 85 longitudinal studies in the field of substance abuse research, Hansen, Tobler, and Graham (1990) showed that the percentage of participants’ retention quickly drops in the beginning of the studies and then declines slowly after the participants agree to take part in the longitudinal process (i.e., from Wave 2 on). Of the many causes responsible for higher Wave 1–Wave 2 dropout rates in longitudinal studies are the pres432

Generalized Estimating Equations ence of transient participants who, after having answered to the first questionnaire or undergone the first assessment protocol, may consider the measurements too intrusive. In SWILSO-O some questions that might be perceived as too intrusive concern sensitive health-related issues and participants’ financial situation. Participants may also be disappointed from the Wave 1 measurement instruments in terms of content or perceived impact on participant’s life. A subsidiary reason for the different dropout processes recovered from the analyses may also be invoked. Because at the inception SWILSO-O was financed and planned as a cross-sectional study (as many other longitudinal studies), participants were not told that they would be contacted again. The fact that contact was made with all living participants about 12 months after the first wave probably also caused some participants to distrust the researchers who initially reassured them of data confidentiality. Of course, confidentiality was not broken, but receiving a letter asking if participants wanted to participate in subsequent interviews and announcing a future telephone contact was likely perceived as paradoxical to some participants. Even if participants’ reactions were generally positive, we may not totally exclude a posteriori this possibility. In sum, a general effect present in most longitudinal studies and a related, subsidiary effect linked to the history of SWILSO-O may explain why the attrition process between Waves 1 and 2 was different from attrition processes in place between any two subsequent interwave periods. Utilizing logit GEE models in this context most likely represents progress. Nevertheless, the choice of the correlation matrix, as we have demonstrated, is not superfluous. Global, synthetic conclusions about the validity of longitudinal results concerning this very old population would have been biased had we adopted a dropout analysis, which discounted, or badly accounted for, participation carryover effects. More precisely, logit GEE models with badly chosen specifications of the correlation matrix (here the exchangeable and autoregressive), given the substantive nature of the problem at hand, do not yield the same results as logit GEE models with more realistic specifications of the dependence structure (here the unstructured and nonstationary 1). Further studies concerning the choice of the dependency structure under common conditions (e.g., few number of clusters and/or few observations per cluster, data not missing completely at random) will be invaluable to applied researchers. Also, other dependence specifications might be considered in the future. For instance, in our example it seemed that the dependence structure between Wave 1 and all successive waves was quite different from the dependence structures in place between any other time periods. Perhaps a more parsimonious but still precise description of the global correlational structure would be a composite of an exchangeable specification (between Wave 1 and all subsequent waves) and an autoregressive 1 or 2 for the remaining correlations. Conclusions Researchers applying GEEs have a number of questions to answer before plunging into the analytic process. The link function with its corresponding variance distribution, the mean model, and the correlation structure must be chosen. 433

Ghisletta and Spini Even though several guidelines about these choices are now available in the literature, we join Johnston (1996), Zorn (2001), and Wang and Carey (2003) in advising sensitivity analyses of the correlational structure. From a practical viewpoint, many software implement GEE analyses with a wide variety of options. Hence, the task of analyzing data with GEEs has been greatly facilitated. In sum, GEEs are a widely available, relatively easy class of statistical models accounting for correlated data. The main advantage of GEEs resides in the robust estimation of parameters’ standard errors, even when the correlation structure is misspecified. Two very common research settings in educational and social sciences where GEEs can be advantageously applied are longitudinal designs (Liang & Zeger, 1986) and cross-sectional, hierarchical settings (Raudenbush, 1995). Furthermore, GEEs are applicable to count, dichotomous, categorical, ordinal, and normal data. We hope that educational and social scientists will consider GEEs in empirical situations where more traditional regression-type models are not adequate. Appendix: Suggested Literature For readers interested in deepening their understanding of GEEs, we advise the original articles by Liang and Zeger (1986), Zeger and Liang (1986), and Diggle et al. (2002). For an excellent, rather comprehensive introduction to GEEs, Zorn (2001) provides the main features of the model, both from a statistical and an applied perspective. A comprehensive book on GEEs is that of Hardin and Hilbe (2002). Dunlop (1994) elucidates, in a rather statistically oriented language, the extension of traditional regression methods to longitudinal data via GLMs and GEEs. For a comparison of population-averaged and cluster-specific approaches to the analyses of correlated data see Hu et al. (1998), Pendergast et al. (1996), and Neuhaus et al. (1991). Some applications of GEEs to educational and social science data appear in Raudenbush (1995), Cheong et al. (2001), Norton et al. (1996), and Miller and Guo (2000). Horton and Lipsitz (1999) review several software to implement GEE analyses. References Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147–158. Carlin, J. B., Wolfe, R., Coffey, C., & Patton, G. C. (1999). Analysis of binary outcomes in longitudinal studies using weighted estimating equations and discrete-time survival methods: Prevalence and incidence of smoking in an adolescent cohort. Statistics in Medicine, 18, 2655–2679. Cheong, Y. F., Fotiu, R. P., & Raudenbush, S. W. (2001). Efficiency and robustness of alternative estimators for two- and three-level models: The case of NAEP. Journal of Educational and Behavioral Statistics, 26, 411–429. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin. Diggle, P. J., Heagerty, P., Liang, K.-Y., & Zeger, S. L. (2002). Longitudinal data analysis (2nd ed.). Oxford, UK: Oxford University Press.

434

Generalized Estimating Equations Dunlop, D. D. (1994). Regression for longitudinal data: A bridge from least squares regression. The American Statistician, 48, 299–303. Fitzmaurice, G. M., Laird, N. M., & Rotnitzky, A. G. (1993). Regression models for discrete longitudinal responses. Statistical Sciences, 8, 284–309. Goldstein, H. (1995). Multilevel statistical models (2nd ed.). London, UK: Edward Arnold. Goodman, J. S., & Blum, T. C. (1996). Assessing the non-random sampling effects of subject attrition in longitudinal research. Journal of Management, 22, 627–652. Guttman, I., & Olkin, I. (1989). Retention or attrition models. Journal of Educational Statistics, 14, 1–20. Hansen, W. B., Tobler, N. C., & Graham, J. W. (1990). Attrition in substance abuse prevention research: A meta analysis of 85 longitudinally followed cohorts. Evaluation Review, 14, 677–685. Hardin, J. W., & Hilbe, J. M. (2002). Generalized estimating equations. London, UK: Chapman & Hall/CRC Press. Heagerty, P. J. (1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics, 55, 688–698. Heagerty, P. J., & Zeger, S. L. (2000). Marginalized multilevel models and likelihood inference [with discussion]. Statistical Science, 75, 1–26. Heyting, A., Tolboom, J. T. B. M., & Essers, J. G. A. (1992). Statistical handling of dropouts in longitudinal clinical trials. Statistics in Medicine, 11, 2043–2062. Horton, N. J., & Lipsitz, S. R. (1999). Review of software to fit Generalized Estimating Equation regression models. The American Statistician, 53, 160–169. Hu, F. B., Goldberg, J., Hedeker, D., Flay, B. R., & Pentz, M. A. (1998). Comparison of population-averaged and subject-specific approaches for analyzing repeated bynary outcomes. American Journal of Epidemiology, 147, 694–703. Johnston, G. (1996). Repeated measures analysis with discrete data using the SAS system. Cary, NC: SAS Institute. Krauss, I. K. (1980). Between- and within-group comparisons in aging research. In L. W. Poon (Ed.), Aging in the 1980s (pp. 542–551). Washington, DC: American Psychological Association. Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974. Lalive d’Epinay, C., Pin, S., & Spini, D. (2001). Présentation de SWILSO-O, une étude longitudinale suisse sur la grand âge: L’exemple de la dynamique de la santé fonctionnelle. L’Année Gérontologique, 15, 78–96. Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using Generalized Linear Models. Biometrika, 73, 13–22. Liang, K.-Y., Zeger, S. L., & Qaqish, B. (1992). Multivariate regression analyses for categorical data. Journal of the Royal Statistical Society, Series B, 54, 3–40. Lindenberger, U., Singer, T., & Baltes, P. B. (2002). Longitudinal selectivity in aging populations: Separating mortality-associated versus experimental components in the Berlin Aging Study. Journals of Gerontology: Psychological Sciences, 57B, P474–P482. Lipsitz, S. H., Fitzmaurice, G. M., Orav, E. J., & Laird, N. M. (1994). Performance of Generalized Estimating Equations in practical situations. Biometrics, 50, 270–278. Little, R. J. A., & Schenker, N. (1995). Missing data. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling for the social and behavioral sciences. New York: Plenum. McArdle, J. J. (1994). Structural factor analysis experiments with incomplete data. Multivariate Behavioral Research, 29, 409–454.

435

Ghisletta and Spini McArdle, J. J., & Hamagami, F. (1996). Multilevel models from a multiple group structural equation perspective. In G. A. Marcoulides & R. E. Schumaker (Eds.), Advanced structural equation modeling. Issues and techniques (pp. 89–124). Mahwah, NJ: Erlbaum. Miller, B., & Guo, S. (2000). Social support for spouse caregivers of persons with dementia. Journals of Gerontology: Psychological Sciences and Social Sciences, 55B, S163–S172. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135, 370–384. Nesselroade, J. R. (1988). Sampling and generalizability: Adult development and aging research issues examined within the general methodological framework of selection. In K. W. Schaie, R. T. Campbell, W. Meredith, & S. C. Rawlings (Eds.), Methodological issues in aging research. New York: Springer. Neuhaus, J. M., Kalbfleisch, J. D., & Hauck, W. W. (1991). A comparison of clusterspecific and population-averaged approaches for analyzing correlated binary data. International Statistical Review, 59, 25–35. Norton, E. C., Bieler, G. S., Ennett, S. T., & Zarkin, G. A. (1996). Analysis of prevention program effectiveness with clustered data using Generalized Estimating Equations. Journal of Consulting and Clinical Psychology, 64, 919–926. Pearson, K. (1903). On the influence of natural selection of the variability and correlation of organs. Philosophical Transactions of the Royal Society of London, Series A, 200, 1–66. Pendergast, J. F., Gange, S. J., Newton, M. A., Lindstrom, M. J., Palta, M., & Fisher, M. R. (1996). A survey of methods for analyzing clustered binary response data. International Statistical Review, 64, 89–118. Pepe, M. S., & Anderson, G. L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics, 23, 939–951. Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1033–1048. Qu, A., Lindsay, B. G., & Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika, 87, 823–836. Qu, A., & Song, P. X.-K. (2002). Testing ignorable missingness in estimating equation approaches for longitudinal data. Biometrika, 89, 841–850. Raudenbush, S. W. (1995). Reexamining, reaffirming, and improving application of hierarchical models. Journal of Educational and Behavioral Statistics, 20, 210–220. Raudenbush, S. W., & Bryk, A. S. (2001). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA.: Sage. Riegel, K. F., Riegel, R. M., & Meyer, G. (1968). A study of the dropout rates in longitudinal research on aging and the prediction of death. In B. L. Neugarten (Ed.), Middle age and aging (pp. 563–570). Chicago, IL: The University of Chicago Press. Robins, J. M., & Rotnitzky, A. G. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In N. Jewell, K. Dietz, & V. Farewell (Eds.), AIDS epidemiology—Methodological issues (pp. 297–331). Boston: Birkhäuser. Robins, J. M., Rotnitzky, A. G., & Zhao, L. P. (1995). Analysis of semi-parametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106–121. Rotnitzky, A. G., Robins, J. M., & Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association, 93, 1321–1339. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

436

Generalized Estimating Equations Schaie, K. W. (1973). Methodological problems in descriptive developmental research on adulthood and aging. In J. R. Nesselroade & H. W. Reese (Eds.), Life-span developmental psychology. Methodological issues (pp. 253–280). New York: Academic Press. Scharfstein, D. O., Rotnitzky, A. G., & Robins, J. M. (1999a). Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94, 1096–1999. Scharfstein, D. O., Rotnitzky, A. G., & Robins, J. M. (1999b). Rejoinder. Journal of the American Statistical Association, 94, 1135–1146. Shadish, W. R., Hu, X., Glaser, R. R., Kownacki, R., & Wong, S. (1998). A method for exploring the effects of attrition in randomized experiments with dichotomous outcomes. Psychological Methods, 3, 3–22. Walker, S. H., & Duncan, D. B. (1967). Estimation of the probability of an event as a function of several independent variables. Biometrika, 54, 167–179. Wang, R., Treul, S., & Alverno, L. (1975). A brief self-assessing scale. Journal of Clinical Pharmacology, 15, 163–167. Wang, Y.-G., & Carey, V. (2003). Working correlation structure misspecification, estimation and covariate design: Implications for generalised estimating equations performance. Biometrika, 90, 29–41. Ware, J. H. (1985). Linear models for the analysis of several measurements in longitudinal studies. The American Statistician, 39, 95–101. Ware, J. H., Lipsitz, S. H., & Speizer, F. E. (1988). Issues in the analysis of repeated categorical outcomes. Statistics in Medicine, 7, 95–108. Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models, and the Gaussian–Newton method. Biometrika, 61, 439–447. Zeger, S. L., & Liang, K.-Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42, 121–130. Zhao, L. P., & Prentice, R. L. (1990). Correlated binary regression using a quadratic exponential model. Biometrika, 77, 642–648. Zorn, C. J. W. (2001). Generalized Estimating Equation models for correlated data: A review with applications. American Journal of Political Science, 45, 470–490.

Authors PAOLO GHISLETTA is Assistant Research Professor at the Center for Interdisciplinary Gerontology and at the Faculty of Psychology and Educational Sciences of the University of Geneva, Switzerland, Route de Mon-Idée 59, 1226 Thônex, Geneva, Switzerland; [email protected]. His main research interests are statistical applications for the analysis of change (especially structural equation and multilevel modeling), lifespan development, cognitive and sensory aging, and resource management in elderly individuals. DARIO SPINI is Associate Professor at the Faculty of Social and Political Sciences and at the Center for Life Course and Life Style Studies of the University of Lausanne, Bâtiment Provence, University of Lausanne, 1015 Lausanne, Switzerland; [email protected]. His main research interests are developmental social psychology, political psychology, individual and collective vulnerability, social psychological regulations, and applications of multivariate and multilevel statistical techniques to longitudinal and cross-cultural data. Manuscript received May 2, 2003 Revision received October 15, 2003 Accepted November 13, 2003

437