Weighted Generalized Estimating Functions for ...

2 downloads 0 Views 306KB Size Report
estimating equations (GEE) (Liang and Zeger 1986) yield consistent ... a class of inverse probability weighted generalized estimating equations (IPWGEE) which.
Weighted Generalized Estimating Functions for Longitudinal Response and Covariate Data which are Missing at Random Baojiang Chen 1 , Grace Y. Yi ∗ 2 and Richard J. Cook 2

1

Department of Biostatistics, University of Washington Seattle, Washington, US 98195 2

Department of Statistics and Actuarial Science University of Waterloo 200 University Avenue West Waterloo, Ontario, Canada N2L 3G1



Corresponding author : Grace Y. Yi, phone: 519-888-4567 x35110; fax: 519-746-1875; email:

[email protected]

Weighted Generalized Estimating Functions for Longitudinal Response and Covariate Data which are Missing at Random

Abstract Longitudinal studies often feature incomplete response and covariate data. It is well known that biases can arise from naive analyses of available data, but the precise impact of incomplete data depends on the frequency of missing data and the strength of the association between the response variables and covariates and the missing data indicators. Different factors may influence the availability of response and covariate data at scheduled assessment times, and at any given assessment time the response may be missing, covariate data may be missing, or both response and covariate data may be missing. Here we show that it is important to take the association between the missing data indicators for these two processes into account through joint models. Inverse probability weighted generalized estimating equations offer an appealing approach for doing this and we develop these here for a particular model generating intermittently missing at random data. Empirical studies demonstrate that the consistent estimators arising from the proposed methods have very small empirical biases in moderate samples. K EYWORDS: generalized estimating equations; inverse probability weights; joint models; longitudinal data; missing covariate; missing response.

1.

INTRODUCTION

Incomplete longitudinal data often arise in comparative studies because of difficulties in ascertaining responses at scheduled assessment times, partially completed forms or questionnaires, patients refusal to undergo complete examinations, or study subjects failing to attend a scheduled clinic visit. Problems ensue if the mechanism leading to the missing data is dependent on the response or covariates. Analyses based only on individuals with complete data can lead to invalid inferences in this case. Under a missing completely at random (MCAR) mechanism (Little and Rubin 2002), analyses based on generalized estimating equations (GEE) (Liang and Zeger 1986) yield consistent estimates of the regression parameters. However, when the data are missing at random (MAR) or missing not at random (MNAR) (Little and Rubin 2002), analyses based on GEE give inconsistent estimates. Robins and Rotnitzky (1995) and Robins, Rotnitzky, and Zhao (1995) developed a class of inverse probability weighted generalized estimating equations (IPWGEE) which can yield consistent estimates when data are MAR. The weights are obtained from models for the missing data process, and these models must be correctly specified for the resulting estimators to be consistent. The literature on methods for missing data has primarily addressed either missing response or missing covariate data (see, e.g., Fitzmaurice, Lipsitz, Molenberghs, and Ibrahim 2001; Horton and Laird 1998; Ibrahim, Lipsitz, and Horton 2001; Lipsitz, Ibrahim, and Zhao 1999; Zhao, Lipsitz, and Lew 1996), but relatively little work has been done when both can be missing. In practice, of course, data are often unavailable for both responses and covariates, and sometimes there is an association between the missingness of the response and covariates. Valid analysis of this type of data therefore requires consideration of this association since misspecification of the missing data models can yield inconsistent estimates. Chen, Ibrahim, Chen, and Senchaudhuri (2008) provide a careful investigation of likelihood methods for missing response and covariate data via the EM algorithm. 1

Shardell and Miller (2008) propose a marginal modeling approach to estimate the association between a time-dependent covariate and an outcome in longitudinal studies with missing response and missing covariate, but they focus on methods with an assumption that responses are independent. The purpose of this manuscript is to describe a general approach to the construction of estimating equations for parameters of marginal models for longitudinal data with incomplete response and covariate under a MAR scheme. The approach involves inverse probability weighted estimating equations in which the association between the missingness of the response and the missingness of the covariate is addressed. We also highlight the poor properties of estimators obtained when ignoring the correlations between the missingness of responses and covariate. The remainder of this paper is organized as follows. In Section 2, we introduce notation and models. In Section 3, we given the forms of the estimating equations and provide details on estimation and inference. A more efficient approach to estimation is discussed in Section 4 using augmented inverse weighted estimating equations. Numerical studies concerning asymptotic bias and relative efficiency of the proposed estimators are given in Section 5. Data arising from the Waterloo Smoking Prevention Project (Cameron et al. 1999) are analyzed in the application in Section 5. In Section 6 we explore the asymptotic bias of estimators obtained from inverse probability weighted estimating equations with misspecified missing data models and discuss model diagnostics for missing data model. Concluding remarks are made in Section 7. 2.

NOTATION AND MODEL FORMULATION

Consider a trial involving n individuals in which each individual is to be examined at J assessment times. Let Yij denote the response for subject i at the jth assessment and Xij denote a scalar time-dependent covariate. We consider the case where Yij , Xij or both Yij and Xij may be missing for one or more assessment times. A covariate vector Zij contains

2

other time-dependent covariates which are fully observed, j = 1, . . . , J. For convenience 0 0 0 0 we let Yi = (Yi1 , Yi2 , . . . , YiJ )0 , Xi = (Xi1 , Xi2 , . . . , XiJ )0 and Zi = (Zi1 , Zi2 , . . . , ZiJ ).

While we restrict attention here to the case in which a scalar time-dependent covariate may be missing, we provide a more general presentation involving incomplete data on multiple covariates in the supplementary material. The conditional mean of Yij is denoted µij = E(Yij |Xi , Zi ), and we let µi = (µi1 , . . . , µiJ )0 denote the full vector of means. We suppose the mean of Yij depends on the covariate vector for subject i at time j through a model of the form g(µij ) = Xij βx + Zij0 βz

(1)

for j = 1, . . . , J, i = 1, . . . , n, where g(·) is a monotone differentiable link function, and β = (βx , βz0 )0 is a p × 1 vector of regression coefficients of interest. The variance is expressed as vij = var(Yij |Xi , Zi ) = κh(µij ), where h(·) is the variance function, κ is the dispersion parameter. We also make the implicit assumption that E(Yij |Xi , Zi ) = E(Yij |Xij , Zij ) and var(Yij |Xi , Zi ) = var(Yij |Xij , Zij ) as is commonly done (e.g., Pepe and Anderson 1994). y y = 1 if Yij is observed and Rij = 0 To indicate the availability of data we let Rij x x = 0 otherwise. The vectors = 1 if Xij is observed and Rij otherwise, and let Rij y y y 0 x 0 x x ) therefore contain information , . . . , RiJ , Ri2 , . . . , RiJ ) and Rix = (Ri1 , Ri2 Riy = (Ri1

on the completeness of the response and covariate data over all time points for individual i, i = 1, . . . , n. It is helpful to define the histories of the response, covariate and ¯ ij = {Xi1 , . . . , Xi,j−1 }, missing data processes and so we let Y¯ij = {Yi1 , . . . , Yi,j−1 }, X ¯ y = {Ry , . . . , Ry }, and R ¯ x = {Rx , . . . , Rx }. Z¯ij = {Zi1 , . . . , Zi,j−1 }, R ij i1 i,j−1 ij i1 i,j−1 Instead of modeling the joint probability P (Riy = riy , Rix = rix |Yi , Xi , Zi ) for Riy and Rix directly, since we are focusing on the longitudinal setting we restrict attention to cony y x ¯y ¯x x |Rij , Rij , Yi , Xi , Zi ) which reflect the = rij ditional models of the form P (Rij = rij , Rij

dynamic nature of the observation process over time; we can then obtain P (Riy = riy , Rix = 3

rix |Yi , Xi , Zi ) through J Y

y y y y x x ¯y ¯x x x P (Rij = rij , Rij = rij |Rij , Rij , Yi , Xi , Zi ) · P (Ri1 = ri1 , Ri1 = ri1 |Yi , Xi , Zi ) .

j=2

This joint conditional probability model may be formulated based on marginal probabilities y ¯y , R ¯ x , Y i , Xi , Z i ) and association parameters as follows. First we let λyij = P (Rij = 1|R ij ij x denote the (marginal) conditional probability that Yij is observed and λxij = P (Rij =

¯y , R ¯ x , Yi , Xi , Zi ) denote the (marginal) conditional probability that Xij is observed. 1|R ij ij We write these probabilities as conditional on the previous missing data indicators for the response and covariate, as well as the full vector of responses and covariates. Particular types of missing data mechanisms are obtained by explicitly indicating the nature of the dependencies on Yi and Xi but we defer this discussion for now. At each time point j, the observation status of the response and covariate may be associated within subjects because of common factors affecting the two observation processes. To model this association we treat the response and covariate observation processes symmetrically and define the conditional odds ratio ψij

y y x x x x ¯y , R ¯ ij ¯y , R ¯ ij P (Rij = 1, Rij = 1|R , Yi , Xi , Zi ) · P (Rij = 0, Rij = 0|R , Y i , Xi , Z i ) ij ij . = y y ¯x y y ¯x x x ¯ ¯ P (Rij = 1, Rij = 0|Rij , Rij , Yi , Xi , Zi ) · P (Rij = 0, Rij = 1|Rij , Rij , Yi , Xi , Zi )

y The parameter ψij is the relative odds that Yij is observed (e.g. Rij = 1) when Xij is

observed versus when Xij is unobserved, given the previous missing data indicators and the full vectors of responses and covariates respectively. If ψij = 1 then the missing indicators for the response and covariate at the jth assessment are conditionally independent. We next y x ¯y ¯x let λxy ij = P (Rij = 1, Rij = 1|Rij , Rij , Yi , Xi , Zi ) denote the joint conditional probability y x 0 ) , conditional on the histories of the indicator variables and the for the pair Rij = (Rij , Rij

entire vector of response and covariates. Note that  2 x y 1/2  aij −[aij −4ψij (ψij −1)λij λij ] , if ψij 6= 1, 2(ψij −1) xy λij =  λx · λy , if ψij = 1, ij ij 4

(2)

where aij = 1 − (1 − ψij )(λxij + λyij ) (e.g., Lipsitz, Laird, and Harrington 1991). The formulation thus far encompasses MCAR, MAR and MNAR mechanisms since we have written the missing data model at assessment j as depending on the full vector of responses Yi and covariates Xi . For missing at random mechanisms we require (o)

(o)

P (Riy = riy , Rix = rix |Yi , Xi , Zi ) = P (Riy = riy , Rix = rix |Yi , Xi , Zi ) , (o)

where Yi

(o)

and Xi

(3)

represent the observed components of Yi and Xi , respectively. How-

ever, in the longitudinal setting with our conditional formulation it is very natural to make the further assumption that y y y y x x ¯y ¯x x x ¯y ¯x P (Rij = rij , Rij = rij |Rij , Rij , Yi , Xi , Zi ) = P (Rij = rij , Rij = rij |Rij , Rij , Yi , Xi , Zi ) , (o)

(o)

(4) for each time point j. It can be seen that (4) implies (3), but not vice versa. Moreover, while mechanism (3) covers a larger class of MAR models than (4), models under (4) are easier to formulate and interpret. Finally, many useful models can be embedded into the class characterized by (4) and this approach has been commonly used to model missing data processes with a MAR mechanism (e.g., Robins, Rotnitzky, and Zhao 1995). For intermittently MAR data, it is often convenient to adopt the further assumption that the missing data indicators at time j depend only on the previously observed outcomes and covariates, and we do this in Section 5. To model (2), we employ marginal logistic regression models for λyij and λxij at each assessment time. In particular, we specify logit(λyij ) = u0ij αy

and logit(λxij ) = vij0 αx ,

(5)

¯y , R ¯ x , Y (o) , X (o) , Zi }, j = 2, 3, . . . , J, and αy where uij and vij contain functions of {R ij ij i i and αx are regression parameters; let αxy = (αy0 , αx0 )0 . Regression models may be used to allow the odds ratio ψij to vary as a function of time-varying covariates. We may specify, 5

for example, log(ψij ) = u∗ij 0 φj ,

(6)

(o) (o) x ¯y , R ¯ ij where u∗ij is function of (Yi , Xi , Zi ) and (R ), and φj is a vector of regression ij 0 coefficients. Let φ = (φ02 , φ03 , . . . , φ0J )0 and α = (αxy , φ0 )0 be of dimension q, say. y x Given these component models for subject i, we let πijxy = P (Rij = 1, Rij = 1|Yi , Zi , Xi )

be the conditional probability of complete data for subject i at time j given the response xy vector Yi and covariates Zi and Xi , j ≥ 2; here πi1 = 1 is assumed. The joint probability

πijxy can then be written as πijxy

j−1 X Xn xy Y y x y x x )r y ril ril (1−ril )ril (1−ril il λij · [ (λxy (λxil − λxy (λyil − λxy = il ) il ) il ) ¯x ¯y R R ij ij

·(1 −

l=2

λxil



λyil

+

y x) (1−ril )(1−ril λxy ] il )

o (7)

¯y for j ≥ 2, where the summation is taken over all the possible values of the histories R ij ¯x . and R ij 3.

ESTIMATION AND INFERENCE

3.1 Estimating Equations for Response Parameters Following the spirit of the IPWGEE approach of Robins, Rotnitzky, and Zhao (1995), we introduce a weight matrix ∆∗i (α) into the usual GEE to adjust for the effects of incomplete y x responses and covariates. That is, if we let ∆∗i (α) = diag(I(Rij = 1, Rij = 1)/πijxy , 1 ≤

j ≤ J), then the product ∆∗i (Yi − µi ) yields an adjusted contribution from subject i which involves the observed data alone. Moreover, this element has expectation zero, and hence unbiased estimating equations for β can be obtained as ∗

U (β, α) =

n X

Ui∗ (β, α) = 0,

(8)

i=1

where Ui∗ (β, α) = Di Vi−1 ∆∗i (α)(Yi − µi ) with Di = ∂µ0i /∂β being a p × J derivative matrix, and Vi the working covariance matrix for the response Yi . 6

1/2

1/2

In practice, the covariance matrix Vi is often expressed as Vi = κFi Ci Fi , where Ci is a working correlation matrix, and Fi = diag(h(µij ), j = 1, . . . , J). When the working correlation matrix Ci is the identity matrix, (8) is computable. However, when a working independence assumption is not adopted, (8) may not be computable since elements of Di Vi−1 associated with the observed pairs (Yij , Xij ) may be unknown because they involve of other missing covariates Xik (k 6= j). Here we modify (8) to incorporate general x working correlation matrices. We define ∆i (α) = [wijk ]J×J , where wijk = I(Rij = xy xy y y x x x = 1, Rik = 1)/πijk , and πijk = P (Rij = 1, Rik = 1|Yi , Xi , Zi ). Let 1, Rik = 1, Rik −1/2

Mi = κ−1 Fi

−1/2

[Ci−1 • ∆i (α)]Fi

, where A • B = [aij · bij ] denotes the Hadamard

product of J × J matrices A = [aij ] and B = [bij ]. By introducing the condition that Xij must be observed for elements in row j of ∆i (α), we ensure that all required elements of Di [Vi−1 • ∆i (α)](Yi − µi ) can be computed. The generalized estimating functions for β are given by U (β, α) =

n X

Ui (β, α) = 0 ,

(9)

i=1

where Ui (β, α) = Di Mi (Yi − µi ), and this yields consistent estimators since E(Riy ,Rix )|(Yi ,Xi ,Zi ) [Ci−1 • ∆i (α)] = Ci−1 . It is easy to see that estimating function (9) depends on the observed data and the parameters only, and hence is computable. To employ (9) to estimate β, one needs to xy evaluate the joint probability πijk which can be written as ( k−1 X X X Y y y x x xy xy (1−ri` )ri` ri` ri` λik · [(λxy (λxi` − λxy πijk = ··· i` ) i` ) y x ri,k−1 ,ri,k−1

y y x ri,j+1 ,ri,j+1 rij x )r y (1−ri` i`

·(λyi` − λxy i` )

`=j+1

) y x) (1−ri` )(1−ri`

(1 − λxi` − λyi` + λxy i` )

y rij

y 1−rij

] · (πijxy ) (πijx − πijxy )

x for j < k, where πijx = P (Rij = 1|Yi , Xi , Zi ) can be expressed in terms of λxij , λyij and λxy ij ;

similar steps are involved in obtaining πijy . 7

,

In practice the parameters α of the missing data model are unknown, and one must replace α in (9) with a consistent estimate. We describe how to obtain an estimate in the next subsection. 3.2 Estimation of Parameters for the Missing Data Processes 0 0 0 0 Let Λij = (λyij , λxij )0 , Ri = (Ri2 , Ri3 , . . . , RiJ ) , Λi = (Λ0i2 , Λ0i3 , . . . , Λ0iJ )0 , and let Vi∗ =

diag(Vi2∗ , Vi3∗ , . . . , ViJ∗ ) be the covariance matrix of Ri , where Vi∗ is the 2 × 2 covariance matrix of Rij with (1, 1) element λyij (1 − λyij ), (2, 2) element λxij (1 − λxij ) and (1, 2) and y x ∗ 0 (2, 1) element λxy ij − λij λij . If Di = ∂Λi /∂αxy , then the estimating functions for αxy are P given by ni=1 S1i (α), where S1i (α) = Di∗ [Vi∗ ]−1 (Ri − Λi ).

We use second order estimating equations for estimation of the association parameter y ∗ x φ. To construct these we define the pairwise product Rij = Rij Rij and the vector Ri∗ = xy xy 0 xy 0 ∗ ∗ 0 ∗ ∗ ∗ ) , and let Λxy = (λxy , . . . , RiJ , Ri3 (Ri2 i i2 , λi3 , . . . , λiJ ) , Ci = ∂[Λi ] /∂φ, and Wi = xy diag(λxy ij · (1 − λij ), j = 2, 3, . . . , J). The estimating functions for φ are then given by Pn ∗ −1 ∗ · (Ri∗ − Λxy i ). Then the estimating equations for i=1 S2i (α), where S2i (α) = Ci [Wi ] P 0 0 (α), S2i (α))0 . α are ni=1 Si (α) = 0, where Si (α) = (S1i

3.3 Estimation and Inference We may employ a Fisher-scoring algorithm for estimation of θ = (α0 , β 0 )0 . To do this we let P P Hi (θ) = (Si0 (α), Ui0 (β, α))0 , M ∗ (α) = ni=1 ∂Si (α)/∂α0 , and M (θ) = − ni=1 Di Mi Di0 . As the estimating functions Si (α) are free of the β parameters, the derivative matrix ∂Hi (θ)/∂θ0 is lower triangular, and therefore, given an initial value θ(0) , updated estimates are obtained via the iterative equation  ∗

θ(t+1) = θ(t) −  P n

−1  P  n (t) 0 S (α )   i=1 i  , P n (t) (t) M (θ ) i=1 Ui (θ )

(t)

M (α )

i=1 [∂Ui (θ)/∂α

0

]|θ(t)

b t = 0, 1, . . ., until θ(t+1) converges to the solution θ. 8

(10)

Alternatively, one can invoke a two-stage estimation procedure. Under this scheme an P estimate of α is obtained as the solution to ni=1 Si (α) = 0 by Fisher-scoring, and then a P Fisher-scoring algorithm is employed to solve ni=1 Ui (β, α ˆ ) = 0 where α ˆ is used in place of α in (9). This two-stage procedure employs the iterative equation α

(s+1)



(s)



(s)

−1

− [M (α )]

·

n X

Si (α(s) ) s = 0, 1, . . .

(11)

i=1

to obtain the estimate α b. This in turn is used to provide estimated weights in β

(t+1)



(t)

(t)

−1

− [M (β , α ˆ )]

·

n X

Ui (β (t) , α ˆ ) t = 0, 1, . . . .

(12)

i=1

The two-stage iterative equations (11) and (12) differ from joint iterative equation (10). Even under the special situation that the components of the left lower corner are zero in the inverse matrix of (10), (10) does not necessarily yield the same updated values as those from (11) and (12). However, the updated values from these two procedures converge to the same limit under mild regularity conditions. While the two-stage procedure based on (11) and (12) is much easier to use for estimaˆ the joint formulation based on Hi (θ) is more useful for developing the asymptotic tion of θ, ˆ Note that since E[Hi (θ)] = 0 when the response and missing data moddistribution for θ. els are correctly specified and by Theorem 3.4 of Newey and McFadden (1993), under P standard regularity conditions there is a unique solution θb to the equation ni=1 Hi (θ) = 0 with probability approaching 1, that satisfies n

1/2

(θˆ − θ) = −{E[∂Hi (θ)/∂θ0 ]}−1 · n−1/2

n X

Hi (θ) + op (1) .

i=1

For the estimator βˆ of central interest, we have n

1/2

(βˆ − β) = −Γ−1 n−1/2 ·

n X

Qi (β, α) + op (1),

i=1 0

where Γ = E[∂Ui (β, α)/∂β ], and Qi (β, α) = Ui (β, α) − E[∂Ui (β, α)/∂α0 ] · [E(∂Si (α)/∂α0 )]−1 · Si (α) . 9

The central limit theorem then leads to the asymptotic distribution for n1/2 (βˆ −β), which is normal with mean 0 and asymptotic variance Γ−1 Σ[Γ−1 ]0 , where Σ = E[Qi (β, α)Q0i (β, α)]. Finally, we comment that the preceding development treats the working correlation matrix Ci as known. In applications, Ci may contain an unknown parameter vector, say ρ, that is functionally independent of the β and α parameters but must be estimated. Following the spirit of Liang and Zeger (1986), we suggest an iterative fitting procedure by which the current value of βˆ is used to compute a weighted moment-type estimator obtained as p a function of the weighted Pearson residual eij = δij (yij − µij )/ h(µij ), where δij = y x rij rij /πijxy . The expression for the estimator of ρ depends on the correlation structure. For

example, for an unstructured correlation matrix where Corr(Yij , Yik ) = ρjk for j 6= k, ρjk P xy ∗ is estimated by ρˆjk = [nκ]−1 ni=1 eij eik · (πijxy πik /πijk ), and the dispersion parameter κ P P y ∗ x x = P (Rij = 1, Rij = 1, Rik = is estimated by κ ˆ = [nJ]−1 ni=1 Jj=1 e2ij · πijxy . Here πijk xy y 1, Rik = 1|Yi , Xi , Zi ) is calculated in the same way πijk is calculated above. Both κ ˆ and

ρˆjk are unbiased estimators. 4.

MORE EFFICIENT ESTIMATION VIA AUGMENTED IPWGEE

Note that the estimating functions in (9) only include the measurements collected at those time points j when both Yij and Xij are observed, together with an observed covariate Xik . There may therefore be some information loss relative to methods that make use of all the available measurements. Under a missing at random mechanism, Robins, Rotnitzky, and Zhao (1994, 1995), Robins and Rotnitzky (1995) and Scharfstein, Rotnitzky, and Robins (1999) proposed methods to improve the efficiency of the inverse probability weighted estimates. The idea is to modify these inverse weighted equations by adding a function with zero expectation yielding an augmented estimating function which remains unbiased. With suitable choice of the appended function, efficiency may be improved. This approach has, to our knowledge, only been investigated to address either incomplete response or

10

covariate processes, but not both. In this section, we describe ways to improve efficiency when either the response, covariates, or both, may be missing at any assessment time. Corresponding to each missing data pattern, we consider a vector Air (r = 1, 2, 3) that picks up available measurements that may not be included in (9). For example, we take µ·

Ai1 Ai2 Ai3

¸ ¶0 y x I(Rij = 1, Rij = 0) y y = · πij − 1 · Rij Yij , j = 1, 2, . . . , J , πijy − πijxy µ· ¸ ¶0 y x I(Rij = 0, Rij = 1) x x = · πij − 1 · Rij Xij , j = 1, 2, . . . , J , πijx − πijxy ¶0 µ· ¸ y x I(Rij = 0, Rij = 0) 0 = − 1 · Zij , j = 1, 2, . . . , J , 1 − πijx − πijy + πijxy

(13) (14) (15)

and Ai = (A0i1 , A0i2 , A0i3 )0 . The key point here is to make Ai have zero mean and be expressed in terms of the observed data. For ease of implementation, Ai is often chosen to be free of the unknown β parameter, but it may depend on the α parameter. We now explicitly denote it by Ai (α). Let Res{A, B} = A−E[AB 0 ]{E[BB 0 ]}−1 B denote the residual obtained by regressing A on B. Let η = E[Res{Ui (β, α), Si (α)}Res{Ai (α), Si (α)}0 ][var(Res{Ai (α), Si (α)})]−1 , and Ui† (β, α) = Ui (β, α) − ηAi (α). Then, if α is known, the estimator β˜† obtained from solving

n X

Ui† (β, α) = 0 ,

(16)

i=1

is consistent for β since Ui† (β, α) is unbiased. Below we establish the asymptotic properties for β˜† ; proofs are given in Appendix B. Theorem 1: Under regularity conditions listed in Appendix A, we have 1. n1/2 (β˜† − β)

has

an

asymptotic

distribution

N (0, Γ−1 Σ† [Γ−1 ]0 )

Σ† = var{Res(Ui (β, α), Hi∗ )}, where Hi∗ = (A0i (α), Si0 (α))0 . 2. If η 6= 0, then β˜† is more efficient than βˆ asymptotically.

11

with

We note that the efficiency of β˜† relies on the choice of function Ai (α), and there is no universal way to specify an optimal Ai (α) function to produce the most efficient estimator β˜† . However, as long as Ai (α) is correlated with Ui (β, α) some improvement in efficiency will be realized. In practice it is usually not possible to solve (16) since η will typically be unknown. A modified version of (16) may be solvable, however, by replacing η with a n1/2 -consistent P d ˆ ˆ ), Si (ˆ d i (ˆ estimate ηˆ = ηˆ1 ηˆ2−1 , where ηˆ1 = n−1 ni=1 Res[U α)]Res[A α), Si (ˆ α)]0 , ηˆ2 = i (β, α P d d i (ˆ d i , Bi ) = Ai −Pn [Ai B 0 ][Pn Bi n−1 ni=1 Res[A α), Si (ˆ α)]Res[A α), Si (ˆ α)]0 , and Res(A i (ˆ i i=1 i=1 Bi0 ]−1 Bi . Under the regularity conditions in Appendix A, the resultant estimator has the same asymptotic distribution as β˜† , and variance matrix Γ−1 Σ† [Γ−1 ]0 can be consistently ˆ −1 Σ ˆ † [Γ ˆ −1 ]0 with estimated by Γ ˆ † = n−1 Σ

n X

ˆ α ˆ α d i (β, d i (β, {Res[U ˆ ), {Ai (ˆ α), Si (ˆ α)}]}Res{[U ˆ ), {Ai (ˆ α), Si (ˆ α)}]}0 .

i=1

5.

EMPIRICAL STUDIES AND APPLICATION

5.1 Assessment of the Proposed Methods In this section we assess the empirical performance of the proposed methods through simulation studies, and contrast their performance with that of other methods one might consider in practice. We consider a setting with J = 3 and n = 500, and simulate longitudinal binary responses from a model with logit(µij ) = β0 + β1 xij where xij is a time-dependent binary covariate generated independently from Bin(1,0.5) which may be missing at some time points. We set expit(β0 ) = 0.6 and exp(β1 ) = 0.5, where expit(t) = exp(t)/(1 + exp(t)). The association between the responses is specified as exchangeable with correlation coefficient ρ, which is specified as 0, 0.3 and 0.6. The data generation procedures follow Preisser, Lohman, and Rathouz (2002). Two thousand simulations are run for each parameter configuration.

12

For the missing response and covariate processes, we respectively take y y logit(λyij ) = αy0 + αy1 ri,j−1 + αy2 ri,j−1 yi,j−1 ,

(17)

x x xi,j−1 , + αx2 ri,j−1 logit(λxij ) = αx0 + αx1 ri,j−1

(18)

for j = 2, 3, which states that the missing data indicator for the response at assessment j depends marginally on the previous missing data indicator for the response and the corresponding response variable if it was observed; the marginal missing data model for the covariate has a similar interpretation. We assume the response and covariates are availy x able at the first assessment time, so ri1 = ri1 = 1. The true values for the regression

parameters of the missing data processes are set to expit(αy0 ) = expit(αx0 ) = 0.5, exp(αy1 ) = exp(αx1 ) = 1.5, and exp(αy2 ) = exp(αx2 ) = 0.1, 0.5 or 2.0. For the joint y x and Rij , we consider a flexible model with time varying association probability of Rij

reflected by log(ψij ) = φ0 + φ1 · I(j = 3),

j = 2, 3 .

(19)

To feature different strengths of association, we take φ0 = 0 and log(2), and φ1 = 0 and log(2), and hence we consider the configurations (ψi2 , ψi3 ) = (1, 1), (1, 2), (2, 2) and (2, 4). Here we assess the performance of the proposed method along with other methods which might be used in practice using different models for the weights. The first method, labeled “GEE” in the tables, is based on generalized estimating equations obtained by setxy ting πijxy and πijk to be 1 in (9), for j = 1, 2, . . . , J. The second and third methods, labeled

“IPWGEE-M1” and “IPWGEE-M2” respectively, use marginal weights in the generalized ∗ ∗ = 1 if both where Rij estimating equation (9) based on a single missing data model for Rij ∗ Yij and Xij are observed and Rij = 0 otherwise. The probability of interest here is then (o)

(o)

∗ ∗ λ∗ij = P (Rij = 1|Ri,j−1 , Yi , Xi ) and a logistic model is specified as 0 logit λ∗ij = wij α,

13

j = 2, 3 .

(20)

∗ ∗ ∗ ∗ The second and third methods employ {1, ri,j−1 , ri,j−1 yi,j−1 , ri,j−1 xi,j−1 } and {1, ri,j−1 , y x yi,j−1 , ri,j−1 xi,j−1 } for wij in (20) respectively, accommodating different covariate ri,j−1

dependencies of the marginal missing data processes for the responses and covariates. The ∗ weight matrix in the second and third methods is ∆i (α) with (j, j) elements I(Rij = ∗ ∗ ∗ 1)/πijxy , and (j, k) elements I(Rij = 1, Rik = 1)/πijk for j 6= k, where the probabilities (o)

(o)

∗ ∗ ∗ ∗ πijxy in (9) are P (Rij = 1|Yi , Xi ), and πijk = P (Rij = 1, Rik = 1|Yi , Xi , Zi ) which y x can be expressed in terms of λ∗ij . Instead of modeling Rij and Rij with a single indicator y ∗ x Rij = Rij Rij , in the fourth and fifth methods we use separate models described in Section y x 2 to characterize Rij and Rij . The fourth method, labeled “IPWGEE-I”, constrains ψij to

be 1, while the fifth method, labeled “IPWGEE-J”, accommodates the association structure through ψij . The sixth method, labeled “AIPWGEE-J”, is the augmented IPWGEE accommodating the association structure through ψij , where we specify Ai (α) = (A0i1 , A0i2 )0 with Ai1 and Ai2 the form of (13) and (14). The parameter ρ in the working correlation matrix P P xy ∗ /πijk with N ∗ = 1/2 · nJ(J − 1) Ri (ρ) is estimated by (N ∗ )−1 ni=1 j