1
Supplement to “Efficient estimation in semivarying coefficient models for longitudinal/clustered data” by Ming-Yen Cheng, Toshio Honda, and Jialiang Li
arXiv:1501.00538v3 [stat.ME] 13 Sep 2015
S.1. Additional simulation results. S.1.1. Nonparametric component estimates. In Step 7 of our estimation procedure we give both local linear and spline approaches to estimation the nonparametric component after the efficient estimator βbΣ b is obtained. In this section we examine the finite sample performance via simulations. For comparison, we also computed the respective initial estimates, that is, the version using βbI instead of βbΣ b . We considered the same settings in Section 4, and we used cross-validation to choose the bandwidth used in the local linear estimation. We computed the mean integrated square error (MISE) for all the function estimates and took their average. The results are given in Table S.1. The figures in Table S.1 indicate that it is clearly advantageous to update the nonparametric component after efficient estimation of the parametric component. In addition, we observe that the refine local linear and spline estimators perform roughly the same in terms of MISE. Table S.1 MISE for simulation studies.
n=100 ρ = .4 ρ = .8 n=200 ρ = .4 ρ = .8
Local linear estimate Initial Refined
Spline estimate Initial Refined
.0449 .0691
.0354 .0597
.0492 .0639
.0376 .0593
.0390 .0595
.0315 .0589
.0415 .0584
.0355 .0576
S.1.2. Parametric component estimates. We note that we adjusted the covariance function σ b(s, t) by setting all negative eigenvalues to be zero. We also considered a strictly positive threshold λL = 0.05 and set all eigenvalues lower than λL to be zero. The estimator using this covariance estimate is denoted by “Positive” in Table S.2. The “positive” estimator includes an adjustment when estimating the covariance function by setting eigenvalues lower than a positive cut-off to be zero while the efficient estimator only adjusts the negative eigenvalues. Therefore, it is slightly more biased than the efficient estimator. In all the considered cases, the crude and positive estimators are still more efficient than the working independence estimator.
2
Recall that in all the numerical analysis reported in the paper, h1 and h2 were selected via the commonly used leave-one-subject-out cross-validation, and the bandwidth h3 used in the estimation of the covariance structure were selected as h3 = 2h1 . To examine effects of the bandwidth choice, we considered various choices of h3 in the numerical studies and obtained quite similar results. Under the column “Different h3 ”, we report the results for another case when h3 = 1.5h1 , which are similar to those obtained when h3 = 2h1 . Our procedure does not require any iteration. In practice it may be interesting to refine the estimation of coefficients and covariances using iterations and obtain a final estimation upon convergence. We report the numerical results under the “Iterative” column. The bias and SE are very close to those obtained without iteration. Table S.2 Estimation results of 200 simulations. “Positive” means we set a positive threshold for the covariance eigenvalues; “Different h3 ” means using a different choice of h3 in our efficient estimation; “Iterative” indicates an iterative estimation approach. n 100
ρ 0.4
200
0.4
100
0.8
200
0.8
β1 β2 β3 β4 β1 β2 β3 β4 β1 β2 β3 β4 β1 β2 β3 β4
Positive bias SE .0173 .0411 .0176 .0423 .0205 .0425 -.0096 .0425 -.0113 .0329 -.0164 .0334 .0120 .0323 -.0095 .0329 .0202 .0366 .0163 .0378 .0197 .0372 -.0168 .0354 -.0044 .0214 .0036 .0215 .0042 .0215 -.0038 .0214
Different h3 bias SE -.0152 .0375 -.0098 .0375 -.0122 .0369 .0098 .0373 .0056 .0274 -.0099 .0274 .0072 .0273 -.0043 .0276 .0082 .0336 -.0075 .0335 .0166 .0337 -.0182 .0338 -.0124 .0202 .0138 .0200 .0165 .0204 -.0148 .0200
Iterative bias SE -.0146 .0361 -.0095 .0352 -.0099 .0360 -.0086 .0362 .0045 .0228 -.0066 .0219 .0034 .0259 -.0035 .0274 .0065 .0325 -.0034 .0323 .0121 .0328 .0157 .0325 .0056 .0199 -.0049 .0199 .0052 .0178 -.0050 .0179
S.2. Proofs of Propositions 1-3 and Lemma 1. In this section, we outline the proofs of Propositions 1-3 and present the proof of Lemma 1. When mi is uniformly bounded, we have the same results for general link functions by just following closely the arguments of [3]. We outline the results at the end of this supplement. Note that the sub-Gaussian error assumption is necessary in that case. We outline the proofs of Propositions 1-3 since we allow some of the mi ’s to diverge as in Assumptions A1 and A2.
3
Proof of Proposition 1. First we consider the properties of ΓV . The (k, l)th element of n−1 H11·2 is given by bV l iVn . bV k , Xl − Z T ϕ hXk − Z T ϕ
From Lemma 1 (v)-(vii), we have
bV k , Xl − Z T ϕ bV l iVn = hXk − Z T ϕ∗V k , Xl − Z T ϕ∗V l iVn + op (1) hXk − Z T ϕ
= hXk − Z T ϕ∗V k , Xl − Z T ϕ∗V l iV + op (1).
This and (2.5) imply that for some positive constants C1 and C2 , we have C1 ≤ λmin (n−1 H11·2 ) ≤ λmax (n−1 H11·2 ) ≤ C2 and hence (S.1)
1 1 ≤ λmin (H 11 ) ≤ λmax (H 11 ) ≤ nC2 nC1
with probability tending to 1. Note that Var(βbV | {Xij }, {Zij }, {Tij }) = ΓV
and Theorem 1 of [13] implies that ΓV − H 11 is nonnegative definite when H 11 is defined with Vi = Σi . Hence for some positive constant C3 , we have λmin (ΓV ) ≥
C3 n
with probability tending to 1. Now we prove the asymptotic normality of βbV − E{βbV | {Xij }, {Zij }, {Tij }} n n X X −1 WiT Vi−1 ǫi . XiT Vi−1 ǫi − H12 H22 = H 11 i=1
i=1
As in the proof of Theorem 2 of [13], we take c ∈ Rp such that |c| = 1 and write
where
cT (βbV − E{βbV | {Xij }, {Zij }, {Tij }}) =
n X
ai ηi
(say),
i=1
−1 −1 a2i = cT H 11 (X i − W i H22 H21 )T Vi−1 Σi Vi−1 (X i − W i H22 H21 )H 11 c
4
and {ηi } is a sequence of conditionally independent random variables with E{ηi | {Xij }, {Zij }, {Tij }} = 0 and Var(ηi | {Xij }, {Zij }, {Tij }) = 1. We have from (S.1) and Lemma 1 (vii) that max a2i = Op
1≤i≤n
m2
max n2
p X k=1
m2 max bV k k2∞ = Op . kXk − Z T ϕ n2
On the other hand, we have for some positive constant C4 , n X i=1
a2i = cT ΓV c ≥
C4 n
with probability tending to 1. Hence we have established max1≤i≤n a2i Pn = Op (n−1 m2max ) = op (1) 2 a i=1 i
and it follows from the standard argument that n n −1/2 X X d ai ηi → N(0, 1). a2i (S.2) i=1
i=1
Finally we evaluate the conditional bias:
Biasβ = E{βbV | {Xij }, {Zij }, {Tij }} − β0
e ∈ GB such that kg0 − gekG,∞ = O(Kn−2 ) and set Take g Note that
δ0 = g0 − ge and δ0 = Z T δ0 .
kδ0 k∞ = O(Kn−2 )
and kδ0 kV = O(Kn−2 ).
eV k kG,∞ = O(Kn−2 ). Then we eV k ∈ GB such that kϕ∗V k − ϕ We also take ϕ have the following expression for the conditional bias: Biasβ = nH 11 (S1 , . . . , Sp )T ,
where b V n δ0 iVn b V n δ0 iVn = hXk − Z T ϕ eV k , δ0 − Z T Π Sk = hXk , δ0 − Z T Π = hXk − Z T ϕ∗V k , δ0 − Z T ΠV n δ0 iVn
b V n δ0 iVn + hXk − Z T ϕ∗V k , Z T ΠV n δ0 − Z T Π b V n δ0 iV eV k , δ0 − Z T Π + hZ T ϕ∗ − Z T ϕ Vk
= S1k + S2k + S3k
n
(say).
5
Note that E{S1k } = 0 and
2 E{S1k }=O
(kX − Z T ϕ∗ kV )2 k Vk Kn3 n
since S1k is a sum of independent random variables, ϕ∗V k = ΠV Xk , δ0 = Z T δ0 , and kδ0 − Z T ΠV n δ0 k∞ ≤ kδ0 k∞ + CKn1/2 kZ T ΠV n δ0 kV
≤ kδ0 k∞ + CKn1/2 kδ0 kV = O(Kn−3/2 ).
Hence we have S1k = Op (1/(nKn3 )1/2 ) = op (n−1/2 ). Now we deal with S2k . From Lemma 1 (vi) and the fact that kδ0 −Z T ΠV n δ0 k∞ = −3/2 O(Kn ), we have b V n δ0 kV kZ T ΠV n δ0 − Z T Π n |hδ0 − Z T ΠV n δ0 , Z T giVn − hδ0 − Z T ΠV n δ0 , Z T giV | = sup kZ T gkVn g∈GB r Kn = Op (Kn−1 n−1/2 ). = Op Kn−3/2 n
Thus we have
|S2k | = op (n−1/2 ). We also have eV k )kVn = Op (Kn−4 ) = op (n−1/2 ) |S3k | ≤ kδ0 kVn kZ T (ϕ∗V k − ϕ
b V n δ0 kV ≤ kδ0 kV . Hence we have since kδ0 − Z T Π n n Biasβ = op (n−1/2 ) .
The desired result follows from (S.2) and the above equality. As for Proposition 2, there is almost no change in calculation of the score functions in [13] and [4] and we omit the outline. This is because mi is bounded for any fixed n. Proof of Proposition 3.
When Vi = Σi , we have
ΓV = H 11 = (H11·2 )−1
and
ϕ∗Σk = ϕ∗ef f,k .
6
Lemma 1 (vii) implies that 1 1 1 −1 ΓV = H11·2 = E{lβ∗ (lβ∗ )T } + op (1) = ΩΣ + op (1). n n n The desired result follows from the above result and Proposition 1. Proof of Lemma 1. (i) Recall that
The proof consists of seven parts. 1 n X T T −1 T o E (Z g)i Vi (Z g)i . n n
(kZ T gkV )2 =
i=1
We have from Assumptions A4 and A5 that (S.3)
mi n o C1 n X 1 X T g T (Tij )Zij Zij g(Tij ) E n mi i=1
j=1
≤ (kZ T gkV )2 ≤
n mi o C2 n X X T g T (Tij )Zij Zij g(Tij ) E n i=1 j=1
for some positive constants C1 and C2 . Assumptions A2 and A3 imply that for some positive constants C3 and C4 , (S.4) q Z mi n o X 1 nX 1 X T g T (Tij )Zij Zij g(Tij ) C3 gl2 (t)dt ≤ E n mi l=1
1 ≤ E n
i=1 j=1 m n i nXX T
g
i=1 j=1
o
T (Tij )Zij Zij g(Tij )
≤ C4
q Z X
gl2 (t)dt.
l=1
The desired result follows from (S.3) and (S.4). (ii) This is a well-known result in the literature of spline regression. See for example A.2 of [12]. (iii)The result in (ii) implies kX T β + Z T gk2∞ ≤ CKn |β|2 + kgk2G,2
for some positive constant C. Recall that p and q are fixed in this paper. On the other hand, we have from Assumptions A1-3 and A5 that for some
7
positive constants C1 , C2 , and C3 , (kX T β + Z T gkV )2 o mi n T X ZT Xij Xij C1 n X 1 X β ij ij E (β T g T (Tij )) ≥ T T Zij Xij Zij Zij g(Tij ) n mi i=1 j=1 o mi n C2 n X 1 X β T T ≥ ≥ C3 |β|2 + kgk2G,2 . E (β g (Tij )) g(Tij ) n mi i=1
j=1
Besides, we have for some positive constants C1 and C2 , (kvkV )2 ≤
n mi C1 X X |vij |2 ≤ C2 kvk∞ . n i=1 j=1
Hence the desired results are established. (iv) For g1 ∈ GB and g2 ∈ GB , we have hZ T g1 , Z T g2 iVn = γ1T
n n1 X
n
i=1
o W Ti Vi−1 W i γ2 = γ1T ∆V n γ2
(say),
where ∆V n is a qKn × qKn P matrix and γ1 and γ2 correspond to g1 and g2 , 1 respectively. Elements of n ni=1 W Ti Vi−1 W i are written as n
(S.5)
1 X X j1 j2 (k ,l ,k ,l ) vi Bk1 (Tij1 )Bk2 (Tij2 )Zij1 l1 Zij2 l2 = ∆V 1n 1 2 2 n
(say),
i=1 j1 ,j2
where vij1 j2 is defined in (??), 1 ≤ k1 , k2 ≤ Kn , and 1 ≤ l1 , l2 ≤ q. By evaluating the variance of (S.5) and using the Bernstein inequality for independent bounded random variables, and Assumptions A1 and A2, we have uniformly in k1 , k2 , l1 , and l2 , s log n (k1 ,l1 ,k2 ,l2 ) (k1 ,l1 ,k2 ,l2 ) − E(∆V n ) = Op (S.6) ∆V n if Bk1 (t)Bk2 (t) ≡ 0 nKn2 and (S.7)
(k ,l ,k ,l ) ∆V 1n 1 2 2
−
(k ,l ,k ,l ) E(∆V 1n 1 2 2 )
= Op
r log n nKn
if Bk1 (t)Bk2 (t) 6≡ 0.
By exploiting (S.6), (S.7), and the local property of the B-spline basis, we obtain (S.8) r log n . max{|λmin (∆V n − E(∆V n ))|, |λmax (∆V n − E(∆V n ))|} = Op n
8
We also have C2 C1 ≤ λmin (E(∆V n )) ≤ λmax (E(∆V n )) ≤ Kn Kn
(S.9)
since Assumptions A2 and A3 yields mi n C3 X 1 X (Zij ⊗ B(Tij ))T (Zij ⊗ B(Tij )) n mi i=1
j=1
≤ ∆V n ≤
n mi C4 X X (Zij ⊗ B(Tij ))T (Zij ⊗ B(Tij )) n i=1 j=1
for some positive constants C3 and C4 . See the proof of Lemma A.3 of [12]. Hence the desired result follows from (S.8) and (S.9). (v) This follows from (iv) and (vi). (vi) Using Assumptions A1 and A2 we have n
hδn , Zl Bk iVn
1 XX δn,ij1 vij1 j2 Zij2 l Bk (Tij2 ) = n i=1 j1 ,j2
and Var(hδn , Zl Bk iVn ) ≤
n C2 kδn k2∞ C1 kδn k2∞ X 2 X E{Bk2 (Tij1 )Bk2 (Tij2 )} ≤ mi 2 n nKn i=1
j1 ,j2
for some positive constants C1 and C2 . Hence we have q X Kn X l=1 k=1
Var(hδn , Zl Bk iVn ) ≤
C kδn k2∞ n
for some positive constant C and the desired result follows from (S.9). eV k ∈ GB such that kϕ eV k − ϕ∗V k kG,∞ = O(Kn−2 ). Then we have (vii) Take ϕ for some positive C, (S.10)
kZ T (ϕV k − ϕ∗V k )k∞
eV k )k∞ + kZ T (ϕ eV k − ϕ∗V k )k∞ ≤ kZ T (ϕV k − ϕ p eV k )kV + kZ T (ϕ eV k − ϕ∗V k )k∞ ≤ C Kn kZ T (ϕV k − ϕ p eV k )kV + kZ T (ϕ eV k − ϕ∗V k )k∞ ≤ C Kn kZ T (ϕ∗V k − ϕ = O(Kn−3/2 ).
9
Here we used the fact that ϕV k = ΠV n Xk ∈ GB and ϕ∗V k = ΠV Xk . Inequality (S.10) implies kZ T ϕV k k∞ = O(1) and we have only to evaluate bV k ). We should just follow the arguments on p.16 of [3] by reZ T (ϕV k − ϕ ∗ bV k since the arguments employ placing ϕk,n and ϕ bk,n with Z T ϕV k and Z T ϕ (iv) and (vi) and don’t depend on mi . Then we have p bV k )k∞ = op (1), kZ T (ϕV k − ϕ bV k )kVn = Op ( Kn /n), kZ T (ϕV k − ϕ p bV k )kV = Op ( Kn /n). and kZ T (ϕV k − ϕ The desired results follow from the above equations and (S.10).
S.3. Proof of Proposition 4. In the proof, we repeatedly use arguments based on exponential inequalities, truncation, and division of regions into small rectangles to prove uniform convergence results as in [S3]. We do not give the details of these arguments since they are standard ones in nonparametric kernel methods. Since we impose Assumption A2 and we do not c2 (t), and σ use Σi or Vi in the construction of gb(t), σ b(s, t), we see the effects of diverging mi explicitly only when applying the exponential inequality for generalized U-statistics. Recall that we assume three times continuous differentiability of the relevant functions in this proposition. b(t), (ii) represenThe proof consists of four parts: (i) representation of g c tation of b ǫij , (iii) representation of σ 2 (t), and (iv) representation of σ b(s, t). (i) Representation of gb(t). Applying the third order Taylor series expansion to g0 (t), we have (S.11) n Tij − t ′ h2 Tij − t 2 ′′ o T T g0 (t) + O(h31 ), g0 (t) + 1 g0 (t) + h1 Zij g0 (Tij ) = Zij h1 2 h1 ′ (t), . . . , g ′ (t))T and g ′′ (t) = (g ′′ (t), . . . , g ′′ (t))T . By where g0′ (t) = (g01 0q 0 01 0q plugging (S.11) into (3.2), we have uniformly in t,
b 1 (t))−1 L b 2 (t)(β0 − βbI ) (S.12) gb(t) = g0 (t) + Dq (L +
h21 b 1 (t))−1 L b 3 (t)g0′′ (t) + Dq (L b 1 (t))−1 E0 (t) + Op (h31 ), Dq (L 2
10
b 1 (t) = A1n (t) defined after (3.2), where L
!
T − t ij T Xij , h1 i=1 j=1 ! mi n X Tij −t 2 T − t X ( ) 1 ij T h1 b3 (t) = , (Zij Zij )⊗ L Tij −t 3 K N1 h1 h1 ( h1 ) i=1 j=1 ! n mi T − t 1 1 XX ij Zij ⊗ Tij −t K ǫij . E0 (t) = N1 h1 h1 h1 n mi 1 XX b Zij ⊗ L2 (t) = N1 h1
1
Tij −t h1
K
i=1 j=1
By following standard arguments such as those in [S3], we obtain for j = 1, 2, 3, r log n b Lj (t) = Lj (t) + Op (S.13) uniformly in t, nh1 b j (t)}, and where Lj = E{L
(S.14)
E0 (t) = Op
r log n nh1
uniformly in t.
Assumption A2 implies that (S.15)
C1 I2q ≤ L1 (t) ≤ C2 I2q
for some positive constants C1 and C2 . From (S.12)-(S.15), we have uniformly in t, (S.16) h2 gb(t) = g0 (t) + Dq (L1 (t))−1 L2 (t)(β0 − βbI ) + 1 Dq (L1 (t))−1 L3 (t)g0′′ (t) 2 log n r log n −1 3 + Dq (L1 (t)) E0 (t) + Op (h1 ) + Op + Op h21 nh1 nh1 2 ′′ b = g0 (t) + L4 (t)(β0 − βI ) + h1 L5 (t)g0 (t) + L6 (t)E0 (t) r log n log n 3 + Op h21 (say). + Op (h1 ) + Op nh1 nh1
Note that all the elements of Lj (t), j = 4, 5, 6, are bounded functions of t. (ii) Representation of b ǫij . We have T T b(Tij )). b ǫij = ǫij + Xij (β0 − βbI ) + Zij (g0 (Tij ) − g
11
By plugging (S.16) into the above equality, we obtain uniformly in i and j, (S.17) T T T b ǫij = ǫij + (Xij − Zij L4 (Tij ))(β0 − βbI ) − h21 Zij L5 (Tij )g ′′ (Tij ) log n r log n T 3 − Zij L6 (Tij )E0 (Tij ) + Op (h1 ) + Op + Op h21 nh1 nh1 (1) (2) (3) = ǫij + Mij (β0 − βbI ) + h21 Mij g ′′ (Tij ) + Mij E0 (Tij ) r log n log n 3 + Op h21 (say). + Op (h1 ) + Op nh1 nh1 (1)
(2)
(3)
Note that all the elements of Mij , Mij , and Mij are uniformly bounded functions of Xij , Zij , and Tij . c2 (t). We have uniformly in i and j, (iii) Representation of σ
(S.18)
(3)
(b ǫij )2 = ǫ2ij − σ 2 (Tij ) + σ 2 (Tij ) + 2ǫij Mij E0 (Tij )
(1) (2) + 2ǫij Mij (β0 − βbI ) + 2ǫij h21 Mij g0′′ (Tij ) r log n log n 3 + Op h21 . + Op (h1 ) + Op nh1 nh1
(l)
Recall that Mij , l = 1, 2, 3, are defined in (S.17). It is easy to see that the (1) (2) c2 (t) are contributions of 2ǫij M (β0 − βbI ) and 2ǫij h2 M g ′′ (Tij ) to σ 1
ij
1 Op √ n
r
log n nh2
ij
r log n and Op h21 nh2
uniformly in t, respectively. Thus we have only to consider ǫ2ij − σ 2 (Tij ), (3)
σ 2 (Tij ), and 2ǫij Mij E0 (Tij ) in (S.18). b7 (t) = A2n (t), which is defined after (3.3), we have for some Setting L positive constants C1 and C2 , r log n b 7 (t) = L7 (t) + Op and C1 I2 ≤ L7 (t) ≤ C2 I2 (S.19) L nh2
b 7 (t)}. Now we have uniformly in t, uniformly in t, where L7 (t) = E{L (S.20)
c2 (t) = (1 0)(L b 7 (t))−1 (E1 (t) + Bias1 (t) + R1 (t)) σ log n r log n 3 + Op (h1 ) + Op + Op h21 , nh1 nh1
12
where E1 (t) is defined in Proposition 4, Bias1 (t) is the term of σ 2 (Tij ), and (3) R1 (t) is the term of 2ǫij Mij E0 (Tij ). It is easy to see that uniformly in t, (S.21)
E1 (t) = Op
r log n nh2
.
By applying the Taylor series expansion, we have σ 2 (Tij ) = σ 2 (t) + h2 (σ 2 )′ (t)
Tij − t h22 2 ′′ Tij − t 2 + (σ ) (t) + O(h32 ). h2 2 h2
Therefore Bias1 (t) can be represented as ! mi n X Tij −t 2 T − t 2 (σ 2 )′′ (t) X 2 (t) ( ) h σ ij 2 h2 b 7 (t) K Bias1 (t) = L + T −t 2 ′ ij 3 h2 (σ ) (t) 2N1 h2 h2 ( h2 ) i=1 j=1
+ Op (h32 ).
uniformly in t. Setting n mi T −t ( ijh2 )2 1 XX b L8 (t) = T −t N1 h2 ( ijh2 )3 i=1 j=1
!
K
T − t ij , h2
we have uniformly in t,
r log n b8 (t) = L8 (t) + Op L , nh2
b 8 (t)} and L8 (t) is a bounded vector function of t. Hence where L8 (t) = E{L we have uniformly in t, (S.22) r log n 2 (σ 2 )′′ (t) 2 (t) h σ 2 3 b7 (t) Bias1 (t) = L + L8 (t) + Op (h2 ) + Op h22 . h2 (σ 2 )′ (t) 2 nh2
Next we deal with R1 (t), which can be written as (S.23) mi′ mi X n X n X T − t T ′ ′ − T XX 1 ij ij ij ′ ′ ′ ′ K A B ǫ ǫ Kb , a ij i j ab,ij ab,i j h h N12 h1 h2 2 1 ′ i=1 j=1 ′ a,b
i =1 j =1
where Kl (t) = tl K(t), a = 0, 1, and b = 0, 1. Note that Aab,ij and Bab,ij are
13
uniformly bounded functions of Xij , Zij , and Tij . We evaluate (S.24) mi′ mi X n X n X T − t T ′ ′ − T X 1 ij ij ij ′ j ′ Aab,ij Bab,i′ j ′ Ka ǫ ǫ Kb ij i h h N12 h1 h2 i=1 j=1 ′ 2 1 ′
=
1 2 N1 h1 h2
i =1 j =1 mi n X X
ǫ2ij Aab,ij Bab,ij Ka
i=1 j=1 n X
T − t ij Kb (0) h2
T − t T ′ − T ij ij ij Kb h h 2 1 i=1 j6=j ′ T − t T ′ ′ − T X X 1 ij ij ij + 2 Kb ǫij ǫi′ j ′ Aab,ij Bab,i′ j ′ Ka h2 h1 N1 h1 h2 ′ ′ +
1 2 N1 h1 h2
X
ǫij ǫij ′ Aab,ij Bab,ij ′ Ka
i6=i j,j
=
(1) R1ab (t)
(2) + R1ab (t) +
(3)
R1ab (t)
(say).
Note that we cannot apply classical exponential inequalities for U-statistics since kernel functions depend on i and i′ and observations are not identical. It is easy to see that uniformly in t, (S.25)
(2)
(1)
R1ab (t) = Op ((nh1 )−1 ) and R1ab (t) = Op (n−1 ). (3)
We evaluate R1ab (t) by using an exponential inequality as the one given in (3.5) of [S1] with A = C1 (log n)k m2max /(n2 h1 h2 ), B 2 = C2
(log n)2k m2max −1 (h1 + h−1 2 ), n3 h1 h2
1/2 1/2
1/2 1/2
C = C3 /(nh1 h2 ), and x = M log n/(nh1 h2 ) in the inequality and standard arguments in nonparametric regression as in [S3]. Note that we used a kind of truncation technique to handle ǫij and that we have to take sufficiently large k and M here. Hence we have (3)
R1ab (t) = Op
log n . n(h1 h2 )1/2
The above equation and (S.23)-(S.25) imply that (S.26)
R1 (t) = Op
1 log n + O p nh1 n(h1 h2 )1/2
14
uniformly in t. It follows from (S.19)-(S.22) and (S.26) that 2 c2 (t) = σ 2 (t) + (1 0)(L7 (t))−1 E1 (t) + h2 (1 0)(L7 (t))−1 L8 (t)(σ 2 )′′ (t) σ 2 log n log n + Op (h31 ) + Op (h32 ) + Op + Op . nh1 nh2
c2 (t) in Proposition 4 also follows from the above expresThe expression of σ sion. (iv) Representation of σ b(s, t). We can proceed almost in the same way as c 2 when we deal with σ (t). First we have uniformly in i, j, and j ′ , b ǫij b ǫij ′ = ǫij ǫij ′ − σ(Tij , Tij ′ ) + σ(Tij , Tij ′ )
(3)
(3)
+ ǫij Mij ′ E0 (Tij ′ ) + ǫij ′ Mij E0 (Tij )
(S.27) (S.28)
(1)
(1)
+ ǫij Mij ′ (β0 − βbI ) + ǫij ′ Mij (β0 − βbI ) (2)
(2)
+ ǫij h21 Mij ′ g0′′ (Tij ′ ) + ǫij ′ h21 Mij g0′′ (Tij ) r log n log n 3 + Op h21 . + Op (h1 ) + Op nh1 nh1
It is easy to see that the contributions of (S.27) and (S.28) to σ b(s, t) are s s 1 log n log n 2 h and O Op √ p 1 n nh23 nh23 uniformly in s and t, respectively. Therefore we have only to consider ǫij ǫij ′ − (3) (3) σ(Tij , Tij ′ ), σ(Tij , Tij ′ ), and ǫij Mij ′ E0 (Tij ′ ) + ǫij ′ Mij E0 (Tij ) in b ǫij b ǫij ′ . b9 (s, t) = A3n (s, t), which is defined after (3.4), we have for some Setting L positive constants C1 and C2 , s log n b 9 (s, t) = L9 (s, t) + Op and C1 I3 ≤ L9 (s, t) ≤ C2 I3 (S.29) L nh23
b 9 (s, t)}. Now we have uniformly uniformly in s and t, where L9 (s, t) = E{L in s and t, (S.30)
b 9 (s, t))−1 (E2 (s, t) + Bias2 (s, t) + R2 (s, t)) σ b(s, t) = (1 0 0)(L log n r log n 3 + Op (h1 ) + Op + Op h21 , nh1 nh1
15
where E2 (s, t) is defined in Proposition 4, Bias2 (s, t) is the term of σ(Tij , Tij ′ ), (3) (3) and R2 (s, t) is the term of ǫij Mij ′ E0 (Tij ′ ) + ǫij ′ Mij E0 (Tij ). It is easy to see that uniformly in s and t, s log n . (S.31) E2 (s, t) = Op nh23 Setting b 10 (s, t) L =
1 N2 h23
n X X
i=1 j6=j ′
1
Tij −s h3 Tij ′ −t h3
×K
T − s 2 2(T − s)(T ′ − t) T ′ − t 2 ij ij ij ij h3 h3 h23
T − s T ′ − t ij ij K , h3 h3
we have uniformly in s and t,
s log n b 10 (s, t) = L10 (s, t) + Op , L nh23
b 10 (s, t)} which is a bounded matrix function of (s, t). where L10 (s, t) = E{L c2 (t), uniformly in s Then we have, as in the proof of the representation of σ and t 2 ∂ σ σ(s, t) 2 (s, t) 2 ∂s ∂2σ b 9 (s, t) h3 ∂σ (s, t) + h3 L10 (s, t) Bias2 (s, t) = L (S.32) (s, t) ∂s∂t ∂s 2 2 ∂σ ∂ σ h3 ∂t (s, t) (s, t) ∂t2 s log n + Op (h33 ) + Op h23 . nh23 Finally we deal with R2 (s, t) in the same way as in the proof of the reprec2 (t). We use the same exponential inequality for U-statistics. sentation of σ We should consider (S.33)
n X n X X X 1 ǫi1 j1 ǫi2 j3 Aabc,i1 j2 Babc,i2 j3 N1 N2 h1 h23 i =1 i =1 j j = 6 j 1 2 3 1 2 T T T i1 j 2 − s i1 j 1 − t i2 j3 − Ti1 j2 Kb Kc , × Ka h1 h3 h3
16
where Kl (t) = tl K(t), a = 0, 1, b = 0, 1, and c = 0, 1. Note that Aabc,ij and Babc,ij are uniformly bounded functions of Xij , Zij , and Tij . This is a generalized U-statistics when we remove the summands of i1 = i2 and we recall (1.1) when we evaluate (S.33). It is easy to see that uniformly in s and t, (S.34) n X X X 1 ǫi1 j1 ǫi1 j3 Aabc,i1 j2 Babc,i1 j3 N1 N2 h1 h23 i =1 j1 6=j2 j3 1 T 1 T T i1 j3 − Ti1 j2 i1 j 2 − s i1 j 1 − t × Ka Kb Kc = Op . h1 h3 h3 nh1 (3)
In the same way as when dealing with R1ab (t), we obtain (S.35) X X X 1 ǫi1 j1 ǫi2 j3 Aabc,i1 j2 Babc,i2 j3 N1 N2 h1 h23 i1 6=i2 j1 6=j2 j3 T log n T T i2 j3 − Ti1 j2 i1 j 2 − s i1 j 1 − t × Ka Kb Kc = Op 1/2 h1 h3 h3 nh h3 1
1/2
with A = C1 (log n)k m3max /(n2 h1 h23 ), B = C2 (log n)k m2max /(n3/2 h1 h23 ), 1/2 1/2 C = C3 /(nh1 h3 ), and x = M log n/(nh1 h3 ) in the exponential inequality. Note that we should choose sufficiently large k and M . It follows from (S.34) and (S.35) that uniformly in s and t, log n . (S.36) R2 (s, t) = Op 1/2 nh1 h3 Note that we cannot relax the assumption of mmax = O(n1/8 ) in Assumption A1 when we derive (S.36). It follows from (S.29)- (S.32) and (S.36) that uniformly in s and t, σ b(s, t) − σ(s, t)
= (1 0 0)(L9 (s, t))−1 E2 (s, t) +
h23 2
∂2σ 2 (s, t) ∂s 2σ −1 ∂ (1 0 0)(L9 (s, t)) L10 (s, t) ∂s∂t (s, t) ∂2σ (s, t) ∂t2
+ Op (h31 ) + Op (h33 ) + Op
log n nh1
+ Op
log n . nh23
The expression given in Proposition 4 follows from the above expression.
17
b i . Set S.4. Proofs of Lemmas 2-8. First we state some results on Σ s r log n log n . δn = h22 + h23 + + (S.37) nh2 nh23 Then we have from Proposition 4 that uniformly in i,
Recall that
b i )|, |λmax (Σi − Σ b i )|} = Op (mi δn ). max{|λmin (Σi − Σ
b −1 − Σ−1 = Σ b −1 (Σi − Σ b i )Σ−1 Σ i i i i −1 b i )Σ−1 + Σ b −1 (Σi − Σ b i )Σ−1 (Σi − Σ b i )Σ−1 . = Σ (Σi − Σ i
i
i
i
i
We have from Assumption A4 and Proposition 4 that uniformly in i, −1 b |Σ−1 i (Σi − Σi )Σi |max = Op (mi δn ),
(S.38)
b i )Σ−1 |max = Op (m2 δn 2 ), b i )Σ−1 (Σi − Σ b −1 (Σi − Σ |Σ i i i i
(S.39)
where |A|max = maxi,j |aij | for any matrix A = (aij ). Besides, it follows from Assumption A4 that we have uniformly in i, (S.40) −1 −1 −1 b b max{|λmin (Σ−1 i (Σi − Σi )Σi )|, |λmax (Σi (Σi − Σi )Σi )|} = Op (mi δn ).
b −1 (Σi − Σ b i )Σ−1 (Σi − Σ b i )Σ−1 as in (S.40) We also have the same result for Σ i i i with mi δn replaced by (mi δn )2 . Proposition 4 also implies each element of −1 b Σ−1 has the form of i (Σi − Σi )Σi (S.41) mi X X (4) (3) (5) (2) (1) Dij (T i )E1 (Tij )+ Dijj ′ (T i )E2 (Tij , Tij ′ )+Di , Di (T i )h22 +Di (T i )h23 + j6=j ′
j=1
where (5)
Di
log n log n log n = mi Op h31 + h32 + h33 + + + nh1 nh2 nh23
uniformly in i. We state the following two useful facts before we start proving Lemmas 2-8, both hold uniformly in l: (S.42)
n
m
i=1
j=1
i 1 X 3X |Wijl | = Op (Kn−1 ) , mi n
18
(S.43)
mi X mi n 1X 2X |Wij1 l ||ǫij2 | = Op (Kn−1 ) , mi n
and
j1 =1 j2 =1
i=1
where Wijl denotes the lth element of Wij . We can prove them in the same way, except that we need a kind of truncation argument when showing (S.43), and we outline the proof of (S.42) in the following. To prove (S.42), we evaluate the expectation and variance and apply the Bernstein inequality. First note that we have uniformly in l, mi n o n X X 3 −1 |Wijl | = O(Kn−1 ). mi E n i=1
j=1
This follows from the local property of the B-spline basis and Assumption A2. In addition, since we have from Assumption A2 that mi n n o n m2 X X m3max X 3 X max 2 4 |W ||W | |W | + m m E ij l ij l ijl i i 1 2 n2 n2 j=1
i=1
=O
m2
max
nKn
+
i=1
m3max , nKn2
j1 6=j2
the variance is bounded from above by C1 n−19/20 uniformly in l. Each summand is bounded from above by C2 m4max /n = O(n−1/2 ). Hence (S.42) and the uniformity in l follow from the Bernstein inequality. Proof of Lemma 2. We can verify the result on n−1 h12,kl by using the local property of the B-spline basis and the Bernstein inequality for independent bounded random variables. Since n 1 X T −1 1 c b i )Σ−1 }W X i {Σi (Σi − Σ (H12 − H12 ) = i i n n i=1
n
+
1 X T b −1 b i )Σ−1 (Σi − Σ b i )Σ−1 }W , X i {Σi (Σi − Σ i i i n i=1
the desired result on n−1 (b h12,kl − h12,kl ) follows from (S.38), (S.39), and (S.42). The results on the Euclidean norm follow from those on the elements. Hence the proof is complete. Proof of Lemma 3. (S.44)
C1 n
n X i=1
We have from Assumption A4 that
n 1 1 C2 X T Wi Wi W Ti W i ≤ H22 ≤ mi n n i=1
19
for some positive constants C1 and C2 and for k = 0, 1, mi n n 1X 1 X 1X 1 T T (Zij Zij ) ⊗ (B(Tij )B T (Tij )). W = W i i n n mki mki i=1
i=1
j=1
Thus the first result follows from Assumptions A2 and A3 and the standard arguments on B-spline bases as in the proofs of Lemmas A.1 and A.2 of [12]. Since we have n 1 X T −1 1 c b i )Σ−1 }W i (H22 − H22 ) = W i {Σi (Σi − Σ i n n i=1
n
1 X T b −1 b i )Σ−1 (Σi − Σ b i )Σ−1 }W , W i {Σi (Σi − Σ + i i i n i=1
the second result follows from (S.40), the inequalities similar to (S.44), and Assumptions A2 and A3. The third result follows from the first and second results. Finally we deal with the fourth result. Note that c22 )−1 − (n−1 H22 )−1 (S.45) (n−1 H c22 )(n−1 H22 )−1 = (n−1 H22 )−1 (n−1 H22 − n−1 H c22 )−1 (n−1 H22 − n−1 H c22 ) +(n−1 H
c22 )(n−1 H22 )−1 . ×(n−1 H22 )−1 (n−1 H22 − n−1 H
By using the first, second, and third results and (S.45), we obtain the fourth one. Hence the proof is complete. Proof of Lemma 4. The first result follows from (S.40). The second one follows from Lemmas 2 and 3. The last one follows from the first two. Proof of Lemma 5. The first result follows from the fact n n n C1 X 1 1 X T −1 C2 X T W Ti W i ≤ W i Σi W i ≤ Wi Wi n mi n n i=1
i=1
i=1
for some positive constants C1 and C2 . Next note that n 1 X T −1 b i )Σ−1 } ǫi W i {Σi (Σi − Σ (S.46) √ i n i=1
n
=
1 X T −1 b i )Σ−1 } ǫ √ W i {Σi (Σi − Σ i i n i=1 n X
1 +√ n
i=1
b i )Σ−1 } ǫ . b i )Σ−1 (Σi − Σ b −1 (Σi − Σ W Ti {Σ i i i i
20
By employing (S.39) and (S.43), we can prove the stochastic order of the ele√ ments of the second term of the right-hand side is uniformly Op ( nKn−1 (h42 + h43 +log n/(nh2 )+log n/(nh23 ))). Thus the norm of this qKn -dimensional vector has the stochastic order of r n log n log n . Op h42 + h43 + + (S.47) Kn nh2 nh23 According to Proposition 4, the first term of the right-hand side of (S.46) can be decomposed into (S.48)
n
n
n
i=1
i=1
i=1
1 X T 1 X T 1 X T √ W i Q1i ǫi + √ W i Q2i ǫi + √ W i Q3i ǫi , n n n
where Q1i corresponds to the first and second terms in (S.41), Q2i corresponds to the third and fourth terms in (S.41), and Q3i corresponds to the fifth term in (S.41). Proposition 4 implies (2)
(3)
Q1i = Q1i h22 + Q1i h23 , where we have for s = 2, 3, (s)
(s)
max{|λmin (Q1i )|, |λmax (Q1i )|} = O(mi ) (s)
uniformly in i. Besides Q1i depends only on Ti for s = 2, 3. The (k, l) element of Q2i has the form of mi X
σikj σilj E1 (Tij ) +
X
′
σikj σilj E2 (Tij , Tij ′ ),
j6=j ′
j=1
kl where Σ−1 i = (σi ). Note that uniformly in l and i, mi X (σikl )2 = O(1). k=1
(5)
Uniformly in i, the elements of Q3i , Di in (S.41), have the order of log n log n log n . + + mi Op h31 + h32 + h33 + nh1 nh2 nh23 We can prove as in the proof of Lemma 3 that for s = 2, 3,
X C1 C2 (s) W Ti Q1i ǫi ≤ IqKn ≤ Cov n−1/2 IqKn Kn Kn n
i=1
21
for some positive constants C1 and C2 . Hence we have n −1/2 X T 2 2 Q ǫ W n 1i i = Op (h2 + h3 ). i
(S.49)
i=1
Similarly to the second term in the right-hand side of (S.46), we can demonstrate by using (S.43) that n r n log n log n log n −1/2 X T . Op h31 +h32 +h33 + + + W i Q3i ǫi = (S.50) n Kn nh1 nh2 nh23 i=1
Finally we evaluate the second term of (S.48) and it has a structure of Vstatistics. By exploiting the structure, we evaluate the expectations and the variances of the elements by using Assumption A2. Then we have n 1 1 1 1 −1/2 X T W i Q2i ǫi = Op √ . +p 2+√ +√ n nh2 nKn h2 nKn h23 nh3 i=1
The second result follows from (S.47), (S.49), (S.50), and the above equality. Proof of Lemma 6. This lemma can be proved in the same way as Lemma 5 and the details are omitted. Proof of Lemma 7.
From the definition of γ ∗ given after (5.5), we have T max |WijT γ ∗ − Zij g0 (Tij )| = Op (Kn−2 )
1≤j≤mi
uniformly in i. The above equality and (S.42) imply that the elements of n
1 X T −1 W i Σi (W i γ ∗ − (Z T g0 )i ) n i=1
is uniformly Op (Kn−3 ) and the first result follows from this. As for the second result, first we note that b −1 − Σ−1 |max = Op (mi δn ) |Σ i i
uniformly in i from (S.38) and (S.39). Recall that δn is defined in (S.37). Thus b −1 − Σ−1 )(W γ ∗ − (Z T g0 ) ) are bounded uniformly the elements of W Ti (Σ i i i i in l by mi X |Wijl | CKn−2 δn m2i j=1
22
with probability tending to 1 for some positive constant C. Hence the second result follows from (S.42). Proof of Lemma 8. This lemma can be proved in the same way as Lemma 7 and the details are omitted.
S.5. Theoretical results for general link functions. We state the results of Section 2 for general link functions when mi is uniformly bounded and ǫi satisfies the sub-Gaussian assumption, Assumption A6′ here. Note that we have no counterpart of Theorem 1 for general link functions even when mi is uniformly bounded. Let v1 and v2 be two processes each taking a scalar stochastic value at Tij , i = 1, . . . , n, j = 1, . . . , mi . Then we define two inner products of v1 and v2 by n
hv1 , v2 i∆ n
1X T v 1i ∆0i Vi−1 ∆0i v 2i and hv1 , v2 i∆ = E{hv1 , v2 i∆ = n }, n i=1
where v 1i and v 2i are defined in the same way as T i and T T T T ∆0i = diag µ′ (Xi1 β0 + Zi1 g0 (Ti1 )), . . . , µ′ (Xim β + Zim g (Timi )) . i 0 i 0
The associated norms are then defined by
∆ 1/2 kvk∆ and kvk∆ = (hv, vi∆ )1/2 . n = (hv, vin )
We now define the projections, with respect to k · k∆ , of the kth element of X onto Z T G and Z T GB by Π∆ Xk = argmin kXk − Z T gk∆ and Π∆n Xk = argmin kXk − Z T gk∆ , g∈G
g∈GB
where n o 1 nX kXk − Z gk = E (X ik − (Z T g)i )T ∆0i Vi−1 ∆0i (X ik − (Z T g)i ) , n T
∆
i=1
T g(T ), . . . , Z T g(T with X ik = (Xi1k , . . . , Ximi k )T and (Z T g)i = (Zi1 i1 imi )). imi ∗ We denote these projections by ϕ∆k = Π∆ Xk and ϕ∆k = Π∆n Xk , and define another one by b ∆n Xk , b∆k = Π ϕ
23
where
b ∆n Xk = argmin kXk − Z T gk∆ Π n. g∈GB
The arguments in Section 5.2 also apply to this ϕ∗∆k . Some matrices are necessary to present Proposition S.1 and we define them here. Let Pn Pn −1 −1 T T f = Pni=1 X i ∆0i Vi −1∆0i X i Pni=1 X i ∆0i Vi −1∆0i W i H T T i=1 W i ∆0i Vi ∆0i X i i=1 W i ∆0i Vi ∆0i W i ! f11 H f12 H = (say), f f H21 H22 f11·2 = H f11 − H f12 H f−1 H f21 , H 22
and
f11 = (H f11·2 )−1 . H
e V n be a p × p matrix whose (k, l)th element is Let Ω
n o 1X n E (X ik − (Z T ϕ∗∆k ) )T ∆0i Vi−1 ∆0i (X il − (Z T ϕ∗∆l ) ) . i i n i=1
f11·2 is an estimate of Ω e V n . We assume that there exists a Note that n−1 H e p × p positive definite matrix ΩV such that eV n = Ω eV . lim Ω
(S.51)
n→∞
We present Propositions S.1-S.3 before stating the assumptions for these propositions. By using Lemma S.1 we can prove Proposition S.1 based on the same arguments as those in [4]. Proposition S.1. (Asymptotic normality of βbV ) Under Assumption S in Section 2 for the norm here, (S.51), and Assumptions A1′ , A2′ , A3, A4′ , A5′ , and A6′ , we have f11 βbV = β0 + H
We also have e V is where Γ
f11 H
n n X i=1
n X i=1
f−1 H f21 )T ∆0i V −1 ǫ + op √1 . (X i − W i H i 22 i n
d e −1/2 (βbV − β0 ) → N(0, Ip ), Γ V
o f−1 H f21 )T ∆0i V −1 Σi V −1 ∆0i (X i −W i H f−1 H f21 ) H f11 . (X i −W i H 22 22 i i
24
We give in Proposition S.2 the semiparametric efficiency bound for estimation of β0 . It can be proved in the same way as Lemma 1 of [4] and the proof is omitted. We denote the semiparametric efficient score function of β by ∗ T ∗ ) . , . . . , ˜lβp l˜β∗ = (˜lβ1 Its expression is given in Proposition S.2. When Vi = Σi , we denote ϕ∗∆k (t) ˜∗ef f,k (t). by ϕ Proposition S.2. (Semiparametric efficiency bound) Under the same assumptions as in Proposition S.1, we have ∗ ˜ = lβk
n X T ˜ ∗ef f,k ) )T ∆0i Σ−1 (X ik − (Z T ϕ i {Y i − µ(X i β0 + (Z g0 )i )}, i
i=1
and the semiparametric efficient information matrix for β is given by 1 ˜∗ ˜∗ T e Σ with Vi = Σi in (S.51). E{lβ (lβ ) } = Ω n→∞ n lim
Proposition S.3 is parallel to Proposition 3. It can be proved in the same way as Corollary 1 of [4], and it also follows from Proposition S.1 and Lemma S.1 (vii). Thus the proof is omitted. Proposition S.3. (Oracle efficient estimator) Under the same assumptions as in Proposition S.1, we have with Vi = Σi in (2.2) √
1/2
d
e (βbΣ − β0 ) → N(0, Ip ). nΩ Σ
Now we describe assumptions for the above propositions. Here we need Assumption A6′ since we need some results from the empirical process theory in dealing with general link functions. Assumption A1′ . (i) µ(x) is twice continuously differentiable and inf x∈R µ′ (x) > 0. (ii) For some positive constant CB9 , we have lim sup |µ(x)|/|x|CB9 < ∞. |x|→∞
Assumption A2′ . The joint density functions fij (t) and fijj ′ (s, t) are uni-
25
formly bounded and we have for some positive constants CB1 and CB2 , n
CB1
m
i 1 XX fij (t) < CB2 on [0, 1] < n
and CB1
1.96 from the efficient estimation, leading to a significant treatment difference. Other than these, because the sample size in this study was rather large, the two types
20 3.8
3.8
3.6
3.6
3.4
3.4
3.2
3.2
Treatment Effects
Treatment Effects
CHENG ET AL.
3 2.8 2.6
2.6
2.4
2.4
2.2
2.2
0
5
10
15
20 Week
25
30
35
2
40
3.8
3.8
3.6
3.6
3.4
3.4
3.2
3.2
Treatment Effects
Treatment Effects
2
3 2.8
2.6 2.4
10
15
20 Week
25
30
35
40
0
5
10
15
20 Week
25
30
35
40
2.2
0
5
10
15
20 Week
25
30
35
2
40
3.8
3.6
3.6
3.4
3.4
3.2
3.2
Treatment Effects
3.8
3 2.8
3 2.8
2.6
2.6
2.4
2.4
2.2 2
5
3
2.4
2
0
2.8
2.6
2.2
Treatment Effects
3 2.8
2.2
0
5
10
15
20 Week
25
30
35
40
2
0
5
10
15
20 Week
25
30
35
40
Fig 2. Estimated treatment effects for the four treatment groups. The panels in the top, middle and bottom rows are respectively the proposed efficient estimates, the estimates assuming independence and the estimates based on the QIF method. The panels in the left and right columns are respectively for the females and the males. Red, green, blue and yellow curves are for treatment groups 1, 2, 3 and 4, respectively.
of estimates for all the constant and varying coefficients appear to be very similar. For the sake of comparison, we also present the estimation results for these regression coefficients from the estimating equation methods based on the QIF method [18]. The conclusions on the estimation significance and effect direction remain the same as for the efficient estimation while the magnitude of the estimated coefficients slightly differs. For this particular dataset, sometimes the QIF estimator seems to have smaller standard error than the efficient estimator. An explanation is that it choses a covariance structure like compound symmetry in the matrix basis, thus it will be more efficient than our estimator when this structure is plausible (which is possibly the case here). Otherwise, it is generally not as good when the covariance structure is mis-specified.
EFFICIENT ESTIMATION
21
In general, the CD4 count tends to increase with age in the fitted model. Our estimation results suggest that there exist interaction effects between treatment and sex. Specifically, for the females (sex=0), subjects receiving treatments 2, 3 and 4 tend to have increasingly higher CD4 counts than those under treatment 1. The effect for treatment 2 (as compared with treatment 1) is estimated as a constant and is significant, while those for the other two treatment groups are varying (the upper right and the lower left panels in Figure 1) with even greater positive differences from treatment 1. For the males (sex=1), subjects receiving treatments 2, 3 and 4 also tend to have higher mean CD4 counts than those receiving treatment 1. The interaction between treatment 2 and sex is varying over time (the lower right panel in Figure 1) while those for treatments 3 and 4 are constant. The effects of treatments 3 and 4 are significantly different from that of treatment 1, judging from Table 3. Also, we notice that the differences between treatments seem to be greater between the females than between the males. The estimated effects of the four treatment groups are plotted in Figure 2 for the efficient estimator, the working independence estimator and the QIF estimator. Note that treatment effects given by the efficient estimator rarely cross each other, giving nice interpretation and ordering of the different treatments, whereas this is not the case for those given by the QIF or the working independence estimator. Previous authors identified a similar pattern on the order of magnitude of the time-varying treatment effects [14]. However, they ignored the interactions between the treatments and sex. Our findings suggest the treatment effect curves might be rather different between the males and the females. 5. Proofs of the main results. 5.1. Additional assumptions and definitions. We denote the Euclidean norm of a vector a by |a|. Let λmin (A) and λmax (A) stand for the minimum and maximum eigenvalues of a symmetric matrix A, respectively. Besides, C, C1 , C2 , . . . are generic positive constants whose values may vary from line to line. Recall that the density function of Tij is denoted by by fij (t), i = 1, . . . , n and j = 1, · · · , mi . Also, we denote the joint density function of Tij and Tij ′ (j 6= j ′ ) by fijj ′ (s, t). In Assumptions A1 and A2, we consider sparse and irregular observation times. Note that we carry out two-dimensional smoothing in step 5 and there are three bandwidths involved in our method. Therefore we impose these restrictive assumptions to avoid complicated assumptions involving mi , mmax , and the bandwidths simultaneously. Roughly speaking, these assumptions imply we should have Pn 5 = O(n). m i i=1
22
CHENG ET AL.
Assumption A1. For some positive P constant CA1 , we have mmax ≡ max1≤i≤n mi = O(n1/8 ) and ni=1 mi < CA1 n. Assumption A2. The joint density functions fij (t) and fijj ′ (s, t) are uniformly bounded and we have for some positive constant CA2 , mi mi n n 1 X 4X 1X 1 X 1 fij (t) < CA2 on [0, 1], mi fij (t) ≤ < CA2 n mi n
1 1 < CA2 n
i=1 n X
j=1
X
i=1 j6=j ′
fijj ′ (s, t) ≤
1 n
i=1 n X i=1
and
j=1
m3i
X
fijj ′ (s, t) < CA2 on [0, 1]2 .
j6=j ′
Assumption A3. For some positive constants CA3 and CA4 , we have T X ZT Xij Xij ij ij CA3 Ip+q ≤ E T T Tij ≤ CA4 Ip+q , uniformly in i and j. Zij Xij Zij Zij
Assumption A4. For some positive constants CA5 and CA6 , we have CA5 ≤ λmin (Σi ) ≤ λmax (Σi ) ≤ CA6 mi , uniformly in i. Assumption A5. For some positive constants CA7 and CA8 , we have CA7 ≤ λmin (Vi ) ≤ λmax (Vi ) ≤ CA8 mi , uniformly in i. Assumption A6. For some positive constants CA9 and CA10 , we have E{exp(CA9 |ǫij |) | X i , Z i , T i } < CA10 , uniformly in i and j. Assumption A3 is a standard one and is necessary for identification of the constant coefficients and the varying coefficient functions. When ǫi consists of some stochastic process and i.i.d. errors, we have Σi = Ξ(T i ) + η 2 Imi , where Ξ(T i ) is positive definite. Hence we impose Assumptions A4 and A5 on Vi and Σi , respectively. In [4], it is assumed that ǫi has the sub-Gaussian property in order to deal with general link functions. The sub-Gaussian assumption prevents mi from tending to infinity. Assumption A6, which is less restrictive, is enough for the identity link function since we do not need to employ any results from the empirical process theory in this case. For g = (g1 , . . . , gq )T ∈ G, we define the sup and L2 norms by kgkG,∞ = Pq Pq R 1 2 2 j=1 supt∈[0,1] |gj (t)| and kgkG,2 = j=1 0 gj (t)dt. Assumptions A2 and A3 imply there are positive constants C1 and C2 such that (5.1)
C1 kgkG,2 ≤ kZ T gkV ≤ C2 kgkG,2
for any g ∈ G. The details are given in Lemma 1. In (2.3), we define two kinds of projections of Xk . We define another one here: (5.2)
b V n Xk = argmin kXk − Z T gkVn . bV k = Π ϕ g∈GB
23
EFFICIENT ESTIMATION
5.2. Spline approximation and projections. Recall we assume all the relevant functions are at least twice continuously differentiable and they and their second order derivatives are uniformly bounded. Hence the sup norm of approximation errors by spline functions is bounded from above by Capprox Kn−2 , where Capprox depends on the relevant functions. SeePCorollary 6.26 of [19]. 2 ) < ∞} and Note that h·, ·iV and k · kV are defined on {v | i,j E(vij T that {Z g} is a closed linear subspace due to (5.1). Therefore the projections ϕ∗V k = (ϕ∗V k1 , . . . , ϕ∗V kq )T , k = 1, . . . , p, exist uniquely. Next, we set Vi−1 = (vij1 j2 ). Note that ϕ∗V k = ΠV Xk defined in (2.3) satisfies that hXk − Z T ΠV Xk , Z T giV = 0 ∀g ∈ G . By representing the above equality explicitly, we can derive the following integral equations for ϕ∗V k (t). For d1 = 1, . . . , q, (5.3)
q X
(d ) ad21 (t)ϕ∗V kd2 (t)
d2 =1
(d1 )
=b
(t) +
Z
1
q X
0 d =1 2
(d )
cd21 (s, t)ϕ∗V kd2 (s)ds,
where (d ) ad21 (t)
n
m
i 1 XX E{Zijd2 vijj Zijd1 | Tij = t}fij (t), = n
b(d1 ) (t) = (d )
1 n
cd21 (s, t) = −
i=1 j=1 n X X
i=1 1≤j1 ,j2 ≤mi n X X
1 n
i=1 j1 6=j2
E{Xij1 k vij1 j2 Zij2 d1 | Tij2 = t}fij2 (t),
E{Zij1 d2 vij1 j2 Zij2 d1 | Tij1 = s, Tij2 = t}fij1 j2 (s, t). (d )
Let A(t) be the q × q matrix whose (d1 , d2 )th element is ad21 (t). Assumptions A2 and A3 imply that |A(t)| = 6 0 on [0, 1] and we set ψV∗ kd1 (t) = Pq (d1 ) ∗ d2 =1 ad2 (t)ϕV kd2 (t). Then (5.3) reduces to (S.2) of [3] and the same argument there applies. Therefore ϕ∗V k (t) has the required smoothness properties under similar regularity conditions. 5.3. Remarks on the proofs of Propositions 1–3. We can proceed as in [13] (and [3]) by replacing Zij , Z i , and ϕ∗k (t) in [13] (and Zij , Zi , and ϕ∗k (t) in [3]) with Wij , W i , and Z T ϕ∗V k (t), respectively. They used several lemmas in their proofs. We reorganize the corresponding lemmas in our setup into Lemma 1 given in the following. Its proof and outlines of the proofs of Propositions 1-3 are given in the supplement [5]. Lemma 1.
Assume that Assumptions A1-5 hold.
24
CHENG ET AL.
(i) There are positive constants C1 and C2 such that for any g ∈ G, C1 kgkG,2 ≤ kZ T gkV ≤ C2 kgkG,2 . (ii) There are positive constants C3 and C4 such that for any g ∈ GB , kgk2G,∞ ≤ C3 Kn kgk2G,2 ≤ C4 Kn (kZ T gkV )2 . (iii) There is a positive constant C5 such that for any β ∈ Rp and g ∈ GB , 1/2 kX T β+Z T gk∞ ≤ C5 Kn kX T β+Z T gkV , where kvk∞ = maxi,j |vij |. Besides, for some positive constant C6 , kvkV ≤ C6 kvk∞ . (iv) hZ T g , Z T g iV − hZ T g , Z T g iV 1 2 1 2 n T g kV kZ T g kV kZ 1 2 g1 ,g2 ∈GB sup
p = Op (Kn log n/n).
(v) For any positive constant M , we have hXj − Z T gj , Xk − Z T gk iVn − hXj −Z T gj , Xk −Z T gk iV = op (1) uniformly in gj ∈ GB and gk ∈ GB satisfying kgj kG,2 ≤ M and kgk kG,2 ≤ M . (vi) For any process δn taking scalar values at Tij such that kδn k∞ is unii formly bounded in n and {δn,ij }m j=1 are mutually independent in i, hδ , Z T giV − hδ , Z T giV p n n n sup = Op ( Kn /n)kδn k∞ . T V kZ gk g∈GB
bV k k∞ = (vii) We also suppose Assumption S holds. Then for k = 1, . . . , p, kϕ bV k )kV = op (1). bV k )kVn = op (1), and kZ T (ϕ∗V k − ϕ Op (1), kZ T (ϕ∗V k − ϕ
5.4. Proof of Theorem 1. Since we consider the identity link function, we have explicit expressions of βbΣ − β0 and βbΣ b − β0 : (5.4) βbΣ − β0 =H 11
n X i=1
− H 11
=I1 − I2 c11 (5.5) βbΣ b − β0 = H
n X i=1
i=1
=Ib1 − Ib2
−1 ∗ T (X i − W i H22 H21 )T Σ−1 i (W i γ − (Z g0 )i )
(say),
n X
c11 −H
−1 (X i − W i H22 H21 )T Σ−1 i ǫi
c−1 H c21 )T Σ b −1 ǫ (X i − W i H i 22 i
n X i=1
c−1 H c21 )T Σ b −1 (W i γ ∗ − (Z T g0 ) ) (X i − W i H 22 i i
(say),
25
EFFICIENT ESTIMATION
c11 , H c22 and H c21 are defined as in (2.4) with Vi = Σ b i , i = 1, . . . , n, where H ∗T T T ∗ ∗ ∗T and γ = (γ1 , . . . , γq ) satisfies |B (t)γj −g0j (t)| ≤ Cg Kn−2 , j = 1, . . . , q, for some positive constant Cg depending on g0 (t). Proposition 4 and Asbi ≤ sumption A4 imply that with probability tending to 1, C1 Imi ≤ Σ b −1 , C2 mi Imi uniformly in i for some positive constants C1 and C2 . As for Σ i b −1 − Σ−1 = Σ b −1 (Σi − Σ b i )Σ−1 Σ i i i i −1 b b −1 (Σi − Σ b i )Σ−1 (Σi − Σ b i )Σ−1 . = Σ (Σi − Σi )Σ−1 + Σ i
i
i
i
i
It follows from Proposition 4, Assumption A4, and the above identity that b i )Σ−1 + m2 Op h4 + h4 + log n + log n . b −1 − Σ−1 = Σ−1 (Σi − Σ (5.6) Σ 2 3 i i i i i nh2 nh23
The last term in the right-hand side of (5.6) is in the sense of eigenvalue evaluation. By using Assumption A4 and Proposition 4, we get an expres−1 b sion of each element of Σ−1 i (Σi − Σi )Σi . This expression, along with the assumptions for Theorem 1 and the local property of the B-spline basis, will be employed in the proofs of the following lemmas. These lemmas, assuming the same assumptions as in Theorem 1, are needed in order to evaluate Ib1 − I1 and their proofs are given in the supplement [5].
c12 , Lemma 2. Let h12,kl and b h12,kl be the (k, l) element of H12 and H respectively. Then we have uniformly in k and l, s r 1 log n log n 1 , h12,kl = Op (Kn−1 ), (h12,kl − b h12,kl ) = Kn−1 Op h22 + h23 + + n n nh2 nh23 n qK Xn l=1
h qK Xn l=1
(n−1 h12,kl )2
o1/2
= Op (Kn−1/2 ),
{n−1 (h12,kl − b h12,kl )}2
i1/2
= Kn−1/2 Op h22 + h23 +
r
log n + nh2
s
log n . nh23
Lemma 3. With probability tending to 1, C1 Kn−1 ≤ λmin (n−1 H22 ) ≤ λmax (n−1 H22 ) ≤ C2 Kn−1 for some positive constants C1 and C2 . We also have c22 − H22 ))|, |λmax (n−1 (H c22 − H22 ))| max |λmin (n−1 (H q p 2 2 −1 = Kn Op h2 + h3 + log n/(nh2 ) + log n/(nh23 ) .
26
CHENG ET AL.
c22 )|, |λmax (n−1 H c22 )| = Op (K −1 ) and Hence we have max |λmin (n−1 H n −1 −1 −1 −1 −1 −1 −1 c c max |λmin ((n H22 ) −(n H22 ) )|, |λmax ((n H22 ) −(n H22 )−1 )| p p is also bounded from above by Kn Op h22 +h23 + log n/(nh2 )+ log n/(nh23 ) .
c11 = 1 H11 + op (1) and 1 H c12 1 H c22 −1 1 H c Lemma 4. We have n1 H n n n n 21 = −1 1 1 1 n H12 n H22 n H21 +op (1), where op (1) means both componentwise and in c11 = nH 11 + op (1). the meaning of eigenvalue evaluation. Hence we have nH
Lemma 5. We have, for some positive constants C1 and C2 , P C2 cov √1n ni=1 W Ti Σ−1 i ǫi ≤ Kn IqKn . In addition we have
C1 Kn IqKn
≤
n 1 X b −1 − Σ−1 )ǫ W Ti (Σ √ i i i n i=1 r log n log n log n r n n Op + + Op (h31 + h32 + h33 ) = + Kn nh1 nh2 Kn nh23 1 1 1 1 . +√ +p 2+√ +Op (h22 + h23 ) + Op √ nKn h2 nKn h23 nh2 nh3
Lemma 6. We have for some positive constants C1 and C2 , C1 Ip ≤ Pn 1 cov √n i=1 X Ti Σ−1 i ǫi ≤ C2 Ip . In addition we have n 1 X b −1 − Σ−1 )ǫ X Ti (Σ √ i i i n i=1 log n log n log n √ √ = + + + nOp (h31 + h32 + h33 ) nOp nh1 nh2 nh23 √ √ +Op (h22 + h23 ) + Op 1/( nh2 ) + 1/( nh23 ) .
Now we prove that Ib1 − I1 = op (n−1/2 ). Write
I1 = H 11
n X i=1
−1 11 X Ti Σ−1 i ǫi −H H12 H22
n X i=1
11 W Ti Σ−1 i ǫi = H (I11 −I12 ) (say).
We define Ib11 and Ib12 similarly. From Proposition 1 and Lemma 4, we have only to prove (5.7)
1 √ (Ib11 − I11 ) = op (1) n
and
1 √ (Ib12 − I12 ) = op (1). n
27
EFFICIENT ESTIMATION
The former result in (5.7) can be handled in the same way as the latter and we consider only the latter. Write n
1 c 1 c −1 1 X T b −1 1 √ (Ib12 − I12 ) = H √ W i (Σi − Σ−1 H22 12 i )ǫi n n n n i=1
n
+
−1 1 X T −1 1 1 c 1 c −1 √ W i Σ i ǫi H12 H22 − H22 n n n n i=1
n
1 −1 1 X T −1 1c 1 √ + H12 − H12 H22 W i Σ i ǫi n n n n i=1
=
(1) DI12
+
(2) DI12
+
(3) DI12
(say).
Lemmas 2, 3, and 5 imply log n log n log n √ √ (1) + + + nOp (h31 + h32 + h33 ) DI12 = nOp nh1 nh2 nh23 1 p 1 1 1 √ p √ √ + + + + Kn Op nKn h2 nKn h23 nh2 nh23 p + Kn Op (h22 + h23 ) = op (1), q p p (j) DI12 = Kn Op h22 + h23 + log n/(nh2 ) + log n/(nh23 ) = op (1), j = 2, 3. Hence we have established
Ib1 − I1 = op (n−1/2 ).
(5.8)
Next we deal with Ib2 − I2 and two more lemmas are necessary. Lemma 7.
n 1 X √ T −1 ∗ T −5/2 √ Σ (W γ − (Z g ) W ) ), and 0 i = Op ( nKn i i i n i=1 n 1 X −1 T b −1 ∗ T √ W i (Σi − Σi )(W i γ − (Z g0 )i ) n i=1 q p √ 2 2 −5/2 nKn Op h2 + h3 + log n/(nh2 ) + log n/(nh23 ) . =
28
CHENG ET AL.
Lemma 8. n 1 X √ ∗ T X Ti Σ−1 (W γ − (Z g ) ) = Op ( nKn−2 ) and √ 0 i i i n i=1 n 1 X b −1 − Σ−1 )(W i γ ∗ − (Z T g0 ) ) X Ti (Σ √ i i i n i=1 p p √ = nKn−2 Op h22 + h23 + log n/(nh2 ) + log n/(nh23 ) . Now we can show that Ib2 − I2 = op (n−1/2 ). Write I2 =H
11
n X
∗ X Ti Σ−1 i (W i γ
i=1
11
T
− (Z g0 )i ) − H
11
−1 H12 H22
n X i=1
=H (I21 − I22 ) (say).
∗ T W Ti Σ−1 i (W i γ − (Z g0 )i )
c11 (Ib21 − Ib22 ). From PropoWe define Ib21 and Ib22 similarly and write Ib2 = H sition 1 and Lemma 4, we have only to prove √1n (Ib21 − I21 ) = op (1) and √1 (Ib22 − I22 ) = op (1). The former result in the above can be handled in the n
same way as the latter and we consider only the latter. Write 1 √ (Ib22 − I22 ) = n
1 c 1 c −1 1 X T b −1 ∗ T √ H12 H22 W i (Σi − Σ−1 i )(W i γ − (Z g0 )i ) n n n n
i=1
1c + H 12 n +
1
n
=
n 1
n
c22 H
−1
c12 − 1 H12 H n (1) DI22
+
−
1
(2) DI22
1
n
H22
n −1 o 1 X ∗ T √ W Ti Σ−1 i (W i γ − (Z g0 )i ) n i=1
n −1 1 X ∗ T √ H22 W Ti Σ−1 i (W i γ − (Z g0 )i ). n n
+
i=1
(3) DI22
(say)
Lemmas 2, 3, and 7 imply, for j = 1, 2, 3, q p (j) √ 2 2 −2 DI22 = nKn Op h2 + h3 + log n/(nh2 ) + log n/(nh23 ) = op (1).
Hence we have established Ib2 − I2 = op (n−1/2 ). The desired result follows from (5.4), (5.5), (5.8) and the above result.
Acknowledgements. The authors thank the associate editor and three referees for their thoughtful and constructive comments on a previous submission, which led to significant improvement of this paper.
EFFICIENT ESTIMATION
29
SUPPLEMENTARY MATERIAL Supplement A: Additional simulation results and technical material (doi: xx.xxxx/xx-AOSxxxxSUPP). Additional simulation results, proofs of the propositions and lemmas, and theory for the case of uniformly bounded cluster size and general link function. References. [1] Cheng, G. and Wang, X. (2011). Semiparametric additive transformation model under current status data. Electronic J. Statist. 5 1735–1764. [2] Cheng, G., Yu, Z. and Huang, J. Z. (2013). The cluster bootstrap consistency in generalized estimating equations. J. Multivariate Anal. 115 33–47. [3] Cheng, G., Zhou, L. and Huang, J. Z. (2014). Supplement to “Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data”. doi: 10.3150/12-BEJ479SUPP. [4] Cheng, G., Zhou, L. and Huang, J. Z. (2014). Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data. Bernoulli 20 141-163. [5] Cheng, M. Y., Honda, T. and Li, J. (2015). Supplement to “Efficient estimation in semivarying coefficient models for longitudinal/clustered data”. doi: xx.xxxx/xxAOSxxxxSUPP. [6] Cheng, M. Y., Honda, T., Li, J. and Peng, H. (2014). Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data. Ann. Stat. 42 1819-1849. [7] Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of covariance function. J. Amer. Statist. Assoc. 102 632–641. [8] Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710 – 723. [9] Fan, J., Ma, Y. and Dai, W. (2014). Nonparametric independence screening in sparse ultra- high dimensional varying coefficient models. J. Amer. Statist. Assoc. 109 1270–1284. [10] Fan, J. and Wu, Y. (2008). Semiparametric estimation of covariance matrixes for longitudinal data. J. Amer. Statist. Assoc. 103 1520–1533. [11] Henry, K., Erice, A., Tierney, C., Balfour, H. H. J., Fischl, M. A., Kmack, A., Liou, S. H., Kenton, A., Hirsch, M. S., Phair, J., Martinez, A., Kahn, J. O. and for the AIDS Clinical Trial Group 193A Study Team (1998). A randomized, controlled, double-blind study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternating drug) for the treatment of advanced AIDS. Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology 19 339-349. [12] Huang, J. Z., Wu, C. O. and Zhou, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statist. Sinica 14 763–788. [13] Huang, J. Z., Zhang, L. and Zhou, L. (2007). Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. Scand. J. Statist. 34 451–477.
30
CHENG ET AL.
[14] Li, Y. (2011). Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation. Biometrika 98 355–370. [15] Lin, X. and Carroll, R. J. (2006). Semiparametric estimation in general repeated measures problems. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 69-88. [16] Lin, X., Wang, N., Welsh, A. H. and Carroll, R. J. (2004). Equivalent kernels of smoothing splines in nonparametric regression for clustered/longitudinal data. Biometrika 91 177-193. [17] Ma, S. (2012). Two-step spline estimating equations for generalized additive partially linear models with large cluster sizes. Ann. Stat. 40 2943–2872. [18] Qu, A. and Li, R. (2006). Quadratic Inference Functions for Varying-Coefficient Models with Longitudinal Data. Biometrics 62 379–391. [19] Schumaker, L. L. (2007). Spline Functions: Basic Theory, 3rd ed. Cambridge University Press, Cambridge. [20] Shen, S. L., Cui, J. L., Mei, C. L. and Wang, C. W. (2014). Estimation and inference of semi-varying coefficient models with heteroscedastic errors. J. Multivariate Anal. 124 70–93. [21] Tian, R., Xue, L. and Liu, C. (2014). Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data. J. Multivariate Anal. 132 94–110. [22] Wang, L. and Qu, A. (2009). Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 177–190. [23] Wang, N., Carroll, R. J. and Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Amer. Statist. Assoc. 100 147–157. [24] Wu, H. and Zhang, J. T. (2006). Nonparametric Regression Methods for Longitudinal Data: mixed-effects modeling approaches. Wiley, New York. [25] Xia, Y., Zhang, W. and Tong, H. (2004). Efficient estimation for semivaryingcoefficient models. Biometrika 91 661–681. [26] Yao, W. and Li, R. (2013). New local estimation procedure for a non-parametric regression function for longitudinal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 75 123–138. [27] Zhang, W., Fan, J. and Sun, Y. (2009). A semiparametric model for cluster data. Ann. Stat. 37 2377-2408. [28] Zhou, J. and Qu, A. (2012). Informative estimation and selection of correlation structure for longitudinal data. J. Amer. Statist. Assoc. 107 701–710. M.-Y. Cheng Department of Mathematics National Taiwan University Taipei 106, Taiwan E-mail:
[email protected]
T. Honda Graduate School of Economics Hitotsubashi University Kunitachi, Tokyo 186-8601, Japan E-mail:
[email protected]
J. Li Department of Statistics & Applied Probability National University of Singapore Singapore 117546 E-mail:
[email protected]
Treatment Effects
3.5
3
2.5
2
0
5
10
15
20 Week
25
30
35
40
Treatment Effects
3.5
3
2.5
2
0
5
10
15
20 Week
25
30
35
40