... of \Vashington. Seattle, Washington, 98195 USA ..... Then P is convex: i.e. for all pp, PPo E P, and for 0 E [0,1], opp +(1 0) PPo = Poe. Po E P. As shown by ...
Estimation with Univa.ria.te "Mixed Case" Interval Censored Data
by Shuguang Song The Boeing Company
TECHNICAL REPORT No. 414 August, 2002
Department of Statistics Box 354322
University of \Vashington Seattle, Washington, 98195 USA
Estimation with Univariate "Mixed Case" Interval Censored Data Shuguang Song The Boeing Company
Abstract: In this paper, we study the Nonparametric Maximum Likelihood Estimator (NPMLE) of univariate "Mixed Case" interval-censored data in which the number of observation times, and observation time themselves are random variables. We provide a characterization of the NPMLE, then formulate the ICM algorithm for computing the NPMLE. We also study the asymptotic properties of the NPMLE: consistency, global rates of convergence with and without a separation condition, an asymptotic minimax lower bound and pointwise asymptotic distribution for the "toy estimator" proposed by GROENEBOOM AND WELLNER (1992)
Key words and phrases: convex minorant algorithm, maximum likelihood, empirical processes, interval censored data, consistency, rate of convergence, asymptotic distribution.
1
Introduction and Formulation of the Problem
Univariate interval censoring problem arises when a random variable, such as failure time or onset of disease, can not be directly observed, but is only known to be in an interval determined by several observation times. Interval censored data and the corresponding application of statistical methods can be found in animal carcinogenicity, epidemiology, HIV and AIDS studies, see FINKELSTEIN AND WOLFE (1985), FINKELSTEIN (1986), SELF AND GROSSMAN (1986), BECKER AND MELBYE (1991) and ARAGON AND EBERLY (1992). There are three types of univariate interval censorship models: 2" and "Case k" interval censoring, interval censoring or current status interval CellSC)f]Jlg. Suppose that Y tUllCtiorlS on
data observed are: The nonparametric maximum likelihood estimation 1" and 2" inf./3rv"J censored data are well summarized in GROENEBOOM AND VVELLNER (1992), GROENEBOOM (1996), HUANG AND WELLNER (1997). WELLNER (1995) studied the NPMLE of the interval censoring in which each subject has exactly k examination times. To compute the NPMLE, TURNBULL (1976) derived self-consistency equations for a very general censoring scheme and proposed solving the equations by the EM algorithm. GROENEBOOM (1991) developed an iterative convex minorant algorithm (ICM). The ICM algorithm is considerably faster than the EM algorithm especially when the sample size is large; see GROENEBOOM AND WELLNER (1992). ARAGON AND EBERLY (1992) and Jongbloed (1995, 1998) proposed modifications of the ICM algorithm, and Jongbloed (1995, 1998) showed that his modified algorithm always converges. Very often in clinical trials, each patient has several follow-ups and the number of followups differs from patient to patient. This motivates the study of the following model. Let T Tk,j, j 0,1, ... ,k, k + 1, k = 1, ... , be a triangular array of "potential observation times" with Tk,o 0 and Tk,k+I ::::: +00, and let K, the number of observation times, be an integer-valued random variable such that Y and (K, are independent. \Vhat we can observe is a vector X = (2.l.K' TK, K), with possible value x = (Ok, tk, k), where Tk is the kth row of the triangular array T, 2.l.k (~k,ll' .. ,2.l.k,k) with 2.l.k,.f = 1(Tk.j_l,Tk,j](Y)' j 1,2, .. , , k + 1. Suppose we observe n i.i.d. , . , ex·. X' 1,..i1.2, v v rh ere X·~~ (,A 1 , 2 , ... ,n. ' Here (}.r(i) ,'T,(i) .. copleso~ . . . ,An'" u (i)(i)1 'T(i) K(i)1 K(i)).. ,Z _ , K) ~ ,z K
1,2, ... , are the underling i.i.d. copies of (Y, T, K). SCHICK AND Yu (2000) referred to this model as the "Mixed Case" interval censoring model. They proved strong consistency in the L 1 (fl)-topology of the NPMLE for a measure fl which is derived from the distribution of observation times. VAN DER VAART AND WELLNER (2000) gave a different formulation of "Mixed Case" interval censoring. They noted that conditional on K and T K , the vector has a multinomial distribution:
where Suppose for moment that the distribution G k of (TKIK Then the de11sil:y X is
.1)
has density gk, and
P(K
(K,
with respect to the dominating measure v which is determined by the joint distribution Thus the log-likelihood function for F of b , ... ,Xn is given by 1
Ki+l
= -n '" L::.K.Jlog(F(TK(i) ~ '" ~.. . ..
(1.2)
i=1 j=1
where
Ki+1 mF(X) =
L
F(T~;,j_1))
L::.Ki,jlog(F(T~;
j=1
K+1
L
j=1
6.Ki,jlog(6.F(T~;,j))'
VAN DER VAART AND WELLNER (2000) proved the consistency of the NPMLE of the "Mixed Case" interval censoring in Hellinger distance, and also recovered the results of SCHICK AND Yu (2000) by using preservation theorems for Glivenko-Cantelli classes. Here, we will use the above formulation of the "Mixed Case" interval censoring problem. In Section 2, we give a characterization of the NPMLE. We then formulate the ICM algorithm for computing the NPMLE. In Section 3, we state the main asymptotic properties of the NPMLE: consistency, global rates of convergence with and without a separation condition, an asymptotic minimax lower bound and pointwise asymptotic distribution. The results in Section 3 are proved in Section 5 by using modern empirical process theroy. There are other two estimators for the univariate "mixed case" interval censored data that are based on the non-homogeneous Poisson process model: nonparametric maximum likelihood estimator and nonparametric maximum pseudo-likelihood estimator (NPMPLE), see WELLNER AND ZHANG (2000). We denote these two estimators as NPMLE wZ and NPMPLE wz . In Section 4, we present simulation studies to compare the asymptotic relative efficiency of the above three estimators.
2
Characterization and Computation of the NPMLE
Let tl ::; t2 ::; ... ::; t m denote the ordered distinct observation time points in the set of all observation time points: {TKi,j, j = 1, ... , Ki,i = 1, ... , n}. Define the rank function R:
,j,
= s.
J
1, ... , Ki, if
1, ... ,
such that
--t {I, 2, ... ,
s
1, ... ,
< rewritten as:
F
and
the same reason as explained in 2" interval m GROENEBOOM WELLNER 1992), we can take tl to be an observation point TKi,j such 1, and to be an observation point such that l(TK, (Y'i) 1, Note that n is a convex In is a concave function due to the linear combination the concave function "log". Thus, the characterization of the NPMLE Fn is given by the following slightly modified version of the Fenchel duality theorem in Lemma 3.1 of WELLNER AND ZHAN (1997): Theorem 2.1 Let ¢ : R n -+ R U { -oo} be a continuous concave function. Let K U R n be a convex cone, and let K o K n (R). Suppose K o is nonempty and ¢ is differentiable on K o. Then i;. E K o satisfies
if and only if
(2.1)
(i;., ¢(i;.))
0,
and
v ¢ (i;.))
(2.2)
::;
°
for all ;f E K.
Note that if;f
(0, ... ,0,1, ... ,1), E > 0, then
i;.+dj, where 1j
'-..-' '-..-' m-j j
v ¢(i;.)) m
v ¢(i;.))
0+
EI: l=m-j+l
Thus, (2.2) becomes Tn
I: 'i
= 1,2, ... , m. this case,
any
E
n,
can be written as
::; 0,
::; o.
where Ki
L
j=l
and 6K(k)
={
"J
1, if R(TK .. i,J) 0, otherwIse.
= k,
Then, by applying (2.1) and (2.3), we have the following characterization of the NPMLE
Fn :
Theorem 2.2 Let T(l) correspond to an observation time TKi,j such that £lKi,j = 1, and let T(m) be the largest ordered observation times such that the corresponding £lKi,j = 0. Then the unique NPMLE maximizes In(FIX) over all F E :F if and only if m
(2.5)
n
L L li,k(FnIX) ~ 0, k=l i=l
and
(2.6)
Denote
where
for l = 1,2, ... ,m,
the G(F,') and V(F,
==
processes 0, p
G(F, V(F,O)
= =
p
I) -lkk(F1X)) =
L
k=l
k=l
L .,.",,,,,."'-,=-=-; i=1
0, P
V(F,
=
L(lk(FIX) k=1
+ F(tk)(-lkk(FjX)))
n
p
L L [li,k k=1 i=1
where p
F(tk) li.kk(FIX)] ,
1,2, ... , m. Since p
L k=1
G(F,pIX)
V(F,pIX)
=
by using Theorem 4.3 of WELLNER AND ZHAN (1997), we can also characterize the NPMLE the slope of the convex minorant of a self-induced cumulative diagram:
Fn
as
Theorem 2.3 Let 1(1) correspond to an observation time 1Ki ,j such that D.K;,j 1, and let be the largest ordered observation times such that the corresponding D.Ki,j = O. Then is the NPMLE of Fo if and only if is the left derivative of the convex minorant of the cumulative sum diagram consisting the following points
Po
p I , ... ,m.
convex minorant
the cumulative Slun
u.W'OL'UH
can be
and Proposition 1.4 of GROENEBOOM AND WELLNER 1992). Thus, the computation of the NPMLE of "Mixed Case" interval censoring can be reduced to the "Case 2" interval censoring as noted HUANG AND WELLNER and VAN DER VAART AND WELLNER The algorithm has been applied in many problems: see GROENEBOOM AND WELLNER (1992), Jongbloed (1995, 1998), WELLNER AND ZHAN (1997), WELLNER AND ZHANG (2000). To prove the global convergence, JONGBLOED (1995) designed a modified iterative convex minorant algorithm (MICM) by inserting a binary line search procedure. In this case, the NPMLE of the "Mixed Case" interval censoring can be calculated by the MICM algorithm as for the "Case 2" interval censoring.
3
Asymptotic Properties of the NPMLE: Results
In this section, we will study the asymptotic properties of the NPMLE: consistency, global rate of convergence in Hellinger distance, a local asymptotic minimax lower bound, and pointwise asymptotic distribution of the "toy estimator". The "toy estimator", introduced by GROENEBOOM AND WELLNER (1992), is obtained by taking one step in the iterative convex minorant algorithm starting with the real underlying distribution function Fa. It is conjectured that this toy estimator has the same asymptotic distribution as the NPMLE. The following regularity conditions are needed:
A. EK
0 such that
... < tk,k < tk,k+1 ==
00,
P(TK,j - TK,j-l
there exists a 77
> 8)
1, for every
> 0 such that the underlying
satisfies
1, ... ,k + 1.
for all j D.
1
density of such that
be the density of the joint distribution Q(tk,j, tk,j-lIK = ,and let be the with respect to Lebesgue measure on R. Suppose that there exist Co > 0, CI > 0
< 1
x.
for
t,
8
= 1 2, ... , k
1,
,2, ... ,
F. Fix to E . Then there is a neighborhood to such that distribution of Qk,j of is differentiable, and IS cOIltUlUC'US, positive and uniformly bounded allj 1,2, ... ,k,k=1,2, .... G. In a neighborhood of H. In a neighborhood of to,
(3.7)
3.1
is differentiable, its density
is continuous and strictly positive. is defined as
is continuous and strictly positive, where
a(t)
Consistency
(2000) have proved the consistency of the NPMLE F'n by using preservation theorems for Glivenko-Cantelli classes. Here, we give another approach to the proof. Then, following this approach, we continue to study the rate of convergence of the NPMLE. In the probability space (n, A, P), where n is {O, 1 }k+l x x {k}; A is a a-field generated by Akl x Bk X Ak3' Akl is the class of all subsets of {eJ (0, ... ,~, 0, ... ,0) : j 1, ... ,k + I}, VAN DER VAART AND WELLNER
Rt
j-th
Bk is a a-field generated by the set of all k-dimensional rectangles on Rt, A k3 = a {0, {k } }; a probability measure with density defined in (1.1) and dominating measure v defined as (3.8)
v( {ej} x B k x {k})
PF
is
counting measure on {ej}
xP(K = k) x P(T E BklK
k).
be the conditional distribution function of . Let Qk,j denote the Q(tk,b"" IK 1,2, ... , conditional on K = k. Then the marginal distribution of Tk,j, for j = 1,2, ... , k, k Hellinger distance is
that
~J j=l J{~
elC
~P(K k=l =
~P(K k=l 00
=
~P(K
k)
=
-
F(tk'H)]} dq(hl K
J
[F(tk,k+l) - F(tk,O)] dQ(hi K
k)
k=l
=
~P(K k=l
k) = 1.
For fixed Fo E F and Pp as defined in (1.1), let
(3.9)
{pp: F E F}, pp PPo 2pp _ 1. pp + PPo PP + PPo {mp: F E F},
P
(3.10)
mp
(3.11)
(3.12)
-
{mp - mpo : h(PP,PFb)
< 5, FE F}.
Then P is convex: i.e. for all pp, PPo E P, and for 0 E [0,1], opp + (1 0) PPo = Poe Po E P. As shown by VAN DER VAART AND WELLNER (2000), page 123, the Hellinger distance h(PFn ,PPo) is less than or equal to PIIM for a density pp with respect to a dominating measure v. To show that h(po ,PF'n) -7nq 0, we only need to prove that the class A1 is a Glivenko-Cantelli class by showing t-h'a:t it~ bra~k~ting numbers are fi~ite for each E > 0, see Theorem 2.4.1 in VAN DER VAART AND WELLNER (1996). It follows from the following Lemma 3.1 that the bracketing number u
(E, A1, Lr(P)) is bounded by Lemma 3.1
(E, P, L r (Qa)), where dQa
for
For any integer r 2 1, let
sup {(J 2 0: {pp+
J
dv S;
EP}.
r
E }
,
(J > 0.
dQ = (J
--':::":-t-- av
For univariate class P defined in EK < 00.
the two lemmas show that the and the class ./'-"1 defined in (3.11) are both Glivenko-Cantelli classes under
Lemma 3.2
If EK
O. Let I PK,j(B), where B
for
(E,F,LdQ)). Then for the probability measure P(TK,j
BIK =
E B, there exists a set of brackets:
[F{ , F{ ] ' ... , [Ff, F[ ] ' ... , such that for any F E F, Ff (t) Since
:s; F(t):S;
1, ... ,I,
(t) for all t and somei, and EpK,j IFf (tk,j) - F[ (tk,j) I :s;
and it follows that
Let the corresponding brackets for P be defined by K+1
II (Ft(tK,j)
P~,i
(5.20)
F[(tK,j-d
K+l
==
(5.21)
II (F[ j=l
Then
!
!
dv
PFo
:s;
K
{
E.
::; 2 E
('2:: P(K
= 2E(K
l)E
< 00,
k=l
if EK
< 00. This implies that: logN[ (2EE(K+1),P,L 1 (P))
Thus, by Theorem 2.4.1 in under EK < 00.
VAN DER VAART AND WELLNER
(1996), P is a Glivenko-Cantelli class 0
Remark: From the above proof, we also have that
(5.22) if EK
< 00. Here v is defined in (3.8).
Proof of Lemma 3.3
Let a
«uPlodV
0-1- dv. o
. PFo
Let [pL pi] be a pair of bracket for the class P as defined in (5.20), (5.21) . Note that
>0-1
1
·PFo
Define
1 >0-1'Pf'o
Then
/
J II( =
dv.
::; {/(F[
dQ(hiK
Thus,
=
1- dV} lPFO
0"1-
This implies that
Now for fixed k :::: 1 and 1 ::; j ::; k,
/l[FoC t k,j
Co, c do not depend on j and k by assumption D. Hence we have
/
k
L
. Then, if EK
Take a
-
,]
-=-::.~:......
+ rto+\Fo(s) _ Fo(to))dG~O) Jto
and
M(h)
E [l:O+h(Fo(S) -
Fo(to))dG~O)(s)].
It follows that to prove (5.28) is equivalently to prove
n 1/ 3 hn
= n 1/ 3 argminn Mn (h)
Op(1).
Note that ho = 0 is the only location of the minima of M(h). As shown before, we have, using conditions F, G and H,
M(h)
M(h o)
r rto+h E lito (Fo(s)
l
to +h
to
~
as h -+ O. The difference between the process
(h)
Fo(to) )dG~O) (.s)
]
(Fo(s) - Fo(to))a(s)d.s.
(to)a(to)h 2
o(h 2 ),
(h) and the process
is
r+
o h
d
Ito
[t (
+n
j=1
CiKi,J 2 (CiFOKi,j)
+ --=~~ (CiFoKi,j+l)
. (1[TKi,j9o+hj Combining this with the fact that
vnl
(h) -
[t
CiFo K
)=1
Gn
[t(( j=1
(h o)
l[ TK i,j
(Fo(TKi,j) - Fo(tO))]
M(h o ) = 0 yields
(h o)1 ' to)
CiKi,j 2 CiFOKi,j)
--'-"-'~ +
-,-----"=------:;:
(CiFoKi ,j+1
Define a class :F~) as
)
:
::;D~. J
The envelope function of the class :F~) can be chosen as -=~L:!:.!:_I
CiRo'K't,J'-'-1
1
I
Following the same calic:ul:a,tlC)ll of E
E
2
. we
r
!{t =
I:P(K k=l
K
2
. I:1[
} NOd"
It{
1
K
} dQ(!kI K
'I:1[ j=l 00
< 2C I : P(K
k) k
k=l
=
2cf
J
k)
k
I : 1[ j=l
dQk,j
[t rt°+ Q~,j(s) D
P(K = k) k
k=l
dS]
j=l Jto-D
< 4C C 1 D (EK 2 ) < 00, by conditions B, F and G. Similarly, following the same argument that gave
we obtain
It follows that
dE = 0(1). Thus we
using Theorem 2.14.2 in
VAN DER VAART AND WELLNER