Every Missing Not at Random Model Has Got A Missing at Random Counterpart With Equal Fit Geert Molenberghs Center for Statistics Universiteit Hasselt, Belgium
Biostatistical Centre Katholieke Universiteit Leuven, Belgium
[email protected]
[email protected]
www.censtat.uhasselt.be
www.kuleuven.ac.be/biostat/
Graybill Conference, June 12, 2008
Toenail Data De Backer, De Keyser, De Vroey, Lesaffre (British Journal of Dermatology 1996)
• Toenail Dermatophyte Onychomycosis: Common toenail infection, difficult to treat, affecting more than 2% of population • Classical treatments with antifungal compounds need to be administered until the whole nail has grown out healthy • New compounds have been developed that reduce treatment to 3 months • Randomized, double-blind, parallel group, multicenter study for the comparison of two such new compounds (A and B) for oral treatment
Graybill Conference, June 12, 2008
1
• Design: . 2 × 189 patients randomized, 36 centers . 48 weeks of total follow up (12 months) . 12 weeks of treatment (3 months) . Measurements at months 0, 1, 2, 3, 6, 9, 12
• General research question:
Are both treatments equally effective for the treatment of TDO?
Graybill Conference, June 12, 2008
2
Unaffected nail length (mm)?
Severity relative to treatment of TDO?
• As response is related to toe size, we restrict to patients with big toenail as target nail =⇒ 150 and 148 subjects • 30 randomly selected profiles, in each group:
Complication: Dropout (24%): Graybill Conference, June 12, 2008
3
The Slovenian Plebiscite Rubin, Stern, and Vehovar (1995)
• Slovenian Public Opinion (SPO) Survey • Four weeks prior to decisive plebiscite • Three questions: 1. Are you in favor of Slovenian independence? 2. Are you in favor of Slovenia’s secession from Yugoslavia? 3. Will you attend the plebiscite? • Political decision: ABSENCE≡NO • Primary Estimand: θ: Proportion in favor of independence
Graybill Conference, June 12, 2008
4
• Slovenian Public Opinion Survey Data: Independence ∗
Secession
Attendance
Yes
Yes
Yes
1191
8
21
No
8
0
4
∗
107
3
9
Yes
158
68
29
No
7
14
3
∗
18
43
31
Yes
90
2
109
No
1
2
25
∗
19
8
96
No
∗
Graybill Conference, June 12, 2008
No
5
Slovenian Plebiscite ←→ Slovenian Public Opinion Survey θ =0.885
Estimator Pessimistic bound Optimistic bound Complete cases Available cases MAR (2 questions) MAR (3 questions) MNAR Graybill Conference, June 12, 2008
θc 0.694 0.904 0.928 ? 0.929 ? 0.892 0.883 0.782 6
Modeling Frameworks & Missing Data Mechanisms
f (yi, r i|Xiθ, ψ) Selection Models: f (y i|Xi, θ) f (ri|Xi, y oi, y m i , ψ) −→
MCAR f (ri|Xi, ψ)
MAR f (r i|Xi, y oi, ψ)
−→
MNAR f (ri|Xi, y oi, y m i , ψ)
Pattern-mixture Models: f (y i|Xi, r i, θ) f (r i|Xi, ψ) Shared-parameter Models: f (y i|Xi, bi, θ) f (r i|Xi, bi, ψ)
Graybill Conference, June 12, 2008
7
MAR in 3 Frameworks
Selection models
Pattern-mixture models
Shared-parameter models
Graybill Conference, June 12, 2008
f (ri|y i, ψ) = f (ri|y oi, ψ)
o m o f (y m i |y i , r i , θ) = f (y i |y i , θ)
?
8
MAR in Selection Models Diggle and Kenward (ApStat 1994)
f (ri|y i, ψ) = f (r i|yoi, ψ) • Longitudinal data: logit [P (Di = j|Di ≥ j, yij , yi,j−1 )] = ψ0 + ψ1yi,j−1 + ψ2 yij ψ2 6= 0
←→
MNAR
ψ2 = 0
←→
MAR
ψ1 = ψ2 = 0
←→
MCAR
• No dependence on the future (NFD): built in Graybill Conference, June 12, 2008
9
MAR in Pattern-mixture Models Molenberghs, Michiels, Kenward, and Diggle (Statistica Neerlandica 1998) Thijs, Molenberghs, Michiels, Verbeke, and Curran (Biostatistics 2002)
o, r , θ) = f (y m|y o, θ) f (ym |y i i i i i
• For longitudinal data: ACMV: available case missing value restrictions: ∀t ≥ 2, ∀s < t : f (yit|yi1, · · · , yi,t−1 , di = s) = f (yit |yi1, · · · , yi,t−1 , di ≥ t) • Practical implementation: doable! f (yit |yi1, · · · , yi,t−1 , di = s) = Graybill Conference, June 12, 2008
αdfd(yi1 , . . . , yi,s−1 ) fd (ys |yi1 , . . . , yi,s−1 ) i α f (y , . . . , y ) i,s−1 d=s d d i1
n X Pn
d=s
10
Non-future Dependence in Pattern-Mixture Models Kenward, Molenberghs, and Thijs (Biometrika 2003)
• Within every pattern: . Past: Build a model for the observed data . Present, given past: For the first unobserved time, given the past: Free choice! . Future, given past and present: Use ACMV-type restrictions • Named NFMV: non-future missing values • Equivalence: SeM: NFD
Graybill Conference, June 12, 2008
⇐⇒
PMM: NFMV
11
MAR in Shared-parameter Models Creemers, Hens, Aerts, Molenberghs, Verbeke, and Kenward (2008)
f (y i|Xi, bi, θ) f (ri|Xi, bi, ψ)
Conventional
∩ Extended
o f (y oi|g i, hi, j i, `i)f (y m i |y i , g i , hi , ki , mi ) f (r i |g i , j i , k i ni )
∪ R
o |y f (y oi|g i, hi, j i)f (y m i i , g i , hi , k i )f (r i |g i , j i , k i )f (bi ) dbi R f (y oi|g i, j i)f (r i|g i, j i)f (bi) dbi
MAR
= R
Graybill Conference, June 12, 2008
o f (y oi|g i, hi)f (y m i |y i , g i , hi )f (bi ) dbi f (y oi) 12
MAR in Shared-parameter Models Creemers, Hens, Aerts, Molenberghs, Verbeke, and Kenward (2008)
Extended
o f (y oi|g i, hi, j i, `i)f (y m i |y i , g i , hi , ki , mi ) f (r i |g i , j i , k i ni )
∪ R o f (y oi|g i, hi, j i)f (y m i |y i , g i , hi , k i )f (r i |g i , j i , k i )f (bi ) dbi R f (y oi|g i, j i)f (r i|g i, j i)f (bi) dbi MAR
= R o f (y oi|g i, hi)f (y m i |y i , g i , hi )f (bi ) dbi f (y oi) ∪
Sub-class MAR
Graybill Conference, June 12, 2008
o f (y oi|j i, `i )f (y m i |y i , mi )f (r i |j i , ni )
13
Slovenian Public Opinion Survey: An MNAR Model Family Baker, Rosenberger, and DerSimonian (1992)
• Counts: Yr1 r2 jk • Questions: j, k = 1, 2
E(Y11jk ) = mjk E(Y10jk ) = mjk βjk E(Y01jk ) = mjk αjk
• Non-response: r1 , r2 = 0, 1
E(Y00jk ) = mjk αjk βjk γjk
. αjk : non-response on independence question . βjk : non-response on attendance question . γjk : interaction between both non-response indicators Graybill Conference, June 12, 2008
14
Slovenian Public Opinion Survey: Identifiable Models
Model
Structure
d.f.
loglik
θ
C.I.
BRD1
(α, β)
6
-2495.29
0.892
[0.878;0.906]
BRD2
(α, βj )
7
-2467.43
0.884
[0.869;0.900]
BRD3
(αk , β)
7
-2463.10
0.881
[0.866;0.897]
BRD4
(α, βk )
7
-2467.43
0.765
[0.674;0.856]
BRD5
(αj , β)
7
-2463.10
0.844
[0.806;0.882]
BRD6
(αj , βj )
8
-2431.06
0.819
[0.788;0.849]
BRD7
(αk , βk )
8
-2431.06
0.764
[0.697;0.832]
BRD8
(αj , βk )
8
-2431.06
0.741
[0.657;0.826]
BRD9
(αk , βj )
8
-2431.06
0.867
[0.851;0.884]
Graybill Conference, June 12, 2008
15
Slovenian Public Opinion Survey: An MNAR “Interval” θ =0.885
Estimator [Pessimistic; optimistic] Complete cases Available cases MAR (2 questions) MAR (3 questions) MNAR MNAR “interval” Graybill Conference, June 12, 2008
θc [0.694;0.904] 0.928 0.929 0.892 0.883 0.782 [0.741;0.892] 16
Slovenian Public Opinion Survey: Interval of Ignorance Model BRD1
Structure (α, β)
d.f. 6
loglik -2495.29
θ 0.892
C.I. [0.878;0.906]
BRD2
(α, βj )
7
-2467.43
0.884
[0.869;0.900]
BRD3
(αk , β)
7
-2463.10
0.881
[0.866;0.897]
BRD4
(α, βk )
7
-2467.43
0.765
[0.674;0.856]
BRD5
(αj , β)
7
-2463.10
0.844
[0.806;0.882]
BRD6
(αj , βj )
8
-2431.06
0.819
[0.788;0.849]
BRD7
(αk , βk )
8
-2431.06
0.764
[0.697;0.832]
BRD8
(αj , βk )
8
-2431.06
0.741
[0.657;0.826]
BRD9 Model 10
(αk , βj ) (αk , βjk )
8 9
-2431.06 -2431.06
0.867 [0.762;0.893]
[0.851;0.884] [0.744;0.907]
Model 11
(αjk , βj )
9
-2431.06
[0.766;0.883]
[0.715;0.920]
Model 12
(αjk , βjk )
10
-2431.06
[0.694;0.904]
Graybill Conference, June 12, 2008
17
Every MNAR Model Has Got an MAR Counterpart Molenberghs, Beunckens, Sotto, and Kenward (JRSSB 2008) Creemers, Hens, Aerts, Molenberghs, Verbeke, and Kenward (2008)
• Fit an MNAR model to a set of incomplete data • Change the conditional distribution of the unobserved outcomes, given the observed ones, to comply with MAR • Resulting new model has exactly the same fit as the original MNAR model • The missing data mechanism has changed • This implies that definitively testing for MAR versus MNAR is not possible Graybill Conference, June 12, 2008
18
MAR Counterpart to Pattern-mixture Models
c d c d c f (y io, y im, ri |θ, ψ) = f (y io|ri, θ) f (ri|ψ) f (y im|y io, ri, θ)
↓
c d c d c d h(y io, y im, ri |θ, ψ) = f (y io|ri, θ) f (ri|ψ) f (y im|y io, θ, ψ)
• Starting from PMM is “natural”: clear separation into: . fully observable components . entirely unobserved component Graybill Conference, June 12, 2008
19
MAR Counterpart to Selection Models
c d c d f (y io , y im , ri|θ, ψ) = f (y io, y im|θ) f (r i|y io, y im, ψ)
↓ c d c d c d c d ψ) = f (y io|ri, θ, ψ) f (r i|θ, ψ) f (y im |y io, ri, θ, ψ) f (y io , y im , ri|θ,
↓ c d c d c d c d h(y io , y im , ri|θ, ψ) = f (y io|ri, θ, ψ) f (r i|θ, ψ) f (y im |y io, θ, ψ)
Graybill Conference, June 12, 2008
20
MAR Counterpart to Shared-parameter Models
o f (y io, y im, ri|bi) = f (y oi|g i, hi, j i, `i ) f (y m |y i i , g i , hi , k i , mi ) f (r i |g i , j i , k i ni )
↓ o h(y io, y im, ri|bi) = f (y oi|g i, hi, j i, `i ) h(y m |y i i , mi ) f (r i |g i , j i , ki ni )
with o h(y m |y i i , mi )
Graybill Conference, June 12, 2008
Z
Z
Z
o = g hi ki f (y m |y i i , g i , hi , ki , mi )dg i dhi dki i
21
Slovenian Public Opinion Survey: Counterpart Added θb
Model
Structure
d.f.
loglik
θ
C.I.
BRD1
(α, β)
6
-2495.29
0.892
[0.878;0.906]
0.8920
BRD2
(α, βj )
7
-2467.43
0.884
[0.869;0.900]
0.8915
BRD3
(αk , β)
7
-2463.10
0.881
[0.866;0.897]
0.8915
BRD4
(α, βk )
7
-2467.43
0.765
[0.674;0.856]
0.8915
BRD5
(αj , β)
7
-2463.10
0.844
[0.806;0.882]
0.8915
BRD6
(αj , βj )
8
-2431.06
0.819
[0.788;0.849]
0.8919
BRD7
(αk , βk )
8
-2431.06
0.764
[0.697;0.832]
0.8919
BRD8
(αj , βk )
8
-2431.06
0.741
[0.657;0.826]
0.8919
BRD9
(αk , βj )
8
-2431.06
0.867
[0.851;0.884]
0.8919
Model 10
(αk , βjk )
9
-2431.06
[0.762;0.893]
[0.744;0.907]
0.8919
Model 11
(αjk , βj )
9
-2431.06
[0.766;0.883]
[0.715;0.920]
0.8919
Model 12
(αjk , βjk )
10
-2431.06
[0.694;0.904]
Graybill Conference, June 12, 2008
MAR
0.8919 22
Slovenian Public Opinion Survey: Incomplete Data
Observed ≡ BRD7 ≡ BRD7(MAR) ≡ BRD9 ≡ BRD9(MAR):
BRD1 ≡ BRD1(MAR):
BRD2 ≡ BRD2(MAR):
Graybill Conference, June 12, 2008
1439
78
159
16
16
32
1381.6 101.7
182.9
41.4
8.1
1402.2 108.9
159.0
24.2
15.6
22.3
32.0
144
54
136
179.7 18.3
136.0
181.2 16.8
136.0
23
Slovenian Public Opinion Survey: Complete-data Prediction BRD1 ≡ BRD1(MAR):
1381.6 101.7 24.2 41.4
170.4 12.5 3.0 5.1
176.6 13.0 3.1 5.3
121.3 9.0 2.1 3.6
BRD2:
1402.2 108.9 15.6 22.3
147.5 11.5 13.2 18.8
179.2 13.9 2.0 2.9
105.0 8.2 9.4 13.4
BRD2(MAR):
1402.2 108.9 15.6 22.3
147.7 11.3 13.3 18.7
177.9 12.5 3.3 4.3
121.2 9.3 2.3 3.2
BRD7:
1439 16
78 16
3.2 155.8 0.0 32.0
142.4 44.8 1.6 9.2
0.4 112.5 0.0 23.1
BRD9:
1439 16
78 16
150.8 8.2 16.0 16.0
142.4 44.8 1.6 9.2
66.8 21.0 7.1 41.1
BRD7(MAR) ≡ BRD9(MAR):
1439 16
78 18
148.1 10.9 11.8 20.2
141.5 38.4 2.5 15.6
121.3 9.0 2.1 3.6
Graybill Conference, June 12, 2008
24
Slovenian Public Opinion Survey: Collapsed (Marginalized) Predictions BRD1 ≡ BRD1(MAR):
1849.9 136.2 32.4 55.4
=⇒
θb = 89.2%
BRD2:
1833.9 142.5 40.2 57.5
=⇒
θb = 88.4%
BRD2(MAR):
1849.0 142.0 34.5 48.5
=⇒
θb = 89.2%
BRD7:
1585.0 391.1 17.6 80.3
=⇒
θb = 76.4%
BRD9:
1799.7 152.0 40.7 82.3
=⇒
θb = 86.7%
BRD7(MAR) ≡ BRD9(MAR):
1849.9 136.3 30.4 57.4
=⇒
θb = 89.2%
Graybill Conference, June 12, 2008
25
Toenail Data: Unaffected Nail Length • We opt for the following SPM: E(Yij |gi, Ti, tj , β) = β0 + gi + β1Ti + β2tj + β3Titj logit [P (Rij = 1|Ri,j−1 = 0, gi , Ti, tj , γ)] = γ0 + γ01 gi + γ1Ti + γ2tj + γ3Ti tj • with . Yij : unaffected nail length for subject i at occasion j . tj : time at which the jth measurement is made . Ti : treatment indicator for subject i . gi : normal random effect Graybill Conference, June 12, 2008
26
• Parameter estimates (standard errors):
Effect
Unaffected nail length
Dropout
Parameter Estimate (s.e.)
Parameter Estimate (s.e.)
Mean structure parameters Intercept
β0
2.510 (0.247)
γ0
-3.127 (0.282)
Treatment
β1
0.255 (0.347)
γ1
-0.538 (0.436)
Time
β2
0.558 (0.023)
γ2
0.035 (0.041)
Treatment-by-time
β3
0.048 (0.031)
γ3
0.040 (0.061)
Variance-covariance structure parameters Residual variance
σ2
6.937(0.248)
Scale factor Rand. int. variance Graybill Conference, June 12, 2008
τ2
6.507 (0.630)
γ01
-0.076 (0.057)
2 2 γ01 τ
0.038 (0.056) 27
• Graphical representation of predictions for incomplete portions: o m 2 . MNAR model: Y m i |y i , gi ∼ N (Xi β + Zi gi , σ Ii )
o 2 . MAR counterpart: Y m i |y i ∼ N (Xi β, dJi + σ Ii )
Graybill Conference, June 12, 2008
(dashed lines) (solid lines)
28
Conclusion: Correspondence Between Model Families Molenberghs, Michiels, Kenward, and Diggle (Statistica Neerlandica 1998) Kenward, Molenberghs, and Thijs (Biometrika 2003) Creemers, Hens, Aerts, Molenberghs, Verbeke, and Kenward (2008)
PMM
:
MCAR l MCAR
:
l MCAR
⊂ ⊂
MAR l ACMV
⊂ ⊂
SPM
⊂
Graybill Conference, June 12, 2008
l Theorem 1 ∪ Subfamily 1
⊂ ⊂
NFD l NFMV 6= interior l Theorem 2 ∪ Subfamily 2
⊂ ⊂
general MNAR l general MNAR
⊂
:
⊃
SeM
⊂
l general MNAR
29
Conclusion: Counterparts to Models Molenberghs, Beunckens, Sotto, and Kenward (JRSSB 2008) Creemers, Hens, Aerts, Molenberghs, Verbeke, and Kenward (2008) Verbeke and Molenberghs (2008)
• MNAR model
=⇒
MAR model:
. Observed data: same fit . Unobserved data given observed data: MAR prediction • Holds more generally:
Graybill Conference, June 12, 2008
30
Conclusion: Counterparts to Models
Enriched data
Coarse data
Augmented data
Incomplete data
Random effects
Censored data
Latent classes
Grouped data
Latent variables Mixtures
Graybill Conference, June 12, 2008
31
Additional Material
Graybill Conference, June 12, 2008
32
Toenail Data: Severity of Infection
πgi1i2rt = πg · πi1|g · πi2|i1gt · πr|g Variable
Index
0
1
Complete first measurement
i1
non-severe
severe
Incomplete last measurement
i2
non-severe
severe
Dropout indicator
r
dropout
completer
Treatment arm
t
standard
experimental
Latent class
g
class 0
class 1
Graybill Conference, June 12, 2008
33
Toenail Data: Severity of Infection
eαg πg = 1 + eα πi1|g
Model
πi2 |i1gt
e(β0+β1g)i1 = 1 + eβ0+β1g
πr|g
e(γ0 +γ1i1+γ2 g+γ3i1 g+γ4t)i2 = 1 + eγ0 +γ1i1 +γ2g+γ3i1 g+γ4t e(δ0+δ1g)r = 1 + eδ0+δ1g
Restriction
Mechanism
Bin1
β1 = 0
MNAR
Bin1
6=
Bin1(MAR)
Bin2
γ2 = γ3 = 0
MAR
Bin2
=
Bin2(MAR)
Graybill Conference, June 12, 2008
Implication
34
Standard treatment Completers
Experimental treatment
Dropouts
Completers
Dropouts
Observed data 77
5
10
79
3
11
42
9
3
42
3
6
Fit of Model ‘Bin1’ 76.85
5.66
9.04
0.34
9.38
81.21
2.43
9.36
0.15
9.51
40.60
7.99
4.62
0.90
5.52
45.62
3.63
5.19
0.41
5.60
Fit of Model ‘Bin1(MAR)’ 77.12
5.39
8.77
0.61
9.38
81.32
2.32
9.24
0.26
9.51
40.61
7.98
4.62
0.91
5.52
45.63
3.63
5.18
0.41
5.59
Fit of Model ‘Bin2’≡‘Bin2(MAR)’ 75.86
5.58
9.72
0.72
10.44
80.16
2.40
10.27
0.31
10.58
41.50
8.15
3.74
0.73
4.47
46.61
3.72
4.20
0.34
4.53
Graybill Conference, June 12, 2008
35