studies: application to income data from the Millennium Cohort Study. Alexina Mason, Nicky Best, Sylvia Richardson (Imperial College London) and Ian Plewis ...
Strategy for modelling non-random missing data mechanisms in longitudinal studies: application to income data from the Millennium Cohort Study Alexina Mason, Nicky Best, Sylvia Richardson (Imperial College London) and Ian Plewis (University of Manchester) Technical report available from www.bias-project.org.uk
3. Add a Response Model of Missingness (RMoM)
We propose a strategy for using Bayesian methods for a ‘statistically principled’ investigation of data which contains missing covariates and missing responses, likely to be non-random. The first part of this strategy entails constructing a ‘base model’ by selecting a model of interest, then adding a sub-model to impute the missing covariates followed by a sub-model to allow informative missingness in the response. The second part involves running a series of sensitivity analyses to check the robustness of the conclusions. We implement our strategy to investigate a question relating to the prediction of income, using data from the first two sweeps of the Millennium Cohort Study.
This sub-model allows informative missingness in the response, by modelling mi, a binary missing value indicator for yi2, s.t. ½ 1: yi2 observed mi = 0: yi2 missing eth (ethnic group) sc (social class) hpayi1 ctry (country) hpayi2 − hpayi1 mi ∼ Bernoulli(pi )
logit(pi ) = θ0 + P iecewise(leveli ) + P iecewise(changei ) +
1. select MoI using complete cases
2. add CMoM
P iecewise(leveli ) =
5. PARAMETER SENSITIVITY run model with the RMoM parameters associated with the informative missingness (δ) fixed to range of plausible values
Figure 1 shows the change in hourly pay for an individual with a degree against their hourly pay if they do not have a degree (all other characteristics remain unchanged) for a range of models.
0
BASE
PS (δ1=δ2=0.5)
BASE 95%CI
AS1
MoI
AS2
MoI 95%CI
AS3
MAR
AS4
BASE provides some evidence that gaining a partner between sweeps is associated with lower pay. There is clearly some sensitivity to our model assumptions, and AS4 (linear functional form of RMoM) provides stronger evidence that gaining a partner is associated with lower pay. By comparison, the complete case analysis (MoI) underestimates the decrease and fails to fully capture the uncertainty in the estimates.
10 20 30 40 50 hourly pay without a partner (£)
5. Parameter Sensitivity The values of δ1 and δ2 control the degree of departure from MAR missingness. So for the parameter sensitivity, a series of models is run with δ fixed to different values.
82 0BASE .9
1.02
−1.0 −0.5
8
0.
MAR
1.1
0.9
1.08
1.22
−1.0
0.9 4
1
6
1.24
1.3
2
6
1.1
−0.5
0.8
6
1.0
1.14
1.2
0.7
0.8
0.98
4 0.8
AS4
8
determine region of high plausibility
0.9 6
1.04
If all the PS variants are plausible, then we cannot even be sure about the direction of the effect of change in partnership status on income, as the models suggest a range of conclusions from strong evidence of a positive effect to strong evidence of a negative effect.
NO
YES
Figure 2: Posterior mean of proportional change in pay associated with gaining a partner between sweeps
0.8
Sensitivity of the proportional change in pay associated with gaining a partner between sweeps to the different assumptions can be displayed graphically, and two possibilities are shown (Figures 2 and 3).
Are conclusions robust?
report robustness
4. Assumption Sensitivity
1.0
elicit expert knowledge
0.5
BASE MODEL
θlevel[1] × (leveli − 10) : leveli < 10 θlevel[2] × (leveli − 10) : leveli ≥ 10
Figure 1: Presentation of results
change in hourly pay with a partner (£) −15 −10 −5 0
seek additional data
run alternative models with key assumptions changed including • MoI error distribution • MoI response transform • RMoM functional form
k
δ1 × changei : changei < 0 P iecewise(changei ) = δ2 × changei : changei ≥ 0 & vague priors
3. add RMoM
4. ASSUMPTION SENSITIVITY
θk wki
choice of functional form and position of knots are based on expert knowledge Alternative (AS4): linear functional form
δ2 0.0
note plausible alternatives
P
0.0
0.5
δ1
6. assess fit
Figure 3: Proportional change in pay associated with gaining a partner between sweeps delta2=−1
delta2=−0.5
delta2=0
delta2=0.5
posterior mean 95% interval
eβsing
1.4
recognise uncertainty
delta2=1
1.2 1.0 0.8
Our proposed model takes account of the design of the survey and the correlation between the two data points for each individual.
log of hourly pay (hpay) robustness to outliers Alternative (AS1): Alternative (AS2): Normal errors cube root transform yit ∼ t4 (µit , σ 2 ) µit = αi + γs(i) +
p P k=1
βk xkit
& vague priors individual age (main respondent’s age) random effects reg (London/other) sing (single/partner) stratum specific Alternative (AS3): include age2 intercepts
2. Add a Covariate Model of Missingness (CMoM) The MoI will not run with missing covariates, so we must add a CMoM to incorporate incomplete cases. Missing sweep 2 values for sing are imputed using the equations singi2 ∼ Bernoulli(q). q ∼ U nif orm(0, 1) Missing sweep 2 values for reg are set to their sweep 1 values, and for age to their sweep 1 values plus the mean of the difference between the values for sweeps 1 and 2 for observed individuals.
0
0.75 −0.75
0
0.75 −0.75
0
0.75 −0.75
0
0.75
δ1
6. Assess fit of validation sample Some sweep 2 data were collected from 7 individuals who were originally non-contacts or refusals in sweep 2, after they were re-issued by the fieldwork agency. We set these data to missing before fitting our models, so they can now be used for model checking. We calculate the mean square error (MSE) of the fit of hourly pay for these 7 individuals, for use as a summary measure of the performance of our models. For the assumption sensitivities, Table 1 suggests that the models with the linear functional form for the RMoM (AS4) and with the cube root transform (AS1) fit the 7 re-issued individuals best. Regarding the parameter sensitivity, from Figure 4 this measure provides greatest support for the models in the upper right quadrant. The results from two such models are shown in Figure 1. Table 1: MSE of imputed hourly pay for 7 re-issued individuals MSE for re-issues median 95% interval BASE 21.4 (3.6,347.8) AS1 8.0 (2.1,55.2) AS2 29.1 (4.2,154.1) AS3 16.4 (3.1,296.2) AS4 9.5 (3.2,26.1)
Figure 4: MSE of imputed hourly pay 1.0
1. Select a Model of Interest (MoI) based on complete cases
0.75 −0.75
0.5
QUESTION OF INTEREST: does change in partnership status affect income?
0
AS4 20
MAR BASE
30
10
40
−1.0 −0.5
We model mothers who are single in sweep 1, in paid work and not self-employed. The missing covariates are assumed to MAR, but the missing responses are allowed to be MNAR. We model only the sweep 2 missingness.
−0.75
δ2 0.0
Description of Data
60
80 90
−1.0
Conclusion
50
70
−0.5
0.0
0.5
1.0
δ1
There is weak evidence that gaining a partner is associated with lower pay, and the reduction is likely to be between £0.66 (£1.32,-£0.03)(MAR) and £1.71 (£2.32,£1.07)(PS) an hour for an individual earning £10 an hour. Some models run as part of the parameter sensitivity analysis suggest that change in partnership status is associated with an increase in pay, but these models do not fall in the region of high plausibility.
1.0