Original Paper Rodrigue S. Allodjia,b,c,ffi, Boris Schwartza,b,c, Diallo Ibrahimaa,b,c, Césaire Agbovond, Dominique Lauriere, and Florent de Vathairea,b,c
Simulation-extrapolation method to address errors in atomic bomb survivor dosimetry on solid cancer and leukaemia mortality risk estimates, 1950–2003
Rodrigue S. Allodji (), Boris Schwartz, Diallo Ibrahima, Florent de Vathaire Radiation Epidemiology Group / CESP - Unit 1018 INSERM, Gustave Roussy B2M 114, rue Edouard Vaillant 94805 Villejuif Cedex, France Tel 01 42 11 54 98 Fax 01 42 11 53 15 E-mail:
[email protected] Rodrigue S. Allodji, Boris Schwartz, Diallo Ibrahima, Florent de Vathaire Gustave Roussy, Villejuif, F-94805, France Rodrigue S. Allodji (), Boris Schwartz, Diallo Ibrahima, Florent de Vathaire Univ. Paris-Sud, Villejuif, F-94800, France Césaire Agbovon Pierre & Vacances – Center Parcs Group, L’artois - Espace Pont de Flandre, 11 rue de Cambrai, 75947 Paris Cedex 19, France. Dominique Laurier Institut de Radioprotection et de Sûreté Nucléaire (IRSN), DRPH, SRBE, Laboratoire d’épidémiologie, BP17, 92262 Fontenay-aux-Roses Cedex, France.
1
Abstract Analyses of the Life Span Study (LSS) of Japanese atomic bombing survivors have routinely incorporated corrections for additive classical measurement errors using regression calibration. Recently, several studies reported that the efficiency of the simulationextrapolation method (SIMEX) is slightly more accurate than the simple regression calibration method (RCAL). In the present paper, the SIMEX and RCAL methods have been used to address errors in atomic bomb survivor dosimetry on solid cancer and leukaemia mortality risk estimates. For instance, it is shown that using the SIMEX method the ERR/Gy is increased by an amount of about 29% for all solid cancer deaths using a linear model compared to the RCAL method, and the corrected EAR 10-4 person-years at 1 Gy (the linear terms) is decreased by about 8%, whilst the corrected quadratic terms (EAR 10-4 person-years per Gy2) is increased by about 65% for leukaemia deaths based on a linear-quadratic model. The results with SIMEX method are slightly higher than published values. The observed differences were probably due to the fact that with the RCAL method the dosimetric data were partially corrected, while all doses were considered with the SIMEX method. Therefore, one should be careful when comparing the estimated risks and it may be useful to use several correction techniques in order to obtain a range of corrected estimates, rather than to rely on a single technique. This work will enable to improve the risk estimates derived from LSS data, and help to make more reliable the development of radiation protection standards. Key words: Atomic bomb survivor dosimetry, measurement error, SIMEX method, solid cancer and leukaemia mortality risk estimates.
2
Introduction Epidemiological studies have shown that exposure to ionizing radiation (IR) increases the risk of cancer in humans. The precision of epidemiological risk estimates relies, in part, upon the degree of random errors, which also include dosimetry errors. Indeed, the reliability of dosimetry data directly affects the reliability of the risk estimates derived from epidemiological studies (Muirhead 2008). The Life Span Study (LSS) of atomic bomb survivors in Hiroshima and Nagasaki is the most complete single study of an exposed population followed over an extended period of time (Preston et al. 2013). LSS has provided the basis for the majority of international estimates of coefficients of radiation risk per unit dose, including those published by the U.S. National Research Council (BEIR VII 2006), the International Commission on Radiological Protection (ICRP 2007) and the United Nations Scientific Committee on the Effects of Atomic Radiation (UNSCEAR 2006). Analyses of LSS have routinely incorporated corrections of additive classical measurement errors using regression calibration (Pierce et al. 1990, 1992, 2008; Little et al. 2008). Using this correction method, Pierce et al. (1990) concluded that excess relative risk (ERR) estimates would be increased by 6% to 17%. In a later analysis, Pierce et al. (2008) also found that adjustment for classical errors increased the evidence for curvature in the solid cancer dose–response relationship, such that the ERR per dose unit increased with increasing dose. Little et al. (2008) have also fitted ERR and excess absolute risk (EAR) models to the latest LSS mortality data using regression calibration and Bayesian methods. Their results were similar to those observed by other researchers (Pierce et al. 1990, 1992, 2008). Therefore, to account for classical dose errors in LSS, most researchers (Kaiser and Walsh 2013; Schöllnberger et al. 2012; Walsh and Kaiser 2011) used the adjusted doses {E(true dose|observed dose)} derived from regression calibration (RCAL) implemented by Pierce et al. (1990). In a previous simulation-based comparison of techniques to correct measurement errors, in matched case-control studies evaluating the effects of childhood exposure to extremely low electromagnetic fields on the risk of disease occurrence, Guolo and Brazzale (2008) reported that the simulation-extrapolation method (denoted as SIMEX) was a viable option compared to regression calibration methods, in the presence of measurement errors. Kukush et al. (2011) have also examined the bias and uncertainty in thyroid cancer estimates of both baseline rate and excess absolute risk (EAR) per dose unit (131I activity) for several 3
approaches to error adjustment. These authors reported that the efficient SIMEX method was slightly more accurate than simple regression calibration (Kukush et al. 2011). More recently, Allodji et al. (2012) have compared the performance of various approaches to estimating ERR of lung cancer mortality in relation to radon exposure among French uranium miners. The methods used were the substitution method (also called regression calibration), another regression calibration variant called the estimation calibration method (denoted as RC-ECM), and the SIMEX method. They concluded that all three error-correction methods allowed a noticeable but partial reduction of the attenuation bias, with a slight advantage for the SIMEX method. However, the SIMEX method has received less attention in literature with respect to regression calibration, although all of these error-correction methods share the same simplicity of application. Indeed, they have the advantage that risk estimates can be carried out using standard risk estimation models. One important reason that the SIMEX method has received less attention may be that this method requires a longer computational and simulation time. Another reason may be that the choice of the optimal internal parameters for the SIMEX method has been little explored, and their impacts are not well characterized. The main objective of the present article is to investigate the SIMEX method to address the impact of errors in LSS dosimetry on solid cancer and leukaemia mortality risk estimates. Consequently, this work aims to further improve the risk estimates derived from LSS data, and in this way to provide more reliable radiation protection standards. A sensitivity analysis is performed to emphasize the impact of internal parameters for the SIMEX method on its error-correction performance.
Materials and methods Study population, follow-up, dosimetry and dose uncertainty The present study is closely related to the study of Ozasa et al. (2012), which used LSS mortality data from 1950 to 2003. The cohort for analysis includes 86,611 survivors. As in LSS previous studies (Preston et al. 2004 and 2007; Little 2008; Ozasa et al. 2012; Kaiser and Walsh 2013), the basic summary tables used in this work involve cross-classification of person years and cases over city, sex, age at exposure, dose, attained age, calendar time, and distance from the hypocentre. 4
Mortality follow-up was facilitated by the family registry system (koseki), which covers the whole of Japan and is > 99% complete. Like in Ozasa et al. (2012), in the current paper, follow-up data until December 31, 2003 were analysed. Cause of death for the subjects was classified by trained staff in the ABCC/RERF according to the International Classification of Diseases (ICD), 7th to 10th editions (Ozasa et al. 2012). In the present paper solid cancer deaths were grouped together, and leukaemia deaths. Individual data were not available, so all analyses in this work used grouped data. The LSS cohort data in file DS02can.dat are available for download from the website of the Radiation Effects Research Foundation (RERF) in Japan (http://www.rerf.or.jp). Several dosimetry systems have been elaborated by RERF for estimation of radiation doses to the LSS cohort over several decades. The currently used values are presented in DS02 (Young and Kerr 2005), which replaced the old dosimetry system (DS86). DS02 includes calculated doses for 15 organ sites. In this paper, the colon dose is considered as representative for the dose to all organs when analysing solid cancer deaths, while bone marrow dose is considered as representative for leukaemia death analyses. For both colon and bone marrow, the DS02 weighted dose variable used here is expressed as the gamma survivor absorbed dose estimate plus 10 times the neutron survivor absorbed dose estimate. DS02 weighted colon or bone marrow doses are expressed in Gray (Gy). An assessment of uncertainty has been part of the development of each element of the DS86 and, subsequently, the DS02 dosimetry system, including source, free-field radiation, house-shielding and bodyshielding elements (Preston et al. 2004). Previous studies reported a coefficient of variation (CV) of about 35% for dose uncertainties due to random measurement error in individual DS02 dose estimates (Pierce et al. 1990, 1992; Preston et al. 2007). These authors reported classical measurement error only in these doses. Pierce et al. (2008) reevaluated the assumptions used by Preston et al. (2007), and concluded that the total random measurement error associated with each individual survivor dose-estimate ought to be more appropriately expanded to 44%. Of this, a CV of 40% is considered as classical measurement error, and a CV of 20% as Berkson measurement error (Pierce et al. 2008). In the present paper only the classical model is considered, assuming 35% measurement error in individual dose estimates. To correct for dose uncertainties due to random measurement error, in the file DS02can.dat, unadjusted dose estimates were replaced by adjusted doses {E(true dose|observed dose)} obtained from the method of Pierce et al. (1990) and assuming 35% 5
measurement error. In order to reconstruct the unadjusted doses, we have reversed the equation 0 1 ln[ z] 2 ln[ z]2 z Avg[ x | z] / z used by Pierce et al. (1990): In other words, we have solved (for the unadjusted dose z ) D z 0 z 1 z ln[ z] 2 z ln[ z ]2 where the adjusted dose D Avg[ x | z ] and the coefficients 0 , 1 , 2 for either city, Hiroshima or Nagasaki, are from Pierce et al. (1990). The equation has been solved by numerical optimization using the Solver Tool in Microsoft Excel, and this has been repeated intensively by one small Visual Basic macros code.
Empirical statistical methods Empirical models use simple mathematical terms to describe baseline morbidity or mortality rates and excess risks associated with radiation exposure. Baseline rates can be modelled by allowing for different baseline rates in a number of strata (groups) defined by, for example, sex and ranges of birth year and age. The risk models applied here are similar to those already considered and explained in detail (Ozasa et al. 2012; Little 2008). Excess risks are expressed as excess relative risk (ERR) or excess absolute risk (EAR) depending on dose and possibly variables (UNSCEAR 2006; Breslow and Day 1987). Various dose–response relationships have been implemented in the LSS; these include linear, linear-quadratic, quadratic, linear-quadratic-exponential models, and other more general forms (Ozasa et al. 2012; Little 2008). The main analysis in the present work focused on linear, linear-quadratic, and pure quadratic dose–response relationships. The ERR model used was as follows:
( c, s ,e,a,t ,d ) 0( c, s,e,a,t ) 1 ERR( D ( c, s,e,a,t ) )
(1)
where D ( c , s ,e,a ,t ) denotes weighted organ (here colon or bone marrow) doses based on detailed tabulations of the data cross-classified by city (c), sex (s), age at exposure (e), attained age (a) and calendar time (t); 0( c, s ,e,a,t ) is the baseline mortality rate at zero dose, depending on the same cross-classification variables. The EAR model used has the following form:
(c,s,e,a,t ,d ) 0( c, s,e,a,t ) EAR( D ( c, s,e,a,t ) )
(2) 6
The effect modification by sex (s), age at exposure (e), and attained age (a) was taken into account in all analyses. Thus, the effect modification models of dose response, averaged over sex with equal weights, for e=30, a=70 were parameterized in all analyses, as has previously been done (Preston et al. 2004; Ozasa et al. 2012). The linear ERR models with effect modification by sex, age at exposure and attained age used were of the following form:
(c,s ,e,a,t ,d ) 0(c,s ,e,a,t ) (1 ( D (c,s,e,a,t ) )) exp( s + e + ln(a))
(3)
where is a parameter representing ERR estimates per Gy for all solid cancers or for leukaemia, and , , are the coefficients for effect modification by age at exposure, attained age, and sex, respectively. The linear EAR model used has the following form:
(c,s ,e,a,t ,d ) 0(c,s ,e,a,t ) EAR( D (c,s,e,a,t ) ) exp( s + e + ln(a)) .
(4)
For linear-quadratic models, the following expressions were used: for the ERR model
(c,s ,e,a,t ,d ) 0( c,s ,e,a,t ) (1 1 D ( c,s ,e,a,t ) 2 D(2c, s,e,a,t ) ) exp(s + e + ln(a))
(5)
and for the EAR model
(c,s ,e,a,t ,d ) 0( c,s ,e,a,t ) (1 D ( c,s ,e,a,t ) 2 D(2c, s,e,a,t ) ) exp(s + e + ln(a)) .
(6)
where 1 and 2 are dose-related parameters to be estimated from the data. The case where 1 0 was also considered; these are pure quadratic models. Significance tests and confidence intervals (CI) were based on likelihood ratio statistics. The results were considered statistically significant when the two-sided P-value was lower than 0.05. The present work focused on risk estimates, thus coefficients for the effect modification terms , , will not be provided.
7
All calculations for empirical statistical methods were performed with the SAS statistical software package (SAS Institute Inc. 2003), using PROC NLMIXED of SAS to fit both ERR and EAR models (Equations 3–6) as described previously (Richardson 2008). These statistical methods were used for unadjusted dose estimates (naïve analyses) and adjusted dose estimates (RCAL). Furthermore, the Akaike Information Criterion (AIC) was used as the method of model choice. The naïve analyses models (Equations 1–2) were cross-checked by independent calculations with the EPICURE package (Preston et al. 1993); these results are not shown.
SIMEX method for adjustment of the effect of errors The SIMEX method was initially proposed by Cook and Stefanski to address additive measurement error in generalized linear regressions (1994), and further extended and applied to handle other measurement error models (Allodji et al. 2012; Carroll et al. 2006; Küchenhoff and Carroll 1997). The SIMEX method uses the relationship between the sizes of the measurement error, described by the measurement error variance U2 and the estimator of the model parameter when ignoring the measurement error. Given that in the LSS cohort data (http://www.rerf.or.jp), the doses are in grouped data rather than individual doses, a set of fictitious survivor individual doses ( Dij where ―i‖ – a survivor index, ―j‖ – a cell index and N j the number of subjects in the jth cell) was generated, and the random additional error was added to each such individual survivor dose for use with the SIMEX method. The mean dose in the jth cell ( D j ) is the sum of the cumulative individual dose ( Dij ) divided by the number of person-years ( pyrij ) of observation (Wood et al. 1997). Nj
Dj
D
i 1 Nj
pyr i 1
Nj
D i 1
ij
ij
(7)
ij
Nj
D j pyrij i 1
Assuming the same fictitious ( D'ij ) survivor doses for each subject: 8
Nj
D i 1
ij
N j Dij'
Nj
N j Dij' D j pyrij
(8)
i 1
Nj
Dij'
D j pyrij i 1
Nj
(9)
In order to generate the set of fictitious survivor individual doses, we needed to obtain from RERF the number of subjects at risk in each cell of the person-year table. After acceptance of our request and our research protocol, we received these data from RERF, which allowed us to produce reasonable approximations for the SIMEX method based on grouped data. The SIMEX algorithm has two main steps, a simulation step and an extrapolation step that may be described succinctly as follows: i) In the simulation step, M datasets with additional measurement error are generated with U2 gradually increased by adding m U2 to it, where m is the factor of increase in the measurement error variance which takes strictly positive values between 0 and M , the maximum value of this factor. The resulting measurement error variance is then (1 U2 )m . Following recommendations by Carroll et al. (2006), M was set equal to 2 in the present study and accordingly, eight values of m from 0.25 to 2 were used, thus using steps of 0.25 (i.e., 0.25, 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, and 2.00). For each value of m , weighted organ doses (here colon or bone marrow) were generated according to the assumption for the following classical measurement error model:
Dij' ( m ) Dij' Um where Dij' is the observed weighted organ doses and U m is the generated additional measurement error, assuming an underlying log-normal distribution at the mth factor of increase, i.e., U m ~LN{-( m U2 / 2 ); m U2 }. For each value of m , new ―observed‖ weighted organ dose values Dij' ( m ) were generated. As previously, this was repeated B=20 9
times for each value of m (Allodji et al. 2012). For each data set, the estimated naïve values of ERR or EAR ( ˆ ) and their standard errors (SE) were calculated, and their average values were computed over the B sets as follows:
ˆ ( m )
1 B ˆ 1 B ( m ) and SE{ ˆ ( m )} b1 SEb { ˆ ( m )} . b 1 b B B
ii) In the extrapolation step, the relationships of these average estimates with measurement error size measured by m were then fitted in order to extrapolate back to the case of no measurement error, i.e., = -1. Extrapolation was performed using a linear quadratic function ( ( ) a b c 2 ) as suggested by Cook and Stefanski (1994) and Carroll et al. (2006). In order to emphasize the impact of the choice of internal parameters for the SIMEX method on its error-correction performance, three sensitivity analyses were performed: i) in the first analysis, the behaviour of the SIMEX method was explored when the range of m was 0.1, 0.5 or 1 instead of 0.25 in the main analyses. Therefore, 20, 4 or 2 values of m were used instead of eight values of m in the main analyses. ii) In the second analysis, the error-correction performance of the SIMEX method was examined when B number times simulation steps were 10, 100 or 200 instead of 20 in the main analyses. iii) In the third analysis, the behaviour of the SIMEX method was investigated when the extrapolation phase was performed using various functions g ( ) , i.e., linear ( ( ) a b ), cubic ( ( ) a b c 2 d 3 ) and rational ( ( ) a
b ), instead (c d )
of the linear-quadratic function only, as suggested in literature (Küchenhoff and Carroll 1997; Stefanski and Cook 1995). For the sake of simplicity, these three sensitivity analyses were performed only on the linear ERR and EAR models.
10
Results Main analyses Table 1 reports naïve (ignoring error) and corrected ERR estimates using the RCAL and SIMEX methods from linear, linear-quadratic and quadratic models of solid cancer and leukaemia deaths in the LSS. When measurement error in weighted organ doses (here colon or bone marrow) was ignored, from linear models, the ERR per Gy was 0.432 (95% CI: 0.352, 0.512) for all solid cancer deaths, and was about ten times as high for leukaemia deaths. The factor 10 was also observed with the fits of quadratic models. Based on a linear quadratic model, for leukaemia 1 (ERR per Gy) was 1.905 (0.312, 3.498) and 2 (ERR per Gy2) was 1.098 (0.269, 1.927), and for all solid cancer 1 was 0.397 (0.234, 0.560) and 2 was 0.043 (-0.063, 0.105). As expected, over the whole dose range up to 4 Gy, for all solid cancer deaths, the linear model showed the lowest AIC value. The addition of the quadratic term to the linear model or the quadratic dose model did not improve the statistical fit by the AIC, for solid cancers. In contrast, the model with the lowest AIC value from naïve analyses was the linear-quadratic form of dose response for the leukaemia deaths. For that reason, Table 1 focuses on linear models only for all solid cancer deaths and on linear-quadratic models only for leukaemia deaths. When error-correction methods (RCAL and SIMEX) were implemented to address the assumption of 35% errors in DS02 dose estimates, this led to an increase in ERR estimates of solid cancer and leukaemia deaths in general. For example, about a 7% increase was found for all solid cancer deaths using the RCAL method with adjusted dose estimates in linear models. In contrast, the corrected ERR per Gy was 0.598 (95% CI: 0.502, 0.694) for all solid cancer deaths using the SIMEX method, which corresponds to an increase of about 38% compared to the naïve estimates. With linear-quadratic ERR models, using the two error-correction methods, only the corrected quadratic terms were significant for leukaemia deaths. For leukaemia deaths, using the RCAL method, the corrected 1 (ERR per Gy) was 1.302 (95%CI: -0.360, 2.964) and 2 (ERR per Gy2) was 1.653 (0.704, 2.602) in the linear-quadratic model, which reflects a decrease of 31% of the linear term 1 and an increase of about 50% of the quadratic term 2. While, the corrected ERR of leukaemia deaths decreased by about 78% for linear term (1) and increased by about 147% for the quadratic term (2) in the linear-quadratic model using the SIMEX method. As expected, there was some loss of precision from taking 11
errors into account, which resulted in higher SE with the SIMEX and RCAL error-correction methods than with the naïve method. Table 2 reports naïve and corrected EAR estimates using the SIMEX method from linear, linear-quadratic and quadratic models for solid cancer and leukaemia deaths in the Life Span Study (LSS) of atomic-bomb survivors. When measurement error in weighted organ dose values (here colon or bone marrow) was ignored, from linear models, the EAR/104 person-years/Gy was 20.810 (95% CI: 12.362, 29.258) for all solid cancer deaths, and 2.322 (95% CI: 1.332, 3.312) for leukaemia deaths. The fits of quadratic models led 2 (EAR/104 person-years/Gy2) which were lower than in linear models. Based on a linear-quadratic model, for leukaemia, 1 (EAR/104 person-years/Gy) was 1.485 (95% CI: 0.458, 2.512) Gy and 2 was 0.495 (95% CI: -0.003, 0.993), and for all solid cancer 1 was 19.479 (95% CI: 9.022, 29.936) and 2 was 0.741 (-3.048, 4.530). From naïve analyses, the linear EAR model was the lowest AIC value for all solid cancer deaths and linear-quadratic EAR model showed the lowest AIC value for leukaemia deaths. When RCAL and SIMEX methods were implemented, an increase in the EAR estimates in general was also observed. With linear EAR models for all solid cancer deaths, for example, the corrected EAR 10-4 person-years/Gy were increased by about 8% (with the RCAL method) and by about 47% (with the SIMEX method) compared to the naïve estimate. With linear-quadratic EAR models, using the SIMEX method, only the corrected quadratic term 2 was significant for leukaemia deaths. The corrected 1 (the linear terms) decreased, whilst the corrected quadratic terms (2) increased by about 51% (with the RCAL method) and 150% (with the SIMEX method), for leukaemia deaths compared to the naïve estimate. Again the correction of measurement error was obtained at the cost of a loss of precision, with an increased SE and increased width of the 95% CI of 1 and 2 for both methods compared to the naïve method, as expected.
Impact of the choice of internal parameters for the SIMEX method on error-correction performance The results of the sensitivity analyses performed are given in Tables 3, 4 and 5. Table 3 shows the behaviour of the SIMEX method when the range of m was 0.1, 0.5 or 1 instead of 0.25 in the main analyses. This analysis showed that the corrected estimates of 12
ERR or EAR depended markedly on the choice of the measurement error variance ( m ). In other words, the performance of correction of the SIMEX method is strongly influenced by the number of values chosen between 0 and M 2 . As expected, when there were only 2 or 4 values with m 1 or m 0.5 , respectively, with the extrapolation step using a linearquadratic function, the SIMEX method led to a serious overestimation of ERR and EAR values. Table 4 presents the performance of the SIMEX method when the number of repetitions of the simulation steps (B) was 10, 100 or 200 instead of 20 in the main analyses. One of the criticisms made of the SIMEX method is that it is computationally intensive, which can be hugely time consuming (Carroll et al. 2006). Indeed, to control simulation variability, it is recommended to perform a large number of repetitions of the simulation and re-estimation steps. In contrast to this criticism, the present analysis indicated that the corrected estimates of ERR or EAR were little influenced by the number of repetitions of the simulation steps (B). In other words, the error-correction performance of the SIMEX method was almost the same whatever the number of repetitions B. Therefore, the SIMEX method performed with B equal to 20, as in the main analyses was suitable. Table 5 presents the error-correction performance of the SIMEX method when the extrapolation phase was performed using various functions g ( ) , i.e., linear, cubic and rational, instead of the linear-quadratic function used in the main analyses. Though the SIMEX method was quite stable when the extrapolation phase was performed with a rational function, its performance might appear bad with the cubic function. In fact, using the cubic function, the SIMEX method led to a very serious overestimation, compared to the naïve estimate and corrected estimates using RCAL and SIMEX methods in the main analyses. Conversely, when the linear function was used in the extrapolation step, the SIMEX method was not able to provide any meaningful correction of the measurement error.
Discussion In the present paper, the SIMEX method has been used to address the impact of dosimetric errors in solid cancer and leukaemia mortality risk estimates for atomic bomb survivors. For instance, using the SIMEX method, the corrected ERR/Gy was higher by about 13
29% for all solid cancer deaths based on a linear dose model compared to the RCAL method. From a linear-quadratic model, with the SIMEX method, the corrected EAR 10-4 personyears/Gy (the linear dose term 1) was lower by about 8%, the corrected quadratic term 2 (EAR/104 person-years/Gy2) was higher by about 65% for leukaemia deaths compared to the RCAL method. Moreover, sensitivity analyses were performed to investigate the impact of internal parameters used by the SIMEX method on its error-correction performance. Indeed, the error-correction performance of the SIMEX method is strongly influenced by the range of measurement error variance ( m ), and by the choice of the extrapolation function used. However, in terms of effectiveness/implementation time, the SIMEX method remains efficient with a relatively small number of repetitions of the simulation steps.
Error-correction methods in LSS studies and other studies in radiation epidemiology As already noted in the introduction section, LSS studies that have explicitly dealt with radiation uncertainties have generally used some version of a regression calibration method (Pierce et al. 2008) and Bayesian approaches (Little et al. 2008). Using the RCAL method, Pierce et al. (1990) concluded that risk estimates would be increased by 6% to 17% based on linear models. In the present paper, the increase of ERR per Gy was 7% for all solid cancer deaths using the RCAL method. Little et al. (2008) found that analyses using a regression calibration method yield central estimates of risk very similar to those obtained using the Bayesian approach. These authors have also found similar results to those used by other researchers (Pierce et al. 1990, 1992, 2008). The results obtained in the present work show that the estimates of risk obtained with the SIMEX method are along the same lines with those obtained using the two previous error-correction methods used in LSS studies. However, the present values look a little too high compared to the estimates of various other researchers. The main reason for this difference might be that while all doses have been considered with the SIMEX method, with the RCAL method only about 30% of dosimetric data are corrected and the remaining dosimetric data are still unadjusted. Indeed, LSS radiation doses below approximately 0.51 Gy are not adjusted by RERF for Hiroshima, and those below about 0.8 Gy are not adjusted for Nagasaki. Therefore, one should be careful when comparing the estimated risks.
14
Schafer and Gilbert (2006) reported that taking exposure/dose error into account increased the estimated risk coefficients by about 50–100% for the residential radon studies, 60% for the Colorado miners, 30% for the Utah nuclear fallout leukaemia study, and 100% for the Utah nuclear fallout thyroid study. Kopecky et al. (2006) used SIMEX to adjust for the effect of dose uncertainty in an analysis of childhood thyroid cancer risks in a Russian population exposed as a consequence of the Chernobyl accident. The dose uncertainty adjustment increased the point estimate of the ERR per unit dose by a factor of three. However, as noted by these authors, the adjusted estimate is almost certainly too large, since it is unlikely that all of the dose uncertainty arose from measurement error. Recently, in a study on the French Uranium Miners’ Cohort, Allodji et al. (2012) showed that the errorcorrected ERR estimate for lung cancer death associated with radon and its decay products was increased by about 25–119%. In this study, the authors also noted the possibility for an overcorrection of the attenuation bias due to dose uncertainty with the SIMEX method, compared to the RCAL method.
Internal parameters for the SIMEX method To the best of our knowledge, results of a comprehensive study of all internal parameters for the SIMEX method have not been reported previously. In addition, the present investigation answers questions that have been raised previously (Allodji et al. 2012) in an effort to better define the conditions of optimal use of the SIMEX method. In the present work, the range of the used values of the measurement error variance ( m ) were 0.1, 0.25, 0.5 or 1 corresponding to 20, 8, 4 or 2 risk estimations, respectively, which, in addition to naïve risk estimate, were used to fit the extrapolation step. As expected, the aforementioned sensitivity analyses indicated that the error-correction performance of the SIMEX method is strongly influenced by the range of m . Thus, the quality of the estimation depends on the choice of m range. The present work did not consider the impact of the choice of the maximum value of measurement error variance (i.e., M =2), because this issue has already been addressed by Stefanski and Cook (1995). These authors found that the results have been generally inconclusive. The present work indicated also that the choice of the extrapolation function has a substantial influence on the results obtained with the SIMEX method. More specifically, it appeared here that the cubic extrapolation function was the worst, while the 15
quadratic extrapolation function was better than the other functions, as expected. Indeed, this is consistent with results reported by Bonate (2013) and with findings of numerous other authors (Küchenhoff and Carroll 1997; Stefanski and Cook 1995; Allodji et al. 2012). Bonate (2013) reported also that the rational extrapolation function was worst when they considered linear, quadratic and rational extrapolation functions. The present results from the sensitivity analyses
are
consistent
with
this
finding.
In
the
present
study,
the
best
effectiveness/implementation time ratio was obtained for 20 repetitions of the simulation steps. This result is very important because it could help make the SIMEX method even more attractive.
Strengths, limitations and perspectives In this paper, the SIMEX method was implemented to address errors in atomic bomb survivor dosimetry for solid cancer and leukaemia mortality risk estimates. Consequently, this work allows to improve the risk estimates derived from LSS data, and to support the development of radiation protection standards. The SIMEX method is simple, attractive, and a possible alternative to the regression calibration method or the Bayesian approach for dealing with the effect of dose errors on risk estimates. The method is easy to apply, as it uses only the naïve estimation given from the original error-free model. However, it requires the measurement error variance U2 to be known or estimated (Carroll et al. 2006). Also, it is computer intensive and it only gives a consistent risk estimate if the correct extrapolation curve has been used. The SIMEX approach allows to consider linear or nonlinear regression models with multiplicative measurement error, because this approach does not depend on the functional form of the model (Kukush et al. 2011). In the present paper, the SIMEX method has been used assuming only classical measurement error, as was also assumed in Pierce et al.’s study (2008) on atomic bomb survivor dosimetry. It is possible, however, to apply the SIMEX method to situations involving a mixture of measurement (classical) error and Berkson error, if the magnitude of the different types of error is known. The lack of individual doses data, as already stated, is a limitation. Nevertheless, in the present work, a set of fictitious survivor individual doses was generated to overcome this problem and to produce reasonable approximations for the SIMEX method based on grouped data. It cannot be entirely ruled out, however, that this procedure could lead to some changes in the study 16
outcomes, which would be interesting to explore in the future through a collaboration with RERF using individual dose data. A central question arising from the present analysis is whether the SIMEX method should be routinely applied to dosimetry data from epidemiological studies. Most epidemiological studies include some degree of dosimetry measurement error, which may be significant; therefore, it may be useful to apply several correction techniques in order to obtain a range of corrected estimates, rather than relying on a single correction technique. Since error-correction methods provide a natural way of looking at the impact of complex dose uncertainties on risk estimates, further research and development of these methods is important.
Acknowledgements One of the authors is recipient of a Postdoctoral Fellowship from the ―Association pour la Recherche sur le Cancer (ARC)‖, France. This report makes use of data obtained from the Radiation Effects Research Foundation (RERF), Hiroshima and Nagasaki, Japan. RERF is a private, non-profit foundation funded by the Japanese Ministry of Health, Labour and Welfare and the U.S. Department of Energy, the latter through the National Academy of Sciences. The conclusions in this report are those of the authors and do not necessarily reflect the scientific judgment of RERF or its funding agencies. The authors warmly thank F Dayet from the Radiation Epidemiology Group, INSERM U1018 (France), and Drs H Cullings, K Ozasa, S Funamoto, and K Iyota from the Department of Statistics and Epidemiology, RERF (Japan), for their support.
17
References Allodji RS, Thiébaut AC, Leuraud K, Rage E, Henry S, Laurier D, Bénichou J (2012) The performance of functional methods for correcting non-Gaussian measurement error within Poisson regression: corrected excess risk of lung cancer mortality in relation to radon exposure among French uranium miners. Stat Med 31:4428-43 Bonate PL (2013) Effect of assay measurement error on parameter estimation in concentration–QTc interval modeling. Pharmaceut Statist 12:156-164 Breslow NE, Day NE (1987) Statistical methods in cancer research. Volume II--The design and analysis of cohort studies. IARC Sci Publ 82:1-406. Lyon Carroll RJ, Ruppert D, Stefanski LA and Crainiceanu CM (2006) Measurement Error in Nonlinear Models: A Modern Perspective. Chapman & Hall, CRC Press: Boca Raton, FL Cook JR, Stefanski LA (1994) Simulation–Extrapolation Estimation in Parametric Measurement Error Models. J Am Stat Assoc 89:1314-1328 Guolo A, Brazzale AR (2008) A simulation-based comparison of techniques to correct for measurement error in matched case–control studies. Stat Med 27:3755–3775 Kaiser JC, Walsh L (2011) Independent analysis of the radiation risk for leukaemia in children and adults with mortality data (1950-2003) of Japanese A-bomb survivors. Radiat Environ Biophys 50:21-35 Kodama K, Mabuchi K, Shigematsu I (1996) A long-term cohort study of the atomic-bomb survivors. J Epidemiol 6:95-105 Kopecky KJ, Stepanenko V, Rivkind N, Voillequé P, Onstad L, Shakhtarin V, Parshkov E, Kulikov S, Lushnikov E, Abrosimov A, Troshin V, Romanova G, Doroschenko V, Proshin A, Tsyb A, Davis S (2006) Childhood thyroid cancer, radiation dose from Chernobyl, and dose uncertainties in Bryansk Oblast, Russia: a population-based casecontrol study. Radiat Res 166:367-374 Küchenhoff H and Carroll RJ (1997) Segmented regression with errors in predictors: semiparametric and parametric methods. Stat Med 16:169-88 Kukush A, Shklyar S, Masiuk S, Likhtarov I, Kovgan L, Carroll RJ, Bouville A (2011) Methods for estimation of radiation risk in epidemiological studies accounting for classical and Berkson errors in doses. Int J Biostat 7:15 18
Little MP, Hoel DG, Molitor J, Boice JD, Wakeford R, Muirhead CR (2008) New models for evaluation of radiation-induced lifetime cancer risk and its uncertainty employed in the UNSCEAR 2006 report. Radiat Res 169:660-676 Muirhead CR (2008) Exposure assessment: implications for epidemiological studies of ionizing radiation. Radiat Prot Dosim 132:134-138 National Research Council (2006) Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2. National Academy Press, Washington, D.C. International Commission on Radiological Protection (2008) The 2007 Recommendations of the International Commission on Radiological Protection. Annals of the ICRP 37(2-4). ICRP Publication 103. Elsevier, Oxford Ozasa K, Shimizu Y, Suyama A, Kasagi F, Soda M, Grant EJ, Sakata R, Sugiyama H, Kodama K (2012) Studies of the mortality of atomic bomb survivors, Report 14, 19502003: an overview of cancer and noncancer diseases. Radiat Res 177:229-243 Pierce DA, Stram DO, Vaeth M (1990) Allowing for random errors in radiation dose estimates for the atomic bomb survivor data. Radiat Res 123:275-284 Pierce DA, Stram DO, Vaeth M, Schafer DW (1992) The errors in-variables problem: considerations provided by radiation dose–response analyses of the A-bomb survivor data. J Am Stat Assoc 87:351-359 Pierce DA, Vaeth M, Cologne JB (2008) Allowance for random dose estimation errors in atomic bomb survivor studies: a revision. Radiat Res 170:118-126 Preston DL, Lubin JH, Pierce DA, McConney ME (1993) Epicure Users Guide. Hirosoft International Corporation, Seattle, WA Preston DL, Pierce DA, Shimizu Y, Cullings HM, Fujita S, Funamoto S, Kodama K (2004) Effect of recent changes in atomic bomb survivor dosimetry on cancer mortality risk estimates. Radiat Res 162:377-389 Preston DL, Ron E, Tokuoka S, Funamoto S, Nishi N, Soda M, Mabuchi K, Kodama K (2007) Solid cancer incidence in atomic bomb survivors: 1958-1998. Radiat Res 168:164
19
Preston RJ, Boice JD Jr, Brill AB, Chakraborty R, Conolly R, Hoffman FO, Hornung RW, Kocher DC, Land CE, Shore RE, Woloschak GE (2013) Uncertainties in estimating health risks associated with exposure to ionising radiation. J Radiol Prot 33:573-88 Richardson DB (2008) A simple approach for fitting linear relative rate models in SAS. American J Epidem 168:1333-1338 SAS Institute Inc. (2003) SAS OnlineDoc 9.1.2. Cary, NC: SAS Institute Inc Schafer DW, Gilbert ES (2006) Some statistical implications of dose uncertainty in radiation dose–response analyses. Radiat Res 166:303-312 Schöllnberger H, Kaiser JC, Jacob P, Walsh L (2012) Dose-responses from multi-model inference for the non-cancer disease mortality of atomic bomb survivors. Radiat Environ Biophys 51:165-78 Stefanski LA, Cook JR (1995) Simulation-extrapolation: The measurement error jackknife. J American Stat Assoc 90:1247-1256 United Nations Scientific Committee on Effects of Ionizing Radiation (2008 and 2009) Volume I: Report to the General Assembly, Scientific Annexes A and B; Volume II: Scientific Annexes C, D and E. United Nations Scientific Committee on the Effects of Atomic Radiation, UNSCEAR 2006 Report. United Nations, New York Walsh L, Kaiser JC (2011) Multi-model inference of adult and childhood leukaemia excess relative risks based on the Japanese A-bomb survivors mortality data (1950-2000). Radiat Environ Biophys 50:21-35 Wood J, Richardson DB, Wing S (1997) A simple program to create exact person-time data in cohort analyses. Internat J Epidemiol 26:395–399 Young RW, Kerr GD (2005) Reassessment of the atomic bomb radiation dosimetry for Hiroshima and Nagasaki—Dosimetry system 2002 (DS02). Radiation Effects Research Foundation, Hiroshima
20
Table 1: Naïve (ignoring error) and corrected excess relative risk (ERR) estimates for attained age of 70 years after exposure at age 30 using the RCAL and SIMEX methods for solid cancer and leukaemia deaths in Life Span Study (LSS) of atomic-bomb survivors
Solid cancer deaths
Leukaemia deaths
Corrected estimates
Naive estimates
RCAL
Corrected estimates
SIMEX*
Naive estimates
RCAL
SIMEX*
Linear models: ERR
0.432
0.463
0.598
3.858
4.142
4.616
SE (ERR)
0.041
0.044
0.049
0.593
0.636
0.670
95% CI
0.352 – 0.512
0.377 – 0.549
0.502 – 0.694
2.696 – 5.020
2.895 – 5.389
3.303 – 5.929
AIC
33540.0
33541.0
33472.9
3396.1
3398.7
3354.7
0.397
0.369
-0.202
1.905
1.302
0.410
0.083
0.091
0.090
0.813
0.848
1.133
0.234 – 0.560
0.191 – 0.547
-0.378 – -0.026
0.312 – 3.498
-0.360 – 2.964
-1.811 – 2.631
0.021
0.056
0.088
1.098
1.653
2.714
0.043
0.049
0.067
0.423
0.484
0.834
-0.063 – 0.105
-0.040 – 0.152
-0.043 – 0.219
0.269– 1.927
0.704 – 2.602
1.079 – 4.349
33541.0
33542.0
31476.9
3390.7
3387.7
3312.4
0.211
0.272
0.439
1.924
2.507
3.591
0.023
0.029
0.041
0.300
0.383
0.524
0.166 – 0.256
0.215 – 0.329
0.359 – 0.519
1.336 – 2.512
1.756 – 3.258
2.564 – 4.618
33564.0
33557.0
33427.4
3395.7
3388.5
3279.2
Linear-quadratic models:
1 SE( 1 )
95% CI of
1
2 SE( 2 ) 95% CI of
2
AIC Quadratic models:
2 SE( 2 )
95% CI of AIC
2
*SIMEX method implemented with eight additional naive risk estimates, 20 repetitions of the simulation steps and using the linear quadratic function for the extrapolation step. ERR: excess relative risk per Gy; SE: Standard Error;
2: ERR per Gy2; CI: confidence intervals; AIC: Akaike Information Criterion
1: ERR per Gy;
21
Table 2: Naïve (ignoring error) and corrected excess absolute risk (EAR) estimates for attained age of 70 years after exposure at age 30 using the RCAL and SIMEX methods for solid cancer and leukaemia deaths in Life Span Study (LSS) of atomic-bomb survivors
Solid cancer deaths
Leukaemia deaths
Corrected estimates
Corrected estimates
Naive estimates
RCAL
SIMEX*
Naive estimates
RCAL
SIMEX*
EAR SE (EAR) 95% CI
20.810 4.310 12.362 – 29.258
22.469 4.649 13.357 – 31.581
30.673 5.320 20.246 – 41.100
2.322 0.505 1.332 – 3.312
2.482 0.539 1.426 – 3.538
2.790 0.544 1.724 – 3.856
AIC
33483.0
33485.0
33411.8
3376.4
3379.1
3333.4
19.479
18.258
44.823
1.485
1.251
1.152
5.335
5.414
12.426
0.524
0.517
0.647
9.022 – 29.936
7.647 – 28.83
20.468 – 69.178
0.458 – 2.512
0.238 – 2.264
-0.116 – 2.420
0.741
2.426
1.796
0.495
0.750
1.236
1.933
2.201
4.864
0.254
0.297
0.505
-3.048 – 4.530
-1.888 – 6.740
-7.737 – 11.329
-0.003 – 0.993
0.168 – 1.332
0.246 – 2.226
33485.0
33486.0
33306.1
3373.9
3373.1
3274.0
9.827
12.826
21.209
1.227
1.601
2.341
2.263
2.854
4.428
0.300
0.381
0.518
5.392 – 14.262
7.232 – 18.420
12.530 – 29.888
0.639 – 1.815
0.854 – 2.348
1.326 – 3.356
Linear models:
Linear-quadratic models:
1 SE( 1 )
95% CI of
1
2 SE( 2 ) 95% CI of
2
AIC Quadratic models:
2 SE( 2 )
95% CI of
2
AIC 33515.0 33507.0 33361.9 3383.3 3376.6 3266.5 *SIMEX method implemented with eight additional naive risk estimates, 20 repetitions of the simulation steps and using the linear quadratic function for the extrapolation step. EAR: excess absolute risk per Gy; SE: Standard Error; 1: EAR/104 person-years/Gy; 2: EAR/104 person-years/Gy2; CI: confidence intervals; AIC: Akaike Information Criterion
22
Table 3: Performance of the SIMEX method according to the choice of internal parameters: impact of the range of the factor of increase in the measurement error variance: results for a linear dose model for attained age of 70 years after exposure at age 30.
Solid cancer deaths 0.1
0.25
ERR
0.442
SE
¥
Leukaemia deaths
0.5
1
0.1
0.25
0.598
0.790
0.785
3.939
0.043
0.049
0.062
0.062
0.358 – 0.526
0.502 – 0.694
0.668 – 0.912
¥
0.5
1
4.616
5.828
5.875
0.607
0.670
0.837
0.843
0.663 – 0.907
2.749 – 5.129
3.303 – 5.929
4.187 – 7.469
4.223 – 7.527
ERR estimates
95% CI
EAR estimates EAR
21.485
30.673
41.857
41.081
2.430
2.790
2.991
2.998
SE
4.607
5.320
6.984
6.846
0.519
0.544
0.565
0.568
12.455 – 30.515
20.246 – 41.100
28.168 – 55.546
27.663 – 54.499
1.413 – 3.447
1.724 – 3.856
1.874 – 4.088
1.885 – 4.111
95% CI
SIMEX method implemented with twenty (0.1), four (0.5) or two (1) additional naive risk estimates, 20 repetitions of the simulation steps and using the linear quadratic function for the extrapolation step. ¥: This is the standard value used in the main analyses; ERR: excess relative risk per Gy; EAR: excess absolute risk/104 person-years/Gy; SE: Standard Error; CI: confidence intervals; AIC: Akaike Information Criterion
23
Table 4: Performance of the SIMEX method according to the choice of internal parameters: impact of the number B of simulation steps: results for a linear dose model for attained age of 70 years after exposure at age 30.
Solid cancer deaths
Leukaemia deaths
B=10
B=20
¥
B=100
B=200
B=10
B=20
¥
B=100
B=200
ERR
0.602
0.598
0.599
0.597
4.480
4.616
4.614
4.610
SE
0.049
0.049
0.049
0.049
0.651
0.670
0.667
0.670
0.506 – 0.698
0.502 – 0.694
0.503 – 0.695
0.501 – 0.693
3.204 – 5.756
3.303 – 5.929
3.307– 5.921
3.297 – 5.923
ERR estimates
95% CI
EAR estimates EAR
32.448
30.673
30.880
30.554
2.788
2.790
2.766
2.760
SE
5.324
5.320
5.321
5.324
0.537
0.544
0.540
0.538
22.013 – 42.883
20.246 – 41.100
20.451 – 41.309
20.119 – 40.989
1.735 – 3.841
1.724 – 3.856
1.708 – 3.824
1.706 – 3.814
95% CI
SIMEX method implemented with eight (0.25) additional naive risk estimates, 10, 100 or 200 repetitions of the simulation steps and using the linear quadratic function for the extrapolation step. ¥: This is the standard value used in the main analyses; ERR: excess relative risk per Gy; EAR: excess absolute risk/10 4 person-years/Gy; SE: Standard Error; CI: confidence intervals; AIC: Akaike Information Criterion.
24
Table 5: Performance of the SIMEX method according to the choice of internal parameters: impact of the choice of extrapolation functions: results for a linear dose model for attained age of 70 years after exposure at age 30.
Solid cancer deaths ¥
Leukaemia deaths ¥
Linear
Linear-quadratic
Cubic
Rational linear
Linear
Linear-quadratic
Cubic
Rational linear
ERR
0.405
0.598
1.561
0.676
3.844
4.616
10.124
5.163
SE
0.041
0.049
0.103
0.051
0.601
0.670
1.350
0.713
0.325 – 0.485
0.502 – 0.694
1.359 – 1.763
0.576 – 0.776
2.666 – 5.022
3.303 – 5.929
ERR estimates
95% CI
7.478 – 12.770 3.766 – 6.560
EAR estimates EAR
19.175
30.673
82.104
22.860
2.203
2.790
4.465
2.084
SE
4.227
5.320
12.007
6.135
0.499
0.544
0.658
0.555
58.570 – 10.835 – 34.885 1.225 – 3.181 1.724 – 3.856 3.175 – 5.755 0.996 – 3.172 105.638 SIMEX method implemented with eight additional naive risk estimates, 20 repetitions of the simulation steps and using the linear, cubic or rational function for the extrapolation step. ¥: This is the standard function used in the main analyses; ERR: excess relative risk per Gy; EAR: excess absolute risk/10 4 person-years/Gy; SE: Standard Error; CI: confidence intervals; AIC: Akaike Information Criterion.
95% CI
10.890 – 27.460
20.246 – 41.100
25