A Bayesian Normal Mixture Accelerated Failure Time Spatial Model and its Application to Prostate Cancer Songfeng Wang and Jiajia Zhang Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, USA and Andrew B. Lawson Division of Biostatistics and Epidemiology, College of Medicine, Medical University of South Carolina, Charleston, SC, USA
1
In the United States, prostate cancer is the third most common cause of death from cancer in males of all ages, and the most common cause of death from cancer in males over age 75. It has been recognized that the incidence of the prostate cancer is high in African Americans, and its occurrence and progression may be impacted by geographical factors. In order to investigate the spatial effects and racial disparities for prostate cancer in Louisiana, in this paper we propose a normal mixture accelerated failure time spatial model, which does not require the proportional hazards assumption and allows the multi-mode distribution to be modeled. The proposed model is estimated with a Bayesian approach and it can be easily implemented in WinBUGS. Extensive simulations show that the proposed model provides decent flexibility for a variety of parametric error distributions. The proposed method is applied to 2000-2007 Louisiana prostate cancer data set from the Surveillance, Epidemiology, and End Results Program. The results reveal the possible spatial pattern and racial disparities for prostate cancer in Louisiana.
1
Introduction
Prostate cancer (PrCA) is a major public health threat, and is the third most common cause of death from cancer in the United States (U.S.) for males of all ages. Data from the Surveillance, Epidemiology, and End Results (SEER) program (2005-2007) shows that PrCA affects an estimated 1
Address for correspondence: Songfeng Wang, Department of Epidemiology and Biostatistics, University of South
Carolina, Columbia, SC 29208, USA . Email:
[email protected]
2 S Wang, J Zhang and A Lawson one in six American males over a lifetime 1 . How to understand the occurrence and progression of PrCA and its possible risk factors is of particular importance to researchers, clinicians and policy makers. One important characteristic of PrCA is its high prevalence and mortality rate in African Americans. Due to regional differences in socioeconomic status, environment and even accessibility to health facilities, PrCA often exhibits spatial patterns in some geographic areas. In order to investigate the large geographic variation and racial disparity in the survival rate of PrCA, a cancer registry with a relatively large African American population is preferred. We extracted PrCA data from the SEER cancer incidence public-use database 1 . After checking all registries, the state of Louisiana where African Americans comprise 32% of the population was chosen. Due to the impact of Hurricane Katrina for July - December 2005, Louisiana cases diagnosed for that six-month time period are excluded from the SEER database 1 . Given the limited observation period, the PrCA data can not represent the entire population, but it does represent the status of the incidence for the eight year period 2000-2007 in Louisiana. The PrCA mortality rate in Louisiana counties is mapped in Figure 1, and it clearly shows the geographical pattern of PrCA in Louisiana. Apparently, the PrCA mortality rate is higher in counties in central and northeast Louisiana, which is consistent with findings from a previous study 2 . Capturing spatial survival pattern by adjusting for possible risk factors has attracted much attention in recent years. Most spatial survival analysis and modeling focuses on the proportional hazards (PH) model and its extensions. For example, Aalen and Osnes 3 used a fully hierarchical Bayesian approach that incorporated the impact of geographical proximity to study the hazard ratios of breast cancer and malignant melanoma risk in Norway. Henderson et al. 4 used a PH model to study the survival of adult acute myeloid leukemia patients in northwest England with a multivariate gamma frailty model to incorporate spatial correlation. Li and Ryan 5 extended ordinary frailty models and applied an extended model to the East Boston Asthma Study. Banerjee et al. 6 applied a spatial frailty model with parametric Weibull baseline hazard for infant mortality in Minnesota counties, where the spatial association was modeled by both geostatistical and lattice approaches. Banerjee et al. 7 also applied the PH model to Iowa breast cancer data with frailty that accounts for spatial clustering. Hennerfeind et al. 8 extended the PH model with a flexible continuous-time geoadditive model and applied the model to waiting time for a coronary artery
Bayesian Normal Mixture AFT Spatial Model
3
Louisiana County Mortality 0.0588 - 0.1026 0.1027 - 0.1439 Madison
0.1440 - 0.1744 0.1745 - 0.2086
Red River
0.2087 - 0.2642 0.2643 - 0.3333
Figure 1: Prostate cancer mortality rate by Louisiana county from the SEER 2000-2007 data (Mortality rate calculated as cancer deaths / population)
bypass graft. Darmofal 9 examined the effect of spatial dependency on the timing of U.S. House members’ position announcements with both individual and hierarchical frailties, in both the PH model and Weibull accelerated failure time (AFT) model. However, the PH assumption can often be violated for covariates in practice. For the Louisiana data set, we checked the PH assumption for all 64 counties and illustrated results from nine selected counties from different geographic areas of Louisiana. Among these counties, Cadddo, Bossier and Webster are in the north west; Sabine, Avoyelles and Grant are in the middle west; Calcasieu, Vermilion and Acadia are in the south west. For each county, the Kaplan-Meier (KM) survival curves are fitted for white and black respectively (Figure 2). Figure 2 shows that some of the KM survival curves cross over in some counties, such as Webster, Avoyelles and Vermilion, which makes the PH assumption questionable. Additionally, we plotted the logarithm of the cumulative hazard function over the logarithm of survival time for these nine counties (Figure 3). We can see that in addition to the previously mentioned three counties, the lines are also not parallel for counties like Bossier and Grant, indicating violations of the PH assumption. We also investigate the survival probability of PrCA with respect to white and black males using the PH model for each county separately, and then calculate the Schoenfeld residual 10 . The results show that the PH assumption
4 S Wang, J Zhang and A Lawson
60
80
20
60
0
20
40
60
60
80
0
20
0.9 0.8 0.7
Survival Probability
white black
0.5
0.5
white black
0.6
0.9 0.8 0.7
Survival Probability
0.6
0.9 0.8
40
40
60
80
0
20
40
60
Time in Months
Calcasieu
Vermilion
Acadia
60
80
0
20
0.9 0.8 0.7
Survival Probability
white black
0.5
0.5
white black
0.6
0.9 0.8 0.7
Survival Probability
0.6
0.9 0.8
40
80
1.0
Time in Months
1.0
Time in Months
Time in Months
80
1.0
Avoyelles
1.0
Grant
0.7 0.6 0.5
20
0.8
80
Sabine
white black
0
0.7
Survival Probability
0.5 40
Time in Months
0.7 0.6
20
white black
Time in Months
1.0
0
0.9
1.0 0
0.6
0.9 0.8 0.7
Survival Probability
0.5 40
white black
0.5
Survival Probability
white black
Time in Months
1.0
20
0.6
0.9 0.8 0.7
Survival Probability
0.6 0.5
white black
0
Survival Probability
Webster
1.0
Bossier
1.0
Caddo
40
60
Time in Months
80
0
20
40
60
80
Time in Months
Figure 2: Kaplan-Meier survival curves of PrCA for black and white in nine different counties of Louisiana.
is not supported for counties like Vermilion (p=0.0214) and Washington (p=0.0116), and the pvalue from the global test indicates that the PH assumption is also violated (p=0.0007). All of these evidences suggest that the PH-based spatial model will be inappropriate for the Louisiana data set. The parametric AFT spatial model is a linear regression of the logarithm of survival time with additive spatial random effects, and often used as an important alternative to the PH spatial model in practice. Common distribution assumptions in the AFT spatial model include the extreme value, normal and logistic distributions. However, for the Louisiana PrCA data set, the logarithm of survival time in months has two modes (Figure 4), which indicates that the commonly used survival functions like normal or extreme value distributions may not be appropriate here. Kom´ arek et al. 11 proposed to use a set of penalized Gaussian mixture densities to model the error term in AFT models, and then extended their work to a mixture of bivariate normal components 12 . In recent work, Kom´ arek et al. 13 modeled error density with normal mixture distributions in a Bayesian mixed effects AFT model. They claimed that such a method “offers a rich family of distributions of various shapes suitable for modeling practically any survival data”, and their simulations show
Bayesian Normal Mixture AFT Spatial Model
4.0
2.5
3.0
3.5
4.0
−2 −3
4.5
2.5
3.0
3.5
Sabine
Grant
Avoyelles
4.0
4.5
−2
white black
−4
−4.0 3.5
−3
log(Cumulative Hazard)
−2.0 −3.0
log(Cumulative Hazard)
−2 −3 −4
white black
4.0
−1
log(Survival Time)
−1.0
log(Survival Time)
white black
4.5
3.0
3.5
4.0
4.5
2.5
3.0
3.5
log(Survival Time)
log(Survival Time)
log(Survival Time)
Calcasieu
Vermilion
Acadia
4.0
4.5
−1.5 −2.5 −3.5
log(Cumulative Hazard)
−2 −3 −4
white black
white black
−4.5
−5
log(Cumulative Hazard)
−2 −3 −4 −5
white black
−6
log(Cumulative Hazard)
−1
3.0
−4
log(Cumulative Hazard)
−2 −3 −4
4.5
−5
−1 3.5
white black
log(Survival Time)
−1
3.0
white black
−5
log(Cumulative Hazard)
−2 −3 −4 −5
log(Cumulative Hazard)
white black
2.5
log(Cumulative Hazard)
Webster −1
Bossier
−1
Caddo
5
2.5
3.0
3.5
4.0
log(Survival Time)
4.5
2.5
3.0
3.5
4.0
4.5
log(Survival Time)
3.0
3.5
4.0
4.5
log(Survival Time)
Figure 3: Logarithm of the cumulative hazard curves of PrCA for black and white in nine different counties of Louisiana.
that the normal mixture provides good approximations for different hazard and survivor functions. In this paper, we therefore propose a normal mixture AFT spatial model which can capture the multimodality of the survival times, and also can help us investigate the spatial pattern and racial disparities of PrCA in Louisiana. The normal mixture distribution allows the flexibility of the proposed model, and the closed density distribution also guarantees the computational efficiency and easy implementation of the proposed model and method. The individual-specific information for a patient used in this study include: age (age of the patient at diagnosis in complete years), race (white and black), county (the patient’s county of residence at the time of diagnosis), stage at diagnosis (SEER summary stage: localized/regional and distant), marital status at diagnosis (single, married and other), and survival time after diagnosis (including censoring time). Observations with missing values on these covariates at diagnosis are excluded from this study. According to the patients’ medical records, race includes white, black, or other. In this study the main interest lies in racial disparities between white and black males, and so cases for other races are excluded. For the four stages of cancer, we exclude the unstaged cases since unstaged means information is not sufficient to assign a stage for the cancer. It is worthwhile
0.0
0.2
0.4
Density
0.6
0.8
1.0
6 S Wang, J Zhang and A Lawson
2.5
3.0
3.5
4.0
4.5
Log of Survival Time
Figure 4: Density curve and histogram of logarithm of survival time in months
pointing out that clinically localized tumors are frequently upstaged to regional stage after surgery, so in the SEER data there is an extra category (localized/regional) only for PrCA. Observations from localized, regional, and localized/regional categories are combined into the localized/regional category for this analysis. The final data set includes 16743 patients with 2713 patients died from PrCA. The remainder of this paper is organized as follows. Section 2 describes the proposed normal mixture AFT spatial model. Section 3 presents the simulation results and findings, and Section 4 applies the proposed model and method to the PrCA data of Louisiana from the SEER program. Finally, Section 5 makes conclusions with discussions for possible future work.
2
The Proposed Model
Let Tij denote the survival time after diagnosis for the jth patient in the ith county, and xij denote the vector of the m possible risk factors corresponding to Tij , where i = 1, . . . , n, j = 1, . . . , ni . The spatial random effect Wi for the ith county can be modeled as a frailty term in the AFT model: log(Tij ) = βxij + Wi + εij ,
(1)
Bayesian Normal Mixture AFT Spatial Model
7
where β = (β1 , . . . , βm ) is the unknown coefficient vector with length m; εij s are independent random errors. The estimated coefficients and spatial random effects have direct impact on the logarithm of survival time in the above model. The density of ε is assumed to follow a normal mixture distribution: f (ε) =
k ∑
ωj ϕ(ε | µj , σj2 ),
(2)
j=1
where ϕ(· | µj , σj2 ) is the density of the jth normal component N (µj , σj2 ); ωj is the corresponding weight; and k is the number of normal mixture components. Normal mixture distributions with a relatively large but fixed number of normal components provide flexibility in modeling baseline curves. The model ((1) and (2)) is referred to as the normal mixture AFT spatial model, and the component number k can be decided by the logarithms pseudo marginal likelihood (LPML) approach. The spatial random effects can be correlated within and among counties. The spatially uncorrelated heterogeneity (uncorrelated county-specific effect) can be modeled with independent normal distributions defined as Wi ∼ N (0, σs2 ), where σs2 denotes the variance of spatial random effect. This is similar to the AFT frailty model with normal random effects at the county level. For correlated spatial heterogeneity, the conditional autoregressive (CAR) model introduced by Besag et al. 14 can be used. This formulation permits correlation among the random effects according to their spatial adjacency structures: 2 2 ), Wi |Wf , f ̸= i, σsi ∼ N (W i , σsi
i = 1, . . . , n
where ∑
f Wf gif ∑ f gif ∑ = vs2 / gif (vs2 is the variance of the random effects)
Wi = 2 σsi
f
gif
= 1 (if region i and f are adjacent, and i ̸= f ); 0 (otherwise)
It is often assumed that
∑
i Wi
= 0 to ensure the model’s identifiability. The specification of the
CAR model shows that in the ith region, Wi depends on both the number of neighbors and the corresponding values in the neighboring regions, thus allowing spatial correlation to be modeled. Both spatially correlated and uncorrelated random effects could be included in the same model
8 S Wang, J Zhang and A Lawson
to permit a trade-off between independence and dependence of the spatial random effects 14 , i.e., Wi = Wi1 + Wi2 where Wi1 is specified as the CAR model and Wi2 ∼ N (0, σs2 ). As pointed out in a previous study 2 , this combined spatially correlated and uncorrelated heterogeneity maintains the correlations between adjacent counties but weakens the correlations with the unadjacent counties. A Bayesian approach for inference is assumed here. In order to conduct the estimation procedure, we introduce a latent label variable rij defining the label of the normal component from which ε is drawn. Then the conditional distributions for ε and rij are given as:
εij | µ, σ, rij ∼ N (µrij , σr2ij ), P r(rij = h | k, ω) = ωh ,
i = 1, . . . , n,
j = 1, . . . , ni
h = 1, . . . , k.
For a finite mixture with a pre-specified number of components, the prior for the mixture weights (ω1 , . . . , ωk ) is a k-dimension Dirichlet distribution with prior “observation counts” equal to k, the number of normal mixture component. That is, (ω1 , . . . , ωk ) ∼ Dirichlet(1, 1, . . . , 1). The prior for the mean of the hth normal component µh is assumed to be an independent normal distribution with mean θh and variance σ02 , i.e. µh | θh , σ02 ∼ N (θh , σ02 ) (h = 1, . . . , k), and the prior for the precision (the inverse of variance) of the hth component σh−2 is assumed to be an independent gamma distribution with shape parameter α0 and scale parameter λ0 , i.e., σh−2 | α0 , λ0 ∼ G(α0 , λ0 ). The values of θh , σ02 , α0 , λ0 are pre-specified and can be non-informative if there is no prior information available. As a common practice, the priors for means of the normal mixture are assumed to follow the restrictions {θ : θ1 < · · · < θk }. For the unknown coefficient vector β, the prior for the lth unknown coefficient βl is taken to be a normal distribution βl ∼ N (µβ , σβ2 ) (l = 1, . . . , m). Let π(W | σs2 ) be the prior for the spatial random effects W . When the number of mixture component k is pre-specified, a Gibbs Sampler can be used to generate samples from the posterior distributions. Implementation of the above MCMC is quite straightforward either for users to program or to use software like WinBUGS. An MCMC estimation procedure and sample WinBUGS code for the proposed model with two normal mixture are given in the supplementary material.
Bayesian Normal Mixture AFT Spatial Model
3
9
Simulation study
The performance of the proposed method is investigated under a series of simulation scenarios. Each simulated dataset is generated from an AFT spatial model with the following form: log(Tij ) = β1 × x1ij + β2 × x2ij + Wi + εij ,
i = 1, . . . , n,
j = 1, . . . , ni .
where β1 = 2, β2 = 3. To be consistent with Louisiana, the number of counties n is set to be 64, and spatial random effects Wi are generated from the CAR model based on the adjacency matrix of the 64 counties in Louisiana. The two covariates x1ij and x2ij are generated from a standard normal distribution and a binomial distribution with a success probability of 0.5, respectively. In order to check the flexibility of the normal mixture distribution, the error term distributions are generated from: (1) a standard normal distribution, (2) a standardized extreme value distribution, (3) a normal mixture distribution with a form of 0.3N (1, 1) + 0.7N (10, 2), and (4) a gamma mixture distribution with a form of 0.4G(2, 1) + 0.6G(3, 1). In order to examine the effect of sample size, we consider (1) a small sample size where ni = 100, and (2) a large sample size where ni = 200 for i = 1, . . . , 64. Here, equal sample sizes are assumed in each county for the purpose of convenience only, since different sample sizes in counties are allowed in the proposed model. The censoring time is generated from a uniform distribution U (0, a), where a is chosen to reach a particular censoring rate. Two censoring scenarios are considered, which include a small censoring rate of 10% and a moderate censoring rate of 40%. The simulations performed by Roeder and Wasserman 15 suggest that normal mixtures with 10 components are sufficient for even highly complex densities. Therefore, for each dataset, the parameters are estimated using the proposed Bayesian approach with a 10 normal mixture AFT spatial model. We consider both the spatial correlated and uncorrelated effects, which are modeled by assuming Wi = Wi1 + Wi2 , with Wi1 as the spatially correlated random effect and Wi2 as the spatially uncorrelated random effect. Due to computational concerns, simulation for each scenario is replicated 100 times. The estimation bias, standard deviation of parameters, and 95% credible interval width (CIW) are calculated by the mean of biases, standard deviations and CI widths from the 100 replicates, respectively. The CI coverage probability (CICP) is calculated as the proportion of the 100 CIs that contain the true values. The results are reported in Table 1. It can be seen that the normal mixture AFT spatial model performs quite well under all the
10
S Wang, J Zhang and A Lawson
Table 1: Estimation bias, standard deviation (sd), 95% credible interval width (CIW) and coverage probability (CICP), for each parameter by considering the correlated and uncorrelated spatial effects 10% Censoring ni Error Dist. 100
200
Normal
bias
sd
40 % Censoring
CIW CICP(%)
bias
sd
CIW CICP(%)
β1 0.002 0.013 0.061
95
0.001 0.018 0.069
95
β2 0.000 0.026 0.102
98
0.004 0.032 0.127
98
Extreme
β1 0.002 0.014 0.054
98
0.002 0.020 0.080
94
Value
β2 -0.001 0.027 0.106
93
0.006 0.038 0.148
93
Normal
β1 -0.002 0.016 0.064
96
-0.001 0.019 0.075
94
Mixture
β2 0.000 0.033 0.128
99
0.009 0.038 0.149
96
Gamma
β1 -0.000 0.027 0.107
98
0.002 0.031 0.122
98
Mixture
β2 0.003 0.027 0.108
97
0.005 0.032 0.126
96
Normal
β1 0.001 0.009 0.037
94
0.004 0.012 0.048
97
β2 -0.002 0.018 0.072
99
0.005 0.023 0.089
92
Extreme
β1 0.000 0.010 0.038
95
0.003 0.014 0.055
96
Value
β2 0.003 0.019 0.074
97
0.005 0.026 0.103
96
Normal
β1 0.000 0.012 0.046
98
0.002 0.014 0.053
98
Mixture
β2 -0.002 0.023 0.091
96
-0.002 0.027 0.105
96
Gamma
β1 -0.001 0.019 0.073
97
-0.002 0.021 0.082
96
Mixture
β2 -0.002 0.019 0.073
98
0.002 0.022 0.085
98
Bayesian Normal Mixture AFT Spatial Model
11
scenarios. For the four different error distributions, the estimates have consistently small bias and standard deviations, and the 95% credible intervals all have coverage probability close to 95% and reasonable width. Compared to the small sample size (ni = 100), for the large sample size (ni = 200) the proposed model performs better with smaller standard deviations and narrower 95% credible intervals. Similar conclusions can be made when the lower censoring rate (10%) is compared to the moderate censoring rate (40%). In general, the proposed model provides robust estimations regardless of the true distribution of the error term, and thus provides a flexible way to fit AFT models without verifying the assumptions about the true underlying distributions.
4
Analysis of Louisiana SEER data
The impact of geographic patterns and racial disparities for the survival time of PrCA is investigated with Louisiana PrCA data from the SEER program. The variables considered in this normal mixture AFT spatial model are age, race, marital status and stage. In order to capture both the correlated and uncorrelated spatial patterns, the spatial random effects are estimated with the CAR model plus a normal random effect. Normal mixture distributions with a relatively large but fixed number of normal components provide enough flexibility in modeling different densities. Although a previous study indicates 10 normal components are enough 15 , a mixture with too many components may cause unnecessary complexity in both model specificity and model fitting. Therefore, when a normal mixture distribution is used, an important issue is to decide the number of mixture components. In frequentist statistics, this can be done using information criteria like Akaike Information Criterion (AIC) or Bayesian Information Criterion. In a Bayesian setting, an analog of AIC is the Deviance Information Criterion (DIC), which was first introduced by Spiegelhalter et al. 16 and defined as DIC = Dbar + pD. Here Dbar is the posterior mean of the deviance and pD is the effective number of parameters. However, the DIC requires a plug-in estimate of each stochastic parent, which could be a discrete node for a mixture distribution. Therefore, in this situation using Spiegelhalter’s DIC is not suggested, and the DIC option is thus disabled in WinBUGS when mixture distributions are involved 17 . Celeux et al. 18 explored different options for constructing an appropriate DIC for mixture models, but how to decide the most appropriate option and implement it into WinBUGS
12
S Wang, J Zhang and A Lawson
is not straightforward. When mixture models are involved, previous work show that logarithms pseudo marginal likelihood (LPML) works well as a model comparison tool 19;20 . LPML was originally proposed by Geisser and Eddy 21 , and then extended to survival data by Gelfand and Mallick 22 and Sinha and Dey 23 . LPML is based on the conditional predictive ordinate (CPO). For the ith observation, CPO is defined as
∫ CP Oi =
f (yi |θ, xi )π(θ|D(−i) )dθ,
where θ is the unknown parameter of interest; yi and xi are the response and covariate vectors for the ith observation, respectively; D(−i) is the dataset without the ith observation; and π(θ|D(−i) ) is the posterior density of θ based on data D(−i) . LPML is defined as LP M L =
n ∑
log(CP Oi ).
i=1
The model with the larger LPML value has better predictive ability and therefore is preferred. To decide the most appropriate number of components, we fit the data using the proposed model with different number of components, and then perform a model selection based on LPML. For each model, two MCMC chains are run for at least 75000 iterations with the first 70000 iterations discarded as burn-in. Convergence diagnostics are performed with trace plots and the BrooksGelman-Rubin (BGR) statistics 24 . The computation is performed on a PC with an i7-870 2.93GHz CPU and 8Gb RAM. Given the large sample size (16743 patients) of the data, for the model with two components, it takes around 20 hours to finish updating, and the computation time increases quickly with an increase in the number of normal components. For each model, LPML is calculated and the results are reported in Table 2. Table 2: LPML for normal mixture AFT spatial model with different number of normal components
Number of Components LPML Number of Components LPML
1
2
3
4
5
-16715.90 -16713.00 -16713.08 -16713.09 -19769.97 6
7
8
9
10
-19809.71 -19818.12 -19832.91 -19892.06 -21181.50
Bayesian Normal Mixture AFT Spatial Model
13
Table 3: The best fitted Bayesian two normal mixture AFT spatial model: mean parameter estimates, sample standard deviations, and quantiles mean
sd
2.5%
50%
97.5%
age
-0.04333*
0.00144
-0.0462
-0.0433
-0.04054
marital-single
-0.06064
0.04301
-0.1453
-0.06022
0.02381
marital-married
0.2177*
0.02986
0.1594
0.2178
0.2766
race
-0.2066*
0.025
-0.2554
-0.2066
-0.157
stage
-1.082*
0.04591
-1.171
-1.081
-0.9923
Based on Table 2, the AFT model with two normal components has the largest LPML compared to both the standard normal AFT model and normal mixture AFT models with a larger number of components. For the mixture models, LPML monotonically decreases when the number of normal components increases, and the difference in LPML increases when the difference in the number of normal components increases. We notice that the LPML for the three and four normal mixture models are both very close to the two normal mixture model, which suggests that these two models fit the data nearly as well as the two normal mixture model. Additionally, parameter estimates from the two, three and four normal mixture models (not listed) vary slightly. We choose the two normal mixture AFT spatial model as the best fitted model, since it has simplest model specification and the largest LPML. The parameter estimations for the variables of interest are summarized in Table 3. The results show that age, race and stage are significant risk factors for the survival time of PrCA (as indicated by * in the table). For marital status, the status “other” is used as the reference level, and we find a significant difference between “married” and “other”, but not between “single” and “other”. Since more than 75% of patients are married, there may not be enough cases for us to observe the effect of all the marital status levels. The interpretations of coefficients are straightforward. For example, given all the other covariates, one year increase in age will multiply the survival time by e−0.04333 = 0.958 with 95% CI (0.955,0.960); the ratio of median survival time for married patients compared to patients whose marital status are neither “single” nor “married” is e0.2177 = 1.243 with 95% CI (1.173,1.319); the ratio of median survival time for black compared
14
S Wang, J Zhang and A Lawson
Louisiana County Posterior median frailties (7)-0.0617 - -0.0411, Region1 (8)-0.0411 - -0.0185, Region2 Madison
(26)-0.0185 - 0.0032, Region3 (14)0.0032 - 0.0227, Region4
Red River
(6)0.0227 - 0.0510, Region5 (3)0.0510 - 0.1067, Rregion6
Figure 5: Posterior median of the spatial frailties by Louisiana county
to white is e−0.2066 = 0.813 with 95% CI (0.775,0.855); the ratio of median survival time for patients at “distant” stage compared to those at “localized/regional” stage is e−1.082 = 0.339 with 95% CI (0.310,0.371). To investigate spatial effects, the posterior medians of spatial random effects for each county are mapped in Figure 5. The map shows that the spatial random effect is much higher in the southeast and northeast regions of Louisiana. Several counties in the lower middle of Louisiana also exhibit high spatial effects. Since the spatial effect is additive to the logarithm of survival time in the AFT spatial model, survival times are expected to be longer in these areas with higher spatial random effects. Comparing Figure 1 to Figure 5, generally speaking the counties showing higher spatial random effects have lower mortality rates, which provides support for the spatial effect of PrCA from our AFT spatial model. However, two counties don’t seem to follow this pattern. We can see that Red River county has both small spatial frailty and a low mortality rate, while Madison county has large spatial frailty and a high mortality rate. Further examinations show that for these two counties, the numbers of patients are 27 and 36, and the numbers of events are 2 and 12, respectively. Given that sample sizes are so small in these two counties, the mortality rates are highly impacted by the
15
0.7 0.6
Region1 White Region6 White Region1 Black Region6 Black
0.4
0.5
Survival Probability
0.8
0.9
1.0
Bayesian Normal Mixture AFT Spatial Model
0
20
40
60
80
Survival Time in Months
Figure 6: Fitted survival curves for black and white males in region 1 & 6
number of events. Therefore, the two counties would be better treated as outliers. To have a better understanding of the geographic pattern and racial disparity, we stratify patients based on their race and geographic region. In Figure 6 we compare the fitted survival curves of a 67 years old, married patient in localized/regional stage, for black and white males in region 1 and 6 (i.e., the regions with the smallest and largest spatial frailties) of Louisiana, respectively. The above values of age, marital status and stage are used based on the medians of these covariates, and the spatial random effects for the two regions also take their medians. It shows that survival probabilities of PrCA are higher for white than black, and are higher in region 6 than region 1. The predicted median survival time for white males is 65.8 months in region 1 and 74.7 months in region 6; but for black males it’s only 50.7 months in region 1 and 58.5 months in region 6.
5
Discussion and Conclusion
In this paper we propose a Bayesian AFT spatial model whose error term is modeled as a finite normal mixture distribution. Both correlated and uncorrelated spatial heterogeneity are integrated in the frailty term. Simulations show that the proposed model is robust to a variety of different error
16
S Wang, J Zhang and A Lawson
distributions. The model can be used as an alternative parametric AFT model to capture complex densities with flexibilities, and model fitting can be performed using general purpose software like WinBUGS with the code provided (see supplementary material). We apply the proposed model to 2000-2007 Louisiana PrCA data from the SEER program. LPML is used to decide the number of normal components, and the two normal mixture model is chosen as the best fitted model. The results indicate that age, race and stage all have significant impact on the survival probabilities of PrCA, and there is also a significant difference between marital status “married” and “other”. More importantly, the model reveals the impact of geographic characteristics and captures the spatial patterns of PrCA in Louisiana. The proposed model can be extended in several ways. First, the normal mixture can have a flexible number of components by allowing a hyperprior distribution. However, the standard MCMC method used by WinBUGS can not sample the posterior distribution from spaces with varying dimensions, and a reversible jump MCMC has to be used 25 . Recently an extension for the reversible jump MCMC procedure has been under test in WinBUGS 26 , and selecting the number could be implemented in WinBUGS in the future. On the other hand, the issue may also be solved with infinite mixture models which assume the number of mixture approaches ∞, but only a finite number of them are “active” or “represented” at any time 27 . Rasmussen et al. 28 proposed an MCMC method for a hierarchical infinite Gaussian mixture model. Compared to the reversible jump MCMC 25 , this method can be easily extended to multivariate observations and seems easier to fit compared to working with an unknown number of finite mixtures. The infinite normal mixture model belongs to a more general family of Dirichlet process mixtures (DPM) 29;30;31 , which links our approach to some semiparametric AFT models 32;33 and provides possible directions to extend the current method.
6
Acknowledgement
We sincerely thank Dr. Tim Hanson from the Department of Statistics and Dr. Jim Hussey from the Department of Epidemiology and Biostatistics at the University of South Carolina for their help. The project is supported by National Cancer Institute grant CA139538.
Bayesian Normal Mixture AFT Spatial Model
References
17
mortality in Minnesota. Biostatistics 4(1) : 123–142.
1 National
Cancer
Surveillance
Institute,
Program,
Can-
7 Banerjee, S. and Dey, D. K. (2005). Semi-
released
April
parametric proportional odds models for spa-
2008, based on the November 2007 sub-
tially correlated survival data. Lifetime Data
mission.,
Surveillance
Analysis 11(2) : 175–191.
and
End
cer
gram
Research
DCCPS,
Statistics
and
Branch,
Results
,
Epidemiology (SEER)
(www.seer.cancer.gov)
Pro-
Limited-
Use Data (1973-2007). .
8 Hennerfeind, A., Brezger, A. and Fahrmeir, L. (2006). Geoadditive survival models. Journal of the American Statistical Association
2 Zhang, J. and Lawson, A. B. (2011). Bayesian
101(475) : 1065–1075.
parametric accelerated failure time spatial model and its application to prostate cancer.
9 Darmofal, D. (2009). Bayesian spatial sur-
Journal of Applied Statistics 38(3) : 591-603.
vival models for political event processes. American Journal of Political Science 53(1)
3 Osnes, K. and Aalen, O. O. (1999). Spatial
: 241–257.
smoothing of cancer survival: a Bayesian approach. Statistics in Medicine 18(16) : 2087– 10 Schoenfeld, D. (1982). Partial residuals for 2099. 4 Henderson, R., Shimakura, S. and Gorst,
the proportional hazards regression model. Biometrika 69(1) : 239-241.
D. (2002). Modeling spatial variation in 11 Kom´arek, A., Lesaffre, E. and Hilton, J. F. leukemia survival data. Journal of the Amer-
(2005). Accelerated failure time model for ar-
ican Statistical Association 97(460) : 965–
bitrarily censored data with smoothed error
972.
distribution. Journal of Computational and
5 Li, Y. and Ryan, L. (2002). Modeling spatial survival data using semiparametric frailty models. Biometrics 58(2) : 287–297.
Graphical Statistics 14(3) : 726–745. 12 Kom´arek, A., and Lesaffre, E. (2006). Bayesian semi-parametric accelerated fail-
6 Banerjee, S., Wall, M. M. and Carlin, B. P.
ure time model for paired doubly interval-
(2003). Frailty modeling for spatially corre-
censored data. Statistical Modelling 6(1) : 3–
lated survival data, with application to infant
22.
18
S Wang, J Zhang and A Lawson
13 Kom´arek,
A. and Lesaffre,
E. (2007).
(2009). Mixtures of Polya trees for flexible
Bayesian accelerated failure time model for
spatial frailty survival modelling. Biometrika
correlated interval-censored data with a nor-
96(2) : 263–276.
mal mixture as error distribution. Statistica Sinica 17(2) : 549–569.
20 Xu, L., Hanson, T., Bedrick, E. and Restrepo, C. (2010). Hypothesis tests on mix-
14 Besag, J., York, J. and Molli´e, A. (1991).
ture model components with applications in
Bayesian image restoration, with two appli-
ecology and agriculture. Journal of Agricul-
cations in spatial statistics. Annals of the In-
tural, Biological, and Environmental Statis-
stitute of Statistical Mathematics 43(1) : 1–
tics 15 : 308-326.
59.
21 Geisser, S. and Eddy, W. F. (1979). A predic-
15 Roeder, K. and Wasserman, L. (1997). Prac-
tive approach to model selection. Journal of
tical Bayesian density estimation using mix-
the American Statistical Association 74(365)
tures of normals. Journal of the American
: 153-160.
Statistical Association 92(439) : 894–902.
22 Gelfand, A. E. and Mallick, B. K. (1995).
16 Spiegelhalter, D. J., Best, N. G., Carlin, B.
Bayesian analysis of proportional hazards
P. and Linde, A. (2002). Bayesian measures
models built from monotone functions. Bio-
of model complexity and fit. Journal of the
metrics 51(3) : 843-852.
Royal Statistical Society, Series B 64(4) : 583–639. 17 Spiegelhalter, D. J., Thomas, A. and Best, N. G. (2003). WinBUGS User Manual, Cambridge: Medical Research Council Biostatistics Unit. 18 Celeux, G., Forbes, F., Robert, C. P. and Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis 1(4) : 651–673. 19 Zhao, L., Hanson, T. E. and Carlin, B. P.
23 Sinha, D. and Dey, D. K. (1997). Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association 92(439) : 1195-1212. 24 Brooks, S. P. and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7(4) : 434–455. 25 Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the
Bayesian Normal Mixture AFT Spatial Model
19
Royal Statistical Society, Series B 59(4) :
Ferguson
731–792.
schemes. The Annals of Statistics 1 :
26 Lunn, D. J., Best, N. and Whittaker, J. C.
distributions
via
P´olya
urn
353–355.
(2009). Generic reversible jump MCMC using 30 Ferguson, T. S. (1973). A Bayesian analysis graphical models. Statistics and Computing
of some nonparametric problems. The Annals
19(4) : 395–408.
of Statistics 1 : 209–230.
27 Chen, T., Morris, J. and Martin, E. (2006). 31 Antoniak, C. E. (1974). Mixtures of Dirichlet Probability density estimation via an infi-
processes with applications to Bayesian non-
nite Gaussian mixture model: application to
parametric problems. The Annals of Statis-
statistical process monitoring. Journal of the
tics 2 : 1152–1174.
Royal Statistical Society, Series C 55(5) : 699–715. 28 Rasmussen, C. E. (2000). The infinite Gaussian mixture model in Advances in Neural Information Processing Systems 12, Solla, S. A., Leen T. K. and K.-R. Mller (eds.) pp 554– 560. Cambridge, MA: MIT Press. 29 Blackwell, D. and MacQueen, J. B. (1973).
32 Christensen, R. and Johnson, W. (1988). Modelling accelerated failure time with a Dirichlet process. Biometrika 75(4) : 693– 704. 33 Kuo, L. and Mallick, B. (1997). Bayesian semiparametric inference for the accelerated failure-time Model. The Canadian Journal of Statistics 25(4) : 457-472.