A Comparison between Cure Model and Recursive Partitioning: A ...

6 downloads 37 Views 1MB Size Report
Aug 8, 2016 - instrument for processing complex mixture cure models. Therefore, applying this ... Because of this achievement, cure model is becoming more.
Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine Volume 2016, Article ID 9425629, 8 pages http://dx.doi.org/10.1155/2016/9425629

Research Article A Comparison between Cure Model and Recursive Partitioning: A Retrospective Cohort Study of Iranian Female with Breast Cancer Mozhgan Safe,1 Javad Faradmal,1,2 and Hossein Mahjub1,3 1

Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran Modeling of Noncommunicable Disease Research Center, Hamadan University of Medical Sciences, Hamadan, Iran 3 Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran 2

Correspondence should be addressed to Hossein Mahjub; h [email protected] Received 9 March 2016; Revised 2 August 2016; Accepted 8 August 2016 Academic Editor: Chung-Min Liao Copyright © 2016 Mozhgan Safe et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Breast cancer which is the most common cause of women cancer death has an increasing incidence and mortality rates in Iran. A proper modeling would correctly detect the factors’ effect on breast cancer, which may be the basis of health care planning. Therefore, this study aimed to practically develop two recently introduced statistical models in order to compare them as the survival prediction tools for breast cancer patients. Materials and Methods. For this retrospective cohort study, the 18-year follow-up information of 539 breast cancer patients was analyzed by “Parametric Mixture Cure Model” and “Model-Based Recursive Partitioning.” Furthermore, a simulation study was carried out to compare the performance of mentioned models for different situations. Results. “Model-Based Recursive Partitioning” was able to present a better description of dataset and provided a fine separation of individuals with different risk levels. Additionally the results of simulation study confirmed the superiority of this recursive partitioning for nonlinear model structures. Conclusion. “Model-Based Recursive Partitioning” seems to be a potential instrument for processing complex mixture cure models. Therefore, applying this model is recommended for long-term survival patients.

1. Introduction Breast cancer, which is the second most prevalent cancer among Iranian females [1], is the most common cause of women cancer death in the world [2]. Iran Ministry of Health has reported the age-standardized incidence rate of 33.21 per 100,000 female population [3]. Iranian patients with breast cancer are younger than the west countries patients; this faster disease formation may lead to a heavier burden [1]. Furthermore, the earlier detection of breast cancer would improve the life expectancy [4] and this is another evidence for the need of valid modeling to precisely predict the patients’ hazard. A proper modeling would correctly detect the factors’ effect on breast cancer, which may be the basis of health care planning [5]. Cox Proportional Hazard and Weibull Models are the two most widely used techniques to model the survival of

breast cancer patients [6–9]. But admiring today’s medical progressions, there is a high probability of being cured [10]. Because of this achievement, cure model is becoming more proper method especially when curability of a disease could be considered as a reality [10, 11]. The same as mixture cure model that probably allocates population individuals into one of the cured or patients groups, there are various statistical learning algorithms which divide the population into homogenous subsets. Referring to their higher accuracy and lower error rates, several articles claim the excellence of these recently introduced algorithms to their traditional counterparts [12–15]. “ModelBased Recursive Partitioning” (MoBRP) is one of the most interpretable members of this family and provides a proper power of prediction in nonlinear regression relationships [16]. This model is a hybrid tree which combines the traditional model fitting with the tree machine learning algorithm.

2

Computational and Mathematical Methods in Medicine

Furthermore, MoBRP derives the benefits of regression trees such as the ability of detecting complex unknown model structures and interactions [16]. To the best of our knowledge, there is no study for modeling the survival time of Iranian breast cancer patients by using “Parametric Mixture Cure Models” (PMCM) and cautiously the only application of “Model-Based Recursive Partitioning” in survival analysis was made by Zeileis et al. to analyze German Breast Cancer dataset [17]. So the goal of this study is to compare the fitness of these two mentioned statistical methods through simulated and also practical breast cancer datasets.

2. Materials and Methods 2.1. Participants. For this retrospective cohort study, the information of 539 breast cancer patients was obtained. Approximately 37% of patients experienced death of breast cancer and the remaining were censored. These patients had been referring to Diagnostic Center of Hamedan Mahdieh Darolaytam during 1995–2013. The study entrance criteria were as follows: (i) Patients who have experienced one of the lumpectomy, quadrantectomy, simple or total mastectomy, or modified radical mastectomy surgeries. (ii) Female breast cancer patients who underwent chemotherapy and radiotherapy before or after surgery.

patients, while 𝑈 = 0 stands for cured individuals. Therefore, the cure model is defined as follows: 𝑆 (𝑡 | 𝑥, 𝑧) = 𝜋 (𝑧) 𝑆 (𝑡 | 𝑈 = 1, 𝑥) + (1 − 𝜋 (𝑧)) ,

(1)

where 𝑆(𝑡 | 𝑈 = 1, 𝑥) is the conditional survival of susceptible individuals given the vector of covariates 𝑥, this probability can be modeled by one of the usual survival models such as Weibull, which is the most proper in this context [10, 11, 18, 21–25], and 𝑆(𝑡 | 𝑈 = 0, 𝑥) is the survival function of nonsusceptible individuals and is embedded as one, in the aforementioned formula. 𝜋(𝑧) defines the probability of being susceptible and can be modeled by one of the binary regressions such as logistic which is more common [11, 18, 23–26], as 𝑧 is the vector of covariates and maybe the same as 𝑥. Finally, 𝑆(𝑡 | 𝑥, 𝑧) has been named marginal survival and shows the survival of the entire population. 2.3. Model-Based Recursive Partitioning. If a global model for all observations fits inappropriately, the total population could be split in a way that a proper fit is provided for each subset; this idea is the main motivation of MoBRP technique. This partitioning is actually a tree where each node is associated with a specific parametric model. The partitioning takes place in such a way that a stable model fitting is provided for each subset [16, 27]. More precisely, the algorithm for growing the tree is as follows: (1) Fit a parametric model to a dataset.

The event of interest was death of breast cancer and survival time was measured in days from the date of diagnosis to the date of participants’ death. Additionally, some medical prognostic and baseline characteristics factors were gathered, for example, “Human Epidermal growth factor Receptor 2” (HER2), “Progesterone Receptor Status” (PR), “Estrogen Receptor Status” (ER), and “number of involved lymph nodes.” 2.2. Mixture Cure Model. A basic assumption for almost all survival models is that, after sufficiently long follow-up, every individual in the population would eventually experience the event of interest. Actually this assumption is violated for some practical situations. Mixture cure is a flexible model that can overcome this limitative assumption. This model considers a subset of population as nonsusceptible. Nonsusceptible individuals are cured and would never experience the event of interest [18]. Clearly, a patient that is cured of breast cancer is nonsusceptible for experiencing the death of it. Cured individuals would appear as censor observations during the course of follow-up. Empirical evidence for the presence of nonsusceptible individuals is the long, stable plateau which usually contains heavy censoring at the end of Kaplan-Meier survival curves [19, 20]. Provided sufficient follow-up, stabled level of probability, at the right extreme of the Kaplan-Meier, is a consistent estimator for the proportion of nonsusceptible cured patients [20]. Let 𝑈 be the indicator variable that shows the status of being susceptible; 𝑈 = 1 stands for susceptible or uncured

(2) Statistically assess the stability of estimated parameters over some partitioning variables. (3) If there is an overall instability through all the estimated parameters, the population would be split along with the partitioning variable which is responsible for the most instability. It should be added that splitting points are chosen in such a way that residual sum of squares or negative log-likelihood is minimized. (4) Repeat the algorithm in each terminal node. To avoid overfitting, this kind of tree is accomplished by preand postpruning; prepruning is implemented via Bonferroni 𝑝 value correction for partitioning variable selection and postpruning can be done via “Akaike Information Criterion” or “Bayesian Information Criterion” [16]. 2.4. Simulation Study. A simulation study was planned in order to compare the performance of PMCM and MoBRP. Data were generated from Logistic-Weibull mixture cure model [18, 28], where 𝜋 (𝑧1 , 𝑧2 ) = 𝑃 (𝑈 = 1 | 𝑧1 , 𝑧2 ) =

exp (𝑏0 + 𝑏1 𝑧1 + 𝑏2 𝑧2 ) , 1 + exp (𝑏0 + 𝑏1 𝑧1 + 𝑏2 𝑧2 )

𝑆 (𝑡 | 𝑈 = 1, 𝑥) = exp [− exp (𝑐0 + 𝑐1 𝑥) 𝑡𝜌 ] .

(2)

Computational and Mathematical Methods in Medicine

3

Table 1: Results of Logistic-Weibull mixture cure model fitting on breast cancer patients’ database.

Logistic part of cure model Intercept Tumor size Number of involved nodes Weibull part of cure model Scale Shape PR+ ER+ HER2+ Radiotherapy AIC of cure model

95% confidence interval Lower Upper

𝑝 value

Estimate

Standard error

−0.08 0.39 0.05

0.58 0.19 0.09

−1.22 0.02 −0.11

1.05 0.77 0.22

0.89 0.04 0.53

7.99 0.62 0.99 −0.46 0.44 −0.43 3759.0

0.20 0.04 0.26 0.24 0.22 0.21

7.60 0.54 0.49 0.05 0.01 −0.85

8.38 0.70 1.50 −0.93 0.87 −0.01