A Stepwise Variable Selection for a Cox Proportional Hazards Cure ...

49 downloads 96 Views 200KB Size Report
A cure rate model is a survival model incorporating the cure rate on the assumption that a population contains both uncured and cured individuals.
Japanese Journal of Biometrics Vol. 34, No. 1, 21–34 (2013)

Preliminary Report

A Stepwise Variable Selection for a Cox Proportional Hazards Cure Model with Application to Breast Cancer Data Junichi Asano∗1 , Akihiro Hirakawa∗2 and Chikuma Hamada∗3 ∗1

Biostatistics Group, Center for Product Evaluation, Pharmaceuticals and Medical Devices Agency, Chiyoda-ku, Tokyo 100-0013, Japan ∗2 Center for Advanced Medicine and Clinical Research, Nagoya University Graduate School of Medicine, Showa-ku, Nagoya 466-8560, Japan ∗3 Department of Management Science, Graduate School of Engineering, Tokyo University of Science, Shinjuku-ku, Tokyo 162-8601, Japan e-mail:[email protected] A cure rate model is a survival model incorporating the cure rate on the assumption that a population contains both uncured and cured individuals. It is a powerful statistical tool for cancer prognostic studies. In order to accurately predict longterm outcome the proportional hazards (PH) cure model requires variable selection methods. However, no specific variable selection method for the PH cure model has been established in practice. In this study, we present a stepwise variable selection method for the PH cure model with a logistic regression for the cure rate and a Cox regression for the hazard for uncured patients. We conducted simulation studies to evaluate the operating characteristics of the stepwise method in comparison to those of the best subset selection method based on Akaike information criterion and of the convenience variable selection method that puts all variables in the PH cure model and selects the significant ones. The results demonstrated that in many cases the stepwise method outperformed other methods with respect to false positive determinations and estimation bias for the survival curve. In addition, we demonstrated the usefulness of the stepwise method for the PH cure model by applying it to analyze clinical data on breast cancer patients. Key words:

cancer prognosis; Cox proportional hazards regression; cure model; lo-

gistic regression; stepwise variable selection.

1.

Introduction In many cancer studies, some patients with long-term censored relapse-free periods may

be considered cured, while others may eventually have a relapse. In such cases, the problems Received November 2012. Revised June 2013. Accepted June 2013.

22

Asano et al.

of interest include estimation of the proportion of patients who may be cured (the cure rate), evaluation of treatment methods and other clinical factors that may influence the cure rate, and relapse-free survival time of uncured patients. In order to accommodate these requirements, mixture models (i.e., cure models) have been used in clinical trials comprising potentially cured patients, to estimate the proportion of cured patients and the survival chances for the uncured population. For example, disease-free survival (DFS) data on breast cancer patients who received neoadjuvant chemotherapy (NAC) should be analyzed by using the cure model because there are disease-free patients in 10 years after receiving NAC (Rastogi et al., 2008). Likewise, cure models to describe progression-free survival trends for multiple myeloma patients were introduced by Othus et al. (2012). In cancer prognostic studies involving patients with long-term censored survival, we often need to identify the clinical variables that affect the cure rate and long-term outcomes in order to develop a predictive prognostic model. In such cases, the proportional hazards (PH) cure model can be useful. However, it is difficult to select exploratory variables for the development of the predictive model, as there are vast number of variable combinations included into the regression for the cure rate and the PH regression for a long-term outcome. For example, with U variables, there are 2U × 2U possible cure models, depending on whether or not each variable is included.

The best subset selection based on information criterion (e.g., Akaike Information Criterion

(AIC) (Akaike, 1974) is often abandoned in favor of the variable selection method, because it is not possible or practical to search for all possible PH cure models (given their overwhelming number). Liu et al. (2012) recently applied the variable selection method based on penalized regression to the PH cure model. The variable selection based on penalized regression is stable (Breiman, 1996; Liu et al., 2012), but there are some issues related to practical application, such as an estimation algorithm and a strategy to determine the value of tuning parameter. In this study, we propose to use a stepwise variable selection method for the PH cure model with a logistic regression for the cure rate and Cox regression for the hazard for uncured patients, and discuss operating characteristics of the stepwise method for the PH cure model using simulated and actual data. The remainder of this paper is organized as follows. In Section 2, we introduce the PH cure model with logistic regression for the cure rate and Cox regression for hazard for uncured patients, and the stepwise variable selection method. In Section 3, we perform simulation studies to compare the operating characteristics of the stepwise variable selection method to those of the convenient and the best subset selection methods. In Section 4, we apply the stepwise variable selection method to the data on breast cancer population. Finally, in Section 5, we discuss the performance of the stepwise method in further details.

Jpn J Biomet Vol. 34, No. 1, 2013

Variable Selection for Cure Model

2. 2.1

23

Methods Cure Model Using Logistic and Cox Hazard Models Let Y be an indicator that an individual will eventually (Y = 1) or never (Y = 0) experience

the event, with probability p = P r(Y = 1). Let T denote the time to event, defined only when Y = 1, with probability density function f (t | Y = 1) and survival function S(t | Y = 1). For a

censored individual, Y is not observed. We assume an independent, noninformative, random

censoring model, and that censoring is statistically independent of Y . The semiparametric CoxPH mixture cure model has been examined previously (Kuk and Chen, 1992; Peng and Dear, 2000; Sy and Taylor, 2000). In the PH cure model, the probability density function of T can be written as follows: (1) f (t) = pf1 (t | Y = 1) + (1 − p)f2 (t | Y = 0) = pf1 (t | Y = 1). Rt The cumulative distribution function is defined as F (t) = p 0 f1 (u | Y = 1)du, and therefore the survival function S(t) = 1 − F (t) can be expressed as follows:

S(t) = (1 − p) + pS(t | Y = 1).

(2)

The survival functions for the overall patient population and uncured patients for t → ∞ are S(t) → 1 − p and S(t | Y = 1) → 0, respectively. For the probability of cure 1 − p, numerous

studies (Farewell, 1982; Kuk and Chen, 1992; Peng and Dear, 2000; Sy and Taylor, 2000) assume a logistic regression: ‹˘ ¯ x) = exp(β0 + β T x) 1 + exp(β0 + β T x) , P r(Y = 0 | x) = 1 − p(x

(3)

where β0 is the intercept, β = (β1 , · · · , βU )T is the vector of regression coefficient, and x =

(x1 , · · · , xU )T is the vector of exploratory variable. Next, the Cox regression model is assumed

for time t (Cox, 1972):

γ T x), x) = λ0 (t | Y = 1) exp(γ λ(t | Y = 1,x

(4)

where λ0 (t | Y = 1) is the baseline hazard function for uncured patients, and γ = (γ1 , · · · , γU )T is

the vector of regression coefficient. The cumulative hazard function for uncured patients is defined

γ T x); therefore, the cumulative baseline using equation (4) as Λ(t | Y = 1, x) = Λ0 (t | Y = 1) exp(γ Rt hazard function can be written as Λ0 (t | Y = 1) = 0 λ0 (u | Y = 1)du. The survival function for x) = S0 (t | Y = 1)exp(γγ uncured patients is S(t | Y = 1,x

Tx

x)

.

xi ), where ti is the We denote the observed data for a patient i (i = 1, · · · , n) by (ti , δi ,x

observed event or censoring time, δi = 1 if ti is uncensored, and δi = 0 otherwise. To simplicity, we suppose that common U variables are included into the logistic and Cox PH models the variable selection is performed (in practice, different variables can be included into each model).

Denote the k distinct event times by t(1) < t(2) < · · · < t(k) . It follows that if δi = 1, then yi = 1, and if δi = 0, then yi is not observed. The likelihood function of the PH cure model is: β ,γ γ , Λ0 ;y y ,x x) L(β0 ,β Jpn J Biomet Vol. 34, No. 1, 2013

24

Asano et al.

=

n Y

xi ) {1 − p(x xi )}1−yi pyi (x

i=1 n n Y

·

i=1

oδi yi

γ T xi ) λ0 (ti | Y = 1) exp(γ

n o γ T xi ) exp −yi Λ0 (ti | Y = 1) exp(γ

β ;y y ,x x)L2 (γ γ , Λ0 ;y y ,x x), = L1 (β0 ,β

(5)

β ,γ γ , Λ0 ) where y = (y1 , · · · , yn )T . Here we use the EM algorithm to estimate the parameters (β0 ,β

with the method developed by Sy and Taylor (2000). 2.2

Stepwise Variable Selection Method for the PH Cure Model In this section, we propose a stepwise variable selection method for the PH cure model. The

method selects the variables to be included into each regression in a stepwise manner by using the algorithm we have developed. In this algorithm, the inclusion and exclusion of the variables are determined based on the two-sided p-value for the Wald test for each regression coefficient θˆu , u = 1, · · · , 2U . We suppose that the vector of (θ1 , θ2 , · · · , θU , θU +1 , · · · , θ2U )T is the vector of (β1 , · · · , βU , γ1 , · · · , γU )T . In addition, the Wald test for the global null hypothesis is performed

for a variable with more than two categories in the algorithm. The algorithm of the stepwise variable selection method is as follows: for the arbitrary steps in the algorithm below, the variable

vectors included in and excluded from the logistic regression are defined as xin,L ((U − V ) × 1

xin,L ,x xout,L )T , and V is the number vector) and xout,L (V × 1 vector), respectively, where x = (x of variables not included in the logistic regression model. Similarly, the variable vectors included

in and excluded from the Cox regression model are defined as xin,C ((U − W ) × 1 vector) and

xin,C ,x xout,C )T , and W is the number of variables xout,C (W × 1 vector), respectively, where x = (x not included in the Cox regression model. Stepwise Algorithm Step 1 Step 1a: We start with the intercept-only logistic and Cox regressions in the PH cure model.

We assume that 1 − p = exp(β0 )/ {1 + exp(β0 )} for the cure rate, and λ(t | Y = 1) = λ0 (t |

Y = 1) for the time-to-event.

Step 1b: For each variable of x, we calculate the corresponding p-value for the coefficient in the logistic regression and obtain the U p-values. Similarly, for each variable of x, we calculate the corresponding p-value for the coefficient in the Cox regression and obtain the U p-values. Step 1c: Among the 2U p-values calculated in Step 1b, the smallest p-value is denoted as pmin . If pmin is smaller than the prespecified value of significant level αin , the variable xu that corresponds to pmin is included into the regression; otherwise, the variable selection is complete. For instance, in this step, if the variable xu is included into the logistic regression, then xin,L = xu , xout,L = (x1 , · · · , xu−1 , xu+1 , · · · , xU )T , and xout,C = (x1 , · · · , xU )T . Jpn J Biomet Vol. 34, No. 1, 2013

Variable Selection for Cure Model

25

On the other hand, if the variable xu is included into the Cox regression, then xin,C = xu , xout,C = (x1 , · · · , xu−1 , xu+1 , · · · , xU )T and xout,L = (x1 , · · · , xU )T . Step 2 Step 2a: We assume the PH cure model including xin,L and xin,C selected in Step 1 (or Step 3). Step 2b: For each variable of xout,L , we calculate the corresponding p-value for the coefficient when we include the variable into the logistic regression assumed in Step 2a. Similarly, for each variable of xout,C , we calculate the corresponding p-value for the coefficient when we include the variable into the Cox regression assumed in Step 2a. Step 2c: If the smallest p-value among (V + W ) p-values calculated in Step 2b, pmin , is smaller than the value of αin , then the variable xu that corresponds to pmin is included into the regression. Otherwise, variable selection is complete. Step 3 Step 3a: We assume the PH cure model including xin,L and xin,C selected in Step 2 (or Step 3c). Step 3b: We include all variables of xin,L into the logistic regression assumed in Step 3a and calculate the p-values for the coefficients. A similar procedure is carried out for the Cox regression. Step 3c: Among (2U − V − W ) p-values, the maximum p-value is denoted as pmax . If pmax is larger than or equal to the prespecified value of significant level αout , then the variable

xu that corresponds to pmax is excluded from the regression. Step 3d: Steps 3a - 3c are repeated until the condition pmax < αout is satisfied. Steps 2 and 3 are repeated until the criterion for completing the variable selection in Step 1 or Step 2 is satisfied, or until all the variables are included into the logistic and Cox regressions, i.e. V = W = 0. 3. 3.1

Simulation Study Simulation Setting We evaluated the operating characteristics of the PH cure model with stepwise variable se-

lection method through simulation studies using variable scenarios. To evaluate the performance of the stepwise variable selection method, we also employed two conventional variable selection methods, the convenient variable selection method that puts all variables in the PH cure model and selects the significant ones (Convenient method), and the best subset selection method based on Akaike Information Criterion (AIC). In practice, Cox regression model with variable selection is frequently applied without considering the cure patients. We therefore implemented the Cox regression model with stepwise variable selection method, and compared the operating characJpn J Biomet Vol. 34, No. 1, 2013

26

Asano et al.

teristics to the PH cure model with stepwise variable selection method in simulation studies. In the simulation studies, we assumed the PH cure model for the time-to-event te , as f (te ) = pf1 (te | Y = 1). In this model, we also assumed the logistic regression for cure rate, 1 − p = ˘ ¯ exp(β0 + β T x)/ 1 + exp(β0 + β T x) , and the Cox regression for hazards for uncured patients,

γ T x). Thus, the probability density function of te is given by: λ(te | Y = 1) = λ0 (te | Y = 1) exp(γ n oi h T γ T x) exp(−te )exp(γγ x ) . f (te ) = 1 − exp(β0 + β T x)/ 1 + exp(β0 + β T x) exp(γ (6) β ,γ γ and x, we obtained the te for patient i as follows: Utilizing f (te ), β0 ,β " ˘ ¯# Se,i − exp(β0 + β T xi )/ 1 + exp(β0 + β T xi ) 1 ˘ ¯ , te,i = log γ T xi ) − exp(γ 1 − exp(β0 + β T xi )/ 1 + exp(β0 + β T xi )

(7)

where Se,i is a uniform random number between 0 and 1. The probability density function of time-to-censor tc is fc (tc ) = λc exp(−λc tc ). Given λc , we obtained tc for a patient i: tc,i = −

1 log Sc,i , λc

(8)

where Sc,i is a uniform random number between 0 and 1, and λc is set to 0.1 and 0.4 for the slight and heavy proportion of censoring, respectively. Both time-to-event te,i and time-to-censor tc,i were generated for a patient; if te,i ≤ tc,i then the time ti was set to te,i , otherwise, the time ti

was set to tc,i . We considered four variables x1 , · · · , x4 ; of them x1 and x2 were generated from the independent Bernoulli distribution with probability 0.5, while x3 and x4 were generated from

the bivariate normal distribution with mean vector 0, and covariance matrix with variance 1 and correlation Corr(x3 , x4 ) = 0.25. We performed simulation studies with eight scenarios shown in Table 1. In Scenario 1, variables x1 influenced the hazard for uncured patients, while variables x2 affected the cure rate. Variables x4 did not have an impact on either the cure rate or the hazard for uncured patients, whereas x3 had an impact on both. In Scenario 2, the impacts of variables x1 -x3 were small, compared with those in Scenario 1. The impact of x4 was the same for both scenarios (i.e., no impact). In Scenario 3, all variables influenced both the cure rate and the hazard for uncured patients. In Scenario 4, every variable influenced only the hazard for uncured patients. Table 1. Eight scenarios for regression coefficients of four variables in the logistic and Cox regressions (β0 = −0.7 for all scenarios). Scenario

(β1 , γ1 )

(β2 , γ2 )

(β3 , γ3 )

(β4 , γ4 )

1 2 3 4 5 6 7 8

(0.0, −0.7) (0.0, −0.2) (0.7, −0.7) (0.0, −0.7) (0.7, 0.0) (0.0, −0.7) (0.7, 0.0) (0.0, 0.0)

(−0.7, 0.0) (−0.2, 0.0) (−0.7, 0.7) (0.0, 0.7) (−0.7, 0.0) (0.0, 0.7) (−0.7, 0.0) (0.0, 0.0)

(0.7, −0.7) (0.2, −0.2) (0.7, −0.7) (0.0, −0.7) (0.7, 0.0) (0.7, 0.0) (0.0, −0.7) (0.0, 0.0)

(0.0, 0.0) (0.0, 0.0) (−0.7, 0.7) (0.0, 0.7) (−0.7, 0.0) (−0.7, 0.0) (0.0, 0.7) (0.0, 0.0)

Jpn J Biomet Vol. 34, No. 1, 2013

Variable Selection for Cure Model

27

Conversely, in Scenario 5 every variable influenced only the cure rate. In Scenario 6, variables x1 and x2 affected the hazard for uncured patients, whereas the remaining variables x3 and x4 influenced the cure rate. In Scenario 7, variables x3 and x4 affected the hazard for uncured patients, while the remaining variables x1 and x2 had impacts on the cure rate. Lastly, in Scenario 8 none of the variables had any impact on either cure rate or the hazard for uncured patients. Table 2. Result of variable selection for each variables in 1,000 simulations across all scenarios.

True regression coefficient =0 0

Result of variable selection Selected Not selected TP FP

FN TN

The number of patients, n, was set to 100 or 300. In the stepwise and convenient methods, the significant levels for inclusion and exclusion of a variable, i.e., αin and αout , were commonly set to 0.15. A simulation was performed 1,000 times for each setting. For each simulation, we calculated the number of variables for true positive (TP), true negative (TN), false positive (FP), and false negative (FN), as defined in Table 2. The average numbers of variables for FP and FN for the two regressions in 1,000 simulations were reported. In each simulation, we estimated the conditional survival proportions for each patient’s time by using the PH cure model including the selected variables, and these estimates were compared with (6) sets the true coefficient value. For each selection method, the average difference of the estimated conditional survival proportions for each patient’s time in 1,000 simulations was reported. 3.2

Simulation Results Table 3 shows the average number of variables for FP and FN in 1,000 simulations obtained

using Scenarios 1-8. Between the PH cure models with variable selection, the numbers of cases with the smallest average number of variables for FP were 44 for the stepwise method, 1 for the convenient method and 0 for the best subset selection method, while these numbers for FN were 3 for the stepwise method, 20 for the convenient method and 12 for the best subset selection method, respectively. Table 4 shows the average of the estimated conditional survival differences in 1,000 simulations using Scenarios 1-8. The numbers of cases with the smallest average difference in the estimated conditional survival proportion were 20 for the stepwise method, 8 for the convenient method, and 4 for the best subset selection method, respectively. The numbers of cases with the smallest average number of variables for FP were 22 for the PH cure model with the stepwise method and were 2 for the Cox regression model with the stepwise method, while those for the FN were 18 for the PH cure model with the stepwise method and were 4 for the Cox regression model with the stepwise method (Table 3). The numbers of cases Jpn J Biomet Vol. 34, No. 1, 2013

n

Method

Cox

Cox

Stepwise

0.85 0.41

0.78 0.93 0.68 0.79 0.84 0.92 1.08 0.85 0.98

Best Subset

0.11 0.71

0.27 0.84 0.03 0.56 1.03 0.80 0.75 0.71 0.54

Cox regression model

0.27 0.79 0.03 0.47 1.09 0.68 0.83 0.51 0.59

1.08 0.49

Convenient

0.44 0.51

Stepwise

300 PH cure model

Stepwise

Cox regression model

0.93 1.04 0.80 0.83 0.87 1.05 1.05 0.97 1.56

0.63 0.94 0.33 0.64 1.05 0.89 1.08 0.78 1.37

Convenient

Best Subset

0.71 0.86 0.37 0.49 1.15 0.73 1.24 0.52 1.44

Stepwise

0.4 100 PH cure model

Stepwise 0.87 0.43

0.16 0.53 0.01 0.46 1.06 0.42 0.78 0.45 0.16

Best Subset

0.22 0.92

0.17 0.46 0.00 0.40 1.18 0.40 0.68 0.41 0.26

Convenient

Cox regression model

0.17 0.46 0.00 0.39 1.18 0.38 0.69 0.37 0.26

Stepwise

300 PH cure model

Stepwise 1.08 0.48

0.63 0.57 0.28 0.49 1.35 0.47 1.19 0.49 0.94

Best Subset

0.54 0.62

0.57 0.52 0.17 0.44 1.40 0.46 1.14 0.47 1.11

Cox regression model

Cox

Logistic

Cox

Scenario4 Logistic

Cox

Scenario5

0.01

1.61

0.04

0.04

0.36

1.62

0.57

0.68

0.02

0.05

0.01

0.01

0.41

0.71

0.36

0.41

0.60

1.82 0.44

1.38 0.08

1.35 0.06

1.57

2.08 1.25

1.45 0.75

1.50 0.85

1.08

1.05 0.02

0.94 0.01

0.90 0.01

1.94

1.02 0.47

0.91 0.35

0.86 0.38

0.44

0.45

0.49

1.28

1.20

1.32

0.37

0.27

0.27

1.02

1.08

1.14

Cox

FN FP FN FP

Logistic

Scenario6 Cox

0.71 1.05

0.74 2.00

0.13 1.66

1.02 1.45

0.37 0.85

3.25

0.39 1.97

0.01 1.46

1.58 0.04 0.87 0.32 0.77 0.34 0.88 0.09 0.73

1.32 0.02 0.74 0.05 0.58 0.36 0.69 0.02 0.55

1.21 0.01 0.72 0.06 0.54 0.36 0.67 0.01 0.55

2.16

1.80 0.41 1.01 0.72 0.84 0.90 0.98 0.69 0.78

1.51 0.38 0.76 0.52 0.70 0.86 0.72 0.15 0.62

1.18 0.40 0.72 0.64 0.60 0.88 0.71 0.16 0.56

3.54

1.08 0.10 0.49 0.12 0.65 0.31 0.58 0.00 0.50

0.82 0.00 0.38 0.01 0.40 0.24 0.47 0.00 0.40

0.78 0.00 0.37 0.01 0.36 0.24 0.44 0.00 0.39

1.24 1.70

1.02 0.20 0.46 0.36 0.49 0.83 0.53 0.03 0.44

0.96 0.19 0.41 0.30 0.47 0.85 0.47 0.01 0.42

2.51

Logistic

Cox

Scenario8

1.79

1.58

1.25

2.05

1.76

1.34

0.92

0.76

0.74

0.86

0.82

0.74

1.30

1.70

1.46

1.06

1.25

1.92

1.56

1.04

1.26

1.12

0.85

0.76

1.22

0.96

0.99

0.85

FN FP FN FP FN FP FN FP

Logistic

Scenario7

0.84 0.23 0.41 0.32 0.43 0.87 0.43 0.01 0.40

FN FP FN FP FN FP FN FP FN FP FN FP

Logistic

Scenario3

0.59 0.50 0.17 0.41 1.44 0.41 1.19 0.42 1.17

FN FP FN FP

Logistic

Scenario2

Convenient

FN FP FN FP

Logistic

Scenario1

Average number of variables for FP and FN for logistic and Cox regression models in all scenarios.

Stepwise

0.1 100 PH cure model

λc

Table 3.

28 Asano et al.

Jpn J Biomet Vol. 34, No. 1, 2013

Variable Selection for Cure Model Table 4. λc

n

Method

0.1 100 PH cure model Stepwise Convenient Best Subset Cox regression model Stepwise 300 PH cure model Stepwise Convenient Best Subset Cox regression model Stepwise 0.4 100 PH cure model Stepwise Convenient Best Subset Cox regression model Stepwise 300 PH cure model Stepwise Convenient Best Subset Cox regression model Stepwise

29

Average conditional survival differences in all scenarios. Scenario1 Scenario2 Scenario3 Scenario4 Scenario5 Scenario6 Scenario7 Scenario8 6.55% 6.58% 6.93%

6.29% 6.38% 6.37%

7.69% 7.59% 7.83%

6.44% 6.43% 6.66%

7.46% 7.47% 7.47%

7.01% 6.97% 7.11%

7.03% 7.04% 7.06%

5.84% 5.89% 5.88%

8.82%

6.44%

12.36%

8.74%

11.83%

10.76%

11.00%

6.64%

3.55% 3.55% 3.52%

3.70% 3.71% 3.74%

3.99% 3.99% 3.92%

3.54% 3.55% 3.50%

4.04% 4.05% 4.22%

3.62% 3.63% 4.05%

3.87% 3.88% 4.03%

3.25% 3.26% 3.27%

7.55%

4.01%

10.84%

6.60%

10.86%

9.42%

9.60%

3.76%

7.69% 7.79% 9.79%

7.81% 8.29% 9.09%

9.53% 9.17% 11.91%

8.68% 8.36% 10.23%

8.74% 9.05% 9.58%

9.04% 8.96% 9.94%

8.28% 8.23% 9.82%

7.57% 8.29% 9.03%

7.15%

6.26%

10.64%

8.37%

10.43%

9.78%

9.69%

6.92%

3.99% 4.01% 6.79%

4.37% 4.50% 4.90%

4.81% 4.76% 8.54%

4.27% 4.28% 5.33%

4.87% 4.88% 4.83%

4.58% 4.58% 5.25%

4.45% 4.46% 4.73%

4.15% 4.41% 4.55%

5.60%

3.79%

8.87%

5.83%

9.22%

7.96%

8.11%

3.91%

with the smallest average difference in the estimated conditional survival proportion were 26 for the PH cure model with the stepwise method and were 6 for the Cox regression model with the stepwise method, respectively. 4.

Application to Breast Cancer Data In this section, we demonstrate the application of the stepwise method to real data, com-

prising 368 breast cancer patients who received neoadjuvant chemotherapy (NAC) at National Cancer Center Hospital between May 1995 and July 2007 (Hirata et al., 2009; Asano et al., under review). Neoadjuvant chemotherapy (NAC) was introduced first in the early 1980s to improve tumor operability in patients with locally advanced breast cancers (Kaufmann et al., 2003). A subset of NAC-treated primary breast-cancer patients was reported to achieve long-term disease-free survival (DFS) (Rastogi et al., 2008). Accordingly, these patients did not experience recurrences, metastases, and did not die during the study period, subsequently being clinically considered as “cured.” In our example, DFS was defined as the time from surgery to the date of the disease relapse, death by any cause, or the date of the last clinical visit for patients without complications. The median DFS was 3.3 year. Figure 1 shows the Kaplan-Meier curves for DFS. Jpn J Biomet Vol. 34, No. 1, 2013

30

Asano et al.

Fig. 1. Kaplan-Meier analysis showing disease-free patients’ (N = 368) survival. Short vertical lines indicate censored data points.

The 9-year DFS rate was 58.8%. The patients with long-term censored times were considered cured. Patients who may have been cured were included as part of this data. Along with the configurations used by Asano et al. (Under review), the following 8 variables were assessed by the Cox’s PH cure model: hormone status (positive vs. negative), essential thrombocythemia (ET) (yes vs. no), age (4

CI: confidence interval

1 1.02(0.56,1.87)

1 1.06 (0.42, 2.67)

Clinical stage IIA/IIB/IIIA IIIB/IIIC

≥35