Modified profile likelihood estimation for the Weibull

0 downloads 0 Views 1MB Size Report
Oct 29, 2018 - In parametric inference, maximum likelihood estimate (MLE) of a .... penalized regression techniques (e.g., lasso (Tibshirani 1996), elastic net ...
Communications in Statistics - Theory and Methods

ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20

Modified profile likelihood estimation for the Weibull regression models in survival analysis Md. Mazharul Islam, Md. Hasinur Rahaman Khan & Tamanna Hawlader To cite this article: Md. Mazharul Islam, Md. Hasinur Rahaman Khan & Tamanna Hawlader (2018): Modified profile likelihood estimation for the Weibull regression models in survival analysis, Communications in Statistics - Theory and Methods To link to this article: https://doi.org/10.1080/03610926.2018.1472784

Published online: 29 Oct 2018.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lsta20

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 2018, VOL. 00, NO. 0, 1–15 https://doi.org/10.1080/03610926.2018.1472784

Modified profile likelihood estimation for the Weibull regression models in survival analysis Md. Mazharul Islam, Md. Hasinur Rahaman Khan, and Tamanna Hawlader Institute of Statistical Research and Training (ISRT), University of Dhaka, Dhaka, Bangladesh ABSTRACT

ARTICLE HISTORY

In this study, adjustment of profile likelihood function of parameter of interest in presence of many nuisance parameters is investigated for survival regression models. Our objective is to extend the Barndorff– Nielsen’s technique to Weibull regression models for estimation of shape parameter in presence of many nuisance and regression parameters. We conducted Monte-Carlo simulation studies and a real data analysis, all of which demonstrate and suggest that the modified profile likelihood estimators outperform the profile likelihood estimators in terms of three comparison criterion: mean squared errors, bias and standard errors.

Received 26 June 2015 Accepted 24 February 2018 KEYWORDS

Modified profile likelihood; profile likelihood; weibull regression model MATHEMATICS SUBJECT CLASSIFICATION

62N02

1. Introduction One of the most common and widely accepted approaches to derive estimators is the method of maximum likelihood. In parametric inference, maximum likelihood estimate (MLE) of a parameter is obtained by maximizing corresponding likelihood function, either by differential calculus or numerical optimization using computer program (Casella and Berger 2002). It is very common that the likelihood function contains several parameters and only a few of them are of interest. These parameters are usually termed as ‘parameters of interest’ and the remaining are called nuisance parameters. The most computationally feasible and widely used approach to make inferences about a parameter of interest in presence of nuisance parameters is the profile likelihood approach. Under this approach, the nuisance parameters in the likelihood function are replaced by their MLEs, obtained under given values of parameters of interest, so that the resulting profile likelihood function becomes a function of the parameters of interest only. This procedure may yield inconsistent estimates when the likelihood function contains a large number of nuisance parameters, even though optimality may not be achieved with a small number of nuisance parameters (Cox and Reid 1987; Fraser and Reid 1989). However, in small sample scenario, estimates become considerably biased in profile likelihood scheme (Da Silva, Ferrari, and Cribari-Neto 2008). A modification to the profile likelihood is introduced to overcome these types of problems. Several adjustments have been proposed to modify the profile likelihood function. In a study by Barndorff-Nielsen published in Biometrica (1980), the author described the construction of ancillary statistics and expressions for the conditional distribution of the MLE’s for exponential models and transformation models. (Barndorff-Nielsen 1980). In 1983, the same authors provided a formula, which was the synthesis and extension of results found in several research CONTACT Md. Hasinur Rahaman Khan [email protected] University of Dhaka, Dhaka 1000, Bangladesh. © 2018 Taylor & Francis Group, LLC

Institute of Statistical Research and Training (ISRT),

2

M. M. ISLAM ET AL.

studies including that of Fisher (1934), Fraser and Fraser (1968), Daniels (1954), BarndorffNielsen and Cox (1979), Cox (1980), Hinkley (1980), and Barndorff-Nielsen (1983). Their formula leads to a modification of the traditional profile likelihood approach when multidimensional parameters exist. When dealing with the incidental parameter problem, which may arise in several other contexts, such as with multilevel data, a general approach for producing estimators with reduced bias is the modified profile likelihood approach, which was first proposed by Barndorff–Nielson in 1983. Later, Cox and Reid (1987) also proposed certain types of modifications based on the conditional likelihood given MLEs of the orthogonalized parameters (Cox and Reid 1987). However in this paper, we only consider the modified profile likelihood estimation scheme. The modified profile likelihood is typically obtained as an approximation to a marginal or conditional likelihood, if they exist. The approach uses a formula which yields an approximation to the probability density function of the maximum likelihood estimator conditional on an ancillary statistic (Da Silva, Ferrari, and Cribari-Neto 2008; Islam and Khan 2016). Computation of modified profile likelihood estimate becomes flexible if the probability function exists in closer form. For example, the modified profile likelihood estimate of the variance σ 2 of normal distribution (Barndorff-Nielsen 1983) can be easily obtained. In some complex scenarios, closed form expressions may not exist or the may be too difficult to derive. An approximation to the modified profile likelihood function was proposed by Severini (1998). In the study of lifetime data, the Weibull distribution has been playing an important role since its innovation. It has been applied successfully to problem arising both in the physical sciences as well as the biological sciences. For example, the Weibull distribution has been used to model the durability of manufactured items, such as automobile components, electronic items, ball bearing etc, and in the study the lifetimes of patients suffering from various diseases. A study of factors associated with lifetime of interest often involves regression models considering certain distributional assumptions on lifetime. Weibull regression model, a popular parametric regression model, is used to identify the relationship between Weibull lifetime and suspected covariates (Lawless 2011). Let t be a random variable that represents the time to the occurrence of an event of interest. Also t is assumed to have the Weibull distribution given by     β t β−1 exp −(t/α)β , t > 0 f (t; α, β) = α α where α is the scale parameter and β is the shape parameter. Suppose the Weibull regression model is used to model the dependence of the scale parameter on the covariates of interest x = (x1 , x2 , . . . , xp ). Then we may write t ∼ W(α(x), β). The log transformation of t (log t = y, say) then follows the Extreme Value distribution which is denoted by y ∼ EV(η(x), κ) with location parameter η(x) and scale parameter κ. The pdf is given by    y − η(x) y − η(x) 1 − exp , −∞ < y < ∞ f (y; η(x), κ) = exp κ κ κ The parameters of the extreme value distribution may be expressed in terms of the parameters of the Weibull distribution as κ = β −1 and η(x) = log α(x). The corresponding survival and distribution functions are given by    y − η(x) S(y; η(x), κ) = exp − exp κ

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

3

and    y − η(x) F(y; η(x), κ) = 1 − exp − exp κ respectively. Suppose that η(x) = xψ, where ψ = (ψ1 , ψ2 , . . . , ψp ) are the regression parameters which index the model parameters. The parameters can then be estimated by the method of maximum likelihood (Da Silva, Ferrari, and Cribari-Neto 2008). The main goal of the study is to make adjustments to the profile likelihood function when the parameters of interest are, firstly, shape parameter β (or, alternatively, κ) and secondly, regression parameters ψ and when there exists collinearity among the covariates in the Weibull regression model. Correlated covariates are found in many areas of biostatistics including microarray experiments, genetics, and medical statistics. The presence of collinearity sometimes makes the estimation and inference procedure more difficult. In such situations, the model as a whole can be found to be statistically significant although the individual regression coefficients are not (Chimka and Wang 2009). Several remedial measures have been used to solve the problem of collinearity among the covariates in both incomplete and complete data sets. When the sample size is very small, the typical remedial measure is to delete certain variables from the model. But in high-dimensional settings and even in some cases for low dimensional settings, it becomes necessary to introduce several penalized regression techniques (e.g., lasso (Tibshirani 1996), elastic net (Zou and Hastie 2005) for complete data and the methods in Khan and Shaw (2016a, 2016b) and Khan and Shaw (2017) for incomplete data. The paper is organized as follows—Section 2 presents the basic formula and features of profile and modified profile likelihood estimation techniques. Besides, the adaptive profile likelihood techniques for Weibull regression models are discussed in this section. In Section 3, several simulation studies are conducted to compare the performance of the profile and the modified profile likelihood estimates. Section 4 describes the implementation of the profile and modified profile likelihood methods to leukemia real data set. The final section presents a discussion and summary of the results obtained.

2. Methods 2.1. Modified profile likelihood function Suppose y = (y1 , y2 , . . . , yn ) is a vector of sample of n observations from a density f (.; θ), where θ can be partitioned as θ = (σ , ξ ). Let σ be the parameter of interest and ξ , in the parameter space, be considered as a set of p nuisance parameters as ξ = (ξ1 , ξ2 , . . . , ξp ). Further let, ξˆσ = (ξˆ1σ , ξˆ2σ , . . . , ξˆpσ ) be the maximum likelihood estimate of ξ for a given value of σ . Then the profile likelihood function of σ is defined by Lp (σ ) = L(σ , ξˆσ ) The modified profile likelihood function Lmp (σ ) for a parameter of interest σ with nuisance parameter ξ , is defined by Lmp (σ ) = M(σ )Lp (σ )

(1)

4

M. M. ISLAM ET AL.

where M is a modifying factor given by

   ∂ ξˆ    ˆ − 12 M(σ ) =   | jσ |  ∂ ξˆσ 

∂ ξˆ is the partial ∂ ξˆσ derivative, and ˆjσ = jξ ξ (σ , ξˆσ ) is the observed information matrix of (p × p) dimension on ξ when σ is assumed to be known Young and Smith (2005). Here | . | is used to denote the absolute value of a matrix determinant,

2.2. Approximation to modified profile likelihood function    ∂ ξˆ    The partial derivative matrix  , also a matrix of (p×p) dimension, must be derived to get  ∂ ξˆσ  the modified profile likelihood function. In many cases, this partial derivative does not have ∂ ξˆ is available. any closed form. An alternative expression of Lmp (σ ) that does not involve ∂ ξˆσ However, it involves a sample space derivative of log-likelihood function and specification of ancillary a such that (σˆ , ξˆ , a) is a minimal sufficient statistic (Severini 2001). ∂ ξˆ can be given as An alternative expression for ∂ ξˆσ ∂ ξˆ = jξ ξ (ξˆσ , σ ; ξˆ , σˆ , a) ξ ;ξˆ (ξˆσ , σ ; ξˆ , σˆ , a)−1 ∂ ξˆσ where ∂ ξ ;ξˆ (ξˆσ , σ ; ξˆ , σˆ , a) = ∂ ξˆ



∂ (ξˆσ , σ ; ξˆ , σˆ , a) ∂ξ

(2)

(3)

Here (ξˆσ , σ ; ξˆ , σˆ , a) and jξ ξ (ξˆσ , σ ; ξˆ , σˆ , a) are the profile log likelihood function and the observed information for ξ , respectively. They depend on the data only through the minimal sufficient statistic (Da Silva, Ferrari, and Cribari-Neto 2008). An alternative formula to Equation (3), which can be obtained through approximately ancillary statistic, is given by, for st th component of matrix, where s = (1, 2, . . . , p) and t = (1, 2, . . . , p), ξs ;ξˆt (ξˆσ , σ ; ξˆ , σˆ , a) = ξs ;y (ξˆσ , σ )

∂y ∂ ξˆt

= ξs ;y (ξˆσ , σ ) vξˆt   = ξs ;y (ξ , σ ) v ˆ ξˆt ξ =ξσ

where the vector of dimension (1 × n) component   ∂ ∂ ∂ ξs ; y (ξ , σ ) = ξs (ξ , σ ; y1 ), ξs (ξ , σ ; y2 ), . . . , ξs (ξ , σ ; yn ) ; ∂y1 ∂y2 ∂yn ∂ ξs (ξ , σ ; yj ) = (ξ , σ ; yj ) ∂ξs

(4)

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

5

and the vector of dimension (n × 1) component   ∂F(y1 ; ξ , σ )/∂ξt  ∂F(y2 ; ξ , σ )/∂ξt  ∂F(yn ; ξ , σ )/∂ξt  vξˆt = −  ˆ ,−  ˆ ,...,−  ˆ ξ =ξσ ξ =ξσ ξ =ξσ f (y1 ; ξ , σ ) f (y2 ; ξ , σ ) f (yn ; ξ , σ ) Here, (ξ , σ ; yj ) is the log-likelihood, f (yj ; ξ , σ ) is the density and F(yj ; ξ , σ ) is the distribution function. Hence, using Equation (1) the resulting modified profile likelihood function becomes as below Lmp (σ ) = Lp (σ )|jξ ξ (ξˆσ , σ )|1/2 | ξ ;ξˆ (ξˆσ , σ )|−1 2.3. Modified profile likelihood of Weibull shape parameter β Da Silva, Ferrari and Cribari–Neto presented Barndorff–Nielson’s modified profile likelihood for Weibull shape parameter (Da Silva, Ferrari, and Cribari-Neto 2008). Suppose y1 , . . . , yn is a set of independent random variables drawn from Extreme value distribution yj ∼ EV(η(xj ), κ), j = 1, 2, . . . , n. We also assume that C and C¯ represent the sets of censored and uncensored observations, respectively. Suppose η(x) = ψx and δj = I(Tj ≤ Cj ), where Tj and Cj are the failure and censoring times, respectively. Let r = nj=1 δj be the observed number of failures and xj = (xj1 , xj2 , . . . , xjp ), where we assume xj1 = 1 for all j. The resulting log likelihood for κ becomes   n  yj − xj ψ  yj − xj ψ exp − (5) p (κ) = −r log κ + κ κ j∈C¯

j=1

  

In matrix notation, Vˆ ψ = v1 , . . . , vn , where vj = xj if j ∈ C¯ and vj = 01×p if j ∈ C where 0 denotes the zero matrix. For uncensored data Vˆ ψ = X, where X is the matrix of p covariates. Z is a diagonal matrix with (jj)th element exp(zj ) for j = 1, 2, . . . , n. According to Da Silva et al., the function mp (κ) is given by     1     mp (κ) = p (κ) + p log κ + log X ZX − log X ZVˆ ψ  (6) 2 where mp (κ) is the modified version of profile likelihood p (κ) (Da Silva, Ferrari, and Cribari-Neto 2008). 2.4. Modified profile likelihood of Weibull regression parameter ψ We focuss on regression parameter ψk of Weibull regression where we assume k = 1, . . . , p. y −η(x ) We model the scale parameter as η(x) = ψ1 x1 + ψ2 x2 + · · · + ψp xp . Suppose zj = j κ j so that the resulting log profile likelihood for ψk is given by p (ψk ) = −r log κ +



zj −

j∈C¯

n

exp(zj )

j=1

where κ is replaced by it’s MLE κ. ˆ Now,  xjs  xjs ∂ (ψ, κ) − =− exp(zj ) − ∂ψs κ κ n

j∈C¯

j=1

6

M. M. ISLAM ET AL.

=

n  xjs  j=1

κ

exp(zj ) −

xjs j∈C¯

κ

⎡ ⎤ n 1 ⎣ xjs exp(zj ) − xjs ⎦ = κ and

(7)

j∈C¯

j=1

⎡ ⎤ n ∂ 1 ⎣ ∂ 2 (ψ, κ) = xjs exp(zj ) − xjs ⎦ ∂ψs ∂ψt ∂ψt κ j=1 j∈C¯ ⎡ ⎤ n  xjt  1 ⎣ ⎦ = xjs exp(zj ) − κ κ j=1

=−

n 1 xjs xjt exp(zj ) κ2

(8)

j=1

Observed information on σ = (ψ1 , . . . , ψk−1 , ψk+1 , . . . , ψp , κ) is obtained using Equation (7) by ⎫⎤ ⎡ ⎧ n ⎬ ∂ 2 (ψ, κ) ∂ ⎣ 1 ⎨ = xjs exp(zj ) − xjs ⎦ ⎭ ∂ψs ∂κ ∂κ κ ⎩ j=1 ¯ j∈C ⎤ ⎡ ⎤ ⎡     n n yj − xj ψ 1 1⎣ ⎦ ⎣ ⎦ xjs exp(zj ) xjs exp(zj ) − xjs + − 2 − = κ κ2 κ j=1 j=1 j∈C¯ ⎡ ⎤ ⎡ ⎤ n n 1 ⎣ 1 ⎣ ⎦ xjs exp(zj )zj − 2 xjs exp(zj ) − xjs ⎦ =− 2 κ κ j=1 j=1 j∈C¯ ⎡ ⎤ n 1 xjs exp(zj )(zj + 1) − xjs ⎦ (9) =− 2⎣ κ j∈C¯

j=1

∂ 2 (ψ, κ) ∂κ 2

=

r +2 κ2

yj − xj ψ j∈C¯

κ3

  n  yj − xj ψ yj − xj ψ yj − xj ψ + + exp(zj ) − −2 exp(zj ) κ3 κ2 κ2 j=1 ⎡ ⎤    n yj − xj ψ yj − xj ψ 2 + exp(zj ) ⎦ 2 exp(zj ) = −⎣ κ3 κ2 j=1 ⎡ ⎤ yj − xj ψ r ⎦ − ⎣− 2 − 2 κ κ3 j∈C¯

(10)

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

7

Here, ˆjσ can be obtained easily using Equations (8), (9), and (10). The approximated component σ ;σˆ (ψk ) can be obtained as   n yj − xj ψ ∂ r yj − xj ψ + exp(zj ) ( ψ, κ) = − − κ (ψ, κ) = ∂κ κ κ2 κ2 j∈C¯

=−

n j=1

δj − κ

So, δj κ (ψ, κ; yj ) = − − δj κ and



j=1

n j=1

 δj

yj − xj ψ κ2

yj − xj ψ κ2



 +

n j=1



yj − xj ψ exp(zj ) κ2



yj − xj ψ + exp(zj ) κ2





     δj yj − xj ψ yj − xj ψ ∂ ∂ − − δj + exp(z κ (ψ, κ; yj ) = ) j ∂yj ∂yj κ κ2 κ2   zj δj 1 1 = − 2 + exp(zj ) 2 + exp(zj ) κ κ κ κ δj 1 = − 2 + 2 exp(zj )(zj + 1) κ κ ∂ F(y ; ψ, κ) ∂yj j = ∂κ κ = κˆ ∂ κˆ f (yj ; ψ, κ)  ∂ ∂κ 1 − exp{− exp(zj )} = f (yj ; ψ, κ)   y −x ψ − exp{− exp(zj )}{− exp(zj )} − j κ 2j = f (yj ; ψ, κ)

|

z

=

− exp{− exp(zj )} exp(zj ) κj 1 κ

exp(zj ) exp{− exp(zj )}

= −zj and

(11)

   ∂ 1 ∂ ψs (ψ, κ; yj ) = xjs exp(zj ) − δj xjs ∂yj ∂yj κ   1 1 = xjs exp(zj ) κ κ 1 = 2 xjs exp(zj ) κ ∂ ∂yj ∂ψ F(yj ; ψ, κ) = t ψ = ψˆ f (yj ; ψ, κ) ∂ ψˆ t ! "  #$ yj −xj ψ ∂ 1 − exp − exp ∂ψt κ = f (yj ; ψ, κ)

|

(12)

8

M. M. ISLAM ET AL.

x

=

− κjt exp{− exp(zj )} exp(zj ) 1 κ

exp(zj ) exp{− exp(zj )}

= −xjt

(13)

∂y ∂y From Equations (11) and (13) we get ∂ κˆj = −zj and ˆj = −xjt , for j ∈ C¯ and 0 for j ∈ C. At ∂ ψt this stage we can evaluate all the components in order to derive the following modified profile log-likelihood function     1   (14) mp (ψk ) = p (ψk ) + log ˆjσ  − log  σ ;σˆ (ψk ) 2

3. Monte-Carlo simulation In this section, we conduct Monte-Carlo simulation studies to compare the performances of profile and modified profile likelihood estimators. All simulations are performed by using statistical software R. The simulations are based on 1000 runs. We consider random right censored observations for simulation examples and we maintain 25% censoring rate in the data sets. For this, we generate lifetimes and censoring times from two different distributions in a way so that 25% censoring rate is maintained. The two estimators are evaluated using different summary statistics including mean, variance, bias, mean squared error (MSE), and relative bias (RB). The RB measures the size of the bias as a percentage of the true parameter value (bias/true parameter value). In addition, we construct the following 95% confidence intervals for the parameter, say θ. CI.p = θˆp ± 1.96 × SE (θˆp ), and CI.mp = θˆmp ± 1.96 × SE (θˆmp ) where θˆp and θˆmp are the profile and modified profile likelihood estimates respectively. Also, SE (θˆp ) and SE (θˆmp ) are the bootstrap standard errors of the profile and modified profile likelihood estimators, respectively. For each type of confidence interval, we obtain the coverage probability as the percentage of intervals containing the true value of the population parameter. 3.1. Estimation of shape parameter β of Weibull regression Recall the density function of the extreme value distribution. The density function of y is given by    y − η(x) y − η(x) 1 − exp , −∞ < y < ∞ f (y; η(x), κ) = exp κ κ κ We apply the profile and modified profile likelihood estimation techniques that were discussed in Section 2. Here we consider three different models and three different sample sizes, n = 20, 50, and 75. For three models say, model-1, model-2, and model-3, we fix the number of nuisance parameters at 2, 4, and 8, respectively. The models are then defined as Model-1 : y ∼ EV[η1 (x), κ],

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

9

Model-2 : y ∼ EV[η2 (x), κ], and Model-3 : y ∼ EV[η3 (x), κ] where η1 (x), η2 (x), and η3 (x) are defined as η1 (x) = x1 ψ1 + x2 ψ2 , η2 (x) = x1 ψ1 + x2 ψ2 + x3 ψ3 + x4 ψ4 and η3 (x) = x1 ψ1 + x2 ψ2 + x3 ψ3 + x4 ψ4 + x5 ψ5 + x6 ψ6 + x7 ψ7 + x8 ψ8 We set β = 2 that yields κ = 0.5 since κ = β −1 as defined in Section 1 and x1 = 1 to all three models. The remaining covariate values are generated from the standard Uniform distribution. The simulation results are presented in Table 1. We notice that the modified estimators provide more accurate average maximum likelihood estimates as reflected in their lower bias. In general, the MSE and variance of the modified estimators decreases with increasing sample size. This characteristic of the estimates is expected, because both estimators are asymptotically unbiased. Results reveal that the modified estimators perform best with respect to all the statistics considered under different parameter settings. In particular, when the number of nuisance parameters increases, the modified estimators outperform the profile estimators significantly. This result provides even stronger evidence in favor of the accuracy of the modified estimators. The modified estimators consistently exhibit the smallest relative bias in absolute sense and the bias becomes negative when number of nuisance parameters increases. The coverage probabilities estimated as the percentage of confidence intervals containing the true parameter value, are presented in the last column of Table 1. The results show better coverage for modified profile likelihood scheme. Hence, modified estimators generally perform better than the profile likelihood estimators when the sample size is small and as the number of nuisance parameters increases. 3.2. Estimation of regression parameter ψ of Weibull regression We are interested to see how the modified profile likelihood estimates differ from profile likelihood estimates when collinearity is present among covariates. We consider three different Table 1. Simulation results of shape parameter β of Weibull regression model. Sample

Nuisance

size

parameter

p

Mean mp

p

Variance

20

2 4

2.238 1.669

2.065 1.829

0.281 0.925

0.218 0.238 0.065 0.371 −0.331 −0.171

0.338 1.034

50

2 4 8

2.104 1.455 1.584

2.036 1.526 1.694

0.080 0.519 0.189

0.075 0.104 0.036 0.356 −0.545 −0.474 0.101 −0.416 −0.306

75

2 4 8

2.050 1.732 1.462

2.005 1.812 1.537

0.052 0.382 0.117

0.050 0.050 0.005 0.178 −0.268 −0.188 0.101 −0.538 −0.463

mp

Bias p

MSE mp

p

RB (%) mp

mp

% of CI p

mp

0.223 11.919 3.251 0.400 −16.552 −8.532

91.90 97.50

94.40 93.60

0.091 0.816 0.362

0.076 5.218 1.782 0.580 −27.246 −23.689 0.195 −20.792 −15.323

93.10 78.10 82.70

94.60 81.80 84.90

0.054 0.454 0.407

0.050 2.490 0.261 0.214 −13.419 −9.407 0.315 −26.900 −23.163

93.60 90.10 60.60

94.60 91.30 68.00

Here p and mp stand for profile and modified profile likelihood technique respectively.

p

10

M. M. ISLAM ET AL.

models defined as below. Model-1 : y ∼ EV[η1 (x), κ] Model-2 : y ∼ EV[η2 (x), κ] and Model-3 : y ∼ EV[η3 (x), κ] where we fix ηi (x) = ψ0 + xi1 ψ1 + xi2 ψ2 for i = 1, 2, 3. We consider three different samples of sizes n = 15, n = 20 and n = 40 and 1000 simulation runs. We let ψ2 = 1, which is our parameter of interest. The nuisance parameters are assumed to have values β = 2 and ψ0 = ψ1 = 1. The covariates for model 1, 2, and 3 have been generated from a bivariate normal distribution with mean μ = (2, 6) and variance-covariance matrices defined as, respectively,       1.2 0 1.2 .92 1.2 1.9

2 =

3 =

1 = 0 4.3 .92 4.3 1.9 4.3 The variance-covariance matrices lead to the following correlation matrices       1 0 1 0.4 1 0.84 ρ1 = , ρ2 = , ρ3 = 0 1 0.4 1 0.84 1 In Table 2, we present the empirical values for the mean and standard errors of the estimators as well as the percentage of 95% confidence intervals containing the true value of the parameter of interest. Table 2 shows that if the covariates are uncorrelated, the performances do not differ much between profile and modified profile likelihood estimators. In presence of collinearity, modified profile likelihood estimate of the regression parameter ψ2 differs from the true value of one whereas the estimate of profile likelihood does not differ much. The results indicate that in presence of collinearity among covariates, estimates of the parameters are more accurate if we use modified version of profile likelihood. The smaller p-values corresponding to modified profile likelihood estimates provide enough evidence to support the claim that ψ2 differs from 0 when the null hypothesis is actually false. In contrast, the ordinary profile likelihood inference fails to provide enough support in favor of the claim. As in the previous example, the modified estimators provide more accurate and precise estimates than estimators based on the profile likelihood, and also yield confidence intervals with higher coverage probabilities than the latter. Table 2. Simulation results of Weibull regression parameter ψ2 . Covariance Sample matrix

1

2

3

Mean

Variance

Bias

MSE

RB (%)

% of CI

size

p

mp

p

mp

p

mp

p

mp

p

mp

p

mp

15 20 40 15 20 40 15 20 40

0.9999 1.0008 0.9981 0.9993 1.0017 0.9990 1.0168 0.9943 1.0023

0.9998 1.0005 0.9983 0.9997 1.0006 0.9989 1.0188 0.9944 1.0023

0.0093 0.0058 0.0022 0.0120 0.0068 0.0028 0.0352 0.0211 0.0074

0.0096 0.0058 0.0023 0.0117 0.0068 0.0028 0.0352 0.0207 0.0073

−0.0001 0.0008 −0.0019 −0.0007 0.0017 −0.0010 0.0168 −0.0057 0.0023

−0.0002 0.0005 −0.0017 −0.0003 0.0006 −0.0011 0.0188 −0.0056 0.0023

0.0094 0.0058 0.0022 0.0120 0.0068 0.0028 0.0355 0.0211 0.0074

0.0096 0.0058 0.0023 0.0117 0.0068 0.0028 0.0355 0.0207 0.0073

−0.0044 0.0838 −0.1899 −0.0705 0.1744 −0.0985 1.6840 −0.5750 0.2233

−0.0166 0.0580 −0.1680 −0.0318 0.0620 −0.1124 1.8830 −0.5579 0.2277

95.20 94.60 94.80 95.20 95.10 95.00 94.00 94.60 94.70

95.20 94.30 95.10 95.40 94.50 94.70 94.40 94.90 94.90

Here p and mp stand for profile and modified profile likelihood techniques, respectively.

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

11

4. Application to leukemia survival data

0.9 0.8 0.5

0.6

0.7

Estimated survival (KM)

0.8 0.7 0.5

0.6

Estimated survival

0.9

1.0

1.0

In a research on treatment for acute leukemia, Klein and Moeschberger (2005) provided data on 137 bone marrow transplant patients. Bone marrow transplant is considered as a standard treatment for acute leukemia. Several potential risk factors were measured and the patients were grouped into three risk categories based on their status at the time of transplantation: acute lymphoblastic leukemia (38 patients), acute myeloctic leukemia (AML) low-risk first remission (54 patients) and AML high-risk second remission or untreated first relapse (15 patients). In our study we have considered the AML low-risk first remission patients only. See details on Klein and Moeschberger (2005), Copelan et al. (1991), Klein and Moeschberger (2005), and Copelan et al. (1991). Acute myeloctic leukemia, by domain, is also known as acute myeloid leukemia which is a cancer that progresses in the myeloid line of blood cells and the bone marrow. The disease has rapid progression and affects approximately 2–3 adults per 100000 each year in western countries (Thiede et al. 2002). The dataset, presented by Klein and Moeschberger (2005), contain survival time (in months) of 54 patients and some risk factors which include acute graft-versus-host disease indicator (developed or never developed), chronic graft-versus-host disease indicator (developed or never developed), platelet returned to normal levels indicator (returned or never returned), patient age (years), donor age (years), gender of patient (male or female), gender of donor (male or female), patient cytomegalovirus immune (CMV) status (positive or negative), donor cytomegalovirus immune (CMV) status (positive or negative), waiting time to transplant (days), French–American–British (FAB) classification (FAB Grade 4–5 and AML or otherwise). See details on Klein and Moeschberger (2005), Copelan et al. (1991), Klein and Moeschberger (2005), and Copelan et al. (1991). However, the original dataset also contain hospital place and methotrexate used as a graft-versus-host-prophylactic indicator variables, but we omit them from our study because of non-convergence issues. The data are presented in Table 3. The labels of the variables, as shown in Table 3, are given in Appendix. We consider the partial dataset of AML low-risk first remission patients in this study because it has considerably large parameter space. The diagnostic plots for Weibull parametric model are obtained to examine the appropriateness of Weibull parametric model for this dataset. We report here, as shown in Figure 1, only two diagnostic plots—the

KM MLE Weibull

0

20

40

60

Time(months)

Figure 1. Diagnostic plots for Weibull model.

0.5

0.6

0.7

0.8

Estimated survival (Weibull)

0.9

12

M. M. ISLAM ET AL.

Table 3. AML low-risk first remission data (subset on 54 patients and selected covariates). T1 85.63 83.53 80.30 73.93 61.90 60.97 52.07 49.00 45.43 34.33 28.67 41.93 74.87 62.33 59.97 56.97 55.80 52.27 50.90 44.13 31.90 31.07 28.23 28.27 61.67 61.43 51.17

D1 DA DC DP

Z1

Z2

Z3 Z4 Z5 Z6

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

19 31 35 16 29 19 26 27 13 25 25 30 45 33 32 23 37 15 22 46 18 27 28 23 37 34 35

13 34 31 16 35 18 30 34 24 29 31 16 39 30 23 28 34 19 12 31 17 30 29 26 36 32 32

1 1 1 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0

0 0 0 0 1 0 0 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 0 1 1 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 1 0 0 1

1 0 1 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0

0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0

Z7

Z8

T1

270 60 120 60 90 210 90 240 90 210 180 180 105 225 120 90 60 90 450 75 90 60 75 180 180 270 180

1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1

48.23 46.13 13.80 73.47 35.43 16.03 3.50 21.37 13.00 9.60 17.40 2.63 38.53 19.43 1.60 14.37 35.80 13.10 0.33 1.77 2.67 1.17 49.97 23.47 21.77 7.40 45.20

D1 DA DC DP

Z1

Z2

Z3 Z4 Z5 Z6

0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0

33 21 21 25 50 35 37 26 50 45 28 43 14 17 32 30 30 33 34 33 30 23 35 29 23 28 33

28 18 15 19 38 36 34 24 48 43 30 43 19 14 33 23 32 28 54 41 35 25 18 21 16 30 22

0 0 1 0 1 1 1 1 1 1 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 1 1

0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0

1 1 0 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0

1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1

1 0 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 1 0 1 1 1 0 1 1

1 0 0 0 1 1 1 0 0 1 0 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 1

1 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0

Z7

Z8

150 120 120 60 270 90 120 90 120 90 90 90 60 120 150 120 150 120 240 180 150 150 30 105 90 120 210

0 0 1 0 1 1 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1

probability-probability plots and the parametric survival probability versus the Kaplan–Meier survival probability plots for this model. It is evident from the plots that the Weibull model can be fitted using this dataset. The Weibull shape parameter determines the behavior of the underlying distribution– in fact for certain values of the shape parameter, the Weibull distribution may reduce to other distributions. The shape parameter is also associated with the failure rate of Weibull distributed failure times. Failure rate will decrease with time if β < 1 and increase if β > 1. The models are implemented to the Leukemia data and the results are reported in Table 4. The results show that βˆmp < 1 and βˆp > 1, where βˆp and βˆmp are profile and modified profile likelihood estimates of the Weibull shape parameter, respectively. The result shows opposite pattern in the underlying distribution when we consider modified profile likelihood scheme instead of profile likelihood. As βˆmp < 1, the failure rate of AML low-risk first remission patients decreases over time whereas the profile likelihood estimate βˆp > 1 implies failure rate increases over time. The standard error of βˆmp is also smaller than that of βˆp , which indicates precise estimation in favor of modified profile likelihood estimation scheme. The corresponding estimates of regression parameters do not differ much in both likelihood estimation schemes. However, in most of the cases, modified profile likelihood estimates have smaller standard errors than that of profile likelihood estimates.

5. Discussion and conclusion The derived quantities based on likelihood function with many nuisance parameters are the basis for statistical inference based on mathematical modeling. It is known that parametric inference is sometimes affected by the achievement of parametric orthogonality. In presence

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

13

Table 4. Estimation of Weibull shape and regression parameter: Leukamia survival data. Profile

Parameters Intercept Acute graft-versus-host disease indicator: yes Chronic graft-versus-host disease indicator: yes Platelet recovery indicator: return to normal level Patient age (years) Donor age (years) Gender of patient: male Gender of donor: male Patient CMV status: positive Donor CMV status: positive Waiting time to transplant (days) FAB: grade 4 or 5 and AML Shape parameter

Modified profile

Estimates

S.E.

Estimates

S.E.

1.703 2.476 0.904 74.54 1.027 0.96 1.04 2.21 1.05 0.504 1.005 0.159 1.106

18.981 4.104 1.458 39.15 0.084 0.084 1.446 5.341 0.974 1.777 0.387 0.312 0.505

1.784 2.416 0.926 74.67 1.063 0.983 1.04 2.11 1.056 0.512 1.01 0.163 0.764

1.766 0.098 0.062 14.28 0.092 0.051 0.058 0.134 0.054 0.105 2.77 0.4 0.349

of high number of nuisance parameters which is unlike the cases, profile likelihood inference technique may be quiet suspicious and sometimes unreliable. The maximum likelihood estimator can be considerably biased when inference is made based on small samples. We have derived the modified profile likelihood estimators for the Weibull shape parameter and also for regression parameter by the technique of modified profile likelihood as suggested by Barndorff-Nielsen (1983). The implementations of the methods are illustrated in detail with some simulated examples and with a real example using leukemia (Klein and Moeschberger 2005; Copelan et al. 1991) data set. The Weibull regression model is a widely used model in survival analysis where censoring plays an important role. The numerical evidences from the lifetime Weibull regression model fitting (both simulated and the real data analysis) support the result that the modified profile likelihood estimation yields considerably improved inference. It is interesting to note that the modification to profile likelihood comes from several higher order approximations. The model assumes that the derivatives of the likelihood components and the information matrix are computationally convenient or at least numerically obtainable. If the model does not belong to the family of exponential distribution, the exact or even approximate expression for the conditional distribution of the maximum likelihood estimator will be accurate to order O(n−1 ) and often up to order O(n−3/2 ). The modified profile likelihood methods are based on the maximization of a log likelihood obtained by adding a suitable correction term to the profile log likelihood function. The resulting modified profile function adequately approximates a conditional or marginal likelihood for nuisance parameters. A conditional likelihood is based on eliminating the nuisance parameters by conditioning on a suitable set of sufficient statistics for these parameters. Moreover, a marginal likelihood is based on a statistic, the distribution of which only depends on the parameters of interest. We have found that the modified profile likelihood estimator leads to a strong reduction in bias over standard maximum likelihood estimator. The maximization of the modified profile likelihood function may be performed by standard numerical algorithms and the related inferential procedures such as computation of standard errors may be performed in a rather conventional way. The maximization of a modified profile likelihood is often more straightforward than the solution of an estimating equation, especially when the parameter of interest is multidimensional. Furthermore, if the covariates in the model are correlated, which is often the case for high-dimensional

14

M. M. ISLAM ET AL.

datasets, e.g., microarray, the implementation of modified profile technique leads to less biased and more efficient estimates. The main drawback of modified profile likelihood is that the method needs sample space derivatives, which are not straightforward to compute under some models. Moreover, the direct implementation of modified profile likelihood for highdimensional datasets is still difficult. Even if the covariates are not correlated, the modified profile likelihood technique may reveal hidden findings as the leukemia example (Klein and Moeschberger 2005; Copelan et al. 1991) suggests.

References Barndorff-Nielsen, O. 1980. Conditionality resolutions. Biometrika 67 (2):293–310. Barndorff-Nielsen, O. 1983. On a formula for the distribution of the maximum likelihood estimator. Biometrika 70 (2):343–65. Barndorff-Nielsen, O., and D. R. Cox. 1979. Edgeworth and saddle-point approximations with statistical applications. Journal of the Royal Statistical Society. Series B (Methodology) 41 (3):279–312. Casella, G., and R. L. Berger. 2002. Statistical inference. Vol. 2. Pacific Grove, CA: Duxbury Press. Chimka, J. R., and Q. Wang. 2009. Accelerated failure-time models of graduation. Educational Research and Reviews 4 (5):267–71. Copelan, E. A., J. C. Biggs, J. M. Thompson, P. Crilley, J. Szer, J. P. Klein, N. Kapoor, B. R. Avalos, I. Cunningham, and K. Atkinson. 1991. Treatment for acute myelocytic leukemia with allogeneic bone marrow transplantation following preparation with BuCy2. Blood 78 (3):838–43. Cox, D. R. 1980. Local ancillarity. Biometrika 67 (2):279–86. Cox, D. R., and N. Reid. 1987. Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society. Series B (Methodology) 49 (1):1–39. Da Silva, M. F., S. L. Ferrari, and F. Cribari-Neto. 2008. Improved likelihood inference for the shape parameter in Weibull regression. Journal of Statistical Computation and Simulation 78 (9):789–811. Daniels, H. E. 1954. Saddlepoint approximations in statistics. The Annals of Mathematical Statistics 25 (4):631–50. Fisher, R. A. 1934. Two new properties of mathematical likelihood. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character 144 (852):285–307. Fraser, D. A. S., and D. A. Fraser. 1968. The structure of inference. Vol. 23. New York: Wiley. Fraser, D. A. S., and N. Reid. 1989. Adjustments to profile likelihood. Biometrika 76 (3):477–88. Hinkley, D. 1980. Likelihood as approximate pivotal distribution. Biometrika 67 (2):287–92. Islam, M. M., and M. H. R. Khan. 2016. Improved likelihood estimation for the generalized extreme value and the inverse Gaussian lifetime distributions. arXiv:1603.08388. Khan, M. H. R., and J. E. H. Shaw. 2016a. On dealing with censored largest observations under weighted least squares. Journal of Statistical Computation and Simulation 88 (18):3758–76. Khan, M. H. R., and J. E. H. Shaw. 2016b. Variable selection for survival data with a class of adaptive elastic net techniques. Statistics and Computing 26 (3):725–41. Khan, M. H. R., and J. E. H. Shaw. 2017. Variable selection for accelerated lifetime models with synthesized estimation techniques. Statistical Methods in Medical Research doi:10.1177/0962280217739522. Klein, J. P., and M. L. Moeschberger. 2005. Survival analysis: Techniques for censored and truncated data. Berlin: Springer Science & Business Media. Lawless, J. F. 2011. Statistical models and methods for lifetime data. Vol. 362. Hoboken, NJ: John Wiley & Sons. Severini, T. A. 1998. An approximation to the modified profile likelihood function. Biometrika 85 (2):403–11. Severini, T. A. 2001. Approximation of sample space derivatives. In: Data analysis from statistical foundations: A Festschrift in honour of the 75th birthday of DAS Fraser, ed. D. A. Fraser, 35.Hauppauge, NY: Nova Publishers. Thiede, C., C. Steudel, B. Mohr, M. Schaich, U. Schäkel, U. Platzbecker, M. Wermke, M. Bornhäuser, M. Ritter, A. Neubauer, et al. 2002. Analysis of FLT3-activating mutations in 979 patients with acute

COMMUNICATIONS IN STATISTICS—THEORY AND METHODS

15

myelogenous leukemia: Association with FAB subtypes and identification of subgroups with poor prognosis. Blood 99 (12):4326–35. Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodology) 58 (1):267–88. Young, G. A., and R. L. Smith. 2005. Essentials of statistical inference. Vol. 16. Cambridge: Cambridge University Press. Zou, H., and T. Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2):301–20.

Appendix T1– Time (in months) to death or on study time D1–Death indicator (1-Dead, 0-Alive) DA–Acute graft-versus-host disease indicator (1-Developed, 0-Never developed) DC–Chronic graft-versus-host disease indicator (1-Developed, 0-Never developed) DP–Platelet recovery indicator (1-Platelets returned to normal levels, 0-Platelets never returned to normal levels) Z1–Patient age (years) Z2–Donor age (years) Z3–Patient sex (1-Male, 0-Female) Z4–Donor Sex (1-Male, 0-Female) Z5–Patient CMV status (1-CMV positive, 0-CMV negative) Z6–Donor CMV status (1-CMV positive, 0-CMV negative) Z7–Waiting time to transplant (days) Z8–FAB status (1-FAB Grade 4–5 and AML, 0-Otherwise)