Parameter estimation of incomplete data in competing ... - IEEE Xplore

0 downloads 0 Views 291KB Size Report
technique for checking model validity. Index Terms—Censoring, competing risks, EM algorithm, like- lihood function, masking, missing data, MLE. ACRONYMS1.
282

IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 2, JUNE 2005

Parameter Estimation of Incomplete Data in Competing Risks Using the EM Algorithm Chanseok Park

Abstract—Consider a system which is made up of multiple components connected in a series. In this case, the failure of the whole system is caused by the earliest failure of any of the components, which is commonly referred to as competing risks. In certain situations, it is observed that the determination of the cause of failure may be expensive, or may be very difficult to observe due to the lack of appropriate diagnostics. Therefore, it might be the case that the failure time is observed, but its corresponding cause of failure is not fully investigated. This is known as masking. Moreover, this competing risks problem is further complicated due to possible censoring. In practice, censoring is very common because of time and cost considerations on experiments. In this paper, we deal with parameter estimation of the incomplete lifetime data in competing risks using the EM algorithm, where incompleteness arises due to censoring and masking. Several studies have been carried out, but parameter estimation for incomplete data has mainly focused on exponential models. We provide the general likelihood method, and the parameter estimation of a variety of models including exponential, -normal, and lognormal models. This method can be easily implemented to find the MLE of other models. Exponential and lognormal examples are illustrated with parameter estimation, and a graphical technique for checking model validity. Index Terms—Censoring, competing risks, EM algorithm, likelihood function, masking, missing data, MLE.

CIF EM MLE Cdf pdf

ACRONYMS1 implies: statistical(ly) cumulative incidence function expectation-maximization maximum-likelihood estimate cumulative distribution function probability density function NOTATION set of labels of failure component lifetime of the th subject due to the th cause censoring indicator variable vector parameters of the distribution of estimate of at the th EM sequence MLE of estimate of pdf of

at the th EM sequence

Manuscript received March 23, 2004; revised September 30, 2004. This work was supported in part by Clemson RGC Award. Associate Editor: M. Xie. The author is with the Department of Mathematical Sciences, Clemson University, Clemson, SC 29634, USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TR.2005.846360 1The

singular and plural of an acronym are always spelled the same.

Cdf of survival function of hazard function of : observed data : missing data likelihood function with no masking likelihood function with masking complete-data likelihood function Indicator function pdf of the standard -normal distribution Cdf of the standard -normal distribution I. INTRODUCTION

T

HE analysis of lifetime or failure time data has been of considerable interest in many branches of statistical applications such as reliability engineering, electrical engineering, industrial engineering, biological sciences, etc. In an industrial application, a system is made up of multiple components connected in a series. In this case, the failure of the whole system is caused by the earliest failure of any of the components, which is commonly referred to as competing risks. In certain situations, it is observed that the determination of the cause of failure may be expensive, or may be very difficult to observe due to the lack of appropriate diagnostics. Therefore, it might be the case that the failure time of an individual is observed, but its corresponding cause of failure is not fully investigated. This is known as masking. We consider that the cause of the th system failure may or may not be exactly identified, so the cause-of-failure leads to nonempty subset of labels defining the component in the module. For example, if the th system with components fails due to the th component, then the set of labels is (no masking); if its failure is completely unknown, then (complete masking); and if its failure is identified by the causes containing more than one failure but not all fail(partial masking). Moreover, this ures, then competing risks problem is further complicated due to possible censoring. In practice, censoring is very common because of time and cost considerations on experiments. The data are said to be censored when, for certain observations, only a lower or upper bound on lifetime is available. The traditional approach when dealing with competing risks is to consider the hypothetical latent lifetimes corresponding to each cause in the absence of the others [1]. We formulate the problem formally using the following notation. A subject is exposed to several potential causes of failure. Let there be a finite number of -independent causes of failure indexed by

0018-9529/$20.00 © 2005 IEEE

PARK: PARAMETER ESTIMATION OF INCOMPLETE DATA IN COMPETING RISKS USING THE EM ALGORITHM

. Let denote the continuous lifetime of the th . It is assumed subject due to the th cause, where that are -independent for all and , and are identically distributed for all for given . The corresponding Cdf, pdf, surare denoted in genvival function, and hazard function of eral by , , , and , respectively, where is a vector of real valued parameters for each . Then the observed lifetime of the th subject is given by the random variable

Typically, in reliability analysis problems, complete observation of may not be possible due to various censoring schemes that can be inherent in data collection. It is further assumed that each can be randomly right-censored by censoring times , which are -independent of lifetimes for all . Thus, one observes triplets , where , is is the set of labels defining the components that failed, and a censoring indicator variable defined as if masked if failed with th cause if censored

(1)

as We denote a realization of the random variable . The analysis of exponential data with two causes was studied by Cox [2], which was extended to multiple causes by Herman & Patell [3]. The parametric estimation problem for the case with two causes with possible missing causes has been discussed by Miyakawa [4] without censored data. Usher & Hodgson [5], Usher & Guess [6], Guess, Usher & Hodgson [7], and Reiser et al. [8] have considered the masking problem, but they mainly focused on exponential models. They provided closed-form solutions under very restrictive assumptions. Although some authors provided the likelihood function with censored data, no explicit estimates were given. Kundu & Basu [9], and Kundu [10] also extended Miyakawa’s work. Their methods are, however, limited to the exponential and Weibull models with only two causes, and they did not consider partial masking. Although they stated that their solutions extend to the multiple cause case, no explicit expressions were provided. Recently, Park & Kulasekera [11] extended their work and provided the closed-form MLE for the exponential model with multiple causes, censored data, and completely-masked causes together; but they only considered the case where the lifetime distributions were exponential and Weibull. For the Weibull distribution, the closed-form MLE is available only when the common shape parameter is estimated by the likelihood function. Jiang & Murthy [12] studied a characterization of the Weibull probability plot of a general -fold Weibull competing risks model. They proposed a graphical method, and the use of multiple regression to obtain parameter estimates. Ishioka & Nonaka [13] presented a technique to stably estimate the common Weibull shape parameter with two causes using a quasi-Newton method when the data consist only of the system lifetime (the concomitant indicator is unknown). Here, the unknown concomitant indicator is equivalent to the masking

283

problem in our context. Thus, their method can be used for the masking problem, but it is very limited to only two causes, and a common shape parameter. Another approach using the EM algorithm was considered by Albert & Baxter [14]. They found the EM sequences for the exponential model with multiple causes, censoring, and general masking. However, unless one assumes an exponential distribution for the lifetimes, it is very difficult to apply their idea because it requires that the hazard and survival functions have nice closed forms. An alternative to the traditional latent lifetime framework is the mixture-model approach. Early work is referenced by Larson & Dinse [15], while a more recent result is studied by Maller & Zhou [16]. An attractive feature of this approach is that it relaxes the -independence assumption of the latent lifetime approach. However, the drawback is that finding the MLE in the parametric mixture model is quite difficult, and requires intense and often unstable numerical calculations. When it is inappropriate or undesirable to assume a specific parametric form, and -independence in the competing risks problem, one can use distribution-free methods. There is extensive literature on nonparametric estimation and testing. Early works are referenced by Gray [17], Aly [18], Sun & Tiwari [19], Lindkvist & Belyaev [20], Lam [21], and Luo & Turnbull [22], while more recent studies are mentioned by Kulathinal & Gasbarra [23], El-Nouty & Lancar [24], and Tiwari, Kulasekera & Park [25]. In this paper, based on the traditional hypothetical latent lifetimes approach, a new method for finding the EM-type MLE is provided. This method can handle complex data such as fully-observed, censored, and partially-masked data in competing risks. This method can be easily implemented to find the EM sequences, even when the hazard and survival functions do not have closed forms. We provide the general likelihood method in Section II. Parameter estimation using the EM algorithm is handled in Section III and IV, followed up with exponential and lognormal examples in Section V. The derivation of the EM sequences of exponential and -normal (lognormal) distributions is detailed in the Appendix. II. MAXIMUM LIKELIHOOD In this section, we provide the general maximum likelibe the indicator function of an event hood method. Let . For convenience, denote , and . The likelihood function of the censored sample is

(2)

284

IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 2, JUNE 2005

where (3)

with respect to is equivalent to individually Maximizing for each cause . Thus, we have reduced the maximizing joint maximum likelihood problem for a set of parameters to separate estimation problems for the single parameter . This simplifies the numerical work considerably. Next, we consider a lifetime of a subject due to an unknown cause of failure (masking), but its cause is known to be one in . We need to find the pdf of , and add this into the a set likelihood function. The CIF for each th cause is

with its corresponding sub-density function (4) The pdf of

with

The EM algorithm is a general iterative approach for computing the MLE of parametric models when there are no closedform ML estimates, or the data are incomplete. The EM algorithm was introduced by Dempster, Laird, & Rubin [26] to overcome the above difficulties. The main references for the EM are Schafer [27], Little & Rubin [28], and Tanner [29]. The EM algorithm consists of an -expectation step (E-step), and a maximization step (M-step). The advantage of the EM algorithm is that it solves a difficult incomplete-data problem by constructing two easy steps. The E-step only needs to compute the conditional -expectation of the log-likelihood with respect to the incomplete data, given the observed data. The M-step needs to find the maximizer of this expected likelihood. An additional advantage of this method compared to other optimization techniques is that it is very simple, and it converges reliably. In general, if it converges, it converges to a local maximum. Hence in the case of the unimodal and concave likelihood function, the EM algorithm converges to the global maximizer from any starting value. Below, we provide a short summary of the EM algorithm when it is applied in the missing-data framework. Let be the vector of unknown parameters. Then the complete-data likelihood is

is given by

Denote if the cause of failure is unknown. Then the overall likelihood of the censored and masked data is given by

Denote the observed part of , and the missing part by denote the estimate at the th EM sequence by algorithm consists of two distinct steps: •

E-step: Compute

by ; and . The EM

, where

(5) where •

(6) In general, the closed-form MLE from the likelihood function above is not available, and numerical methods are required to . One popular method that is often used is the maximize Newton-Raphson method, but a problem with this method is that it can be very sensitive to the choice of starting values, and therefore can often fail to converge to a solution. Also, in the case of the likelihood function (6) above, if the number of causes is large, the likelihood can become over-parameterized, and the Newton-Raphson method becomes totally ineffective. The difficulty with using direct maximization of the likelihood in (6) is overcome through the use of the EM algorithm discussed in the following section. III. THE EM ALGORITHM, AND LIKELIHOOD CONSTRUCTION In this section, we introduce the EM algorithm, and develop the likelihood functions which can be conveniently used as inputs in the E-step of the EM algorithm.

M-step: Find which maximizes over . The question is whether we can apply the EM algorithm to the competing risks problem. When the data are masked, this is equivalent to the cause of failure being missing, so we can , by treating the construct the complete-data likelihood, cause of failure as missing data. Constructing the complete-data likelihood is not difficult once we introduce an indicator varifor . able. Define has a Bernoulli distribution with Then . It follows that if if Replacing

in (6) with

PARK: PARAMETER ESTIMATION OF INCOMPLETE DATA IN COMPETING RISKS USING THE EM ALGORITHM

we have the complete-data likelihood of the censored and masked data as

where

285

data as missing data can be thought of as a general approach that will allow one to find the closed form independently of the distribution assumed for the lifetimes. It should be noted that this method may also be applied to more general forms of censoring including grouping, rounding, and truncation. For more details, the reader is referred to Heitjan [30], and Heitjan & Rubin [31]. Using this approach, numerical optimization will no longer be needed, and the EM algorithm can be easily implemented. The approach can be applied to a variety of distributions including the exponential, -normal, lognormal, and Laplace distributions. Below, we show just how to obtain (8) as a closed-form pdf by treating the censored data as missing data. be a truncation of at with . Then Let we have the complete-data likelihood corresponding to (8).

(7) , then clearly If follows that

, and thus

. It

(9) Using this, and

, we can simplify (7) as follows where the pdf of

is given by

(8) Now, because the likelihood is fully factorized by , the estimation problem can be solved individually for . So, just as we did in (3) and (6), by each single parameter , using this factorized complete-data likelihood instead of we have reduced the joint maximum likelihood problem for a set of parameters to individual estimation problems, each with a single parameter . So, although the likelihood in (6) is not easy to solve because of numerical difficulties, considering the masked data as missing data, and applying an EM framework, allows one to obtain a likelihood which is made up of . Therefore, the individual likelihoods for each parameter transformation of the problem to a missing-data problem simplifies the numerical difficulties considerably. Nevertheless, it still may not be obvious how the EM algorithm is implemented in the missing-data case, and this is discussed in the next section.

for . In the E-step, we can compute as follows. For convenience, let , where . Using (9), we have

Now, conditioning on the dichotomous random variable , the second term on the right side of the preceding can be written as

IV. EM IMPLEMENTATION ISSUES When the distribution for the lifetimes is assumed to be exponential, and the data are censored and masked, we can easily implement an EM algorithm using (8) because the hazard and survival functions are in closed forms. On the other hand, suppose one wants to consider the case where the lifetimes have the -normal distribution, and the data consist of both censored and masked observations. The application of (8) is clearly not straightforward because the hazard and survival functions do not have closed forms, and the overall likelihood cannot be written as a product of individual likelihoods, each with a single parameter. Yet, by treating the censored observations as missing data, it is possible to write the complete-data likelihood in (8) as a closed-form pdf. Using this “trick” of treating the censored

Because

where

has a Bernoulli distribution, we have . It follows that

, and

.

286

IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 2, JUNE 2005

TABLE I PARAMETER ESTIMATES UNDER CONSIDERATION

In concluding this section, we should stress that in the case of the exponential and Weibull distributions, the hazard and survival functions are in closed forms, so applying the EM algorithm using (8) is straightforward, and there is no need to treat the censored data as missing data. On the other hand, for the -normal and lognormal distributions, it is either impossible, or very difficult to obtain closed forms for the hazard and survival functions, so applying the EM algorithm through the use of (8) does not always work. However, (9) involves only the , so applying the EM algorithm using (9) corresponding pdf can be thought of as a straightforward generalized approach to the competing risks problem. In the Appendix, we provide the EM sequences assuming various distributions for the lifetimes, and then show how doing so allows one to obtain simple closed forms in the M-step of the EM algorithm.

TABLE II SIMULATED LIFETIME DATA FOR A THREE-COMPONENT SYSTEM WITH COMPLETE AND MASKED CAUSES

TABLE III PARAMETER ESTIMATES WITH COMPLETE AND MASKED CAUSES

V. EXAMPLES A. Exponential Distribution Model To compare our results, we consider the data earlier presented in Table I of Usher & Hodgson [5]. In their paper, the data consist of failure times with different information on the causes: complete observation, general masking, complete masking (Case 1), and two other cases (Case 2 and Case 3). To examine the convergence of the estimates, we compare the estimates from Usher & Hodgson (denoted by UH) with the estimates using the EM sequence of (8) (denoted by AB), and using the EM sequence of (9) (denoted by {P}). The EM algorithms of AB & P were run for only 10 iterations. The estimates based on AB have the same results as those based on UH after rounding at the third decimal point. This shows that the EM algorithms converge reasonably fast. The results are summarized in Table I. The estimates based on P converge to those based on UH after 20 iterations. This indicates that the estimates based on AB converge faster than those based on P. B. Lognormal Distribution Model The lognormal distribution, like the Weibull and exponential distributions, has been widely used in lifetime modeling. The

lognormal distribution is defined as the distribution of a random variable whose logarithm is -normally distributed. The series system of three components is considered. Using the results of the -normal model, we can estimate the parameters of the lognormal model. The data presented in Table II were simulated assuming lognormally-distributed component denote the lifetime of the th subject due to lifetimes. Let for , 2, 3 from the th cause. We generated the data . We used ; and , , . Then ( , ) were given by , where are censoring indicator variables, and are censoring times which are uniformly distributed between 5 and 6. To investigate the effects of masking, approximately 36% of the observed failures were randomly masked. We estimated the MLE of the parameters using both the EM algorithm with the complete data, and the EM algorithm with the masked data. The results are summarized in Table III, and it is clear from the table that the estimates obtained using the masked data are reasonably close to those using the complete data.

PARK: PARAMETER ESTIMATION OF INCOMPLETE DATA IN COMPETING RISKS USING THE EM ALGORITHM

287

Fig. 1. The parametric CIF with masked causes (solid line), and the complete causes (dashed line). The empirical CIF with the complete causes (dashed step functions).

Fig. 2. The conservative empirical CIF bound after merging (solid step functions). The parametric CIF with the masked causes (solid line). The parametric and empirical CIF with the complete causes (dashed lines).

Another procedure for testing how well the EM estimation algorithm for the masked data performs is through the use of the CIF mentioned in Section II. In general, the CIF, , can be obtained by numerical integration of the sub-density function in (4). In Fig. 1, we draw the parametric CIF using the masked-data estimates (sold line). To make a more obvious comparison with the complete data, we superimpose the CIF using the complete-data estimates (dashed line). This makes it even more obvious that the masked-data estimation procedure is reasonably efficient. Finally, a procedure for checking lognormality of the data is to compare the complete-data empirical CIF, proposed by Aalen [32], with the complete-data parametric CIF. This is also done in Fig. 1 with the dashed step functions representing the empirical CIF. Again, by examining Fig. 1, one can see that the empirical CIF lies reasonably close to the complete-data CIF. This is an indication that the assumption of a lognormal distribution for the lifetimes is reasonable. Of course, because the data were generated using the lognormal distribution, the result is expected but the same procedure is applicable in the usual case where the underlying distribution is not generated artificially, and is unknown. One problem with the empirical CIF approach proposed by Aalen is that it is not possible to obtain the exact empirical CIF with masked data. One way to get around this difficulty is to find a conservative band for the empirical CIF with masked data by grouping the masked causes into one specific cause. For example, if we have , then one can define what can be thought of as the grouped masked cause (either 1, 2, or 3), and then assign these masked causes to this “grouped’” cause. However, the problem with this approach is that the empirical CIF will be dependent on which grouped cause was

chosen. For example, consider the case where there are three causes, and both partial and complete maskings. Then, from the assignments or or or or or it is obvious that there will be more than 9 different empirical CIF depending on which cause is selected as the grouped cause. Clearly, as the number of causes increases, this procedure becomes infeasible. Rather than grouping the causes in the manner described above, a different approach which reduces the number of resulting empirical CIF can be constructed. Again, consider the case where are three masked causes 1, 2, 3. One can pick up a specific single cause (say, cause 1). Then one can define cause 1 as cause }, and define the rest of causes (causes 2 and 3) as cause . Then, the set of masked causes consists of , and only two possible assignments exist, that is, assignment of to either or . Assigning the masked causes to gives one extreme case of the empirical CIF of cause 1, again gives another extreme case of the and assigning to empirical CIF of cause 1. These two empirical CIF using these grouped causes form a conservative band shown in Fig. 2. The two empirical CIF are represented by the solid step functions, and the parametric CIF of cause (cause 1) is superimposed on the plot (solid line). Clearly, it can be seen that the solid line lies reasonably well within the conservative band. As a reference, the parametric and

288

IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 2, JUNE 2005

empirical CIF with the complete causes (dashed) are also copied from Fig. 1. Similarly, repeating the above method by merging causes 1 and 3 into one cause (say, cause ), we check the lognormality of cause 2.

M-step because both the hazard and survival functions are in closed forms. is an exponential random variable with We assume that . Thus, the pdf of is the rate parameter

VI. CONCLUDING REMARKS This paper developed a new method for finding the EM-type MLE in the competing risks model. The method presented here can handle complex data, such as fully-observed, censored, and partially-masked, in competing risks. It can also be easily implemented to find the EM sequences, even when the hazard and survival functions are not closed forms. It should be emphasized also that the proposed method is quite general. Its broad applicability is certainly an advantage of the proposed method. For other interesting engineering applications, the reader is referred to Park & Padgett [33]. To assess the accuracy of the estimator, one can use the following asymptotic distribution from the well-known property of the MLE

First, we obtain an EM sequence using (8). Using , and , we have the complete-data as log-likelihood of

Let , and denote the estimate of th EM sequence by . • E-step: It follows from that

at the

where where is the expected Fisher information matrix. Here, it is difficult to obtain the expected Fisher information matrix. However, this expected Fisher information matrix is easily estimated by the observed Fisher information matrix defined as

It is noteworthy that ancillary arguments suggest that the observed Fisher information is preferable to the expected Fisher into information [34]. Thus, substituting the EM-type MLE , we can easily obtain the asymptotic distribution of the MLE. Using this, we can assess the accuracy of the proposed estimator (i.e., asymptotic confidence regions for ). The MLE has long been known to be asymptotically efficient at the correctly specified model under certain regularity conditions [35]. On the other hand, it is now also widely understood that the MLE is often a very poor estimator under contamination and departures from the correct model. One possible remedy for these deficiencies of the MLE is the use of robust estimators, such as Huber’s -estimator [36], and Beran’s minimum Hellinger distance estimator [37]. However, it seems to be problematical to implement these estimators, especially when the number of parameters is large, which is often the case in competing risk problems. It is a challenging future work to develop a robust estimator under the over-parameterized competing risks model. APPENDIX THE DERIVATION OF THE EM SEQUENCES A. Exponential Distribution Model In the exponential case, the EM sequences can be obtained by either (8), or (9) without using numerical optimization in the

if if



It is worth mentioning that the above function using (8) coincides with the equation (3.1) of Albert & Baxter [14]. M-step: Differentiating with respect to , and setting this to zero, we obtain

Solving for , we obtain the in the M-step

st EM sequence

(10) Next, we can obtain a different EM sequence Using (9) instead of (8). We have the complete-data log-likelihood of



E-step: It follows from

that

PARK: PARAMETER ESTIMATION OF INCOMPLETE DATA IN COMPETING RISKS USING THE EM ALGORITHM



M-step: Differentiating with respect to , and setting this to zero, we obtain

Solving for , we obtain the in the M-step

st EM sequence

(11)

Note that in the limit as

Solving for

289

where is the truncated -normal} random variable with the pdf given by

We denote the estimate of and , respectively. by • E-step: We have

and

at the th EM sequence

, (11) becomes

where

, we have

Therefore, although the above EM sequence (11) is different from (10), they give the same limiting estimates. It is also worth noting that if we solve the stationary-point using the above results in (10) equations and (11) with only complete masking considered, then both solutions give

Using the above results, we have

where for . As expected, this result is the same as that of Park & Kulasekera [11] with a single group. where B. -Normal (Lognormal) Distribution Model For the -normal distribution, it is extremely difficult or impossible to obtain the EM sequences using (8) because finding the closed-form maximizer is not feasible in the M-step. Using (9), we can avoid these difficulties so that we obtain the EM sequences. This idea can easily be extended to the lognormal case using the fact that the logarithm of a random variable which is lognormally distributed has an -normal distribution. We assume that is an -normal} random variable with the mean and variance parameters . The pdf of is

if if •

M-step: Differentiating , we obtain

Differentiating , we obtain We have the complete-data log-likelihood of

as

with respect to

again with respect to

290

IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 2, JUNE 2005

Solving obtain the

and for and , we st EM sequence in the M-step as

Note that if the data are fully observed, then the so that the EM sequences become simply the MLE of and . It is of interest to look at the role of and when an observation is incomplete. If an observation is right-censored, , which results in the full weight ( i.e., then ) toward and , the -expectations of the respective random variables and having the pdf truncated at . has a value between 0 If an observation is masked, then and 1, of which the value is determined by the extent of masking. That is, as the number of indices in the set gets larger, the value becomes smaller, which results in and . more weight on ACKNOWLEDGMENT This paper is dedicated to the memory and honor of Prof. B. H. Lee of Nuclear Engineering at Seoul National University and KAIST. The author’s interests in mathematics and engineering were formed under the strong influence of him. Prof. Lee passed away peacefully in July 2001. REFERENCES [1] M. L. Moeshberger and H. A. David, “Life tests under competing causes of failure and the theory of competing risks,” Biometrics, vol. 27, pp. 909–933, 1971. [2] D. R. Cox, “The analysis of exponentially distributed lifetimes with two types of failures,” J. Royal Statist. Soc. B, vol. 21, pp. 411–421, 1959. [3] R. J. Herman and R. K. N. Patell, “Maximum likelihood estimation for multi-risk model,” Technometrics, vol. 13, pp. 385–396, 1971. [4] M. Miyakawa, “Analysis of incomplete data in competing risks model,” IEEE Trans. Reliab., vol. 33, pp. 293–296, 1984. [5] J. S. Usher and T. J. Hodgson, “Maximum likelihood analysis of component reliability using masked system life-test data,” IEEE Trans. Reliab., vol. 37, pp. 550–555, 1988. [6] J. S. Usher and F. M. Guess, “An iterative approach for estimating component reliability from masked system life data,” Qual. Reliab. Eng. Int., vol. 5, pp. 257–261, 1989. [7] F. M. Guess, J. S. Usher, and T. J. Hodgson, “Estimating system and component reliabilities under partial information of the cause of failure,” J. Statist. Planning and Inference, vol. 29, pp. 75–85, 1991. [8] B. Reiser, I. Guttman, D. K. J. Lin, F. M. Guess, and H. S. Usher, “Bayesian inference for masked system lifetime data,” Appl. Statist., vol. 44, pp. 79–90, 1995. [9] D. Kundu and S. Basu, “Analysis of incomplete data in presence of competing risks,” J. Statist. Planning and Inference, vol. 87, pp. 221–239, 2000. [10] D. Kundu, “Parameter estimation for partially complete time and type of failure data,” Biometrical J., vol. 46, pp. 165–179, 2004. [11] C. Park and K. B. Kulasekera, “Parametric inference of incomplete data with competing risks among several groups,” IEEE Trans. Reliab., vol. 53, pp. 11–21, 2004. [12] R. Jiang and D. N. P. Murthy, “Study of n-fold Weibull competing risk model,” Mathematical and Computer Modeling, vol. 38, pp. 1259–1273, 2003.

[13] T. Ishioka and Y. Nonaka, “Maximum likelihood estimation of Weibull parameters for two independent competing risks,” IEEE Trans. Reliab., vol. 40, pp. 71–74, 1991. [14] J. R. G. Albert and L. A. Baxter, “Applications of the EM algorithm to the analysis of life length data,” Appl. Statist., vol. 44, pp. 323–341, 1995. [15] M. G. Larson and G. E. Dinse, “A mixture model for the regression analysis of competing risks data,” Appl. Statist., vol. 34, pp. 201–211, 1985. [16] R. A. Maller and X. Zhou, “Analysis of parametric models for competing risks,” Statistica Sinica, vol. 12, pp. 725–750, 2002. [17] R. J. Gray, “A class of k -sample tests for comparing the cumulative incidence of a competing risk,” Ann. Statist., vol. 16, pp. 1140–1154, 1988. [18] E.-E. A. A. Aly, S. C. Kochar, and I. W. McKeague, “Some tests for comparing cumulative incidence functions and cause-specific hazard rates,” J. Amer. Statist. Assoc., vol. 89, pp. 994–999, 1994. [19] Y. Sun and R. C. Tiwari, “Comparing cumulative incidence functions of a competing-risks model,” IEEE Trans. Reliab., vol. 46, pp. 247–253, 1997. [20] H. Lindkvist and Y. Belyaev, “A class of nonparametric tests in the competing risks model for comparing two samples,” Scandinavian J. Statist., vol. 25, pp. 143–150, 1998. [21] K. F. Lam, “A class of tests for the equality of k cause-specific hazard rates in a competing risks model,” Biometrika, vol. 85, pp. 179–188, 1998. [22] X. Luo and B. W. Turnbull, “Comparing two treatments with multiple competing risks endpoints,” Statistica Sinica, vol. 9, pp. 986–998, 1999. [23] S. B. Kulathinal and D. Gasbarra, “Testing equality of cause-specific hazard rates corresponding to m competing risks among k groups,” Lifetime Data Analysis, vol. 8, pp. 147–161, 2002. [24] C. El-Nouty and R. Lancar, “The presmoothed Nelson-Aalen estimator in the competing risk model,” Communications in Statistics: Theory and Methods, vol. 33, pp. 135–151, 2004. [25] R. Tiwari, K. B. Kulasekera, and C. Park, “Nonparametric tests for cause specific hazard rates with censored data in competing risks among several groups,” J. Statist. Planning and Inference, 2005, to be published. [26] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Statist. Soc. B, vol. 39, pp. 1–22, 1977. [27] J. L. Schafer, Analysis of Incomplete Multivariate Data: Chapman & Hall, 1997. [28] R. J. A. Little and D. B. Rubin, Statistical Analysis With Missing Data, 2nd ed. New York: John Wiley & Sons, 2002. [29] M. A. Tanner, Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions: SpringerVerlag, 1996. [30] D. F. Heitjan, “Inference from grouped continuous data: a review (with discussion),” Statist. Sci., vol. 4, pp. 164–183, 1989. [31] D. F. Heitjan and D. B. Rubin, “Inference from coarse data via multiple imputation with application to age heaping,” J. Amer. Statist. Assoc., vol. 85, pp. 304–314, 1990. [32] O. O. Aalen, “Nonparametric estimation of partial transition probabilities in multiple decrement models,” Ann. Statist., vol. 6, pp. 534–545, 1978. [33] C. Park and W. J. Padgett, “Analysis of strength distributions of multimodal failures using the EM algorithm,” Department of Statistics University of South Carolina, Tech. Rep. 220, 2004. [34] B. Efron and D. V. Hinkley, “Assessing the accuracy of the maximum likelihood estimator: Observed versus expected fisher information,” Biometrika, vol. 65, pp. 457–487, 1978. [35] E. L. Lehmann, Theory of Point Estimation. New York: SpringerVerlag, 1998. [36] P. J. Huber, “Robust estimation of a location parameter,” Ann. Math. Statist., vol. 35, pp. 73–101, 1964. [37] R. J. Beran, “Minimum Hellinger distance estimates for parametric models,” Ann. Statist., vol. 5, pp. 445–463, 1977.

Chanseok Park is an Assistant Professor of Mathematical Sciences at Clemson University, Clemson, SC. He received the B.S. degree in mechanical engineering from Seoul National University; the M.A. degree in mathematics from the University of Texas at Austin; and the Ph.D. degree in statistics in 2000 from the Pennsylvania State University. His research interests include minimum distance estimation, survival analysis, statistical computing and simulation, acoustics, and solid mechanics.

Suggest Documents