Likelihood-Based Inference for the Shape Parameter of ... - Springer Link

2 downloads 0 Views 160KB Size Report
Abstract. Based on a Type 2 censored sample, we use the likelihood-based approach to draw likelihood inference on the shape parameter c of a two-parameter ...
Lifetime Data Analysis, 10, 293–308, 2004 Ó 2004 Kluwer Academic Publishers. Printed in The Netherlands.

Likelihood-Based Inference for the Shape Parameter of a Two Parameter Weibull Distribution SHAUL K. BAR-LEV Department of Statistics, University of Haifa, Haifa 31905, Israel

[email protected]

Received August 22, 2002; Revised September 8, 2003; Accepted December 22, 2003 Abstract. Based on a Type 2 censored sample, we use the likelihood-based approach to draw likelihood inference on the shape parameter c of a two-parameter Weibull distribution. In particular, we derive the profile, conditional and marginal likelihoods of c. Numerical results along with some concluding remarks regarding the use of likelihood-based methods for inference are provided. Keywords: conditional likelihood, likelihood interval, marginal likelihood, profile likelihood, Type 2 censored sample, Weibull distribution

1. Introduction Based on a Type 2 censored sample, we use the likelihood-based approach to draw likelihood inference on the shape parameter of a two parameter Weibull distribution. The scale and shape Weibull distribution is given by a probability density function (p.d.f.) and a cumulative distribution function (c.d.f.) of the form fðx : c; hÞ ¼ ðc=hÞðx=hÞc1 expfðx=hÞc gIð0;1Þ ðxÞ;

h > 0;

c > 0;

ð1Þ

and Fðx : c; hÞ ¼ ½1  expfðx=hÞc gIð0;1Þ ðxÞ;

ð2Þ

where IA ðxÞ designates the indicator function of a set A and 1 < x < 1. The Weibull distribution plays an important role in life testing and survival analysis. The special case where c ¼ 1 makes (2) the exponential distribution with scale parameter h: Various classical procedures, such as tests of hypotheses and confidence intervals, as well as Bayesian methods have been proposed for inference on the shape parameter c (see Lawless 1982, 2003; Nelson 1982, 1990; Meeker and Escobar, 1998). In this study we propose a likelihood-based approach for likelihood inference on c, which also provides plausibility statements regarding the particular value c ¼ 1: The likelihood approach was first suggested by Fisher (cf., Fisher, 1956) and later developed, among others, by Birnbaum (1962), Barnard (1966), Barnard, Jenkins and Winsten (1962), Sprott and Kalbfleisch (1969), Kalbfleisch and Sprott (1970a,1970b,1973), Edwards (1972), Lindsey (1973), Dawid (1975), Sprott (1975), Pace and Salvan (1997, Chapter 4) and Royall (1997). Applications of likelihood methods to some problems in life testing can be found in Whitney (1974) and Reiser

294

BAR-LEV

and Bar-Lev (1979). Some likelihood methods, employed in various different directions and contexts, are discussed and developed in Severini (2000), which contains a good survey of likelihood-based methods. The likelihood approach has been applied to various areas, such as time series, life testing, linear models and psychological stochastic learning. Basically, this approach embraces the likelihood principle stating that the likelihood function contains all available information on the unknown parameters that can be extracted from the sample. Those parameter values, for which there is a relatively large probability of obtaining the observed sample, are considered as being supported by the data and are therefore regarded as more plausible; and vice versa. The most plausible value of an unknown parameter is the related maximum likelihood estimate (MLE). If LðfÞ is the likelihood function of f (possibly a vector) based on a given sample data, and ^f is : the MLE of f, then the relative likelihood function of f is the ratio RðfÞ¼LðfÞ=Lð^fÞ which ranges between 0 and 1. Values of f for which RðfÞ is ‘‘small’’ are viewed as implausible, whereas values making RðfÞ ‘‘large’’ are considered as plausible. The set ff : RðfÞ  ag is called a 100a% likelihood region for f. Accordingly, one might consider values of f within a 50% likelihood region as highly plausible whereas values of f ranging outside a 5% or a 10% likelihood region as being implausible. Whether to use the likelihood principle-based approach for inference or the more commonly used methods of classical, frequency-based, statistical inference, has no definitive simple answer. We have found some summarizing statements in Severini (2000, pp. 79–80) as providing, at least, a partial answer to this question: ‘‘The likelihood principle is in conflict with another important principle of statistical inference, the repeated sampling principle’’, which states ‘‘that statistical procedures should be evaluated on the basis of their behavior in hypothetical repetitions of the experiment that generated the original data’’, although, ‘‘there is considerable arbitrariness how this principle is interpreted, in particular, in how the hypothetical repetitions are defined’’. Finally, Severini states that ‘‘the main reason that the likelihood principle is not routinely used is most likely due to the fact that many users of statistical methods find other principles, such as repeated sampling principle, more compelling, rather than the fact that the likelihood function gives misleading inferences in some rare cases.’’ In this paper we apply the likelihood approach to the Weibull distribution for the situation where the shape and scale parameters c and h are regarded as the structural and nuisance parameters, respectively. General likelihood-based methods have been proposed for the elimination of a nuisance parameter while focusing on the structural parameter only. Some of these methods are the profile likelihood, marginal likelihood, conditional likelihood and integrated likelihood (for a brief description, see Pace and Salvan, 1997; Severini, 2000). The first three of these likelihoods are used in the sequel for likelihood inference on the shape parameter c of the Weibull distribution. For the sake of completeness, these three methods will be briefly reviewed in Section 2. In Section 3, we derive the profile, marginal and conditional likelihoods of c. A numerical example, based on a Type 2 censored sample from (1), is provided in Section 4. Likelihood inferences resulting from these three likelihoods

295

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

are then presented and compared. Some concluding remarks regarding the use of likelihood-based methods for inference are presented in Section 5. 2. Profile, Marginal and Conditional Likelihoods In this Section we review the notions of profile, marginal and conditional likelihoods that are employed in the sequel. This review is primarily based on Sprott and Kalbfleisch (1969), Kalbfleisch and Sprott (1970a,1970b,1973) and Sprott (1975). Additional references are Barndorff-Nielsen (1978), Bhapkar (1991), Jorgensen (1993), Zhu and Reid (1994) and Pace and Salvan (1997, Chapter 4). Consider a random sample x ¼ ðx1 ; . . . ; xn Þ drawn from a distribution depending on a parameter f and let ðc; hÞ be a partition of f; where c is the parameter of interest (the structural parameter) and h is a nuisance parameter (both c and h may be vectors). Assume, without loss of generality, that the joint p.d.f fðx : c; hÞ of x is continuous. The likelihood function of ðc; hÞ is proportional to fðx : c; hÞ up to constant not depending on ðc; hÞ and we may write Lðc; hÞ ¼ fðx : c; hÞ: Inferences on the structural parameter c can be obtained by eliminating the nuisance parameter h from the model and constructing a likelihood function depending on c only. The profile likelihood of c provides such an elimination by simply replacing the nuisance parameter h with its MLE ^hðcÞ; while keeping c fixed. The profile likelihood is therefore defined by PðcÞ ¼ suph Lðc; hÞ ¼ Lðc; ^hðcÞÞ so that the relative profile likelihood is PðcÞ PðcÞ ¼ ; ð3Þ supc PðcÞ Lð^c; ^ hÞ where ^c and ^ h are the respective MLE’s of c and h. RPðcÞ serves as an upper bound on the relative joint likelihood of ðc; hÞ: The main drawback to the use of RPðcÞ is that it assumes that for any fixed c, the nuisance parameter h attains its most likely value. In small samples or when the dimension of h is large, replacing h by an estimator may lead to a loss of some measure of accuracy concerning inferential conclusions on c. Two other methods for eliminating the nuisance parameter from the model are marginal and conditional likelihoods of the structural parameter. For their presentation we employ the definitions given in Sprott and Kalbfleisch (1969). We begin with a marginal likelihood of c which is defined as follows. Assume that the p.d.f. Q fðx : c; hÞ ni¼1 dxi satisfies the two following conditions: : RPðcÞ¼

ðM1 Þ There exists a nonsingular transformation ðx1 ; . . . ; xn Þ ! ðy1 ; . . . ; yr ; n Q yrþ1 ; . . . ; yn Þ such that fðx : c; hÞ dxi can be decomposed into two p.d.f.’s as i¼1 " #" # n r n Y Y Y dyi : fðx : c;hÞ dxi ¼ gðy1 ;...;yr : cÞ dyi hðyrþ1 ;...;yn : c;h j y1 ;...;yr Þ i¼1

i¼1

i¼rþ1

ð4Þ

296

BAR-LEV

ðM2 Þ TheQ submodel defined by the conditional p.d.f. hðyrþ1 ; . . . ; yn : c; h j y1 ; . . . ; yr Þ ni¼rþ1 dyi contains no information on c in the absence of knowledge on h . Condition M1 implies that ðy1 ; . . . ; yr Þ is jointly ancillary for h in the presence of c . This joint ancillary may depend on c . The marginal likelihood and the relative marginal likelihood of c are defined, respectively, by Y : MðcÞ¼gðy1 ; . . . ; yr : cÞ dyi ; r

ð5Þ

i¼1

: RMðcÞ¼

MðcÞ : supc MðcÞ

ð6Þ

By this definition of a marginal likelihood, the information on c that Q may be contained in the conditional submodel hðyrþ1 ; . . . ; yn : c; h j y1 ; . . . ; yr Þ ni¼rþ1 dyi is ignored. Condition M2 assumes that no lossQof information on c is caused by focusing on the marginal submodel gðy1 ; . . . ; yr : cÞ ri¼1 dyi , making the conditional submodel nonformative with respect to c in the absence of knowledge on h. A natural question then arises of how one can define, both intuitively and formally, the notion of a nonformative submodel. Indeed, many attempts have been made in the literature to define this notion and various definitions have been proposed, implying that the definition of a marginal likelihood is not unique. All such definitions, stemming from different perspectives, are intuitively appealing. One would expect, however, that the resulting marginal likelihoods, even if are analytically different, would yield similar shapes. (Indeed, in this respect, see Figure 1 in Section 4 relating to a comparison between the marginal and conditional likelihoods for the shape parameter of the Weibull distribution.) For specific models, some of the definitions of nonformative submodels may result in the same marginal likelihood (as is the case with the marginal likelihood for the shape parameter in the Weibull model–see Section 3.2). It is beyond the scope of this paper to review the various definitions and the reader is referred to the aforementioned references and in particular to Sprott (1975), Barndorrf-Nielsen (1978), Godambe (1980), Remon (1984), Bhapkar (1991), Zhu and Reid (1994) and Pace and Salvan (1997, Chapter 4). A good description of this problem, i.e., whether there is a universal definition for a submodel to be nonformative for a structural parameter in the presence of a nuisance parameter, can be found in Jorgensen (1993). The Weibull case, treated in this paper, will be shown to satisfy two of these definitions; one proposed by Sprott and Kalbfleisch (1969) and the other by Barnard and Sprott (1983). Note that Q if the vector ðy1 ; . . . ; yr Þ is functionally independent Qof c then the Q volume element ri¼1 dyi can be omitted from MðcÞ in (5). Otherwise, ri¼1 dyi ¼ ri¼1 dyi ðcÞ cannot be ignored. Sprott and Kalbfliesch (1969) computed, under some Euclidean Q assumptions, the volume element ri¼1 dyi ðcÞ and showed that up to a constant not depending on c it is equal to

297

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

j KT K j1=2=j L j;

ð7Þ

where j L j is the determinant of the Jacobian L¼

@ðx1 ; . . . ; xn Þ @ðy1 ; . . . ; ynÞ

ð8Þ

associated with the transformation ðx1 ; . . . ; xn Þ ! ðy1 ; . . . ; yr ; yrþ1 ; . . . ; yn Þ and K is the n  ðn  rÞ matrix defined by   @xi ð9Þ K¼ @yj i¼1;...;n; j¼rþ1;...;n Consequently, substituting (7) in (5) gives MðcÞ ¼ gðy1 ; . . . ; yr : cÞ j KT K j1=2 = j L j .

ð10Þ

Similarly, following Sprott and Kalbfleisch (1969), a conditional likelihood of c requires the two following conditions to hold: ðC1 Þ There exists a nonsingular transformation ðx1 ; . . . ; xn Þ ! ðt1 ; . . . ; tk ; Q ykþ1 ; . . . ; yn Þ such that fðx : c; hÞ ni¼1 dxi can be decomposed into two p.d.f.’s as n Y dxi fðx : c; hÞ "

i¼1

¼ f1 ðx1 ; . . . ; xn : c j t1 ; . . . ; tk Þ

n Y

dxi =

k Y

#" dti

f2 ðt1 ; . . . ; tk : c; hÞ

i¼1

i¼1 Qk c; hÞ i¼1

k Y

# dti :

ð11Þ

i¼1

dti contains no available information ðC2 Þ The marginal p.d.f. f2 ðt1 ; . . . ; tk : on c in the absence of knowledge on h: In this case the conditional and relative conditional likelihoods of c are defined by CðcÞ ¼ f1 ðx1 ; . . . ; xn : c j t1 ; . . . ; tk Þ

n Y i¼1

: RCðcÞ¼

CðcÞ : supc CðcÞ

dxi =

k Y

dti ;

ð12Þ

i¼1

ð13Þ

Note that condition C1 states that ðt1 ; . . . ; tk Þ is jointly sufficient for h for any fixed c. As for the marginal Q likelihood, the statement in C2 that the marginal submodel f2 ðt1 ; . . . ; tk : c; hÞ ki¼1 dti contains no available information on c in the absence of knowledge on h; has also been given various definitions, implying again that the definition of aQconditional likelihood is not unique. Q If ni¼1 dxi = ki¼1 dti does not depend on c, it can be deleted from (12). Otherwise, Sprott and Kalbfliesch (1969) showed, under some Euclidean assumptions, that up to a constant not depending on c it equals

298

BAR-LEV

1= j JJT j1=2 ;

ð14Þ

where J is the k  n matrix given by   @ti J¼ : @xj i¼1;...;k;j¼1;...;n

ð15Þ

Hence, substituting (14) in (12), yields CðcÞ ¼ f1 ðx1 ; . . . ; xn : c j t1 ; . . . ; yk Þ= j JJT j1=2 :

ð16Þ

3. An Application for the Weibull Distribution In this section we derive the profile, marginal and conditional likelihoods of the shape parameter c of the Weibull distribution (1) under a Type 2 censored sample. Assume that n independent items with survival distribution (1) are placed on a test. The test stops once the predetermined r-th failure time is observed, where 1  r  n. Let x1      xr denote the corresponding r failure times, then their joint p.d.f. is r  c r Y fðx1 ; . . . ; xr : c; hÞ ¼ C c xc1 expfTðcÞ=hc gIð0;1Þ ðx1 Þ; ð17Þ h i¼1 i where C ¼ n!=ðn  rÞ! and r r X : X c xi þ ðn  rÞxcr ¼ ðn  i þ 1Þðxci  xci1 Þ; TðcÞ ¼ i¼1

x0  0:

ð18Þ

i¼1

The joint likelihood Lðc; hÞ, based on x1      xr , equals fðx1 ; . . . ; xr : c; hÞ in (17). Whitney (1974) used the relative joint likelihood of ðc; hÞ to plot contours for likelihood inference on ðc; hÞ. 3.1. The Profile Likelihood of c For fixed c, the MLE ^ hðcÞ for h is defined by the relation Lðc; ^hðcÞÞ ¼ suph Lðc; hÞ: Hence, by a straightforward differentiation of ln Lðc; hÞ, we obtain that ^ hðcÞ ¼ ðTðcÞ=rÞ1=c , whereas the MLE ^c for c is the unique solution of the likelihood equation  Pr  c r c X r r i¼1 xi ln xi þ ðn  rÞxr ln xr þ ln xi ¼ 0: ð19Þ  ^c Tð^cÞ i¼1 Thus, the relative profile likelihood in (3) is given by

299

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

RPðcÞ ¼

  r c Tð^cÞ r Y ðc^cÞ x : ^c TðcÞ i¼1 i

ð20Þ

3.2. A Marginal Likelihood for c We first derive the marginal likelihood of c following criteria M1 and M2 of Section 2. We then follow a referee’s suggestion to use the equivalence between the Weibull distribution and extreme value distribution (which belongs to a location-scale family) to provide an alternative approach which in our case results in the same solution. Consider the transformation ðx1 ; . . . ; xr Þ ! ðy1 ; . . . ; yr ; TðcÞÞ, where the yi ’s are defined by yi ¼

xci ; i ¼ 1; ... ;r  1; TðcÞ

yr ¼

ðn  r þ 1Þxcr ; 0  y1     yr ; TðcÞ

r X

yi ¼ 1:

i¼1

ð21Þ This transformation satisfies both conditions M1 and M2 as is shown in the next lemma. Lemma 1. Let ðy1 ; . . . ; yr Þ be defined by (21), then (i) TðcÞ gamma (r; hc Þ, where gamma ðr; bÞ designates the gamma distribution with shape and scale parameters r and b, respectively. (ii) For any given c, the random vector ðy1 ; . . . ; yr Þ is jointly ancillary for h. (iii) The conditional distribution of TðcÞ given ðy1 ; . . . ; yr Þ satisfies condition M2 , i.e., it contains no available information on c in the absence of knowledge on h. Proof: (i) Let zi ¼ xci ; i ¼ 1;    ; r; then it is easy to show that z1      zr are distributed as the first r order statistics in a random sample of size n stemming from an exponential distribution with scale parameter hc : Note that by (18), Pr TðcÞ ¼d i¼1 ðn  i þ 1Þðzi  zi1 Þ; where the symbol ¼d designates an equality in distribution. By using well known results on the exponential distribution, we get that ðn  i þ 1Þðzi  zi1 Þ; i ¼ 1; . . . ; r; are i.i.d. r.v.’s with common gamma ð1; hc Þ distribution. This implies ðiÞ: P (ii) Since ri¼1 yi ¼ 1, the joint distribution of ðy1 ; . . . ; yr Þ, defined in (21), is supported on a hyperplane. Accordingly, we consider the transformation ðx1 ; . . . ; xr Þ ! ðy1 ; . . . ; yr1 ; TðcÞÞ: The Jacobian of such a transformation is @ðx1 ; . . . ; xr Þ ¼ j Jj ¼ @ðy ; . . . ; y ; TðcÞÞ 1

r1

ðTðcÞÞr=c1 cr ðn  r þ 1Þ1=c

1

r1 X i¼1

!1=c1 yi

r1 Y

yi 1=c1 :

i¼1

ð22Þ

300

BAR-LEV

Thus, the joint p.d.f. of ðy1 ; . . . ; yr1 ; TðcÞÞ and the marginal p.d.f. of ðy1 ; . . . ; yr1 Þ are given, respectively, by f3 ðy1 ; . . . ; yr1 ; TðcÞÞ

and

!

r1 X n! ðTðcÞÞr1 TðcÞ exp  c Ið0;1Þ ðTðcÞÞIð0;1Þ yi ¼ ðn  rÞ!ðn  r þ 1Þ! h hcr i¼1

ð23Þ

! r1 X n!ðr  1Þ! Ið0;1Þ yi Ið0;1Þ ðy1 Þ: gðy1 ; . . . ; yr1 Þ ¼ ðn  rÞ!ðn  r þ 1Þ! i¼1

ð24Þ

Since TðcÞ  gamma ðr; hc Þ; we obtain from (23) and (24) that ðy1 ; . . . ; yr1 Þ and TðcÞ are stochastically independent and that for any given c, ðy1 ; . . . ; yr1 Þ is jointly ancillary for h: Note also that the marginal p.d.f. of ðy1 ; . . . ; yr1 Þ in (24) does not depend on c and is constant in ðy1 ; . . . ; yr1 Þ (which by (21) are functions of c). Nevertheless, Q the marginal likelihood of c is not constant in c, since by (5) and (7), the factor r1 i¼1 dyi ðcÞ of MðcÞ does depend on c. (iii) We have to show that the conditional distribution of TðcÞ given ðy1 ; . . . ; yr1 Þ satisfies condition M2 : However, in part (ii) we showed that TðcÞ and ðy1 ; . . . ; yr1 Þ are stochastically independent. Accordingly, we have to show that the marginal distribution of TðcÞ does not contain any available information on c in the absence knowledge of h. We show this by employing two definitions of the notion. One appears in condition (b) of Sprott (1975); the other has been proposed by Kalbfleisch and Sprott (1970a). Indeed, by part ðiÞ; TðcÞ  gamma ðr; hc Þ: Thus, U ¼ TðcÞ=hc  c cðr; 1Þ is a pivotal quantity for h: This satisfies the premises of condition (b) in Sprott (1975). Furthermore, denote b ¼ hc , then for any given c; b is a one-to-one function of h: Consequently, for a given b, c is not identifiable in the absence knowledge of h. This conclusion fulfills the appropriate condition in one of the definitions given by Kalbfleisch and Sprott (1970a); concluding that the distribution of TðcÞ does not contain any available information on c in the absence of knowledge on h. h The general expression for MðcÞ is given by (10), where general expressions for L and K are given by (8) and (9), respectively. For the present situation, L ¼ J, and hence by (22), we obtain r Y j L j¼j J j¼ cr ðTðcÞÞr1 x1c ; ð25Þ i i¼1

whereas KT is given by KT ¼ ð@x1 =@TðcÞ; . . . ; @xr =@TðcÞÞ 1 1 1 yr ; ¼ ½TðcÞy1 1=c1 y1 ; . . . ; ½TðcÞyr1 1=c1 yr1 ; ½TðcÞyr 1=c1 1=c c c c ðn  r þ 1Þ which implies that

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

T

jK Kj

1=2

" #   r1 1 X TðcÞyr 2=c2 y2r 2=c2 2 ¼ y ðTðcÞyi Þ þ c i¼1 i nrþ1 ðn  r þ 1Þ2 !1=2 r X 1 2 ¼ x : cTðcÞ i¼1 i

301

ð26Þ

By substituting (22), (25) and (26) in (10), the marginal likelihood is obtained as MðcÞ ¼ Ccr1 ðTðcÞÞr

r Y

xci ;

ð27Þ

i¼1

where C is a constant not depending on c. Finally, the relative marginal likelihood is given by  r  r1  c Tð^cM Þ r Y c^c RMðcÞ ¼ xi M ; ð28Þ ^cM TðcÞ i¼1 where ^cM , the maximum marginal likelihood estimate, is the unique solution of the marginal likelihood equation " # r r X X r1 ^cM 1 ^cM  rðTð^cM ÞÞ xi ln xi þ ðn  rÞxr ln xr þ ln xi ¼ 0: ð29Þ ^cM i¼1 i¼1 A referee has suggested an alternative approach for deriving a marginal likelihood for c by using the equivalence existing between the Weibull distribution (1) and the extreme value ðl; rÞ distribution under the one-to-one transformation w ¼ log x, l ¼ log h and r ¼ c1 . Such a transformation results in f2 ðw : l; rÞ ¼ r1 exp½ðw  lÞ=r  expððw  lÞ=rÞ;

1 < w < 1;

ð30Þ

for the p.d.f. of w. This is a location-scale family allowing an exact conditional analysis by conditioning on the maximal ancillary in the form proposed by Fisher (1934), which yields a marginal likelihood for r (and therefore for c) not depending on an Euclidean assumption. More specifically, consider a Type 2 censored sample w1      wr from (30), where wi ¼ log xi , i ¼ 1;    ; r. For this situation the MLE’s ^ and r ^ for l and r; respectively, are equivariant estimators, whereas the maximal l ^Þ=^ ancillary is a ¼ ða1 ; . . . ; ar Þ; ai ¼ ðwi  l r; i ¼ 1; . . . ; r: Here, the ai ’s are invariant under a group of location-scale transformations so that the distribution of a does not ^  lÞ=^ ^=r are pivotal quantities for l depend on ðl; rÞ. Moreover, z1 ¼ ðl r and z2 ¼ r and r; respectively (for further details see Barnard and Sprott, 1983; Lawless, 1982, Section 4). Following Fisher (1934), Barnard and Sprott suggested basing inferences for l and r on the conditional p.d.f. of z1 and z2 given a, where integration with respect to z1 gives the marginal p.d.f. of z2 (given a) on which inference for r can be based. Fortunately, for Type 2 censoring, this marginal p.d.f. has been derived by Lawless (1982, Theorem 4.1.3, p. 150) in the form

302

BAR-LEV

f3 ðz2 Þ ¼ kða; r; nÞ exp 

r X

! ai

i¼1



r

1

r X

!!r

expðai z2 Þ þ ðn  rÞ expðar z2 Þ

i¼1

exp z2

r X

! ai ;

ð31Þ

i¼1

where z2 > 0 and kða; r; nÞ is a function depending on a; r; and n only. To obtain from (31) a marginal likelihood for c, we employ the transformation z2 ! rz2 ¼ ^Þ=^ c1 z2 and then in the resulting expression ai ¼ ðwi  l r ¼ ^cðlog xi   substitute  log ^ hÞ ¼ ^c log xi =^ h , i ¼ 1; . . . ; r. This yields the following expression: r Y Ccr1 ðTðcÞÞr xci ; ð32Þ i¼1

as a marginal likelihood for c. Note that (32) coincides with the marginal likelihood for c in (27) which was obtained using Sprott and Kalbfleisch’s (1969,1970a,1970b, 1973) approach. This coincidence does not seem to be too surprising. Both approaches have attempted to exploit all available information contained in the sample about c in terms of marginal likelihoods. The marginal likelihood of c in (27) was obtained following the Sprott and Kalbfleisch approach using a maximal ancillary for h which depends on c and an Euclidean assumption. Alternatively, the Barnard and Sprott approach for a location-scale family is based on Fisher’s idea of conditioning on a maximal ancillary and extracting the information on the scale parameter. 3.3. The Relative Conditional Likelihood of c Using Lemma 1, it can be readily seen that TðcÞ is sufficient for h, for any fixed c: Moreover, it has been shown in the proof of part (ii) of Lemma 1, that the distribution of TðcÞ does not contain any available information on c in the absence of knowledge on h. Hence, by utilizing (12) and the notation in (11 ), the conditional likelihood of c can be written as r Y CðcÞ ¼ f1 ðx1 ; . . . ; xr: c j TðcÞÞ dxi =dTðcÞ; ð33Þ where

r Q

i¼1

dxi =dTðcÞ ¼ 1= j JJT j1=2 and

  i¼1 c1 c1 ; . . . ; cx ; ðn  r þ 1Þcx J ¼ ð@TðcÞ=@x1 ; . . . ; @TðcÞ=@xr Þ ¼ cxc1 ; r 1 r1

implying that " T 1=2

j JJ j

¼c

r1 X

#1=2 x2c2 i

þ ðn  r þ

1Þ2 x2c2 r

:

ð34Þ

i¼1

Employing (17), (33) and (34), we obtain that the conditional likelihood of c is

303

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

 CðcÞ ¼ C1

c TðcÞ

r1 Y r

!" xc1 i

i¼1

r1 X

#1=2 x2c2 i

þ ðn  r þ

1Þ2 x2c2 r

;

ð35Þ

i¼1

where C1 is a constant not depending on c: The relative conditional likelihood is therefore "P #   2^c 2 1=2 r1 2^cC 2 r cTð^cC Þ r1 Y þ ðn  r þ 1Þ2 xr C c^cC i¼1 xi xi ; ð36Þ RCðcÞ ¼ Pr1 2c2 ^cC TðcÞ þ ðn  r þ 1Þ2 x2c2 r i¼1 i¼1 xi where ^cC is the maximum conditional likelihood estimate which solves the conditional likelihood equation " # r r X X r1 ^cC 1  ðr  1ÞðTð^cC ÞÞ xi ln xi þ ðn  rÞx^cr C ln xr þ ln xi  RðcÞ ¼ 0; ^cC i¼1 i¼1 ð37Þ where RðcÞ is defined by Pr1 2^cC 2 2^c 2 x ln xi þ ðn  r þ 1Þ2 xr C ln xr : RðcÞ ¼ i¼1 Pi r1 2^c 2 2^c 2 C þ ðn  r þ 1Þ2 xr C i¼1 xi

ð38Þ

It can be seen that the expression for the conditional likelihood (36 ) is analytically more cumbersome than the corresponding expression for the marginal likelihood in (28). 4. A Numerical Example Numerous studies on the Weibull distribution are available in the literature. This reflects the important role of this distribution in data modelling and survival analysis, as well as the fact that the MLEs for the parameters involved do not have explicit expressions, but rather are given in terms of implicit likelihood equations (cf., (19)). Several alternative estimation procedures for the shape parameter c; as well as for the other parameters, have been proposed. These procedures include, among others, method of moments, best linear unbiased, median unbiased and minimum quantile estimation. For brevity, we do not review these procedures, but refer the reader to Lawless (1982), Johnson, Kotz and Balakrishnan (1994, Chapter 21) and Meeker and Escobar (1998) for an appropriate thorough survey. We now implement the above procedure by employing the following Type 2 censored data taken from Mann, Schafer and Singpurwalla (1974). n ¼ 10 items are placed on a test, while waiting for the fifth ðr ¼ 5Þ failure time to occur. The failure times have been taken, in fact, from an exponential distribution (i.e., the Weibull distribution with shape parameter c ¼ 1). The five observed failure times are x1 ¼ 1:9;

x2 ¼ 5:6;

x3 ¼ 9:9;

x4 ¼ 19:3;

x5 ¼ 24:5;

r ¼ 5;

n ¼ 10: ð39Þ

304

BAR-LEV

The three estimates: the MLE ^c, the maximum marginal likelihood estimate (MMLE) ^cM and the maximum conditional likelihood estimate (MCLE) ^cC ; are obtained as the unique solutions of the likelihood equations (19), (29) and (37), respectively. Accordingly, for the given data in (39), one obtains ^c ¼ 1:108;

^cM ¼ 0:915; ^cC ¼ 0:917:

ð40Þ

We observe that the MMLE and MCLE are almost equal. Using (20), (28) and (36), respectively, we obtain that the relative profile, marginal and conditional likelihoods are   107 c 5 c RPðcÞ ¼ a; ð41Þ 2:4476 TðcÞ   107 c 51 c a; RMðcÞ ¼ 2:4289 TðcÞ c

ð42Þ

    106 c 4 1 1=2 c RCðcÞ ¼ a; DðcÞ 7:056 TðcÞ

ð43Þ

and

where a ¼ 4:9808  104 TðcÞ ¼ 1:9c þ 5:6c þ 9:9c þ 19:3c þ 6ð24:5c Þ; and DðcÞ ¼ 1:92c2 þ 5:62c2 þ 9:92c2 þ 19:32c2 þ 36ð24:52c2 Þ:

0.75

0.5

0.25

0 0.5

1

1.5

Figure 1. The relative likelihoods in (41), (42) and (43).

2

2.5

3

ð44Þ

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

305

Table 1. Relative Profile, Marginal and Conditional Likelihood Intervals. Profile 5% likelihood interval 10% likelihood interval 50% likelihood interval 90% likelihood interval Likelihood estimates

(0. 318, 2.619) (0.382, 2.386) (0.647, 1.729) (0.907, 1.326) 1.108

Marginal (0.219, (0.273, (0.502, (0.735, 0.915

2.334) 2.113) 1.494) 1.119)

Conditional (0.239, 2.315) (0.293, 2.095) (0.519, 1.483) (0.743, 1.115) 0.917

Figure 1 plots the three relative likelihoods in (41)–(43): RPðcÞ (solid line), RMðcÞ (dash) and RCðcÞ (dots). We can realize from Figure 1 that the relative marginal and conditional likelihoods almost coincide. (It is not feasible to distinguish between the dash and dots lines.) Both are more concentrated around c ¼ 1; than the relative profile likelihood. The latter also has much slower tailing off for larger values of c: Recall that the main drawback of RPðcÞ is that it assumes that, for any fixed c, the nuisance parameter h attains its most likely value. As opposed to RPðcÞ; the two other relative likelihoods, marginal and conditional, have been shown in Lemma 1 to contain, when h is unknown, all of the sampling information available on c. This fact is perhaps reflected in Figure 1 by the (almost) coincidence of the two curves, marginal and conditional. Table 1 displays 5%; 10%; 50%; and 90% likelihood intervals for the three various types: relative maximum, marginal and conditional likelihoods. For comparison purposes, it also displays the corresponding likelihood estimates presented in (40). It can be seen from Table 1 and Figure 1 that the relative profile likelihood intervals are skewed to the right. At any given likelihood level, the length of each relative profile likelihood is larger than the corresponding length of each of the two other likelihood intervals (marginal and conditional). This is not too surprising due to the definition of the relative profile likelihood in which the nuisance parameter h is replaced by its partial maximum likelihood estimate. In large samples one may expect that such a replacement will have a relatively minor effect on inferences concerning c. In small samples (as in the present numerical example where n ¼ 10), however, replacing h by an estimate may have a large impact on inferences on c. In contrast to that, both marginal and conditional likelihoods have been constructed using criteria by which the resulting marginal and conditional submodels contain all available information on c that can be extracted from the sample in the absence of knowledge on the nuisance parameter h. Hence, one would expect that such ‘‘optimal’’ criteria would naturally lead to likelihoods with similar shapes. It is therefore reassuring to realize in the present example that although the marginal and conditional likelihoods are analytically different, they are however almost indistinguishable. For the sake of completeness, we also present two-sided confidence intervals for c based on and McCool’s (1970) Monte Carlo simulations of the distribution of the

306

BAR-LEV

MLE of c. At confidence levels 80% and 90% , these intervals are ð0:75; 2:59Þ and ð0:65; 3:36Þ; respectively.

5. Some Concluding Remarks In this paper we have attempted to stimulate the use of the likelihood principle by applying it to the shape parameter of the two parameter Weibull distribution. Numerous references in the statistical literature advocate the use of the likelihood approach (or the law of likelihood). Some of these references were indicated earlier. Additional references containing convincing arguments for embracing the likelihood principle are Basu (1975,1977), Kalbfleisch (1987), Ghosh (1988), Tsou and Royall (1995) and Royall (1997). The latter monograph includes a rich list of additional relevant references. Arguments against the use of the likelihood principle can be found in Berger and Wolpert (1988). Quite surprisingly, however, the intensified study and development of likelihood based approaches for inference have not produced the numerous applications by practitioners as could have been expected. The main reason for this phenomenon is linked, in my opinion, to the different perception of the classical frequency-based approach, which opposes the likelihood principle, and to sluggishness in digesting new concepts by practitioners; both statisticians and non-statisticians. Even a well established approach such as the Bayesian one, is often ignored, for example, by research oriented physicians in their published works in medical journals. Nevertheless, I believe that persistent attempts to develop and apply the likelihood approach may lead eventually to its use in various aspects and areas making a 5% or 10% likelihood intervals as conventional a notion as a 95% confidence interval. In general, although it is not quite correct to say that values of c outside of the 5% or 10% likelihood intervals are implausible at the 5% or 10% level of the respective relative likelihood (see Kalbfleisch and Sprott, 1970a), such a statement does give a rough idea of the range of implausible values for c. A 10% likelihood interval is not comparable with a two-sided confidence interval at a 90% confidence level or at any other level. These two intervals have different meanings. Confidence intervals are based on hypothetically many repetitions of the same experiment, whereas likelihood regions are based on the likelihood principle by which parameter values are ranked by how likely they make the observed sample. Accordingly, confidence intervals are to be interpreted with respect to many repetitions of a particular experiment. If these intervals are to be used as a routine device in situations where there will actually be many repetitions of a particular experiment, then the confidence approach is appropriate; however, if the situation is one of scientific inference in which there is a particular set of data to be analyzed, in which repetitions are not planned or in which repetitions may not be of the same form, then to this end at least, the likelihood approach seems more reasonable. Moreover, likelihood methods do not depend on asymptotic properties; thus, these methods are exact for small as well as large sample sizes.

LIKELIHOOD-BASED INFERENCE FOR THE SHAPE PARAMETER

307

Acknowledgments I am grateful to an Associate Editor and a referee for providing detailed and constructive comments which helped improving the quality of the paper. I also thank Benjamin Reiser for his valuable comments while revising the manuscript.

References G. A. Barnard, ‘‘The use of the likelihood function in statistical practice,’’ Proceedings Fifth Berkeley symposium on mathematics, Statistics and Probability, University of California Press, pp. 27–48, 1966. G. A. Barnard, G. M. Jenkins, and C. B. Winsten, ‘‘Likelihood inference and time series,’’ Journal of the Royal Statistical Society Series A 125, pp. 321–372, 1962. G. A. Barnard and D. A. Sprott, ‘‘The generalized problem of the Nile: Robust confidence sets for parametric functions,’’ Annals of Statistics vol. 11 pp. 104–113, 1983. O. Barndorff-Nielsen, Information and Exponential Families, Wiley: Chichester, 1978. D. Basu, ‘‘Statistical information and likelihood’’ (with discussion), Sankhya A vol. 37 pp. 1–71, 1975. D. Basu, ‘‘On the elimination of nuisance parameters’’, Journal of the American Statistical Association vol. 72 pp. 355–366, 1977. J. O. Berger and R. L. Wolpert, The Likelihood Principle, 2nd edition, IMS: Hayward, 1988. V. P. Bhapkar, ‘‘Loss of information in the presence of nuisance parameters and partial sufficiency,’’ Journal of Statistical Planning and Inference vol. 28 pp. 139–160, 1991. A. Birnbaum, ‘‘On the foundations of statistical inference’’ (with discussion), Journal of the American Statistical Association vol. 47 pp. 269–326, 1962. A. P. Dawid, ‘‘On the concepts of sufficiency and ancillarity in the presence of nuisance parameters’’, Journal of the Royal Statistical Society Series B vol. 37 pp. 248–258, 1975. A. W. F. Edwards, Likelihood: An Account of the Statistical Concept of Likelihood and its Application to Scientific Inference , Cambridge University Press: Cambridge, 1972. R. A. Fisher, ‘‘Two new properties of mathematical likelihood,’’ Proceedings of the Royal Society of London A vol. 144 pp. 285–307, 1934. R. A. Fisher, Statistical Methods and Scientific Inference, Oliver and Boyd: London, 1956. J. K. Ghosh (ed.), Statistical Information and Likelihood, Lecture Notes in Statistics, 45, Springer-Verlag: Heidelberg, 1988. V. P. Godambe, ‘‘On sufficiency and ancillarity in the presence of a nuisance parameter,’’ Biometrika vol. 67 pp. 155–162, 1980. N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions, V.2, John Wiley: New York, 1994. B. Jorgensen, ‘‘A review of conditional inference: is there a universal definition of nonformation?,’’ Bulletin of the International Statistics Institute, vol. 55 pp. 323–340, 1993. J. D. Kalbfleisch and D. A. Sprott, ‘‘Application of likelihood methods to models involving large number of parameters (with discussion),’’ Journal of the Royal Statistical Society Series B vol. 32 pp. 175–208, 1970a. J. D. Kalbfleisch and D. A. Sprott, ‘‘Applications of likelihood and fiducial probability to sampling finite population,’’ New Developments in Survey Sampling, John Wiley: New York, 1970b. J. D. Kalbfleisch and D. A. Sprott, ‘‘Marginal and conditional likelihoods,’’ Sankhya A vol. 35 pp. 311– 328, 1973. J. F. Lawless, Statistical Models and Methods for Lifetime Data, Wiley: New York, 1982. J. F. Lawless, Statistical Models and Methods for Lifetime Data (Second Edition), Wiley: New Jersey, 2003. J. K. Lindsey, Inferences from Sociological Survey Data: A Unified Approach, Elsevier: Amsterdam, 1973.

308

BAR-LEV

N. R. Mann, R. E. Schafer, and N. D. Singpurwalla, Methods for Statistical Analysis of Reliability and Life Data, Wiley: New York, 1974. J. I. McCool, ‘‘Inferences on the Weibull percentiles and shape parameter from maximum likelihood estimates‘‘, IEEE Transaction on Reliability, R-19, pp. 2–9, 1970. W. Q. Meeker and L. A. Escobar, Statistical Methods for Reliability Data, Wiley: New York, 1998. W. Nelson, Applied Life Data Analysis, Wiley: New York, 1982. W. Nelson, Accelerated Testing: Statistical Methods, Test Plans, Data Analysis, Wiley: New York, 1990. L. Pace and A. Salvan, Principles of Statistical Inference, World scientific: Singapore, 1997. M. Remon, ‘‘On a concept of partial sufficiency: L-sufficiency,’’ International Statistical Review vol. 52 pp. 127–136, 1984. R. Royall, Statistical Evidence, Chapman and Hall: London, 1997. B. Reiser and S. K. Bar-Lev, ‘‘Likelihood inference for life test data,’’ IEEE Transactions on Reliability, R-28, pp. 38–43, 1979. T. A. Severini, Likelihood Methods in Statistics, Oxford University Press: New York, 2000. D. A. Sprott, ‘‘Marginal and conditional sufficiency’’, Biometrika vol. 62 pp. 599–605, 1975. D. A. Sprott and J. D. Kalbfleisch, ‘‘Examples of likelihoods and comparisons with point estimates and large sample approximations,’’ Journal of the American Statistical Association vol. 64 pp. 468–484, 1969. T-S. Tsou and R. M. Royall, ‘‘Robust likelihoods,’’ Journal of the American statistical Association vol. 90 pp. 316–320, 1995. J. B. Whitney, ‘‘A likelihood analysis of some common distributions,’’ Journal of Quality Technology vol. 6 pp. 182–187, 1974. Y. Zhu and N. Reid, ‘‘Information, ancillarity, and sufficiency in the presence of nuisance parameters,’’ Canadian Journal of Statistics vol. 22 pp. 111–123, 1994.