Methodol Comput Appl Probab (2010) 12:199–212 DOI 10.1007/s11009-008-9112-4
Two New Mixture Models Related to the Inverse Gaussian Distribution Samuel Kotz · Víctor Leiva · Antonio Sanhueza
Received: 21 July 2007 / Revised: 3 November 2008 / Accepted: 3 November 2008 / Published online: 2 December 2008 © Springer Science + Business Media, LLC 2008
Abstract This article presents a new family of logarithmic distributions to be called the sinh mixture inverse Gaussian model and its associated life distribution referred as the extended mixture inverse Gaussian model. Specifically, the density, distribution function, and moments are developed for the sinh mixture inverse Gaussian distribution. Next, the extended mixture inverse Gaussian distribution is characterized. A graphical analysis of the densities of the new models is also provided. In addition, a lifetime analysis is presented for the extended mixture inverse Gaussian distribution. Finally, an example with a real data set is given to illustrate the methodology, which indicates that the new models result in a better fit to the data than some other wellknown distributions. Keywords Birnbaum-Saunders distribution · Goodness-of-fit · Likelihood methods · Moments · Sinh-normal distribution
AMS 2000 Subject Classifications Primary 60E05; Secondary 62N05
S. Kotz Department of Engineering Management and Systems Engineering, The George Washington University, Washington, DC, USA e-mail:
[email protected] V. Leiva (B) Departamento de Estadística, Universidad de Valparaíso, Casilla 5030, Valparaíso, Chile e-mail:
[email protected],
[email protected] A. Sanhueza Departamento de Matematica y Estadística, Universidad de La Frontera, Casilla 54-D, Temuco, Chile e-mail:
[email protected]
200
Methodol Comput Appl Probab (2010) 12:199–212
1 Introduction Many distributions with support in (−∞, +∞), such as the Cauchy, Gumbel, Laplace, log-gamma, logistic, normal, and Student-t (or simply t) models, correspond to random variables (r.v.) which can be viewed as logarithmic transformations of certain non-negative variates. In this paper, these models are called logarithmic distributions and they can be symmetric or asymmetric (with a non-zero coefficient of skewness). In the case of symmetric logarithmic distributions (e.g., the Cauchy, Laplace, logistic, normal, and t models), the associated life distribution is called logsymmetric; see Fang et al. (1990, pp. 55–64). However, this terminology is somewhat misleading since the transformation applied here is exponential. For asymmetric logarithmic distributions, however, the transformation is actually logarithmic, as is the case of the Gumbel (or log-Weibull or extreme values) and log-gamma models, whose associated distributions are the Weibull and gamma models, respectively. For some well-known logarithmic distributions as the Cauchy, Gumbel, Laplace, log-gamma, logistic, normal, and t models, the relationship between these distributions and their associated life distributions, i.e., the log-Cauchy, Weibull, log-Laplace, gamma, log-logistic, lognormal, and log-t models, respectively, are available and known; see some details in Marshall and Olkin (2007, pp. 427–445). For lesser known probability models, there are several reasons that justify the relationship between logarithmic distributions and their associated life distributions; see Sanhueza et al. (2008a). For instance, in log-linear models, the logarithmic distribution of the error term is required. In addition, it is often more efficient to estimate the parameters of a life distribution through its associated logarithmic distribution. Similarly, the moments of all orders of an r.v. following a life distribution can be obtained via its associated logarithmic distribution. Moreover, it is possible to generate random numbers from a life distribution using its associated logarithmic distribution by applying more efficient computational methods. Furthermore, in many occasions the hazard rate (an important characteristic of life distributions) can admit ∩-and-increasing shapes when the life distribution is generated through its logarithmic distribution. In this paper, two new models are presented: a logarithmic distribution and its associated life distribution. The logarithmic distribution is named the sinh (remember that sinh is the hyperbolic sine) mixture inverse Gaussian (SMIG) model and has been developed by using a procedure similar to the one proposed by Rieck and Nedelman (1991) for obtaining the so-called sinh-normal (SN) distribution. The associated life distribution is named the extended mixture inverse Gaussian (EMIG) model and has been obtained by using an exponential transformation of an r.v. with SMIG distribution. The main motivation for these new models is based on two factors. The first is that various well-known models are particular cases of the new distributions and therefore some of their properties can be extended to the case of the SMIG and EMIG models. The second is that these new models provide greater flexibility, which can result in a better fit for different types of data. Next, we describe the connection pieces on determining the two new models developed in this paper. In our opinion, these pieces present a vivid and constructive picture of the developments of these two models aiming to provide a meaningful representation of lifetime (and related) data and shed light on the routes taken by several independent researchers along this path.
Methodol Comput Appl Probab (2010) 12:199–212
201
The first ingredient in the structure of the SMIG and EMIG distributions is the inverse Gaussian (IG) model. At least two books by prominent statisticians have appeared in the last 20 years solely devoted by this subject: Chhikara and Folks (1989) and Seshadri (1999). Formally, the two-parameter IG distribution, denoted by T ∼ IG(μ, λ), has probability density function (pdf) and characteristic function given by √ √ √ 2 √ μ 1 1 λ t λ fT (t) = √ exp − (1) √ ; t > 0, √ √ − √ 2 μ μ t 2π t3 and
λ 2μ2 1− 1− is ; ϕT (s) = exp μ λ
i=
√
−1,
(2)
respectively, where both μ (the mean) and λ (scale) are positive. This unimodal pdf is a member of the exponential family and is skewed to the right; see Seshadri (1993). Tweedie (1957) named this distribution, but the model was originally derived by Schrödinger (1915). The most direct genesis of the IG distribution is given via a Wiener process with positive drift ν, which is simply a stochastic process X ≡ {X(t), t ∈ T} with independent increments, where X(0) = 0 and X(t) ∼ N(νt, σ 2 t). Here the lifetime for that X(t) reaches a value a, with a > 0, for the first time, has the pdf given by Eq. 1, with μ = a/ν and λ = a2 /σ 2 . The IG distribution has been used in fields as diverse as demography, ecology, hydrology, the internet, and meteorology. An alternative parametrization suggested in√this paper (to avoid some difficulties related to the parameter μ) is given by α = μ/λ and β = μ, so that T ∼ IG(α, β), where α > 0 and β > 0 are the shape and scale parameters, respectively. (Of course β is the mean of T). In this parametrization, the pdf and cumulative distribution function (cdf) of the r.v. T are, respectively, given by √ √ √ t β β 1 fT (t) = φ (3) √ − √ √ ; t > 0, α > 0, β > 0, α β t α t3 and FT (t) = Φ
1 α
√ √ √ √ 1 t β t β 2 Φ − + exp ; √ − √ √ + √ α2 α β β t t
t > 0,
(4)
where φ(·) and Φ(·) denote the standard normal pdf and cdf, respectively. If T ∼ IG(α, β), then its length-biased version, called length-biased inverse Gaussian (LBIG) distribution, is given by means of the r.v. L = β 2 /T (also called complementary reciprocal of T) and is denoted by L ∼ LBIG(α, β). The corresponding pdf of the r.v. L is given by f L (l) = l fT (l)/β, with l > 0 and β > 0, which can be expressed as √ √ l β 1 1 f L (t) = φ (5) √ − √ √ √ ; l > 0, α > 0, β > 0. α β l α β l
202
Methodol Comput Appl Probab (2010) 12:199–212
Note that the pdf of the r.v. L does not involve additional parameters respect to the r.v. T. These results are due to Jörgensen et al. (1991) and Gupta and Akman (1995). The LBIG distribution is the second ingredient for generating the new models. A third ingredient is based on mixture distributions. It is well-known that the mixture models are very powerful and popular tools for generating flexible distributions with good properties; see McLachlan and Peel (2000). For our purposes, the mixture inverse Gaussian (MIG) distribution represented by the mixture of the IG and LBIG models is of interest. Thus, if T ∼ IG(α, β) and L ∼ LBIG(α, β), the corresponding pdf of the r.v. M with MIG distribution is (in the obvious notation) f M (m) = [1 − p] fT (m) + p f L (m);
m > 0, 0 < p < 1,
which can be written as √ √
√ 1 m β β 1 [1 − p] √ ; +p √ √ f M (m) = φ √ −√ α m β α β m α m3
m > 0,
(6)
with α > 0, β > 0, and 0 < p < 1. This distribution is denoted by M ∼ MIG(α, β, p). A fourth ingredient is based on a model developed by Birnbaum and Saunders (1969) that describes lifetime to failure by fatigue of a material specimen subject to a cyclically repeated stress pattern, which is called the Birnbaum-Saunders (BS) distri
bution. This model is defined in terms of the r.v. T = β α Z /2 + [α Z /2]2 + 1 2 , where Z is a standard normal r.v. and α > 0 and β > 0 are the shape and scale parameters, respectively. In this case, the notation T ∼ BS(α, β) is used. Note that the median of the r.v. T is β. The pdf and cdf of T ∼ BS(α, β) are, respectively, given by √ √ 1 t β [t + β] fT (t) = φ (7) √ − √ √ √ ; t > 0, α > 0, β > 0, α β t 2 α β t3 and
FT (t) = Φ
1 α
√ √ t β ; √ − √ β t
t > 0.
(8)
It is easy to verify that the BS pdf given in Eq. 7 is the mixture pdf described in Eq. 6 when p = 1/2. In other words, the BS distribution is an equally weighted mixture of the IG and LBIG models. This relationship between the BS and IG distributions has been observed by several authors; see Desmond (1986), Jörgensen et al. (1991), Marshall and Olkin (2007, pp. 466–468), and Saunders (2007, pp. 164–165). In principle, both distributions (BS and IG) deal with lifetime until the occurrence of a certain event, but the specific geneses of these two models are different. However, both distributions are related to fatigue failure models; see Bhattacharyya and Fries (1982). Our final ingredient for understanding and analyzing the structure of the SMIG and EMIG distributions leads us to the mentioned concept of logarithmic distributions (of which the sinh-normal one is a particular case). In spite of its obvious limitations, the logarithmic transformation of an r.v. has been the most popular tool in both theoretical and applied distribution theory for over 100 years. In two consecutive papers, Galton (1879) and McAlister (1879) originated the study of logarithmic transformations of the normal distribution, which was continued
Methodol Comput Appl Probab (2010) 12:199–212
203
rigorously by Kapteyn and van Uven (1916), Wicksell (1917), and Gibrat (1930), to mention only a few investigators, including the most prominent A.N. Kolmogorov and D.R. Cox. This article consists of four parts. In the second part, the new logarithmic distribution is developed including its pdf, cdf, and moments. In the third part, the life distribution associated with the new logarithmic distribution is characterized. Some graphical plots and a brief analysis of these are also given. In addition, a lifetime analysis and moments for this new life distribution are presented. Finally, in the fourth part, for the purposes of illustration, an example with real data is carried out. Maximum likelihood (ML) and goodness-of-fit methods are used to estimate the parameters and study the fit of the new model to the data, respectively.
2 The New Logarithmic Distribution The new model is developed by using the following arguments. Over 50 years ago a leading statistician of the twentieth century, N.L. Johnson (1917–2004), proposed the now famous transformation (see Johnson 1949) of a standard normal r.v. Z defined by Y −γ Z =ν+δg , σ where g(·) is a monotone function (which can be assumed without loss of generality as a non-decreasing function), δ > 0, and σ > 0. For our purposes, we use the case ν = 0, δ = 2/α, and g(·) = sinh(·), such that Y −γ 2 . (9) Z = sinh α σ (We recall that the sinh is defined by sinh(u) = [exp(u) − exp(−u)]/2, where u = a + b i is a complex number. If u is a real number, i.e., with b = 0, then we have that sinh(u) is given by a concave increasing curve—passing trough the origin— for u < 0 and by a convex increasing curve for u > 0. Similarly, cosh(u) = [exp(u) + exp(−u)]/2 is represented by a ∪-shaped symmetric curve from the value 1 up to ∞, with cosh(u) = cosh(−u).) From Eq. 9, we have −1 α Z Y = γ + σ sinh , (10) 2 where sinh−1 (·) = arcsinh(·) is the inverse sinh function which is defined by √ arcsinh(u) = log(u + u2 + 1). The distribution of the r.v. Y given in Eq. 10 corresponds to the SN model with pdf 2 y−γ 2 y−γ fY (y) = φ sinh cosh ; y ∈ R, α > 0, γ ∈ R, σ > 0. α σ ασ σ (11) Note that α, γ, and σ are shape, location, and scale parameters, respectively. For more details on the SN distribution, see Leiva et al. (2007). It is important to highlight that the SN distribution may serve as an alternative model to the normal and t
204
Methodol Comput Appl Probab (2010) 12:199–212
distributions since this is symmetric and its kurtosis can be smaller or greater than that of the normal model depending on the value of α. The value σ = 2 in Eq. 11 yields the log-BS distribution (logarithmic version of the BS distribution). Thus, the BS distribution is obtained by the transformation T = exp(Y) from Eq. 10, with β = exp(γ). 2.1 Density Probability Function An r.v. Y follows the SMIG distribution of shape (α), location (γ), scale (σ ), and mixing ( p) parameters, denoted by Y ∼ SMIG(α, γ, σ, p), if its pdf is given by
y−γ 2 y−γ y−γ 2 fY (y) = φ sinh [1 − p] exp − + p exp , α σ ασ σ σ (12) with y ∈ R, α > 0, γ ∈ R, σ > 0, and 0 < p < 1. It is clear that the SMIG pdf given in Eq. 12 is an extension of Eq. 11, where 1/2 twice appearing in cos([y − γ]/σ ) = [1/2] exp([y − γ]/σ ) + [1/2] exp(−[y − γ]/σ ) are replaced by p and 1 − p. Evidently, for p = 0, we get the sinh-IG (SIG) model as the logarithmic distribution associated with the IG one; see Kanefuji and Iwase (1996) and Leiva et al. (2008a). For p = 1, we arrive at the sinh-LBIG (SLBIG) model as the logarithmic distribution associated with the LBIG one. Of course, for p = 1/2, we have the SN distribution. The pdf graphs of SMIG distributions presented in Fig. 1 (right side) clearly indicate the flexibility of the new model. 2.2 Cumulative Distribution Function The cdf of an r.v. can be employed to compute the percentiles of its distribution by using the quantile function (q.f.), to generate random numbers, and to produce goodness-of-fit methods. Let Y ∼ SMIG(α, γ, σ, p). Then, the cdf of Y is given by 2 y−γ 2 FY (y) = Φ sinh + [1 − 2 p] exp α σ α2 ⎞ ⎛ 2 y − γ 2 sinh + 1 ⎠ ; y ∈ R. (13) ×Φ ⎝− α σ
2.3 Moments The moment generating function (mgf) of Y ∼ SMIG(α, γ, σ, p) is given by 2 exp(1/α 2 ) ∞ 1 2[y − γ] exp (sy) exp − 2 cosh MY (s) = √ α σ α σ 2π −∞
y−γ y−γ + p exp dy. × [1 − p] exp − σ σ
205
4
Methodol Comput Appl Probab (2010) 12:199–212
2.0
SMIG(α = 1, γ = 0, σ = 0.5, p = 0.25) SMIG(α = 1, γ = 0, σ = 1, p = 0.25) SMIG(α = 1, γ = 0, σ = 2, p = 0.25) SMIG(α = 1, γ = 0, σ = 4, p = 0.25)
f(y)
0
0.0
0.5
1
1.0
2
f(t)
1.5
3
EMIG(α = 1, β = 1, σ = 0.2, p = 0.25) EMIG(α = 1, β = 1, σ = 0.5, p = 0.25) EMIG(α = 1, β = 1, σ = 1, p = 0.25) EMIG(α = 1, β = 1, σ = 4, p = 0.25)
0.0
0.5
1.0
1.5
2.0
2.5
–6
3.0
–4
–2
2
4
6
0.7
SMIG(α = 1, γ = − 1.4, σ = 2, p = 0.5) SMIG(α = 2, γ = 1.4, σ = 2, p = 0.5) SMIG(α = 2, γ = 0, σ = 2, p = 0.5) SMIG(α = 4, γ = 0, σ = 2, p = 0.5)
0
0.0
0.1
2
0.2
4
0.3
f(t)
f(y)
0.4
6
0.5
0.6
EMIG(α = 0.25, β = 0.25, σ = 2, p = 0.5) EMIG(α = 0.5, β = 0.25, σ = 2, p = 0.5) EMIG(α = 1, β = 1, σ = 2, p = 0.5) EMIG(α = 2, β = 1, σ = 2, p = 0.5)
8
0 y = log(t)
t
0.0
0.5
1.0
1.5
2.0
2.5
3.0
–6
–4
–2
2
4
6
SMIG(α = 1, γ = − 1.4, σ = 2, p = 0) SMIG(α = 2, γ = 1.4, σ = 2, p = 0) SMIG(α = 2, γ = 0, σ = 2, p = 0) SMIG(α = 4, γ = 0, σ = 2, p = 0)
0
0.0
0.1
2
0.2
4
0.3
f(t)
f(y)
0.4
6
0.5
0.6
EMIG(α = 0.25, β = 0.25, σ = 2, p = 0) EMIG(α = 0.5, β = 0.25, σ = 2, p = 0) EMIG(α = 1, β = 1, σ = 2, p = 0) EMIG(α = 2, β = 1, σ = 2, p = 0)
8
0 y = log(t)
0.7
10
t
0.0
0.5
1.0
1.5
2.0
2.5
–6
3.0
–4
–2
0.7
10
2
4
6
SMIG(α = 1, γ = − 1.4, σ = 2, p = 1) SMIG(α = 2, γ = 1.4, σ = 2, p = 1) SMIG(α = 2, γ = 0, σ = 2, p = 1) SMIG(α = 4, γ = 0, σ = 2, p = 1)
0
0.0
0.1
2
0.2
4
0.3
f(t)
f(y)
0.4
6
0.5
0.6
EMIG(α = 0.25, β = 0.25, σ = 2, p = 1) EMIG(α = 0.5, β = 0.25, σ = 2, p = 1) EMIG(α = 1, β = 1, σ = 2, p = 1) EMIG(α = 2, β = 1, σ = 2, p = 1)
8
0 y = log(t)
t
0.0
0.5
1.0
1.5 t
2.0
2.5
3.0
–6
–4
–2
0 y = log(t)
Fig. 1 Pdf graphs of the EMIG and SMIG distributions for the indicated values
2
4
6
206
Methodol Comput Appl Probab (2010) 12:199–212
Carrying out the change of variable u = 2[y − γ]/σ , we obtain [1 − p] K s σ −1 1/α 2 + p K s σ +1 1/α 2 2 2 MY (s) = exp(γs) , K 1 1/α 2
(14)
2
where Kν (·) is the modified Bessel function of the third kind; see Gradshteyn and Ryzhik (2000, p. 918). Indeed noting that (see Whittaker and Watson 1927, Chapter 17, Example 40)
∞
Kν (z) =
exp(−z cosh(t)) cosh(νt) dt 0
=
zν (1/2) cos(νπ ) 2ν (ν + 1/2)
∞
exp(−z cosh(t)) sinh(2νt) dt 0
2ν z ν = √ (ν + 1/2) cos(νπ ) π
∞
2
−ν−1/2 u + z2 cos(u) du
0
√ √ and using the well-known formula K1/2 (z) = K−1/2 (z) = π exp(−z)/ 2 z, yielding √ K1/2 (1/α 2 ) = α 2π exp(1/α 2 )/2, we arrive at Eq. 14. Details of the straightforward but tedious calculations are available from the authors. Note that Gradshteyn and Ryzhik (2000) provide expressions for integer-order Bessel functions. Alternatively, the R software (http://www.R-project.org) can be used by means of the function besselK(). R (R Development Core Team 2008) language is a non-commercial and open-source software package for statistical computing and graphics that can be obtained at no cost from CRAN (http://CRAN.R-project.org).
3 The Life Distribution Associated with the SMIG Model As mentioned, the EMIG distribution is associated with a logarithmic distribution that we call the SMIG model. Thus, if Y ∼ SMIG(α, γ, σ, p), then T = exp(Y) has the EMIG distribution. The notation T ∼ EMIG(α, β, σ, p) is used here for this new life distribution, where β = exp(γ). 3.1 Shape Analysis and Properties Let T ∼ EMIG(α, β, σ, p). Then, the pdf of T is given by fT (t) = φ(at ) [1 − p]
1
2βσ
1 +1
σ α tσ
+p
1
2 t σ −1 1
σ αβσ
;
t > 0,
(15)
where at = [[t/β]1/σ − [β/t]1/σ ]/α, with α > 0, β > 0, σ > 0, and 0 < p < 1. Some particular cases of the EMIG distribution are the BS, IG, MIG, and LBIG distributions, so that they are nested within the EMIG distribution. Also, the extended inverse Gaussian (EIG) and extended Birnbaum-Saunders (EBS) distributions are special cases of the EMIG model by considering p = 0 and 1/2, which can be revised in Leiva et al. (2008a) and Sanhueza et al. (2008b), respectively.
Methodol Comput Appl Probab (2010) 12:199–212
207
The mode of the r.v. T ∼ EMIG(α, β, σ, p), which is denoted by tm , may be computed as the solution of the equation ⎡ ⎤ 2 2 1 σ +1 σ a β σ [1 + σ ] [1 − σ ] t t 1 m m t m ⎦ = 0. [1 − p] atm β σ tm + + p⎣ − (16) 1 1 σ βσ σ βσ A graphical representation of the EMIG distribution for selected values of its four parameters is presented on the left side panel of Fig. 1. By comparing the graphs corresponding to β = 1 with those having β = 0.25, we observe that β in the EMIG distribution is a scale parameter, which also behaves as a location parameter. (For the BS distribution, β is the median, while for the IG distribution, it is the mean.) In both models, we observe that changes in p do not result in substantial alterations in the shape of the densities. In addition, the parameter σ in the EMIG distribution modifies the symmetry, but for the SMIG distribution σ is strictly a location parameter. Moreover, α affects the kurtosis of the EMIG distribution. The cdf of the r.v. T ∼ EMIG(α, β, σ, p) is given by 2 4 2 FT (t) = Φ(at ) + [1 − 2 p] exp (17) Φ − at + 2 ; t > 0. α2 α Analogously to the BS distribution, the EMIG distribution is closed under reciprocation, i.e., 1/T ∼ EMIG(α, 1/β, σ, 1 − p); see Saunders (1974). Furthermore, the EMIG distribution satisfies the following properties: c T ∼ EMIG(α, c β, σ, p), with c > 0, and U = [T/β + β/T − 2]/α 2 ∼ χ 2 (1). 3.2 Hazard Rate The hazard rate (h.r.) is defined by hT (t) = fT (t)/[1 − FT (t)], with t > 0 and 0 < FT (t) < 1, where fT (t) and FT (t) are the pdf and cdf of T, respectively. For T ∼ EMIG(α, β, σ, p), its h.r. is given by
1 1 1 1 2φ(at ) 1 − p β σ t− σ −1 + pt σ −1 β − σ , t > 0. (18) hT (t) = ασ (−at ) − [1 − 2 p] exp(2/α 2 )(− a2t + 4/α 2 ) The change point (denoted by tc ) of the h.r. is of importance in a lifetime analysis, especially when the distribution has a non-monotone hazard rate. If tc is known, one can arrive at the inflexion point of the hazard. In general, the change point of the h.r. of an r.v. T is just the solution of the equation dhT (t)/dt = 0. In the case of the EMIG distribution, the calculations are some intricate but straightforward. The limiting behavior of hT (t) for t → ∞ is expressed as ⎧ σ > 2, ⎪ ⎨ 0, 1 hT (t) → (19) 2 , σ = 2; ⎪ ⎩ 2α β ∞, σ < 2. This result can be derived using the L’Hopital rule. Consequently, the EMIG distribution admits ∩-shaped and increasing hazard rates depending on the value of σ .
208
Methodol Comput Appl Probab (2010) 12:199–212
Table 1 Values of E[T k ], for k = 1, 2, 3, 4, in the indicated models Model E[T] E T2 E T3
β2
β3
β 2 2 4 2 4 6 BS 2 2+α 2 2 + 4α + 3α 2 2 + 9α + 18α + 15α IG LBIG
β
β 1 + α2
β 2 1 + α2
β 2 1 + 3α 2 + 3α 4
β 3 1 + 3α 2 + 3α 4
β 3 1 + 6α 2 + 15α 4 + 15α 6
E T4 β4 2 4 2 2 + 16α + 60α
+ 120α 6 + 105α 8
β 4 1 + 6α 2 + 15α 4 + 15α 6 β 4 1 + 10α 2 + 45α 4
+ 105α 6 + 105α 8
3.3 Moments The moments of a positive continuous r.v. T can be obtained by means of its logarithmic transformation, Y = log(T), as
MY (s) = E exp(s Y) = E T s . (20) Now, from the mgf of the SIMG distribution given in Eq. 14 and from Eq. 20, we easily compute the moments of the EIMG distribution for any real number s as [1 − p] K s σ −1 1/α 2 + p K s σ +1 1/α 2 s s 2 2 . (21) E[T ] = β K 1 1/α 2 2
By using Eq. 21, we can calculate, for example, the first four moments of T ∼ EMIG(α, β, σ, p). For p = 0, 1/2, 1, σ = 2 and s = 1, 2, 3, 4, some particular cases can be determined, which coincide with the moments of the IG distribution (see Johnson et al. 1994, pp. 262–264), the BS distribution (see Johnson et al. 1995, p. 653 and Kotz et al. 2005, pp. 2247–2250), and the LBIG distribution (see Balakrishnan et al. 2009). Table 1 presents a summary of these particular cases. Based on this table, it is possible to compute the standard deviation (SD) and the coefficients of variation (CV), skewness (CS), and kurtosis (CK) of the EMIG distribution.
4 Numerical Illustration and Implementation In this section, some of the obtained results for the new model are illustrated. Firstly, by using the ML method and based on real data, the parameters of the EMIG distribution are estimated. Next, goodness-of-fit for this model and a comparison with other models are established. Finally, the ML estimates are used for obtaining an estimation of the pdf of the EMIG model. R packages named bs and ig are available from CRAN for the BS and IG distributions; see Leiva et al. (2006, 2008b). A new R package named emig has been developed by the authors in order to analyze data from the EMIG distribution, which is available upon request. The emig package contains probabilistic and lifetime indicators of the EMIG and SMIG models. For example, the graphical plots of EMIG and SMIG densities shown in Fig. 1 were made with the functions demig() and dsmig(), respectively. Furthermore, the functions pemig(), qemig(), and remig() allow us to compute the cdf and q.f. of the EMIG distribution and to generate random numbers from this model, respectively. ML estimates of the parameters of the EMIG distribution can be computed through the function mleemig().
Methodol Comput Appl Probab (2010) 12:199–212
209
Table 2 Descriptive statistics for precipitations (in inches) Mean
Median
SD
CV
CS
CK
Range
Min.
Max.
n
2.16
1.64
1.25
57.8
1.34
0.97
4.61
1.01
5.62
25
The likelihood ratio (LR) test is implemented in the emig package by the function rlemig(), which is useful to check the suitability of the EMIG distribution. Also, we implement three criteria of model selection based on loss of information for the EMIG distribution. These methods are the Schwarz (SIC), Akaike (AIC), and Hannan-Quinn (HQC) information criteria and may be computed by the functions sicemig, aicemig, and hqcemig, respectively. 4.1 Data Exploratory Analysis For the purposes of illustration, we analyze a real data set corresponding to precipitations (inches) from Jug Bridge, Maryland. These data were analyzed by Folks and Chhikara (1987) using the MIG and IG distributions and they were implemented in the ig package. The data are: 1.01, 1.11, 1.13, 1.15, 1.16, 1.17, 1.17, 1.20, 1.52, 1.54, 1.54, 1.57, 1.64, 1.73, 1.79, 2.09, 2.09, 2.57, 2.75, 2.93, 3.19, 3.54, 3.57, 5.11, 5.62. Table 2 shows a descriptive summary, including the sample SD, CV, CS, and CK, while Fig. 2 (left side) shows a histogram of the data. The results presented in Table 2 were obtained by the function descriptiveSummary() of the ig package. A look at Table 2 and Fig. 2 (left side) reveals a positively skewed distribution (CS = 1.34) with moderate kurtosis (CK = 0.97). We propose the EMIG distribution for modeling these precipitations in Jug Bridge, Maryland. 4.2 Estimation and Model Checking
0.4
0.6
Density
0.6 0.4
0.0
0.0
0.2
0.2
Density
0.8
0.8
1.0
1.0
Firstly, in order to estimate the parameters α, β, σ , and p of the EMIG distribution, we consider the ML estimation method, which requires of a numerical iterative
1
2
3
4
Precipitations (inches)
5
6
0.0
0.5
1.0
1.5
2.0
Log–precipitations
Fig. 2 Estimated pdf curves of the EMIG and SMIG distributions based on precipitations and logprecipitations, respectively, and the histograms of these data
210
Methodol Comput Appl Probab (2010) 12:199–212
Table 3 ML estimates and log-likelihood () for the indicated models and LR statistics (LRS) with their respective p-values for testing H0 : indicated model vs. H1 : EMIG model, based on the data of precipitations from Jug Bridge, Maryland Model
αˆ
βˆ
σˆ
pˆ
−
LRS
p-value
EMIG BS IG LBIG MIG
6.29 0.50 0.52 0.52 0.52
2.43 1.92 2.16 1.70 2.16
32.65 − − − −
0.34 − − − −
29.2 33.4 33.3 33.6 33.3
− 8.45 8.17 8.79 8.17
− 0.0146 0.0168 0.0124 0.0043
procedure. As starting values for the parameters, we use the ML estimates of α and β of the BS distribution and pre-fix σ = 2 and p = 0.5. Secondly, as mentioned, since the BS, IG, MIG, and LBIG distributions are nested within the EMIG distribution, in order to contrast the hypotheses of a competitive distribution (BS, IG, MIG, and LBIG) against an alternative considering the EMIG distribution, we use the standard LR test. The LR statistic is −2 log() and follows approximately the χ 2 distribution with degrees of freedom corresponding to the difference between the number of parameters of the two competitive models. For example, if we want to contrast the BS distribution (H0 : σ = 2, p = 0.5) against the EMIG distribution, then = LBS (α, β)/LEMIG (α, β, σ, p), where LBS (α, β) and LEMIG (α, β, σ, p) denote the likelihood functions of the BS and EMIG distributions, respectively, and −2 log() ∼ ˙ χ 2 (2). Table 3 summarizes the parameter estimates, log-likelihood functions, and LR statistics with their respective approximate p-values for the BS, IG, LBIG, MIG, and EMIG distributions based on the analyzed data. From this table, we conclude that the EMIG distribution fits the data much better than the BS, IG, LBIG, and MIG distributions (p-values < 0.02). In order to verify the results presented in Table 3, we have considered the criteria of model selection: SIC, AIC, HQC for the EMIG distribution, which are implemented in the emig package. The results for the considered criteria based on the precipitations data are presented in Table 4. All of these criteria confirm that the EMIG distribution presents a better fit than the BS, IG, LBIG, and MIG distributions. In order to state graphically how the EMIG distribution fits the precipitations data, we use the invariance property of the ML estimators for obtaining the estimated EMIG pdf. This fit is shown on the left side of Fig. 2 superimposed on the histogram of the data. In addition, a fit of the SMIG model to the logarithm of the data is also shown and superimposed on the histogram of the log-precipitations in Fig. 2 (right side). The results presented here show the excellent agreements between the EMIG and SMIG models and the data of precipitations and their logarithms, respectively.
Table 4 SIC, AIC, and HQC for the indicated distributions based on the precipitations data
Distribution
SIC
AIC
HQC
BS IG LBIG MIG EMIG
1.46 1.46 1.47 1.52 1.42
1.42 1.41 1.42 1.45 1.33
1.43 1.42 1.44 1.47 1.35
Methodol Comput Appl Probab (2010) 12:199–212
211
5 Conclusions In this article, a new logarithmic distribution and its associated life distribution have been developed, which turn out to be quite flexible. Their densities and several plots of them have been given, showing how the parameters influence the shape of these densities. Also, some properties of the new models have been presented. An example with real data has shown that these new models are a flexible alternative to (and a better fit than) other models. With these extensions, new families of probability models, which might be of use for various practitioners in, for example, actuarial, engineering, environmental, and medical sciences, have been developed. In addition, the computational implementation of these new models has been provided and discussed, which is available from the authors upon request. Some future works that can be mentioned are: (i) an extension of the results obtained in this paper to distributions with heavier-than-normal tails, which can lead to more general families and robust qualitatively parameter estimates; (ii) regression models related to these new models; and (iii) techniques of diagnostics in order to detect potentially influential observations and to check the suitability of the new models. Acknowledgements The authors would like to thank the editor and referees for their helpful comments which aided in improving this article. This study was supported by FONDECYT 1080326, DIPUV 29/2006 and DIUFRO DI08-1002 grants, Chile.
References Balakrishnan N, Leiva V, Sanhueza A, Cabrera E (2009) Mixture inverse Gaussian distribution and its transformations, moments and applications. Statistics (in press). http://dx.doi.org/ 10.1080/02331880701829948 Bhattacharyya GK, Fries A (1982) Fatigue failure models: Birnbaum-Saunders versus inverse Gaussian. IEEE Trans Reliab 31:439–440 Birnbaum ZW, Saunders SC (1969) A new family of life distributions. J Appl Probab 6:319–327 Chhikara RS, Folks JL (1989) The inverse Gaussian distribution: theory, methodology, and applications. Marcel Dekker, New York Desmond AF (1986) On the relationship between two fatigue-life models. IEEE Trans Reliab 35:167–169 Fang KT, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman, London Folks JL, Chhikara RS (1987) The inverse Gaussian distribution and its statitical application—a review. J R Stat Soc Ser B 40:263–289 Galton F (1879) The geometric mean, in vital and social statistics. Proc R Soc 29:365–367 Gibrat R (1930) Les inegalités économiques. Sirey, Paris Gradshteyn IS, Ryzhik IM (2000) Table of integrals, series and products. Academic, New York Gupta RC, Akman HO (1995) On the reliability studies of the weighted inverse Gaussian model. J Stat Plan Inference 48:69–83 Jörgensen B, Seshadri V, Whitmore G (1991) On the mixture of the inverse Gaussian distribution with its complementary reciprocal. Scand J Statist 18:77–89 Johnson NL (1949) Systems of frequency curves generated by methods of translation. Biometrika 36:149–176 Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions, vol 1. Wiley, New York Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, vol 2. Wiley, New York Kapteyn J, van Uven MJ (1916) Skew frequency curves in biology and statistics. Hoitsema Brothers, Groningen Kanefuji K, Iwase K (1996) Exponential inverse Gaussian distribution. Comput Stat 11:315–326
212
Methodol Comput Appl Probab (2010) 12:199–212
Kotz S, Campbell BR, Balakrishnan N, Vidakovic B (eds) (2005) Encyclopedia of statistics sciences, vol 4. Wiley, New York Leiva V, Hernández H, Riquelme M (2006) A new package for the Birnbaum-Saunders distribution. R News 6:35–40. http://www.R-project.org/doc/Rnews/Rnews_2006-4.pdf Leiva V, Barros M, Paula GA, Galea M (2007) Influence diagnostics in log-Birnbaum-Saunders regression models with censored data. Comput Stat Data Anal 51:5694–5707 Leiva V, Sanhueza A, Silva A, Galea M (2008a) A new three-parameter extension to the inverse Gaussian distribution. Stat Probab Lett 78:1266–1273 Leiva V, Hernández H, Sanhueza A (2008b) An R package for a general class of inverse Gaussian distributions. J Stat Softw 26(4). http://www.jstatsoft.org/v26/i04 Marshall AW, Olkin I (2007) Life distributions. Springer, New York McAlister D (1879) The law of the geometric mean. Proc R Soc 29:367–376 McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York R Development Core Team (2008) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org Rieck JR, Nedelman JR (1991) A log-linear model for the Birnbaum-Saunders distribution. Technometrics 33:51–60 Sanhueza A, Leiva V, Balakrishnan N (2008a) A new class of inverse Gaussian type distributions. Metrika 68:31–49 Sanhueza A, Leiva V, Balakrishnan N (2008b) The generalized Birnbaum-Saunders distribution and its theory, methodology and application. Commun Stat Theory Methods 37:645–670 Saunders SC (1974) A family of random variables closed under reciprocation. J Am Stat Assoc 69:533–539 Saunders SC (2007) Reliability, life testing and prediction of services lives. Springer, New York Schrödinger E (1915) Zur theorie der fall-und steigversuche und teilchen mit Brownscher bewegung. Phys Z 16:289–295 Seshadri V (1993) The inverse Gaussian distribution: a case study in exponential families. Claredon, New York Seshadri V (1999) The inverse Gaussian distribution: statistical theory and applications. Springer, New York Tweedie MCK (1957) Statistical properties of the inverse Gaussian distribution I. Ann Math Stat 28:362–377 Wicksell SD (1917) On the genetic theory of frequency. Arkiv för Matematik, Astronomi och Fysik 12:1–56 Whittaker ET, Watson GN (1927) A course of modern analysis. Cambridge University Press, Cambridge