On a bivariate Kumaraswamy type exponential distribution S. M. Mirhosseini Department of Statistics, Faculty of Mathematical Science, Ferdowsi University of Mashhad,P.O. Box 91775-1159, Mashhad, Iran Department of Statistics, Yazd University, Yazd, Iran
A. Dolati Department of Statistics, Yazd University, Yazd, 89195-741, Iran
M. Amini1 Department of Statistics, Ordered and Spatial Data Center of Excellence, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, P.O.Box.1159 Mashhad 91775 Iran
Abstract This paper considers a class of absolutely continuous bivariate exponential distributions whose univariate margins are the ordinary exponential distributions. We study different mathematical properties of the proposed model. The estimation of the parameters by maximum likelihood is discussed. Application is made to a real data example to illustrate the flexibility of the proposed distribution for data analysis. Keywords: Bivariate exponential distribution; bivariate Kumaraswamy distribution; Copula; Dependence; Measure of association; 2000 MSC: Primary 62E15; Secondary 62H10 1. Introduction In many reliability situations bivariate lifetime data arise and in these cases it is important to consider different distributions that could be used to model such data. According to Joe [10], Section 4.1, a parametric family of bivariate distributions should satisfy four desirable properties: (a) There should exist an interpretation like a mixture or other stochastic representation. (b) The margins, at least the univariate, should belong to the same parametric family. (c) The bivariate dependence between Email address: (Corresponding author)
[email protected] (M. Amini)
Preprint submitted to Communications in Statistics - Theory and Methods
June 22, 2014
the margins should be described by a parameter and cover a wide range of dependence. (d) The joint distribution and the density should preferably have a closed-form representation; at least numerical evaluation should be possible. In general, these desirable properties cannot be fulfilled simultaneously. There is no known bivariate family of distributions that has all of the properties but the family of bivariate normal distributions may be the closest; see, e.g, [10], Section 4.1. It is yet an open problem to find parametric families of distributions that satisfy all of the desirable properties. The exponential distributions are perhaps the most widely applied statistical distributions in reliability. A large number of bivariate exponential distributions have been proposed in literature; see , e.g.[1, 2, 5, 12]. A wide survey on bivariate exponential and related distributions could be seen in chapter 10 of the recent book by Balakrishnan and Lai [3]. Motivated by the above mentioned Joe’s comment, the aim of this paper is to propose a bivariate exponential distribution satisfying all of the properties (a), (b), (d) and a suitable range of dependence for applications. Starting from two univariate cumulative distribution functions (c.d.f) F1 and F2 we consider a bivariate distribution of the form Hα (x, y) = 1 − (1 − F1 (x)F2 (y))α , x, y ∈ R.
(1)
The univariate marginal survival functions of (1) belong the well-known proportional α
hazard rate (PHR) model given by H i (x) = F i (x), i = 1, 2. Unlike the univariate case, the function defined by (1) is not necessary a bivariate c.d.f for all α > 0. We will show that H is a bivariate distribution function when 0 < α ≤ 1 and discusses the dependence properties of this model when the baseline univariate c.d.fs F1 and F2 are ordinary exponential distribution functions. The paper is organized as follows: The derivation of the model is discussed in Section 2. Various properties of the new model are discussed in Section 3. Section 4 is devoted to the inference about parameters. Some simulation results and real data analysis for illustrating purposes is provided in Section 5. The paper ends with some concluding remarks in Section 6.
2
2. Derivation of the model 2.1. The general form Following the idea given in [6], let {X1 , X2 , . . .} and {Y1 , Y2 , . . .} be two sequences of mutually independent and identically distributed (i.i.d.) random variables. It is assumed that for k ∈ {1, 2, 3, . . .}, Xk ∼ Exp(λ1 ) and Yk ∼ Exp(λ2 ). Consider a α sequence of independent Bernoulli trials in which the kth trial has probability of , k 0 < α ≤ 1, k ∈ {1, 2, 3, ...} and let N be the trial number on which the first success occurs. The discrete random variable N has the probability mass function ( ) ( α) α α P (N = n) = (1 − α) 1 − ... 1 − ; n = 1, 2, . . . , 2 n−1 n
(2)
and its probability generation function is given by, g(t) = E(tN ) = 1 − (1 − t)α ;
t ∈ [0, 1].
(3)
Remark 1. A random variable N with the probability mass function (2) is known as discrete Mittag-Leffler random variable and studied in [15]. Let U = max(X1 , ..., XN ) and V = max(Y1 , ..., YN ).
(4)
The cumulative distribution function of (U, V ) is then H(x, y) = P {U ≤ x, V ≤ y} ∞ ∑ = [P (Xi ≤ x)P (Yi ≤ y)]n P (N = n) n=1
= g{F1 (x)F2 (y)} = 1 − {1 − F1 (x)F2 (y)}α ,
x, y ∈ R,
(5)
with the univariate marginal distribution Hi (x) = 1 − F¯i (x)α , i = 1, 2.
(6)
Note that for the case α = 1, we have that H(x, y) = F1 (x)F2 (y). These observation lead to the following result. Proposition 1. For any univariate distribution functions F1 and F2 and 0 < α ≤ 1, the function defined by (1) is a bivariate distribution function. 3
Remark 2. Notice that the function defined by (1) may fail to be a bivariate distribution function when α > 1. For example, for i = 1, 2, let Fi (x) = x, 0 ≤ x ≤ 1, and let x1 = 21 , and y2 = 1. Then for H(x, y) = 1 − (1 − xy)2 , we have that ( )α ( )α 1 3 H(x2 , y2 ) − H(x1 , y2 ) − H(x2 , y1 ) + H(x1 , y1 ) = 2 − < 0, 2 4
x2 = 1, y1 =
1 2
for all α > 2, and hence H is not a bivariate distribution function. 1
¯ α (x) = 1 − Fi (x), i = 1, 2, Since the marginal survival functions of (5) satisfy H i another representation of (5) is given by ( )α 1 1 −α −α ¯ ¯ ¯ ¯ H(x, y) = 1 − H1 (x)H2 (y) H1 (x) + H2 (y) − 1 . Note also that α is the association parameter of the model. For the case α = 1, it reduces to H(x, y) = H1 (x)H2 (y) and when α → 0, we have H(x, y) = min{H1 (x), H2 (y)}. Remark 3. Note that the bivariate distribution defined by (1) is a Kumaraswamy type distribution. A random variable with a Kumaraswamy distribution has a univariate cumulative distribution functions (c.d.f ) in the following form [13] F (x) = 1 − (1 − xβ )α ,
x ∈ (0, 1), α, β > 0.
(7)
The background genesis of the Kumaraswamy distribution can be seen in [11]. The proposed model (1) is a bivariate version of (7) with β = 1 and α ∈ (0, 1]. 2.2. A bivariate exponential distribution Using (1) with the baseline distributions Fi (x) = 1 − e−λi x , i = 1, 2, we obtain a bivariate exponential distribution of the form H(x, y) = 1 − {1 − (1 − e−λ1 x )(1 − e−λ2 y )}α , x, y ≥ 0, λ1 , λ2 > 0, 0 < α ≤ 1,
(8)
whose univariate c.d.fs are ordinary exponential distributions Hi (x) = 1 − e−λi αx , i = 1, 2. The joint density function of (8) is given by h(x, y) = λ1 λ2 αe−(λ1 x+λ2 y) {1−α(1−e−λ1 x )(1−e−λ2 y )}{1−(1−e−λ1 x )(1−e−λ2 y )}α−2 . (9) 4
Since 0 < (1 − e−λ1 x )(1 − e−λ2 y ) < 1, by using the binomial expansion (1 − x) = t
∞ ( ) ∑ t j=0
with
(t) j
=
t(t−1)(t−2)...(t−j+1) , j!
h(x, y) = λ1 λ2 e =
j=1
|x| < 1, t ∈ R,
after some simplification, one may written (9) as
−(λ1 x+λ2 y)
∞ ( ∑
j
(−1)j xj
)
∞ ( ) ∑ α j=1
j
(−1)j+1 j 2 {(1 − e−λ1 x )(1 − e−λ2 y )}j−1
(10)
α (−1)j+1 fj:j (x; λ1 )fj:j (y; λ2 ), j
where fj:j (.; λi ), denotes the density function of the maximum order statistics of a sample of size j from exponential distribution with parameter λi , i = 1, 2. If (Z1 , Z2 ) is a random vector with distribution function (8), we say that (Z1 , Z2 ) has the bivariate Kumaraswamy exponential distribution (BKE, in short) with parameters λ1 , λ2 , α and denoted by (X, Y ) ∼ BKE(λ1 , λ2 , α). By using the stochastic representation (4) we have that a random vector (Z1 , Z2 ) ∼ BKE(λ1 , λ2 , α), has the same distribution as the vector (max(X1 , ..., XN ), max(Y1 , ..., YN )), where X1 , ..., XN and Y1 , ..., YN are two sets of independent exponential random variables with the respective scale parameters λ1 and λ2 , and the counting random variable N distributed as a Mittag-Leffler distribution with the parameter α, defined by (2). Thus it is easy to generate random variates from the BKE distribution. We present the following simple algorithm to generate (Z1 , Z2 ) from the BKE(λ1 , λ2 , α). Step 1. Generate N from Mittag-Leffler distribution with parameter α, by using the algorithm given in Apendix C. Step 2. Generate U1 , ..., UN and V1 , ..., VN from U(0, 1). Step 3. For i = 1, ..., N , compute Xi = − λ11α ln(1 − Ui ) and Yi = − λ21α ln(1 − Vi ). Step 4. Obtain Z1 = max(X1 , ..., XN ) and Z2 = max(Y1 , ..., YN ). 3. Properties In this section we discuss different properties of the proposed model.
5
3.1. Moments First we provide expressions for joint moment generating function (m.g.f) and Pearson’s correlation of BKE model. Proposition 2. If the random vector (X, Y ) ∼ BKE(λ1 , λ2 , α) then (a) the joint m.g.f of (X, Y ), for |t| < λ1 and |s| < λ2 , is given by ∞ ( ) ∑ s αλ1 λ2 t α MX,Y (t, s) = + (−1)j+1 j 2 B(1 − , j)B(1 − , j), (λ1 − t)(λ2 − s) j=2 j λ1 λ2 ∫1 where B(a, b) = 0 xa−1 (1 − x)b−1 dx, is the beta function. (b) the Pearson’s correlation coefficient of (X, Y ) is given by )2 (∑ j ∞ ( ) ∑ α 1 2 j+1 Corr(X, Y ) = α − 1. (−1) k j j=1 k=1 Proof. For part (a), from (10) we obtain MX,Y (t, s) = E(etX+sY ) ∞ ( ) ∑ α = (−1)j+1 j 2 M1 (t; λ1 )M2 (s, λ2 ), j j=1 where for i = 1, 2,
∫
∞
Mi (t, λi ) =
λi etx e−λi x (1 − e−λi x )j−1 dx
∫0 1
−t
(1 − u) λi uj−1 du
= 0
= B(1 −
t , j). λi
For part (b), using (10) the product moment could be obtained as (α) ( ∫ ∞ )2 ∞ ∑ (−1)j+1 j −y −y j−1 j ye (1 − e ) dy . E(XY ) = λ λ 1 2 0 j=1 ∑j−1 (j−1) (−1)k e−ky we have By using the binomial expansion (1 − e−y )j−1 = k=0 k ) ∫ ∞ ∫ ∞ j−1 ( ∑ j−1 −y −y j−1 k j ye (1 − e ) dy = j ye−(k+1)y dy (−1) k 0 0 k=0 ( ) j−1 ∑ j−1 1 = j (−1)k k (k + 1)2 k=0 =
j ∑ 1 k=1
6
k
.
(11)
(12)
Since the univariate margins of BKE distribution are ordinary exponential distributions 1 αλi
with expectations and variances equal to
and
1 , (αλi )2
i = 1, 2, respectively, the
expression for Pearson’s correlation coefficient is immediate. The following result gives the conditional moments of BKE model. Proposition 3. If the random vector (X, Y ) ∼ BKE(λ1 , λ2 , α), then (α) j ∞ ∑ ∑ (−1)j+1 j −λ (1−α)t j E(Y |X = t) = e 1 (1 − e−λ1 t )j−1 , kαλ 2 j=1 k=1 and E(Y 2 |X = t) =
∞ ∑
(α)
j=1
j
(−1)j+1 j αλ22
( e−λ1 (1−α)t (1 − e−λ1 t )j−1
j j ∑ 1 2 ∑ 1 ( ) + k k2 k=1 k=1
) .
Proof. From (10) the conditional density function of (Y |X = t) is given by ∞ ( ) λ2 −λ1 (1−α)t−λ2 y ∑ α hY |X (y|t) = e (−1)j+1 j 2 {(1 − e−λ1 t )(1 − e−λ2 y )}j−1 . α j j=1
Thus we have ∫ E(Y |X = t) =
∞
yhY |X (y|t)dy (α) ( ∫ ∞ ) (−1)j+1 j −λ (1−α)t j −λ1 t j−1 −λ2 y −λ2 y j−1 1 = e (1 − e ) j λ2 ye (1 − e ) dy α 0 j=1 (α) (∑ ) j ∞ ∑ (−1)j+1 j −λ (1−α)t 1 j −λ1 t j−1 1 = e (1 − e ) , αλ2 k j=1 k=1 0 ∞ ∑
where the last identity follows from (12). For E(Y 2 |X = t), a similar argument and using the following identity (see; e.g, [8]) 2j
j−1 ∑
( (−1)k
k=0
( j )2 ) j ∑1 ∑ j−1 1 1 = , + 2 k (k + 1)3 k k k=1 k=1
we obtain the required result.
7
3.2. Stress-strength parameter The Stress-strength parameter, (i.e. R = P (X < Y )) is useful for data analysis purposes [4]. The following proposition give a convenient form for the Stress-strength parameter of BKE model. Proposition 4. Suppose that (X, Y ) ∼ BKE(λ1 , λ2 , α). Then j ( )( ) ∞ ∑ ∑ kλ2 α j + 1), if λ1 = ̸ λ2 (−1)j+k+1 jB(j, λ1 j k j=1 k=0 P (X < Y ) = 1 , if λ1 = λ2 , 1+α where B(., .) denotes the beta function. Proof. From (10) we have ∫ ∞∫ y P (X < Y ) = h(x, y)dxdy 0 0 ∫ ∞∫ x ∞ ( ) ∑ α j+1 2 = (−1) j λ1 λ2 e−(λ1 x+λ2 y) {(1 − e−λ1 x )(1 − e−λ2 y )}j−1 dydx j 0 0 j=1 ( ) ∞ ∑ α = (−1)j+1 j 2 Ψ(j, λ1 , λ2 ), j j=1 where ∫
∞
Ψ(j, λ1 , λ2 ) =
λ1 e
−λ1 x
(1 − e
= =
=
[∫
)
1−e−λ2 x
u
j−1
] du dx
∫ 1 ∞ λ1 e−λ1 x (1 − e−λ1 x )j−1 (1 − e−λ2 x )j dx j 0 ∫ λ2 1 1 (1 − u)j−1 (1 − u λ1 )j du j 0 ( )∫ 1 j kλ2 1∑ k j (−1) (1 − u)j−1 u λ1 du j k=0 k 0 ( ) j 1∑ kλ2 k j + 1), (−1) B(j, j k=0 k λ1 0
=
−λ1 x j−1
0
which gives the required result.
8
3.3. Dependence properties In what follows we discuss the dependence properties of the BKE distribution through its associated copula. In view of Sklar’s Theorem [14], solving the equation Cα {H1 (x), H2 (y)} = H(x, y),
(13)
for the function Cα : [0, 1] × [0, 1] → [0, 1] yields the underlying copula associated with (8), as 1
1
Cα (u, v) = 1 − {1 − (1 − (1 − u) α )(1 − (1 − v) α )}α ,
(14)
for all u, v ∈ (, 1) and 0 < α ≤ 1. The theory and applications of copulas are well documented in [14]. This copula turns to be a member of the Archimedean family of 1
copulas with the strict generator ϕ(t) = −ln(1 − (1 − t) α ) (see, [6] for detail). Note that the density function of the BKE distribution defined by (9) could be rewritten as h(x, y) = h1 (x)h2 (y)cα (H1 (x), H2 (y)) ,
(15)
where Hi (x) = 1 − e−λi αx and hi (x) = λi αe−λi αx , i = 1, 2, are the marginal distribution functions and the marginal density functions, respectively and cα (u, v) =
∂2 C (u, v), ∂u∂v α
is the density function of the copula (13) given by 1
1
1
1
cα (u, v) = α{1−α(1−(1−u) α )(1−(1−v) α )}{1−(1−(1−u) α )(1−(1−v) α )}α−2 . (16) Recall that for two copulas C1 and C2 , we say that C2 is more concordant than C1 (written C1 ≺c C2 ) if C1 (u, v) ≤ C2 (u, v), for all u, v ∈ (0, 1). A pair (X, Y ) with associated copula C is positively quadrant dependent (written PQD) if Π ≺c C, where Π(u, v) = uv is the product copula [14]. The dependence properties of the BKE distribution depend only on the parameter α. The copula Cα defined by (14) is negatively ordered with respect to α; that is for α1 , α2 ∈ (0, 1], such that α1 ≤ α2 , we have Cα2 ≺c Cα1 , (see, [6] for detail). As a consequence, for α ≤ 1 we have that Π(u, v) = C1 (u, v) ≤ Cα (u, v) for all u, v ∈ (0, 1), that is a pair (X, Y ) distributed as BKE(λ1 , λ2 , α) is PQD. Remark 4. Since the BKE distribution defined by (8) has PQD property, it is suitable to describe the positive dependence of a random pair (X, Y ). However, it is very simple to consider a distribution to describe a negative dependence. It suffices to consider the copula C ∗ given by Cα∗ (u, v) = u − Cα (u, 1 − v). It is obvious that the properties of a 9
copula Cα∗ can be obtained in a simple way from the corresponding properties of a copula Cα ; see, [14] for detail. Let (X, Y ) and (X ′ , Y ′ ) be two continuous random vectors with the same univariate marginals and the respective joint density functions f and g. The pair (X, Y ) is said to be more positive likelihood ratio dependence (PLRD) than the pair (X ′ , Y ′ ), denoted by (X ′ , Y ′ ) ≺PLRD (X, Y ), if f (x1 , y1 )f (x2 , y2 )g(x1 , y2 )g(x2 , y1 ) ≥ f (x1 , y2 )f (x2 , y1 )g(x1 , y1 )g(x2 , y2 ),
(17)
whenever x1 ≤ x2 and x2 ≤ y2 [16]. When X ′ and Y ′ are independent, then the pair (X, Y ) is said to be PLRD and the condition (17) reduces to f (x1 , y1 )f (x2 , y2 ) ≥ f (x1 , y2 )f (x2 , y1 ). Holland and Wang [9] showed that a sufficient condition for PLRD in the case of continuous random variables is that
∂2 ∂x∂y
ln f (x, y) ≥ 0.
Proposition 5. Suppose that (X, Y ) ∼ BKE(λ1 , λ2 , α). Then (X, Y ) is PLRD. Proof. Let (U, V ) be a random vector with the joint distribution function H u (u, v) = 1 − (1 − uv)α , u, v ∈ [0, 1], 0 < α ≤ 1. Let hu be the density function associated with H u . Since
∂2 ∂u∂v
ln hu (u, v) ≥ 0, for all u, v ∈ [0, 1], 0 < α ≤ 1, by the result of Holland
and Wang [9] the pair (U, V ) is PLRD, i.e., (U ′ , V ′ ) ≺PLRD (U, V ), where U ′ and V ′ are independent with the same univariate marginal distribution as U and V . For i = 1, 2, let ϕi (t) be the increasing mapping t → − λ1i ln(1 − t). Since the pair (X, Y ) has the same joint distribution as the pair (ϕ1 (U ), ϕ2 (V ), in view of Theorem 9.D.2 in [16], we have the required result. 3.4. Kendall’s tau and Spearman’s rho The population version of two of the most common nonparametric measures of association between the components of a continuous random pair (X, Y ) are Kendall’s tau (τ ) and Spearman’s rho (ρ) which depend only on the copula C of the pair (X, Y ), and are given by ∫ 1∫
1
C(u, v) dC(u, v) − 1,
τ (X, Y ) = 4 0
(18)
0
and
∫ 1∫
1
C(u, v)dudv − 3.
ρ(X, Y ) = 12 0
0
10
(19)
See [14] for detail. The following result provides expressions for these measures associated with a vector (X, Y ) having BKE distribution. Proposition 6. Suppose that (X, Y ) ∼ BKE(λ1 , λ2 , α). Then
and
τ (X, Y ) = 1 + 4αB(2, 2α − 1)(Ψ(2) − Ψ(2α + 1)),
(20)
( ) α ρ(X, Y ) = 9 − 12α (−1) [B(j + 1, α)]2 , j j=0
(21)
2
∞ ∑
j
where B denotes the beta function and Ψ is the digamma function. Proof. The result follows from Proposition 9 in [6]. Remark 5. Note that as a consequence of the PQD property of the BKE distribution, for (X, Y ) ∼ BKE(λ1 , λ2 , α), we have Corr(X, Y ) ≥ 0, τ (X, Y ) ≥ 0 and ρ(X, Y ) ≥ 0. α
ρ(α)
τ (α)
ρ(α) τ (α)
0.0001 0.999999935 0.9998000258 1.000200 0.1001 0.952423077 0.8218848820 1.158828 0.2001 0.854503656 0.6770884892 1.262027 0.3001 0.739507866 0.5547812933 1.332972 0.4001 0.620781827 0.4487290130 1.383423 0.5001 0.504091758 0.3549780000 1.420065 0.6001 0.391946075 0.2708654200 1.447014 0.7001 0.285344232 0.1945181220 1.466929 0.8001 0.184556871 0.1245645610 1.481616 0.9001 0.089498325 0.0599718330 1.492339 Table 1: Kendall’s tau and Spearman’s rho associated with the family of distributions (8) for different values of α ∈ (0, 1].
The values of the Kendalla’s tau and the Spearman’s rho for BKE distribution are given in Table 1. It follows that that
ρ(α) τ (α)
=
11
3 2
as α −→ 1.
4. Maximum likelihood estimation In this section we describe how to obtain the maximum likelihood estimators of the unknown parameters based on a random sample of size n from BKE model. Let (X1 , Y1 ), ..., (Xn , Yn ) be a random sample from (X, Y ) ∼ BKE(λ1 , λ2 , α). Then the log-likelihood function based on observations (xi , yi ), i = 1, ..., n, becomes ( ) n n ∑ ∑ l(θ) = n ln(αλ1 λ2 ) − λ1 xi + λ2 yi +(α − 2)
n ∑
(
i=1
i=1 −λ1 xi
ln 1 − (1 − e
)(1 − e
i=1
+
n ∑
)
(
−λ2 yi
)
) −λ1 xi
ln 1 − α(1 − e
−λ2 yi
)(1 − e
) ,
(22)
i=1
where θ = (α, λ1 , λ2 ). The maximum likelihood estimates can be obtained by maximizing (22) with respect to the unknown parameters. As expected, they cannot be obtained in explicit forms. One needs to solve three non-linear equations to compute the MLEs, see the appendix B for details. Note that the Newton–Raphson method or other optimization routine may be used to maximize (22). But to use any optimization process we need to provide the initial guesses of the parameters. Since BKE is a regular family, √ the usual asymptotic normality result holds in this case, i.e. n(θˆ − θ) −→d N3 (0, I −1 ), where I is the Fisher information matrix. Note that the Fisher information matrix cannot be obtained in explicit forms, most of the elements can be obtained only in terms of double summations. We have provided (in Appendix B) the observed Fisher information matrix, which can be used to compute the asymptotic confidence intervals of the unknown parameters. Now we discuss how to obtain the initial guesses based on the observed sample. Since α is the association parameter, an initial value for this parameter can be calculated by inversion of Kendall’s τ or Spearman’s ρ. Specifically, the theoretical value of these measures is computed using (20) and (21), and one of the equations τ (α) = τn or ρ(α) = ρn is solved to obtain an estimate for α, say, α ˜ , where τn and ρn are the sample estimates of Kendall’s tau and Spearman’s rho. Since τ (α) and ρ(α) cannot be computed analytically from (20) and (21), the numerical interpolation could be used. To find initial estimates for the parameters λ1 and λ2 , we make the re-parameterization: β1 = λ1 α and β2 = λ2 α. Since X ∼ exp(β1 ) and Y ∼ exp(β2 ), in 12
the first step we calculate the MLEs of β1 and β2 based on the respective marginals, by ˜ 1 = β˜1 /˜ ˜ 2 = β˜2 /˜ β˜1 = 1/¯ x and β˜2 = 1/¯ y . In the next step, we put λ α and λ α as the initial estimates of λ1 and λ2 . Remark 6. As one of the reviewers mentioned, the existence and uniqueness of MLEs are important issues that should be considered. For real data analysis we could use the plot of the profile likelihood for alpha, though it is not enough to justify the existence and uniqueness for all three estimates considered here. For a theoretical justification, we need to show that the Hessian matrix (whose components are the expected values of the second partial derivatives of the log likelihood function with respect to the parameters) is positive definite. Because of the complexity of the Hessian matrix, at this point, the authors could not provide a rigorous proof for this problem and it remains for future work. 5. Simulation results and data analysis 5.1. Simulation results In this subsection we present some simulation results to see how the maximum likelihood estimates behave for different sample sizes and for different parameter values. Data is simulated from BKE distribution with two set of the parameters (α, λ1 , λ2 ) = (0.3, 1, 1) and (α, λ1 , λ2 ) = (0.8, 2, 3). The sample sizes we look at are 30, 50 and 100 observations. In each case we have computed the maximum likelihood estimates of the unknown parameters by maximizing the log-likelihood function (22). We compute the average estimates (AE), bias, standard errors (SE) and root of mean squared errors (RMSE) over 1000 replications and the results are reported in Table 2. In all the cases the performances of the maximum likelihood estimates are quite satisfactory. It is observed that as sample size increases the standard errors, root of mean squared error and absolute value of bias decrease for all the parameters, as expected. The average estimates of the parameters λ1 and λ2 decrease in all cases. But the AE of α is increased in some cases.
13
Table 2: Average estimates (AE), Bias, SE and RMSE based on 1000 simulations of the bivariate Kum-exponential distribution for n= 30, 50 and 100. (α, λ1 , λ2 ) = (0.3, 1, 1) n 30
50
100
(α, λ1 , λ2 ) = (0.8, 2, 3)
AE
Bias
SE
RMSE
AE
Bias
SE
RMSE
α ˆ ˆ1 λ
0.30005
0.00005
0.07021
0.07021
0.78445
-0.01555
0.14797
0.14879
1.06333
0.06333
0.19631
0.20627
2.17969
0.17969
0.49326
0.52497
ˆ2 λ
1.06196
0.06196
0.19471
0.20434
3.25784
0.25784
0.78935
0.83040
α ˆ ˆ1 λ
0.3003
0.0003
0.0543
0.0543
0.79497
-0.00503
0.11759
0.11769
1.0387
0.0387
0.1462
0.1513
2.09070
0.09070
0.37997
0.39064
ˆ2 λ
1.0405
0.0405
0.1489
0.1543
3.15097
0.15097
0.56409
0.58394
α ˆ ˆ1 λ
0.30005
0.00005
0.03782
0.03782
0.80147
0.00147
0.08722
0.08723
1.01451
0.01451
0.09928
0.10033
2.04301
0.04301
0.25926
0.26281
ˆ2 λ
1.01570
0.01570
0.10028
0.10150
3.05122
0.05122
0.39052
0.39387
Statistics
Gestation Lifespan
Minimum
12
2
1st Quartile
30.750
4.925
Median
65.50
10.100
Mean
114.16
16.155
3st Quartile
172.50
24.750
Maximum
392
50
Standard deviation 7 Pearson’s corr.
0.570
Kendall’s tau
0.451
Spearman’s rho
0.611
rho/tau
1.355
8
Table 3: Summary statistics for data set
14
50 40 30 y 20 10 0
100
200
300
400
x
Figure 1: Scatter plot of data set.
5.2. Data analysis In this section we try to fit the BKE model to a real data set. The data set represented in Table 4 in appendix A shows the Gestation period in days (X variable) and Lifespan in years (Y variable) for 40 various animals. The data could be found in www.stat.auckland.ac.nz/∼lee/330/datasets.dir/mammalsleep.txt. Scatter plots of the data set in Figure 1 and sample estimates of Pearson’s moment correlation, Spearman’s rho and Kendall’s tau given in Table 3 demonstrate obvious positive dependence between involved random variables. We first test the marginal distributions. For this purpose, we computed the Kolmogorov–Smirnov (KS) distances between the empirical marginals and the fitted marginals. The estimated parameters for the margins are given by βb1 = 0.0087 and βb2 = 0.0619. We found KS distances between empirical marginal distributions and fitted margins and the corresponding p values for Gestation and Lifespan to be 0.1003 (with p-value=0.8160) and 0.1167 (with p-value=0.647), respectively. For both of marginal variables the exponential distribution is a good fit. Using the sample estimate τn = 0.451, given in Table 3, we obtain an initial guess value of α as 0.4. From marginal estimates βb1 and βb2 , we also obtain the initial values of λ1 and λ2 as 0.022 and 0.155, respectively. Using the above initial estimates, we obtain the MLEs and b1 = 0.018, s.e(λ b1 ) = 0.0013, their standard errors (s.e) as α b = 0.470, s.e(b α) = 0.8727, λ 15
−450 −500 −550
The profile log−likelihood of α
−400
^ =0.46843870 α
0.0
0.2
0.4
0.6
0.8
1.0
α
Figure 2: The profile log-likelihood of α.
b2 ) = 0.0095. We plot the profile log-likelihood function of α in and λb2 = 0.133, s.e(λ Figure 2 and it is clear that the profile log-likelihood function is an unimodal function. 5.3. A copula goodness-of-fit test The natural question that arises here is whether the BKE model fits these bivariate data or not. It may be mentioned that although several goodness of fit tests are available for an arbitrary univariate distribution function, but for a general bivariate distribution functions we do not have a satisfactory goodness of fit test. Since the copula of the BKE model given by (14) has a closed and simple form, we try to perform a copula goodness-of-fit test. A review and comparison of goodness-of-fit procedures is given in [7]. These authors favour ’blanket’ tests, that is, rank-based procedures requiring no strategic choice such as kernel, bandwidth, etc. From their simulations, a good combination of power and conceptual simplicity is provided by the Cram´er-von Mises statistic: Sn = n
n ∑
(Cαb (ui , vi ) − Cn (ui , vi ))2 ,
i=1
where Cαb and Cn are the fitted copula and the empirical copula (see, e.g, [14]) of the data at hand, respectively, and ui =
1 × rank of Xi among X1 , ..., Xn , n+1 16
and vi =
1 × rank of Yi among Y1 , ..., Yn . n+1
This statistic measures how close the fitted copula is from the empirical copula of data. The P-value of the test is computed using a parametric bootstrap procedure described in Appendix A of [7]. To this end, we applied this procedure to both the Gestation ∗ period and Lifespan data. The bootstrap values S1∗ , ..., S1000 of the Cram´er-von Mises
test statistic are generated and we found the proportion of these values that are larger than Sn = 0.04 as P-value≈ 0.11. Thus we may conclude that the copula Cα defined by (14) with the association parameter α = 0.470 performs a good fit for this data set. In summary, as a result of Sklar’s Theorem [14] an adequate bivariate distribution for this data is given by H(x, y) = 1 − {1 − (1 − e−0.018x )(1 − e−0.133y )}0.47 ,
(23)
with the univariate marginal distribution functions F (x) = 1 − e−0.00846x ,
G(y) = 1 − e−0.06251y .
Note that from Table 3 the Kendall’s tau and the Spearman’s rho of the data set are given by τn = 0.451 and ρn = 0.611, respectively, which are close to the theoretical values τ = 0.3824 and ρ = 0.5388 calculated for the fitted model (23) by using (18) and (19). 6. Concluding remarks We have introduced a new bivariate distribution called BKE distribution, whose univariate marginal distributions are the ordinary exponential distribution. The proposed model is an absolutely continuous distribution. Since the joint distribution function and the density function of the BKE distribution have simple forms, it can be easily used in practice for modelling bivariate data. By giving a stochastic interpretation for BKE distribution, we presented an algorithm for data generation from BKE distribution. We have presented different statistical properties of this model and also discussed the maximum likelihood estimation of the parameters. Monte Carlo simulations indicate that the performance of the MLEs are quite satisfactory. We analysed a real data 17
set and showed through a goodness-of-fit test of copula that the BKE distribution can be used for modelling this data. Although the development here is for a bivariate distribution with exponential marginals, other bivariate distributions could be introduced along the same lines. By considering univariate distribution functions Fi , i = 1, ..., p, a multivariate generalization of the BKE distribution could be defined as ( )α p ∏ H(x1 , ..., xp ) = 1− 1 − Fi (xi ) , i=1
Another possible extension is to start with a p-variate distribution function F for constructing new distributions via H(x1 , ..., xp ) = 1 − (1 − F (x1 , ..., xp ))α . This structures also define p-variate distributions function when α ∈ (0, 1]. Acknowledgements We thank the helpful comments of the Editor that led to several improvements in this paper and for the suggestion about goodness of fit tests of copulas.
References [1] Arnolda, B. C. and Straussa, D. (1988). Bivariate Distributions with Exponential Conditionals. Journal of the American Statistical Association, 83, 522–527. [2] Awada, A. M., Azzamb, M. M. and Hamdan, M. A. (1981). Some inference results on P (X < Y ) in the bivariate exponential model. Commun. Statist. Theory and Methods, 10, 2515–2525. [3] Balakrishnan, N. and Lai, C. D. (2009). Continuous Bivariate Distributions. 2th Edition, Springer, New York. [4] Church, JD and Harris, B. (1970). The estimation of reliability from stress strength relationships. Technometrics, 12, 49–54. [5] Diawaraa, N. and Carpenter, M. (2010). Mixture of Bivariate Exponential Distributions. Commun. Statist. Theory and Methods, 39, 2711–2720. 18
[6] Dolati, A., Amini, M. and Mirhosseini, S. M. (2014). Dependence properties of bivariate distributions with proportional (reversed) hazards marginals. Metrika, 77, 333–347. [7] Genest, C., R´emillard, B. and Beaudoin, D. (2008). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, 42, 199–213. [8] Gupta, R.D. and Kundu, D. (1999). Generalized exponential distributions. Austral. New Zealand J. Statist. 41, 173–188. [9] Holland, P. W. and Wang, Y. J. (1987). Dependence functions for continous bivariates densities. Commun. Statist. Theory and Methods, 16:3, 863–876. [10] Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London. [11] Jones, M.C. (2009). Kumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Stat. Methodol., 6, 70–91. [12] Kozubowskia, T. J. and Panorska, A. K. (2008). A Mixed Bivariate Distribution Connected with Geometric Maxima of Exponential Variables. Commun. Statist. Theory and Methods, 37, 2903–2923. [13] Kumaraswamy, P. (1980). Generalized probability density-function for doublebounded random-processes. J. Hydrol., 46, 79–88. [14] Nelsen, R.B. (2006). An Introduction to Copulas. Second Edition, Springer, New York. [15] Pillai, R. N. and Jayakumar, K. (1995). Discrete Mittag–Leffler distributions. Statistics and Probability Letters, 23, 271–274. [16] Shaked, M. and Shantikumar, G. (2007). Stochastic Orders, Springer, Springer, New York.
19
Appendix A: No
Animal
X
Y
No
Animal
X
Y
1
African Giant Pouched Rat
42
4.5
21
Mouse
19
3.2
2
Baboon
180
27
22
Musk Shrew
30
2
3
Big Brown Bat
35
19
23
N.American.opossum
12
5
4
Brazilian Tapir
392
30.4
24
Nine-banded Armadillo
120
6.5
5
Cat
63
28
25
Owl Monkey
140
12
6
Chimpanzee
230
50
26
Patas Monkey
170
20.2
7
Chinchilla
112
7
27
Phanlanger
17
13
8
Cow
281
30
28
Pig
115
27
9
Eastern American Mole
42
3.5
29
Rabbit
31
18
10
Echidna
28
50
30
Rat
21
4.7
11
European Hedgehog
42
6
31
Red Fox
52
9.8
12
Galago
120
10.4
32
Rhesus Monkey
164
29
13
Goat
148
20
33
Rockhyrax (Heterob)
225
7
14
Golden Hamster
16
3.9
34
Rockhyrax (Procaviahab)
225
6
15
Gray Seal
310
41
35
Sheep
151
20
16
Ground Squirrel
28
9
36
Tenrec
60
4.5
17
Guineapig
68
7.6
37
Tree Hyrax
200
7.5
18
Horse
336
46
38
Tree Shrew
46
2.3
19
Lesser Short-tailed Shrew
21.5
2.6
39
Vervet
210
24
20
Little Brown Bat
50
24
40
Water Opossum
14
3
Table 4: Gestation (days) and Lifespan (years) for 40 various animals.
Appendix B: Normal equations and observed Fisher information matrix We will present the normal equations. We use the following notations: ( ) n n ∑ ∂l(θ) −λ1 xi −λ2 yi = + ln 1 − (1 − e )(1 − e ) ∂α α i=1 −
n ∑ i=1
(1 − e−λ1 xi )(1 − e−λ2 yi ) = 0, 1 − α(1 − e−λ1 xi )(1 − e−λ2 yi )
20
∑ ∑ ∂l(θ) n xi e−λ1 xi (1 − e−λ2 yi ) = − xi − (α − 2) ∂λ1 λ1 1 − (1 − e−λ1 xi )(1 − e−λ2 yi ) i=1 i=1 n
− α
n ∑ i=1
n
xi e−λ1 xi (1 − e−λ2 yi ) = 0, 1 − α(1 − e−λ1 xi )(1 − e−λ2 yi )
∑ ∑ yi e−λ2 yi (1 − e−λ1 xi ) ∂l(θ) n yi − (α − 2) = − ∂λ2 λ2 1 − (1 − e−λ1 xi )(1 − e−λ2 yi ) i=1 i=1 n
n
− α
n ∑ i=1
yi e−λ2 yi (1 − e−λ1 xi ) = 0. 1 − α(1 − e−λ1 xi )(1 − e−λ2 yi )
Fisher information matrix From (22) we have
]2 n [ ∑ ∂ 2 l(θ) n (1 − e−λ1 xi )(1 − e−λ2 yi ) = − 2− , −λ1 xi )(1 − e−λ2 yi ) ∂α2 α 1 − α(1 − e i=1
∑ x2 e−(λ1 xi +λ2 yi ) (1 − e−λ2 yi ) ∂ 2 l(θ) n i = − + (α − 2) ∂λ21 λ21 (1 − (1 − e−λ1 xi )(1 − e−λ2 yi ))2 i=1 n
+α
n ∑ x2 e−λ1 xi (1 − e−λ2 yi )(1 − α(1 − e−λ2 yi )) i
(1 − α(1 − e−λ1 xi )(1 − e−λ2 yi ))2
i=1
,
∑ y 2 e−(λ1 xi +λ2 yi ) (1 − e−λ1 xi ) n ∂ 2 l(θ) i = − + (α − 2) ∂λ22 λ22 (1 − (1 − e−λ1 xi )(1 − e−λ2 yi ))2 i=1 n
+α
n ∑ y 2 e−λ2 yi (1 − e−λ1 xi )(1 − α(1 − e−λ1 xi )) i
(1 − α(1 − e−λ1 xi )(1 − e−λ2 yi ))2
i=1
,
n ∑ ∂ 2 l(θ) xi e−λ1 xi (1 − e−λ2 yi ) = ∂α∂λ1 1 − (1 − e−λ1 xi )(1 − e−λ2 yi ) i=1
−
n ∑ i=1
2
∂ l(θ) = ∂α∂λ2
n ∑ i=1
−
xi e−λ1 xi (1 − e−λ2 yi ) , (1 − α(1 − e−λ1 xi )(1 − e−λ2 yi ))2
yi e−λ2 yi (1 − e−λ1 xi ) 1 − (1 − e−λ1 xi )(1 − e−λ2 yi )
n ∑ i=1
yi e−λ2 yi (1 − e−λ1 xi ) , (1 − α(1 − e−λ1 xi )(1 − e−λ2 yi ))2
∑ ∂ l(θ) xi yi e−(λ1 xi +λ2 yi ) = −(α − 2) ∂λ1 ∂λ2 (1 − (1 − e−λ1 xi )(1 − e−λ2 yi ))2 i=1 n
2
−α
n ∑ i=1
xi yi e−(λ1 xi +λ2 yi ) . (1 − α(1 − e−λ1 xi )(1 − e−λ2 yi ))2 21
2
, and θ = (θ1 , θ2 , θ3 ) = The Fisher information is I(θ) = [I(θij )], where Iij (θ) = −E ∂∂θl(θ) i θj (α, λ1 , λ2 ). We shall now present the exact expressions of Iij (θ), for i = 1, 2, 3. Direct calculations show that ( I11 = −E (
∂ 2 l(θ) ∂α2
)
n = 2 α
∂ l(θ) = ∂λ21 ) ( 2 ∂ l(θ) = = −E ∂λ22
I22 = −E I33
2
)
( 1+
∞ ∑ ∞ ∑ (−1)k αj+3 j=0 k=0
(α−2) ) k
(j + k + 3)2
,
n (1 − α(α − 2)A1 − αA2 ) , λ21 n (1 − α(α − 2)A1 − αA2 ) λ22
(
I12 I13 I23
) ∂ 2 l(θ) nα = −E = (B2 − B1 ), ∂α∂λ1 λ1 ( 2 ) ∂ l(θ) nα = −E = (B2 − B1 ), ∂α∂λ2 λ2 ) ( 2 ∂ l(θ) nα = ((α − 2)C2 + αC1 ), = −E ∂λ1 ∂λ2 λ1 λ2
where A1 =
∞ ∑
( j
(−1)
j=0
(
) ( ) α−4 B(2, j + 1) B(2, j + 2) − αB(2, j + 3) j )
× Ψ′ (2) − Ψ′ (j + 3) − (Ψ(2) − Ψ(j + 3))2 , A2 =
∞ ∑ ∞ ∑ j=0 k=0
(
( j
(−1)
) ( ) α−2 k 1 α α B(2, j + k + 3) − ) j k+j+2 k+j+3 )
× Ψ′ (2) − Ψ′ (k + j + 5) − (Ψ(2) − Ψ(k + j + 5))2 ,
22
B1 = − B2 = C1 = − C2 =
( )( ∞ ∑ α − 3 1 (−1)j+1 (Ψ(2) − Ψ(j + 3)) j (j + 2)2 (j + 1) j=0 ) 1 (Ψ(2) − Ψ(j + 4)) , (j + 3)2 (j + 2) ( ) k( ) ∞ ∑ ∞ ∑ (−1)j+1 α−2 α j Ψ(2) − Ψ(k + j + 3) , j + k + 2 j=0 k=0 ( )( ∞ ∑ 1 j α−4 (Ψ(2) − Ψ(j + 3))2 (−1) 2 (j + 2) (j + 1) j j=0 ) α (Ψ(2) − Ψ(j + 4))2 , (j + 3)2 (j + 2) ( ) ( ) ∞ ∑ ∞ ∑ j α−2 k (−1) α { Ψ(2) − Ψ(k + j + 3) B(2, j + k + 3)}2 , j j=0 k=0
where Ψ(.) is the digamma function and B(a, b) =
∫1 0
xa−1 (1 − x)b−1 dx is the beta
function. Appendix C: R codes for generating random variate form Mittag-Leffler probability mass function with parameter of α. f.mittagleffler=function(k,alpha) { y=1:k prod(alpha-y+1)*(-1)^(k+1)/factorial(k)} F.mittagleffler=function(k,alpha) {f=c() for(t in 1:k){ y=1:t f[t]=prod(alpha-y+1)*(-1)^(t+1)/factorial(t)} y=1:t sum(f) } sim.mittagleffler=function(alpha) {k=1 u=runif(1) 23
while(F.mittagleffler(k,alpha)