ULHAS J. DIXIT – PARVIZ F. NASIRI (*)
Estimation of parameters of the exponential distribution in the presence of outliers generated from uniform distribution
Contents: 1. Introduction. — 2. Joint distribution of (X 1 , X 2 , ..., X n ) with k outliers. — 3. Method of moments. — 4. Maximum likelihood estimate. — 5. Mixture of method of moment and maximum likelihood. — 6. Numerical study. Acknowledgment. References. Summary. Riassunto. Key words.
1. Introduction Consider spread from a point source for example, which might a small plot of plants. During favourable weather conditions, the plants release their pollen and it disperses according to an exponential distribution with distance from the source. However, in less favourable conditions, light, rain or mist, not only are the plants less likely to release pollen, but that which is released still falls with an exponential distribution with a different scale parameter. Dixit, Moore and Barnett (1996) consider the above example in the context of spread disease amongst plants of viral spores such as barley yellow mosaic dwarf virus (BYMDV). By using the methodology as stated in Dixit, Moore and Barnett (1996), it is possible to estimate the average distance (and hence area) of disease spread in a field from a small patch of infested plants in the presence of some spread caused by insects. According to Dixit, Moore and Barnett (1996), we assume that a set of random variables (X 1 , X 2 , ..., X n ) represent the distance of an infected sampled plant from a plot of plants inoculated with a virus. (*) Department of Statistics, University of Mumbai, Vidyanagari, Mumbai, – 400 098, India E-mail:
[email protected]
188 Some of the observations are derived from the airborne dispersal of the spores and are distributed according to the exponential distribution. The other observations out of n random variables (say k) are present because aphids which are known to be carriers of BYMDV have passed the virus into the plants when the aphids feed on the sap. These k (known) aphids are considered to be uniformly distributed. Thus, we assume that the random variables (X 1 , X 2 , ..., X n ) are such that k of them are distributed with pdf g(x, θ, α). 1 ; 0 < x < αθ, α, θ > 0 (1.1) g(x, θ, α) = αθ and remaining (n−k) random variables are distributed with pdf f (x, θ ) 1 x f (x, θ ) = exp − ; x > 0, θ > 0 (1.2) θ θ The present paper considers the estimation of θ and α in the model described above. In Section 2, we have obtained the joint distribution of (X 1 , X 2 , ..., X n ) in the presence of k outliers. In section 3, 4 and 5, we deal with the method of moment, maximum likelihood and mixture of method of moment and maximum likelihood in estimating α and θ. In Section 6, we compare the Bias and Determinant of both the estimates empirically. 2. Joint distribution of (X 1 , X 2 , ..., X n ) with k outliers The joint distribution of (X 1 , X 2 , ..., X n ) in the presence of k outliers is given by f (x1 , x2 , ..., xn , α, θ) = h k e−xi /θ G(x, α, θ )
(2.1)
h k = [C(n, k)θ n α k ]−1 n! C(n, k) = (n − k)!k!
(2.2)
where
G(x,α,θ) =
n−k+1 n−k+2
...
n
k
e
(2.3)
i=1 x Ai /θ π k
A1 =1 A2 =A1 +1 Ak =Ak−1 +1
I (u) = 0 if u < 0 = 1 if u ≥ 0
i=1 I (αθ
− x Ai ) (2.4)
189 3. Method of moments Let D =
m 2
m 2 1
m 2 , m 1
, D1 =
b= m i
k n
=
and b = n x ji
α2b = + 2b θ 2 3
(3.3)
α 2 b/3 + 2b ( αb + b)2 2
(3.4)
α2b + 2b 3
(3.5)
D=
αb +b 2
(3.2)
D
(3.1)
bα +b θ 2
m 1 = m 2
where
i = 1, 2
n
j=1
n−k , n
2
=
Solving (3.5) A1 α 2 + A2 α + A3 = 0
(3.6)
where Db2 b + 4 3 A2 = −Dbb
A1 = −
2
A3 = −Db + 2b If (A22 − 4A1 A3 ) is non-negative then the roots are real. Therefore −A2 + A22 − 4A1 A3 αˆ = 2A1 Next
(3.7)
2
D1
[ α3 b + 2b] m 2 = θ m 1 [ αb + b] 2
αb α2b +b =θ + 2b 2 3
(3.8)
190 Hence θˆ = D1
ˆ [ αb + b] 2
[ αˆ 3 b + 2b] 2
,
(3.9)
where αˆ is given by (3.7). Now, we shall show that αˆ and θˆ are asymptotically unbiased estimators. n n 2, Let W1 = i=1 X i and W2 = i=1 X i2 , then D = nW A1 = W2 −nW2 b2 4W12
+
b , 3
A2 =
−nW2 bb W12
, A3 =
−nW2 b W12
2
1
+ 2b
Now, we can write αˆ as a function of W1 and W2 . Hence αˆ = f (W1 , W2 ).
(3.10)
Let E(W1 ) = µ and E(W2 ) = ν. Expand the function f (W1 , W2 ) around (µ, ν) by Taylor series f (W1 , W2 ) = f (µ, ν) + (W1 − µ) + (W2 − ν)
∂f |W =µ,W2 =ν ∂ W1 1
∂f |W =µ,W2 =ν + ... ∂ W2 1
(3.11)
Hence, from (3.7), (3.10), and (3.11), E(α) ˆ = f (µ, ν)
nνbb √ 2b 2nνb2 = − + × µ2 3 4µ2 Where
bα +b θ µ=n 2
bα 2 ν=n + 2b θ 2 3
−1
(3.12)
191
=
nνbb µ2
2
b nνb2 − −4 3 4µ2
nνb 2b − 2 µ
(3.13)
2
8bb 4nνbb 2nνbb2 =− + + 3 3µ2 µ2
8bb 2bbnν 2b + +b =− 3 µ2 3 8bb 2bbnν + (b + 2) 3 3µ2 8bb 2bb 4(bα 2 + 6b)(b + 2) + =− 3 3 3(bα + 2b)2
=−
16(bb)2 α 2 − 6α + 9 = 9 (bα + 2b)2 1/2 =
4bb(α − 3) 3(bα + 2b)
(3.14)
From (3.12), 4bb(bα 2 +6b) 4bb(α−3) 4bb + (2bα 2 +2αb−3bα) = 2 2 3(bα+2b) 3(bα+2b) 3(bα+2b) 4bbα = (2bα + 2b − 3b) (3.15) 3(bα + 2b)2 Similarly,
2b 2nνb2 2b (bα 2 + 6b)b − = 1 − 3 4µ2 3 (bα + 2b)2 =
4bb [2bα + 2b − 3b] 3(bα + 2b)2
(3.16)
Hence, from (3.12), (3.15) and (3.16), E(α) ˆ =α
(3.17)
192 By using (3.11), let W1 = X i , W2 = X i2 , W3 = α, ˆ E(W1 ) = µ, E(W2 ) = ν, and E(W3 ) = α. θˆ = f (W1 , W2 , W3 ) = f (µ, ν, α) ∂f + (W1 − µ) |W =µ,W2 =ν,W3 =α ∂ W1 1 ∂f + (W2 − ν) |W =µ,W2 =ν,W3 =α ∂ W2 1 ∂f + (W3 − α) |W =µ,W2 =ν,W3 =α + ... ∂ W3 1 According to the previous procedure +b ν αb 2 2 n→∞ µ α b ( + 2b)
E(θˆ ) = lim
3
=
2 ( α3 b + 2b)θ 2 lim n→∞ ( αb + b)θ 2
=θ
( αb + b) 2 ( α3 b + 2b) 2
(3.18)
Hence αˆ and θˆ are asymptotically unbiased. 4. Maximum likelihood estimate From (2.1), the likelihood of (x1 , x2 , ..., xn ) is L(x1 , x2 , ..., xn , θ, α) =
e−xi /θ G(x, α, θ ) C(n, k)θ n−k (αθ)k
(4.1)
ˆ = x(n) , where x(n) = This likelihood is maximised if we take (αθ) max(x1 , x2 , ..., xn ). Hence L(x1 , x2 , ..., xn , θ, α)
e−xi /θ G(x, θ ) C(n, k)θ n−k xn k
where G(x, θ ) =
n−k+1 n−k+2 A1 =1 A2 =A1 +1
...
n Ak =Ak−1 +1
k
e
i=1 x Ai /θ .
(4.2)
193 The likelihood equation corresponding to (4.2) is −(n − k) G (x, θ ) xi + + =0 θ G(x, θ ) θ2
where
k ∂G 1 G (x, θ ) = ... x Ai =− 2 ∂θ θ A A A i=1
1
(4.3)
k
e
1 x Ai /θ
k
2
We can solve (4.3) by Newton - Raphson method. Hence solution of the equation is g(θi ) ; i = 1, 2, ... (4.4) θi+1 = θi − g (θi ) where
A1
g(θi ) = (n −k)θi +
A2
...
A1
1 { g (θi ) = (n − k) + 2 θi 1 { − 2 θi
A1
{
A2 ...
A2 ..
A1
{
A1
k
Ak (
A2
A2
Ak (
...
k
A1
k
1 x Ai /θ
Ak e
...
A2
Ak (
...
e
k 1
2 1 x Ai ) e
Ak
k 1 x Ai /θ
x Ai )e
i=1
Ak
k
n
xi (4.5)
1
x Ai )e
e
−
k
1 x Ai /θ }2
x A /θ 2 i }
1 x Ai /θ }
x A /θ i
}
(4.6)
Here, the initial solution θ0 should be selected from (3.9). Note: Estimation of k: If k is unknown, then k can be selected by evaluating the likelihood for different values of k choosing the one that maximises the likelihood.
5. Mixture of method of moment and maximum likelihood Read (1981) proposed the methods which avoid the difficulty of complicated equations. According to Read (1981), replacement of some, but, not all, of the equations in the system of likelihood may make it more manageable. From (3.9), θˆ =
ˆ + b] D1 [ αb 2
[ αˆ 3b + 2b] 2
(5.1)
194 and αˆ =
X (n) θˆ
(5.2)
From (5.1) and (5.2) e1 θ 2 + e2 θ + e3 = 0,
(5.3)
where e1 = 2b e2 = −bD1 2 bX (n) bX (n) D1 − e3 = 3 2 θˆ =
−e2 +
e22 − 4e1 e3
2e1
(5.4)
6. Numerical study In order to have some idea about Bias and Determinant we perform sampling experiments using a pentium II. We have a written program in Gauss language to do simulation study. Since in our model we have two parameters, we have calculated variance - covariance matrix of the estiamte θ and α. Determinant ˆ 2 . (See, Jayade of the covariance matrix is V (θˆ )V (α) ˆ − (Cov(α, ˆ θ)) and Prasad (1990)). The simulation study was carried out for θ = 0.2 and α = 0.3 for k = 1, 2, and 3 with sample sizes n = 10(10)80 respectively. Here, we have presented Bias and Determinant in table 1, 2 and 3 for k = 1, 2 and 3. Table 1, 2 and 3 summarises the results based on one thousand independent replications of each experiments. ˆ < θ. For all values of k, we can conjecture that E(α) ˆ > α and E(θ) Bias for θˆ may be negative in some cases. But bias of α is always positive. In all cases determinant for method of moment is larger than the mle and mixture estimate. But the determinant of mixture estimate is always smaller than the maximum likelihood method. One more good point about the mixture estimate is that it is easy to calculate. We also have shown that moment estimates are asymptotically unbiased. It is difficult to show analytically that mixture estimate
195 of θ is asymptotically unbiased. But from simulation of study, we conjecture that mixture estimate of θ is asymptotically unbiased. Therefore, we conclude that mixture estimate should be used always. Table 1 k=1
α = 0.3
θ = 0.2.
n
Method
Bias of α
Bias of θ
Determinant
10
M.E M.L.E Mixture
7.38731808 5.12072069 3.00951205
-0.05331814 -0.06286613 0.01365834
0.01979398 0.00835185 0.00070145
20
M.E M.L.E Mixture
7.84778122 5.14417451 3.74679474
-0.03286697 -0.03644049 0.01530065
0.01397838 0.00416079 0.00100062
30
M.E M.L.E Mixture
7.92240028 5.18184348 4.18288864
-0.02505852 -0.02687431 0.00900593
0.00811478 0.00191553 0.00084228
40
M.E M.L.E Mixture
8.25847094 5.26360618 4.41760458
-0.01620150 -0.01674681 0.01400932
0.00687830 0.00140131 0.00076403
50
M.E M.L.E Mixture
8.51560399 5.43828664 4.66801041
-0.00559773 -0.00594334 0.02080644
0.00644276 0.00118987 0.00076871
60
M.E M.L.E Mixture
8.90858503 5.62479521 4.88264927
-0.00911911 -0.00918627 0.01583692
0.00577397 0.00108637 0.00073600
70
M.E M.L.E Mixture
9.06762312 5.69544960 5.02224445
-0.00869197 -0.00856969 0.01353792
0.00479204 0.00089419 0.00065995
80
M.E M.L.E Mixture
9.00646515 5.74746825 5.15931679
0.05383654 0.05394044 0.07278600
0.00456885 0.00082780 0.00059468
196 Table 2 k=2
α = 0.3
θ = 0.2.
n
Method
Bias of α
Bias of θ
Determinant
10
M.E M.L.E Mixture
7.98482685 6.61620435 3.03453892
-0.09111024 -0.10088316 -0.01071868
0.02626965 0.01273989 0.00095851
20
M.E M.L.E Mixture
7.10749977 5.64018325 3.87119422
-0.05272716 -0.05786594 -0.00298474
0.00848294 0.00396482 0.00126673
30
M.E M.L.E Mixture
6.98417833 5.54712418 4.36410389
-0.03789580 -0.04116731 -0.00338317
0.00497922 0.00217214 0.00123695
40
M.E M.L.E Mixture
7.13909432 5.58113523 4.60502057
-0.02871923 -0.03085552 0.00045035
0.00406495 0.00159671 0.00095604
50
M.E M.L.E Mixture
7.22106660 5.66678115 4.85651345
-0.00797101 -0.00958333 0.01605138
0.00343226 0.00129517 0.00106978
60
M.E M.L.E Mixture
7.42244948 5.72572432 4.97787223
0.01322037 0.01211992 0.03577106
0.00321760 0.00105766 0.00079902
70
M.E M.L.E Mixture
7.49450164 5.80100751 5.13840131
0.00079222 -0.00015421 0.02079754
0.00275141 0.00085825 0.00079142
80
M.E M.L.E Mixture
7.60329922 5.91201162 5.30633966
0.00339940 0.00259977 0.02147294
0.00213713 0.00064844 0.00069384
197 Table 3 k=3
α = 0.3
θ = 0.2.
n
Method
Bias of α
Bias of θ
Determinant
10
M.E M.L.E Mixture
8.29338361 8.87425039 3.03813540
-0.12383856 -0.13029435 -0.02745238
3.85040158 0.01774386 0.00192943
20
M.E M.L.E Mixture
7.10313677 6.08381782 4.00015401
-0.07136309 -0.07561746 -0.02068533
0.00817121 0.00493521 0.00220720
30
M.E M.L.E Mixture
6.68740186 5.71622909 4.47492945
-0.05150660 -0.05471088 -0.01823845
0.00474096 0.00260615 0.00212593
40
M.E M.L.E Mixture
6.84952472 5.84575148 4.83173491
-0.04238368 -0.04485696 -0.01456683
0.00318208 0.00176061 0.00191147
50
M.E M.L.E Mixture
6.72358907 5.78036278 5.00093074
-0.03453472 -0.03665961 -0.01302715
0.00253141 0.00132284 0.00142085
60
M.E M.L.E Mixture
6.95025722 5.99670817 5.28285408
-0.02940705 -0.03114272 -0.00989217
0.00220722 0.00112138 0.00134394
70
M.E M.L.E Mixture
6.84372439 5.91059189 5.33366393
-0.02626065 -0.02786905 -0.00997041
0.00212089 0.00092050 0.00123339
80
M.E M.L.E Mixture
6.89711413 6.00322535 5.51741524
-0.02333821 -0.02491451 -0.00930442
0.00195889 0.00082891 0.00110200
Acknowledgment This research is funded by the Council of Scientific and Industrial Research, New Delhi. The authors are thankful to the two referees for their valuable comments, suggestions and Italian translation of the summary.
198 REFERENCES Dixit, U.J., Moore, K.L., and Barnett, V. (1996) On the estimation of the power of the scale parameter of the exponential distribution in the presence of outliers generated from uniform distribution, Metron, 54, 201-211. Jayade, V.D. and Prasad, M.S. (1990) Estimation of parameters of mixed failure time distribution, Commun. Statist. Theory Meth., 19(12), 4667-4678. Read, R.R. (1981) Representation of certain covariance matrices with application to asymptotic efficiency, J. Amer. Statist. Assoc., 76, 148-154.
Estimation of parameters of the exponential distribution in the presence of outliers generated from uniform distribution Summary The maximum likelihood, moment and mixture of the estimators are derived for samples from the exponential distribution in the presence of outliers generated from uniform distribution. These estimators are compared empirically when all the parameters are unknown; their bias and determinants are investigated with the help of numerical technique. We have shown that these estimators are asymptotically unbiased. At the end, we conclude that mixture estimators are better than the maximum likelihood and moment estimators.
Stima di parametri della distribuzione esponenziale in presenza di dati anomali generati da una distribuzione uniforme Riassunto Lo stimatore di massima verosimiglianza viene confrontato con lo stimatore del metodo dei momenti e con uno stimatore misto, ottenuto mediante mistura del metodo di massima verosimiglianza e del metodo dei momenti. Viene qui considerato il caso in cui i parametri sono tutti incogniti. Il comportamento di tali stimatori viene analizzato mediante metodi numerici. Essi risultano asintoticamente efficienti, Tra di essi il metodo di stima mista e` superiore rispetto agli stimatori di massima verosimiglianza e quello dei momenti.
Key words Outliers; Exponential and uniform distribution; Estimation.
[Manuscript received March 1999; final version received November 2000.]