In this study, a goodness of fit test statistics for Gumbel distribution based on .... employ the Kullback-Leibler information which is given by. 0. 0. ( ). ( , ). ( )log . ( ; , ). f x. I f f. f x ..... distributions. In the case of Cauchy, the test is found to have the.
Applied Mathematical Sciences, Vol. 8, 2014, no. 95, 4703 - 4712 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.46470 Goodness
of Fit Test for Gumbel Distribution
Based on Kullback-Leibler Information Using Several Different Estimators S. A. Al-Subh Department of Mathematics and Statistics Mutah University, Karak, Jordan Copyright © 2014 S. A. Al-Subh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract In
this
H
: F (x ) = F
o
paper, o
(x )
our
objective
is
for all x against H : F ( x ) ≠ F ( x ) 1 o
to test the statistical hypothesis for some x , where F o ( x ) is a known
distribution function. In this study, a goodness of fit test statistics for Gumbel distribution based on Kullback-Leibler information is studied. The performance of the test under simple random sampling is investigated using Monte Carlo simulation. The Gumbel parameters are estimated by using several methods of estimation such as maximum likelihood, order statistics, moments, and L-moments. Ten different distributions are considered under the alternative hypothesis. For all the distributions considered, it is found that the test statistics based on estimators found by moment and order statistic methods have the highest power, except for weibull and Lognormal distributions. Keywords: Goodness of fit test; Kullback-Leibler Information; Entropy; Gumbel distribution; order statistics
1. INTRODUCTION There are many areas of application of Gumbel distribution such as environmental sciences, system reliability and hydrology. In hydrology, for example, the Gumbel distribution may be used to represent the distribution of the minimum level of a river in a particular year based on
4704
S. A. Al-Subh
minimum values for the past few years. It is useful for predicting the occurrence of that an extreme earthquake, flood or other natural disaster. The potential applicability of the Gumbel distribution to represent the distribution of minima relates to extreme value theory which indicates that it is likely to be useful if the distribution of the underlying sample data is of the normal or exponential type. Many studies have been carried out to study goodness of fit tests by using of Kullback-Leibler information. Kinnison (1989) tested the Gumbel distribution using a correlation coefficient type statistic. Arizono and Ohata (1989) proposed a test of normality based on an estimate of the Kullback-Leibler information. Song (2002) presented a general methodology for developing asymptotically distribution-free goodness of fit tests based on the Kullback-Leibler information. Also, he has shown that the tests are omnibus within an extremely large class of nonparametric global alternatives and to have good local power. Ibrahim et al. (2011) found that the goodness of fit test based on Kullback-Leibler information supports the results which indicate that the chi-square test is most powerful under RSS than SRS for some selected order statistics. In this paper, we introduce a goodness of fit test for Gumbel distribution which is based on the Kullback-Leibler information. We estimate the Gumbel parameters by using several methods of estimation such as maximum likelihood, order statistics, moments and Lmoments. According to Hosking (1990), L-moments have the theoretical advantages over the conventional moments of being able to characterize a wider range of distributions and, when estimated from a sample, the method is more robust to the presence of outliers in the data. Also, the parameter estimates obtained from L-moments are sometimes more accurate in small samples than the maximum likelihood estimates. We compute the percentage values and the power based on the statistics which involves the Kullback-Leibler information using Monte Carlo simulations. This paper is organized as follows. First, we define the test statistic and the estimators of Gumbel distribution. Then, we write the procedures to calculate the percentage points and the power function of the test statistic under an alternative distribution. In addition, a simulation study is conducted to study the power of the test statistic and we state our conclusions.
2. PRELIMINARY NOTES AND METHODS Test Statistics Let X 1 , X 2 ,..., X n be a random sample from the distribution function F (x ) with quantile function Q (u ) = F −1 (u ) and let X 1:n ≤ X 2:n ≤ ... ≤ X n :n denote the corresponding order statistics. We are interested in testing the hypothesis H o : F (x ) = Fo (x ) for all x vs. H 1 : F (x ) ≠ Fo (x ) for some x , where Fo (x ) is a Gumbel distribution function of the following form ⎛ ⎛ x −α ⎞⎞ Fo (x ; α , β ) = exp ⎜ − exp ⎜ − ⎟, β ⎟⎠ ⎠ ⎝ ⎝ and its density function is
(1)
Goodness of fit test for Gumbel distribution
f o ( x; α , β ) =
⎛ x −α ⎛ x −α − exp ⎜ − exp ⎜ − β β β ⎝ ⎝ 1
4705 ⎞⎞ ⎟⎟, ⎠⎠
(2)
where α is a location parameter, β is a scale parameter, x, α ∈ (−∞, ∞), and β > 0. We employ the Kullback-Leibler information which is given by ∞ f (x ) I (f , f 0 ) = ∫ f (x )log dx . (3) f 0 (x ; α , β ) −∞ The quantity I (f , f 0 ) describes the amount of ‘Information’ lost for approximating f (x ) using f 0 (x ). The larger value of I (f , f 0 ) indicates the greater disparity between f (x ) and f 0 (x ). It known that I (f , f 0 ) = 0 if and only if f (x ) ≡ f 0 (x ) for all x > 0. Hence the test can be designed as follows. Reject H o vs H 1 if I (f , f 0 ) is large. Vasicek (1976) and Song (2002) find that ∞
I ( f , f0 ) =
∫
∞
f ( x )log f ( x ) dx −
−∞
∫
f ( x )log f 0 ( x ; α , β ) dx
−∞
1 n 1 n ⎛ n (4) = ∑ log ⎜ ( xi + m:n − xi − m:n ) ⎞⎟ − ∑ log f 0 ( x ; αˆ , βˆ ) , n i =1 ⎝ 2m ⎠ n i =1 where m , called window size, is a positive integer (m ≤ n / 2), xi:n = x1:n for i < 1 and x i :n = x n :n for i > n . For the Gumbel distribution, I (f , f 0 ) can be estimated, denoted as I mn , and be given by I mn = -H (f ) −
1 n logf 0 (x ; αˆ , βˆ ) ∑ n i =1
=−
⎛ x − αˆ ⎞ x αˆ 1 n 1 n ⎡ n log ⎢ ( X i + m :n − X i −m :n )⎤⎥ + log βˆ + ˆ − ˆ + ∑ exp ⎜ − i ˆ ⎟ ∑ n i =1 β β n i =1 β ⎠ ⎣ 2m ⎦ ⎝
=−
n 1 n ⎛ n ⎞ ˆ + 1 exp ⎜⎛ − x i − αˆ ⎟⎞. log log X X β − + ( ) ∑ ⎜⎝ 2m i +m :n i −m :n ⎟⎠ ∑ n i =1 n i =1 βˆ ⎠ ⎝
(5)
Many estimators for α and β are considered. The estimators derived are maximum likelihood, order statistics, moments and L-moments. The purpose is to find the estimator which is able to give better power. Estimators of α , β We will introduce four different types of estimators for α and β , which are mle, me, os and lm. i) Maximum Likelihood Estimator (mle ) : We denote the mle of α , β as αˆ mle , βˆmle , respectively. Let X 1 , X 2 ,..., X n be a random sample from (2). The log-likelihood function is given by
4706
S. A. Al-Subh n ⎛ x −α ⎞ n ⎛ x i −α ⎞ l (α , β ) = − n log ( β ) − ∑ ⎜ i . ⎟ − ∑ exp ⎜ − β ⎠ i =1 β ⎟⎠ i =1 ⎝ ⎝
(6)
After taking the derivatives with respect to α and β and equating to 0, we obtain the equations given as n
βˆmle = x − ∑ x iw i and αˆ mle = − βˆ log ( z ) .
(7)
i =1
⎛ x where z i = exp ⎜⎜ − i ˆ ⎝ β mle
⎞ zi 1 n . ⎟⎟ , z = ∑ z i and w i = n i =1 nz ⎠
ii) Method of Moment Estimator ( me ) : The mean and variance for Gumbel distribution are given by
μ = α + γβ and σ 2 =
π2 6
β 2 . (8)
The moment estimators of the two parameters are 6 βˆme = s
(9)
π
αˆ me = x − γβˆme where s, x are the sample standard deviation and mean, respectively, and γ = 0.57721566 is Euler's constant.
(10)
iii) Order Statistics Estimator (os ) : The p th quantile for the Gumbel distribution is Q ( p ; α , β ) = F −1 ( p ) = α − β log ( − log p ) ,
0 < p d α * ) , where I (.) stands for indicator function. 40, 000 t =1 The power of each test is approximated based on a Monte Carlo simulation of 40,000 iterations according to the algorithm above. We compare the efficiency of the tests for different sample sizes: n = 12, 18, 24, 36, different window sizes: m = 1, 2,3, 4 and different alternative distributions: Normal (0, 1), Logistic (0, .7), Laplace (0, 1), StudentT (12), StudentT (4), Cauchy (0, 1), Exponential (1), Weibull (Γ(1.5), 2),
T (H ) ≈
W eibull (2, .5), and Lognormal (−0.2, Tables (1) and (2).
.4). The Simulation results are presented in the
TABLE 1. Percentage points for the test statistics I mn for different sample sizes
n = 12, 18, 24, 36, window sizes m = 1, 2,3, 4 and α * = 0.05. n
m mle
12 1 2 3 4 18 1 2 3 4
.831 .614 .561 .566 .655 .460 .422 .430
me
os
lm
n
mle
me
os
lm
.820 .629 .613 .623 .679 .495 .467 .472
1.091 .889 .882 .881 .850 .692 .660 .669
.810 .615 .447 .596 .659 .475 .449 .451
24
.571 .389 .352 .346 .487 .318 .269 .259
.672 .421 .385 .386 .512 .339 .295 .284
.706 .531 .496 .585 .567 .403 .362 .347
.580 .400 .368 .364 .497 .327 .283 .270
36
Goodness of fit test for Gumbel distribution
4709
TABLE 2. Power estimates of the I mn statistics under different alternative distributions with different sample sizes n = 12, 18, 24, 36, window sizes
m = 1, 2,3, 4 and α * = 0.05.
Normal (0, 1) n
m mle
12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4
.098 .120 .135 .164 .120 .169 .173 .183 .134 .189 .232 .237 .182 .254 .319 .348
me
os
lm
.113 .163 .183 .218 .188 .237 .268 .289 .255 .317 .354 .364 .355 .446 .509 .512
.114 .125 .132 .133 .177 .179 .188 .198 .183 .202 .211 .224 .225 .288 .301 .316
.114 .142 .164 .201 .161 .208 .229 .255 .211 .265 .308 .325 .296 .391 .425 .459
StudentT (12) n
m mle
12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4
.095 .123 .145 .181 .109 .160 .197 .210 .146 .213 .238 .254 .220 .311 .355 .372
Logistic (0, .7) mle .083 .125 .149 .190 .126 .180 .194 .225 .157 .231 .248 .260 .247 .341 .376 .408
me
os
Laplace (0, 1)
lm
mle
me
.057 .174 .146 .122 .218 .196 .179 .174 .156 .260 .214 .184 .205 .175 .277 .245 .196 .230 .212 .304 .058 .273 .218 .196 .333 .293 .279 .271 .250 .389 .314 .296 .284 .251 .397 .333 .296 .302 .247 .400 .063 .309 .291 .284 .436 .515 .331 .345 .357 .484 .407 .343 .371 .361 .500 .416 .350 .388 .346 .492 .070 .427 .406 .431 .595 .533 .453 .482 .525 .657 .547 .467 .520 .565 .665 .563 .485 .535 .548 .669
StudentT (4)
os
lm
.282 .304 .561 .313 .449 .472 .474 .477 .531 .557 .567 .565 .702 .723 .729 .732
.197 .244 .381 .283 .330 .365 .385 .378 .432 .485 .490 .490 .586 .651 .673 .663
Cauchy (0, 1)
me
os
lm
mle
me
os
lm
mle
me
os
.139 .181 .207 .135 .220 .269 .295 .320 .291 .357 .386 .401 .409 .494 .535 .552
.147 .157 .154 .171 .232 .230 .245 .251 .260 .274 .287 .300 .356 .392 .406 .404
.125 .161 .189 .218 .192 .248 .265 .285 .260 .314 .336 .385 .357 .434 .483 .492
.102 .128 .161 .205 .150 .208 .251 .256 .201 .281 .325 .325 .310 .398 .470 .482
.197 .241 .263 .281 .310 .351 .367 .368 .389 .435 .463 .464 .523 .591 .605 .602
.241 .249 .262 .252 .363 .376 .381 .378 .429 .452 .456 .463 .574 .601 .605 .603
.184 .226 .248 .272 .287 .337 .343 .352 .368 .419 .434 .436 .519 .576 .589 .589
.467 .498 .468 .452 .679 .711 .690 .616 .808 .849 .836 .793 .929 .955 .959 .794
.580 .671 .584 .687 .579 .684 .549 .683 .755 .842 .776 .854 .751 .850 .709 .849 .854 .924 .875 .931 .863 .923 .841 .925 .949 .982 .965 .984 .962 .985 .955 .984
lm .563 .575 .559 .537 .742 .785 .733 .686 .849 .870 .856 .686 .949 .961 .960 .953
4710
S. A. Al-Subh
TABLE 2. (Cont.)
n
Exponential (1) m mle me os
12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4
.133 .168 .198 .141 .246 .322 .354 .310 .324 .471 .485 .470 .468 .659 .710 .698
.177 .219 .190 .137 .270 .327 .353 .309 .365 .463 .481 .475 .538 .673 .721 .736
.073 .065 .061 .039 .098 .103 .097 .082 .198 .217 .218 .216 .364 .452 .487 .489
W eibull (Γ(1.5), 2) lm mle me os lm .171 .218 .200 .151 .264 .359 .371 .327 .363 .483 .503 .488 .520 .692 .737 .735
.223 .318 .292 .186 .382 .500 .519 .445 .491 .649 .675 .638 .700 .845 .877 .885
.278 .323 .301 .223 .424 .518 .528 .482 .557 .675 .697 .674 .756 .871 .897 .901
.120 .124 .155 .064 .166 .189 .390 .136 .333 .412 .475 .394 .584 .672 .717 .701
.263 .332 .308 .243 .410 .530 .530 .494 .553 .686 .719 .690 .733 .866 .905 .903
W eibull (2, .5) mle me os
lm
.120 .062 .065 .065 .167 .069 .071 .077 .198 .077 .087 .087 .312 .089 .102 .104
.058 .062 .062 .066 .063 .066 .071 .069 .066 .073 .080 .085 .068 .085 .091 .101
.071 .061 .058 .061 .071 .063 .068 .070 .065 .073 .075 .078 .070 .081 .089 .103
.042 .044 .041 .042 .045 .047 .044 .050 .049 .052 .053 .047 .051 .053 .064 .059
TABLE 2. (Cont.) Lognormal (−0.2,
n
m mle
12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4
.085 .101 .096 .084 .099 .128 .127 .106 .108 .164 .170 .162 .144 .210 .226 .215
.4)
me
os
lm
.091 .100 .085 .067 .125 .146 .189 .118 .157 .187 .189 .179 .210 .265 .269 .278
.051 .056 .047 .039 .045 .045 .041 .038 .070 .070 .066 .068 .098 .100 .106 .093
.089 .089 .091 .067 .104 .137 .135 .117 .125 .173 .181 .162 .167 .227 .258 .256
Goodness of fit test for Gumbel distribution
4711
From the above tables, the percentage points for the test decrease as the sample size n increases and the power increases as the sample size n increases and the window size m increases. Also, in the case of moment estimator, the test has the highest power for Normal, Logistic, Student T (12) , and Exponential distributions but in the case of LMoment estimator, the test has the highest power for W eibull (Γ(1.5), 2) and lognormal distributions. In addition, in the case of Maximum Likelihood estimator, the test has the highest power for W eibull (2, .5) distribution and the lowest power for Cauchy. 4. CONCLUSION
An accurate estimation of parameters of the Gumbel distribution in statistical analysis is importance. In this paper, we have introduced a test statistic of goodness of fit for Gumbel distribution based on Kullback-Leibler information measure. We considered ten different distributions under the alternative hypothesis. It is found that the test statistics based on estimators found by moment and order statistic methods have the highest power, except for weibull and Lognormal distributions. In the case of Cauchy, the test is found to have the highest power for all estimators but the test has the lowest power for Weibull (2, .5) . The theory developed could be extended easily to other distributions. Also, we can apply RSS, extreme RSS and median RSS on this test statistics.
REFERENCES
[1] E.J. Gumbel, Statistics of Extremes. Columbia University Press, New York, 1958. [2] H.A. David & H.N. Nagaraja, Order Statistics. Third Edition. Hoboken, New Jersey: John Wiley & Sons, Inc, 2003. [3] I. Arizono & H. Ohta, A test for Normailty based on Kullback-Leibler Information. American Statistician 43(1989), 20-22. [4] J.R.M. Hosking, L-moments: Analysis and estimation of distribution using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B 52(1990): 105-124. [5] K. Ibrahim, M.T. Alodat, A.A. Jemain & S.A. Al-Subh, Chi-square test for goodness of fit for logistic distribution using ranked set sampling and simple random sampling. Statistica & Applicazioni IX(2) (2011): 111-128. [6] K.S. Song, Goodness-of-fit tests based on Kullback-Leibler discrimination information. IEEE Transactions on Information Theory 48(2002): 1103-1117. [7] O. Vasicek, A test for normality based on sample entropy. Journal of the Royal Statistical Society: Series B 38(1976): 54-59.
4712
S. A. Al-Subh
[8] P. Perez-Rodriguez, H. Vaquera-Huerta & J. Villasenor-Alva, A goodness-of-fit test for the gumbel distribution based on Kullback-Leibler Information. Communications in Statistics-Theory and Methods 38(2009):842-855. [9] R. D'Agostine & M. Stephens, Goodness of fit Techniques. Marcel Dekker Inc., New York, 1986. R. Kinnison, Correlation coefficient goodness of fit test for the extreme value [10] distribution. American Statistician 43(1989): 98-100. [11]
S. Kullback, Information Theory and Statistics. New York, Wiley, 1959.
[12] S. Kullback & R.A. Libler, On information and sufficiency. The Annals of Mathematical Statistics 4(1951): 49-70.
Received: June 23, 2014