Goodness of Fit Test for Gumbel Distribution ... - Semantic Scholar

14 downloads 0 Views 102KB Size Report
In this study, a goodness of fit test statistics for Gumbel distribution based on .... employ the Kullback-Leibler information which is given by. 0. 0. ( ). ( , ). ( )log . ( ; , ). f x. I f f. f x ..... distributions. In the case of Cauchy, the test is found to have the.
Applied Mathematical Sciences, Vol. 8, 2014, no. 95, 4703 - 4712 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.46470        Goodness

of Fit Test for Gumbel Distribution

Based on Kullback-Leibler Information Using Several Different Estimators S. A. Al-Subh Department of Mathematics and Statistics Mutah University, Karak, Jordan Copyright © 2014 S. A. Al-Subh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract In

this

H

: F (x ) = F

o

paper, o

(x )

our

objective

is

for all x against H : F ( x ) ≠ F ( x ) 1 o

to test the statistical hypothesis for some x , where F o ( x ) is a known

distribution function. In this study, a goodness of fit test statistics for Gumbel distribution based on Kullback-Leibler information is studied. The performance of the test under simple random sampling is investigated using Monte Carlo simulation. The Gumbel parameters are estimated by using several methods of estimation such as maximum likelihood, order statistics, moments, and L-moments. Ten different distributions are considered under the alternative hypothesis. For all the distributions considered, it is found that the test statistics based on estimators found by moment and order statistic methods have the highest power, except for weibull and Lognormal distributions. Keywords: Goodness of fit test; Kullback-Leibler Information; Entropy; Gumbel distribution; order statistics

1. INTRODUCTION    There are many areas of application of Gumbel distribution such as environmental sciences, system reliability and hydrology. In hydrology, for example, the Gumbel distribution may be used to represent the distribution of the minimum level of a river in a particular year based on

 

4704

S. A. Al-Subh

minimum values for the past few years. It is useful for predicting the occurrence of that an extreme earthquake, flood or other natural disaster. The potential applicability of the Gumbel distribution to represent the distribution of minima relates to extreme value theory which indicates that it is likely to be useful if the distribution of the underlying sample data is of the normal or exponential type. Many studies have been carried out to study goodness of fit tests by using of Kullback-Leibler information. Kinnison (1989) tested the Gumbel distribution using a correlation coefficient type statistic. Arizono and Ohata (1989) proposed a test of normality based on an estimate of the Kullback-Leibler information. Song (2002) presented a general methodology for developing asymptotically distribution-free goodness of fit tests based on the Kullback-Leibler information. Also, he has shown that the tests are omnibus within an extremely large class of nonparametric global alternatives and to have good local power. Ibrahim et al. (2011) found that the goodness of fit test based on Kullback-Leibler information supports the results which indicate that the chi-square test is most powerful under RSS than SRS for some selected order statistics. In this paper, we introduce a goodness of fit test for Gumbel distribution which is based on the Kullback-Leibler information. We estimate the Gumbel parameters by using several methods of estimation such as maximum likelihood, order statistics, moments and Lmoments. According to Hosking (1990), L-moments have the theoretical advantages over the conventional moments of being able to characterize a wider range of distributions and, when estimated from a sample, the method is more robust to the presence of outliers in the data. Also, the parameter estimates obtained from L-moments are sometimes more accurate in small samples than the maximum likelihood estimates. We compute the percentage values and the power based on the statistics which involves the Kullback-Leibler information using Monte Carlo simulations. This paper is organized as follows. First, we define the test statistic and the estimators of Gumbel distribution. Then, we write the procedures to calculate the percentage points and the power function of the test statistic under an alternative distribution. In addition, a simulation study is conducted to study the power of the test statistic and we state our conclusions.

2. PRELIMINARY NOTES AND METHODS Test Statistics Let X 1 , X 2 ,..., X n be a random sample from the distribution function F (x ) with quantile function Q (u ) = F −1 (u ) and let X 1:n ≤ X 2:n ≤ ... ≤ X n :n denote the corresponding order statistics. We are interested in testing the hypothesis H o : F (x ) = Fo (x ) for all x vs. H 1 : F (x ) ≠ Fo (x )   for some x , where Fo (x ) is a Gumbel distribution function of the following form ⎛ ⎛ x −α ⎞⎞ Fo (x ; α , β ) = exp ⎜ − exp ⎜ − ⎟, β ⎟⎠ ⎠ ⎝ ⎝ and its density function is

(1)

 

Goodness of fit test for Gumbel distribution

f o ( x; α , β ) =

⎛ x −α ⎛ x −α − exp ⎜ − exp ⎜ − β β β ⎝ ⎝ 1

4705 ⎞⎞ ⎟⎟, ⎠⎠

(2)

where α is a location parameter, β is a scale parameter, x, α ∈ (−∞, ∞), and β > 0. We employ the Kullback-Leibler information which is given by ∞ f (x ) I (f , f 0 ) = ∫ f (x )log dx . (3) f 0 (x ; α , β ) −∞ The quantity I (f , f 0 ) describes the amount of ‘Information’ lost for approximating f (x ) using f 0 (x ). The larger value of I (f , f 0 ) indicates the greater disparity between f (x ) and f 0 (x ). It known that I (f , f 0 ) = 0 if and only if f (x ) ≡ f 0 (x ) for all x > 0. Hence the test can be designed as follows. Reject H o vs H 1 if I (f , f 0 ) is large. Vasicek (1976) and Song (2002) find that ∞

I ( f , f0 ) =





f ( x )log f ( x ) dx −

−∞



f ( x )log f 0 ( x ; α , β ) dx

−∞

1 n 1 n ⎛ n (4) = ∑ log ⎜ ( xi + m:n − xi − m:n ) ⎞⎟ − ∑ log f 0 ( x ; αˆ , βˆ ) , n i =1 ⎝ 2m ⎠ n i =1 where m , called window size, is a positive integer (m ≤ n / 2), xi:n = x1:n for i < 1 and x i :n = x n :n for i > n . For the Gumbel distribution, I (f , f 0 ) can be estimated, denoted as I mn , and be given by I mn = -H (f ) −

1 n logf 0 (x ; αˆ , βˆ ) ∑ n i =1

=−

⎛ x − αˆ ⎞ x αˆ 1 n 1 n ⎡ n log ⎢ ( X i + m :n − X i −m :n )⎤⎥ + log βˆ + ˆ − ˆ + ∑ exp ⎜ − i ˆ ⎟ ∑ n i =1 β β n i =1 β ⎠ ⎣ 2m ⎦ ⎝

=−

n 1 n ⎛ n ⎞ ˆ + 1 exp ⎜⎛ − x i − αˆ ⎟⎞. log log X X β − + ( ) ∑ ⎜⎝ 2m i +m :n i −m :n ⎟⎠ ∑ n i =1 n i =1 βˆ ⎠ ⎝

(5)

Many estimators for α and β are considered. The estimators derived are maximum likelihood, order statistics, moments and L-moments. The purpose is to find the estimator which is able to give better power. Estimators of α , β We will introduce four different types of estimators for α and β , which are mle, me, os and lm. i) Maximum Likelihood Estimator (mle ) : We denote the mle of α , β as αˆ mle , βˆmle , respectively. Let X 1 , X 2 ,..., X n be a random sample from (2). The log-likelihood function is given by

 

4706

S. A. Al-Subh n ⎛ x −α ⎞ n ⎛ x i −α ⎞ l (α , β ) = − n log ( β ) − ∑ ⎜ i . ⎟ − ∑ exp ⎜ − β ⎠ i =1 β ⎟⎠ i =1 ⎝ ⎝

(6)

After taking the derivatives with respect to α and β and equating to 0, we obtain the equations given as n

βˆmle = x − ∑ x iw i and αˆ mle = − βˆ log ( z ) .

(7)

i =1

⎛ x where z i = exp ⎜⎜ − i ˆ ⎝ β mle

⎞ zi 1 n .  ⎟⎟ , z = ∑ z i and w i = n i =1 nz ⎠

ii) Method of Moment Estimator ( me ) : The mean and variance for Gumbel distribution are given by

μ = α + γβ and σ 2 =

π2 6

β 2 .                                                                                              (8)

The moment estimators of the two parameters are 6 βˆme = s

(9)

π

αˆ me = x − γβˆme where s, x are the sample standard deviation and mean, respectively,  and γ = 0.57721566 is Euler's constant.  

(10)

iii) Order Statistics Estimator (os ) : The p th quantile for the Gumbel distribution is Q ( p ; α , β ) = F −1 ( p ) = α − β log ( − log p ) ,

0 < p d α * ) , where I (.)  stands for indicator function. 40, 000 t =1 The power of each test is approximated based on a Monte Carlo simulation of 40,000 iterations according to the algorithm above. We compare the efficiency of the tests for different sample sizes: n = 12, 18, 24, 36, different window sizes: m = 1, 2,3, 4 and different alternative distributions: Normal (0, 1),   Logistic (0, .7),   Laplace (0, 1),   StudentT (12), StudentT (4),   Cauchy (0, 1),   Exponential (1),   Weibull (Γ(1.5), 2),  

T (H ) ≈

W eibull (2, .5),  and Lognormal (−0.2, Tables (1) and (2).

.4).           The Simulation results are presented in the

TABLE 1. Percentage points for the test statistics I mn for different sample sizes

n = 12, 18, 24, 36, window sizes m = 1, 2,3, 4 and α * = 0.05. n

m mle

12 1 2 3 4 18 1 2 3 4

.831 .614 .561 .566 .655 .460 .422 .430

me

os

lm

n

mle

me

os

lm

.820 .629 .613 .623 .679 .495 .467 .472

1.091 .889 .882 .881 .850 .692 .660 .669

.810 .615 .447 .596 .659 .475 .449 .451

24

.571 .389 .352 .346 .487 .318 .269 .259

.672 .421 .385 .386 .512 .339 .295 .284

.706 .531 .496 .585 .567 .403 .362 .347

.580 .400 .368 .364 .497 .327 .283 .270

36

 

Goodness of fit test for Gumbel distribution

4709

TABLE 2. Power estimates of the I mn statistics under different alternative distributions with different sample sizes n = 12, 18, 24, 36, window sizes

m = 1, 2,3, 4 and α * = 0.05.

Normal (0, 1) n

m mle

12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4

.098 .120 .135 .164 .120 .169 .173 .183 .134 .189 .232 .237 .182 .254 .319 .348

me

os

lm

.113 .163 .183 .218 .188 .237 .268 .289 .255 .317 .354 .364 .355 .446 .509 .512

.114 .125 .132 .133 .177 .179 .188 .198 .183 .202 .211 .224 .225 .288 .301 .316

.114 .142 .164 .201 .161 .208 .229 .255 .211 .265 .308 .325 .296 .391 .425 .459

StudentT (12) n

m mle

12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4

.095 .123 .145 .181 .109 .160 .197 .210 .146 .213 .238 .254 .220 .311 .355 .372

Logistic (0, .7) mle .083 .125 .149 .190 .126 .180 .194 .225 .157 .231 .248 .260 .247 .341 .376 .408

me

os

Laplace (0, 1)

lm

mle

me

.057 .174 .146 .122 .218 .196 .179 .174 .156 .260 .214 .184 .205 .175 .277 .245 .196 .230 .212 .304 .058 .273 .218 .196 .333 .293 .279 .271 .250 .389 .314 .296 .284 .251 .397 .333 .296 .302 .247 .400 .063 .309 .291 .284 .436 .515 .331 .345 .357 .484 .407 .343 .371 .361 .500 .416 .350 .388 .346 .492 .070 .427 .406 .431 .595 .533 .453 .482 .525 .657 .547 .467 .520 .565 .665 .563 .485 .535 .548 .669

StudentT (4)

os

lm

.282 .304 .561 .313 .449 .472 .474 .477 .531 .557 .567 .565 .702 .723 .729 .732

.197 .244 .381 .283 .330 .365 .385 .378 .432 .485 .490 .490 .586 .651 .673 .663

Cauchy (0, 1)

me

os

lm

mle

me

os

lm

mle

me

os

.139 .181 .207 .135 .220 .269 .295 .320 .291 .357 .386 .401 .409 .494 .535 .552

.147 .157 .154 .171 .232 .230 .245 .251 .260 .274 .287 .300 .356 .392 .406 .404

.125 .161 .189 .218 .192 .248 .265 .285 .260 .314 .336 .385 .357 .434 .483 .492

.102 .128 .161 .205 .150 .208 .251 .256 .201 .281 .325 .325 .310 .398 .470 .482

.197 .241 .263 .281 .310 .351 .367 .368 .389 .435 .463 .464 .523 .591 .605 .602

.241 .249 .262 .252 .363 .376 .381 .378 .429 .452 .456 .463 .574 .601 .605 .603

.184 .226 .248 .272 .287 .337 .343 .352 .368 .419 .434 .436 .519 .576 .589 .589

.467 .498 .468 .452 .679 .711 .690 .616 .808 .849 .836 .793 .929 .955 .959 .794

.580 .671 .584 .687 .579 .684 .549 .683 .755 .842 .776 .854 .751 .850 .709 .849 .854 .924 .875 .931 .863 .923 .841 .925 .949 .982 .965 .984 .962 .985 .955 .984

lm .563 .575 .559 .537 .742 .785 .733 .686 .849 .870 .856 .686 .949 .961 .960 .953

 

4710

S. A. Al-Subh

TABLE 2. (Cont.)  

n

Exponential (1) m mle me os

12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4

.133 .168 .198 .141 .246 .322 .354 .310 .324 .471 .485 .470 .468 .659 .710 .698

.177 .219 .190 .137 .270 .327 .353 .309 .365 .463 .481 .475 .538 .673 .721 .736

.073 .065 .061 .039 .098 .103 .097 .082 .198 .217 .218 .216 .364 .452 .487 .489

W eibull (Γ(1.5), 2) lm mle me os lm .171 .218 .200 .151 .264 .359 .371 .327 .363 .483 .503 .488 .520 .692 .737 .735

.223 .318 .292 .186 .382 .500 .519 .445 .491 .649 .675 .638 .700 .845 .877 .885

.278 .323 .301 .223 .424 .518 .528 .482 .557 .675 .697 .674 .756 .871 .897 .901

.120 .124 .155 .064 .166 .189 .390 .136 .333 .412 .475 .394 .584 .672 .717 .701

.263 .332 .308 .243 .410 .530 .530 .494 .553 .686 .719 .690 .733 .866 .905 .903

W eibull (2, .5)   mle me os

lm

.120 .062 .065 .065 .167 .069 .071 .077 .198 .077 .087 .087 .312 .089 .102 .104

.058 .062 .062 .066 .063 .066 .071 .069 .066 .073 .080 .085 .068 .085 .091 .101

.071 .061 .058 .061 .071 .063 .068 .070 .065 .073 .075 .078 .070 .081 .089 .103

.042 .044 .041 .042 .045 .047 .044 .050 .049 .052 .053 .047 .051 .053 .064 .059

TABLE 2. (Cont.)   Lognormal (−0.2,

n

m mle

12 1 2 3 4 18 1 2 3 4 24 1 2 3 4 36 1 2 3 4

.085 .101 .096 .084 .099 .128 .127 .106 .108 .164 .170 .162 .144 .210 .226 .215

.4)  

me

os

lm

.091 .100 .085 .067 .125 .146 .189 .118 .157 .187 .189 .179 .210 .265 .269 .278

.051 .056 .047 .039 .045 .045 .041 .038 .070 .070 .066 .068 .098 .100 .106 .093

.089 .089 .091 .067 .104 .137 .135 .117 .125 .173 .181 .162 .167 .227 .258 .256

 

Goodness of fit test for Gumbel distribution

4711

From the above tables, the percentage points for the test decrease as the sample size n increases and the power increases as the sample size n increases and the window size m increases. Also, in the case of moment estimator, the test has the highest power for Normal, Logistic, Student T (12) ,    and Exponential distributions but in the case of LMoment estimator, the test has the highest power for W eibull (Γ(1.5), 2) and lognormal distributions. In addition, in the case of Maximum Likelihood estimator, the test has the highest power for W eibull (2, .5) distribution and the lowest power for Cauchy. 4. CONCLUSION

An accurate estimation of parameters of the Gumbel distribution in statistical analysis is importance. In this paper, we have introduced a test statistic of goodness of fit for Gumbel distribution based on Kullback-Leibler information measure. We considered ten different distributions under the alternative hypothesis. It is found that the test statistics based on estimators found by moment and order statistic methods have the highest power, except for weibull and Lognormal distributions. In the case of Cauchy, the test is found to have the highest power for all estimators but the test has the lowest power for Weibull (2, .5) . The theory developed could be extended easily to other distributions. Also, we can apply RSS, extreme RSS and median RSS on this test statistics.

REFERENCES

[1] E.J. Gumbel, Statistics of Extremes. Columbia University Press, New York, 1958. [2] H.A. David & H.N. Nagaraja, Order Statistics. Third Edition. Hoboken, New Jersey: John Wiley & Sons, Inc, 2003. [3] I. Arizono & H. Ohta, A test for Normailty based on Kullback-Leibler Information. American Statistician 43(1989), 20-22. [4] J.R.M. Hosking, L-moments: Analysis and estimation of distribution using linear combinations of order statistics. Journal of the Royal Statistical Society: Series B 52(1990): 105-124. [5] K. Ibrahim, M.T. Alodat, A.A. Jemain & S.A. Al-Subh, Chi-square test for goodness of fit for logistic distribution using ranked set sampling and simple random sampling. Statistica & Applicazioni IX(2) (2011): 111-128. [6] K.S. Song, Goodness-of-fit tests based on Kullback-Leibler discrimination information. IEEE Transactions on Information Theory 48(2002): 1103-1117. [7] O. Vasicek, A test for normality based on sample entropy. Journal of the Royal Statistical Society: Series B 38(1976): 54-59.

 

4712

S. A. Al-Subh

[8] P. Perez-Rodriguez, H. Vaquera-Huerta & J. Villasenor-Alva, A goodness-of-fit test for the gumbel distribution based on Kullback-Leibler Information. Communications in Statistics-Theory and Methods 38(2009):842-855. [9] R. D'Agostine & M. Stephens, Goodness of fit Techniques. Marcel Dekker Inc., New York, 1986. R. Kinnison, Correlation coefficient goodness of fit test for the extreme value [10] distribution. American Statistician 43(1989): 98-100. [11]

S. Kullback, Information Theory and Statistics. New York, Wiley, 1959.

[12] S. Kullback & R.A. Libler, On information and sufficiency. The Annals of Mathematical Statistics 4(1951): 49-70.

Received: June 23, 2014