Comparison of the Probability Plot Correlation ... - ASCE Library

6 downloads 0 Views 184KB Size Report
Comparison of the Probability Plot Correlation Coefficient Test. Statistics for the General Extreme Value Distribution. Sooyoung Kim1, Jun-Haeng Heo2. 1 Ph.D.
World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

Comparison of the Probability Plot Correlation Coefficient Test Statistics for the General Extreme Value Distribution Sooyoung Kim1, Jun-Haeng Heo2 1

Ph.D. student, School of Civil and Environmental Engineering, Yonsei University, 134 Shinchon-Dong, Seoul, 120-749, Korea ; Tel(+82-2)393-1597 ; Fax(+82-2)393-1597 ; E-mail: [email protected] 2 School of Civil and Environmental Engineering, Yonsei University, 134 ShinchonDong, Seoul, 120-749, Korea ; Tel(+82-2)2123-2805 ; Fax(+82-2)364-5300 ; E-mail: [email protected] Abstract A proper probability distribution for estimating a quantile is selected by the goodness of fit tests in frequency analysis. The probability plot correlation coefficient(PPCC) test has been known as powerful and easy test among the goodness of fit tests. Generally, the PPCC test statistics are affected by significance levels, sample sizes, plotting position formulas, and shape parameters in case that a given distribution includes a shape parameter. Therefore, it is important to select an exact plotting position formula for the PPCC test statistics for a given probability distribution. After Cunnane(1978) defined the plotting position that related with the mean of data, many researches have accomplished about the plotting position formulas considered the influence of coefficients of skewness related with shape parameters. In this study, the PPCC test statistics are derived by using a plotting position formula developed from theoretical reduced variates with a term of a coefficient of skewness for the general extreme value(GEV) distribution. In addition, the PPCC test statistics are estimated by considering various sample sizes, significance levels, and shape parameters of the GEV distribution. The performance of derived PPCC test statistics is evaluated by estimating the rejection rate of population from Monte Carlo simulation. 1. INTRODUCTION It is very important to select an appropriate probability distribution in frequency analysis for the estimation of design quantile. An appropriate probability distribution is

1

2456

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

2457

selected generally based on the goodness of fit test which is the method for examining the fitness between sample data and its population for a given probability distribution. Various goodness of fit tests such as the Kolmogorov-Smirnov test, the Cramer von Mises test, and the chi-square test have been developed in many literatures. Among the goodness of fit tests, the Probability Plot Correlation Coefficient(PPCC) test developed for nornality test by Filliben(1975) has been known as powerful test. Since then, the PPCC test has been applied to various probability distributions. Vogel(1986) derived the PPCC test statistics for the Gumbel distribution, and Vogel and Kroll(1989) applied the PPCC test to the 2-parameters Weibull and uniform distributions for low flow frequency analysis. Vogel and McMartin(1991) computed the PPCC test statistics of 5% significance level for gamma distribution, and the PPCC test statistics for the GEV distribution are calculated by Chowdhury et al.(1991). Heo et al.(2007) proposed the regression equations to estimate the test statistics for several probability distributions. In this study, 100,000 samples for the general extreme value(GEV) distribution were generated to derive the PPCC test statistics considering various sample sizes n (from 10 to 500), significance levels(from 0.005 to 0.995), and shape parameters. The PPCC test statistics are derived by using a plotting position formula developed from theoretical reduced variates with a term of a coefficient of skewness for the GEV distribution and various existing plotting position formula such as Blom(1958), Gringorten(1963), Filliben(1975), and Cunnane(1978). In addition, Monte Carlo simulation is performed to select an appropriate plotting position formula for assumed probability distributions. 2. THE DERIVATION OF THE PPCC TEST STATISTICS 2.1 The GEV distribution The GEV distribution is defined by Eq. (1)~(2)(Jenkinson, 1955). 1/ k F ( x) = exp ⎡ − {1 − k ( x − u ) / α } ⎤ ⎣ ⎦

= exp ⎡⎣ − exp {−( x − u ) / α }⎤⎦

k ≠0

(1)

k =0

(2)

where, x0 is a location parameter, α is a scale parameter, and β is a shape parameter. 2.2 The derivation of plotting position formula regaring the reduced variates After Cunnane(1978) defined the plotting position that related with the mean of data

and proposed the general formula that can be applied to various probability distributions,

2

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

2458

many researches have derived the plotting position formulas considered the influence of coefficients of skewness related with shape parameters. The mean of density function of the rth smallest value in random sample n is defined as follows(Arnell et al., 1986). E[ xr ] =



n! xr F ( xr ) r −1[1 − F ( xr )]n − r f ( xr )dxr ∫ (r − 1)!(n − r )! −∞

(3)

The reduced variates of the GEV distribution is substituted into Eq.(3) and the theoretical reduced variates from the mean concept are expressed by. 1

E[ y2 r ] =

n! (− ln F ) k F r −1[1 − F ]n − r dF ∫ (r − 1)!(n − r )! 0

(4)

1

E[ y3r ] = −

n! (− ln F ) k F r −1[1 − F ]n − r dF (r − 1)!(n − r )! ∫0

(5)

where, Eqs.(4) and (5) are the reduced variates for the EV2 and EV3 distribution, respectively. In addition, this study adopted the genetic algorithm for the estimation of plotting formula parameters. The objective function of real-coded genetic algorithm(RGA) that is one type of genetic algorithm is the root mean square error(RMSE). Then, population size is 1,000, generation number is 2,000, crossover probability is 0.8, mutation probability is 0.01 in RGA. Table 1. The RMSEs from various plotting position formulas Shape

Coeff. of

Sample

parameter

skewness

size

-0.20

0.10

Arnell

In-na and

Goel

et al.

Nguyen

and De

0.0183

-

0.0156

0.0213

0.0104

0.0140

0.3087

0.0121

0.0166

0.0071

0.0086

0.0119

0.1765

0.0106

0.0143

40

0.0064

0.0077

0.0108

0.1402

0.0096

0.0129

50

0.0059

0.0070

0.0099

0.1162

0.0088

0.0119

100

0.0047

0.0054

0.0077

0.0701

0.0070

0.0093

10

0.0017

0.0036

0.0026

0.0813

0.0022

0.0032

20

0.0011

0.0021

0.0015

0.0358

0.0016

0.0022

30

0.0009

0.0016

0.0011

0.0249

0.0013

0.0018

40

0.0008

0.0015

0.0010

0.0199

0.0011

0.0015

50

0.0007

0.0013

0.0010

0.0165

0.0011

0.0015

100

0.0007

0.0011

0.0008

0.0095

0.0008

0.0010

Derived

Gringorten

Cunnane

10

0.0101

0.0144

20

0.0080

30 3.53507

0.6376

3

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

2459

The values of RMSE between the theoretical reduced variates and calculated those by derived plotting formula and other plotting formulas are compared for the accuracy of plotting position as shown in Table 1. Finally, the derived plotting formula is applicable for the GEV distribution and that formula is proposed by Eq.(6). i − 0.3155 Pi = (6) n + 0.0050γ 2 − 0.0902γ + 0.3074 where, γ is a skewness coefficient. 2.3 The basic concept of the PPCC test The PPCC test was developed by Filliben(1975) for normality test. This test performs the goodness of fit test by using the correlation coefficient r between the ordered observations X i and the corresponding fitted quantiles M i which is determined by plotting position pi for each X i . The correlation coefficient r is

defined by Eq. (7). n

r=

∑(X i =1

n

∑(X i =1

i

i

− X )( M i − M )

− X)

(7)

n

2

∑ (M i =1

i

−M)

2

where X and M represent the mean values of the observation X i and the fitted quantiles M i , respectively, and n is the sample size. If the value of correlation coefficient r is close to 1.0, the observations can be drawn from the fitted distribution. The estimate of the order statistic median for M i proposed by Filliben(1975) is as follows.

M i = φ −1 (mi )

(8)

where φ −1 (⋅) is the inverse of cumulative distribution function for the standard normal distribution and the median value mi is given by mi = 1 − (0.5)1/ n

i =1

(9a)

mi = (i − 0.3175) / (n + 0.365)

i = 2, 3,L, n − 1

(9b)

mi = (0.5)1/ n

i=n

(9c)

The null hypothesis of a given sample cannot be rejected at the q significant level by following condition..

r > rq (n)

(10) 4

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

where rq (n) is the test statistic of the PPCC test for given sample size and significance level. 2.4 The derivation of the PPCC test statistics Vogel and McMartin(1991) provided the procedure for deriving the PPCC test statistics as follows : (a) Generate X i of sample size n ( i = 1,L, n ) for an assumed parent distribution

with given shape parameters. (b) Calculate M i using the inverse of the cumulative distribution function and plotting position. (c) Estimate the correlation coefficient r between generated sample X i and calculated plotting position value M i . (d) Repeat the procedure (from step (a) to step (c)) 100,000 times to obtain 100,000 correlation coefficient r . (e) Select 100,000×qth smallest r as rq This procedure applied to the derivation of the PPCC test statistics for the GEV distribution. The conditions in this study for the derivation of the PPCC test statistics are as follows. · Sample sizes( n ) : 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 200, 300, 500 · Significance levels : 0.005, 0.01, 0.05, 0.1, 0.5, 0.9, 0.95, 0.99, and 0.995 · Applied plotting position formulas : derived plotting formula · The range of shape parameters : -0.3, -0.2, -0.1, -0.05, 0.05, 0.1, 0.2, 0.3 2.5 Results of PPCC test statistics The estimated test statistics are plotted in Figure 1. Figure 1 shows that the values of test statistics increase as sample sizes increase. Especially, test statistics are comptuted steeply until sample size is 50 regardless of shape parameter. The test statistics for negative shape parameters are spreaded widely between 0.8 and 1.0, but test statistics for positive shape parameters are distributed between 0.88 and 1.0. In addition, the widths of test statistics for the negative shape parameters are greater than those of positive shape parameters. And then, the values of test statistics for the range of positive shape parameters are closed to each other as sample sizes increase. The results of test statistics which are derived by various plotting formulas such as derived plotting formula, Blom(1958), Gringorten(1963), Filliben(1975), and

5

2460

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

2461

Cunnane(1978) are shown in Figure 2. Applied plotting formulas except derived plotting formula and Filliben which are shown in Eqs.(6) and (9) are listed in Table 2. Roughly, the values of test statistics for the negative shape parameters are different along sample sizes. On the contrary, the test statistics for positive shape parameters are little different over all sample sizes. In addition, the test statistics by derived plotting formula are greater than other values in case that sample sizes are over 60~70. In other cases, the test statistics by Gringorten(1963) are greater than other test statistics. 1

1

0.95

Test Statistics

Test Statistics

0.95

0.9

0.85

0.8

0.9

0.85

Shape parameter = 0.2

Shape parameter = -0.2

0.75

Significance=0.01 Significance=0.05 Significance=0.1

Significance=0.01 Significance=0.05 Significance=0.1 0.8

0.7 0

50

100

150

0

200

50

100

150

200

Sample size n

Sample size n

(a) Shape parameter is -0.2

(b) Shape parameter is 0.2

Figure 1. The results of PPCC test statistics by derived plotting formula 1

1

Shape = -0.2 Derived Gringorten Cunnane Filliben Blom

Test Statistics

Test Statistics

0.95

0.9

0.95

0.9

Shape = -0.1 Derived Gringorten Cunnane Filliben Blom

0.85

0.8

0.85 0

50

100

150

200

0

50

Sample size n

150

200

(b) Shape parameter is -0.1

(a) Shape parameter is -0.2 1

1

0.95

0.95

Test Statistics

Test Statistics

100

Sample size n

0.9

0.9

Shape = 0.2

Shape = 0.1 Derived Gringorten Cunnane Filliben Blom

0.85

Derived Gringorten Cunnane Filliben Blom

0.85

0.8

0.8 0

50

100

150

200

0

Sample size n

50

100

150

Sample size n

(c) Shape parameter is 0.1

(d) Shape parameter is 0.2

Figure 2. The results of PPCC test statistics by various plotting formulas

6

200

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

2462

Table 2. The recommended plotting position formulas

Name

Plotting position formula

Blom(1958)

pi =

i − 3/8 n + 1/ 4

Gringorten(1963)

pi =

i − 0.44 n + 0.12

Cunnane(1978)

pi =

i − 0.4 n + 0.2

3. Power test 3.1 Procedure of power test In this study, power test was performed to select an appropriate plotting position formula for the PPCC test statistics of the GEV distribution using the Monte Carlo simulation. The procedure of power test for a given parent distribution with various shape parameters, sample sizes, probability distributions, and plotting position formulas are as follows : (a) Assume the GEV distribution as a parent distribution. (b) Generate data set with given shape parameters considering various sample sizes and plotting position formulas. In this case, power test was performed for considering sample sizes such as n = 10, 25, 50, 100, and 200 and plotting

position formulas by derived plotting formula, Blom(1958), Gringorten(1963), Filliben(1975), and Cunnane(1978). In addition, the assumed shape parameters are -0.3, -0.2, -0.1, 0.1, 0.2, and 0.3. (c) Frequency analysis is applied to the generated data set. The method of probability weight moments is used for the parameter estimation in this case. In addition, PPCC test using different plotting position formulas is applied to the generated data set at 5% significance level. (d) Repeat the procedure (from step (a) to step (c)) 10,000 times to obtain 10,000 rejection ratio(%). (e) Estimate the ability of rejection by counting rejection ratio(%). 3.2 Results of power test The results of power tests are shown in Fig. 3. The rejection results for the negative shape parameters for the specific sample size(=25) are shown the value over 5% and 10% significance levels without the effect of plotting position formulas, respectively. However, the rejection results for positive shape parameters in same sample size are

7

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

2463

under 5% and 10% significance level without the effect of plotting position formulas. In addition, the rejection ratios of some shape parameters(-0.3 and -0.2) in case of another sample size(=100) are over both significance levels, respectively. Contrarily, the rejection results of other shape parameters are under both significance levels. Accordingly, the rejection ratios are sensitive for the values of shape parameter in case of relatively small sample size. Especially, the rejection ratios in case that significance level is 0.05 are shown in Table 3. The rejection ratios of derived plotting position formula is higher than others in case of large sample size(100, 200) and some shape parameters(-0.3~0.1), but the rejection ability by plotting position formula of Gringorten(1963) is higher than others in other cases. These results are related with the values of PPCC test statistics estimated by derived plotting formula and Gringorten(1963). 70

60

n=25

The % of Rejection

50

40

30

20

n=100 Shape=-0.3 Shape=-0.2 Shape=-0.1 Shape=0.1 Shape=0.2 Shape=0.3

50

The % of Rejection

Shape=-0.3 Shape=-0.2 Shape=-0.1 Shape=0.1 Shape=0.2 Shape=0.3

60

40

30

20

10

10

0

0 Derived

Gringorten

Blom

Filliben

Derived

Cunnane

(a) Sample size =25, significance level = 0.05

Blom

Filliben

Cunnane

(b) Sample size =100, significance level = 0.05

70

60

n=25

50

40

30

20

n=100 Shape=-0.3 Shape=-0.2 Shape=-0.1 Shape=0.1 Shape=0.2 Shape=0.3

50

The % of Rejection

Shape=-0.3 Shape=-0.2 Shape=-0.1 Shape=0.1 Shape=0.2 Shape=0.3

60

The % of Rejection

Gringorten

Plotting positioin formula

Plotting positioin formula

40

30

20

10

10

0

0 Derived

Gringorten

Blom

Filliben

Derived

Cunnane

Gringorten

Blom

Filliben

Cunnane

Plotting positioin formula

Plotting positioin formula

(c) Sample size =25, significance level = 0.10

(d) Sample size =100, significance level = 0.10

Figure 3. The results of power test Table 3. The comparison of rejection ratio(%) in case of significance level 5% Shape parameters Sample size

Plotting formulas -0.3

-0.2

-0.1

0.1

0,2

0.3

Derived

81

74.81

67.3

47.49

37.3

29.79

Gringorten

81.25

75.13

67.73

48.08

38.22

30.92

Blom

81.13

74.92

67.37

47.49

37.36

29.81

Filliben

80.93

74.72

67.08

46.88

36.68

28.91

10

8

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

25

50

100

200

2464

Cunnane

81.15

74.98

67.51

47.69

37.65

30.2

Derived

63.5

46.57

26.85

2.79

1.06

1.79

Gringorten

63.52

46.6

26.87

2.85

1.16

1.91

Blom

63.48

46.54

26.8

2.78

1.01

1.71

Filliben

63.44

46.5

26.78

2.72

0.98

1.58

Cunnane

63.5

46.55

26.83

2.8

1.06

1.75

Derived

56.78

30.95

9.37

0.51

0.86

1.86

Gringorten

56.79

30.95

9.4

0.53

0.92

1.97

Blom

56.78

30.94

9.38

0.48

0.78

1.83

Filliben

56.78

30.94

9.39

0.49

0.77

1.78

Cunnane

56.78

30.94

9.39

0.48

0.8

1.88

Derived

53.89

18.41

2.47

0.95

0.88

1.71

Gringorten

53.88

18.34

2.37

0.93

0.91

1.74

Blom

53.88

18.37

2.36

0.93

0.9

1.76

Filliben

53.88

18.36

2.4

0.94

0.89

1.77

Cunnane

53.88

18.35

2.36

0.94

0.9

1.72

Derived

50.52

9.67

2.2

1.44

1.25

2.02

Gringorten

50.43

9.31

1.93

1.44

1.25

2.1

Blom

50.43

9.31

1.95

1.48

1.18

2.08

Filliben

50.43

9.32

1.98

1.56

1.2

2.07

Cunnane

50.43

9.31

1.93

1.47

1.26

2.14

4. CONCLUSIONS In this study, an exact plotting formula was derived by using genetic algorithm. In addition, the PPCC test statistics for the GEV distribution were derived by considering various sample sizes, significance levels, shape parameters, and plotting formulas. The power test was performed to select an appropriate plotting position formula by Monte Carlo simulation. As a result, the rejection capability of derived plotting position formula is higher than others in case of large sample size and some shape parameters(0.3~0.1), but the rejection ability by plotting position formula of Gringorten(1963) is higher than others in other cases. 5. ACKNOWLEDGEMENT This study was financially supported by the Construction Technology Innovation Program(08-Tech-Inovation-F01) through the Research Center of Flood Defence

9

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

Technology for Next Generation in Korea Institute of Construction & Transportation Technology Evaluation and Planning(KICTEP) of Ministry of Land, Transport and Maritime Affairs(MLTM). 6. REFERENCES Arnell, N. W., Beran, M., and Hosking, J. R. M. (1986). "Unbiased plotting positions for the general extreme value distribution." Journal of Hydrology, Vol. 86, pp. 59-69. Blom, G. (1958). Statistical estimates and transformed beta variables. John Wiley and Sons, New York. Chowdhury, J.D., Stedinger, J.R., and Lu, L.H. (1991). "Goodness-of-fit tests for regional generalized extreme value flood distributions", Water Resources Research, Vol.27, No.7, pp.1765-1776. Cunnane, C. (1978). "Unbiased plotting positions - A review", Journal of Hydrology, Vol. 37, No. 3/4, pp. 205-222. Filliben, J.J. (1975). "The Probability Plot Correlation Coefficient Test for Normality", Technometrics, Vol. 17, No. 1, pp. 111~117.

Goel, N. K. and De, M. (1993). "Development of unbiased plotting position formula for General Extreme Value distribution." Stochastic Environmental Research and Risk Assessment, Vol. 7, pp. 1-13. Gringorten, I.I. (1963). "A plotting rule for extreme probability paper", Journal of Geophysical Research, Vol.68, No.3, pp.813-814. Heo, J., Kho, Y., Shin, H., Kim, S., Kim, T. (2007). “Regression Equations of Probability Plot Correlation Coefficient Test Statistics from Several Probability Distributions”, Journal of Hydrology, Vol.355, No.1-4, pp. 1-15. In-na, N. and Nguyen, V-T-V. (1989). "An unbiased plotting position formula for the generalized extreme value distribution." Journal of Hydrology, Vol. 106, pp. 193209. Jenkinson, A. F.(1955). “The frequency distribution of the annual maximum(or minimum) values of meteorological elements”. Quarterly Journal of the Royal Meteorological Society, Vol.87, pp.158-171. Vogel, R.M. (1986). "The probability plot correlation coefficient test for the normal, lognormal, and Gumbel distributional hypothesis", Water Resources Research, Vol.22, No.4, pp.587-590. Vogel, R.M. and Kroll, C.N. (1989). “Low-flow frequency analysis using probability plot correlation coefficients”, Journal of Water Resources Planning and Management, Vol.115, No.3, pp.338-357.

10

2465

World Environmental and Water Resources Congress 2010: Challenges of Change. © 2010 ASCE

Vogel, R.M. and McMartin. D.E. (1991). "Probability plot goodness-of-fit and skewness estimation procedures for the Pearson type distribution", Water Resources Research, Vol.27, No.12, pp.3149-3158.

11

2466

Suggest Documents