closed form, and estimates of parameters, large quantiles, and confidence intervals of these quantiles will be determined. Differences between the estimated ...
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions Raul Hernandez-Molinar Tec de Monterrey, Campus San Luis Potosí. John Lefante. Tulane University. In some applications, the population characteristics of main interest can be found in the tails of the distribution function. The study of risk of extreme events will lead to the use of probability distributions and the scenarios that correspond to the tail of these distributions. Considering two approaches: parametric and nonparametric, the research emphasizes the assessment of distribution tails, assuming that underlying distributions are heavy tailed. Two heavy tailed distributions are considered: Generalized Pareto and Lognormal. The Maximum likelihood estimation method, using the complete sample, and using only the upper order statistics provide estimators of the parameters. Measures of Bias and Mean Squared Error of the estimators of the parameters, and the Conditional Mean Exceedence Functions of the distributions, are generated. The methodology for estimating population parameters, has potential applications in financial markets, quality control, assurance portfolios, monitoring of residual discharges, medical applications, design of environmental policies, or calibration and adjustment of processes and equipment. The main idea is to present, and analyze the methods used for the estimation, and some convergence problems when these two distribution functions are used in generating scenarios. I.
Introduction
The determination of distribution functions based on censored samples of extreme values is very useful for instance, in the determination of limits of tolerance, or policy formulations. If the determination involves the risk of extreme events or situations with a low probability of occurrence, the analysis can consider those techniques derived under extreme value theory. The work deals with the comparison of two estimation methods: the classical maximum likelihood estimation method and an asymptotic maximum likelihood estimation method (AEVML), when a censored sample of k upper order statistics is used. Two distribution functions are employed as underlying distributions:
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Lognormal, Generalized Pareto. These two distributions belong to the domain of attraction of the Gumbel limiting distribution exp(-exp(-x)). Using the joint density function of the set of upper order statistics, normalizing constants depending on the underlying distribution parameters, and normalizing constants proposed by I. Weissman (1978), under the parametric or nonparametric approach, the estimation of the underlying distribution parameters will be achieved. The estimated distribution function parameters are defined in closed form, and estimates of parameters, large quantiles, and confidence intervals of these quantiles will be determined. Differences between the estimated parameters are analyzed when the entire sample and the sample with the upper order statistics are generated using Monte Carlo simulation processes. II.
Maximum Likelihood Asymptotic Method
This Method was proposed by I. Weissman, it is applied when we have a random sample of size n, using only the k upper order statistics. The method is called “asympthotic” given the limiting distribution function is generated taken into account the asymptothical property of the extreme values. The main assumption is that the distribution function belongs to the domain of attraction of the limiting distribution function Gumbel. The normalization constants an , bn need to be known. It is possible the approximation of the likelihood function of the joint distribution function for the k upper order statistics, which are evaluated on ( yn− k +1;n ,L, yn;n ) . The method maximizes the asymptotic likelihood function in order to obtain then estimators for the distribution function, from any population sample under study, based on the determination of the normalization constants an , bn , which are required for the standardization of the random variables in the domain of attraction for the limiting distribution. The asymptotic likelihood function for the k upper order statistics, using the Gumbel distribution is:
R| S| T
c h ∏ a1 exp −cu h U|V| . W
u k (ui ) = exp − exp − uk
k
i =1
n
i
The log for the corresponding likelihood function is:
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
d
c h
i
k
c h
l u k (ui ) = − exp − uk − ∑ ui − k ln(an ) . III.
i =1
Estimating the parameters for Lognormal Distribution Function.
The proposed normalizing constants are: a
* n
F n IJ = σ G 2 ln H 2 πK
−1
,y
(1)
R| F n I ln lnG J H n I F | 2 πK = σ G 2 ln − H 2 π JK S| F 4 I F ln(n) I |T GH 2σ JK GH 2 π JK F n IJ . L = G 2 ln H 2 πK 1
bn*
2
2
1
Note that:
and
2
1
R| F n I ln lnG H 2 π JK | L =S || FG 4 IJ FG ln(n) IJ TH 2 K H 2 π K 2
1
2
U| |V . || W
1
2
U| |V + µ . || W
(2)
(3)
(4)
This means the following: an =
FG σ IJ exp(σL − σL + µ) , y HLK 2
(5)
bn = exp(σL1 − σL2 + µ ) .
(6)
1
1
The estimation of the parameters associated to the original distribution, using normalizing constants generated from Weissman´s equations (1) and (2); and solving to determine the parameters:
F I GH JK
a$ σ$ = n L1 , b$n
(7)
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
LM 1 y k∑ M $ σ =M MM y N k
i =1
n −i +1;n
n − k +1;n
− yn− k +1;n
+ a$n ln( k )
$ 1. µ$ = ln(b$n ) + σ$ L2 − σL
then,
µ$ = ln yn − k +1;n
IV.
OP PPFG 2 ln n IJ PPH 2 π K Q
1 2
R| F n I ln lnG H 2 π JK | + a$ ln( k ) − σ$ S || FG 4 IJ FG ln(n) IJ TH 2 K H 2 π K n
, and
(8)
(9)
1
2
U| |V − σ$ FG 2 ln n IJ || H 2 π K W
1
2
.
(10)
Estimating the parameters for the Generalized Pareto
It is possible to consider the normalizing constants as a function of the α and β parameters for the Generalized Pareto distribution, using the following
an = bn
1 α
(11)
F I = lnG αβn J H K 1 α
(12)
We can estimate the parameters α and β, using equation (11) :
1 α$ = . a$n
(13)
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
If we employ the sample with the k upper order statistics, and the normalizing constants proposed by Weissman:
L1 α$ = M ∑ y Nk k
i =1
n −i +1;n
− yn − k +1;n
OP Q
−1
.
(14)
and with the equation (12) 1
exp(bn ) = αβn α
(15)
and
β=
exp(bn ) αn
1 α
.
(16)
Using the sample with the k upper order statistics, we have:
β$ =
exp yn− k +1;n + a$n ln( k )
=
α$ n
(17)
1 α
LM N
exp yn− k +1;n + α$ n
1 α
1 ln( k ) α$
OP Q.
(18)
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
V. Summary of the equations for estimating the distribution functions parameters based on the upper order statistics
LM 1 y k∑ σ$ = MM MM y N k
Lognormal
OP PPFG 2 ln n IJ + a$ ln( k ) H PP 2 π K Q R| F n I ln lnG H 2 π JK | + a$ ln( k ) − σ$ S || FG 4 IJ FG ln(n) IJ TH 2 K H 2 π K F n IJ −σ$ G 2 ln H 2 πK
n −i +1;n − yn − k +1;n
i =1
n − k +1;n
µ$ = ln yn− k +1;n
1 2
n
n
1
L1 α$ = M ∑ y MN k L exp M y N β$ = k
Generalized Pareto
i =1
n −i +1;n
n − k +1;n
+ 1
− yn− k +1;n 1 ln( k ) α$
OP PQ
1
2
U| |V − || W
2
−1
OP Q
α$ n α
VI. Simulating the Process in order to compare two methods (Classical versus Asymptotic) Monte Carlo simulations were achieved, employing S-Plus. We determine the sample size, the critical values, confidence intervals, and the number of the upper order statistics required. An important condition was that all the values in the simulations correspond to those values greater than the 95th percentile. The simulations were achieved 5000 y 10000 repetitions. It was observed that the convergence of the joint distribution for the k upper order statistics is affected k when the ratio tends to increase. n
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
VII.
Results.
The parameters of the distribution function were defined in closed form. They were used to generate estimations of the parameters, upper quantiles and confidence intervals. We estimate the parameters using the two methods (classical maximum likelihood, and the proposed method based on the k upper order statistics). In the comparison, the parameters observed significant differences. A linear regression model has been employed in order to make an adjustment, reducing the bias. Some results are presented in the following tables and figures.
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Table 1. Lognormal Distribution Function: µ=1.0, σ=2.0 Estimatión of µ, σ, and Confidence Intervals for 95% del Percentil 97.5th Percentile= 136.99 n Method CML AEVML AEVML AEVML AEVML
x.975
k 50
µ.hat
σ.hat
151.1002 0.999383 1.990654 10 155.858 3.104203 0.85973 15 135.1858 3.148323 0.779081 17 128.3995 3.150791 0.755762 20 119.5155 3.145783 0.726985
◊
LL(x.975)
UL(x.975)
135.6191 144.6066 125.1908 118.7995 110.4244
168.4317 168.1029 146.0811 138.873 129.4466
CML AEVML AEVML AEVML AEVML
100
143.279 1.000702 1.992253 130.365 10 205.9494 3.309579 0.916164 194.486 15 186.8276 3.388895 0.837506 176.3729 17 179.917 3.403894 0.814389 169.7932 20 170.5077 3.41607 0.78553 160.8188
157.5019 218.1611 197.9664 190.7061 180.8382
CML AEVML AEVML AEVML AEVML
500
138.9284 1.000366 2.000714 10 315.2707 3.67144 1.013223 15 311.6269 3.814709 0.935529 17 308.304 3.851366 0.91245 20 302.546 3.893518 0.882987
130.2017 304.8402 301.6637 298.5165 293.0059
148.2427 326.0713 321.9316 318.4247 312.4085
CML AEVML AEVML AEVML AEVML
1000
CML AEVML AEVML AEVML AEVML
2000
CML AEVML AEVML AEVML AEVML
5000
◊
10 15 17 20
137.3952 369.8256 375.5244 374.7261 371.9592
1.000022 3.78984 3.956688 3.999175 4.051995
1.998569 1.046469 0.968441 0.946086 0.916252
130.0562 360.0961 366.0909 365.4121 362.8262
145.1493 379.8254 385.208 384.2844 381.3287
10 15 17 20
137.2012 426.0407 442.1998 444.7263 446.0707
1.000657 3.894652 4.08059 4.131756 4.194388
1.998956 1.076785 0.999353 0.97607 0.945884
131.0037 417.0787 433.393 435.9977 437.4646
143.6921 435.1993 451.189 453.6331 454.8496
10 15 17 20
137.1661 508.2382 538.3942 546.3654 554.2807
1.000455 4.034428 4.235752 4.296521 4.367966
1.99979 1.105586 1.031588 1.007768 0.978381
132.2124 500.3182 530.4895 538.4927 546.4664
142.3056 516.2858 546.418 554.3544 562.2079
CML: classical maximum likelihood estimation method. AEVML: asymptotic extreme value maximum likelihood estimation method. x.975: 97.5th percentile. LL(x.975): lower limit of the 95% confidence interval of the 97.5th percentile. UL(x.975): upper limit of the 95% confidence interval of the 97.5th percentile.
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Table 2. Lognormal Distribution Function: µ=1.0, σ=2.0. Estimatión of µ, σ, and 95% Confidence Intervals for the Percentile Estimation considering the 2%of the sample 97.5th Percentile= 136.99
◊
Method: CML
n
x.975 500 1000 2000 3000 4000 5000 6000 7000 8000 9000
138.4873 137.989 137.1454 137.321 136.8556 137.1894 137.0754 137.3935 137.0333 136.9886
µ.hat 1.001506 0.999796 0.999307 1.000727 0.998262 1.000325 1.000817 1.000432 0.999743 0.998732
σ.hat 1.998716 2.000731 1.999385 1.999909 1.999664 1.999989 1.999389 2.000839 1.999914 2.00031
LL(x.975) UL(x.975) 129.7782 130.6301 130.9487 131.7036 131.6295 132.2349 132.3404 132.8274 132.6214 132.7029
147.7835 145.7633 143.6357 143.178 142.2893 142.3296 141.9799 142.1167 141.592 141.4127
Method: AEVML k n
x.975 500 1000 2000 3000 4000 5000 6000 7000 8000 9000
◊
10 20 40 60 80 100 120 140 160 180
330.4393 372.3839 425.2246 453.0681 474.4482 500.655 512.0802 518.9844 528.3794 537.1612
µ.hat 3.684161 4.056589 4.384709 4.547747 4.65574 4.746268 4.802929 4.847886 4.886662 4.921627
σ.hat 1.012436 0.915406 0.828921 0.784901 0.757353 0.737836 0.722165 0.708789 0.698838 0.689864
LL(x.975) UL(x.975) 319.8662 363.2409 417.3486 445.8897 467.7159 494.1984 505.9022 513.0456 522.6217 531.5583
341.3822 381.7633 433.252 460.3635 481.2783 507.1968 518.3343 524.9923 534.201 542.8234
CML: classical maximum likelihood estimation method. AEVML: asymptotic extreme value maximum likelihood estimation method. x.975: 97.5th percentile. LL(x.975): lower limit of the 95% confidence interval of the 97.5th percentile. UL(x.975): upper limit of the 95% confidence interval of the 97.5th percentile.
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Table 3. Función de Distribución µ=1.0, σ=2.0. Estimation of µ, σ, and 95% Confidence Intervals for the Percentile Estimation considering n=10,000 with k increasing. Percentile 97.5th = 136.99
n x.975 10,000
n
137.1993 137.0134 137.1566 137.0425 137.1893 137.1877 137.2782 137.1205 136.93 136.9478
k x.975
10,000
◊
10 20 40 60 80 100 120 140 160 180
579.7007 644.0885 677.9958 657.0752 639.6383 628.8564 607.5412 593.6539 574.4377 559.9716
◊
Method: CML LL(x.975) UL(x.975) µ.hat σ.hat 1.000573 0.999097 1.000678 0.999938 0.999633 1.000457 1.000442 1.000955 0.999622 0.999823
2.000187 2.000246 1.99995 1.999906 2.000645 2.000192 2.000535 1.999679 1.999662 1.999646
Method: AEVML µ.hat σ.hat 4.130346 4.478735 4.760767 4.854463 4.905287 4.940787 4.950729 4.960985 4.956677 4.957157
1.125745 1.003854 0.885444 0.823368 0.784977 0.757947 0.736388 0.719725 0.704763 0.693063
133.0201 132.8371 132.9783 132.8661 133.0098 133.0087 133.0975 132.9431 132.7556 132.7731
141.5097 141.3211 141.4661 141.3502 141.5001 141.4979 141.5903 141.4292 141.2357 141.2538
LL(x.975) UL(x.975) 572.5354 636.8888 671.0577 650.4871 633.2879 622.6709 601.5459 587.7946 568.7358 554.3852
586.958 651.3702 685.0062 663.7306 646.0528 635.1037 613.5966 599.5719 580.1972 565.6145
CML: classical maximum likelihood estimation method. AEVML: asymptotic extreme value maximum likelihood estimation method. x.975: 97.5th percentile. LL(x.975): lower limit of the 95% confidence interval of the 97.5th percentile. UL(x.975): upper limit of the 95% confidence interval of the 97.5th percentile.
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Tabla 4. Generalized Pareto Distribution: α=0.5,β=1.0. Estimation of α, β, and 95% Confidence Interval for the Percentile. 97.5th Percentile= 10.64 Method
n
MM AEVML AEVML AEVML AEVML
100
MM AEVML AEVML AEVML AEVML
500
MM AEVML AEVML AEVML AEVML
1000
MM AEVML AEVML AEVML AEVML
2000
MM AEVML AEVML AEVML AEVML
5000
◊
k
x.975
α.hat
β.hat
◊
LL(x.975)
UL(x.975)
10 15 17 20
10.21411 9.072792 9.641101 9.882994 10.281
0.448253 1.927104 1.665762 1.595344 1.509057
1.049305 0.67707 0.614059 0.595928 0.572483
7.574498 1.395652 2.492206 2.835489 3.288849
12.85373 16.74993 16.79 16.9305 17.27315
10 15 17 20
10.44153 9.146177 9.07275 9.060889 9.065163
0.481475 2.271939 2.042759 1.995187 1.926766
1.014706 0.900852 0.814372 0.793973 0.767022
9.227232 3.148405 3.789824 4.002802 4.274302
11.65583 15.14395 14.35568 14.11898 13.85602
10 15 17 20
10.56122 9.673477 9.406827 9.352905 9.293226
0.49109 2.31842 2.113953 2.05958 2.003406
1.006824 1.002401 0.895521 0.867201 0.836414
9.690577 4.059459 4.370872 4.510687 4.71387
11.43185 15.28749 14.44278 14.19512 13.87258
10 15 17 20
10.58521 10.31456 9.823995 9.719389 9.612262
0.493648 2.399917 2.177263 2.129426 2.072786
1.005997 1.137711 0.989529 0.956601 0.919219
9.968819 5.319209 5.206302 5.244032 5.345247
11.20161 15.30991 14.44169 14.19475 13.87928
10 15 17 20
10.63911 11.12103 10.43398 10.25518 10.06631
0.498254 2.416307 2.220278 2.169934 2.115599
1.001534 1.276549 1.099526 1.053808 1.005845
10.24651 5.956 6.155651 6.138093 6.132447
11.03171 16.28605 14.7123 14.37227 14.00018
MM: moments estimation method. AEVML: asymptotic extreme value maximum likelihood estimation method. x.975: 97.5th percentile. LL(x.975): lower limit of the 95% confidence interval of the 97.5th percentile. UL(x.975): upper limit of the 95% confidence interval of the 97.5th percentile.
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Figure 1. Lognormal data from random generator. m=1.0, s=2.0. Correcting the Bias on m: Using only m.aevml as predictor (n=sample size, k=number of upper order statistics) 1.004
1.003
1.002
mu.hat
1.001
1
0.999
0.998
0.997
0.996
Adjusted AEVML MLE
50 0( 20 ) 10 00 (1 0) 10 00 (1 5) 10 00 (1 7) 10 00 (2 0) 20 00 (1 0) 20 00 (1 5) 20 00 (1 7) 20 00 (2 0) 50 00 (1 0) 50 00 (1 5) 50 00 (1 7) 50 00 (2 0)
50 0( 17 )
50 0( 15 )
50 0( 10 )
10 0( 20 )
10 0( 17 )
10 0( 15 )
10 0( 10 )
50 (2 0)
50 (1 7)
50 (1 5)
50 (1 0)
0.995
n (k)
Figure 2. Lognormal data from random generator. m=1.0, s=2.0 Correcting the Bias on s. Using n, k, and s.aevml as covariates (n=sample size, k=number of upper order statistics) 2.02
2.01
sigma.hat
2
1.99
1.98
Adjusted AEVML MLE
50 (2 0) 10 0( 10 ) 10 0( 15 ) 10 0( 17 ) 10 0( 20 ) 50 0( 10 ) 50 0( 15 ) 50 0( 17 ) 50 0( 20 ) 10 00 (1 0) 10 00 (1 5) 10 00 (1 7) 10 00 (2 0) 20 00 (1 0) 20 00 (1 5) 20 00 (1 7) 20 00 (2 0) 50 00 (1 0) 50 00 (1 5) 50 00 (1 7) 50 00 (2 0)
50 (1 7)
50 (1 5)
50 (1 0)
1.97
n (k)
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions
Figure 3. Generalized Pareto data from random generator. a=0.5, b=1.0. Correcting the Bias on a: Using n, k, and, a.aevml as covariates (n=sample size, k=number of upper order statistics) 0.53
0.51
a.hat
0.49
0.47
0.45 Adjusted AEVML Mom ents
0) (2
7) 50
00
(1
5)
00 50
50
00
(1
0) (1
0) 50
00
(2
7) 20
00
(1
5) 20
00
(1
0)
00 20
20
00
(1
0) (2
7) 10
00
(1
5)
00 10
00 10
10
00
(1
0)
)
(1
20
) 50
0(
17
)
0( 50
50
0(
15
) 10
) 50
0(
20
)
0( 10
0(
17
) 15
10
0( 10
10
0(
10
)
0.43
n (k)
Figure 4 G eneralized Pareto data from random generator. a=0.5, b=1.0. C orrecting the Bias on β : U sing n, k and β .aevm l as covariates (n=sam ple size, k=num ber of upper order statistics) 1.15
1.1
β.hat
1.05
1
0.95
adjusted A E V M L M om ents
0) 00
(2
7) 50
00
(1
5) 50
00
(1
0) 50
50
00
(1
0) (2
7)
00 20
00
(1
5) 20
20
00
(1
0) (1
0)
00 20
10
00
(2
7) (1
5)
00 10
10
00
(1
0)
)
(1 00 10
50
0(
20
) 50
0(
17
) 50
0(
15
) 50
0(
10
) 10
0(
20
) 10
0(
17
) 15 0( 10
10
0(
10
)
0.9
n (k)
Some Convergence Problems On Heavy Tail Estimation Using Upper Order Statistics For Generalized Pareto and Lognormal Distributions