A non-parametric double-bootstrap method for an

0 downloads 0 Views 1015KB Size Report
M. Fátima Brilhante∗, M. Ivette Gomes† and Dinis D. Pestana†. ∗CEAUL and Departamento de Matemática, Universidade dos Açores, Campus Ponta Delgada, ...
A non-parametric double-bootstrap method for an adaptive MOP EVIestimation M. Fátima Brilhante, M. Ivette Gomes, and Dinis D. Pestana Citation: AIP Conf. Proc. 1479, 1708 (2012); doi: 10.1063/1.4756501 View online: http://dx.doi.org/10.1063/1.4756501 View Table of Contents: http://proceedings.aip.org/dbt/dbt.jsp?KEY=APCPCS&Volume=1479&Issue=1 Published by the American Institute of Physics.

Additional information on AIP Conf. Proc. Journal Homepage: http://proceedings.aip.org/ Journal Information: http://proceedings.aip.org/about/about_the_proceedings Top downloads: http://proceedings.aip.org/dbt/most_downloaded.jsp?KEY=APCPCS Information for Authors: http://proceedings.aip.org/authors/information_for_authors

Downloaded 24 Oct 2012 to 85.240.156.235. Redistribution subject to AIP license or copyright; see http://proceedings.aip.org/about/rights_permissions

A Non-Parametric Double-Bootstrap Method for an Adaptive MOP EVI-Estimation M. Fátima Brilhante∗ , M. Ivette Gomes† and Dinis D. Pestana† ∗

CEAUL and Departamento de Matemática, Universidade dos Açores, Campus Ponta Delgada, 9501 − 801 Ponta Delgada, Portugal † CEAUL and DEIO, Faculdade de Ciências de Lisboa,Campo Grande, Bloco C6, Piso 4, 1749 − 016 Lisboa, Portugal Abstract. The Hill estimator, the average of k excesses of ordered log-observations, can be regarded as the logarithm of the mean of order p = 0 of a set of adequate statistics. The mean of order p (MOP), now with p ≥ 0, of the same statistics leads to the so-called MOP extreme value index (EVI)-estimator, a simple generalisation of the classical Hill estimator of a positive EVI, recently introduced in the literature. This class of MOP EVI-estimators depends on the extra tuning parameter p ≥ 0, which makes it very flexible, and even able to overpass most of the ‘classical’ and even reduced-bias EVI-estimators. Apart from a simulation study that reflects such an assertion, we advance with a fully non-parametric double bootstrap algorithm for the choice of p and k. We further provide applications of the algorithm to simulated and real data in the fields of biostatistics. Keywords: Bootstrap methodology, heavy tails, semi-parametric estimation, statistics of extremes. AMS2010: 62G32.

INTRODUCTION AND PRELIMINARIES Given a sample of size n of independent, identically distributed (i.i.d.) random variables (r.v.’s), (X1 , X2 , . . . , Xn ), with a common distribution function (d.f.) F, let us denote (X1:n ≤ X2:n ≤ · · · ≤ Xn:n ) the sample of associated ascending order statistics. Let us assume that there exist sequences of real constants {an > 0} and {bn ∈ R} such that the maximum, linearly normalized, i.e. (Xn:n − bn ) /an , converges in distribution to a non-degenerate r.v. Then, the limit distribution is necessarily of the type of the general extreme value (EV) d.f., given by  exp(−(1 + γx)−1/γ ), 1 + γx > 0 if γ = 6 0 EVγ (x) = (1) exp(− exp(−x)), x ∈ R if γ = 0.  The d.f. F is said to belong to the max-domain of attraction of EVγ , and we write F ∈ DM EVγ . The parameter γ, in (1), is the extreme value index (EVI), the primary parameter of extreme events. Let us denote RVa the class of regularly varying functions at infinity, with an index of regular variation equal to a (see Bingham et al., 1987). The EVI measures the heaviness of the right tail function F(x) := 1 − F(x), and the heavier the right tail, the larger γ is. In this paper we shall work with Pareto-type underlying d.f.’s, with a positive EVI, or equivalently, models such that F(x) = x−1/γ L(x), γ > 0, with L ∈ RV0 , i.e. F ∈ RV−1/γ . These heavy-tailed models are quite common in many areas of application, like computer science, telecommunications, insurance, finance, bibliometrics and biostatistics, among others. Equivalently, with F ← (x) := inf{y : F(y) ≥ x} denoting the generalized inverse function of F, the (reciprocal) quantile function U(t) := F ← (1 − 1/t), t ≥ 1, is of regular variation with index γ, i.e.  + F ∈ DM := DM EVγ γ>0 ⇐⇒ F ∈ RV−1/γ ⇐⇒ U ∈ RVγ (2) for all x > 0 (Gnedenko, 1943; de Haan, 1984). The second-order parameter ρ (≤ 0) rules the rate of convergence in the first-order condition, in (2), and it is the non-positive parameter appearing in the limiting relation  xρ −1 lnU(tx) − lnU(t) − γ ln x if ρ < 0 ρ lim = t→∞ A(t) ln x if ρ = 0, which is assumed to hold for every x > 0 in order to obtain the non-degenerate asymptotic behaviour of any EVIestimator, and where |A| must then be of regular variation with index ρ (Geluk and de Haan, 1987). Numerical Analysis and Applied Mathematics ICNAAM 2012 AIP Conf. Proc. 1479, 1708-1711 (2012); doi: 10.1063/1.4756501 © 2012 American Institute of Physics 978-0-7354-1091-6/$30.00

1708 Downloaded 24 Oct 2012 to 85.240.156.235. Redistribution subject to AIP license or copyright; see http://proceedings.aip.org/about/rights_permissions

+ For heavy-tailed models in DM , the classical EVI-estimators are the Hill estimators (Hill, 1975), which are the averages of the log-excesses, given by Vik := ln Xn−i+1:n /Xn−k:n , 1 ≤ i ≤ k < n. We thus have

  k Xn−i+1:n 1/k 1 k H(k) := ∑ Vik = ∑ ln = ln k i=1 Xn−k:n i=1

k

Xn−i+1:n ∏ Xn−k:n i=1

!1/k ,

1 ≤ k < n,

(3)

the logarithm of the geometric mean of the statistics Uik := Xn−i+1:n /Xn−k:n . More generally, Brilhante et al. (2012) considered as basic statistics for the EVI estimation, the mean of order p (MOP) of Uik , denoted A p (k) and the class of MOP EVI-estimators,   1/p k     p  1  if p > 0  ∑U −p    1 − A p (k) /p if p > 0  k i=1 ik H p (k) := , A p (k) = (4)    1/k   k  ln A0 (k) = H(k) if p = 0   if p = 0,  ∏ Uik i=1

dependent now on this tuning parameter p ≥ 0, and with H0 (k) ≡ H(k), given in (3). We develop a Monte-Carlo simulation study of the comparative behaviour of different classes of ‘classical’ and reduced-bias EVI-estimators, including the MOP EVI-estimators, in (4). We further suggest a fully non-parametric double-bootstrap method for the adaptive choice of the tuning parameters k and p, and applications to simulated random samples, as well as to sets of real data in the field of biostatistics.

A COMPARATIVE SIMULATION STUDY We have implemented multi-sample Monte Carlo simulation experiments of size 5000 × 20 for the class of MOP EVIestimators, in (4), comparatively with the Moment (M), introduced in Dekkers et al. (1993), the mixed moment (MM), studied in Fraga Alves et al. (2009), and the minimum-variance reduced-bias (MVRB) EVI-estimators in Caeiro et al. (2005), denoted CH,with CH standing for corrected-Hill, for sample sizes n = 100, 200, 500, 1000, 2000 and 5000, from the following underlying models, with γ > 0: (1) the Fréchet model, with d.f. F(x) = exp(−x−1/γ ), x ≥ 0; (2) the Burr model, with d.f. F(x) = 1 − (1 + x−ρ/γ )1/ρ , x ≥ 0; (3) the extreme value model, with d.f. F(x) = EVγ (x), with EVγ (x) given in (1); (4) the Student-tν , with ν = 1, 2, 4, i.e. for values of γ = 1, 0.5, 0.25 (γ = 1/ν). For details on multi-sample simulation, see Gomes and Oliveira (2001). The chosen values of p were p = j/(10γ), with j = 0(1)9. As an illustration, we merely present the simulated mean values (E) and the root mean square errors (RMSE) of some of the EVI-estimators under study, as functions of the number of top order statistics k involved in the estimation, for a Student t4 underlying parent. These values are based on the first replicate with a size 5000.

E[.] H 0.4

RMSE[.]

0.2

j = 4 CH

H

j=4

M j=9 0.1

0.25

MM

MM

CH

M 0.1

k 0

250

500

j=9

0 0

250

500

k

FIGURE 1. Mean values (left) and RMSE (right) of the EVI-estimators under consideration for an Student tν d.f. with ν = 4 (γ = 0.25, ρ = −0.5)

Table 1 is again related with the Student t4 model, (γ = 1/ν = 0.25, ρ = −2/ν = −0.5). We there present, for n = 200, 500, 1000, 2000 and 5000, the simulated mean values at optimal levels (levels where RMSE are minima

1709 Downloaded 24 Oct 2012 to 85.240.156.235. Redistribution subject to AIP license or copyright; see http://proceedings.aip.org/about/rights_permissions

as functions of k) of the EVI-estimators under consideration. We have further computed the Hill estimator, in (4) whenever p = 0, at the simulated value of k0|0 := arg mink MSE H0 (k) , the simulated optimal k in the sense of minimum RMSE, not relevant in practice, but providing an indication of the best possible performance of Hill’s estimator. Such an estimator will be denoted H00 . We have also computed all other estimators, generally denoted W (k), at the simulated value of k0|W := arg mink MSE W (k) , and use the notation W0 := W (k0|W ). The simulated indicators are s MSE (H00 ) RMSE (H00 ) = . (5) REFFW |H := RMSE (W0 ) MSE (W0 ) Information on 95% confidence intervals, computed on the basis of the 20 replicates with 5000 runs each, is also provided. Among the estimators considered, the one providing the smallest squared bias and the highest REFF is underlined and in bold. TABLE 1. Simulated mean values, EW0 , and REFFW |H indicators, at optimal levels, of H(k) ≡ H0 (k), H p (k), p = j/(10γ), j = 4, 8 and 9, CH(k), M(k) and MM(k), for Student t4 underlying parents, together with 95% confidence intervals Mean values n H( j = 0) j=4 j=8 j=9 CH M MM

200

500

1000

2000

5000

0.3392 ± 0.0026 0.2914 ± 0.0018 0.2646 ± 0.0005 0.2583 ± 0.0001 0.3104 ± 0.0009 0.1273 ± 0.0428 0.2596 ± 0.0033

0.3167 ± 0.0016 0.2881 ± 0.0009 0.2616 ± 0.0003 0.2565 ± 0.0001 0.3005 ± 0.0013 0.1513 ± 0.0079 0.2546 ± 0.0008

0.3055 ± 0.0013 0.2844 ± 0.0007 0.2604 ± 0.0002 0.2554 ± 0.0001 0.2939 ± 0.0008 0.1678 ± 0.0022 0.2534 ± 0.0006

0.2959 ± 0.0009 0.2810 ± 0.0006 0.2589 ± 0.0002 0.2546 ± 0.0001 0.2879 ± 0.0006 0.1857 ± 0.0012 0.2513 ± 0.0003

0.2862 ± 0.0007 0.2765 ± 0.0004 0.2575 ± 0.0002 0.2539 ± 0.0000 0.2805 ± 0.0004 0.2019 ± 0.0007 0.2507 ± 0.0002

1.2554 ± 0.0050 3.2243 ± 0.0156 6.2623 ± 0.0303 1.2834 ± 0.0057 0.7102 ± 0.0043 2.4656 ± 0.0188

1.1819 ± 0.0045 2.7827 ± 0.0144 5.3442 ± 0.0294 1.2358 ± 0.0048 0.7451 ± 0.0055 2.9994 ± 0.0160

REFF indicators j=4 j=8 j=9 CH M MM

1.5845 ± 0.0049 4.7308 ± 0.0251 9.2156 ± 0.0518 1.3982 ± 0.0084 0.6222 ± 0.0645 1.5866 ± 0.0240

1.4200 ± 0.0054 4.0389 ± 0.0172 7.9104 ± 0.0351 1.3615 ± 0.0053 0.6266 ± 0.0145 1.8738 ± 0.0123

1.3285 ± 0.0050 3.5993 ± 0.0120 7.0227 ± 0.0207 1.3223 ± 0.0057 0.6695 ± 0.0037 2.1368 ± 0.0100

Regarding REFF-indicators, the MOP EVI-estimator associated with p = 9/(10γ), and computed at optimal levels, outperforms all other alternatives.

FOUNDATIONS FOR THE DOUBLE-BOOTSTRAP ALGORITHM AND APPLICATIONS For the new class of MOP EVI-estimators H p (k), in (4), valid for p < 1/γ, k0|p (n) = arg mink MSE(H p (k)) =  kA|p (n)(1 + o(1)), with kA|p (n) := arg mink AMSE H p (k) , and AMSE standing for asymptotic mean square error. The bootstrap methodology can thus enable us to consistently estimate the optimal sample fraction (OSF), k0|p (n)/n, on the basis of a consistent estimator of kA|p (n), in a way similar to the one used in Gomes and Oliveira (2001), among others, for the classical adaptive Hill EVI-estimation, performed through H(k) ≡ H0 (k), in (3), and in Brilhante et al. (2012) and Gomes et al. (2012), among others, for MOP and second-order reduced-bias estimation, respectively. With the notation [x] for the integer part of x, we use again the auxiliary statistics Tk,n ≡ T (k|H p ) ≡ Tk,n|p := H p ([k/2]) − H p (k),

k = 2, . . . , n − 1,

which converge in probability to zero, for any intermediate k, and have an asymptotic behaviour strongly related with the asymptotic behaviour of H p (k). Denoting k0|T (n) := arg mink MSE(Tk,n ), we have 2

k0|p (n) = k0|T (n) × (1 − 2ρ ) 1−2ρ (1 + o(1)).

1710 Downloaded 24 Oct 2012 to 85.240.156.235. Redistribution subject to AIP license or copyright; see http://proceedings.aip.org/about/rights_permissions

(6)

Given the random sample X n = (X1 , . . . , Xn ) from any unknown model F, and the functional Tk,n =: φk (X n ), 1 < k < n, let us consider for any n1 = O(n1−ε ), 0 < ε < 1, the bootstrap sample X ∗n1 = (X1∗ , . . . , Xn∗1 ), from the model Fn∗ (x) = 1 n

n

∑ I[Xi ≤x] , the empirical d.f. associated with the available sample, X n . Next, associate to the bootstrap sample the

i=1

corresponding bootstrap auxiliary statistic, Tk∗1 ,n1 := φk1 (X ∗n1 ), 1 < k1 < n1 . Then, we have the MSE-estimate, MSE∗ (n1 , k1 ) =

1 B ∗ 2 ∑ tk,n1 ,l , k1 = 2, . . . , n1 − 1. B l=1 2ρ − 1−2ρ

∗ (n ) = arg min MSE∗ (n , k ), k∗ (n )/k and with k0|T 1 1 1 k1 0|T (n) = (n1 /n) 0|T 1 sample size, n2 = [n21 /n] + 1, we have

2 ∗ ∗ k0|T (n1 ) /k0|T (n2 ) = k0|T (n)(1 + o(1)),

(1 + o(1)). Consequently, for another

as n → ∞.

(7)

On the basis of (7), we are now able to first consistently estimate k0|T , and next k0|p on the basis of (6) and any estimate ∗ denoting the sample counterpart of k∗ , ρ ˆ an adequate ρ-estimate, and ρˆ of the second-order parameter ρ. With kˆ 0|T 0|T    2 ∗ ≡k ˆ ∗ (n; n1 ) := min n − 1, cρˆ (kˆ ∗ (n1 ))2 /kˆ ∗ ([n2 /n] + 1) . cρ = (1 − 2ρ ) 1−2ρ , we thus have the k0 -estimate, kˆ 0|p 1 0|p 0|T 0|T The adaptive estimate of γ is then given by H p∗ ≡ H ∗ := H p (kˆ ∗ (n; n1 )). The MSE p∗ of H p∗ can be obtained p,n,n1 |T

0|p

through a non-parametric bootstrap estimate, leading us to the p-choice p0 := arg inf p MSE p∗ , and to the k-choice, ∗ (n; n ). Applications to simulated Student samples and to data sets in Gomes and Neves (2011) show the kˆ 0|p 1 0 interesting performance of this fully non-parametric double bootstrap algorithm.

ACKNOWLEDGMENTS Research partially supported by National Funds through FCT — Fundação para a Ciência e a Tecnologia, project PEst-OE/MAT/UI0006/2011 and project EXTREMA, PTDC/FEDER.

REFERENCES 1. N. Bingham, C.M. Goldie, and J.L. Teugels (1987). Regular Variation. Cambridge Univ. Press, Cambridge. 2. M.F. Brilhante, M.I. Gomes, and D.D. Pestana (2012). A Simple Generalization of the Hill Estimator. Notas e Comunicações CEAUL 03/2012. 3. F. Caeiro, M.I. Gomes, and D.D. Pestana (2005). Direct reduction of bias of the classical Hill estimator. Revstat 3:2, 111–136. 4. A. Dekkers, J. Einmahl, and L. de Haan (1989). A moment estimator for the index of an extreme-value distribution. Annals of Statistics 17, 1833–1855. 5. M.I. Fraga Alves, M.I. Gomes, L. de Haan, and C. Neves (2009). The mixed moment estimator and location invariant alternatives. Extremes 12, 149–185. 6. J. Geluk, and L. de Haan (1987). Regular Variation, Extensions and Tauberian Theorems. CWI Tract 40, Center for Mathematics and Computer Science, Amsterdam, Netherlands. 7. B.V. Gnedenko (1943). Sur la distribution limite du terme maximum d’une série aléatoire. Annals of Mathematics 44:6, 423–453. 8. M.I. Gomes, and M.M. Neves (2011). Estimation of the extreme value index for randomly censored data. Biometrical Letters 48:1, 1-22. 9. M.I. Gomes, and O. Oliveira (2001). The bootstrap methodology in Statistics of Extremes: choice of the optimal sample fraction. Extremes 4:4, 331–358. 10. M.I. Gomes, F. Figueiredo, and M.M. Neves (2012). Adaptive estimation of heavy right tails: the bootstrap methodology in action, Extremes, in press. DOI: 10.1007/s10687-011-0146-6 11. L. de Haan (1984) Slow variation and characterization of domains of attraction. In Tiago de Oliveira, ed., Statistical extremes and applications. D. Reidel, Dordrecht, 31–48. 12. B.M. Hill (1975). A simple general approach to inference about the tail of a distribution. Annals Statistics 3, 1163-1174.

1711 Downloaded 24 Oct 2012 to 85.240.156.235. Redistribution subject to AIP license or copyright; see http://proceedings.aip.org/about/rights_permissions

Suggest Documents