Estimation of the extreme value index and high quantiles ... - CiteSeerX

6 downloads 0 Views 105KB Size Report
In this paper, we consider the estimation problem of the extreme value index and extreme quantiles ..... Residual life time at great age, Ann. Probab.,. 2, 792-804.
Estimation of the extreme value index and high quantiles under random censoring Jan Beirlant(1) & Emmanuel Delafosse(2) & Armelle Guillou(2) (1)

Katholieke Universiteit Leuven, Department of Mathematics, Celestijnenlaan 200B, 3001 Leuven, Belgium (2)

Universit´e Paris VI, L.S.T.A., Boˆıte 158, 175 rue du Chevaleret, 75013 Paris

Key words and phrases: Pareto index, extreme quantile, censoring, Kaplan-Meier estimator. Abstract. In this paper, we consider the estimation problem of the extreme value index and extreme quantiles in the presence of censoring. Taking into account the fact that our main motivation is application in insurance, we focus on the Fr´echet and Gumbel domains of attraction. In the case of no-censoring, the most famous estimator of the Pareto index is the classical Hill estimator (1975). Some adaptations of this estimator in the case of censoring are proposed and used to build extreme quantile estimators. A theoretical study of the asymptotic properties of such estimators is started. The finite sample behaviour is illustrated in a small simulation study and also in a practical insurance example. R´ esum´ e. Dans cet article, nous consid´erons le probl`eme de l’estimation d’un index des valeurs extrˆemes et de quantiles extrˆemes en pr´esence de censure al´eatoire. Compte tenu du fait que notre motivation principale concerne l’application en assurance, nous nous concentrons sur les domaines d’attraction de Fr´echet et de Gumbel. Dans le cas non censur´e, l’estimateur de l’index le plus connu est l’estimateur de Hill (1975). Nous proposons des adaptations de cet estimateur de l’index dans le cas censur´e que nous utilisons par la suite dans le but d’estimer un quantile extrˆeme. Une ´etude th´eorique des propri´et´es asymptotiques de ces nouveaux estimateurs est propos´ee. Par ailleurs, leur comportement est illustr´e sur la base de simulations et sur un exemple de donn´ees r´eelles. Mots-cl´ es: Index de Pareto, quantile extrˆeme, donn´ees censur´ees, estimateur de KaplanMeier. 1. Introduction. When a data set contains observations within a restricted range of values, but otherwise not measured, it is called a censored data set. Statistical techniques for analyzing censored data sets are quite well studied, especially in survival analysis and biostatistics in general where censoring mechanisms are quite common. Especially the case of right censoring where some results are known to be at least as large as the reported value, received a lot of attention. Here we can for instance refer to Cox and Oakes (1984). This then 1

concerns central characteristics of the underlying distribution. The literature on tail or extreme value analysis for censored data is almost non existing. In Reiss and Thomas (1997) (section 6.1), Beirlant et al. (1996) (section 2.7) and Beirlant and Guillou (2001) in case of truncated data, some estimators of tail indices were proposed without any deeper study on their behaviour. However, important problems such as the estimation of extreme quantiles apparently were not considered before in general. Data sets with censored extreme data often occur in insurance when reported payments cannot be larger than the maximum payment value of the contract. When the reported payment equals the maximum payment, this real payment can indeed be equal to the maximum or can be censored. The situation where all data above a fixed value are censored is referred to as truncation or type I censoring. This case was considered in Beirlant and Guillou (2001). It can occur when the observations are not the real payments but the payments as a fraction of the sum insured, in which case the truncation level equals 100%. Here we consider random right censoring. The claim sizes X are possibly censored by the maximum payment Y . A maximum payment of a given contract is then considered as a realization of the random variable Y . Different situations can now occur, whether the censoring values (or maximum payment values) are observed or not. To be more specific, let Xi , i ∈ IN, be independent and identically distributed (i.i.d.) random variables with common distribution function (df) F and let Yi , i ∈ IN, be a second i.i.d. sequence with df G. We only observe Zi = Xi ∧ Yi , δi = 1lXi ≤Yi , i ∈ IN. We denote by H the df of Z1 and let τH = inf{x : H(x) = 1}, the supremum of the support 1 of H. We define H (z) = IP(Z > z, δ = 1) = IP(z < X ≤ Y ). Being motivated by actuarial applications we confine ourselves to the case where sample maxima from X samples are in the domain of attraction of the Fr´echet or Gumbel law. This typically means that we consider polynomially decreasing tails or exponentially decreasing tails with infinite right endpoint. We will consequently consider the following cases: • Observing (Z, δ), X independent of Y , and both X and Y are in the domain of attraction of the Fr´echet law; • Observing (Z, δ), X independent of Y , X is in the domain of attraction of the Fr´echet or the Gumbel law, and Y in the domain of attraction of the Fr´echet law. In order to illustrate the methods presented in this paper, we use a liability insurance example from Frees and Valdez (1998). 2. Estimation techniques. 2.1. Observing (Z, δ), X independent of Y , and both X and Y are in the domain of attraction of the Fr´ echet law

2

Supposing that F is of Pareto-type, that is, there exists a positive constant α for which 1 − F (x) = x−α `1 (x),

(1)

where `1 is a slowly varying function at infinity satisfying `1 (λx) → 1 when x → ∞, for all λ > 0. `1 (x) In order for the censoring to be not too heavy, it appears natural to assume that the censoring distribution is also heavy tailed 1 − G(x) = x−β `2 (x),

(2)

for some β > 0 and slowly varying `2 . Assuming that X and Y are independent, so that 1 − H(x) = (1 − F (x))(1 − G(x)), it now follows that ˜ 1 − H(x) = x−(α+β) `(x),

(3)

with `˜ also a slowly varying function at infinity. These conditions can be restated in terms of the tail quantile functions as UF (x) = x1/α `1,U (x), UG (x) = x1/β `2,U (x), UH (x) = x1/(α+β) `˜U (x), with UF (x) = inf{y : F (y) ≥ 1 − 1/x}, x > 1, and `1,U (x), `2,U (x) and `˜U (x) again slowly varying functions at infinity. −1 Our goal is to and of extremes quantiles  discuss the estimation problem of γ1 := α 1 1 xF,p := UF p with p < n . This problem has received a lot of attention in case of nocensoring, i.e. when Xi ≤ Yi for all i = 1, ..., n. The most famous estimator of γ1 is Hill’s (1975) estimator, given by

HX,k,n =

k 1X log Xn−i+1,n − log Xn−k,n . k i=1

(4)

Turning to the estimation of high quantiles, the estimator proposed by Weissman (1978) serves as a reference under Pareto-type models without censoring: xˆp,k = Xn−k,n

 k + 1 HX,k,n

(n + 1)p

.

In case of random right censoring, the likelihood based on Ej,t = into Nt  Y

αEj−α−1

 δj 

j=1

3

Ej−α

1−δj

,

(5)

Zj , Zj t

> t, is changed

leading to the estimator (c) HZ,t

Pn

=

i=1

log(Zi /t)1l{Zi >t} , i=1 δi 1l{Zi >t}

(6)

Pn

while for the extreme quantile estimator we propose to use (c) xˆp,t

1 − Fˆn (t) =t p

!H (c)

Z,t

,

(7)

where Fˆn (x), −∞ < x < τH denotes the Kaplan-Meier (1958) product limit estimator of F (x), defined as  n  Y δj,n 1lZj,n ≤x ˆ , 1 − Fn (x) = 1− n−j+1 j=1 where Zj,n denote the order statistics associated to Z1 , ..., Zn and δj,n := δk if and only if Zj,n = Zk . The corresponding tail probability estimator is now of course given by  x −1/H (c) (c) Z,t ˆ ˆ IP (X > x) = (1 − Fn (t)) . t When choosing t = Zn−k,n , we obtain the estimator Pk (c)

HZ,k,n =

j=1



(8)



log(Zn−j+1,n ) − log(Zn−k,n ) Pk

j=1 δn−j+1,n

,

(9)

which is the original Hill estimator adapted for right censoring. We will give also another interpretation for this estimator which is based on a novel QQ-plot. 2.2. Observing (Z, δ), X independent of Y , X in the domain of attraction of the Fr´ echet or Gumbel law, and Y in the domain of attraction of the Fr´ echet law When considering the extension to the case where γ1 ≥ 0, again as in the no-censoring case there are mainly two sets of solutions which originated from two different formulations of the model. First, the maximum likelihood approach based on POT’s (Peaks over Threshold) is based on the results given by Balkema and de Haan (1974) and Pickands (1975), stating that the limit distribution of the absolute exceedances over a threshold t when t → ∞ is given by a generalized Pareto distribution (GPD). In the case of censoring, we can easily adapt the likelihood to k h i δj h i1−δj Y fGP D (E˜j ) 1 − FGP D (E˜j ) j=1

4



where E˜j = Zj − t if Zj > t and 1 − FGP D (x) = 1 +

 1 γ1 x − γ1 . σ

Then, the maximization of (c)

this expression leads to a POT estimator for γ1 which we further denote by γˆt,M L . Secondly, we can construct a new estimator based on k upper order statistics for instance within the framework of the QQ-plot regression technique. For example, in the case of no-censoring, Beirlant et al. (1996) proposed an estimator of a real-valued index based on a generalized quantile plot, which takes over the role of the Pareto quantile plot in this more general setting. More precisely they proposed to look at the graph with coordinates 

log

 n+1 , log U Hj,n , j = 1, ..., n − 1, j

with U Hj,n = Xn−j,n HX,j,n . Again this plot becomes ultimately linear for small j with slope approximating γ1 . Then, one can construct several regression based estimators, such as k 1X γˆk,U H = log U Hj,n − log U Hk+1,n . k j=1 From the above it appears natural to define a generalization of γˆk,U H to the censoring case as a slope estimator of the generalized quantile plot adapted for censoring 







(c) − log 1 − Fˆn (Zn−j+1,n ) , log U Hj,n , (c)

(10)

(c)

(j = 1, ..., n − 1) where U Hj,n = Zn−j,n HZ,j,n : (c) γˆk,U H

=

1 k

Pk

j=1

(c)

(c)

log U Hj,n − log U Hk+1,n . 1 Pk j=1 δn−j+1,n k

(11)

(c) Using one of the abovementioned estimators γˆ.,. of γ1 ≥ 0 we can now propose new estimators for the quantile xF,p , in the spirit of the one proposed by Dekkers et al. (1989) in the case of no-censoring:

 (c)

(c) xˆp,t,. = t + γˆ.,. t

(c)  ˆ.,. 1−Fˆn (t) γ p (c) γˆ.,.

−1

.

(12)

Under suitable assumptions, we establish the asymptotic properties of our estimators. We illustrate their behaviour in a small simulation study, but also in a practical insurance example.

5

Bibliography [1] Balkema, A. and de Haan, L. (1974). Residual life time at great age, Ann. Probab., 2, 792-804. [2] Beirlant, J. and Guillou, A. (2001). Pareto index estimation under moderate right censoring, Scand. Actuarial J., 2, 111-125. [3] Beirlant, J. Teugels, J.L. and Vynckier, P. (1996). Practical Analysis of Extreme Values, Leuven University Press, Leuven. [4] Beirlant, J., Vynckier, P. and Teugels, J.L. (1996). Excess functions and estimation of the extreme value index, Bernoulli, 2, 293-318. [5]Cox, D.R. and Oakes, D (1984). Analysis of Survival Data, Chapman and Hall, New York. [6] Dekkers, A.L.M., Einmahl, J.H.J. and de Haan, L. (1989). A moment estimator for the index of an extreme-value distribution, Ann. Statist. 17, 1833-1855. [7] Frees, E. and Valdez, E. (1998). Understanding relationships using copulas, North American Actuarial Journal, 2, 1–15. [8] Hill, B.M. (1975). A simple general approach to inference about the tail of a distribution, Ann. Statist., 3, 1163-1174. [9] Kaplan, E.L. and Meier, P. (1958). Non-parametric estimation from incomplete observations, J. Amer. Statist. Assoc., 53, 457-481. [10] Pickands III, J. (1975). Statistical inference using extreme order statistics, Ann. Statist., 3, 119-131. [11] Reiss, R.D. and Thomas, M. (1997). Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields, Birkh¨auser Verlag, Basel. [12] Weissman, I. (1978). Estimation of parameters and large quantiles based on the k largest observations. J. Amer. Statist. Assoc. 73, 812-815.

6

Suggest Documents