expected shortfall

19 downloads 0 Views 649KB Size Report
Dec 18, 2017 - Value-at-risk (VaR) and Expected Shortfall. (ES) are popular measures of market risk associated with an asset or a portfolio of assets. VaR is an ...
Communications in Statistics - Simulation and Computation

ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20

Nonparametric estimation of 100(1 − p)% expected shortfall: p → 0 as sample size is increased Santanu Dutta & Suparna Biswas To cite this article: Santanu Dutta & Suparna Biswas (2016): Nonparametric estimation of 100(1 − p)% expected shortfall: p → 0 as sample size is increased, Communications in Statistics Simulation and Computation, DOI: 10.1080/03610918.2016.1152370 To link to this article: https://doi.org/10.1080/03610918.2016.1152370

Accepted author version posted online: 10 Aug 2016. Published online: 18 Dec 2017. Submit your article to this journal

Article views: 48

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lssp20

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION , VOL. , NO. , – https://doi.org/./..

Nonparametric estimation of ( − p)% expected shortfall: p →  as sample size is increased Santanu Dutta and Suparna Biswas Mathematical Science Department, Tezpur University, Napaam, Tezpur, Assam, India

ABSTRACT

ARTICLE HISTORY

Expected shortfall (ES) is a well-known measure of extreme loss associated with a risky asset or portfolio. For any 0 < p < 1, the 100(1 − p) percent ES is defined as the mean of the conditional loss distribution, given the event that the loss exceeds (1 − p)th quantile of the marginal loss distribution. Estimation of ES based on asset return data is an important problem in finance. Several nonparametric estimators of the expected shortfall are available in the literature. Using Monte Carlo simulations, we compare the accuracy of these estimators under the condition that p → 0 as n → ∞ for several asset return time series models, where n is the sample size. Not much seems to be known regarding the properties of the ES estimators under this condition. For p close to zero, the ES measures an extreme loss in the right tail of the loss distribution of the asset or portfolio. Our simulations and real-data analysis provide insight into the effect of varying p with n on the performance of nonparametric ES estimators.

Received  November  Accepted  January  KEYWORDS

Expected shortfall; Nonparametric estimation MATHEMATICS SUBJECT CLASSIFICATION

G

1. Introduction Risk measures have become important tools in finance and actuarial science. A risk measure is a mapping which assigns real numbers to the possible outcomes of a random financial quantity, such as an insurance claim or market risk of a portfolio (see Brazauskas et al., 2008). Market risk of a portfolio refers to the loss incurred due to random fluctuations in the value of risky assets in the portfolio over a period of time. Value-at-risk (VaR) and Expected Shortfall (ES) are popular measures of market risk associated with an asset or a portfolio of assets. VaR is an extreme quantile of the marginal loss distribution. Its use was recommended by the Basel Committee on Banking Supervision in 1996. However, there are several demerits of the VaR. For instance, Artzner et al. (1997, 1999) have shown that VaR does not provide any information about the size of the potential loss when it exceeds the VaR level. More importantly, VaR is not able to distinguish portfolios which bear different levels of risk (see Acerbi et al., 2001; Scaillet 2003). Also VaR is not a coherent risk measure, as it is not sub-additive. Yamai and Yoshiba (2002) stated two more disadvantages of the VaR. To address these issues, the ES was introduced. It is the mean of the conditional loss distribution, given the event that the loss exceeds the VaR. See Artzner et al. (1997), Artzner et al. (1999), Acerbi et al. (2001), Acerbi and Tasche (2002), etc. for detailed discussion on various properties of the ES. It is CONTACT Santanu Dutta , Tezpur, Assam, India. ©  Taylor & Francis Group, LLC

tezpur@gmail.com

Mathematical Science Department, Tezpur University, Napaam

2

S. DUTTA AND S. BISWAS

closely linked to VaR, and is regarded as a good supplement to the VaR (see Acerbi et al., 2001; Acerbi and Tasche, 2002). n Let {Yt }t=0 be the price or market value of a portfolio over n consecutive periods of a time unit (say daily closing values of an index or stock). Then, Xt = − log(Yt /Yt−1 ), t = 1, . . . , n, are the negative log returns or losses of the portfolio over these n consecutive time units (say days). There are some well-known stylized facts exhibited by {Xt }t=1,...,n . For instance, (linear) autocorrelations of asset returns are often insignificant, but squared returns exhibit autocorrelation, distribution of Xt seems to display a power-law or Pareto-like tail, the returns exhibit volatility clustering, the conditional distribution of Xt (after correcting for volatility) is also heavy tailed (see Cont (2001) for a detailed discussion). No one particular parametric model is known to capture all or most of these stylized properties of {Xt }t=1,...,n . So we assume that {Xt }t=1,2,... is a stationary α-mixing process with marginal distribution function (df) F and marginal density f . A wide variety of common econometric models satisfy these assumptions (see Chen and Tang 2005). The estimators obtained under such general assumptions are nonparametric in nature, i.e., they are not dependent on the exact specification of a model for the stochastic process {Xt }t=1,2,... . The advantage of the nonparametric approach is that it does not require exact specification of the data generating process and hence it is robust against mis-specification of the marginal distribution (see Chen 2008). In this paper, we review some of the nonparametric estimators of expected shortfall and compare their finite sample performance with that of the empirical estimator. The 100(1 − p) percent VaR, denoted by VaR p , is the (1 − p)th quantile of the marginal distribution (see Drees 2003) and the 100(1 − p) percent ES is the mean of the conditional loss distribution, given that loss exceeds VaR p . The asymptotic properties of such ES estimators are obtained under the assumption that p is fixed as n → ∞. Not much seems to be known about the behavior of the ES estimators under the condition p → 0 as n → ∞. This condition, viz., p → 0 as n → ∞, implies that even for large sample size only a small proportion of the sample values is likely to be above the VaR p level. Hence under this condition the estimation of ES seems to be a challenging problem even for large sample size, without any extra model assumptions. In this article, we compare the performance of nonparametric estimators of ES for varying p, via Monte Carlo simulations. The article is divided into five sections. Section 2 contains some important definitions and assumptions. In Section 3, we describe six nonparametric ES estimators and review their known asymptotic properties. Using Monte Carlo simulations, we compare the mean squared errors (MSE) of some of the well-known ES estimators for varying p and fixed n. The comparisons are repeated for different values of n and ten different stationary time series models. The findings are reported in Section 4. No estimator can be claimed uniformly the best. Although kernel smoothing is popular in several nonparametric estimation problems, but in the context of ES estimation the kernel based estimator seems to perform quite poorly. But we are able to identify three estimators which perform very well whenever np > 1. In Section 5, we estimate the ES of S & P Nifty index, which is an important stock market index in India, based on the daily return data for the period 1 April 2012 to 31 March 2015.

2. Definitions For 0 < p < 1, the pth quantile of the distribution with df F is defined as Q p = inf{x : F (x) > 1 − p}.

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

3

the 100(1 − p) percent VaR, denoted by VaR p , is defined as the pth quantile of the marginal distribution of Xt (see Drees 2003), i.e., VaR p = Q p . The conditional loss distribution, given that the loss exceeds the VaR, is called the shortfall distribution. Its df  p is defined as follows:  p (x) = P{Xt ≤ x|Xt > VaR p }. The mean of this distribution is called the expected shortfall and is denoted by ES p . Under the assumption that E|X1 | < ∞, the 100(1 − p) expected shortfall is defined as  ES p = E{Xt |Xt > VaR p } =

1 xd p (x) = p



1 xdF (x) = p x>VaR p



1

Q(u)du. 1−p

Let γ (k) = Cov{(X1 − Q p )I(X1 ≥ Q p ), (Xk+1 − Q p )I(Xk+1 ≥ Q p )} for any positive integer k and σ02 (1 − p, n) = {Var{(X1 − Q p )I(X1 ≥ Q p )} +

n−1 

γ (k)}.

k=1

For any event A, IA (w) equals 1 for w ∈ A and equals 0 otherwise. For any random variable X, let X + = max{X, 0}.

3. Nonparametric methods for estimating expected shortfall In this section, we review some known nonparametric estimators of expected shortfall. In the following, Fkl denote the σ -algebra of events generated by {Xt , k ≤ t ≤ l} for l > k. The α-mixing coefficient introduced by Rosenblatt (1956) is defined as α(k) =

sup

∞ A∈F1i ,B∈Fi+k

|P(AB) − P(A)P(B)|.

The series {Xt }t∈N is said to be α-mixing if limk→∞ α(k) = 0. In the following, we assume that 1. Assumption 1. ∃ρ ∈ (0, 1) such that α(k) ≤ Cρ k for all k ≥ 1 and a positive constant C. 2. Assumption 2. The distribution function F of Xt is absolutely continuous with probability density f which has continuous second derivatives in B(Q p ), a neighborhood of Q p . Fk , the joint distribution function of (X1 , Xk+1 ), have all its second derivatives bounded in B(Q p ) for k ≥ 1 and E(|Xt |2+δ ) ≤ C for some δ > 0 and a positive C. 3.1. Empirical estimator Let Fˆ denote the empirical distribution of the observed losses X1 , X2 , . . . , Xn , i.e., Fˆ (x) =

1 I(Xi ≤ x), n i=1 n

4

S. DUTTA AND S. BISWAS

where I(·) is the indicator function and Xi is i.i.d with distribution F. By standard results on empirical distribution (see Van Der Vaart, 1998), the pth quantile can be estimated by   i−1 i −1 ˆ , F (1 − p) = X(i) , 1 − p ∈ , n n where X(1) ≤ X(2) ≤ · · · ≤ X(n) are the order statistics. The empirical estimator of expected shortfall is defined as n i=[n(1−p)]+1 X(i) , Emp p = n − [n(1 − p)] where [x] denotes the largest integer not greater than x. The empirical estimator can be rewritten as n Xt I(Xt ≥ qˆ p ) , Emp p = t=1 [np] + 1 where qˆ p = X([n(1−p)]+1) . Under Assumptions 1 and 2, Chen (2008) obtained the following asymptotic property of the empirical estimator. Chen (2008) showed that under Assumptions 1 and 2 and for arbitrary positive κ, as n → ∞  n  3

1 1 (Xt − Q p )I(Xt ≥ Q p ) − p(ES p − Q p ) + o p n− 4 +κ . Emp p − ES p = p n i=1 The above equation is a Bahadur type expansion which leads to the following theorem regarding the asymptotic normality of the empirical estimator Emp p . Theorem 3.1. (Chen (2008)) Under Assumptions 1 and 2, as n → ∞ √ d npσ0−1 (1 − p; n)(Emp p − ES p ) → N(0, 1). The above theorem indicates that the asymptotic standard deviation of Emp p equals n 1 which is the standard deviation of np i=1 (Xt − Q p )I(Xt ≥ Q p ). Chen (2008)

σ0 (1−p;n) √ , np

argued that the asymptotic variance of Emp p equals of {(Xt − Q p )I(Xt ≥ Q p )}t .

2πφ(0) , np2

where φ is the spectral density

3.2. Kernel-based estimator Scaillet (2004) proposed an estimator which employs kernel smoothing in both the initial VaR estimation and the final averaging of the excessive losses. The kernel estimator proposed by Scaillet (2004) is defined as follows. ∞ Let K be a kernel function, which is a symmetric probability density function, G(t ) = t K(u)du and Gh (t ) = G(t/h), where h is a positive smoothing bandwidth. The kernel estimator of the survival function S(x) = 1 − F (x) is 1 Gh (z − Xt ). n t=1 n

Sh (z) =

A kernel estimator of Q p , denoted as qˆ p,h , is the solution of Sh (z) = 1 − p. Then, the kernel estimator of ES is given as 1  Xt Gh (qˆ p,h − Xt ). np t=1 n

Chen p,h =

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

5

In the kernel-based method the main problem lies with the selection of bandwidth. Azzalini (1981), Bowman et al. (1998), Scaillet (2004), and Chen and Tang (2005) provide some choice of the bandwidth parameter. Chen and Tang (2005) have obtained the asymptotic bias, variance, and the rate of almost sure convergence of their version of qˆ p , under the assumption that {Xt } is a stationary geometric α-mixing process. The authors suggested the following choice for the optimal value of h,

1/3 2 f 3 (Q p )bk hopt = n−1/3 , σk4 ( f (1) (Q p ))2 where bk = uw(u)H(u)du, and σk2 = u2 w(u)du. H(·) is the distribution function of the distribution with density w. h involves unknown constants Q p , f and its derivative f (1) at Q p . Chen and Tang (2005) suggested to approximate Q p in h by the corresponding sample quantile. The authors suggested to approximate f and f (1) by the density and the first derivative of the generalized Pareto distribution. Under 2 and assuming that K is 1 Assumptions 1 and 1 a symmetric probability density satisfying −1 uK(u)du = 0, −1 u2 K(u)du = σK2 and K has bounded and Lipschitz continuous derivative, Chen (2008) proved that as n → ∞. Theorem 3.2. (Chen 2008) √ d npσ0−1 (1 − p; n)(Chen p,h − ES p ) → N(0, 1) 1 2 2 and furthermore, Bias(Chen p,h ) = − 2p σK h f (Q p ) + o(h2 ) and

Var(Chen p,h ) =

1 2 σ (1 − p; n) + o(n−1 h), np 0

where h = o(1), nh3−β → ∞ for some β > 0, and nh4 log2 (n) = o(1) as n → ∞ (see Chen 2008). The term σ02 (1 − p; n) is the same as in Theorem 1. By comparing Theorems 1 and 2, it is observed that the kernel estimator has the same asymptotic distribution as the unsmoothed √ sample estimator Emp p . Both Emp p and Chen p,h converges to ES p at the rate of np. The second part of Theorem 2 implies that the kernel estimator does not offer a variance reduction, due to the presence of the second order term o(n−1 h) in Var(Chen p,h ). Also smoothing brings in a bias. Clearly the asymptotic MSE of the Chen p,h is greater than the same for the empirical estimator. Therefore, for the purpose of estimating the ES, the kernel smoothing does not seem to yield an asymptotically efficient estimator.

3.3. Brazauskas et al.’s estimator Let us recall that expected shortfall is defined as 1 ES p = p



1

Q(u)du. 1−p

Let Fˆ denote the empirical cumulative distribution function of X1 , . . . , Xn and Fˆ −1 be its quantile function. Brazauskas et al. (2008) defined an empirical estimator of ES p as follows: p = 1 ES p



1

1−p

Fˆ −1 (u)du.

6

S. DUTTA AND S. BISWAS

 p converges to ES p almost Under the assumption that X1 , . . . , Xn are i.i.d. with E|X1 | < ∞, ES surely as n is increased (see Brazauskas et al. (2008)). To construct pointwise and simultaneous confidence intervals for ES p, Brazauskas et al. (2008) obtained the following asymptotic result. Theorem 3.3. (Brazauskas et al. (2008)) Let X1 , . . . , Xn be i.i.d. random variables with finite second moment E[X12 ]. Let the distribution function F be continuous at the point F −1 (1 − p). Then as n → ∞ √  p − ES p ) →d N(0, σ 2 (1 − p)), n(ES where N(0, σ 2 (1 − p)) denotes a centered normal random variable with the variance   ∞ 1 ∞ (F (x ∧ y) − F (x)F (y))dxdy. σ 2 (1 − p) = 2 p F −1 (1−p) F −1 (1−p) 3.4. Tail-trimmed estimator by Hill (−) (−) (−) Let Xt(−) = Xt I(Xt < 0) and X(1) ≤ X(2) · · · ≤ X(n) ≤ 0 denote the negative numbers in the data ordered in increasing order. Let {kn } be an intermediate sequence, where kn → ∞ and kn → 0. Hill (2013) defined the following tail trimmed estimator of ES: n

1  ˆ . Xt I X(k(−) ≤ X ≤ q t n,p n) np t=1 n

Hill p = −

where qˆn,p = X([pn]) . kn is the number of trimmed (omitted) tail extremes, representing an asymptotically vanishing and therefore negligible sample tail proportion knn . Under a geometric α-mixing condition on {Xt } and few additional regularity conditions (see Hill 2013), Hill p is a consistent and asymptotically normal estimator, viz. under the assumptions that {Xt } is strongly mixing with geometric mixing rate, E[Xt2 ] < ∞, kn → ∞ at a slowly varying rate (e.g., kn ∼ ln(n)) and some of other assumptions in Hill (2013),   S2 d 1/2 n (Hill p − ES p ) → N 0, 2 , p  n ∗ ∗ ∗ where S2 = limn→∞ S2n , S2n = n1 E( t=1 {Xn,t − E[Xn,t ]})2 and Xn,t = Xt I(−ln ≤ Xt ≤ q1−p ). kn {ln } is a positive sequence such that P(Xt < −ln ) = n . The advantage of the above tail trimmed ES estimator is that it leads to asymptotically standard inference even for heavy tailed time series with infinite variance (see Hill 2013). Tail-trimming is used to dampen the effect of extremes in a sample, ensuring standard Gaussian inference, and a higher rate of convergence than without trimming when the variance is infinite (see Hill 2013). Hill (2013) suggested to use kn = max{1, [0.25n2/3 /(ln(n))2ι ]} and ι = 10−10 in his simulation study. 3.5. Yamai and Yoshiba’s estimator Yamai and Yoshiba (2002) defined the following estimator of ES p ES p,β =

nβ  1 X(i) , n(β − 1 + p) i=[n(1−p)]

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

7

where β is a positive constant such that X(1) < X(2) < · · · < X[n(1−p)] < · · · < X(nβ ) < · · · < X(n) . The empirical estimator Emp p is similar to the above estimator for β = 1. The estimator is asymptotically normal, with asymptotic variance equal to   Q1−β 1 2 2 + βQ + x2 f (x)dx (1 − p)Q p 1−β n(β − 1 + p)2 Qp  2   − βQ1−β + (1 − p)Q p +

Q1−β

x f (x)dx

.

Qp

The optimal choice β is not specified by Yamai and Yoshiba. The above estimator may also be looked at as a tail trimmed estimator, where we omit the extreme sample quantiles in the right tail beyond X(nβ ) . n − nβ is the number of right tail extremes in the sample that are trimmed to construct ES p,β . Remark 3.1. 1. The asymptotic results obtained in Yamai and Yoshiba (2002), Hill (2013), Brazauskas et al. (2008), and Chen (2008) are obtained under the assumption that p is fixed. Not much seems to be known about any of these estimators under the condition that p → 0 as n → ∞. 2. If p → 0 as n → ∞ tail trimming in Hill’s (2013) estimator seems to be challenging to implement, as for even a large sample size only a few values are below qˆn,p (e.g., if p = 0.001 and n ≤ 1000 there is at most one observation below qˆn,p ). 3. If p → 0 as n → ∞, we may use β = 1 − rn in the Yamai and Yoshiba’s estimator ES p,β , where rn converges to zero at a faster rate than p as n → ∞. In this article, we use nrn = max{1, 0.25(np)2/3 /(ln(np + 1))2ι }. This choice is motivated by the choice of kn in Hill’s estimator (2013). The difference is that we use np (or np + 1) instead of n in the formula for kn proposed by Hill. The resulting rn represents the proportion of omitted right tail extremes beyond X[n(1−p)] . With this choice of β = 1 − rn , ES p,β is always defined for any combination of n and p.

3.6. Filtered historical method In this method, a suitable time series model, such as an ARMA or a GARCH, is fitted to the asset return data. Let eˆi , i = 1, 2, . . . , n, denote the residuals of the fitted model. Then, the filtered historical estimator of expected shortfall (Magadia, 2011) is given by  ηt >q

FHp =  ηt >q

ηt

I(ηt > q)

,

n where ηt = eˆt − n1 t=1 eˆt and q = η([(1−p)n]+1) is the ([(1 − p)n] + 1)th order statistic of {η1 , . . . , ηn }. The advantages of the filtered historical method are that it maintains the correlation structure in the return data without relying on the exact specification of the conditional distribution of asset returns and it takes into account the changing market volatility conditions (see Yuan 2011).

8

S. DUTTA AND S. BISWAS

4. Simulation study We compare the mean squared error (MSE) of six quantile estimators,viz. the empirical quan p , Yamai and Yoshiba’s estimator ES p,β , tile estimator Emp p , the Brazauskas et al.’s estimator ES Filtered historical FHp , Hill’s estimator Hill p and the kernel estimator Chen p,h with h = hopt . It is difficult to compute the exact value of the MSE of these estimators even if the datagenerating process is completely specified. The Monte-Carlo (MC) estimate of the MSE of  any estimator Tn of a parameter θ is defined as B1 Bj=1 (Tn j − θ )2 , where B is the number of MC samples each of size n drawn from a given process and Tn j is the estimate based on the jth MC sample, j = 1, . . . , B. We consider nine time series models. The first three of these models are as follows: (i) {Xi }i=1,2,... is an i.i.d. process, marginal distribution GPD with ξ = 1/3. (ii) {Xi }i=1,2,... is an i.i.d. process, marginal distribution Student’s-t with 4 df. (iii) {Xi }i=1,2,... is an i.i.d. process, marginal distribution N(0, 1). The first two models are motivated by empirical observations by Cont (2001) regarding the extent of tail heaviness of the marginal asset return distributions. Cont (2001) mentioned that when sample moments based on asset return data are plotted against sample size, the sample variance seems to stabilize with increase in sample size. But the behavior of the fourth-order sample moment seems to be erratic as n is increased. This feature is also exhibited by the sample moments based on i.i.d. draws from the Student’s t distribution with four degrees of freedom, which displays a tail behavior similar to many asset return distributions. Cont also mentioned that the daily return distributions of stocks, market indices, and exchange rates seem to exhibit power law tail with exponent α satisfying, ξ = 1/α varying between 0.2 and 0.4. These observations motivate the choice of the marginal distributions in (i) and (ii). The third model represents the classical Black–Schole’s assumption on log return time series. To study the effect of dependence on the above-mentioned quantile estimators we consider the following ARMA(1,1) models in Drees (2003): Xi − φXi−1 = Zi + θZi−1 , (iv ) φ = 0.95, θ = −0.6, (v ) φ = 0.95, θ = −0.9, (vi) φ = 0.3, θ = 0.9. In addition, the following GARCH(1,1) models are also considered Xt = σt Zt , 2 , (vii) σt2 = 0.0001 + 0.9Xt−1 2 2 (viii) σt2 = 0.0001 + 0.4Xt−1 + 0.5σt−1 , 2 2 + 0.9424σt−1 . (ix) σt2 = 0.0386Xt−1

The GARCH(1,1) time series is known to model the volatility clustering observed in financial time series data. The first two GARCH models are used in the simulation study in Drees (2003). The model (ix) is the GARCH model fitted to CNX NIFTY daily loss data for the duration 1 April 2012 to 31 March 2015 (details are mentioned in the next section on data analysis).

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

9

Table . Ratios estimated for Student t distribution and GARCH model with varying p. n

1− p

MSE2 MSE1

MSE3 MSE1

MSE4 MSE1

MSE5 MSE1

Student t



Garch(α = 0.0386, β = 0.9424)



. . . .

. . . .

. . . .

. . . .

. . . .

Model Coeff.

The esteemed reviewer suggested to consider a small-scale experiment to compare performance of the estimators of expected shortfall under netting agreements. The term netting is used to describe the process of offsetting mutual positions or obligations between two parties (see Fermanian and Scaillet 2005). To study the effect of netting on ES estimation, let us consider a simple portfolio made of a long position in one asset and a short position in another one with the same counter party.1 Let E1t , E2t denote the gains in the long and short positions, respectively. The vector (E1t , E2t ) is assumed to be Gaussian. mi and σi are mean and standard deviation of Eit , i = 1, 2 and ρ is the correlation coefficient. Since E1t and E2t are long and short position gains, we assume that ρ is negative. Let Dt be a Bernoulli random variable, independent of (E1t , E2t ), such that Dt = 1 represents a credit event that causes default at time t (and hence initiation of a netting agreement). In case of default, without any netting arrangement, the loss at time t equals E1t+ + E2t+ . However under netting arrangement, the loss due to default at time t equals (E1t + E2t )+ (see Fermanian and Scaillet (2005), page 937). Therefore under this netting arrangement, the loss at time t equals I(Dt = 1)(E1t + E2t )+ − I(Dt = 0)(E1t + E2t ). This motivates model (x) in our simulation study (x) Xt = I(Dt = 1)(E1t + E2t )+ − I(Dt = 0)(E1t + E2t ), where {(E1t , E2t )} is an i.i.d. Gaussian process, with m1 = 10, m2 = −1, ρ = 0.89 and σi = 1, i = 1, 2. And we take P(Dt = 1) = 0.20, i.e., the chance of default is assumed to be 20%. From each of the above models (i) − (x) and for each combination of n and p, we draw 1000 MC samples of size n. From each of these samples compute the values of the six estimators of ES p for various values of p. From these values, we compute the MC estimate of the MSE of that estimator for different choices of n, p and the underlying time series model. In  p , ES p,β , FHp , Hill p and each case, let the MC estimates of the MSE of the estimators Emp p , ES kernel-based estimator Chen p,h be denoted by MSE1 − MSE6, respectively. In Tables 3–5, we , . . . , MSE6 for p varying from 0.05 to 0.001 for the i.i.d, ARMA, and the report the ratios MSE2 MSE1 MSE1 GARCH models, respectively. The values of the ratio of the MSEs in Tables 3–5 are reported for n equal to 100, 250, 500, and 1000. In Table 6, we report these ratios for the process generated by model (x). From Tables 1–6, we have the following observations: 1. No estimator uniformly out performs the other estimators. However, we can identify some conditions under which some these estimators performs well. 2. Large n and small p. For np > 1, the Yamai and Yoshiba’s ES p,β (with our choice of β) clearly outperforms the empirical estimator for the GARCH(1,1) time series model 

Suppose a trader borrows money from a broker, takes a long position on a certain equity and also buys a put option (short position) of the market index future to hedge against any random fall in the stock market. The trader can adopt two strategies. In the event of any unforseen downward movement in the market, he may cover the gains in the put option and take delivery of the stocks by paying remaining dues to the broker in cash. Otherwise the trader can exit both the long and short positions at market price, and return the dues to the broker. In this example a sudden downward market movement is the event that causes default. The first strategy is not netted, as only positions with positive gains are used to meet the default obligation. The second strategy involves netting, where overall portfolio gain is used to meet the traders obligation to the broker. Our model (x) represents the loss in the second strategy at time t.

10

S. DUTTA AND S. BISWAS

Table . Estimating expected shortfall of nifty returns at % and .%. Index

1− p

Emp p

 ES p

ES p,β

FH p

Hill p

CNX Nifty

. .

− . − .

− . − .

− . − .

− . − .

− . − .

and i.i.d. processes with heavy tailed marginal densities (see the values of the ratio MSE3 MSE1 in Tables 3 and 5).  p and FHp seem to be more accuIf n ≥ 500 and p = 0.01, 0.001 the estimators ES rate than the empirical estimator Emp p for data generated by the i.i.d process with Student-t marginal density and ARMA models (iv ) and (v ) (see Tables 3 and 4). For  p and FHp seem ARMA model (vi) if n = 1000 and p = 0.01, 0.001 the estimators ES to be more accurate than the empirical estimator Emp p (see Table 4). For the GARCH  p and FHp seem to outperform the empirical models (vii) and (viii), the estimators ES estimator Emp p for p = 0.001 and n = 1000 (see Table 5). The FHp estimator seems to outperform Emp p for n ≥ 250 and for all p for ARMA models (iv ), (v ), and (vi) (see Table 4). In general, the estimators ES p,β , ES p , and FHp seem to be suitable for estimation of ES p especially for small p and large sample size, such that np > 1.  p and ES p,β seem to out3. n ≤ 500. If the marginal density is GPD, the estimators ES perform Emp p for n ≤ 500 and p ≥ 0.01 (see Table 3). For the ARMA models, the FHp estimator seems to perform well for n = 250 and p = 0.001 (see Table 4). For the GARCH model (ix) and n ≤ 250, the FHp seems to outperform the empirical estimator Emp p for all values of p (see Table 5). 4. Effect of decreasing p. The effect of decreasing p keeping n fixed (i.e., as we attempt to estimate more extreme quantiles based on the same sample size), on the relative accuracy of the nonparametric estimators seems to be model specific. For ARMA models,  p and FHp in comparison to the empirical estimator the accuracy of the estimators ES seems to increase as p → 0, keeping n fixed (see Table 4). This implies that for stationary time series data where ARMA models are a good fit, extreme quantiles can be  p and FHp than the sample quantile. For the GARCH estimated more accurately by ES model (ix) and n = 1000, the accuracy of the Yamai and Yoshiba’s ES p,β , in compare to Emp p , seems to improve as p → 0.  p proposed by 5. Effect of netting. Under model (x), we see that the estimator ES Brazauskas et al. (2008) outperforms the empirical estimator for all choices of n and p. See Table 6. For np < 1, the other nonparametric estimators perform poorly in compare to the empirical estimator. For np > 1, Yamai and Yoshiba’s estimator ES p,β and  p outperform the empirical estimator for data generated by model (x). The accuES  p seems to increase as p → 0, keeping n fixed, in presence of netting racy of only ES arrangement. Remark 4.1.  p and FHp are preferable 1. The above observations suggest that estimators ES p,β , ES choices for estimation of expected shortfall for large n and small p, such that np > 1. However for np < 1, the gain in accuracy using these estimators in compare to Emp p varies widely with the process generating the data. If the data are generated by a GARCH(1, 1) model, the filtered historical estimator FHp seems to perform well.

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

11

Table . Ratios estimated using different estimators of Expected Shortfall for iid cases with varying p. Dist

n

1− p

MSE2 MSE1

MSE3 MSE1

MSE4 MSE1

MSE5 MSE1

MSE6 MSE1

GPD



. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . .







Student t









N(,)









2. Kernel Smoothing for ES estimation. The values of the ratio MSE6 in Tables 1–3 indicate MSE1 that the kernel-based estimator performs poorly in compare to the empirical ES estimator, and that there is no reason to use kernel smoothing for ES estimation. Earlier we observed that kernel smoothing does not yield an asymptotically efficient estimator. So, kernel smoothing is not recommended for ES estimation. 3. ES estimation in presence of netting. In the presence of netting, the ES estimator proposed by Brazauskas et al. (2008) is recommended. Other nonparametric estimators may perform poorly in compare to the empirical estimator in presence of netting, especially for np < 1.

12

S. DUTTA AND S. BISWAS

Table . Ratios estimated for ARMA model with varying p. Model Coeff.

n

1− p

MSE2 MSE1

MSE3 MSE1

MSE4 MSE1

MSE5 MSE1

MSE6 MSE1

(.,−.)



. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .







(.,−.)









(.,.)









5. Data analysis The S & P Nifty is a well diversified 50 stock index accounting for 22 sectors of the Indian economy. For investors in the Indian equity market, it is of natural importance to assess the market risk of the Nifty index for a number of purposes, such as benchmarking performance of mutual funds (see for instance, Biswas and Dutta, 2015). We apply the above nonparametric estimators to estimate the expected shortfall of the S & P CNX Nifty index based on the daily closing values of the index from 1 April 2012 to 31 March 2015. These data are collected from national stock exchange(NSE) website (see http://www.nseindia.com/products/content/equities/indices/indices.htm). There are 742

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

13

Table . Ratios estimated for GARCH model with varying p. Model Coeff.

n

1− p

MSE2 MSE1

α = 0.9



. . . . . . . . . . . . . . . .

. . . . . . . .  . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

 .  . . . . .  . . .  .  .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .







α = 0.4, β = 0.5









α = 0.039, β = 0.942









MSE3 MSE1

MSE4 MSE1

MSE5 MSE1

MSE6 MSE1

.   . . . . .   . . .  . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

  . . . . . .  . .    . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

daily log return values in our data. The mean and standard deviation of these data are 0.0003 and 0.0041, respectively (indicating that the Nifty daily returns during 1 April 2012 to 31 March 2015 were closely scattered around zero). However, there were 235 trading days during this period where the Nifty daily return exceeded one percent. In Table 2, we report five nonparametric estimates of the 99 and 99.9 percent ES of the Nifty index during the period under study. We do not use the kernel based estimator, as it is found perform poorly in our simulations. In Table 2, we observe discrepancy among the ES estimates for p = 0.001. For real data, the underlying data-generating process is unknown. However, the GARCH model (ix) fits

14

S. DUTTA AND S. BISWAS

Table . Ratios estimated for netted with varying p. Cond.

n

1− p

MSE2 MSE1

MSE3 MSE1

MSE4 MSE1

MSE5 MSE1

MSE6 MSE1

Netted



. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . 

. . . . . . . . . . . . . . . .







well to this data. Also, the asset return data are known to exhibit similar features as an i.i.d. process with Student’s-t, with 4 degree of freedom, as the marginal density (see Cont 2001).  p , ES p,β , In Table 1, we report the ratio of the MC estimates of the MSEs of the estimators ES FHp and Hill p to the same for the empirical estimator Emp p based on samples of size n = 742 generated by the GARCH (ix) model (fitted to the Nifty data) and Student’s-t distribution, with 4 degrees of freedom. From Table 1, we see that, the 99.9% ES estimates obtained by Yamai and Yoshiba’s estimator ES p,β and Hill’s estimator Hill p perform poorly for sample of size n = 742 generated by the GARCH (ix) model.  p , FHp estimates are almost equal for both the 99 and 99.9% ES estimates. The 99 and The ES 99.9% ES estimates based on the Nifty return data are equal to −1.4 and −1.8%, respectively. These values serve as important benchmark of (daily) market risk in National Stock exchange, India, during 1 April 2012 to 31 March 2015.

Acknowledgment We are thankful to the esteemed reviewer for the suggestions and important references which lead to significant improvement of the article.

References Acerbi, C., Nordio, C., Sirtori, C., Abaxbank, Monforte Corso. (2001). Expected Shortfall as a Tool for Financial Risk Management, arXiv:cond-mat/0102304v1. Acerbi, C., Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking and Finance 26:1487–1503. Artzner, P., Delbaen, F., Eber, J. M., Heath, D. (1997). Thinking coherently. Risk 10:68–71. Artzner, P., Delbaen, F., Eber, J. M., Heath, D. (1999). Coherent measures of risk. Mathematical Finance 9(3):203–228. Azzalini, A. (1981). A Note on the estimation of a distribution function and quantiles by a Kernel method. Biometrika 68(1):326–328. Biswas, S., Dutta, S. (2015). Assessing market risk of Indian index funds. Global Business Review 16(3):511–523. Bowman, A., Hall, P., Prvan, T. (1998). Bandwidth selection for the smoothing of distribution functions. Biometrika 85:799–808.

®

COMMUNICATIONS IN STATISTICS—SIMULATION AND COMPUTATION

15

Brazauskas, V., Jones, B., Madan, L., Zitikis, R. (2008). Estimating conditional tail expectation with acturial application in view. Journal of Statistical Planning and Inference. 138:3590–3604. Chen, S.X. (2008). Nonparametric estimation of expected shortfall. Journal of Financial Econometrics 6:87–107. Chen, S.X., Tang, C.Y. (2005). Nonparametric inference of value-at-risk for dependent financial returns. Journal of Financial Economics 3(2):227–255. Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance 1:223–236. Drees, H. (2003). Extreme quantile Estimation for dependent data, with application to finance. Bernoulli 9(1):617–657. Fermanian, J.D., Scaillet, O. (2005). Sensitivity analysis of VaR and expected shortfall for portfolios under Netting agreements. Journal of Banking and Finance 29:927–958. Hill, J. B. (2013). Expected Shortfall Estimation and Gaussian Inference for Infinite Variance Time Series. Unpublished monograph. Available http://www.unc.edu/∼jbhill/expected_short_ robust_JBHILL.pdf Magadia, J. (2011). Confidence Interval for Expected Shortfall using Bootstrap Methods, 4th Annual BSP-UP Professional Chair Lectures, 21-23 February, Bangko Sentral ng Pilipinas, Malate, Manila. Nadarajah, S., Zhang, B., Chan, S. (2013). Estimation methods for expected shortfall. Quantitative Finance. 271–291. Scaillet, O. (2003). “The origin and development of VaR”, in Modern Risk Management: A History. 15th anniversary of Risk Magazine, Risk Publications, 151–158. Scaillet, O. (2004). Nonparametric estimation and sensitivity analysis of expected shortfall. Mathematical Finance 14(1):115–129. Yamai, Y., Yoshiba, T. (2002). Comparative analysis of expected shortfall and value-at-risk: Their estimation error, decomposition, and optimization. Monetary and Economic Studies 87–122. Yuan, H. (2011). Calculation of Expected Shortfall via Filtered Historical Simulation. Project report. Uppsala University.