Backtesting Expected Shortfall - Cass Business School

12 downloads 3619 Views 4MB Size Report
Backtesting Expected Shortfall Carlo Acerbi Balazs Szekely March 18, 2015. Outline The VaR vs ES Dilemma Elicitability Three Tests for ES Numerical Results
Backtesting Expected Shortfall Carlo Acerbi Balazs Szekely March 18, 2015

©2015 MSCI Inc. All rights reserved.

Outline       

The VaR vs ES Dilemma Elicitability Three Tests for ES Numerical Results Testing ES in Practice Back to Elicitability Conclusions

2

The VaR or ES Dilemma The Importance of Backtest

VaR and ES  VaR: the best of worst x% losses; losses  ES: the average of worst x% losses

→ threshold of x% → expected x% loss

Profit and Loss Distribution

ES VaR

 ES multiple advantages: tail sensitivity, subadditivity (→ coherent), mathematical tractability, uniqueness, uses same risk models …  Last roadblock for ES toward Basel: backtesting  As of Oct13, no general consensus ES backtest

4

Basel: VaR or ES?  1994: RiskMetrics Technical Document popularizes “Value at Risk” (VaR)  1996: Basel Committee internal-based approach to capital adequacy, based on VaR  1997: Artzner et al. “Coherent Measures of Risk”: axioms for sensible risk measures. VaR criticized for not complying  2001: Rockafellar and Uryasev, Acerbi and Tasche, define “Expected Shortfall” (ES, aka CVaR), a coherent measure of risk  2000s VaR and ES are widely adopted by financial institutions as complementary tools  2013: Basel Committee replaces VaR1% with ES2.5%  VaR is maintained for model backtesting

5

Backtesting in a Nutshell predicted

predicted

predicted

predicted

realized realized

realized

t

t+1 realized

 Backtesting means checking whether realizations were in line with model forecasts  However, distributions (and statistics) do not materialize  Only one scenario at a time does

 Not all risk measures can be backtested  And it is not easy to say which ones can

 Opposite examples  VaR: just test whether losses exceed VaR x% of the times (model independent!)  Mode: probably impossible to backtest 6

Basel VaR Backtest: Null and Alternative Hypotheses 𝐻0

𝐻1

 “Fundamental review of the trading book, a revised market risk framework”, Basel Committee 2013 7

Basel VaR Backtest: Traffic Light System

 Same mechanism since 1996

8

Time Independence

 Conditional coverage tests (Christoffersen ‘98)  However, in practice, independence of arrival of VaR exceptions is normally tested through visual inspection 9

Why so Difficult with ES?  Backtesting ES is less immediate  “How can I backtest ES, if it’s the average of the x% worst cases, but life is a single scenario only?”  It sounds like we need multiple scenarios to compute a realized ES

 As a matter of fact, literature on backtesting ES is scarce and fragmentary  Only one proposed model-independent ES backtest (Kerkhof and Melenberg 2004)

 Legitimate doubt: is ES backtestable at all?  A recent result (Gneiting 2011) seemed to have proven that it is not

10

Elicitability

Eliciwhat?  A statistic 𝑌 is said to be elicitable if it solves 𝑌 = arg min 𝔼 𝑆(𝑦, 𝑋) 𝑦

for some scoring function 𝑆 𝑦, 𝑥  Popular examples:  Mean:

𝑦 = arg min 𝔼 (𝑦 − 𝑋)2

 Median:

𝑦 = arg min 𝔼 𝑦 − 𝑋

 𝛼-quantile:

𝑞𝛼 = arg min 𝔼 (𝑋 − 𝑞)(𝛼 − (𝑋 − 𝑞 < 0))

𝑦 𝑦

𝑞

→ VaR !

 If a statistic 𝑌 is elicitable, given forecasts 𝑦𝑡 and realizations 𝑥𝑡 the mean score 1 𝑆= 𝑇

𝑇

𝑆 𝑦𝑡 , 𝑥𝑡 𝑡=1

ranks predictive models: the lower the better  Natural test: think of mean squared errors, for instance 12

ES is not Elicitable  Gneiting 2011 proves that there is no 𝑆 𝑦, 𝑥 that elicits ES  Many risk experts concluded that ES cannot be backtested  ES cannot be backtested because it fails to satisfy elicitability ... If you held a gun to my head and said: ‘We have to decide by the end of the day if Basel 3.5 should move to ES, with everything we know now, or do we stick with VaR’, I would say: ‘Stick with VaR’ (Paul Embrechts, March 2013, RISK.net)

 Criticism followed the announced choice of the Basel Committee to adopt ES

 Are there risk measures which are both coherent and elicitable?  Yes, expectiles (Ziegel, 2013, Bellini et al. 2014)  Explosion of interest for these measures

13

Something is not Quite Right  If non-elicitable means non-backtestable  Why has VaR never been backtested by exploiting its elicitability?  The exceptions test has nothing to do with scoring functions

 How about variance? It’s not elicitable either! Should we drop it as well?  And what about Kerkhof and Melenberg 2004? “…contrary to common belief, ES is not harder to backtest than VaR... Furthermore, the power of the test for ES is considerably higher”

 We will answer all of these questions

14

Three Tests for Expected Shortfall

Model Independent Setting  Basel choices:  1 year lookback window: 𝑡 = 1, … 𝑇 = 250  𝑉𝑎𝑅1% and 𝐸𝑆2.5%

 Independent (not iid) Profit-loss 𝑋𝑡  𝐹𝑡 real (unknown) distribution  𝑃𝑡 forecast (model) distribution

 Only assumption: continuous distributions  → we can write:

𝐸𝑆𝛼,𝑡 = −𝔼 𝑋𝑡 𝑋𝑡 + 𝑉𝑎𝑅𝛼,𝑡 < 0]

 Standard hypothesis testing framework (the below is generic)  𝐻0 :

𝐹𝑡 = 𝑃𝑡

null: model is perfect

 𝐻1 :

𝐹 𝑃 𝐸𝑆𝛼,𝑡 > 𝐸𝑆𝛼,𝑡

alternative: ES underestimated (one sided test)

 Proposed tests must be model-independent  No distribution families, no location-scale, no asymptotic convergence, …

16

Test 1: Testing ES After VaR 𝔼

 From

𝑋𝑡 𝐸𝑆𝛼,𝑡

𝑋𝑡 + 𝑉𝑎𝑅𝛼,𝑡 < 0 + 1 = 0

𝑍1 𝑋 =

 Define

𝑇 𝑋𝑡 𝐼𝑡 𝑡=1𝐸𝑆 𝛼,𝑡 𝑇 𝐼 𝑡=1 𝑡

+1

where 𝐼𝑡 = 𝟏{𝑋𝑡+𝑉𝑎𝑅𝛼,𝑡 1

24

𝐻0 : Student-t, 𝜈 = 100; 𝐻1: Scaled Distributions

25

𝐻0 : Student-t, 𝜈 = 100; 𝐻1: Scaled Distributions

26

𝐻0 : Student-t, 𝜈 = 5; 𝐻1: Scaled Distributions

27

𝐻0 : Student-t; 𝐻1: ES Coverage 95%, 90%  𝐻0 : 𝐹𝑡 = 𝑃𝑡 , Student-t distributions  𝐻1 : 𝐹𝑡 (𝑥) = 𝑃𝑡 (𝑥/ 𝛾), scaled distribution, 𝛾 > 1, but labeled in terms of ES coverage  𝐸𝑆𝛼𝑃 = 𝐸𝑆𝛼𝐹′ with 𝛼 = 2,5% and 𝛼 ′ = 5%, 10%  Analogous to the Basel VaR coverage tables

28

𝐻0 : Student-t , 𝜈 = 100; 𝐻1: ES Coverage 95%, 90%

29

𝐻0 : Student-t , 𝜈 = 5; 𝐻1: ES Coverage 95%, 90%

30

𝐻0 : Student-t , 𝜈 = 100; 𝐻1: 𝜈 = 10, 5, 3  𝐻0 : 𝐹𝑡 = 𝑃𝑡 , Student-t distributions  𝐻1 : Student-t distributions with lower 𝜈  Notice that standard deviation is also larger 𝜎 =

31

𝜈/(𝜈 − 2)

𝐻0 : Student-t , 𝜈 = 100; 𝐻1: 𝜈 = 10, 5, 3

32

𝐻0 : Student-t , 𝜈 = 100; 𝐻1: 𝜈 = 10, 5, 3

33

𝐻0 : Student-t , 𝜈 = 10; 𝐻1: 𝜈 = 5, 3

34

𝐻0 : Normalized Student-t , 𝜈 = 100; 𝐻1: 𝜈 = 10, 5, 3  𝐻0 : 𝐹𝑡 = 𝑃𝑡 , Normalized Student-t distributions with 𝜎 = 1  𝐻1 : Normalized Student-t distributions with lower 𝜈 and 𝜎 = 1

35

𝐻0 : Normalized Student-t , 𝜈 = 100; 𝐻1: 𝜈 = 10, 5, 3

36

𝐻0 : Normalized Student-t , 𝜈 = 100; 𝐻1: 𝜈 = 10, 5, 3

37

𝐻0 : Normalized Student-t , 𝜈 = 10; 𝐻1: 𝜈 = 5, 3

38

𝐻0 : Normalized Student-t ; 𝐻1: fixed VaR2.5%    

𝐻0 : 𝐹𝑡 = 𝑃𝑡 , Normalized Student-t distributions with 𝜎 = 1 𝐻1 : Normalized Student-t distributions with lower 𝜈 and 𝜎 = 1 Distributions offset so that they have equal VaR2.5% Alternative hypotheses built to analyze Test 1

39

𝐻0 : Norm. Student-t 𝜈 = 100; 𝐻1: fixed VaR2.5%

40

𝐻0 : Norm. Student-t 𝜈 = 100; 𝐻1: fixed VaR2.5%

41

𝐻0 : Norm. Student-t 𝜈 = 10; 𝐻1: fixed VaR2.5%

42

Summary of Results  All tests for ES 2.5% generally display more power than the Basel test for VaR 1% in identical conditions  Test 1 is subordinated to testing VaR first, but has strong power for model misspecifications in the tail index  Test 2 and test 3 excel in different cases: test 2 is more powerful on scaled distributions; test 3 is more powerful on distributions with different tail index

43

Testing ES in Practice Implementing Test 2

Implementing Test 2  Test 2 can be adopted without storing forecast distributions  95% and 99.99% significance level thresholds are fixed values 𝑍2 = −0.70 and 𝑍2 = −1.8

 Every day, it is sufficient to record the quantities  𝑋𝑡 𝐼𝑡 :

magnitude of exceptions, or zero

 𝐸𝑆𝑡 :

predicted ES

 The graph

𝑠↦

𝑋𝑡 𝐼𝑡 𝑠 𝑡=1 𝑇 𝛼 𝐸𝑆

𝛼,𝑡

+

𝑠 𝑇

for 𝑠 = 1, … , 𝑇 allows us to

visualize the time evolution of the contributions to the final 𝑍2 and check time independence

45

𝑍2 Past History (of a Poor Risk Model) Z stat (ES) Z stat (VaR) Critical values (ES) Critical values (VaR) jpmembig/hist1y - confidence level: 97.5% 1

Backtest stats (rolling window)

0.5

0

-0.5

-1

-1.5

-2

-2.5

Oct13

Jan14

Apr14

46

Jul14

𝑍2 Is Not Rejected (But Some Exceptions Cluster) P&L - jpmembig/hist1y - confidence level: 97.5%

Return [%]

2 VaR bands ES bands

0 -2 -4

Jul13

Oct13

Jan14

Apr14

Contribution to ZES statistic 0.2 0 -0.2 -0.4 -0.6

Jul13

Oct13

Jan14

Apr14

Cumulative contribution to ZES statistic 1 0 -1 -2

Jul13

Oct13

Jan14

47

Apr14

𝑍2 Too High – Risk Overestimation P&L - jpmembig/hist1y - confidence level: 97.5%

Return [%]

2 VaR bands ES bands

1 0 -1 -2

Oct13

Jan14

Apr14

Jul14

Contribution to ZES statistic 0.05 0 -0.05 -0.1 -0.15

Oct13

Jan14

Apr14

Jul14

Cumulative contribution to ZES statistic 1 0 -1 -2

Oct13

Jan14

Apr14

48

Jul14

𝑍2 Far Too Low – Model failure P&L - jpmembig/hist1y - confidence level: 97.5%

Return [%]

2 VaR bands ES bands

0 -2 -4

Feb13

May13

Aug13

Nov13

Contribution to ZES statistic 0.2 0 -0.2 -0.4 -0.6

Feb13

May13

Aug13

Nov13

Cumulative contribution to ZES statistic 1 0 -1 -2 -3

Feb13

May13

Aug13

49

Nov13

Back to Elicitability Where Is The Catch?

Elicitable ≠ Backtestable  We have shown that ES can be backtested without being elicitable  Therefore backtestable ⇒ elicitable  We are tempted to conclude that elicitability is not the only way to backtest

backtestable elicitable

VaR

 Actually, there is even more…

51

ES

Elicitability: Model Selection, Not Model Testing  If a measure is elicitable, we can rank models by their mean score  However, this is a relative, not an absolute scale  A mean score alone doesn’t tell us anything about the validity of a single model

 A mean score allows to choose the best model among several ones which forecast the same random process  Ex: Bank A has three VaR forecast models and runs a contest to select the best one  → Model selection

 Statistical test instead provides a validation with absolute significance  Ex: Bank A wants to validate the model  Ex: Regulators want to compare models of Banks A, B, C, …Z against the same scale  → Model testing (a.k.a. validation)

 This key observation has been completely overlooked so far in the public debate

52

Elicitability Has Nothing To Do with Backtestability  So, in reality the situation is as follows:  Elicitability is irrelevant for model testing!

Mode

backtestable elicitable

ES

VaR

 As a matter of fact, the Mode is elicitable  It is not by chance that VaR is not backtested via elicitability 53

By the Way, ES is Elicitable (at Least in Practice)  A way to see why ES is not elicitable is to realize that there is no expression like 𝔼 𝐿 𝑋, 𝐸𝑆 = 0  If that were the case, we could interpret 𝜕𝑆 𝑋, 𝑒 𝐿 𝑋, 𝐸𝑆 = 𝜕𝑒 𝑒=𝐸𝑆 and build a scoring function 𝑆  However, there do exist null expectations involving both ES and VaR 𝔼 (𝑋 + 𝐸𝑆)(𝑋 + 𝑉𝑎𝑅 < 0) = 0 𝔼 𝑋 𝑋 + 𝑉𝑎𝑅 < 0 + 𝛼𝐸𝑆 = 0

54

By the Way, ES is Elicitable (at Least in Practice)  It is (tedious but) straightforward to build a scoring function 𝑆𝑊

𝛼𝑒 2 𝑊𝛼𝑣 2 𝑊 𝑥2 − 𝑣2 𝑣, 𝑒, 𝑥 = + − 𝛼𝑒𝑣 + 𝑒 𝑣 + 𝑥 + 2 2 2

(𝑥 + 𝑣 < 0)

which jointly elicits ES and VaR if 𝐸𝑆 < 𝑊 𝑉𝑎𝑅, for a chosen 𝑊 ∈ ℝ 𝑉𝑎𝑅, 𝐸𝑆 = arg min 𝔼 𝑆 𝑊 (𝑣, 𝑒, 𝑋) 𝑣,𝑒

 Some 𝑊 ∼ 2 will do for any concrete application  Perfectly analogous to the variance, which is elicitable only jointly with the mean (Lambert et al. 2008)  … we do not know any 𝔼 𝐿 𝑋, 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

55

=0

Model Selection via ES-VaR Elicitability  Example of elicitability contest for different Riskmetrics models on different test portfolios  Data: RiskMetrics Year in Review 2014

56

Comments to Some Recent Papers Consistency P-va

On Davis’ Consistency  Davis (2014): a risk measure 𝑋 is consistent if we can write 1 𝑛→∞ 𝑏𝑛

lim

𝑛 𝑖=1

𝑙(𝑋𝑖 , 𝑌𝑖 ) = 0 a.s.

 where 𝑋𝑖 and 𝑌𝑖 are model predictions for the price process and for the risk measure respectively and 𝑏𝑛 is a suitable divergent sequence  𝑙 is called a calibration function

 Also in this case, if we extend the notion of consistency to vector statistics , we can prove it for 𝑌 = (𝐸𝑆, 𝑉𝑎𝑅) choosing  𝑏𝑛 = 𝑛  𝑙 𝑋, 𝐸𝑆, 𝑉𝑎𝑅 =

𝑋(𝑋+𝑉𝑎𝑅