A Probabilistic Measure of Risk H. Zmarrou1 Under supervision of Dr. A.A. Balkema Faculteit der Natuurwetenschappen, Wiskunde en Informatica , Korteweg-de Vries Institute for Mathematics, Amsterdam, The Netherlands
A thesis submitted for candidacy for the degree of Master of Mathematics Amsterdam, 17 March 2003
1
[email protected]
Acknowledgement First and foremost thanks go to A.A. Balkema for his extraordinary support, his patience, and his eye for details as my supervisor in writing this thesis. I am also very grateful to R. de Vilder for setting me on the right track to find my subject and for his clear view of the practical that allowed me to get an initial grip on the subject. Finally I would like to thank N. Schoonderwoerd for his support during the stage period.
A probabilistic measure of risk
3
Abstract This thesis develops a probabilistic methods for estimating the distribution of a portfolio when the underlying risk factors have a heavy-tailed distribution. The most work is concerned to develop a statistical model for the change in the risk factors. The perspective to the problem is a semi-parametric approach for risk factors distribution estimation. This is the mixture of two approaches, when we combine non-parametric estimation with parametric estimation of the tails of the distribution of the change in the risk factors. These methods build upon recent research in extreme value theory, which enable us to estimate the tails of a distribution. McNeil see [22], Danielsson and de Vries see [10] and others propose an efficient semi-parametric method with a solid mathematical background for estimating financial returns, and this method is expanded here to efficient estimation procedure for the joint density of the financial returns and the change in the implied volatility.
Contents 1 Introduction and problem statement 1.1 Introduction . . . . . . . . . . . . . . . 1.1.1 Objectives . . . . . . . . . . . . 1.2 Some notations and definitions . . . . 1.2.1 Problem statement . . . . . . . 1.2.2 Generalities and notation . . . 1.3 Modelling the marginal distributions . 1.3.1 Parametric approach . . . . . . 1.3.2 Historical simulation approach 1.4 Modelling the tails of the marginal . . 1.4.1 Methods . . . . . . . . . . . . . 1.4.2 Risk measures . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
2 Data analysis 2.1 Fitting the GPD . . . . . . . . . . . . . . 2.1.1 Estimation of the GPD parameters 2.1.2 Bootstrap Confidence Intervals . . 2.1.3 Remarks About GPD Parameters .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
3 Modelling the portfolio probability distribution 3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Modelling the tail of the bivarirate density . . . . . . . . . . . . . . . . . . 3.2.1 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Remarks and consequences . . . . . . . . . . . . . . . . . . . . . . 3.3 Modelling the interior of the bivariate density . . . . . . . . . . . . . . . . 3.3.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Applications and concluding remarks . . . . . . . . . . . . . . . . . . . . . 3.4.1 Associate a probability measure to the lattice representation of the 3.4.2 Deriving the distribution of a portfolio (P&L distribution) . . . . .
4
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . risk matrix . . . . . . .
. . . . . . . . . . .
. . . .
. . . . . . . . .
. . . . . . . . . . .
1 1 2 3 3 3 4 4 5 6 6 10
. . . .
12 12 14 15 17
. . . . . . . . .
20 20 20 21 21 25 26 26 26 27
Chapter 1
Introduction and problem statement ”if the population of price changes is strictly normal, on the average of any stock . . . an observation of more than five standard deviations from the mean should be observed about every 7000 years. In fact such observations seem to occur about once every three to four years.” Eugene Fama
1.1
Introduction
A central problem in market risk management is estimating the profit-and-loss distribution of a portfolio over a specified horizon (typically one day). Given this distribution, the calculation of specific risk measures is relatively straightforward. Value-at-Risk (VAR), for example, is the quantile of this distribution. The expected loss and the expected loss beyond some threshold are integrals with respect to this distribution. The difficulty in estimating these types of risk lies primarily in estimating the profit-and-loss distribution itself, especially the tail of this distribution associated with large losses. All methods for estimating or approximating the distribution of changes in portfolio value rely (at least implicitly) on two types of modelling considerations: modelling the changes in the underlying risk factors to which a portfolio is exposed, and a mechanism for translating these changes in risk factors to changes in portfolio value. Examples of risk factors are market prices, exchange rates, levels of volatility. For portfolios consisting of equities (stocks: linear portfolios), mapping changes in the risk factors to the portfolio value is straightforward. But for portfolios containing complex derivatives (e.g. options) this mapping relies on a pricing model. The simplest approach to changes in portfolio value is the ”the variance-covariance” method used by RiskMetrics. This approach is based on two fundamental assumptions (i) Changes in risk factors have a multivariate normal distribution with mean zero and covariance matrix Σ, (ii) portfolios change linearly with changes in the risk factors. Under these assumptions, the profit-and-loss distribution is normal; its standard deviation can be calculated from the covariance matrix of the underlying risk factors. The attraction of this approach lies in its simplicity, but each of this assumptions is open to criticism. Another method motivated by a Swiss bank and which is popular under financial trading company is the following lattice representation This representation is applied to option portfolios with K different stocks, this results in the evaluation of K risk matrices, where each risk matrix (see table) consists of change in value of the underlying portfolio 1
2
H. Zmarrou
R Volatility V0 − 2σv V0 − 1σv V0 V0 + 1σv V0 + 2σv
S0 − 3σs P L11 P L21 P L31 P L41 P L51
S0 − 2σs P L12 P L22 P L32 P L42 P L52
Stock price S0 − 1σs S0 P L13 P L14 P L23 P L24 P L33 P L34 P L43 P L44 P L53 P L54
S0 + σs P L15 P L25 P L35 P L45 P L55
S0 + 2σs P L16 P L26 P L36 P L46 P L56
S0 + 3σs P L17 P L27 P L37 P L47 P L57
Table 1.1: Lattice representation of risk profile with respect to one of the risk factors. In case of an options portfolio, the risk factors are the stock price and the implied volatility of the options. This representation may also be applicable to other financial instruments. The entries, P Lij , of the risk matrix R represent the change in value of the underlying portfolio with respect to different changes in the stock price kσs and in implied volatility kσv of the options, where σs and σv are respectively the standard deviation of the change in the stock price and the standard deviation of the change in the implied volatility. Six stock movements and four volatility movements are considered here. The current price and volatility are given by S0 and V0 respectively. By construction P L34 =0 in the risk matrix. Each day a simulation procedure is performed and the expected matrix is calculated. The most negative entries, called the haircut. It is as the worst case loss in the precalculated P &L matrix. This matrix representation has some shortcomings. First the value of the portfolio is simulated only for finitely many points (discrete evaluation of the portfolio) hence the haircut will also be discrete, and therefore either underpredicted or overpredicted. Secondly the haircut which is considered as theoretical maximum loss of a portfolio is calculated without any consideration of the probability of the different possible outcomes of the changes in the risk factors at the end of the holding period (one night). This means that the haircut may be either overestimated (e.g. in case both the stock price and the volatility decline rapidly) or underestimated in case when the outcome of the changes in the risk factors are outside the risk matrix.
1.1.1
Objectives
This thesis contributes to the development of a more realistic measure of the risk. The main objective is to develop, implement and evaluate a procedure to estimate the risk measure when financial risk factors are heavy-tailed. (heavy-tailed means that more probability mass is concentrated in the tails of the probability distribution than for a normal distribution). This work develops of a statistical model for the change in the risk factors. The perspective to the problem is a semi-parametric approach for risk factors distribution estimation. This is the mixture of two approaches, when we combine non-parametric estimation with parametric estimation of the tails of the distribution of the change in the risk factors. These methods build upon recent research in extreme value theory, which enable us to estimate the tails of a distribution. McNeil (1996), Danielsson & De Vries (1997) and others propose an efficient semi-parametric method with a solid mathematical background for estimating financial returns, and this method is expanded here to an efficient estimation procedure for a portfolio distribution. In this work we will not care on a study about the dependency structure of the different simulated matrices (for different stocks). We will concentrate on the bivariate distribution which govern the risk matrix for one particular stock.
A probabilistic measure of risk
1.2 1.2.1
3
Some notations and definitions Problem statement
Let X = (R, V ) ∈ R2 the random vector which governs the changes of the two risk factors (stock price and the implied volatility) such that R=0 and V=0 corresponds to the actual value of the risk factors. This vector define the value g(X) of the underlying portfolio. g(.) represent a real value function which depend on the underlying portfolio, it’s defined from a proper domain Ω ⊂ R2 to R and is called the value function. The outcome of X at the end of the holding period is given by its probability measure P on the Borel space (R2 , B). The profit-and-loss distribution is then given by the induced probability measure Pg of the transformed random variable g(X). This probability measure is defined on R. The associated distribution function Fg is given by: Fg (x) = P (X : g(X) ≤ x).
(1.1)
Example 1. Suppose that we have a portfolio which contain an European call option on non-dividend paying stock which follows a geometric Brouwnian motion. The option is valued according to BlackScholes model. The risk factors are given by X := (R, V ) (we suppose that the interest rate is constant during the holding period), where R represent the value of the stock at time t, V is the implied volatility of the option observed at time t. Let K denote the exercise price and N (.) the probability distribution function for a standardized normally distributed variable. Then, the value of the European call is given by gcall (X, t) = RN (d1 ) − K exp(−rτ )N (d2 ), where d1 =
(1.2)
ln(R/K) + (r + 12 V 2 )τ √ V τ √ d2 = d1 − V τ
τ = T − t is time to maturity, T is the exercice time and r is the interest rate. The value function of the portfolio in this example is given by equation(1.2). The stochastic vector X is the risk factor. The main problem is to find the distribution of such a value function in equation(1.2). To do this it’s necessarily to model the joint distributin of the change in the value of the random variable X, and then translate this joint distribution to our portfolio according to the change in a such value function given in equation(1.2). The modelling of the bivariate distribution may be achieved in two steps. Modelling the marginal distributions in the first step; secondly we give a model of the joint distribution based on the marginal findings. We will begin by modelling the marginal distribution function for the stock return and the change in the volatility level.
1.2.2
Generalities and notation
Portfolio risk factors consist of movements in asset prices and the movements in the volatility of the underlying. Price movements are measured relative to some initial price. Specifically, price changes is referred to as return. The return of an asset over period t is given by: Rt =
St − St−1 . St
4
H. Zmarrou
We will define in the same way the change in the volatility as: Vt − Vt−1 Wt = . Vt Where St en Vt are respectively the asset price and the volatility level at the end of the holding period and St−1 , Vt−1 the asset price and the volatility level in the previous period . In the haircut calculation daily return and the daily change in the volatility level are used. Daily returns and daily changes in the volatility level are calculated using the closing price of the asset and the closing volatility level of a given option, see [11]. As customary in the financial literature we replace percentage return for log-returns. Log-returns are given by: St rt = ln( ) St−1 The use of log-returns is especially practical in risk calculation. Equivalently to the log-return we define the log-change in the volatility level by: Vt vt = ln( ) Vt−1 From on we will work with the log-returns and we denote it by logret and with the log-changes in volatility which we denote by logvol. In this work we will analyse the daily returns of the German index DAX and the daily changes in the level of volatility obtained from the German volatilty index VDAX. For a description of this indexes see [11].
1.3 1.3.1
Modelling the marginal distributions Parametric approach
The distribution function (df ) of a risk factor is defined as: F (u) = P (rt ≤ u) 0
When the probability density function exist it is then defined as its derivative f = F . Many empirical results on the financial series indicates early in the 1960s the insufficiency of the normal distribution for modelling the marginal distribution of asset returns and their heavy-tailed character. Since then, the non-Gaussian character of the distribution has been observed in various market data. One may quantify the deviation from the normal distribution by using the kurtosis and/or the skewness of the distribution F defined as: F4 F3 k(kurtosis) = 2 , s(skewness) = q 3 F2 F2 where F is the mean and Fm is the centered moment of order m given by: Z Fm (u) = (F (u) − F (u))m dF (u). So in case the skewness and/or kurtosis of the distribution of the returns differ significantly from 0 and 3, its not adequate to approximate the return distribution with a Gaussian one. The empirical facts showed that the normal distribution is not a realistic model for the returns distribution (table 1). The empirical facts leave a considerable margin for the choice of the distribution. There are many parametric models proposed in the literature, starting with the normal, Student distribution, stable distribution, etc.
A probabilistic measure of risk
Data DAX logret VDAX logvol
Skewness -0.557497 0.904315
Kurtosis 6.999765 10.76457
5
Jarque-Bera test of Normality 1801.000 6639.347
Table 1.2: Statistical measurements of the skewness and the kurtosis for a sample of 2507 daily observations.
1.3.2
Historical simulation approach
The empirical distribution function is defined as: Fn (u) =
#{i : 1 ≤ i ≤ n, n
ri ≤ u}
r1 ,...,rn are n independent identically distributed (i.i.d.) data points with distribution function F . From the Glivenko-Cantelli theorem we know that with probability one, sup |Fn (x) − F (x)| → 0,
n → ∞.
x∈R
Thus an obvious estimator for the returns is the empirical distribution. Instead of making distributional assumptions about returns, past returns are used to predict future returns. The advantage of historical estimation is that few assumptions are required. The primary assumption is that the distribution of the returns is constant over the sample period (stationarity). The historical simulation has been shown to be a reasonable method, see Mahoney (1996). However past extreme returns can be a poor predictors of extreme events, and as a result, historical estimation (HE) should be used with care. The reason for this is that by it’s nature HE has nothing to say about the probability outcomes which are worse than the sample minimum return. Hence the choice of sample size can have a large impact on the estimation procedure. An other problem is the discreetness of the extreme returns, the empirical function is not one to one but constant between two realization, that is we may not have observations corresponding to certain quantile, see §(1.4). We propose to combine the empirical estimation for the body of the distribution function df and a parametric fitting of the tails based on extreme value theory. We will give an estimate for the density function as the derivative of the distribution function after we have constructed a continuous sample cn in (2.1) over bins (tj , tj+1 ], where the tj < tj+1 version of the df by lineary interpolating the sample df F constitute a grid on the real line. One get the continuous sample df for tj < x ≤ tj+1 cn (tj ) + x − tj [F cn (tj+1 ) − F cn (tj )]. Fn (x) = F tj+1 − tj if we denote νj the frequency of data X1 ,. . . , Xn in the bin (tj , tj+1 ], we may write the continuous sample df as cn (tj ) + (x − tj )νj (1.3) Fn (x) = F tj+1 − tj Thus, the sample df Fn only depends on the data in a grouped form. Taking the derivative of the preceding sample df Fn , one get the probability density function for tj < x ≤ tj+1 νj (1.4) fn (x) = n(tj+1 − tj ) It is very natural to visualize frequencies by means of such a histogram. It’s apparent that this histogram is an appropriate estimate of the density f of F . The histogram may also be considered as sample density.
6
1.4
H. Zmarrou
Modelling the tails of the marginal
From the practitioner’s point of view, one of the most interesting questions that tail studies can answer is what are the extreme movements that can be expected in financial markets? Have we already seen the largest ones or are we going to experience even larger movements? Are there theoretical processes that can model the type of fat tails that come out of our empirical analysis? Answers to such question are essential for risk management of exposures. Most authors in the area of kernel smoothing claim that the kernel estimate is not a consistent estimator for densities with heavy tails see [28], [31]. This is essentially because the empirical estimation has nothing to say about future observations. It turns out that we can answer these questions within the framework of the extreme value theory (EVT). Once we know the tail index, we can extend the analysis outside the sample to consider possible extreme movements that have not yet been observed historically. Extreme value theory is a powerful framework to study the tail behavior of a distribution, it has its roots in hydrology. One needed to compute how high a sea dyke had to be to guard against e.g. a storm of 100 year. EVT has recently found it way into the financial world. For a comprehensive source of this theory see [12]and [25]. In the next section we will present the parametric framework of this thesis. Within the EVT context there are two approaches to study the extremal events. One of them is the direct modelling of the distribution of minimum or maximum realizations. The other one is the modelling the exceedances over a given threshold.
1.4.1
Methods
Theoretical background In this section we summarize the results from EVT which underlie our modelling. General texts on the subject of extreme values include [12] and [25]. Suppose we have a sequence of i.i.d observation X1 , · · · , Xn from an unknown distribution function F . We are interested in estimating the distribution function Fu of values of x above a certain threshold u. The distribution function Fu is called the conditional excess distribution function (cedf) and is formally defined as Fu (y) = P (X − u ≤ y|X > u), 0 ≤ y ≤ xF − u (1.5) where X is a random variable, u is a given threshold, y = x − u are the excesses and xF ≤ ∞ is the right endpoint of F . We verify that Fu can be written in terms of F , i.e Fu (y) =
F (u + y) − F (u) F (x) − F (u) = . 1 − F (u) 1 − F (u)
(1.6)
If X has exceeded the high level u, Fu (y) measures the probability that it did not exceed it by more than y. Another useful function is the mean excess function of X defined as eX (u) = E(X − u|X > u). The distribution which comes to the fore in the modelling of excesses is the generalized Pareto distribution (GPD) defined as: Definition 1. (The generalized Pareto distribution) define the df Gξ by ½ 1 − (1 + ξx)−1/ξ Gξ (x) = 1 − exp(−x)
if ξ 6= 0 if ξ = 0
7
A probabilistic measure of risk
where x ≥ 0 if ξ ≥ 0 and 0 ≤ x ≤ −1/ξ if ξ < 0 Gξ is called a Standard generalized Pareto distribution (GPD). One can introduce the related location-scale family Gξνβ by replacing the argument above by (x−µ)/β for µ ∈ R, β > 0. In this case the support has to be adjusted. At this point EVT can prove very helpful as its provides us with a powerful result about the cedf which is stated in the following theorem due to (Balkema and de Haan [3] and (Pickands, 1975): Theorem 1. (Balkema and de Haan(1974), Pickands(1975)): Let X ∼ F Then for every ξ ∈ R, X ∈ M DA(Hξ ) if and only if lim
sup |Fu (x) − Gξβ(u) (x)| = 0
u↑xF 0 0, the distribution function F ∈ M DA(Hξ ) if and only if F¯ (x) = 1 − F (x) = x−1/ξ L(x) for some slowly varying function L. Theorem (1) says that the excess distribution Fu may be replaced by the GPD distribution Gξ when u is large. To see how it can be used, note that by equations (1.5) and (1.6) above we may write F¯ (x) = F¯ (u)F¯u (x − u)
f or
x≥u
Assuming that u is sufficiently large, we may then approximate Fu by Gξβ(u) (x) and use the empirical estimator for F¯ (u), Nu Fˆ¯u (u) = n where n X Nu = 1{Xi >u} i=1
and where n is the total number of observations. The upper tail of F (x) may be then estimated by Nu x − u −1/ξˆ Fˆ (x) = 1 − Fˆ¯ (x) = 1 − (1 + ξˆ ) for all x ≥ u. n βˆ
8
H. Zmarrou
The estimate distribution Fˆ (x) is defined conditionally on a large chosen threshold. The problem of the optimal choice of u be discussed later. One may derive the density of the GPD distribution with parameter ξ, β just by taking the derivative of Fu , we get then 1 f (x) = β
µ ¶− ξ1 −1 x 1+ξ β
The parameters ξ and β of the GPD Gξβ may be estimated by using, for example, maximum likehood estimation (MLE) once the threshold u has be chosen. Hosking and Wallis showed that for ξ > −1/2 the ˆ β) ˆ of the two parameters (ξ, β) of the GPD are asymptotically Normally distributed ML estimates (ξ, ˆ β) ˆ and covarince matrix Σ, where with mean vector (ξ, µ Σ = (1 + ξ)
(1 + ξ) β β 2β 2
¶
However we will see that this ML estimate suffers from some shortcoming. This suggest to use some extra tools to get the good estimates for the parameters. The data that are used in the maximum likehood estimation are Xi1 − u · · · Xik − u where Xi1 , · · · , Xik are the observations that exceed u. Again there is a bias-variance trade-off in the choice of u (for a intensive study of this problem, see [10]. A graphical tool known as the mean excess plot (u, eX (u)) is often used. The mean excess plot relies on the following result for generalized Pareto distribution Proposition 1. Suppose X has GPD distribution with ξ < 1 and β > 0, then, for u < xF eX (u) =
β + ξu , 1−ξ
β + ξu > 0
The restriction ξ < 1 implies that the heavy-tailed distribution must have at least a finite mean. If the threshold is large enough so that Fu is approximately Gξ,β . By proposition (1) the plot (u, eX (u)) is lineaire in u . How then is one to pick u? The mean excess plot is a graphical tool for examining the relation between the possible threshold u and the mean excess function eX (u) and checking the values of u where there is linearity. In practice it is not eX (u) but its sample version called sample mean excess function (SMEF) Pn (Xi − u)+ eˆX (u) = Pi=1 n i=1 1{Xi >u} which is plotted against u. If the SMEF is a positively sloped straight line a bove a certain threshold u, it is an indication that the data above u follows the GPD with a positive shape parameter ξ. Another tool in threshold determination is the Hill-plot. See [12] for a detailed discussion and several examples of the Hill plot. Hill (1975) proposed an estimator of ξ when ξ > 0. By ordering the data to their values as X(1,n) , X(2,n) , · · · , X(n,n) where X(1,n) ≥ X(2,n) ≥ · · · ≥ X(n,n) , the Hill’s estimator of the tail index ξ is obtained under the assumption that the distribution of large losses is of Pareto type P (X > x) = cx−1/ξ ,
ξ>0
Here also is a MLE procedure for ξ done and one get the estimator k 1X ξˆkn = ln X(i,n) − ln X(k+1,n) k i=1
9
A probabilistic measure of risk
where X(k+1) → ∞ as defined before is upper order statistics (the number of exceedance), n is the sample size, and α = 1/ξ the tail index. With this estimator we get the following estimator of the tail distribution k x Fˆ (x) = 1 − ( )−1/ξ n X(k+1,n)
for
x > X(k+1)
A Hill plot is constructed such that the estimated ξ is plotted as a function of k upper order statistics. A threshold is selected from the plot when the shape parameter become approximately stable( constant). Mason(1982) claims that the Hill estimator is a consistent estimator of ξ for fat-tailed distributions. 1/2 ˆ Hall(1982) and Goddie and smith (1987) claim that (ξ−ξ)k is asymptotical is Normally distributed with 2 zero mean and variance ξ . The same trade off between bias and variance arise in the choice of the number of the upper order statistics. if one choose a low threshold, the number of observations(exceedances) increases and the estimation becomes more precise. However, choosing a low threshold introduces some observations from the center of the distribution and the estimation becomes biased. Therefore a careful combination of several technics, such as the QQ-plot, the Hill-plot and the SMEF should be conspired in threshold determination. Until now we have just fitted the GPD to the conditional distribution of the excesses above a high threshold, we may also fit it to the tail of the original distribution above the high threshold [25]. For x ≥ u, i.e points in the tail of the distribution, F (x) = P {X ≤ x} = (1 − P (X ≤ u)Fu (x − u) + P (X ≤ u) We now know from the theorem (1) above that we can estimate Fu (x − u) by Gξβ (x − u) for u large. We can also estimate P (X ≤ u) from the data by Fn (u). The empirical distribution function evaluated at u. This means that for x ≥ u we can use the tail estimate Fb(x) = (1 − Fn (u))Gξβ (x − u) + Fn (u) This is equivalent to Nu x − u −1/ξˆ Fb(x) = 1 − (1 + ξˆ ) , n βˆ
f or
all
x≥u
To approximate the distribution function F (x). It can be shown [25] that Fb(x) is also a GP, with ˆ but with scale parameter β˜ = β(1 ˆ − Fn (u))ξˆ and location parameter the same shape parameter ξ, ˜ − Fn (u)) − 1)/ξ. ˆ µ ˜ = u − β((1 We gave a estimate of the upper tail of the marginal distribution the down tail is given then by using the same theory after inverting the sign of the data and using the equality for distribution function F (x) + F (−x) = 1 if we denote ud the positive right threshold and ug the negative left threshold we get for the left tail of the distribution the following estimator: Nug ug − x −1/ξ ˆ g , for all x ≤ u (1 + ξˆg ) Fb(x) = g ˆ n βg for some other parameters estimate of the left tail of the of the Pareto distribution. Here Nug denote the number of point below ug . When the estimates of the two tails are known, we estimate of the body of the distribution is given by the continuous sample df in equation (1.3.2), we can give an estimate of the marginal distributions. Let ud and ug be respectively the right and the left chosen threshold, and
10
H. Zmarrou
βˆd , ξˆd , βˆg , ξˆg the parameters estimate of the right and the left tails, and let ug = t1 < t2 < · · · < tk = ug be a equal partition of the segment [ug , ud ], the estimate of the marginal distribution is then given by: ! Ã ¶ k µ X N (x − tj )νj u − x ˆ u g g −1/ξg ˆ c b ) 1{x≤ug } + Fn (tj ) + 1{tj u, it’s giving by inverting the Pareto distribution estimated above βˆ n ˆ V\ aRp (X) = u + ( (1 − p)−ξ − 1). ξˆ Nu Expected Shortfall Another informative measure of risk is the expected shortfall (ES) which estimates the potential size of a loss that exceeds V aR, it’s calculate the expected loss given that this last exceeds V aR ESp = E[X|X > V aRp ]. Under the assumption that, above u, the (P&L) is a Pareto df , (ES) can be written as: ESp = V aRp + This is a direct consequence of the proposition (1).
ˆ aRp − u) βˆ + ξ(V 1 − ξˆ
Chapter 2
Data analysis We analyze the daily returns of the German stock index DAX and the daily change in the German implied volatility index VDAX (for a description of the implied volatility index VDAX, see [11] for the period from 02/01/1992 to 28/12/2001. The data set is coming from the Institut f¨ ur Mathematische Stochastik der Universit¨at Freiburg, Germany. As first information about the behavior of the underlying data can be obtained by producing the histograms of the data and we observe that in the tails the data are far from Gaussian figure (2.1). In this section plots in the left panel correspond to the logret and plots in the right panel correspond to the logvol.
Figure 2.1: Tails of the data ( lower part of the histograms), and the MLE fitted Gaussian density. Another useful tool is the quantile plot (QQ-plot), one plots the sample quantiles against the quantiles of a given distribution. Formally the plot is defined by the point cn (F
−1
(qi ), F −1 (qi )), i = 1, · · · , n
where F −1 (qi ) is the quantile function of a given distribution. If the sample data come from the family of distribution F , the plot will be close to a straight line.
2.1
Fitting the GPD
We first model the right part of both distributions, then we changed the sign of the logret and the logvol to use the EVT theory for the left tails, recall the EVT is defined in the right part of the plan. First we 12
A probabilistic measure of risk
13
will model the tails of both distributions which are more interesting for the risk management. The body of the distributions are as mentioned before non-parameterically estimated. Despite the appealing theoretical framework EVT provides, small sample issues pose problems when it comes to statistical inferences. The main problem is the selection of the high order statistic X(k+1) in the Hill estimator or the selection of the threshold in the GPD theory tells us that X(k+1) ; u should be high in order to satisfy theorem (1), but the higher the order statistics (the threshold) the less observations are left for the estimation of the parameter of the tails distributions functions. So far no automatic algorithm with satisfactory performance for the selection of the high order statistic of the threshold is available. Tools from exploratory data analysis prove helpful in approaching this problem and we will present them with our application. A graphical tool which is helpful tool for the selection of the threshold, which define the exceedance in our data set, is the sample mean excess function, this function is the sample version of the mean excess function as defined in the previous chapter, we have seen in proposition (1) that eX (u) is a linear function in u when X has a GPD df, in practice one plot the sample mean function given by Pn (Xi − u)+ eˆX (u) = Pni=1 i=1 1( Xi > u) which is plotted against u, and we pick u from the region which the (sedf) becomes more or less linear but not constant, see figure (2.2).
Figure 2.2: Sample mean execess plot of the logret, logvol. After using this plot to pick the upper threshold u or equivalently the high order statistic X(k+1) one obtains an estimators of the tails of the distribution using the equation (1.6). For the right tails of the distribution of the DAX en the VDAX. For the left tails of the distributions we use the same techniques by inverting the sign of the data. Figure (2.2) shows the mean excess plot corresponding to right part of our set of the data. From an inspection of the plots, we suggest trying the values ud = 0.0217, vd = 0.0896 or equivalently the (n − 107)th and the (n − 83)th upper orders statistics for de logret and the logvol respectively, and ug = 0.0218, vg = 0.0667 or equivalently the 125th and the 131th lower orders statistics for de logret and the logvol respectively with ud , ug correspond respectively to the right and left threshold for the logret data, vd , vg correspond respectively to the right and left threshold for the volatility shift. Another tool (Hill) to pick the number of order statistics is to plot (k, α ˆ kn ) and pick k of equivalently u from the region where the plot and therfore the estimator becomes stable.
14
H. Zmarrou
Figure(2.5) shows the Hill plot corresponding to the right tails of our data, and the plots indicates that the choice of the threshold with the help of the smedf was satisfactory.
Figure 2.3: Hill plot corresponding to right part of our data.
2.1.1
Estimation of the GPD parameters
Different methods can be used to estimate the parameters, the most uses one is the maximum likelihood estimator. The GPD parameters, for the associated excesses x ∈ GP Dξ,β , are estimated via maximum likelihood from the density: 1 1 ξ f (x) = (1 + x)−( ξ +1) (2.1) β β The pdf (2.1) has an associated log-likelihood maximization problem à ! Nu X ξ −1 max L((ξ, β) = −Nu ln β − (ξ + 1) ln(1 + Xi ) ξ,β β i=1
(2.2)
In (2.2), Nu defines the numbers of excesses,or exceedable, and Xi are the individual excesses above the estimated threshold. The parameters ξ and β are respectively the shape and the scaling factors of the distribution. The shape of the log-likelihood function is shown in Figure 5. The problem of the estimation with the MLE is that one has to maximize a function in two parameter that gives a hole region where the likelihood function is maximal and further more a comparison between the behavior of the MLE and the Hill estimator (Figure (2.5)) of the shape parameter gives us a clear signal which estimator we take. We estimate the shape parameter with help of the Hill estimator (Hill estimator is more stable). When an estimate of the shape is given we estimate the scale parameter with the MLE. In the following figure (2.4) we show the log-liklihood, in 3-D view (left) and the level curves corresponding to the upper area (where the the log-liklihood is maximal), generated on a much finer scale. The plot shows clearly the problem of estimating the two parameters when we use a MLE estimator.
The following figure (2.5) we plot the Hill estimator (solid line) corresponding to the 250 upper order statistics versus MLE (dashed) for the same number of order statistics. The Hill estimator of the exponent parameter ξ is clearly more stable then the MLE estimator of ξ.
A probabilistic measure of risk
15
Figure 2.4: The shape of the log-likelihood of Generalized Pareto Distribution.
Figure 2.5: The behavior of the MLE estimator (dashed) versus the Hill estimator (solid). We may uses a QQ-plot ( the sample quantiles above the thresholds against Gξˆβˆ quantiles) to visually check whether the data point satisfy the GPD assumptions (figure (2.6)), given this plot we may conclude that the fit was satisfactory. Figure (2.7) we give a look of the sample distributions functions above the thresholds and the fitted GPD distribution above this thresholds, the parameters of the GPD are estimated as described above. Given the point estimate of the parameters in the GPD for the logret, we can namely give a point estimate of the V aR for the losses at a given small probability say α = 0.9725 or α = 0.99 βˆ n ˆ V\ aRα (X) = u + ( (1 − α)−ξ − 1) ξˆ Nu
2.1.2
Bootstrap Confidence Intervals
If we admit that large-sample theory holds for our estimates, we can construct confidence intervals for ˆ For a confidence level 1 − 2p, percentile bootstrap confidence intervals the parameters estimates ξˆ and β. are defined by the de p and the 1 − p percentiles of the cumulative distribution function of the bootstrap ˆ bootstrap replications gives the replications of the statistics of the interest. For the shape estimate ξ, interval [ξˆlo , ξˆup ] = [ξˆ∗(p) , ξˆ∗(1−p) ]
16
H. Zmarrou
Figure 2.6: QQ-plot of the sample qunatile (logret and logvol ) above the thresholds versus the Gξˆβˆ quantiles.
Figure 2.7: The sample distributions functions above the thresholds (dotted) and the fitted GPD distribution above this thresholds (solid).
where ξˆ∗(p) is the 100p percentile of the bootstrap distribution of ξˆ∗ . We applied the bootstrap method ˆ Figure 2.8 shows the to generate 1000 samples. For each sample, we estimate the parameters ξˆ and β. ˆ The two plots show that the estimators parameters marginal densities of the bootstrap values for ξˆ and β. of the GPD are asymptotically Gaussian distributed. Table (2.1) we summarize some numerical results obtained from the DAX index and the VDAX index. Based on these results above we can also give an estimate to the risk measures introduced in the previous section for a given probability p.We compute for p = 0.99. The results in table (2.2) indicate that with probability 0.01 the tomorrow’s loss will exceed the value 3.69% and the corresponding loss, that is the averge loss in situation when the losses exceed 3.69%, is 4.77%. These point estimates are completed with a 95% confidence intervals. Thus with probability 0.01 dp measure appears the expected loss will, in 95 out of 100 cases, lies between 4.07% and 5.23%.The ES to be too expensive for financial institutions. It gives, however, a good description of the very extreme risk embedded in a financial market. It is well interesting to note that the upper bound of the confidence interval for the parameter ξug is such that the first and the second order moments are finite. This guarantees that the estimated expected
17
A probabilistic measure of risk
ˆ Figure 2.8: Empirical density and the fitted MLE Gaussian ξˆ (left) and β(right).
Parameters ξˆud βˆud ξˆug βˆu
Lower bound
Point estimate
Upper bound
g
0.216 0.008 0.244 0.007
0.265 0.010 0.313 0.008
0.338 0.011 0.366 0.010
ξˆvd βˆvd ξˆvg βˆv
0.261 0.023 0.234 0.021
0.325 0.030 0.298 0.027
0.405 0.038 0.361 0.031
g
Table 2.1: Point estimates and 95% bootsrap confidence intervals for eight parameters of the our two marginal distributions.(logret and logvol) . shortfall, which is a conditional first moment, exists.
2.1.3
Remarks About GPD Parameters
From the table (2.2) we can see from an examination of the different confidence intervals that from a statistical point of view the hypothesis that the shapes parameters (the ξ’s) are equal can be not rejected. We will need this remark when modelling the joint distribution of the two risk factors. Another important remark concerns the influence of the choice of the sample size on the parameter estimates, table (2.3) reproduces the point estimates and the bootstrap confidence intervals obtained from 10647 observation of the DAX index, i.e the entire sample from 04/01/1960 until 28/06/2002. We observe that the estimated values differ very little from values reported in table (2.2) which correspond to the estimates obtained from a subsample of 2507 observation. This indicate that this method Risk measures V\ aRp dp ES
Lower bound 3.21 4.07
Point estimate 3.69 4.77
Upper bound 3.91 5.23
Table 2.2: Point estimate of the two risk measures and a 95% bootstrap confidence interval .
18
H. Zmarrou
Parameters ξˆud βˆud ξˆug βˆu g
Lower bound 0.256 0.0060 0.326 0.0064
Point estimate 0.291 0.0070 0.356 0.0074
Upper bound 0.327 0.0080 0.386 0.0086
Table 2.3: Parameters estimates for the GPD when we consider de entire sample of the DAX index . V\ aR95% \ ES 95% V\ aR99% \ ES 99%
HE 2.18 3.17 3.29 3.97
Gaussian 2.08 2.78 2.61 3.05
GPD 2.20 3.14 3.49 4.27
Table 2.4: Comparaison of Risk measures given with help of three different methods. may be more accurate than for example the HE which depend strongly on the available data. In this paragraph we collect some risk measurement estimate: V aRp and ESp for different percentiles with the methods of historical simulation (HS), Gaussian (N ) and the generalized Parieto distribution(GPD) The Gaussian approximation provides, as expected, a poor approximation of the loss distribution. The empirical estimation approach is very simailar to the GPD methode at the 95%. At level 99% the things becomes to be different and the V \ aR99% using the HE gives an example of the relative inefficiency of this measue of risk. We also compared V aR estimated by the GPD method with the V aR proposed by the haircut methode. If we consider only seven stock movements, the haircut in this case is 3σdax where σdax is the standard deviation of the DAX index, which gives a loss of 3.99% which not differ from the expected loss with the \ V\ aR99% calculated with the GPD method, but it’s far from the ES 99% the expected loss given that the \ loss is execeded over the V aR99% . Sometimes the haircut is computed with the consideration of more than 3σdax , it’s may achieve until 10σdax movements of the stock, in this case the haircut is calculeted to 13.05%. If we consider the HE, this value will be never achieved, with the GPD method this value is achieved with a probability approximatelly equal to 0.00018 = 1, 8.10−4 , this means that with probability 1, 8.10−4 the tomorrow’s loss in the DAX index will exceed the 13.05%, if we assume the one trading year contains 250 trading days, the loss will exceed 13.05% ones every 72 years. That seems a very long period so the haircut computed with the consideration of 10σdax movements seems to be exaggerated. To close this section we give the looks of the marginal distribution and the density functions of the logret and the logvol. In this figure we have plotted the empirical distribution (points) function of the augmented by the estimated GPD tails (solid lines). In this figure we have plotted the histogram estimator of the body of the logret and logvol the augmented by the estimated GPD tails .
A probabilistic measure of risk
Figure 2.9: The distribution function of the logret and the distribution of the logvol.
Figure 2.10: The density function of the logret and the density of the logvol.
19
Chapter 3
Modelling the portfolio probability distribution Given the findings from our marginal estimation procedure, we will treat the problem similarly, i.e. a semi-parametric approach. The problem with the multivariate data and the extreme value theory is given, first, by the lack of a natural ordering of observation, i.e. the lack of a natural extrema classification, and second the extreme value theory required large samples to be sufficiently accurate.
3.1
Methodology
We suggest to estimate the bivariate density of the (X, Y ) = (logret, logvol) by, first, a data transformation, by means of a linear transformation of the data to extract the correlation structure between the two variables, next, we will make again a polar transformation to work with the polar coordinate figures (3.1) and (3.2). This generate two new random variables denoted (ϑ,ρ). Focussing on polar coordinate has the advantage that they can be analyzed in the half plan, defined by [−π, π[×[0, ∞). We will divide the transformed data into two parts, data below some threshold say r0 and data above this threshold. We will use the same approach as in the previous section as mentioned above. The density of the data below the threshold will be estimated using a bivariate histogram, the density of the data above the threshold which are more interesting for the risk management will be treated more accurately using a parametric model based on the EVT. As mentioned before the lack of the a natural ordering of the observation will be solved when we use a dimension reduction technique by projecting data above r0 on the ace [r0 , ∞).
3.2
Modelling the tail of the bivarirate density
To study the behavior of the tails of the bivariate density and their dependency on the angle we consider, (k+1)π we divide the plan on six part [ϕ + kπ [, for k = 1, · · · , 6, and ϕ ∈ [−π, π[, a smaller division of 3 ,ϕ+ 3 the plan will give a more precise result but this will decrease the number of observations on each part we consider, this will have certainly an influence on the estimation procedure. Consider the bivariate vector (X, Y ) = (logret, logvol), and put U = aX and V = bX + cY , we choose a, b and c such that the two variables U and V are not correlated and such that the variance of the two variables is equal to one and 20
21
A probabilistic measure of risk
Figure 3.1: Scatter plot of the original data (logret, logvol) (left), scatter plot of the data after a linear transformation(right).
we apply a polar transformation ρ=
p
U 2 + V 2,
ϑ = arctan
V U
where ϑ is defined on [−π, π[+2kπ and r on [0, ∞) see figures (3.1) and (3.2). For each partition [ϕ + (k+1)π kπ [, we use also the notation arctan(u, v) which gives the arc tangent of uv , taking account 3 ,ϕ + 3 which quadrant the point (u, v) is in. We project the data points on the r-axis, one get a univariate data set with coordinate on r, for r ≥ r0 we use the EVT to give an estimate of the tail of the distribution of the data points on this partition. This estimate is completely determinate by the estimators of the two parameters ξ and β, i.e. the shape and the scale parameters.
3.2.1
Data analysis
We analyze the daily returns of the German index DAX and the correspond daily volatility shift in the VDAX for the period 02/01/1992 to 28/12/2001, a scatter plot (figures (3.1) and (3.2)) are helpful graphical tools to have a first impression about the dependency structure of the data. We transform the data as describes above, we plot the data and one get the two transformed plots figures (3.1) and (3.2) (k+1)π . We choose a ϕ ∈ [−π, π[, and we divide the plan on six part (see figure 13) [ϕ + kπ [ for 3 ,ϕ + 3 k = 1, · · · , 6 ,we project on the [0, ∞) axis and for each data part we choose a threshold based on the thecnics introduced before and we use the EVT to estimate the tail parameters, for ϕ = −π and an equal threshold r0 = 2.5 one get the parameters estimates in table (3.1).
For another choice of ϕ ∈ [−π, π[, and a smaller threshold r1 = 2.3, one gets the following estimates for ϕ = −π/6 :
3.2.2
Remarks and consequences
Through the analysis of the data in the previous sections and from above we have seen that the estimations of the parameters of the GPD are not a easy task, it’s dependent on several considerations, threshold
22
H. Zmarrou
data points -π ≤ θ < −2π/3 -2π/3ϑ < −π/3 -π/3 ≤ ϑ < 0 0≤ ϑ < π/3 π/3 ≤ ϑ < 2π/3 2π/3 ≤ ϑ < π
sample size 381 426 446 486 365 403
nr.exc 25 18 16 37 27 39
ξ = 1/α 0.29 0.34 0.18 0.25 0.35 0.25
C.I [0.19,0.41] [0.20,0.51] [0.12,0.31] [0.18,0.34] [0.22,0.46] [0.18,0.33]
β 0.74 0.87 0.49 0.63 0.91 0.63
C.I [0.47,1.09] [0.50,1.33] [0.31,0.79] [0.44,0.89] [0.60,1.28] [0.44,0.85]
Table 3.1: Point estimates of the shape and scale parameters for each part of the data and the 95% bootstrap confidence intervals based on 1000 generated samples.
Figure 3.2: plot of the transformed data in the polar coordinate .
choice, sample size, etc. As consequence of the analysis done two tables above we conclude that the hypothesis of the independence of the exponent (shape) ξ and the scale parameter β on the angle we consider can not be rejected, we make this assumption in the rest of this section and we will choose a common shape and scale parameter of the tail of the density function of our data. We will make a more conservative choice of the two parameters ξ and β, taking ξˆ = 0.27 and βˆ = 0.70 as common values of the GPD parameters. This choice of ξ for our data is relatively big, that gives a not optimistic model about the tail of the distribution (heavy tailed distribution) The assumption above (independence of the exponent and the scale parameter on the angle ϑ) is equivalent to the independence of the two random variables ϑ and ρ. The joint density of the vector (ϑ,ρ) above the threshold r0 may be written as the product of the two marginal distributions corresponding to ϑ and . Formally if we denote fa the joint density of the vector (ϑ,ρ) above the threshold r0 = 2.5, g the density
data points -π/6 ≤ ϑ < π/6 π/6 ≤ ϑ < π/2 π/2 ≤ ϑ < 5π/6 5π/6 ≤ ϑ < −5π/6 -5π/6 ≤ ϑ < −π/2 -π/2 ≤ ϑ < −π/6
sample size 491 428 361 402 381 444
nr.exc 36 44 45 38 26 25
ξ = 1/α 0.20 0.33 0.31 0.31 0.22 0.26
C.I [0.14,0.32] [0.23,0.43] [0.21,0.38] [0.22,0.42] [0.15,0.31] [0.17,0.39]
β 0.45 0.76 0.72 0.72 0.51 0.61
C.I [0.31,0.74] [0.56,0.99] [0.54,0.96] [0.52,0.98] [0.33,0.77] [0.37,0.96]
Table 3.2: Point estimates of the shape and scale parameters for each part of the data and the 95% bootstrap confidence intervals based on 1000 generated samples for another partition of the plan.
23
A probabilistic measure of risk
of the random variable ϑ, hξ,β the density of the random variable ρ, we may write fa :
[−π, π[×[r0 , ∞) → [0, ∞) fa (θ, r) = g(θ)hξ,β (r)
Modelling the density of ϑ may be achieved using a kernel estimator or a orthogonal series estimator, for the other marginal density we will use the EVT theory. The marginal distribution of the random variable ρ may be fitted above r0 using the GPD µ ¶− 1ˆ −1 ξ 1 r − r0 ˆ hξ, 1+ξ ˆ βˆ (r) = ˆ ˆ β β We use a kernel estimator in estimating the density of ϑ, the kernel estimator with kernel K is defined by gˆ(θ) = (nh)−1
n X i=1
µ K
θ − ϑi h
¶
where h is the window width, also called the smothing parameter, ϑi are the realizations of the random variable ϑ. The kernel estimator is a sum of ’bumps’ placed at the observations. The kernel function determine the shape of the bumps while the window width determine their width. The choice of h and K for our data set will be done according to criterias treated in [28] as the Integrated Squared Error(ISE), Z ISE(h) = {ˆ g (x) − g(x)}2 dx The (ISE) is appropriate if we are only concerned with the data set at the hand, but it does not take into account other possible data set from the density f . Therefore it’s will be more appropriate to take the expected value of the random quantity (ISE) the Mean Integrated Squared Error(MISE), Z M ISE(h) = Eb {ˆ g (x) − g(x)}2 dxc The problem which arise using this type estimator is that the domain of definition of our density is a bounded interval namely [−π, π[, while the kernel gives weight to the numbers outside this interval. One possible way of ensuring that gˆ(θ) is zero for θ outside [−π, π[ is simply to calculate the estimate for θ ∈ [−π, π[ ignoring the boundary conditions. Then, we set gˆ(θ) to zero for θ outside [−π, π[. A drawback of this approach is that if we use a the kernel method, which produces estimates which are probability densities, the estimates obtained will no longer integrate to unity. R To make matters worse, the contribution to gˆ(θ)dθ of points near −π and π will be much less than that of points well away from the boundary, and so, even if the estimate is rescaled to make it a probability density. the weight of the distribution near zero will be underestimated. A possible approach is to wrap the kernel round the circle. Computationally it may be simpler to augment the data set by replicating shifted copies of it on the intervals [−3π, −π] and [π, 3π], to obtain the set ϑ1 − 2π, ϑ2 − 2π, · · · , ϑn − 2π, ϑ1 , ϑ2 , · · · , ϑn , ϑ1 + 2π, ϑ2 + 2π, · · · , ϑn + 2π which we denote again ϑ1 , ϑ2 , · · · , ϑn , ϑn+1 , · · · , ϑ2n , ϑ2n+1 , · · · , ϑ3n
24
H. Zmarrou
in principle we should continue to replicate on intervals further away from [−π, π[, but that is not necessary. Applying the kernel method to the augmented data set will give an estimate on [−π, π[ which has the required boundary property; the window width should be based on the sample size n and not 3n. If a kernel estimate g ? is constructed from this data set of size 3n, then an estimate based on the original data can be given by putting ½ gˆ(θ) =
3g ? 0
if θ ∈ [−π, π[ . otherwise
for a detailed analysis of this estimator see [28]. Following the analysis of Silverman we choose the 2 Gaussian kernel K(x) = √12π exp − x2 , and for the window width we choose the value h = 1. This choice of the two parameters K and h minimize the MISE. We may thus write the density of the tails as fa (θ, r) = n
−1
µ ¶− 1ˆ −1 ξ ¢ ¡ 1 r − r0 2 ˆ 0.75 1 − (θ − ϑi ) 1{|θ−ϑi | 2.5 the density estimate µ ¶ 3n X 1 arctan(u, v) − ϑi √ fa (u, v) = K × ˆ h u2 + v 2 i=1 βnh 3
Ã
√
u2 + v 2 − r0 1 + ξˆ βˆ
!− 1ˆ −1 ξ
ˆ ξˆ as given above , √ 1 where n, h, K, β, is the determinant of the Jacobi-matrix of φ−1 and arctan(u, v) = u2 +v 2 v arctan u . To get an estimate of the joint density of the vector (X, Y ) = (logret, logvol), recall the vector (U, V ) is a linear transformation of the vector (X, Y ), U = aX and V = bX + cY , using the same theorem as above, this density estimate may for a2 x2 + (bx + cy)2 ≥ r02 be written as ¶ µ 3n X 1 (ac) arctan(ax, bx + cy) − ϑi p fa (x, y) = K ˆ h a2 x2 + (bx + cy)2 βnh i=1
µ ¶− 1ˆ −1 √ 2 2 ξ a x +(bx+cy)2 −r0 ˆ × 1+ξ ˆ β
25
A probabilistic measure of risk
and zero if a2 x2 + (bx + cy)2 < r02 . As before, here is ac − b is the the determinant of the transformation (x, y) 7→ (u, v) = (ax, bx + cy) we have calculated the scalars a, b and c and we have got the values :a ≈ 76.57, b ≈ 54.67 and c ≈ 26.70. We have plotted this function in figure (3.10 ).
3.3
Modelling the interior of the bivariate density
As mentioned above we will use a bivariate histogram, we work with the polar condensates, the histogram is determined by a partition of the plan [0, r0 ] × [−π, π[. Consider a partition of the plan as follow : The r ace is partitioned on five regular partitions (intervals) [r1 , r2 [, [r2 , r3 [, · · · , [r5 , r6 [, recall that r6 = r0 = 2.5, for each partition on the r ace we consider another partition of the θ ace, we will take a different partitions corresponding to different intervals [ri , ri+1 [, i = 1, 2, · · · , 6, we do that because the distribution of the data becomes more depending on the angle θ when r becomes bigger and bigger. For the first interval i.e [r1 , r2 [ we consider the hole interval i.e [−π, π[, for the second intrval [r2 , r3 [ we consider a regular partition of length 2π/3, for the third interval we consider a regular partition of length π/2, a partition of length π/3 for the fourth interval and finally a partition of length π/4 for the last interval. This produces 22 no regular rectangles denoted Bk k = 0, 1, · · · , 21. Each Bk contains νk points and Pk i=0 νk = m where m is the sample size below the threshold r0 . Then the density is given by fˆb (r, θ) =
νk mArea(Bk )
f or
(r, θ) ∈ Bk
(3.2)
the density of the data on [0, r0 ] × [−π, π[ may be written as fˆb (r, θ) =
k X 0
νk 1B (r, θ) mArea(Bk ) k
(3.3)
where Area(Bk ) is the area of the rectangle Bk . From the polar coordinates we pass back to the Cartesian coordinates using the transformation theorem. Denote each Bk the rectangle [ri , ri+1 [×[θk , θk+1 [ where ri , θk as defined above, an estimate of the density of the vector (U, V ) on the disk of radius r0 = 2.5 may be written as fb (u, v) =
k X 0
νk 1C (u, v) mArea(Ck ) k
where Ck is a part of a ring defined as 2 Ck = {(u, v) : ri2 ≤ u2 + v 2 < ri+1
and
tan θk ≤
v < tan θk+1 } u
We can now give a joint density estimate of the original data i.e (X, Y ) = (logret, logvol) if a2 x2 + (bx + cy)2 < r02 : k X νk 1Ek (x, y) (3.4) fb (x, y) = mArea(E k) 0 where the Ek are defined as 2 Ek = {(x, y) : ri2 ≤ a2 x2 + (bx + cy)2 < ri+1
and
tan θk ≤
bx + cy < tan θk+1 } ax
26
3.3.1
H. Zmarrou
Conclusion
We arrive to a stage where we can give an estimate of the bivariate joint density corresponding to the two random variables (logret, logvol), If we denote N the total sample size i.e. N = n + m, an estimate of the joint density of (X, Y ) for (x, y) ∈ R2 may be written as fX,Y (x, y) =
m n fb (x, y) + fa (x, y) N N
fb and fa as given by equations (3.3) and (3.4 ). A plot of this density function, density function in the polar coordinate are shown in figure 14 in the end of this work.
3.4
Applications and concluding remarks
We have presented in this work a semi-parametric analysis of historical co-movement of the implied volatility on the VDAX index and the DAX index. Based on the EVT, the use of the parametric approach to model the tails of the joint behavior of the underlying and its implied volatility allows us to extend the analysis beyond the sample we have. With the joint density of the (logret, logvol), in our hands, one can deduce several interesting implications and potential application, some of which we enumerate below.
3.4.1
Associate a probability measure to the lattice representation of the risk matrix
We have presented in the introduction the lattice representation of the risk matrix used to estimate the maximum expected loss of a portfolio. We have also mentioned that the haircut is calculated without any consideration of the different possible outcomes of the changes in the risk factors (logret, logvol) at the end of the holding period (one night). With the estimated joint density of the vector (logret, logvol), one may associate a probability measure to the possible outcomes. We can thus exclude in the simulated risk matrix outcomes (cells) which occurs with a very small probability (when the implied volatility and the underlying move strongly in the same direction), and includes more cells, those which occurs more probably and which the matrix representation doesn’t take in account. This gives an other representation of the risk matrix: in place of calculating a (n × m) matrix, where n is the number of stock movement we consider, and m is the number of volatility movement, one has to simulate only in a restricted number of entries in the matrix en forget it about the other entries which are characterized by a negligible probability. One get a multidiagonal matrix see the following representation. +4σv +3σv +2σv +σv V0 −σv −2σv −3σ v −4σv
−5σs P L11 P L21 P L31 P L41 P L51 × × × ×
−4σs P L12 P L22 P L32 P L42 P L52 P L62 × × ×
−3σs P L13 P L23 P L33 P L43 P L53 P L63 P L73 × ×
−2σs P L14 P L24 P L34 P L44 P L54 P L64 P L74 P L84 ×
−σs P L15 P L25 P L35 P L45 P L55 P L65 P L75 P L85 P L95
Table 8: Risk matrix representation corrected.
S0 P L16 P L26 P L36 P L46 P L56 P L66 P L76 P L86 P L96
+1σs P L17 P L27 P L37 P L47 P L57 P L67 P L77 P L87 P L97
+2σs × P L28 P L38 P L48 P L58 P L68 P L78 P L88 P L98
+3σs × × P L39 P L49 P L59 P L69 P L79 P L89 P L99
+4σs × × × P L4.10 P L5.10 P L6.10 P L7.10 P L8.10 P L9.10
+5σs × × × × P L6.11 P L6.11 P L7.11 P L8.11 P L9.11
(3.5)
A probabilistic measure of risk
27
In the risk matrix representation, the sign (×) means that the cacluculations in this points are not necessary. This new representation is motivated by the following scatter plot generated with 100.000 simulated data points from the estimated joint density.
Figure 3.3: Scatter plot of 100.000 simulated data points from the estimated density function.
3.4.2
Deriving the distribution of a portfolio (P&L distribution)
Given a portfolio consisting of divers put and call options and a number of stocks, one may simulate enough possible outcomes (logret, logvol). From the joint density estimated above, simulate enough possible stock prices from the distribution of the logret estimated in the first chapter 1. Given the relation between implied volatility, stock price and option prices, we can simulate a large number of possible outcomes of the underlying portfolio values which can be used to give a good approximation of the distribution of the return of the portfolio.
Figure 3.4: The estimated density of the random variable R (left). The estimated density of the random variable Θ (right). .
Once we have approximate the distribution of the return of the portfolio, we may use the risk measures defined in the first part of this work (VaR and ES), to give a good approximation of the loss of a financial position that would be exceeded with small probability 1 − p (to be chosen in discussion with FORTIS
28
H. Zmarrou
−1 CLEARING). Given p, we may write V aRp (X) = FX (p), where X represent the random return of the iunderlying portfolio, F its distribution.
A probabilistic measure of risk
29
Figure 3.5: The estimated joint density of the interior of the data in polar coordinates .
Figure 3.6: The estimated joint density of the random vector (R, Θ) (tails of the joint density in polar coordinates coordinates) .
30
H. Zmarrou
Figure 3.7: The estimated joint density of the interior of the original data set (logret,logvol ).
Figure 3.8: The estimated joint density of the data polar coordinates coordinates .
A probabilistic measure of risk
Figure 3.9: The estimated tails of the joint density of the original data set (logret,logvol ) .
Figure 3.10: The estimatedjoint density of the original data set (logret,logvol ) . N.B: The plot in figure (3.10) is truncated from above.
31
Bibliography [1] T.W. Anderson, An Introduction to Multivariate Statistical Analysis, Second edition, Wiley, New York 1984. [2] A. Azzalini, Statistical Inference Based on the Likehood, Chapman and Hall, Londen 1995. [3] A.A. Balkema, L. de Haan, Residual life time at great age. Ann. Prob. 2 (1974), 792–804 [4] A.A. Balkema, S.I. Resnick, Max-Infinite Divisibility. J.Appl.prob, 14 (1977), 309-319. [5] J. Beirland, J. Teugels, P. Vynckier, (1996) Practical analysis of extreme values, Leuven University Press, Leuven 1996. [6] J.S. Butler, Estimating Value-at-Risk with a precesion measure by combining kernel estimation with historical Simulation. working paper (1997), Vanderbilt University, Nashville, TN, USA. [7] G. Consigli, G. Frascelle, Financial modelling based on extreme value analysis. Working paper (2001), EWGFM 29th Meeting, Haarlem, 15-16-17 November 2001. [8] R. Cont, Empirical properties of asset returns: stylized facts and statistical issues.Working paper (2000), Centre de Math´ematiques Appliqu´ees, Ecole Polythechnique, France. [9] R. Cont, Stochastic models of implied volatility surfaces.Working paper, Centre de Math´ematiques Appliqu´ees, Ecole Polythechnique, France. [10] J. Danielson, C.G. de Vries, Value at risk and extreme returns, Journal of emperical finance 4 (1997) 241–257
¨ [11] Deutsche BOrse, http://deutsche-boerse.com/dbag/dispatch/de/kir/gdbn avigation/home.P.Embrechts, C.Kl¨ uppelb verlag1999, Berlin.
[12] [13] P. Embrechts,A. McNeil, D.Straumann, Correlation: Pitfalls and alternatives ,Risk Magazine (1999), May 69-71. [14] P. Embrechts, F.Lindskog, A. McNeil, Modelling Dependence with Copulas and Applications to Risk Management.Working paper (2001), Departement of Mathematics, ETHZ, Z¨ urich, Switzerland. [15] S. Emmer, C. Kl¨ uppelberg, C. Tr¨ ustedt, VaR- a measure for the extreme risk. Working paper (2001), Institut f¨ ur Angewandte Mathematik der Universit¨at W¨ urzburg. [16] R.A. Fischer, L.H.C. Tippet, Limiting forms of the frequency distribution of largest or smallest member of a sample Proc.Cambridge Philos.Soc 24 (1928), 180-190. 32
A probabilistic measure of risk
33
[17] B.V. Gnedenko, Sur la distribution limit´e du terme d ’une s´erie al´eatoire. Ann.Math. 44 (1943), 423-453. [18] X. Huang, Statistics of Bivariate Extreme Values. PHD Thesis, Erasmus University Rotterdam, The Netherlands. [19] E. Jondeau, M. Rockinger, The tail behavior of stock returns: Emerging versus Mature Markets, HEC and Banque de France (1999). [20] P. Kofman, Catastrophic Risk measurement for asset Portfolios. Working paper (1997), Departement of Econometrics, Monash University, Melbourne, Australia. [21] C. Kooperberg, Bivariate density estimation With an application to survival Analysis. Journal of Computational and Graphical Statistics 7 (1998) , 322-341. [22] A. J. McNeil, Calculating quantile risk measures for financial time series using extreme value theory. Departement of Mathematics, ETHZ, Z¨ urich, Switzerland. [23] A.J. McNeil, T. Saladin, Developing scenarios for future extreme losses using the POT method. In Extremes and Integrated Risk Management (2000) , edited by Embrechts PME, published by RISK books, London. [24] R.B. Nelson, An introduction to copula. Springer, New York 1999. [25] R.D. Reiss, M. Thomas, Statistical Analysis of Extreme Values with application to Insurance, Finance, Hydrology and Other Fields. Birkh¨auser Verlag, Basel 2001. [26] RiskMetrics Group, RiskMetrics Technical Document. www.riskmetrics.com/research/techdocs. [27] D.W. Scott, Multivariate Density Estimation, Theory, Practice, and Visualization. John Wiley & Sons, Inc (1992). [28] B.W. Silverman,Density Estimation for statistics and data analysis. Londen: Chapman and Hall 1986. [29] S. Straetmans, Extreme financial returns adn their comovements. PHD Thesis, Erasmus University Rotterdam, The Netherlands 1998. [30] Y.L. Tong, The multivariate Normal Distribution. Springer, New York (1990). [31] M.P. Wand, M.C. Jones, Kernel smothing. Londen: Chapman and Hall (1995).