Volatility Modelling with a Generalized t Distribution

0 downloads 0 Views 256KB Size Report
Sep 29, 2016 - The generalized Student t distribution contains the general error .... Light tails means exponential, or more generally super-exponential, tails.
Volatility Modelling with a Generalized t Distribution Andrew Harvey and Rutger-Jan Lange Faculty of Economics, Cambridge University and Econometric Institute, Erasmus University, Rotterdam. September 29, 2016

Abstract Exponential generalized autoregressive conditional heteroscedasticity (EGARCH) models in which the dynamics of the logarithm of scale are driven by the conditional score are known to exhibit attractive theoretical properties for the t distribution and general error distribution (GED). A model based on the generalized t includes both as special cases. We derive the information matrix for the generalized t and show that, when parameterized with the inverse of the tail index, it remains positive de…nite in the limit as the distribution goes to a GED. We generalise further by allowing the distribution of the observations to be skewed and asymmetric. Our method for introducing asymmetry ensures that the information matrix reverts to the usual case under symmetry. We are able to derive analytic expressions for the conditional moments of our EGARCH model as well as the information matrix of the

1

dynamic parameters. The practical value of the model is illustrated with commodity and stock return data. Overall the approach o¤ers a uni…ed, ‡exible, robust and e¤ective treatment of volatility. Keywords: Asymmetry; dynamic conditional score (DCS) model; partially adaptive estimation; robustness; tail index.

1

Introduction

The generalized Student t distribution contains the general error distribution (GED), also known as the exponential power distribution (EPD), and the Student t distribution as special cases. It was introduced by McDonald and Newey (1988), who proposed using it for static regression models, and it was subsequently employed by Bollerslev et al (1994) for volatility models. The additional ‡exibility of the generalized t enables it to capture a variety of shapes at the peak of the distribution as well as in the tails. This ‡exibility goes a long way towards meeting the objection that parametric models are too restrictive and hence vulnerable to misspeci…cation. Indeed McDonald and Newey argued that the ‡exibility of the generalized t model made it ‘partially adaptive’. They highlighted the robustness of the generalized t for models of location, observing that, as with the t distribution, the score or in‡uence function is redescending (except in the limiting GED case) and so is resistant to outliers. Unlike researchers such as Theodossiou and Savva (2014), who have used GARCH models with the generalized t distribution, we have the dynamics driven by the conditional score: hence the robustness inherent in the generalized t extends to all parts of the model. Models in which the dynamics are driven by the conditional score have been

2

developed by Creal, Koopman and Lucas (2011, 2013), where they are called Generalized Autoregressive Score (GAS) models, and Harvey (2013), where they are called Dynamic Conditional Score (DCS) models. The DCS EGARCH model is set up as yt =

+ "t exp(

tpt 1 );

(1)

t = 1; :::; T;

where the "0t s are independently and identically distributed with location zero and unit scale. The stationary …rst-order dynamic model for

tjt 1 ;

the logarithm of

the scale, is t+1jt

= !(1

)+

tjt 1

+ ut ;

j j < 1;

(2)

where ut is the score of the distribution of yt conditional on past observations, 1j0

= ! and

and

are parameters, with ! denoting the unconditional mean.

The conditional score is an in‡uence function which downweights extreme observations when the conditional distribution is fat tailed. Letting the conditional distribution be Student’s t leads to a model known as Beta-t-EGARCH. This model has now been widely applied and shown to be more attractive than the standard GARCH t model both from the practical and theoretical points of view. The EGARCH formulation has the obvious attraction that scale remains positive and that stationarity conditions are straightforward - in (2) we simply require that j j < 1: When combined with conditional score dynamics, the asymptotic theory for the ML estimators is relatively straightforward and other results, such as formulae for moments and autocorrelations, may also be derived. Letting the conditional distribution be GED leads to the Gamma-GED-EGARCH model. The use of the generalized Student t distribution gives what we will call Beta-Gen-tEGARCH. This model has all the theoretical advantages of Beta-t-EGARCH, but 3

it includes Gamma-GED-EGARCH as a limiting case. The probability density function (PDF) for a standardized generalized t variable, symmetric around zero with unit scale, is

f ("t ) = K( ; ) (1 + j"t j = ) where

and

( +1)=

1 < "t < 1;

;

(3)

are positive shape parameters and

K( ; ) =

2

1=

1 B(1= ; = )

with B(:; :) denoting the beta function. Setting

(4)

= 2 gives a t distribution with

degrees of freedom, whereas GED( ) is obtained when

! 1. The Laplace

or double exponential distribution is GED(1); while GED(2) is the normal. The parameter

is the tail index, with a range 0
0 is the tail index. All fat tailed distributions are heavy-tailed, but not the converse. Thin (thick) tailed distributions have smaller (larger) kurtosis than a normal distribution.

4

In the Beta-Gen-t-EGARCH model, yt in (1) has a conditional generalized t distribution with mean

and scale exp(

ut = @ ln ft =@

tpt 1

tjt 1 ):

The conditional score in (2) is

= ( + 1) bt

1;

(5)

where ft = f (yt ) and

bt =

(jyt (jyt

j e tpt 1 ) = j"t j = : j e tpt 1 ) = + 1 j"t j +

(6)

As jyt j ! 1; ut ! ; so the score is bounded for …nite . As a result it yields an in‡uence function - or, as it is sometimes called in the volatility literature, news impact curve - that is relatively insensitive to outliers. An exact expression for the information matrix of the dynamic parameters can be constructed in much the same way as was done for the Beta-t-EGARCH and GB2 location/scale models in Harvey (2013, sub-sections 4.6 and 5.3). Further ‡exibility in the generalized t can be achieved by introducing skewness and/or creating asymmetry where the shape parameters are allowed to di¤er on either side of the location parameter. Asymmetric GED and Student t distributions were studied by Zhu and Zinde-Walsh (2009) and Zhu and Galbraith (2010) respectively. Our approach is slightly di¤erent in that we obtain a full information matrix that reverts directly to the usual information matrix under symmetry. The additional ‡exibility of the distribution has a potential cost in that it may become more di¢ cult to estimate unknown parameters, particularly shape parameters. The extent to which this is a problem for maximum likelihood (ML) estimation can be partially determined by examining asymptotic standard errors

5

and the correlations between estimates. Having an explicit expression for the information matrix can give considerable insight. For example, it turns out that estimating a skew parameter has a relatively small e¤ect on other shape parameters. The plan of the paper is as follows. Section 2 gives the scores and information matrix for the generalized t distribution. We then derive the limiting score and information matrix as the inverse tail index goes to zero. Readers primarily interested in the dynamic Beta-Gen-t-EGARCH model may wish to move directly to Section 3 where its properties are discussed and applications to commodity and stock returns are given. The generalized t is extended to include skewness and/or asymmetry in Section 4. The Appendices can be found on-line as Supporting Information; the link is given at the end of the paper.

2

Properties of the generalized t distribution

The generalized t distribution has fat tails for …nite

and moments exist up to,

but not including, : The extra shape parameter, ; introduces more ‡exibility into the distribution, particularly at the peak. (Note that although f 0 (0) = 0 for the derivative is not continuous at

= 1 and for

> 1;

< 1, it becomes in…nite.) Many

of the properties of the distribution, including the asymptotic distribution of the ML estimators, depend on the fact that bt in (6) is distributed as beta(1= ; = ) at the true parameter values. The CDF of the generalized t; like that of bt ; is a regularized incomplete beta function. The corresponding quantile function is readily available. Hence Value at Risk (VaR) may be computed. A corresponding 6

expression for Expected Shortfall (ES) may also be derived; see Zhu and Galbraith (2011, pp 768-70) and Zhu (2012).

2.1

Score functions

When yt =

+ "t exp( ); the score functions for location and (the logarithm of)

scale are @ ln ft = @ Provided

+1 (1 e

bt ) j"t j

1

) and

sgn(yt

@ ln ft = ( + 1)bt @

(7)

1:

is …nite, the score (in‡uence) function of location is redescending in

that it approaches zero as y moves away from zero. The score function for scale has corresponding robustness features in that it is bounded. (This behaviour accords with the general relationship between the location score and the score for the logarithm of scale, namely @ ln f =@ = (y

)@ ln f =@

1:)

For the purposes of taking limits as the tail index goes to in…nity, it is better to work with its reciprocal, : The score is then @ ln ft = @ where

(1=

)

(1=

+ 1= ) +

ln(1

bt )

( + 1)bt

2

;

(:) is the digamma function. As regards ,

@ ln ft = @

ln[1

bt ]

( + 1) bt ln bt + fbt

(1

bt )g ln[ (1

bt )]

2

+

(1= ) +

(1=

)

The score function for ; like that of

( + 1) (1=

+ 1= )

2

but unlike that of

7

:

(or ); is bounded: as

yt !

1; bt ! 1 and

lim

yt ! 1

h

ln[1

bt ]

ln(1

bt )) ! ln

bt ) + bt ln( (1

( + 1) bt ln bt + fbt

(1

so

bt )g ln[ (1

When yt = 0; so bt = 0; the term in square brackets is equal to

i bt )] = ln : ln :

It can be veri…ed that the above scores all have zero expectation by using the properties of a beta variable given in Appendix A.

2.2

Information matrix

The static information matrix for the GB2 distribution with an exponential link function for scale is given in Harvey (2013, p 164). The information matrix for the generalized t is more complicated because of the re-parameterization2 . The derivation draws on formulae in Appendix A. Proposition 1 The information matrix for the parameters of the generalized t distribution, (3), with …nite tail index, that is 0

2

B B B B IB B B @

1

0

C B I C B C B 0 C B C=B C B 0 C B A @ 0

0

> 0; takes the form 0

I

I

I

I

I

I

1 0 C C I C C C I C C A I

(8)

The information matrix in Zhu (2012) is slightly simpler because he re-parameterizes the scale. But this necessitates another transformation to get the asymptotic covariance matrix for the ML estimators of the original set of parameters; see also Zhu and Galbraith (2010, p 300) and our Appendix D. Furthermore Zhu does not parameterize in terms of and this is crucial for obtaining the limiting information matrix as the tail index goes to in…nity.

8

with diagonal elements

I

=

I

=

( + 1) 3 2= + + 1) exp(2 )

(

+ +1

1

1) +

= 2

0

0 1

0

+

( (1 + )2

1 2

0

=

where

2

1

(1=

0

) 4

(1=

0

+

3

+ I

1= ) (2= + 1= (1= ) (1= )

)

;

> 1=2;

;

ln + ( I

(2

1

+

0 1

(1 +

2

)

2

+ + 1) +1

0

1 2

4

+ 1= ) 2

2

;

2 + +1 ; ( + 1) ( + + 1)

(:) is the trigamma function. The o¤-diagonal elements are

I

I When

ln +

=

1+ (

=

ln

1

1+

1

;

+ + 1) 1+

( + 1) (

1

1+

I 1

+

+ + 1)

= 0

( + 1) ( 1

+ + 1)

(1 + ) 3

0

;

+1

3

is known to be two, the information matrix for the t distribution is

obtained. The double generalized Pareto has

= 1 and the information matrix is

then given by the expression at the end of Appendix C. The ML estimators of the shape parameters are highly correlated. For example, if

= 2 and

= 1=7 inverting the information matrix shows the (large sample)

correlation between the estimators of

and

to be 0:842 . This strong positive

correlation means that a higher value of ; corresponding to a heavier tail, will induce a higher value of , meaning a lighter tail. The correlations with scale are smaller: between the estimates of

and

0:266 : 9

it is 0:124 and between

and

it is

Remark 1 for

The asymptotic distribution theory for ML estimation is not standard

< 2 because of the singularities in the derivatives of the log-density at : In

particular the second derivative with respect to expectation cannot be found; I

does not exist at y =

and its

in (8) is obtained from the expectation of the

square of the …rst derivative. Nevertheless McDonald and Newey (1988) are able to show consistency and asymptotic normality provided

> 1: The problems are

essentially the same as those that arise for the GED; see Zhu and Zinde-Walsh (2009, p 91). If

is known, the usual asymptotics hold for the scale and shape

parameter of the GED and this will remain true for the generalized t. The ML estimators of these parameters are consistent and asymptotically normal when is estimated (consistently) by the median. If

is estimated by ML, the block

diagonality of the information matrix suggests that inference for the scale and shape parameters will still be valid for any

> 0: This is borne out by a Monte

Carlo study for the GED reported in Bottazzi and Secchi (2011, p 1002-6). They also demonstrate that the ML estimator of

will be asymptotically e¢ cient for

> 1=2; the condition required to ensure that I

2.3

exists.

General error distribution as a limiting case

The PDF of a standardized GED variable, in the form used by Zhu and ZindeWalsh (2009), is 1 1=

f ("t ) =

2' (1= )

10

exp

1

j"t j

:

(9)

The GED is a special case of generalized t because, as

! 1; that is

! 0; the

use of a result in Davis (1964, p. 257) gives

K( ; ) =

2

1=

1 = B(1= ; = ) 2

(1= + = ) ! (1= ) ( = ) 2

1=

1=

(1= )

;

while 1+

1

( +1)=

1

! exp

j"t j

j"t j

:

The limiting scores may be similarly obtained from those in sub-section 2.1: Because bt ! 0 as

! 0, whereas bt = ! j"t j , it is easy to see that

@ ln ft =e !0 @

lim

j"t j

1

sgn("t )

@ ln ft = j"t j !0 @

and

lim

1:

(10)

The limiting scores for the shape parameters are derived in Appendix B by using a known expansion for the digamma function, together with Taylor series expansions. As a result (j"t j @ ln ft = lim !0 @ 2

2

1)

1 2

(11)

and lim

!0

@ ln ft j"t j = @

j"t j ln j"t j +

+ ln( ) + (1= ) 2

1

:

(12)

At the true parameter values, j"t j is distributed as gamma( ; 1= ) and so it is possible to check that all the scores have zero expectation. The main point to note concerns E[j"t j ln j"t j ] which is evaluated by taking the expectation of ln j"t j with respect to a gamma( ; 1 + 1= ) variable, giving (1 + 1= ) + ln :

11

The information matrix for ; 1

0

0

and I

[1+ 1 ]

ln +

(

+ln +

ln +

+1)

ln +

[1+ 1 ]+ 21

C [1+ 1 ]+ 21 C C: C A

3 +1 2

in (8) are immediately seen to be ; but …nding the limit of I

limits of I

1

2 submatrix is the information matrix of the GED. The elements

= 1 and we have

is shown in Appendix C that lim I

=

2+

! 0:

(13)

straightforward, except when

lim I

2

[ 1 ]) +( 1 +1) 0 [ 1 ] ( 3

The top-left 2 I

[1+ 1 ]

ln +

C B C B C=B C B A @

B B IB B @

and , goes to the following limit as

and I 2 2=

0

= (3 + 1)=2 as

(1= )

0

(1 + 1= ) =

is less 2

: It

! 0: The derivation of the

is similar. The limiting information quantity for ; that is

exp( 2 ) (2

1= )= (1= ); is found by using (6.1.46) in Davis

(1964, p. 257). If the generalized t distribution is de…ned in terms of elements I(

;:)

! 0 as

rather than , all

! 1 and so the information matrix is singular. The

parameterization has the practical value of ensuring that ML estimation is stable when the tail index is large.

2.4 When

Testing against fat tails is …nite, the tails of the generalized t are fat, but this is no longer the case

for the limiting GED distribution; see footnote 1. Thus, within the generalized t framework, a test of the null hypothesis that the alternative of

< 1, that is

= 1; or equivalently

= 0; against

> 0; is a test against fat tails. As has been

shown, the regularity condition that the information matrix be positive de…nite under the null is satis…ed when the inverse tail parameterization is used.

12

A likelihood ratio (LR) test of

= 0 is straightforward to implement but

> 0; is one-sided, the asymptotic distribution of the LR

because the alternative,

2 1

statistic is a mixture of a

and a degenerate distribution with its mass at the

origin. Hence the 5% critical value is the usual 10% one, that is 2:71. A Lagrange multiplier (LM), or score, test can be implemented using (13) and the limiting score for

LM =

where (I

1

)

sponding to

given in (11). The test statistic is (I

X j"t j2

1

) 4T

X 2j"t j

+

2

T

T

(14)

;

is the diagonal element in the inverse information matrix correand the parameters ;

and

= 0: If

under the null hypothesis that

are replaced by their ML estimators

2 1 -distributed

LM statistic is asymptotically

1

is known, (I

= 2=( + 1): The

)

under the null. For

= 2; the test

is simply a reformulation of the standard excess kurtosis test. This is no longer the case for

6= 2 because then the contrast is between the moments of j"t j2 and

j"t j : Tests of the null hypothesis that out. The LR statistic of statistic. A joint test of

= =

0

takes a speci…c value,

against 0

and

6= 0

0

0;

may also be carried

is asymptotically

= 0 against

6=

0

2 1;

and

as is the LM > 0 is also

possible. The LM test statistic, which is asymptotically distributed as the null, requires the limiting score for ; as given in (12).

13

2 2

under

3

Dynamic Scale

The Beta-Gen-t-EGARCH model has "t in equation (1) distributed as a standardized generalized t. In a survey paper on GARCH, Bollerslev, Engle and Nelson (1994, pp 3017-23) proposed a related model in which the

and

parameters in

an impact curve expression similar in form to (5) are di¤erent from the

and

shape parameters in the conditional distribution. They …tted the model to US stock returns, but did not go on to study its properties or develop it further. However, there are good theoretical reasons for having the same

and

in the impact

curve and the conditional distribution, that is making the impact curve the conditional score. Indeed the properties set out in the next two sub-sections can only be obtained for a model in which the impact curve is the score because only then is it a linear function of a beta variable.

3.1

Moments and forecasts

Because bt in the score variable, ut , is distributed as beta(1= ; = ) for …nite , Theorem 7 in Harvey (2013, ch 4) generalizes immediately to give exact expressions for the even unconditional moments in the stationary Beta-Gen-t-EGARCH model. We have c

c=

c

E (jyt j ) = E (j"t j ) (c) = where (c) = ec! ut

j

when

tpt 1

1 j=1 e

jc

;

(

j c);

(c + 1) ( c + ) (c); (1) ( ) with

j; j

1 0; and a …rst-order stationary dynamic model, (2), for volatility, the single= ( ; ; !)0 is I( ) = I D( ), where I

observation information matrix for is as in (8) and 0

B B D( ) = D B B @

1

2

3

6 A D E 7 C 6 7 C C = 1 6 D B F 7; 7 C 1 b6 4 5 A E F C !

6= 0; j j < 1;

(15)

with

A = I ; D =

I a ; 1 a

I 2 (1 + a ) ; 2 (1 )(1 a ) c(1 ) E= and 1 a

B=

and the quantities a = + E(@ut =@ ); b =

15

2

+2

)2 (1 + a) ; 1 a ac (1 ) F = (1 a)(1 a )

C=

(1

E (@ut =@ )+ 2 E[(@ut =@ )2 ] and

c = E(ut @ut =@ ), evaluated from

E

@ut = @

and E

"

;

+1+

@ut @

2

#

E ut

2 @ut (1 ) = @ ( + 1 + 2 )( + 1 +

(16)

)

( + 1)2 ( + 1) ( + 1) : + + 1) (2 + + 1) ( + + 1) ( + 1) 2

=

(3

A necessary condition for I( ) to be positive de…nite is b < 1: (Note that b is not the same as the beta variate bt .) Proposition 3 Let e denote the ML estimator of

and let

0

values. A su¢ cient condition for the limiting distribution of

p

denote the true

T(e

0 ),

to be

multivariate normal with mean vector zero and a covariance matrix given by the inverse of I(

0)

is j

(17)

( + 1)=4j < 1:

The proof of the above proposition can be constructed by generalizing the result3 for Beta-t-EGARCH in Harvey (2013, pp 37-45, 116-7). The conditions which need to be veri…ed are those in Jensen and Rahbek (2004) and in the present context the main point is that the boundedness of third derivatives of the log-likelihood is guaranteed by the fact that the terms are functions of bounded variables of the form bht (1

bt )k ; h; k = 1; 2; ::: The required derivatives of the log-likelihood are

stochastic recurrence equations which depend on xt = + (@ut =@ are stationary and ergodic at

=

0

tpt 1 )

and which

because b < 1 implies E(ln jxt j) < 0: These

3

The condition jaj < 1 in Lemmas 9 and 10 of Harvey (2013) is not su¢ cient. The error comes from incorrectly writing jaj = jE(xt )j as jaj = E(jxt j); following on from an earlier typographical 2 error on p 37. The condition jaj < 1 should be replaced by b < 1: Since b = E(x2t ) = E(jxt j ) 2 [E(jxt j)] ; it follows that b < 1 is su¢ cient to ensure that E(jxt j) < 1 and this in turn ensures that E(ln jxt j) < 0 by Jensen’s inequality.

16

derivatives can be bounded for any set of parameter values for which jxt j < 1 for all t: Expression (17) is obtained by noting that @ut =@

tpt 1

and that the maximum value that can be taken by bt (1

=

( + 1)bt (1

bt ) < 0

bt ) is 1=4:

The condition in (17) is more stringent than b < 1 and is probably much stronger than necessary4 . However, so long as

is not too large, it seems to

comfortably include parameter values that arise in practice. We conjecture that the ML estimator of the full parameter vector, ( ; will be consistent and asymptotically normal for

0

; ; )0 ;

> 1=2; see Remark 2. An

analytic expression for the information matrix can be derived, but is somewhat intricate. However, it can again be shown that most of the terms in the third derivatives of the log-likelihood are functions of bounded variables.

3.3

Gamma-GED-EGARCH

Letting

! 1 gives the Gamma-GED-EGARCH model, in which

tpt 1

evolves

as a linear function of the conditional score variable5 , obtained from (10) as ut = (jyt

j exp(

tpt 1 ))

1; t = 1; :::; T; with j"t j distributed as gamma ( ; 1= )

at the true parameter values. When

< 2, the response is less sensitive to

outliers than it is for a normal distribution, but it does not have the robustness of Beta-t-EGARCH because ut is not bounded. There may therefore be a case for approximating Gamma-GED-EGARCH by a Beta-Gen-t-EGARCH model in which

is large but …nite; similar sentiments are expressed by McDonald and

Newey (1988). 4

Note that condition (17) also allows us to relax the assumption that the starting value, 1p0 , is known, because it ensures that the volatility process is invertible; see Straumann & Mikosch (2006). 5 The parameterization of the GED in Harvey (2013, section 4.4) is slightly di¤erent.

17

The information matrix is given by setting ; E [(@ut =@ )2 ] =

2

= 0 in (16) to yield E [@ut =@ ] = 2

( + 1) and E [ut (@ut =@ )] =

: When a Gamma-GED-

EGARCH model has been estimated, the LM statistic against fat tails takes the form of (14) with (I

3.4

1

)

now obtained from the full information matrix.

Asymmetric impact curve (leverage)

Returns may have a di¤erent e¤ect on volatility depending on whether they are positive or negative. This e¤ect, sometimes known as leverage, may be modeled by adding another variable to the dynamic equation. Speci…cally

t+1jt

where ut = sgn (

3.5

= ! (1

)+

yt )(ut + 1) and

tjt 1

+ ut +

ut ;

(18)

is a parameter.

Example

Beta-Gen-t-EGARCH models were …tted to daily observations on returns from the iShares Silver Trust from April 28th 2006 to February 11th 2015. The source is http://…nance.yahoo.com/q?s=SLV. Table 1 shows the results for the general model together with those for Beta-t-EGARCH, that is EGARCH, that is

= 2; and Gamma-GED-

= 0: As is clear from the estimates and their (numerical) SEs

- and as is apparent from the analytic information matrix - the ML estimators of the two shape parameters are strongly correlated. When ; estimated as 1:34 in the general model, is set to two, the estimate of and correspondingly

increases from 0:076 to 0:220

goes from 13:16 to 4:55. The SE of

to 0:023. Similarly the estimate of

decreases when 18

is reduced from 0:045

is set to zero and its SE falls

from 0:138 to 0:046. Leverage e¤ects are estimated but appear to be insigni…cant. All three models …t well according to the Box-Ljung statistics6 , Q (20) and Q (20); constructed from the …rst 20 autocorrelations of scores for location and scale respectively. However Beta-t-EGARCH is clearly worse than the other two models on the AIC and BIC goodness of …t criteria. Indeed the hypothesis that = 2 is convincingly rejected by Wald and LR tests. The situation with regard is less clear cut. The LR statistic (that is minus 2 times the logarithm of

to

the likelihood ratio) of the null hypothesis that 3.84, the 5% signi…cance value for a

2 1

= 0 is 3:60 which is less than

distribution. However, as noted in sub-

section 2.4, the correct 5% signi…cance value is only 2:70 because of the one-sided alternative. Thus there may be a small gain from using Beta-Gen-t-EGARCH rather than Gamma-GED-EGARCH. Following a suggestion of a referee, we also estimated a simple version of the impact curve in Bollerslev et al (1994, p 3019, expression 9.9) in which the shape parameters in the impact curve are not constrained to be the same as those in the conditional distribution. (Unlike Bollerslev et al (1994) we do not divide by exp(

tjt 1 );

where

is yet another parameter to be estimated.) In view of the

…ndings in Table 1 the leverage term was not included. The results are shown in Table 2, together with the estimates for the generalized t model of Table 1 but with leverage excluded (the estimates are virtually unchanged). As can be seen, estimating the additional parameters,

y

and

y

; gives only a small improvement

in …t and both AIC and BIC increase. There is an increase in the SE of is smaller than the

and

y

estimated in the constrained model so the downweighting of

6

Prob. values, computed from 220 and 218 distributions for Q (20) and Q (20) respectively, are shown; two degrees of freedom are lost for scale because two dynamic parameters are estimated.

19

outliers is less pronounced. From the practical point of view, it is worth noting that the model was harder to estimate. We also estimated our Beta-Gen-t-EGARCH model and the Bollerslev et al (1994, p 3019) model, both with leverage, for daily returns on SP500 from 2 January 2004 to 31 December 2013. The leverage term in the impact curve of the Bollerslev model has the inverse tail index,

z

; unconstrained but the other shape

parameter, corresponding to ; is set to one. The results are shown in Table 3. Again the simpler Beta-Gen-t-EGARCH model is better according to AIC and BIC. Even more signi…cant is the fact that the estimated inverse tail indices for both terms in the impact curve are close to zero and so, with an estimate of 3.65 for

y

; the weight given to an outlier could be excessive. These estimates of the

inverse tail indices are not inconsistent with the those reported in Bollerslev et al (1994, p 3021). Overall we see no gain from the generality of the Bollerslev et al (1994) formulation: indeed it increases AIC and BIC. Furthermore it seems that the price paid for the model’s generality is that, when estimated, its ability to deal with outliers may be compromised.

4

Asymmetry and skewness

Skewness can be introduced into the generalized t distribution in the same way as was done by Harvey and Sucarrat (2014) for the t distribution. An equivalent formulation was proposed by Zhu and Galbraith (2010) and they made a further extension to asymmetry by allowing di¤erent degrees of freedom above and below : Extending the generalized t distribution to handle skewness and asymmetry is, 20

in principle, straightforward: we now have

1

and

2

as well as

1

and

2:

We write the PDF of the asymmetric skew generalized t distribution as

K12 f (y) = exp( )

where 0
> > > < > > > > :

1 y 1+ 1 2 e 1+

1 2

1

1 +1)= 1

; 2

y 2(1

(

(

; (19)

2 +1)= 2

;

)e

< 1 and K12 = 1=[ =K1 + (1

y

y> ;

)=K2 ], with Ki = K( i ;

i );

i = 1; 2;

de…ned as in (4). This distribution is similar to, but not quite the same as, the skew generalized t distribution used in Hansen et al (2010). The derivation of (19) and the way in which the approach di¤ers from that in Zhu (2012) and Zhu and Galbraith (2010) is set out in Appendix D. The distribution is constructed so that f (y) is continuous through y =

. Furthermore, the derivative f 0 (y) is zero on

both sides of y = , and thus continuous, when both

1

and

2

are greater than

one. The second derivative of f (y) is generally discontinuous. Under symmetry, K12 = K1 = K2 = K( ; ), irrespective of . When

= 1=2, the distribution is

not skewed, even though it may be asymmetric. The probability that y


The logarithm of scale could be written as or y < :

+ ln 2 or

21

+ ln 2(1

); depending on whether

y

y

: Not only does

have the convenient property of being equal to Pr(y < ),

but, more importantly, a single scale parameter allows a straightforward extension to dynamic volatility. The EGARCH dynamics are driven by the scores uit = (

i

1; i = 1; 2; where

+ 1)bit

b1t =

(jyt (jyt

(jyt j =2 e tpt 1 ) 1 = 1 and b2t = 1 tpt 1 j =2 e ) = 1+1 (jyt

j =2(1 j =2(1

)e )e

) 2= 2 ; 1) 2= 2+1

tpt 1

tpt

and i = 2 for y > : At the true parameter values, the b0it s

with i = 1 for y

are (independently) distributed as beta(1= i ; i = i ); i = 1; 2. The asymmetry in the score should not be confused with the asymmetric e¤ects introduced into the impact curve of (18) by leverage.

4.1

Skewness

When there is skewness but no asymmetry in the shape parameters,

=

y

: When

> 1=2, the left-hand side scale, that is for y < ; will be ampli…ed, whereas the right-hand side scale will be diminished. The opposite is true for The score for meters,

is uncorrelated with the score for

< 1=2:

and the two shape para-

and : However, it is correlated with : The sub-matrix for

and

in

the information matrix, which is obtained from the more general result in the next sub-section, is 0

B I@

1

0

I B C B 4 (1 ) A=@ I+ 2 (1 )

22

1 I+ 2 (1 ) C C; I +1 A (1 )

(21)

where I

and I

are as in (8) and

I+ =

Letting

! 0 gives I + = e

1= e B(1= ; 1=

2 1=

( + 1) 2 : ) + +1

= (1= ); which is the expression for the GED.

The information sub-matrix for ;

and

is unchanged. Thus the introduction

of skewness has no e¤ect on the asymptotic distribution of the ML estimators of and : For the EGARCH model, the fact that b1t and b2t have the same beta

;

distribution means that D( ) is as in (15). If the standard generalized t of the previous section is estimated, an LM test against skewness may be carried out. The score for

at

= 0:5 is 2 ( + 1) bt :sgn(yt

): This is similar to the score for ; but it is an odd function because of the sign. That this is the case is perhaps not surprising because skewness is formulated as asymmetry in the scale. (In fact it is

2 times the leverage term, ut ; in (18).) The

LM statistic, LM = is asymptotically

4.2

2 1

(I

1

)

T

2 ( + 1)

X

2

bt :sgn(yt

under the null hypothesis that

)

;

= 0:5:

General information matrix

The following expression for the information matrix of a skew, asymmetric distribution with shape parameters

i,

i = 1; 2; is derived in Appendix D. The information

matrix for a skew, asymmetric generalized t distribution is a special case with ( i;

i );

i

=

i = 1; 2: An advantage of our approach over that of Zhu and co-authors is

that the information matrices for either skewness or asymmetry emerge directly.

23

Proposition 4 Let I1 ( ; ;

1)

and I2 ( ; ;

for the distributions with shape parameters I1 ( ; ;

1)

and I+ 2( ; ;

2)

2) 1

denote the information matrices and

2

respectively. Further, let

denote the information matrices for the same two

distributions, but conditioned on the observation being below , for for

2:

1;

or above ;

Then the information matrix of a skewed and asymmetric random variable

24

constructed in the same way as (19) is

0

B B B B B IB B B B B @

0

1

1 2

C C C C C C = C C C C A

y

B B B B B B B B B B B B B B B @

I1; 4

2

I1; 2

2

I1; 2

+1

I1;

I1;

I1;

I1;

I1;

1

I1;

1

2

0

0

0

1

=

2

I1;

1

I1;

1

0

I1;

1

+ I2; 2(1

I2; (1

0

)

0

2

)

I2; (1

+1 )2

I2; 1

)

0

2(1

)

0

0 2

C C C C C C C C C C C C C C C A

2(1 I2; (1

0 0

I2;

1

+ I2;

0

I2;

0

2

0

)2

I2; (1

)

2(1

@ 2 ln K1 @ 1 @ 01

+ I2;

2(1

0 + I2;

+

0

0

)2

+ I2; 2(1 )2

0

1 1

+ I2;

B B B 0 B B B 0 B B B B 0 @ 0 Corollary 1 When

0

2

0

I2; 4(1

B B B B B B B B B y B )B B B B B B B B @

+(1

1

2

I1; 2 I1;

I1;

2

I1; 2

I2;

2

) 2

)

2

0

0 I2;

0

0

0

@ 2 ln K12 @ 2

0

@ 2 ln K12 @ @ 01

@ 2 ln K12 @ @ 02

0

0

0

0

@ 2 ln K12 @ 1@

0

@ 2 ln K12 @ 1 @ 01

@ 2 ln K12 @ 1 @ 02

@ 2 ln K12 @ 2@

0

@ 2 ln K12 @ 2 @ 01

@ 2 ln K12 @ 2 @ 02

2 2

1

+

@ 2 ln K2 @ 2 @ 02

1 C C C C C C C C C C C C C C C C C C A

C C C C C C: C C C C A

there is skewness but no asymmetry and the infor-

mation matrix of (21) is obtained. Corollary 2 When there is asymmetry but no skewness, the row and column corresponding to

disappear and

y

= 1=(1 + K1 =K2 ): 25

Corollary 3 For the generalized t distribution, 0

1

0

C B I C B + C B I C B C=B C B I+ C B A @ I+

B B B B I+ B B B @

I

+

I

+

I

I

I

I

I

I

I

I

I

I

+

1

0

C B C B C B C B C; I B C B C B A @

1

0

C B I C B C B I C B C=B C B I C B A @ I

I

I

I

I

I

I

I

I

I

I

I

I

1

C C C C C; C C A

where the elements of I are as in Proposition 1, but I+ =

I

I+ =

I

I+ =

I

with

+

[1 + 1= + 1=

1=

2 1=

= (1= ), I + = e

1=

(

]) ;

;

0:577 denoting the Euler-Mascheroni constant. Letting

I+ = e 1)

exp( ) ( + 1) 1= 2 ; B(1= ; 1= ) + +1 exp( ) ( + 1) 1= = (log( ) + B(1= ; 1= ) + +1 1= 2 exp( ) = B(1= ; 1= ) + +1 +1 =

! 0 gives

log )= (1= ), and I + = e

(

= (1 + 1= ). Thus I+ and I remain positive de…nite in the limit.

Remark 2

As with the symmetric distribution, the usual asymptotics hold for

the scale and shape parameters in a static asymmetric generalized t model when

is

known. The ML estimators of these parameters are consistent and asymptotically normal when

is estimated by the mode, but they may not be e¢ cient; see Bickel

(2002). A Monte Carlo study for the asymmetric GED reported in Bottazzi and Secchi (2011, p 1002-6) suggests that, when

is estimated by ML, inference for

the scale and shape parameters remains valid for any

> 0; despite the fact that

the information matrix is no longer block diagonal. They also demonstrate that the ML estimator of

will be asymptotically e¢ cient for

26

> 1=2; the condition

required to ensure that I

exists.

In the EGARCH model, the information matrix takes on board the di¤erent shape parameters in the evaluation of ut and its derivatives. Thus the block corresponding to the dynamic parameters in the volatility equation is I( ) = y

I1; D1 + (1

y

)I2; D2 ; with Ii;

=

i =( i

+1+

i i)

for i = 1; 2: LM tests

against asymmetry can be carried out if a symmetric model has been …tted. The test statistic will be based on a contrast between the shape parameter score above and below : For the tail index, the di¤erence in the two scores near the tails will carry a good deal of in‡uence. A test of whether just one of the tails is light may also be relevant.

4.3

Example: Silver

Table 4 shows the results from …tting a Beta-Asymmetric-Gen-t-EGARCH model to Silver returns. A model with skewness was estimated as well but the

parameter

was not signi…cantly di¤erent from 0.5. Asymmetry is present in the tail index but not in . The left hand tail is fat, with a tail index of 1=0:136 = 7: 35; whereas the right tail is light because e2 = 0: The likelihood ratio test statistic that they

are the same, which is asymptotically distributed as

2 1

under the null hypothesis,

is 23: 4: Hence the value of 0:076 reported for the symmetric model in Table 1 is spread between

1

and

2:

The SEs for the separate estimates are not much

bigger than the SEs obtained when it is assumed that the parameters are the same. The estimate of

is virtually unchanged, as is its SE. The asymmetry and

the sharp peak induced by a value of

well below two are evident in the histogram

27

of raw returns. Leverage e¤ects are again insigni…cant, but the asymmetry in the distribution imparts an asymmetry to the score function.

4.4

Martingale di¤erence formulation

There is arguably a problem with imposing skewness and asymmetry in the way we have done, in that it can prevent yt being a martingale di¤erence. This is because its conditional expectation,

+

"

exp(

tjt 1 );

where

"

= E("t ); is time-varying.

This issue is well-known for skewed distributions; see the discussion in Harvey (2013, pp 145-7). With asymmetry the problem again arises. The general solution is to set up the model as yt = 1

2 "

=

y

(2 )

+ ("t

")

tjt 1 );

where

1

1

2

1 1

1= 1

exp(

+ (1

1

y

) 2(1

)

1

2 2

1= 2

1

1 2

The score is now

ut =

8 > > < > > :

1+

1

1

1

1+

1

1

2

j"t j j"t j + 1= j"+ t j + j"t j + 1=

"

"t "

"+ t

where "t = "t =2 and "+ t = "t =2(1

1;

"t

0

1;

"t > 0

1

2

): The score is still a martingale di¤erence

as is

ut =

8 > > < > > :

1+

1 1

1+

1 2

j"t j j"t j + 1= j"+ t j + j"t j + 1=

=K1 (1 =K1 + (1 =K1 (1 =K1 + (1

1

2

)=K2 ; )=K2 )=K2 ; )=K2

the additional variable in the dynamic leverage equation, (18).

28

"t

0 ;

"t > 0

4.5

Example: SP500

The martingale di¤erence formulation was used to estimate EGARCH models for daily returns on SP500 from 2 January 2004 to 31 December 2013. The results are shown in Table 5. The skew parameter is signi…cantly di¤erent from 0:5. On the other hand, asymmetry in the tail index, while telling the story of a fatter left-hand tail, is not statistically signi…cant when allowance is made for skewness: the LR test statistic is only 1:0: When the tail indices are the constrained to be same they take the value 32: 26, that is 1=0:031: Without skewness, the LR test statistic for di¤erent tail indices is 28:0 and the lower tail index is 8: 26. Finally is signi…cantly below two, just as it is for silver. On the other hand, when a model for weekly SP500 returns was estimated,

5

was around 2:4:

Conclusion

An EGARCH model in which the dynamic equation for the logarithm of scale is driven by the conditional score can be set up with a generalized t distribution. For a …nite tail index, the in‡uence function is bounded, so mitigating the e¤ect of outliers. Properties such as unconditional moments may be derived and the asymptotic distribution of the maximum likelihood estimator worked out. The dynamics be can extended to include leverage e¤ects and explanatory variables. Empirical evidence shows the practical value of the model for commodity and stock returns. By working with the inverse of the tail index, it is shown that the information matrix remains positive de…nite in the limit as the generalized t goes to a GED. The proof uses Taylor series and expansions for the trigamma function. Similarly 29

digamma function expansions are used to obtain limiting expressions for the scores. These results can be used to construct Lagrange multiplier tests of light tails against fat tails. The generalized t distribution can be extended to accommodate skewness and/or asymmetry and we derive an expression for the information matrix in this case. The full model is very ‡exible but will often simplify. Parameter restrictions may be tested by likelihood ratio, Wald or Lagrange multiplier tests. For Lagrange multiplier tests, the form of the score in the cases examined indicates their plausibility. The multivariate generalized t distribution, described in Arslan (2004), can be used to generalize the model for dynamic volatilities and changing correlations proposed by Creal et al (2011). For positive variables, the GB2 distribution may be parameterized in a similar way to the generalized t so as to include generalized gamma as a special case. The null hypothesis of light tails against the alternative of fat tails may be similarly tested within this framework. The overall theory parallels that of the generalized t thereby providing a fully integrated approach to modeling volatility. Link to Supporting Information ??????? Acknowledgements The work was carried out while Rutger-Jan Lange was a Post-Doctoral Research Associate on the project Dynamic Models for Volatility and Heavy Tails at the Economics Faculty, Cambridge. We are grateful to the Keynes Fund for …nancial support. Earlier versions of the paper were presented at Aix-Marseille School of Economics (GREQAM), at the Cambridge INET Workshop on Developments 30

in Time Series in March 2015, at the EUI in Florence and at the Department of Statistics in Bologna. We are grateful to Chico Blasques, Christian Brownlees, Umberto Cherubini, Guiseppe Cavaliere, Russell Davidson, Simone Giannerini, Peter Hansen, Ryoko Ito, Sebastien Laurent, Alessandra Luati and Stephen Thiele for helpful comments. We would also like to thank the Associate Editor and two referees. References Arslan, O. (2004) Family of multivariate generalized t distributions. Journal of Multivariate Analysis 89, 329-37. Bickel, D.R., (2002). Robust estimators of the mode and skewness of continuous data. Computational Statistics and Data Analysis 39, 153-63. Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994). ARCH models, in Handbook of Econometrics, Volume 4, 2959-3038. Engle, R.F. and McFadden, D.L. (eds). Amsterdam: North-Holland. Bottazzi, G. and Secchi, A. (2011). A new class of asymmetric exponential power densities with applications to economics and …nance. Industrial and Corporate Change 20, 991–1030. Creal, D., Koopman, S. J. and Lucas, A. (2011). A dynamic multivariate heavytailed model for time-varying volatilities and correlations. Journal of Business and Economic Statistics 29, 552-63. Creal, D., Koopman, S. J. and Lucas, A. (2013). Generalized autoregressive score models with applications. Journal of Applied Econometrics 28, 777-95. Davis, P.J. (1964). Gamma function and related functions, in Abramowitz, M. and I. A. Stegun (eds.). Handbook of Mathematical Functions, 253-66, New York: 31

Dover Publications Inc. Embrechts, P., Kluppelberg, C. and T. Mikosch (1997). Modelling Extremal Events. Berlin: Springer Verlag. Hansen, J. V., McDonald, J.B., Theodossiou, P. and Larsen, B. J. (2010). Partially Adaptive Econometric Methods for Regression and Classi…cation. Computational Economics 36,153–169 Harvey, A.C. (2013). Dynamic Models for Volatility and Heavy Tails: with applications to …nancial and economic time series. Econometric Society Monograph, Cambridge University Press. Harvey, A. C. and Sucarrat, G. (2014). EGARCH Models with Fat Tails, Skewness and Leverage. Computational Statistics and Data Analysis 26, 320-338. Jensen, S. T. and Rahbek, A. (2004). Asymptotic inference for nonstationary GARCH. Econometric Theory 20, 1203-26. McDonald, J.B. and Newey, W.K. (1988). Partially Adaptive Estimation of Regression Models via the Generalized t Distribution, Econometric Theory 4, 428457. Straumann, D. and T. Mikosch (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: a stochastic recurrence equations approach. Annals of Statistics 34, 2449-2495. Theodossiou, P. and Savva, C.S. (2016), Skewness and the relation between risk and return, Management Science, 62, 1598 –1609. Zhu, D. (2012). Asymmetric Parametric Distributions and a New Class of Asymmetric Generalized t Distribution. Working paper, School of Economics,

Shanghai University of Finance and Economics. http://papers.ssrn.com/sol3/papers.cfm?abstract_i Zhu, D. and Galbraith, J.W. (2010). A generalized asymmetric Student-t dis32

tribution with application to …nancial econometrics. Journal of Econometrics 157, 297–305. Zhu, D. and Galbraith, J.W. (2011). Modeling and forecasting expected shortfall with the generalized asymmetric Student-t and asymmetric exponential power distributions. Journal of Finance 18, 765-78. Zhu, D. and Zinde-Walsh, V. (2009). Properties and estimation of asymmetric exponential power distribution. Journal of Econometrics 148, 86-99.

33

Model

!

ln L AIC BIC Q (20) Q (20)

Student t 0.000 0.038 0.991 0.519 0.070 0.004 0.006 0.002 0.112

0.036

GED 0.000 0.045 0.987 0.502 0.086 ( 0.004 0.007 0.003 0.098

0.035

2) -4648.2 4.210 4.225

0.023 0)

0.034

Gen t 0.000 0.042 0.989 0.493 0.081 0.004 0.007 0.002 0.106

0.220 (

1.148 -4642.0 4.204 4.220 0.046

0.076 0.045

1.342 -4640.1 4.204 4.222 0.138

22.3

14.6

0.33

0.80

23.4

19.9

0.27

0.46

24.0

17.7

0.24

0.61

Table 1 Beta-Gen-t-EGARCH …tted to daily Silver returns from April 28th 2006 to February 11th 2015 - Standard errors (SEs) in small typeface

34

Model Constrained

! 0.042 0.989 0.491 0.081 0.076 1.342 0.007 0.003 0.103

0.035

y

y

-

- -4640.1 4.203 4.218

ln L AIC BIC

0.045 0.138

Uncon- 0.026 0.991 0.494 0.079 0.093 1.380 0.046 1.695 -4639.7 4.204 4.225 strained

0.026 0.004 0.115

0.034

0.049 0.147

0.040 1.118

Table 2 Beta-Gen-t-EGARCH and unconstrained EGARCH models …tted to Silver returns

35

Model Con strained

! 0.045 0.026 0.988 -0.535 0.040 0.041 1.497 0.005 0.005 0.001

0.102

0.014

y

y

z

-

-

- -3361.5 2.6766 2.6928

ln L

AIC

BIC

0.024 0.114

Uncon- 0.057 0.003 0.989 -0.439 0.040 0.038 1.479 0.007 3.653 0.002 -3360.6 2.6783 2.7014 strained

0.007 0.007 0.003

0.128

0.014

0.040 0.142

0.014 1.798

0.059

Table 3 Beta-Gen-t-EGARCH and unconstrained EGARCH models …tted to SP500 daily returns. The parameter

z

36

is in the leverage term (and

z

= 1).

!

1

ln L AIC BIC Q (20) Q (20)

2

Estimate 0.004 0.040 0.990 0.499 0.098 0.136 0.000 1.349 -4628.4 4.194 4.215 SE

0.004 0.007 0.002 0.104

0.034

0.052 0.048

27.1

20.1

0.13

0.45

0.144

Table 4 Beta-Asymmetric-Gen-t-EGARCH …tted to Silver returns

37

!

1

0.045 0.026 0.988 -0.535 0.040 0.005 0.005 0.001

0.102

0.014

0.049 0.036 0.986 -0.376 0.023 0.005 0.005 0.001

0.100

0.013

0.041 0.024

0.115

0.014

AIC

BIC

1.497 -3361.5 2.6766 2.6928 0.114

0.121 0.004 1.617 -3347.5 2.6663 2.6848 0.039 0.044 0.149

0.050 0.030 0.986 -0.233 0.015 0.573 0.031 0.005 0.005 0.001

ln L

2

0.011 0.029

1.478 -3342.0 2.6619 2.6805 0.122

0.047 0.032 0.987 -0.323 0.021 0.542 0.068 0.004 1.561 -3341.5 2.6623 2.6832 0.004 0.005 0.001

0.116

0.014

0.023 0.032 0.056 0.153

Table 5 Beta-Skew-Asymmetric-Gen-t-EGARCH …tted to SP500 daily excess returns from 2 January 2004 to 31 December 2013

38

Suggest Documents