Statistical and Algorithm Aspects of Optimal ... - Stanford University

16 downloads 209 Views 5MB Size Report
This signature page was generated electronically upon submission of this ... indicate that the self-excited point proces
STATISTICAL AND ALGORITHM ASPECTS OF OPTIMAL PORTFOLIOS

DISSERTATION SUBMITTED TO THE INSTITUTE OF COMPUTATIONAL AND MATHEMATICAL ENGINEERING OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Howard Howan Stephen Shek March 2011

2011 by Howard Howan Stephen Shek. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons AttributionNoncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/zv848cg8605

ii

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Walter Murray, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Tze Lai, Co-Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Peter Hansen

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii

Abstract We address three key aspects of optimal portfolio construction: expected return, variancecovariance modeling and optimization in presence of cardinality constraints. On expected return modeling, we extend the self-excited point process framework to model conditional arrival intensities of bid and ask side market orders of listed stocks. The cross-excitation of market orders is modeled explicitly such that the ask side market order size and bid side probability weighted order book cumulative volume can aect the ask side order intensity, and vice versa. Dierent variations of the framework are estimated by using method of maximum likelihood estimation, based on a recursive application of the log-likelihood functions derived in this thesis. Results indicate that the self-excited point process framework is able to capture a signicant amount of the underlying trading dynamics of market orders, both in-sample and out-of-sample. A new framework is introduced, Realized GARCH, for the joint modeling of returns and realized measures of volatility. A key feature is a measurement equation that relates the realized measure to the conditional variance of returns. The measurement equation facilitates a simple modeling of the dependence between returns and future volatility. Realized GARCH models with a linear or log-linear specication have many attractive features. They are parsimonious, simple to estimate, and imply an ARMA structure for the conditional variance and the realized measure. An empirical application with DJIA stocks and an exchange traded index fund shows that a simple Realized GARCH structure leads to substantial improvements in the empirical t over standard GARCH models. Finally we describe a novel algorithm to obtain the solution of the optimal portfolio problem with NP -hard cardinality constraints. The algorithm is based on a local relaxation that exploits the inherent structure of the objective function. It solves a sequence of small, local, quadratic-programs by rst projecting asset returns onto a reduced metric space, followed by clustering in this space to identify sub-groups of assets that best accentuate a suitable measure of similarity amongst dierent assets. The algorithm can either be cold started using the centroids of initial clusters or be warm started based on the output of a previous result. Empirical result, using baskets of up to 3,000 stocks and with dierent cardinality constraints, indicates that the algorithm is able to achieve signicant performance gain over a sophisticated branchand-cut method. One key application of this local relaxation algorithm is in dealing with large scale cardinality constrained portfolio optimization under tight time constraint, such as for the purpose of index tracking or index arbitrage at high frequency.

iv

Contents

1 Introduction

1

2 Covariance Matrix Estimation and Prediction

6

2.1

Literature Review 2.1.1 2.1.2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

High Frequency Volatility Estimators in Absence of Noise . . . . . . . . . . .

6

High Frequency Volatility Estimators in Presence of Market Microstructure Noise

2.2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.3

Volatility Prediction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.4

Covariance Prediction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Complete Framework: Realized GARCH

7 18 23

. . . . . . . . . . . . . . . . . . . . . . . .

26

2.2.1

Framework

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.2.2

Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3 Asset Return Estimation and Prediction

45

3.1

Outline of Some Commonly Used Temporal Models . . . . . . . . . . . . . . . . . . .

45

3.2

Self Excited Counting Process and its Extensions . . . . . . . . . . . . . . . . . . . .

49

3.2.1

Univariate Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

3.2.2

Bivariate Case

52

3.2.3

Taking volume and orderbook imbalance into account

. . . . . . . . . . . . .

55

3.2.4

Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Solving Cardinally Constrained Mean-Variance Optimal Portfolio 4.1

4.2

4.3

4.4

Literature Review

69

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.1.1

Branch-and-bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

4.1.2

Heuristic methods

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

4.1.3

Linearization of the objective function . . . . . . . . . . . . . . . . . . . . . .

73

. . . . . . . . . . . . . . . . . . . . . . . . . .

73

4.2.1

Sequential Primal Barrier QP Method Framework

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

4.2.2

Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

Solving CCQP with a Global Smoothing Algorithm . . . . . . . . . . . . . . . . . . .

78

4.3.1

Framework

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

4.3.2

Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

Solving CCQP with Local Relaxation Approach . . . . . . . . . . . . . . . . . . . . .

86

4.4.1

Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.4.2

Algorithms

90

4.4.3

Computational Result

4.4.4

Algorithm Benchmarking

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94 95

5 Conclusion

108

A Appendix of Proofs

110

v

List of Tables 1

Key model features at a glance: The realized measures,

Rt , RVt ,

and

xt

denote the

intraday range, the realized variance, and the realized kernel, respectively.

In the

Realized GARCH model, the dependence between returns and innovations to the volatility (leverage eect) is modeled with so that

Eτ (zt ) = 0,

when

zt ∼ (0, 1).



τ (zt ),

such as

τ (z) = τ1 z + τ2 (z 2 − 1),

The MEM specication listed here is that



selected by Engle & Gallo (2006) using BIC (see their Table 4).

The distributional

assumptions listed here are those used to specify the quasi log-likelihood function. (Gaussian innovations are not essential for any of the models). 2

. . . . . . . . . . . .

19

Results for the logarithmic specication: G(1,1) denotes the LGARCH(1,1) model that does not utilize a realized measure of volatility. GARCH(2,2) model without the



RG(2,2)

denotes the Real-

τ (z) function that captures the dependence between ∗

returns and innovations in volatility. RG(2,2)

is the (2,2) extended to include the

2 ARCH-term α log rt−1 . The latter being insignicant.

. . . . . . . . . . . . . . . . .

39

3

Estimates for the RealGARCH(1, 2) model.

4

In-Sample and Out-of-Sample Likelihood Ratio Statistics

. . . . . . . . . . . . . . .

42

5

Fitted result for DCC(1,1)-RGARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . .

44

6

MLE tted parameters for the proposed models; standard errors are given in parenthesis. Sample date is 25 June 2010.



. . . . . . . . . . . . . . . . . . . . . . .

indicates that the value is not signicant at

95% level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

40

64

Nullspace CG algorithm output for the rst 21 iterations. ngrad.r: norm of reduced gradient; ngrad:

norm of gradient; obj:

objective function value; cond:

condition

number of hessian; cond.r: condition number of reduced hessian; mu: barrier parameter; step: step length; res.pre: norm of residual pre CG; res.post: norm of residual post CG; maximum of outer iteration: 20; maximum of CG iteration: 30. 8

K

= 15.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

K

= 15.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

Successive geometric truncation method result for 3,000 of the most liquid US stocks

K = 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . local relaxation method. (500, 15) means solving a 500 universe

with cardinality constraint, 11

97

Successive arithmetic truncation method result for 3,000 of the most liquid US stocks, with cardinality,

10

81

Successive arithmetic truncation method result for 500 of the most liquid US stocks, with cardinality,

9

. . . . . .

Parameter setup for

problem with cardinality equal to 15. . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

99

12

CPLEX parameter settings (with the rest of the parameters set to their default values). 99

13

Computational result for the 500 asset case with cardinality, and the

local relaxation

K

= 15. For CPLEX

algorithm, we have imposed a maximum run time cuto at

1,200 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Computational result for the 3,000 asset case, with cardinality,

K

= 15. Successive

truncation is done over 10 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

99

100

List of Figures 2.1

Monthly variance using daily close-to-close vs using hourly open-to-close.

Sample

period: 2008-02-02 to 2009-01-01. Slope is from simple OLS. . . . . . . . . . . . . . . 2.2

MLE tted EMWA parameter as a function of sampling period; Gaussian density.

2.3

News impact curve for IBM and SPY

3.1

Output of three variations of the ARMA time series framework.

24

.

25

. . . . . . . . . . . . . . . . . . . . . . . . . .

43

Black circles are

actual data-points and red dash-lines indicate the next period predictions. Top panel: simple ARMA; Center panel: ARMA with wavelet smoothing; Bottom panel: ARMAGARCH. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. Sample period shown here is between 2008-05-06 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2

47

Output based on AIC general state space model following Hyndman et al. (2000) and the Holt-Winters Additive models. Black circles are actual data-points and red dash-lines indicate the next period predictions.

Dataset used is the continuously

rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals.

Sample period shown here is between 2008-05-06 13:40:00 and

2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.

49

3.3

Diagrammatic illustration of a feed-forward (4,3,1) neural network. . . . . . . . . . .

50

3.4

Output based on a feed-forward (15,7,1) neural network.

Black circles are actual

data-points and red dash-lines indicate the next period predictions. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. Sample period shown here is between 2008-0506 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

µ = 0.3, α = 0.6 and β = 1.2.

3.5

Simulated intensity of an univariate Hawkes process, with

3.6

Conditional intensity of bid and ask side market orders following an order submit-

51 53

ted on the bid side of the market, estimated with bin size ranging from 30 to 500 milliseconds, using BP tick data on 25 June 2010. . . . . . . . . . . . . . . . . . . . . 3.7

Simulated intensity of a bivariate Hawkes process. Path in blue (top) is a realization of the

λ1

process; path in red (bottom) is that of the

visualization.

Parameters:

β11 = β12 = 1.5 3.8

54

and

λ2

process, inverted to aid

µ1 = µ2 = 0.5, α11 = α22 = 0.8, α12 = α21 = 0.5,

β22 = β21 = 1.5.

. . . . . . . . . . . . . . . . . . . . . . . . . . .

55

Snapshots showing the evolution of a ten layer deep limit order book just before a trade has taken place (gray lines) and just after (black lines) for BP. Dotted lines are for the best bid and ask prices. Solid line is the average or mid price. Bars are scaled by maximum queue size across the whole book and represented in two color tones to help identify changes in the order book just before and after a trade has taken place.

vii

58

3.9

Time to order completion as a function of order submission distance from the best prevailing bid and ask prices. Distance is dened as the number of order book median price increments from the best top layer prices at the oppose side of the market. For example a distance of 1 corresponds to the top bid and ask prices and a distance of 2 corresponds to the second layer of the bid and ask sides. . . . . . . . . . . . . . . .

59

3.10 Probability of order completion within 5 seconds from submission. Dots are estimated based on empirical data and solid curve is based on the tted power law function

0.935x−0.155 .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

3.11 Time series for the dierence in bid and ask side LOB probability weighted cumulative volume,

v¯ (t, τ, L; 1) − v¯ (t, τ, L; 2),

for BP, on 25 June 2010.

. . . . . . . . . . . . . .

61

3.12 Top panel: time series for best prevailing bid and ask prices of the limit order book. Bottom panel: bid-ask spread. Both are for BP, on 25 June 2010. . . . . . . . . . . .

62

3.13 Unconditional arrival intensity of market and limit order on bid and ask sides of the order book, estimated using overlapping windows of one minute period, for BP on 25 June 2010. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.14 KS-plot for empirical inter-arrival times for market orders on the bid and ask sides of the market.

Dash lines indicate the two sided 99% error bounds based on the

Kolmogorov-Smirnov statistic. Also shown is the value of the Kolmogorov-Smirnov test statistic for the two order types. . . . . . . . . . . . . . . . distribution of the

3.15 KS-plot based on four variations of the framework discussed:

65

bivariate, bivariate

with trade size mark, bivariate with order book imbalance mark and bivariate with trade size and order book imbalance marks. Fitted to sample data on 25 June 2010. Dash lines indicate the two sided 99% error bounds based on the distribution of the

Kolmogorov-Smirnov

statistic. Also shown is the value of the

Kolmogorov-Smirnov

test statistic for the two order types. . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

3.16 In sample KS-plot for empirical inter-arrival times for market order on the bid and ask sides of the LOB. Model parameters are tted with data from in sample period on 06 July 2009 and applied to in sample period on 06 July 2009. Dash lines indicate the two sided 99% error bounds based on the distribution of the statistic.

Also shown is the value of the

two order types.

Kolmogorov-Smirnov

Kolmogorov-Smirnov test statistic for the

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

3.17 Out of sample KS-plot for empirical inter-arrival times for market order on the bid and ask sides of the LOB. Model parameters are tted with data from in sample period on 06 July 2009 and applied to out of sample period on 07 July 2009. Dash lines indicate the two sided 99% error bounds based on the distribution of the statistic.

Also shown is the value of the

two order types. 4.1

Kolmogorov-Smirnov

Kolmogorov-Smirnov test statistic for the

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Time complexity of simple QP with only the budget constraint.

68

Problem size is

the number of names in the portfolio. Both axes are in log scale. Result produced using the

R

built-in QP solver based on a dual algorithm proposed by Goldfarb and

Idnani (1982) that relies on Cholesky and QR factorization, both of which have cubic complexity.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

70

4.2

Null space line-search algorithm result. Top panel: value of barrier parameter, a function of iteration. feasible value,

x0 =

h

Bottom panel: 1-norm of error.

0.3

0.5

0.2

0.7

0.3

i>

σ

is set to

0.8

µ,

as

and initial

. Trajectory of the 1-norm error

in the bottom panel illustrates that the algorithm stayed within the feasible region of the problem. 4.3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Histogram for mean daily return for 500 most actively traded stocks between 200801-22 to 2010-01-22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4

77

Optimization output using built-in

R

78

routine that is based on the dual method of

Goldfarb and Idnani (1982, 1983). Top Panel: with equality constraint only. Bottom Panel: with equality and inequality constraints. and maximum of the solution vector 4.5

mix x

x, respectively. .

Optimization output using Algorithm 1.

and

are the minimum

. . . . . . . . . . . . . . . . . .

79

Top Panel: with equality constraint only.

Bottom Panel: with equality and inequality constraints. minimum and maximum of the solution vector 4.6

max x

mix x

x, respectively.

and

max x

are the

. . . . . . . . . . . . .

Algorithm 1 convergence diagnostics. Top panel: value of barrier parameter

79

µ; Center

panel: 1-norm of dierence between objective value at each iteration and the true value (as dened to be the value calculated by the of CG iterations at each outer iteration. 4.7

R function); Bottom panel:

. . . . . . . . . . . . . . . . . . . . . . . . .

Additional Algorithm 1 convergence diagnostics.

Top panel: step-length

panel: objective function value; Bottom panel: 2-norm of the gradient. 4.8

Converged output for and the last 500 are histogram of

4.9

si .

si ;

and the last 500 are

si .

Panel-1: rst 500 variables are

Panel-2: histogram of

xi ;

xi ,

α;

second 500 are

Panel-3: histogram of

yi ;

K = 100.

si ;

Panel-1: rst 500 variables are

Panel-2: histogram of

xi ;

xi ,

yi ;

80

yi

Panel-4:

second 500 are

Panel-3: histogram of

80

Center

. . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Converged output for

histogram of

K = 250.

the number

85

yi

Panel-4:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.10 Heatmap of covariance matrix for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01. colors closer to -1. 4.11 A

Darker colors indicate correlation closer to 1 and light

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Minimum spanning tree

87

for 50 actively traded stocks, over the sample period 2008-

01-10 to 2009-01-01. Using distance metric dened in (4.18). Each color identies a dierent sector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12

k-means clustering

88

in a space spanned by the rst three principal components based

on log returns for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

4.13 Fully relaxed (i.e. without cardinality constraint) QP solution for 500 actively traded US stocks (not all tickers are shown) sampled over the period from 2002-01-02 to 2010-01-22, with dollar neutral constraint. . . . . . . . . . . . . . . . . . . . . . . . .

96

4.14 Projection onto a 4-dimensional space spanned by the rst four dominant PCA factors, based on daily returns for the most liquid 500 US traded stocks, sampled over the period from 2002-01-02 to 2010-01-22. dierent cluster groups.

Symbols with dierent colors correspond to

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

98

local relaxation method performance comparison for the 500 asset case, with cardinality, K = 15, both are cold started. . . . . . . . . . . . . . . . . . . . . .

4.15 CPLEX versus

100

4.16 Projection onto a four dimensional space spanned by the rst four dominant PCA factors, based on monthly returns for the most liquid 3,000 US traded stocks, sampled over the period from 2005-05-31 to 2010-04-30. correspond to dierent cluster groups. 4.17 CPLEX versus

local relaxation local relaxation

. . . . . . . . . . . . . . . . . . . . . . . . . .

101

method performance comparison for the 3,000 asset

case, with cardinality constraint 4.18 CPLEX versus

Symbols with dierent colors

K

= 15, both cold started. . . . . . . . . . . . . . .

102

method performance comparison for the 500 assets

universe, with cardinality constraint,

K = 15;

Both methods are warm started using

the solution of successive truncation over 20 iterations. Maximum run-time is set to one hour.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.19 CPLEX versus

local relaxation

103

method performance comparison for the 3,000 assets

universe, with cardinality constraint,

K = 100;

Both methods are warm started using

the solution of arithmetic successive truncation over 20 iterations. Maximum run-time is set to seven hours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.20 Apply CPLEX MIQP to the union of the cluster groups identied by

104

local relaxation

algorithm upon convergence (the beginning of the at line in Figure 4.19). Maximum run-time is set to one hour.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

105

4.21 Mean-variance ecient frontiers for a 3,000 asset universe. QP is the frontier without the cardinality constraint. CCQP is the frontier in presence of cardinality. To produce the frontier for CCQP, we warm-start the local relaxation algorithm, based on the successive truncation solution, and set a maximum run time limit of 252,000 seconds for the case where

K = 100

and 3,600 second for the case where

x

K = 15.

. . . . . . .

106

1

Introduction

Portfolio optimization in the classical mean-variance optimal sense is a well studied topic (see Markowitz, 1952, 1987; Sharpe, 1964). It is relevant at all trading frequencies, whenever the agent has an exponential utility function, and whenever multiple return forecasts are aggregated into a portfolio. In high frequency nance, when transaction cost plays a key part in the eventual return of a strategy and when speed of transaction depends on the number of stocks in the portfolio, we need a framework to form a mean-variance optimal portfolio that bounds the total number of names to execute. Ultimately, the problem we aim to solve is a

cardinality-constrained quadratic program

(CCQP) with linear constraints, which can be expressed as

min f (x) = c> x + 12 x> Hx

(1.1)

x,˜ x

s.t.

where

Ax = b

(1.2)

e> x e=K

(1.3)

c, x, e, x e ∈ Rn×1 , H ∈ Rn×n , A ∈ Rm×n , b ∈ Rm×1 , m ≤ n

and

e

is a vector of

1's.

Let

x∗

be

our solution set, the indicator function in (1.3) is given by

 1 x ∈ x∗ i x ei = 0 o.w.. The three key components in the objective function are the prediction of expected return, variance-covariance matrix,

c,

H , and the specic algorithm for solving the optimization problem.

and This

thesis addresses these three key aspects of optimal portfolio construction. Based on Shek (2010), we investigate a self-excited point process approach to forecast expected return. Until recently, the majority of time series analyses related to nancial data has been carried out using regularly spaced price data, with the goal of modeling and forecasting key distributional characteristics of future returns, such as expected mean and variance. These time series data mainly consist of daily closing prices, where comprehensive data are widely available for a large set of asset classes. With the recent rapid development of high-frequency nance, the focus has shifted to intraday tick data, which record every transaction during market hours, and come with irregularly spaced time-stamps. We could resample the dataset and apply the same analyses as before, or we could try to explore additional information that the inter-arrival times may convey in terms of likely future trade direction. In order to properly take into account these irregular occurrences of transactions, we can adopt the framework of point process. In a doubly stochastic framework (see Bartlett, 1963), both the counting process and the driving intensity are stochastic. A point process is called selfexcited if the current intensity of the events is determined by events in the past, see Hawkes (1971). It is widely accepted and observed that volatility of price returns tends to cluster. That is, a period of elevated volatility is likely to be followed by periods of similar levels of volatility. Trade arrivals also exhibit such clustering eect, for example a buy order is likely to be followed closely by another buy order. These orders tend to cluster in time. There has been a growing amount of literature on the application of point process to model inter-arrival trade durations, see Bowsher (2003) for a

1

comprehensive survey of the latest modeling frameworks. The proposed framework extends previous work on the application of self-excited process to model high frequency nancial data. The extension comes in the form of a marked version of the process in order to take into account trade size inuence on the underlying arrival intensity. In addition, by incorporating information from the

book

limit order

(LOB), the proposed framework takes into account a measure of supply-demand imbalance of

the market by parametrize the underlying based intensity as a function of this imbalance measure. The intensities,

where

ti

λ1t , λ2t

for market bid and ask side orders, respectively, are given by the following,

   λ1t = µ1 v¯2t +

1 w ¯1

P

 

1 w ¯2

P

and

λ2t = µ2 v¯1t + tj

are

Ft -adapted

ti f r s Z

with Z (h) = I +

Ph

j=1

Pj − I



and

> y (xr , xs ) = lim y (h) (xr , xs ) = e> r (I − Z) f + es Zf h→∞

where Z is the fundamental matrix of the underlying Markov chain. Focusing on the

h = ∞ case and dene y(r,s) = y (xr , xs ), then the ltered realized variance (the

infeasible estimator) is given by

RVF =

X

2 nr,s y(r,s) .

r,s The empirical transition matrix



Pˆr,s =

is given by

nr,s/nr,• , where

nr,• =

P

s

nr,s .

Then we have

the feasible estimator

RVFˆ =

X

2 nr,s yˆ(r,s)

r,s where





2 ˆ yˆ(r,s) = e> I − Zˆ f + e> r r Zf

Denition 12.

and



ˆ Zˆ = I − Pˆ − Π

−1

.

Markov Chain estimator in price level is dened as

D   E M C # := n f, 2Zˆ − I f π ˆ

ha, biπ = a> Λπ b with Λπ = diag (π1 , π2 , . . . , πS k ).  RFFˆ − M C # = Op n−1 , and if the rst observe state

where the inner product is dened to be

It can be

shown (Theorem 4 in paper) that

coincides

with the last observed state then

RVFˆ = M C # . 17

Markov Chain estimator in log-price is to a good approximation

D   E n2 f, 2Zˆ − I f π ˆ Pn . MC = 2 X i=1 Ti This approximation is good as long as

XTi

does not uctuate dramatically.

computation and it preserves the asymptotic theory derived for Robustness to inhomogeneity (for both

MC

#

MC

#

MC

allows for faster

.

and its standard deviation) is dealt by articially

increasing the order of the Markov chain. For example, a simulation study shows that estimation for a data generating process with a Markov chain of order one, a Markov chain of order two yields a consistent estimate for

M C #,

but the asymptotic variance of estimator increases with

k.

Note,

although misspecication of the order two Markov chain leads to poor estimates of many population quantities, but it will accurately estimate the quantity we seek. One special feature of the

MC

estimator is that it is a generalization of the ALT estimator

proposed by Large (2007) (see Section 2.1.2). It can be shown that this is identical to

k = 1,

and

S=2

with

>

f = (+κ, −κ)

M C #,

with

.

2.1.3 Volatility Prediction Volatility is more predictable then the mean of the underlying return process. Stylized eects such as diurnal intraday pattern, and clustering at dierent time duration are prevalent in many assets and are well known and studied. In the subsections that follows, some key features of a number of forecasting frameworks are introduced. Table 1 on page 19 summaries their key features.

GARCH (Bollerslev, 1986; Engle, 1982)

The latent volatility process of asset returns are

relevant to a wide variety of applications, such as option pricing and risk management, and GARCH models are widely used to model the dynamic features of volatility. This has sparked the development of a large number of ARCH and GARCH models since the seminal paper by Engle (1982). Within the GARCH framework, the key element is the specication for the conditional variance. GARCH models utilize daily returns (typically squared returns) to extract information about the current level of volatility, and this information is used to form expectations about the next period's volatility.

A single return is unable to oer more than a weak signal about the current level of

volatility. The implication is that GARCH models are poorly suited for situations where volatility changes rapidly to a new level, because the GARCH model is slow at catching up and it will take many periods for the conditional variance (implied by the GARCH model) to reach its new level. Partial GARCH and MEM discussed in this Section, and the

Realized GARCH

in Section 2.2 are

second generation of GARCH models that address these shortcomings, but still retain the essential features of the original framework.

UHF-GARCH (Engle, 2000)

The proposed procedure is to model the associated variables,

such as observed (i.e. latent value plus any measurement noise) returns (or the arrival times, and then to

separately

marks ) conditional on

model the arrival times. Below, the notations used in the

original paper are retained for ease of comparison. Dene inter-trade duration

xi = ti − ti−1 18

19 and

xt

iid N (0, 1) iid Exp (1)

zt

ut σu

zt

∼ iid N (0, I)

∼ iid N (0, I)





zRK,t





denote the intraday range, the realized variance, and the realized

ht zt ξ + ϕ log ht + τ (zt ) + ut



∼ ∼

 zt  zR,t  ∼ iid N (0, I) zRV,t



ei εi

τ (zt ),

such as

τ (z) = τ1 z + τ2 (z 2 − 1),

† so that Eτ (zt ) = 0, when zt ∼ (0, 1). The MEM specication listed here is that selected by Engle & ‡ The distributional assumptions listed here are those used to specify the quasi log-likelihood function.

(Gaussian innovations are not essential for any of the models).

Gallo (2006) using BIC (see their Table 4).

with



zt ∼ iid N (0, 1)

Distribution

kernel, respectively. In the Realized GARCH model, the dependence between returns and innovations to the volatility (leverage eect) is modeled

Rt , RVt ,

ht = exp {ω + β log ht−1 + γ log xt−1 }

= =

ht zt 2 µt zRK,t



= ht zt2 2 = hR,t zR,t 2 = hRV,t zRV,t

= ρ √rxi−1 + ei + φei−1 i−1 = ψt εt  = Vi−1 √rxi i xi

ht zt

= =

rt log xt

rt xt

2 ht = ω + αrt−1 + βht−1 + γxt−1 µt = ωR + αR xt−1 + βR µt−1



rt2 Rt2 RVt

σi2

xt

√ri xi

rt =

Observable

2 2 ht = ω + αrt−1 + βht−1 + δrt−1 + ϕRt−1 2 hR,t = ωR + αR Rt−1 + βR hR,t−1 + δR rt−1 hRV,t = ωRV + αRV RVt−1 + βRV hRV,t−1 2 +δRV rt−1 + ϑRV RVt−1 1(rt−1 N (t)| x ei−1 , yei−1 ) ∆t ´ f ( t − ti−1 , u| x ei−1 , yei−1 ; θi ) du ´ ´ u∈Ξ . f ( s − ti−1 , u| x ei−1 , yei−1 ; θi ) duds s≥t, u∈Ξ lim

∆t→0

Dene the component conditional densities

f ( xi , yi | x ei−1 , yei−1 ; θi ) = g ( xi | x ei−1 , yei−1 ; θ1i ) q ( yi | xi , x ei−1 , yei−1 ; θ2i ) . Then we obtain

λi (t, x ei−1 , yei−1 )

Model for inter-arrival time, xi

´

=

g ( t − ti−1 | x ei−1 , yei−1 ; θ1i ) . g ( s − ti−1 | x ei−1 , yei−1 ; θ1i ) ds s≥t

To model the inter-arrival time,

xi ,

(2.12)

Engle and Russell (1998)

adopted the following framework:

Denition 13. Autoregressive Conditional Duration (ACD) εi ∼ iid (1, 1)

xi

=

ψi εi

ψi

=

ψ (e xi−1 , yei−1 ; θi ) = E [ xi | x ei−1 , yei−1 ; θ1i ] =

(Engle and Russell, 1998) is given by

(2.13)

ˆ Ω

xi g ( xi | x ei−1 , yei−1 ; θ1i ) dxi .

(2.14)

Note that (2.13) is powerful because it nests

.

log linear model:

log xi = zi β + wi ,

.

Cox proportional hazard model (Cox, 1972):

and

λ (t, z) = λ0 (t) e−zβ ;

we have (2.13).

Example.

Conditional mean specication

xi

=

ψi εi , εi ∼ Exp (1)

ψi

=

ω + αxi−1 + βψi−1 .

From (2.13) we have

g ( xi | x ei−1 , yei−1 ; θ1i )

= g ( xi = εi ψi | ψi ; θ1i )   xi = g εi = ψ ; θ i 1i ψi   xi = p 0 εi = θ11 . ψi 20

so if

λ0 ≡

constant, then

The density and survivor function

ε,

and the base intensity can be expressed

ε ∼ p0 (•; θ11 ) ˆ S0 (t; θ11 ) = p0 (s; θ11 ) ds s>t

λ0 (t; θ11 )

p0 (t; θ11 ) . S0 (t; θ11 )

=

Then (2.12) can be expressed by

p0 λi (t, x ei−1 , yei−1 )

´

=

s≥t

=

ψi

´

=

λ0

Note that once we have specied the density of

t−ti−1 ψi ; θ11

p0



p0





s−ti−1 ψi ; θ11 t−ti−1 ψi ; θ11



ds



p0 (e s; θ11 ) de s  t − ti−1 1 ; θ11 . ψi ψi

s e≥





t−ti−1 ψi

ε and the functional form for ψ , then the conditional

intensity is fully specied. For example, if we have

P r (ε = x) = e−x

, i.e. standard exponential

density, then we have

1 λi = 2 ψi

Model for price volatility, σi modeling arrival time,

xi ,

The return process,

ri ,

e

t−ti−1 ψi

−1

.

Recall that the UHF-GARCH framework is a two step process of

and price volatility,

we are now in a position to model

!

1

σi ,

xi

separately. With

specied by (2.13) and (2.14),

σi .

in terms of the observed price process,

pi ,

is given by

ri = pi − pi−1 . The observed price process is related to the latent process,

mi ,

via

pi = mi + ζi , where

ζi

is error from truncation. Assume that the latent price,

pi ,

to be a Martingale with respect

to public information with innovation that can be written, without loss of generality, proportional to the square root of the time. We have a duration adjusted return given by

∆mi ∆ζi ri √ = √ + √ = νi + ηi , xi xi xi 5

such that all relevant quantities are expressed as per unit of time. Consider a serially correlated specication for the second term

ηi = ρηi−1 + ξi + χξi−1 , 5 Autocorrelated since the truncation at one point in time is likely to be the same as the truncation several seconds later

21

then we arrive at an ARMA(1,1) model for our duration adjusted return given by

ri ri−1 + ei + φei−1 , √ = ρ√ xi xi−1 where

(2.15)

ei = νi + ξi .

Denition 14.

Dene the conditional variance per unit of time

 Vi−1 where the ltration for

Vi−1 (•)

is

 ri √ xi = σi2 , xi

(2.16)

σ ((xi , ri ) : i = 0, 1, . . . , i − 1).

Compare (2.16) to the denition in classic GARCH for return

hi =

xi σi2 , hence

Vi−1 ( ri | xi ) = hi ,

we see that

σi2 is the variance per unit time.

Empirical observations

There are a number of ways to specify

σi2 ,

once such specication (Eqn

(40) in their paper) is given below

2 σi2 = ω + αe2i−1 + βσi−1 + γi x−1 i + γ2

xi + γ3 ξi−1 + γ4 ψi−1 . ψi

With the above tted to data for IBM, a number of interesting observations is observed:

.

mean of

(2.15)

is negative ⇒ no news is bad news,

(Diamond and Verrecchia, 1987);

. γ1

is positive ⇒ no trade means no news,

. γ2

is negative ⇒ mean reversion after surprise of trade;

. γ4

is positive ⇒ high transaction means high volatility.

Partial GARCH (Engle, 2002b)

hence low volatility (Easley and O'Hara, 1992);

High-frequency nancial data are now readily available and

the literature has recently introduce a number of realized measures of volatility, including the realized variance, the bipower variation, the realized kernel, and many related quantities, see Sections 2.1.1 and 2.1.2, and Barndor-Nielsen and Shephard (2002), Barndor-Nielsen and Shephard (2004), Barndor-Nielsen et al. (2008b), Hansen and Horel (2010), and references therein.

Any of these

measures is far more informative about the current level of volatility than is the squared return. This makes realized measures useful for modeling and forecasting future volatility. Let

xt

be some observable time series that is referred to as the realized measure. For instance,

could be the realized variance computed from high-frequency data on day

2 like rt , related to

ht ,

xt

t. The realized variance is,

and this holds true for many high-frequency based measures. So it is natural

to augment the standard GARCH model with

xt .

A simple extension is to add the realized measure

to the dynamic equation for the conditional variance,

2 ht = ω + αrt−1 + βht−1 + γxt−1 . This is known as the GARCH-X model, where the label X is due to

(2.17)

xt

being treated as an

exogenous variable. Engle (2002b) was the rst to estimate this model with the realized variance

22

as the exogenous variable treats

xt

xt .

The GARCH-X model is referred to as a

partial-model

because it

as an exogenous variable. Within the GARCH-X framework no eort is paid to explain

the variation in the realized measures, so these are partial (incomplete) models that have nothing to say about returns and volatility beyond a single period into the future. In order to complete the model we need a specication for

xt ,

which is given in Section 2.2. This model was also estimated

by Barndor-Nielsen and Shephard (2007) who used both the realized variance and the bi-power variation as the exogenous variable. The condition variance, Thus, if

γ 6= 0

requirement is

then (2.17) implies that

xt

Ft = σ(rt , xt , rt−1 , xt−1 , . . .),

MEM Engle (2002b)

is adapted to but

Ft

ht ,

Ft .

is, by denition, adapted to

Ft−1 .

A ltration that would satisfy this

could in principle be an even richer

σ -eld.

The Multiplicative Error Model (MEM) by Engle (2002b) was the rst

complete model in this context, see also Engle and Gallo (2006). This model species a GARCH structure for each of the realized measures, so that an additional latent volatility process is introduced for each realized measure in the model. Another complete model is the HEAVY model by Shephard and Sheppard (2010) that, in terms of its mathematical structure, is nested in the MEM framework. Unlike the traditional GARCH models, these models operate with multiple latent volatility processes. For instance, the MEM by Engle and Gallo (2006) has a total of three latent volatility processes and the HEAVY model by Shephard and Sheppard (2010) has two (or more) latent volatility processes.

2.1.4 Covariance Prediction Model-free predictor

A simple estimator would be to use the trailing 255 daily close-to-close

return history to estimate the

variance-covariance

(VCV) matrix, with monthly update, i.e.

the

estimated VCV is then kept constant throughout the month, until the next update. An alternative would be to incorporate intraday information, using hourly open-to-close returns, where marketmicrostructure noise is less prevalent. For this alternative, we need to address the contribution of variance from activities outside of market hours. One obvious way to account for this systemic bias is to scale the open-to-close VCV. In Figure 2.1 we plot the monthly variance of daily close-to-close return against the monthly variance of hourly open-to-close return.

squares

Using simple

ordinary least

(OLS) regression, together with the fact that trading spans 6.5 hours during the day, the

estimated the average per-hour ratios for open-to-close to overnight variance is in the range of 4.4 6

to 16.1 . This is in-line with estimated value of 12.78 by Oldeld and Rogalski (1980) and 13.2 by French and Roll (1986). We see that the value of this bias could vary signicantly depending on the stock and, not shown here, it could also be a function of the sample period. This can be explained by considering the following:

.

on one end of the spectrum, for an

American depositary receipt

(ADR) that is linked to a

stock mainly traded on an Asian exchange, most of the information content would be captured in the close-to-open returns.

Hence, we expect a smaller (likely to be less than 1) hourly

open-to-close to overnight variance ratio;

.

on the other end of the spectrum, for the stock of a company that generates its income mainly in the US, we would expect most of the price movement coming from trading during market

6 Note

that the higher the ratio, the more volatile the open-to-close return relative to close-to-open returns.

23

0.00005

0.00010

0.00015

0.00020

0.0025 0.0015

slope = 7.12

0.0005

0.0015 0.0010 0.0000

0.0005

slope = 8.62

0.00000

DELL Monthly variance on daily c2c return

IBM

0.00000

Monthly variance on hourly o2c return

0.00010

0.00020

0.00030

Monthly variance on hourly o2c return

0.003 0.002

slope = 7.79

0.000

0.001

0.015 0.005

0.010

slope = 10.11

0.000 0.0000

XOM Monthly variance on daily c2c return

0.020

WFT

0.0004

0.0008

0.0012

0e+00

1e-04

2e-04

3e-04

4e-04

5e-04

Figure 2.1: Monthly variance using daily close-to-close vs using hourly open-to-close. Sample period: 2008-02-02 to 2009-01-01. Slope is from simple OLS.

hours, hence a more volatile open-to-close return variance which leads to a higher variance ratio. The simplest form of model free estimator incorporating higher frequency data is as follows

.

calculate hourly open-to-close returns (based on tick or higher frequency data);

.

calculate VCV matrix over a suitably chosen number of observations as the

.

scale VCV by stock specic scaling parameter, calculated over the same look-back period, to

look-back period ;

give an unbiased estimator of daily close-to-close VCV.

Moving average predictor

The RiskMetrics conditional

exponentially weighted moving average

(EWMA) covariance estimator, based on daily observations, is given by

> Ht = αrt−1 rt−1 + (1 − α) Ht−1 where the initial recursion value,

(2.18)

H0 , can be taken to be the unconditional sample covariance matrix.

RiskMetrics (1996) suggests a value for the smoothing parameter of

α = 0.06.

The primary advantage

of this estimator is clear: it has no parameters to estimate and is simple to implement. The obvious drawback is that it forces all assets to have the same smoothing coecient irrespective of the asset's volatility dynamics. This will be addressed by the following section, where, with added complexity, we aim to address some of these asset idiosyncratic behaviors. When applied to high-frequency data, it has often been observed that (2.18) gives a noisy estimator of the VCV matrix. Therefore, the RiskMetric EWMA estimator is adapted to high frequency observations as follows. Let the

daily

latent covariance estimator,

Hτ = αrτ rτ> + (1 − α) Hτ −1 24

Hτ ,

be given by

0.30 0.25 0.20

Value of parameter

0.15

2

4

6

8

10

12

Sampling period, *5min

Figure 2.2: MLE tted EMWA parameter as a function of sampling period; Gaussian density.

where

rτ rτ>

is the variance-covariance of the return over the period,

observed return for day

rτ rτ> Hτ ,

τ

and

rn,τ

is the

n-th

τ ∈ [t1,τ , tn,τ ], r1,τ

and last observed return for day

is simply the high-frequency realized VCV for day

τ.

τ.

is the rst

In other words,

So we have one latent VCV estimator,

per day.

To estimate the EWMA parameters explicitly, we use MLE to maximize the joint Gaussian density. The tted result gives

α = 0.30,

i.e. signicantly more weight on the high-frequency estimator,

rτ −1 rτ>−1 , than that suggested by RiskMetric7 . To quantify robustness of the EWMA estimator to the sampling frequency, the dataset is down-sampled to frequencies corresponding to sampling every 10min, 15min, 20min, 25min and 30min. Figure 2.2 shows the MLE tted parameter as a function of sampling period. It can be seen that as sampling frequency decreases, there is a corresponding decrease in the persistency parameter. This is attributable to the decreasing information content in less frequently sampled return data.

DCC(1,1)-GARCH predictor (Engle, 2002c)

The DCC-GARCH framework relies on two

2 latent processes: conditional variance Dt , and conditional correlation and

Dt

is diagonal, so that

Ht = Dt Rt Dt .

Rt , where both Dt , Rt ∈ Rn×n

The conditional variance is given by

> 2 Dt2 = diag (ωi ) + diag (κi ) ⊗ rt−1 rt−1 + diag (λi ) ⊗ Dt−1 , where

rt ∈ Rn

7 The

and



is the Hadamard product. The conditional correlation is given by

−1/2

−1/2

Rt

=

diag (Qt )

Qt diag (Qt )

Qt

=

(1 − α − β) Q + αt−1 > t−1 + βQt−1

MLE tted parameters assuming a t(5)-distribution are (0.28,

25

0.72).

(2.19)

where

  Q = E t−1 > t−1

is the unconditional correlation matrix of the standardized residual. The

log-likelihood for Gaussian innovation can be expressed as

L = −

1X 2 n log 2π + log |Dt | + rt> Dt−2 rt − > t t 2 t

−1 + log |Rt | + > t R t t . Parameter estimation is done via a two stage optimization procedure.

Stage One

In the rst stage, we maximize the log-likelihood of the conditional variance process,

( θb = arg max

ω,κ,λ

Stage Two

)  1 X 2 − n log 2π + log |Dt | + rt> Dt−2 rt . 2 t

In the second stage, we maximize the conditional correlation process, given the stage

one result,

)  1X > −1 > log |Rt | + t Rt t − t t . max − α,β 2 t (

In the next Section, we will extend this framework to take into account high frequency observations based on a new proposed univariate model for the conditional variance.

2.2

Complete Framework: Realized GARCH

2.2.1 Framework We propose a new framework that combines a GARCH structure for returns with a model for realized measures of volatility. Models within this framework are called

Realized GARCH

models, a name

that transpires both the objective of these models (similar to GARCH) and the means by which these models operate (using realized measures).

A

Realized GARCH

model maintains the single

volatility-factor structure of the traditional GARCH framework. Instead of introducing additional latent factors, we take advantage of the natural relationship between the realized measure and the conditional variance, and we will argue that there is no need for additional factors in many cases. The general structure of the RealGARCH(p,q) model is given by

rt =

where

zt ∼ iid (0, 1)

and

p

ht zt ,

(2.20)

ht = v(ht−1 , . . . , ht−p , xt−1 , . . . , xt−q ),

(2.21)

xt = m(ht , zt , ut ),

(2.22)

ut ∼ iid (0, σu2 ),

with

We refer to the rst two equations as the

zt

and

ut

being mutually independent.

return equation

and the

GARCH equation, and these

dene a class of GARCH-X models, including those that were estimated by Engle (2002a) and Barndor-Nielsen and Shephard (2007). Recall that the GARCH-X acronym refers to the the fact that

xt

is treated as an exogenous variable.

26

measurement equation,

We shall refer to (2.22) as the

often be interpreted as a measurement of

xt = ht + ut .

ht .

because the realized measure,

xt ,

can

The simplest example of a measurement equation is:

The measurement equation is an important component because it completes the

model. Moreover, the measurement equation provides a simple way to model the joint dependence between

rt

and

the presence of

xt , which is known to be empirically important.

This dependence is modeled though

zt in the measurement equation, which we nd to be highly signicant in our empirical

analysis. It is worth noting that most (if not all) variants of ARCH and GARCH models are nested in the

Realized GARCH

framework. See Bollerslev (2009) for a comprehensive list of such models. The

nesting can be achieved by setting

xt = rt

or

xt = rt2 ,

and the measurement equation is redundant

for such models, because it is reduced to a simple identity. The

Realized GARCH

model with a simple log-linear specication is characterized by the follow-

ing GARCH and measurement equations.

log ht = ω +

Xp i=1

βi log ht−i +

Xq j=1

γj log xt−j ,

(2.23)

log xt = ξ + ϕ log ht + τ (zt ) + ut , where

√ zt = rt / ht ∼ iid (0, 1), ut ∼ iid (0, σu2 ),

Remark

.

15

and

τ (z)

(2.24)

is called the

leverage function.

A logarithmic specication for the measurement equation seems natural in this context.

The reason is that (2.20) implies that

log rt2 = log ht + log zt2 ,

(2.25)

and a realized measure is in many ways similar to the squared return,

ht .

measure of of

It is therefore natural to explore specications where

log ht and zt , such as (2.24).

rt2 ,

log xt

albeit a more accurate

is expressed as a function

A logarithmic form for the measurement equation makes it convenient

to specify the GARCH equation with a logarithmic form, because this induces a convenient ARMA structure, as we shall see below.

Remark tion,

.

16

τ (zt ).

In our empirical application we adopt a quadratic specication for the leverage funcThe identity (2.25) motivated us to explore expressions that involves

log zt2 ,

but these

were inferior to the quadratic expression, and resulted in numerical issues because zero returns are occasionally observed in practice.

Remark 0

then

.

17

xt

The conditional variance,

must also be adapted to

σ(rt , xt , rt−1 , xt−1 , . . .),

Remark ht .

.

18

but

Ft

Ft .

ht

is, by denition, adapted to

xt

Therefore, if

A ltration that would satisfy this requirement is

could in principle be an even richer

Note that the measurement equation does not require

For instance,

Ft−1 .

γ 6= Ft =

σ -eld. xt

to be an unbiased measure of

could be a realized measure that is computed with high-frequency data from

a period that only spans a fraction of the period that

rt

is computed over. E.g.

realized variance for a 6.5 hour long period whereas the return, spans 24 hours. When

xt

is roughly proportional to

ht ,

rt ,

xt

could be the

is a close-to-close return that

then we should expect

ϕ ≈ 1,

and that is

indeed what we nd empirically. Both when we use open-to-close returns and close-to-close returns.

27

An attractive feature of the log-linear

Realized GARCH

model is that it preserves the ARMA

structure that characterizes some of the standard GARCH models. This shows that the ARCH nomenclature is appropriate for the

Realized GARCH

model. For the sake of generality we derive

the result for the case where the GARCH equation includes lagged squared returns. Thus consider the following GARCH equation,

log ht where

= ω+

Xp i=1

βi log ht−i +

Xq j=1

γj log xt−j +

Xq j=1

2 αj log rt−j ,

(2.26)

q = maxi {(αi , γi ) 6= (0, 0)}.

Proposition 19. Dene wt = τ (zt ) + ut and vt = log zt2 − κ, where κ = E log zt2 . The Realized

GARCH model dened by (2.24) and (2.26) implies log ht = µh +

q p∨q X X (γj wt−j + αj vt−j ), (αi + βi + ϕγi ) log ht−i + j=1

i=1

log xt = µx +

p∨q X

(αi + βi + ϕγi ) log xt−i + wt +

i=1

log rt2 = µr +

p∨q X

{−(αj + βj )wt−j + ϕαj vt−j } ,

j=1

p∨q p∨q X X 2 (αi + βi + ϕγi ) log rt−i + vt + {γi (wt−j − ϕvt−j ) − βj vt−j } , i=1

j=1

where µh = ω + γ• ξ + α• κ, µx = ϕ(ω + α• κ) + (1 − α• − β• )ξ , and µr = ω + γ• ξ + (1 − β• − ϕγ• )κ, with X X X q

α• =

j=1

αj ,

β• =

p

i=1

βi ,

and

γ• =

q

j=1

γj ,

using the conventions βi = γj = αj = 0 for i > p and j > q. log ht is ARMA(p ∨ q, q − 1), whereas α1 = · · · = αq = 0, then log xt is ARMA(p ∨ q, p).

So the log-linear Realized GARCH model implies that

log rt2 and

log xt

are ARMA(p

∨ q, p ∨ q ).

If

From Proposition 19 we see that the persistence of volatility is summarized by a

parameter π=

p∨q X

persistence

(αi + βi + ϕγi ) = α• + β• + ϕγ• .

i=1

Example 20.

By setting

model. For instance with

xt = rt2 , it is easy to verify that the RealGARCH(p,q) nests the GARCH(p,q) p=q=1

we obtain the GARCH(1,1) structure with

2 2 v(ht−1 , rt−1 ) = ω + αrt−1 + βht−1 ,

m(ht , zt , ut ) = ht zt2 . The measurement equation is simply an identity in this case, i.e. we can take

Example 21.

If we set

xt = rt ,

ut = 0,

for all

t.

then we obtain the EGARCH(1,1) model by Nelson (1991) with

v(ht−1 , rt−1 ) = exp {ω + α|zt−1 | + θzt−1 + β log ht−1 } , p m(ht , zt , ut ) = ht zt .

28

since

p zt−1 = rt−1 / ht−1 ,

Naturally, the interesting case is when containing several realized measures.

GARCH

xt

is a high-frequency based realized measure, or a vector

Next we consider some particular variants of the

Realized

model.

Consider the case where the realized measure, variance.

xt ,

is a consistent estimator of the integrated

Now write the integrated variance as a linear combination of the conditional variance

and a random innovation, and we obtain the relation

xt = ξ + ϕht + t .

We do not impose

ϕ=1

so that this approach also applies when the realized measure is computed from a shorter period (e.g. 6.5 hours) than the interval that the conditional variance refers to (e.g. 24 hours). Having a measurement equation that ties

xt

to

ht

has several advantages. First, it induces a simple and

tractable structure that is similar to that of the classical GARCH framework.

For instance, the

conditional variance, the realized measure, and the squared return, all have ARMA representations. Second, the measurement equation makes it simple to model the dependence between shocks to returns and shocks to volatility, that is commonly referred to as a leverage eect. measurement equation induces a structure that is convenient for prediction.

Third, the

Once the model is

estimated it is simple to compute distributional predictions for the future path of volatilities and returns, and these predictions do not require us to introduce auxiliary future values for the realized measure. To illustrate the framework and x ideas, consider a canonical version of the

Realized GARCH

model that will be referred to as the RealGARCH(1,1) model with a log-linear specication. This model is given by the three equations

rt

where

rt

is the return,

=

p

ht zt ,

log ht

= ω + β log ht−1 + γ log xt−1,

log xt

= ξ + ϕ log ht + τ (zt ) + ut ,

zt ∼ iid(0, 1)

and

ut ∼ iid(0, σu2 ),

and

ht = var(rt |rt−1 , xt−1 , rt−2 , xt−2 , . . .).

The last equation relates the observed realized measure to the latent volatility, and is therefore called the measurement equation. It is easy to verify that one,

log ht = µ + π log ht−1 + wt−1 ,

where

log ht

is an autoregressive process of order

µ = ω + γξ, π = β + ϕγ,

and

wt = γτ (zt ) + γut .

So

it is natural to adopt the nomenclature of GARCH models. The inclusion of the realized measure in the model and the fact that

log xt

has an ARMA representation motivate the name

Realized

GARCH. A simple, yet potent specication of the leverage function is τ (z) = τ1 z + τ2 (z − 1), which 2

can generate an asymmetric response in volatility to return shocks.

The simple structure of the

model makes the model easy to estimate and interpret, and leads to a tractable analysis of the quasi maximum likelihood estimator. The MEM by Engle and Gallo (2006) utilizes two realized measures in addition to the squared returns.

These are the intraday range (high minus low) and the realized variance, whereas the

HEAVY model by Shephard and Sheppard (2010) uses the realized kernel (RK) by Barndor-Nielsen et al. (2008b). These models introduce an additional latent volatility process for each of the realized measures.

So the MEM and the HEAVY digress from the traditional GARCH models that only

have a single latent volatility factor. Unlike the MEM by Engle and Gallo (2006) and the HEAVY model by Shephard and Sheppard (2010), the Realized GARCH has the following characteristics.

29

.

Maintains the single factor structure of latent volatility.

.

Ties the realized measure directly to the conditional variance.

.

Explicit modeling of the return-volatility dependence (leverage eect).

Key model features are given in Table 1. Brownless and Gallo (2010) estimates a restricted MEM model that is closely related to the Realized GARCH with the linear specication. They utilize a single realized measure, the realized kernel by Barndor-Nielsen et al. (2008b), so that they have two latent volatility processes,

E(rt2 |Ft−1 ) and

µt = E(xt |Ft−1 ).

ht =

However, their model is eectively reduced to a single factor

model as they introduce the constraint,

ht = c + dµt , see Brownless and Gallo (2010,

Eqns. (6)-(7)).

This structure is also implied by the linear version of our measurement equation. However, they do not formulate a measurement equation or relate

xt − µt

to a leverage function. Instead they, for

the purpose of simplifying the prediction problem, adopt a simple time-varying ARCH structure,

µt = at + bt xt−1 ,

where

at

and

bt

are dened by spline methods. Spline methods were introduced in

this context by Engle and Rangel (2008) to capture the low-frequency variation in the volatility. One of the main advantages of

Realized GARCH

framework is the simplicity by which dependence

between return-shocks and volatility shocks is modeled with the leverage function.

The MEM is

formulated with a general dependence structure for the innovations that drive the latent volatility processes. The usual MEM formulation is based on a vector of non-negative random innovations,

ηt ,

that are required to have mean

E(ηt ) = (1, . . . , 1)0 .

The literature has explored distributions

with this property such as certain multivariate Gamma distributions, and Cipollini, Engle, and Gallo (2009) use copula methods that entail a very exible class of distributions with the required structure. Some drawbacks of this approach include: estimation is complex; and a rigorous analysis of the asymptotic properties of these estimators seems intractable. A simpler way to achieve the structure in the multiplicative error distribution is by setting vector of random variables random variables,

Zt ,

ηt = Zt Zt ,

and work with the

instead. The required structure can be obtained

with a more traditional error structure, where each element of

Zt

is required to have zero mean

and unit variance. This alternative formulation can be adopted without any loss of generality, since the dependence between the elements of

Zt

is allow to take any form. The estimates in Engle and

Gallo (2006) and Shephard and Sheppard (2010) are based on a likelihood where the elements of

ηt

are independent

χ2 -distributed

random variables with one degree of freedom. We have used the

alternative formulation in Table 1 where

2 2 (zt2 , zR,t , zRV,t )0

corresponds to

ηt

in the MEM by Engle

and Gallo (2006).

Leverage function and news impact

The function

τ (z)

is called the leverage function because

it captures the dependence between returns and future volatility, a phenomenon that is referred to as the leverage eect. We normalized such functions by

Eτ (zt ) = 0, and we focus on those that have

the form

τ (zt ) = τ1 a1 (zt ) + · · · + τk ak (zt ),

where

Eak (zt ) = 0,

for all

k,

so that the function is linear in the unknown parameters. We shall see that the leverage function induces an EGARCH type structure in the GARCH equation, and we note that the functional form used in Nelson (1991),

τ (zt ) = τ1 z + τ+ (|zt | − E|zt |)

30

is within the class of leverage functions we

consider. We shall mainly consider leverage functions that are constructed from Hermite polynomials

τ (z) = τ1 z + τ2 (z 2 − 1) + τ3 (z 3 − 3z) + τ4 (z 4 − 6z 2 + 3) + · · · , and our baseline choice for the leverage function is a simple quadratic form This choice is convenient because it ensures that

Ezt = 0

and

var(zt ) = 1.

Eτ (zt ) = 0,

τ (zt ) = τ1 zt + τ2 (zt2 − 1).

for any distribution of

zt ,

so long as

The polynomial form is also convenient in our quasi likelihood analysis,

and in our derivations of the kurtosis of returns generated by this model. The leverage function

τ (z)

is closely related to the

news impact curve, see Engle and Ng (1993),

that maps out how positive and negative shocks to the price aect future volatility. We can dene the news impact curve by

ν(z) = E(log ht+1 |zt = z) − E(log ht+1 ), so that

100ν(z) measures the percentage impact on volatility as a function of the studentized return.

From the ARMA representation it follows that

ν(z) = γ1 τ (z).

Quasi-Maximum Likelihood Estimation (QMLE) Analysis

In this section we discuss the

asymptotic properties of the quasi-maximum likelihood estimator within the RealGARCH(p, q) model. The structure of the QMLE analysis is very similar to that of the standard GARCH model, see Bollerslev and Wooldridge (1992), Lee and Hansen (1994), Lumsdaine (1996), and Jensen and Rahbek (2004b,a).

Both Engle and Gallo (2006) and Shephard and Sheppard (2010) justify the

standard errors they report, by referencing existing QMLE results for GARCH models. This argument hinges on the fact that the joint log-likelihood in Engle and Gallo (2006) and Shephard and Sheppard (2010) is decomposed into a sum of univariate GARCH-X models, whose likelihood can be maximized separately. The factorization of the likelihood is achieved by two facets of these models: One is that all observables (i.e. squared return and each of the realized measures) are being tied to their individual latent volatility process. The other is that the primitive innovations in these models are taken to be independent in the formulation of the likelihood function. The latter inhibits a direct modeling of leverage eect with a function such as

τ (zt ),

which is one of the traits of the Realized

GARCH model. In this section we will derive the underlying QMLE structure for the log-linear model. The structure of the linear

Realized GARCH

Realized GARCH

model is similar. We provide detailed expres-

sions for the rst and second derivatives of the log-likelihood function. These expressions facilitate direct computation of robust standard errors, and provide insight about regularity conditions that would justify QMLE inference. For instance, the rst derivative will unearth regularity conditions that enables a central limit theorem be applied to the score function. For the purpose of estimation, we adopt a Gaussian specication, so that the log likelihood function is given by

n

`(r, x; θ) = −

We write the leverage function as

1X [log(ht ) + rt2 /ht + log(σu2 ) + u2t /σu2 ]. 2 t=1

τ 0 at = τ1 a1 (zt ) + · · · + τk0 ak (zt ),

31

and denote the parameters in the

model by

θ = (λ0 , ψ 0 , σu2 )0 ,

where λ = (ω, β1 , . . . , βp , γ1 , . . . , γq )0 ,

To simplify the notation we write

˜ t = log ht h

and

x ˜t = log xt ,

ψ = (ξ, ϕ, τ 0 )0 .

and dene

˜ t−1 , . . . , h ˜ t−p , x gt = (1, h ˜t−1 , . . . , x ˜t−q )0 ,

˜ t , a0 )0 . mt = (1, h t

So the GARCH and measurement equations can be expresses as

˜ t = λ0 gt h

x ˜t = ψ 0 mt + ut .

and

ht

The dynamics that underlies the score and Hessian are driven by to

λ.

and its derivatives with respect

The properties of these derivatives are stated next.

Lemma 22. Dene h˙ t = h˙ t =

p X

˜t ∂h ∂λ

and h¨ t =

˜t ∂2h ∂λ∂λ0

. Then h˙ s = 0 and h¨ s = 0 for s ≤ 0, and

and

βi h˙ t−i + gt

¨t = h

i=1

p X

¨ t−i + (H˙ t−1 + H˙ 0 ), βi h t−1

i=1

where H˙ t−1 = 01+p+q×1 , h˙ t−1 , . . . , h˙ t−p , 01+p+q×q is an p + q + 1 × p + q + 1 matrix. (ii) When p = q = 1 we have with β = β1 that 



h˙ t =

t−1 X

and

β j gt−j

¨t = h

j=0

t−1 X

kβ k−1 (Gt−k + G0t−k ),

k=1

where Gt = (03×1 , gt , 03×1 ). Proposition 23. (i) The score,

∂` ∂θ

=

n P t=1

∂`t ∂θ

, is given by



(1 − zt2 +

1 ∂`t =−  ∂θ 2

2ut ˙ t )h˙ t 2 u σu

t − 2u σ 2 mt

  , 

u

2 σu −u2t 4 σu

where u˙ t = ∂ut /∂ log ht = −ϕ + 12 zt τ 0 a˙ t with a˙ t = ∂a(zt )/∂zt . n P ∂ 2 `t ∂2` (ii) The second derivative, ∂θ∂θ 0 = ∂θ∂θ 0 , is given by t=1

  ∂ 2 `t =  0 ∂θ∂θ

where

n − 12 zt2 +

bt = (0, 1, − 21 zt a˙ 0t )0

2(u˙ 2t +ut u ¨t ) h˙ t h˙ 0t − 12 1 − zt2 2 σu u˙ t ut ˙0 ˙0 2 mt ht + σ 2 bt ht σu u ut u˙ t ˙ 0 4 ht σu

o

and

n

 u ¨t = − 14 τ 0 zt a˙ t + zt2 a¨t

+

2ut u˙ t 2 σu

o







− σ12 mt m0t



ut 0 4 mt σu

2 2 1 σu −2ut 6 2 σu

 , 

¨t h u

with a¨t = ∂ 2 a(zt )/∂zt2 .

An advantage of our framework is that we can draw upon results for generalized hidden Markov models. that

˜t h

Consider the case

p = q = 1:

From Carrasco and Chen (2002, Proposition 2) it follows

has a stationary representation provided that

32

π = β + ϕγ ∈ (−1, 1),

and if we assign

˜0 h

its invariant distribution, then

˜ t |s < ∞ E|h

if

˜t h

is strictly stationary and

β -mixing

with exponential decay, and

s

E|τ (zt ) + ut | < ∞. Moreover, {(rt , xt ), t ≥ 0} is a generalized hidden Markov model, ˜ t , t ≥ 0}, and so by Carrasco and Chen (2002, Proposition 4) it follows that {h

with hidden chain also

{(rt , xt )}

is stationary

β -mixing

with exponentially decay rate.

The robustness of the QMLE as dened by the Gaussian likelihood is, in part, reected by the weak assumptions that make the score a martingale dierence sequence.

These are stated in the

following Proposition.

Proposition 24. (i) Suppose that E(ut |zt , Ft−1 ) = 0, E(zt2 |Ft−1 ) = 1, and E(u2t |Ft−1 ) = σu2 . Then

is a martingale dierence sequence. (ii) Suppose, in addition, that {(rt , xt , h˜ t )} is stationary and ergodic. Then

st (θ) =

∂`t (θ) ∂θ

n

1 X ∂`t d √ → N (0, Jθ ) n t=1 ∂θ

and

n



1 X ∂ 2 `t p → Iθ , n t=1 ∂θ∂θ0

provided that 

1 4 E(1



2ut ˙ t )2 E 2 u  σu

h˙ t h˙ 0t



− σ12 E u˙ t mt h˙ 0t

 Jθ =  

and

− zt2 + u

−E(u3t )E(u˙ t ) E(h˙ 0t ) 6 2σu

o E(u˙ 2 ) + σ2t E(h˙ t h˙ 0t ) n u o  1 ˙0 Iθ =  − E ( u ˙ m + u b ) h 2 t t t t t  σu 0 n









1 0 2 E(mt mt ) σu 3 E(ut ) 0 6 E(mt ) 2σu



 , 

2 E(u2t /σu −1)2 4 4σu



0



1 0 2 E(mt mt ) σu

0

0

1 4 2σu

 , 

1 2

are nite. t ∂`t Note that in the stationary case we have Jθ = E ∂` ∂θ ∂θ 0 , so a necessary condition for |Jθ | < ∞ is that zt and ut have nite forth moments. Additional moments may be required for zt , depending on the complexity of the leverage function τ (z), because u˙ t depends on τ (zt ).



Theorem 25. Under suitable regularity conditions, we have the asymptotic result.   √  n θˆn − θ → N 0, Iθ−1 Jθ Iθ−1 . It is worth noting that the estimator of the parameters in the GARCH equation, the measurement equation,

ψ,

λ,

and those of

are not asymptotically independent. This asymptotic correlation is

induced by the leverage function in our model, and the fact that we link the realized measure, to

ht

xt ,

with a measurement equation.

In the context of ARCH and GARCH models, it has been shown that the QMLE estimator is consistent with an Gaussian limit distribution regardless of the process being stationary or nonstationary.

The latter was established in Jensen and Rahbek (2004b,a).

So unlike the case for

autoregressive processes, we do not have a discontinuity of the limit distribution at the knife-edge in the parameter space that separates stationary and non-stationary processes. This is an important result for empirical applications, because the point estimates are typically found to be very close to the boundary.

33

Notice that the martingale dierence result for the score, Proposition 24(i), does not rely on stationarity. So it is reasonable to conjecture that the central limit theorem for martingale dierence processes is applicable to the score even if the process is non-stationary. While standard errors for

θˆ

may be compute from numerical derivatives, these can also be

computed directly using the following expressions

n



1X 0 Jˆ = sˆt sˆt , n t=1

sˆt =

where

1 2 (1



zˆt2

2ˆ ut ˆ ˆ˙ 0 ut 0 σ ˆ2 − u ˆ2 ˆ t, u 4 t ) + 2u ˙ t )ht , − 2 m σ ˆu σu 2ˆ σu

0 ,

and

n ˆ ˆ h˙ t h˙ 0t + 21 1 − zˆt2 +   ˆ ˆ˙t m −ˆ σu−2 u ˆt +u ˆtˆbt h˙ 0t ˆ ˆ − uˆσˆt u4˙ t h˙ 0t u n o n ˆ ˆ˙2 +ˆ ¨t ) ˆ 2(u ˆ˙ 0 1 1 2 t ut u ˙ th z ˆ + h 1 − zˆt2 + + 2 t 2 σ ˆu  t 2  ˆ ˆ˙t m ˆt +u ˆtˆbt h˙ 0t −ˆ σu−2 u ˆ ˆ − uˆσˆt u4˙ t h˙ 0t

 Iˆ

=

1 2

n 1 X   n t=1 

 n

=

1 X   n t=1 

n zˆt2 +

ˆ˙2 +ˆ ˆ 2(u ¨t ) t ut u 2 σ ˆu

o

ˆ˙t 2ˆ ut u 2 σ ˆu

o

ˆ¨ h t



1 ˆ tm ˆ 0t 2 m σ ˆu



− σˆuˆ4t m ˆ 0t

2 u2t −ˆ σu 1 2ˆ 6 2 σ ˆu

u

ˆ˙t 2ˆ ut u 2 σ ˆu

o

ˆ¨ h t

• − σˆ12 m ˆ tm ˆ 0t u

0

u

Pn

where the zero follows from the rst order condition: conditions for

λ

ˆ˙ ˆ u ˆ u ˙0 implies that − t 4 t h σ ˆu

=

For our baseline leverage function,

t=1

1−ˆ zt2 ˆ ˙ 2 ht . 2ˆ σu

τ1 zt + τ2 (zt2 − 1),

   1 0      log ht   1     mt =   , bt =  , 1     zt    − 2 zt  zt2 − 1 −zt2





u ˆt m ˆ 0t = 0.



   



  • , 

1 4 σ ˆu

Moreover, the rst-order

we have



1 u˙ t = −ϕ + τ1 zt + τ2 zt2 , 2

Decomposition of likelihood function σ({rt , xt , ht }, t ≤ 0))

1 u ¨t = − τ1 zt − τ2 zt2 . 4

The log-likelihood function is (conditionally on

F0 =

given by

log L({rt , xt }nt=1 ; θ) =

n X

log f (rt , xt |Ft−1 ).

t=1 Standard GARCH models do not model be compared to those of the density for

(rt , xt )

xt ,

so the log-likelihood we obtain for these models cannot

Realized GARCH

model. However, we can factorize the joint conditional

by

f (rt , xt |Ft−1 ) = f (rt |Ft−1 )f (xt |rt , Ft−1 ), and compare the partial log-likelihood,

`(r) :=

n P

log f (rt |Ft−1 ),

t=1 model. Specically for the Gaussian specication for

34

zt

and

with that of a standard GARCH

ut , we split the joint likelihood,

into the

sum

n

`(r, x)

n

1X 1X [log(2π) + log(ht ) + rt2 /ht ] + − [log(2π) + log(σu2 ) + u2t /σu2 ]. 2 t=1 2 t=1 {z } | {z } | −

=

=`(r)

=`(x|r)

Asymmetries in the leverage function are summarized by the following two statistics,

ρ− = corr{τ (zt ) + ut , zt |zt < 0}

ρ+ = corr{τ (zt ) + ut , zt |zt > 0}.

and

These capture the slope of a piecewise linear news impact curve for negative and positive returns, such as that implied by the EGARCH model.

Multiperiod forecasts

One of the main advantages of having a complete specication, i.e., a

xt

model that fully describes the dynamic properties of

is that it multi-period ahead forecasting is

feasible. In contrast, the GARCH-X model can only be used to make one-step ahead predictions. Multi-period ahead predictions are not possible without a model for dictions with the

Realized GARCH

model is straightforward. We let

xt . Multi-period ahead pre˜ ht denote either ht or log ht ,

such that the results presented in this section apply to both the linear and log-linear variants of the

Realized GARCH

model.

Consider rst the case where

p = q = 1. By substituting the GARCH equation into measurement

equation we obtain the VARMA(1,1) structure

"

˜t h

#

x ˜t

" =

β

γ

ϕβ

ϕγ

#"

˜ t−1 h x ˜t−1

#

" +

#

ω ξ + ϕω

" +

τ (zt ) + ut

that can be used to generate the predictive distribution of future values of

rt , "

#

0

˜ t, x h ˜t ,

,

as well as returns

using

˜ t+h h x ˜t+h

#

" =

β ϕβ

γ ϕγ

#h "

˜t h x ˜t

# +

h−1 X j=0

"

β ϕβ

This is easily extended to the general case (p, q

γ ϕγ ≥ 1)

#j ("

#

ω ξ + ϕω

" +

0 τ (zt+h−j ) + ut+h−j

where we have

Yt = AYt−1 + b + t , with the conventions



˜t h

 .  . .   ˜  ht−p+1 Yt =   x ˜t   .  . .  x ˜t−q+1

      ,     



(β1 , . . . , βp )

(γ1 , . . . , γq )

  (Ip−1×p−1 , 0p−1×1 ) 0p−1×q A=  ϕ(β1 , . . . , βp ) ϕ(γ1 , . . . , γq )  0q−1×p (Iq−1×q−1 , 0q−1×1 )

35

   ,  

#) .





ω

   ,  

  0p−1×1 b=  ξ + ϕω  0q−1×1

0p×1



  t =  τ (zt ) + ut  , 0q×1

so that

= Ah Yt +

Yt+h

h−1 X

Aj (b + t+h−j ).

j=0

The predictive distribution for

˜ t+h h

x ˜t+h ,

and/or

is given from the distribution of

which also enable us to compute a predictive distribution for

rt+h ,

Ph−1 i=0

Ai t+h−i ,

and cumulative returns

rt+1 +

· · · + rt+h .

Induced properties of cumulative returns if and only if the studentized return, identity

rt =



ht zt ,

zt ,

The skewness for single period returns is non-zero,

has non-zero skewness.

and the assumption that

zt ⊥⊥ht ,

This follows directly from the

that shows that,

n o d/2 d/2 d/2 E(rtd ) = E(ht ztd ) = E E(ht ztd |Ft−1 ) = E(ht )E(ztd ), and in particular that

3/2

E(rt3 ) = E(ht )E(zt3 ).

So a symmetric distribution for

zt

implies that

zero skewness, and this is property that is shared by standard GARCH model and

rt

has

Realized GARCH

model alike. For the skewness and kurtosis of cumulative returns,

rt +· · ·+rt+k , the situation is very dierent,

because the leverage function induces skewness.

Proposition 26. Consider RealGARCH(1,1) model and dene π = β + ϕγ and µ = ω + ϕξ, so that

where

log ht = π log ht−1 + µ + γwt−1 ,

wt = τ1 zt + τ2 (zt2 − 1) + ut , √

with zt ∼ iid N (0, 1) and ut ∼ iid N (0, σu2 ). The kurtosis of the return rt = ht zt is given by ∞ Y E(rt4 ) 1 − 2π i γτ2 p = 3 E(rt2 )2 1 − 4π i γτ2 i=0

!

( exp

∞ X i=0

π 2i γ 2 τ12 1 − 6π i γτ2 + 8π 2i γ 2 τ22

)

 exp

γ 2 σu2 1 − π2

 .

(2.27)

There does not appear to be a way to further simplify the expression (2.27), however when

γτ2

is small, as we found it to be empirically, we have the approximation (see the appendix for details)

E(rt4 ) ' 3 exp E(rt2 )2

Simple extension to multivariate



γ 2 τ22 γ 2 (τ12 + σu2 ) + − log π 1 − π2

 .

(2.28)

The simplest extension to multivariate case is via the DCC(1,1)-

GARCH framework, by replacing the conditional variance equation (2.19) by

2 Dt2 = diag (ωi ) + diag (βi ) ⊗ Dt−1 + diag (γi ) ⊗ Xt−1

36

Dt2

consists of log h1,t , . . . , log hN,t , and 2 is related to Dt by the usual measurement equation (cf. (2.24))

where the diagonal matrix

Xt

of

log x1,t , . . . , log xN,t ,

and

log xi,t = ξi + ϕi log hi,t + τ (zi,t ) + ui,t . Essentially, the proposed extension replaces the rst stage of DCC-GARCH by RealGarch(1,1), the output of which is then fed into the second stage to estimate the conditional correlation given by

where

  Q = E t−1 > t−1

Rt

=

Qt

=

−1/2

diag (Qt )



−1/2

Qt diag (Qt )

 e et−1 > 1−α e − βe Q + α t−1 + βQt−1 ,

is the unconditional correlation matrix of the standardized residual. The

key assumption for this extension is that we conjecture that correlation varies at a much slower time scale than does the volatility for each asset. estimations based on

Realized GARCH,

This enables us to simply plug-in the variance

normalize our return series, than estimate the correlation

component exactly as before.

2.2.2 Empirical Analysis Data

Our sample spans the period from 2002-01-01 to 2008-08-31, which we divide into an in-

sample period: 2002-01-01 to 2007-12-31; leaving the eight months, 2008-01-02 and 2008-08-31, for out-of-sample analysis.

We adopt the realized kernel as the realized measure,

xt ,

using a Parzen

kernel function. This estimator is similar to the well known realized variance, but is robust to market microstructure noise and is a more accurate estimator of the quadratic variation. Our implementation of the realized kernel follows Barndor-Nielsen, Hansen, Lunde, and Shephard (2008a) that guarantees a positive estimate, which is important for our log-linear specication. The exact computation is explained in great details in Barndor-Nielsen, Hansen, Lunde, and Shephard (2009). When we estimate a

Realized GARCH

model using open-to-close returns we should expect

whereas with close-to-close returns we should expect

xt

to be smaller than

ht

xt ≈ ht ,

on average.

To avoid outliers that would result from half trading days, we removed days where high-frequency data spanned less than 90% of the ocial 6.5 hours between 9:30am and 4:00pm.

This removes

about three daily observation per year, such as the day after Thanksgiving and days around Christmas.

When we estimate a model that involves

min{s0} log rt2

Result

for

log rt2 ,

we deal with zero returns by substituting

log 0.

In this section we present empirical results using returns and realized measures for 28 stocks

and and exchange-traded index fund, SPY, that tracks the S&P 500 index. We adopt the realized kernel, introduced by Barndor-Nielsen et al. (2008b), as the realized measure,

xt .

We estimate the

realized GARCH models using both open-to-close returns and close-to-close returns. High-frequency prices are only available between open and close, so the population quantity that is estimated by the realized kernel is directly related to the volatility of open-to-close returns, but only captures some of the volatility of close-to-close returns. Next we consider

Realized GARCH

measurement equations.

models with a log-linear specication of the GARCH and

For the purpose of comparison we estimate an LGARCH(1,1) model in

37

Realized GARCH

addition to the six

models. Again we report results for both open-to-close returns

and close-to-close returns for SPY. The results are presented in Table 2. The main results in Table 2 can be summarized by:

.

ARCH parameter,

α,

.

Consistent with the log-linearity we nd

is insignicant.

ϕ ' 1 for both open-to-close and close-to-close returns.

We report empirical results for all 29 assets in Table 3 and nd the point estimates to be remarkable similar across the many time series.

In-sample and out-of-sample likelihood ratio statistics are

computed in Table 4. Table 4 shows the likelihood ratios for both in-sample and out-of-sample period. The statistics are based on our analysis with open-to-close returns. The statistics in Panel A are the conventional likelihood ratio statistics, where each of the ve smaller models are benchmarked against the largest model. The largest model is labeled (2,2). This is the log-linear RealGARCH(2,2) model that has the squared return

rt2

in the GARCH equation in addition to the realized measure. Thus the likelihood

ratio statistics are in Panel A are dened by

 LRi = 2 `RG(2,2)∗ (r, x) − `i (r, x) , where

i represents one of the ve other Realized

distribution of likelihood ratio statistic, Thus comparing the

LRi

LRi ,

GARCH

models. In the QMLE framework the limit

is usually given as a weighted sum of

to the usual critical value of a

χ

2

χ2 -distributions.

-distribution is only indicative of signi-

cance.



Comparing the RealGARCH(2,2) So

to RealGARCH(2,2) leads to small LR statistics in most cases.

α tends to be insignicant in our sample.

This is consistent with the existing literature that nds

that squared returns adds little to the model, once a more accurate realized measure is used in the GARCH equation. The leverage function, the hypothesis that

τ (zt ),

is highly signicant in all cases. The LR statistics associated with

τ1 = τ2 = 0

are well over 100 in all cases. These statistics can be computed



by subtracting the statistics in the column labeled (2,2) from those in the column labeled (2,2) . The joint hypothesis,

β2 = γ 2 = 0

is rejected in most cases, and so the empirical evidence does not

support a simplication of the model to the RealGARCH(1,1). The results for the two hypotheses

β2 = 0

and

γ2 = 0

are, on average,

5.7,

are less conclusive.

β2 = 0

which would be borderline signicant when compared to conventional critical

2 values from a χ(1) -distribution. and are on average

The likelihood ratio statistics for the hypothesis,

16.6.

The LR statistics for the hypothesis,

γ2 = 0,

tend to be larger

So the empirical evidence favors the RealGARCH(1,2) model over the

RealGARCH(2,1) model. Consider next the out-of-sample statistics in Panel B. These

likelihood ratio

(LR) statistics are

computed as

r

where

n

and

m

n {`RG(2,2) (r, x) − `j (r, x)}, m

denote the sample sizes, in-sample and out-of-sample, respectively. The in-sample

parameter estimates are simply plugged into the out-of-sample log-likelihood, and the asymptotic distribution of these statistics are non-standard because the in-sample estimates do not solve the rst-order conditions out-of-sample, see Hansen (2009). The RealGARCH(2,2) model nests, or is

38

39

-1752.7

.988

-

-0.18 1.04 0.38 -0.07 0.07

-0.18

1.04

0.38

-0.07

0.07

-0.32 0.12

-0.33

0.12 -1710.3

-0.18

-0.18

-1712.0

.986

.975

-2388.8

-0.18

-

-2395.6

0.45

-1711.4

0.13

-0.32

-0.16

.976

-2391.9

0.07

-0.07

0.38

1.04

-0.18

-

0.43

0.13

0.40

-

0.06

0.70

-

0.96

0.04

RG(2,1)

RG(2,2)†

RG(2,2)∗

G(1,1)

RG(1,1)

-1712.3

0.13

-0.35

-0.19

.999

-2385.1

0.07

-0.07

0.38

0.96

-0.23

-0.44

0.46

-0.44

1.43

-

0.00

-2382.9

0.07

-0.07

0.38

1.03

-0.18

-0.41

0.42

-0.46

1.45

0.00

0.00

0.96

0.03

0.05

-2576.86

0.04

-0.11

0.39

0.99

-0.42

-

0.43

-

0.54

-

0.18

-1708.9

-

-

-

.999

-1709.6

0.14

-0.35

-0.16

.999

-1938.24

.988

-1876.51

-0.01

-0.31

-0.27

.974

Panel B: Auxiliary Statistics (in-sample)

-2495.7

-

-

0.41

1.07

-0.16

-0.38

0.40

-0.44

1.42

-

0.00

-1875.46

-0.03

-0.29

-0.25

.987

-2567.15

0.04

-0.11

0.38

1.00

-0.42

-0.21

0.48

-

0.72

-

0.11

-1876.12

-0.03

-0.28

-0.25

.975

-2571.67

0.04

-0.11

0.39

0.99

-0.42

-

0.46

0.15

0.37

-

0.19

RG(2,1)

-1875.39

0.01

-0.29

-0.25

.999

-2564.16

0.04

-0.11

0.38

1.02

-0.42

-0.42

0.45

-0.41

1.38

-

0.01

RG(2,2)

Close-to-Close Returns

RG(1,2)

Panel A: Point Estimates and Log-Likelihood (in-sample)

RG(2,2)

Open-to-Close Returns

RG(1,2)

0.41

-

0.55

0.03

0.06

RG(1,1)

-1874.88

-

-

-

.999

-2661.73

-

-

0.41

1.03

-0.41

-0.40

0.42

-0.43

1.40

-

0.01

RG(2,2)†

RG(2,2)

RG(2,2)





is the (2,2) extended to include the ARCH-term

denotes the RealGARCH(2,2) model without the

τ (z) function that captures the dependence 2 α log rt−1 . The latter being insignicant.

between returns and innovations in volatility.

Table 2: Results for the logarithmic specication: G(1,1) denotes the LGARCH(1,1) model that does not utilize a realized measure of volatility.

π ρ ρ− ρ+ ` (r)

α β1 β2 γ1 γ2 ξ ϕ σu τ1 τ2 ` (r, x)

0.04

ω

(0.00)

G(1,1)

Model

-1876.13

-0.05

-0.33

-0.28

.999

-2563.53

0.04

-0.11

0.38

0.99

-0.42

-0.42

0.46

-0.39

1.35

0.00

0.02

RG(2,2)∗

AA AIG AXP BA BAC C CAT CVX DD DIS GE GM HD IBM INTC JNJ JPM KO MCD MMM MRK MSFT PG T UTX VZ WMT XOM SPY Average

os

` (r)

os

` (r, x)

-

-493.8 -535.8 -477.8 -440.2 -517.4 -516.0 -394.0 -377.7 -394.1 -364.8 -401.7 -499.5 -452.0 -360.8 -424.4 -277.6 -496.7 -360.3 -386.3 -331.4 -437.1 -402.9 -315.2 -397.3 -372.0 -404.9 -353.7 -364.9 -338.4

ρ+

-

-409.1 -438.0 -390.8 -335.8 -416.9 -427.6 -315.6 -306.1 -307.9 -288.7 -300.8 -384.7 -378.2 -284.7 -345.2 -192.9 -402.7 -265.0 -290.7 -261.7 -314.0 -317.7 -239.8 -313.1 -289.5 -320.8 -275.9 -299.0 -260.0

ρ−

0.22

0.24 0.08 0.25 0.26 0.21 0.24 0.27 0.14 0.20 0.22 0.25 0.31 0.20 0.24 0.22 0.30 0.22 0.19 0.26 0.21 0.18 0.24 0.14 0.25 0.29 0.23 0.30 0.15 0.13

ρ

-0.30

-0.32 -0.17 -0.30 -0.36 -0.31 -0.31 -0.32 -0.35 -0.35 -0.35 -0.26 -0.33 -0.37 -0.32 -0.24 -0.25 -0.30 -0.28 -0.35 -0.23 -0.13 -0.31 -0.32 -0.32 -0.34 -0.31 -0.29 -0.37 -0.32

π

-0.08

-0.08 -0.06 -0.05 -0.09 -0.09 -0.07 -0.08 -0.19 -0.13 -0.09 -0.02 -0.01 -0.13 -0.09 -0.05 0.04 -0.10 -0.06 -0.09 -0.04 0.04 -0.08 -0.14 -0.07 -0.05 -0.08 -0.02 -0.20 -0.17

` (r, x)

0.99

0.98 0.98 0.99 0.99 0.99 0.99 0.99 0.97 0.98 1.00 0.99 0.99 0.99 0.98 1.00 0.99 0.99 0.99 0.99 0.97 0.98 0.99 0.98 0.99 0.99 0.99 0.99 0.98 0.99 -2994.1

-2893.6

-3196.7

-3059.2

-3512.7

-2646.6

-3021.1

-3478.5

-2944.8

-3371.9

-2573.6

-3276.8

-2777.3

-3481.1

-2896.7

-3318.4

-3967.3

-2988.7

-3289.6

-3067.3

-3021.8

-3279.4

-2974.0

-2823.4

-3260.0

-3217.9

-3317.2

-3519.9

-

-2388.8

` (r)

-

-2776.4 -2403.1 -2371.1 -2536.0 -2016.9 -2260.5 -2621.1 -2319.1 -2301.2 -2518.5 -2197.8 -2987.9 -2538.4 -2192.6 -2869.1 -1874.8 -2463.0 -1886.7 -2461.8 -2140.3 -2479.2 -2330.7 -1850.7 -2560.4 -2302.4 -2343.4 -2164.9 -2334.7 -1710.3

τ2

0.09

0.09 0.04 0.10 0.09 0.08 0.09 0.09 0.08 0.08 0.09 0.08 0.12 0.09 0.08 0.07 0.10 0.09 0.07 0.11 0.07 0.07 0.08 0.08 0.10 0.10 0.09 0.09 0.08 0.07

τ1

-0.03

-0.04 -0.02 -0.02 -0.03 -0.04 -0.03 -0.03 -0.08 -0.05 -0.04 -0.01 -0.01 -0.05 -0.04 -0.02 0.02 -0.04 -0.02 -0.05 -0.02 0.01 -0.03 -0.05 -0.03 -0.01 -0.03 -0.01 -0.08 -0.07

σu

0.41

0.40 0.45 0.43 0.39 0.42 0.39 0.38 0.39 0.40 0.41 0.41 0.47 0.41 0.39 0.36 0.44 0.42 0.38 0.45 0.41 0.47 0.38 0.41 0.46 0.40 0.43 0.39 0.38 0.38

ϕ

1.04

1.15 1.02 1.08 1.22 0.99 0.99 1.07 1.32 1.08 1.10 0.98 1.02 1.01 0.94 1.03 1.04 0.98 0.93 0.98 0.98 1.23 0.92 1.04 0.86 0.88 1.01 1.04 1.26 1.04

ξ

-0.02

-0.07 -0.06 -0.16 -0.13 0.00 0.09 -0.14 -0.09 0.08 -0.05 0.01 -0.32 0.00 0.01 -0.11 0.13 -0.02 0.19 -0.01 0.02 -0.19 0.08 0.18 0.01 0.06 0.07 0.12 -0.10 -0.18

γ2

-0.21

-0.14 -0.21 -0.12 -0.17 -0.29 -0.19 -0.22 -0.14 -0.17 -0.25 -0.19 -0.24 -0.20 -0.15 -0.33 -0.19 -0.30 -0.21 -0.25 -0.23 -0.21 -0.22 -0.25 -0.38 -0.24 -0.20 -0.19 -0.12 -0.18

γ1

0.41

0.33 0.45 0.38 0.31 0.51 0.45 0.37 0.33 0.37 0.39 0.38 0.39 0.39 0.41 0.46 0.38 0.49 0.45 0.37 0.43 0.33 0.44 0.43 0.53 0.45 0.40 0.37 0.34 0.45

β

0.79

0.77 0.74 0.70 0.82 0.78 0.74 0.82 0.71 0.77 0.85 0.81 0.84 0.79 0.74 0.87 0.80 0.81 0.76 0.88 0.77 0.84 0.79 0.78 0.86 0.80 0.79 0.80 0.71 0.70

ω

0.01

0.03 0.02 0.05 0.02 0.00 -0.02 0.03 0.03 -0.01 0.01 0.00 0.06 0.01 0.00 0.02 -0.03 0.01 -0.05 0.00 0.00 0.03 -0.01 -0.04 0.00 -0.01 -0.01 -0.02 0.03 0.04

Table 3: Estimates for the RealGARCH(1, 2) model.

40

nested in, all other models. For nested and correctly specied models where the larger model has

k

additional parameters that are all zero (under the null hypothesis) the out-of-sample likelihood

ratio statistic is asymptotically distributed as

r

where

Z1

and

n d {`i (r, x) − `j (r, x)} → Z10 Z2 , m

Z2

are independent

as m, n → ∞ with m/n → 0,

Zi ∼ Nk (0, I).

This follows from, for instance, Hansen (2009,

corollary 2), and the (two-sided) critical values can be inferred from the distribution of

k=1

|Z10 Z2 |.

the 5% and 1% critical values are 2.25 and 3.67, respectively, and for two degrees of freedom

(k = 2),

these are 3.05 and 4.83, respectively.

When compared to these critical values we nd,

on average, signicant evidence in favor of a model with more lags than RealGARCH(1,1). statistical evidence in favor of a leverage function is very strong.

α,

For

The

Adding the ARCH parameter,

will (on average) result in a worse out-of-sample log-likelihood. As for the choice between the

RealGARCH(1,2), RealGARCH(2,1), and RealGARCH(2,2) the evidence is mixed. In Panel C, we report partial likelihood ratio statistics, that are dened by

`j (r|x)},

2{maxi `i (r|x) −

so each model is compared with the model that had the best out-of-sample t in term of

Realized GARCH models with the standard GARCH(1,1) model, and we see that the Realized GARCH models also dominate the standard GARCH model in this metric. This is made more impressive by the fact that Realized GARCH models are maximizing the joint likelihood, and not the partial likelihood that is used in the partial likelihood. These statistics facilitate a comparison of the

these comparisons.

8

The leverage function,

τ (z)

is closely related to the

news impact curve

that was introduced by

Engle and Ng (1993). High frequency data enable are more more detailed study of the news impact curve than is possible with daily returns. A detailed study of the news impact curve that utilizes high frequency data is Ghysels and Chen (2010). Their approach is very dierent from ours, yet the shape of the news impact curve they estimate is very similar to ours. The news impact curve shows how volatility is impacted by a shock to the price, and our Hermite specication for the leverage function presents a very exible framework for estimating the news impact curve. In the log-linear specication we dene the new impact curve by

ν(z) = E(log ht+1 |zt = z) − E(log ht+1 ), so that

100ν(z)

measures the percentage impact on volatility as a function of return-shock mea-

sures in units of standard deviations. As shown in Section 2.2.1 we have

ν(z) = γ1 τ (z).

We have

estimated the log-linear RealGARCH(1,2) model for both IBM and SPY using a exible leverage function based on the rst four Hermite polynomials.

(−0.036, 0.090, 0.001, −0.003)

for IBM and

The point estimates were

(ˆ τ1 , τˆ2 , τˆ3 , τˆ4 ) =

(ˆ τ1 , τˆ2 , τˆ3 , τˆ4 ) = (−0.068, 0.081, 0.014, 0.002)

for SPY.

Note that the Hermite polynomials of orders three and four add little beyond the rst two polynomials. The news impact curves implied by these estimates are presented in Figure 2.3. The fact that

ν(z)

is smaller than zero for some (small) values of

z

is an implication of its denition that implies,

E[ν(z)] = 0. 8 There is not a well developed theory for the asymptotic distribution of these statistics, in part because we are comparing a model that maximizes the partial likelihood (the GARCH(1,1) model) with models that maximizes the joint likelihood (the Realized GARCH models).

41

Panel A: In-Sample Likelihood Ratio

Panel B: Out-of-Sample Likelihood Ratio

Panel C: Out-of-Sample Partial Likelihood Ratio

0.7 7.2 1.4 0.0 0.9 0.4 1.5 0.3 0.0 0.8 0.7 0.9 0.4 0.0 1.8 0.0 1.6 0.5 1.7 1.4 1.8 0.1 1.7 0.0 0.1 0.1 2.6 0.5 1.3

(2,2)∗

0.6 0.0 1.7 0.3 19.7 0.0 1.6 0.0 1.4 1.1 0.3 0.3 2.1 0.3 0.1 0.1 2.0 0.9 2.0 1.0 2.4 0.2 1.5 1.7 0.0 0.0 1.4 0.5 2.5

1.0

(2,2)†

0.0 6.0 1.3 0.0 1.5 1.1 1.0 0.3 1.2 0.8 0.0 0.0 0.2 0.1 0.0 0.1 1.6 0.5 1.8 1.4 1.9 0.0 1.3 0.0 0.2 0.3 1.3 0.5 0.0

1.6

(2,2)

2.6 7.4 0.1 1.7 0.0 0.5 0.0 0.2 1.5 1.4 0.6 0.2 0.1 0.7 2.0 0.1 0.1 0.0 0.2 0.9 1.7 0.4 0.6 2.6 0.9 2.8 0.0 0.3 0.7

0.8

(2,1)

1.4 5.9 0.3 1.8 0.6 0.9 0.8 0.3 1.5 2.0 0.0 0.0 0.0 0.5 0.5 0.0 1.3 0.6 1.5 1.9 1.6 0.5 1.4 0.2 0.2 0.7 0.6 0.5 0.6

1.0

(1,2)

3.3 9.3 0.0 0.6 4.1 0.3 0.1 0.6 1.2 0.0 0.9 1.4 0.9 0.4 1.7 0.4 0.0 0.4 0.0 0.0 0.0 0.1 0.0 2.8 1.4 3.5 0.0 0.0 0.8

1.0

(1,1)

4.6 56.1 24.0 1.1 147.9 26.9 47.3 30.1 19.2 35.4 41.6 57.5 45.5 12.0 133.1 21.8 34.9 5.6 6.8 14.5 11.8 37.1 24.8 19.2 12.3 8.7 30.5 21.6 40.8

1.2

G11

0.1 12.5 0.2 0.0 -2.8 -7.2 2.2 -0.1 6.9 0.0 0.6 14.6 0.5 -0.2 19.1 0.2 -0.6 0.0 1.4 -0.1 0.4 4.1 1.0 -0.2 -0.4 -1.3 9.4 0.6 1.4

33.5

(2,2)∗

21.9 25.5 13.3 24.5 56.5 -0.1 9.0 20.8 -7.1 24.3 9.3 7.2 28.9 25.0 36.8 30.7 -3.6 22.6 47.4 24.5 32.4 17.0 35.0 35.1 33.1 16.6 28.7 27.9 24.4

2.1

(2,2)†

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23.0

(2,2)

6.4 6.0 1.5 -2.6 -0.3 -2.5 -4.2 2.2 3.0 3.7 5.9 6.8 -1.0 1.0 -3.4 2.0 -5.4 -1.0 -9.3 -2.8 -1.2 7.1 -0.4 13.5 1.2 11.1 -7.9 1.0 1.9 0

(2,1)

4.5 -0.2 2.7 0.4 -0.7 -2.7 -1.0 0.0 4.1 4.8 0.6 -0.2 -1.0 -0.1 0.1 -0.8 -2.5 -2.8 -0.6 -0.2 -2.5 4.2 0.3 -0.3 -2.0 3.2 -3.8 -0.1 -1.2 1.1

(1,2)

6.9 15.7 1.2 -1.4 8.0 1.3 1.7 5.3 4.6 4.8 14.1 17.0 2.2 3.3 0.9 7.1 0.3 2.1 -13.0 -2.4 -3.4 8.3 -1.3 19.7 5.8 15.0 -7.2 5.7 6.3 0.1

(1,1)

0

4.4

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

187.4 201.2 197.0 197.1 198.8 228.4 227.3 194.6 167.0 213.7 200.3 319.9 201.3 176.7 130.8 235.4 213.5 186.1 278.6 183.5 309.0 186.1 160.4 198.3 223.7 188.7 190.0 234.0 225.6 0

(2,2)∗

3.9 14.2 0.0 0.0 5.1 6.3 1.6 0.0 6.1 0.0 12.2 18.4 2.0 1.0 9.7 0.1 1.9 0.0 0.8 0.1 2.9 10.2 2.9 0.0 0.0 0.5 8.3 0.9 4.2 208.8

(2,2)†

11.7 20.6 26.0 18.8 20.5 16.0 13.8 3.5 15.9 33.1 20.2 39.9 14.4 20.2 47.8 4.8 23.6 28.3 26.1 20.4 10.6 20.5 12.3 39.0 29.0 13.2 23.0 4.3 17.9 3.9

(2,2)

8.3 15.2 25.0 6.0 5.9 9.5 2.3 0.1 10.8 12.6 12.6 18.8 4.0 16.2 13.9 1.3 3.5 22.0 2.9 6.7 12.0 11.5 3.0 4.8 21.6 4.3 12.0 1.1 11.6 20.5

(2,1)

24.0 43.8 30.5 34.8 46.3 26.8 39.8 17.7 31.1 57.7 37.7 62.7 29.8 28.5 75.9 34.6 50.7 39.7 54.8 32.0 64.6 37.3 36.5 69.8 39.6 31.5 36.2 14.7 25.3 9.6

(1,2)

AA AIG AXP BA BAC C CAT CVX DD DIS GE GM HD IBM INTC JNJ JPM KO MCD MMM MRK MSFT PG T UTX VZ WMT XOM SPY

39.8

(1,1)

Average

Table 4: In-Sample and Out-of-Sample Likelihood Ratio Statistics

42

Figure 2.3: News impact curve for IBM and SPY

The estimated news impact curve for IBM is more symmetric about zero than that of SPY, and this empirical result is fully consistent with the existing literature. The most common approach to model the news impact curve is to adopt a specication with a discontinuity at zero, such as that used in the EGARCH model by Nelson (1991),

τ (z) = τ1 z + τ+ (|z| − E|z|). We also estimated the leverage

functions with the piecewise linear function that leads to similar empirical results. Specically, the implied news impact curves have the most pronounced asymmetry for the index fund, SPY, and the two oil related stocks, CVX and XOM. However, the likelihood function tends to be larger with the polynomial leverage function,

τ (z) = τ1 z + τ2 (z 2 − 1),

and the polynomial specication simplies

aspects of the likelihood analysis. For the multivariate case, an application of the proposed DCC(1,1)-RealGARCH(1,1) extension to the univariate

Realized GARCH

model is illustrated by using four stocks, IBM, XOM, SPY

and WMT, with tted result shown in Table 5.

Note that this simple extension is a proof of

concept of how the RealGARCH framework can be extended to deal with covariance estimation, as a straightforward plug-in to an existing establish framework. The assumption that volatility changes more frequently than the underlying correlation of stocks is justiable for typical portfolio holding horizon measured in days or longer. Here we can draw on the result of empirical analysis for the univariate case and expect similar predictive gain for this proposed extension. More sophisticated frameworks would be benecial when we need to deal with cases where both volatility and correlation are highly non-stationary, and change on a comparable time scale. This is an area for future research.

43

h0 ω β γ ξ ϕ τ1 τ2 σu

IBM

XOM

SPY

0.76

1.17

0.28

WMT 2.39

-0.00

0.04

0.06

-0.03

0.64

0.59

0.55

0.66

0.36

0.30

0.41

0.30

0.01

-0.10

-0.18

0.12

0.94

1.27

1.04

1.04

-0.04

-0.08

-0.07

-0.01

0.08

0.08

0.07

0.09

0.39

0.38

0.38

0.40

α e

0.018

βe 0.952

Table 5: Fitted result for DCC(1,1)-RGARCH(1,1)

44

3

Asset Return Estimation and Prediction

mean

In the mean-variance portfolio optimization framework, future asset return.

often refers to some prediction of

Essentially, the framework requires a forecast for the rst two moments of

the joint distribution of an universe of assets under consideration. Of course, in a jointly Gaussian setting, these are the only two moments needed for deriving some objective value based on a suitably chosen utility function. As it is now commonly acknowledged, asset returns are far some Gaussian and often exhibit characteristic tail and asymmetric behavior that are distinctly dierent to a simple Gaussian distribution. Whether it is essential to take these added complexity into account depends on the problem at hand and on the reliability of whether these stylized facts can be adequately estimated, and important signals extracted, from the underlying inherently noisy observations. Given the signicant amount of improvement in protability from being able to increase just a small fraction of the investor's edge over a random estimation, a lot of eort have been spent on modeling this rst moment of the asset return dynamics.

3.1

Outline of Some Commonly Used Temporal Models

Random Walk

If we assume that the underlying return process is a symmetric random walk,

then the one step ahead prediction is simply equal to the current return. That is,

rt+1 | Ft = rt + t+1 , where

t

is a

rt ∈ RN

is the vector of returns often dened as the dierence in the logarithm of prices, and

Ft -adapted,

zero-mean,

iid

random variable. This is not a model per se, but a benchmark

that are often used to compare the predictive gain of dierent modeling frameworks.

Auto-Regressive-Moving-Average (ARMA) models

auto-regressive

Jenkins (1976)) combines the ideas of

ARMA model (see for example Box and

(AR) and

moving average

(MA) models into

a compact and more parsimonious form so that the number of parameters used is kept small. Note that GARCH, mentioned in Section (2), can be regarded as an ARMA model. There are a number of variations of this versatile framework.

Example 27.

Simple ARMA

The classic ARMA(p,q) framework is given by

where

Φ (B) = 1 −

Pp

i=0

Φ (B) rt

=

Θ (B) t





N (0, 1) ,

φi B i , Θ (B) = 1 −

Gaussian innovation series. The series,

rt ,

Pq

i i=0 θi B ,

is said to be

B

is the lag operator, and

unit-root stationary

{t }

if the zeros of

is

an iid

Φ (z)

are

outside the unit circle. For non-stationary series, appropriate dierencing needs to be applied rst before we can estimate the parameters of the model (Box and Jenkins, 1976). The lag order (i.e. the value for

p

and

q ) selection can be done iteratively via the Akaike Information Criterion

value of the tted result. Recall that

AIC = 2k − 2 log L, 45

(AIC)

where

k

is the number of parameters in the model and

L

is the likelihood function.

The model

tting algorithm iteratively increase the lag order and select the nal complexity with the lowest AIC.

Example 28.

ARMA with Wavelet Smoothing

Market microstructure noises, such as bid-ask bounce and price quantization, usually have the maximum impact for sampling frequency in the region of 5 to 20 minutes (Andersen et al., 2001c). To eectively lter out these noises, we can rst decompose the raw return series,

rt ,

into dierent

multiresolution analysis (MRA) (Mallat, 1989), followed by reconstitution of the time series signal by using resolutions, or more precisely dierent levels of wavelet details, by an application of wavelet

a subset of wavelet details so that details below a threshold are ltered out. Recall by denition of

discrete wavelet transform

(DWT)

W = Wrt , where

W ∈ RN ×N

tion of the vector vector with

N/2

j

is the orthogonal real-valued matrix dening the DWT. Consider the decomposi-

W

into

J +1

elements

9

sub-vectors, so that

and

rt = W > W =

J

2 = N.

N −1 X

J X

Wn Wn =

Dj , j = 1, . . . , J , Wn

partitioning the rows of discarding the rst

r˜t =

PJ

is the

W

Wj> Wj + VJ> VJ :=

j=1

which denes a MRA of our return series

at a certain scale.

, where

Wj

is a column

So we have

n=0

wavelet detail

>

W = (W1 , . . . , WJ , VJ )

rt ,

J X

Dj + SJ ,

(3.1)

j=1

consists of a constant vector

SJ

and

J

levels of

rt

each of which contains a time series related to variations in

n-th

row of the DWT matrix.

Wj

commensurate with the partitioning of

and

W

Vj

into

matrices are dened by

W1 , . . . , W J

and

VJ .

By

K − 1 levels of detail, we obtain a series that is a locally ltered, smoothed series,

j=K Dj + SJ ,

which retains features with time scales larger than a predetermined threshold.

Finally, a simple ARMA model is tted to this ltered series,

Φ (B) r˜t

=

Φ (B) t





N (0, 1) ,

where again, AIC can be used to select the optimal lag parameters.

Example 29.

ARMA-GARCH

To account for heteroskedasticity of the return series, we can jointly t an ARMA and GARCH model, for example an ARMA(1,1)-GARCH(1,1) framework on the raw return series,

Φ (B) rt

=

Θ (B) t

ht

=

2 α0 + α1 rt−1 + β1 ht−1

t



N (0, ht ) .

rt ,

so that

9 For cases where N is not an integral power of 2, we can simply truncate and discard some part of the data series, or, refer to Mallat (1989) if it is necessary to treat it as a special case.

46

As in Example (27), the algorithm rst uses AIC to determine the optimal lag parameters (p*,q*) for the ARMA framework then, holding the parameter values constant, it ts an ARMA(p*,q*)GARCH(1,1) model by maximizing the likelihood a Gaussian density for the innovation,

0.000

0.002

Simple ARMA

-0.002

Return

{t }.

101640

101650

101660

101670

101680

Index

Return

-0.002

0.000

0.002

Smoothed ARMA

101640

101650

101660

101670

101680

Index

Return

-0.002

0.000

0.002

Smoothed ARMA-GARCH

101640

101650

101660

101670

101680

Index

Figure 3.1: Output of three variations of the ARMA time series framework. Black circles are actual data-points and red dash-lines indicate the next period predictions.

Top panel:

simple ARMA;

Center panel: ARMA with wavelet smoothing; Bottom panel: ARMA-GARCH. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. 15:15:00.

Sample period shown here is between 2008-05-06 13:40:00 and 2008-05-07

At each point, we t a model using the most recent history of 1,000 data points, then

make a 1-period ahead forecast using the tting parameters.

See Figure 3.1 for predictions versus actual output for models given in the three examples above, all based on some variations of the ARMA framework.

Innovations State Space Model

The innovations state space model can be written in terms of

the underlying measurement and transition equations

where

yt

yt

= w (xt−1 ) xt−1 + r (xt−1 ) t

xt

= f (xt−1 ) xt−1 + g (xt−1 ) ⊗ t

denotes the observation at time

t,

and

xt

denotes the state vector containing unobserved

components that describe the level, trend and seasonality of the underlying time series; noise series with zero mean and constant variance

σ2 .

{t } is a white

Hyndman et al. (2000) gives a comprehensive

treatment of frameworks that belong to this class of models and propose a methodology of automatic

47

forecasting that takes into account trend, seasonality and other features of the data. They cover 24 state space models in their framework that include additive and multiplicative specications for the seasonal component and an additional damped specication for the trend component. Model selection is done using the AIC of the tted models.

additive Holt-Winters (see Holt-Winters Additive Model has

The general formulation of innovative state space model also include Holt, 1957 and Winters, 1969) as a specic instance. Recall the the following specication

   l = α (yt − st−m ) + (1 − α) (lt−1 + bt−1 )  t bt = β (lt − lt−1 ) + (1 − β) bt−1    s = γ (y − l − b ) + (1 − γ) s . t

The

t

t−1

t−1

t−m

Level Growth Seasonal

h -step ahead forecast is then given by yˆ t+h|t = lt + bt h + st−m+h+ m

where

h+ m = (h + 1) mod m + 1.

If we dene

t , yˆ t|t−1 − yt ,

then the

Holt-Winters Additive Model

can be expressed in the linear innovation framework as

yt

h

=

where the state vector

xt =



=

lt

xt−1 + t  

1

1

1

1

0

  0 0

1

   0  xt−1 +  αβ  ⊗ t , 1 γ

 xt

i

1

bt

0

st−m

>

α



. This is an example of a linear innovations state

space model. Figure 3.2 compares the 1-period predicted return with the actual realized return for the sample period between 2008-05-06 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.

Neural Network

Neural network is a popular subclass of nonlinear statistical models. See Hastie

et al. (2003) for a comprehensive treatment of this class of models.

feed forward - back propagation

network, also known as

The structural model for a

multilayer percetrons

in the neural network

literature, is given by

F (x) =

M X

 bm S a0m +

m=1

where

S : R → (0, 1)

specifying

F

m X

 ajm xj  + b0

j=1

is known as the activation function, and

n oM M bm {ajm }0

are the parameters 0 (x). Figure 3.3 illustrates a simple feed-forward neural network with four inputs, three

hidden activation units and one output. For illustration purpose, a (15,7,1) neural net is tted to the same dataset, with tan-sigmoid,

S (x) = e

x

,

−e−x/ex +ex as the activation function for the hidden

layer. The model inputs include lagged returns, lagged trading direction (+1 for positive return and -1 for negative return) indicators and logarithm of the ratio of market high to market low prices,

48

0.000 -0.002

Return

0.002

AIC innovations state space

101640

101650

101660

101670

101680

Index

0.000 -0.002

Return

0.002

Additive Holt-Winters

101640

101650

101660

101670

101680

Index Figure 3.2: Output based on AIC general state space model following Hyndman et al. (2000) and the Holt-Winters Additive models. Black circles are actual data-points and red dash-lines indicate the next period predictions. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. Sample period shown here is between 2008-05-06 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.

dened over a look-back period of 1,000 observations, as a proxy for volatility. Figure 3.4 shows the model output.

3.2

Self Excited Counting Process and its Extensions

Until recently, the majority of time series analyses related to nancial data has been carried out using regularly spaced price data (see Section 3.1 for the outline of some commonly used models based on equally spaced data), with the goal of modeling and forecasting key distributional characteristics of future returns, such as expected mean and variance.

These time series data mainly consist of

daily closing prices, where comprehensive data are widely available for a large set of asset classes. With the recent rapid development of high-frequency nance, the focus has shifted to intra-day tick data, which record every transaction during market hours, and come with irregularly spaced time-stamps. We could resample the dataset and apply the same analyses as before, or we could try to explore additional information that the inter-arrival times may convey in terms of likely future trade direction. In order to take into account of these irregular occurrences of transactions properly, we can adopt the framework of a point process. In a doubly stochastic framework (see Bartlett, 1963), both the counting process and the driving intensity are stochastic. A point process is called self-excited if the

49

1

a01

a11

x1

S1

a12 1

x2

b1

a02 b2

S2 x3

1 a 03

a42

x4

1 b 0

+

F(x)

b3

S3

a43

Input layer

hidden layer

output layer

Figure 3.3: Diagrammatic illustration of a feed-forward (4,3,1) neural network.

current intensity of the events is determined by events in the past, see Hawkes (1971). It is widely accepted and observed that volatility of price returns tends to cluster. That is, a period of elevated volatility is likely to be followed by periods of similar levels of volatility. Trade arrivals also exhibit such clustering eect (Engle and Russell, 1995), for example a buy order is likely to be followed closely by another buy order. These orders tend to cluster in time. There has been a growing amount of literature on the application of point process to model inter-arrival trade durations, see for example Engle and Russell (1997), and Bowsher (2003) for a comprehensive survey of the latest modeling frameworks. This section of the thesis extends previous work on the application of self-excited process to model high frequency nancial data. The extension comes in the form of a marked version of the process in order to take into account trade size inuence on the underlying arrival intensity. In addition, by incorporating information from the

limit order book

(LOB), the proposed framework takes into

account a measure of supply-demand imbalance of the market by parametrize the underlying base intensity as a function of this imbalance measure. Given that the main purpose of this section is to outline the methods to incorporate trade size and order-book information in a self-excited point process framework, empirical comparison of the proposed framework to other similar and related models are reserved for future research. For a comprehensive introduction to the theory of point process, see Daley and Vere-Jones (2003) and Bremaud (1980). The following sub-sections give a brief outline of the self-excited point process framework in order to motivate the extensions that follow in Section 3.2.3.

3.2.1 Univariate Case Consider a simple counting process for the number of trade events, of the trades, space

{ti }i∈{0,1,2,...T } ,

(Ω, F, P),

such that

Nt ,

characterized by arrival time

a sequence of strictly position random variable on the probability

t0 = 0

and

0 < ti ≤ ti+1

for

i ≥ 1.

We allow the intensity of the process

be itself stochastic, characterized by the following Stieltjes integral

ˆ h (t − u) dNu

λt = µ + u

A

0

#"

x∗ −λ∗

73

#

" =

−c b

# (4.5)

Note that we have assumed

A

has full row rank, since

m ≤ n,

independent linear equality constraints on a problem with dimensionality of the optimization to

Z ∈ N (A),

i.e.

Z

is in the

null space

n − m. of

A,

Let

then the imposition of

linearly

n variables can be viewed as reducing the  > , i.e. Y is in the range of A , and

Y ∈ R A>

then (4.4) can be simplied to give

X 1 > > > > ∗ x Z HZx log xi min x Z (c + HY x ) + Z −µ Z Z Y 2 xZ ∈Rn−m x ∈x i

where the unique decomposition of the

m

n-vector, x

(4.6)

Z

is given by

x = Y xY + ZxZ . XZ−2 =

Let

diag

0, . . . , 0, x−2 Z



 Z > H + µXZ∗−2 Z  0,

. If

then the unique solution to (4.6),

x∗Z ,

must satisfy the equations

Z > HZx∗Z

µXZ−1 e | {z }

=

−Z > (c + HY x∗Y ) ,

(4.7)

term due to barrier Note that due to the barrier term on the right hand side, the system in (4.7) is clearly nonlinear in

xz ,

so ruling out the possibility of using a simple

conjugate gradient

(CG) method.

Base on a similar analysis as the one above, an IQP can be posed as solving a sequence of unconstrained QP in a reduced space in to



x

from

x,



p = x − x.

so that

Rn−m ,

e ge + Hp Ap where

−v = Ax − b.

Observe that

p

min

s.t.

12

corresponding partition for some

pB ∈ R

m×1

,

,

x=

h

A=

pS ∈ R

h

p

p

denote the step

to give

= A> λ ∗

(4.8)

= −v

(4.9)

itself is the solution of an equality-constrained QP,

p∈Rn

Consider the partition

using a line research method. Let

We can write (4.5) in terms of

xB B

(n−m)×1

e ge> p + 21 p> Hp Ap = −v.

i xS in m basic and n − m superbasic variables, together with i h i S for some B ∈ Rm×m , S ∈ Rm×(n−m) , and p = pB pS , then (4.9) can be written as

BpB = v − SpS 12 e.g.

B.

we could use the pivoting permutation, as part of the QP decomposition, to nd the full rank square matrix

74

and we have

"

pB

#

" =

pS

"

"

B −1

W = Clearly we have the

AQ =

h

I

0

i

13

# ,Z=

n × (n − m)

"

" ,Q

I

pS .

I

#

−B −1 S

#

−B −1 S

v+

0

"

0

pS #

B −1

= − Let

#

−B −1 v − B −1 SpS

−1

=

#

B

S

0

In−m

.

reduced-gradient basis, Z ∈ N (A)

matrix of

and the product

. We have

p = W pB + ZpS , where the values of

pB

and

pS

can be readily determined by the following methods,

. pB :

by (4.9) and (4.10), we have

. pS :

multiply (4.8) by

ZT

(4.10)

Ap = AW pB = −v

which simplies to

pB = −v ;

and using (4.10), we obtain

e S = −Z > ge − Z > HY e pB , Z > HZp which has a unique solution for If

x

is feasible (so that

v = 0),

then

pS

since

(4.11)

e  0. Z > HZ

pB = 0, p ∈ N (A)

and (4.11) becomes

e S = −Z > ge. Z > HZp In general, the reduced-gradient form of

Z

is not formed or stored explicitly. Instead, the search

direction is computed using vectors of the form

"

where

Buξ = −Sξ ,

Similarly,

Z >ζ ,

for some

n-vector ζ =

>

Z ξ=

where

Buζ1 = −Sζ1 ,

h

for some

#

ζ1

−B

−1

ζ2

S

and can be solved using

"

ξ ∈ R(n−m)×1 ,



ξ=

I

and can be solved using

h

Zξ ,

−B −1 S

Zξ =

(4.12)

#

ξ

LU-decomposition

i>

I

such that

B,

of

uξ = −U −1 L−1 Sξ .

i.e.

, can be expressed as

i

"

ζ1

#

ζ2

" =

uζ1

#

ζ2

LU-decomposition

of

B,

i.e.

uζ1 = −U −1 L−1 Sζ1 .

Since these vectors are obtained by solving systems of equations that involve may be represented using only a factorization (such as the

B.

that Ye ∈ N (A) only if Y > Z = 0, e.g. when Q is orthogonal.

75

and

B>,

thus

Z

LU-decomposition ) of the m × m matrix

The proposed algorithm is given in Algorithm 1.

13 Note

B

Algorithm 1 Null Space QP iP ←

[form permutation matrix]

h

via pivoting scheme of the QR decomposition of matrix

Aˇ,

such

ˇ =A e= B e ∈ Rm×m is simply given by the e Se , where the full rank square matrix B that AP e. partition of A partition of x ˇ into m basic and n − m superbasic variables Given: µ0 , σ , max-outer-iteration, max-CG-iteration (0) start with feasible value x0 , which implies pB = 0 and remain zero for all subsequent iterations k←1 while k ≤ max-outer-iteration do e k ← H + µ(k) X −2 H k gek ← c + Hx − µ(k) Xk−1 e (k) > e > based on (4.12), where Z H ek . We solve this linear system with calculate pS k ZpS = −Z g Tuncated CG Algorithm 3 (setting maximum iteration to max-CG-iteration). Note that rather

than forming Z 's explicitly, we use stored LU-decomposition ZpS h i p(k) ← 0 p(k) S

of

B

for the matrix-vector product

[calculate step-length]

if x(k) + p(k) > 0 then α(k) = 1

else

α(k) = 0.9

end if

[new update]

x(k+1) ← x(k) + α(k) p(k) k ←k+1 if kegk+1 k − kegk k > − then µ(k+1) ← σµ(k)

else

µ(k+1) ← µ(k) ;

end if end while

nal result is given by



x(k ) P > ,

where

k∗

is the terminating iteration

76

1.0 0.8 0.6 0.4 0.0

0.2

mu

0

10

20

30

40

50

40

50

0.0 0.2 0.4 0.6 0.8 1.0 1.2

| Obj.iterative - Obj.true |

Iteration

0

10

20

30 Iteration

Figure 4.2: Null space line-search algorithm result. a function of iteration.

x0 =



0.3

0.5

0.2

Top panel: value of barrier parameter,

Bottom panel: 1-norm of error.

0.7

0.3

>

σ

is set to

0.8

µ,

as

and initial feasible value,

. Trajectory of the 1-norm error in the bottom panel illustrates

that the algorithm stayed within the feasible region of the problem.

Small Scale Simulated Result 

Consider the case where



6.97

  −0.34 8.04  H=  −0.14 −0.11   −1.52 −0.44 1.70 0.17

















4.89



    2.64        • •   , c =  3.40     8.24 •   4.62  0.07 5.11 3.93

7.28 −0.41 0.30

and







1

1

0

0

 A= 1

0

0

1

0

0

1

0

   1  , b =  1.0  . 0 0.2

By using the built-in QP solver in

R,

which implements the dual method of Goldfarb and Idnani

(1982, 1983), we get the optimal solution responding objective function value of

1.0



1

x∗ =

7.61.

h

0.688

0.11

0.20

0.31

−0.00

i>

and the cor-

Note the slight violation of the non-zero constraint in

the last variable. Figure 4.2 shows the result of our line search algorithm, which converges to the same result strictly in the interior of the constrains.

4.2.2 Empirical Analysis Data

Empirical results in this section is based on the daily closing prices for 500 of the most

liquid stocks, as measured by the median 21-day daily transacted volume, traded on the main stock exchanges in the US, spanning the period between 2008-01-22 to 2009-01-22. mean-variance optimization problem is as follows,

77

The setup of the

40 30 20

Frequency

10 0

-0.6

-0.4

-0.2

0.0

0.2

Daily return, %

Figure 4.3: Histogram for mean daily return for 500 most actively traded stocks between 2008-01-22 to 2010-01-22

. H

is the variance-covariance matrix for the log returns, based on open-to-close dierence in

log-prices.

. c .

.

is set to equal to the mean of log returns over the sample period, see Figure 4.3;

a total of two

equality constraints

are set such that



weights for stocks with A in the rst alphabet in their ticker name sum to 1/2;



weights for stocks with B in the rst alphabet in their ticker name sum to 1/2;

inequality constraints

are set such that the vector of 500 weights are larger than -10, i.e.

xi ≥ −10 ∀i.

Result

Figure 4.4 shows the result using the built-in

R routine that is based on the dual method of

Goldfarb and Idnani (1982, 1983), for the cases with and without the inequality constraint. For the inequality constrained case, the number of iterations until convergence is set to 30, and the number of active constraints at solution is set to 29. Compare this with the result based on Algorithm 1, shown in Figure 4.5, we see that the two results are clearly not identical. This is an artifact of the iterative routines used in the underlying algorithms used by the two methods. In terms of processing time, the built-in

R function is faster than our algorithm.

Figure 4.6 and

Figure 4.7 shows the convergence diagnostics. Table 1 shows diagnostic output from Algorithm 1 at each outer iteration.

4.3

Solving CCQP with a Global Smoothing Algorithm

Murray and Ng (2008) proposes an algorithm for nonlinear optimization problems with discrete variables. This section extends that framework, by casting the problem in the portfolio optimization

78

50 0

Weights

100

min x: -41.3 max x: 120 obj value: -0.339

0

100

200

300

400

500

400

500

30

Stock index

10 -10

0

Weights

20

min x: -10 max x: 30.3 obj value: -0.306

0

100

200

300

Stock index

Figure 4.4:

Optimization output using built-in

R

routine that is based on the dual method of

Goldfarb and Idnani (1982, 1983). Top Panel: with equality constraint only. Bottom Panel: with equality and inequality constraints.

x, respectively.

and

max x

are the minimum and maximum of the solution

20 0 -40

-20

Weights

mix x

min x: -39.2 max x: 50.2 obj value: -0.333

40

vector

0

100

200

300

400

500

400

500

10 20 30 40 50

min x: -9.98 max x: 56.5 obj value: -0.309

-10

0

Weights

Stock index

0

100

200

300

Stock index

Figure 4.5:

Optimization output using Algorithm 1.

Top Panel:

Bottom Panel: with equality and inequality constraints. maximum of the solution vector

x, respectively.

79

mix x

with equality constraint only.

and

max x

are the minimum and

8e-04 mu

4e-04 0e+00

5

10

15

20

15

20

15

20

0.04 0.02 0.00

| Obj.iterative - Obj.true |

Iteration

5

10

20 15 10 5

CG iterations

25

30

Iteration

5

10 Iteration

Figure 4.6: Algorithm 1 convergence diagnostics. Top panel: value of barrier parameter

µ;

Center

panel: 1-norm of dierence between objective value at each iteration and the true value (as dened to be the value calculated by the

R

function); Bottom panel: the number of CG iterations at each

40 0

20

Step length

60

outer iteration.

5

10

15

20

15

20

15

20

-0.27 -0.29 -0.31

Obj.iterative

Iteration

5

10

|| Grad ||

0.00 0.01 0.02 0.03 0.04 0.05

Iteration

5

10 Iteration

Figure 4.7: Additional Algorithm 1 convergence diagnostics. Top panel: step-length objective function value; Bottom panel: 2-norm of the gradient.

80

α; Center panel:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

ngrad.r 2.12e-02 4.75e-02 1.26e-03 1.15e-03 8.32e-04 7.82e-04 2.59e-03 1.49e-03 7.28e-04 6.71e-04 5.42e-04 4.03e-04 6.17e-04 2.45e-03 1.17e-03 5.85e-04 6.72e-04 6.72e-04 6.72e-04 6.72e-04 6.72e-04

Table 7:

ngrad 2.07e-02 4.98e-02 1.29e-03 1.21e-03 8.86e-04 8.42e-04 2.64e-03 1.59e-03 8.51e-04 8.59e-04 8.46e-04 7.67e-04 8.67e-04 2.57e-03 1.39e-03 8.77e-04 9.40e-04 9.40e-04 9.40e-04 9.40e-04 9.40e-04

obj -0.253 -0.272 -0.274 -0.286 -0.294 -0.300 -0.299 -0.300 -0.304 -0.306 -0.306 -0.306 -0.308 -0.308 -0.308 -0.309 -0.309 -0.309 -0.309 -0.309 -0.309

cond 7.45e+03 7.07e+03 7.64e+03 1.38e+04 2.21e+04 3.16e+04 7.30e+04 3.74e+04 4.66e+04 5.08e+04 4.61e+04 3.99e+04 4.80e+04 1.13e+05 3.42e+04 3.74e+04 9.73e+04 9.73e+04 9.73e+04 9.73e+04 9.73e+04

cond.r 7.81e+03 9.92e+03 1.45e+04 2.40e+04 7.06e+04 1.61e+05 3.29e+05 1.92e+05 2.96e+05 3.49e+05 3.92e+05 3.99e+05 4.75e+05 1.62e+06 5.16e+05 5.22e+05 1.48e+06 1.48e+06 1.48e+06 1.48e+06 1.48e+06

mu 1.00e-03 1.00e-03 1.00e-03 6.00e-04 3.60e-04 2.16e-04 2.16e-04 2.16e-04 1.30e-04 7.78e-05 7.78e-05 7.78e-05 4.67e-05 4.67e-05 4.67e-05 2.80e-05 1.68e-05 1.68e-05 1.68e-05 1.68e-05 1.68e-05

step 74.90 20.22 6.95 13.96 13.40 11.46 6.30 7.31 7.78 9.08 5.23 6.23 4.65 2.79 5.05 3.25 0.00 0.00 0.00 0.00 0.00

Nullspace CG algorithm output for the rst 21 iterations.

res.pre 6.21e-16 1.79e-15 5.56e-15 2.81e-15 2.84e-15 5.56e-15 6.66e-15 1.19e-14 1.27e-14 1.64e-14 2.16e-14 9.33e-15 7.99e-15 5.35e-15 1.38e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14

ngrad.r:

res.post 1.79e-15 5.56e-15 2.81e-15 2.84e-15 5.56e-15 6.66e-15 1.19e-14 1.27e-14 1.64e-14 2.16e-14 9.33e-15 7.99e-15 5.35e-15 1.38e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14

norm of reduced

gradient; ngrad: norm of gradient; obj: objective function value; cond: condition number of hessian; cond.r: condition number of reduced hessian; mu: barrier parameter; step: step length; res.pre: norm of residual pre CG; res.post:

norm of residual post CG; maximum of outer iteration:

20;

maximum of CG iteration: 30.

setting. There is an equivalence of the CCQP problem and that of nding the global minimizer of a problem with continuous variables (see for example Ge and Huang, 1989). However, it is insucient to introduce only penalty function to enforce integrability as it risks introducing large numbers of local minimums which can signicantly increase the complexity of the original problem. The idea of the global smoothing algorithm is to add a strictly convex function to the original objective, together with a suitable penalty function. The algorithm then iteratively increases the penalty on non-integrability and decreases the amount of smoothing introduced.

4.3.1 Framework Here we add the cardinality constraint (4.3), and solve the following mixed integer optimization problem,

min f (x) = c> x + 12 x> Hx

(4.13)

x

s.t.

Ax = b yi ≥ xi ≥ 0 P i yi = K

81

(4.14)

i = 1, . . . , n

(4.15)

yi ∈ {0, 1}

(4.16)

where

c, x ∈ Rn×1 , H ∈ Rn×n , A ∈ Rm×n , b ∈ Rm×1

and

m ≤ n.

The transformed problem

becomes:

minn

x∈R

F (x, y)

= f (x) +

X γX (log si + log xi ) yi (1 − yi ) − µ 2 i i {z } | {z } | penalty term

 s.t.

where

e ∈ Rn×1

A

0

0

  0 −I

e>

x

smoothing term





b



    0  y  =  K , 0 s −I

I

is a column vector of one's,



(4.17)

s

is the slack variable,

γ

is the parameter that governs

µ is the parameter that  governs  the  amount of  global  A 0 0 x b       Aˇ =  0 e> 0 , x ˇ =  y  and ˇb =  K ,

the degree of penalty on non-integrability, and

smoothing introduced into the problem. Let

−I

I

−I

s

0

then the gradient and Hessian can be expressed as,

          gˇ = ∇F (x, y) =          

g1 − . . .

µ x1



    gn − xµn    γ  ˇ 11 H 2 (1 − 2y1 )   . ˇ = ∇2 F (x, y) =  , H .  0 .   γ  0 2 (1 − 2yn )   µ − s1   .  .  .

0 ˇ H22

0



 0  ˇ 33 H

0

− sµn where

ˇ 11 = H + diag g = c + Hx, H



µ x2i



ˇ 22 = diag (−γi ) , H ˇ 33 = diag ,H



µ s2i

 .

An outline of the global smoothing CCQP algorithm is given in Algorithm 2. Algorithm 3 is an adaptation of the conjugate gradient method that produces a descent search direction and guarantees that the fast convergence rate of the pure Newton method is preserved, provided that the step length is used whenever it satises the acceptance criteria. The CG iteration is terminated once a direction of negative curvature is obtained.

4.3.2 Empirical Analysis Empirical results to illustrate an application of the

global smoothing algorithm

is based on the daily

closing prices for 500 of the most liquid stocks, as measured by median 21-day daily transacted volume, traded on the main stock exchanges in the US, spanning the period between 2008-01-22 and 2009-01-22. Algorithm 2 illustrates an implementation of the framework to solve the following

82

Algorithm 2 Global Smoothing CCQP iP ←

[form permutation matrix]

h

via pivoting scheme of the QR decomposition of matrix

ˇ =A e= B e ∈ Rm×m e Se , where the full rank square matrix B AP e. partition of A partition of x ˇ into m basic and n − m superbasic variables Given µ0 , σ , ν and γ0 start with feasible value, x ˇ0 , which implies pB ≡ 0 and remain zero for all k←1 that

while not-converged do e k ← P > HP ˇ H is the second

and

n

gek ← P > gˇ,

where

xk

is simply the rst

n

Aˇ,

such

is simply given by the

subsequent iterations

elements of the vector

Px ˇk

and

yk

elements

pS based on (4.12), where we solve the potentially indenite Newton system e k ZpS = −Z > gek based on either the Tuncated CG Algorithm 3, or the Modied Cholesky Z >H

calculate

Factorization Algorithm 4

pk ←



0



pS

calculate the maximum feasible step along

αM

using the backtracking line search method in

Algorithm 5

x ˇk+1 ← x ˇk + αk pk k ←k+1 if kegk+1 k − kegk k > − then µk ← σµk γk+1 ← γk /σ ν

else

µk+1 ← µk

end if end while

[nal result]

and

x ˇk ∗ P > ,

γk+1 ← γk

where

k∗

is the terminating iteration

Algorithm 3 Truncated-CG while not-reached-maximum-iteration do e if d> j Hk dj ≤ ξ then if if j = 0 then pk ← −e gk

else

pk ← zj

end if end if >

e αj ← rj rj/d> j Hk dj zj+1 ← zj + αj dj e k dj rj+1 ← rj + αj H if krj+1 k < k then pk ← zj+1

end if

>

βj+1 ← rj+1 rj+1/rj> rj dj+1 ← −rj+1 + βj+1 dj

end while

83

Algorithm 4 Modied Cholesky Factorization γ ← max (|aii | : i = 1, 2, . . . , n), ξ ← min (aii : i = 1, 2, . . . , n), M aij is the i, j element of the matrix A √ β ← max (γ, ξ/ n2 −1, M ),  > 0 ek A←H for l ← 1, 2, . . . , n do µl ← max {|ˆ | : j = l +o 1, l + 2, . . . , n} naljp all |, µl/β rll ← max , |ˆ Given:

the relative machine

precision, where

ell ← rll2 − a ˆll for j ← 1, 2, . . . , n do rlj ← aˆlj/rll

end for for i ← 1, 2, . . . , n do a ˆij ← a ˆij − rlj rli

end for end for

Algorithm 5 Backtracking Line Search (specialized to solve 1 ≤ x ≤ 0 Given:

α ¯ > 0, ρ ∈ (0, 1), c ∈ (0, 1), α ← α ¯

while F (ˇxk + αk pk ) > F (ˇxk ) + cαegk> pk do α ← ρα

end while αk ← α

specic problem,

min f (x) = c> x + 21 x> Hx x,y

yi − xi = si P i yi = K

s.t.

i = 1, . . . , n yi ∈ {0, 1}

si ≥ 0 x≥0

minn

x∈R

F (x, y)

= f (x) + −µ

X

γX yi (1 − yi ) 2 i

(log yi + log (1 − yi ) + log si + log xi )

i

" s.t.

>

0

e

−I

I

Figure 4.8 shows the converged output for

0 −I

K = 250

#



x



   y = s

"

K

#

0

and Figure 4.9 for

.

K = 100.

While the proposed algorithm is able to solve small to medium scale problems, convergence is often sensitive to how

γ

and

µ

are modied at each iteration. Further research is needed to control

these parameters to ensure consistent convergence performance and to generalize the algorithm to deal with more general constraints. Therefore, we treat this proposed algorithm as a prototype and a proof of concept that such a framework can potentially be adapted to solve a CCQP portfolio

84

0.8 out

0.4 0.0

0

500

1000

x

1500

150 0 50

Frequency

250

Index

0.0

0.1

0.2

0.4

250 150 0

50

Frequency

0.3

y

out[1:500]

0.0

0.2

0.4

0.8

1.0

250 150 0

50

Frequency

0.6

s

out[501:1000]

Figure 4.8: Converged output for

si ;

K = 250.

Panel-2: histogram of

Panel-1: rst 500 variables are

xi ;

Panel-3: histogram of

yi ;

xi ,

second 500 are

Panel-4: histogram of

yi si .

out

0.0

0.4

0.8

and the last 500 are

500

1000

x

1500

100 200 300 400

Index

0

Frequency

0

0.05

0.10

0.15

y

0.20

0.25

0.30

0.35

100 200 300 400

out[1:500]

0

Frequency

0.00

0.4

s

0.6

0.8

1.0

out[501:1000]

0

Frequency

0.2

100 200 300 400

0.0

Figure 4.9: Converged output for and the last 500 are

si ;

K = 100.

Panel-2: histogram of

Panel-1: rst 500 variables are

xi ;

Panel-3: histogram of

85

yi ;

xi ,

second 500 are

Panel-4: histogram of

yi si .

problem.

In the next section, we proposed another approach to solve the CCQP, which is both

robust and scales well to the size of the underlying problem.

4.4

Solving CCQP with Local Relaxation Approach

Without loss of generality, we replace the generic constraint

P

i

xi = 0,

so that it mimics the case of a

dollar neutral

Ax ≥ b in (4.2) by an equality constraint

portfolio. The resulting CCQP problem is

given by

min f (x) = −c> x + λx> Hx x P s.t. i xi = 0 P i 1{xi 6=0} = K. Murray and Shek (2011) proposes an algorithm that explores the inherent similarity of asset returns, and relies on solving a series of relaxed problems in a reduced dimensional space by rst projecting and clustering returns in this space. This approach mimics line-search and trust-region methods for solving continuous problems, and it uses local approximation to the problem to determine an improved estimate of the solution at each iteration. Murray and Shanbhag (2007) and Murray and Shanbhag (2006) have used one variation of local relaxation approach to solve a gridbased electrical substation siting problem, and have demonstrated that the growth in eort with the number of integers is slow with their proposed algorithm vis-à-vis an exponential growth for a number of commercial solvers. The algorithm proposed here is dierent to their algorithm in a number of key areas. One, instead of a predened 2-dimensional grid we project asset returns onto a multi-dimensional space based on exploring statistical properties of historic returns. Second, instead of using physical distance, we dene the distance metric as a function of factor loading, risk aversion and expected return. Third, given the typical cardinality relative to the size of the search space and cluster group size, we cannot assume that the clusters do not overlap, hence we have proposed a necessary probabilistic procedure to assign cluster members to each centroid at each iteration.

4.4.1 Clustering Recall that the goal of minimizing portfolio risk is closely related to asset variances and their correlation - our objective is to maximize expected return of the portfolio, while penalizing its variance. It is intuitive to think that assets from dierent sectors exhibit dierent correlation dynamics. However, empirical evidence seems to suggest this dynamic is weak relative to idiosyncratic dynamic of stocks both within and across sectors. Figure 4.10 shows the ordered covariance matrix for 50 assets, 5 from each of the 10 sectors dened by Bloomberg, such that the rst group of stocks (JCP, CBS,

Consumer Discretionary, the second group of stocks (NHZ, KO, ..., GIS) belong to Consumer Staples, etc. We see that the covariance matrix does not seem to show any noticeable ..., COH) belong to

block structure that would suggest collective sectoral behavior. Covariance matrices calculated using dierent sample periods also conrm this observation. Instead of relying on sector denition to cluster assets, we could use historic correlation of returns as a guide to help us identify assets that behave similarly. First we dene a distance metric,

86

dij

for

JCP CBS ANF M COH HNZ KO CAG PG GIS VLO APA CHK RRC WMB STI PFG SCHW HCN PLD GILD ABT UNH AET BMY FLR ETN NSC WM EMR TXN MFE CSCO NTAP AMD MON PX OI FCX PTV VZ T AMT Q CTL EIX AES FE D PCG

JCP CBS ANF M COH HNZ KO CAG PG GIS VLO APA CHK RRC WMB STI PFG SCHW HCN PLD GILD ABT UNH AET BMY FLR ETN NSC WM EMR TXN MFE CSCO NTAP AMD MON PX OI FCX PTV VZ T AMT Q CTL EIX AES FE D PCG

Figure 4.10: Heatmap of covariance matrix for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01. Darker colors indicate correlation closer to 1 and light colors closer to -1.

a pair of assets

i

and

j, dij =

where

ρij

q 2 (1 − ρij ),

(4.18)

is the Pearson's correlation coecient based on the pairwise price return time series. This

denition penalizes negatively correlated stocks more than stocks that are not correlated, which makes sense as our goal is to group stocks with similar characteristics as measured by the co-

minimum

movement of their returns. Using this denition of distance metric, we can construct a

spanning tree

(MST) (see for example Russell and Norvig (1995)) that spans our asset pairs with

edge lengths given by (4.18). From Figure 4.11, we see that this method has identied some cluster structures, and veried that the dominant dimension is not along sector classication. The MST is a two dimensional projection of the covariance matrix.

An alternative, higher

dimensional, method is to cluster the assets in a suitably chosen orthogonal factor space spanned by a more comprehensive set of spanning basis. eectively project our universe of often

k  N.

N

Given a return matrix,

asset returns onto an orthogonal

R ∈ RT ×N ,

k -dimensional

we aim to

space, where

We can write

R = Pk Vk> + U, where

Pk ∈ RT ×k

is the return of our

k

factors,

Vk ∈ RN ×k

is the factor loadings and

the matrix of specic returns. This reduces the dimension of our problem from the

i-th

column of the return matrix

R, ri ∈ RT ,

to

k,

ri = v1,i p1 + v2,i p2 + · · · + vk,i pk + ui ,

is

such that

can be written as a linear combination of the

factors

87

N

U ∈ RT ×N

k

Figure 4.11: A

Minimum spanning tree

for 50 actively traded stocks, over the sample period 2008-

01-10 to 2009-01-01. Using distance metric dened in (4.18). Each color identies a dierent sector.

where and

U

vi,j ∈ R is the (i, j) component of the matrix Vk , pi respectively. One method to identify the

k

and

ui

factors is by

are the i-th column of the matrix

principal component analysis

P

(PCA).

An outline of the procedure can be expressed as follows,

.

obtain spectral-decomposition

14

:

1 T

R> R = V ΛV > ,

of the eigenvalues along the diagonal of

.

form principal component matrix:

.

nally take rst

k

columns of

P

V

V

and

Λ

are ordered by magnitude

Λ,

P = RV ,

and

where

and

to give

Pk

and

Vk

, known as the principle component

and factor loading matrix, respectively. Once we have identied the

k -dimensional

space, then the proposed clustering method works as

follows:

.

dene a cluster distance metric and

.

j -th

d (ri , rj ) = kvi − vj k2 ,

row of the factor loading matrix

where

v i ∈ Rk

and

vj ∈ R k

are the

i-th

Vk .

lter outliers in this space by dropping assets with distance signicantly greater than the median distance from the

center-of-gravity 15

14 Here we assume columns of R have 15 The center-of-gravity is dened in

of the

k -dimensional

space;

zero mean. the usual sense by assigning a weight of one to each asset in the projected space, then the center is dened to be the position where a point mass, equal to the sum of all weights, that balances the levered weight of the assets.

88

GIS

AET ABT PG

AET AES T CAG

FCX COH HCN PCG UNH D M RRC EIX MFE NSC MON ETN WM BMY TXN HNZ EMR JCP OIPFG PX ANF FLR WMB GILD APA Q STI

T

pca.3

pca.2

SCHW

WMB

MFE D

STI UNH PX FLR APA Q ANF PFG AES ETNNSC TXN HNZ EMR OI WM FCX BMY MON JCP GILD PCG EIX

COH M HCN

CHK

RRC

KO

SCHW

VZ KO CHK PLD

VZ

NTAP AMT VLO

PG

PLD

CAG

AMT

CSCO

ABT

CSCO

VLO NTAP

AMD AMD

GIS

pca.1

pca.2

AET

AES T

SCHW

CHK VZ

KO

pca.2

pca.3

PCG UNH D RRC MFE NSC EIX MON ETN WM BMY TXN HNZ EMR PFG JCP OI PX ANF FLR WMB GILD APA Q STI

HCN M

pca.3

FCX

COH

PLD NTAP

AMT

VLO

PG CAG ABT

CSCO AMD GIS

pca.1

Figure 4.12:

k-means clustering

pca.1

in a space spanned by the rst three principal components based

on log returns for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01.

.

identify clusters (using for example

k -means

clustering;

see Hastie et al. (2003) for details) in

this metric space. Figure 4.12 illustrates the proposed clustering method using the returns of 50 actively traded US stocks. Here we have set the number of clusters to three and have dropped four outliers stocks (FE, CBS, CTL and PTV) based on the criteria listed above, and then projected the remaining 46 returns onto a three-dimensional space spanned by the three dominant principal components.

Distance Measure

s,

in the projected space

dm (s; ao ) = σd kvs − a0 k2 + (1 − σd ) (kck∞ − |cs |) ,

(4.19)

We dene the distance measures for an assets,

relative to an arbitrary reference point,

where

cs

is the

s-th

a0 ,

by

element of the vector of expected return,

we use the tuning parameter

σd

c.

The economic interpretation is that

to control the trade o between picking stocks highly correlated

and stocks that have higher expected return. Note that

σd = 1

corresponds to the simple Euclidean

distance measure.

Identify Cluster Member

Once the initial starting

K centroid assets have been identied, the center-of-gravity method, dened for the

proposed algorithm identies the member assets by the

89

i-th

cluster as

∗(i)

Vk> xr

gi =

∗(i) ,

xr where the vector of assets in the centroid

i,

∗(i)

xr

i-th

corresponds to optimal weights (see

(4.20)

relaxed-neighborhood QP

in Section 4.4.2)

cluster. The member assets are picked according to the degree of closeness to

dened by the distance measure

dm (s; gi )

for asset

s.

Once the corresponding member assets have been identied for each cluster, the algorithm will propose a new set of

K

centroids at the end of each iteration. For the next iteration, new member

assets needed to be calculated for each cluster. The composition of each new cluster clearly depends on the order that the centroids are picked.

For example, clusters

A

and

B

could be suciently

close in the projected space, such that there exists a group of assets which are equally close to both clusters. Since assets cannot belong to more than one group, so once cluster the asset drops out of the feasible set of assets for cluster Let

pi

be the probability of centroid asset

i

B.

A has claimed an asset,

being picked for cluster assignment, we propose the

following logistic transform to parametrize this probability:

pi log = αs 1 − pi where

x∗o,i

! ∗ x 1 o,i P ∗ − , K i xo,i

corresponds to the optimal weight (see

centroid-asset QP

(4.21)

in Section 4.4.2) of the

i-th

centroid asset, such that the higher the centroid weight the greater the probability that it will have a higher priority over the other centroids in claiming its member assets.

4.4.2 Algorithms Algorithm for identifying an initial set of starting centroids, So

There are a number of

possible ways to prescribe the initial set of centroid assets.

.

k-means 

(Algorithm 6)

form

K

clusters, where

spanned by

So ⊂ S ,

R (Pk ).

K

is the cardinality of the problem, in the projected factor space

Then in the projected space, identify the set of

K

centroid-assets,

which are stocks closest to the cluster centers (note a cluster center is simply a

point in the projected space, and does not necessarily coincide with any projected asset returns in this space);

.

Successive truncation (Algorithm 7)



at each iteration we essentially discards a portion of assets which have small weights, until we are left with the appropriate number of assets in our portfolio equal to the desired cardinality value.

90

Algorithm 6 K -means in factor space R, K 1 > TR R [Singluar Value Decomposition] Given:

Ω←

Vk ←

rst

k

columns of

[Random initialization]

Ω = V ΛV > V (0) (0) (0) m1 , m2 , ..., mK

while not-converged do for i = 1 to K don

E-Step:

M-Step:



o

(n) (n) = vj : vj − mi ≤ vj − m˜i , ∀˜i = 1, . . . , K P (n+1) 1 ˛ mi = ˛˛ (n) (n) vj ˛ vj ∈C (n)

Ci

˛Ci

end for end while

i

˛

Algorithm 7 Successive truncation N, I j←0 S0 ← S; n0 ← N ; φ0 ← 0

Given:

repeat

[Solve QP] for

x∗

where

x∗j = arg min x

−c> x + 21 x> Hx

s.t.

e> x = 0 x ∈ Sj

if

arithmetic truncation then return φj ← max (nj − bN/I c , K) else return φj ← max (b(1 − K/N ) nj c , K) end if 

Sj+1 ← s(n) , s(n−1) , . . . , s(φj ) nj+1 = |Sj+1 | − φj until φj = K

91

Algorithm for iterative local relaxation

In the main Algorithm 8, at each iteration, we solve

a number of small QPs for subsets of the original set of assets, assets belonging to one of the

K

So ⊂ Sr ⊆ S ,

where

Sr

is the set of

clusters identied. The QPs that we solve in the main algorithm

include the following:

.

Globally-relaxed QP min −c> x + 21 x> Hx x

e> x = 0

s.t.

x ∈ S, where

c

and

H

are the expected return and covariance matrix for the set of assets,

column vector of

.

S; e

is a

1's.

Centroid-asset QP 1 > min −c> o xo + 2 xo Ho xo x

e> xo = 0

s.t.

x ∈ So , where

So ,

co

and

only.

Ho

This gives optimal weight vector

corresponding to

.

are expected return and covariance matrix for the set of centroid assets,

i-th

x∗o ,

where

x∗o,i

is the

i-th

element of this vector

so,i .

centroid asset,

Relaxed-neighborhood QP 1 > min −c> r xr + 2 xr Hr xr w

∗ e> i xr = xo,i

s.t.

i = 1, . . . , K,

xr ∈ Sr , where

ei

is a column vector with

1's

assets and are zero everywhere else;

cr

is the set of

neighbor

centroid cluster

Hr is the correK ∪i=1 N (so,i ), where

is the vector of expected returns and

sponding variance-covariance matrix of our sub-universe of assets,

N (so,i )

i-th

at positions that correspond to the

assets belonging to the cluster with

Sr = so,i

impose the constraint that the sum of weights of the neighbors of the

as the centroid.

i-th

We

centroid add up to

∗ the centroid weight, xo,i .

Algorithm for identifying new centroids of the

K

clusters. We generate

cluster to give

kmk∞

Φ2

{mk }k=1,...,K

be the number of members in each

number of centroid proposals based on assets within the same

Φ1 , . . . , Φkmk∞ . Φ1 ∈ RK

to the centroid;

Let

is a vector of the set of closest distances, dened by (4.19),

is the vector of distances that are second closest, etc. For cluster group

mk ≤ kmk∞ , we pad the last kmk∞ −mk away from the centroid in cluster

k.

k

with

entries with distance of the cluster member that is farthest

Once we have generated the centroid proposals

{Φi }i=1,...,kmk



,

Algorithm 9 then loops over each of the proposals. During each loop, the algorithm replaces the

92

Algorithm 8 Local relaxation in projected orthogonal space αs , σd , Mmin , Mmax , K , η¯, o , S x∗ ← solution of globally-relaxed QP Ξ ← ranked vector x∗ So ← initial centroids by Algorithm 7 or Given:

while not-converged do

by Algorithm 6

(x∗o , fo∗ ) ← solution of centroid-asset QP ` ←Ξ Ξ  while Iε = i : x∗o,i < o 6= Ø do for each i ∈ Iε do  s ← s`1 , where s = so,i ∈ So : x∗o,i < o , ` ←Ξ ` \ {` Ξ s}

end for x∗o ←

solution of

end while [Generate

centroid-asset

and

` \ So . s` ∈ Ξ

QP

random-centroid-asset-order, Io˜] Sampling, without replacement, from the index set

Io = {1, 2, . . . , K}

with probability in (4.21)

pi =

1 + exp αs

!!!−1 ∗ x 1 o,i P ∗ − K i xo,i

S` ← S \ So for {so,i : so,i ∈ So , i ∈ Io˜} do mi ← max (bMmax pi c , Mmin ) `], such that [Identify mi neighbors of so,i , N (so,i ) ⊂ S n o

 s ∈ S` : σd vs − vso,i 2 + (1 − σd ) (kck∞ − |cs |) < r ⊂ |N (so,i )| = S` ← S` \ N (so,i ) ,

end for S Sr ← x∗r ← S0 ←

if

K i=1 N (so,i ) solution of

relaxed-neighborhood

QP

output of Algorithm 9

[max-iteration-reached OR max-time-limit-reached]

break end if end while

93

then

N (so,i ) mi − 1

Algorithm 9 Identify new centroids Given:

Υ, {gi }i=1,...,K

[Generate centroid proposal list]

for i = 1 to kmk∞ do Φi ←

end for

proposal of

K

centroid assets with distance ranked

i-th

closeset based on (4.19)

j ← 1; η ← 1; exit-loop←false while exit-loop is false do for i = 1 to K do so,i ← Φj,i f ← objective function if f < fo∗ then exit-loop←true

value of

centroid-asset

QP

break end if end for

j ←j+1 if j > kmk∞ then if η ≥ η¯ then goto Algorithm 10

else

Mmax ← bΥη Mmax c η ←η+1

end if end if end while

current centroids with progressively less optimal proposals, in the sense of larger distance measure from the cluster centroid assets, and breaks out as soon as there has been an improvement to the objective function. If none of the proposals gives a lower objective function value, the algorithm then attempts to swap centroids with assets from other clusters. The choice of substitution is governed by the expected returns of the assets, weighted by the magnitude of the relaxed solution from QP, to give a vector

$∈R

P

i mi

relaxed-neighborhood

,

>  $ = x∗o,1 cs∈N (so,1 ) , . . . , x∗o,2 cs∈N (so,2 ) , . . . , . . . , x∗o,K cs∈N (so,K ) , where

cs∈N (so,k )

is a row vector consists of the expected returns for assets belonging to the

k -th

cluster.

4.4.3 Computational Result Preparation of Dataset

Empirical results in this section are based on two key datasets. The rst

dataset consists of daily closing prices for 500 of the most liquid stocks, as measure by mean daily transacted volume, traded on the main stock exchanges in the US, spanning the period between 2002-01-02 to 2010-01-22. The second dataset consists of 3,000 monthly closing prices for actively traded stocks in the US, spanning the period between 2005-05-31 to 2010-04-30. These two datasets test the performance of our algorithm under dierent market environments and for dierent scales

94

Algorithm 10 Swap centroids k ← 1; q ← kSr k

while exit-loop

is

false do

so,k ← $q f ← objective function if f < fo∗ then η←0 exit-loop←true

value of

centroid-asset

QP

end if

k ←k+1

if k = K then q ←q−1 k←1 if q ≤ 0 then exit-loop←true

end if end if end while

of the problem. The results based on the proposed algorithm are compared with those obtained by the successive truncation algorithm in Algorithm 7 and by the branch-and-cut MIQP solver used in CPLEX, a leading commercial optimization software.

Cleaning and Filtering

We follow a three step procedure to clean, lter and prepare the datasets

for analysis:

.

Recorded price validation.

We discard assets that do not have a complete price history over

the whole sample period;

.

NAICS sector matching. We purge any asset whose ticker cannot be matched to the latest North American Industry Classication System (NAICS) sector denition. This is in place to facilitate potentially imposing sectoral constraints.

.

PCA outliers detection.

We project the returns, dened as the dierence in log-prices, onto a

4-dimensional space spanned by the dominant PCA factors. Any assets with projected return more than two standard deviations away from the projected median is considered an outlier asset and dropped.

Factorization of Covariance Matrix

For the dataset with 3,000 assets, since

N T

so the

covariance matrix from raw returns is rank decient. We take the necessary step of factorizing the raw covariance matrix by using the rst four dominant PCA factors. This ensures that the resulting covariance matrix is both full rank and well conditioned.

The rst four factors together capture

35.07% of the total variance for the 500 stock case, and 38.15% for the 3,000 stocks case.

4.4.4 Algorithm Benchmarking Successive Truncation

Figure 4.13 shows the result for the fully relaxed problem (i.e. without

the cardinality constraint), ordered by value of the optimal weights. Positive weights mean that we

95

4

3

Weight

2

1

0

−1

HCC TRK DD TD CRY DGX ACO IBOC WTT VAR SPLS USTR DOW ANAT KEI MDU DNR MSM STFC STC WDR CTB TRST PEG HLX HAL DRAM WWW WTNY PKE GNTX ETM IDCC JOE EMR RHI BVF SBP LZ STBA VII CTT AMLN PFG SHS TMK SR CHD MTX CCF

−2

Tickers

Figure 4.13: Fully relaxed (i.e. without cardinality constraint) QP solution for 500 actively traded US stocks (not all tickers are shown) sampled over the period from 2002-01-02 to 2010-01-22, with dollar neutral constraint.

long those assets and negative weights correspond to shorting those assets. As we impose the conP straint, i xi = 0, so the net dollar exposure of the result portfolio is zero. This is one interpretation of a market neutral portfolio. An alternative denition of market neutrality requires neutralizing go

specic factors of the net portfolio exposure, by eectively taking into account correlation between dierent assets. We apply the successive arithmetic truncation algorithm given in Algorithm 7 to CCQP. Table 8 gives the result for the 500-choose-15 (i.e. solving a 500 asset problem with cardinality equal to 15) case. It can be seen that if the number of iterations is set to one-tenth of the size of the universe, i.e. by discarding a large portion of stocks from the feasible set at each iteration, the method is able to obtain a result quickly. However, as we will show later, if we need a higher degree of optimality, this simple method often fails to deliver. Also, note that it is not guaranteed that the more iterations we use, the better the solution. Table 9 shows the result for the 3,000-choose-15 case, we see that the algorithm does not give monotone improvement of the objective function. For comparison, Table 10 shows the result for geometric truncation for the 3,000-choose-15 case.

Again, we notice that

increasing the number of iterations does not necessarily increase optimality. Both of these truncation methods are widely used in practice.

Although there is clearly no

dispute on the speed of this heuristic algorithm, we later show that it can be signicantly inferior than algorithms that are able to explore the solution space more eectively, at only a small cost of execution time. Throughout the rest of this paper, we refer to successive arithmetic truncation as simple succes-

96

Number of

Objective

Time

Optimal

Iterations

Value

(sec)

Assets

AGII BIG CPO ETH FBP FICO HCC IGT LPNT PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO DYN ETH FICO HCC NWBI PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO EMC ETH FICO HCC IGT PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO DYN EMC ETH FICO HCC PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO ETH FICO HCC IGT PFG RYN SRT TAP TRK UIS WPP

10

-0.00615

0.246

20

-0.00637

0.455

50

-0.00629

0.832

100

-0.00635

1.677

500

-0.00638

8.073

Table 8: Successive arithmetic truncation method result for 500 of the most liquid US stocks, with cardinality,

K

= 15.

Number of

Objective

Time

Optimal

Iterations

Value

(sec)

Assets

10

-12.08450

33

100

-12.13756

210

300

-12.23406

561

500

-12.23406

929

1,000

-11.72131

1,839

3,000

-11.72131

5,536

AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE EDMC MJN SEM TLCR VRSK AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN SEM TLCR TMH VRSK AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK SEM TLCR VRSK AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK

Table 9: Successive arithmetic truncation method result for 3,000 of the most liquid US stocks, with cardinality,

K

= 15.

Number of

Objective

Time

Optimal

Iterations

Value

(sec)

Assets

10

-12.37960

19

100

-11.72131

63

300

-11.72131

161

500

-11.72131

290

1,000

-11.72131

507

3,000

-11.72131

1259

AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE LOPE MJN SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK

Table 10: Successive geometric truncation method result for 3,000 of the most liquid US stocks with cardinality constraint,

K

= 15.

97

● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ●● ●●● ●●●● ● ● ● ● ●●● ●● ●● ●●● ● ●● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ●● ● ● ●● ●









● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●●●● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ●● ●● ●●●●●● ● ●● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●● ●●● ●●●●●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ●●● ● ●● ●● ●●● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ●● ● ●● ●● ●● ● ●● ●●● ● ●● ● ● ●● ●●● ● ● ● ●●● ● ● ● ● ● ●

pca.4

pca.3

pca.2

● ● ● ● ● ●●

● ● ● ● ● ●

● ●

● ● ● ●



● ●

pca.1



●● ●

pca.1

pca.1





● ● ●●●● ● ● ●●● ● ● ● ● ● ●●● ●● ●●●●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●●●●● ● ● ● ●● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●●● ●●●● ●● ● ●● ● ● ●●● ●●● ● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ●● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ●●● ●● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●







● ●



● ●

● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●

pca.2

pca.2



●● ● ● ● ●● ● ●● ● ● ●●

● ●● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●●●● ●● ● ●● ● ●● ● ● ●● ●● ● ●●●● ●● ●●● ● ● ●● ● ● ●●●●● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●●● ●● ● ● ●● ● ●● ●● ● ● ● ● ●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ●● ●●● ● ● ● ● ●●●● ●● ● ●

pca.4

pca.3



● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ● ●● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●●●● ● ●● ●● ● ●● ● ● ●●●●● ● ●

pca.4



● ●



● ●●●● ● ● ● ●● ● ● ● ● ● ●



●● ●

pca.3

Figure 4.14: Projection onto a 4-dimensional space spanned by the rst four dominant PCA factors, based on daily returns for the most liquid 500 US traded stocks, sampled over the period from 2002-01-02 to 2010-01-22. Symbols with dierent colors correspond to dierent cluster groups.

sive truncation, unless otherwise stated.

Local Relaxation and CPLEX MIQP

Each of the six panels in Figure 4.14 represents a 2-

dimensional view of the 4-dimensional PCA projection of the returns of our 500 assets. In the same gure, we have also shown the initial

K

clusters, indicated by dierent colors and symbols.

The

local relaxation method that solves a CCQP with cardinality equal to 15 may be initialized with the cluster centroids assets, iteratively solve for relaxed but smaller QPs by including a subset of the cluster members, then followed by moving to a new set of centroids with strictly increasing optimality. A prototype of the local relaxation algorithm has been implemented in

R

running in a 64-Bit

Linux kernel. The solution of the relaxed QP's required by the new algorithm and the truncation method are found using the QP solver in

R,

which is based on Goldfarb and Idnani (1982).

To

help assess the performance of our proposed algorithm, we present three solved problems, each with dierent universe sizes and cardinality values. The sets of parameters used in the algorithm for these three problems are given in Table 11. For comparison, we have implemented CPLEX v12 in a 64-Bit Linux kernel using the CPLEX C++ API to access the built-in MIQP solver, which is based on the dynamic search branch-and-cut method (IBM, 2010). The code is set to run in parallel using up to eight threads on a Quad Intel Core i7 2.8GhZ CPUs with access to a total of 16GB of memory. Table 12 shows the parameter settings used. Table 13 shows the result, for the 500-choose-15 case, with a run-time cut-o set to

98

(500, 15)

Mmin Mmax σd αs 0 η¯ Υ Table 11:

Parameter setup for

(3000, 15)

(3000, 100)

5

5

3

30

30

10

0.2

0.2

0.2

80

80

80

1 × 10−5

1 × 10−5

1 × 10−5

10

10

10

1.1

1.2

1.8

local relaxation

method.

(500, 15) means solving a 500 universe

problem with cardinality equal to 15. Parameter

Setting

Preprocessing Heuristics Algorithm MIP emphasis MIP search method Branch Variable Selection Parallel mode

true RINS at root node only branch-and-cut balance optimality and feasibility dynamic search automatic deterministic, using up to 8 threads

Table 12: CPLEX parameter settings (with the rest of the parameters set to their default values).

1,200 sec. Figure 4.15 shows the performance comparison between CPLEX and the new algorithm as a function of run-time. We see that the local relaxation method is able to obtain a better solution than CPLEX right from the beginning, and it strictly dominates CPLEX throughout the testing period. Next, we assess the scalability of the new algorithm, by switching to the larger dataset which consists of 3,000 actively traded stocks. As before, we rst project the factored covariance matrix onto a 4-dimensional PCA space and then identify the initial

K

clusters based on

k-means,

as

shown in Figure 4.16. Table 14 shows the computational result. We see that the local relaxation method is able to reach a signicantly better result than CPLEX quickly. Figure 4.17 shows the performance comparison of the local relaxation method verses CPLEX for the 3,000 choose 15 case, which illustrates dominance of the proposed algorithm over CPLEX for a prolonged period of time.

Local Relaxation with warm start

One additional feature of the local relaxation algorithm

is that it can be warm started using the result from another algorithm or from a similar problem

Method

Objective

Time

Optimal

Value

(sec)

Assets

CPLEX

-0.00796

1200

Local

-0.00870

1200

AOI AYI CCF CPO DD DST DY FBP HCC LRCX MRCL NST TMK UIS WPP AOI AVT AYI BIG BPOP CCF CPO CSCO DST HCC HMA IBM MCRL TAP ZNT

Relaxation

Table 13: Computational result for the 500 asset case with cardinality, the

local relaxation

K

= 15. For CPLEX and

algorithm, we have imposed a maximum run time cuto at 1,200 seconds.

99

0.000 −0.004 −0.006

Value obtained by successive truncation

−0.008

Objective function value

−0.002

CPLEX Local Relaxation

0

100

200

300

400

500

600

Time, sec

Figure 4.15: CPLEX versus with cardinality,

K

Method

local relaxation

method performance comparison for the 500 asset case,

= 15, both are cold started.

Objective

Time

Optimal

Value

(sec)

Assets

-12.08425

36

AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE EDMC MJN SEM TLCR VRSK

CPLEX

-12.30410

25200

Local

-12.76492

7200

AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN SEM TLCR VECO VRSK AOL AONE ARST ART BPI CFL CFN CIE CIT CVE DGI DOLE MJN SEM TLCR

Successive Truncation

Relaxation Table 14:

Computational result for the 3,000 asset case, with cardinality,

truncation is done over 10 iterations.

100

K

= 15.

Successive

● ●





●●

● ● ●

●●



● ●

●●



● ●

● ● ● ● ●

●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●●● ● ● ●● ● ● ●● ● ● ●●● ●● ● ● ● ● ●● ●●●● ●●● ● ●● ● ● ●● ●●● ●● ●● ● ●● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●●● ●● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ●●● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●● ● ● ●● ● ●● ●● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ● ● ●●● ● ●●● ● ●● ●● ● ● ●●● ● ●● ●●●● ● ● ● ●●● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●

pca.4

● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ●● ● ●●● ●● ● ●●● ●● ●●● ● ●● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ●● ●● ● ● ●● ● ● ●●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●●● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ●● ●● ●●●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ●●●● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●●●● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●●●● ●● ● ●●●●● ● ●● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●

● ●

pca.3

pca.2

● ● ● ●●





●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●●● ●●● ●● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●● ● ● ●●● ● ● ● ●●●● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ●●●● ● ● ● ● ●● ●● ● ●●● ●● ● ● ●●● ●● ●● ● ●● ●● ● ●● ● ●● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ●● ●● ●●● ● ●● ●● ● ●●● ●● ●● ●● ●● ● ● ●● ●● ●● ●● ●● ●● ● ● ●● ● ●●●● ● ● ●●● ●● ●● ●● ● ● ● ●● ● ● ●● ●● ● ● ●●● ●● ● ● ●●● ● ● ●●●● ● ● ● ●● ● ● ● ●●●● ●●● ●● ● ● ● ● ●● ●● ● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●●●● ●● ●●●● ● ● ●●●● ●●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●●●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●●●●●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●



● ●

● ●

● ●

● ●



pca.1

pca.1

pca.1

● ● ●

pca.2

Figure 4.16:



●●

●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●● ● ● ●●● ● ● ●● ● ● ●●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●●● ● ●●● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ●●● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●● ● ● ● ●● ● ● ●●● ●● ● ● ●● ● ●●●●● ● ●● ●● ●● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●●● ● ●



pca.2

pca.4



pca.4

pca.3

● ●

● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ●● ●● ●● ●● ● ●● ● ●●● ● ●●● ● ●● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ●●●● ● ● ●● ● ●●●●● ● ●● ●● ● ●●● ● ● ●●●●● ● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●





●● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ●● ● ●●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ● ●● ●● ●● ●● ● ●●●●● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●

pca.3

Projection onto a four dimensional space spanned by the rst four dominant PCA

factors, based on monthly returns for the most liquid 3,000 US traded stocks, sampled over the period from 2005-05-31 to 2010-04-30. Symbols with dierent colors correspond to dierent cluster groups.

101

−8 −10

Objective function value

−6

CPLEX Local Relaxation

−12

Value obtained by successive truncation

0

1000

2000

3000

4000

5000

6000

7000

Time, sec

Figure 4.17: CPLEX versus

local relaxation method performance K = 15, both cold started.

comparison for the 3,000 asset

case, with cardinality constraint

directly. We have seen that the simple successive truncation algorithm could often lead to a good local minimum with little computational cost. To take advantage of this, we propose warm starting the local relaxation algorithm using the result from successive truncation as the initial centroid assets. Since local relaxation method is strictly feasible and monotone improving, we can set a time limit and take the result at the end of the period. This way, we are guaranteed a solution no worse than the one obtained by successive truncation. This hybrid approach gives the best combination and can be shown to be signicantly more ecient than a warmed started CPLEX solve. Figure 4.18 shows the result, solving a 500 asset CCQP with

K = 15,

by warm starting the local

relaxation method with output from successive truncation, based on 20 iterative truncations. It can be seen that the new algorithm dominates the CPLEX result. Figure 4.19 shows the result, solving a 3,000 asset CCQP with

K = 100,

by warm starting the

local relaxation method with output from successive truncation. Note that at least for this instance, CPLEX has never quite managed to nd a minimum that comes close to the one obtained by the proposed algorithm, even after it has been left running for 35 hours.

Warm starting CPLEX with Local Relaxation strength of the

local relaxation

It is reasonable to conjecture that the

algorithm is in its ability to explore structure of the problem, and

hence in identifying the sub-group of assets within which the optimal solution set lies. To see whether the branch-and-cut method in CPLEX is able to improve on the solution from the new algorithm once the algorithm has identied a sub-asset group, we propose the following test:

102

−0.0085

−0.0075

−0.0065

Successive truncation

−0.0095

Objective function value

CPLEX Local Relaxation

0

500

1000

1500

2000

2500

3000

3500

Time, sec

Figure 4.18:

CPLEX versus

local relaxation

method performance comparison for the 500 assets

K = 15;

Both methods are warm started using the solution of

universe, with cardinality constraint,

successive truncation over 20 iterations. Maximum run-time is set to one hour.

103

−18.2

−18.0

−17.8

CPLEX Local Relaxation

−18.4

Objective function value

−17.6

Value obtained by successive truncation

0

5000

10000

15000

20000

25000

Time, sec

Figure 4.19: CPLEX versus

local relaxation

universe, with cardinality constraint,

method performance comparison for the 3,000 assets

K = 100;

Both methods are warm started using the solution

of arithmetic successive truncation over 20 iterations. Maximum run-time is set to seven hours.

104

−10 −12 −14 −16 −18

Objective function value

Value obtained by Local Relaxation

0

500

1000

1500

2000

2500

3000

3500

Time, sec

Figure 4.20: Apply CPLEX MIQP to the union of the cluster groups identied by

local relaxation

algorithm upon convergence (the beginning of the at line in Figure 4.19). Maximum run-time is set to one hour.

local relaxation

.

solve the CCQP with

.

once the algorithm has reached a stationary group of assets (here stationary refers to the

algorithm;

observation that the cluster groups do not change over certain number of iterations), stop the algorithm and record the union of assets over all identied cluster groups;

.

solve the CCQP to optimality only for this group of assets using CPLEX.

To illustrate the idea, we apply the above procedure to the result for the 3,000-choose-100 case. In Figure 4.19 we see that the algorithm appears to have converged after 15,916 seconds. We take a union of the assets in the 100 clusters from the

local relaxation

algorithm, then feed these assets

into CPLEX. Figure 4.20 shows the output from CPLEX at each iteration. It can be seen that the CPLEX branch-and-cut algorithm takes almost 20 minutes to reach a level close to the result given by the

local relaxation

method, and never quite reach it within the run-time limit of one hour.

Mean-variance ecient frontier tolerance parameters is known as

The set of optimal portfolios formed based on dierent risk

frontier portfolios,

in the sense that all portfolios on the frontier

are optimal, given the specic risk tolerance of an investor. Therefore, a mean-variance optimizing agent will only choose frontier portfolios. Figure 4.21 shows this ecient frontier for our universe of 3,000 US stocks. We observe that, in-sample, the CCQP solutions are strictly dominated by those based on QP - a consequence of the cardinality constraint which

cuts o

parts of the feasible space.

We observe that this domination increases as we decrease the cardinality of the problem, as larger

105

QP CCQP, K=100 CCQP, K=15

350

300

Mean, %

250

200

150

100

50

1500

1000

500

0

0

Variance, %

Figure 4.21: Mean-variance ecient frontiers for a 3,000 asset universe. QP is the frontier without the cardinality constraint. CCQP is the frontier in presence of cardinality. To produce the frontier for CCQP, we warm-start the local relaxation algorithm, based on the successive truncation solution, and set a maximum run time limit of 252,000 seconds for the case where for the case where

K = 15.

106

K = 100

and 3,600 second

parts of the feasible space becomes inaccessible to the CCQP. However, in practice, the robustness of forming a portfolio with smaller subset of assets with higher expected returns can often lead to better out of sample performance.

107

5

Conclusion

In this thesis, new approaches to model the three key statistical and algorithm aspects of optimal portfolio construction have been presented. For the rst aspect, we have proposed a complete model for returns and realized measures of volatility,

xt ,

where the latter is tied directly to the conditional volatility

ht .

The motivation is

to include high frequency data in a rigorous way in a classic GARCH framework that has so far been used predominately in dealing with less frequently sampled data, hence without having to be concerned with market microstructure noise.

We have demonstrated that the model is straight-

forward to estimate and oers a substantial improvement in the empirical t, relative to standard GARCH models. The model is informative about realized measurement, such as its accuracy. The proposed framework induces an interesting reduced-form model for

{rt , ht },

that is similar to that

of a stochastic volatility model with leverage eect. Our empirical analysis can be extended in a number of ways. For instance, including a jump robust realized measure of volatility would be an interesting extension, because Bollerslev, Kretschmer, Pigorsch, and Tauchen (2009) found that the leverage eect primarily acts through the continuous volatility component. Another possible extension is to introduce a bivariate model of open-to-close and close-to-open returns, as an alternative to modeling close-to-close returns, see Andersen et al. (2011). The

m

naturally extended to a multi-factor structure. Say ables. The

Realized GARCH

Realized GARCH

realized measures and

k

latent volatility vari-

model discussed in this thesis corresponds to the case

the MEM framework corresponds to the case

m = k.

conduct inference about the number of latent factors,

framework is

k = 1,

whereas

Such a hybrid framework would enable us to

k.

We could, in particular, test the one-factor

structure, conjectured to be sucient for the realized measure used in this paper, against the twofactor structure implied by MEM. For the extension of

Realized GARCH

to multivariate settings, we

could explore more general frameworks by loosening the assumption that volatility and correlation operate at dierent time scales. The increasing role that high frequency nance plays in the market has also motivated the part of the thesis on expected return forecast. The proposed model is based on self-excited point process augmented by trade size and limit order book information. The motivation for incorporate these two potential

marks

is based on underlying economic intuition.

First, the trade size mark

acts as a mechanism to dierentiate large orders that is more likely to induce price movement than small noise trading of insignicant impact.

Second, information from the limit order book

adds an extra dimension to gauge supply-demand imbalance of the market. Game theoretic utility balancing argument aside, a more sell order loaded book will signal that more investors are looking for an opportunity to liquidate or sell short a position, and

vice versa

for a buy order loaded book.

Nowadays, sophisticated execution algorithms contribute to over 60% of total trading volume on the NYSE, with similar dominance in other markets. These algorithms aim to minimize market impact of large orders, by slicing them into smaller units and then submitting to dierent layers of the limit order book at optimal times. The proposed frameworks in this paper are tested using the LSE order book data and result suggests that the inclusion of limit order book information can lead to a more robust estimation when compared with the basic bivariate model, as dened by goodness-of-t of the model implied distribution to the empirical distribution of the inter-arrival times of the bid and ask side market orders. As for trade size, result suggests that it could have an adverse eect on model performance. This could be attributable to the fact that most orders, before they reach

108

the exchange, have been sliced by execution algorithms so that the size is close to the median daily average, so as to minimize signaling eect.

Hence a large portion of the uctuation around the

median volume is noise, which lowers the overall signal to noise ratio. The nal key aspect of the thesis is portfolio optimization with cardinality constraints.

NP -hard

The

nature of cardinality constrained mean-variance portfolio optimization problems has led

to a variety of dierent algorithms with varying degrees of success in reaching optimality given limited computational resources and under the presence of strict time constraints in practice. The computational eort needed to be assured of solving problems of the size considered here is truly astronomical and way beyond the performance of machines envisaged. However, solving a problem exactly is of questionable valued over that of obtaining a "near" solution.

relaxation

The proposed

local

algorithm exploits the inherent structure of the objective function. It solves a sequence

of small, local, quadratic-programs by rst projecting asset returns onto a reduced metric space, followed by clustering in this space to identify sub-groups of assets that best accentuate a suitable measure of similarity amongst dierent assets. The algorithm can either be cold started using the centroids of initial clusters or be warm started based on the outputs of a previous result. Empirical result, using baskets of up to 3,000 stocks and with dierent cardinality constraints, indicates that the proposed algorithm is able to achieve signicant performance gain over a sophisticated branchand-cut method.

109

A

Proof.

Appendix of Proofs [Proposition 19] The rst result follows by substituting

log ht + κ + vt ξ − wt )/ϕ,

into the GARCH equation and rearranging. Next, we substitute

log rt2

= (log xt − ξ − wt )/ϕ + κ + vt ,

log xt −ξ −wt = ϕω + so with

Xp∨q i=1

πi = αi + βi + γi ϕ,

and multiply by

(βi +αi )(log xt−i −ξ −wt−i )+ϕ

Xq j=1

p∨q X

πi log xt−i + wt −

i=1

Proof.

ϕ = 0,

ϕ,

and

log rt2 =

log ht = (log xt −

and nd

γj log xt−j +ϕ

Xq j=1

αj (κ+vt−j )

we have

log xt = ξ(1 − β• − α• ) + ϕκα• + ϕω + When

log xt = ϕ log ht + ξ + wt

the measurement equation shows that

log xt

Xp i=1

(αi + βi )wt−i + ϕ

Xq j=1

αj vt−j .

is an iid process.

[Lemma 22] First note that

  ∂gt0 = 0, h˙ t−1 , . . . , h˙ t−p , 0p+q+1×q =: H˙ t−1 , ∂λ Thus from the GARCH equation,

˜ t = λ0 gt , h

we have that

p

X ∂g 0 βi h˙ t−i + gt . h˙ t = t λ + gt = H˙ t−1 λ + gt = ∂λ i=1 Similarly, the second order derivative, is given by

¨t h

= = = =

∂(gt + H˙ t−1 λ) ∂λ0 ∂gt Ht−1 + H˙ t−1 + λ ∂λ0 ∂λ0 p X ∂ h˙ t−i 0 H˙ t−1 + H˙ t−1 + βi ∂λ0 i=1 p X

¨ t−i + H˙ 0 + H˙ t−1 . βi h t−1

i=1

(h0 , . . . , hp−1 ) being treated as xed or as a ˙ ¨ ¨ t this implies h ¨ 1 = 0. vector of unknown parameters, we have hs = hs = 0. Given the structure of h P t−1 ˙t = When p = q = 1 it follows immediately that h β j gt−j . Similarly we have For the starting values we observe that regardless of

j=0

¨t = h

t−1 X

β (H˙ t−1−j + H˙ t−1−j ) = j

j=0 where

H˙ t = (03×1 , h˙ t , 03×1 )

t−2 X

β j (H˙ t−1−j + H˙ t−1−j )

j=0 and where the second equality follows by

110

H˙ 0 = 0.

The results now

follows by

t−2 X

β i h˙ t−1−i

t−2 X

=

i=0

=

βi

t−1−i−1 X

i=0

j=0

t−2 X

t−i−2 X

βi

i=0

β k−i−1 gt−k

k−i−1=0

t−2 X t−1 X

=

β j gt−1−i−j

β k−1 gt−k

i=0 k=i+1 t−1 X

=

kβ k−1 gt−k .

k=1

Proof.

[Proposition 23] Recall that

ut = x ˜t − ψ 0 mt

and

˜ t = g 0 λ. So derivative with respect to h ˜t h t

given by

∂zt ˜t ∂h ∂ut u˙ t = ˜t ∂h ∂ u˙ t u ¨t = ˜t ∂h So with

˜ t) ∂rt exp(− 12 h 1 = − zt ˜ 2 ∂ ht

=

1 = −ϕ + zt τ 0 a˙ t , 2   ˙ t )0 τ ∂ −ϕ + 21 zt a(z 1 = − τ 0 zt a˙ t + zt2 a¨t . ˜ 4 ∂ ht

=

˜ t + z 2 + log(σ 2 ) + u2 /σ 2 } `t = − 12 {h u t u t

∂`t ut =2 2 ∂ut σu

∂`t 1 =− ˜t 2 ∂h

and

Derivatives with respect to

∂zt2 = −zt2 , ˜ ∂ ht

so that

λ

(

we have

˜t ∂z 2 ∂u2t /∂ h 1+ t + 2 ˜t σu ∂h

) =−

1 2

are

∂zt ∂λ ∂ut ∂λ ∂ u˙ t ∂λ0 ∂`t ∂λ

= =

˜t ∂zt ∂ h 1 = − zt h˙ t ˜ t ∂λ 2 ∂h ˜t ∂ut ∂ h = u˙ t h˙ t ˜ ∂ ht ∂λ

= u ¨t h˙ 0t =

∂`t ˙ 1 ht = − ˜ 2 ∂ ht



111

1 − zt2 +

2ut u˙ t σu2



h˙ t .



1 − zt2 +

2ut u˙ t σu2

 .

are

Derivatives with respect to

∂ut ∂ξ ∂ut ∂ϕ ∂ut ∂τ Similarly,

∂`t 2 ∂σu

ψ

are

= −1, ˜ t, = −h = −at ,

∂ u˙ t ∂`t ∂`t ∂ut ut = 0, and = = −2 2 , ∂ξ ∂ξ ∂ut ∂ξ σu ∂ u˙ t ∂`t ∂`t ∂ut ut ˜ = −1, and = = −2 2 h t, ∂ϕ ∂ϕ ∂ut ∂ϕ σu ∂ u˙ t 1 ∂`t ∂`t ∂ut ut = zt a˙ t and = = −2 2 at . ∂τ 2 ∂τ ∂ut ∂τ σu

= − 12 (σu−2 − u2t σu−4 ).

−2

   ∂z 2 2 ∂ut ∂ u˙ t ∂ h˙ t 2ut = h˙ t − t0 + 2 u˙ t 0 + ut 0 + (1 − zt2 + 2 u˙ t ) 0 ∂λ σu ∂λ ∂λ σu ∂λ    2 2ut = h˙ t zt2 + 2 u˙ 2t + ut u ¨t h˙ 0t + (1 − zt2 + 2 u˙ t )h¨t . σu σu

∂ 2 `t ∂λ∂λ0

Similarly, since

∂zt ∂ψ

∂ 2 `t −2 ∂λ∂ξ ∂ 2 `t ∂λ∂ϕ ∂ 2 `t −2 ∂λ∂τ 0 −2

so that

Now we turn to the second order derivatives.

=0

we have

∂(1 − zt2 +

2ut ˙ t )h˙ t 2 u σu

  u˙ t ˙ = 2ht − 2 + 0 ∂ξ σu     2ut 2 ∂(1 − zt + σ2 u˙ t )h˙ t ∂ut u˙ t ut ∂ u˙ t ut u˙ t u ˙ ˙ ˜ = 2ht + 2 = 2ht −ht 2 − 2 ∂ϕ ∂ϕ σu2 σu ∂ϕ σu σu     ut ∂ u˙ t ut 1 ∂ut u˙ t u˙ t 2h˙ t + 2 = 2h˙ t −a0t 2 + 2 zt a˙ t , ∂τ 0 σu2 σu ∂τ 0 σu σu 2

= = =

= 2h˙ t



ut ∂ u˙ t ∂ut u˙ t + 2 ∂ψ 0 σu2 σu ∂ξ

u˙ t ∂ 2 `t ut = 2 h˙ t m0t + 2 h˙ t b0t , ∂λ∂ψ 0 σu σu ∂ 2 `t ∂λ∂σu2

=



with



1 bt = (0, 1, − zt a˙ 0t )0 . 2

2ut 2 ˙ 1 ∂(1 − zt + σu2 u˙ t )ht ut u˙ t h˙ t = 2 2 ∂σu σu4

∂ 2 `t 1 = − 2 mt m0t ∂ψ∂ψ 0 σu ∂ 2 `t 1 2ut ut = − (− 4 )mt = 4 mt ∂ψ∂σu2 2 σu σu   ∂ 2 `t 1 −1 u2t 1 σu2 − 2u2t = − + 2 = . ∂σu2 ∂σu2 2 σu4 σu6 2 σu6

Proof.

[Proposition 26] We note that

ht h2t

=

=

exp

∞ X

! µ

i

π (µ + γwt−1 )

i=0 ∞ X

exp 2

i=0

= e 1−π !

π i (µ + γwt−1 )

∞ Y

   E exp γπ i τ (zt−i ) E exp π i γut−i ,

i=0 ∞ Y



= e 1−π

i=0

112

   E exp 2γπ i τ (zt−i ) E exp 2π i γut−i ,

and using results, such as

E

∞ Y



i

exp π γut−i

!

i=0

=

∞ Y

∞ 2 P∞ π 2i γ 2 σu   Y = e i=0 E exp π i γut−i = e 2

i=0

2 π 2i γ 2 σu 2

=e

2 /2 γ 2 σu 1−π 2

,

i=0

we nd that

Eh2t (Eht )2

=

=

=

=

 2µ Q ∞ e 1−π i=0 E exp 2γπ i wt−1 2µ Q 2 ∞ e 1−π i=0 {E exp (γπ i wt−1 )} 2 ! P∞ 4π2i γ 2 τ12 2γ 2 σu 2γτ2 ∞ Y 1 − 2π i γτ2 e i=0 2(1−4πi γτ2 ) e− 1−π e 1−π2 p γτ2 2 2 P γ 2 σu π 2i γ 2 τ1 −2 1−π 1 − 4π i γτ2 2 ∞ i=0 2(1−2π i γτ ) e i=0 e 1−π2 2 e ! 2 2 ∞ 2 P∞ 2π 2i γ 2 τ1 π 2i γ 2 τ1 γ 2 σu Y 1 − 2π i γτ2 − p e i=0 (1−4πi γτ2 ) (1−2πi γτ2 ) e 1−π2 1 − 4π i γτ2 i=0 ! P 2 ∞ π 2i γ 2 τ1 2 ∞ γ 2 σu Y 1 − 2π i γτ2 i=0 (1−6π i γτ +8π 2i γ 2 τ 2 ) 2 2 e 1−π 2 p e 1 − 4π i γτ2 i=0

where the last equality uses

π 2i γ 2 τ12 2π 2i γ 2 τ12 − (1 − 4π i γτ2 ) (1 − 2π i γτ2 )

2π 2i γ 2 τ12 (1 − 2π i γτ2 ) − π 2i γ 2 τ12 (1 − 4π i γτ2 ) (1 − 4π i γτ2 )(1 − 2π i γτ2 ) π 2i γ 2 τ12 π 2i γ 2 τ12 = . (1 − 4π i γτ2 )(1 − 2π i γτ2 ) (1 − 6π i γτ2 + 8π 2i γ 2 τ22 )

= =

Recall that

∞ Y 1 − 2π i γτ2 E(rt4 ) p = 3 2 2 E(rt ) 1 − 4π i γτ2 i=0

!

( exp

∞ X i=0

π 2i γ 2 τ12 i 1 − 6π γτ2 + 8π 2i γ 2 τ22

)

 exp

γ 2 σu2 1 − π2

 .

For the rst term on the right hand side, we have

log

∞ Y 1 − 2π i γτ2 p 1 − 4π i γτ2 i=0

ˆ '

=



1 − 2π x γτ2 dx log √ 1 − 4π x γτ2 0 (∞ ) X (2γτ2 )k 1 1 (4γτ2 )k − (1 − 2k−1 ) log π k2 2 k2 k=1

= =

1 log π γ 2 τ22

∞ X (2γτ2 )k k=1



k2

(1 − 2k−1 )

3 1 + 38 γτ2 + 7(γτ2 )2 + 96 5 (γτ2 ) + − log π

496 4 9 (γτ2 )

The second term can be bounded by



X γ 2 τ12 π 2i γ 2 τ12 γ 2 τ12 1 ≤ ≤ . i γτ + 8π 2i γ 2 τ 2 2 1 − 6πγτ 1 − π2 1 − 6π 1 − π 2 2 2 i=0 So the approximation error is small when

γτ2

is small.

113

+ ···

.

Proof.

[Proposition 30] Starting from the assumption that the following SDE is well dened

dλt = β (ρ (t) − λt ) dt + αdNt , then solution for

λt

takes the form

ˆ

t

αe−β(t−u) dNu

λt = c (t) + 0

ˆ

where

t

e−β(t−u) ρ (u) du.

c (t) = c (0) e−βt + β 0 Verify by Ito's lemma on

eβt λt ˆ eβt λt

ˆ

t

αeβu dNu 0

0

βeβt λt dt + eβt dλt

t

eβu ρ (u) du +

= c (0) + β

= βeβt ρ (t) dt + αeβt dNt = β (ρ (t) − λt ) dt + αdNt .

dλt Taking the limit we obtain

ˆ

 lim c (t)

t→∞

=

lim

t→∞

=

lim β

ρ (t)

as a constant

´t 0

t→∞

= Treating

c (0) e

−βt



c (0) ≡ µ

 ρ (u) du

0

eβu ρ (u) du eβt

lim ρ (t) then we have

ˆ

Note that if we set

e

−β(t−u)

t→∞

ρ (t) ≡ µ, c (t)

t

=

c (0) e

−βt

t



e−β(t−u) ρ (u) du  eβt − 1

0 −βt

=

c (0) e−βt + µe

=

µ + e−βt (c (0) − µ)

then the process is simply

ˆ

t

e−β(t−u) dNu .

λt = µ + α 0 Therefore we can think of

µ

as the long run base intensity, i.e. the intensity if there has been no

past arrival.

114

Proof.

[Proposition 32] Since the parameters are bounded, we have

(1)

LT (µ1 , β11 , β12 , α11 , α12 )

 ˆ −

=

T

µ1 dt +

0

ˆ

T



α11 e−β11 (t−ti ) dt +

0

ti

Suggest Documents