This signature page was generated electronically upon submission of this ... indicate that the self-excited point proces
STATISTICAL AND ALGORITHM ASPECTS OF OPTIMAL PORTFOLIOS
DISSERTATION SUBMITTED TO THE INSTITUTE OF COMPUTATIONAL AND MATHEMATICAL ENGINEERING OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
Howard Howan Stephen Shek March 2011
2011 by Howard Howan Stephen Shek. All Rights Reserved. Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons AttributionNoncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/zv848cg8605
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Walter Murray, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Tze Lai, Co-Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Peter Hansen
Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.
iii
Abstract We address three key aspects of optimal portfolio construction: expected return, variancecovariance modeling and optimization in presence of cardinality constraints. On expected return modeling, we extend the self-excited point process framework to model conditional arrival intensities of bid and ask side market orders of listed stocks. The cross-excitation of market orders is modeled explicitly such that the ask side market order size and bid side probability weighted order book cumulative volume can aect the ask side order intensity, and vice versa. Dierent variations of the framework are estimated by using method of maximum likelihood estimation, based on a recursive application of the log-likelihood functions derived in this thesis. Results indicate that the self-excited point process framework is able to capture a signicant amount of the underlying trading dynamics of market orders, both in-sample and out-of-sample. A new framework is introduced, Realized GARCH, for the joint modeling of returns and realized measures of volatility. A key feature is a measurement equation that relates the realized measure to the conditional variance of returns. The measurement equation facilitates a simple modeling of the dependence between returns and future volatility. Realized GARCH models with a linear or log-linear specication have many attractive features. They are parsimonious, simple to estimate, and imply an ARMA structure for the conditional variance and the realized measure. An empirical application with DJIA stocks and an exchange traded index fund shows that a simple Realized GARCH structure leads to substantial improvements in the empirical t over standard GARCH models. Finally we describe a novel algorithm to obtain the solution of the optimal portfolio problem with NP -hard cardinality constraints. The algorithm is based on a local relaxation that exploits the inherent structure of the objective function. It solves a sequence of small, local, quadratic-programs by rst projecting asset returns onto a reduced metric space, followed by clustering in this space to identify sub-groups of assets that best accentuate a suitable measure of similarity amongst dierent assets. The algorithm can either be cold started using the centroids of initial clusters or be warm started based on the output of a previous result. Empirical result, using baskets of up to 3,000 stocks and with dierent cardinality constraints, indicates that the algorithm is able to achieve signicant performance gain over a sophisticated branchand-cut method. One key application of this local relaxation algorithm is in dealing with large scale cardinality constrained portfolio optimization under tight time constraint, such as for the purpose of index tracking or index arbitrage at high frequency.
iv
Contents
1 Introduction
1
2 Covariance Matrix Estimation and Prediction
6
2.1
Literature Review 2.1.1 2.1.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
High Frequency Volatility Estimators in Absence of Noise . . . . . . . . . . .
6
High Frequency Volatility Estimators in Presence of Market Microstructure Noise
2.2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3
Volatility Prediction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.4
Covariance Prediction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Complete Framework: Realized GARCH
7 18 23
. . . . . . . . . . . . . . . . . . . . . . . .
26
2.2.1
Framework
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
2.2.2
Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3 Asset Return Estimation and Prediction
45
3.1
Outline of Some Commonly Used Temporal Models . . . . . . . . . . . . . . . . . . .
45
3.2
Self Excited Counting Process and its Extensions . . . . . . . . . . . . . . . . . . . .
49
3.2.1
Univariate Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
3.2.2
Bivariate Case
52
3.2.3
Taking volume and orderbook imbalance into account
. . . . . . . . . . . . .
55
3.2.4
Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Solving Cardinally Constrained Mean-Variance Optimal Portfolio 4.1
4.2
4.3
4.4
Literature Review
69
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.1.1
Branch-and-bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
4.1.2
Heuristic methods
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
4.1.3
Linearization of the objective function . . . . . . . . . . . . . . . . . . . . . .
73
. . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.2.1
Sequential Primal Barrier QP Method Framework
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.2.2
Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
Solving CCQP with a Global Smoothing Algorithm . . . . . . . . . . . . . . . . . . .
78
4.3.1
Framework
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
4.3.2
Empirical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
Solving CCQP with Local Relaxation Approach . . . . . . . . . . . . . . . . . . . . .
86
4.4.1
Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
4.4.2
Algorithms
90
4.4.3
Computational Result
4.4.4
Algorithm Benchmarking
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94 95
5 Conclusion
108
A Appendix of Proofs
110
v
List of Tables 1
Key model features at a glance: The realized measures,
Rt , RVt ,
and
xt
denote the
intraday range, the realized variance, and the realized kernel, respectively.
In the
Realized GARCH model, the dependence between returns and innovations to the volatility (leverage eect) is modeled with so that
Eτ (zt ) = 0,
when
zt ∼ (0, 1).
†
τ (zt ),
such as
τ (z) = τ1 z + τ2 (z 2 − 1),
The MEM specication listed here is that
‡
selected by Engle & Gallo (2006) using BIC (see their Table 4).
The distributional
assumptions listed here are those used to specify the quasi log-likelihood function. (Gaussian innovations are not essential for any of the models). 2
. . . . . . . . . . . .
19
Results for the logarithmic specication: G(1,1) denotes the LGARCH(1,1) model that does not utilize a realized measure of volatility. GARCH(2,2) model without the
†
RG(2,2)
denotes the Real-
τ (z) function that captures the dependence between ∗
returns and innovations in volatility. RG(2,2)
is the (2,2) extended to include the
2 ARCH-term α log rt−1 . The latter being insignicant.
. . . . . . . . . . . . . . . . .
39
3
Estimates for the RealGARCH(1, 2) model.
4
In-Sample and Out-of-Sample Likelihood Ratio Statistics
. . . . . . . . . . . . . . .
42
5
Fitted result for DCC(1,1)-RGARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . .
44
6
MLE tted parameters for the proposed models; standard errors are given in parenthesis. Sample date is 25 June 2010.
†
. . . . . . . . . . . . . . . . . . . . . . .
indicates that the value is not signicant at
95% level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
40
64
Nullspace CG algorithm output for the rst 21 iterations. ngrad.r: norm of reduced gradient; ngrad:
norm of gradient; obj:
objective function value; cond:
condition
number of hessian; cond.r: condition number of reduced hessian; mu: barrier parameter; step: step length; res.pre: norm of residual pre CG; res.post: norm of residual post CG; maximum of outer iteration: 20; maximum of CG iteration: 30. 8
K
= 15.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
K
= 15.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
Successive geometric truncation method result for 3,000 of the most liquid US stocks
K = 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . local relaxation method. (500, 15) means solving a 500 universe
with cardinality constraint, 11
97
Successive arithmetic truncation method result for 3,000 of the most liquid US stocks, with cardinality,
10
81
Successive arithmetic truncation method result for 500 of the most liquid US stocks, with cardinality,
9
. . . . . .
Parameter setup for
problem with cardinality equal to 15. . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
99
12
CPLEX parameter settings (with the rest of the parameters set to their default values). 99
13
Computational result for the 500 asset case with cardinality, and the
local relaxation
K
= 15. For CPLEX
algorithm, we have imposed a maximum run time cuto at
1,200 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Computational result for the 3,000 asset case, with cardinality,
K
= 15. Successive
truncation is done over 10 iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
99
100
List of Figures 2.1
Monthly variance using daily close-to-close vs using hourly open-to-close.
Sample
period: 2008-02-02 to 2009-01-01. Slope is from simple OLS. . . . . . . . . . . . . . . 2.2
MLE tted EMWA parameter as a function of sampling period; Gaussian density.
2.3
News impact curve for IBM and SPY
3.1
Output of three variations of the ARMA time series framework.
24
.
25
. . . . . . . . . . . . . . . . . . . . . . . . . .
43
Black circles are
actual data-points and red dash-lines indicate the next period predictions. Top panel: simple ARMA; Center panel: ARMA with wavelet smoothing; Bottom panel: ARMAGARCH. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. Sample period shown here is between 2008-05-06 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2
47
Output based on AIC general state space model following Hyndman et al. (2000) and the Holt-Winters Additive models. Black circles are actual data-points and red dash-lines indicate the next period predictions.
Dataset used is the continuously
rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals.
Sample period shown here is between 2008-05-06 13:40:00 and
2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.
49
3.3
Diagrammatic illustration of a feed-forward (4,3,1) neural network. . . . . . . . . . .
50
3.4
Output based on a feed-forward (15,7,1) neural network.
Black circles are actual
data-points and red dash-lines indicate the next period predictions. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. Sample period shown here is between 2008-0506 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
µ = 0.3, α = 0.6 and β = 1.2.
3.5
Simulated intensity of an univariate Hawkes process, with
3.6
Conditional intensity of bid and ask side market orders following an order submit-
51 53
ted on the bid side of the market, estimated with bin size ranging from 30 to 500 milliseconds, using BP tick data on 25 June 2010. . . . . . . . . . . . . . . . . . . . . 3.7
Simulated intensity of a bivariate Hawkes process. Path in blue (top) is a realization of the
λ1
process; path in red (bottom) is that of the
visualization.
Parameters:
β11 = β12 = 1.5 3.8
54
and
λ2
process, inverted to aid
µ1 = µ2 = 0.5, α11 = α22 = 0.8, α12 = α21 = 0.5,
β22 = β21 = 1.5.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
55
Snapshots showing the evolution of a ten layer deep limit order book just before a trade has taken place (gray lines) and just after (black lines) for BP. Dotted lines are for the best bid and ask prices. Solid line is the average or mid price. Bars are scaled by maximum queue size across the whole book and represented in two color tones to help identify changes in the order book just before and after a trade has taken place.
vii
58
3.9
Time to order completion as a function of order submission distance from the best prevailing bid and ask prices. Distance is dened as the number of order book median price increments from the best top layer prices at the oppose side of the market. For example a distance of 1 corresponds to the top bid and ask prices and a distance of 2 corresponds to the second layer of the bid and ask sides. . . . . . . . . . . . . . . .
59
3.10 Probability of order completion within 5 seconds from submission. Dots are estimated based on empirical data and solid curve is based on the tted power law function
0.935x−0.155 .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
3.11 Time series for the dierence in bid and ask side LOB probability weighted cumulative volume,
v¯ (t, τ, L; 1) − v¯ (t, τ, L; 2),
for BP, on 25 June 2010.
. . . . . . . . . . . . . .
61
3.12 Top panel: time series for best prevailing bid and ask prices of the limit order book. Bottom panel: bid-ask spread. Both are for BP, on 25 June 2010. . . . . . . . . . . .
62
3.13 Unconditional arrival intensity of market and limit order on bid and ask sides of the order book, estimated using overlapping windows of one minute period, for BP on 25 June 2010. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
3.14 KS-plot for empirical inter-arrival times for market orders on the bid and ask sides of the market.
Dash lines indicate the two sided 99% error bounds based on the
Kolmogorov-Smirnov statistic. Also shown is the value of the Kolmogorov-Smirnov test statistic for the two order types. . . . . . . . . . . . . . . . distribution of the
3.15 KS-plot based on four variations of the framework discussed:
65
bivariate, bivariate
with trade size mark, bivariate with order book imbalance mark and bivariate with trade size and order book imbalance marks. Fitted to sample data on 25 June 2010. Dash lines indicate the two sided 99% error bounds based on the distribution of the
Kolmogorov-Smirnov
statistic. Also shown is the value of the
Kolmogorov-Smirnov
test statistic for the two order types. . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
3.16 In sample KS-plot for empirical inter-arrival times for market order on the bid and ask sides of the LOB. Model parameters are tted with data from in sample period on 06 July 2009 and applied to in sample period on 06 July 2009. Dash lines indicate the two sided 99% error bounds based on the distribution of the statistic.
Also shown is the value of the
two order types.
Kolmogorov-Smirnov
Kolmogorov-Smirnov test statistic for the
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
3.17 Out of sample KS-plot for empirical inter-arrival times for market order on the bid and ask sides of the LOB. Model parameters are tted with data from in sample period on 06 July 2009 and applied to out of sample period on 07 July 2009. Dash lines indicate the two sided 99% error bounds based on the distribution of the statistic.
Also shown is the value of the
two order types. 4.1
Kolmogorov-Smirnov
Kolmogorov-Smirnov test statistic for the
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Time complexity of simple QP with only the budget constraint.
68
Problem size is
the number of names in the portfolio. Both axes are in log scale. Result produced using the
R
built-in QP solver based on a dual algorithm proposed by Goldfarb and
Idnani (1982) that relies on Cholesky and QR factorization, both of which have cubic complexity.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
70
4.2
Null space line-search algorithm result. Top panel: value of barrier parameter, a function of iteration. feasible value,
x0 =
h
Bottom panel: 1-norm of error.
0.3
0.5
0.2
0.7
0.3
i>
σ
is set to
0.8
µ,
as
and initial
. Trajectory of the 1-norm error
in the bottom panel illustrates that the algorithm stayed within the feasible region of the problem. 4.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Histogram for mean daily return for 500 most actively traded stocks between 200801-22 to 2010-01-22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
77
Optimization output using built-in
R
78
routine that is based on the dual method of
Goldfarb and Idnani (1982, 1983). Top Panel: with equality constraint only. Bottom Panel: with equality and inequality constraints. and maximum of the solution vector 4.5
mix x
x, respectively. .
Optimization output using Algorithm 1.
and
are the minimum
. . . . . . . . . . . . . . . . . .
79
Top Panel: with equality constraint only.
Bottom Panel: with equality and inequality constraints. minimum and maximum of the solution vector 4.6
max x
mix x
x, respectively.
and
max x
are the
. . . . . . . . . . . . .
Algorithm 1 convergence diagnostics. Top panel: value of barrier parameter
79
µ; Center
panel: 1-norm of dierence between objective value at each iteration and the true value (as dened to be the value calculated by the of CG iterations at each outer iteration. 4.7
R function); Bottom panel:
. . . . . . . . . . . . . . . . . . . . . . . . .
Additional Algorithm 1 convergence diagnostics.
Top panel: step-length
panel: objective function value; Bottom panel: 2-norm of the gradient. 4.8
Converged output for and the last 500 are histogram of
4.9
si .
si ;
and the last 500 are
si .
Panel-1: rst 500 variables are
Panel-2: histogram of
xi ;
xi ,
α;
second 500 are
Panel-3: histogram of
yi ;
K = 100.
si ;
Panel-1: rst 500 variables are
Panel-2: histogram of
xi ;
xi ,
yi ;
80
yi
Panel-4:
second 500 are
Panel-3: histogram of
80
Center
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Converged output for
histogram of
K = 250.
the number
85
yi
Panel-4:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
4.10 Heatmap of covariance matrix for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01. colors closer to -1. 4.11 A
Darker colors indicate correlation closer to 1 and light
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Minimum spanning tree
87
for 50 actively traded stocks, over the sample period 2008-
01-10 to 2009-01-01. Using distance metric dened in (4.18). Each color identies a dierent sector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12
k-means clustering
88
in a space spanned by the rst three principal components based
on log returns for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
4.13 Fully relaxed (i.e. without cardinality constraint) QP solution for 500 actively traded US stocks (not all tickers are shown) sampled over the period from 2002-01-02 to 2010-01-22, with dollar neutral constraint. . . . . . . . . . . . . . . . . . . . . . . . .
96
4.14 Projection onto a 4-dimensional space spanned by the rst four dominant PCA factors, based on daily returns for the most liquid 500 US traded stocks, sampled over the period from 2002-01-02 to 2010-01-22. dierent cluster groups.
Symbols with dierent colors correspond to
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
98
local relaxation method performance comparison for the 500 asset case, with cardinality, K = 15, both are cold started. . . . . . . . . . . . . . . . . . . . . .
4.15 CPLEX versus
100
4.16 Projection onto a four dimensional space spanned by the rst four dominant PCA factors, based on monthly returns for the most liquid 3,000 US traded stocks, sampled over the period from 2005-05-31 to 2010-04-30. correspond to dierent cluster groups. 4.17 CPLEX versus
local relaxation local relaxation
. . . . . . . . . . . . . . . . . . . . . . . . . .
101
method performance comparison for the 3,000 asset
case, with cardinality constraint 4.18 CPLEX versus
Symbols with dierent colors
K
= 15, both cold started. . . . . . . . . . . . . . .
102
method performance comparison for the 500 assets
universe, with cardinality constraint,
K = 15;
Both methods are warm started using
the solution of successive truncation over 20 iterations. Maximum run-time is set to one hour.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.19 CPLEX versus
local relaxation
103
method performance comparison for the 3,000 assets
universe, with cardinality constraint,
K = 100;
Both methods are warm started using
the solution of arithmetic successive truncation over 20 iterations. Maximum run-time is set to seven hours. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.20 Apply CPLEX MIQP to the union of the cluster groups identied by
104
local relaxation
algorithm upon convergence (the beginning of the at line in Figure 4.19). Maximum run-time is set to one hour.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
105
4.21 Mean-variance ecient frontiers for a 3,000 asset universe. QP is the frontier without the cardinality constraint. CCQP is the frontier in presence of cardinality. To produce the frontier for CCQP, we warm-start the local relaxation algorithm, based on the successive truncation solution, and set a maximum run time limit of 252,000 seconds for the case where
K = 100
and 3,600 second for the case where
x
K = 15.
. . . . . . .
106
1
Introduction
Portfolio optimization in the classical mean-variance optimal sense is a well studied topic (see Markowitz, 1952, 1987; Sharpe, 1964). It is relevant at all trading frequencies, whenever the agent has an exponential utility function, and whenever multiple return forecasts are aggregated into a portfolio. In high frequency nance, when transaction cost plays a key part in the eventual return of a strategy and when speed of transaction depends on the number of stocks in the portfolio, we need a framework to form a mean-variance optimal portfolio that bounds the total number of names to execute. Ultimately, the problem we aim to solve is a
cardinality-constrained quadratic program
(CCQP) with linear constraints, which can be expressed as
min f (x) = c> x + 12 x> Hx
(1.1)
x,˜ x
s.t.
where
Ax = b
(1.2)
e> x e=K
(1.3)
c, x, e, x e ∈ Rn×1 , H ∈ Rn×n , A ∈ Rm×n , b ∈ Rm×1 , m ≤ n
and
e
is a vector of
1's.
Let
x∗
be
our solution set, the indicator function in (1.3) is given by
1 x ∈ x∗ i x ei = 0 o.w.. The three key components in the objective function are the prediction of expected return, variance-covariance matrix,
c,
H , and the specic algorithm for solving the optimization problem.
and This
thesis addresses these three key aspects of optimal portfolio construction. Based on Shek (2010), we investigate a self-excited point process approach to forecast expected return. Until recently, the majority of time series analyses related to nancial data has been carried out using regularly spaced price data, with the goal of modeling and forecasting key distributional characteristics of future returns, such as expected mean and variance. These time series data mainly consist of daily closing prices, where comprehensive data are widely available for a large set of asset classes. With the recent rapid development of high-frequency nance, the focus has shifted to intraday tick data, which record every transaction during market hours, and come with irregularly spaced time-stamps. We could resample the dataset and apply the same analyses as before, or we could try to explore additional information that the inter-arrival times may convey in terms of likely future trade direction. In order to properly take into account these irregular occurrences of transactions, we can adopt the framework of point process. In a doubly stochastic framework (see Bartlett, 1963), both the counting process and the driving intensity are stochastic. A point process is called selfexcited if the current intensity of the events is determined by events in the past, see Hawkes (1971). It is widely accepted and observed that volatility of price returns tends to cluster. That is, a period of elevated volatility is likely to be followed by periods of similar levels of volatility. Trade arrivals also exhibit such clustering eect, for example a buy order is likely to be followed closely by another buy order. These orders tend to cluster in time. There has been a growing amount of literature on the application of point process to model inter-arrival trade durations, see Bowsher (2003) for a
1
comprehensive survey of the latest modeling frameworks. The proposed framework extends previous work on the application of self-excited process to model high frequency nancial data. The extension comes in the form of a marked version of the process in order to take into account trade size inuence on the underlying arrival intensity. In addition, by incorporating information from the
book
limit order
(LOB), the proposed framework takes into account a measure of supply-demand imbalance of
the market by parametrize the underlying based intensity as a function of this imbalance measure. The intensities,
where
ti
λ1t , λ2t
for market bid and ask side orders, respectively, are given by the following,
λ1t = µ1 v¯2t +
1 w ¯1
P
1 w ¯2
P
and
λ2t = µ2 v¯1t + tj
are
Ft -adapted
ti f r s Z
with Z (h) = I +
Ph
j=1
Pj − I
and
> y (xr , xs ) = lim y (h) (xr , xs ) = e> r (I − Z) f + es Zf h→∞
where Z is the fundamental matrix of the underlying Markov chain. Focusing on the
h = ∞ case and dene y(r,s) = y (xr , xs ), then the ltered realized variance (the
infeasible estimator) is given by
RVF =
X
2 nr,s y(r,s) .
r,s The empirical transition matrix
Pˆ
Pˆr,s =
is given by
nr,s/nr,• , where
nr,• =
P
s
nr,s .
Then we have
the feasible estimator
RVFˆ =
X
2 nr,s yˆ(r,s)
r,s where
2 ˆ yˆ(r,s) = e> I − Zˆ f + e> r r Zf
Denition 12.
and
ˆ Zˆ = I − Pˆ − Π
−1
.
Markov Chain estimator in price level is dened as
D E M C # := n f, 2Zˆ − I f π ˆ
ha, biπ = a> Λπ b with Λπ = diag (π1 , π2 , . . . , πS k ). RFFˆ − M C # = Op n−1 , and if the rst observe state
where the inner product is dened to be
It can be
shown (Theorem 4 in paper) that
coincides
with the last observed state then
RVFˆ = M C # . 17
Markov Chain estimator in log-price is to a good approximation
D E n2 f, 2Zˆ − I f π ˆ Pn . MC = 2 X i=1 Ti This approximation is good as long as
XTi
does not uctuate dramatically.
computation and it preserves the asymptotic theory derived for Robustness to inhomogeneity (for both
MC
#
MC
#
MC
allows for faster
.
and its standard deviation) is dealt by articially
increasing the order of the Markov chain. For example, a simulation study shows that estimation for a data generating process with a Markov chain of order one, a Markov chain of order two yields a consistent estimate for
M C #,
but the asymptotic variance of estimator increases with
k.
Note,
although misspecication of the order two Markov chain leads to poor estimates of many population quantities, but it will accurately estimate the quantity we seek. One special feature of the
MC
estimator is that it is a generalization of the ALT estimator
proposed by Large (2007) (see Section 2.1.2). It can be shown that this is identical to
k = 1,
and
S=2
with
>
f = (+κ, −κ)
M C #,
with
.
2.1.3 Volatility Prediction Volatility is more predictable then the mean of the underlying return process. Stylized eects such as diurnal intraday pattern, and clustering at dierent time duration are prevalent in many assets and are well known and studied. In the subsections that follows, some key features of a number of forecasting frameworks are introduced. Table 1 on page 19 summaries their key features.
GARCH (Bollerslev, 1986; Engle, 1982)
The latent volatility process of asset returns are
relevant to a wide variety of applications, such as option pricing and risk management, and GARCH models are widely used to model the dynamic features of volatility. This has sparked the development of a large number of ARCH and GARCH models since the seminal paper by Engle (1982). Within the GARCH framework, the key element is the specication for the conditional variance. GARCH models utilize daily returns (typically squared returns) to extract information about the current level of volatility, and this information is used to form expectations about the next period's volatility.
A single return is unable to oer more than a weak signal about the current level of
volatility. The implication is that GARCH models are poorly suited for situations where volatility changes rapidly to a new level, because the GARCH model is slow at catching up and it will take many periods for the conditional variance (implied by the GARCH model) to reach its new level. Partial GARCH and MEM discussed in this Section, and the
Realized GARCH
in Section 2.2 are
second generation of GARCH models that address these shortcomings, but still retain the essential features of the original framework.
UHF-GARCH (Engle, 2000)
The proposed procedure is to model the associated variables,
such as observed (i.e. latent value plus any measurement noise) returns (or the arrival times, and then to
separately
marks ) conditional on
model the arrival times. Below, the notations used in the
original paper are retained for ease of comparison. Dene inter-trade duration
xi = ti − ti−1 18
19 and
xt
iid N (0, 1) iid Exp (1)
zt
ut σu
zt
∼ iid N (0, I)
∼ iid N (0, I)
zRK,t
denote the intraday range, the realized variance, and the realized
ht zt ξ + ϕ log ht + τ (zt ) + ut
√
∼ ∼
zt zR,t ∼ iid N (0, I) zRV,t
ei εi
τ (zt ),
such as
τ (z) = τ1 z + τ2 (z 2 − 1),
† so that Eτ (zt ) = 0, when zt ∼ (0, 1). The MEM specication listed here is that selected by Engle & ‡ The distributional assumptions listed here are those used to specify the quasi log-likelihood function.
(Gaussian innovations are not essential for any of the models).
Gallo (2006) using BIC (see their Table 4).
with
‡
zt ∼ iid N (0, 1)
Distribution
kernel, respectively. In the Realized GARCH model, the dependence between returns and innovations to the volatility (leverage eect) is modeled
Rt , RVt ,
ht = exp {ω + β log ht−1 + γ log xt−1 }
= =
ht zt 2 µt zRK,t
√
= ht zt2 2 = hR,t zR,t 2 = hRV,t zRV,t
= ρ √rxi−1 + ei + φei−1 i−1 = ψt εt = Vi−1 √rxi i xi
ht zt
= =
rt log xt
rt xt
2 ht = ω + αrt−1 + βht−1 + γxt−1 µt = ωR + αR xt−1 + βR µt−1
√
rt2 Rt2 RVt
σi2
xt
√ri xi
rt =
Observable
2 2 ht = ω + αrt−1 + βht−1 + δrt−1 + ϕRt−1 2 hR,t = ωR + αR Rt−1 + βR hR,t−1 + δR rt−1 hRV,t = ωRV + αRV RVt−1 + βRV hRV,t−1 2 +δRV rt−1 + ϑRV RVt−1 1(rt−1 N (t)| x ei−1 , yei−1 ) ∆t ´ f ( t − ti−1 , u| x ei−1 , yei−1 ; θi ) du ´ ´ u∈Ξ . f ( s − ti−1 , u| x ei−1 , yei−1 ; θi ) duds s≥t, u∈Ξ lim
∆t→0
Dene the component conditional densities
f ( xi , yi | x ei−1 , yei−1 ; θi ) = g ( xi | x ei−1 , yei−1 ; θ1i ) q ( yi | xi , x ei−1 , yei−1 ; θ2i ) . Then we obtain
λi (t, x ei−1 , yei−1 )
Model for inter-arrival time, xi
´
=
g ( t − ti−1 | x ei−1 , yei−1 ; θ1i ) . g ( s − ti−1 | x ei−1 , yei−1 ; θ1i ) ds s≥t
To model the inter-arrival time,
xi ,
(2.12)
Engle and Russell (1998)
adopted the following framework:
Denition 13. Autoregressive Conditional Duration (ACD) εi ∼ iid (1, 1)
xi
=
ψi εi
ψi
=
ψ (e xi−1 , yei−1 ; θi ) = E [ xi | x ei−1 , yei−1 ; θ1i ] =
(Engle and Russell, 1998) is given by
(2.13)
ˆ Ω
xi g ( xi | x ei−1 , yei−1 ; θ1i ) dxi .
(2.14)
Note that (2.13) is powerful because it nests
.
log linear model:
log xi = zi β + wi ,
.
Cox proportional hazard model (Cox, 1972):
and
λ (t, z) = λ0 (t) e−zβ ;
we have (2.13).
Example.
Conditional mean specication
xi
=
ψi εi , εi ∼ Exp (1)
ψi
=
ω + αxi−1 + βψi−1 .
From (2.13) we have
g ( xi | x ei−1 , yei−1 ; θ1i )
= g ( xi = εi ψi | ψi ; θ1i ) xi = g εi = ψ ; θ i 1i ψi xi = p 0 εi = θ11 . ψi 20
so if
λ0 ≡
constant, then
The density and survivor function
ε,
and the base intensity can be expressed
ε ∼ p0 (•; θ11 ) ˆ S0 (t; θ11 ) = p0 (s; θ11 ) ds s>t
λ0 (t; θ11 )
p0 (t; θ11 ) . S0 (t; θ11 )
=
Then (2.12) can be expressed by
p0 λi (t, x ei−1 , yei−1 )
´
=
s≥t
=
ψi
´
=
λ0
Note that once we have specied the density of
t−ti−1 ψi ; θ11
p0
p0
s−ti−1 ψi ; θ11 t−ti−1 ψi ; θ11
ds
p0 (e s; θ11 ) de s t − ti−1 1 ; θ11 . ψi ψi
s e≥
t−ti−1 ψi
ε and the functional form for ψ , then the conditional
intensity is fully specied. For example, if we have
P r (ε = x) = e−x
, i.e. standard exponential
density, then we have
1 λi = 2 ψi
Model for price volatility, σi modeling arrival time,
xi ,
The return process,
ri ,
e
t−ti−1 ψi
−1
.
Recall that the UHF-GARCH framework is a two step process of
and price volatility,
we are now in a position to model
!
1
σi ,
xi
separately. With
specied by (2.13) and (2.14),
σi .
in terms of the observed price process,
pi ,
is given by
ri = pi − pi−1 . The observed price process is related to the latent process,
mi ,
via
pi = mi + ζi , where
ζi
is error from truncation. Assume that the latent price,
pi ,
to be a Martingale with respect
to public information with innovation that can be written, without loss of generality, proportional to the square root of the time. We have a duration adjusted return given by
∆mi ∆ζi ri √ = √ + √ = νi + ηi , xi xi xi 5
such that all relevant quantities are expressed as per unit of time. Consider a serially correlated specication for the second term
ηi = ρηi−1 + ξi + χξi−1 , 5 Autocorrelated since the truncation at one point in time is likely to be the same as the truncation several seconds later
21
then we arrive at an ARMA(1,1) model for our duration adjusted return given by
ri ri−1 + ei + φei−1 , √ = ρ√ xi xi−1 where
(2.15)
ei = νi + ξi .
Denition 14.
Dene the conditional variance per unit of time
Vi−1 where the ltration for
Vi−1 (•)
is
ri √ xi = σi2 , xi
(2.16)
σ ((xi , ri ) : i = 0, 1, . . . , i − 1).
Compare (2.16) to the denition in classic GARCH for return
hi =
xi σi2 , hence
Vi−1 ( ri | xi ) = hi ,
we see that
σi2 is the variance per unit time.
Empirical observations
There are a number of ways to specify
σi2 ,
once such specication (Eqn
(40) in their paper) is given below
2 σi2 = ω + αe2i−1 + βσi−1 + γi x−1 i + γ2
xi + γ3 ξi−1 + γ4 ψi−1 . ψi
With the above tted to data for IBM, a number of interesting observations is observed:
.
mean of
(2.15)
is negative ⇒ no news is bad news,
(Diamond and Verrecchia, 1987);
. γ1
is positive ⇒ no trade means no news,
. γ2
is negative ⇒ mean reversion after surprise of trade;
. γ4
is positive ⇒ high transaction means high volatility.
Partial GARCH (Engle, 2002b)
hence low volatility (Easley and O'Hara, 1992);
High-frequency nancial data are now readily available and
the literature has recently introduce a number of realized measures of volatility, including the realized variance, the bipower variation, the realized kernel, and many related quantities, see Sections 2.1.1 and 2.1.2, and Barndor-Nielsen and Shephard (2002), Barndor-Nielsen and Shephard (2004), Barndor-Nielsen et al. (2008b), Hansen and Horel (2010), and references therein.
Any of these
measures is far more informative about the current level of volatility than is the squared return. This makes realized measures useful for modeling and forecasting future volatility. Let
xt
be some observable time series that is referred to as the realized measure. For instance,
could be the realized variance computed from high-frequency data on day
2 like rt , related to
ht ,
xt
t. The realized variance is,
and this holds true for many high-frequency based measures. So it is natural
to augment the standard GARCH model with
xt .
A simple extension is to add the realized measure
to the dynamic equation for the conditional variance,
2 ht = ω + αrt−1 + βht−1 + γxt−1 . This is known as the GARCH-X model, where the label X is due to
(2.17)
xt
being treated as an
exogenous variable. Engle (2002b) was the rst to estimate this model with the realized variance
22
as the exogenous variable treats
xt
xt .
The GARCH-X model is referred to as a
partial-model
because it
as an exogenous variable. Within the GARCH-X framework no eort is paid to explain
the variation in the realized measures, so these are partial (incomplete) models that have nothing to say about returns and volatility beyond a single period into the future. In order to complete the model we need a specication for
xt ,
which is given in Section 2.2. This model was also estimated
by Barndor-Nielsen and Shephard (2007) who used both the realized variance and the bi-power variation as the exogenous variable. The condition variance, Thus, if
γ 6= 0
requirement is
then (2.17) implies that
xt
Ft = σ(rt , xt , rt−1 , xt−1 , . . .),
MEM Engle (2002b)
is adapted to but
Ft
ht ,
Ft .
is, by denition, adapted to
Ft−1 .
A ltration that would satisfy this
could in principle be an even richer
σ -eld.
The Multiplicative Error Model (MEM) by Engle (2002b) was the rst
complete model in this context, see also Engle and Gallo (2006). This model species a GARCH structure for each of the realized measures, so that an additional latent volatility process is introduced for each realized measure in the model. Another complete model is the HEAVY model by Shephard and Sheppard (2010) that, in terms of its mathematical structure, is nested in the MEM framework. Unlike the traditional GARCH models, these models operate with multiple latent volatility processes. For instance, the MEM by Engle and Gallo (2006) has a total of three latent volatility processes and the HEAVY model by Shephard and Sheppard (2010) has two (or more) latent volatility processes.
2.1.4 Covariance Prediction Model-free predictor
A simple estimator would be to use the trailing 255 daily close-to-close
return history to estimate the
variance-covariance
(VCV) matrix, with monthly update, i.e.
the
estimated VCV is then kept constant throughout the month, until the next update. An alternative would be to incorporate intraday information, using hourly open-to-close returns, where marketmicrostructure noise is less prevalent. For this alternative, we need to address the contribution of variance from activities outside of market hours. One obvious way to account for this systemic bias is to scale the open-to-close VCV. In Figure 2.1 we plot the monthly variance of daily close-to-close return against the monthly variance of hourly open-to-close return.
squares
Using simple
ordinary least
(OLS) regression, together with the fact that trading spans 6.5 hours during the day, the
estimated the average per-hour ratios for open-to-close to overnight variance is in the range of 4.4 6
to 16.1 . This is in-line with estimated value of 12.78 by Oldeld and Rogalski (1980) and 13.2 by French and Roll (1986). We see that the value of this bias could vary signicantly depending on the stock and, not shown here, it could also be a function of the sample period. This can be explained by considering the following:
.
on one end of the spectrum, for an
American depositary receipt
(ADR) that is linked to a
stock mainly traded on an Asian exchange, most of the information content would be captured in the close-to-open returns.
Hence, we expect a smaller (likely to be less than 1) hourly
open-to-close to overnight variance ratio;
.
on the other end of the spectrum, for the stock of a company that generates its income mainly in the US, we would expect most of the price movement coming from trading during market
6 Note
that the higher the ratio, the more volatile the open-to-close return relative to close-to-open returns.
23
0.00005
0.00010
0.00015
0.00020
0.0025 0.0015
slope = 7.12
0.0005
0.0015 0.0010 0.0000
0.0005
slope = 8.62
0.00000
DELL Monthly variance on daily c2c return
IBM
0.00000
Monthly variance on hourly o2c return
0.00010
0.00020
0.00030
Monthly variance on hourly o2c return
0.003 0.002
slope = 7.79
0.000
0.001
0.015 0.005
0.010
slope = 10.11
0.000 0.0000
XOM Monthly variance on daily c2c return
0.020
WFT
0.0004
0.0008
0.0012
0e+00
1e-04
2e-04
3e-04
4e-04
5e-04
Figure 2.1: Monthly variance using daily close-to-close vs using hourly open-to-close. Sample period: 2008-02-02 to 2009-01-01. Slope is from simple OLS.
hours, hence a more volatile open-to-close return variance which leads to a higher variance ratio. The simplest form of model free estimator incorporating higher frequency data is as follows
.
calculate hourly open-to-close returns (based on tick or higher frequency data);
.
calculate VCV matrix over a suitably chosen number of observations as the
.
scale VCV by stock specic scaling parameter, calculated over the same look-back period, to
look-back period ;
give an unbiased estimator of daily close-to-close VCV.
Moving average predictor
The RiskMetrics conditional
exponentially weighted moving average
(EWMA) covariance estimator, based on daily observations, is given by
> Ht = αrt−1 rt−1 + (1 − α) Ht−1 where the initial recursion value,
(2.18)
H0 , can be taken to be the unconditional sample covariance matrix.
RiskMetrics (1996) suggests a value for the smoothing parameter of
α = 0.06.
The primary advantage
of this estimator is clear: it has no parameters to estimate and is simple to implement. The obvious drawback is that it forces all assets to have the same smoothing coecient irrespective of the asset's volatility dynamics. This will be addressed by the following section, where, with added complexity, we aim to address some of these asset idiosyncratic behaviors. When applied to high-frequency data, it has often been observed that (2.18) gives a noisy estimator of the VCV matrix. Therefore, the RiskMetric EWMA estimator is adapted to high frequency observations as follows. Let the
daily
latent covariance estimator,
Hτ = αrτ rτ> + (1 − α) Hτ −1 24
Hτ ,
be given by
0.30 0.25 0.20
Value of parameter
0.15
2
4
6
8
10
12
Sampling period, *5min
Figure 2.2: MLE tted EMWA parameter as a function of sampling period; Gaussian density.
where
rτ rτ>
is the variance-covariance of the return over the period,
observed return for day
rτ rτ> Hτ ,
τ
and
rn,τ
is the
n-th
τ ∈ [t1,τ , tn,τ ], r1,τ
and last observed return for day
is simply the high-frequency realized VCV for day
τ.
τ.
is the rst
In other words,
So we have one latent VCV estimator,
per day.
To estimate the EWMA parameters explicitly, we use MLE to maximize the joint Gaussian density. The tted result gives
α = 0.30,
i.e. signicantly more weight on the high-frequency estimator,
rτ −1 rτ>−1 , than that suggested by RiskMetric7 . To quantify robustness of the EWMA estimator to the sampling frequency, the dataset is down-sampled to frequencies corresponding to sampling every 10min, 15min, 20min, 25min and 30min. Figure 2.2 shows the MLE tted parameter as a function of sampling period. It can be seen that as sampling frequency decreases, there is a corresponding decrease in the persistency parameter. This is attributable to the decreasing information content in less frequently sampled return data.
DCC(1,1)-GARCH predictor (Engle, 2002c)
The DCC-GARCH framework relies on two
2 latent processes: conditional variance Dt , and conditional correlation and
Dt
is diagonal, so that
Ht = Dt Rt Dt .
Rt , where both Dt , Rt ∈ Rn×n
The conditional variance is given by
> 2 Dt2 = diag (ωi ) + diag (κi ) ⊗ rt−1 rt−1 + diag (λi ) ⊗ Dt−1 , where
rt ∈ Rn
7 The
and
⊗
is the Hadamard product. The conditional correlation is given by
−1/2
−1/2
Rt
=
diag (Qt )
Qt diag (Qt )
Qt
=
(1 − α − β) Q + αt−1 > t−1 + βQt−1
MLE tted parameters assuming a t(5)-distribution are (0.28,
25
0.72).
(2.19)
where
Q = E t−1 > t−1
is the unconditional correlation matrix of the standardized residual. The
log-likelihood for Gaussian innovation can be expressed as
L = −
1X 2 n log 2π + log |Dt | + rt> Dt−2 rt − > t t 2 t
−1 + log |Rt | + > t R t t . Parameter estimation is done via a two stage optimization procedure.
Stage One
In the rst stage, we maximize the log-likelihood of the conditional variance process,
( θb = arg max
ω,κ,λ
Stage Two
) 1 X 2 − n log 2π + log |Dt | + rt> Dt−2 rt . 2 t
In the second stage, we maximize the conditional correlation process, given the stage
one result,
) 1X > −1 > log |Rt | + t Rt t − t t . max − α,β 2 t (
In the next Section, we will extend this framework to take into account high frequency observations based on a new proposed univariate model for the conditional variance.
2.2
Complete Framework: Realized GARCH
2.2.1 Framework We propose a new framework that combines a GARCH structure for returns with a model for realized measures of volatility. Models within this framework are called
Realized GARCH
models, a name
that transpires both the objective of these models (similar to GARCH) and the means by which these models operate (using realized measures).
A
Realized GARCH
model maintains the single
volatility-factor structure of the traditional GARCH framework. Instead of introducing additional latent factors, we take advantage of the natural relationship between the realized measure and the conditional variance, and we will argue that there is no need for additional factors in many cases. The general structure of the RealGARCH(p,q) model is given by
rt =
where
zt ∼ iid (0, 1)
and
p
ht zt ,
(2.20)
ht = v(ht−1 , . . . , ht−p , xt−1 , . . . , xt−q ),
(2.21)
xt = m(ht , zt , ut ),
(2.22)
ut ∼ iid (0, σu2 ),
with
We refer to the rst two equations as the
zt
and
ut
being mutually independent.
return equation
and the
GARCH equation, and these
dene a class of GARCH-X models, including those that were estimated by Engle (2002a) and Barndor-Nielsen and Shephard (2007). Recall that the GARCH-X acronym refers to the the fact that
xt
is treated as an exogenous variable.
26
measurement equation,
We shall refer to (2.22) as the
often be interpreted as a measurement of
xt = ht + ut .
ht .
because the realized measure,
xt ,
can
The simplest example of a measurement equation is:
The measurement equation is an important component because it completes the
model. Moreover, the measurement equation provides a simple way to model the joint dependence between
rt
and
the presence of
xt , which is known to be empirically important.
This dependence is modeled though
zt in the measurement equation, which we nd to be highly signicant in our empirical
analysis. It is worth noting that most (if not all) variants of ARCH and GARCH models are nested in the
Realized GARCH
framework. See Bollerslev (2009) for a comprehensive list of such models. The
nesting can be achieved by setting
xt = rt
or
xt = rt2 ,
and the measurement equation is redundant
for such models, because it is reduced to a simple identity. The
Realized GARCH
model with a simple log-linear specication is characterized by the follow-
ing GARCH and measurement equations.
log ht = ω +
Xp i=1
βi log ht−i +
Xq j=1
γj log xt−j ,
(2.23)
log xt = ξ + ϕ log ht + τ (zt ) + ut , where
√ zt = rt / ht ∼ iid (0, 1), ut ∼ iid (0, σu2 ),
Remark
.
15
and
τ (z)
(2.24)
is called the
leverage function.
A logarithmic specication for the measurement equation seems natural in this context.
The reason is that (2.20) implies that
log rt2 = log ht + log zt2 ,
(2.25)
and a realized measure is in many ways similar to the squared return,
ht .
measure of of
It is therefore natural to explore specications where
log ht and zt , such as (2.24).
rt2 ,
log xt
albeit a more accurate
is expressed as a function
A logarithmic form for the measurement equation makes it convenient
to specify the GARCH equation with a logarithmic form, because this induces a convenient ARMA structure, as we shall see below.
Remark tion,
.
16
τ (zt ).
In our empirical application we adopt a quadratic specication for the leverage funcThe identity (2.25) motivated us to explore expressions that involves
log zt2 ,
but these
were inferior to the quadratic expression, and resulted in numerical issues because zero returns are occasionally observed in practice.
Remark 0
then
.
17
xt
The conditional variance,
must also be adapted to
σ(rt , xt , rt−1 , xt−1 , . . .),
Remark ht .
.
18
but
Ft
Ft .
ht
is, by denition, adapted to
xt
Therefore, if
A ltration that would satisfy this requirement is
could in principle be an even richer
Note that the measurement equation does not require
For instance,
Ft−1 .
γ 6= Ft =
σ -eld. xt
to be an unbiased measure of
could be a realized measure that is computed with high-frequency data from
a period that only spans a fraction of the period that
rt
is computed over. E.g.
realized variance for a 6.5 hour long period whereas the return, spans 24 hours. When
xt
is roughly proportional to
ht ,
rt ,
xt
could be the
is a close-to-close return that
then we should expect
ϕ ≈ 1,
and that is
indeed what we nd empirically. Both when we use open-to-close returns and close-to-close returns.
27
An attractive feature of the log-linear
Realized GARCH
model is that it preserves the ARMA
structure that characterizes some of the standard GARCH models. This shows that the ARCH nomenclature is appropriate for the
Realized GARCH
model. For the sake of generality we derive
the result for the case where the GARCH equation includes lagged squared returns. Thus consider the following GARCH equation,
log ht where
= ω+
Xp i=1
βi log ht−i +
Xq j=1
γj log xt−j +
Xq j=1
2 αj log rt−j ,
(2.26)
q = maxi {(αi , γi ) 6= (0, 0)}.
Proposition 19. Dene wt = τ (zt ) + ut and vt = log zt2 − κ, where κ = E log zt2 . The Realized
GARCH model dened by (2.24) and (2.26) implies log ht = µh +
q p∨q X X (γj wt−j + αj vt−j ), (αi + βi + ϕγi ) log ht−i + j=1
i=1
log xt = µx +
p∨q X
(αi + βi + ϕγi ) log xt−i + wt +
i=1
log rt2 = µr +
p∨q X
{−(αj + βj )wt−j + ϕαj vt−j } ,
j=1
p∨q p∨q X X 2 (αi + βi + ϕγi ) log rt−i + vt + {γi (wt−j − ϕvt−j ) − βj vt−j } , i=1
j=1
where µh = ω + γ• ξ + α• κ, µx = ϕ(ω + α• κ) + (1 − α• − β• )ξ , and µr = ω + γ• ξ + (1 − β• − ϕγ• )κ, with X X X q
α• =
j=1
αj ,
β• =
p
i=1
βi ,
and
γ• =
q
j=1
γj ,
using the conventions βi = γj = αj = 0 for i > p and j > q. log ht is ARMA(p ∨ q, q − 1), whereas α1 = · · · = αq = 0, then log xt is ARMA(p ∨ q, p).
So the log-linear Realized GARCH model implies that
log rt2 and
log xt
are ARMA(p
∨ q, p ∨ q ).
If
From Proposition 19 we see that the persistence of volatility is summarized by a
parameter π=
p∨q X
persistence
(αi + βi + ϕγi ) = α• + β• + ϕγ• .
i=1
Example 20.
By setting
model. For instance with
xt = rt2 , it is easy to verify that the RealGARCH(p,q) nests the GARCH(p,q) p=q=1
we obtain the GARCH(1,1) structure with
2 2 v(ht−1 , rt−1 ) = ω + αrt−1 + βht−1 ,
m(ht , zt , ut ) = ht zt2 . The measurement equation is simply an identity in this case, i.e. we can take
Example 21.
If we set
xt = rt ,
ut = 0,
for all
t.
then we obtain the EGARCH(1,1) model by Nelson (1991) with
v(ht−1 , rt−1 ) = exp {ω + α|zt−1 | + θzt−1 + β log ht−1 } , p m(ht , zt , ut ) = ht zt .
28
since
p zt−1 = rt−1 / ht−1 ,
Naturally, the interesting case is when containing several realized measures.
GARCH
xt
is a high-frequency based realized measure, or a vector
Next we consider some particular variants of the
Realized
model.
Consider the case where the realized measure, variance.
xt ,
is a consistent estimator of the integrated
Now write the integrated variance as a linear combination of the conditional variance
and a random innovation, and we obtain the relation
xt = ξ + ϕht + t .
We do not impose
ϕ=1
so that this approach also applies when the realized measure is computed from a shorter period (e.g. 6.5 hours) than the interval that the conditional variance refers to (e.g. 24 hours). Having a measurement equation that ties
xt
to
ht
has several advantages. First, it induces a simple and
tractable structure that is similar to that of the classical GARCH framework.
For instance, the
conditional variance, the realized measure, and the squared return, all have ARMA representations. Second, the measurement equation makes it simple to model the dependence between shocks to returns and shocks to volatility, that is commonly referred to as a leverage eect. measurement equation induces a structure that is convenient for prediction.
Third, the
Once the model is
estimated it is simple to compute distributional predictions for the future path of volatilities and returns, and these predictions do not require us to introduce auxiliary future values for the realized measure. To illustrate the framework and x ideas, consider a canonical version of the
Realized GARCH
model that will be referred to as the RealGARCH(1,1) model with a log-linear specication. This model is given by the three equations
rt
where
rt
is the return,
=
p
ht zt ,
log ht
= ω + β log ht−1 + γ log xt−1,
log xt
= ξ + ϕ log ht + τ (zt ) + ut ,
zt ∼ iid(0, 1)
and
ut ∼ iid(0, σu2 ),
and
ht = var(rt |rt−1 , xt−1 , rt−2 , xt−2 , . . .).
The last equation relates the observed realized measure to the latent volatility, and is therefore called the measurement equation. It is easy to verify that one,
log ht = µ + π log ht−1 + wt−1 ,
where
log ht
is an autoregressive process of order
µ = ω + γξ, π = β + ϕγ,
and
wt = γτ (zt ) + γut .
So
it is natural to adopt the nomenclature of GARCH models. The inclusion of the realized measure in the model and the fact that
log xt
has an ARMA representation motivate the name
Realized
GARCH. A simple, yet potent specication of the leverage function is τ (z) = τ1 z + τ2 (z − 1), which 2
can generate an asymmetric response in volatility to return shocks.
The simple structure of the
model makes the model easy to estimate and interpret, and leads to a tractable analysis of the quasi maximum likelihood estimator. The MEM by Engle and Gallo (2006) utilizes two realized measures in addition to the squared returns.
These are the intraday range (high minus low) and the realized variance, whereas the
HEAVY model by Shephard and Sheppard (2010) uses the realized kernel (RK) by Barndor-Nielsen et al. (2008b). These models introduce an additional latent volatility process for each of the realized measures.
So the MEM and the HEAVY digress from the traditional GARCH models that only
have a single latent volatility factor. Unlike the MEM by Engle and Gallo (2006) and the HEAVY model by Shephard and Sheppard (2010), the Realized GARCH has the following characteristics.
29
.
Maintains the single factor structure of latent volatility.
.
Ties the realized measure directly to the conditional variance.
.
Explicit modeling of the return-volatility dependence (leverage eect).
Key model features are given in Table 1. Brownless and Gallo (2010) estimates a restricted MEM model that is closely related to the Realized GARCH with the linear specication. They utilize a single realized measure, the realized kernel by Barndor-Nielsen et al. (2008b), so that they have two latent volatility processes,
E(rt2 |Ft−1 ) and
µt = E(xt |Ft−1 ).
ht =
However, their model is eectively reduced to a single factor
model as they introduce the constraint,
ht = c + dµt , see Brownless and Gallo (2010,
Eqns. (6)-(7)).
This structure is also implied by the linear version of our measurement equation. However, they do not formulate a measurement equation or relate
xt − µt
to a leverage function. Instead they, for
the purpose of simplifying the prediction problem, adopt a simple time-varying ARCH structure,
µt = at + bt xt−1 ,
where
at
and
bt
are dened by spline methods. Spline methods were introduced in
this context by Engle and Rangel (2008) to capture the low-frequency variation in the volatility. One of the main advantages of
Realized GARCH
framework is the simplicity by which dependence
between return-shocks and volatility shocks is modeled with the leverage function.
The MEM is
formulated with a general dependence structure for the innovations that drive the latent volatility processes. The usual MEM formulation is based on a vector of non-negative random innovations,
ηt ,
that are required to have mean
E(ηt ) = (1, . . . , 1)0 .
The literature has explored distributions
with this property such as certain multivariate Gamma distributions, and Cipollini, Engle, and Gallo (2009) use copula methods that entail a very exible class of distributions with the required structure. Some drawbacks of this approach include: estimation is complex; and a rigorous analysis of the asymptotic properties of these estimators seems intractable. A simpler way to achieve the structure in the multiplicative error distribution is by setting vector of random variables random variables,
Zt ,
ηt = Zt Zt ,
and work with the
instead. The required structure can be obtained
with a more traditional error structure, where each element of
Zt
is required to have zero mean
and unit variance. This alternative formulation can be adopted without any loss of generality, since the dependence between the elements of
Zt
is allow to take any form. The estimates in Engle and
Gallo (2006) and Shephard and Sheppard (2010) are based on a likelihood where the elements of
ηt
are independent
χ2 -distributed
random variables with one degree of freedom. We have used the
alternative formulation in Table 1 where
2 2 (zt2 , zR,t , zRV,t )0
corresponds to
ηt
in the MEM by Engle
and Gallo (2006).
Leverage function and news impact
The function
τ (z)
is called the leverage function because
it captures the dependence between returns and future volatility, a phenomenon that is referred to as the leverage eect. We normalized such functions by
Eτ (zt ) = 0, and we focus on those that have
the form
τ (zt ) = τ1 a1 (zt ) + · · · + τk ak (zt ),
where
Eak (zt ) = 0,
for all
k,
so that the function is linear in the unknown parameters. We shall see that the leverage function induces an EGARCH type structure in the GARCH equation, and we note that the functional form used in Nelson (1991),
τ (zt ) = τ1 z + τ+ (|zt | − E|zt |)
30
is within the class of leverage functions we
consider. We shall mainly consider leverage functions that are constructed from Hermite polynomials
τ (z) = τ1 z + τ2 (z 2 − 1) + τ3 (z 3 − 3z) + τ4 (z 4 − 6z 2 + 3) + · · · , and our baseline choice for the leverage function is a simple quadratic form This choice is convenient because it ensures that
Ezt = 0
and
var(zt ) = 1.
Eτ (zt ) = 0,
τ (zt ) = τ1 zt + τ2 (zt2 − 1).
for any distribution of
zt ,
so long as
The polynomial form is also convenient in our quasi likelihood analysis,
and in our derivations of the kurtosis of returns generated by this model. The leverage function
τ (z)
is closely related to the
news impact curve, see Engle and Ng (1993),
that maps out how positive and negative shocks to the price aect future volatility. We can dene the news impact curve by
ν(z) = E(log ht+1 |zt = z) − E(log ht+1 ), so that
100ν(z) measures the percentage impact on volatility as a function of the studentized return.
From the ARMA representation it follows that
ν(z) = γ1 τ (z).
Quasi-Maximum Likelihood Estimation (QMLE) Analysis
In this section we discuss the
asymptotic properties of the quasi-maximum likelihood estimator within the RealGARCH(p, q) model. The structure of the QMLE analysis is very similar to that of the standard GARCH model, see Bollerslev and Wooldridge (1992), Lee and Hansen (1994), Lumsdaine (1996), and Jensen and Rahbek (2004b,a).
Both Engle and Gallo (2006) and Shephard and Sheppard (2010) justify the
standard errors they report, by referencing existing QMLE results for GARCH models. This argument hinges on the fact that the joint log-likelihood in Engle and Gallo (2006) and Shephard and Sheppard (2010) is decomposed into a sum of univariate GARCH-X models, whose likelihood can be maximized separately. The factorization of the likelihood is achieved by two facets of these models: One is that all observables (i.e. squared return and each of the realized measures) are being tied to their individual latent volatility process. The other is that the primitive innovations in these models are taken to be independent in the formulation of the likelihood function. The latter inhibits a direct modeling of leverage eect with a function such as
τ (zt ),
which is one of the traits of the Realized
GARCH model. In this section we will derive the underlying QMLE structure for the log-linear model. The structure of the linear
Realized GARCH
Realized GARCH
model is similar. We provide detailed expres-
sions for the rst and second derivatives of the log-likelihood function. These expressions facilitate direct computation of robust standard errors, and provide insight about regularity conditions that would justify QMLE inference. For instance, the rst derivative will unearth regularity conditions that enables a central limit theorem be applied to the score function. For the purpose of estimation, we adopt a Gaussian specication, so that the log likelihood function is given by
n
`(r, x; θ) = −
We write the leverage function as
1X [log(ht ) + rt2 /ht + log(σu2 ) + u2t /σu2 ]. 2 t=1
τ 0 at = τ1 a1 (zt ) + · · · + τk0 ak (zt ),
31
and denote the parameters in the
model by
θ = (λ0 , ψ 0 , σu2 )0 ,
where λ = (ω, β1 , . . . , βp , γ1 , . . . , γq )0 ,
To simplify the notation we write
˜ t = log ht h
and
x ˜t = log xt ,
ψ = (ξ, ϕ, τ 0 )0 .
and dene
˜ t−1 , . . . , h ˜ t−p , x gt = (1, h ˜t−1 , . . . , x ˜t−q )0 ,
˜ t , a0 )0 . mt = (1, h t
So the GARCH and measurement equations can be expresses as
˜ t = λ0 gt h
x ˜t = ψ 0 mt + ut .
and
ht
The dynamics that underlies the score and Hessian are driven by to
λ.
and its derivatives with respect
The properties of these derivatives are stated next.
Lemma 22. Dene h˙ t = h˙ t =
p X
˜t ∂h ∂λ
and h¨ t =
˜t ∂2h ∂λ∂λ0
. Then h˙ s = 0 and h¨ s = 0 for s ≤ 0, and
and
βi h˙ t−i + gt
¨t = h
i=1
p X
¨ t−i + (H˙ t−1 + H˙ 0 ), βi h t−1
i=1
where H˙ t−1 = 01+p+q×1 , h˙ t−1 , . . . , h˙ t−p , 01+p+q×q is an p + q + 1 × p + q + 1 matrix. (ii) When p = q = 1 we have with β = β1 that
h˙ t =
t−1 X
and
β j gt−j
¨t = h
j=0
t−1 X
kβ k−1 (Gt−k + G0t−k ),
k=1
where Gt = (03×1 , gt , 03×1 ). Proposition 23. (i) The score,
∂` ∂θ
=
n P t=1
∂`t ∂θ
, is given by
(1 − zt2 +
1 ∂`t =− ∂θ 2
2ut ˙ t )h˙ t 2 u σu
t − 2u σ 2 mt
,
u
2 σu −u2t 4 σu
where u˙ t = ∂ut /∂ log ht = −ϕ + 12 zt τ 0 a˙ t with a˙ t = ∂a(zt )/∂zt . n P ∂ 2 `t ∂2` (ii) The second derivative, ∂θ∂θ 0 = ∂θ∂θ 0 , is given by t=1
∂ 2 `t = 0 ∂θ∂θ
where
n − 12 zt2 +
bt = (0, 1, − 21 zt a˙ 0t )0
2(u˙ 2t +ut u ¨t ) h˙ t h˙ 0t − 12 1 − zt2 2 σu u˙ t ut ˙0 ˙0 2 mt ht + σ 2 bt ht σu u ut u˙ t ˙ 0 4 ht σu
o
and
n
u ¨t = − 14 τ 0 zt a˙ t + zt2 a¨t
+
2ut u˙ t 2 σu
o
•
•
− σ12 mt m0t
•
ut 0 4 mt σu
2 2 1 σu −2ut 6 2 σu
,
¨t h u
with a¨t = ∂ 2 a(zt )/∂zt2 .
An advantage of our framework is that we can draw upon results for generalized hidden Markov models. that
˜t h
Consider the case
p = q = 1:
From Carrasco and Chen (2002, Proposition 2) it follows
has a stationary representation provided that
32
π = β + ϕγ ∈ (−1, 1),
and if we assign
˜0 h
its invariant distribution, then
˜ t |s < ∞ E|h
if
˜t h
is strictly stationary and
β -mixing
with exponential decay, and
s
E|τ (zt ) + ut | < ∞. Moreover, {(rt , xt ), t ≥ 0} is a generalized hidden Markov model, ˜ t , t ≥ 0}, and so by Carrasco and Chen (2002, Proposition 4) it follows that {h
with hidden chain also
{(rt , xt )}
is stationary
β -mixing
with exponentially decay rate.
The robustness of the QMLE as dened by the Gaussian likelihood is, in part, reected by the weak assumptions that make the score a martingale dierence sequence.
These are stated in the
following Proposition.
Proposition 24. (i) Suppose that E(ut |zt , Ft−1 ) = 0, E(zt2 |Ft−1 ) = 1, and E(u2t |Ft−1 ) = σu2 . Then
is a martingale dierence sequence. (ii) Suppose, in addition, that {(rt , xt , h˜ t )} is stationary and ergodic. Then
st (θ) =
∂`t (θ) ∂θ
n
1 X ∂`t d √ → N (0, Jθ ) n t=1 ∂θ
and
n
−
1 X ∂ 2 `t p → Iθ , n t=1 ∂θ∂θ0
provided that
1 4 E(1
2ut ˙ t )2 E 2 u σu
h˙ t h˙ 0t
− σ12 E u˙ t mt h˙ 0t
Jθ =
and
− zt2 + u
−E(u3t )E(u˙ t ) E(h˙ 0t ) 6 2σu
o E(u˙ 2 ) + σ2t E(h˙ t h˙ 0t ) n u o 1 ˙0 Iθ = − E ( u ˙ m + u b ) h 2 t t t t t σu 0 n
•
•
1 0 2 E(mt mt ) σu 3 E(ut ) 0 6 E(mt ) 2σu
•
,
2 E(u2t /σu −1)2 4 4σu
•
0
1 0 2 E(mt mt ) σu
0
0
1 4 2σu
,
1 2
are nite. t ∂`t Note that in the stationary case we have Jθ = E ∂` ∂θ ∂θ 0 , so a necessary condition for |Jθ | < ∞ is that zt and ut have nite forth moments. Additional moments may be required for zt , depending on the complexity of the leverage function τ (z), because u˙ t depends on τ (zt ).
Theorem 25. Under suitable regularity conditions, we have the asymptotic result. √ n θˆn − θ → N 0, Iθ−1 Jθ Iθ−1 . It is worth noting that the estimator of the parameters in the GARCH equation, the measurement equation,
ψ,
λ,
and those of
are not asymptotically independent. This asymptotic correlation is
induced by the leverage function in our model, and the fact that we link the realized measure, to
ht
xt ,
with a measurement equation.
In the context of ARCH and GARCH models, it has been shown that the QMLE estimator is consistent with an Gaussian limit distribution regardless of the process being stationary or nonstationary.
The latter was established in Jensen and Rahbek (2004b,a).
So unlike the case for
autoregressive processes, we do not have a discontinuity of the limit distribution at the knife-edge in the parameter space that separates stationary and non-stationary processes. This is an important result for empirical applications, because the point estimates are typically found to be very close to the boundary.
33
Notice that the martingale dierence result for the score, Proposition 24(i), does not rely on stationarity. So it is reasonable to conjecture that the central limit theorem for martingale dierence processes is applicable to the score even if the process is non-stationary. While standard errors for
θˆ
may be compute from numerical derivatives, these can also be
computed directly using the following expressions
n
1X 0 Jˆ = sˆt sˆt , n t=1
sˆt =
where
1 2 (1
−
zˆt2
2ˆ ut ˆ ˆ˙ 0 ut 0 σ ˆ2 − u ˆ2 ˆ t, u 4 t ) + 2u ˙ t )ht , − 2 m σ ˆu σu 2ˆ σu
0 ,
and
n ˆ ˆ h˙ t h˙ 0t + 21 1 − zˆt2 + ˆ ˆ˙t m −ˆ σu−2 u ˆt +u ˆtˆbt h˙ 0t ˆ ˆ − uˆσˆt u4˙ t h˙ 0t u n o n ˆ ˆ˙2 +ˆ ¨t ) ˆ 2(u ˆ˙ 0 1 1 2 t ut u ˙ th z ˆ + h 1 − zˆt2 + + 2 t 2 σ ˆu t 2 ˆ ˆ˙t m ˆt +u ˆtˆbt h˙ 0t −ˆ σu−2 u ˆ ˆ − uˆσˆt u4˙ t h˙ 0t
Iˆ
=
1 2
n 1 X n t=1
n
=
1 X n t=1
n zˆt2 +
ˆ˙2 +ˆ ˆ 2(u ¨t ) t ut u 2 σ ˆu
o
ˆ˙t 2ˆ ut u 2 σ ˆu
o
ˆ¨ h t
•
1 ˆ tm ˆ 0t 2 m σ ˆu
•
− σˆuˆ4t m ˆ 0t
2 u2t −ˆ σu 1 2ˆ 6 2 σ ˆu
u
ˆ˙t 2ˆ ut u 2 σ ˆu
o
ˆ¨ h t
• − σˆ12 m ˆ tm ˆ 0t u
0
u
Pn
where the zero follows from the rst order condition: conditions for
λ
ˆ˙ ˆ u ˆ u ˙0 implies that − t 4 t h σ ˆu
=
For our baseline leverage function,
t=1
1−ˆ zt2 ˆ ˙ 2 ht . 2ˆ σu
τ1 zt + τ2 (zt2 − 1),
1 0 log ht 1 mt = , bt = , 1 zt − 2 zt zt2 − 1 −zt2
•
u ˆt m ˆ 0t = 0.
•
• ,
1 4 σ ˆu
Moreover, the rst-order
we have
1 u˙ t = −ϕ + τ1 zt + τ2 zt2 , 2
Decomposition of likelihood function σ({rt , xt , ht }, t ≤ 0))
1 u ¨t = − τ1 zt − τ2 zt2 . 4
The log-likelihood function is (conditionally on
F0 =
given by
log L({rt , xt }nt=1 ; θ) =
n X
log f (rt , xt |Ft−1 ).
t=1 Standard GARCH models do not model be compared to those of the density for
(rt , xt )
xt ,
so the log-likelihood we obtain for these models cannot
Realized GARCH
model. However, we can factorize the joint conditional
by
f (rt , xt |Ft−1 ) = f (rt |Ft−1 )f (xt |rt , Ft−1 ), and compare the partial log-likelihood,
`(r) :=
n P
log f (rt |Ft−1 ),
t=1 model. Specically for the Gaussian specication for
34
zt
and
with that of a standard GARCH
ut , we split the joint likelihood,
into the
sum
n
`(r, x)
n
1X 1X [log(2π) + log(ht ) + rt2 /ht ] + − [log(2π) + log(σu2 ) + u2t /σu2 ]. 2 t=1 2 t=1 {z } | {z } | −
=
=`(r)
=`(x|r)
Asymmetries in the leverage function are summarized by the following two statistics,
ρ− = corr{τ (zt ) + ut , zt |zt < 0}
ρ+ = corr{τ (zt ) + ut , zt |zt > 0}.
and
These capture the slope of a piecewise linear news impact curve for negative and positive returns, such as that implied by the EGARCH model.
Multiperiod forecasts
One of the main advantages of having a complete specication, i.e., a
xt
model that fully describes the dynamic properties of
is that it multi-period ahead forecasting is
feasible. In contrast, the GARCH-X model can only be used to make one-step ahead predictions. Multi-period ahead predictions are not possible without a model for dictions with the
Realized GARCH
model is straightforward. We let
xt . Multi-period ahead pre˜ ht denote either ht or log ht ,
such that the results presented in this section apply to both the linear and log-linear variants of the
Realized GARCH
model.
Consider rst the case where
p = q = 1. By substituting the GARCH equation into measurement
equation we obtain the VARMA(1,1) structure
"
˜t h
#
x ˜t
" =
β
γ
ϕβ
ϕγ
#"
˜ t−1 h x ˜t−1
#
" +
#
ω ξ + ϕω
" +
τ (zt ) + ut
that can be used to generate the predictive distribution of future values of
rt , "
#
0
˜ t, x h ˜t ,
,
as well as returns
using
˜ t+h h x ˜t+h
#
" =
β ϕβ
γ ϕγ
#h "
˜t h x ˜t
# +
h−1 X j=0
"
β ϕβ
This is easily extended to the general case (p, q
γ ϕγ ≥ 1)
#j ("
#
ω ξ + ϕω
" +
0 τ (zt+h−j ) + ut+h−j
where we have
Yt = AYt−1 + b + t , with the conventions
˜t h
. . . ˜ ht−p+1 Yt = x ˜t . . . x ˜t−q+1
,
(β1 , . . . , βp )
(γ1 , . . . , γq )
(Ip−1×p−1 , 0p−1×1 ) 0p−1×q A= ϕ(β1 , . . . , βp ) ϕ(γ1 , . . . , γq ) 0q−1×p (Iq−1×q−1 , 0q−1×1 )
35
,
#) .
ω
,
0p−1×1 b= ξ + ϕω 0q−1×1
0p×1
t = τ (zt ) + ut , 0q×1
so that
= Ah Yt +
Yt+h
h−1 X
Aj (b + t+h−j ).
j=0
The predictive distribution for
˜ t+h h
x ˜t+h ,
and/or
is given from the distribution of
which also enable us to compute a predictive distribution for
rt+h ,
Ph−1 i=0
Ai t+h−i ,
and cumulative returns
rt+1 +
· · · + rt+h .
Induced properties of cumulative returns if and only if the studentized return, identity
rt =
√
ht zt ,
zt ,
The skewness for single period returns is non-zero,
has non-zero skewness.
and the assumption that
zt ⊥⊥ht ,
This follows directly from the
that shows that,
n o d/2 d/2 d/2 E(rtd ) = E(ht ztd ) = E E(ht ztd |Ft−1 ) = E(ht )E(ztd ), and in particular that
3/2
E(rt3 ) = E(ht )E(zt3 ).
So a symmetric distribution for
zt
implies that
zero skewness, and this is property that is shared by standard GARCH model and
rt
has
Realized GARCH
model alike. For the skewness and kurtosis of cumulative returns,
rt +· · ·+rt+k , the situation is very dierent,
because the leverage function induces skewness.
Proposition 26. Consider RealGARCH(1,1) model and dene π = β + ϕγ and µ = ω + ϕξ, so that
where
log ht = π log ht−1 + µ + γwt−1 ,
wt = τ1 zt + τ2 (zt2 − 1) + ut , √
with zt ∼ iid N (0, 1) and ut ∼ iid N (0, σu2 ). The kurtosis of the return rt = ht zt is given by ∞ Y E(rt4 ) 1 − 2π i γτ2 p = 3 E(rt2 )2 1 − 4π i γτ2 i=0
!
( exp
∞ X i=0
π 2i γ 2 τ12 1 − 6π i γτ2 + 8π 2i γ 2 τ22
)
exp
γ 2 σu2 1 − π2
.
(2.27)
There does not appear to be a way to further simplify the expression (2.27), however when
γτ2
is small, as we found it to be empirically, we have the approximation (see the appendix for details)
E(rt4 ) ' 3 exp E(rt2 )2
Simple extension to multivariate
γ 2 τ22 γ 2 (τ12 + σu2 ) + − log π 1 − π2
.
(2.28)
The simplest extension to multivariate case is via the DCC(1,1)-
GARCH framework, by replacing the conditional variance equation (2.19) by
2 Dt2 = diag (ωi ) + diag (βi ) ⊗ Dt−1 + diag (γi ) ⊗ Xt−1
36
Dt2
consists of log h1,t , . . . , log hN,t , and 2 is related to Dt by the usual measurement equation (cf. (2.24))
where the diagonal matrix
Xt
of
log x1,t , . . . , log xN,t ,
and
log xi,t = ξi + ϕi log hi,t + τ (zi,t ) + ui,t . Essentially, the proposed extension replaces the rst stage of DCC-GARCH by RealGarch(1,1), the output of which is then fed into the second stage to estimate the conditional correlation given by
where
Q = E t−1 > t−1
Rt
=
Qt
=
−1/2
diag (Qt )
−1/2
Qt diag (Qt )
e et−1 > 1−α e − βe Q + α t−1 + βQt−1 ,
is the unconditional correlation matrix of the standardized residual. The
key assumption for this extension is that we conjecture that correlation varies at a much slower time scale than does the volatility for each asset. estimations based on
Realized GARCH,
This enables us to simply plug-in the variance
normalize our return series, than estimate the correlation
component exactly as before.
2.2.2 Empirical Analysis Data
Our sample spans the period from 2002-01-01 to 2008-08-31, which we divide into an in-
sample period: 2002-01-01 to 2007-12-31; leaving the eight months, 2008-01-02 and 2008-08-31, for out-of-sample analysis.
We adopt the realized kernel as the realized measure,
xt ,
using a Parzen
kernel function. This estimator is similar to the well known realized variance, but is robust to market microstructure noise and is a more accurate estimator of the quadratic variation. Our implementation of the realized kernel follows Barndor-Nielsen, Hansen, Lunde, and Shephard (2008a) that guarantees a positive estimate, which is important for our log-linear specication. The exact computation is explained in great details in Barndor-Nielsen, Hansen, Lunde, and Shephard (2009). When we estimate a
Realized GARCH
model using open-to-close returns we should expect
whereas with close-to-close returns we should expect
xt
to be smaller than
ht
xt ≈ ht ,
on average.
To avoid outliers that would result from half trading days, we removed days where high-frequency data spanned less than 90% of the ocial 6.5 hours between 9:30am and 4:00pm.
This removes
about three daily observation per year, such as the day after Thanksgiving and days around Christmas.
When we estimate a model that involves
min{s0} log rt2
Result
for
log rt2 ,
we deal with zero returns by substituting
log 0.
In this section we present empirical results using returns and realized measures for 28 stocks
and and exchange-traded index fund, SPY, that tracks the S&P 500 index. We adopt the realized kernel, introduced by Barndor-Nielsen et al. (2008b), as the realized measure,
xt .
We estimate the
realized GARCH models using both open-to-close returns and close-to-close returns. High-frequency prices are only available between open and close, so the population quantity that is estimated by the realized kernel is directly related to the volatility of open-to-close returns, but only captures some of the volatility of close-to-close returns. Next we consider
Realized GARCH
measurement equations.
models with a log-linear specication of the GARCH and
For the purpose of comparison we estimate an LGARCH(1,1) model in
37
Realized GARCH
addition to the six
models. Again we report results for both open-to-close returns
and close-to-close returns for SPY. The results are presented in Table 2. The main results in Table 2 can be summarized by:
.
ARCH parameter,
α,
.
Consistent with the log-linearity we nd
is insignicant.
ϕ ' 1 for both open-to-close and close-to-close returns.
We report empirical results for all 29 assets in Table 3 and nd the point estimates to be remarkable similar across the many time series.
In-sample and out-of-sample likelihood ratio statistics are
computed in Table 4. Table 4 shows the likelihood ratios for both in-sample and out-of-sample period. The statistics are based on our analysis with open-to-close returns. The statistics in Panel A are the conventional likelihood ratio statistics, where each of the ve smaller models are benchmarked against the largest model. The largest model is labeled (2,2). This is the log-linear RealGARCH(2,2) model that has the squared return
rt2
in the GARCH equation in addition to the realized measure. Thus the likelihood
ratio statistics are in Panel A are dened by
LRi = 2 `RG(2,2)∗ (r, x) − `i (r, x) , where
i represents one of the ve other Realized
distribution of likelihood ratio statistic, Thus comparing the
LRi
LRi ,
GARCH
models. In the QMLE framework the limit
is usually given as a weighted sum of
to the usual critical value of a
χ
2
χ2 -distributions.
-distribution is only indicative of signi-
cance.
∗
Comparing the RealGARCH(2,2) So
to RealGARCH(2,2) leads to small LR statistics in most cases.
α tends to be insignicant in our sample.
This is consistent with the existing literature that nds
that squared returns adds little to the model, once a more accurate realized measure is used in the GARCH equation. The leverage function, the hypothesis that
τ (zt ),
is highly signicant in all cases. The LR statistics associated with
τ1 = τ2 = 0
are well over 100 in all cases. These statistics can be computed
†
by subtracting the statistics in the column labeled (2,2) from those in the column labeled (2,2) . The joint hypothesis,
β2 = γ 2 = 0
is rejected in most cases, and so the empirical evidence does not
support a simplication of the model to the RealGARCH(1,1). The results for the two hypotheses
β2 = 0
and
γ2 = 0
are, on average,
5.7,
are less conclusive.
β2 = 0
which would be borderline signicant when compared to conventional critical
2 values from a χ(1) -distribution. and are on average
The likelihood ratio statistics for the hypothesis,
16.6.
The LR statistics for the hypothesis,
γ2 = 0,
tend to be larger
So the empirical evidence favors the RealGARCH(1,2) model over the
RealGARCH(2,1) model. Consider next the out-of-sample statistics in Panel B. These
likelihood ratio
(LR) statistics are
computed as
r
where
n
and
m
n {`RG(2,2) (r, x) − `j (r, x)}, m
denote the sample sizes, in-sample and out-of-sample, respectively. The in-sample
parameter estimates are simply plugged into the out-of-sample log-likelihood, and the asymptotic distribution of these statistics are non-standard because the in-sample estimates do not solve the rst-order conditions out-of-sample, see Hansen (2009). The RealGARCH(2,2) model nests, or is
38
39
-1752.7
.988
-
-0.18 1.04 0.38 -0.07 0.07
-0.18
1.04
0.38
-0.07
0.07
-0.32 0.12
-0.33
0.12 -1710.3
-0.18
-0.18
-1712.0
.986
.975
-2388.8
-0.18
-
-2395.6
0.45
-1711.4
0.13
-0.32
-0.16
.976
-2391.9
0.07
-0.07
0.38
1.04
-0.18
-
0.43
0.13
0.40
-
0.06
0.70
-
0.96
0.04
RG(2,1)
RG(2,2)†
RG(2,2)∗
G(1,1)
RG(1,1)
-1712.3
0.13
-0.35
-0.19
.999
-2385.1
0.07
-0.07
0.38
0.96
-0.23
-0.44
0.46
-0.44
1.43
-
0.00
-2382.9
0.07
-0.07
0.38
1.03
-0.18
-0.41
0.42
-0.46
1.45
0.00
0.00
0.96
0.03
0.05
-2576.86
0.04
-0.11
0.39
0.99
-0.42
-
0.43
-
0.54
-
0.18
-1708.9
-
-
-
.999
-1709.6
0.14
-0.35
-0.16
.999
-1938.24
.988
-1876.51
-0.01
-0.31
-0.27
.974
Panel B: Auxiliary Statistics (in-sample)
-2495.7
-
-
0.41
1.07
-0.16
-0.38
0.40
-0.44
1.42
-
0.00
-1875.46
-0.03
-0.29
-0.25
.987
-2567.15
0.04
-0.11
0.38
1.00
-0.42
-0.21
0.48
-
0.72
-
0.11
-1876.12
-0.03
-0.28
-0.25
.975
-2571.67
0.04
-0.11
0.39
0.99
-0.42
-
0.46
0.15
0.37
-
0.19
RG(2,1)
-1875.39
0.01
-0.29
-0.25
.999
-2564.16
0.04
-0.11
0.38
1.02
-0.42
-0.42
0.45
-0.41
1.38
-
0.01
RG(2,2)
Close-to-Close Returns
RG(1,2)
Panel A: Point Estimates and Log-Likelihood (in-sample)
RG(2,2)
Open-to-Close Returns
RG(1,2)
0.41
-
0.55
0.03
0.06
RG(1,1)
-1874.88
-
-
-
.999
-2661.73
-
-
0.41
1.03
-0.41
-0.40
0.42
-0.43
1.40
-
0.01
RG(2,2)†
RG(2,2)
RG(2,2)
∗
†
is the (2,2) extended to include the ARCH-term
denotes the RealGARCH(2,2) model without the
τ (z) function that captures the dependence 2 α log rt−1 . The latter being insignicant.
between returns and innovations in volatility.
Table 2: Results for the logarithmic specication: G(1,1) denotes the LGARCH(1,1) model that does not utilize a realized measure of volatility.
π ρ ρ− ρ+ ` (r)
α β1 β2 γ1 γ2 ξ ϕ σu τ1 τ2 ` (r, x)
0.04
ω
(0.00)
G(1,1)
Model
-1876.13
-0.05
-0.33
-0.28
.999
-2563.53
0.04
-0.11
0.38
0.99
-0.42
-0.42
0.46
-0.39
1.35
0.00
0.02
RG(2,2)∗
AA AIG AXP BA BAC C CAT CVX DD DIS GE GM HD IBM INTC JNJ JPM KO MCD MMM MRK MSFT PG T UTX VZ WMT XOM SPY Average
os
` (r)
os
` (r, x)
-
-493.8 -535.8 -477.8 -440.2 -517.4 -516.0 -394.0 -377.7 -394.1 -364.8 -401.7 -499.5 -452.0 -360.8 -424.4 -277.6 -496.7 -360.3 -386.3 -331.4 -437.1 -402.9 -315.2 -397.3 -372.0 -404.9 -353.7 -364.9 -338.4
ρ+
-
-409.1 -438.0 -390.8 -335.8 -416.9 -427.6 -315.6 -306.1 -307.9 -288.7 -300.8 -384.7 -378.2 -284.7 -345.2 -192.9 -402.7 -265.0 -290.7 -261.7 -314.0 -317.7 -239.8 -313.1 -289.5 -320.8 -275.9 -299.0 -260.0
ρ−
0.22
0.24 0.08 0.25 0.26 0.21 0.24 0.27 0.14 0.20 0.22 0.25 0.31 0.20 0.24 0.22 0.30 0.22 0.19 0.26 0.21 0.18 0.24 0.14 0.25 0.29 0.23 0.30 0.15 0.13
ρ
-0.30
-0.32 -0.17 -0.30 -0.36 -0.31 -0.31 -0.32 -0.35 -0.35 -0.35 -0.26 -0.33 -0.37 -0.32 -0.24 -0.25 -0.30 -0.28 -0.35 -0.23 -0.13 -0.31 -0.32 -0.32 -0.34 -0.31 -0.29 -0.37 -0.32
π
-0.08
-0.08 -0.06 -0.05 -0.09 -0.09 -0.07 -0.08 -0.19 -0.13 -0.09 -0.02 -0.01 -0.13 -0.09 -0.05 0.04 -0.10 -0.06 -0.09 -0.04 0.04 -0.08 -0.14 -0.07 -0.05 -0.08 -0.02 -0.20 -0.17
` (r, x)
0.99
0.98 0.98 0.99 0.99 0.99 0.99 0.99 0.97 0.98 1.00 0.99 0.99 0.99 0.98 1.00 0.99 0.99 0.99 0.99 0.97 0.98 0.99 0.98 0.99 0.99 0.99 0.99 0.98 0.99 -2994.1
-2893.6
-3196.7
-3059.2
-3512.7
-2646.6
-3021.1
-3478.5
-2944.8
-3371.9
-2573.6
-3276.8
-2777.3
-3481.1
-2896.7
-3318.4
-3967.3
-2988.7
-3289.6
-3067.3
-3021.8
-3279.4
-2974.0
-2823.4
-3260.0
-3217.9
-3317.2
-3519.9
-
-2388.8
` (r)
-
-2776.4 -2403.1 -2371.1 -2536.0 -2016.9 -2260.5 -2621.1 -2319.1 -2301.2 -2518.5 -2197.8 -2987.9 -2538.4 -2192.6 -2869.1 -1874.8 -2463.0 -1886.7 -2461.8 -2140.3 -2479.2 -2330.7 -1850.7 -2560.4 -2302.4 -2343.4 -2164.9 -2334.7 -1710.3
τ2
0.09
0.09 0.04 0.10 0.09 0.08 0.09 0.09 0.08 0.08 0.09 0.08 0.12 0.09 0.08 0.07 0.10 0.09 0.07 0.11 0.07 0.07 0.08 0.08 0.10 0.10 0.09 0.09 0.08 0.07
τ1
-0.03
-0.04 -0.02 -0.02 -0.03 -0.04 -0.03 -0.03 -0.08 -0.05 -0.04 -0.01 -0.01 -0.05 -0.04 -0.02 0.02 -0.04 -0.02 -0.05 -0.02 0.01 -0.03 -0.05 -0.03 -0.01 -0.03 -0.01 -0.08 -0.07
σu
0.41
0.40 0.45 0.43 0.39 0.42 0.39 0.38 0.39 0.40 0.41 0.41 0.47 0.41 0.39 0.36 0.44 0.42 0.38 0.45 0.41 0.47 0.38 0.41 0.46 0.40 0.43 0.39 0.38 0.38
ϕ
1.04
1.15 1.02 1.08 1.22 0.99 0.99 1.07 1.32 1.08 1.10 0.98 1.02 1.01 0.94 1.03 1.04 0.98 0.93 0.98 0.98 1.23 0.92 1.04 0.86 0.88 1.01 1.04 1.26 1.04
ξ
-0.02
-0.07 -0.06 -0.16 -0.13 0.00 0.09 -0.14 -0.09 0.08 -0.05 0.01 -0.32 0.00 0.01 -0.11 0.13 -0.02 0.19 -0.01 0.02 -0.19 0.08 0.18 0.01 0.06 0.07 0.12 -0.10 -0.18
γ2
-0.21
-0.14 -0.21 -0.12 -0.17 -0.29 -0.19 -0.22 -0.14 -0.17 -0.25 -0.19 -0.24 -0.20 -0.15 -0.33 -0.19 -0.30 -0.21 -0.25 -0.23 -0.21 -0.22 -0.25 -0.38 -0.24 -0.20 -0.19 -0.12 -0.18
γ1
0.41
0.33 0.45 0.38 0.31 0.51 0.45 0.37 0.33 0.37 0.39 0.38 0.39 0.39 0.41 0.46 0.38 0.49 0.45 0.37 0.43 0.33 0.44 0.43 0.53 0.45 0.40 0.37 0.34 0.45
β
0.79
0.77 0.74 0.70 0.82 0.78 0.74 0.82 0.71 0.77 0.85 0.81 0.84 0.79 0.74 0.87 0.80 0.81 0.76 0.88 0.77 0.84 0.79 0.78 0.86 0.80 0.79 0.80 0.71 0.70
ω
0.01
0.03 0.02 0.05 0.02 0.00 -0.02 0.03 0.03 -0.01 0.01 0.00 0.06 0.01 0.00 0.02 -0.03 0.01 -0.05 0.00 0.00 0.03 -0.01 -0.04 0.00 -0.01 -0.01 -0.02 0.03 0.04
Table 3: Estimates for the RealGARCH(1, 2) model.
40
nested in, all other models. For nested and correctly specied models where the larger model has
k
additional parameters that are all zero (under the null hypothesis) the out-of-sample likelihood
ratio statistic is asymptotically distributed as
r
where
Z1
and
n d {`i (r, x) − `j (r, x)} → Z10 Z2 , m
Z2
are independent
as m, n → ∞ with m/n → 0,
Zi ∼ Nk (0, I).
This follows from, for instance, Hansen (2009,
corollary 2), and the (two-sided) critical values can be inferred from the distribution of
k=1
|Z10 Z2 |.
the 5% and 1% critical values are 2.25 and 3.67, respectively, and for two degrees of freedom
(k = 2),
these are 3.05 and 4.83, respectively.
When compared to these critical values we nd,
on average, signicant evidence in favor of a model with more lags than RealGARCH(1,1). statistical evidence in favor of a leverage function is very strong.
α,
For
The
Adding the ARCH parameter,
will (on average) result in a worse out-of-sample log-likelihood. As for the choice between the
RealGARCH(1,2), RealGARCH(2,1), and RealGARCH(2,2) the evidence is mixed. In Panel C, we report partial likelihood ratio statistics, that are dened by
`j (r|x)},
2{maxi `i (r|x) −
so each model is compared with the model that had the best out-of-sample t in term of
Realized GARCH models with the standard GARCH(1,1) model, and we see that the Realized GARCH models also dominate the standard GARCH model in this metric. This is made more impressive by the fact that Realized GARCH models are maximizing the joint likelihood, and not the partial likelihood that is used in the partial likelihood. These statistics facilitate a comparison of the
these comparisons.
8
The leverage function,
τ (z)
is closely related to the
news impact curve
that was introduced by
Engle and Ng (1993). High frequency data enable are more more detailed study of the news impact curve than is possible with daily returns. A detailed study of the news impact curve that utilizes high frequency data is Ghysels and Chen (2010). Their approach is very dierent from ours, yet the shape of the news impact curve they estimate is very similar to ours. The news impact curve shows how volatility is impacted by a shock to the price, and our Hermite specication for the leverage function presents a very exible framework for estimating the news impact curve. In the log-linear specication we dene the new impact curve by
ν(z) = E(log ht+1 |zt = z) − E(log ht+1 ), so that
100ν(z)
measures the percentage impact on volatility as a function of return-shock mea-
sures in units of standard deviations. As shown in Section 2.2.1 we have
ν(z) = γ1 τ (z).
We have
estimated the log-linear RealGARCH(1,2) model for both IBM and SPY using a exible leverage function based on the rst four Hermite polynomials.
(−0.036, 0.090, 0.001, −0.003)
for IBM and
The point estimates were
(ˆ τ1 , τˆ2 , τˆ3 , τˆ4 ) =
(ˆ τ1 , τˆ2 , τˆ3 , τˆ4 ) = (−0.068, 0.081, 0.014, 0.002)
for SPY.
Note that the Hermite polynomials of orders three and four add little beyond the rst two polynomials. The news impact curves implied by these estimates are presented in Figure 2.3. The fact that
ν(z)
is smaller than zero for some (small) values of
z
is an implication of its denition that implies,
E[ν(z)] = 0. 8 There is not a well developed theory for the asymptotic distribution of these statistics, in part because we are comparing a model that maximizes the partial likelihood (the GARCH(1,1) model) with models that maximizes the joint likelihood (the Realized GARCH models).
41
Panel A: In-Sample Likelihood Ratio
Panel B: Out-of-Sample Likelihood Ratio
Panel C: Out-of-Sample Partial Likelihood Ratio
0.7 7.2 1.4 0.0 0.9 0.4 1.5 0.3 0.0 0.8 0.7 0.9 0.4 0.0 1.8 0.0 1.6 0.5 1.7 1.4 1.8 0.1 1.7 0.0 0.1 0.1 2.6 0.5 1.3
(2,2)∗
0.6 0.0 1.7 0.3 19.7 0.0 1.6 0.0 1.4 1.1 0.3 0.3 2.1 0.3 0.1 0.1 2.0 0.9 2.0 1.0 2.4 0.2 1.5 1.7 0.0 0.0 1.4 0.5 2.5
1.0
(2,2)†
0.0 6.0 1.3 0.0 1.5 1.1 1.0 0.3 1.2 0.8 0.0 0.0 0.2 0.1 0.0 0.1 1.6 0.5 1.8 1.4 1.9 0.0 1.3 0.0 0.2 0.3 1.3 0.5 0.0
1.6
(2,2)
2.6 7.4 0.1 1.7 0.0 0.5 0.0 0.2 1.5 1.4 0.6 0.2 0.1 0.7 2.0 0.1 0.1 0.0 0.2 0.9 1.7 0.4 0.6 2.6 0.9 2.8 0.0 0.3 0.7
0.8
(2,1)
1.4 5.9 0.3 1.8 0.6 0.9 0.8 0.3 1.5 2.0 0.0 0.0 0.0 0.5 0.5 0.0 1.3 0.6 1.5 1.9 1.6 0.5 1.4 0.2 0.2 0.7 0.6 0.5 0.6
1.0
(1,2)
3.3 9.3 0.0 0.6 4.1 0.3 0.1 0.6 1.2 0.0 0.9 1.4 0.9 0.4 1.7 0.4 0.0 0.4 0.0 0.0 0.0 0.1 0.0 2.8 1.4 3.5 0.0 0.0 0.8
1.0
(1,1)
4.6 56.1 24.0 1.1 147.9 26.9 47.3 30.1 19.2 35.4 41.6 57.5 45.5 12.0 133.1 21.8 34.9 5.6 6.8 14.5 11.8 37.1 24.8 19.2 12.3 8.7 30.5 21.6 40.8
1.2
G11
0.1 12.5 0.2 0.0 -2.8 -7.2 2.2 -0.1 6.9 0.0 0.6 14.6 0.5 -0.2 19.1 0.2 -0.6 0.0 1.4 -0.1 0.4 4.1 1.0 -0.2 -0.4 -1.3 9.4 0.6 1.4
33.5
(2,2)∗
21.9 25.5 13.3 24.5 56.5 -0.1 9.0 20.8 -7.1 24.3 9.3 7.2 28.9 25.0 36.8 30.7 -3.6 22.6 47.4 24.5 32.4 17.0 35.0 35.1 33.1 16.6 28.7 27.9 24.4
2.1
(2,2)†
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23.0
(2,2)
6.4 6.0 1.5 -2.6 -0.3 -2.5 -4.2 2.2 3.0 3.7 5.9 6.8 -1.0 1.0 -3.4 2.0 -5.4 -1.0 -9.3 -2.8 -1.2 7.1 -0.4 13.5 1.2 11.1 -7.9 1.0 1.9 0
(2,1)
4.5 -0.2 2.7 0.4 -0.7 -2.7 -1.0 0.0 4.1 4.8 0.6 -0.2 -1.0 -0.1 0.1 -0.8 -2.5 -2.8 -0.6 -0.2 -2.5 4.2 0.3 -0.3 -2.0 3.2 -3.8 -0.1 -1.2 1.1
(1,2)
6.9 15.7 1.2 -1.4 8.0 1.3 1.7 5.3 4.6 4.8 14.1 17.0 2.2 3.3 0.9 7.1 0.3 2.1 -13.0 -2.4 -3.4 8.3 -1.3 19.7 5.8 15.0 -7.2 5.7 6.3 0.1
(1,1)
0
4.4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
187.4 201.2 197.0 197.1 198.8 228.4 227.3 194.6 167.0 213.7 200.3 319.9 201.3 176.7 130.8 235.4 213.5 186.1 278.6 183.5 309.0 186.1 160.4 198.3 223.7 188.7 190.0 234.0 225.6 0
(2,2)∗
3.9 14.2 0.0 0.0 5.1 6.3 1.6 0.0 6.1 0.0 12.2 18.4 2.0 1.0 9.7 0.1 1.9 0.0 0.8 0.1 2.9 10.2 2.9 0.0 0.0 0.5 8.3 0.9 4.2 208.8
(2,2)†
11.7 20.6 26.0 18.8 20.5 16.0 13.8 3.5 15.9 33.1 20.2 39.9 14.4 20.2 47.8 4.8 23.6 28.3 26.1 20.4 10.6 20.5 12.3 39.0 29.0 13.2 23.0 4.3 17.9 3.9
(2,2)
8.3 15.2 25.0 6.0 5.9 9.5 2.3 0.1 10.8 12.6 12.6 18.8 4.0 16.2 13.9 1.3 3.5 22.0 2.9 6.7 12.0 11.5 3.0 4.8 21.6 4.3 12.0 1.1 11.6 20.5
(2,1)
24.0 43.8 30.5 34.8 46.3 26.8 39.8 17.7 31.1 57.7 37.7 62.7 29.8 28.5 75.9 34.6 50.7 39.7 54.8 32.0 64.6 37.3 36.5 69.8 39.6 31.5 36.2 14.7 25.3 9.6
(1,2)
AA AIG AXP BA BAC C CAT CVX DD DIS GE GM HD IBM INTC JNJ JPM KO MCD MMM MRK MSFT PG T UTX VZ WMT XOM SPY
39.8
(1,1)
Average
Table 4: In-Sample and Out-of-Sample Likelihood Ratio Statistics
42
Figure 2.3: News impact curve for IBM and SPY
The estimated news impact curve for IBM is more symmetric about zero than that of SPY, and this empirical result is fully consistent with the existing literature. The most common approach to model the news impact curve is to adopt a specication with a discontinuity at zero, such as that used in the EGARCH model by Nelson (1991),
τ (z) = τ1 z + τ+ (|z| − E|z|). We also estimated the leverage
functions with the piecewise linear function that leads to similar empirical results. Specically, the implied news impact curves have the most pronounced asymmetry for the index fund, SPY, and the two oil related stocks, CVX and XOM. However, the likelihood function tends to be larger with the polynomial leverage function,
τ (z) = τ1 z + τ2 (z 2 − 1),
and the polynomial specication simplies
aspects of the likelihood analysis. For the multivariate case, an application of the proposed DCC(1,1)-RealGARCH(1,1) extension to the univariate
Realized GARCH
model is illustrated by using four stocks, IBM, XOM, SPY
and WMT, with tted result shown in Table 5.
Note that this simple extension is a proof of
concept of how the RealGARCH framework can be extended to deal with covariance estimation, as a straightforward plug-in to an existing establish framework. The assumption that volatility changes more frequently than the underlying correlation of stocks is justiable for typical portfolio holding horizon measured in days or longer. Here we can draw on the result of empirical analysis for the univariate case and expect similar predictive gain for this proposed extension. More sophisticated frameworks would be benecial when we need to deal with cases where both volatility and correlation are highly non-stationary, and change on a comparable time scale. This is an area for future research.
43
h0 ω β γ ξ ϕ τ1 τ2 σu
IBM
XOM
SPY
0.76
1.17
0.28
WMT 2.39
-0.00
0.04
0.06
-0.03
0.64
0.59
0.55
0.66
0.36
0.30
0.41
0.30
0.01
-0.10
-0.18
0.12
0.94
1.27
1.04
1.04
-0.04
-0.08
-0.07
-0.01
0.08
0.08
0.07
0.09
0.39
0.38
0.38
0.40
α e
0.018
βe 0.952
Table 5: Fitted result for DCC(1,1)-RGARCH(1,1)
44
3
Asset Return Estimation and Prediction
mean
In the mean-variance portfolio optimization framework, future asset return.
often refers to some prediction of
Essentially, the framework requires a forecast for the rst two moments of
the joint distribution of an universe of assets under consideration. Of course, in a jointly Gaussian setting, these are the only two moments needed for deriving some objective value based on a suitably chosen utility function. As it is now commonly acknowledged, asset returns are far some Gaussian and often exhibit characteristic tail and asymmetric behavior that are distinctly dierent to a simple Gaussian distribution. Whether it is essential to take these added complexity into account depends on the problem at hand and on the reliability of whether these stylized facts can be adequately estimated, and important signals extracted, from the underlying inherently noisy observations. Given the signicant amount of improvement in protability from being able to increase just a small fraction of the investor's edge over a random estimation, a lot of eort have been spent on modeling this rst moment of the asset return dynamics.
3.1
Outline of Some Commonly Used Temporal Models
Random Walk
If we assume that the underlying return process is a symmetric random walk,
then the one step ahead prediction is simply equal to the current return. That is,
rt+1 | Ft = rt + t+1 , where
t
is a
rt ∈ RN
is the vector of returns often dened as the dierence in the logarithm of prices, and
Ft -adapted,
zero-mean,
iid
random variable. This is not a model per se, but a benchmark
that are often used to compare the predictive gain of dierent modeling frameworks.
Auto-Regressive-Moving-Average (ARMA) models
auto-regressive
Jenkins (1976)) combines the ideas of
ARMA model (see for example Box and
(AR) and
moving average
(MA) models into
a compact and more parsimonious form so that the number of parameters used is kept small. Note that GARCH, mentioned in Section (2), can be regarded as an ARMA model. There are a number of variations of this versatile framework.
Example 27.
Simple ARMA
The classic ARMA(p,q) framework is given by
where
Φ (B) = 1 −
Pp
i=0
Φ (B) rt
=
Θ (B) t
∼
N (0, 1) ,
φi B i , Θ (B) = 1 −
Gaussian innovation series. The series,
rt ,
Pq
i i=0 θi B ,
is said to be
B
is the lag operator, and
unit-root stationary
{t }
if the zeros of
is
an iid
Φ (z)
are
outside the unit circle. For non-stationary series, appropriate dierencing needs to be applied rst before we can estimate the parameters of the model (Box and Jenkins, 1976). The lag order (i.e. the value for
p
and
q ) selection can be done iteratively via the Akaike Information Criterion
value of the tted result. Recall that
AIC = 2k − 2 log L, 45
(AIC)
where
k
is the number of parameters in the model and
L
is the likelihood function.
The model
tting algorithm iteratively increase the lag order and select the nal complexity with the lowest AIC.
Example 28.
ARMA with Wavelet Smoothing
Market microstructure noises, such as bid-ask bounce and price quantization, usually have the maximum impact for sampling frequency in the region of 5 to 20 minutes (Andersen et al., 2001c). To eectively lter out these noises, we can rst decompose the raw return series,
rt ,
into dierent
multiresolution analysis (MRA) (Mallat, 1989), followed by reconstitution of the time series signal by using resolutions, or more precisely dierent levels of wavelet details, by an application of wavelet
a subset of wavelet details so that details below a threshold are ltered out. Recall by denition of
discrete wavelet transform
(DWT)
W = Wrt , where
W ∈ RN ×N
tion of the vector vector with
N/2
j
is the orthogonal real-valued matrix dening the DWT. Consider the decomposi-
W
into
J +1
elements
9
sub-vectors, so that
and
rt = W > W =
J
2 = N.
N −1 X
J X
Wn Wn =
Dj , j = 1, . . . , J , Wn
partitioning the rows of discarding the rst
r˜t =
PJ
is the
W
Wj> Wj + VJ> VJ :=
j=1
which denes a MRA of our return series
at a certain scale.
, where
Wj
is a column
So we have
n=0
wavelet detail
>
W = (W1 , . . . , WJ , VJ )
rt ,
J X
Dj + SJ ,
(3.1)
j=1
consists of a constant vector
SJ
and
J
levels of
rt
each of which contains a time series related to variations in
n-th
row of the DWT matrix.
Wj
commensurate with the partitioning of
and
W
Vj
into
matrices are dened by
W1 , . . . , W J
and
VJ .
By
K − 1 levels of detail, we obtain a series that is a locally ltered, smoothed series,
j=K Dj + SJ ,
which retains features with time scales larger than a predetermined threshold.
Finally, a simple ARMA model is tted to this ltered series,
Φ (B) r˜t
=
Φ (B) t
∼
N (0, 1) ,
where again, AIC can be used to select the optimal lag parameters.
Example 29.
ARMA-GARCH
To account for heteroskedasticity of the return series, we can jointly t an ARMA and GARCH model, for example an ARMA(1,1)-GARCH(1,1) framework on the raw return series,
Φ (B) rt
=
Θ (B) t
ht
=
2 α0 + α1 rt−1 + β1 ht−1
t
∼
N (0, ht ) .
rt ,
so that
9 For cases where N is not an integral power of 2, we can simply truncate and discard some part of the data series, or, refer to Mallat (1989) if it is necessary to treat it as a special case.
46
As in Example (27), the algorithm rst uses AIC to determine the optimal lag parameters (p*,q*) for the ARMA framework then, holding the parameter values constant, it ts an ARMA(p*,q*)GARCH(1,1) model by maximizing the likelihood a Gaussian density for the innovation,
0.000
0.002
Simple ARMA
-0.002
Return
{t }.
101640
101650
101660
101670
101680
Index
Return
-0.002
0.000
0.002
Smoothed ARMA
101640
101650
101660
101670
101680
Index
Return
-0.002
0.000
0.002
Smoothed ARMA-GARCH
101640
101650
101660
101670
101680
Index
Figure 3.1: Output of three variations of the ARMA time series framework. Black circles are actual data-points and red dash-lines indicate the next period predictions.
Top panel:
simple ARMA;
Center panel: ARMA with wavelet smoothing; Bottom panel: ARMA-GARCH. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. 15:15:00.
Sample period shown here is between 2008-05-06 13:40:00 and 2008-05-07
At each point, we t a model using the most recent history of 1,000 data points, then
make a 1-period ahead forecast using the tting parameters.
See Figure 3.1 for predictions versus actual output for models given in the three examples above, all based on some variations of the ARMA framework.
Innovations State Space Model
The innovations state space model can be written in terms of
the underlying measurement and transition equations
where
yt
yt
= w (xt−1 ) xt−1 + r (xt−1 ) t
xt
= f (xt−1 ) xt−1 + g (xt−1 ) ⊗ t
denotes the observation at time
t,
and
xt
denotes the state vector containing unobserved
components that describe the level, trend and seasonality of the underlying time series; noise series with zero mean and constant variance
σ2 .
{t } is a white
Hyndman et al. (2000) gives a comprehensive
treatment of frameworks that belong to this class of models and propose a methodology of automatic
47
forecasting that takes into account trend, seasonality and other features of the data. They cover 24 state space models in their framework that include additive and multiplicative specications for the seasonal component and an additional damped specication for the trend component. Model selection is done using the AIC of the tted models.
additive Holt-Winters (see Holt-Winters Additive Model has
The general formulation of innovative state space model also include Holt, 1957 and Winters, 1969) as a specic instance. Recall the the following specication
l = α (yt − st−m ) + (1 − α) (lt−1 + bt−1 ) t bt = β (lt − lt−1 ) + (1 − β) bt−1 s = γ (y − l − b ) + (1 − γ) s . t
The
t
t−1
t−1
t−m
Level Growth Seasonal
h -step ahead forecast is then given by yˆ t+h|t = lt + bt h + st−m+h+ m
where
h+ m = (h + 1) mod m + 1.
If we dene
t , yˆ t|t−1 − yt ,
then the
Holt-Winters Additive Model
can be expressed in the linear innovation framework as
yt
h
=
where the state vector
xt =
=
lt
xt−1 + t
1
1
1
1
0
0 0
1
0 xt−1 + αβ ⊗ t , 1 γ
xt
i
1
bt
0
st−m
>
α
. This is an example of a linear innovations state
space model. Figure 3.2 compares the 1-period predicted return with the actual realized return for the sample period between 2008-05-06 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.
Neural Network
Neural network is a popular subclass of nonlinear statistical models. See Hastie
et al. (2003) for a comprehensive treatment of this class of models.
feed forward - back propagation
network, also known as
The structural model for a
multilayer percetrons
in the neural network
literature, is given by
F (x) =
M X
bm S a0m +
m=1
where
S : R → (0, 1)
specifying
F
m X
ajm xj + b0
j=1
is known as the activation function, and
n oM M bm {ajm }0
are the parameters 0 (x). Figure 3.3 illustrates a simple feed-forward neural network with four inputs, three
hidden activation units and one output. For illustration purpose, a (15,7,1) neural net is tted to the same dataset, with tan-sigmoid,
S (x) = e
x
,
−e−x/ex +ex as the activation function for the hidden
layer. The model inputs include lagged returns, lagged trading direction (+1 for positive return and -1 for negative return) indicators and logarithm of the ratio of market high to market low prices,
48
0.000 -0.002
Return
0.002
AIC innovations state space
101640
101650
101660
101670
101680
Index
0.000 -0.002
Return
0.002
Additive Holt-Winters
101640
101650
101660
101670
101680
Index Figure 3.2: Output based on AIC general state space model following Hyndman et al. (2000) and the Holt-Winters Additive models. Black circles are actual data-points and red dash-lines indicate the next period predictions. Dataset used is the continuously rolled near-expiry E-mini S&P 500 futures contract traded on the CME, sampled at 5 minute intervals. Sample period shown here is between 2008-05-06 13:40:00 and 2008-05-07 15:15:00. At each point, we t a model using the most recent history of 1,000 data points, then make a 1-period ahead forecast using the tting parameters.
dened over a look-back period of 1,000 observations, as a proxy for volatility. Figure 3.4 shows the model output.
3.2
Self Excited Counting Process and its Extensions
Until recently, the majority of time series analyses related to nancial data has been carried out using regularly spaced price data (see Section 3.1 for the outline of some commonly used models based on equally spaced data), with the goal of modeling and forecasting key distributional characteristics of future returns, such as expected mean and variance.
These time series data mainly consist of
daily closing prices, where comprehensive data are widely available for a large set of asset classes. With the recent rapid development of high-frequency nance, the focus has shifted to intra-day tick data, which record every transaction during market hours, and come with irregularly spaced time-stamps. We could resample the dataset and apply the same analyses as before, or we could try to explore additional information that the inter-arrival times may convey in terms of likely future trade direction. In order to take into account of these irregular occurrences of transactions properly, we can adopt the framework of a point process. In a doubly stochastic framework (see Bartlett, 1963), both the counting process and the driving intensity are stochastic. A point process is called self-excited if the
49
1
a01
a11
x1
S1
a12 1
x2
b1
a02 b2
S2 x3
1 a 03
a42
x4
1 b 0
+
F(x)
b3
S3
a43
Input layer
hidden layer
output layer
Figure 3.3: Diagrammatic illustration of a feed-forward (4,3,1) neural network.
current intensity of the events is determined by events in the past, see Hawkes (1971). It is widely accepted and observed that volatility of price returns tends to cluster. That is, a period of elevated volatility is likely to be followed by periods of similar levels of volatility. Trade arrivals also exhibit such clustering eect (Engle and Russell, 1995), for example a buy order is likely to be followed closely by another buy order. These orders tend to cluster in time. There has been a growing amount of literature on the application of point process to model inter-arrival trade durations, see for example Engle and Russell (1997), and Bowsher (2003) for a comprehensive survey of the latest modeling frameworks. This section of the thesis extends previous work on the application of self-excited process to model high frequency nancial data. The extension comes in the form of a marked version of the process in order to take into account trade size inuence on the underlying arrival intensity. In addition, by incorporating information from the
limit order book
(LOB), the proposed framework takes into
account a measure of supply-demand imbalance of the market by parametrize the underlying base intensity as a function of this imbalance measure. Given that the main purpose of this section is to outline the methods to incorporate trade size and order-book information in a self-excited point process framework, empirical comparison of the proposed framework to other similar and related models are reserved for future research. For a comprehensive introduction to the theory of point process, see Daley and Vere-Jones (2003) and Bremaud (1980). The following sub-sections give a brief outline of the self-excited point process framework in order to motivate the extensions that follow in Section 3.2.3.
3.2.1 Univariate Case Consider a simple counting process for the number of trade events, of the trades, space
{ti }i∈{0,1,2,...T } ,
(Ω, F, P),
such that
Nt ,
characterized by arrival time
a sequence of strictly position random variable on the probability
t0 = 0
and
0 < ti ≤ ti+1
for
i ≥ 1.
We allow the intensity of the process
be itself stochastic, characterized by the following Stieltjes integral
ˆ h (t − u) dNu
λt = µ + u
A
0
#"
x∗ −λ∗
73
#
" =
−c b
# (4.5)
Note that we have assumed
A
has full row rank, since
m ≤ n,
independent linear equality constraints on a problem with dimensionality of the optimization to
Z ∈ N (A),
i.e.
Z
is in the
null space
n − m. of
A,
Let
then the imposition of
linearly
n variables can be viewed as reducing the > , i.e. Y is in the range of A , and
Y ∈ R A>
then (4.4) can be simplied to give
X 1 > > > > ∗ x Z HZx log xi min x Z (c + HY x ) + Z −µ Z Z Y 2 xZ ∈Rn−m x ∈x i
where the unique decomposition of the
m
n-vector, x
(4.6)
Z
is given by
x = Y xY + ZxZ . XZ−2 =
Let
diag
0, . . . , 0, x−2 Z
Z > H + µXZ∗−2 Z 0,
. If
then the unique solution to (4.6),
x∗Z ,
must satisfy the equations
Z > HZx∗Z
µXZ−1 e | {z }
=
−Z > (c + HY x∗Y ) ,
(4.7)
term due to barrier Note that due to the barrier term on the right hand side, the system in (4.7) is clearly nonlinear in
xz ,
so ruling out the possibility of using a simple
conjugate gradient
(CG) method.
Base on a similar analysis as the one above, an IQP can be posed as solving a sequence of unconstrained QP in a reduced space in to
∗
x
from
x,
∗
p = x − x.
so that
Rn−m ,
e ge + Hp Ap where
−v = Ax − b.
Observe that
p
min
s.t.
12
corresponding partition for some
pB ∈ R
m×1
,
,
x=
h
A=
pS ∈ R
h
p
p
denote the step
to give
= A> λ ∗
(4.8)
= −v
(4.9)
itself is the solution of an equality-constrained QP,
p∈Rn
Consider the partition
using a line research method. Let
We can write (4.5) in terms of
xB B
(n−m)×1
e ge> p + 21 p> Hp Ap = −v.
i xS in m basic and n − m superbasic variables, together with i h i S for some B ∈ Rm×m , S ∈ Rm×(n−m) , and p = pB pS , then (4.9) can be written as
BpB = v − SpS 12 e.g.
B.
we could use the pivoting permutation, as part of the QP decomposition, to nd the full rank square matrix
74
and we have
"
pB
#
" =
pS
"
"
B −1
W = Clearly we have the
AQ =
h
I
0
i
13
# ,Z=
n × (n − m)
"
" ,Q
I
pS .
I
#
−B −1 S
#
−B −1 S
v+
0
"
0
pS #
B −1
= − Let
#
−B −1 v − B −1 SpS
−1
=
#
B
S
0
In−m
.
reduced-gradient basis, Z ∈ N (A)
matrix of
and the product
. We have
p = W pB + ZpS , where the values of
pB
and
pS
can be readily determined by the following methods,
. pB :
by (4.9) and (4.10), we have
. pS :
multiply (4.8) by
ZT
(4.10)
Ap = AW pB = −v
which simplies to
pB = −v ;
and using (4.10), we obtain
e S = −Z > ge − Z > HY e pB , Z > HZp which has a unique solution for If
x
is feasible (so that
v = 0),
then
pS
since
(4.11)
e 0. Z > HZ
pB = 0, p ∈ N (A)
and (4.11) becomes
e S = −Z > ge. Z > HZp In general, the reduced-gradient form of
Z
is not formed or stored explicitly. Instead, the search
direction is computed using vectors of the form
"
where
Buξ = −Sξ ,
Similarly,
Z >ζ ,
for some
n-vector ζ =
>
Z ξ=
where
Buζ1 = −Sζ1 ,
h
for some
#
ζ1
−B
−1
ζ2
S
and can be solved using
"
ξ ∈ R(n−m)×1 ,
uξ
ξ=
I
and can be solved using
h
Zξ ,
−B −1 S
Zξ =
(4.12)
#
ξ
LU-decomposition
i>
I
such that
B,
of
uξ = −U −1 L−1 Sξ .
i.e.
, can be expressed as
i
"
ζ1
#
ζ2
" =
uζ1
#
ζ2
LU-decomposition
of
B,
i.e.
uζ1 = −U −1 L−1 Sζ1 .
Since these vectors are obtained by solving systems of equations that involve may be represented using only a factorization (such as the
B.
that Ye ∈ N (A) only if Y > Z = 0, e.g. when Q is orthogonal.
75
and
B>,
thus
Z
LU-decomposition ) of the m × m matrix
The proposed algorithm is given in Algorithm 1.
13 Note
B
Algorithm 1 Null Space QP iP ←
[form permutation matrix]
h
via pivoting scheme of the QR decomposition of matrix
Aˇ,
such
ˇ =A e= B e ∈ Rm×m is simply given by the e Se , where the full rank square matrix B that AP e. partition of A partition of x ˇ into m basic and n − m superbasic variables Given: µ0 , σ , max-outer-iteration, max-CG-iteration (0) start with feasible value x0 , which implies pB = 0 and remain zero for all subsequent iterations k←1 while k ≤ max-outer-iteration do e k ← H + µ(k) X −2 H k gek ← c + Hx − µ(k) Xk−1 e (k) > e > based on (4.12), where Z H ek . We solve this linear system with calculate pS k ZpS = −Z g Tuncated CG Algorithm 3 (setting maximum iteration to max-CG-iteration). Note that rather
than forming Z 's explicitly, we use stored LU-decomposition ZpS h i p(k) ← 0 p(k) S
of
B
for the matrix-vector product
[calculate step-length]
if x(k) + p(k) > 0 then α(k) = 1
else
α(k) = 0.9
end if
[new update]
x(k+1) ← x(k) + α(k) p(k) k ←k+1 if kegk+1 k − kegk k > − then µ(k+1) ← σµ(k)
else
µ(k+1) ← µ(k) ;
end if end while
nal result is given by
∗
x(k ) P > ,
where
k∗
is the terminating iteration
76
1.0 0.8 0.6 0.4 0.0
0.2
mu
0
10
20
30
40
50
40
50
0.0 0.2 0.4 0.6 0.8 1.0 1.2
| Obj.iterative - Obj.true |
Iteration
0
10
20
30 Iteration
Figure 4.2: Null space line-search algorithm result. a function of iteration.
x0 =
0.3
0.5
0.2
Top panel: value of barrier parameter,
Bottom panel: 1-norm of error.
0.7
0.3
>
σ
is set to
0.8
µ,
as
and initial feasible value,
. Trajectory of the 1-norm error in the bottom panel illustrates
that the algorithm stayed within the feasible region of the problem.
Small Scale Simulated Result
Consider the case where
•
6.97
−0.34 8.04 H= −0.14 −0.11 −1.52 −0.44 1.70 0.17
•
•
•
•
•
•
4.89
2.64 • • , c = 3.40 8.24 • 4.62 0.07 5.11 3.93
7.28 −0.41 0.30
and
1
1
0
0
A= 1
0
0
1
0
0
1
0
1 , b = 1.0 . 0 0.2
By using the built-in QP solver in
R,
which implements the dual method of Goldfarb and Idnani
(1982, 1983), we get the optimal solution responding objective function value of
1.0
1
x∗ =
7.61.
h
0.688
0.11
0.20
0.31
−0.00
i>
and the cor-
Note the slight violation of the non-zero constraint in
the last variable. Figure 4.2 shows the result of our line search algorithm, which converges to the same result strictly in the interior of the constrains.
4.2.2 Empirical Analysis Data
Empirical results in this section is based on the daily closing prices for 500 of the most
liquid stocks, as measured by the median 21-day daily transacted volume, traded on the main stock exchanges in the US, spanning the period between 2008-01-22 to 2009-01-22. mean-variance optimization problem is as follows,
77
The setup of the
40 30 20
Frequency
10 0
-0.6
-0.4
-0.2
0.0
0.2
Daily return, %
Figure 4.3: Histogram for mean daily return for 500 most actively traded stocks between 2008-01-22 to 2010-01-22
. H
is the variance-covariance matrix for the log returns, based on open-to-close dierence in
log-prices.
. c .
.
is set to equal to the mean of log returns over the sample period, see Figure 4.3;
a total of two
equality constraints
are set such that
weights for stocks with A in the rst alphabet in their ticker name sum to 1/2;
weights for stocks with B in the rst alphabet in their ticker name sum to 1/2;
inequality constraints
are set such that the vector of 500 weights are larger than -10, i.e.
xi ≥ −10 ∀i.
Result
Figure 4.4 shows the result using the built-in
R routine that is based on the dual method of
Goldfarb and Idnani (1982, 1983), for the cases with and without the inequality constraint. For the inequality constrained case, the number of iterations until convergence is set to 30, and the number of active constraints at solution is set to 29. Compare this with the result based on Algorithm 1, shown in Figure 4.5, we see that the two results are clearly not identical. This is an artifact of the iterative routines used in the underlying algorithms used by the two methods. In terms of processing time, the built-in
R function is faster than our algorithm.
Figure 4.6 and
Figure 4.7 shows the convergence diagnostics. Table 1 shows diagnostic output from Algorithm 1 at each outer iteration.
4.3
Solving CCQP with a Global Smoothing Algorithm
Murray and Ng (2008) proposes an algorithm for nonlinear optimization problems with discrete variables. This section extends that framework, by casting the problem in the portfolio optimization
78
50 0
Weights
100
min x: -41.3 max x: 120 obj value: -0.339
0
100
200
300
400
500
400
500
30
Stock index
10 -10
0
Weights
20
min x: -10 max x: 30.3 obj value: -0.306
0
100
200
300
Stock index
Figure 4.4:
Optimization output using built-in
R
routine that is based on the dual method of
Goldfarb and Idnani (1982, 1983). Top Panel: with equality constraint only. Bottom Panel: with equality and inequality constraints.
x, respectively.
and
max x
are the minimum and maximum of the solution
20 0 -40
-20
Weights
mix x
min x: -39.2 max x: 50.2 obj value: -0.333
40
vector
0
100
200
300
400
500
400
500
10 20 30 40 50
min x: -9.98 max x: 56.5 obj value: -0.309
-10
0
Weights
Stock index
0
100
200
300
Stock index
Figure 4.5:
Optimization output using Algorithm 1.
Top Panel:
Bottom Panel: with equality and inequality constraints. maximum of the solution vector
x, respectively.
79
mix x
with equality constraint only.
and
max x
are the minimum and
8e-04 mu
4e-04 0e+00
5
10
15
20
15
20
15
20
0.04 0.02 0.00
| Obj.iterative - Obj.true |
Iteration
5
10
20 15 10 5
CG iterations
25
30
Iteration
5
10 Iteration
Figure 4.6: Algorithm 1 convergence diagnostics. Top panel: value of barrier parameter
µ;
Center
panel: 1-norm of dierence between objective value at each iteration and the true value (as dened to be the value calculated by the
R
function); Bottom panel: the number of CG iterations at each
40 0
20
Step length
60
outer iteration.
5
10
15
20
15
20
15
20
-0.27 -0.29 -0.31
Obj.iterative
Iteration
5
10
|| Grad ||
0.00 0.01 0.02 0.03 0.04 0.05
Iteration
5
10 Iteration
Figure 4.7: Additional Algorithm 1 convergence diagnostics. Top panel: step-length objective function value; Bottom panel: 2-norm of the gradient.
80
α; Center panel:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
ngrad.r 2.12e-02 4.75e-02 1.26e-03 1.15e-03 8.32e-04 7.82e-04 2.59e-03 1.49e-03 7.28e-04 6.71e-04 5.42e-04 4.03e-04 6.17e-04 2.45e-03 1.17e-03 5.85e-04 6.72e-04 6.72e-04 6.72e-04 6.72e-04 6.72e-04
Table 7:
ngrad 2.07e-02 4.98e-02 1.29e-03 1.21e-03 8.86e-04 8.42e-04 2.64e-03 1.59e-03 8.51e-04 8.59e-04 8.46e-04 7.67e-04 8.67e-04 2.57e-03 1.39e-03 8.77e-04 9.40e-04 9.40e-04 9.40e-04 9.40e-04 9.40e-04
obj -0.253 -0.272 -0.274 -0.286 -0.294 -0.300 -0.299 -0.300 -0.304 -0.306 -0.306 -0.306 -0.308 -0.308 -0.308 -0.309 -0.309 -0.309 -0.309 -0.309 -0.309
cond 7.45e+03 7.07e+03 7.64e+03 1.38e+04 2.21e+04 3.16e+04 7.30e+04 3.74e+04 4.66e+04 5.08e+04 4.61e+04 3.99e+04 4.80e+04 1.13e+05 3.42e+04 3.74e+04 9.73e+04 9.73e+04 9.73e+04 9.73e+04 9.73e+04
cond.r 7.81e+03 9.92e+03 1.45e+04 2.40e+04 7.06e+04 1.61e+05 3.29e+05 1.92e+05 2.96e+05 3.49e+05 3.92e+05 3.99e+05 4.75e+05 1.62e+06 5.16e+05 5.22e+05 1.48e+06 1.48e+06 1.48e+06 1.48e+06 1.48e+06
mu 1.00e-03 1.00e-03 1.00e-03 6.00e-04 3.60e-04 2.16e-04 2.16e-04 2.16e-04 1.30e-04 7.78e-05 7.78e-05 7.78e-05 4.67e-05 4.67e-05 4.67e-05 2.80e-05 1.68e-05 1.68e-05 1.68e-05 1.68e-05 1.68e-05
step 74.90 20.22 6.95 13.96 13.40 11.46 6.30 7.31 7.78 9.08 5.23 6.23 4.65 2.79 5.05 3.25 0.00 0.00 0.00 0.00 0.00
Nullspace CG algorithm output for the rst 21 iterations.
res.pre 6.21e-16 1.79e-15 5.56e-15 2.81e-15 2.84e-15 5.56e-15 6.66e-15 1.19e-14 1.27e-14 1.64e-14 2.16e-14 9.33e-15 7.99e-15 5.35e-15 1.38e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14
ngrad.r:
res.post 1.79e-15 5.56e-15 2.81e-15 2.84e-15 5.56e-15 6.66e-15 1.19e-14 1.27e-14 1.64e-14 2.16e-14 9.33e-15 7.99e-15 5.35e-15 1.38e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14 1.61e-14
norm of reduced
gradient; ngrad: norm of gradient; obj: objective function value; cond: condition number of hessian; cond.r: condition number of reduced hessian; mu: barrier parameter; step: step length; res.pre: norm of residual pre CG; res.post:
norm of residual post CG; maximum of outer iteration:
20;
maximum of CG iteration: 30.
setting. There is an equivalence of the CCQP problem and that of nding the global minimizer of a problem with continuous variables (see for example Ge and Huang, 1989). However, it is insucient to introduce only penalty function to enforce integrability as it risks introducing large numbers of local minimums which can signicantly increase the complexity of the original problem. The idea of the global smoothing algorithm is to add a strictly convex function to the original objective, together with a suitable penalty function. The algorithm then iteratively increases the penalty on non-integrability and decreases the amount of smoothing introduced.
4.3.1 Framework Here we add the cardinality constraint (4.3), and solve the following mixed integer optimization problem,
min f (x) = c> x + 12 x> Hx
(4.13)
x
s.t.
Ax = b yi ≥ xi ≥ 0 P i yi = K
81
(4.14)
i = 1, . . . , n
(4.15)
yi ∈ {0, 1}
(4.16)
where
c, x ∈ Rn×1 , H ∈ Rn×n , A ∈ Rm×n , b ∈ Rm×1
and
m ≤ n.
The transformed problem
becomes:
minn
x∈R
F (x, y)
= f (x) +
X γX (log si + log xi ) yi (1 − yi ) − µ 2 i i {z } | {z } | penalty term
s.t.
where
e ∈ Rn×1
A
0
0
0 −I
e>
x
smoothing term
b
0 y = K , 0 s −I
I
is a column vector of one's,
(4.17)
s
is the slack variable,
γ
is the parameter that governs
µ is the parameter that governs the amount of global A 0 0 x b Aˇ = 0 e> 0 , x ˇ = y and ˇb = K ,
the degree of penalty on non-integrability, and
smoothing introduced into the problem. Let
−I
I
−I
s
0
then the gradient and Hessian can be expressed as,
gˇ = ∇F (x, y) =
g1 − . . .
µ x1
gn − xµn γ ˇ 11 H 2 (1 − 2y1 ) . ˇ = ∇2 F (x, y) = , H . 0 . γ 0 2 (1 − 2yn ) µ − s1 . . .
0 ˇ H22
0
0 ˇ 33 H
0
− sµn where
ˇ 11 = H + diag g = c + Hx, H
µ x2i
ˇ 22 = diag (−γi ) , H ˇ 33 = diag ,H
µ s2i
.
An outline of the global smoothing CCQP algorithm is given in Algorithm 2. Algorithm 3 is an adaptation of the conjugate gradient method that produces a descent search direction and guarantees that the fast convergence rate of the pure Newton method is preserved, provided that the step length is used whenever it satises the acceptance criteria. The CG iteration is terminated once a direction of negative curvature is obtained.
4.3.2 Empirical Analysis Empirical results to illustrate an application of the
global smoothing algorithm
is based on the daily
closing prices for 500 of the most liquid stocks, as measured by median 21-day daily transacted volume, traded on the main stock exchanges in the US, spanning the period between 2008-01-22 and 2009-01-22. Algorithm 2 illustrates an implementation of the framework to solve the following
82
Algorithm 2 Global Smoothing CCQP iP ←
[form permutation matrix]
h
via pivoting scheme of the QR decomposition of matrix
ˇ =A e= B e ∈ Rm×m e Se , where the full rank square matrix B AP e. partition of A partition of x ˇ into m basic and n − m superbasic variables Given µ0 , σ , ν and γ0 start with feasible value, x ˇ0 , which implies pB ≡ 0 and remain zero for all k←1 that
while not-converged do e k ← P > HP ˇ H is the second
and
n
gek ← P > gˇ,
where
xk
is simply the rst
n
Aˇ,
such
is simply given by the
subsequent iterations
elements of the vector
Px ˇk
and
yk
elements
pS based on (4.12), where we solve the potentially indenite Newton system e k ZpS = −Z > gek based on either the Tuncated CG Algorithm 3, or the Modied Cholesky Z >H
calculate
Factorization Algorithm 4
pk ←
0
pS
calculate the maximum feasible step along
αM
using the backtracking line search method in
Algorithm 5
x ˇk+1 ← x ˇk + αk pk k ←k+1 if kegk+1 k − kegk k > − then µk ← σµk γk+1 ← γk /σ ν
else
µk+1 ← µk
end if end while
[nal result]
and
x ˇk ∗ P > ,
γk+1 ← γk
where
k∗
is the terminating iteration
Algorithm 3 Truncated-CG while not-reached-maximum-iteration do e if d> j Hk dj ≤ ξ then if if j = 0 then pk ← −e gk
else
pk ← zj
end if end if >
e αj ← rj rj/d> j Hk dj zj+1 ← zj + αj dj e k dj rj+1 ← rj + αj H if krj+1 k < k then pk ← zj+1
end if
>
βj+1 ← rj+1 rj+1/rj> rj dj+1 ← −rj+1 + βj+1 dj
end while
83
Algorithm 4 Modied Cholesky Factorization γ ← max (|aii | : i = 1, 2, . . . , n), ξ ← min (aii : i = 1, 2, . . . , n), M aij is the i, j element of the matrix A √ β ← max (γ, ξ/ n2 −1, M ), > 0 ek A←H for l ← 1, 2, . . . , n do µl ← max {|ˆ | : j = l +o 1, l + 2, . . . , n} naljp all |, µl/β rll ← max , |ˆ Given:
the relative machine
precision, where
ell ← rll2 − a ˆll for j ← 1, 2, . . . , n do rlj ← aˆlj/rll
end for for i ← 1, 2, . . . , n do a ˆij ← a ˆij − rlj rli
end for end for
Algorithm 5 Backtracking Line Search (specialized to solve 1 ≤ x ≤ 0 Given:
α ¯ > 0, ρ ∈ (0, 1), c ∈ (0, 1), α ← α ¯
while F (ˇxk + αk pk ) > F (ˇxk ) + cαegk> pk do α ← ρα
end while αk ← α
specic problem,
min f (x) = c> x + 21 x> Hx x,y
yi − xi = si P i yi = K
s.t.
i = 1, . . . , n yi ∈ {0, 1}
si ≥ 0 x≥0
minn
x∈R
F (x, y)
= f (x) + −µ
X
γX yi (1 − yi ) 2 i
(log yi + log (1 − yi ) + log si + log xi )
i
" s.t.
>
0
e
−I
I
Figure 4.8 shows the converged output for
0 −I
K = 250
#
x
y = s
"
K
#
0
and Figure 4.9 for
.
K = 100.
While the proposed algorithm is able to solve small to medium scale problems, convergence is often sensitive to how
γ
and
µ
are modied at each iteration. Further research is needed to control
these parameters to ensure consistent convergence performance and to generalize the algorithm to deal with more general constraints. Therefore, we treat this proposed algorithm as a prototype and a proof of concept that such a framework can potentially be adapted to solve a CCQP portfolio
84
0.8 out
0.4 0.0
0
500
1000
x
1500
150 0 50
Frequency
250
Index
0.0
0.1
0.2
0.4
250 150 0
50
Frequency
0.3
y
out[1:500]
0.0
0.2
0.4
0.8
1.0
250 150 0
50
Frequency
0.6
s
out[501:1000]
Figure 4.8: Converged output for
si ;
K = 250.
Panel-2: histogram of
Panel-1: rst 500 variables are
xi ;
Panel-3: histogram of
yi ;
xi ,
second 500 are
Panel-4: histogram of
yi si .
out
0.0
0.4
0.8
and the last 500 are
500
1000
x
1500
100 200 300 400
Index
0
Frequency
0
0.05
0.10
0.15
y
0.20
0.25
0.30
0.35
100 200 300 400
out[1:500]
0
Frequency
0.00
0.4
s
0.6
0.8
1.0
out[501:1000]
0
Frequency
0.2
100 200 300 400
0.0
Figure 4.9: Converged output for and the last 500 are
si ;
K = 100.
Panel-2: histogram of
Panel-1: rst 500 variables are
xi ;
Panel-3: histogram of
85
yi ;
xi ,
second 500 are
Panel-4: histogram of
yi si .
problem.
In the next section, we proposed another approach to solve the CCQP, which is both
robust and scales well to the size of the underlying problem.
4.4
Solving CCQP with Local Relaxation Approach
Without loss of generality, we replace the generic constraint
P
i
xi = 0,
so that it mimics the case of a
dollar neutral
Ax ≥ b in (4.2) by an equality constraint
portfolio. The resulting CCQP problem is
given by
min f (x) = −c> x + λx> Hx x P s.t. i xi = 0 P i 1{xi 6=0} = K. Murray and Shek (2011) proposes an algorithm that explores the inherent similarity of asset returns, and relies on solving a series of relaxed problems in a reduced dimensional space by rst projecting and clustering returns in this space. This approach mimics line-search and trust-region methods for solving continuous problems, and it uses local approximation to the problem to determine an improved estimate of the solution at each iteration. Murray and Shanbhag (2007) and Murray and Shanbhag (2006) have used one variation of local relaxation approach to solve a gridbased electrical substation siting problem, and have demonstrated that the growth in eort with the number of integers is slow with their proposed algorithm vis-à-vis an exponential growth for a number of commercial solvers. The algorithm proposed here is dierent to their algorithm in a number of key areas. One, instead of a predened 2-dimensional grid we project asset returns onto a multi-dimensional space based on exploring statistical properties of historic returns. Second, instead of using physical distance, we dene the distance metric as a function of factor loading, risk aversion and expected return. Third, given the typical cardinality relative to the size of the search space and cluster group size, we cannot assume that the clusters do not overlap, hence we have proposed a necessary probabilistic procedure to assign cluster members to each centroid at each iteration.
4.4.1 Clustering Recall that the goal of minimizing portfolio risk is closely related to asset variances and their correlation - our objective is to maximize expected return of the portfolio, while penalizing its variance. It is intuitive to think that assets from dierent sectors exhibit dierent correlation dynamics. However, empirical evidence seems to suggest this dynamic is weak relative to idiosyncratic dynamic of stocks both within and across sectors. Figure 4.10 shows the ordered covariance matrix for 50 assets, 5 from each of the 10 sectors dened by Bloomberg, such that the rst group of stocks (JCP, CBS,
Consumer Discretionary, the second group of stocks (NHZ, KO, ..., GIS) belong to Consumer Staples, etc. We see that the covariance matrix does not seem to show any noticeable ..., COH) belong to
block structure that would suggest collective sectoral behavior. Covariance matrices calculated using dierent sample periods also conrm this observation. Instead of relying on sector denition to cluster assets, we could use historic correlation of returns as a guide to help us identify assets that behave similarly. First we dene a distance metric,
86
dij
for
JCP CBS ANF M COH HNZ KO CAG PG GIS VLO APA CHK RRC WMB STI PFG SCHW HCN PLD GILD ABT UNH AET BMY FLR ETN NSC WM EMR TXN MFE CSCO NTAP AMD MON PX OI FCX PTV VZ T AMT Q CTL EIX AES FE D PCG
JCP CBS ANF M COH HNZ KO CAG PG GIS VLO APA CHK RRC WMB STI PFG SCHW HCN PLD GILD ABT UNH AET BMY FLR ETN NSC WM EMR TXN MFE CSCO NTAP AMD MON PX OI FCX PTV VZ T AMT Q CTL EIX AES FE D PCG
Figure 4.10: Heatmap of covariance matrix for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01. Darker colors indicate correlation closer to 1 and light colors closer to -1.
a pair of assets
i
and
j, dij =
where
ρij
q 2 (1 − ρij ),
(4.18)
is the Pearson's correlation coecient based on the pairwise price return time series. This
denition penalizes negatively correlated stocks more than stocks that are not correlated, which makes sense as our goal is to group stocks with similar characteristics as measured by the co-
minimum
movement of their returns. Using this denition of distance metric, we can construct a
spanning tree
(MST) (see for example Russell and Norvig (1995)) that spans our asset pairs with
edge lengths given by (4.18). From Figure 4.11, we see that this method has identied some cluster structures, and veried that the dominant dimension is not along sector classication. The MST is a two dimensional projection of the covariance matrix.
An alternative, higher
dimensional, method is to cluster the assets in a suitably chosen orthogonal factor space spanned by a more comprehensive set of spanning basis. eectively project our universe of often
k N.
N
Given a return matrix,
asset returns onto an orthogonal
R ∈ RT ×N ,
k -dimensional
we aim to
space, where
We can write
R = Pk Vk> + U, where
Pk ∈ RT ×k
is the return of our
k
factors,
Vk ∈ RN ×k
is the factor loadings and
the matrix of specic returns. This reduces the dimension of our problem from the
i-th
column of the return matrix
R, ri ∈ RT ,
to
k,
ri = v1,i p1 + v2,i p2 + · · · + vk,i pk + ui ,
is
such that
can be written as a linear combination of the
factors
87
N
U ∈ RT ×N
k
Figure 4.11: A
Minimum spanning tree
for 50 actively traded stocks, over the sample period 2008-
01-10 to 2009-01-01. Using distance metric dened in (4.18). Each color identies a dierent sector.
where and
U
vi,j ∈ R is the (i, j) component of the matrix Vk , pi respectively. One method to identify the
k
and
ui
factors is by
are the i-th column of the matrix
principal component analysis
P
(PCA).
An outline of the procedure can be expressed as follows,
.
obtain spectral-decomposition
14
:
1 T
R> R = V ΛV > ,
of the eigenvalues along the diagonal of
.
form principal component matrix:
.
nally take rst
k
columns of
P
V
V
and
Λ
are ordered by magnitude
Λ,
P = RV ,
and
where
and
to give
Pk
and
Vk
, known as the principle component
and factor loading matrix, respectively. Once we have identied the
k -dimensional
space, then the proposed clustering method works as
follows:
.
dene a cluster distance metric and
.
j -th
d (ri , rj ) = kvi − vj k2 ,
row of the factor loading matrix
where
v i ∈ Rk
and
vj ∈ R k
are the
i-th
Vk .
lter outliers in this space by dropping assets with distance signicantly greater than the median distance from the
center-of-gravity 15
14 Here we assume columns of R have 15 The center-of-gravity is dened in
of the
k -dimensional
space;
zero mean. the usual sense by assigning a weight of one to each asset in the projected space, then the center is dened to be the position where a point mass, equal to the sum of all weights, that balances the levered weight of the assets.
88
GIS
AET ABT PG
AET AES T CAG
FCX COH HCN PCG UNH D M RRC EIX MFE NSC MON ETN WM BMY TXN HNZ EMR JCP OIPFG PX ANF FLR WMB GILD APA Q STI
T
pca.3
pca.2
SCHW
WMB
MFE D
STI UNH PX FLR APA Q ANF PFG AES ETNNSC TXN HNZ EMR OI WM FCX BMY MON JCP GILD PCG EIX
COH M HCN
CHK
RRC
KO
SCHW
VZ KO CHK PLD
VZ
NTAP AMT VLO
PG
PLD
CAG
AMT
CSCO
ABT
CSCO
VLO NTAP
AMD AMD
GIS
pca.1
pca.2
AET
AES T
SCHW
CHK VZ
KO
pca.2
pca.3
PCG UNH D RRC MFE NSC EIX MON ETN WM BMY TXN HNZ EMR PFG JCP OI PX ANF FLR WMB GILD APA Q STI
HCN M
pca.3
FCX
COH
PLD NTAP
AMT
VLO
PG CAG ABT
CSCO AMD GIS
pca.1
Figure 4.12:
k-means clustering
pca.1
in a space spanned by the rst three principal components based
on log returns for 50 actively traded stocks, over the sample period 2008-01-10 to 2009-01-01.
.
identify clusters (using for example
k -means
clustering;
see Hastie et al. (2003) for details) in
this metric space. Figure 4.12 illustrates the proposed clustering method using the returns of 50 actively traded US stocks. Here we have set the number of clusters to three and have dropped four outliers stocks (FE, CBS, CTL and PTV) based on the criteria listed above, and then projected the remaining 46 returns onto a three-dimensional space spanned by the three dominant principal components.
Distance Measure
s,
in the projected space
dm (s; ao ) = σd kvs − a0 k2 + (1 − σd ) (kck∞ − |cs |) ,
(4.19)
We dene the distance measures for an assets,
relative to an arbitrary reference point,
where
cs
is the
s-th
a0 ,
by
element of the vector of expected return,
we use the tuning parameter
σd
c.
The economic interpretation is that
to control the trade o between picking stocks highly correlated
and stocks that have higher expected return. Note that
σd = 1
corresponds to the simple Euclidean
distance measure.
Identify Cluster Member
Once the initial starting
K centroid assets have been identied, the center-of-gravity method, dened for the
proposed algorithm identies the member assets by the
89
i-th
cluster as
∗(i)
Vk> xr
gi =
∗(i) ,
xr where the vector of assets in the centroid
i,
∗(i)
xr
i-th
corresponds to optimal weights (see
(4.20)
relaxed-neighborhood QP
in Section 4.4.2)
cluster. The member assets are picked according to the degree of closeness to
dened by the distance measure
dm (s; gi )
for asset
s.
Once the corresponding member assets have been identied for each cluster, the algorithm will propose a new set of
K
centroids at the end of each iteration. For the next iteration, new member
assets needed to be calculated for each cluster. The composition of each new cluster clearly depends on the order that the centroids are picked.
For example, clusters
A
and
B
could be suciently
close in the projected space, such that there exists a group of assets which are equally close to both clusters. Since assets cannot belong to more than one group, so once cluster the asset drops out of the feasible set of assets for cluster Let
pi
be the probability of centroid asset
i
B.
A has claimed an asset,
being picked for cluster assignment, we propose the
following logistic transform to parametrize this probability:
pi log = αs 1 − pi where
x∗o,i
! ∗ x 1 o,i P ∗ − , K i xo,i
corresponds to the optimal weight (see
centroid-asset QP
(4.21)
in Section 4.4.2) of the
i-th
centroid asset, such that the higher the centroid weight the greater the probability that it will have a higher priority over the other centroids in claiming its member assets.
4.4.2 Algorithms Algorithm for identifying an initial set of starting centroids, So
There are a number of
possible ways to prescribe the initial set of centroid assets.
.
k-means
(Algorithm 6)
form
K
clusters, where
spanned by
So ⊂ S ,
R (Pk ).
K
is the cardinality of the problem, in the projected factor space
Then in the projected space, identify the set of
K
centroid-assets,
which are stocks closest to the cluster centers (note a cluster center is simply a
point in the projected space, and does not necessarily coincide with any projected asset returns in this space);
.
Successive truncation (Algorithm 7)
at each iteration we essentially discards a portion of assets which have small weights, until we are left with the appropriate number of assets in our portfolio equal to the desired cardinality value.
90
Algorithm 6 K -means in factor space R, K 1 > TR R [Singluar Value Decomposition] Given:
Ω←
Vk ←
rst
k
columns of
[Random initialization]
Ω = V ΛV > V (0) (0) (0) m1 , m2 , ..., mK
while not-converged do for i = 1 to K don
E-Step:
M-Step:
o
(n) (n) = vj : vj − mi ≤ vj − m˜i , ∀˜i = 1, . . . , K P (n+1) 1 ˛ mi = ˛˛ (n) (n) vj ˛ vj ∈C (n)
Ci
˛Ci
end for end while
i
˛
Algorithm 7 Successive truncation N, I j←0 S0 ← S; n0 ← N ; φ0 ← 0
Given:
repeat
[Solve QP] for
x∗
where
x∗j = arg min x
−c> x + 21 x> Hx
s.t.
e> x = 0 x ∈ Sj
if
arithmetic truncation then return φj ← max (nj − bN/I c , K) else return φj ← max (b(1 − K/N ) nj c , K) end if
Sj+1 ← s(n) , s(n−1) , . . . , s(φj ) nj+1 = |Sj+1 | − φj until φj = K
91
Algorithm for iterative local relaxation
In the main Algorithm 8, at each iteration, we solve
a number of small QPs for subsets of the original set of assets, assets belonging to one of the
K
So ⊂ Sr ⊆ S ,
where
Sr
is the set of
clusters identied. The QPs that we solve in the main algorithm
include the following:
.
Globally-relaxed QP min −c> x + 21 x> Hx x
e> x = 0
s.t.
x ∈ S, where
c
and
H
are the expected return and covariance matrix for the set of assets,
column vector of
.
S; e
is a
1's.
Centroid-asset QP 1 > min −c> o xo + 2 xo Ho xo x
e> xo = 0
s.t.
x ∈ So , where
So ,
co
and
only.
Ho
This gives optimal weight vector
corresponding to
.
are expected return and covariance matrix for the set of centroid assets,
i-th
x∗o ,
where
x∗o,i
is the
i-th
element of this vector
so,i .
centroid asset,
Relaxed-neighborhood QP 1 > min −c> r xr + 2 xr Hr xr w
∗ e> i xr = xo,i
s.t.
i = 1, . . . , K,
xr ∈ Sr , where
ei
is a column vector with
1's
assets and are zero everywhere else;
cr
is the set of
neighbor
centroid cluster
Hr is the correK ∪i=1 N (so,i ), where
is the vector of expected returns and
sponding variance-covariance matrix of our sub-universe of assets,
N (so,i )
i-th
at positions that correspond to the
assets belonging to the cluster with
Sr = so,i
impose the constraint that the sum of weights of the neighbors of the
as the centroid.
i-th
We
centroid add up to
∗ the centroid weight, xo,i .
Algorithm for identifying new centroids of the
K
clusters. We generate
cluster to give
kmk∞
Φ2
{mk }k=1,...,K
be the number of members in each
number of centroid proposals based on assets within the same
Φ1 , . . . , Φkmk∞ . Φ1 ∈ RK
to the centroid;
Let
is a vector of the set of closest distances, dened by (4.19),
is the vector of distances that are second closest, etc. For cluster group
mk ≤ kmk∞ , we pad the last kmk∞ −mk away from the centroid in cluster
k.
k
with
entries with distance of the cluster member that is farthest
Once we have generated the centroid proposals
{Φi }i=1,...,kmk
∞
,
Algorithm 9 then loops over each of the proposals. During each loop, the algorithm replaces the
92
Algorithm 8 Local relaxation in projected orthogonal space αs , σd , Mmin , Mmax , K , η¯, o , S x∗ ← solution of globally-relaxed QP Ξ ← ranked vector x∗ So ← initial centroids by Algorithm 7 or Given:
while not-converged do
by Algorithm 6
(x∗o , fo∗ ) ← solution of centroid-asset QP ` ←Ξ Ξ while Iε = i : x∗o,i < o 6= Ø do for each i ∈ Iε do s ← s`1 , where s = so,i ∈ So : x∗o,i < o , ` ←Ξ ` \ {` Ξ s}
end for x∗o ←
solution of
end while [Generate
centroid-asset
and
` \ So . s` ∈ Ξ
QP
random-centroid-asset-order, Io˜] Sampling, without replacement, from the index set
Io = {1, 2, . . . , K}
with probability in (4.21)
pi =
1 + exp αs
!!!−1 ∗ x 1 o,i P ∗ − K i xo,i
S` ← S \ So for {so,i : so,i ∈ So , i ∈ Io˜} do mi ← max (bMmax pi c , Mmin ) `], such that [Identify mi neighbors of so,i , N (so,i ) ⊂ S n o
s ∈ S` : σd vs − vso,i 2 + (1 − σd ) (kck∞ − |cs |) < r ⊂ |N (so,i )| = S` ← S` \ N (so,i ) ,
end for S Sr ← x∗r ← S0 ←
if
K i=1 N (so,i ) solution of
relaxed-neighborhood
QP
output of Algorithm 9
[max-iteration-reached OR max-time-limit-reached]
break end if end while
93
then
N (so,i ) mi − 1
Algorithm 9 Identify new centroids Given:
Υ, {gi }i=1,...,K
[Generate centroid proposal list]
for i = 1 to kmk∞ do Φi ←
end for
proposal of
K
centroid assets with distance ranked
i-th
closeset based on (4.19)
j ← 1; η ← 1; exit-loop←false while exit-loop is false do for i = 1 to K do so,i ← Φj,i f ← objective function if f < fo∗ then exit-loop←true
value of
centroid-asset
QP
break end if end for
j ←j+1 if j > kmk∞ then if η ≥ η¯ then goto Algorithm 10
else
Mmax ← bΥη Mmax c η ←η+1
end if end if end while
current centroids with progressively less optimal proposals, in the sense of larger distance measure from the cluster centroid assets, and breaks out as soon as there has been an improvement to the objective function. If none of the proposals gives a lower objective function value, the algorithm then attempts to swap centroids with assets from other clusters. The choice of substitution is governed by the expected returns of the assets, weighted by the magnitude of the relaxed solution from QP, to give a vector
$∈R
P
i mi
relaxed-neighborhood
,
> $ = x∗o,1 cs∈N (so,1 ) , . . . , x∗o,2 cs∈N (so,2 ) , . . . , . . . , x∗o,K cs∈N (so,K ) , where
cs∈N (so,k )
is a row vector consists of the expected returns for assets belonging to the
k -th
cluster.
4.4.3 Computational Result Preparation of Dataset
Empirical results in this section are based on two key datasets. The rst
dataset consists of daily closing prices for 500 of the most liquid stocks, as measure by mean daily transacted volume, traded on the main stock exchanges in the US, spanning the period between 2002-01-02 to 2010-01-22. The second dataset consists of 3,000 monthly closing prices for actively traded stocks in the US, spanning the period between 2005-05-31 to 2010-04-30. These two datasets test the performance of our algorithm under dierent market environments and for dierent scales
94
Algorithm 10 Swap centroids k ← 1; q ← kSr k
while exit-loop
is
false do
so,k ← $q f ← objective function if f < fo∗ then η←0 exit-loop←true
value of
centroid-asset
QP
end if
k ←k+1
if k = K then q ←q−1 k←1 if q ≤ 0 then exit-loop←true
end if end if end while
of the problem. The results based on the proposed algorithm are compared with those obtained by the successive truncation algorithm in Algorithm 7 and by the branch-and-cut MIQP solver used in CPLEX, a leading commercial optimization software.
Cleaning and Filtering
We follow a three step procedure to clean, lter and prepare the datasets
for analysis:
.
Recorded price validation.
We discard assets that do not have a complete price history over
the whole sample period;
.
NAICS sector matching. We purge any asset whose ticker cannot be matched to the latest North American Industry Classication System (NAICS) sector denition. This is in place to facilitate potentially imposing sectoral constraints.
.
PCA outliers detection.
We project the returns, dened as the dierence in log-prices, onto a
4-dimensional space spanned by the dominant PCA factors. Any assets with projected return more than two standard deviations away from the projected median is considered an outlier asset and dropped.
Factorization of Covariance Matrix
For the dataset with 3,000 assets, since
N T
so the
covariance matrix from raw returns is rank decient. We take the necessary step of factorizing the raw covariance matrix by using the rst four dominant PCA factors. This ensures that the resulting covariance matrix is both full rank and well conditioned.
The rst four factors together capture
35.07% of the total variance for the 500 stock case, and 38.15% for the 3,000 stocks case.
4.4.4 Algorithm Benchmarking Successive Truncation
Figure 4.13 shows the result for the fully relaxed problem (i.e. without
the cardinality constraint), ordered by value of the optimal weights. Positive weights mean that we
95
4
3
Weight
2
1
0
−1
HCC TRK DD TD CRY DGX ACO IBOC WTT VAR SPLS USTR DOW ANAT KEI MDU DNR MSM STFC STC WDR CTB TRST PEG HLX HAL DRAM WWW WTNY PKE GNTX ETM IDCC JOE EMR RHI BVF SBP LZ STBA VII CTT AMLN PFG SHS TMK SR CHD MTX CCF
−2
Tickers
Figure 4.13: Fully relaxed (i.e. without cardinality constraint) QP solution for 500 actively traded US stocks (not all tickers are shown) sampled over the period from 2002-01-02 to 2010-01-22, with dollar neutral constraint.
long those assets and negative weights correspond to shorting those assets. As we impose the conP straint, i xi = 0, so the net dollar exposure of the result portfolio is zero. This is one interpretation of a market neutral portfolio. An alternative denition of market neutrality requires neutralizing go
specic factors of the net portfolio exposure, by eectively taking into account correlation between dierent assets. We apply the successive arithmetic truncation algorithm given in Algorithm 7 to CCQP. Table 8 gives the result for the 500-choose-15 (i.e. solving a 500 asset problem with cardinality equal to 15) case. It can be seen that if the number of iterations is set to one-tenth of the size of the universe, i.e. by discarding a large portion of stocks from the feasible set at each iteration, the method is able to obtain a result quickly. However, as we will show later, if we need a higher degree of optimality, this simple method often fails to deliver. Also, note that it is not guaranteed that the more iterations we use, the better the solution. Table 9 shows the result for the 3,000-choose-15 case, we see that the algorithm does not give monotone improvement of the objective function. For comparison, Table 10 shows the result for geometric truncation for the 3,000-choose-15 case.
Again, we notice that
increasing the number of iterations does not necessarily increase optimality. Both of these truncation methods are widely used in practice.
Although there is clearly no
dispute on the speed of this heuristic algorithm, we later show that it can be signicantly inferior than algorithms that are able to explore the solution space more eectively, at only a small cost of execution time. Throughout the rest of this paper, we refer to successive arithmetic truncation as simple succes-
96
Number of
Objective
Time
Optimal
Iterations
Value
(sec)
Assets
AGII BIG CPO ETH FBP FICO HCC IGT LPNT PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO DYN ETH FICO HCC NWBI PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO EMC ETH FICO HCC IGT PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO DYN EMC ETH FICO HCC PFG RYN SRT TAP UIS WPP AGII BWA CDE CPO ETH FICO HCC IGT PFG RYN SRT TAP TRK UIS WPP
10
-0.00615
0.246
20
-0.00637
0.455
50
-0.00629
0.832
100
-0.00635
1.677
500
-0.00638
8.073
Table 8: Successive arithmetic truncation method result for 500 of the most liquid US stocks, with cardinality,
K
= 15.
Number of
Objective
Time
Optimal
Iterations
Value
(sec)
Assets
10
-12.08450
33
100
-12.13756
210
300
-12.23406
561
500
-12.23406
929
1,000
-11.72131
1,839
3,000
-11.72131
5,536
AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE EDMC MJN SEM TLCR VRSK AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN SEM TLCR TMH VRSK AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK SEM TLCR VRSK AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK
Table 9: Successive arithmetic truncation method result for 3,000 of the most liquid US stocks, with cardinality,
K
= 15.
Number of
Objective
Time
Optimal
Iterations
Value
(sec)
Assets
10
-12.37960
19
100
-11.72131
63
300
-11.72131
161
500
-11.72131
290
1,000
-11.72131
507
3,000
-11.72131
1259
AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE LOPE MJN SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK AOL ART CFL CFN CIE CIT CVE DGI DOLE MJN NFBK RA SEM TLCR VRSK
Table 10: Successive geometric truncation method result for 3,000 of the most liquid US stocks with cardinality constraint,
K
= 15.
97
● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ●● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ●● ●●● ●●●● ● ● ● ● ●●● ●● ●● ●●● ● ●● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ●● ● ● ● ● ●● ● ● ●●● ●● ● ● ●● ●
●
●
●
●
● ● ●● ● ●● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ●●●● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ●● ●● ●●●●●● ● ●● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ●● ●●● ●●●●●● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ●●● ● ●● ●● ●●● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ●●●● ●● ● ●● ●● ●● ● ●● ●●● ● ●● ● ● ●● ●●● ● ● ● ●●● ● ● ● ● ● ●
pca.4
pca.3
pca.2
● ● ● ● ● ●●
● ● ● ● ● ●
● ●
● ● ● ●
●
● ●
pca.1
●
●● ●
pca.1
pca.1
●
●
● ● ●●●● ● ● ●●● ● ● ● ● ● ●●● ●● ●●●●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●●●●● ● ● ● ●● ● ● ●●●● ●● ● ●● ● ● ● ● ● ●●● ●●●● ●● ● ●● ● ● ●●● ●●● ● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ●● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ●●● ●● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●
●
●
●
● ●
●
● ●
● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●
pca.2
pca.2
●
●● ● ● ● ●● ● ●● ● ● ●●
● ●● ● ●● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●●●● ●● ● ●● ● ●● ● ● ●● ●● ● ●●●● ●● ●●● ● ● ●● ● ● ●●●●● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●●● ●● ● ● ●● ● ●● ●● ● ● ● ● ●● ●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ●● ●●● ● ● ● ● ●●●● ●● ● ●
pca.4
pca.3
●
● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ● ●● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ● ●● ●●● ● ●●●●● ● ●● ●● ● ●● ● ● ●●●●● ● ●
pca.4
●
● ●
●
● ●●●● ● ● ● ●● ● ● ● ● ● ●
●
●● ●
pca.3
Figure 4.14: Projection onto a 4-dimensional space spanned by the rst four dominant PCA factors, based on daily returns for the most liquid 500 US traded stocks, sampled over the period from 2002-01-02 to 2010-01-22. Symbols with dierent colors correspond to dierent cluster groups.
sive truncation, unless otherwise stated.
Local Relaxation and CPLEX MIQP
Each of the six panels in Figure 4.14 represents a 2-
dimensional view of the 4-dimensional PCA projection of the returns of our 500 assets. In the same gure, we have also shown the initial
K
clusters, indicated by dierent colors and symbols.
The
local relaxation method that solves a CCQP with cardinality equal to 15 may be initialized with the cluster centroids assets, iteratively solve for relaxed but smaller QPs by including a subset of the cluster members, then followed by moving to a new set of centroids with strictly increasing optimality. A prototype of the local relaxation algorithm has been implemented in
R
running in a 64-Bit
Linux kernel. The solution of the relaxed QP's required by the new algorithm and the truncation method are found using the QP solver in
R,
which is based on Goldfarb and Idnani (1982).
To
help assess the performance of our proposed algorithm, we present three solved problems, each with dierent universe sizes and cardinality values. The sets of parameters used in the algorithm for these three problems are given in Table 11. For comparison, we have implemented CPLEX v12 in a 64-Bit Linux kernel using the CPLEX C++ API to access the built-in MIQP solver, which is based on the dynamic search branch-and-cut method (IBM, 2010). The code is set to run in parallel using up to eight threads on a Quad Intel Core i7 2.8GhZ CPUs with access to a total of 16GB of memory. Table 12 shows the parameter settings used. Table 13 shows the result, for the 500-choose-15 case, with a run-time cut-o set to
98
(500, 15)
Mmin Mmax σd αs 0 η¯ Υ Table 11:
Parameter setup for
(3000, 15)
(3000, 100)
5
5
3
30
30
10
0.2
0.2
0.2
80
80
80
1 × 10−5
1 × 10−5
1 × 10−5
10
10
10
1.1
1.2
1.8
local relaxation
method.
(500, 15) means solving a 500 universe
problem with cardinality equal to 15. Parameter
Setting
Preprocessing Heuristics Algorithm MIP emphasis MIP search method Branch Variable Selection Parallel mode
true RINS at root node only branch-and-cut balance optimality and feasibility dynamic search automatic deterministic, using up to 8 threads
Table 12: CPLEX parameter settings (with the rest of the parameters set to their default values).
1,200 sec. Figure 4.15 shows the performance comparison between CPLEX and the new algorithm as a function of run-time. We see that the local relaxation method is able to obtain a better solution than CPLEX right from the beginning, and it strictly dominates CPLEX throughout the testing period. Next, we assess the scalability of the new algorithm, by switching to the larger dataset which consists of 3,000 actively traded stocks. As before, we rst project the factored covariance matrix onto a 4-dimensional PCA space and then identify the initial
K
clusters based on
k-means,
as
shown in Figure 4.16. Table 14 shows the computational result. We see that the local relaxation method is able to reach a signicantly better result than CPLEX quickly. Figure 4.17 shows the performance comparison of the local relaxation method verses CPLEX for the 3,000 choose 15 case, which illustrates dominance of the proposed algorithm over CPLEX for a prolonged period of time.
Local Relaxation with warm start
One additional feature of the local relaxation algorithm
is that it can be warm started using the result from another algorithm or from a similar problem
Method
Objective
Time
Optimal
Value
(sec)
Assets
CPLEX
-0.00796
1200
Local
-0.00870
1200
AOI AYI CCF CPO DD DST DY FBP HCC LRCX MRCL NST TMK UIS WPP AOI AVT AYI BIG BPOP CCF CPO CSCO DST HCC HMA IBM MCRL TAP ZNT
Relaxation
Table 13: Computational result for the 500 asset case with cardinality, the
local relaxation
K
= 15. For CPLEX and
algorithm, we have imposed a maximum run time cuto at 1,200 seconds.
99
0.000 −0.004 −0.006
Value obtained by successive truncation
−0.008
Objective function value
−0.002
CPLEX Local Relaxation
0
100
200
300
400
500
600
Time, sec
Figure 4.15: CPLEX versus with cardinality,
K
Method
local relaxation
method performance comparison for the 500 asset case,
= 15, both are cold started.
Objective
Time
Optimal
Value
(sec)
Assets
-12.08425
36
AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE EDMC MJN SEM TLCR VRSK
CPLEX
-12.30410
25200
Local
-12.76492
7200
AOL AONE ART CFL CFN CIE CIT CVE DGI DOLE MJN SEM TLCR VECO VRSK AOL AONE ARST ART BPI CFL CFN CIE CIT CVE DGI DOLE MJN SEM TLCR
Successive Truncation
Relaxation Table 14:
Computational result for the 3,000 asset case, with cardinality,
truncation is done over 10 iterations.
100
K
= 15.
Successive
● ●
●
●
●●
● ● ●
●●
●
● ●
●●
●
● ●
● ● ● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●●● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●●● ● ● ●● ● ● ●● ● ● ●●● ●● ● ● ● ● ●● ●●●● ●●● ● ●● ● ● ●● ●●● ●● ●● ● ●● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●●● ●● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ●●● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●● ● ● ●● ● ●● ●● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ●● ●●● ● ●● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●●● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ● ● ●●● ● ●●● ● ●● ●● ● ● ●●● ● ●● ●●●● ● ● ● ●●● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●
pca.4
● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ●● ● ●●● ●● ● ●●● ●● ●●● ● ●● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●●● ● ●● ● ●● ●● ● ● ●● ● ● ●●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●●● ● ●●● ● ●●● ● ● ● ●● ● ● ● ● ●● ●● ●●●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●●● ●●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ●●●● ● ● ● ● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●●●● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●●●● ●● ● ●●●●● ● ●● ●● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●
● ●
pca.3
pca.2
● ● ● ●●
●
●
●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●●● ●●● ●● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ●● ● ● ●●● ● ● ● ●●●● ● ●● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ●●●● ● ● ● ● ●● ●● ● ●●● ●● ● ● ●●● ●● ●● ● ●● ●● ● ●● ● ●● ● ● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ●● ●● ●●● ● ●● ●● ● ●●● ●● ●● ●● ●● ● ● ●● ●● ●● ●● ●● ●● ● ● ●● ● ●●●● ● ● ●●● ●● ●● ●● ● ● ● ●● ● ● ●● ●● ● ● ●●● ●● ● ● ●●● ● ● ●●●● ● ● ● ●● ● ● ● ●●●● ●●● ●● ● ● ● ● ●● ●● ● ●● ●● ●● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●●●● ●● ●●●● ● ● ●●●● ●●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●●●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●●●●●● ●●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●
●
● ●
● ●
● ●
● ●
●
pca.1
pca.1
pca.1
● ● ●
pca.2
Figure 4.16:
●
●●
●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●●● ● ● ●●● ● ● ●● ● ● ●●●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ●● ● ●●● ● ●●● ● ● ●● ● ● ●● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ●●● ● ● ● ●●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ●●● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ●● ● ● ● ●● ● ● ●●● ●● ● ● ●● ● ●●●●● ● ●● ●● ●● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ●●● ● ●
●
pca.2
pca.4
●
pca.4
pca.3
● ●
● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ●● ●● ●● ●● ● ●● ● ●●● ● ●●● ● ●● ● ● ●● ●●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ●●●● ● ● ●● ● ●●●●● ● ●● ●● ● ●●● ● ● ●●●●● ● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●●● ● ●● ● ● ●●● ● ● ●
●
●
●● ● ● ● ● ●● ● ●●● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ●● ● ●●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ●● ●● ●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ● ●● ●● ●● ●● ● ●●●●● ●● ●● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●
pca.3
Projection onto a four dimensional space spanned by the rst four dominant PCA
factors, based on monthly returns for the most liquid 3,000 US traded stocks, sampled over the period from 2005-05-31 to 2010-04-30. Symbols with dierent colors correspond to dierent cluster groups.
101
−8 −10
Objective function value
−6
CPLEX Local Relaxation
−12
Value obtained by successive truncation
0
1000
2000
3000
4000
5000
6000
7000
Time, sec
Figure 4.17: CPLEX versus
local relaxation method performance K = 15, both cold started.
comparison for the 3,000 asset
case, with cardinality constraint
directly. We have seen that the simple successive truncation algorithm could often lead to a good local minimum with little computational cost. To take advantage of this, we propose warm starting the local relaxation algorithm using the result from successive truncation as the initial centroid assets. Since local relaxation method is strictly feasible and monotone improving, we can set a time limit and take the result at the end of the period. This way, we are guaranteed a solution no worse than the one obtained by successive truncation. This hybrid approach gives the best combination and can be shown to be signicantly more ecient than a warmed started CPLEX solve. Figure 4.18 shows the result, solving a 500 asset CCQP with
K = 15,
by warm starting the local
relaxation method with output from successive truncation, based on 20 iterative truncations. It can be seen that the new algorithm dominates the CPLEX result. Figure 4.19 shows the result, solving a 3,000 asset CCQP with
K = 100,
by warm starting the
local relaxation method with output from successive truncation. Note that at least for this instance, CPLEX has never quite managed to nd a minimum that comes close to the one obtained by the proposed algorithm, even after it has been left running for 35 hours.
Warm starting CPLEX with Local Relaxation strength of the
local relaxation
It is reasonable to conjecture that the
algorithm is in its ability to explore structure of the problem, and
hence in identifying the sub-group of assets within which the optimal solution set lies. To see whether the branch-and-cut method in CPLEX is able to improve on the solution from the new algorithm once the algorithm has identied a sub-asset group, we propose the following test:
102
−0.0085
−0.0075
−0.0065
Successive truncation
−0.0095
Objective function value
CPLEX Local Relaxation
0
500
1000
1500
2000
2500
3000
3500
Time, sec
Figure 4.18:
CPLEX versus
local relaxation
method performance comparison for the 500 assets
K = 15;
Both methods are warm started using the solution of
universe, with cardinality constraint,
successive truncation over 20 iterations. Maximum run-time is set to one hour.
103
−18.2
−18.0
−17.8
CPLEX Local Relaxation
−18.4
Objective function value
−17.6
Value obtained by successive truncation
0
5000
10000
15000
20000
25000
Time, sec
Figure 4.19: CPLEX versus
local relaxation
universe, with cardinality constraint,
method performance comparison for the 3,000 assets
K = 100;
Both methods are warm started using the solution
of arithmetic successive truncation over 20 iterations. Maximum run-time is set to seven hours.
104
−10 −12 −14 −16 −18
Objective function value
Value obtained by Local Relaxation
0
500
1000
1500
2000
2500
3000
3500
Time, sec
Figure 4.20: Apply CPLEX MIQP to the union of the cluster groups identied by
local relaxation
algorithm upon convergence (the beginning of the at line in Figure 4.19). Maximum run-time is set to one hour.
local relaxation
.
solve the CCQP with
.
once the algorithm has reached a stationary group of assets (here stationary refers to the
algorithm;
observation that the cluster groups do not change over certain number of iterations), stop the algorithm and record the union of assets over all identied cluster groups;
.
solve the CCQP to optimality only for this group of assets using CPLEX.
To illustrate the idea, we apply the above procedure to the result for the 3,000-choose-100 case. In Figure 4.19 we see that the algorithm appears to have converged after 15,916 seconds. We take a union of the assets in the 100 clusters from the
local relaxation
algorithm, then feed these assets
into CPLEX. Figure 4.20 shows the output from CPLEX at each iteration. It can be seen that the CPLEX branch-and-cut algorithm takes almost 20 minutes to reach a level close to the result given by the
local relaxation
method, and never quite reach it within the run-time limit of one hour.
Mean-variance ecient frontier tolerance parameters is known as
The set of optimal portfolios formed based on dierent risk
frontier portfolios,
in the sense that all portfolios on the frontier
are optimal, given the specic risk tolerance of an investor. Therefore, a mean-variance optimizing agent will only choose frontier portfolios. Figure 4.21 shows this ecient frontier for our universe of 3,000 US stocks. We observe that, in-sample, the CCQP solutions are strictly dominated by those based on QP - a consequence of the cardinality constraint which
cuts o
parts of the feasible space.
We observe that this domination increases as we decrease the cardinality of the problem, as larger
105
QP CCQP, K=100 CCQP, K=15
350
300
Mean, %
250
200
150
100
50
1500
1000
500
0
0
Variance, %
Figure 4.21: Mean-variance ecient frontiers for a 3,000 asset universe. QP is the frontier without the cardinality constraint. CCQP is the frontier in presence of cardinality. To produce the frontier for CCQP, we warm-start the local relaxation algorithm, based on the successive truncation solution, and set a maximum run time limit of 252,000 seconds for the case where for the case where
K = 15.
106
K = 100
and 3,600 second
parts of the feasible space becomes inaccessible to the CCQP. However, in practice, the robustness of forming a portfolio with smaller subset of assets with higher expected returns can often lead to better out of sample performance.
107
5
Conclusion
In this thesis, new approaches to model the three key statistical and algorithm aspects of optimal portfolio construction have been presented. For the rst aspect, we have proposed a complete model for returns and realized measures of volatility,
xt ,
where the latter is tied directly to the conditional volatility
ht .
The motivation is
to include high frequency data in a rigorous way in a classic GARCH framework that has so far been used predominately in dealing with less frequently sampled data, hence without having to be concerned with market microstructure noise.
We have demonstrated that the model is straight-
forward to estimate and oers a substantial improvement in the empirical t, relative to standard GARCH models. The model is informative about realized measurement, such as its accuracy. The proposed framework induces an interesting reduced-form model for
{rt , ht },
that is similar to that
of a stochastic volatility model with leverage eect. Our empirical analysis can be extended in a number of ways. For instance, including a jump robust realized measure of volatility would be an interesting extension, because Bollerslev, Kretschmer, Pigorsch, and Tauchen (2009) found that the leverage eect primarily acts through the continuous volatility component. Another possible extension is to introduce a bivariate model of open-to-close and close-to-open returns, as an alternative to modeling close-to-close returns, see Andersen et al. (2011). The
m
naturally extended to a multi-factor structure. Say ables. The
Realized GARCH
Realized GARCH
realized measures and
k
latent volatility vari-
model discussed in this thesis corresponds to the case
the MEM framework corresponds to the case
m = k.
conduct inference about the number of latent factors,
framework is
k = 1,
whereas
Such a hybrid framework would enable us to
k.
We could, in particular, test the one-factor
structure, conjectured to be sucient for the realized measure used in this paper, against the twofactor structure implied by MEM. For the extension of
Realized GARCH
to multivariate settings, we
could explore more general frameworks by loosening the assumption that volatility and correlation operate at dierent time scales. The increasing role that high frequency nance plays in the market has also motivated the part of the thesis on expected return forecast. The proposed model is based on self-excited point process augmented by trade size and limit order book information. The motivation for incorporate these two potential
marks
is based on underlying economic intuition.
First, the trade size mark
acts as a mechanism to dierentiate large orders that is more likely to induce price movement than small noise trading of insignicant impact.
Second, information from the limit order book
adds an extra dimension to gauge supply-demand imbalance of the market. Game theoretic utility balancing argument aside, a more sell order loaded book will signal that more investors are looking for an opportunity to liquidate or sell short a position, and
vice versa
for a buy order loaded book.
Nowadays, sophisticated execution algorithms contribute to over 60% of total trading volume on the NYSE, with similar dominance in other markets. These algorithms aim to minimize market impact of large orders, by slicing them into smaller units and then submitting to dierent layers of the limit order book at optimal times. The proposed frameworks in this paper are tested using the LSE order book data and result suggests that the inclusion of limit order book information can lead to a more robust estimation when compared with the basic bivariate model, as dened by goodness-of-t of the model implied distribution to the empirical distribution of the inter-arrival times of the bid and ask side market orders. As for trade size, result suggests that it could have an adverse eect on model performance. This could be attributable to the fact that most orders, before they reach
108
the exchange, have been sliced by execution algorithms so that the size is close to the median daily average, so as to minimize signaling eect.
Hence a large portion of the uctuation around the
median volume is noise, which lowers the overall signal to noise ratio. The nal key aspect of the thesis is portfolio optimization with cardinality constraints.
NP -hard
The
nature of cardinality constrained mean-variance portfolio optimization problems has led
to a variety of dierent algorithms with varying degrees of success in reaching optimality given limited computational resources and under the presence of strict time constraints in practice. The computational eort needed to be assured of solving problems of the size considered here is truly astronomical and way beyond the performance of machines envisaged. However, solving a problem exactly is of questionable valued over that of obtaining a "near" solution.
relaxation
The proposed
local
algorithm exploits the inherent structure of the objective function. It solves a sequence
of small, local, quadratic-programs by rst projecting asset returns onto a reduced metric space, followed by clustering in this space to identify sub-groups of assets that best accentuate a suitable measure of similarity amongst dierent assets. The algorithm can either be cold started using the centroids of initial clusters or be warm started based on the outputs of a previous result. Empirical result, using baskets of up to 3,000 stocks and with dierent cardinality constraints, indicates that the proposed algorithm is able to achieve signicant performance gain over a sophisticated branchand-cut method.
109
A
Proof.
Appendix of Proofs [Proposition 19] The rst result follows by substituting
log ht + κ + vt ξ − wt )/ϕ,
into the GARCH equation and rearranging. Next, we substitute
log rt2
= (log xt − ξ − wt )/ϕ + κ + vt ,
log xt −ξ −wt = ϕω + so with
Xp∨q i=1
πi = αi + βi + γi ϕ,
and multiply by
(βi +αi )(log xt−i −ξ −wt−i )+ϕ
Xq j=1
p∨q X
πi log xt−i + wt −
i=1
Proof.
ϕ = 0,
ϕ,
and
log rt2 =
log ht = (log xt −
and nd
γj log xt−j +ϕ
Xq j=1
αj (κ+vt−j )
we have
log xt = ξ(1 − β• − α• ) + ϕκα• + ϕω + When
log xt = ϕ log ht + ξ + wt
the measurement equation shows that
log xt
Xp i=1
(αi + βi )wt−i + ϕ
Xq j=1
αj vt−j .
is an iid process.
[Lemma 22] First note that
∂gt0 = 0, h˙ t−1 , . . . , h˙ t−p , 0p+q+1×q =: H˙ t−1 , ∂λ Thus from the GARCH equation,
˜ t = λ0 gt , h
we have that
p
X ∂g 0 βi h˙ t−i + gt . h˙ t = t λ + gt = H˙ t−1 λ + gt = ∂λ i=1 Similarly, the second order derivative, is given by
¨t h
= = = =
∂(gt + H˙ t−1 λ) ∂λ0 ∂gt Ht−1 + H˙ t−1 + λ ∂λ0 ∂λ0 p X ∂ h˙ t−i 0 H˙ t−1 + H˙ t−1 + βi ∂λ0 i=1 p X
¨ t−i + H˙ 0 + H˙ t−1 . βi h t−1
i=1
(h0 , . . . , hp−1 ) being treated as xed or as a ˙ ¨ ¨ t this implies h ¨ 1 = 0. vector of unknown parameters, we have hs = hs = 0. Given the structure of h P t−1 ˙t = When p = q = 1 it follows immediately that h β j gt−j . Similarly we have For the starting values we observe that regardless of
j=0
¨t = h
t−1 X
β (H˙ t−1−j + H˙ t−1−j ) = j
j=0 where
H˙ t = (03×1 , h˙ t , 03×1 )
t−2 X
β j (H˙ t−1−j + H˙ t−1−j )
j=0 and where the second equality follows by
110
H˙ 0 = 0.
The results now
follows by
t−2 X
β i h˙ t−1−i
t−2 X
=
i=0
=
βi
t−1−i−1 X
i=0
j=0
t−2 X
t−i−2 X
βi
i=0
β k−i−1 gt−k
k−i−1=0
t−2 X t−1 X
=
β j gt−1−i−j
β k−1 gt−k
i=0 k=i+1 t−1 X
=
kβ k−1 gt−k .
k=1
Proof.
[Proposition 23] Recall that
ut = x ˜t − ψ 0 mt
and
˜ t = g 0 λ. So derivative with respect to h ˜t h t
given by
∂zt ˜t ∂h ∂ut u˙ t = ˜t ∂h ∂ u˙ t u ¨t = ˜t ∂h So with
˜ t) ∂rt exp(− 12 h 1 = − zt ˜ 2 ∂ ht
=
1 = −ϕ + zt τ 0 a˙ t , 2 ˙ t )0 τ ∂ −ϕ + 21 zt a(z 1 = − τ 0 zt a˙ t + zt2 a¨t . ˜ 4 ∂ ht
=
˜ t + z 2 + log(σ 2 ) + u2 /σ 2 } `t = − 12 {h u t u t
∂`t ut =2 2 ∂ut σu
∂`t 1 =− ˜t 2 ∂h
and
Derivatives with respect to
∂zt2 = −zt2 , ˜ ∂ ht
so that
λ
(
we have
˜t ∂z 2 ∂u2t /∂ h 1+ t + 2 ˜t σu ∂h
) =−
1 2
are
∂zt ∂λ ∂ut ∂λ ∂ u˙ t ∂λ0 ∂`t ∂λ
= =
˜t ∂zt ∂ h 1 = − zt h˙ t ˜ t ∂λ 2 ∂h ˜t ∂ut ∂ h = u˙ t h˙ t ˜ ∂ ht ∂λ
= u ¨t h˙ 0t =
∂`t ˙ 1 ht = − ˜ 2 ∂ ht
111
1 − zt2 +
2ut u˙ t σu2
h˙ t .
1 − zt2 +
2ut u˙ t σu2
.
are
Derivatives with respect to
∂ut ∂ξ ∂ut ∂ϕ ∂ut ∂τ Similarly,
∂`t 2 ∂σu
ψ
are
= −1, ˜ t, = −h = −at ,
∂ u˙ t ∂`t ∂`t ∂ut ut = 0, and = = −2 2 , ∂ξ ∂ξ ∂ut ∂ξ σu ∂ u˙ t ∂`t ∂`t ∂ut ut ˜ = −1, and = = −2 2 h t, ∂ϕ ∂ϕ ∂ut ∂ϕ σu ∂ u˙ t 1 ∂`t ∂`t ∂ut ut = zt a˙ t and = = −2 2 at . ∂τ 2 ∂τ ∂ut ∂τ σu
= − 12 (σu−2 − u2t σu−4 ).
−2
∂z 2 2 ∂ut ∂ u˙ t ∂ h˙ t 2ut = h˙ t − t0 + 2 u˙ t 0 + ut 0 + (1 − zt2 + 2 u˙ t ) 0 ∂λ σu ∂λ ∂λ σu ∂λ 2 2ut = h˙ t zt2 + 2 u˙ 2t + ut u ¨t h˙ 0t + (1 − zt2 + 2 u˙ t )h¨t . σu σu
∂ 2 `t ∂λ∂λ0
Similarly, since
∂zt ∂ψ
∂ 2 `t −2 ∂λ∂ξ ∂ 2 `t ∂λ∂ϕ ∂ 2 `t −2 ∂λ∂τ 0 −2
so that
Now we turn to the second order derivatives.
=0
we have
∂(1 − zt2 +
2ut ˙ t )h˙ t 2 u σu
u˙ t ˙ = 2ht − 2 + 0 ∂ξ σu 2ut 2 ∂(1 − zt + σ2 u˙ t )h˙ t ∂ut u˙ t ut ∂ u˙ t ut u˙ t u ˙ ˙ ˜ = 2ht + 2 = 2ht −ht 2 − 2 ∂ϕ ∂ϕ σu2 σu ∂ϕ σu σu ut ∂ u˙ t ut 1 ∂ut u˙ t u˙ t 2h˙ t + 2 = 2h˙ t −a0t 2 + 2 zt a˙ t , ∂τ 0 σu2 σu ∂τ 0 σu σu 2
= = =
= 2h˙ t
ut ∂ u˙ t ∂ut u˙ t + 2 ∂ψ 0 σu2 σu ∂ξ
u˙ t ∂ 2 `t ut = 2 h˙ t m0t + 2 h˙ t b0t , ∂λ∂ψ 0 σu σu ∂ 2 `t ∂λ∂σu2
=
−
with
1 bt = (0, 1, − zt a˙ 0t )0 . 2
2ut 2 ˙ 1 ∂(1 − zt + σu2 u˙ t )ht ut u˙ t h˙ t = 2 2 ∂σu σu4
∂ 2 `t 1 = − 2 mt m0t ∂ψ∂ψ 0 σu ∂ 2 `t 1 2ut ut = − (− 4 )mt = 4 mt ∂ψ∂σu2 2 σu σu ∂ 2 `t 1 −1 u2t 1 σu2 − 2u2t = − + 2 = . ∂σu2 ∂σu2 2 σu4 σu6 2 σu6
Proof.
[Proposition 26] We note that
ht h2t
=
=
exp
∞ X
! µ
i
π (µ + γwt−1 )
i=0 ∞ X
exp 2
i=0
= e 1−π !
π i (µ + γwt−1 )
∞ Y
E exp γπ i τ (zt−i ) E exp π i γut−i ,
i=0 ∞ Y
2µ
= e 1−π
i=0
112
E exp 2γπ i τ (zt−i ) E exp 2π i γut−i ,
and using results, such as
E
∞ Y
i
exp π γut−i
!
i=0
=
∞ Y
∞ 2 P∞ π 2i γ 2 σu Y = e i=0 E exp π i γut−i = e 2
i=0
2 π 2i γ 2 σu 2
=e
2 /2 γ 2 σu 1−π 2
,
i=0
we nd that
Eh2t (Eht )2
=
=
=
=
2µ Q ∞ e 1−π i=0 E exp 2γπ i wt−1 2µ Q 2 ∞ e 1−π i=0 {E exp (γπ i wt−1 )} 2 ! P∞ 4π2i γ 2 τ12 2γ 2 σu 2γτ2 ∞ Y 1 − 2π i γτ2 e i=0 2(1−4πi γτ2 ) e− 1−π e 1−π2 p γτ2 2 2 P γ 2 σu π 2i γ 2 τ1 −2 1−π 1 − 4π i γτ2 2 ∞ i=0 2(1−2π i γτ ) e i=0 e 1−π2 2 e ! 2 2 ∞ 2 P∞ 2π 2i γ 2 τ1 π 2i γ 2 τ1 γ 2 σu Y 1 − 2π i γτ2 − p e i=0 (1−4πi γτ2 ) (1−2πi γτ2 ) e 1−π2 1 − 4π i γτ2 i=0 ! P 2 ∞ π 2i γ 2 τ1 2 ∞ γ 2 σu Y 1 − 2π i γτ2 i=0 (1−6π i γτ +8π 2i γ 2 τ 2 ) 2 2 e 1−π 2 p e 1 − 4π i γτ2 i=0
where the last equality uses
π 2i γ 2 τ12 2π 2i γ 2 τ12 − (1 − 4π i γτ2 ) (1 − 2π i γτ2 )
2π 2i γ 2 τ12 (1 − 2π i γτ2 ) − π 2i γ 2 τ12 (1 − 4π i γτ2 ) (1 − 4π i γτ2 )(1 − 2π i γτ2 ) π 2i γ 2 τ12 π 2i γ 2 τ12 = . (1 − 4π i γτ2 )(1 − 2π i γτ2 ) (1 − 6π i γτ2 + 8π 2i γ 2 τ22 )
= =
Recall that
∞ Y 1 − 2π i γτ2 E(rt4 ) p = 3 2 2 E(rt ) 1 − 4π i γτ2 i=0
!
( exp
∞ X i=0
π 2i γ 2 τ12 i 1 − 6π γτ2 + 8π 2i γ 2 τ22
)
exp
γ 2 σu2 1 − π2
.
For the rst term on the right hand side, we have
log
∞ Y 1 − 2π i γτ2 p 1 − 4π i γτ2 i=0
ˆ '
=
∞
1 − 2π x γτ2 dx log √ 1 − 4π x γτ2 0 (∞ ) X (2γτ2 )k 1 1 (4γτ2 )k − (1 − 2k−1 ) log π k2 2 k2 k=1
= =
1 log π γ 2 τ22
∞ X (2γτ2 )k k=1
k2
(1 − 2k−1 )
3 1 + 38 γτ2 + 7(γτ2 )2 + 96 5 (γτ2 ) + − log π
496 4 9 (γτ2 )
The second term can be bounded by
∞
X γ 2 τ12 π 2i γ 2 τ12 γ 2 τ12 1 ≤ ≤ . i γτ + 8π 2i γ 2 τ 2 2 1 − 6πγτ 1 − π2 1 − 6π 1 − π 2 2 2 i=0 So the approximation error is small when
γτ2
is small.
113
+ ···
.
Proof.
[Proposition 30] Starting from the assumption that the following SDE is well dened
dλt = β (ρ (t) − λt ) dt + αdNt , then solution for
λt
takes the form
ˆ
t
αe−β(t−u) dNu
λt = c (t) + 0
ˆ
where
t
e−β(t−u) ρ (u) du.
c (t) = c (0) e−βt + β 0 Verify by Ito's lemma on
eβt λt ˆ eβt λt
ˆ
t
αeβu dNu 0
0
βeβt λt dt + eβt dλt
t
eβu ρ (u) du +
= c (0) + β
= βeβt ρ (t) dt + αeβt dNt = β (ρ (t) − λt ) dt + αdNt .
dλt Taking the limit we obtain
ˆ
lim c (t)
t→∞
=
lim
t→∞
=
lim β
ρ (t)
as a constant
´t 0
t→∞
= Treating
c (0) e
−βt
+β
c (0) ≡ µ
ρ (u) du
0
eβu ρ (u) du eβt
lim ρ (t) then we have
ˆ
Note that if we set
e
−β(t−u)
t→∞
ρ (t) ≡ µ, c (t)
t
=
c (0) e
−βt
t
+β
e−β(t−u) ρ (u) du eβt − 1
0 −βt
=
c (0) e−βt + µe
=
µ + e−βt (c (0) − µ)
then the process is simply
ˆ
t
e−β(t−u) dNu .
λt = µ + α 0 Therefore we can think of
µ
as the long run base intensity, i.e. the intensity if there has been no
past arrival.
114
Proof.
[Proposition 32] Since the parameters are bounded, we have
(1)
LT (µ1 , β11 , β12 , α11 , α12 )
ˆ −
=
T
µ1 dt +
0
ˆ
T
Xˆ
α11 e−β11 (t−ti ) dt +
0
ti