Computational Economics 21: 3–22, 2003. © 2003 Kluwer Academic Publishers. Printed in the Netherlands.
3
Estimation of VAR Models: Computational Aspects PAOLO FOSCHI and ERRICOS J. KONTOGHIORGHES Institut d’informatique, Université de Neuchâtel, Rue Emile Argand 11, CH-2007 Neuchâtel, Switzerland; E-mail:
[email protected] and
[email protected]
Abstract. The Vector Autoregressive (VAR) model with zero coefficient restrictions can be formulated as a Seemingly Unrelated Regression Equation (SURE) model. Both the response vectors and the coefficient matrix of the regression equations comprise columns from a Toeplitz matrix. Efficient numerical and computational methods which exploit the Toeplitz and Kronecker product structure of the matrices are proposed. The methods are also adapted to provide numerically stable algorithms for the estimation of VAR(p) models with Granger-caused variables. Key words: vector autoregressive processes, SURE model, matrix factorizations
1. Introduction The vector time series zt ∈ Rn is a Vector Autoregressive (VAR) process of order p when its data generation process has the form zt = 1 zt −1 + 2 zt −2 + · · · + p zt −p + ut ,
(1)
where i ∈ Rn×n are the coefficient matrices and the vectors ut ∈ Rn are serially uncorrelated and identically distributed with zero mean and variance-covariance matrix . That is, E(ut ) = 0, E(ut uTt ) = and E(ut uTt ) = 0, for t = τ . Given a sample z1 , . . . , zM and a presample z1−p , . . . , z0 the VAR model (1) is efficiently estimated by Ordinary Least Squares (OLS) estimation of the model T T uT T T T zM zM+1−p · · · zM−1 zM−p p M T uT T zT T T M−1 zM−1−p zM−p · · · zM−2 p−1 M−1 = . + . , (2) .. · · · ··· ··· .. .. . T T z2−p · · · z0T z1−p z1T uT1 T1 which in compact form it can be written as Y + XB + U ,
(3)
This work is in part supported by the Swiss National Foundation Grants 21-54109.98 and
1214-056900.99/1.
4
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
where Y, X, B and U are defiend by the context. The variance-covariance matrix of vec(U ) is ⊗ IM , where vec(·) denotes the column stacking operator and ⊗ is the Kronecker product operator. The OLS and Generalized Least Squares (GLS) estimators of (3) are the same (Lütkepohl, 1993; Zellner, 1962). Let T = (X Y ) and its QR decomposition (QRD) be given by T =Q
R 0
=
np
n
M−(p+1)n
QT
QY
QN
np
p
RT RTY np
0 RY p M−(p+1)n 0 0
(4)
where Q ∈ RM×M is orthogonal and R ∈ RM×M is upper triangular. The OLS estimator of B in (3) is computed by Bˆ = RT−1 RT Y .
(5)
The residuals are given by Uˆ = QY RY ,
(6)
and the covariance matrix is estimated by ˆ = α Uˆ T Uˆ = αRYT RY ,
(7)
where α = 1/M or α = 1/(M − np). Alternatively R in (4) may be computed using the Cholesky factorization of T T T , but this is neither computationally efficient nor numerically stable due to the poor numerical properties of the matrix T T T . Efficient methods avoid this problem by computing the Cholesky factorization without forming that matrix explicitly. The matrix T is Block–Toeplitz with blocks of size 1 × n that are constant along the diagonals. Exploiting the structure of T , a fast algorithm is possible for computing the upper triangular matrices in (4) and, thus, a fast estimation algorithm can be designed. When other, possibly endogenous, factors such as deterministic trends are added, (1) becomes zt = wt + 1 zt −1 + 2 zt −2 + · · · + p zt −p + ut ,
(8)
where wt ∈ Rη and ∈ Rn×η . The OLS estimates of (8) can be computed the same way as those of (1). However, care must be taken in arranging the matrix of regressors (endogenous and exogenous) in (4) to minimize the loss of structure derived from the introduction of the endogenous variables wt . A good choice is M = (W X Y ), where W T = (wM wM−1 · · · w1 ). The complexity of the algorithms will be given in flops (floating point operations per second), where flop denotes a single scalar multiplication or addition. Throughout the paper, the following notation is used: the vector ei denotes the ith
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
5
column of the n × n identity matrix In and the n × n shift matrix is denoted by Z = (e2 e3 · · · en 0), that is 0 0 ··· 0 0 1 0 ··· 0 0 Z = 0 1 ··· 0 0 . (9) .. .. . . .. .. . . . . . 0 0 ··· 1 0 A set of vectors v1 , v2 , . . . , vn is denoted by {vi }n and the direct sums of two or more matrices A ⊕ B and ⊕ni=1 Ai are equivalent to the block diagonal matrices A1 0 · · · 0 0 A2 · · · 0 A 0 (10) and .. .. . . .. , 0 B . . . . 0 0 · · · An respectively (Andrews and Kane, 1970; Graham, 1986; Regalia and Mitra, 1989). For notational convenience the subscript n in the set operator {·}n is dropped and ⊕ni=1 will be abbreviated to ⊕i . The purpose of this work is twofold. First, to exploit the structure of the Toeplitz matrix in (3) and provide a fast algorithm to compute the upper triangular matrix R and, consequently, an efficient estimation procedure for the VAR(p) models. Second, to design computationally efficient methods to estimate VAR models with coefficient constraints by exploiting the Kronecker product structure of the Seemingly Unrelated Regression Equations (SURE) models. In Section 2 the Generalized Schur Algorithm (GSA) and its block version are presented. In Section 3 the adaption of the algorithm to estimate the VAR models (1) and (8) is considered. In Section 4 the estimation of the VAR model with Zero Coefficient Constraints (VAR-ZCC) and the resulting SURE model are investigated. Finally in Section 5 the estimation of the VAR model with Grangercausality restrictions is presented. This model is considered as a SURE model with Proper Subset Regressors (SURE-PSS). 2. Structured Matrices and the Generalized Schur Algorithm Let A ∈ Rn×n be a positive definie matrix and F ∈ Rn×n be strictly lower trianglular; that is, F = [fij ] with fij = 0 for i ≤ j . The displacement of A with respect to (w.r.t.) F is ∇F A = A − F AF T
(11)
and its rank d = rank(∇F A) is called the displacement rank of A. The matrix A is said to have a displacement structure or, more simply, be structured w.r.t. F if it has a small displacement rank. In this case ∇F A = GT JG ,
(12)
6
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
where G ∈ Rd×n , J = Ik ⊕ (−Il ) and k + l = d. The rows of G are called the generators of A (Kailath et al., 1979; Kailath and Sayed, 1995). Given F , the matrix A is uniquely defined by k, l and G. In fact, sicne F is strictly lower triangular, F n = 0, and from (11) and (12) it follows that A=
n−1
F i (A − F AFT )(F i )T =
i=0
n−1
F i GT JG(F i )T ,
(13)
i=0
where it has been assumed F 0 matrix t3 · · · t1 t2 t2 t1 t2 · · · t3 t2 t1 · · · T = .. .. .. . . . . . . tn tn−1 tn−2 · · ·
= In . Consider for example the symmetric Toeplitz tn tn−1 tn−2 .. .
.
(14)
t1
This is a structured matrix and its displacement rank w.r.t. the shift matrix F = (e2 e3 · · · en 0) – the matrix with ones on the first sub-diagonal and zero elsewhere – is 2, k = l = 1, and the generators are 1 t1 t2 t3 · · · tn . (15) G= √ t1 0 t2 t3 · · · tn A Generalized Schur Algorithm (GSA) can be used to compute the Cholesky factorization A = R T R when A has displacement structure (12). At each iteration a row of R is computed. Since the first column of F i G is zero for i ≥ 1, it follows from (13) that the first row of A is given by (a11 a12 · · · a1n ) = r11 (r11 r12 · · · r1n ) = (g11 g21 · · · gd1 )JG .
(16)
If Q is a J -invariant matrix (a hyperbolic transformation), that is QJQT = J , ˜ = QT G are again generators for A. If Q is chosen to annihilate then the rows of G the first column of G except for the (1,1)-element, then (16) is given by g˜ 11 (g˜11 g˜12 · · · g˜1n );
(17)
that is, g˜1T = (g˜11 g˜1n · · · g˜1n ) is the first row of R. Consider now the partitioning R=
i−1
n−i+1
R1 0
R12 R2
i−1 n−i+1
,
A=
i−1
n−i+1
A11 A12 A21 A22
i−1
n−i+1
(18)
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
and define (i)
A
=A−
R1T T R12
(R1 R12) =
0 0 0 S
7
,
(19)
where S = R2T R2 is the Schur complement of A11 . If A(i) has displacement structure ∇F A(i) = GTi JGi ,
(20)
with Gi =
i−1
1
n−i
0 0
u1 v1
uT2 V2
1
(21)
,
d−1
and Qi is a J -invariant matrix that annihilates the elements of v1 , that is,
i−1
1
n−i
0
ρ
0
0
u˜ T V˜
˜ i = QTi Gi = G
1
,
(22)
d−1
then the first row of R2 is given by (ρ u˜ T ). Now, if riT = (0T ρ u˜ T ), then A(i+1) = A(i) − ri riT , which has displacement given by ∇F A(i+1) = GTi JGi − ri riT + Fri riT F T ˜ Ti J G ˜ i − ri riT + Fri riT F T = G = where Gi+1 =
GTi+1 JGi+1
riT F T
0(r−1)×i V˜
(23)
,
(24)
has the same structure as (21), i.e., the first i elements of Fri are zero. Thus, given the generators of A in (12), the rows of the Cholesky factor R can be computed by iterating Equations (22) and (24). The algorithm may breakdown if u1 T T < 0. (25) (u1 v1 )J v1 At each step of the algorithm, a J -invariant matrix Qi should be computed to annihilate the elements of v1 in (21). The computation of this matrix plays a key role in the numerical stability of the whole algorithm (Stewart and Van Dooren, 1997). In particular the number of hyperbolic transformations should be minimized. Here only one hyperbolic Givens rotation is used and this is in factored form (Stewart and Van Dooren, 1997).
8
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
Let (x T y T ) = (uT1 v1T ), where x ∈ Rk and y ∈ Rl . If Qx ∈ Rk×k and Qy ∈ Rl×l are two orthogonal matrices, then Qx ⊕ Qy is J -invariant. If Qx and Qy are the Householder transformations such that QTx x = αe1 and QTy y = βe1 , and H ∈ R2×2 is such that α ρ H = (26) β 0 and
H
T
1 0 0 −1
then the matrix
H =
1 0 0 −1
h11 0 0 Ik−1 Qx 0 Q= 0 Qy h21 0 0 0
,
h12 0 0 0 h22 0 0 Il−1
is J -invariant and satisfies x T Q = ρe1 . y
(27)
(28)
(29)
Notice that if no breakdown occurs, then the matrix H can be always computed as the hyperbolic Givens rotation 1 1 s , (30) H = c s 1 where s = −β/α and c2 = 1 − s 2 . In factored form this is 1 0 1/c 0 1 s H = , s c 0 1 0 1
(31)
which represents a stable implementation of the transformation H (Stewart and Van Dooren, 1997). Algorithm 1 summarizes the Generalized Schur Algorithm. It needs to store only the generators Gi and the matrix R; the matrix A(i) is never computed explicitly. Supposing that the matrix-vector multiplication involving F is negligible (when F is a shift matrix) and using Householder matrices for Qx and Qy , the computational cost of the algorithm is dominated by the steps 5 and 7, which require 4d(n − i) and 6(n − i) flops, respectively (Golub and Van Loan, 1996). Therefore, the complexity of the algorithm is (2d + 3)n2 .
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
9
ALGORITHM 1. Generalized Schur Algorithm 1: Set G(1) = G 2: for i = 1, 2, . . . , n − 1 do 3: Let
G(i) 4: 5:
i−i i 0 x = 0 y
7:
X Y
k l
.
(32)
Compute the Householder reflections Qx and Qy such that QTx x = αe1 and QTy y = βe1 , respectively. Apply Qx and Qy to X and Y , respectively, such that: X˜ = QTx X,
6:
n−i
Y˜ = QTy Y .
Compute a hyperbolic Givens transformation H such that α T = ρe1 . H β
(33)
(34)
ˆ Yˆ and Apply H to the first rows of X˜ and Y˜ to obtain X, ˆ (i) = G
0 ρe1 Xˆ 0 0 Yˆ
.
(35)
ˆ in the ith rown of R. 8: Store (0 · · · 0 e1T X) ˆ (i) after its first row has been multiplied by F . 9: G(i+1) is defined by G 10: end for For some applications a block version of the algorithm is more appropriate. That is, if A, F ∈ RNn×Nn are matrices with block-size n × n, then F is block strictly lower triangular and A has displacement rank d w.r.t. F . A possible implementation of the Block Generalized Schur Algorithm (BGSA) is outlined in Algorithm 2. Steps 4–7 compute a J -invariant transformation Qi such that X1 Ri . (36) = QTi Y1 0 Notice that other implementations exist for computing (36) each of which has different numerical properties. In Algorithm 2 the only critical part, concerning numerical stability, is the computation of the hyperbolic transformation H in Step 6. The breakdown of the block algorithm (Gallivan et al., 1996) is related to the non-existence or the singularity of Ri in (36).
10
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
ALGORITHM 2. Generalized Schur Algorithm – Block Version 1: Set G(1) = G and J = Ik ⊕ −Il . 2: for i = 1, 2, . . . , N do 3: Let Gi = 4:
5:
Compute k−n
7:
0 0
n
(N−i)n
X1 Y1
X2 Y2
k
(37)
.
l
Compute the QRDs RX RY and Y1 = QY . X1 = QX 0 0
n
6:
(i−1)n
Xˆ 2A Xˆ 2B
= QTX X2
n
and l−n
Yˆ2A Yˆ2B
Compute a J-invariant matrix H such that RX Ri T . = H RY 0 Compute
ˆ 2A X X˘ 2A = HT . Y˘2B Yˆ2B Store (0 Ri X˘ 2A ) in the ith block-row of R.
9:
Compute (0
(i−1)n
Xˆ A ) = ( 0
n
Ri
= QTY Y2 .
(39)
(40)
(41)
8:
in (N−i)n
(38)
(N−i)n
X˘ 2A ) F T
(42)
and form Gi+1 as: Gi+1
10: end for
(i−1)n
n
0
0
0 = 0 0
0 0 0
(N−i)n
n Xˆ A k−n X˜ 2B . Y˘2A n Y˜2B l−n
(43)
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
11
If Householder transformations are used, then the computational complexity of Steps 4 and 5 is 4n2 (N − i)(d − n) flops. The number of flops for Steps 6 and 7 using hyperbolic Givens rotation is given by j n n 1 2 6((N − i)n + h) 6 j (N − i)n + j 2 (44) j =1 h=1 j =1 4(N − i)n3 . Thus, the computational complexity of the whole procedure is given by N
(N − i)n (4n + 4(d − n)) 2
N
i=1
4d(N − i)n2
i=1
(45)
2dN 2 n2 . 3. A Fast Algorithm for the OLS Estimation of the VAR Model The BGSA shown by Algorithm 2 is designed to compute the Cholesky factorization of structured matrices. To compute the matrix R in (4) the algorithm should be applied to the matrix T T T . Consider the block Toeplitz matrix T ∈ RMm×Nn defined by j =1,...,N
T = [Ti−j ]i=1,...,M , that is,
T0 T1 .. .
T =
TM−1
(46)
T−1 · · · T1−N T0 · · · T2−N .. .. . . TM−2 · · · TM−N
,
(47)
where Tk ∈ Rm×n . Let A = T T T , where the (i, j )th block Aij ∈ Rm×n is given by Aij =
M
T T Tk−i Tk−j = Yi−1 Yj −1
(48)
k=1 T T T T1−i · · · TM−i−1 ) is the (i + 1)th block column of T . If Y0 has full and YiT = (T−i column rank, then the displacement rank of A w.r.t. Zm = Z ⊗ Im is 2(m + n) and the generators (Appendix A) are given by J = Im+n ⊕ −Im+n and
n
n
···
n
··· 0 T T1−N −1 · · · G= T 0 Q0 Y1 · · · QT0 YN−1 0 TM−1 · · · TM+1−N R0 QT0 Y1
QT0 YN−1
n
m , n m
(49)
12
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
where Y0 = Q0 R0 , R0 ∈ Rn×n is upper triangular and Q0 ∈ RMm×n is orthogonal. If M = (W T ), where W ∈ RMm×η , then the displacement equation for A¯ = T M M w.r.t. the shift matrix Z¯ = 0η×η ⊕ Zm is T W W WT T 0 0 + , (50) ∇Z¯ A¯ = 0 0 ∇Zm T T T TTW so that the generators are given by J = Iη+m+n ⊕ −Iη+m+n and G=
η
n
0 0 0 0 0
n
···
n
··· R0 ··· 0 T−1 · · · T1−N T QW Y0 QTW Y1 · · · QTW YN−1 0 QT0 Y1 · · · QT0 YN−1 0 TM−1 · · · TM+1−n
RW QTW Y0
QTW Y1 QT0 Y1
QTW YN−1 QT0 YN−1
η n m
.
(51)
η n m
The computational complexity of (51) can be approximated by 4n2 ((n+mM/2)N+ η). Notice that if the generators are given by (49), or by (51), then Steps 4–7 of Algorithm 2 are not needed for the first iteration. In the specific case of the estimation of VAR models m = 1, N = p + 1, Tk = T , YiT = (yM+i−p · · · y2+i−p y1+i−p ) and the displacement rank of A and A¯ yM−N−k are 2(n + 1) and 2(n + η + 1), respectively. Thus, the computation of the Cholesky decomposition of A¯ T A¯ using Algorithm 2 requires 2(n + η)n2 p 2 flops, and the computation of the generators requires a further 4n2 ((n + M/2)p + η) flops. 4. VAR Models with Zero Coefficient Constraints The VAR model (3) can be written as the SURE model vec(Y ) = (In ⊗ X)vec(B) + vec(U ) ,
(52)
where vec(U ) has zero mean and variance-covariance matrix given by ⊗ IM . Often zero coefficient constraints (ZCC) are imposed to VAR models, hereafter called VAR-ZCC model (Lütkepohl, 1993). Thus, some elements of B are zero. Let βi ∈ Rki the vector of nonzero elements in the ith column of B. The VAR-ZCC model can be written as the SURE model vec(Y ) = (⊗ni=1 Xi )vec({βi }n ) + vec(U ) ,
(53)
where Xi = XSi and Si ∈ Rnp×ki is a selection matrix. Notice that the SURE model has common regressors. The Best Linear Unbiased Estimator (BLUE) of vec({βi }) is obtained from the solution of the General Least Squares (GLS) problem argmin vec(Y ) − vec({Xi βi }) −1 ⊗IM β1 ,...,βn
(54)
13
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
which is given by vec({βˆi }) = ((⊗i XiT )( −1 ⊗ IM )(⊕i Xi ))−1 (⊗i XiT )vec(Y −1 ) .
(55)
The computation of (55) has poor numerical properties and does not allow for a singular dispersion matrix . Therefore, it is preferable to formulate (54) as the Generalized Linear Least Squares Problem (GLLSP) argmin V F subject to vec(Y ) = (⊕i Xi )vec({βi }) + vec(V C T ) ,
(56)
V , {βi }
where · F denotes the Frobenious norm, = CCT , the upper triangular C ∈ Rn×n has full rank and the random matrix V is defined as (C ⊗ IM )vec(V ) = vec(U ). That is, VCT = U , which implies that vec(V ) has zero mean and variancecovariance matrix InM (Kontoghiorghes and Clarke, 1995; Kourouklis and Paige, 1981; Paige, 1978; Paige, 1979). Without loss of generality it will be assumed that
is non-singular. Consider the Generalized QR decomposition (GQRD) ⊕i Ri T (57a) Q (⊕i Xi ) = 0 and QT (C ⊗ IM )P =
K nM−K
W11 W12 0 W22
K
(57b)
,
nM−K
n ki , Ri ∈ Rki ×ki and W22 are upper triangular, and Q, P ∈ where K = i=1 RnM×nM are orthogonal (Anderson et al., 1992; Paige, 1990). Using (57) the GLLSP (56) can be written as G (viA 2 + viB 2 ) subject to argmin {viA }, {viB },{βi }
i=1
vec({yiA }) vec({yiB })
where
=
Q vec(Y ) = T
and
P vec(V ) = T
⊕i Ri 0
vec({yiA }) vec({yiB })
vec({viA }) vec({viB })
vec({βi }) +
K nM−K
K nM−K
W11 W12 0 W22
vec({viA }) vec({viB })
(58) ,
14
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
−1 From (58) it follows that vec({viB }) = W22 vec({yˆi }) and vec({viA }) = 0. Thus, the solution for the SURE model comes from solving the triangular system vec({yiA }) ⊕i Ri W12 vec({βi }) = . (59) 0 W22 vec({yiB }) vec({viB })
The matrix Q in (57) is defined as Q = (⊕i QiA ⊕i QiB ) , where QTi Xi
=
Ri 0
ki
M−ki
, with Qi = QiA QiB
,
is the QRD of Xi (i = 1, . . . , G). The computation of (57b) occurs in two stages. The first stage computes Q (C ⊗ I )Q = T
K nM−K
11 W 12 W 21 W 22 W
K
,
(60)
nM−K
ij (i, j = 1, 2) is block upper triangular. Furthermore, the main blockwhere W 21 are zero, and the ith (i = 1, . . . , G) blocks of the main 12 and W diagonals of W 22 are given by Cii Iki and Cii IM−ki , respectively. The second 11 and W diagonal of W stage computes the RQD 22 ) P 21 W = (0 W22 ) (W
(61a)
= (W11 W12 ) . 12 ) P 11 W (W
(61b)
and
. Sequential and parallel strategies for computing the RQD Thus, in (57b) P = QP (61) have been described in Kontoghiorghes (1999, 2000). The computations of (60) and (61) are the most expensive operations – their computational cost is in the order of n3 M 3 – and they become quite important when computing the feasible GLS (FGLS) estimators, that is, when computed at each iteration for different C. However, the common regressors that exist in the SURE model can be used to reduce the computational complexity of the iterative estimation procedure (Kontoghiorghes et al., 2000). Consider the QRD given in (4). Premultiplying (53) from the left by Q¯ T = (In ⊗ QT In ⊗ QY In ⊗ QN )T gives vec(UT ) ⊕i RSi vec(RT Y ) vec(RY ) = 0 vec({βi }) + vec(UY ) , (62) 0 0 vec(UN )
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
15
where
n
np UT n T Q U = UY . UN M−(p+1)n The covariance matrix of vec((UT UY UN )) is given by 0 0
⊗ Inp . 0 0
⊗ In 0 0
⊗ IM−n(p+1)
(63)
(64)
Thus the estimator of the SURE model (53) arises from the solution of the reduced size model vec(RTY ) = (⊕i RT Si )vec({βi }) + vec(UT ) ,
(65)
where the covariance matrix of vec(UM ) is given by ⊗ Inp . The estimator of
is now given by ˆ = 1 (Uˆ TT Uˆ T + RYT RY ) ,
M
(66)
where Uˆ is the residual matrix estimated from (65). Notice that RYT RY does not depend on the covariance matrix . 5. VAR Models with Granger Causality Restrictions Consider partitioning the time series zt and the coefficient matrices l in (1) as zAt AAl Abl l = . (67) zt = zBt BAl BBl If BAl = 0 for l = 1, 2, . . . , p, then the series zAt does not Granger-cause zBt ; that is, zAt is not linearly informative about the future of the time series zBt (Hamilton, 1994; Lütkepohl, 1993). This concept can be generalized. Define the permutation (π1 , . . . , πn ) of the indices (1, . . . , n), and let > = (eπ1 eπ2 · · · eπn ) be the associated permutation matrix. Now, if there exists a permutation (π1 , . . . , πn ) such that
h1 h2 ···
× 0 >T l > ∼ .. . 0
× × .. . 0
hs
× k1 × k2 .. .. . . · · · × ks
··· ··· .. .
(68)
16
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
Figure 1. Proper Subset Structure derived from Granger causality restrictions.
for l = 1, 2, . . . , p, then the series zπj does not Granger-cause zπl when the (i, j )th element of the matrix in (68) belongs to the block lower triangular part. When the Granger causality restrictions are known a priori, the variables can be ordered so that the matrices l have the structure given in (68). In this case the model is equivalent to a triangular SURE model (Srivastava and Giles, 1987; Kontoghiorghes and Dinenis, 1996). Conversely, if different models with different causality restriction have to be estimated, then a reordering is not convenient since the Toeplitz structure of X is destroyed. In this case the VAR model is equivalent to a SURE model with proper subset regressors (Srivastava and Giles, 1987). Multiplying (3) on the right by > gives Y˜ = X B˜ + U˜ ,
(69)
where Y˜ = Y >, U˜ = U > and B˜ = B>. Figure 1 shows an example of the structure of B˜ T . The matrix B˜ is characterized by the property that if the (j, i)th element is zero, then also the (j, i + 1)th element has to be zero. That is B˜ = (S1 β1 S2 β2 · · · Sn βn ), where Si = >ik=1 Sˆk = Si−1 Sˆi and Sˆk are selection matrices. The regressor matrices of the SURE model are defined as Xi = XSi = Xi−1 Sˆi . Consider the QRDs Ri (70) Xi = (QiA QiB ) 0 and
Ri 0
Sˆi+1 = (Q¯ i+1,A Q¯ i+1,B )
Ri+1 0
,
(71)
¯ i+1,A Q¯ i+1,B ) are orthogonal matrices. The where Qi = (QiA QiB ) and Q¯ i+1 = (Q QRDs (70) and (71) imply Ri R i+1 Si+1 = Qi Q¯ i+1 , (72) Xi+1 = Qi 0 0 and Qi+1 = Qi Q¯ i+1 . Notice that ˆ i+1 0 Q . Q¯ i+1 = 0 IM−ki
(73)
17
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
In (60) for x, y ∈ {A, B} c11 QT1x Q1y c12 QT1x Q2y 0 c22 QT2x Q2y Wxy = .. .. . . 0 0
· · · c1n QT1x Qny · · · c2n QT2x Qny .. .. . . T · · · cnn Qnx Qny
.
(74)
Thus, for i < j ¯ Tix Q¯ Ti−1 · · · Q¯ T1 Q¯ 1 · · · Q¯ i−1 Q ¯ j −1 Q ¯i ···Q ¯ jy QTix Qjy = Q ¯ T Q¯ i · · · Q¯ j −1 Q¯ jy = Q ix ¯ jy , if x = A, (I 0)Q¯ i+1 · · · Q¯ j −1 Q = ¯ jy , if x = B, ¯ i+1 · · · Q¯ j −1 Q (0 I )Q and for i = j QTix Qjy
if x = y = A, Iki = IM−ki if x = y = B and 0 if x = y .
(75)
(76)
BA = 0, since From (75) and (76) it follows that W ¯ i+1 · · · Q¯ j −1 Q¯ j A ) QTiB Qj A = (0 IM−ki−1 )(Q ki−1 M−ki−1
∼ ( 0
× )
kj−1
× 0
kj−1
(77)
M−kj−1
and ki−1 > kj −1 . The matrix in (60) has a block upper triangular form and thus the = W and P = Q. RQD (61) is not needed, i.e., W Let yi denote the ith column of Y and note that yiA ¯ Ti Q¯ Ti−1 · · · Q ¯ T1 yi . (78) = QTi yi = Q yiB The vectors yiA and yiB can be computed from the recursion −1) yj A (j ) (j ) ¯ Tj (yj(j −1) yj(j+1 · · · yn(j −1) ) , =Q yj +1 · · · yn yj B or, in a more compact form, yj A (j ) = Q¯ Tj Y (j −1) , Y yj B
(79)
(80)
¯ Tj in (80) where Y (0) = (y1(0) y2(0) · · · yn(0) ) = Y . Notice that the multiplication by Q affects only the first kj −1 rows of Y (j −1).
18
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
Consider the case of n = 4. The triangular matrix in (59) is
R1
0
0
0
0
0
R2
0
0
0
0
0
R3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
R4 0
¯ 2B c13 (I 0)Q ¯ 2Q ¯ 2Q ¯ 3B c14 (I 0)Q ¯ 3Q ¯ 4B c12 (I 0)Q ¯ ¯ ¯ 0 c23 (I 0)Q3B c24 (I 0)Q3 Q4B ¯ 4B 0 0 c34 (I 0)Q
0
0 0 0 0 0 ¯ 2B c13 (0 I )Q ¯ 2Q ¯ 2Q ¯ 3B c14 (0 I )Q ¯ 3Q ¯ 4B c11 I c12 (0 I )Q ¯ 4B ¯ 3B ¯ 3Q 0 c22 I c23 (0 I )Q c24 (0 I )Q ¯ 4B 0 0 c33 I c34 (0 I )Q 0 0 0 c44 I 0
(81)
and rearranging the terms of (59) gives, y1A y1B y2A y 2B = y 3A y 3B y 4A y4B
R1 0 0 c11 I
0
¯ 2b c12 Q
0
¯ 3B C13 Q¯ 2 Q
0
0
R2 0
0 c22 I
0
¯ 3B c23 Q
0
0
0
0
R3 0
0 c33 I
0
0
0
0
0
0
¯ 2Q ¯ 3Q ¯ 4B c14 Q ¯ 3Q ¯ 4B 0 c24 Q ¯ 4B 0 c34 Q R4 0 0 c44 I 0
β1 v1B β2 v2B . β3 v3B β4 v4B
Conversely, for i = 1, . . . , n n βi Ri 0 yiA ¯ i+1 Q¯ i+2 · · · Q ¯ j −1 )Q ¯ j B vj B . cij (Q = + yiB viB 0 cii I j =i+1
(82)
(83)
This system is solved by a back-substitution without forming the matrices WAB and WBB as shown by Algorithm 3. In order to optimize the memory access and exploit cache effects, the access to the matrices, Q¯ i of Steps 12 and 14 of Algorithm 3 should be reorganized (Tanenbaum, 1999). Updating the vectors yj is done in one step for each i and the vectors v˜i are computed recursively by the formulae 0 i ¯ Ti v˜ji . (84) and v˜ji−1 = Q v˜i = viB
6. Conclusion Algorithms for solving VAR models have been proposed and analyzed. The VAR models with zero coefficient constraints or Granger-caused variables have been considered as SURE models with common or proper subset regressors, respectively. The numerically stable algorithms have efficiently exploited the Toeplitz, Kronecker, and other structures of the matrices in these models. Furthermore, the proposed algorithms can handle ill-conditioned problems.
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
19
ALGORITHM 3. Recursive Solution of the SURE Model with Proper Subset Regressors 1: Compute the QRD (4). 2: Set R1 = RT and Y˜ = RT Y . 3: for i = 2, . . . , n do 4:
Compute the QRD: Ri−1 Si = Qˆ i
Ri 0
.
5: Compute Y˜1:ki−1 ,i:n = Qˆ Ti Y˜1:ki−1 ,i:n . 6: end for 7: for i = n, n − 1, . . . , 1 do 8: Solve Ri βi = Y˜1:ki ,i . 9: Compute viB = Y˜ki +1:,i /cii . 10: for j = i − 1, i − 2, . . . , 1 do 11: if j = i − 1 then ¯ iB viB . 12: Compute v˜ i = Q 13: else ¯ j v˜i . 14: Compute v˜i = Q 15: end if 16: Compute y˜j = y˜j − cj i v˜i . 17: end for 18: end for The implementation of the algorithms needs to be investigated. Block versions of the algorithms which are suitable for conventional high performance computers need to be designed (Dongarra et al., 1998). The adaptation of the numerical methods to tackle other models that have similar matrix structures as those proposed here needs to be considered. Currently the Vector Error Correction Model (VECM) and the Johansen procedure for estimating cointegrated systems are investigated. The VECM has a structure similar to that of (8), while the Johansen procedure requires the OLS estimation of a linear system having a block Toeplitz structure (Johansen, 1988, 1995; Lütkepohl, 1993).
20
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
Appendix A: Displacement Structures Derived from LS Problems Involving Block Toeplitz Matrices Consider the block Toeplitz matrix T ∈ RMm×Nn defined by j =1,...,N
T = [Ti−j ]i=1,...,M , that is
T0 T1 .. .
T =
TM−1
(A.1)
T−1 · · · T1−N T0 · · · T2−N .. .. . . TM−2 · · · TM−N
,
(A.2)
where Tk ∈ Rm×n . Let define A = T T T = [Aij ]ij ∈ RMm×Mm , having blocks Aij ∈ Rm×m and given by Aij =
M
T T Tk−i Tk−j = Yi−1 Yj −1 ,
(A.3)
k=1 T T T T1−i · · · TM−i−1 ) is the (i + 1)th block column of T . where YiT = (T−i The displacement of A w.r.t. Zm = Z ⊗ Im is given by ∇Zm A = M1 + M2 , where A11 A12 · · · A1n T A12 (A.4) M1 = .. 0 .
AT1n and
M2 =
0 0 0 Aij − Ai−1,j −1
(A.5)
.
From (87) it follows that Aij − Ai−1,j −1 = =
M
T Tk−i Tk−j
k=1 T T1−i T1−j
−
−
M
T Tk−i+1 Tk−j +1
k=1 T TM+1−i TM+1−j
(A.6)
,
for i, j = 2, 3, . . . , n, and M2 = GT2 G2 − GT4 G4 , where G2 = (0 T−1 · · · T1−N ) and G4 = (0 TM−1 · · · TM+1−N ).
(A.7)
ESTIMATION OF VAR MODELS: COMPUTATIONAL ASPECTS
21
If Y0 has full column rank and its QRD is Q0 R0 , then M1 = GT1 G1 − GT3 G3 ,
(A.8)
where G1 = (R0 QT0 Y1 · · · QT0 YN−1 ) and G3 = (0 QT0 Y1 · · · QT0 YN−1 ). From (91) and (92) it follows that the displacement of A is given by G1 In 0 0 0 0 I 0 0 m G2 (A.9) ∇Zp A = (GT1 GT2 GT3 GT4 ) . 0 0 −In 0 G3 G4 0 0 0 −Im
References Anderson, E., Bai, Z. and Dongarra, J.J. (1992). Generalized QR factorization and its applications. Linear Algebra and its Applications, 162, 243–271. Andrews, H.C. and Kane, J. (1970). Kronecker matrices, computer implementation, and generalized spectra. Journal of the ACM, 17(2), 260–268. Dongarra, J.J., Duff, I.S., Sorensen, D.C. and van der Vorst, H.A. (1998). Numerical Linear Algebra for High-Performance Computers. SIAM, Philadelphia, PA. Gallivan, K., Thirumalai, S., Van Dooren, P. and Vermaut, V. (1996). High Performance Algorithms for Toeplitz and Block Toeplitz Matrices. Linear Algebra and its Applications, 241–243, 343– 388. Golub, G.H. and Van Loan, C.F. (1996). Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore, Maryland. Graham, A. (1986). Kronecker Products and Matrix Calculus: With Applications. Ellis Horwood Series in Mathematics and its Applications. Ellis Horwood Limited, Publishers, Chichester; Halsted Press: a division of John Wiley & Sons, New York etc. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press. Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, 12, 231–254. Johansen, S. (1995). Likelihood Based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press, Oxford. Kailath, T., Kung, S.-Y. and Morf, M. (1979). Displacement ranks of a matrix. Bull. Amer. Math. Soc., 1, 769–773. Kailath, T. and Sayed, A.H. (1995). Displacement structure: theory and applications. SIAM Review, 37(3), 297–386. Kontoghiorghes, E.J. (1999). Parallel strategies for computing the orthogonal factorizations used in the estimation of econometric models. Algorithmica, 25, 58–74. Kontoghiorghes, E.J. (2000). Parallel Algorithms for Linear Models: Numerical Methods and Estimation Problems, Vol. 15 of Advances in Computational Economics. Kluwer Academic Publishers. Kontoghiorghes, E.J. and Clarke, M.R.B. (1995). An alternative approach for the numerical solution of seemingly unrelated regression equations models. Computational Statistics and Data Analysis, 19(4), 369–377. Kontoghiorghes, E.J. and Dinenis, E. (1996). Solving triangular seemingly unrelated regression equations models on massively parallel systems. In M. Gilli (ed.), Computational Economic Systems: Models, Methods and Econometrics, Vol. 5 of Advances in Computational Economics, pp. 191–201.
22
PAOLO FOSCHI AND ERRICOS J. KONTOGHIORGHES
Kontoghiorghes, E.J., Foschi, P. and Garin, L. (2000). Computational efficient algorithms for solving SURE models. In Lecture Notes in Computer Science, in press. Kourouklis, S. and Paige, C.C. (1981). A constrained least squares approach to the general GaussMarkov linear models. Journal of the American Statistical Association, 76(375), 620–625. Lütkepohl, H. (1993). Introduction to Multiple Time Series Analysis, 2nd edn. Springer-Verlag. Paige, C.C. (1978). Numerically stable computations for general univariate linear models. Communications on Statistical and Simulation Computation, 7(5), 437–453. Paige, C.C. (1979). Fast numerically stable computations for generalized linear least squares problems. SIAM Journal on Numerical Analysis, 16(1), 165–171. Paige, C.C. (1990). Some aspects of generalized QR factorizations. In M.G. Cox and S.J. Hammarling (eds.), Reliable Numerical Computation, 71–91. Regalia, P.A. and Mitra, S.K. (1989). Kronecker products, unitary matrices and signal processing applications. SIAM Review, 31(4), 586–613. Srivastava, V.K. and Giles, D.E.A. (1987). Seemingly Unrelated Regression Equations Models: Estimation and Inference (Statistics: Textbooks and Monographs), Vol. 80. Marcel Dekker, Inc. Stewart, M. and Van Dooren, P. (1997). Stability issues in the factorization of structured matrices. SIAM Journal on Matrix Analysis and Applications, 18(1), 104–118. Tanenbaum, A.S. (1999). Structured Computer Organization, 4th edn. Prentice Hall. Zellner, A. (1962). An efficient method of estimating seemingly unrelated regression equations and tests for aggregation bias. Journal of the American Statistical Association, 57, 348–368.