March, 1996
Causality in Nonlinear Models Anders Warne Institute for International Economic Studies Stockholm University 10691 Stockholm, Sweden email:
[email protected]
Abstract The concepts of weak, strong and strict Granger causality are introduced for nonlinear time series models. 1-step ahead predictions are formed using the conditional expectation. The weak form is related to Granger's original de nition for linear predictors in that it is based on the forecast error variance, whereas the strong form concerns the conditional variance, and the strict form the conditional distribution. Necessary and sucient conditions for noncausality are derived for a regime switching VAR with residuals that are Gaussian conditional on the regime. In such models, the strong and strict forms are equivalent and imply linear restrictions, while the weak form produces nonlinear constraints. As an illustration, the hypotheses are tested using monthly U.S. data on money and income. Keywords: Granger causality, Markov process, nonlinear time series, regime switching, vector autoregression. JEL Classification Numbers: C32. Parts of this paper was written while I was visiting the Institute of Economics at the University of Copenhagen. I have bene tted greatly from discussions with Henrik Hansen and received valuable comments from seminar participants at the University of Copenhagen, the University of Aarhus, IIES, and the Institute of Statistics and Econometrics, Humboldt University, Berlin. Financial support from Bankforskningsinstitutet is gratefully acknowledged. This paper was printed using funds made available by the Deutsche Forschungsgemeinschaft.
1
1 Introduction The most widely used concept of causality in econometrics is due to Granger (1969). Founded on optimal unbiased linear least squares predictors for some given information set X , Granger's de nition of causality states that a variable m is causal for a variable y if the variance of the 1-step ahead prediction error for y is smaller when the history of m is included in X than when it is excluded. Furthermore, m is not causal for y if these prediction error variances are equal. Granger causality has primarily been studied in linear VAR models under the assumption that the innovation is conditionally homoskedastic (such as the iid case). For such models, the necessary and sucient condition for m not to be causal for y (relative to the variables included in the VAR) is that all coecients on lags of m are zero for the equation describing y. For stationary VAR's the Wald, LM, and LR statistics have their usual limiting distribution (under conventional assumptions about the innovation; see e.g. Lutkepohl (1991)), while for VAR's with some unit roots the distribution depends on certain parameters; see Sims, Stock and Watson (1990) and Toda and Phillips (1993). In nonlinear models we face a more fundamental problem; how to measure causal eects. Granger (1969) takes the view that a cause must precede an eect or, more precisely, that the future cannot cause the past. Hence, if m causes y it should also help predicting y. Given the reliance on rational expectations in much of economic theory, the conditional expectation seems to be the most appropriate candidate as the predictor of interest. The existence of causal eects from m to y may then be investigated by comparing certain moments of the prediction errors for the expectation of y given X and of y given X less the history of m. A simplistic approach is to take the prediction error variance as a measure for whether m causes y or not. However, if the conditional variance of the prediction error is a function of the observed data and these are nonstationary, then the forecast error variance need not exist. More importantly, conditional variances have attracted considerable interest from, for instance, the recent literature on asset pricing in nance (see e.g. Ferson (1993)), 2
2
the literature on economic policy and business cycles (see e.g. Lucas (1987)), and from the literature on central bank independence and in ation versus price level targeting (see e.g. Svensson (1995) and references therein). While the latter bodies of literature typically examine the variances of certain aggregates under various policy rules, the observation that changes in behavior aects the variances suggests that, from a time series perspective, the conditional variances may depend on the likelihood of future changes in the policy rules. If a policy rule is not fully credible and the degree of credibility varies over time, then the conditional variances should also be time varying. In this paper the variance version is refered to as weak Granger causality and the conditional variance version as strong Granger causality. If m is noncausal for y in a strong sense, then by the law of iterated expectations it is also weakly noncausal for y. The reverse is generally not true. Strong noncausality means that the expected squared prediction error conditional on X is, for all time periods, equal to the expected squared prediction error (for the expectation of y given X less the history of m) conditional on X exluding the history of m; see Granger, Engle and Robins (1986). Weak noncausality, on the other hand, means that these conditional variances are equal on average. From that point of view, the weak form corresponds to long run neutrality for the conditional variances, and the strong form to full neutrality. For statistical reasons, the concept strict Granger causality is also discussed. There, m is said to be strictly noncausal for y if the conditional distribution for the prediction error is invariant with respect to the history of m. Strict noncausality is thus closely related to the de nition of noncausality in Chamberlain (1982) and Florens and Mouchart (1982), where m is said to be noncausal for y if the distribution for y conditional on X is invariant with respect to m. Although it is dicult to motivate the strict form on purely economic theory arguments, it is potentially useful for two reasons. First, it allows for a statistical interpretation of the restrictions implied by, say, strong noncausality (typically these will not be unique). Second, restrictions that imply strong but not strict noncausality may depend on other properties of the statistical model which can make inference about those restrictions very dicult. 3
To study the implications of the three causality concepts we consider a VAR model subject to regime switches through a latent q-state Markov chain, and derive necessary and sucient conditions for m to be noncausal for y. While the parametric restrictions implied by strong and strict noncausality are linear, some constraints are nonlinear for weak noncausality. In particular, all coecients on lags of m in the y equation are zero under strong noncausality, whereas only the expected values of these coecients are required to be zero under weak noncausality. The remainder of the paper is organized as follows. In the next section we de ne weak, strong and strict noncausality. In section 3 we rst present the regime switching VAR, and then provide parametric conditions for noncausality. Section 4 illustrates the theory using monthly U.S. data on money and income, while section 5 summarizes the main ndings. Finally, proofs of the propositions are given in the appendix.
2 Granger Causality 2.1 Notation and Basic Assumptions To be concrete, let mt and yt denote money and income, respectively, and let the histories of these variables up to and including period t be given by Mt fm : = t; t ? 1; : : :; 1 ? pg and Yt fy : = t; t ? 1; : : : ; 1 ? pg, where p is a positive integer. Also, let zt denote a vector of other variables and Zt fz : = t; t ? 1; : : :; 1 ? pg its history. We decompose zt into two vectors, z ;t and z ;t, and de ne the n dimensional vector xt such that xt [x0 ;t x0 ;t]0, with x ;t [yt z0 ;t]0 and x ;t [mt z0 ;t]0. Hence, the history of xt can, for instance, be written Xt fZt; Yt; Mtg fX ;t; X ;tg. Suppose Xt is a real valued time series of random variables and that there exists a density (probability) function ft(xtjXt? ; ) for each t. The parameters and the parameter space are denoted by and , where is a subset of Rs , the s dimensional Euclidean space, and it is assumed that the density (probability) function is measurable in xtjXt? for every 2 and continuous in for every xtjXt? in the sample space. The true value of is 1
1
2
1
2
1
2
2
1
2
1
1
1
4
denoted by 2 . Finally, suppose that the conditional mean E [xtjXt? ; ] is nite, and that the conditional covariance matrix E [(xt ? E [xtjXt? ; ])(xt ? E [xtjXt? ; ])0jXt? ; ] is nite and positive de nite for all nite t. 1
1
1
1
2.2 Weak, Strong and Strict Noncausality Let ut denote the 1-step ahead prediction error for yt conditional on Xt and when the predictor is given by the expectations operator. That is, +1
+1
ut yt ? E [yt jXt; ]: +1
+1
+1
(1)
By assumption ut has conditional mean zero and positive and nite conditional variance t . The concept of causality formulated by Granger (1969) concerns optimal (minimum MSE) unbiased 1-step ahead linear least squares predictors. Although the notion cannot directly be translated into nonlinear models, three causality concepts inspired by Granger's ideas will be presented below. A coarse version of Granger noncausality in nonlinear models is the following: +1
2
De nition 1 m is said to be noncausal for y in a weak sense i for all t E [ut ; ] = E [~ut ; ] < 1; 2 +1
2 +1
(2)
where u~t+1 yt+1 ? E [yt+1 jZt; Yt; ].
Analogously, we say that m is weakly Granger causal for y if the left hand side of (2) is smaller than the right hand side. The term \weak" does not appear in Granger's de nition, but has been added here to emphasize that the concept can be strengthened. A logical step is to measure causal eects from the behavior of the conditional prediction error variance. It is argued in the introduction that economic behavior can in uence the time pro le of the conditional variance and it is therefore possible that a variance based causality measure is un t for detecting certain relevant causal eects. Let us therefore 5
consider the following:
De nition 2 m is said to be noncausal for y in a strong sense i for all t E [ut jXt; ] = E [~ut jZt; Yt; ] < 1: 2 +1
(3)
2 +1
Alternatively, m is strongly Granger causal for y if the two random variables in (3) are dierent for some t; see Granger, Engle and Robins (1986). The above measures of causal eects can be viewed as representing causality in mean and in mean-variance, respectively. Let us nally de ne a measure based on the behavior of the density (probability) functions for ut given Xt, denoted by gt (ut jXt; ), and for u~t given Zt; Yt, denoted by ht (~ut jZt; Yt; ). +1
+1
+1
+1
+1
+1
De nition 3 m is said to be noncausal for y in a strict sense i for all t gt (ut jXt; ) = ht (~ut jZt; Yt; ): +1
+1
+1
+1
(4)
In other words, m is strictly Granger causal for y if the conditional distribution for the 1-step ahead prediction error is not invariant with respect to the history of money. This de nition of noncausality is similar to the de nition in Chamberlain (1982) and Florens and Mouchart (1982), who de ne m to be noncausal for y if the marginal density for yt given Xt, denoted by ft y (yt jXt; ), is invariant with respect to Mt. While strict noncausality is a natural measure of causal eects from a statistical perspective, it is dicult to imagine an economic environment where m can be strongly noncausal for y and yet be strictly causal. However, since the three causality measures are nested, strict noncausality provides a means for interpreting parametric restrictions implied by, say, strong noncausality. Moreover, for some nonlinear models weak and strong noncausality can be highly dicult to test when certain sets of restrictions depend in complicated ways +1
( ) +1
+1
1
By nested it is understood that (4) implies (3) which implies (2). Alternatively, let H (i) be the set of all parametric functions of which are consistent with De nition i, for i = 1; 2; 3. Then, H (3) H (2) H (1). 1
6
on other properties of the model; see Proposition 2 below. Practical considerations can therefore force a researcher to focus on a more limited hypothesis.
3 A Regime Switching VAR Model In this section we shall study the three causality notions within a particular model which has recently attracted some interest in time series analysis. We rst present the model (which is nested within the class of autoregressive models studied in Hamilton (1990)) and then turn to the question of under which conditions m is not Granger causal for y according to the weak, the strong and the strict form, respectively. In section 4 we apply the theoretical results to monthly U.S. postwar data on money and income.
3.1 The Statistical Model To establish notation, let xt denote an n dimensional discrete time series generated by the VAR model: p X xt = s Dt + Ask xt?k + "t; t = 1; 2; : : : ; T; (5) ( )
t
k=1
t
where "tjst N (0; s ) and s is positive de nite. The vector Dt is d dimensional and deterministic, e.g. a constant and centered seasonal dummies. The initial values x ; : : :; x ?p are taken as xed. The random state or regime variable st is unobserved, conditional on st? independent of past x, and assumed to follow a q-state Markov process. In other words, Pr[st = j jst? = i; Xt? ] = Pr[st = j jst? = i] = pij , for all t and i; j = 1; 2; : : : ; q. The Markov transition probabilities satisfy Pqj pij = 1 for all i. It is assumed that the Markov process is irreducible (no absorbing states) and we collect the parameters in t
t
0
1
1
1
1
1
=1
P
2 6 6 = 666 4
p
p q
11
... pq 1
7
1
... pqq
3 7 7 7 : 7 7 5
(6)
By construction one eigenvalue of P is always equal to unity and to ensure that st is ergodic the remaining eigenvalues are assumed to lie inside the unit circle. The ergodic probabilities, Pr[st = j ] = j , are collected into , where P 0 = . The random matrices s , Ask and s depend only on the regime variable st. Specifically, if st = j , then s = j , Ask = Ajk while s = j . Since st is ergodic it follows that s , Ask and s are ergodic as well. Karlsen (1990) establishes a sucient condition for xt to be weakly stationary. His condition applies directly here when Dt does not include any deterministically trending variables. It is noteworthy that his condition allows for unit and explosive roots within states as long as some of the Ask matrices vary across states. Furthermore, the condition is also valid when Dt includes trending variables and the random matrices s and Ask satisfy certain (nonlinear) restrictions. The interested reader is also refered to Holst et al. (1994) and Warne (1996). Below we shall consider Markov chains that can be split into two independent processes (where one can be a single regime process). This allows coecients in two subsystems of x to vary with the regime and, at the same time, be independent. Let s ;t and s ;t be a q and a q state Markov process, respectively, with q = q q and s ;t and s ;t independent, i.e. pij = pi1 j1 pi2 j2 where Pr[sl;t = jljsl;t? = il] = pilj and Pqj pilj = 1 for il = 1; : : :; ql. Collecting the parameters into P and P and de ning st s ;t + q (s ;t ? 1) for the pair (s ;t; s ;t) we have that P = (P P ). While the restrictions implied by independence appear to be nonlinear, they are in fact linear. The reason is that the elements of each row of P l sum to unity. Moreover, the Markov process sl;t is serially uncorrelated if, for all il; jl = 1; : : : ; ql, pilj = jl . ( )
t
t
t
( )
( )
t
t
t
( )
t
t
t
( ) t
( )
t
t
1
2
1 2
(1)
(2)
1
(1)
1
()
(1)
2
1
l
l =1
l l
(2)
2
2
1
2
()
l l
2
1
(2)
()
()
l l
() l
3.2 Noncausality Conditions Consider the following partition of (5): 2 6 4
3 2 x1;t 7 6 5=4 x2;t
2 3 p X 1;s 7 6 4 5 Dt + k=1 2;s t
t
k
( ) 11
t
( ) 21
t
k
8
;s ;s
32 ;s 7 6 54 (k ) 22;s
k
( ) 12
t
t
3 2 x1;t?k 7 6 5+4 x2;t?k
3 1;t 7 5: 2;t
(7)
Furthermore, let us partition s conformably with i;t, i.e. ij;s E [i;t0j;tjst]. We also have use for a ner partition of (5). For that partition, it suces to give the income equation: t
yt = ;s Dt + 1
t
t
p h X a(11k);s yt?k + a(12k);s k=1 t
t
i
z ;t?k + a k ;s mt?k + a k ;s z ;t?k + " ;t; ( ) 13
1
( ) 14
t
t
2
1
(8)
while ! ;s is the variance of " ;t conditional on st. We shall use the rst partition when we discuss regime predictions while the second partition is used in conjunction with Granger causality. For expositional reasons, let us rst assume that all regimes are known. The prediction of next period's income conditional on st and Xt is then 11
1
t
+1
E [yt jst ; Xt] = yt ? " ;t : +1
+1
+1
(9)
1 +1
Accordingly, the prediction error is given by " ;t and the conditional prediction error variance by ! ;s +1 . The necessary and sucient condition for money not to Granger cause income in a weak, strong and strict sense is that a k ;s = 0 for all k and t. Let us now drop the assumption that the regimes are known. While the regime variable st is independent of past x conditional on st? , it can be predicted using only past x. Let Pr[st jXt] denote the probability of a particular state occuring at t + 1 conditional on the information available at t. The prediction of next period's income is then given by 1 +1
11
t
( ) 13
t
1
+1
E [yt jXt] = +1
X
s +1
E [yt jst ; Xt] Pr [st jXt] : +1
+1
+1
(10)
t
The role for money is dierent in (10) relative to (9) in that the history of money can now predict income by containing information which helps predict next period's state. Since st is independent of Xt conditional on st it follows that +1
Pr [st jXt] = +1
X
s
Pr [st jst] Pr [stjXt] : +1
t
9
(11)
From this relationship we can deduce that there are only two instances when there is no additional information in the history of money for predicting next period's state. The rst is when Pr[st jst] = Pr[st ], i.e. the Markov process is serially uncorrelated. The second case occurs when Pr[stjXt] = Pr[stjZt; Yt]. This discussion presumes that the coecients in the income equation vary freely with the regime. It is however possible that these coecients vary with s ;t but not with s ;t . Similarly, there may be information in Mt for predicting s ;t but not for predicting s ;t . In such situations it may still be the case that the prediction of income in (10) does not depend on the history of money. This leads us to our rst result. +1
+1
1 +1
2 +1
2 +1
1 +1
Proposition 1 There is no information in X ;t for predicting s ;t 2
1 +1
Pr[s ;t jX ;t; ], for all t i either 1 +1
, i.e. Pr[s1;t+1jXt ; ] =
1
k =k ,
(A1): (I) P = (P P ), i;s = i;s , ij;s ii;s = ii;s ,
ij;s for i; j = 1; 2, k = 1; : : :; p, and (II) k ;s1 = 0 for k = 1; : : :; p; or (1)
( )
(2)
t
i;t
( )
t
( ) 12
i;t
t
i;t
;s
12
t
=0
;t
(A2): P = ({q1 0 P ), (1)
(2)
is satis ed.
Notice rst that all restrictions in (A1) and (A2) are linear. Furthermore, if we change the restrictions in (A1.II) to k ;s2 = 0, then there is no information in X ;t for predicting s ;t. Moreover, in the appendix it is shown that ( ) 21
1
;t
2
Corollary 1 I (A1.I) is satis ed, then for all the predictions of s ;t and s ;t given X 1
2
are independent, i.e. Pr[stjX ; ] = Pr[s1;tjX ; 1] Pr[s2;tjX ; 2] for t; = 1; : : :T .
Hence, for the predictions of s ;t and s ;t to be independent, it is not sucient that the Markov processes are independent. In fact, the joint distribution for xt conditional on st (and Xt? ) being equal to the product between the marginal distributions for xl;t conditional on sl;t (and Xt? ) for l = 1; 2 must also be satis ed. Under these additional restrictions forecasting, ltering and smoothing inference about the two regime variables can be conducted independently. Additionally, 1
2
1
1
10
Corollary 2 I (A1) and k ;s2 = 0 for k = 1; : : : ; p are satis ed, then for t; = 1; : : : ; T ( ) 21
;t
it holds that Pr[stjX ; ] = Pr[s1;tjX1; ; 1] Pr[s2;tjX2; ; 2].
The intuition behind condition (A1) is, in fact, straightforward. Suppose p = Dt = 1, n = q = q = 2, while ;t is iid. The restrictions on s in (A1) are sucient for the money equation residual to be iid. Now consider the experiment of drawing two mt's, one for each regime, when yt? and mt? are xed. The dierence between these two draws is: 1
2
1
mtjs
t =2
t
1
? mtjs = ( ; ? ; ) + ( ; ? ; ) yt? + ( ; ? ; ) mt? : 22
t =1
21
21 2
21 1
1
22 2
22 1
1
(12)
The right hand side of (12) is zero for all (yt? ; mt? ) when the coecients in the money equation are constant across states. Accordingly, if these restrictions are satis ed, then Pr[stjYt; Mt] = Pr[stjYt; Mt? ] and all information about st is found in the income equation. If the coecient on money in that equation is zero for both states, then lags of money play no role for predicting regime switches. Before we present the next result, some additional notation is needed. Speci cally, let ;t E [ ;s +1 jXt; ], while a kl;t is de ned analogously for l = 1; : : : ; 4 and all k. The prediction error is then given by ut = vt + " ;t , where 1
1
1
1
1
( ) 1
t
+1
vt
+1
+1
1 +1
(k) (k ) ? a a 11;t yt+1?k + 11;s +1 =1 Pp (k) Pp (k) (k ) (k ) z + ? a a ? a a 1;t+1?k 12;t 13;t mt+1?k + k=1 12;s +1 k=1 13;s +1 Pp (k) (k ) a ? a 14;t z2;t+1?k ; k=1 14;s +1
;s +1 ? ;t Dt + Ppk 1
t
1
+1
t
t
t
(13)
t
is (conditional on Xt) uncorrelated with " ;t . A sucient, but not necessary, condition for vt to be mean zero stationary is that income is covariance stationary. Another possibility is that xt is cointegrated with cointegration vectors that may depend on the regime. 1 +1
+1
Proposition 2 m is noncausal for y in a weak sense i vt
+1
either
(B1): (A1); or 11
is mean zero stationary and
(B2): (A2), P = {q2 0, and Pqj a k ;j j = 0 for k = 1; : : : ; p; or (2)
(2)
=1
( ) 13
(B3): (A2), ;j = ;j1 , a kl;j = a kl;j1 for l = 1; : : : ; 4 and j = 1; : : : ; q, and Pq1 k j1 a ;j1 j1 = 0 for k = 1; : : : ; p; or 1
=1
( ) 13
( ) 1
1
( ) 1
(1)
(B4): ;j = , a kl;j = a kl for l = 1; 2; 4 and j = 1; : : : ; q, and Pqj a k ;j pij = 0 for i = 1; : : : ; q and k = 1; : : : ; p; or 1
1
( ) 1
( ) 1
=1
( ) 13
(B5): Pqj ;j pij = 0, and Pqj a kl;j pij = 0 for l = 1; : : : ; 4, i = 1; : : : ; q, and k = 1; : : :; p, =1
1
( ) 1
=1
is satis ed.
Some of the restrictions in (B2) to (B5) are seemingly nonlinear. However, as long as P has full rank, condition (B3) (with q = 1) states that a k ;j = 0 for all j and k. In addition, (B3) and (B4) are equivalent under q = 1, while (B5) becomes a special case of these conditions. One example of a reduced rank restriction on P is (A2) with q = q > 1 (q = 1). Now, (B2) and (B3) are equivalent, while (B4) and (B5) are special cases. Another example is when P = (P {q2 0) with q ; q > 1. Failure to take this possibility into account when testing e.g. the restrictions in (B4) can lead to invalid inference about weak noncausality. In practise, the partial derivatives of the nonlinear restrictions form a matrix with rank equal to the number of restrictions as long as pij > 0 for all i; j . In that sense, the nonlinear restrictions in (B2) to (B5) do not lead to a singularity problem when forming test statistics or estimates using the partials. However, the number of restrictions depends directly on the rank of P , thus making weak noncausality a problematic hypothesis to test whenever q > 2. Still, as long as a k ;s varies with the regime, the conditions in Proposition 2 are not sucient for money not to Granger cause income in a strong sense. ( ) 13
1
1
1
(1)
(2)
( ) 13
1
2
2
t
Proposition 3 m is noncausal for y in a strong sense i vt
+1
either
12
is mean zero stationary and
(C1): (A1); or (C2): (B2), and a k ;j = 0 for j = 1; : : : ; q and k = 1; : : :; p; or ( ) 13
(C3): (B3), a k ;j = 0 for k = 1; : : : ; p, and !
11
(C4): (B4), a k ;j = 0 for k = 1; : : : ; p, and !
11
( ) 13
( ) 13
;j
=!
;j
= ! for j = 1; : : : ; q,
;j1
11
for j = 1; : : :; q; or
11
is satis ed.
Notice rst that all the restrictions in (C1) to (C4) are linear, and that the number of restrictions is known from n , n , q , q , and p (with ni being the dimension of xi;t). Second, if q = 1, then (C2) and (C3) are equivalent, whereas q = 1 means that (C3) and (C4) are equivalent. Hence, for the 2-state regime process the restrictions in (C1){(C3) exhaust the cases when m is strongly noncausal for y. Third, for bivariate VAR's with q = 1, the combination of (C1) and (C3) implies that the number of states is equal to unity and that a k = 0, i.e. the usual Granger noncausality restrictions in single regime VAR's. It is now straightforward to derive the conditions for strict noncausality. From the de nitions of strong and strict noncausality we know that the strong form must be satis ed when the strict form holds. In other words, the condition [(C1) or (C2) or (C3) or (C4)] is necessary for strict noncausality. 1
2
1
2
2
1
1
( ) 13
Proposition 4 m is noncausal for y in a strict sense i m is noncausal for y in a strong sense.
This result is, at a quick glance, somewhat surprising, but at a closer look evident. Specifically, the density for ut conditional on Xt can be written as +1
gt (ut jXt; ) = +1
+1
X
s +1
gt s (ut jst ; Xt; ) Pr[st jXt; ]: ( ) +1
+1
+1
(14)
+1
t
The distribution for ut conditional on st and Xt is Gaussian with mean vt and variance ! ;s +1 . Hence, the distribution for ut conditional on Xt is mixed Gaussian with +1
11
+1
+1
t
13
+1
mean zero and variance t = v;t + ";t. This distribution is completely described by vt , ! ;s +1 , and Pr[st jXt]. Under (C1), the forecast probability of st is equal to the product between the probabilities Pr[s ;t jX ;t] and Pr[s ;t jXt], while the density functions on the rigth hand side of (14) are invariant with respect to X ;t and s ;t , thereby implying that m is strictly noncausal for y. Next, (C2) means that the forecast probability of st is equal to the ergodic probability, while the density functions gt s () do not depend on Mt. Under (C3) we have that the forecast probability of st is equal to Pr[s ;t ] times Pr[s ;t jXt], and that gt s () are invariant with respect to s ;t and Mt. Finally, under (C4) the prediction error, ut , is independent of st and Xt such that ut N (0; ! ). Whenever " ;s +1 jst has a distribution which is not fully described by the mean and the variance, then there is no reason to expect the strong and strict noncausality restrictions to be equivalent. 2
11
2
2
+1
+1
t
+1
1 +1
1
2 +1
2
2 +1
+1
( ) +1
+1
( ) +1
1 +1
2 +1
2 +1
+1
+1
1
t
+1
11
+1
4 An Application to Money and Income In this section we shall analyse the causality restrictions for monthly U.S. data on money and income. The variables are M1 and industrial production for the sample period 1959:1 to 1995:2. Both series are seasonally adjusted (as in many previous studies using these data; e.g. Christiano and Ljungqvist (1988)) and taken from Citibase. To deal with trends we take rst dierences of the log levels and to avoid numerical problems in the estimation the variables are multiplied by 100. Moreover, the number of lags and states are xed at 2. In Table 1 some stylized facts about the behavior of money and income are presented. On average, income grows at less than .3 percent per month with a standard deviation of about .9 percent. The average growth in money is somewhat higher, while the standard deviation is about .5 percent. Moreover, income and money growth do not seem to be 2
Formulas for computing the mean and the autocovariance conditional on the state are given in Warne (1996). 2
14
contemporaneously correlated. Turning to the two states, we nd that the estimates of the VAR model roughly suggests a zero average income growth in state 1 and a 50 percent higher volatility than on average. In state 2, income growth is approximately .4 percent and the volatility is quite modest. Money, on the other hand, grows at about the same rate in both states and is somewhat more volatile in state 1. Concerning the contemporaneous correlations, there is a negative correlation in state 1 and a positive in state 2. Both correlation coecients are, however, rather small. Summing up, these simple moments suggest that we can interpret state 1 as \the bad state" and state 2 as \the good state". Since q = 2 it is evident that either q or q is equal to unity. Accordingly, the coecients in the money and income equations cannot both vary over time and be independent. The ML estimates of all 28 parameters are reported in Table 2. The estimates have been obtained analytically via the EM algorithm; for more details the reader is refered to Hamilton (1990,1994) and Warne (1996). Standard errors for the point estimates have been computed from the conditional scores (as in Hamilton (1996)) and are reported within parentheses. The point estimates for the bad state (regime 1) are generally more uncertain than those for the good state. This is, undoubtedly, related to the good state being roughly 3 times as likely (unconditionally) as the bad state. Let us rst consider the hypothesis that, conditional on the state, money does not Granger cause income. That is, ;s = ;s = 0 for each state separately. The results from using Wald and F-statistics are reported in Table 3. For the good state, this hypothesis is rejected at the 5 percent level, and for the bad state at the 60 percent level. Hence, 1
2
3
4
(1) 12
t
(2) 12
t
The statistic max jeig(A^)j refers to Karlsen's (1990) condition for weak stationarity. If the true value of this statistic is less than unity, then xt is weakly stationary. 4 The estimated number of observation for state 1 is approximately 111 and for state 2 about 320. P These estimates are calculated as Tt=1 Pr[st = j jXT ; ^] for j = 1; 2. Interestingly enough, by dividing these numbers by T we almost get the same values as the ergodic probabilities. If we estimate the Markov transition probabilities under P the restriction that the Markov process is serially uncorrelated, the ML estimatorPof j is (1=T ) Tt=1 Pr[st = j jXT ; ^]. Generally,Pthe ML estimator of 1 is approximately equal P to (1=T ) Tt=1 Pr[P st?1 = 1jXT ; ^] for large enough P T if Tt=1 Pr[st?1 = 1jXT ; ^] 1 ? [( Tt=1 Pr[st?1 = 1; st = 2jXT ; ^])=( Tt=1 Pr[st?1 = 2; st = 1jXT ; ^])]( Tt=1 Pr[st?1 = 2jXT ; ^]). This can certainly hold even for serially correlated Markov processes. 3
15
money does not cause income when we know that next period's state is the bad state, while it seems to cause income if next period's state is the good state. One interpretation of these results is that there is \money illusion" in the good state because the agents in the economy face signal extraction costs when attempting to separate nominal from real shocks. If these extraction costs are large relative to the costs of ignoring the signal extraction problem, it can be rational to have money illusion. In the bad state, however, the extraction costs are small relative to the money illusion costs, suggesting that agents \solve" the signal extraction problem in this state. The point estimates of the standard deviations in Table 1 in fact seem to reinforce this story. Generally, signal extraction costs can be assumed to be high (low) when the signal is weak (strong). The strength of the signal may be measured by the relative volatility between income and money. In the good state, this ratio is approximately 1.5 and in the bad state roughly 2. Hence, it is plausible that the extraction costs are higher in the good (low relative volatility) state than in the bad (high relative volatility) state. Still, this interpretation may not be valid in a (structural) model with more variables (like interest rates, prices, and so on) and it does not take into account that the states may not be perfectly predicted. Moreover, the lack of rejection for the bad state may primarily be related to imprecise estimates. Next, consider the hypothesis that the history of money is uninformative about next period's state. Both sets of restrictions, (A1) and (A2) (see Proposition 1), are strongly rejected by the data. Moreover, if (A2) is false, then the Markov process is serially correlated. Based on the point estimates of the transition probabilities in Table 2, good (bad) states are typically followed by good (bad) states. Together with the conclusion from testing (A2), this suggests that the regime process is subject to a positive serial correlation. While various aspects of the noncausality hypotheses have already been examined, the tests of all sucient conditions have not been addressed. From Proposition 2 we have that i (A1) or (B2) or (B3) is true, then money is weakly noncausal for income. Given the evidence about (A2) it is not surprising that (B2) is strongly rejected as well. Now, if (A2) is false then P must have full rank 2 and, thus, the nonlinear restrictions in (B3) become linear. According to the results in Table 3 this hypothesis is rejected at the 10 percent 16
level, but not at the 5 percent level. Hence, there is some evidence that the conditional mean of income is invariant with respect to the history of money. Finally, from Proposition 3 we know that money does not Granger cause income in a strong sense i (A1) or (C2) or (C3) is true. All these restrictions are, individually, rejected at conventional levels of signi cance, suggesting that there is information in money about the conditional forecast error variance of income. Summing up, if data is generated by a bivariate 2-state Markov switching VAR(2) model, then money Granger causes income for monthly U.S. data in log growth rates. There is some evidence that money is weakly noncausal for income, but the hypothesis of strong (and strict) noncausality is rejected at conventional marginal levels of signi cance. This can be contrasted with single regime VAR's where the opposite conclusion is reached in rst dierence models. In fact, noncausality in such models is very robust with respect to the selection of sample period and lag order (except for very short lag orders, when the estimated residuals do not pass standard serial correlation diagnostics). For example, Christiano and Ljungqvist (1988) show that money Granger causes income in a bivariate single regime VAR (with 12 lags) for log levels, but not for log growth rates. They argue, based on evidence from bootstrapped empirical distributions, that the conclusion for the rst dierence model is wrong. The evidence from the 2-state VAR (with 2 lags) supports Christiano and Ljungqvist's conclusion that money does Granger cause income in the bivariate VAR, although for a very dierent reason. Speci cally, the results in this paper suggest that the \cause" stems from a signal extraction problem where the history of money is useful for predicting the regime variable. One may conjecture that m being weakly but not strongly noncausal for y in growth rates can result in Granger causality not being rejected in a single regime VAR for the growth rates and rejected for the levels, where the eects are accumulated. The suggestion that money and income are correlated because of a signal extraction problem rather than a causal link from money to income goes back to (at least) Lucas (1972) in the macroeconomics literature. The signal extraction problem in the (reduced form) regime switching VAR is primarily about the uncertainty of the regimes, but may 17
also, at a deeper level, re ect a Lucas type problem through the parameters.
5 Concluding Remarks The main concern in this paper is the question of how to formulate a Granger causality hypothesis for (parametric) nonlinear time series models. Causality is measured from the behavior of 1-step ahead prediction errors using the conditional expectation. The weak form of noncausality is concerned with the prediction error variance, the strong form compares conditional prediction error variances, while the strict form is based on the conditional distribution of the prediction error. It is argued that the conditional prediction error variance can, from an economic theory point of view, be a more enlightening causality measure than the unconditional variance, while the reasons for examining strict causality are primarily of a statistical nature. When applied to a VAR model subject to changes in regime, it is found that the parametric restrictions implied by weak noncausality can be very dicult to deal with in practise. Speci cally, the number and form (linear versus nonlinear) of the restrictions depends critically on the rank of the Markov transition matrix. Strong noncausality, on the other hand, gives sets of linear restrictions that are well de ned given the number of states and lags in the model. It seems plausible that these features are not unique to the regime switching model, but also appear in other VAR models with time varying coecients. Hence, strong noncausality is a simpler concept to deal with when performing statistical analysis and conducting inference. Moreover, given that the residuals of the VAR are Gaussian conditional on the regime, it is shown that strong and strict noncausality are equivalent. In other words, if (and only if) there is no information in the history of money for evaluating the conditional mean and variance for the prediction error of next period's income, then the conditional distribution of the prediction error is invariant with respect to the history of money. The reason for this equivalence is that the conditional distribution of the prediction error is a mixture of Gaussian distributions with weights given by the regime forecasts. 18
Table 1: Some stylized facts about money and income growth in the US for the sample 1959:1{1995:2. Unconditional moments variable mean st. dev. corr. y .277 .906 .009 m .498 .489 State 1 y .049 1.408 -.083 m .434 .641 State 2 y .354 .634 .090 m .519 .424
19
Table 2: ML estimates and standard errors for a 2-state Markov switching VAR(2) model of money and income for the sample 1959:1{1995:2. Income equation Money equation coecient st = 1 st = 2 coecient st = 1 st = 2 1;s -.086 .140 2;s .394 .201 (.238) (.064) (.119) (.038) (1) (1) .066 -.025 .447 .271 21;s 11;s (.127) (.058) (.070) (.035) .235 .474 .375 -.104 (1) (1) 22;s 12;s (.355) (.113) (.121) (.056) (2) (2) 21;s -.031 -.064 11;s -.082 .126 (.142) (.054) (.082) (.033) (2) (2) 22;s -.148 .205 12;s -.164 .274 (.308) (.111) (.134) (.056)
11;s 1.674 .322
22;s .388 .100 (.224) (.036) (.067) (.010)
12;s -.133 .020 (.105) (.012) Markov probabilities p11 .752 p22 .916 (.069) (.029) max jeig(A^)j = .406 ln L(XT ; ^) = -710.6 ^1 = .254 (.047) Estimated # obs: st = 1: 111 (91,131); st = 2: 320 (300,340) t
t
t
t
t
t
t
t
t
t
t
t
t
20
Table 3: Wald and F-tests of the Granger noncausality restrictions. Noncausality tests when st+1 known. Hypothesis # rest. Wald F (k ) 12;1 = 0 2 1.120 .5433 [.571] [.581] (k ) 12;2 = 0 2 6.553 3.178 [.038] [.043] Money and Regime Predictions (A1) 12 60.84 4.976 [.000] [.000] (A2) 1 57.93 56.05 (q2 = 1) [.000] [.000] Weak noncausality (B2) 3 59.44 19.26 (q2 = 1) [.000] [.000] (B3) 7 13.41 1.871 (q1 = 1) [.063] [.073] Strong noncausality (C2) 5 75.88 14.75 (q2 = 1) [.000] [.000] (C3) 8 74.78 9.130 (q1 = 1) [.000] [.000]
Note: p-values are reported with brackets.
21
Mathematical Appendix Proof of Proposition 1 It should be obvious that (A2) implies that there is no information in X2;t for predicting s1;t+1 . Let us therefore focus on the only remaining possibility, condition (A1). To prove that these restrictions are necessary and sucient for Pr[s1;tjXt ] = Pr[s1;tjX1;t] to hold we shall proceed in two steps. The rst step involves nding a general condition for predictions of s1;t (and s2;t ) to be invariant with respect to alternative information sets. In the second step we show that the parameter restrictions in (A1) are necessary and sucient for the invariance condition to be satis ed under the two information sets of interest. Let tj (j ) = Pr[st = j jx ; W ], where xt is a vector of variables and W is the history of an observable vector wt up to and including period . The vector wt can, for example, be de ned such that it contains xt?1 and various exogenous variables observable at time t. Furthermore, let t(j ) = fx (xtjst = j; Wt) be the density function for xt conditional on the state and the history of w. We stack these functions into q 1 vectors tj and t , respectively. From e.g. Hamilton (1994) we have that tjt, tjt?1, and t are related according to: j
( ) tjt = {0 (tjt?1 t ) ;
t = 1; 2; : : :;
(A.1)
tjt?1 = P 0t?1jt?1 ;
t = 2; 3; : : :;
(A.2)
q tjt?1
while
t
and 1j0 = , a q 1 vector of positive constants summing to unity. Here, denotes the Hadamard (element-by-element) product and {q the q 1 unit vector. Let st be represented by two Markov processes, s1;t and s2;t , which are not necessarily independent. De ne j such that j j2 + q2 (j1 ? 1) when (s1;t ; s2;t) = (j1 ; j2), where q1 ; q2 1 and q = q1 q2 2. Then tj (j ) = tj (j1; j2) = Pr[s1;t = j1 ; s2;t = j2jx ; W ], while t(1) j (j1) = Pq2 (1) (2) 0 j2 =1 tj (j1; j2) and similarly for tj (j2). More compactly, this means that tj = [Iq1 {q2 ]tj 0 and t(2) j = [{q1 Iq2 ]tj . The following result about Hadamard and Kronecker products will prove useful below:
22
Lemma 1 I t = (t t ) with t l being ql 1 for l = 1; 2, then (1)
()
(2)
h
i
h
i
Iq1 {0q2 tjt?1 t = Iq1 t(2)0 tjt?1 t(1);
while
(A.3)
{0q1 Iq2 tjt?1 t = t(1)0 Iq2 tjt?1 t(2):
(A.4)
Proof The j :th element of (tjt? t) is given by tjt? (j ; j )t (j )t (j ). Premultiplying this q 1 vector by [Iq1 {0q2 ] we obtain a q dimensional vector whose j :th element is 1
t (j1) Pqj22=1 tjt?1(j1; j2) t(2)(j2). (1)
1
1
2
(1)
1
(2)
1
2
1
Now de ne
2 tjt?1 (j1; 1) 6 6 ..
tjt?1(j1) 66 . 4 tjt?1(j1; q2)
3 7 7 7 7; 5
j1 = 1; : : :; q1:
(A.5)
P Then tjt?1(j1)0t(2) = qj22=1 tjt?1 (j1; j2)t(2)(j2). Collecting these results we nd that 2 i 6 ih (1) h 6 (2) 0 Iq1 {q2 tjt?1 t t = 66 4
tjt?1(1)0t(2) .. .
tjt?1(q1 )0t(2)
3 7 7 (1) 7 7 t : 5
(A.6)
De ne the q2 q1 matrix tjt?1 according to tjt?1 [ tjt?1(1) tjt?1(q1 )]. It then follows that 2 3 (2) 0
(1) tjt?1 t 7 6 6 7 . (2) 0 6 7 .
tjt?1t = 6 (A.7) . 7: 4 5
tjt?1(q1 )0t(2) Moreover, tjt?1 = vec( tjt?1), with vec being the column stacking operator. Next,
t0jt?1t(2) = [t(2)0 Iq1 ]vec( t0jt?1)
= [t(2)0 Iq1 ]Kq2 ;q1 vec( tjt?1)
= Kq1 ;1[Iq1 t
0]
(2)
= [Iq1 t(2)0]tjt?1;
23
tjt?1
(A.8)
where Km;n is the mn mn commutation matrix, Km;1 = Im , and the third equality follows by Theorem 3.9 in Magnus and Neudecker (1988). Collecting these last results we have established (A.3). The result (A.4) follows by similar arguments. QED If s1;t and s2;t are independent, it follows that 0 (1)0 (2)0 t(1) jt?1 = [Iq1 {q2 ][P P ]t?1jt?1
= P
(A.9)
0 (1) ; t?1jt?1
(1)
(1) (2)0 (2) since P (2){q2 = {q2 . Similarly, t(2) jt?1 = P t?1jt?1. However, this does not mean that tjt?1 and (1) (2) t(2) jt?1 are independent since t?1jt?1 and t?1jt?1 need not be independent.
Lemma 2 I (i) t = 't(t t ) where 't is a scalar and t l a ql 1 vector, (ii) t and (1)
(2)
()
(1)
t(2) are vectors of density functions for independent random variables, and (iii) s1;t and s2;t are independent, then for all t = 1; : : :; T (t(jlt)?1 t(l)) = 0 (l) ; {q (tjt?1 t(l))
tjlt
()
l = 1; 2;
(A.10)
l
(2) (1) (2) with tj = (t(1) j tj ), where tj and tj are independent for = t; t ? 1.
Proof Note rst that {0q = {0q1 (Iq1 {0q2 ) = {0q2 ({0q1 Iq2 ). For l = 1 we know that tjt = [Iq1 {0q2 ]tjt. (1)
From equation (A.1) we thus have that
0 0 0 ?1 t(1) jt = [Iq1 {q2 ][tjt?1 t][{q1 (Iq1 {q2 )(tjt?1 t)]
= [([Iq1 t
0]
(2)
(2)0 (1) ?1 0 tjt?1) t ][{q1 [([Iq1 t ]tjt?1) t ]] ; (1)
(A.11)
by Lemma 1 and since the scalar 't cancels. A similar expression is obtained for t(2) jt . Let = ((1) (2)) where the elements of (l) are positive and sum to unity. Then
1(1)j1 = [((1) 1(2)0(2)) 1(1)][{0q1 [((1) 1(2)0(2)) 1(1)]]?1 = [ (1)
(1) 1
][{0q1 ((1) 1(1))]?1 ;
(A.12)
and similarly for 1(2)j1 . By (ii) it follows that 1(1)j1 and 1(2)j1 are independent. Thus, 1j1 = (1(1)j1 1(2)j1 ).
24
Moreover, by (iii) we have that 2(lj)1 = P (l)01(lj)1, which are also independent for l = 1; 2. Thus, 2j1 = (2(1)j1 2(2)j1 ) and so on for t = 2; 3; : : :; T , thereby establishing suciency. To prove necessity, suppose (i) is not true. Let t = (t(1) t(2)) t, where t 6= ( t(1) t(2)) for ql 1 vectors t(l). Then, for example (2)0 (1) 0 (2)0 (1) ?1 t(1) jt = [(Iq1 t )(tjt?1 t) t ][{q1 [(Iq1 t )(tjt?1 t) t ]]
6= [([Iq1 t
0]
(2)
(2)0 (1) tjt?1) t ][{0q1 [([Iq1 t ]tjt?1) t ]]?1: (1)
(A.13)
The only case when the inequality can be replaced with an equality is if t = ( t(1) t(2)). Next, if (ii) does not hold, then for instance 1(1)j1 and 1(2)j1 cannot be independent. Finally, if (iii) does (2) (1) (2) (1)0 (1) not hold, then t(1) jt?1 6= P t?1jt?1 and depends on t?1jt?1 as well. Thus, 2j1 and 2j1 cannot be independent even if 1(1)j1 and 1(2)j1 are. QED Note that assumptions (i) and (ii) are often closely related. For the Gaussian distribution, for example, (i) implies (ii) and vice versa. However, there may exist some perverse distributions which can satisfy (i) but not (ii) unless additional parametric conditions hold. We can always select = ((1) (2)) when the parameters are assumed to be known. However, Hamilton (1990) shows that when parameters are unknown, the ML estimator of is given by the estimate of 1jT . The following Lemma ensures that the results in Lemma 2 also hold if the ML estimator of is consistent.
Lemma 3 I the conditions in Lemma 2 are satis ed, then
(2) tj = t(1) j tj ;
(A.14)
(2) for all t; = 1; : : :; T , with t(1) j and tj being independent.
Proof Let us rst prove this for all < t. We have already established in Lemma 2 that
(1)j and (2)j are independent for all . By equation (22.3.13) in Hamilton (1994) we have that tj = (P 0 )t? j for = 1; : : :; t ? 1: By independence of s1;t and s2;t and of (1)j and (2)j we (2) obtain tj = ((P (1)0)t? (1)j (P (2)0)t? (2)j ) = (t(1) j tj ), which are thus independent. To show (A.14) for > t it is sucient to consider = T since the algorithm for computing smooth probabilities is valid for any > t. From Kim (1994) (see also Lindgren (1978) and
25
Hamilton (1994)) we get h
i
tjT = tjt P t+1jT ()t+1jt ;
t = 1; : : :; T ? 1;
(A.15)
(2) (l) where () denotes element-by-element division. To show that tjT = (t(1) jT tjT ), with tjT independent for l = 1; 2, we begin with t = T ? 1. By Lemma 2 we have that T j = (T(1)j T(2)j ) for = T; T ? 1. Accordingly,
h
i
h
i
T jT ()T jT ?1 = T(1)jT ()T(1)jT ?1 T(2)jT ()T(2)jT ?1 :
(A.16)
Let T(l) P (l) (T(lj)T ()T(lj)T ?1) for l = 1; 2. We then obtain h
i
P T jT ()T jT ?1 =
h
(1)
T
i
(2)
T
T:
(A.17)
0 Hence, T ?1jT = (T ?1jT ?1 T ). With t(1) jT = [Iq1 {q2 ]tjT it follows by Lemmas 1 and 2 that
T(1)?1jT = ([Iq1
=
0]
(2)
T
T ?1jT ?1) T
(1)
(1) (1) 0 (2) T T ?1jT ?1(T ?1jT ?1 T ); (2)
(A.18)
since T ?1jT ?1 = (T(1)?1jT ?1 T(2)?1jT ?1). From the de nition of T(2) we nd that (2) (2) 0 (2) 0 (2)0 (2) T ?1jT ?1 = (T jT ()T jT ?1) P T ?1jT ?1 = (T(2)jT ()T(2)jT ?1)0T(2)jT ?1 P = qj22=1 T(2)jT (j2):
(2)
T
(A.19)
This is equal to unity and we thus get h
i
T(1)?1jT = T(1)?1jT ?1 P (1) T(1)jT ()T(1)jT ?1 :
(A.20)
Proceeding with T(2)?1jT , the above arguments imply that h
i
T(2)?1jT = T(2)?1jT ?1 P (2) T(2)jT ()T(2)jT ?1 ;
26
(A.21)
and, hence, by Lemma 2, T(l?) 1jT are independent for l = 1; 2 and T ?1jT = (T(1)?1jT T(2)?1jT ). For the remaining t backwards recursions, using the above arguments, implies the result. Necessity follows by the arguments in Lemma 2. QED Note that conditions (i) and (ii) are only sucient in forecast situations. If st is serially uncorrelated, then P 0 = {0q , with being the vector of ergodic probabilities. Accordingly, for all < t, tj = (P 0 )t? j = since {0q = {0q j = 1. Hence, if s1;t and s2;t are independent and serially (2) (1) (2) uncorrelated, then tj = (t(1) j tj ) = ( ) for all < t. This completes step one in the proof of Proposition 1. We have established necessary and sucient conditions for how the information used to predict st can be split into information valuable for predicting s1;t but not s2;t , and vice versa, and when information can be \thrown away" without aecting the regime predictions. Note that the conditions in Lemma 2 are very general in the sense that they apply to any vector of density functions t. For example, the functional form can vary over t as well as over states. The crucial underlying assumption is that st is independent of information available at time t ? 1 conditional on st?1 . If this assumption is violated, then the algorithms for computing regime predictions are no longer valid. The assumption that s1;t and s2;t are independent, in fact, increases the level of generality of the results. For example, it allows q2 = 1 in which case t = 'tt(1) (with the scalar 't being invariant with respect to st ) is necessary and sucient for regime predictions based on the vector densities t and t(1) to be equivalent. The scalar 't can, for instance, be a marginal density. When q1 ; q2 2 we allow for the possibility that two subsystems of the model can contain information for predicting one independent regime process each but not the other regime process, while a third subsystem is completely uninformative about regimes. By considering r independent Markov chains, these results can be generalized further. For my purposes, however, the above results are sucient. Now let us return to the regime switching VAR with conditionally Gaussian residuals. Here we nd that for each j 2 f1; : : :; q g the joint log density is ln (t (j )) = ? n2 ln(2 ) ? 21 ln (det [ j ]) ? 12 "0tjj ?j 1 "tjj ;
(A.22)
P where "tjj = xt ? j Dt ? pk=1 A(jk)xt?k . Let n1 and n2 be the number of x1;t and x2;t variables,
27
respectively, with n1 + n2 = n. The marginal density for x2;t, conditional on st = j and Xt?1 , is ln t(2)(j ) = ? n22 ln(2 ) ? 12 ln (det [ 22;j ]) ? 21 02;tjj ?221;j 2;tjj :
(A.23)
This density is invariant with respect to s1;t i (i) 22;(j1 ;j2 ) = 22;j2 , 2;(j1 ;j2 ) = 2;j2 , and (2kl;)(j1 ;j2 ) = (2kl;j) 2 for all j1 = 1; : : :; q1, j2 = 1; : : :; q2, l = 1; 2, and k = 1; : : :; p. For q2 = 1 these restrictions imply that the parameters in the marginal density for x2;t are constant across states. Under these restrictions, the density for x1;t, conditional on st = j = j2 + q2 (j1 ? 1), x2;t , and Xt?1, is
ln t(1)(j )
h
i
= ? n21 ln(2 ) ? 12 ln det ~ 11;j + 02;tjj2 ?221;j2 012;j ~ ?111;j 1;tjj ? 1 0 ~ ?1 ? 12 02;tjj ?221;j2 012;j ~ ?111;j 12;j ?221;j2 2;tjj2 ; 2 1;tjj 11;j 1;tjj
(A.24)
2
where ~ 11;j 11;j ? 12;j ?221;j2 012;j . For this density function to be invariant with respect to s2;t for q2 2 necessary and sucient conditions are (ii) 11;(j1;j2 ) = 11;j1 , 1;(j1 ;j2 ) = 1;j1 , and (1kl;)(j1 ;j2 ) = (1kl;j) 1 for all j1 = 1; : : :; q1, j2 = 1; : : :; q2, l = 1; 2, and k = 1; : : :; p; and (iii) 12;j = 0 for all j = 1; : : :; q . Under (i) to (iii) we nd that t = (t(1) t(2)) for all t, with t(l) being the marginal density of xl;t conditional on sl;t and Xt?1 . If these linear restrictions are not satis ed, then t cannot be decomposed into the (Kronecker) product between a q1 and a q2 vector density. For q2 = 1, restrictions (iii) can be dispensed with. In that case, t = 'tt(1), with 't being given by the marginal density for x2;t . To satisfy the remaining two conditions in Lemma 2 we only need to let s1;t and s2;t be independent. For q2 2 we have stochastic independence from, in particular, restrictions (iii), and for q2 = 1 this is not needed since 't is just a scalar which cancels in (A.1). By Lemma 2 it then follows that Pr[st = j jXt; ] = Pr[s1;t = j1jX1;t; X2;t; 1 ] Pr[s2;t = j2 jX1;t?1; X2;t; 2]. When q2 2 it also follows that Pr[s1;t = j1 jX1;t; X2;t; 1] = Pr[s1;t = j1 jX1;t; X2;t?1; 1]. The nal stage is now straightforward. In order for X2;t to be uninformative about s1;t , (iii) must also hold for q2 = 1, and (iv) (12k);j1 = 0 for all j1 = 1; : : :; q1 and k = 1; : : :; p for q2 1. Together, conditions (i) to (iv) are thus necessary and sucient for Pr[s1;t = j1jXt ; ] = Pr[s1;t = j1jX1;t; 1] when s1;t and s2;t are independent.
28
Proof of Proposition 2 The prediction of yt+1 conditional on Xt is given by
E [yt+1jXt] = 1;tDt+1 +
p X a(11k);tyt+1?k + a(12k);tz1;t+1?k + a(13k);tmt+1?k + a(14k);tz1;t+1?k : k=1
(A.25)
A necessary condition for this expression to be invariant with respect to Mt is that (i) a(13k);t = 0 P for all k and t. This holds i qj=1 a(13k);j pij = 0 for all k and i. As long as P is of full rank it follows that a(13k);j = 0 for all j and k. Furthermore, the random coecients on Dt+1, yt+1?k , z1;t+1?k , and z2;t+1?k must also be invariant with respect to Mt. That is, 1;t = E [1;s +1 jX1;t; Z2;t] and a(1ki;t) = E [a(1ki;s) +1 jX1;t; Z2;t] for i = 1; 2; 4 and all k. For instance, 1;t is invariant with respect to Mt i (ii) condition (A1) in Proposition 1 holds; or (iii) st+1 is serially uncorrelated; or (iv) 1;s +1 only varies with s1;t+1, which is serially uncorrelated and independent of s2;t+1; or (v) 1;s +1 = 1 ; or (vi) 1;t = 0 for all t. Note that cases (iii){(vi) share the property that 1;t = 1 . Analogous cases can be established for a(1ki;t) . Stationarity of vt+1 is now required to ensure that the forecast error variance exists. QED t
t
t
t
Proof of Proposition 3 The condition [(B1) or (B2) or (B3) or (B4) or (B5)] is necessary for m to be strongly noncausal for y . The cases (C1) to (C4) follow by evaluating the conditional forecast error variance for each (Bi) case separately and assuming strong noncausality. It then follows that (B5) and strong noncausality implies a special case of either (C2) or (C3) or (C4). QED
29
References Chamberlain, G. (1982). The General Equivalence of Granger and Sims Causality. Econometrica, 50, 569{581. Christiano, L. J. & Ljungqvist, L. (1988). Money does Granger-Cause Output in the Bivariate Money-Output Relation. Journal of Monetary Economics, 22, 217{235. Ferson, W. E. (1993). Theory and Empirical Testing of Asset Pricing Models. Forthcoming in Jarrow, R. A., Ziemba, W. T. & Maksimovic, V. (Eds.), The Finance Handbook, North Holland. Florens, J. P. & Mouchart, M. (1982). A Note on Noncausality. Econometrica, 50, 583{591. Granger, C. W. J. (1969). Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica, 37, 424{438. Granger, C. W. J, Engle, R. F. & Robins, R. P. (1986). Wholesale and Retail Prices: Bivariate Time-Series Modelling with Forecastable Error Variances. In Belsey, D. & Kuh, E. (Eds.), Model Reliability, MIT Press, Cambridge, MA, 1{17. Hamilton, J. D. (1990). Analysis of Time Series Subject to Changes in Regime. Journal of Econometrics, 46, 39{70. Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press, Princeton. Hamilton, J. D. (1996). Speci cation Testing in Markov-Switching Time-Series Models. Journal of Econometrics, 70, 127{157. Holst, U., Lindgren, G., Holst, J. & Thuvesholmen, M. (1994). Recursive Estimation in Switching Autoregressions with a Markov Regime. Journal of Time Series Analysis, 15, 489{506. Karlsen, H. (1990). A Class of Non-Linear Time Series Models. PhD Thesis, Department of Mathematics, University of Bergen, Norway. Kim, C.-J. (1994). Dynamic Linear Models with Markov-Switching. Journal of Econometrics, 60, 1{22. Lindgren, G. (1978). Markov Regime Models for Mixed Distributions and Switching Regressions. Scandinavian Journal of Statistics, 6, 81{91. Lucas, R. E., Jr. (1972). Expectations and the Neutrality of Money. Journal of Economic Theory, 4, 103{124. Lucas, R. E., Jr. (1987). Models of Business Cycles. Basil Blackwell, Oxford. 30
Lutkepohl, H. (1991). Introduction to Multiple Time Series Analysis. Springer-Verlag, Berlin. Magnus, J. & Neudecker, H. (1988). Matrix Dierential Calculus with Applications in Statistics and Econometrics. John Wiley, Chichester. Sims, C. A., Stock, J. H. & Watson, M. W. (1990). Inference in Linear Time Series Models with Some Unit Roots. Econometrica, 58, 113{144. Svensson, L. E. O. (1995). Price Level Targeting versus In ation Targeting: A Free Lunch?. Manuscript, Institute for International Economic Studies, Stockholm University. Toda, H. Y. & Phillips, P. C. B. (1993). Vector Autoregressions and Causality. Econometrica, 61, 1367{1393. Warne, A. (1996). Autocovariance Functions and Maximum Likelihood in a VAR Model under Markov Switching. Manuscript, Institute for International Economic Studies, Stockholm University.
31