Semi-Parametric Estimation of Noncausal Vector Autoregression
Christian Gourieroux Joann Jasiak †
∗
First version: May, 2015, revised August 2016 Abstract This paper introduces a consistent semi-parametric estimation method for mixed causal/noncausal multivariate non-Gaussian processes. The new estimator, called the GCov for generalized covariance, relies on the moment conditions of nonlinear autocovariances of the process. The GCov does not require any distributional assumptions on the errors and produces the estimates from one single optimization while the MLE estimator requires a parametric specification as well as multiple optimizations, the number of which is equal to np + 1. The paper also discusses the advantages and limitations of the second-order identification in a mixed VAR and shows that, despite some limitations, it is possible to distinguish the different numbers of causal and noncausal components from the autocovariances of the process. The method is illustrated by a simulation study and applied to commodity prices. Keywords : Multivariate Noncausal Process, Identification, SemiParametric Estimation, Generalized Covariance Estimator, Speculative Bubble, Alternative Investment. JEL number: C14, C32, C51.
∗
University of Toronto and CREST, e-mail:
[email protected]. York University, Canada, e-mail:
[email protected]. The first author gratefully acknowledges financial support of the chair ACPR:Regulation and Systemic Risks, and of the Global Risk Institute. †
1
Introduction
The analysis of stationary linear time series with a two-sided moving average representation involving independent and identically distributed (i.i.d.) errors originated from the early classical time series literature [see e.g. Hannan (1973)]. A particular characteristic of non-Gaussian stationary linear time series is that its two-sided moving average representation that includes the present, past and future shocks can be distinguished from a one-sided moving average representation involving the current and lagged shocks only. In Gaussian processes, those two representations cannot be distinguished and the future shocks that determine the impulse response functions cannot be identified. Nevertheless, the Gaussian processes have been given particular attention in the time series literature, especially since the mid-seventies, in the framework of the Box-Jenkins method of analysis. That methodology offers a straightforward approach to the identification and estimation of time series from the moments of order up to two only, allowing to accommodate the past dependence in Gaussian processes, but leaving out of the scope the noncausal and nonlinear effects. Accordingly, the class of VARMA (ARMA) models that has remained the benchmark specification for linear processes over the last 30 years, can capture only the linear causal effets of the past on the present and disregards any noncausal or nonlinear causal dynamics. Even if the data are non-Gaussian, the noncausal components are left out as the Box-Jenkins method relies on the Gaussian pseudo-maximum likelihood estimation and suffers from the aforementioned lack of identification. Consequently, the causal and noncausal dynamics cannot be distinguished, leading the researcher to an erroneous conclusion about the absence of noncausal dynamics. Moreover, given that the Gaussian pseudo-maximum likelihood estimators may be inconsistent when applied to non-Gaussian data, the inference based on those estimators, including the impulse response analysis, may be unreliable [see, Gourieroux, Monfort (2015)]. Recently, due to the wide-spread use of non-Gaussian processes in Economics and Finance, the interest in noncausal dynamics in stationary linear time series has grown considerably. Empirical evidence from applications involving multivariate economic and financial time series with non-Gaussian distributions suggests that estimation of causal VAR models by the Gaussian maximum likelihood (ML), in line with the Box-Jenkins procedure, leads to mispecifications. For example, an estimated autoregressive matrix of a nonGaussian VAR(1) may have eigenvalues of modulus significantly larger than 2
1 [see e.g. Lanne, Saikkonen (2013) for such estimations, and Gourieroux, Jasiak (2014) for the analysis of causal misspecification]. Another empirically established fact is the presence of nonlinearities in financial data. In the time series literature, there exist theoretical results that point to the equivalence of noncausal linear dynamics with causal nonlinear patterns [see Rosenblatt (2000)]. Accordingly, a linear noncausal model can accommodate complex nonlinear dynamics in calendar time, such as conditional heteroscedasticity. In particular, Gourieroux, Zakoian (2016) have shown that a noncausal linear autoregression can be used to model speculative bubbles in processes with fat tails. There are also valid economic arguments that suggest the presence of noncausal components in the structural economic VARMA models. For example, noncausal components can result from a leading effect of a fiscal policy [Leeper, Walker, Yang (2013)], or from the rational expectations hypothesis in multivariate dynamic models [see e.g. Gourieroux, Jasiak, Monfort (2016) for general discussions]. The estimation method that has been used in the recent literature on mixed causal/noncausal processes is the approximate maximum likelihood (ML), which assumes a parametric specification of the error distribution 1 . The ML method is consistent, provided that the error distribution is properly specified, while being generally inconsistent, otherwise. The objective of this paper is to improve the methodology in this respect and to introduce semiparametric estimation methods for the mixed causal/noncausal multivariate processes. The paper is focused on the mixed VAR model, which is the benchmark model in the literature on multivariate linear processes. We show that second-order identification, that is, the identification of the process from moments up to order two, which include the auto- and crosscovariances, is feasible to some extent in the VAR model, contrary to the wide-spread belief in the lack of second-order identification of non-Gaussian linear processes. We claim that, in general, it is possible to distinguish the mixed processes with different values of n1 , n2 , where n1 and n2 denote the causal and the noncausal dimensions, respectively. For direct estimation of the matrix of autoregressive coefficients in a semiparametric framework, we propose the GCov (Generalized Covariance) esti1
See Breidt et al. (1991), Rosenblatt (2000), Chapter 8, for the asymptotic properties of the ML estimators in the univariate case, Davis, Song (2012), Lanne, Saikkonen (2013), for the multivariate case.
3
mator that relies on the moment conditions of nonlinear autocovariances. Although in the parametric framework this estimator is not fully efficient, it provides consistent estimates in one single optimization, while the ML requires multiple optimizations. More precisely, when the error distribution is parametric, the lack of full efficiency of the generalized covariance estimator is compensated by its numerical simplicity. For example, for a VAR(p) process, the ML estimator requires np + 1 optimizations, which is equal to the total number of all possible combinations of causal and noncausal dimensions of the process 2 . In contrast, the GCov estimator is obtained from a single optimization only. The paper is organized as follows. In Section 2, we derive and discuss the causal and noncausal components of a mixed VAR(1) process, along with their second-order properties. Section 3 covers the second-order identification of the mixed VAR(1) process. Section 4 presents the semi-parametric estimation method from the covariance-based moment conditions. The methodology is extended in Section 5 to the general VAR(p) process. An application to a set of simulated data is presented in Section 6, which shows how to implement the GCov estimator in practice. Section 6 also examines the finite sample properties of the GCov and provides an application to commodity prices analysis. Section 7 concludes. Proofs are gathered in Appendices. Additional figures are provided as the Complementary Material.
2
The Mixed Vector Autoregressive Process of Order 1
Let us consider a mixed VAR(1) process (Yt ) of dimension n, which is strictly stationary and satisfies the recursive system : Yt = ΦYt−1 + εt ,
(2.1)
where (εt ) is a sequence of i.i.d. random vectors (i.e. a strong white noise) of dimension n, and Φ is a (n × n) matrix. We assume that : Assumption A.1 : εt is square integrable with zero mean E(εt ) = 0, and variance-covariance matrix V (εt ) = Σ. 2
For example, in a bivariate VAR(3) model, the MLE estimator requires 7 separate optimizations.
4
Assumption A.2 : The eigenvalues of matrix Φ are of modulus different from 1. Assumption A.2 ensures the existence and uniqueness of a strictly stationary solution to recursive equation (2.1), shown in Section 2.1 below 3 . Moreover this solution admits a two-sided moving average representation. As εt is not assumed to be independent of the lagged values of the process Yt−1 , Yt−2 , . . ., process (2.1) is not necessarily ”causal”.
2.1
Causal and noncausal components of Y
To prove the existence of a stationary solution of (2.1) and to derive the twosided moving average representation of process Y , we first find its causal and noncausal components. We follow the approach introduced in Davis, Song (2012) and Gourieroux, Jasiak (2015). Proposition 1: Representation Theorem Let us denote by n1 (resp. by n2 = n − n1 ) the number of eigenvalues of Φ of modulus strictly less than 1 (resp. strictly larger than 1). There exist an invertible (n × n) matrix A, and two square matrices: J1 of dimension (n1 × n1 ) and J2 of dimension (n2 × n2 ) such that all eigenvalues of J1 (resp. J2 ) are of modulus strictly less than 1 (resp. larger than 1) and such that : ∗ ∗ Yt = A1 Y1,t + A2 Y2,t , ∗ ∗ ∗ ∗ Y1,t = J1 Y1,t−1 + ε∗1,t , Y2,t = J2 Y2,t−1 + ε∗2,t ,
(2.2) (2.3)
ε∗1,t = A1 εt ,
(2.4)
ε∗2,t = A2 εt ,
where A1 , A2 (resp A1 , A2 ) are the blocks in the decomposition of matrix A as : 1 A −1 −1 A = (A1 , A2 ) [resp. in the decomposition of A as A = ]. A2 Proof : It is always possible to decompose matrix Φ as : J1 0 Φ=A A−1 , 0 J2 3
See also Deistler (1975) Assumption IV and Lemma 2.
5
by considering its real Jordan canonical form and gathering in J1 (resp. J2 ) all sub-blocks associated with the eigenvalues of modulus less than 1 (resp. larger than 1) [see Gourieroux, Jasiak (2015)]. Let us premultiply both sides −1 of equation ∗ (2.1) by matrix A and∗ write: Y1,t ε1,t ∗ −1 ∗ Yt = ≡ A Y t , εt = ≡ A−1 εt . We get : ∗ Y2,t ε∗2,t J1 0 ∗ ∗ + ε∗t , Yt−1 Yt = 0 J2 or: ∗ ∗ Yj,t = Jj Yj,t−1 + ε∗j,t , j = 1, 2,
as the equivalent of recursive equations (2.3). Moreover, equation Yt = AYt∗ , is equivalent to : ∗ ∗ Yt = A1 Y1,t + A2 Y2,t ,
that is the decomposition given in equation (2.2). This proves Proposition 1. QED Let us examine the recursive equations (2.3). Since all eigenvalues of J1 are of modulus strictly less than 1, the recursive equation : ∗ ∗ Y1,t = J1 Y1,t−1 + ε∗1,t ,
is causal. Next, by recursive backward substitutions, we obtain the causal ∗ one-sided moving average representation of Y1,t : ∗ Y1,t
=
∞ X
J1h ε∗1,t−h ≡ (Id − J1 L)−1 ε∗1,t ,
(2.5)
h=0
where : −1
(Id − J1 L)
=
∞ X h=0
and L denotes the lag operator.
6
J1h Lh ,
(2.6)
∗ ∗ The second recursive equation : Y2,t = J2 Y2,t−1 + ε∗2,t requires a different approach, as the eigenvalues of J2 are of modulus strictly larger than 1. This recursive equation can be rewritten as : ∗ ∗ − J2−1 ε∗2,t+1 , = J2−1 Y2,t+1 Y2,t
(2.7)
or by recursive substitution: ∗ Y2,t
=−
∞ X
J2−h ε∗2,t+h ≡ (Id − J2 L)−1 ε∗2,t ,
(2.8)
h=1
where : (Id − J2 L)−1 = −
∞ X
J2−h L−h .
(2.9)
h=1
By comparing formulas (2.6) and (2.9), we see that the expression of the inverse of (Id − Jj L), j = 1, 2, depends on the modulus of the eigenvalues being greater, or smaller than 1 [see also Destler (1975), equation 6, for an equivalent result in z-transform]. This leads us to the following Corollaries : Corollary 1 : There exists a two-sided moving average solution of (2.1), that is, Yt = [A1 (Id − J1 L)−1 A1 + A2 (Id − J2 L)−1 A2 ]εt = [A1
∞ X
J1h Lh A1
− A2
h=0
∞ X
J2−h L−h A2 ]εt .
h=1 ∗ (Y1,t )
∗ (Y2,t )
Corollary 2 : i) Processes and are purely causal and noncausal, respectively. They can be interpreted as the causal and noncausal components of process (Yt ); ii) These components are deterministic functions of (Yt ) ∗ because: Yj,t = Aj Yt , j = 1, 2. Corollary 2 clarifies the role of the representation theorem, which can be compared to the representation theorem in the cointegration theory. The aim of the cointegration analysis is to find a combination of n1 components of the process that is stationary, so that the combination of n2 = n − n1 remaining components is nonstationary and represents the common trend. In 7
our framework, we derive n1 purely causal combinations, while the combination of the remaining n2 = n − n1 purely noncausal components determines the common speculative bubble 4 . In practice, number n2 is expected to be small. Corollary 3 : We can always write: (Id − ΦL)−1 = Ξ1 (L)Ξ2 (L−1 ), where the roots of det Ξ1 (z) = 0 and det Ξ2 (z) = 0 are of modulus greater than 1. Proof: We have: 1 (Id − J1 L)−1 0 A (Id − ΦL) = (A1 , A2 ) −1 0 (Id − J2 L) A2 1 (Id − J1 L)−1 0 Id 0 A . = (A1 , A2 ) −1 A2 0 Id 0 (Id − J2 L) −1 (Id − J L) 0 1 Therefore, (Id−ΦL)−1 = Ξ1 (L)Ξ2 (L−1 ), where Ξ1 (L) = A 0 Id Id 0 and Ξ2 (L−1 ) = A−1 . 0 (Id − J2 L)−1 QED −1
2.2
The autocovariance function
Let us denote by Γ(h) = t , Yt−h ) thematrix autocovariance of Y of or Cov(Y ∗ Γ (h) Γ∗1,2 (h) 1,1 ∗ = Cov(Yt∗ , Yt−h ) the autocovarider h, and by Γ∗ (h) = Γ∗2,1 (h) Γ∗2,2 (h) ance of Y ∗ and its block decomposition. The expressions of autocovariances Γ∗ (h) are derived in Appendix 1. Γ(h) is defined in the following Proposition : Proposition 2 : Γ(h) = A1 Γ∗1,1 (h)A01 + A1 Γ∗1,2 (h)A02 + A2 Γ∗2,1 (h)A01 + A2 Γ∗2,2 (h)A02 , 4
see Gourieroux, Zakoian (2016), Gourieroux, Jasiak, Monfort (2016).
8
where : |h|
Γ∗1,1 (h) = J1 Γ∗1,1 (0), for h ≥ 0, = Γ∗1,1 (0)(J10 )|h| , for h ≤ 0, −|h| ∗ Γ2,2 (0),
Γ∗2,2 (h) = Γ∗2,2 (0)(J20 )−|h| , for h ≥ 0, = J2
for h ≤ 0,
Γ∗1,2 (h) = −[Σ∗1,2 (J20 )−h + J1 Σ∗1,2 (J20 )−h+1 + . . . + J1h−1 Σ∗1,2 (J20 )−1 ], for h > 0, = 0, for h ≤ 0, Γ∗2,1 (h) = [Γ∗1,2 (h)]0 , where Σ∗1,2 = Cov(ε∗1,t , ε∗2,t ).
3
Second-Order Identification
Second-order identification relies on the moments up to order two, including the autocovariances of a process. In the analysis of mixed multivariate processes, two identification issues arise, which are discussed below in the context of a mixed VAR(1) process. ∗ ∗ +A2 Y2,t First, the factor decomposition of Yt in formula (2.2) : Yt = A1 Y1,t involves the causal and noncausal ”factors”. As it is common in factor analysis, they are defined up to invertible linear transformations. This describes the static identification problem. Second, it can be difficult to distinguish the causal and noncausal components of the process and find their respective numbers from an analysis based only on the first and second-order moments of the observable process. This is the dynamic identification problem. The two identification issues are discussed below.
3.1
Factor Standardization
J1 0 The decomposition of matrix Φ into Φ = A A−1 is not unique, 0 J2 ∗ ∗ neither is the associated decomposition of Y into Yt = A1 Y1,t + A2 Y2,t . Q1 0 Let us consider a block diagonal invertible matrix , where 0 Q2 Qj is a square (nj × nj ) matrix j = 1, 2. We can write :
9
Q1 0 0 Q2
J˜1 0 0 J˜2
Φ = A
= A˜
Q−1 0 1 0 Q−1 2
J1 0 0 J2
Q1 0 0 Q2
Q−1 0 1 0 Q−1 2
A˜−1 ,
−1 j ˜j where A˜j = Aj Qj , J˜j = Q−1 j Jj Qj , A = Qj A , j = 1, 2.
Hence, we also have: ∗ ∗ Yt = A1 Y1,t + A2 Y2,t = A˜1 Y˜1,t + A˜2 Y˜2,t , ∗ where : Y˜j,t = A˜j Yt = Q−1 j Yj,t , j = 1, 2. The causal and noncausal factors are defined up to an invertible transformation. However, the causal and noncausal vector spaces generated by these factors, or equivalently by the columns of A1 and A2 , are more relevant than the factors themselves. The existing literature on mixed processes, agrees on the decomposition J1 0 such that is the Jordan form of matrix Φ [i.e. the complex Jordan 0 J2 form in Davis, Song (2012), and the real Jordan form in Gourieroux, Jasiak (2015)]. We follow this approach in Section 4.
3.2
The difficulty in identifying the causal dimension
It is known in the literature [see e.g. Chan, Ho, Tong (2006)] that the moving average coefficients of a two-sided moving average process are identifiable (up to the effect of V (εt ) = Σ), if the error distribution has no Gaussian features. More precisely, at most one component of the noise can be Gaussian. Conversely, if the ε0t s are Gaussian, the dynamics of a two-sided moving average process cannot be distinguished from the dynamics of a pure causal one-sided moving average. This suggests that there is an issue with the dynamic identification based on only the first and second-order moments of the observable process 5 . 5
This nonlinear identification result is related to the second-order identification in Deistler (1975), Theorem 2. Indeed, at second order, the autoregressive polynomial is
10
A−1
Surprisingly, in the VAR(1) framework, that is not necessarily the case 6 . Indeed, the identification issue usually concerns an unconstrained framework, +∞ ∞ ∞ X X X where Yt = Bj εt−j = Bj εt−j + B−j εt+j . This corresponds to a j=−∞
j=0
j=1
decomposition into a causal component Y˜1,t =
∞ X
Bj εt−j and a noncausal
j=0
component Y˜2,t =
∞ X
B−j εt+j , both of dimension n. The decomposition de-
j=1
rived in Proposition 1 shows that Assumption (2.1) of mixed VAR(1) process imposes additional restrictions on the dimensions of the causal and noncausal spaces, n1 and n2 , which sum up to n, and on the patterns of the moving average coefficients (see Corollary 1). Those restrictions facilitate the identification of the causal dimension.7 Let us discuss this point in a bivariate process n = 2. i) First, if (Yt ) is a pure causal VAR(1) process, we get : Γ(h) = Φ|h| Γ(0), if h ≥ 0, = Γ(0)(Φ0 )|h| , if h ≤ 0.
(3.1)
Similarly, if (Yt ) is a pure noncausal VAR(1) process, we get : Γ(h) = Γ(0)(Φ0 )−|h| , if h ≥ 0, = Φ−|h| Γ(0), if h ≤ 0.
(3.2)
By comparing the series of autocovariances, we easily derive the following result : Proposition 3: At second-order, any purely causal VAR(1) process with autoregressive matrix Φ admits also a purely noncausal VAR(1) representation ˜ = Γ(0)−1 (Φ0 )−1 Γ(0). with matrix Φ identifiable up to a product of a unimodular matrix, whose elements are polynomial functions of the lag operator. The assumption of i.i.d. non-Gaussian errors (which is stronger than ”uncorrelated errors”) eliminates the problem of the unimodular matrix. 6 neither in the VAR(p) framework (see Section 5) 7 This resembles a result in the Independent Component Analysis (ICA). The sources can be identified at second-order, if they are serially correlated with distinct spectra (while being cross-sectionally uncorrelated). This identification result leads to second-order estimation methods of the sources, such as AMUSE [Tong et al. (1990)], or SOBI [Belouchrani et al. (1997)].
11
Thus, it is not possible to distinguish a pure causal representation from a pure noncausal representation given the knowledge of autocovariances only. Note that Proposition 4 is valid for any value of n ≥ 2. ii) Let us now consider a mixed causal/noncausal process with n1 = n2 = 1. The autocovariances of the causal and noncausal components are (see Appendix 1 for the cross-covariances) :
∗ ∗ σ2,2 σ1,1 |h| −|h| ∗ = J , γ2,2 (h) = 2 J2 , 2 1 1 − J1 J2 − 1 ∗ σ 1,2 ∗ γ1,2 (h) = (J1h − J2−h ), if h > 0, = 0, if h ≤ 0. 1 − J1 J2 ∗ (h) γ1,1
(3.3) (3.4)
This leads us to the following Proposition (see Appendix 2) : Proposition 4: At second order, the bivariate mixed causal/noncausal pro∗ ∗ cess admits a pure causal representation if and only if γ1,2 (0) = 0 ⇔ σ1,2 = 0. ∗ = 0, then the ”causal” Proposition 4 can be interpreted as follows. If σ1,2 ∗ ∗ and the ”noncausal” processes Y1 , Y2 and the associated innovations ∗1 , ∗2 are uncorrelated. Then, there exist four representations of the bivariate process, which cannot be distinguished and have the following characteristics: a purely causal representation with the eigenvalues of Φ equal to J1 and 1/J2 ; a purely noncausal representation with eigenvalues 1/J1 , J2 ; and two mixed causal/noncausal representations with the pair of eigenvalues being either ∗ 6= 0, one can distinguish on the basis of (J1 , J2 ), or (1/J1 , 1/J2 ). If σ1,2 second-order properties only the (purely causal, purely noncausal) from the mixed, but cannot distinguish the purely causal from the purely noncausal.
3.3
Second-order identification of causal/noncausal dimensions and directions
Let us first show that it is not possible to disentangle a model of causal and noncausal dimensions (n1 , n2 = n − n1 ), respectively, from a model of dimensions (n2 = n − n1 , n1 ), given the knowledge of the autocovariance functions Γ(h), ∀h only.
12
Proposition 5: At second-order a mixed model with characteristics [(n1 , A1 , J1 ), (n2 , A2 , J2 )] cannot be distinguished from a model with characteristics [(n2 , A2 , J2−1 ), (n1 , A1 , J1−1 )]. Proof : Let us consider the two-sided moving average representation of (Yt ) given in Corollary 1. We get : Yt = {A1 (Id − J1 L)−1 A1 + A2 (Id − J2 L)−1 A2 }εt = A1 (Id − J1 L)−1 ε∗1,t + A2 (Id − J2 L)−1 ε∗2,t . This representation can be equivalently written as : Yt = −A1 (Id − J1−1 L−1 )−1 J1−1 L−1 ε∗1,t − A2 (Id − J2−1 L−1 )−1 J2−1 L−1 ε∗2,t . Let us now define the noise process ε∗ in reverse time as : ε˜t = ε∗−t . We get : Yt = A1 (Id − J1−1 L)−1 J −1 L˜ ε1,t − A2 (Id − J2−1 L)−1 J2−1 L˜ ε2,t = A1 (Id − J1−1 L)−1 ε˜∗1,t + A2 (Id − J2−1 L)−1 ε˜∗2,t , where ε˜∗j,t = −Jj−1 ε˜j,t−1 , j = 1, 2. This is another mixed causal/noncausal representation of process (Yt ) in which the causal component has characteristics (n2 , A2 , J2−1 ) and the characteristics of the noncausal component are (n1 , A1 , J1−1 ). QED Thus, an observationally equivalent model is obtained by replacing the eigenvalues by their reciprocals, as long as it is done simultaneously for all causal (resp. noncausal) components. Proposition 6 below extends Proposition 3 for pure processes to mixed causal/ noncausal processes.
13
Proposition 6: i) The product of the causal and noncausal dimensions equal to n1 (n − n1 ) is identifiable at second-order, except on a set of Σ of measure zero. ii) For a given pair [n10 , n − n10 ], the pair of causal/noncausal spaces can be second-order identifiable. Proof : i) From Proposition 2, it follows that the dimension of the vector space generated by matrices Γ(h), h < 0, is equal to n21 + (n − n1 )2 = n2 − 2n1 (n − n1 ). This quantity is identifiable as well as the product n1 (n − n1 ). This proves the result. ii) The second-order identifiability of the couple of spaces is proven in Appendix 3 for n = 2. QED Because ∗ ∗ Γ∗12 (h) = Cov(Y1,t , Y2,t−h ) = Cov(A1 Yt , A2 Yt−h )
= A1 Γ(h)(A2 )0 = 0. for h ≤ 0, a basis of causal and noncausal directions A01 , A02 can be found as: (A10 , A20 )
= ArgminA1 ,A2
−H X
||A1 Γ(h)(A2 )0 ||2
h=0
s.t. A1 Γ(0)(A1 )0 = Idn1 , A2 Γ(0)(A2 )0 = Idn−n1 . Thus, in a mixed VAR(1) process, the second-order identification is feasible to a limited extent, as we can identify the pairs [(n10 , A10 , J10 ), [(n − −1 −1 n10 , A20 , J20 ], [(n − n10 ), A20 , J20 ), (n10 , A10 , J10 )]. However, we are unable to determine which one among these is true.
4
Nonlinear Covariance Estimators
This section introduces a direct consistent estimator of matrix Φ of autoregressive coefficients in the VAR(1) model based on nonlinear sample autocoˆ of Φ, the real Jordan canonical with elements variances. Given the estimate Φ ˆ Jˆ1 , Jˆ2 can also be found as well as the causal and noncausal components A, of the process. 14
4.1
Estimation of Φ
By considering the restrictions Cov(A1 Yt , A2 Yt−h ) = 0 for h ≤ 0, based on moments up to order two only, we are unable to disentangle the mixed models (n1 , n − n1 ) and (n − n1 , n1 ). However, it follows from the nonlinear identification result in Chan, Ho, Tong (2006), that as long as the error terms εt are serially independent, there exist other nonlinear covariance based conditions that can be used. These are based on the covariances between nonlinear transforms of the error terms: cj,k (h, Φ) = Cov[aj (Yt −ΦYt−1 ), ak (Yt−h −ΦYt−h−1 )], j, k = 1, ..., K, h = 1, ..., H, defined for a given set of functions ak , k = 1, ..., K. Appendix 4 defines more precisely a two-step generalized covariance (GCov) estimator of matrix Φ, and derive its asymptotic properties. The asymptotic accuracy of the GCov estimator depends on the choice of functions ak , k = 1, ..., K and the selected maximal lag H. From Proposition, 6 it follows that the consistency of the GCov is achieved if the moment conditions include second-order moments up to lag p plus at least one well-chosen nonlinear autocovariance. Let us discuss the practical choice of transformations aj , ak . One can choose a quadratic aj and a linear ak to capture the absence of leverage effect at lag h, h ≤ 0, or a quadratic aj and a quadratic ak to capture the absence of volatility persistence at lag h, h ≤ 0, or cubic , quartic, or even higher order aj and ak as proposed for instance to estimate ”autoregressive” schemes for mixed univariate processes in Gassiat (1990), and Rosenblatt (2000), Section 8.7. Theoretically, there exist optimal weights for various covariances, along the lines of the Generalized Method of Moments. However, from a practical point of view, unequal optimal weights can be numerically cumbersome to implement, due to a rather large dimension of the weighting matrix. There exists a simple way to circumvent that problem by replacing in the objective function the autocovariances by autocorrelations. More precisely, let us introduce a set of functions ak , k = 1, ..., K, and the autocorrelations [ j (Yt − ΦYt−1 ), ak (Yt−h − ΦYt−h−1 )]. Then, we can consider ρˆj,k (h, Φ) = Corr[a a weighted covariance estimator, defined as the minimizer of the following objective function:
15
ˆ = argminΦ Φ
K X K X H X [ ρˆ2j,k (h, Φ)],
(4.1)
j=1 k=1 h=1
where H is the highest selected lag and the theoretical autocorrelations ρj,k are replaced by their sample counterparts ρˆj,k . This estimator and its accuracy depend on the choice of functions ak , k = 1, ..., K and the maximum lag H. 4.1.1
Estimation of the Error Distribution
After identifying the causal and noncausal dimensions, the true error terms εt , or the transforms ε∗t , are consistently estimated by the associated residuals ˆt , ˆ∗t , respectively. These residuals are used to estimate the unknown joint density f ∗ , say, of the ε∗t by kernel smoothing, for example. These functional estimators of the densities can be used for two different purposes. First, fˆ can be used to simulate trajectories of the mixed process and to compute by bootstrap the accuracy of the generalized covariance estimator. Second, it is possible to reestimate the mixed causal/noncausal model by the Maximum Likelihood after substituting this kernel estimator fˆ∗ to the true f ∗ [see e.g. Gassiat (1993) for the asymptotic properties of such an adaptive estimation approach].
5
Increasing the autoregressive order
The methodology introduced in Sections 2-4 is easily extended to a mixed VAR process of any order p. For expository purpose, we first consider the example of a univariate mixed AR(2) process. Next, we discuss the multivariate model.
5.1
Mixed AR(2) process
Let us consider a univariate process that satisfies the autoregressive equation: yt = φ1 yt−1 + φ2 yt−2 + t ,
(5.1)
where the errors are i.i.d. non-Gaussian. We assume that the autoregressive polynomial can be written as: 16
Φ(L) = 1 − φ1 L − φ2 L2 = (1 − θ1 L)(1 − θ2 L),
(5.2)
where |θ1 | < 1 and |θ2 | > 1. This implies that the roots of Φ(z) are real. Then, we write the partial fraction decomposition: 1 1 θ1 1 θ2 1 = = + Φ(L) (1 − θ1 L)(1 − θ2 L) θ1 − θ2 1 − θ1 L θ2 − θ1 1 − θ2 L
(5.3)
It follows that: yt =
θ1 θ2 1 1 1 t = t + t , Φ(L) θ1 − θ2 1 − θ1 L θ2 − θ1 1 − θ2 L
or, equivalently, yt =
θ1 θ2 1 ∗ y1,t + y∗ , θ1 − θ2 θ2 − θ1 1 − θ2 L 2,t
(5.4)
∗ ∗ where (1 − θ1 L)y1,t = t , (1 − θ2 L)y2,t = t . ∗ As |θ1 | < 1, process y1,t is a causal AR(1) with causal innovation ∗1,t = t . ∗ Let us now consider y2,t . We have:
−θ2 L(1 −
1 1 1 −1 ∗ ∗ L )y2,t = t ⇐⇒ (1 − L−1 )y2,t = − t+1 . θ2 θ2 θ2
(5.5)
∗ Thus, y2,t is a noncausal AR(1) process with noncausal innovation 1 ∗ 2,t = − θ2 t+1 . The equality (5.4) extends the representation derived in Proposition 1. The first difference is the deterministic relationship between the causal and noncausal innovations ∗2,t = − θ12 ∗1,t+1 , which is a consequence of the unique shock t driving both the causal and noncausal components. ∗ ∗ Another difference is the filtering of y1,t , y2,t from the observations on yt . We have: ∗ (1 − θ1 L)y1,t = t = (1 − θ1 L)(1 − θ2 L)yt , ∗ ∗ ∗ and therefore y1,t = yt − θ2 yt−1 . Similarly, y2,t = yt − θ1 yt−1 . Thus, y1,t and ∗ y2,t are combinations of both the current and lagged values of yt , and are not functions of the current value yt only.
17
5.2
Multivariate VAR(p) process
Let us now consider a mixed VAR(p) process: Yt = Φ1 Yt−1 + · · · + Φp Yt−p + t ,
(5.6)
where (t ) is a sequence of i.i.d. random vectors of dimension n. This model can be equivalently written as a mixed VAR(1) model for process Xt obtained 0 0 by stacking the current and lagged values of Yt : Xt = (Yt0 , Yt−1 , ..., Yt−p+1 )0 . We have: Xt = ΨXt−1 + ut ,
(5.7)
where ut = (0t , 0, ..., 0)0 and
Φ1 ... ... Φp Id 0 ... 0 Ψ= 0 Id ... 0 . 0 ... Id 0 The representation theorem in Proposition 1 can be applied to this new model. We introduce the real Jordan canonical form of matrix Ψ: J1 0 Ψ=B B −1 , 0 J2 1 B −1 and define B = (B1 , B2 ), B = . Then, B2 ∗ ∗ Xt = B1 X1,t + B2 X2,t , ∗ ∗ ∗ X1,t = J1 X1,t−1 + u1,t , ∗ ∗ X2,t = J2 X1,t+1 + u∗2,t , ∗ X1,t = B 1 Xt ,
u∗1,t = B 1 ut ,
∗ X2,t = B 2 Xt ,
u∗2,t = B 2 ut .
In fact, the Jordan representation is an indirect extension to the multivariate framework of the standard partial fraction decomposition by the formula: (Id − ΨL)−1 = B1 (Id − J1 L)−1 B 1 + B2 (Id − J2 L)−1 B 2 , 18
that follows from Corollary 1. The remarks from Section 5.1 are valid in the multivariate framework. For example, a) u∗1,t and u∗2,t satisfy a linear deterministic relationship. They both depend on t . However, dim u∗1,t + dim u∗2t = n1 + n2 = np > dim t = n, if p > 1. Pp−1 2 P 2 ∗ 1 ∗ = B 1 Xt = p−1 b) X1,t h=0 Bh Yt−h . They are h=0 Bh Yt−h and X2,t = B Xt = functions of the current and lagged values of Yt . However, extending other results obtained for the VAR(1) process, like for example Corollary 3, to the VAR(p) process needs to be done with caution. We know from Corollary 3 that (Id − ΨL)−1 = Θ1 (L)Θ2 (L−1 ). This decomposition does not imply that the autoregressive polynomial Φ(L) = Id − Φ1 L − · · · − Φp Lp can be written as: Φ(L) = Θ∗1 (L)Θ∗2 (L−1 ), say, because such a decomposition requires additional assumptions [see e.g. Lanne, Saikkonen (2013)].
5.3
Statistical Inference
The analysis of a mixed VAR(p) process (5.5) can be done along the following lines: a) Estimate Φ1 , ..., Φp , for a given p by the Generalized Covariance estimator based on t (Φ) = Yt − Φ1 Yt−1 − · · · − Φp Yt−p ; ˆ from the estimated Φ ˆ 1 , ..., Φ ˆ p , and the Jordan canonical b) Compute Ψ ˆ i.e. Jˆ1 , Jˆ2 , B; ˆ form of Ψ, ˆ 1 Yt−1 − · · · − Φ ˆ p Yt−p , and plot their c) Compute the residuals ˆt = Yt − Φ nonlinear ACF. If these autocorrelations are still significant, reestimate the model with a higher autoregressive order p. If they are not significant, consider p as the estimated autoregressive order. d) Apply the filtering formulas to obtain the causal and noncausal com∗ ∗ ˆ 1,t ˆ 2,t ponents X ,X that can possibly have an economic interpretation. e) Plot the kernel smoothed sample densities of ˆt to estimate the common distribution of errors t .
19
6
Illustration
In this section, we first illustrate the application of the Generalized Covariance estimator introduced in Section 4 to simulated data and discuss its finite sample properties. Next, the GCov estimator is used to analyze the dynamics of commodity futures.
6.1
The simulated data
Let us consider a bivariate process n = 2 of causal and noncausal dimensions equal to 1: n1 = n − n1 = 1. The following parameter values are fixed: J1 = 0.7, J2 = 2, 1 −1 1 1 −1 A= ,A = . 0 1 0 1 The errors t = (1,t , 2,t )0 are such that 1,t , 2,t are drawn independently in the same t-Student distribution with the degree of freedom ν = 4, mean zero and variance equal to ν/(ν − 2). The autoregressive matrix is equal to: J1 0 0.7 −1.3 −1 Φ=A A = . 0 J2 0.0 2.0 We create a sample of length T = 1000 by simulating 2000 errors st , t = −1 s 1, ..., T , and by computing the simulated transformed errors ∗s t = A t , ∗s ∗s + ∗s from which the values of the causal component Y1,t = J1 Y1,t−1 1,t , t = ∗s 1, ..., T , with initial value Y1,0 = 0, and the values of the noncausal com∗s ∗s ponent: Y2,t = 1/J2 Y2,t+1 − 1/J2 ∗s 2,t+1 , t = 1, ..., T , with terminal value ∗s Y2,T +1 = 0 are obtained. Next, we compute the values of the series Yts = AYt∗s , t = 1, ..., T and discard the first and last 500 realizations. They are ∗ ∗ ∗ related to the observed process as follows: Y1,t = Y1,t , Y2,t = Y1,t + Y2,t ∗ ∗ ⇐⇒ Y1,t = Y1,t , Y2,t = Y2,t − Y1,t . Hence, the first component of Yt is purely causal and its second component is a mixture of a causal and a noncausal process. Figure 1 shows the path of the two components of the observed process Y s and Figure 2 shows its autocorrelation function. [Figure 1: Simulated Yt ]
20
The first observed component has mean -0.147 and variance 8.296 and the second component has mean 0.028 and variance 0.633. Their contemporaneous correlation is -0.221. [Figure 2: Autocorrelation Function of Yt ] The simulated data display multiple peaks in the trajectory due to the fat tails of the errors. Indeed, for the selected value of parameter ν = 4, the kurtosis of the errors does not exist. The marginal and cross autocorrelations are significant up to lag 10 with exponential decay rates determined by the values of J1 = 0.7 and 1/J2 = 0.5. Let us now consider the auto- and cross-correlations of the causal and noncausal components of Yt∗ . [Figure 3: Autocorrelation Function of Yt∗ ] The cross-correlations are almost all not significant in the South-West panel of Figure 3. This illustrates the condition: Γ∗1,2 (h) = 0, for h ≤ 0, derived in Proposition 2.
6.2
Covariance estimators
Let us now illustrate the direct estimation of Φ by the generalized covariance (GCov) estimators. Let us consider model Yt = ΦYt−1 +t and write its errors as the functions of unknown parameter matrix Φ: t (Φ) = Yt − ΦYt−1 . The following set of four functions of the errors is considered: a1 () = 1 , a2 () = 2 , a3 () = 21 , a4 () = 22 . For a given value of Φ and by replacing variable Y and its lag by the observed processes, we compute the following series: (1,t (Φ), 2,t (Φ), 21,t (Φ), 22,t (Φ)) = (Z1,t (Φ), Z2,t (Φ), Z3,t (Φ), Z4,t (Φ)), t = 1, ..., T. Let ρˆj,k (h, Φ) denote the sample autocorrelation between Zj,t (Φ) and Zk,t−h (Φ), j, k = 1, ..., 4. We interpret them as the auto- and cross-correlation functions of (Φ), and the auto- and cross-correlations of 2 (Φ). The GCov estimate is obtained by minimizing the portmanteau statistic computed from the above auto- and cross-correlations up to and including lag H = 10:
21
ˆ = ArgminΦ Φ
4 X 4 X 10 X
[ˆ ρj,k (h, Φ)]2 .
(5.8)
j=1 k=1 h=1
When Y is replaced with the simulated data introduced in Section 6.1, the following estimated autoregressive matrix is obtained: 0.7246 −1.4525 ˆ= Φ , −0.0302 1.9939 with eigenvalues λ1 = 0.690, λ2 = 2.027 close to the true values J1 = 0.7 and ˆ are obtained by bootstrap and are equal J2 = 2.0.The standard errors of Φ to 0.023, 0.308, for the elements of the first row, and to 0.009, 0.120 for the elements of the second row. ˆ t−1 are computed in After estimating Φ, the GCov residuals εˆt = Yt − ΦY order to approximate the density of t . Figures 4 and 5 show the smoothed empirical densities of ˆ1,t and ˆ2,t in comparison to the true t-Student density of the errors. [Figure 4: Empirical Density of GCov Residuals and True Errors (causal component)] [Figure 5: Empirical Density of GCov Residuals and True Errors (noncausal component)] The standard kernel-based density estimators, produced by S+ by using a default bandwidth overlap, indicate that the empirical densities of the GCov residuals are very close to those of the true errors. More specifically, the extreme values of the GCov residuals are very close to the extremes of the true errors, as their maximum is 7.555 and 7.579, respectively and the minimum is -11.183 and -11.277, respectively. Moreover, the means of the GCov and true errors are -0.006 and -0.013 and their standard errors are 1.437 and 1.430, respectively, which are close too. Both series of GCov residuals are serially uncorrelated. Their contemporaneous correlation is statistically significant and equal to -0.095, as compared to the standard asymptotic normality-based critical value of 0.063. If matrix Φ is estimated by the Ordinary Least Squares (OLS) from the Seemingly Unrelated Regression of Yt on Yt−1 , we get the estimated matrix ˆ = 0.721 −1.272 Φ with eigenvalues λ1 = 0.67 and λ2 = 0.52. In this 0.008 0.473 misspecified pure causal model, the eigenvalues are close to J1 = 0.7 and 22
1/J2 = 0.5. The root outside the unit circle is captured by the estimate of its stationary counterpart.
6.3
Finite Sample Performance of the GCov Estimator
To assess the performance of the GCov estimator in finite samples, we simulate samples of length T=100 and T=200 of data generated from two independent series of t-Student errors with degreee of freedom 4 and matrix Φ: φ11 φ12 0.7 −1.3 Φ= = , φ21 φ22 0.0 2.0 with eigenvalues 0.7 and 0.2. The above autoregressive matrix Φ is identical to the one used in simulations discussed in Sections 6.1-6.2. Each of the three samples of T=100 and T=200 is replicated 300 times. Figure 6 illustrates the densities of GCov estimators with the objective function (5.8) of the four components of Φ. [Insert Figure 6: GCov Coefficient Estimates: T=100 (dashed line), T=200 (dotted line)] The vertical lines in Figure 6 indicate the true values of the parameters. We observe that for T=100, the GCov densities of Φij have long tails. For T=200, the tails of the densities get shorter as they become more peaked. ˆ we compute the eigenvalues and find their location with respect From Φ, to the unit circle. Table 1 provides the frequency of mixed estimated model (one eigenvalue of modulus less than 1 and one of modulus larger than 1), pure causal estimated model (both eigenvalues of modulus less than 1) and pure noncausal estimated model (both eigenvalues of modulus greater than 1). The frequency of mixed estimated model is rather large and increases with the sample size. Table 1 : Estimated Dynamics sample size T=100 T=200
mixed pure causal 79% 5% 86% 2%
23
pure noncausal 16% 12%
The estimated eigenvalues can be real or complex. For real-valued ones, Figure 7 displays the densities of the smallest (resp. largest) eigenvalue, with the vertical lines indicating their true values 0.7 and 2, respectively [Insert Figure 7: GCov Eigenvalues Estimates: T=100 (dashed line), T=200 (dotted line)]
6.4
Application to commodity futures
Let us now estimate a bivariate VAR model of commodity futures. The data set contains observations on US corn and US soybeans futures, expressed in US Dollars and observed daily between August 3, 2015 and May 12, 2016 8 . There are 200 observations on each commodity, collected only on the working days of each week, excluding the statutory US holidays. [Insert Figure 8: Evolution of Commodity Futures] Figure 8 shows the evolution of commodity futures over the sampling period. Both series display multiple asymmetric spikes and an increase over the final part of the trajectory. Therefore, we demean the data and remove the deterministic trend by regressing each series on time and time squared by the ordinary least squares. The residuals of these regressions are displayed jointly in Figure 9. [Insert Figure 9: Adjusted Data] Figure 9 reveals the comovements of the futures and the presence of asymmetric spikes. The adjusted data have non-Gaussian sample distributions with sample variances 241.45 and 771.79 and kurtosis 7.37 and 4.97 for corn and soybean, respectively. The two series are serially correlated in levels and in squares. Figure 11 shows the sample autocorrelations and cross-correlations of the levels. The range of serial correlation, as indicated by the statistically significant values is quite long. Figure 10 displays the ACF of the squared adjusted data. We observe volatility persistence as well as significant and non-diminishing covolatilities. [Insert Figure 10: ACF of Adjusted Data] [Insert Figure 11: ACF of Squared Adjusted Data] 8
The data and the characteristics of the futures (exchange, contract size, etc) are publicly available at www.Investing.com
24
We expect the mixed VAR(1) model to accommodate the patterns of ACF and ACF of squares. The GCov estimator is based on an objective function that includes moments up to order 4 with H=10. We get the following estimateof matrix Φ: 0.956 −0.101 ˆ= Φ . −0.326 1.091 The standard errors obtained by bootstrap are 0.466, 0.017, 0.012 and 0.465, ˆ are Jˆ1 = 0.83 and Jˆ2 = 1.21, respectively. The eigenvalues of matrix Φ respectively, which corresponds to a mixed process. The associated matrix Aˆ is: −0.423 −0.530 ˆ A= . 0.307 −0.794 The fit of the model can be assessed from the ACF of the residuals and of the squared residuals (see Figures 1, 2, 3 in Complementary Material). There remain significant cross-correlations of the residuals and the squared residuals are autocorrelated as well. In order to remove them, we fit the VAR(3) model to the data, with the following matrices of estimated autoregressive coefficients: 1.179 −0.217 −0.3122 0.150 0.2571 −0.007 ˆ1 = ˆ2 = ˆ3 = Φ , Φ , Φ . 1.366 0.555 −1.083 0.182 −0.236 0.181 The mixed VAR(3) has two pairs of complex eigenvalues of modulus less than one and two real eigenvalues, one of which is greater and one is less than one. [Insert Figure 12: ACF of VAR(3) Residuals] [Insert Figure 13: ACF of squared VAR(3) Residuals] From the ACF of the mixed VAR(3) residuals and squared residuals, we observe that the long range serial linear and nonlinear dependence have been removed by the VAR(3) model. The model also eliminated the conditional heteroscedasticity as the squared residuals are uncorrelated as well. This is due to the nonlinear dynamics in calendar time being accommodated by the mixed model. There is a noncausal space of dimension 1, generated by: ∗ X2,t = 0.54Y1,t + 0.42Y2,t + 0.52Y1,t−1 + 0.40Y2,t−1 + 0.49Y1,t−2 + 0.38Y2,t−2 ,
with close values of coefficients on the present and past values of the process. ∗ This result has the following financial interpretation. X2t is the value at date 25
t of a portfolio with 0.54 share invested in commodity 1, 0.42 share invested in commodity 2 and the remaining 0.52Y1,t−1 +· · ·+0.38Y2,t−2 invested in cash (riskfree asset with zero riskfree rate). This portfolio is regularly updated for the amount invested in cash and not self-financed. This is an alternative investment, whose value can be used to define an alternative index and construct options written on that index. These options are insurance products that protect against the effects of noncausal components such as the speculative bubbles and conditional heteroscedasticity. By using them in a suitable manner, the investor can focus on the management of the traditional causal components only.
7
Concluding Remarks
This paper examined the identification and estimation of mixed causal/noncausal linear VAR(p) processes. We first reveal that second-order identification is available to a limited extent, which is insufficient for practitioners. Therefore, we consider nonlinear transforms of the autocovariance function that lead us to the Generalized Covariance (GCov), which is a one-step estimator of a VAR(p) model with causal and noncausal components. The asymptotic properties of the GCov estimator are derived and presented in the paper. The proposed method provides also the estimates of the causal (resp. noncausal) dimensions and directions, as well as the nonparametric estimate of the error distribution. When the error distribution is parametric, the lack of full efficiency of the GCov estimator is compensated by its numerical simplicity. Indeed, the ML estimator requires np + 1 optimizations, which is equal to the number of all possible causal dimensions of the process, while the GCov estimator is obtained from one single optimization only. This methodology is applied to commodity prices, and an interpretation in terms of alternative investment is provided.
26
REFERENCES
Belouchrani, A., Abed-Meraim, K., K., Cardoso, J., and E., Moulines (1997) : ”A Blind Source Separation Technique Using Second-Order Statistics”, IEEE Trans. on Signal Processing, 45, 434-444. Breidt, F., Davis, R., Lii, K., and M., Rosenblatt (1991) : ”Maximum Likelihood Estimation for Noncausal Autoregressive Processes”, Journal of Multivariate Analysis, 36, 175-198. Chan, K., Ho, L., and H., Tong (2006) : ”A Note on Time-Reversibility of Multivariate Linear Processes”, Biometrika, 93, 221-227. Davis, R., and S., Resnick (1986) : ”Limit Theory for the Sample Covariance and Correlation Functions of Moving Averages”, The Annals of Statistics, 14, 533-558. Davis, R., and L., Song (2012) : ”Noncausal Vector AR Processes with Application to Economic Time Series”, DP Columbia University. Deistler, M. (1975): ”z-Ttransform and Identification of Linear Econometric Models with Autocorrelated Errors”, Metrika 22, 13-25. Gassiat, E. (1990) : ”Estimation Semi-Param´etrique d’un Mod´ele Autor´egressif Stationnaire Multiindice Non N´ecessairement Causal”, Ann. Inst. Henri Poincar´e, 26, 181-205. Gassiat, E. (1993) : ”Adaptive Estimation in Noncausal Stationary AR Processes”, Annals of Statistics, 21, 2022-2042. Gourieroux, C., and J., Jasiak (2001) : ”Nonlinear Autocorrelograms : An Application to Intertrade Durations”, Journal of Time Series Analysis,23, 1-28. Gourieroux, C., and J., Jasiak (2014) : ”Misspecification of Causal and Noncausal Orders in Autoregressive Processes”, CREST DP. Gourieroux, C., and J., Jasiak (2015): ”Filtering, Prediction, and Simu27
lation Methods for Noncausal Processes”, Journal of Time Series Analysis, 37, 405-430. Gourieroux, C., Jasiak, J. and A. Monfort (2016): ”Stationary Dynamic Equilibria in Rational Expectation Models”, CREST D.P. Gourieroux, C., and A., Monfort (1995): ”Statistics and Econometric Models”, Vol1, Cambridge University Press Gourieroux, C., and A., Monfort (2015) : ”Revisiting Identification and Estimation in Structural VARMA Models”, CREST DP. Gourieroux, C., Monfort A. and J.P. Renne (2016) : ”Statistical Inference for Independent Component Analysis”, CREST DP. Gourieroux, C., and J.M., Zakoian (2016) : ”Local Explosion Modelling by Noncausal Process”, Journal of the Royal Statistical Society, forthcoming (available on-line) Hannan, E. (1973) : ”The Asymptotic Theory of Linear Time Series Models”, Journal of Applied Probability, 10, 130-145. Hansen, L. (1982): ”Large Sample Properties of Generalized Method of Moments Estimators”, Econometrica, 50, 1029-1054. Hansen, L. and K. Singleton (1982): ” Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models”, Econometrica, 50, 1269-1286. Hyvarinen, A., Karhunen, J., and E., Oja (2001) : ”Independent Component Analysis”, New-York, Wiley. Ilmonen, P., Nordhausen, K., Oja, H., and E., Ollila (2012) : ”On Asymptotics of ICA Estimators and Their Performance Indices”,DP, Free University of Brussels. Lanne, M., and P., Saikkonen (2013) : ”Noncausal Vector Autoregression”, Econometric Theory, 29, 447-481. Leeper, E., Walker, T., and S., Yang (2013) : ”Fiscal Foresight and 28
Information Flows”, Econometrica, 81, 1115-1145. Rosenblatt, M. (2000) : ”Gaussian and Non-Gaussian Linear Time Series and Random Fields”, Springer Verlag. Tong, L., Soon, V., Huang, Y., and R., Liu (1990) : ”Amuse : A New Blind Identification Algorithm”, in Proc. IEEE ISCAS, 1784-1787, NewOrleans, May.
29
Appendix 1 Proof of Proposition 2. Let us derive the expression Γ∗1,2 (h) for h > 0. We have: ∗ ∗ Γ∗1,2 (h) = Cov(Y1,t , Y2,t−h )
= −Cov(ε∗1,t + J1 ε∗1,t−1 + . . . + J1h−1 ε∗1,t−h+1 , J2−h ε∗2,t + . . . + J2−1 ε∗2,t−h+1 ) = −[Σ∗1,2 (J20 )−h + J1 Σ∗1,2 (J20 )−h+1 + . . . + J1h−1 Σ∗1,2 (J20 )−1 ],
(a.1)
by using the serial independence of ε∗t and Σ∗1,2 = Cov(ε∗1,t , ε∗2,t ), ∀t. In particular, if n = 2, n1 = n2 = 1, then J1 , J2 and Σ∗1,2 are scalars and we get : −(h−1)
∗ ∗ γ1,2 (h) = −σ1,2 [J2−h + J1 J2
+ . . . + J1h−1 J2−1 ]
∗ = −σ1,2 J2−h [1 + (J1 J2 ) + . . . + (J1 J2 )h−1 ] ∗ = −σ1,2 J2−h
=
[1 − (J1 J2 )h ] 1 − J1 J2
∗ σ1,2 [J h − (J2−h )], if h > 0. 1 − J1 J2 1
30
(a.2)
Appendix 2 Condition for distinguishing a mixed process from a pure process, n=2 As it is equivalent to identify process (Yt ), or a one-to-one linear transformation of (Yt ), we assume that the mixed causal/noncausal process is (Yt∗ ) itself, with the autocovariances given in (3.3)-(3.4). By Proposition 3, we consider a pure causal process, without loss of generality. Let us denote such a process by : Yt = ΦYt−1 + εt , where the eigenvalues of Φ are of modulus strictly smaller than 1, and analyze the conditions ensuring that the associated autocovariances Γ(h) coincide with the autocovariances given in (3.3)-(3.4). Let us assume for expository purpose that J1 6= 1/J2 . Then matrix Φ0 is diagonalizable with eigenvalues J1 and 1/J2 , and we can write : J1 0 0 −1 Φ =C C, 0 J2−1 c11 c12 is the matrix, whose columns provide the eigenvecwhere C = c21 c22 tors of Φ0 . For h ≤ 0, we have from (3.2): ! |h| J 0 1 Γ(h) = Γ(0)(Φ0 )|h| = Γ(0)C −1 C. −|h| 0 J2 ∗ Let us now consider the implications of the conditions γ1,2 (h) = 0, if h ≤ 0. We get : 0 (1, 0)Γ(h) = 0, ∀h, h ≤ 0, 1
⇔ (1, 0)Γ(0)C
|h| J1
−1
0 |h|
⇔ (d11 , d12 )
J1 0
0 −|h|
J2 0
!
−|h|
J2
31
!
C
c12 c22
0 1
= 0, ∀h ≤ 0,
= 0, ∀h ≤ 0,
(where (d11 , d12 ) = (1, 0)Γ(0)C −1 ) |h|
−|h|
⇔ d11 c12 J1 + d12 c22 J2
= 0, ∀h ≤ 0,
⇔ either c12 = 0 and d12 = 0, or c22 = 0 and d11 = 0, (because vectors (d11 , d12 )0 and (c12 , c22 )0 are non-zero vectors and the |h| −|h| sequences J1 and J2 are linearly independent). a) Let us consider the case c12 = 0. Then c22 can be standardized to 1. We have : 1 1 0 c11 0 −1 , C= , C = c21 1 c11 −c 21 c11 1 1 0 (1, 0)Γ(0)C −1 = (γ11 (0), γ12 (0)) , −c21 c11 c11 and d12 = γ12 (0). Thus the condition d12 = 0 is equivalent to the condition γ12 (0) = 0. b) Similarly, if c22 = 0, we can fix c12 = 1. We have : 1 c11 1 0 −1 −1 C= ,C = − , c21 0 c21 −c21 c11 (1, 0)Γ(0)C
−1
1 = − [γ11 (0), γ12 (0)] c21
0 −1 −c21 c11
,
and d11 = γ12 (0). The condition d11 = 0 is equivalent to the condition γ12 (0) = 0.
32
Appendix 3 Identification of a mixed process for n = 2 Let us consider the mixed process (Yt∗ ) with n1 = n2 = 1 and the autocovariances given in (3.3)-(3.4). i) From Proposition 5, it follows that this process has the same second-order properties as another mixed process, where the first component is noncausal and associated with the inverse J1−1 , and the second one is causal and associated with J2−1 . ii) Let us now consider another mixed representation of process (Yt∗ ) with the matrix A−1 defining the new causal and noncausal components. The autocovariance of the new causal component is given by:
11
12
γ˜1,1 (h) = (a , a )Γ(h)
a11 a12
∗ ∗ σ11 |h| −|h| 12 2 σ22 J + (a ) J 1 − J12 1 J22 − 1 2 a11 a12 ∗ |h| −|h| + σ (J − J2 ). 1 − J1 J2 12 1
= (a11 )2
This autocovariance has the exponential form (3.3) only if it is proportional |h| −|h| to either J1 or J2 . Without loss of generality, we consider the case when |h| it is proportional to J1 (by applying part i) above). We have: ∗ a11 |h| ∗ 12 12 σ22 γ˜1,1 (h) = γ1,1 J1 ⇐⇒ a a 2 − σ = 0. J2 − 1 1 − J1 J2 12 ∗ When a12 = 0, we get the initial causal component (Y1,t ) (defined up to a 12 multiplicative scalar). Otherwise, we can assume a ˜ = 1 and obtain:
a ˜11 =
∗ σ22 1 − J1 J2 , 2 ∗ J2 − 1 σ12
(a.1)
∗ whenever σ12 6= 0. By symmetry, we can consider the autocovariance of the new second component, that is,
33
21
22
γ˜2,2 (h) = (a , a )Γ(h) −|h|
It is proportional to J2 a ˜22 =
a21 a22
.
, if and only if, either a21 = 0, or a ˜21 = 1 and
∗ 1 − J1 J2 σ11 . 2 ∗ 1 − J1 σ12
(a.2)
Finally, the cross-covariance between the new components is 21 a 11 12 . γ˜1,2 (h) = (a , a )Γ(h) a12 This cross-covariance has to be zero for h < 0. This condition is equivalent to:
γ˜1,2 (h) = 0 ⇐⇒
∗ ∗ σ11 |h| −|h| 12 22 σ22 J + a ˜ a ˜ J2 1 2 2 1 − J1 J2 − 1 12 21 a ˜ a ˜ |h| −|h| ∗ + σ12 (J1 − J2 ) = 0, ∀h ≤ 0 1 − J1 J2 ∗ ∗ σ11 σ12 ˜11 1−J 2 + 1−J J = 0, a 1 2
a ˜11 a ˜21
1
⇐⇒
σ∗ a + ˜22 J 222 −1 2
∗ σ12 1−J1 J2
(a.3)
=0
We note that the conditions (a.3) are not compatible with the conditions ∗ (a.1)-(a.2). To summarize, if σ12 6= 0, the only identification problem for the mixed process is that the process with all eigenvalues replaced by their inverses discussed in part i) cannot be distinguished.
34
Appendix 4 Covariance Estimators The asymptotic theory of covariance estimators is similar to the asymptotic theory of moment estimators. The main difference is that the covariance condition: Cov(X, Y ) = 0 ⇐⇒ E(XY ) − EXEY = 0, involves both a moment and a product of moments, except if one of the variables has mean zero. This appendix explains how the standard results need to be adjusted for the presence of a product of moments. A.4.1 Expansion of a sample covariance Let us consider a bivariate strictly stationary process (Xt , Zt ), t = 1, ..., T with finite variance. The empirical covariance can be expanded for large T as: √
d T (X, Z) − Cov(X, Z)] T [Cov T T T √ 1X 1X 1X T{ [Xt Zt − E(XZ)] − Xt Zt + EXEZ} = T t=1 T t=1 T t=1 √ =
T{
T T T 1X 1X 1X [Xt Zt − E(XZ)] − (Xt − E(X))EZ − (Zt − EZ)EX} + op (1) T t=1 T t=1 T t=1
T 1 X = √ [(Xt − EX)(Zt − EZ) − Cov(X, Z)] + op (1). T t=1
This expansion can be used to compute the asymptotic variance of an empirical covariance as well as the asymptotic covariance between two sample covariances. For example, If the (Xt , Zt ) are i.i.d. , we get: √ d T (X, Z) − Cov(X, Z)]} = V [(X − EX)(Z − EZ)], Vasy { T [Cov √ √ d T (X, Z) − Cov(X, Z)], T [Cov d T (U, V ) − Cov(U, V )]} = Covasy { T [Cov 35
Cov[(X − EX)(Z − EZ), (U − EU )(V − EV )]. As it is common practice, for serially correlated variables, additional terms need to be introduced for the different lags. A.4.2 The Generalized Covariance (GCov) estimator The definition of the generalized covariance estimator is similar to the GMM estimator [Hansen (1982), Hansen, Singleton (1982)]. By analogy, we proceed in two steps and first define a consistent covariance estimator with a simple weighting scheme. That first step estimator may not be fully (semiparametrically) efficient. In the second step, we select optimal weights, based on the first step consistent estimator. Below, we describe the second step. We denote by φ˜T , the consistent first step estimator of φ = vec Φ0 . Let Y˜t = (Yt0 , ..., Yt−H )0 denote the history of the process at date t up to the highest lag H, and Ψl (Yt , φ), l = 1, ..., KH, the functions ak (Yt−h − ΦYt−h−1 ). With each covariance Cov[Ψk (Yt , φ), Ψl (Yt , φ)], k, l = 1, ..., KH can be associated its sample counterpart evaluated at the first step estimator value, that is: d k (Yt , φ˜T ), Ψl (Yt , φ˜T )], γˆk,l,T = Cov[Ψ (a.1) and this quantity can be expanded about the true value φ, say. We get: √ √ d k (Y˜t , φ˜T ), Ψl (Y˜t , φ˜T )) T γˆk,l,T = T Cov(Ψ √ ∂ d ˜ ˜ ˜ = [− 0 Cov(Ψ k (Yt , φ), Ψl (Yt , φ))]φ=φ˜T T (φ − φT ) + uk,l,T ∂φ √ = Dk,l T (φ − φ˜T ) + uk,l,T , say, ∀(k, l), (a.2) where the error term is such that: Easy (uk,l,T ) = 0, ∀(k, l), Covasy (uk,l,T , uk0 ,l0 ,T ) n√ o √ d k (Y˜t , φ), Ψl (Y˜t , φ)), T Cov(Ψ d k0 (Yt , φ), Ψl (Yt , φ)) = Covasy T Cov(Ψ = σ(k,l),(k0 ,l0 ) , say. 36
The system (a.2) can be written in a vector form as: √
√ T γˆT = DT T (φ − φ˜T ) + uT , (a.3)
with EuT = 0, V ar(uT ) = Σ. We get asymptotically a linear model in parameter φ, which can be estimated by the Feasible Generalized Least Squares. The Feasible GLS estimator is: ˆ −1 DT )−1 DT0 Σ ˆ −1 γˆT , φˆT = φ˜T + (DT0 Σ T T (a.4) ˆ T is a consistent estimator of Σ. Then, we can apply a standard where Σ approach to derive the following proposition [see e.g. Gourieroux, Monfort (1995), Sections 9.5.2, 9.5.3 and Property 9.11 for the analogue GMM]. Proposition: The GCov estimator is consistent and asymptotically normal, with asymptotic variance: √ Vasy [ T (φˆT − φ)] = (D0 Σ−1 D)−1 . The matrices involved in the expression of φˆT and of its asymptotic distribution can be expressed as follows: We have ∂ ∂ Cov[Ψk (Y˜t , φ), Ψl (Y˜t , φ)] = Cov[ 0 Ψk (Y˜t , φ), Ψl (Y˜t , φ)] 0 ∂φ ∂φ +Cov[
∂Ψl ˜ (Yt ; φ), Ψk (Y˜t , φ)]. ∂φ
Since Ψl (Yt , φ) is of the form: Ψl (Y˜t , φ) = ak (Yt−h − ΦYt−h−1 ), the elements σ(k,l),(k0 ,l0 ) can also be written by using the results of Section 1 of this appendix. They are consistently estimated by replacing the theoretical variances by their sample counterparts, the parameters by their estimates and the errors by the associated residuals.
37
5 0 −5 −10 0
200
400
600
800
Figure 1: Simulated Yt : Y1t -solid line, Y2t -dashed line
38
1000
Multivariate Series : y Series 1 and Series 2
−0.5
0.0
−0.4
0.2
−0.3
ACF 0.4
−0.2
0.6
−0.1
0.8
0.0
1.0
Series 1
0
5
10
15
20
25
0
5
10
20
25
20
25
Series 2
−0.2
0.0
0.2
−0.1
0.4
ACF
0.0
0.6
0.8
0.1
1.0
Series 2 and Series 1
15
−25
−20
−15
Lag
−10
−5
0
0
5
10
Lag
15
Figure 2: Autocorrelation Function of Yt
39
Series 1 and Series 2
0.0
−0.4
0.2
−0.3
−0.2
ACF 0.4
0.6
−0.1
0.8
0.0
1.0
Series 1
0
5
10
15
20
0
5
15
20
15
20
Series 2
0.0
−0.06
0.2
−0.04
−0.02
0.4
ACF 0.0
0.6
0.02
0.04
0.8
0.06
1.0
Series 2 and Series 1
10
−20
−15
−10 Lag
−5
0
0
5
10 Lag
Figure 3: Autocorrelation Function of Yt∗
40
0.30 0.25
•
•
•
0.20
•
•
0.15
• •
0.10
density
•
• •
0.05
• •
•
0.0
• ••••••••••••••••••• −15
−10
−5
•
•
•
•
0
••• •••••••••• 5
10
15
epsilon
Figure 4: Empirical Density of GCov Residuals dotted line and True Errors dot symbols (causal component)
41
0.30
•• •
0.25
•
•
•
0.15
• •
0.10
density
0.20
•
• • •
0.05
•
•
0.0
•
•••• −15
−10
• ••••••• −5
•
•
•
•
0
•
••
•••
•••••••••••••
5
10
15
epsilon
Figure 5: Empirical Density of GCov Residuals dotted line and True Errors dot symbols (noncausal component)
42
0.6
5
0.4 0.2
density phi_2
4 3 2 0
0.0
1
density phi_1
0.5
1.0
1.5
−5
0
10
phi_2
0.4 0.0
0.2
density phi_4
2 1 0
density phi_3
3
0.6
phi_1
5
0
1
2
3
0
phi_3
5
10
15
phi_4
Figure 6: GCov Coefficient Estimates: T=100 (dashed line), T=200 (dotted line)
43
20
3.0 2.0 1.0 0.0
density eigenvalue 1
0.0
0.5
1.0
1.5
2.0
0.6 0.4 0.2 0.0
density eigenvalue 2
eigenvalue 1
0
5
10
15
20
eigenvalue 2
Figure 7: GCov Eigenvalues Estimates: T=100 (dashed line), T=200 (dotted line)
44
360 370 380 390
US Corn Futures
8/3/15
10/13/15
12/23/15
3/3/16
5/12/16
3/3/16
5/12/16
850 900 950 1000
US Soybean Futures
8/3/15
10/13/15
12/23/15
Figure 8: Evolution of Commodity Futures
45
−40
−20
0
20
40
60
corn soybean
0
50
100
Figure 9: Adjusted Data
46
150
200
Multivariate Series : y Series 1 and Series 2
−0.2
−0.4
0.0
0.2
−0.2
ACF 0.4
0.0
0.6
0.8
0.2
1.0
Series 1
0
5
10
15
20
0
5
15
20
15
20
Series 2
−0.1
0.0
0.0
0.2
0.4
ACF 0.1
0.6
0.2
0.8
0.3
1.0
Series 2 and Series 1
10
−20
−15
−10 Lag
−5
0
0
5
10 Lag
Figure 10: ACF, Adjusted Data
47
Series 1 and Series 2
−0.1
0.0
0.2
0.0
ACF 0.4
0.1
0.6
0.2
0.8
1.0
Series 1
0
5
10
15
20
0
5
15
20
15
20
Series 2
−0.1
0.0
0.2
0.0
0.4
ACF
0.1
0.6
0.8
0.2
1.0
Series 2 and Series 1
10
−20
−15
−10 Lag
−5
0
0
5
10 Lag
Figure 11: ACF, Squared Adjusted Data
48
Series 1 and Series 2
−0.1
0.0
0.0
0.2
0.1
ACF 0.4
0.2
0.6
0.3
0.8
0.4
1.0
Series 1
0
2
4
6
8
10
0
2
4
8
10
6
8
10
Series 2
−0.1
0.0
0.0
0.2
0.4
ACF 0.1
0.2
0.6
0.3
0.8
0.4
1.0
Series 2 and Series 1
6
−10
−8
−6
Lag
−4
−2
0
0
2
4
Lag
Figure 12: ACF, VAR(3) Residuals
49
Series 1 and Series 2
−0.1
0.0
0.2
0.0
ACF 0.4
0.1
0.6
0.8
0.2
1.0
Series 1
0
2
4
6
8
10
0
2
4
8
10
6
8
10
Series 2
−0.1
0.0
0.2
0.0
0.4
ACF
0.1
0.6
0.8
0.2
1.0
Series 2 and Series 1
6
−10
−8
−6
Lag
−4
−2
0
0
2
4
Lag
Figure 13: ACF, Squared VAR(3) Residuals
50