Marginal Likelihood based Tests on Cointegration in the Engle-Granger Model
February 15, 2007
Marc K. Francke and Aart F. de Vos Vrije Universiteit Amsterdam, Department of Econometrics, De Boelelaan 1105, NL-1081 HV Amsterdam
[email protected] and
[email protected]
Preliminary version, written for the EEA/ESEM Congress, Budapest, Hungary, August 27 - 31, 2007
1
Marginal Likelihood based Tests on Cointegration in the Engle-Granger Model
Abstract In this article we study the virtues of marginal likelihood in bivariate cointegration tests as known from the seminal article of Engle and Granger (1987). These tests are a special case of unit root tests. In Francke and de Vos (2007) we show that some marginal likelihood based unit root tests are almost uniformly most powerful. First we show that in one of the variables is exogenous marginalization leads to very powerful tests. In the bivariate cointegration model two marginal likelihood based approaches are reasonably successful: one based on the residuals and on the marginal likelihood of the bivariate model. Keywords: Hypothesis tests; Unit root.
1
Introduction
In this article we study the virtues of marginal likelihood in bivariate cointegration tests as known from the seminal article of Engle and Granger (1987), henceforth EG. These tests are a special case of unit root tests. In Francke and de Vos (2007) we show that unit root tests based on marginal likelihood perform quite well, specifically in the first-order autoregressive models [AR(1)] with constant and also if a trend is involved, and are almost uniformly most powerful. More complex stochastic structures can also be tackled with marginal likelihood and extension to the linear model is straightforward. We treat the cases where one of the variables is exogenous and the EG example where this is not the case. In case an explanatory variable (x) is exogenous, marginal likelihood is directly applicable. We show that if x follows a random walk the tests perform about as well as in the case without explanatory variables. Specific for this case is that one can do tests conditional on x and –more in line with the cointegration literature– unconditional. The latter appears to perform almost as well as the conditional tests, which has the advantage that the critical values need not be calculated (by simulation) in each case. More complex stochastic structures can be tackled by model choice based on maximum marginal likelihood. Loss of power is unavoidable, but less than other methods. Alternatively the asymptotic distribution of the marginal likelihood ratio –which only depends on one parameter for any model, the ratio of the unconditional and long term variance– may be used to 1
adjust the marginal likelihood ratio. Unfortunately known methods to estimate this parameter are too bad to get reasonable results in samples as big as 100. In case two series x and y cointegrate and none of them is exogenous, marginalization with respect to one of them fails. A reasonable solution is the usual estimation of the cointegration parameter by Ordinary Least Squares (OLS) and application of the univariate marginal likelihood tests on the residuals. The outcomes are better than the usual tests. The most promising result is that the bivariate marginal likelihood exists. This does not directly lead to powerful tests as information on a key parameter (defining the combination that forms the common trend) appears to be weak. For 100 observations, our key example, maximum likelihood estimates of this parameter are not so good. More study is needed. The existence of the likelihood also offers a bridge to Bayesian analysis. In section 2.1 we give a summary of marginal likelihood. Section 2.2 gives the specification suited for unit root tests and details for linear models with AR(1) disturbances. Section 3.1 summarizes the tests for the models AR(1) plus constant and constant plus trend. Section 3.2 provides the results on these models when more complex stochastic structures are involved. Section 4 gives the results for the cointegration context, first the case of an explanatory variable, next the tests based on OLS estimation of the cointegrating relation and finally the bivariate marginal likelihood.
2 2.1
Classical marginal likelihood Marginal likelihood in the general linear model
The concept of classical marginal likelihood was introduced by Kalbfleisch and Sprott (1970). For the linear model it is used in the context of unbalanced incomplete block designs by Patterson and Thompson (1971), who refer to it as the likelihood of error contrasts. The use of the classical marginal likelihood is limited to location and scale parameters and some other applications, which may explain that it remained relatively unknown. The marginal likelihood is the likelihood of a transformation of the data, and it is indepen√ dent of the nuisance parameters. The generally applicable transformation is y ∗ = A0 y/ y 0 AA0 y, leading to ∗
LMβ,σ (θ) := f (y |θ) =
1 Γ( m2 )|X 0 X|1/2 2 π m/2 |X 0 Ω−1 X|1/2 |Ω|1/2
y 0 Ω−1 MXΩ y y 0 MX y
−m/2 ,
(1)
an (m−1) dimensional marginal likelihood, independent of β and σ 2 , where A is an n×m matrix, m = n − k, A0 X = 0, r(A) = m, A0 A = Im , AA0 = MX and MXΩ = I − X(X 0 Ω−1 X)−1 X 0 Ω−1 , see King (1980).
2
The likelihood f (y|θ, σ 2 , β) is divided in two parts, the marginal likelihood and its complement, f (y|θ, σ 2 , β) = f (y ∗ |θ)f (B ∗0 y|θ, σ 2 , β). (2) The claim is that the marginal likelihood contains all information relevant for inference on θ in absence of knowledge on the nuisance parameters. There appears to be no loss of information on θ by using y ∗ in place of y, though it is difficult to give a totally satisfactory justification of this claim, see McCullagh and Nelder (1989). Stated otherwise, the complement of the marginal likelihood, B ∗0 y, contains no information on θ. Bernardo and Smith (1994, p. 481) criticize this point and call it a “highly controversial notion ... for which no operational definition has ever been provided”. King (1980) shows that y ∗ is a maximal invariant under the group of transformations y → η0 y + Xη,
(3)
where η0 is a positive scalar and η is a k × 1 vector. The principle of invariance implies that we can treat the maximal invariant y ∗ as the observed random vector and (1) as its density function, and therefore as a likelihood function for θ, see Rahman and King (1997). We state it as the marginalization axiom.
2.2
Marginal likelihood for linear models with AR(1) disturbances
Consider the first-order autoregressive model yt = µ + x0t β + ut ,
(4)
ut+1 = ρut + vt , t = 1, . . . , n, ( = ξ for ρ = 1, u1 2 2 ∼ N (0, σv /(1 − ρ ) for |ρ| < 1,
(5) (6)
where vt is a potentially serially correlated stationary process with standard normality assumptions, xt is a (k − 1) × 1 vector, and ξ is an unknown scalar. In the model where the vt = εt are independent, for |ρ| < 1, the marginal likelihood (with respect to µ and β) is m = n − k dimensional. When ρ = 1, it is a specification in first differences, yt − yt−1 = (xt − xt−1 )β + εt−1 , for t = 2, . . . , n; after marginalization with respect to β the marginal likelihood is also m dimensional. If we define the following,
3
yt (ρ) xt (ρ) Σy,ρ Σx,ρ µ b ˜ X
= = = = = =
yt − ρyt−1 , Σyy,ρ xt − ρxt−1 , Σxy,ρ n−1 y1 + (1 − ρ)Σt=2 yt + yn , Σxx,ρ n−1 x1 + (1 − ρ)Σt=2 xt + xn , F −1 F Σy,ρ − Σx,ρ Σ−1 βb xx,ρ Σxy,ρ , g(ρ) (In − n1 ii0 )X,
= = = = = =
(1 − ρ2 )y12 + Σnt=2 yt (ρ)2 , (1 − ρ2 )x01 y1 + Σnt=2 xt (ρ)0 yt (ρ), (1 − ρ2 )x01 x1 + Σnt=2 xt (ρ)0 xt (ρ), 0 g(ρ) − (1 − ρ)Σx,ρ Σ−1 xx,ρ Σx,ρ , −1 0 Σ−1 b, xx,ρ Σxy,ρ − (1 − ρ)Σxx,ρ Σx,ρ µ n − (n − 2)ρ,
then the residual sum of squares if provided by b RSSµ,β (ρ) = Σyy,ρ − (1 − ρ)Σy,ρ µ b − Σ0xy,ρ β,
(7)
and for |ρ| < 1 the marginal loglikelihood `Mβ (ρ, σ 2 ) can be expressed as g(ρ) ˜ 0 X| ˜ − ln |X n(1 + ρ) 1−ρ 0 + ln |Σxx,ρ − Σ Σx,ρ | + σ −2 RSSµ,β (ρ). g(ρ) x,ρ
−2`Mβ (ρ, σ 2 ) = m ln 2πσ 2 + ln
(8)
The existence of the marginal likelihood in ρ = 1 follows from the fact that the marginal likelihood of A0 y is proportional to the likelihood of B 0 D0 y, where D is the matrix of first differences and B a matrix of full column rank such that B 0 (D0 X) = 0. Now B 0 D0 y ∼ N (0, B 0 D0 ΩDB), where Ω is the covariance matrix of the AR(1) process, and D0 ΩD is well defined for −1 < ρ ≤ 1. The difference between the marginal likelihood and the likelihood of B 0 D0 y is a Jacobian term ˜ 0 X|−ln ˜ ln |Σxx,1 |−ln |X n. Note that for ρ = 1 this transformation implies that the loglikelihood 0 0 of B D y equals m ln 2πσ 2 + σ −2 RSSµ,β (1). 2 2 Substitution of σ bML = RSSµ,β (ρ)/m in (8) gives `Mβ (ρ, σ bML ) ∝ `Mβ,σ (ρ), the marginal likelihood for ρ. Unlike the profile likelihood, this marginal likelihood is also defined for and continuous in ρ = 1, and suited as a basis for unit root tests.
3 3.1
Unit root tests Some basic models
Many unit root tests are described in the econometric literature for the model with a constant, and for the model with a constant and linear trend. The classics are those developed in Fuller (1976) and Dickey and Fuller (1979). More recent tests are e.g. the PT test by Elliott, Rothenberg, and Stock (1996), the QT tests by Elliott (1999). We propose two new ones, based on marginal likelihood. Define ρbML = arg max `Mβ,σ (ρ), the maximum marginal likelihood estimator of ρ. The first test is a marginal loglikelihood 4
difference test, evaluated in ρbML , and is given by T1 = `Mβ,σ (b ρML ) − `Mβ,σ (ρ = 1).
(9)
T2 = γ bML = n(1 − ρbML ),
(10)
The second test is
where the local-to-unit root format is used in order to have a useful asymptotic framework to analyse power. In both tests the null hypothesis of a unit root is rejected if the test statistic exceeds the critical value, depending on the size α and the number of observations n. Test T1 is based on the marginal likelihood ratio (MLR). In terms of γ = n(1 − ρ) the logarithm of the MLR is provided by RSSµ,β (γ) 1 ln h(γ) + m ln , (11) ln(MLR(γ)) = − 2 RSSµ,β (γ = 0) where h(γ) is the ratio of the determinant terms |X 0 X| , |X 0 Ω−1 X|, and |Ω| under the null and alternative hypothesis. Test T1 can equivalently be formulated as T1 = ln MLR(b γML ). T1 is an optimal invariant procedure: it depends on a function of a maximal invariant, see Lehmann (1986). In Francke and de Vos (2007) critical values are calculated for the model with constant and the model with constant and trend. These values are provided in tables 1 and 2. It is also shown that the power functions almost coincide with the power envelope, even in small sample sizes (n = 25).
3.2
Serial correlation
Until now we have explored tests for unit roots under the assumption that the error term vt in (5) is serially uncorrelated. In practice in many cases this assumption is not valid. It is well known that misspecification of vt has serious consequences for unit root tests. Notoriously difficult is the ARMA(1,1) (autoregressive moving average) model, where vt = εt − θεt−1 . Applying the AR(1) marginal likelihood based tests when data is generated by an ARMA(1,1) process, results in very serious size distortions. This is already the case in the model without explanatory variables. For instance, the MLR test T1 when θ = 0.5 and n = 100, has a size of 0.62 when the 5 percent critical value (1.799 from table 1) is used. There are two ways to cope with serial correlation in vt within the context of marginal likelihood. The first is to apply standard likelihood methods for model selection and estimation. The marginal likelihood is well defined for relevant models. If in (5) vt is a stationary ARMA(p,q) process, the marginal likelihood is also defined when ρ = 1, as ∆yt = ∆xt β + ∆ut , and ∆ut = (1 − L)/(1 − ρL)vt is stationary for −1 < ρ ≤ 1. Computations can be done efficiently by the exact initial Kalman filter, see Koopman (1997) and the diffuse Kalman filter, see 5
De Jong (1988) and De Jong (1991). Numerical problems in the computations of the Kalman filter for ρ near 1 can be avoided by formulating the model directly in first differences. Model selection can be done by the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC) or other criteria. Conditional on the chosen model a MLR test may be applied on the unit root hypothesis. The second approach is based on the asymptotic distribution of the MLR (11) under serial correlation. The asymptotically relevant parameters are the long term variance ω 2 , and the unconditional variance of vt , γv (0). In Francke and de Vos (2007) it is shown that asymptotically, apart from γ, the MLR only depends on one parameter, κ = ω 2 /γv (0). Moreover, it is shown for the model with constant (and trend) that RSSµ,β (γ) −1 + γ − γ, (12) m ln ln h(γ) + κ RSSµ,β (γ = 0) converges to the distribution that is known from the AR(1) model where the vt are i.i.d. Gaussian with variance σ 2 . Consequently, for these models the adjusted residual sum of squares may be used in combination with the critical values computed for the marginal likelihood tests without correlation. So, asymptotically, procedures based on estimation of κ will be robust. Estimation of κ is consistently possible, though outside the likelihood context, by ordinary least squares on ∆yt = α + δ0 yt−1 +
x0t β
+
l X
δi ∆yt−i + ηt ,
(13)
i=1
where l is the number of lags, following the approach of Elliott, Rothenberg, and Stock (1996). It follows that !−2 n n l 2 2 X X X ω ˆ 2 b ˆ κ b= ∆yt − α ˆ − δ0 yt−1 , (14) = ηˆ 1 − δi / γˆv (0) i=l+2 t i=1 i=l+2 where ω ˆ 2 is an estimate of the spectral density at frequency 0, and γˆv (0) is an estimate of the unconditional variance of vt . Lag length selection (l) can be done by information criteria. Ng and Perron (2001) show that AIC and BIC tend to select a lag that is too small, and they propose a modified information criterion. Alternatively nonparametric methods using for instance a Bartlett window can be applied to obtain an estimate of ω ˆ 2 . Conditional on the estimate of κ the maximum marginal likelihood estimator of γ can be computed from (12).
6
4 4.1
Cointegration tests Unit root tests with a random walk as explanatory variable
The model yt = µ + xt β + ut ,
(15)
xt = xt−1 + ηt ,
(16)
where ηt ∼ N (0, ση2 ), and ut specified as in (5) and (6), is a special case of cointegration: xt is exogenous. For ρ = 1 the series do not cointegrate, for ρ < 1 they do. In some applications of the cointegration methodology of EG it seems reasonable to assume that xt is exogenous. Like income for consumption; though it may be argued that there is feedback as well, this is weak. We will show that marginalization with respect to x works quite well in this context and compare it with the methodology of EG. In later sections we will show however that in the case of cointegration as defined in EG, where x is not exogenous, marginalization is a bad idea. The basics of a test on exogeneity in the EG context will be given at the end of this paper. It is straightforward to obtain a test for ρ = 1 conditional on xt based on the marginal likelihood in (8). This is a slightly different context as inference is conditional on x. An important advantage of conditioning is that inference on ρ no longer depends on the correct model specification of x. A disadvantage is that for each realization of x a new critical value of the test must be computed. We found however that as long as x is generated as a random walk, the critical values hardly depend on the specific realization of x. This implies that for most practical purposes one may use the unconditional critical values, that we computed by generating a new x in each draw. This confirms the validity of an unconditional setup as used by EG. The marginal likelihood based tests (9) and (10) have been presented in the previous section for the model with constant (and trend). We use the same tests for the case that xt is a random walk. Critical values are computed from the values of T1 and T2 , generating xt and ut as independent random walks, different in each replication. Table 3 provides critical values for the T1 , T2 , DW, and DF statistics for n =100 and 1,000, based on 100,000 replications. Note that the marginal likelihood is independent of the variance ση2 . Though in general keeping xt fixed or not does not matter, one may still wonder whether characteristics of x may be reason to use the unconditional test. Specifically “trending” behavior of x deserves attention. The critical values obtained when x is a linear trend differ significantly from those obtained for x is a random walk, see tables 2 and 3. In fact the latter are rather close to those obtained when x is a constant, compare tables 1 and 3. To explore this further we investigate whether “trending” random walks are a reason to switch to unconditional tests. We select from 100 different random walks with n = 100 observations 5 typical ones, based 7
on the absolute difference between x1 and x100 . For the 5 random walks these differences are 0.03, 3.3, 6.9, 11.7, and 34.4, respectively. The first case resembles a realization of the AR(1) model with a constant, and the last one a realization of the AR(1) model with a constant and linear trend. For the 5 cases the power of the marginal likelihood tests is considered for γ = 0, 5, 10, 15, and 20. The results from 50,000 replications are shown in table 4. In the left part of the table critical values for n =1,000 are used as an approximation for asymptotic critical values. In the right part the critical values for the specific random walks are used. Based on table 4 it can be concluded that test T1 hardly has size distortion. Test T2 however shows size distortion that increases with |x1 − x100 |. It can be concluded that if x is specified as a random walk, the unconditional tests may be used, with T1 as the preferred test. Note that “trending” behavior of the random walk results in loss of power. Another source of loss of power may be the specification of the disturbance terms vt in (5) and ηt in (16). We explore test results for vt = εt − θεt−1 ,
(17)
η = ξt − θx ξt−1 ,
(18)
and resulting in the ARMA(1,1) model, yt = µ + x0t β + ut , ut+1 = ρut + εt − θεt−1 , where xt+1 = xt + ξt − θx ξt−1 .
(19)
We explore different marginal likelihood based tests and compare the results with the DurbinWatson and Augmented Dickey-Fuller tests. We have 4 different data generating processes, (1) θ = 0, θx = 0, (2) θ = 0, θx = 0.8, (3) θ = 0.5, θx = 0, and (4) θ = 0.5, θx = 0.8. The first group of tests is T1 and T2 , as described in the previous section. The evaluation model is the AR(1) model without serial correlation. In the second group of tests the evaluation model is the ARMA(1,1) model. Conditional on xt in state-space format the model in first differences is provided by ∆yt = 1 0 0 αt + ∆xt β, (20) ρ 1 0 1 αt+1 = 0 0 1 αt + − (1 + θ) εt , (21) 0 0 0 θ for t = 2, . . . , n, and α1 ∼ N (0, σ 2 P1 ) where 2(θ2 + θ + 1 − ρθ)/ (1 + ρ) −(1 + θ)2 + θρ −θ P1 = −(1 + θ)2 + θρ 1 + 2θ + 2θ2 −θ (1 + θ) . −θ −θ (1 + θ) θ2 8
(22)
The marginal likelihood is computed by the Kalman filter and depends on 2 parameters, ρ and θ. The maximum marginal likelihood estimator under the null and alternative hypotheses can be substituted in for θ. Unit root tests are applied, based on the marginal likelihood ratio ˆ ˆ (T1θML ), and on the difference between ρbML and 1 (T2θML ), where ρbML and θˆML are maximum marginal likelihood estimators from model (20)–(22). The third and fourth group of tests are based on the adjusted marginal likelihood ratio (12). The tests T1κ and T2κ are based on knowledge of the true κ = (1 − θ)2 /(1 + θ2 ). In the tests T1κˆ and T2κˆ κ is estimated by (13) and (14). The lag length l is determined by BIC with a maximum lag of 4. For all marginal likelihood tests as critical values the 5 percent values from table 3 are used. The critical values for the Durbin-Watson and Augmented Dickey-Fuller tests (ADF) are also provided in 3. Lag length selection in the Dickey-Fuller test is also done by BIC with a maximum lag of 4. Table 5 provides power functions of the different tests for the four data generating processes for n = 100 observations based on 5,000 replications. The results for θx = 0 compared with θx = 0.8 are approximately the same, except for the tests Tiκˆ and DW. The DW test suffers from severe size distortion when θx = 0.8. For the other tests the power is not much influenced by the difference between a random walk and a more general I(1) process. When θ = 0 and θx = 0 there is hardly any size distortion in any test. When θ = 0 the ˆ tests Ti outperform all other tests, including the tests TiθML ; assuming the “right” model results ˆ in higher power. The tests TiθML are second best; the loss in power relative to Ti is relatively small. When θ = 0.5 especially the tests Ti and DW suffer from severe size distortion (0.68), but ˆ also the tests Tiκˆ and ADF have large size distortions (0.30). The tests TiθML and Tiκ perform ˆ much better, although there is a reduction in power compared to θ = 0. In specific for T1θML the size distortion is small. ˆ The general conclusion is that the test T1θML performs best in this example. If the true value of κ is known, the adjusted marginal likelihood ratio tests Tiκ are a good alternative. There are possibilities for improvement in the performance of test Tiκˆ , specifically in the estimation procedure for κ.
4.2
Marginalization in the cointegration model
The bivariate model, where x and y are correlated, is the real EG example. The model is provided by yt + αxt = µ + ut ,
ut = ρut−1 + vt ,
(23)
yt + βxt = dt ,
dt = dt−1 + ηt ,
(24)
9
where the initial conditions for u1 and d1 are as specified in (6). The model where xt is exogenous, is a special case of (23) and δyt + xt = dt for δ = 0. One might consider to test for ρ = 1 from the first relation, like in the last section, by marginalization with respect to x. However, as can be seen from the reduced form of xt and yt , α β (µ + ut ) − dt , β−α β−α 1 1 (µ + ut ) + dt , xt = − β−α β−α yt =
(25) (26)
xt is a weighted average of ut and dt . Marginalization with respect to x leads to dramatic results, as shown in table 6. The size of the resulting tests are reasonable, but the power is in our leading example as bad as 0.18 for ρ = 0.8. The explanation is that marginalization with respect to xt also removes a part of ut , as can be seen from the reduced form Eq. (26). The logical alternative is to estimate α by OLS, the well known consistent procedure, and to use the residuals r = yt + α ˆ xt for the unit root test in the AR(1) plus constant model. These tests are denoted by Ti (r), for i = 1, 2. Because α is estimated, the critical values for these marginal likelihood based tests differ from those in table 3. Critical values for the test Ti (r) are provided in table 7 for n = 100 and n = 1, 000. Tables 8 and 9 show the results from a simulation study. Power functions are provided for different marginal likelihood based cointegration tests and the Durbin-Watson (DW) and augmented Dickey-Fuller (ADF) test. Table 8 contains simulation results for n = 100 and table 9 for n = 1, 000. The data generating process is given by (17) and (23)–(24), where α = 2, β = 1, and the ratio σv /ση = 1. In the upper part of the tables ut is AR(1), in the lower part of the tables ut is ARMA(1,1), where θ = 0.5 in (17). In the tests Ti (r) the evaluation model is ˆ AR(1), in the tests TiθML (r) the evaluation model is ARMA(1,1). In the tests Tiκ (r) and Tiκb (r) the adjusted marginal likelihood (12) is used. The superscript κ means that κ is known and κ b means that κ is estimated by (14). For n = 100 and θ = 0 there is almost no size distortion in any test. The tests Ti (r) perform best, slightly better than the Durbin Watson test DW . There is a small loss in power when θ or κ are estimated. The augmented Dickey-Fuller ADF test performs worst. The results for n = 1, 000 hardly differ from the results for n = 100. If the data generating process is ARMA, the results of the marginal likelihood tests Ti (r), both for n = 100 and n = 1, 000, are dramatic; like the DW test, they have severe size ˆ distortion. Test T1θML (r) has almost no size distortion, but the power for γ = 20 and n = 100 ˆ drops down from 0.518 to 0.225 compared to θ = 0. Test T2θML (r) has a small size distortion. ˆ Both tests TiθML (r) have no size distortion when n = 1, 000 and the power increases from 0.225 for n = 100 to 0.764 for n = 1, 000. The tests Tiκ (r) are undersized and like the ADF test the Tiκb (r) tests have moderate size distortion. 10
These results are all in agreement with our results on univariate models. The fact that α has to be estimated just leads to considerable loss in power for all tests. Note that in case of a null hypothesis of α, which often comes from economic theory, a sequential procedure may be followed. First a test on α can be performed, and then if the null hypothesis α0 is accepted, a marginal likelihood based test on yt + α0 xt .
4.3
Bivariate marginal likelihood in the cointegration model
A bivariate likelihood approach requires, not surprisingly, use of the specification of dt . This was not used in the methods of the previous section. It is assumed that ut and dt are independent. The marginal likelihood of (x, y) is proportional to the likelihood of the first differences. Eq. (23) and (24) in first differences are ∆yt + α∆xt = ∆ut ,
(27)
∆yt + β∆xt = ηt .
(28)
Because ∆u and η are independent variables, the likelihood of (∆y, ∆x) is provided by ln f (∆y, ∆x|α, β, ρ, ση , σv ) = (n − 1) ln |α − β| + ln f (∆y + α∆x, ∆y + β∆xt |α, β, ρ, ση , σv ) = (n − 1) ln |α − β| + ln f (∆u, η|α, β, ρ, ση , σv ) = (n − 1) ln |α − β| + ln f (∆u|α, ρ, σv ) + ln f (η|β, ση ),
(29)
and is proportional to the marginal likelihood of (x, y). Spectacular are the results when β is assumed known, see table 10. Noteworthy is that neither the model for the second equation nor its fit matters. The unit root test is just based on the likelihood of the first equation (27) with a penalty (n − 1) ln |α − β|. To estimate β there are some alternatives. One is estimation of β with OLS on the first differences. This is in general not a good idea as ∆xt is not exogenous, but in our example the resulting test gives reasonable results; about the same quality as in table 8, the tests based on the OLS estimates of α. Another idea is to compute the β that gives zero first order autocorrelation in ∆yt + β∆xt . This is a consistent procedure, while the OLS estimation is not. Another option is to optimize the likelihood (29) with respect to α, β, ρ, σv and ση . Let α ˆ ML , βˆML , ρˆML , σ ˆv, ML and σ ˆη, ML denote the maximum likelihood estimators of α, β, ρ, σv and ση respectively, then TiF are marginall likelihood based tests on cointegration in the bivariate model, given by T1F = ln f (∆y, ∆x|ˆ αML , βˆML , ρˆML , σ ˆv, ML ) − ln f (∆y, ∆x|ˆ αML , βˆML , ρ = 1, σ ˆv, ML ),
(30)
T2F = n(1 − ρˆML ).
(31)
11
Note that for ρ = 1, α and β are not identified. This is no problem as any β gives the same result of the marginal likelihood ratio test. In optimization procedures of the bivarite marginal likelihood function however it requires some care to avoid numerical problems. In table 11 critical values for tests TiF are given, based on 10, 000 replications. Table 12 provides power functions for the TiF tests both for n = 100 and n = 1, 000. The data generating process is the same as in the previous section (θ = 0). The results of the TiF tests are better than the tests Ti (r), see tables 8 and 9. The power increases from 0.673 to 0.747 for n = 100 and γ = 20 and from 0.685 to 0.762 for n = 1, 000 and γ = 20. In table 13 power functions are provided for a different data generating process; the ratio of σv /ση = 9, α = 2 and β = −7. The residual based tests Ti (r), the DW and ADF tests don’t work at all. The size of TiF is too small, but the power is is quite well. It seems that the critical values of the tests TiF depend on the value of α, β and σv /ση . Finally, the bivariate likelihood offers a way to test for exogeneity of x. Normalizing the “common trend” equation as δyt + xt = dt gives no problem; the Jacobian becomes |1 − δα| and the hypothesis δ = 0 can be tested. If the hypothesis is accepted, marginalization of the unit root tests in the cointegrating relation is allowed.
5
Conclusion
Marginal likelihood based tests offer new opportunities for tests on bivariate cointegration. If one of the variables is exogenous, which can be tested for, very powerful tests are available that marginalize with respect to the exogenous variable. In the bivariate case the results are mixed, but two group of tests perform reasonably well, and are more robust than the standard Durbin-Watson and augmented Dickey-Fuller tests. The first uses the OLS residuals and applies the tests described in Francke and de Vos (2007). The second uses the complete likelihood. This works well in the pure AR(1) model, but more complex correlation structures still has to be dealt with.
12
References Bernardo, J. M., and A. F. M. Smith. 1994. Bayesian Theory. John Wiley, New York. De Jong, P. 1988. “The Likelihood for a State-Space Model.” Biometrika 75:165–169. . 1991. “The Diffuse Kalman Filter.” The Annals of Statistics 2:1073–1083. Dickey, D. A., and W. A Fuller. 1979. “Unit Roots in Time Series Models: Tests and Implications.” Journal of the American Statistical Association 74:427–431. Elliott, G. E. 1999. “Efficient Tests for a Unit Root When the Initial Observation is Drawn from its Unconditional Distribution.” International Economic Review 40:767–783. Elliott, G. E., T. J. Rothenberg, and J. H. Stock. 1996. “Efficient Tests for an Autoregressive Unit Root.” Econometrica 64:813–836. Engle, F. E., and C. W. J. Granger. 1987. “Co-integration and Error Correction: Representation, Estimation, and Testing.” Econometrica 55:251–276. Francke, M. K., and A. F. de Vos. 2007. “Marginal Likelihood and Unit Roots.” Journal of Econometrics. to appear. Fuller, W. A. 1976. Introduction to Statistical Time Series. John Wiley, New York. Kalbfleisch, J. D., and D. A. Sprott. 1970. “Application of Likelihood Methods to Models Involving Large Numbers of Parameters.” Journal of the Royal Statistical Society B 32:175–208. King, M. L. 1980. “Robust Tests for Spherical Symmetry and their Application to Least Squares Regression.” The Annals of Statistics 8:1630–1638. Koopman, S. J. 1997. “Exact Initial Kalman Filtering and Smoothing for Nonstationary Time Series Models.” Journal of the American Statistical Assocation 92:1630–1638. Lehmann, E. L. 1986. Testing Statistical Hypotheses. 2. John Wiley, New York. McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2. London: Chapman & Hall. Ng, S., and P. Perron. 2001. “Lag Length Selection and the Construction of Unit Root Tests with Good Size and Power.” Econometrica 71:1519–1554. Patterson, H. D., and R. Thompson. 1971. “Recovery of Inter-Block Information When Block Sizes are Unequal.” Biometrika 58:545–554. Rahman, S., and M. L. King. 1997. “Marginal-Likelihood Score-Based Tests of Regression Disturbances in the Presence of Nuisance Parameters.” Journal of Econometrics 82:81–106.
13
Table 1: Critical values for unit root tests in T1 = `Mβ,σ (b ρML ) − `Mβ,σ (1) α\n 50 100 250 1000 0.01 3.189 3.213 3.224 3.236 0.025 2.375 2.399 2.408 2.420 0.05 1.784 1.799 1.803 1.811 0.10 1.208 1.221 1.227 1.229
the AR(1) model with constant. T2 = n(1 − ρbML ) 50 100 250 1000 16.4 17.0 17.3 17.5 13.1 13.5 13.7 13.8 10.6 10.8 10.9 11.0 8.0 8.1 8.1 8.1
Table 2: Critical values for unit root tests in the AR(1) model with constant and trend. T1 = `Mβ,σ (b ρML ) − `Mβ,σ (1) T2 = n(1 − ρbML ) α\n 50 100 250 1000 50 100 250 1000 0.01 3.102 3.134 3.136 3.143 21.9 23.0 23.6 24.0 0.025 2.284 2.310 2.316 2.321 18.3 19.0 19.4 19.6 0.05 1.692 1.712 1.716 1.719 15.4 15.9 16.2 16.3 0.10 1.120 1.136 1.137 1.141 12.3 12.6 12.7 12.8
Table 3: Critical values for unit root tests in the AR(1) model with a random walk as explanatory variable. T1 T2 DW DF α\n 100 1000 100 1000 100 1000 100 1000 0.01 3.199 3.253 17.8 17.7 50.9 55.6 3.992 3.930 0.025 2.386 2.421 14.1 14.0 43.8 46.8 3.665 3.611 0.05 1.792 1.818 11.2 11.0 38.1 40.4 3.399 3.343 8.2 31.8 33.5 3.091 3.052 0.10 1.210 1.232 8.3
Table 4: Power functions for unit root as explanatory variable. x \ γ = n(1 − ρ) 0 5 T1 1 0.050 0.187 2 0.050 0.186 3 0.048 0.187 4 0.048 0.179 5 0.047 0.153 T2 1 0.050 0.184 2 0.052 0.190 3 0.048 0.183 4 0.059 0.210 5 0.079 0.222
tests in the AR(1) model with 5 different random walks 10 0.499 0.490 0.504 0.454 0.387 0.495 0.499 0.493 0.504 0.503
15 0.813 0.799 0.820 0.747 0.676 0.815 0.811 0.818 0.797 0.793
14
20 0.958 0.948 0.961 0.917 0.876 0.963 0.958 0.966 0.945 0.950
0 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050
5 0.189 0.187 0.194 0.186 0.163 0.184 0.183 0.189 0.181 0.142
10 0.503 0.492 0.517 0.466 0.408 0.495 0.485 0.505 0.455 0.363
15 0.815 0.800 0.830 0.758 0.696 0.815 0.800 0.827 0.756 0.661
20 0.959 0.948 0.965 0.921 0.886 0.963 0.953 0.969 0.931 0.886
Table 5: Power functions for cointegration tests for n = 100, where xt is an exogenous variable. γ = n(1 − ρ) DGP T1 T2 ˆ T1θML ˆ T2θML T1κ T2κ T1κb T2κb DW ADF
0 θ=0 0.051 0.051 0.053 0.061 0.051 0.051 0.030 0.030 0.044 0.061
DGP T1 T2 ˆ T1θML ˆ T2θML T1κ T2κ T1κb T2κb DW ADF
θ = 0.5 and θx 0.658 0.975 0.683 0.979 0.046 0.145 0.087 0.257 0.025 0.088 0.031 0.111 0.300 0.664 0.313 0.677 0.686 0.950 0.222 0.407
5 10 and θx = 0 0.182 0.506 0.175 0.491 0.172 0.441 0.197 0.492 0.182 0.506 0.175 0.491 0.081 0.243 0.077 0.232 0.111 0.286 0.116 0.235 =0 0.999 1.000 0.308 0.503 0.272 0.313 0.881 0.896 0.998 0.656
15
20
0.809 0.802 0.705 0.761 0.809 0.802 0.460 0.442 0.553 0.471
0.951 0.960 0.864 0.913 0.951 0.960 0.640 0.633 0.815 0.713
0 θ=0 0.050 0.046 0.050 0.055 0.050 0.046 0.012 0.009 0.223 0.056
5 10 and θx = 0.8 0.188 0.526 0.170 0.485 0.182 0.464 0.189 0.488 0.188 0.526 0.170 0.485 0.053 0.200 0.046 0.177 0.231 0.360 0.120 0.252
1.000 1.000 0.447 0.686 0.506 0.575 0.953 0.967 1.000 0.836
1.000 1.000 0.505 0.773 0.706 0.778 0.980 0.987 1.000 0.932
θ = 0.5 and θx 0.637 0.979 0.640 0.978 0.044 0.156 0.074 0.235 0.028 0.107 0.031 0.117 0.238 0.635 0.234 0.638 0.723 0.956 0.156 0.345
= 0.8 1.000 1.000 0.352 0.501 0.340 0.361 0.892 0.894 0.999 0.606
15
20
0.840 0.812 0.738 0.769 0.840 0.812 0.427 0.401 0.600 0.465
0.972 0.970 0.896 0.923 0.972 0.970 0.652 0.628 0.827 0.704
1.000 1.000 0.511 0.696 0.641 0.679 0.967 0.971 1.000 0.810
1.000 1.000 0.569 0.792 0.856 0.894 0.991 0.993 1.000 0.916
ˆ
Ti is the test given by (9) and (10). TiθML is the test, where γ and θ are estimated from (20)–(22). Tiκ is the test based on the adjusted marginal likelihood ratio (12), using the true value of κ. Tiκb is the test based on the adjusted marginal likelihood ratio (12), where κ is estimated by (14). The data generating process is provided by (4)–(6), (17) and (19), σε2 = 1, and β = 5, xt+1 = xt + ξt − θx ξt−1 , σξ2 = 1, n = 100.
Table 6: Power functions for cointegration tests for n = 100. γ = n(1 − ρ) 0 5 10 15 20 T1 0.052 0.074 0.124 0.146 0.176 T2 0.054 0.076 0.116 0.144 0.172 Ti is the test given by (9) and (10). The data generating process is provided by (17) and (23)–(24), σε = ση = 1, α = 2, θ = 0 and β = 1.
15
Table 7: Critical values for bivariate residual based marginal likelihood cointegration tests. T1 (r) T2 (r) α\n 100 1000 100 1000 0.01 4.75 4.81 23.27 24.31 0.025 3.85 3.89 19.70 20.16 0.05 3.12 3.10 16.56 17.11 0.10 2.35 2.34 13.38 13.75 Tests Ti (r) are based on r = y + α ˆ x, where α is estimated by OLS on y = −αx + e. The data generating process is given below table 6. The critical values are based on 10, 000 replications.
Table 8: Power functions for residual based cointegration γ = n(1 − ρ) 0 5 10 15 DGP θ=0 T1 (r) 0.051 0.086 0.197 0.416 0.052 0.087 0.200 0.417 T2 (r) θˆML T1 (r) 0.047 0.079 0.164 0.328 θˆML T2 (r) 0.066 0.111 0.227 0.425 T1κ (r) 0.051 0.086 0.197 0.416 κ 0.052 0.087 0.200 0.417 T2 (r) T1κb (r) 0.064 0.099 0.213 0.420 κ b T2 (r) 0.066 0.101 0.215 0.424 0.053 0.087 0.197 0.409 DW 0.051 0.071 0.144 0.284 ADF DGP T1 (r) T2 (r) ˆ T1θML (r) ˆ T2θML (r) T1κ (r) T2κ (r) T1κb (r) T2κb (r) DW ADF
θ = 0.5 0.560 0.568 0.043 0.090 0.015 0.015 0.268 0.269 0.580 0.180
0.878 0.883 0.068 0.165 0.040 0.040 0.500 0.501 0.887 0.319
0.990 0.990 0.126 0.303 0.126 0.126 0.765 0.762 0.989 0.525
0.999 0.999 0.189 0.458 0.291 0.296 0.909 0.910 0.999 0.721
tests for n = 100. 20 0.673 0.677 0.518 0.635 0.673 0.677 0.660 0.662 0.668 0.494
0.999 0.999 0.225 0.568 0.516 0.529 0.964 0.966 1.000 0.850
Tests Ti (r) are based on r = y + α ˆ x, where α is estimated by OLS on y = −αx + e. The data generating process is provided by (17) and (23)–(24), σε = ση = 1, α = 2 and β = 1. Power functions are based on 10, 000 replications.
16
Table 9: Power functions for residual based γ = n(1 − ρ) 0 5 DGP θ=0 T1 (r) 0.055 0.094 0.053 0.092 T2 (r) θˆML T1 (r) 0.056 0.089 θˆML T2 (r) 0.054 0.089 κ 0.055 0.094 T1 (r) κ T2 (r) 0.053 0.092 0.054 0.093 T1κb (r) κ b T2 (r) 0.052 0.092 DW 0.051 0.085 ADF 0.054 0.074 DGP T1 (r) T2 (r) ˆ T1θML (r) ˆ T2θML (r) T1κ (r) T2κ (r) T1κb (r) T2κb (r) DW ADF
θ = 0.5 0.615 0.625 0.052 0.056 0.018 0.017 0.111 0.116 0.625 0.091
0.935 0.933 0.116 0.122 0.080 0.078 0.245 0.244 0.926 0.167
cointegration tests for n = 1, 000. 10 15 20 0.219 0.206 0.209 0.216 0.219 0.206 0.220 0.207 0.197 0.156
0.432 0.420 0.422 0.422 0.432 0.420 0.434 0.420 0.394 0.315
0.687 0.683 0.670 0.669 0.687 0.683 0.685 0.682 0.649 0.539
0.996 0.996 0.282 0.294 0.228 0.223 0.502 0.500 0.995 0.354
1.000 1.000 0.515 0.531 0.477 0.466 0.765 0.760 1.000 0.578
1.000 1.000 0.764 0.779 0.745 0.744 0.927 0.925 1.000 0.802
Tests Ti (r) are based on r = y + α ˆ x, where α is estimated by OLS on y = −αx + e. The data generating process is provided by (17) and (23)–(24), σε = ση = 1, α = 2 and β = 1. Power functions are based on 2, 500 replications.
Table 10: Power functions for cointegration tests for n = 100, when β is known. γ = n(1 − ρ) 0 5 10 15 20 β T1 (x, y) 0.076 0.246 0.592 0.862 0.972 β T2 (x, y) 0.062 0.194 0.548 0.828 0.964 T1β (x, y) is based on (29), when β is known. The data generating process is provided by (17) and (23)–(24), σε = ση = 1, α = 2 and β = 1. Power functions are based on 10, 000 replications.
17
Table 11: Critical values for bivariate marginal likelihood cointegration tests. T1F T2F α\n 100 1000 100 1000 0.01 5.39 5.24 25.38 26.37 0.025 4.35 4.33 21.74 22.19 0.05 3.68 3.52 18.74 18.71 0.10 2.91 2.82 15.62 15.63 Tests TiF are given by (30) and (31). The data generating process is provided by (17) and (23)–(24), σε = ση = 1, α = 2 and β = 1. The critical values are based on 10, 000 replications.
Table 12: Power functions for cointegration tests. γ = n(1 − ρ) 0 5 10 15 F n = 100 T1 0.048 0.096 0.233 0.478 T2F 0.049 0.097 0.232 0.477 F 0.056 0.112 0.254 0.503 n = 1, 000 T1 F T2 0.056 0.114 0.252 0.490
20 0.747 0.745 0.762 0.756
Tests TiF are given by (30) and (31). The data generating process is provided by (17) and (23)–(24), σε = ση = 1, θ = 0, α = 2 and β = 1. Power functions are based on 2, 500 replications.
Table 13: Power functions for cointegration tests for a γ = n(1 − ρ) 0 5 10 T1 (r) 0.050 0.043 0.039 T2 (r) 0.052 0.045 0.042 F T1 0.007 0.030 0.127 T2F 0.008 0.030 0.127 DW 0.054 0.047 0.044 ADF 0.052 0.043 0.040
different DGP, when n = 100. 15 20 0.040 0.044 0.041 0.046 0.360 0.670 0.358 0.672 0.044 0.048 0.038 0.040
Tests Ti (r) are based on r = y + α ˆ x, where α is estimated by OLS on y = −αx + e. Tests TiF are given by (30) and (31). The data generating process is provided by (17) and (23)–(24), σε = 9, ση = 1, θ = 0, α = 2 and β = −7. Power functions are based on 10, 000 replications.
18