Likelihood-Based Recursive Tests of the Adaptive Learning ... - Unibo

0 downloads 0 Views 410KB Size Report
ADepartment of Statistical Sciences, University of Bologna, via Belle Arti 41, ..... (γ: δ: κ)/ is not known and must be inferred from the data, it is clear that also the ...
Likelihood-Based Recursive Tests of the Adaptive Learning Hypothesis Luca Fanelli∗ March 2008

Abstract We propose likelihood-based recursive tests for the cross-equation restrictions that the class of forward-looking models typically used in monetary policy imposes on vector autoregressive (VAR) systems, under the adaptive learning hypothesis. As the information set increases over time and estimates are updated recursively, the test for the adaptive learning hypothesis amounts to a sequence of likelihood ratio (LR) statistics obtained by comparing the likelihoods of the unrestricted and constrained VAR. We show through simulation experiments that, in order to control the null hypothesis over the entire sequence, a proper set of critical values can be opportunely adapted from the theory of Inoue and Rossi (Journal of Business and Economic Statistics 23, 2005, pp. 336-345), also obtaining satisfactory power against backward-looking alternatives, in finite samples. The proposed method is applied to investigate the New Keynesian Phillips Curve (NKPC) on euro area data. The results show that, while the NKPC is sharply rejected under the rational expectations hypothesis, the nonlinear restrictions implied by the model tend to be supported over large parts of the monitoring period when the LR tests are recursively calculated under the adaptive learning hypothesis. Keywords: Adaptive Learning Hypothesis, Cross-equation Restrictions, New Keynesian Phillips Curve, Recursive Estimation, Sequential Test, VAR. J.E.L. Classification: C32, C52, D83, E10 ∗

Department of Statistical Sciences, University of Bologna, via Belle Arti 41, I-40126 Bologna, Italy. e-mail:

[email protected], ph: +39 0541434303, fax: +39 051 232153.

1

1

Introduction

The class of small-scale dynamic stochastic general equilibrium models that presently dominates the debate in monetary policy, is built under the rational expectations hypothesis (REH). Following Muth (1961), this means that rational agents compute expectations from the data generating process. The empirical implications of the REH have been investigated in many fields of macroeconomics and finance (Sargent, 1979; Campbell and Shiller, 1987; Bekaert and Hodrick, 2001, just to mention a few), with mixed evidence. There is growing awareness among macroeconomists and econometricians that the REH requires too much knowledge. In practice, agents display ‘bounded’ rationality and depart systematically from Muth’s (1961) tenet. The standard approach to modelling boundedly rational expectations assumes that agents behave as econometricians and form their forecasts by using adaptive updating rules, see e.g. Pesaran (1987), Sargent (1999), Evans and Honkapohja (1999, 2001), Ireland (2003) and Branch and Evans (2006). This means that as new data is available, agents estimate and update the parameters of their forecasting model, the so-called perceived law of motion, according to recursive least squares (RLS), or ‘constant gain’ learning rules, see Sargent (1999).1 Replacing expectations in the forward-looking model with the forecasts implied by the perceived law of motion yields the so-called actual law of motion, which reads as the agents’ data generating process. Under certain conditions, it has been found that expectations in these models can converge to the rational expectations equilibrium, which means that in the limit the actual law of motion is indistinguishable from the model solution obtained under the REH. In this paper, we are not interested in the conditions under which adaptive learning rules converge in the limit to a rational expectations equilibrium or to a ‘restricted perceptions equilibrium’ (Evans and Honkapohja, 1999, 2003), but to the testable implications that arise during the convergence process. So far, the existing econometric contributions have mainly focused on the problem of estimating the actual law of motion, disregarding the issue of testing the econometric implications of the adaptive learning hypothesis (ALH). One difficulty is that the notion of adaptive learning is logically based on a perpetual updating mechanism, whose ultimate effect (convergence to a rational expectations equilibrium or to a restricted perceptions equilibrium) can be evaluated only in the limit. In this paper we argue that tests of the econometric implications of the ALH can be constructed, under precise conditions, by means of recursive estimation methods and relatively standard testing procedures. This paper contributes to the literature by proposing a likelihood-based approach for test1

In this context learning is ‘adaptive’ rather than ‘optimal’, because it ignores the feedback from the learning

rule on the actual law of motion.

2

ing the restrictions that the class of forward-looking models typically used in monetary policy imposes under the ALH. We consider a fairly general theoretical specification, which covers models such as the New Keynesian Phillips Curve (NKPC), forward-looking aggregate demand equations and Taylor-type policy rules as special cases. The analysis is developed under the following assumptions: (i) the perceived law of motion is specified as a vector autoregressive (VAR) model for the observable variables; (ii) VAR coefficients are updated through RLS as new data increase the information set over time. As regards (i), it has been well known since Sargent (1979) that models involving forwardlooking behaviour impose a set of restrictions on the VAR agents’ use to compute forecasts (see Sbordone 2002, 2005; Kurmann, 2007; Fanelli 2008, for recent examples); the nonlinear restrictions involving the VAR coefficients and the parameters of the theoretical model are usually denoted as cross-equation restrictions (CER). In the literature, tests for the CER are typically presented as tests of the implications of the REH. We show that when (i) is combined with (ii), the analysis is no longer based on a single set of CER but on a sequence of CER: as new data become available and coefficient estimates are updated recursively, the CER must also be updated and evaluated again. The task of testing the CER in a recursive framework can be tackled by estimating the VAR recursively through maximum likelihood (ML), both unrestrictedly and subject to the constraints implied by the forward-looking model, and then computing likelihood ratio (LR) tests. In particular, starting from initial coefficient estimates obtained from a given subsample, the LR tests can be computed recursively, extending the end point of the sample, until the full set of available observations is covered. Assuming Gaussian disturbances, the recursive application of ML amounts to RLS. However, in the adaptive learning framework, the test based on the sample of size T , must not be considered as conclusive, as it must be repeated as the new observations T + 1, T + 2,..., enter the information set. This means that, from the inferential point of view, standard asymptotic critical values that do not take the recursive (sequential) nature of the test into account cannot be applied each time new data is available. Indeed, due to the law of iterated logarithms, repeated applications of such tests yields a procedure that rejects a true null hypothesis with probability approaching one, as the number of applications grows (Robbins, 1970). More precisely, if conventional critical values are used, the test will have the correct size at each point in time, but it will not have the correct size over the whole sequence of test statistics, as shown by Inoue and Rossi (2005) in the context of recursive tests for predictive ability. In these cases, in order to control the size of the null hypothesis as the test is repeated over time, the analysis must resort to a sequential approach in which the critical values vary over time.

3

To test the implications of forward-looking model under the ALH, we compare each recursively computed LR statistic, with the asymptotic critical values proposed by Inoue and Rossi (2005). However, despite those authors focusing on a very general set-up, and covering all estimation methods that can be regarded as a special cases of the generalized method of moments (including ML), in practice our test reads as a VAR-based recursive test for the nonlinear restrictions implied by forward-looking models, and is not a test for predictive ability. For this reason, we use Monte Carlo experiments to investigate how the suggested testing procedure works in our framework and in finite samples. Our simulations show that when the recursive LR test statistics are compared with the asymptotic critical values derived by Inoue and Rossi (2005) from Brownian motions, the procedure allows a conservative control of the null hypothesis when the data are generated from the constrained VAR, but exhibits low power against backward-looking alternatives, i.e. when the data are generated from unrestricted VARs. We argue that it is still possible to control the size of the sequence successfully while retaining power against backward-looking alternatives in finite samples, provided that the critical values in Inoue and Rossi (2005) are properly adjusted. In particular, the Monte Carlo experiments highlight that the recursive LR test based on linear convex combinations of the above mentioned critical values, and the critical values taken from the standard χ2 distribution, the latter having a dominant weight, delivers a procedure with an empirical size very close to the nominal size of the test, and powerful when the forward-looking model is false. The proposed method is applied to investigate the NKPC on euro area data under the ALH, using the wage share as a measure of firms real marginal costs, as in Gali and Gertler (1999) and Gali et al. (2001), and including a short-term interest rate in the VAR. The empirical analysis shows that the NKPC is rejected when the model is tested under the REH through a ‘one-shot’ LR test based on the full sample. Conversely, we find that the NKPC is supported by the data over large parts of the monitoring period (1986-2006), and with a dominating forward-looking inflation component, when the CER are recursively tested under the ALH. The result of the test points out that learning can be regarded as a major source of the euro area inflation persistence. The rest of the paper is organized as follows. Section 2 sketches the idea behind the proposed test of the ALH. Section 3 discusses the technical details of the proposed VAR-based procedure, and Section 4 investigates the size and power properties of the test through Monte Carlo experiments. Section 5 investigates empirically the NKPC on euro area data. Some concluding remarks are reported in Section 5. Technical details are sketched in the Appendix A.

4

2

Background

Consider the forward-looking model yt = γEt yt+1 + δyt−1 + κwt + vt

(1)

where yt is a scalar, vt a scalar white noise shock, and wt a scalar explanatory variable; Et yt+1 indicates the expected value of yt+1 formed at time t on the basis of the available information Ωt (Et yt+1 ≡ E(yt+1 | Ωt )), and γ, δ and κ are the structural parameters, usually subject to the

theoretical constraints: 0 < γ < 1, 0 < δ < 1, κ > 0.2

Many forward-looking models, including the NKPC that will be investigated in Section 5, can be regarded as special cases of (1). To address identification issues, it is of key importance to know the structure of the process generating wt in (1). For the sake of simplicity, and without loss of generality, we assume for the moment that wt is generated by the process wt = ρ1 wt−1 + et

(2)

where ρ1 is such that 0 < ρ1 < 1, and et a white noise process. If a determinate rational expectations solution to the model (1)-(2) exists, it will take the form yt = φ0 + φ1 yt−1 + φ2 wt−1 + η t

(3)

where η t is a white noise process with variance σ η , and φ0 , φ1 and φ2 are coefficients that depend on γ, δ, κ and ρ1 . Recently, many authors replace the process in (2) with more general specifications based on VAR processes for yt and wt (and possibly other variables), and use such a VAR to derive the CER with equation (1); the CER are then used to estimate and test the model using either ‘limited-information’ techniques as in Sbordone (2002, 2005) and Rudd and Whelan (2005a, 2005b, 2006), or ‘full-information’ ML methods as in Kurmann (2007) and Fanelli (2008). Suppose now that boundedly rational agents have a perceived law of motion of the form (3), and form their forecasts behaving as econometricians. They gradually learn the values of φ0 , φ1 and φ2 from the observed data, and update their forecasts. It is often argued that when forming expectations in period t, agents have access to information only up to t − 1; in that case, agents’ 2

In many circumstances, the disturbance vt can be given a precise economic meaning; moreover, it is customary

to model vt as a first order autoregressive process. In this paper we rule out autoregressive dynamics for vt for three reasons: first, any ‘external persistence’ factor introduced via the disturbance would be ad hoc, and not derived from first principles; second, we interpret vt as a term capturing unexplained (transitory) deviations from the theory, obtaining an ‘inexact’ model as in Kurmann (2007) and Fanelli (2008).

5

one-step ahead forecasts of yt are formed by b0t−1 (1 + φ b1t−1 ) bt−1 yt+1 = E(yt+1 | Ht−1 ) = φ E

b b b2 yt−1 + φ ρ1t−1 )wt−1 +φ 2t−1 (φ1t−1 + b 1t−1

(4)

where Ht−1 = {yt−1 , wt−1 , yt−2 , wt−2, ..., y1 , w1 } ⊆ Ht is the information set available at time b b b ρ1t−1 , are least squares estimates of the coefficients of (3) and t − 1, and φ 0t−1 , φ1t−1 , φ2t−1 and b bt−1 yt+1 , plugging the expression (2), obtained using the data in Ht−1 . Replacing Et yt+1 with E (4) into equation (1) and solving for yt , gives the so-called actual law of motion under least squares learning:

2

b b b b b ρ1t−1 ) + κb ρ1t−1 ]wt−1 + vt . (5) yt = γ[φ 0t−1 (1 + φ1t−1 )] + (γ φ1t−1 + δ)yt−1 + γ[φ2t−1 (φ1t−1 + b

b b b ρ1t−1 . To estiThe coefficients of model (5) are convolutions of γ, δ, κ and φ 0t−1 , φ1t−1 , φ2t−1 and b

mate (5), one can cast the model, along with the equations governing the recursion of coefficient estimates, in state space form, and exploit likelihood-based (possibly Bayesian) techniques, see e.g. Milani (2005). Estimation under adaptive learning rules is implicitly carried out by treating the actual law of motion (5) as the data generating process. This paper addresses the following question: how can one test the data adequacy of model (1) under the ALH ? Answering this question, amounts to investigate whether the process governing the convergence of model (5) to the rational expectations equilibrium, or to the ‘restricted perceptions equilibrium’, is supported by the observed data or not. Our solution is based on the assumption that agents use a VAR for Zt = (yt : wt )0 as their perceived law of motion, and apply RLS to update coefficient estimates. We do not require that the Minimum State Variable (MSV) solution (McCallum, 1983) of (1)-(2) be nested within the specified VAR. Rather, we maintain that the chosen VAR for Zt represents a sound dynamic approximation of the observed time-series, so that it can used as the statistical platform upon which the CER implied by model (1) can be recursively tested as new data enter the information set.3 We argue that in order to control the null hypothesis successfully over time, the resulting sequence of recursively computed test statistics can be be compared with a suitable set of critical values opportunely adapted from Inoue and Rossi (2005). We discuss the method in detail in the next section. 3

We are not interested in this paper on the role of misspecification in adaptive learning. The interested reader

is referred to Evans and Honkapoja (1999, 2003) and references therein.

6

3

VAR-based test of the adaptive learning hypothesis

Given the forward-looking model (1), consider the vector of observable variables Zt = (yt : wt : a0t )0 , where at is a qa × 1 (p = 2 + qa ) sub-vector of variables that do not enter the forward-looking

model, but that possibly help to forecast wt , or which play an important role in the system. In Section 5 where the NKPC will be investigated, yt will be the inflation rate, wt a measure of firms’ real marginal costs, and at will be proxied by a short-term interest rate (qa = 1). The law of motion for the p × 1 vector of observable variables is given by Zt =

k X

Bi Zt−i + εt

(6)

i=1

where k is the lag length, Z0 , Z−1 , ..., Z(1−k) are fixed, Bi , i = 1, 2, ..., k are p × p matrices of

parameters, and εt is a MDS with respect to Ht = σ(Zt , Zt−1 , ..., Z1 ) ⊆ Ωt , with (non-singular)

covariance matrix Σε and Gaussian distribution. For the sake of simplicity, and without loss

of generality, we assume that the variables in Zt are given in deterministic component-adjusted form, i.e. Zt = Zt∗ − dt , where Zt∗ is the vector containing the observed time series, dt is a p × 1 P vector obeying B(L)dt = ΘDt , with B(L) = I − ki=1 Bi Li the characteristic polynomial and L

the lag operator, Dt is the l × 1 vector containing all deterministic components (constant, linear

trend, dummies, etc.), and Θ is the corresponding p × l vector of coefficients. As it will be seen

below, since the forward-looking model in (1) does not include any deterministic component, we base our discussion on a VAR which is adjusted for deterministic components, otherwise we should consider also a set of CER on the Θ coefficients, which hardly can be justified in economic terms. If the VAR is (asymptotically) stable, i.e. the roots of det(Ip −

| s |> 1, -step ahead forecasts of Zt can be computed as bt−1 Zt+ = gz A E

+1

Pk

i i=1 Bi s )

= 0 are such that

et−1 Z

(7)

0 et = (Zt0 : Z 0 : ... : Z 0 where Z t−1 t−k+1 ) is the pk × 1 state vector associated with the VAR (6),



B1

B2

···

  Ip 0p×p · · ·   A=  0p×p Ip · · ·  . .. ..  .. . .  0p×p · · ·

Bk−1

Bk

0p×p .. .

0p×p .. .

Ip

0p×p

         

is the pk × pk companion matrix, and gz is a p × pk selection matrix such that gz Zet = Zt . 7

(8)

Since at each point in time the VAR coefficients are not known and must be estimated from the available data, in practice the forecast in (7) can be replaced with bt−1 Zt+ = gz (At−1 ) E

+1

et−1 Z

(9)

where with the notation At−1 we conventionally denote the counterpart of A in (8), before the coefficients Bj s are recursively replaced with their estimates based on Ht−1 . More precisely, given initial coefficient estimates based on the sample from t = 1 to T0 , estimates can be updated

following recursive rules: when the VAR disturbances are Gaussian, RLS amounts to reiterated applications of ML. From (9) it turns out that et−1 bt−1 yt+1 = gy (At−1 )Z E

(10)

et−1 bt−1 wt = gw At−1 Z E

(12)

et−1 bt−1 yt = gy At−1 Z E

(11)

where gy and gw are 1 × pk selection vectors such that gy Zet = yt , and gw Zet = wt . Condition

both sides of equation (1) with respect to Ht−1 , and apply the law of iterated expectations,

obtaining

E(yt | Ht−1 ) = γE(yt+1 | Ht−1 ) + δyt−1 + κE(wt | Ht−1 );

(13)

substitute the VAR forecasts (10)-(12) into (13), yielding the following set of CER gy At−1 (Ipk − γAt−1 ) − δgy − κgw At−1 = 01×pk

, t = T0 + 1, T0 + 2, ...

(14)

which involve the structural parameters and the VAR coefficients, and where T0 + 1 reads as the ‘monitoring time’. In the Appendix A we discuss in detail the nature of the nonlinear restrictions in (14), and show that they can be uniquely represented in explicit form as Bw,t−1 = f (By,t−1 , Ba,t−1 , τ )

(15)

where f is a nonlinear vector function, Bw,t−1 = gw At−1 , By,t−1 = gy At−1 , and Ba,t−1 = ga At−1 are vectors (matrices) containing the VAR coefficients associated with the equations for wt , yt and at , respectively. The expression (15) shows that under the CER the VAR coefficients of the equation for wt depend uniquely of the structural parameters, and the remaining VAR coefficients. For notational convenience, we have not attached any time index to the structural parameters γ, δ and κ entering (14) (and (15)); yet, as in typical situations the vector τ = (γ : δ : κ)0 is not known and must be inferred from the data, it is clear that also the estimate of τ obtained from the constrained VAR will vary over time. 8

Let t b ε,t )) bmax = − log(det(Σ (16) log L t 2 be the unrestricted VAR log-likelihood (a part from a constant) evaluated at the maximum, b ε,t = t−1 Pt (Zi − B bt Z bt Z ei )(Zi − B ei )0 , and let where Σ i=1 t e ε,t )) emax log L = − log(det(Σ t 2

(17)

e ε,t is the estimated covariance matrix of the be the constrained counterpart of (16), where Σ restricted VAR at time t. Formally (16) is likelihood of the actual law of motion at time t.

Given (16) and (17), the LR statistics for the constrained against the unrestricted VAR at

time t, are given by the sequence emax b max − log L ) , t = T0 + 1, T0 + 2, ... LRt = −2(log L t t

(18)

In practice, the sequence (18) will be computed from t = T0 + 1 until T max , where T max is the length of the sample at time at the time the test is computed. Suppose for the moment that we know which critical values must be used in (18). Given the sequential nature of the test, it may happen that the forward-looking model is rejected (accepted) for some t, and accepted (rejected) for other values of t, with t ranging from T0 + 1 until T max . Overall, the restrictions implied by the ALH tend to be supported by the data when the test does not reject the null hypothesis over a large fraction of the monitoring interval T0 + 1 − T max .

Apparently, to decide about the null, one might compare the LR statistics (18) computed

from T0 + 1 to T max with the critical value χ2r,1−α , where χ2r,1−α is the 1 − α quantile of the χ2

distribution with r degree of freedom, and r is the number of restrictions being tested under the null, r = pk−dim(τ ), (dim(τ ) = 3 in equation (1)). However, in the adaptive learning framework the test must be repeated when the observation T max + 1 is available, and so forth. Therefore, by using conventional critical values in problems like (18), the probability that one eventually rejects of the null hypothesis is asymptotically one due to the law of iterated logarithms, that is lim P [LRt ≥ χ2r,1−α ] = 1

t→∞

(19)

even when the null is true (Robbins, 1970). Inoue and Rossi (2005) have derived a general asymptotic theory which allows one to ‘follow’ test statistics like (18) through the sequence t = T0 + 1, T0 + 2, ..., in such a way that the probability of rejecting the null hypothesis is under control at each t. They show that the asymptotic distribution of test statistics of the type (18) depend on a multivariate Brownian motion. In practice, given r and the nominal size of the test, α, the LR statistics in (18) can 9

be compared with the asymptotic critical value IRtr,α , computed as IRtr,α = c2r,α + r log(t/T0 ), where cr,α is taken from Table 1 of Inoue and Rossi (2005). Although the asymptotic theory of Inoue and Rossi (2005) covers all estimation methods that can be regarded as special cases of the generalized method of moments (including ML estimation), those authors have developed a test for recursively testing for predictive ability, and not the class of restrictions in (14). It is therefore natural to ask how the sequence of recursive VAR-based LR tests for the CER (14) performs in samples of finite length. To investigate this issue, in Section 4 we put forth some Monte Carlo experiments with a twofold objective: first, to assess the size and power properties of the sequence of LRt test statistics when they are compared with the asymptotic critical values IRtr,α in finite samples; second, to envisage simple adjustments to the proposed test. To make the testing procedure operational, it is necessary to compute the likelihoods (16) and (17) for each t. In practice, the available sample of observations is split into two parts: the first, a pre-testing period from t = 1 to T0 is used to produce initial estimates of VAR coefficients; the second, from T0 + 1 until T max , is the sample period used for the recursive evaluation of the forward-looking model. The computation of (16) is straightforward. The computation of (17) involves the recursive estimation of the VAR under the CER, and can be carried out, for each t, by restricted maximum likelihood (RML), hence requires numerical optimization methods.4

4

Monte Carlo evidence

In this section we provide some Monte Carlo experiments to investigate the performance of the test of the ALH described in Section 3. The goal is to investigate the empirical size and power of the sequence of LR test statistics in (18) when the critical values are taken from Inoue and Rossi (2005), and used in finite samples. The simulation experiments are also designed to envisage adjustments to the critical values in order to control the size of the procedure while preserving power against backward-looking specifications. The experiment is based on a trivariate (p = 3) VAR with k = 2 lags and a constant, and with disturbances εt drawn from a Gaussian distribution with covariance matrix Σε = 0.5I3 .5 4

Clearly, when VAR disturbances are not Gaussian, the RML estimator is a restricted quasi-maximum like-

lihood (RQML). Observe that there are circumstances where the structural parameters in τ are known, and estimation under the CER is easier. This may happen (i) when the researcher has an a priori conjecture about the values taken by the structural parameters γ, δ and κ over the monitoring period, and wishes to test empirically the validity of his/her guess, (ii) when the researcher aims to evaluate whether the estimates of γ, δ and κ obtained in a previous study (and on a different period) are still supported by the data when new observations enter the information set. 5 Results for larger values of p and k are available upon request. All computations in this and in the next

10

To investigate size, data have been generated under the CER, restricting the VAR coefficients as in (15), and fixing the structural parameters at the values γ = 0.45, δ = 0.50 and κ = 0.10, respectively. To evaluate power against backward-looking alternatives (i.e. the unrestricted VAR), data have been generated from the same VAR as before, but without imposing any restrictions on the VAR coefficients. The norm of the vector d0 = gy A(Ipk − γA) − δgy − κgw A can be regarded as a measure of the extent of deviations from the CER: in principle, the power of the test should increase in correspondence of larger values of kdk, k·k being the Euclidean

norm.

We discuss in detail the size and power properties resulting from the experiments. Size evaluation The nominal size of the test was set to α = 0.05. We simulated M = 5000 samples of length T max = 150 from the constrained VAR, and on each simulated sample we estimated a trivariate VAR with two lags both unrestrictedly and subject to the CER, using the observations until T0 = 50 to form initial coefficient estimates, and the observations from T0 + 1 up to T max to evaluate the CER. The constrained estimation was carried out by the BFGS method (Fletcher, 1987).6 We then computed the LR statistics (18) for t = T0 + 1,..., t = T max , and compared each LRt with the following set of critical values: (a) the χ2r,1−α quantile with r = pk − dim(τ ) = 6 − 3 = 3, as in the case of repeated applications of a ‘one-shot’ LR test

for the CER; (b) the critical value IRtr,α = c2r,α + r log(t/T0 ), with cr,α taken from Table 1 in Inoue and Rossi (2005), see Section 3; (c) a set of ‘weighted’ critical values, cvtς , constructed as linear convex combinations of χ2r,1−α and IRtr,α , cvtς = (1 − ς)χ2r,1−α + ςIRtr,α , 0 ≤ ς ≤ 1. The

weight ς has been selected from the set {0, 14 , 12 , 34 , 1}, so that cvt1 = IRtr,α , and cvt0 = χ2r,1−α .

The weighted critical values cvtς have been introduced in order to offset the tendency showed by

the test based on the critical value IRtr,α to over-preserve the null hypothesis in finite samples, even when the forward-looking model is false (see below). Figure 1 plots the fraction of times in which the null is rejected using the five critical values. It can be noticed that using χ2r,1−α (cvt0 ), leads to a substantial - albeit not dramatic - overrejection of the true model. In this case, the tendency of the empirical size to drift away from the nominal size as the number of test repetitions increases, can be inferred from the last part of the sequence. On the other hand, the use of the critical value IRtr,α (cvt1 ) taken from Table 1 of Inoue and Rossi (2005) provides a strictly conservative control of the null. The graph shows that the ‘weighted’ critical value cvt0.25 tracks the nominal size of the test well over the entire section have been obtained through Ox 3.0. 6 1 We used the following trasformations for the structural parameters: γ = ( 1+exp( exp( 3 ), with

i , i = 1, 2, 3 real numbers.

11

1)

1 ), δ = ( 1+exp(

2)

), κ =

monitoring period.7 Power evaluation We repeated the simulation exercise described above, generating M = 5000 samples of length T max

= 150 from a partially restricted VAR. More precisely, the VAR coefficients in the first and

third equation of the system have been left at the same values considered in the size experiment, whereas the coefficients in the second equation have been fixed such that in one case the norm kdk is equal to 0.47, and in another case to 0.53, respectively. Again, on each simulated sample we estimated a trivariate VAR with two lags both unrestrictedly and under the CER, using the

observations until T0 = 50 to form initial coefficient estimates, and the observations from T0 + 1 up to t = T max to evaluate the CER. The two panels of Figure 2 plot the fraction of times in which the null hypothesis is rejected using the five critical values discussed above. The upper panel refers to the case kdk=0.47, and

the lower panel to the case kdk=0.53. Results show, as expected, that the power of the test

tends to increase as deviations from the null, measured by the norm of d, are more pronounced. At first glance, the test based on χ2r,1−α (cvt0 ) seems the most powerful, but as already explained in the previous sections, and as showed in the size experiment, repeated comparisons between the test statistics LRt and the quantile χ2r,1−α lead by construction to a size-biased procedure. On the other hand, the test based on IRtr,α (cvt1 ) appears as the less powerful, in line with its conservative nature already detected in the size experiment. The weighted critical values cvtς deliver an interesting balance between size and power: in particular, with relatively small values of r (that means relatively small values of p and k), the use of cvt0.25 in the test guarantees a good compromise between size coverage and power against backward-looking alternatives. Overall, our simulation experiments, not reported for brevity, suggest that for p = 2 (bivariate VAR) and p = 3 (trivariate VAR), and for k = 2, 3, 4, implying values of r = pk − dim(τ ) ranging from r = 1, to r = 9, the choice ς = (1/4) in cvtς = (1 − ς)χ2r,1−α + ςIRtr,α guarantees

a satisfactory control of the null hypothesis in finite samples, without penalizing power against unrestricted VARs. 7

For the sake of brevity, we do not documented in detail the performance of the BFGS procedure through

which we obtain the RML estimates of VAR coefficients, and in particular of the structural parameters γ, δ and κ. Here it is sufficient to stress that for each sample of length t = T0 + 1, ..., T max the forward-looking parameter γ is estimated less precisely than δ and κ. Observe that estimation is the empirical application of Section 5 is performed by combining the BFGS procedure with a grid-search for γ, δ and κ.

12

5

Application to the NKPC

In this section we apply the procedure described in Section 3 and Section 4 to test the NKPC on euro area data under the ALH. The traditional ‘hybrid’ formulation of the NKPC can be regarded as a special case of model (1), with yt ≡ π t representing the inflation rate, wt a measure

of firms’ real marginal costs; κ is the slope parameter, γ captures the role of expected future

inflation, and δ can be related to inertial mechanisms such as indexation, contracts, or rules of thumb. Although there exist several examples in the literature where dynamic stochastic general equilibrium models comprising NKPC-like supply equations are investigated under adaptive learning, to our knowledge, Milani (2005) is the only existing contribution where the econometric analysis of the NKPC is explicitly addressed under adaptive learning rules. We briefly review Milani’s (2005) findings, in order to stress differences and analogies with the results report below. Focusing on the United States, Milani (2005) considers a univariate dynamic forecast model for inflation serving as the perceived law of motion, and estimates the resulting actual law of motion recursively using a ‘constant gain’ version of recursive least squares. He finds that backward-looking inflatin components captured by the δ are no longer essential to fit the data when the suggested learning algorithm is taken into account. Learning is thus interpreted as the major source of inflation persistence. The analysis in Milani (2005) does not take any stand on the idea of testing whether the actual law of motion he obtains from the NKPC is recursively supported by the data under the ALH. Data description We consider quarterly data taken from the last release of the Area-wide Model data set described in Fagan et al. (2001). Variables cover the period 1971.q1-2006.q4; the data used in estimation and testing are counterparts of the variables in the vector Zt = (π t : wst : it )0 . The inflation rate π t is measured by the log of the quarterly changes in the GDP deflator; firms’ average marginal costs are proxied by the wage share wst (log of real unit labour costs) as in Gali et al. (2001); it is the short-term nominal interest rate.8 VAR specification The analysis is based on two VARs for Zt = (π t : wst : it )0 , with 2 and 3 lags, denoted VAR(2) and VAR(3), respectively. Both systems include an intercept, and are used as the statistical platoform upon which the restrictions implied by the NKPC under the ALH are tested. 8

The GDP deflator is YED in the AWM data set. The wage share (real unit labour costs) is computes as

wst = 100 × log(W INt /Y ERt ), where WIN is ‘Compensation to Employees’ (in real terms) and YER is real GDP. The short-term interest rate is STN in the data set.

13

We preliminarily tested for the possible presence of breaks in the VAR coefficients, relying on the Sup-Wald test statistics for parameter instability derived by Bai et al. (1998). The Sup-Wald test detects the most likely date for a break in all VAR coefficients, and treats the break date as unknown; it allows to detect stable sub-periods of the sample 1971.q1-2006.q4, to consider for the evaluation of the NKPC. Results are reported in Table 1. Depending on the VAR lag order, the Sup-Wald test identifies two dates located between the end of the seventies and the beginning of the eighties, and covering the second oil shock (when the VAR(2) is considered), as the most likely break dates. In all cases, the values of the test statistics are significant at the 1% level, and the 90% confidence intervals are very tight.9 From the results in Table 1 we decided to restrict the econometric evaluation of the NKPC on the sub-period 1980.q3-2006q.4 in which both VARs do not exhibit parameters instability. NKPC evaluation We have split the sample 1980.q3-2006.q4 into two parts: the sub-period 1980.q3-1985.q4 has been used to form initial VAR coefficient estimates, and the sub-period 1986.q1-2006.q4 has been used to test the NKPC under the ALH. In terms of the notation used in Section 3, T0 =1985.q4 and T max =2006.q4. Following Batini (2006), the date T0 + 1 =1986.q1 is here chosen as the first monitoring time, as after 1984/1985 the nature of the European Exchange Rate Mechanism characterizing the majority of European countries, changed from a ‘soft’ to a ‘hard’ exchange rate parity arrangement; yet, results are robust to different choices of T0 + 1. Table 2 compares the two unrestricted VARs over the monitoring period, using standard information criteria (the Akaike’s information criterion, AIC, the Hannan-Quinn criterion, HQ, and the Scharz criterion, SC), and multivariate residuals diagnostic tests. Both models fit the data well, though the VAR(2) appears slightly better than the VAR(3). The RML estimation in the VAR(2) and VAR(3) has been performed by combining the BSGF method with a grid search for the structural parameters γ, δ and κ. This choice, albeit time consuming in the recursive framework, allows to evaluate the likelihood function in correspondence of all meaningful points of the structural parameters space, reducing, if the chosen grid is sufficiently fine, the possibility of incurring in local maxima. We specified the range 0.6-0.98 for γ, the range 0-0.30 for δ, and the range 0.02-0.20 for κ, using 0.02 as common incremental value; for γ and δ, only pairs fulfilling the restriction γ + δ ≤ 1 have been considered in estimation.10 The number of CER, r = pk − dim(τ ), is r = 3 with the VAR(2), and r = 6 9

The p-values associated with the Sup-Wald statistics in Table 1 have been computed by simulation-based

methods, by applying the local Monte Carlo (LMC) (or ‘parametric bootstrap’) techniques described in Dufour and Jouidini (2006). Observe that the same result is obtained using the Sup-LR test (Andrews, 1993). 10 We also used grids covering smaller values of γ and larger values of δ, without changes in the results.

14

with the VAR(3). The upper panels of Figure 3 and 4 plot the sequence of LR statistics computed using the VAR(2) and VAR(3) over the monitoring period, along with the 5% critical values χ2r,1−α (cvt0 ), cvt0.25 and IRtr,α (cvt1 ), where cvt0.25 is constructed as cvt0.25 = 0.75χ2r,1−α + 0.25IRtr,α , see Section 4. The lower panels of Figures 3 and 4 report the recursive ML estimates of the structural parameters γ, δ and κ, obtained from the constrained system by combining the BFGS procedure with the grid-search. The graphs reveal that if one focuses on the VAR(2) and the VAR(3), and considers a ‘oneshot’ LR test for the CER, namely compares the value of the LR test statistic obtained at the date 2006.q4 (using all available information) with the quantile taken from the χ2 distribution, the NKPC is sharply rejected. The rejection of the NKPC in the euro area under the REH is in line with Bårdsen et al. (2004), O’Reilly and Whelan (2005), and Fanelli (2008). However, taking into account the sequential nature of the recursive test, and comparing the LR test statistics in the sequence with the critical values cvt0.25 (see Section 4), it turns out that for both models the CER are supported over large part of the period 1986.q1-2006.q4 (73% of times considering the VAR(2), and 95% of times considering the VAR(3)). The fact that in both cases the rejection of the CER occurs at the end of the monitoring period, can be interpreted as further evidence against the REH. The overwhelming magnitude of the recursively estimated forward-looking parameter, γ, compared to the recursively estimated backward-looking parameters, δ, suggests that intrinsic persistence inflation mechanisms such as indexation, contracts and rules of thumbs, are no longer crucial ingredients of inflation dynamics under adaptive learning schemes. This result is in line with the findings of Milani (2005) relative to the U.S. economy, however, differently from this author, we find that non-zero values of δ are consistent with euro area data.

6

Concluding remarks

The REH is pervasive in the macroeconometric literature, and its econometric implications have been tested in many fields of research with mixing results. Although a recent strand of the literature contends that economic agents display bounded rationality, and replaces the REH with adaptive learning schemes, the issue of testing the econometric implications of the ALH have been generally disregarded. In this paper, we have introduced a likelihood-based recursive method for testing the econometric implications of the ALH, considering a class of forwardlooking models which is widely used in monetary policy. We focus on the case in which agents use VARs to form their forecasts and RLS to update coefficients estimates over time. Other

15

models and updating rules are equally possible and deserve exploration in future research. We have shown that, in our set-up, LR tests for the CER implied by forward-looking models can be recursively computed through repeated applications of ‘one-shot’-type tests. However, in order to control the null hypothesis successfully over the monitoring period, the test statistics must be compared with critical values that do not remain constant over time. Our Monte Carlo experiments reveal that a proper set of critical values can be opportunely adapted from the theory of Inoue and Rossi (2005), obtaining a procedure which allows us to control the size of the test successfully, without penalizing power against backward-looking alternatives, even in finite samples. The paper has shown that the empirical assessment of the NKPC in the euro area changes substantially when the REH is replaced with the ALH. The result of the test emphasizes the role of adaptive learning as a major source of euro area inflation persistence.

Appendix A Derivation of cross-equation restrictions in explicit form In this Appendix we discuss in detail the CER (14). Given the VAR (6) with qa = 1, its 0 = g A and companion matrix A, and the selection vectors gy , gw and ga , let By0 = gy A, Bw w

Ba0 = ga A be the 1 × pk, vectors containing the parameters associated with the VAR marginal equations for yt , wt and at , respectively. The companion matrix can be thus written as   1 × pk By0    B0  1 × pk   A= w   Ba0  1 × pk   0 (pk − 3) × pk ϑ

where ϑ0 is the lower block of the companion matrix, containing zeros and ones only, and 0 = (B 0 By0 = (By,1 : By,2 : ... : By,pk ), Bw w,1 : Bw,2 , ... : Bw,pk ), Ba = (Ba,1 : Ba,2 : ... : Ba,pk ).

Given these definitions, the CER (14) can be re-written in the form 0 By0 (Ipk − γA) − δgy − κBw = 01×pk

and applying the vec operator to both sides: (Ipk − γA0 )By − κBw − δgy0 = 0pk×1 . It is now evident that (20) gives rise to the following set of restrictions: [By,i − γ(By,1 By,i + By,p Ba,i + By,2 Bw,i + I(i ≤ p)By,(3+i) )] 16

(20)

− κBw,i − I(i = 1)δ = 0 , i = 1, 2, ..., pk

(21)

where I(·) is the indicator function. As shown in Kurman (2006) for bivariate systems, and in Fanelli (2008) and Fanelli and Palomba (2007) in a more general set-up, the relations in (21) can be uniquely expressed in explicit form by solving the system with respect to the Bw,i coefficients, obtaining Bw,i =

1

(γBy,2 + κ) i = 1, 2, ..., pk

[By,i − γ(By,1 By,i + By,p Ba,i + I(i ≤ p)By,(3+i) )]

(22)

provided that By,2 6= −(κ/γ).

The constraints in (22) express the parameters associated with the wt marginal equations of

the VAR as unique function of the other VAR parameters (those in By and Ba ), and the structural coefficients τ = (γ : δ : κ)0 . Fanelli and Palomba (2007) show that the constrained VAR is (locally) identifiable in the sense of Rothenberg (1971) in a neighborhood of true parameter values. The number of free parameters under the restrictions (22) is 2pk + dim(τ ), whereas the number of parameters of the unrestricted VAR is 3pk; this means that the number of implied over-identifying restrictions is pk − dim(τ ), which implies that the VAR lag length, k, must obey the restriction pk ≥ 4 for the CER to be binding.

References Andrews, D.W.K. (1993), Tests for parameter instability and structural change with unknown change point, Econometrica 61, 821-856. Bai, J., Lumsdaine, R.L., Stock, J.H. (1998), Testing for and dating common breaks in multivariate time series, Review of Economic Studies 65, 395-432. Bårdsen, G., Jansen, E. S. and Nymoen, R., (2004), Econometric evaluation of the New Keynesian Phillips curve, Oxford Bulletin of Economics and Statistics 66, 671-685. Batini, N. (2006), Euro area inflation persistence, Empirical Economics 31, 977-1002. Bekaert, G., Hodrick, R. (2001), Expectations hypotheses tests, Journal of Finance 56, 13571394. Branch, W.A. (2004), The theory of rationally heterogeneous expectations: evidence from survey data on inflation expectations, Economic Journal 114, 592-621. Branch, W.A., Evans, G.W. (2006), A simple recursive forecasting model, Economic Letters 91, 158-166. 17

Campbell, J. Y., Shiller, R. J. (1987), Cointegration and tests of present value models, Journal of Political Economy 95, 1062-1088. Dufour, J.-M., Jouini, T. (2006), Finite-sample simulation-based inference in VAR models with application to Granger causality testing, Journal of Econometrics 135, 229-254. Evans, G. W. and Honkapohja, S. (1999), Learning dynamics, in Handbook of Macroeconomics 1A, Chap. 7 Evans, G. W., Honkapohja, S. (2001), Learning and expectations in macroeconomics, Princeton University Press. Evans, G. W., Honkapohja, S. (2003a), Adaptive learning and monetary policy design, Journal of Money, Credit and Banking 35, 1045-1072. Evans, G. W., Honkapohja, S. (2003b), Expectations and the stability problem for optimal monetary policies, Review of Economic Studies 70, 807-824. Fagan, G., Henry, G. and Mestre, R. (2001), An area-wide model (awm) for the Euro area, European Central Bank, Working Paper No. 42. Fanelli, L. (2008), Testing the New Keynesian Phillips Curve through Vector Autoregressive models: Results from the Euro area, Oxford Bulletin of Economics and Statistics 70, 53-66. Fanelli, L., Palomba, G. (2007), Simulation-based tests of forward-looking models under VAR learning dynamics, University Politecnica of Marche, Quaderno di Ricerca No. 298. Fletcher, R., (1987), Practical methods of optimization, Wiley-Interscience, New York. Galí, J., Gertler, M. (1999), Inflation dynamics: a structural econometric analysis, Journal of Monetary Economics 44, 195-222. Galí, J., Gertler M. and Lopez-Salido, J.D. (2001), European inflation dynamics, European Economic Review 45, 1237-1270. Inoue, A., Rossi, B. (2005), Recursive predictability tests for real-time data, Journal of Business and Economic Statistics 23, 336-345. Ireland, P. (2003), Irrational expectations and econometric practice. Discussion of Orphanides and Williams, “Inflation scares and forecast-based monetary policy”, Federal Reserve Bank of Atlanta, Working Paper 2003-22.

18

Kurmann, A. (2007), Maximum likelihood estimation of dynamic stochastic theories with an application to New Keynesian pricing, Journal of Economic Dynamics and Control 31, 767-796. McCallum, B. T. (1983), On non-uniqueness in rational expectations models: an attempt at perspective, Journal of Monetary Economics 11, 139-168. Milani, F. (2005), Adaptive learning and inflation persistence, Working Paper, University of California, Irvine. Muth, J.F. (1961), Rational expectations and the theory of price movements, Econometrica 29, 315-335. O’Reilly, G., Whelan, K. (2005), Has euro-area inflation persistence changed over time?, Review of Economics and Statistics 87, 709-720. Pesaran, H. M. (1987), The limits to rational expectations, Basil Blackwell, Oxford. Robbins, , H. (1970), Statistical methods related to the law of the iterated logarithm, Annals of Mathematical Statistics 41, 1397-1409. Rothenberg, T. (1971), Identification in parametric models, Econometrica 39, 577-591. Ruud, J., Whelan, K. (2005a), Does labor’s share drive inflation ?, Journal of Money Credit and Banking 37, 297-312. Ruud, J., Whelan, K. (2005b), New tests of the New Keynesian Phillips curve, Journal of Monetary Economics 52, 1167-1181. Ruud, J., Whelan, K. (2006), Can rational expectations sticky-price models explain inflation dynamics?, American Economic Review 96, 303-320. Sargent, T.J. (1979). A note on the maximum likelihood estimation of the rational expectations model of the term structure, Journal of Monetary Economics 5, 133-143. Sargent, T.J. (1999), The conquest of American inflation, Princeton University Press, Princeton. Sbordone, A.M. (2002), Prices and unit labor costs: a new test of price strickiness, Journal of Monetary Economics 49, 265-292. Sbordone, A.M. (2005), Do expected future marginal costs drive inflation dynamics ?, Journal of Monetary Economics 52, 1183-1197. 19

A

Figures and Tables

0.09

χ2 I&R cv075 cv050 cv025 nominal size

0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 51

60

70

80

90

100

110

120

130

140

150

Figure 1. Monte Carlo experiment, T0 + 1 = 51 and T max = 150, VAR with p = 3 variables and k = 2 lags. Empirical size of the sequence of LR tests for the CER implied by the forward-looking model (1) with γ = 0.45, δ = 0.50 and κ = 0.10 under the ALH, vs nominal size (0.05). The different lines refer to the empirical size obtained through different critical values.

20

0.9

|| d ||

= 0.47

0.8 0.7 0.6 0.5

χ2 I&R cv075 cv050 cv025

0.4 0.3 0.2 51

60

70

80

90

100

110

120

130

140

150

|| d || = 0.53 0.9 0.8 0.7

χ2 I&R cv075 cv050 cv025

0.6 0.5 0.4 51

60

70

80

90

100

110

120

130

140

150

Figure 2. Monte Carlo experiment, T0 + 1 = 51 and T max = 150, VAR with p = 3 variables and k = 2 lags. Empirical power of the sequence of LR tests for the CER implied by the forward-looking model (1) under the ALH, for two values of kdk, see Section 4. The different lines refer to the empirical power obtained using different critical values.

21

VAR(2) 15.0 12.5 10.0 7.5

LR χ2 (cv0) I&R (cv1) cv025

5.0 2.5 1985

1990

1995

2000

2005

2000

2005

1.0 0.9 0.8 0.7 0.6

γ δ κ

0.5 0.4 0.3 0.2 0.1 1985

1990

1995

Figure 3. Euro area data. Upper panel: Sequence of recursively computed LR statistics for the CER implied by the NKPC under the ALH over the period 1986:q12006:q4, with corresponding 5% critical values computed as in Section 4. Lower panel: recursively estimated structural parameters obtained through the grid-search (Section 5). Results based on the VAR(2).

22

35

VAR(3)

LR χ2 (cv0) I&R (cv1) cv025

30 25 20 15 10 5 1985

1990

1995

2000

2005

1.0 0.9 0.8 0.7 0.6

γ δ κ

0.5 0.4 0.3 0.2 0.1 1985

1990

1995

2000

2005

Figure 4. Euro area data. Upper panel: Sequence of recursively computed LR statistics for the CER implied by the NKPC under the ALH over the period 1986:q12006:q4, with corresponding 5% critical values computed as in Section 4. Lower panel: recursively estimated structural parameters obtained through the grid-search (Section 5). Results based on the VAR(3).

23

Sup-Wald break date statistics, sample period: 1970:2-2006:4 VAR with lags 2

3

Sup-Wald 99.82 [0.001] 115.91 [0.001]

Break Date 1979:4

90% Confidence Interval 1979:3—1980:1

1976:4

1976:3—1977:1

Table 1: Sup-Wald values of the break date test of Bai et al. (1998) on VARs of lag length 2 and 3. NOTES: p-values in square brackets have been computed by using local Monte Carlo simulation techniques (Dufour and Jouini, 2006), and 1000 replications. Trimming fraction = 0.15.

VAR(2)

VAR(3)

monitoring period: 1986.q1-2006.q4 AIC HQ SC Autocor Normality

-24.18 -23.93 -23.57 F(9,175)=1.74 [0.08] χ2 (6) =8.26 [0.22]

-24.15 -23.80 -23.28 F(9,168)=1.16 [0.32] χ2 (6) =5.55 [0.48]

Table 2: Information criteria and vector residual diagnostic tests for the VAR(2) and VAR(3) over the monitoring period. NOTES: AIC=Akaike; HQ= HannanQuinn; SC=Schwarz; Autocor= LM vector test for residual autocorrelations up to 1 lag; Normality = LM vector test for residual normality; p-values in square brackets.

24

Suggest Documents