Should we be surprised by the unreliability of real-time output gap estimates? Density estimates for the Eurozone James Mitchell∗ National Institute of Economic and Social Research December 18, 2003
Abstract Output gap estimates calculated in real-time are known to be often unreliable. Recent work has found that, without the benefit of hindsight, it can prove difficult for policy-makers to pin down accurately the current position of the output gap. However, attention primarily has focussed on output gap point estimates alone. This paper considers output gap estimates and their uncertainty more generally. As is well known from the forecasting literature, if the outturn falls within the bounds of what was expected the fact that point forecasts are inaccurate need not mean forecasts more generally contain no useful information. In this sense, the unreliability of real-time output gap estimates need be neither surprising nor indeed lead to sub-optimal inference by users of the estimates. Interpreting realtime output gap estimates as forecasts, we explain the importance of providing measures of uncertainty, via interval or density forecasts, around real-time output gap estimates. We consider how this can be achieved. The importance of allowing, in particular, for parameter uncertainty is discussed. We then explain how ex post the accuracy of these measures of unreliability associated with real-time estimates can be evaluated statistically and a decision then made about their reliability. An application to the Eurozone illustrates the use of these techniques in the context of real-time output gap measurement. Simulated out-of-sample experiments reveal that not only can real-time point estimates of the Eurozone output gap be unreliable, but so can measures of uncertainty associated with them. This provides a serious challenge to both producers and users of output gap estimates. Keywords: Output gap; Real-Time; Density Forecasts JEL Classification: E32; C53 ∗
Thanks to Andrew Harvey, Kostas Mouratidis, Tomasso Proietti, Simon van Norden, Ken Wallis and Martin Weale for helpful comments. This paper is part of a programme of work undertaken at the National Institute of Economic and Social Research in London with the support of EUROSTAT. Address for correspondence: James Mitchell, NIESR, 2 Dean Trench Street, Smith Square, London, SW1P 3HE, UK. Tel: +44 (0)207 654 1926. E-Mail:
[email protected]. All errors are mine alone.
1
1
Introduction
Policy makers require output gap estimates in real-time.1 They do not have the luxury of being able to wait before deciding whether the economy is currently lying above or below its trend level. They have to decide, without the benefit of hindsight, whether a given change to output in the current period is temporary or permanent, that is whether it is a cyclical or trend movement.2 Their problem can be interpreted as a forecasting one, since these real-time output gap estimates are forecasts, in the sense that they are expectations of the output gap conditional on incomplete information. Only with the arrival of additional information, such as revised historical data and data not available at the time, do the output gap estimates eventually settle down at their “final” values.3 Recent work has found real-time or end-of-sample output gap (point) estimates to be unreliable, in the sense that there is a large and significant revision or forecasting error; see Orphanides & van Norden’s (2002) application to the US economy.4 The revisions associated with real-time estimates are considerable; indeed for the US they were found to be as large as the output gap estimates themselves. So-called data revisions, explained by revisions to published GDP data, were found to be less important than so-called statistical revisions. Statistical revisions are explained by the arrival of new data helping macroeconomists, with the advantage of hindsight, better understand the position of the business cycle, and also perhaps revising what model they use to identify and estimate the output gap. Clearly policy-makers misjudging the position of the business cycle in real-time, or in other words making poor forecasts, can lead to sub-optimal policy-decisions; see Nelson & Nikolov (2001) and Ehrmann & Smets (2003). The findings for the US, therefore, are worrying. Below we find that a similar picture of revisions to output gap estimates emerges when we look at the Eurozone business cycle. But, should we be surprised by this unreliability? Previous work has largely overlooked this question. With a couple of exceptions to which we turn below, focus has hitherto been on the point forecast - that is on the “central tendency” of the output gap. But as is well known from the forecasting literature, forecasts more generally need not be misleading or useless even if point estimates are wrong. Specifically, if the “final” estimate or outturn falls within the bounds of what was expected in real-time, real-time estimates can remain useful to users. It is therefore imperative to provide users of output 1
Since our focus is on the “growth” business cycle rather than the “classical” cycle the “business cycle” and “output gap” are treated synonymously. 2 In fact, in real time not only do policy makers apparently require current estimates of the output gap but future or forecasted values; see Schumacher (2002). In this paper we concentrate on obtaining in real-time current estimates of the output gap, and their uncertainty. In any case, as we argue below, one can view obtaining real-time estimates of the (current) output gap as a forecasting exercise. 3 Values are never truly final because data revisions and the arrival of new data are a continuous process. 4 This unreliability, of course, has long been appreciated qualitatively; e.g. see Morgenstern’s (1963) discussion (p. 268) where he views the revisions typically associated with national accounts data as “casting serious doubts on the usefulness of national income figures for business cycle analysis”.
2
gap estimates with not just a real-time point estimate, but an indication of the degree of uncertainty associated with it. If reliable, users can take decisions accordingly, and insure themselves against the possibility of revisions to real-time output point estimates. In fact, measures of uncertainty are useful in their own right if interested in analysing, for example, risk and volatility, or the probability of a recession. Although previous work, such as Orphanides & van Norden (2002) and Camba-Mendez & Rodriguez-Palenzuela (2003), has provided measures of uncertainty associated with output gap estimates, via estimated standard errors, as indicated above focus has remained on the point estimates; the quality of real-time output gap estimates has been evaluated primarily by focussing on the degree and nature of ex post revisions to output gap point estimates. Typically this has involved examination, for example, both of the correlation between the real-time and final point estimates and the difference between the real-time and final estimates. When presented, measures of uncertainty have been evaluated in specific ways - it has not been made explicit what defines a “good” measure of uncertainty. This can leave us unclear about the reliability of real-time output gap estimates and their uncertainty. In this paper we provide a more general discussion of the importance of providing measures of uncertainty, via interval and density forecasts, associated with real-time output gap point estimates.5 We explicitly relate the computation and evaluation of measures of uncertainty associated with real-time output gap estimates to recently developed techniques from the forecasting literature. By comparison, previous work has left this relationship, at best, implicit and relied on ad hoc tests. Camba-Mendez & RodriguezPalenzuela (2003) test if the standard error of the real-time estimates is equal to that of the final estimates using what can be interpreted as a test for equal (point) forecast accuracy similar to Diebold & Mariano (1995), while Orphanides & van Norden (2002) actually use a type of Diebold-Mariano test on the second moments of the revision between real-time and final estimates to test if the variability of the revision process is consistent with what was expected ex post rather than ex ante.6 In contrast, we consider real-time 5
Our work builds on pioneering contributions, such as Stone et al. (1942) and Morgenstern (1963), that look at the uncertainty of economic statistics. Morgenstern, for example, stresses the uncertainties associated with national income statistics and begins to quantify them using uncertainty bands (see Figure 10, p. 269). 6 In the notation of Appendix C, Camba-Mendez & Rodriguez-Palenzuela (2003) test if Pt|t (Θt ) = Pt|T (ΘT ), where Pt|t (Θt ) is the variance of the real-time estimate and Pt|T (ΘT ) is the variance of the final estimate. Note that the parameters are estimated, respectively, using current and full-sample information. Under squared error loss, Gaussianity and both the real-time and final estimates having zero mean around some ‘true’ estimate (we could think of the final estimate as this ‘true’ estimate), this amounts to a test for equal forecast performance similar to Diebold & Mariano (1995). The hypothesis is tested using an F -test, whose appropriateness rests on certain, unstated, rather strong conditions such as no serial correlation. Orphanides & van Norden (2002), on the other hand, test if E(Rt|T ) = 0 and E(Rt|T )2 = Pt|t (ΘT ) − Pt|T (ΘT ) via a Diebold-Mariano type test. The estimator for the variance of the revision, Pt|t (ΘT ) − Pt|T (ΘT ), is not computable in real-time (ex ante), however, requiring full-sample information, T . Nevertheless, it is interesting their statistic with one # of the evaluation tests " to compare 2 XT C C yt|T −yt|t √ − 1 → N 0, 2πfT(0) where considered below. Their test statistic is: T1 t=1
3
Pt|t (ΘT )−Pt|T (ΘT )
(ex ante) computation of measures of uncertainty associated with real-time estimates and methods of evaluation that have a clear interpretation and indeed a growing pedigree. The clarity derives from defining what constitutes a “good”, or even “optimal” (in some respect), interval or density forecast. The plan of the remainder of this paper is as follows. Section 2 illustrates the unreliability of real-time output gap point estimates in the Eurozone. Parameter uncertainty is found to be a dominant source of this unreliability. We then turn to whether we should be surprised by this unreliability. To this end, Section 3 considers how to measure the uncertainty associated with real-time estimates, and then evaluate them similarly to how point estimates are evaluated on the basis of their root mean squared error (RMSE) against the outturn. Section 4 then re-visits the simulated real-time application to the Eurozone considered in Section 2. It indicates, and then evaluates, the degree of uncertainty associated with output gap estimates in real-time. Section 5 offers some concluding comments and discusses the implications of the findings for policy makers. Details of the data and models used are confined to appendices.
2
The unreliability of real-time output gap point estimates illustrated. An application to the Eurozone
Eurozone data are taken from the ECB’s Area Wide Model (AWM) database; see Appendix A for more details.7 It is expedient in an application to the Eurozone to focus on statistical revisions alone as construction of a ‘real-time’ data set is not readily possible. We do, however, make an attempt at construction of an “approximate” real-time data set for GDP; see Appendix A.1.8 This is meant merely to be suggestive and in the main body of the paper attention is restricted to the AWM data, i.e. to statistical revisions alone, by considering what amounts to a ‘final’ data vintage. To illustrate the unreliability of real-time output gap estimates the following experiment is undertaken. Full sample or final estimates of the output gap are derived using data available over the (full) sample-period, 1971q1-2003q1. Real-time output gap estimates are computed recursively from 1981q1. This involves using data from 1971q1-1981q1, to provide an initial estimation period of 10 years to compute the real time estimate for ( f (0) is some consistent estimate of the spectral density of
√
C C yt|T −yt|t
2 )T at frequency
Pt|t (ΘT )−Pt|T (ΘT ) t=1
zero. In contrast one of our tests for whether uncertainty is “correctly” accounted for by the forecast [see C T C yt|T −yt|t √ Section 3.3.2] requires the scaled revision errors to be i.i.d. N (0, 1). Pt|t (Θt )
7
t=1
Related studies that examine real-time output gap estimates for the Eurozone are Runstler (2002) and Camba-Mendez & Rodriguez-Palenzuela (2003). They ignore data revisions, and in fact use AWM data as in the majority of this paper. 8 Although data on other variables are used to help estimate the output gap we should expect data revisions to them to be minor compared with those typically experienced by GDP.
4
1981q1.9 Then data for 1971q1-1981q2 are used to re-estimate the output gap (that involves re-estimation of the parameters of the models used to measure the output gap) and obtain real-time estimates for 1981q2. This recursive exercise, designed to mimic real-time measurement of the output gap, is carried on until data for the period 1971q12000q1 are used to estimate the real-time output gap for 2000q1. The last 3 years are excluded from the real-time simulation to allow for the fact that real-time estimates take time to converge to their ‘final’ values.10 When the latest vintage of data is used throughout the experiment, what we call the real-time estimate is strictly the quasi-real estimate of Orphanides & van Norden (2002). As indicated, we do also provide some tentatative indication of the impact of data revisions by, from 1993q3, also de-trending using the data vintage available at the time rather than the final one. This is the first time data unreliability, albeit imperfectly proxied, has been considered in the context of Eurozone output gap estimation. To give an indication of what the final output gap estimates (using the AWM data) look like, Figure 1 presents them for two univariate and five multivariate representative estimators. For a review of these estimators, the specifications used, see Appendix B. Although the seven different estimates share the same shape, picking up the trough in the mid 1980s and the peak of the early 1990s, Figure 1 reminds us that inference is sensitive to measurement.11 The bivariate HP estimator produces cycles with a far greater, indeed implausible, amplitude than the other estimators; this perhaps illustrates the dangers of imposing, rather than estimating, model parameters. We can also see the end-of-sample problem; the different estimators in Figure 1 do not agree about the current position of the cycle. The real-time unreliability of output gap point estimates then is summarised in Table 1 for the seven representative output gap estimators. Reflecting the fact that the first five estimators in Table 1 can be interpreted within an unobserved components (UC) framework [see Appendix B] and that UC models use the data in two ways, as well as considering the real-time estimate, Table 1 also examines the filtered estimates (or the quasi-final estimates in the parlance of Orphanides & van Norden (2002)).12 For the univariate HP filter, as the only parameter is chosen a priori, clearly there is no parameter uncertainty meaning that the filtered and real-time estimates are equivalent. The reliability of the real-time and filtered point estimates is summarised by their correlation against the final estimates and the noise-to-signal ratio (NSR). The NSR is the ratio of the RMSE of the revision (the difference between the real-time or filtered estimate and the final estimate) to the standard deviation of the final estimate of the output gap. To provide some indication of how sensitive results are to the chosen sample period, Table 9
We ignore the fact that GDP data are published with, at least, a one quarter lag. Calculations in this paper were performed using the GAUSS and Ox [see Doornik (1998)] programming languages. Use was made of the SsfPack module for Ox; see Koopman et al. (1999). 11 Other studies of the Eurozone business cycle, such as Runstler (2002) and Artis et al. (2003), find similarly shaped cycles to Figure 1. 12 UC models use the data in two ways in the sense that first they estimate the parameters of the model, and secondly they use these estimates to obtain the smoothed estimates of the output gap that are the final estimates of the output gap. 10
5
1 contrasts the performance of the real-time and filtered estimates computed over the period 1981q1-2000q1, with those computed over the period 1993q3-2000q1. This latter period was chosen as it is after the peak in economic activity in the early 1990s. The final column of Table 1 provides a tentative indication of the impact of data revisions on real-time output gap estimation and presents the correlation of the real-time estimates, using real-time data vintages, against the final estimates using the final data vintage.13 Table 1 indicates that as the future becomes the present output gap estimates are revised. Real-time point estimates of the output gap in the Eurozone, as in the US, are unreliable, in the sense that there is a large and important revision error. This is reflected by the correlation coefficients, over the period 1981q1-2000q1, being in the range 0.27-0.76 and the noise-to-signal ratio exceeding unity for four of the seven output gap estimators, and for the remaining estimators being greater than 0.8. Reflecting the importance of ex post information in re-defining the parameter values the filtered estimates are more reliable than the real-time estimates: correlation is higher and NSR lower. Parameter uncertainty appears to be a dominant source of the unreliability of real-time estimates. Interestingly, as in the US, data revisions do not appear to be a major source of revisions. The real-time estimates using real-time data were found to have similar properties to the real-time estimates that use final (AWM) data. This is reflected here in the final column of Table 1 that indicates that use of the real-time data, rather than the final data considered in the sixth column, leads to estimates correlated similarly with the final estimates. There is some variation across estimators and the sample period. Comparing the estimators across the period 1981q1-2000q1, correlation ranges from 0.27 for the univariate HP measure to around 0.6-0.7 for most of the multivariate measures. In this sense it is encouraging that the move from univariate to multivariate measures of the output gap does lead to real time estimates better correlated with the final estimates. Adding ‘economic information’ appears to help. However, the multivariate estimators have higher noiseto-signal ratios than the univariate UC estimator. It should be added that for the BQ SVAR estimator results were found to be extremely sensitive to the chosen lag order, p. Only with very high p could we obtain a priori plausible looking cycles. The results presented in Table 1 are for p = 16. When p = 1 the correlation coefficient increased from 0.153 to 0.9. Although similar results have been obtained by Camba-Mendez & Rodriguez-Palenzuela (2003), see their Figure 3, we believe this result is misleading. It is based on an implausible looking business cycle with, for example, an amplitude in the range -0.5% to 0.5%, rather than around -2.5% to 2.5% as is plausible and indeed typical for the other estimators. In general, SVAR estimates were found to be sensitive to the lag order chosen. This unreliability of real-time point estimates also shows up over the period 1993q32000q1. Although the univariate estimators are more reliable over this later period than 1981q1-2000q1, some of the multivariate estimators appear to be less reliable. This may 13
It should be noted that the filtered estimates using the real-time data set (not reported in Table 1) are not identical, as they should be, to those using the AWM data. They are, however, qualitatively similar. Due to its method of construction, detailed in Appendix A.1, the final column of the real-time data matrix for GDP is not identical to the AWM database.
6
reflect the fact that they do well at picking up the big movements, such as the boom in the early 1990s, but are worse at picking up more minor movements. Table 1: The unreliability of real-time output gap point estimates. The correlation (r) of alternative real-time and filtered estimates of the Eurozone output gap against the final estimates and the noise-to-signal ratio (N SR)
AWM data 1981q1-2000q1 real-time filtered r NSR r NSR HP 0.273 1.43 0.273 1.43 Uni UC 0.600 0.82 0.838 0.64 Bi UC 0.599 0.87 0.941 0.56 Bi HP 0.688 3.31 0.727 1.35 Tri UC 0.760 1.79 0.921 0.57 SVAR: BQ 0.153 1.30 n/a n/a SVAR: KPSW 0.575 0.95 n/a n/a
1993q3-2000q1 real-time filtered r NSR r NSR 0.875 1.17 0.875 1.17 0.904 0.57 0.898 0.71 0.864 0.70 0.923 0.58 0.585 3.81 0.599 1.87 0.352 2.15 0.795 1.07 0.195 1.68 n/a n/a 0.968 0.32 n/a n/a
real-time data 1993q3-2000q1 real-time r 0.896 0.863 0.894 0.579 0.309 0.440 0.835
The findings of Table 1 beg the question, should we be surprised by this unreliability? To address this question, we therefore first take up the challenge of providing, in realtime, measures of uncertainty associated with real-time (point) estimates of the output gap, via interval and density forecasts. Secondly we evaluate them to ascertain whether the unreliability of output gap point estimates is surprising.14 Could we have anticipated this revision error? In real-time was the revision error to the real-time estimate within the bounds of what we could have predicted? It is important to evaluate whether these measures of uncertainty offer a reliable indication of the degree of unreliability associated with output gap estimates as otherwise all that can be said is the bands are wider for, say, output gap estimate A than estimate B. Nothing can be inferred about the appropriateness of the bands per se.
3
Uncertainty associated with output gap estimates: density estimates of the output gap
To capture fully the uncertainty associated with the real-time estimates, or forecasts, of the output gap, we construct density forecasts; see Tay & Wallis (2000) for a review 14
Although Orphanides & van Norden (2002), (pp. 578-582), provide measures of uncertainty associated with their final (smoothed) and quasi-final (filtered) estimates they do not present them for real-time estimates.
7
of density forecasting. Density forecasts of the output gap provide an estimate of the probability distribution of its possible future values. In contrast, so-called “interval” and “event” forecasts provide specific information on forecast uncertainty that can be derived from the density forecast; interval forecasts specify the probability that the actual outcome will fall within a given interval while event forecasts focus on the probabilities of certain events, such as the probability of recession. Density forecasts of inflation in the U.K., for example, are now provided each quarter both by the Bank of England in its “fan” chart and the National Institute of Economic and Social Research (NIESR) in its quarterly forecast. Density forecasts inform the user of the forecast about the risks involved in using the forecast for decision making. Indeed, interest may lie in the dispersion or tails of the density itself; for example inflation targets often focus the attention of monetary authorities to the probability of future inflation falling outside some pre-defined target range while users of growth forecasts may be concerned about the probability of recession. Moreover, volatility forecasts, as measured by the variance, and other measures of risk and uncertainty, can be extracted from the density forecast. Questions then arise over how the density forecasts should be constructed. We consider two approaches. The first, more traditional, approach relies on a state-space representation for the output gap estimator. The second approach quantifies the degree of uncertainty associated with the fact that future values of the (log) level of output (and other variables in multivariate measures of the output gap) are unknown but are known to affect real-time estimates. This is achieved simply by forecasting the future repeatedly via simulation techniques. This second approach not only provides one means of measuring the uncertainty associated with real-time estimates but can also, in principle, improve the actual performance of real-time estimates. This is because the accuracy of real-time estimates is a function of how well future values of the underlying series can be forecast. Let us consider each of these measures in turn before in Section 3.3 considering how one can evaluate them.
3.1
Measuring uncertainty using the Kalman filter
Traditionally when using unobserved-components based measures of the output gap two measures of uncertainty associated with the cyclical component of output are distinguished: filter and parameter uncertainty. Conditional on Gaussianity (of the disturbances driving the components of the state vector) confidence intervals around the output gap then can be presented given knowledge of the covariance matrix of the estimated state vector.15 This is the approach followed, for example, by Orphanides & van Norden (2002). For a review see Appendix C. As Orphanides & van Norden (2002) acknowledge these measures of uncertainty do not capture the effects of data revision nor the greater parameter uncertainty associated with estimation in real-time using shorter sam15
The Kalman filter recursions automatically return estimates of the covariance matrix of the state vector; see Harvey (1989). The diagonal elements of these matrices then can be used to construct the confidence intervals and density estimates.
8
ples.16 Nevertheless, it is important to evaluate how well the implied interval and density forecasts perform relative to the outturn (i.e. the final estimate). Using the notation C of Appendix C, let Pt|t denote the covariance matrix of the output gap at time t, yt|t , calculated at time t (the output gap is one of the elements of the state vector). Then conditional on Gaussianity confidence intervals can be derived, as can density estimates. C The Gaussian density is N (yt|t , Pt|t ). Below we examine how to evaluate these interval C and density forecasts once the final output gap estimate, yt|T where T → ∞, has become 17 available.
3.2
Measuring uncertainty by repeatedly forecasting the future
We also quantify the degree of uncertainty associated with the fact that future values of the (log) level of output (and other variables in multivariate measures of the output gap) are unknown but are known to affect real-time estimates.18 Quantification is achieved simply by trying to capture the uncertainty about future values by forecasting them repeatedly, say R times, via simulation procedures that allow for parameter uncertainty in realtime estimates. An alternative is to look at the past revision error variance. Parameter uncertainty is expected, and indeed in Table 1 was revealed, to contribute significantly to unreliability that may be associated with real-time estimates. Structural changes in the economy, for example, will cause parameter uncertainty. As structural changes often cannot be detected until some time afterwards, it is only with the arrival of additional data that parameter uncertainty is expected to decrease. The cyclical component at time t is then derived by de-trending the observed data in the (log) level of output up to time t plus the forecasted data for periods (t + 1), (t + 2), ..., (t+h). Given the R set of forecasts from (t+1) to (t+h), R estimates of the output gap at time t can be computed. The dispersion of these estimates at time t provides an C , that can indication of the range of likely outcomes for the final output gap estimates, yt|T be computed once actual data from period (t + 1) to T are published. The R estimates of the output gap can be used to construct confidence intervals (around say the median estimate) and density estimates. Questions then arise over how the density forecasts should be constructed via forecasting. What forecasting model should be used, should this model change over time, how should we account for uncertainty, how far ahead should we forecast etc.? The “optimal” real-time or one-sided filter, in the sense of delivering the minimum mean square revision 16
An alternative approach is adopted by Fabiani & Mestre (2001). Their measure of uncertainty is based on the use of stochastic simulation. Although they provide confidence intervals they are not evaluated in any way making it difficult to know what they measure and how they should be used. 17 We do not present measures of uncertainty associated with SVAR estimates. Note that since SVAR models can be interpreted within a state-space set-up it should prove possible to define similar measures of uncertainty for the SVAR models, as the UC models considered in this paper. An alternative means of quantifying the uncertainty associated with SVAR output gap estimates would be to undertake a stochastic simulaton. Experience suggests that this would deliver very wide confidence bands. 18 Similarly seasonal adjustment filters usually are applied in real-time to the actual series extended by forecasts.
9
error, involves application of the two-sided filter to the actual data extended by optimal linear forecasts; for a discussion in the context of seasonal adjustment see Wallis (1982).19 These optimal, minimum mean square error, forecasts are derived via the Kalman filter, exploiting the state-space representation for a given output gap estimator. Additionally, we experiment with forecasts derived from reduced form, rather than structural, models. This is a more agnostic approach; indeed, forecasts from atheoretical models have been found often to outperform those from structural models; e.g. see Clements & Hendry (1999). Specifically, for univariate measures of the output gap forecasts are produced from ARMA models. These can be rationalised as the reduced form of unobserved components models. For example, an ARIMA(0,2,2) model in the log-level of GDP corresponds to the reduced form of a local linear trend UC model; see Harvey (1989), p. 180. It is considered important to consider forecasts from more than one model given that model uncertainty is an additional source of uncertainty associated with real-time estimates of the output gap.20 We focus on ARMA models estimated in the second difference of the log-level of output, say yt . We experiment both forecasting from ARMA models where the number of lags is chosen recursively using some information criterion and using fixed lags chosen a priori. Results below are presented for an ARMA(0,2) model in the second differences of output. For multivariate models of the output gap we require forecasts not just of the log-level of output (GDP) but the additional economic variables. The same procedure as documented for univariate models is followed except that a VAR(p) model is considered. A VAR(p) model, with large p, can be seen as an approximation to an underlying VARMA process. In fact, we chose p using the Bayesian information criterion, that is known to select a parsimonious model. However, parsimonious models are potentially more robust to unforeseen structural breaks; e.g. see Clements & Hendry (1999). The impact of parameter uncertainty on the forecasts is then quantified as follows.21 For forecasts produced from the structural model a Monte Carlo integration is carried out along the lines of Hamilton (1986). This involves taking a Bayesian perspective: the vector of parameters in the model, Θ, is assumed to be a random variable, which is distributed b V (Θ)), where V (Θ) is the variance of Θ obtained from the inverse of approximately N (Θ, b 22 Then the negative Hessian matrix evaluated at the maximum likelihood estimates, Θ. 19
See also van Norden (2002) who in the context of business cycle analysis neatly shows that extending the series {yt } with optimal forecasts prior to de-trending can be seen as equivalent to deriving the X 2 X∞ T −1 b optimal one-sided filter, in terms of minimising E Bj yT −j − Bj yT −j with respect to j=0
j=−∞
bj , where Bj are the optimal “ideal” weights given as Bj = sin ju−sin jl for |j| ≥ 1 and Bj = u−l for B πj π j = 0, where u and l are the bands outside of which the spectral gain is zero, and inside of which the gain is unity. 20 In fact, model uncertainty, in addition to parameter and data uncertainty, are the three sources of uncertainty affecting output gap estimates distinguished by the ECB in its October 2000 bulletin. 21 Alternatively, one could avoid our simulation based techniques by allowing for parameter uncertainty analytically following, for example, Quenneville & Singh (2000). 22 In fact, the variance is obtained from the inverse of minus the numerical second derivatives as this is more reliable; see Doornik (2001), p. 250.
10
b V (Θ)) distribution. This is done a large number of values of Θ are drawn from a N (Θ, j = 1, 2, ..., 2000 times. For each draw optimal forecasts are obtained from the state-space model.23 For forecasts produced from reduced-form ARMA or VAR models we use the bootstrap procedure of Pascual et al. (2001) to study the impact of parameter uncertainty on prediction densities. Even having decided upon a forecasting model, or perhaps a collection of models, how far ahead should one forecast? Results, undoubtedly, will be sensitive. If one was concerned in only forecasting, or perhaps we should say “now”casting, current GDP, since it is available at a one quarter lag, so-called FLASH estimators could be used that exploit, for example, industrial production data since they are published within the quarter. We, however, consider forecasting further ahead. We limit discussion to forecasts four quarters ahead. An alternative is to choose the forecast horizon with reference to how far ahead the chosen filter looks so that a two-sided filter can be applied even at the end of the sample of ‘hard’ data; see Harvey & Koopman (2000) and Koopman & Harvey (2003). 3.2.1
Can forecasting also deliver improved real-time (point) estimates?
Forecasting not only provides one means of measuring the uncertainty associated with real-time estimates but may help us improve the accuracy of our real-time point estimates. This is because the accuracy of real-time estimates is a function of how well future values of the underlying series can be forecast; see Appendix C.2 for details. The better one can forecast (in real-time) the future values of the (log) level of output (and other variables in multivariate measures of the output gap) the less the revision. Indeed, if one could forecast perfectly there would, of course, be no revision at all. The potential advantages of forecasting can be appreciated in another, related, way. Forecasting future values prior to de-trending facilitates use of a less one-sided de-trending filter; again for a discussion in the context of seasonal adjustment see Wallis (1982).24 Detrending filters, whether parametric or nonparametric, can be seen to involve application of a moving-average filter to the raw data; e.g. see Harvey & Koopman (2000) and Koopman & Harvey (2003) who present algorithms for deriving the weights for parametric (UC) models. Most de-trending filters, such as the HP filter, imply a smooth two-sided moving average in the middle of the sample with no time-series observation receiving a large weight relative to its close neighbours. However, at the end of the sample the moving average becomes one-sided. It is well known that application of a one-sided filter will lead to more volatile estimates at the end of the sample as the weight on the last observation will be much higher than any of the weights associated with application of the filter in the middle of the sample; e.g. see van Norden (1997). Forecasting prior to de-trending can 23
We also experimented with an alternative Monte-Carlo bootstrap approach suggested by Stoffer & Wall (1991). 24 In fact, as the forecasts are a linear combination of the observed data, the filter applied is still onesided in the original data. As indicated above, the optimal one-sided filter, in the sense of delivering the minimum mean square revision error, involves application of the two-sided filter to the actual data extended by optimal linear forecasts; see Wallis (1982).
11
reduce the impact of the last observation. Naturally, if the future is forecast incorrectly this may not deliver more reliable output gap estimates. Re-visiting the real-time experiment in Section 2, and using the AWM dataset, we provide an indication of whether forecasting prior to de-trending can help deliver more accurate estimates in real-time. We focus on the behaviour of the median estimate across 499 bootstrap replications. We consider extending the series in real-time both with optimal forecasts and reduced form forecasts. The results are presented in Table 2. For convenience the second column of Table 2 reminds us of the performance of the point estimates considered in Table 1. Table 2 indicates that forecasting ahead in real-time, prior to de-trending, can help apparently particularly so when reduced-form forecasts are used to extend the series in real-time. The correlation against the final estimates is often higher for the forecasted output gap estimates than the real-time estimates. Clearly these results are merely suggestive since they are sensitive to choices about the forecasting model, forecast horizon etc.25 However, they suggest that there is some value to trying to forecast ahead prior to de-trending. Table 2: The correlation of real-time estimates of the Eurozone output gap with the final estimate when either optimal or atheoretical, reduced-form, forecasts are used to extend the series. Correlation presented over the period 1981q1-2000q1.
Hodrick-Prescott Univariate Unobserved Components Bivariate Unobserved Components Bivariate Hodrick-Prescott Trivariate Unobserved Components
3.3
real-time 0.273 0.600 0.599 0.688 0.760
optimal forecast reduced-form forecast n/a 0.284 0.347 0.633 0.446 0.705 0.617 0.527 0.749 0.343
Evaluation of interval and density forecasts of the output gap
Interval and density forecasts of the output gap are evaluated ex post with respect to the outturn, i.e. the final estimates for the output gap. Since the final estimates are themselves estimated, and not known with certainty, we consider evaluation of the real-time estimates not just against the final (point) estimate, as is traditional, but against the density of the final estimates. Specifically, we consider the impact of parameter uncertainty on the final estimates. Following the discussion above, the density estimate of the final estimate 25
For the HP filter we did experiment with forecasting 25 periods ahead based on the view that this is roughly the length of the filter implied by HP. The correlation of the real-time and final estimates fell to 0.230.
12
is constructed by taking repeated draws from the distribution of (full-sample) parameter estimates. Then for each draw smoothed estimates of the output gap are computed and used as the basis for the interval and density evaluation tests considered below. Across the draws we compute the mean and variance of the p-value of the evaluation test, plus the proportion of times the p-value > 0.05. Let us now turn to the interval and density evaluation tests. 3.3.1
Evaluation of interval forecasts
Interval forecasts are evaluated using the LR tests of Christoffersen (1998). The asymptotically equivalent Pearson chi-squared tests of Wallis (2003) are also considered.26 A ‘good’ interval forecast should have correct conditional coverage, so that in volatile periods the interval is wider than in less volatile periods. Define It as an indicator variable that takes the value 1 if the outcome falls within the interval forecast at time t, and the value 0 otherwise. Consider an interval forecast for coverage probability p. Then Christoffersen (1998) defines a set of ex ante forecasts as being “efficient” with respect to the information set (say, Ωt−1 ) if E(It | Ωt−1 ) = p. If Ωt−1 = {It−1 , It−2 , ...} then this implies {It } is i.i.d. Bernoulli with parameter p. Then a test for unconditional coverage, ignoring independence, is defined as the test for whether actual coverage equals nominal: i.e. E(It ) = p. A LR test is then defined: LRuc = −2 log where n1 =
Xn j=1 p
(1 − p)n0 pn1 ∼ χ2 (1), (1 − π b)n0 π n1
(1)
Ij , n0 = n − n1 , π b = n1 /(n0 + n1 ) is the proportion of successes in the
sample and π b → π. See Wallis (2003) for details of the derivation of the asymptotically equivalent Pearson chi-squared version of this test. One advantage of the Pearson chisquared test is that the exact distribution of the test statistic can be derived.27 (1) tests whether the coverage is correct but does not have power against the alternative that the 0’s and 1’s come clustered together in a time-dependent fashion. To remedy this omission, (1) is supplemented with a test for independence. Define the first order Markov chain: 1 − π 01 π 01 Π1 = , (2) 1 − π 11 π 11 where π ij = P (It = j | It−1 = i). Under independence (2) equals 1 − π1 π1 Π1 = . 1 − π1 π1 26
(3)
An alternative regression based approach for evaluating interval forecasts is proposed by Clements & Taylor (2003). This approach allows for more general dependence structures than the first order Markov chain process considered below. 27 In fact, we adopt an alternative second-best solution and use the Yates continuity correction.
13
A LR test for independence (note that in fact only second moments are being considered) is then given by: LRind = −2 log
(1 − π b1 )(n00 +n10 ) π b1 (n01 +n11 ) ∼ χ2 (1), (1 − π b01 )n00 π bn0101 (1 − π b11 )n10 π b11 n11
(4)
where nij is the number of times event i is followed by event j. Again see Wallis (2003) for details of the derivation of the asymptotically equivalent Pearson chi-squared version of this test. A joint test for correct conditional coverage can then be defined as LRcc = LRuc + LRind . See Wallis (2003) for the Pearson chi-squared version of this test. 3.3.2
Evaluation of density forecasts
Density forecasts are evaluated in two complementary ways. The first approach, reduces the density forecast to an interval forecast and uses the LR and chi-squared tests considered in Section 3.3.1; see Wallis (2003). We focus on the central 50% or inter-quartile-range forecast implied by the density forecast. Secondly, density forecasts are evaluated ex post using the probability integral transform; see Diebold. et al. (1998).28 They popularised the idea of evaluating a sample of density forecasts based on the idea that a density forecast can be considered “optimal” if the model for the density is correctly specified. One can then evaluate forecasts without the need to specify a loss function. This is attractive as it is often hard to define an appropriate general (economic) loss function. These tests evaluate the whole densities. Sometimes interest may lie in a particular region of the density, such as the probability of a recession. In these cases it is sensible to evaluate just the event of concern rather than the whole density; for an application see Clements (2002b). A sequence of estimated density forecasts, {pt (ytC )}Tt=1 , for the realisations of the process {ytC }Tt=1 , coincides with the true densities {ft (ytC )}Tt=1 when the sequence of zt is independently and identically distributed (i.i.d.) with a uniform distribution, U(0,1), where, C Zyt zt = pt (u)du, , t = 1, ..., T (5) −∞
Therefore, to test whether the density forecasts are optimal and do capture all aspects of the distribution of ytC one must test whether the zt are both i.i.d. and U(0,1). Departures from i.i.d. U(0,1) reveal useful information about model failures. Deviations from uniform i.i.d. indicate that the forecasts have failed to capture some aspect of the underlying data generating process. For example, serial correlation in the z-sequence (in 28
This methodology seeks to obtain the most “accurate” density forecast, in a statistical sense. It can be contrasted with economic approaches to evaluating forecasts that evaluate forecasts in terms of their implied economic value, which derives from postulating a specific (economic) loss function; see Granger & Pesaran (2000) and Clements (2002a).
14
the first moment, squares, third powers etc.) would indicate poorly modelled dynamics, whereas non-uniformity indicates improper distributional assumptions or poorly modelled dynamics, or both. By taking the inverse normal cumulative density function (CDF) transformation of the {zt } to give, say, {zt∗ } the test for uniformity can be considered equivalent to one for normality on {zt∗ }; see Berkowitz (2001). This is useful as normality tests are widely seen to be more powerful than uniformity tests. For example, consider the Gaussian C density forecasts of the output gap computed in real-time, N (yt|t , Pt|t ). The probability p p C C C C integral transforms are zt = Φ((yt|T − yt|t ), Pt|t ), i.e. zt = Φ((yt|T − yt|t )/ Pt|t ). The T →∞ T →∞ p ∗ C C Berkovitz series {zt } are then the scaled revision errors, (yt|T − yt|t )/ Pt|t that are i.i.d. T →∞
N (0, 1) under the null. However, testing is complicated by the fact that the impact of dependence on the tests for uniformity/normality is unknown, as is the impact of nonuniformity/normality on tests for dependence. ∗ To test for normality of the {zt∗ }Tt=1 series we employ the Doornik-Hansen test; see Doornik & Hansen (1994). To test for independence of the {zt∗ } series we use the LjungBox test for auto-correlation; see Harvey (1989), p. 259. Since dependence may occur in higher moments we consider (zt∗ − z ∗ )j for j = 1, 2, 3.29 The joint hypothesis of a zero mean and a variance of unity for {zt∗ } is tested bycomputing p-values for the statistic XT ∗ √ D using the standard normal CDF, where D = T1∗ (zt∗ )2 − 1 (σ/ T ∗ )−1 and t=1
{(zt∗ )2 }
σ is an estimator for the standard deviation of . One can robustify against serial correlation (and heteroscedasticity) by using a HAC estimator for this standard deviation. If there is no dependence in the {zt∗ } one should expect the robust and nonrobust versions of the test to be equal aside from sampling variability (and any impact of heteroscedasticity).
4
The uncertainty of real-time output gap estimates in the Eurozone: interval and density forecasts
Consider now the two alternative measures of uncertainty, computed in real-time, associated with the real-time point estimates. These are based upon the Gaussian density, C N (yt|t , Pt|t ), and the simulated density, derived by forecasting four quarters ahead 499 times for each recursive sample. Let us denote the Gaussian density, G, and the simulated density, S. Note that for G we allow only for filter uncertainty; parameter uncertainty 29
Alternatively use could be made of recently proposed joint tests for uniformity and independence; see Hong (2002). Care should be exercised in interpreting results from density evaluation tests. Clements et al. (2003) have found, using Monte-Carlo experiments, that traditional density evaluation tests are not useful in sample sizes typical to macroeconomics; specifically, they found that the tests failed to detect non-linearities present under the data-generation-process. Graphical analysis, using P and Q plots, can also be useful in explaining why density forecasts have failed.
15
could be added in too. We should expect this to add to uncertainty most for those measures of the output gap that estimate most parameters, namely the multivariate measures. In fact, as we will shall see, in general, filter uncertainty alone appears to over-estimate the degree of uncertainty and lead to too wide confidence bands. Figures 2-6 provide a visual indication of the degree of estimated real-time uncertainty for the UC based estimators of the output gap by computing 95% confidence intervals around the median and point estimates, respectively, for the simulated and Gaussian measures of uncertainty. There are striking differences in the degree of uncertainty (predicted in real-time) associated with the real-time estimates across the alternative estimators of the output gap. As is to be expected, uncertainty is lowest for the Hodrick-Prescott estimator. It is largest for the bivariate Hodrick-Prescott estimator where the degree of uncertainty is, incredibly, large. This reflects the imposition of a priori parametric restrictions leading to a poorly defined model, in a statistical sense, where there is considerable uncertainty about the values of the remaining free parameters. With perhaps the sole exception of the trivariate UC estimator (and even then only at certain points in time) the confidence bands are wider for the Gaussian bands, derived from the Kalman filter recursions, than those derived via simulation. Indeed, the Gaussian bands are very rarely significantly different from zero; policy-makers on this basis could never be sure about the position of the business cycle. The simulated confidence intervals are narrower, and more often, although still not often, do suggest that the output gap is statistically significant. Given these differences between the two alternative estimates of real-time uncertainty it is important to ask, which measure of uncertainty, if any, is better? Is the finding that the Gaussian bands nearly always cover zero, in fact, correctly quantifying the degree of uncertainty associated with real-time estimates. We analyse this formally using the statistical tests considered in Section 3.3. First, however, it is useful to simply contrast the confidence bands for each measure with the actual outturn (printed in bold face) by looking again at Figures 2-6. There are again important differences across the alternative output gap estimators. The outturn frequently falls outside both the simulated and Gaussian 95% confidence bands for the Hodrick-Prescott estimator. For the univariate and bivariate estimators, the simulated bands again, but in contrast to the Gaussian ones, fail to pick up the boom of the early 1990s with the outturn falling well above the predicted 95% upper band. Reflecting the larger degree of uncertainty associated with the bivariate Hodrick-Prescott and trivariate UC cycles, it is only for these estimators that the outturn generally falls inside the 95% confidence bands. However, it does appear that in the less volatile period in the aftermath of the boom of the early 1990s the confidence bands across the different estimators do a better job at ‘covering’ the outturn. Tables 3-11 complement these figures by providing the results of the formal evaluation tests for the predicted measures of uncertainty.30 Interval and density evaluation tests are performed both over the periods 1981q1-2000q1 and 1993q3-2000q1, for both the S 30
On occasion for the simulated measure of uncertainty, usually for the univariate estimators, some of the {zt } are exactly 0 or 1. The inverse normal CDF transformation is then taken, and {zt∗ } derived, by either adding or subtracting 0.0001 to 0 or 1, respectively.
16
and G measures of uncertainty. For all estimators apart from Hodrick-Prescott, where there is no parameter uncertainty, tests are also performed against the outturn allowing for parameter uncertainty. In no cases do either the S or G measures of uncertainty provide a statistically satisfactory indication of the degree of uncertainty associated with real-time output gap estimates when evaluated against the point estimate for the outturn. There is evidence to suggest, however, that the measures of uncertainty are less unreliable over the latter period, 1993q3-2000q1 - the p-values tend to be higher. But even over this latter period at least one of the tests rejects the reliability of the particular measure of uncertainty. Reliability of the predicted degree of unreliability associated with real-time output gap estimates requires both the density and interval evaluation tests to be not rejected. Looking at the interval evaluation tests alone is insufficient; for example, over the period 1993q32000q1 the simulated measure of uncertainty for the univariate UC estimator does not fail the interval evaluation tests - the 95% and central 50% confidence bands offer a reliable indication of the degree of uncertainty. However, since the estimated density is rejected we know that not all possible confidence intervals are satisfactory. Of course, inference about the interval and density forecasts is complicated by the fact that the different statistical tests central to the evaluation tests will have different small-sample properties. Some of the tests may be more powerful than others. It is possible that a lack of consensus across the alternative statistics about the reliability of the interval and density forecasts reflects not a real disagreement but the the statistical properties of the tests. The most common reason for failure of the density forecasts is serial dependence in the first moment of {zt∗ }. This is consistent with our finding of considerable persistence in the (scaled) revisions. Evidence for dependence is supported further by examining the robust and non-robust tests for the joint hypothesis of a zero mean and a variance of unity for {zt∗ } - see Table 12. Table 12 suggests there are often differences between the robust and non-robust versions of this test which we should not expect if there were independence in {zt∗ }. In fact, making the interval and density forecasts look yet more deficient, for certain output gap estimators this joint null hypothesis is rejected for the Gaussian measure of uncertainty even when allowing for dependence (that itself should not be present under the null that the density forecast is correct). This is explained by the greater degree of uncertainty suggested by the Gaussian measure leading to {zt∗ } with too low a variance - so although the mean is insignificantly different from zero the variance is significantly different from unity. Therefore it appears that not just, as indicated in Table 1, are real-time point estimates of the output gap unreliable but so are measures of uncertainty associated with them. However, a less pessimistic picture is painted when over the period 1993q3-2000q1 the S and G measures of uncertainty are evaluated not just against the final (point) estimates, but against its density that allows for uncertainty about the outturn. Unsurprisingly there is improvement in the reliability of the interval and density forecasts. This is so especially for the simulated measure where there is always some evidence, across the interval and density evaluation tests, to suggest that the predicted degree of uncertainty is satisfactory: both the mean (across 499 realisations for the outturn) of the p-values and the proportion 17
of times the p-value is greater than 5% rise above zero, even for the serial dependence test in the first moment of {zt∗ }: LB1. Interestingly, this is so even for those estimators of the output gap that do not offer as reliable point estimates, as evaluated in Table 1. However, it is perhaps little cause for celebration for policy-makers that analysts can only provide them with reliable interval and density forecasts of the output gap when the target (the final output gap they are trying to forecast) is itself acknowledged to be uncertain. Alternative measures of uncertainty, of course, may do better than the two considered here. Future work should consider other ways of measuring the uncertainty associated with real-time estimates with the aim of finding the ‘correct’ density estimate.
5
Concluding comments
This paper stresses the distinction between point estimates of the output gap, and measures of uncertainty associated with them. In an application to the Eurozone our results indicate that not only are real-time point estimates of the output gap unreliable, consistent with previous research for the US, but so in general are their measures of uncertainty. This provides a serious challenge to users of output gap estimates. Users must decide what do to given that there is real-time mismeasurement not just of the output gap (point) estimates but their uncertainty too. There is some indication that interval and density forecasts of the output gap, computed in real-time, are more reliable in the aftermath of the boom in the Eurozone in the early 1990s. This perhaps suggests that only conditional on correct identification of a turning point, and without the possibility of pronounced cyclical behaviour in the future, can reliable interval and density forecasts of the output gap be obtained in real-time. Certainly more work is required to find the ‘correct’ density forecast. Notwithstanding the poor performance of the interval and density forecasts in our simulated out-of-sample application to the Eurozone we suggest, in contrast to current practice, that analysts when estimating the output gap in real-time should also rountinely indicate the degree of confidence in their estimates via interval and density forecasts. Even, if the predicted degree of uncertainty proves to be incorrect ex post, at least the policy-maker has been warned ex ante of the dangers associated with the real-time point output gap estimates. Our results for some representative univariate and multivariate output gap estimators suggest that typically the predicted degree of uncertainty associated with the point estimates will be so large that it is impossible to reject the hypothesis that the output gap is statistically insignificant from zero. The findings also point to obvious problems in defining fiscal rules with reference to the economic cycle. Performance against a rule which requires the current budget to be balanced “over the cycle” cannot be assessed for several years. It is likely that the true position can be established only after at least two or three years. Similarly reforming the Stability and Growth Pact to take account of the cycle, as has been recently suggested in many circles, runs into obvious problems since the cycle cannot be measured accurately on a timely basis. Moreover, preliminary work we have undertaken suggests that real-time
18
output gap estimates often have little forecasting power over inflation relative to simple autoregressive alternatives; the estimates are therefore of limited use to monetary policy makers. But this does not appear to be due to the unreliability of real-time output gap estimates but rather the difficulties of forecasting inflation per se. This is consistent with the findings of Orphanides & van Norden (2003) for the US and Canova (2002) for the G-7; it is the instability of the parameters of Phillips Curve equations that helps explain the poor forecasting performance.
19
6
Bibliography
References Altissimo, A., Bassanetti, A., Cristadoro, R., Forni, M., Lippi, M., Hallin, M., Reichlin, L. & Veronese, G. (2001), EUROCOIN: A real time coincident indicator of the Euro area business cycle. CEPR Discussion Paper Series No. 3108. Apel, M. & Jansson, P. (1999), ‘A theory-consistent approach for estimating potential output and the NAIRU’, Economics Letters 64, 271–275. Artis, M., Marcellino, M. & Proietti, T. (2003), ‘Dating the Euro area business cycle’, CEPR Discussion Paper No. 3696 . Astley, M. & Yates, T. (1999), Inflation and real disequilibria. Bank of England Working Paper. Barrell, R. & Sefton, J. (1995), ‘Output gaps, some evidence from the UK, France and Germany’, National Institute Economic Review 151, 65–73. Berkowitz, J. (2001), ‘Testing density forecasts, with applications to risk management’, Journal of Business and Economic Statistics 19, 465–474. Beyer, A., Doornik, J. & Hendry, D. (2001), ‘Constructing historical Euro-zone data’, Economic Journal 111, 308–327. Blanchard, O. & Quah, D. (1989), ‘The dynamics of aggregate demand and supply disturbances’, American Economic Review 79, 655–673. Boone, L. (2000), ‘Comparing semi-structural methods to estimate unobserved variables: the hpmv and kalman filters approaches’. OECD Economics Department Working Papers No. 240. Bullard, J. & Keating, J. (1995), ‘The long-run relationship between inflation and output in postwar economies’, Journal of Monetary Economics 36, 477–496. Camba-Mendez, G. & Rodriguez-Palenzuela, D. (2003), ‘Assessment criteria for output gap estimates’, Economic Modelling 20, 529–562. Canova, F. (1998), ‘Detrending and business cycle facts’, Journal of Monetary Economics 41(3), 475–512. Canova, F. (2002), G-7 inflation forecasts. Mimeo, Universitat Pompeu Fabra. Chagny, O. & Lemoine, M. (2002), The impact of the macroeconomic hypothesis on the estimation of the output gap using a multivariate Hodrick-Prescott filter: the case of the euro-area. Eurostat Working Paper. 20
Christoffersen, P. (1998), ‘Evaluating interval forecasts’, International Economic Review (39), 841–862. Clements, M. (2002a), ‘Evaluating the Bank of England density forecasts of inflation’, Warwick University Discussion Paper . Clements, M. (2002b), ‘An evaluation of the survey of professional forecasters probability distributions of expected inflation and output growth’, Warwick University Discussion Paper . Clements, M., Frances, P., Smith, J. & van Dijk, D. (2003), ‘On SETAR non-linearity and forecasting’, Journal of Forecasting 22, 359–375. Clements, M. P. & Hendry, D. F. (1999), Forecasting Non-stationary Economic Time Series, The MIT Press, London, England. Clements, M. & Taylor, N. (2003), ‘Evaluating interval forecasts of high-frequency financial data’, Journal of Applied Econometrics 18, 445–456. DeSerres, A. & Guay, A. (1995), Selection of the truncation lag in structural VARs (or VECMs) with long run restrictions. Bank of Canada Working Paper 95-9. Diebold., F. X., Gunther, A. & Tay, K. (1998), ‘Evaluating density forecasts with application to financial risk management’, International Economic Review 39, 863–883. Diebold, F. X. & Mariano, R. S. (1995), ‘Comparing predictive accuracy’, Journal of Business and Economic Statistics 13, 253–263. Doornik, J. (1998), Object-oriented Matrix Programming using Ox, Timberlake Consultants. Doornik, J. (2001), Ox 3.0. An object-oriented matrix programming language, Timberlake Consultants. Doornik, J. & Hansen, H. (1994), A practical test for univariate and multivariate normality. Discussion Paper, Nuffield College, Oxford. Dupasquier, C., Guay, A. & St-Amant, P. (1999), ‘A survey of alternative methodologies for estimating potential output and the output gap’, Journal of Macroeconomics 21, 577–595. Economic models at the Bank of England (1999). Ehrmann, M. & Smets, F. (2003), ‘Uncertain potential output: implications for monetary policy’, Journal of Economic Dynamics and Control 27, 1611–1638. Fabiani, S. & Mestre, R. (2001), A system approach for measuring the Euro area NAIRU. European Central Bank Working Paper No. 65. 21
Fagan, G., Henry, J. & Mestre, R. (2001), ‘An area-wide model (AWM) for the eu11’, European Central Bank . Faust, J. & Leeper, E. (1997), ‘When do long run identifying restrictions give reliable results?’, Journal of Business and Economic Statistics 15, 345–353. Gerlach, S. & Smets, F. (1999), ‘Output gaps and monetary policy in the EMU area’, European Economic Review 43, 801–812. Gordon, R. J. (1997), ‘The time-varying NAIRU and its implications for economic policy’, Journal of Economic Perspectives 11, 11–32. Granger, C. & Pesaran, M. (2000), ‘Economic and statistical measures of forecast accuracy’, Journal of Forecasting 19, 537–560. Hamilton, J. D. (1986), ‘A standard error for the estimated state vector of a state-space model’, Journal of Econometrics 33, 387–397. Hamilton, J. D. (1994), Time Series Analysis, Princeton University Press. Harvey, A. C. (1989), Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press. Harvey, A. C. & Jaeger, A. (1993), ‘Detrending, stylized facts and the business cycle’, Journal of Applied Econometrics 8, 231–247. Harvey, A. & Koopman, S. (2000), ‘Signal extraction and the formulation of unobserved components models’, Econometrics Journal 3, 84–107. Harvey, A. & Trimbur, T. (2003), ‘General model based filters for extracting trends and cycles in economic time series’, Review of Economics and Statistics 85, 244–255. Hong, Y. (2002), ‘Evaluation of out-of-sample probability density forecasts’, Cornell University Discussion Paper . Kaiser, R. & Maravall, A. (2001), Measuring business cycles in economic time series, Springer. King, R. G., Plosser, C., Stock, J. & Watson, M. (1991), ‘Stochastic trends and economic fluctuations’, American Economic Review 81, 819–840. Koopman, S. & Harvey, A. (2003), ‘Computing observation weights for signal extraction and filtering’, Journal of Economic Dynamics and Control 27, 1317–1333. Koopman, S. J., Shephard, N. & Doornik, J. A. (1999), ‘Statistical algorithms for models in state space using SsfPack 2.2’, Econometrics Journal 2(1), 107–60. Laxton, D. & Tatlow, R. (1992), ‘A simple multivariate filter for the measurement of potential output’, Bank of Canada 59. 22
Morgenstern, O. (1963), On the Accuracy of Economic Observations, Princeton University Press, New Jersey. Morley, J. (1999), A note on constraining AR(2) parameters in estimation. mimeo, Washington University. Nelson, E. & Nikolov, K. (2001), UK inflation in the 1970s and 1980s: The role of output gap mismeasurement. Bank of England Working Paper no. 148. Orphanides, A. & van Norden, S. (2002), ‘The unreliability of output-gap estimates in real time’, The Review of Economics and Statistics 84, 569–583. Orphanides, A. & van Norden, S. (2003), ‘The reliability of inflation forecasts based on output gap estimates in real time’. CIRANO Working Paper 2003s-01. Pascual, L., Romo, J. & Ruiz, E. (2001), ‘Effects of parameter estimation on prediction densities: a bootstrap approach’, International Journal of Forecasting 17, 83–103. Quah, D. & Vahey, S. (1995), ‘Measuring core inflation’, Economic Journal 105, 1130– 1144. Quenneville, B. & Singh, A. (2000), ‘Bayesian prediction mean squared error for state space models with estimated parameters’, Journal of Time Series Analysis 21, 219– 236. Runstler, G. (2002), The information content of real-time output gap estimates: an application to the Euro area. European Central Bank Working Paper No. 182. Schumacher, C. (2002), ‘Forecasting trend output in the Euro area’, Journal of Forecasting 21(8), 543–558. Staiger, D., Stock, J. & Watson, M. (1997), ‘The NAIRU, unemployment and monetary policy’, Journal of Economic Perspectives 11, 33–49. Stoffer, D. & Wall, K. (1991), ‘Bootstrapping state-space models: Gaussian maximum likelihood estimation and the Kalman filter’, Journal of the American Statistical Association 86, 1024–1033. Stone, J., Champernowne, D. & Meade, J. (1942), ‘The precision of national income estimates’, Review of Economic Studies 9, 111–125. Tay, A. & Wallis, K. (2000), ‘Density forecasting: a survey’, Journal of Forecasting (19), 235–254. van Norden, S. (1997), ‘Measurement of the output gap: a discussion of recent research at the Bank of Canada’. Bank of Canada Working Paper.
23
van Norden, S. (2002), Filtering for current analysis. Bank of Canada Working Paper 2002-28. Wallis, K. (1982), ‘Seasonal adjustment and revision of current data: linear filters for the X-11 method’, Journal of the Royal Statistical Society A 145, 74–85. Wallis, K. (2003), ‘Chi-squared tests of interval and density forecasts, and the Bank of England’s fan charts’, International Journal of Forecasting (19), 165–175.
24
A
Data
Official Eurozone data for GDP, published by EUROSTAT, are available only from 1991. Unfortunately this does not offer a sufficiently long time-series for sensible business cycle analysis. Therefore we take the data from the ECB’s Area Wide Model (AWM); see Fagan et al. (2001).31 We use real GDP data (AWM code: YER). These data are available from 1970q1-2000q4. The data are then updated to 2003q1 using official data from EUROSTAT (via New Cronos).32 All data are used in their seasonally adjusted form. This means the data have in fact been seasonally adjusted using full-sample information, that of course would not be available to policy-makers in real-time. So our results ignore this additional source of uncertainty. For the multivariate estimators of the output gap, price, unemployment, consumption and investment data are required too. Revisions to these data are less important than for GDP data. Price data are the harmonised index of consumer prices [HICP] . These data are taken from the AWM and updated from 2001q1 using New Cronos.
A.1
Construction of an “approximate” real-time data set for Eurozone GDP
A real-time data set for Eurozone GDP is constructed from consecutive quarterly issues of EUROSTAT’s Quarterly National Accounts, starting from the 1993q1 publication. In each consecutive publication figures on quarterly real GDP growth are published, in general, for the previous 7 quarters although sometimes the previous 9 quarters. This means the data are only revised for these periods. We back-date all the vintages with values to 1990q3 (the earliest observation from the 1993q1 publication) using information that would have been available at the time; i.e. the previous vintages are used to back-date more recent vintages beyond their first published value. From 1998q1 GDP data are published with a one quarter lag; i.e. the 1998q1 publication’s most recent estimate is for 1997q4. Prior to 1998q1 GDP data were published with a two quarter lag, with the odd exception. Viewing each vintage as a column of a matrix with the first row of each column corresponding to a common starting point, we obain an upper triangular matrix, that is account for this change, by inserting a “fake” real-time data vintage between 1997q4 and 1998q1 - this consists of the 1997q4 vintage (which has estimates up to 1997q2) plus the 1997q3 value from the 1998q1 vintage (for this vintage the 1997q4 value is now the most recent estimate). Since there were not always four publications per year (in 1995, for example, there were only two in 1995q1 and 1995q4) the missing vintages are assumed 31
These data have been used in other studies of the output gap in the Eurozone such as Runstler (2002) and Camba-Mendez & Rodriguez-Palenzuela (2003). 32 After the completion of this paper an updated version of the AWM database (to 2002q4) was released by Jerome Henry. The real GDP series from this source appears very similar to the series considered in this paper (updated using official data). For example, the correlation of the Harvey-Trimbur cycles extracted from the two series is 0.99635 over the period 1971q1-2002q4. An alternative data source for aggregate Eurozone real GDP data from 1979q4-1999q3 is Beyer et al. (2001). Given the later starting point this series is not considered.
25
equal to the more recent vintage but observations that would not have been available at the time (given the typical publication lag) are deleted. For example, the 1995q3 vintage (which is not published) is assumed to be the 1995q4 vintage (which is published) minus the value for 1995q2 and the 1995q2 unpublished vintage is the 1995q4 vintage minus the values for 1995q1 and 1995q2. There are also definitional changes regarding the composition of the Eurozone. If there is a choice we opt for the definition closest to the Euro12. To facilitate business cycle analysis the real-time data set needs to go back beyond 1990q3. In this sense our real-time data set is “approximate” in that we backcast using the values from the AWM data-set. That is, the AWM data from 1970q1-1990q2 is updated using the consecutive vintages (the level of real GDP is backed out from the quarterly growth rate). This delivers 43 vintages, with data back to 1970q1, that can be used to examine the output gap in ‘real-time’ from 1992q3 to 2003q1. Although using the AWM data to backcast is ad hoc and does not reflect information truly available at the time, at least the most recent values for a given vintage, the ones most likely to impact on real-time output gap estimates, are genuine.
B
Appendix. Alternative output gap estimators
The output gap, the difference between actual and potential output, has an importance in the popular debate which can tend to run ahead of the problems in measuring it; the output gap is not observable. The choice of what measure, or estimator, of the output gap to use is more than a dry academic issue. As Canova’s (1998) analysis, for example, showed inference can be sensitive to measurement. Various estimators have been proposed. Gerlach & Smets (1999) make the distinction between statistical, structural and mixed estimators.33 The statistical approach views the estimation of the output gap as a statistical decomposition of actual output into trend and cyclical components. This approach is univariate. The structural approach exploits economic theory to estimate the output gap, typically by using a production function to relate potential output to productivity and inputs of labour and capital; see Barrell & Sefton (1995) and Economic models at the Bank of England (1999). The data requirements for this approach are, however, restrictive since, for example, estimates of the capital shock are required. We 33
An alternative approach, “euroCOIN” published by the CEPR, constructs a so-called “indicator” of the Eurozone business cycle by extracting the unobserved common component of Eurozone GDP growth from a panel, with a large number of variables, using dynamic principal components; see Altissimo et al. (2001). This cross-sectional information makes it possible to remove high frequency components with less lead and lag observations than traditionally used in time-series filters. But since this indicator is extracted from the first difference of GDP, a filter that is known to induce phase shift, rather than directly from the (log) level of GDP this indicator can less obviously be interpreted as a measure of the business cycle than the estimators considered in this paper. It is perhaps better interpreted as the trend, or likely future path, of GDP growth. Given this interpretation the euroCOIN indicator can be evaluated in terms of its forecast performance. In any case, although euroCOIN does provide a timely estimate of GDP growth, the output gap estimators considered in this paper have been more widely used by policy-makers for business cycle analysis; hence our emphasis on them in this paper.
26
consider the statistical and mixed approaches. Two statistical estimators are considered, largely for comparative purposes. The Hodrick-Prescott filter is perhaps the most widely used statistical approach for de-trending a time-series; we fix λ at 1600 as is common for quarterly data. Results for this filter provide an important benchmark. The Harvey-Trimbur cycle is an UC cycle, that is a generalisation of the class of Butterworth filters that have the attractive property of allowing smooth cycles to be extracted from economic time series - indeed ideal band pass filters emerge as a limiting case; see Harvey & Trimbur (2003).34
B.1
A mixed approach to measuring the output gap: combining statistics and economics
The mixed approach combines elements of the structural and statistical approaches. Time series methods are used to study output data together with other data that economic theory suggests are closely related to the output gap. The Phillips Curve, for example, suggests that inflation data contain information about the output gap while Okun’s Law suggests unemployment is important. These economic variables may contain useful information about the supply side of the economy and the stage of the business cycle. Output should not be detrended using output data alone. Specifically we will consider the following class of multivariate estimators of the output gap: (1) Unobserved components models; (2) Multivariate Hodrick-Prescott filters and (3) Structural VAR models.35
B.2
Multivariate Unobserved Components filters
We consider two representative multivariate unobserved components (UC) or state space models. The first is along the lines of Gerlach & Smets (1999) and Runstler (2002) and uses information on output and inflation. The second also considers unemployment; see Apel & Jansson (1999) and Fabiani & Mestre (2001). To avoid having to fix the signal to noise ratio [see Gordon (1997) and Economic models at the Bank of England (1999)] assumptions can be made about the nature of the cyclical process for unemployment or output; see also Gerlach & Smets (1999), Fabiani & Mestre (2001) and Runstler (2002). This typically takes the form of assuming a stationary cyclical process. Without such an assumption, the trend component typically accounts for all the variation in the level of the variable and soaks up all residual variation. Alternatively, as with the multivariate HP filter, a priori restrictions are placed on the variances with the aim of obtaining plausible looking cycles. This approach has been used, for example, by Chagny & Lemoine (2002). 34
In the notation of Harvey & Trimbur (2003) we set m = n = 2. Related multivariate approaches to estimating the output gap are based on the multivariate Beveridge-Nelson decomposition and the Cochrane approach; e.g. see Dupasquier et al. (1999). 35
27
B.2.1
A bivariate UC model of output and inflation
To give the output gap a more economic interpretation than in univariate unobserved components models, it is related to data on inflation. This is sensible as the relationship between the output gap and inflation is central to the Phillips Curve and the conduct of monetary policy. We consider general models of the form: yt = yt∗ + ytC Θ(L)ytC = εyt C Γ(L)∆π t = λyt−1 + επt ∗ + β t−1 + εyt yt∗ = yt−1
β t = β t−1 + εβt
(6) (7) (8) ∗
(9) (10)
where yt is the log of actual output, yt∗ is its trend level, ytC is the output gap, π t is quarterly inflation in percentage points measured at an annual rate and ε are the disturbances. All disturbances are assumed i.i.d. Gaussian. The Phillips Curve equation, (8), provides a link between inflation and aggregate demand (measured here by the output gap). Since inflation is commonly assumed to depend only on nominal factors in the long-run we can see (8) to be imposing a long run homogeneity restriction. This can be seen by appreciating that underlying (8) is the C following equation, Γ∗ (L)π t = λ(L)yt−1 + επt expressed in the level of inflation. Long run homogeneity requires Γ∗ (1) = 0, implying Γ∗ (L) = Γ(L)(1 − L), meaning the Phillips Curve relationship is expressed in the first differences of inflation, ∆π t . We allow for the lag polynomial Θ(L) to be second order. The roots are constrained to be stationary, but importantly we do allow for complex roots; see Morley (1999) for an account of how the roots can be constrained through a simple re-parameterisation. Experimentation suggested that restricting attention to a first order polynomial led to less sensible looking cycles. Γ(L) is assumed to be first order. We consider a smooth trend specification where σ 2y∗ = 0.36 εt
36
Other options are to allow the disturbances to follow moving average processes; e.g. if we let επt follow a MA process then a far richer dynamic process is possible than with a purely auto-regressive specification; such an approach is followed by Runstler (2002). We also address the issue of whether inflation should depend on lagged, but not current, or current as well as lagged, output (or unemployment) gap. Gerlach & Smets (1999) and Staiger et al. (1997), for example, consider lagged values only. However, current values are also considered, see for example Gordon (1997). This means that inflation and the output gap can be affected by shocks simultaneously, which appears reasonable; see, for example, Astley & Yates (1999) for further discussion of why it is important to allow for such endogeneity. Experimentation revealed that in practice although this assumption affects the timing of the cycles by one period, the shape is largely unaffected.
28
B.2.2
A trivariate UC model of output, inflation and unemployment
It is also common to consider as an additional economic variable, unemployment; see Apel & Jansson (1999) and Fabiani & Mestre (2001). This is based on the view that the output and unemployment gaps are closely related. Inflation is likely to contain information about the size of both gaps. Restrictions can then be imposed in an unobserved components model that identify not just the Phillips Curve, linking measures of excess demand to inflation, but Okun’s Law, that relates the unemployment and output gaps. Define ut and u∗t as unemployment and trend unemployment, respectively. We consider the following model, based on Apel & Jansson (1999) and Fabiani & Mestre (2001): ∆π t = Γ(L)∆π t−1 + Ψ(L)(ut−1 − u∗t−1 ) + επt yt − yt∗ = γ(L)(ut−1 − u∗t−1 ) + εqgap t
(11) (12)
where ∗
∗ yt∗ = yt−1 + m1t−1 + εyt ∗ u∗t = u∗t−1 + m2t−1 + εut
(13) (14)
and 1
m1t = m1t−1 + εm t
m2
m2t = m2t−1 + εt
(15) (16)
The unemployment gap is modelled as an autoregressive process: (ut − u∗t ) = δ(L)(ut−1 − u∗t−1 ) + εugap t
(17)
Equation (11) is a version of Gordon’s triangle Phillips Curve model; it relates inflation to movements in the unemployment gap. Expectations are implicit in the inflation dynamics. Equation (12) is an Okun’s Law relationship, relating cyclical unemployment and output movements. Equations (13) and (14) are assumed to follow a local linear trend model. This representation was used with success by Fabiani & Mestre (2001) in an application to the Eurozone. It implies that the trend of output and unemployment (the NAIRU) are I(2) processes. We consider a first order polynomial for γ(L) . δ(L) is second order and constrained to be stationary but allowed to be complex (this constraint is empirically important). Ψ(L) is first order. Γ(L) is second order. Again we consider a smooth trend specification where σ 2y∗ = 0. εt It should be noted that the above system does not allow for full endogeneity between real disequilibria and inflation, that would involve neither real disequilibria causing inflation nor vice-versa. A restrictive path for the transmission of demand shocks is implied as demand shocks lead to inflation via the unemployment gap, and then from the unemployment gap to the output gap; see Astley & Yates (1999). 29
B.2.3
Estimation of UC models
The parameters of the univariate and multivariate unobserved components models are estimated by maximum likelihood exploiting their state-space form. Computations are performed using the beta version of SsfPack 3 for Ox. Importantly, in contrast to Fabiani & Mestre (2001), for example, all observable variables are put in the state vector to ensure parameter uncertainty is fully accounted for; see Harvey (1989), pp. 366-368. In the context of Fabiani & Mestre (2001), see their Appendix, this means that inflation is also placed in the state vector rather than left in the measurement equation.
B.3
Multivariate Hodrick Prescott filters
Laxton & Tatlow (1992) proposed an extension to the Hodrick-Prescott (HP) filter which incorporates economic information. Additional, so-called economic, constraints are imposed on the minimisation from which the HP filter is defined. The residuals from a structural equation, such as the Phillips Curve or Okun’s Law, are added to the minimisation problem that the univariate HP filter seeks to solve. Just as the univariate or traditional HP filter can be interpreted within an unobserved components framework [see Harvey & Jaeger (1993)], so can the multivariate HP filter; see Boone (2000). This facilitates estimation by maximum likelihood and inference since confidence bands around the estimates can be derived from the Kalman filter recursions. B.3.1
A bivariate HP model of output and inflation
If one does not assume a particular parametric process for the cyclical component, as is commonly done with the UC models, to obtain plausible looking cycles typically one will need to constrain the variance of the disturbances driving the elements of the state vector. This is the approach taken by the HP filter, albeit implicitly when the filter is interpreted as a nonparametric filter.37 To illustrate, consider the following UC model where potential output is assumed to follow, say, a smooth trend representation [see Harvey & Jaeger (1993)] and the economic constraint is based on the Phillips Curve: yt = yt∗ + ytC C Γ(L)∆π t = λyt−1 + επt ∗ yt∗ = yt−1 + β t−1 β t = β t−1 + εyt
∗
(18) (19) (20) (21)
where all disturbances are i.i.d. Gaussian. Alternatives representations for potential output such as the local linear trend and the random walk representation can be considered. 37
The variances of the disturbances, of course, can be estimated by maximum likelihood. It is the decision not to estimate them, but assume a priori values, that we take to be the defining characteristic of the multivariate HP approach. This contrasts the multivariate UC approach where these variances are estimated.
30
Writing (18) in its state-space form, the transition equation for the state-vector (for expositional ease only making some specific assumptions about the form of the lag polynomials in (18)) is given by: π εt ∆π t−1 ∗ 0 0 ∗ ∆π t ∗ ∗ ∗ yt 0 2 −1 0 yt−1 εyt ∗ = ∗ + (22) yt−1 0 1 0 0 yt−2 0 C C 0 0 0 0 yt−1 ytC εyt Consistent with how for the univariate HP filter the signal to noise ratio determines the smoothness of the trend series (σ 2y∗ /σ 2yC ) [see Harvey & Jaeger (1993) and Kaiser & Maravall (2001)], the relative variance of the disturbances in (22) controls the smoothness of the cycle and the fit of the economic relationship (the second equation in (18)). As σ 2y∗ tends to infinity the more explanatory power is given to the unobserved variable and the less the importance of the cycle. Let λ1 and λ2 denote these relative variances, that control the problem, where λ1 = σ 2yC /σ 2y∗
(23)
σ 2yC /σ 2π .
(24)
λ2 =
Set, without loss of generality, σ 2yC = 1, then λ1 = 1/σ 2y∗
(25)
λ2 = 1/σ 2π
(26)
Then λ1 controls the smoothness of the trend component of output. As λ1 → 0 the trend becomes very volatile and can soak up all cyclical variation. As λ1 → ∞ the trend tends to a deterministic (smooth) trend. Traditionally, as with the univariate HP filter, σ 2y∗ = 1/1600 ⇔ λ1 = 1600. λ2 controls the fit of the Phillips Curve relationship. As λ2 → 0 (σ 2π → ∞) the worse the fit of the the economic relationship, implying less information provided by the economic relationship. Traditionally σ 2π ' 1/25 ⇔ λ2 = 25 or σ 2π ' 1 ⇔ λ2 = 1; see Boone (2000) and Chagny & Lemoine (2002). Note that the better the fit of the economic relationship the better the explanatory power lagged values of the output gap have over inflation.
B.4
Structural VAR (SVAR) models
Following the seminal paper by Blanchard & Quah (1989), VAR models with economic restrictions imposed on the long-run have been used to estimate output gaps; see Bullard & Keating (1995) and Camba-Mendez & Rodriguez-Palenzuela (2003), and for a related exercise Quah & Vahey (1995). A perceived advantage of SVAR models, relative to UC models, is that SVAR models can be viewed as one-sided filters; in this sense they 31
overcome the end-point problem associated with UC model that can be seen to involve application of a two-sided filter. In the SVAR approach restrictions are imposed on the matrix of long-run multipliers so that demand (or transitory) shocks can be distinguished from supply (or permanent shocks). Transitory shocks are identified by imposing the restriction that they have no long-run effect on the level of the variable of concern. The output gap is the cumulative sum of the transitory shocks to output. Potential output is then the cumulative sum of the permanent shocks. In contrast to the other multivariate methods of estimating the output gap, the SVAR approach gives the components of output an economic interpretation. Potential output is not assumed to follow a random walk as in UC models. Furthermore, both real disequilibria and inflation are modelled as endogenous variables. We distinguish two broad types of SVAR model: those with and without cointegration. Specifically, we consider models of the type proposed by Blanchard & Quah (1989) and King et al. (1991) that deal with the case of no cointegration and cointegration, respectively. In cointegrating VAR models there is an additional identification problem taking two related forms: (i) identification of the cointegrating vectors and (ii) identification of the common stochastic trends. Both the cointegrating vectors and the common stochastic trends offer characterisations of the “long run” in a VAR model. A popular approach to identification in the presence of cointegration is King et al. (1991). Cointegration provides information on the number of permanent and transitory shocks. Atheoretical alternatives to the identification strategy of King et al. (1991) have been considered also; for an application to the Eurozone see Schumacher (2002). B.4.1
Blanchard-Quah SVAR models
We consider SVAR models in line with and Astley & Yates (1999) and Camba-Mendez & Rodriguez-Palenzuela (2003). We will consider trivariate VAR models of output, inflation and unemployment; see Camba-Mendez & Rodriguez-Palenzuela (2003). Restrictions then can be imposed on the matrix of long-run multipliers so that inflation is determined in the long-run by only one structural shock (the inflation shock), whereas output is determined by two structural shocks (the inflation shock plus a supply shock), and unemployment is determined by three structural shocks (the inflation shock, the supply shock and a demand shock). The output gap is then the cumulative sum of the demand shocks to output. It is important that the estimated (reduced-form) VAR model, that provides the basis for the identification of the output gap, is stationary. This requires the variables in the VAR to be appropriately transformed prior to estimation. Following Camba-Mendez & Rodriguez-Palenzuela (2003) in their application to the Eurozone we consider the first differences of inflation, GDP and unemployment. We did, however, experiment with representations in the level of inflation and unemployment.
32
B.4.2
King-Plosser-Stock-Watson SVAR models
Rather than estimating a stationary (perhaps first-differenced) VAR model a cointegrating VAR model is considered. We follow King et al. (1991) and consider a three variable system of GDP, consumption and investment. There are two cointegrating vectors: the ratios of consumption and investment to GDP are stationary. Therefore, there is one permanent shock and two transitory shocks. The output gap is the cumulative sum of the two transitory shocks to output. B.4.3
The importance of lag length
The long-run restrictions implicit to identification require estimation of the matrix of long run responses (the sum of the lag polynomial in the moving-average representation corresponding to the VAR model). This is based on estimation of the sum of the coefficients in the VAR model. Therefore the reliability of SVAR estimates of the output gap rests on reliable estimation of the AR coefficients; see Faust & Leeper (1997) for details. A key issue then is selection of lag order in the VAR model. Using too small a lag order can lead to significant biases in estimation of the permanent and transitory components; see DeSerres & Guay (1995). DeSerres & Guay (1995) found that use of information criteria lead to too low an estimated lag order. This is consistent with our findings in the sense that the BIC tended to select a lag order of one, but it was only with much larger lag orders that sensible looking output gap estimates were generated. B.4.4
Uncertainty associated with SVAR estimates
SVAR models require estimation of a large number of parameters. This is expected to give rise to considerable uncertainty about the estimates, as in a changing world we expect considerable parameter uncertainty. One interesting approach to reduce this uncertainty is to impose rank restrictions on the estimated VAR model that reduce the number of estimated parameters; see Camba-Mendez & Rodriguez-Palenzuela (2003). One means of quantifying the uncertainty associated with SVAR estimates of the output gap is to undertake a series of stochastic simulation. An alternative is to appreciate that underlying the SVAR model is a state-space representation; e.g. DeSerres & Guay (1995) exploit a state-space representation that is close in spirit to the BQ approach to investigate the importance of lag order on output gap estimation via Monte-Carlo simulation.
C C.1
Appendix. Uncertainty and UC models Confidence bands
Let at|T = E(αt |ΩT ) and at|t = E(αt |Ωt ) denote the optimal (minimum mean squared error [MSE]) estimators of the state vector of the UC model, αt , based on observations (or more generally information set Ω) up to and including T and t, respectively, conditional on the parameter vector Θ that is estimated by maximum likelihood. One of the elements 33
of αt is ytC . Then let Pt|t = V (at|t − αt ) and Pt|T = V (at|T − αt ) denote their covariance matrices. Then, assuming covariances between at|t and at|T are zero, we know Pt|t = V (at|t − αt ) = V (at|t − at|T + at|T − αt ) = Pt|T + V (at|T − at|t ) Pt|t = Pt|T + V (at|T − at|t )
(27) (28)
so the MSE of the filtered (or one-sided) estimates is greater than that of the smoothed (or two-sided) estimates by a positive semidefinite matrix, V (at|T − at|t ). The MSE Pt|t0 decreases as t0 increases. If Θ were known Pt|t and Pt|T would indicate the uncertainty in the Kalman-filter recursions. We call this uncertainty, filter uncertainty. However, there is an additional b i.e. Pt|t (Θ) b > Pt|t (Θ) and Pt|T (Θ) b > source of uncertainty if Θ is estimated, say by Θ; Pt|T (Θ). We account for both of these types of uncertainty following Hamilton (1986).38
C.2
Revisions
Define the revision Rt|T as Rt|T = at|T − at|t . Then we may re-write (27) as: V (Rt|T ) = Pt|t − Pt|T .
(29)
The MSE of the revisions is a lower bound for the MSE of Pt|t . This, as stressed by Runstler (2002), is useful as it provides an indication of the MSE for estimators that cannot be cast in the state-space form. We can interpret (29) as the familiar result that the MSE of the real-time (forecasted) estimate is equal to the variance of the outturn, Pt|T , plus the square of the bias, V (Rt|T ). This assumes the revision process has mean zero so that V (Rt|T ) = E(Rt|T )2 . The variance of the revision allowing for parameter uncertainty as well as filter uncertainty is: b = Pt|t (Θ) b − Pt|T (Θ). b V (Rt|T )(Θ) C.2.1
(30)
Relating revisions to forecast errors
The revisions can be related to the forecast error. Since [see Hamilton (1994), p.396]: −1 at|T = at|t + Pt|t T 0 Pt+1|t at+1|T − at+1|t (31) −1 Rt|T = Pt|t T 0 Pt+1|t at+1|T − at+1|t (32) where T is the matrix in the transition equation of the state vector, the revision is a function of the forecast error at+1|t . The closer at+1|t is to the final estimate at+1|T the smaller the revision and the less the difference between the filtered and smoothed estimates. Therefore, the better the one-step ahead forecasts of the state vector the less the revision. 38
An alternative approach is to use analytical approximations to capture the parameter uncertainty, see Quenneville & Singh (2000), like Orphanides & van Norden (2002).
34
We can see this in another way; see also Kaiser & Maravall (2001), p. 118. Consider the final estimate of the output gap as a known weighted linear function, that is say centered and symmetric, of the underlying GDP data, say yt : at|T = B(L, F )yt where B(L, F ) = b0 +
∞ X
(33)
bj (Lj + F j ) and Lyt = yt−1 and F yt = yt+1 . Then the final
j=1
estimate of the output gap is given by: at|T = B(L, F )yt = b0 +
∞ X
bj yt−j +
j=1
∞ X
bj yt+j .
(34)
j=1
But in real-time the future values of yt , namely yt+1 , yt+2 ... are unknown. To apply the two-sided filter future values need to be forecasted. Denote the forecasts of yt+1 , yt+2 ... made at time t by E(yt+j |Ωt ). Then we may define the real-time estimate as: at|t = b0 +
∞ X
bj yt−j +
j=1
∞ X
bj E(yt+j |Ωt ).
(35)
j=1
Subtracting (35) from (34) the revision is obtained as a function of the forecasting error, (yt+j − E(yt+j |Ωt )), at|T − at|t = Rt|T =
∞ X
bj (yt+j − E(yt+j |Ωt )).
j=1
It follows that if these forecast errors are reduced the revision will decrease.
35
(36)
Table 3: Evaluation of Hodrick-Prescott interval and density estimates: 1981q1-2000q1
Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
1981q1-2000q1 Simulated Gaussian 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.105 0.000 0.000 0.000
1993q3-2000q1 Simulated Gaussian 0.011 0.000 0.367 0.198 0.524 0.504 0.000 0.107 0.000 0.453 0.009 0.000 0.000 0.000 0.000 0.124 0.768 0.001 0.001 0.002
Notes: LB1-LB3 are p-values for tests for serial correlation in the first, second and third power. Norm is the p-value for the normality test. Interval95 and Interval50 are the 95 and 50 per cent interval forecasts. UC is the p-value for the unconditional test. Ind is the p-value for the independence test and CC is the p-value for the conditional coverage test.
36
Table 4: Evaluation of univariate UC interval and density estimates: 1981q1-2000q1 1981q1-2000q1 Actual Simulated Gaussian Density:LB1 0.000 0.000 Density:LB2 0.000 0.000 Density:LB3 0.000 0.000 Density: Norm 0.018 0.16 Interval95:UC 0.000 0.000 Interval95:Ind 0.000 0.00 Interval95:CC 0.000 0.000 Interval50:UC 0.000 0.040 Interval50:Ind 0.000 0.00 Interval50:CC 0.000 0.000
1993q3-2000q1 Simulated Gaussian 0.000 0.000 0.008 0.029 0.033 0.044 0.026 0.153 0.895 0.453 0.105 0.000 0.116 0.000 0.700 0.000 0.138 0.000 0.251 0.000
Table 5: Evaluation of univariate UC interval and density estimates allowing for uncertainty in the final estimates: 1993q3-2000q1
Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
S:mean 0.007 0.225 0.253 0.161 0.730 0.020 0.023 0.121 0.117 0.050
S:var S:prop>0.05 0.001 0.028 0.084 0.760 0.063 0.734 0.068 0.384 0.048 1.000 0.002 0.174 0.003 0.182 0.061 0.264 0.047 0.450 0.021 0.158
G:mean 0.000 0.074 0.057 0.277 0.453 0.000 0.000 0.000 0.000 0.000
G:var 0.000 0.023 0.005 0.067 0.000 0.000 0.000 0.000 0.000 0.000
G:prop>0.05 0.000 0.198 0.306 0.824 1.000 0.000 0.000 0.000 0.000 0.000
Notes: S denotes the simulated measure of uncertainty and G denotes the Gaussian one. mean is the average of the p-values across the evaluation tests, var is the variance of the p-values and prop>0.05 is the proportion of times the p-value is greater than 5 per cent.
37
Table 6: Evaluation of bivariate UC interval and density estimates: 1981q1-2000q1
Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
1981q1-2000q1 Simulated Gaussian 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.076 0.000 0.855 0.000 0.248 0.000 0.649 0.001 0.362 0.000 0.000 0.000 0.000
1993q3-2000q1 Simulated Gaussian 0.000 0.000 0.010 0.231 0.010 0.037 0.000 0.512 0.895 0.453 0.000 0.000 0.000 0.000 0.248 0.000 0.379 0.000 0.347 0.000
Table 7: Evaluation of bivariate UC interval and density estimates allowing for uncertainty in the final estimates: 1993q3-2000q1
Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
S:mean 0.008 0.189 0.149 0.116 0.555 0.002 0.002 0.209 0.240 0.111
S:var S:prop>0.05 0.001 0.046 0.063 0.444 0.048 0.442 0.041 0.342 0.039 0.996 0.000 0.014 0.000 0.022 0.079 0.562 0.052 0.720 0.029 0.386
38
G:mean 0.030 0.573 0.320 0.600 0.453 0.000 0.000 0.000 0.005 0.000
G:var 0.003 0.054 0.072 0.039 0.000 0.000 0.000 0.000 0.001 0.000
G:prop>0.05 0.160 1.000 0.864 1.000 1.000 0.000 0.000 0.000 0.010 0.000
Table 8: Evaluation of bivariate Hodrick-Prescott interval and density estimates: 1981q12000q1
Actual Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
1981q1-2000q1 Simulated Gaussian 0.000 0.000 0.237 0.036 0.649 0.004 0.167 0.675 0.855 0.080 0.505 0.000 0.788 0.000 0.040 0.000 0.004 0.048 0.003 0.000
1993q3-2000q1 Simulated Gaussian 0.001 0.016 0.244 0.930 0.199 0.827 0.000 0.592 0.895 0.453 0.339 0.000 0.414 0.000 0.441 0.000 0.300 0.000 0.301 0.000
Table 9: Evaluation of bivariate Hodrick-Prescott interval and density estimates allowing for uncertainty in the final estimates: 1993q3-2000q1
Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
S:mean 0.028 0.410 0.327 0.055 0.293 0.473 0.162 0.493 0.541 0.497
S:var S:prop>0.05 0.001 0.296 0.051 0.998 0.070 0.834 0.014 0.198 0.107 0.664 0.107 0.738 0.039 0.454 0.066 0.962 0.152 0.840 0.128 0.804
39
G:mean 0.009 0.805 0.717 0.649 0.453 0.000 0.000 0.000 0.015 0.000
G:var 0.000 0.018 0.019 0.027 0.000 0.000 0.000 0.000 0.003 0.000
G:prop>0.05 0.000 1.000 1.000 1.000 1.000 0.000 0.000 0.000 0.030 0.000
Table 10: Evaluation of trivariate UC interval and density estimates: 1981q1-2000q1 1981q1-2000q1 Actual Simulated Gaussian Density:LB1 0.000 0.000 Density:LB2 0.000 0.000 Density:LB3 0.000 0.000 Density: Norm 0.094 0.257 Interval95:UC 0.000 0.015 Interval95:Ind 0.000 0.633 Interval95:CC 0.000 0.058 Interval50:UC 0.000 0.023 Interval50:Ind 0.000 0.006 Interval50:CC 0.000 0.002
1993q3-2000q1 Simulated Gaussian 0.000 0.000 0.004 0.373 0.001 0.240 0.272 0.320 0.000 0.310 0.002 0.535 0.000 0.615 0.007 0.700 0.006 0.006 0.002 0.023
Table 11: Evaluation of trivariate UC interval and density estimates allowing for uncertainty in the final estimates: 1993q3-2000q1
Density:LB1 Density:LB2 Density:LB3 Density: Norm Interval95:UC Interval95:Ind Interval95:CC Interval50:UC Interval50:Ind Interval50:CC
S:mean 0.101 0.229 0.241 0.320 0.211 0.089 0.087 0.115 0.122 0.056
S:var S:prop>0.05 0.039 0.224 0.078 0.592 0.051 0.714 0.072 0.870 0.115 0.332 0.044 0.188 0.033 0.238 0.075 0.282 0.038 0.386 0.027 0.168
40
G:mean 0.000 0.160 0.104 0.490 0.351 0.110 0.117 0.289 0.062 0.030
G:var 0.000 0.020 0.010 0.034 0.111 0.058 0.049 0.113 0.024 0.004
G:prop>0.05 0.000 0.816 0.626 0.982 0.644 0.268 0.300 0.734 0.182 0.176
Table 12: p-values for the joint null hypothesis of a zero mean and unit variance for {zt∗ } 1981q1-2000q1 1993q3-2000q1 Simulated Gaussian Simulated Gaussian dep ind dep ind dep ind dep ind univariate HP 0.11 0.00 0.20 0.00 0.19 0.00 0.44 0.16 univariate UC 0.17 0.00 0.32 0.03 0.44 0.40 0.00 0.00 bivariate HP 0.30 0.11 0.00 0.00 0.49 0.48 0.00 0.00 bivariate UC 0.15 0.00 0.14 0.00 0.23 0.02 0.00 0.00 trivariate UC 0.13 0.00 0.18 0.00 0.22 0.00 0.32 0.04 Notes: dep denotes a HAC test and ind a test that assumes independence
41
HP bivariate_UC trivariate_UC KPSW
0.075
univariate_UC bivariate_HP BQ
0.050
0.025
0.000
−0.025
−0.050
−0.075
−0.100
−0.125 1970
1975
1980
1985
1990
1995
2000
Figure 1: Final estimates of the Eurozone business cycle: a comparison of different estimators 42
Simulated 95% confidence intervals 0.02
0.00
−0.02
1980
1985
1990
1995
2000
1990
1995
2000
Gaussian 95% confidence intervals 0.05
0.00
−0.05 1980
1985
Figure 2: Uncertainty associated with real-time estimates using the univariate UC filter. Note: final estimate is is bold face 43
0.02 Simulated 95% confidence intervals 0.01 0.00 −0.01 −0.02
1980
1985
1990
1995
2000
1990
1995
2000
Gaussian 95% confidence intervals 0.02
0.00
−0.02
−0.04 1980
1985
Figure 3: Uncertainty associated with real-time estimates using the Hodrick-Prescott filter. Note: final estimate is is bold face 44
0.02
Simulated 95% confidence intervals
0.00
−0.02
−0.04 1980
1985
1990
1995
2000
1990
1995
2000
0.050 Gaussian 95% confidence intervals 0.025 0.000 −0.025 −0.050 1980
1985
Figure 4: Uncertainty associated with real-time estimates using the bivariate UC filter. Note: final estimate is is bold face 45
0.50 Simulated 95% confidence intervals 0.25 0.00 −0.25 −0.50 1980
1985
1990
1995
2000
1990
1995
2000
Gaussian 95% confidence intervals 1
0
−1 1980
1985
Figure 5: Uncertainty associated with real-time estimates using the bivariate HodrickPrescott filter. Note: final estimate is is bold face 46
0.10
Simulated 95% confidence intervals
0.05
0.00
−0.05 1980
1985
1990
1995
2000
1990
1995
2000
Gaussian 95% confidence intervals 0.10 0.05 0.00 −0.05 1980
1985
Figure 6: Uncertainty associated with real-time estimates using the trivariate UC filter. Note: final estimate is is bold face 47