forecasting with smooth transition autoregressive ...

2 downloads 0 Views 383KB Size Report
Jun 16, 2000 - forecasts called the fan chart. The graph opens up like a fan as the forecast horizon increases. The idea is to compute 100α% prediction ...
June 16, 2000

FORECASTING WITH SMOOTH TRANSITION AUTOREGRESSIVE MODELS by Stefan Lundbergh and Timo Teräsvirta

Abstract

This chapter considers the use of smooth transition autoregressive models for forecasting. First, the modelling of time series with these nonlinear models is discussed. Techniques for obtaining multiperiod forecasts are presented. The usefulness of forecast densities in the case of nonlinear models is considered and techniques of graphically displaying such densities demonstrated. The chapter ends with an empirical example of forecasting two quarterly unemployment series.

Acknowledgements. The first author thanks the Tore Browaldh’s Foundation for financial support. The work of the second author has been supported in part by the Swedish Council for Research in the Humanities and Social Sciences.

Keywords. Density forecast, highest density region, nonlinear forecasting, nonlinear modelling, LSTAR model, time series forecasting

JEL Classification Codes: C22, C52

Address for correspondence: Department of Economic Statistics Stockholm School of Economics Box 6501 SE-113 83 Stockholm Sweden

1

Introduction

Forecasting with a nonlinear model is numerically more complicated than carrying out a similar exercise with a linear model. This makes it worthwhile to discuss forecasting from nonlinear models in more detail. In this chapter, we consider forecasting with a special type of nonlinear model, the so-called smooth transition autoregressive (STAR) model. The idea of smooth transition in the form it appears here can be traced back to Bacon and Watts (1971). For the history and applications of the STAR model to economic time series see, for example, Granger and Teräsvirta (1993) or Teräsvirta (1994). The STAR model nests a linear autoregressive model, and the extra parameters give the model added flexibility, which may be useful in econometric modelling and forecasting. Much of what will be said in this chapter generalizes to the case where we have a smooth transition regression model with regressors that are strongly exogenous for their coefficients. At the moment there do not exist very many examples of forecasting with STAR models in the econometrics or time series literature. Teräsvirta and Anderson (1992) forecast quarterly OECD industrial production series with these models. The results were mixed: in some cases the STAR model yielded slightly better one-quarter-ahead forecasts than the linear model, but in some other cases the situation was the reverse. Forecasters who have forecast with other nonlinear models have obtained similar results. A conclusion of the authors was that the results of comparing forecasts from a linear and nonlinear model probably depend on how ‘nonlinear’ the forecast period is. Recently, Sarantis (1999) modelled nonlinearities in real effective exchange rates for 10 major industrialized countries with STAR models. A set of out-of-sample forecast experiments indicated that the STAR model yielded more accurate forecasts than a pure random walk. On the other hand, it did not perform any better than a linear autoregressive model. The STAR model did outperform another nonlinear model, a hidden Markov model with a switching intercept parameterized as in Hamilton (1989). It is worth pointing out that these forecast comparisons are based

1

on point forecasts. We shall argue below that the real value of forecasts from nonlinear models may well lie in forecast densities whose shape may offer important information to policymakers. We begin this chapter by defining a STAR model and briefly discussing its properties. We also mention a systematic approach to STAR modelling proposed in the literature. The idea is to first specify the model, estimate its parameters and, finally, evaluate the estimated model. After a short presentation of the modelling cycle we turn to forecasting. First, we show how to obtain multi-period forecasts numerically from STAR models. Next, we highlight some of the issues emerging in STAR forecasting with a simulation experiment. This includes a discussion of ways of reporting forecasts and forecast densities that result from the numerical forecasting procedure. Finally, we illustrate practice of forecasting with an example, in which two quarterly unemployment series modelled in Skalin and Teräsvirta (2000) are forecast up to nine quarters ahead with logistic STAR models.

2 2.1

STAR model Definition

The smooth transition autoregressive model is defined as follows yt = φ wt +(θ wt )GL k (γ, c; st ) + εt

(1)

where {εt } is a sequence of normal (0,σ 2 ) independent errors, φ = (φ0 , ..., φp ) and θ = (θ 0 , ..., θp ) are (p + 1) × 1 parameter vectors, wt = (1, yt−1 , ..., yt−p ) is the vector of consisting of an intercept and the first p lags of yt . Furthermore, the transition function is GL k (γ, c; st ) = (1 + exp{−γ

k  (st − ci )})−1

(2)

i=1

where γ > 0 and c1 ≤ ... ≤ ck are identifying restrictions. This model is called the logistic STAR model of order k (LSTAR(k)) model. The transition variable st is either a weakly stationary stochastic variable or a deterministic function of time, t. A common case considered here is that 2

st = yt−d , d > 0. In the application of Section 4.2, however, st = ∆4 yt−d . The slope or smoothness parameter γ controls the slope of the transition function (2). For example, if k = 1 and γ → ∞ then (2) becomes a step function and the STAR model (1) a threshold autoregressive (TAR) model with two regimes. Note that if γ → 0 the model becomes linear as (2) becomes a constant. The most common cases in practice are k = 1 and k = 2 in (2). In the former case, illustrated in Figure 1, the transition function increases monotonically from zero to unity with st . The LSTAR(1) model may thus be applied, for example, to modelling asymmetric business cycles because the dynamics of the model are different in the expansion from the recession. In the latter case, depicted in Figure 2, the transition function is symmetric about its minimum at (c1 + c2 )/2 and approaches unity as st → ±∞. The minimum lies between zero and 1/2. It equals 1/2 if c1 = c2 and approaches zero as γ → ∞ while c1 < c2 . This is clearly seen from Figure 2. The dynamics of the LSTAR(2) model are therefore similar both for high and low values of the transition variable and different in the middle. Note that if k = 2 and c1 = c2 then (2) may be replaced by GE (γ, c; st ) = 1 − exp{−γ(st − c)2 }.

(3)

The STAR model (1) with (3) is called the exponential STAR (ESTAR) model and has the property that the minimum value of the transition function equals zero; see Figure 3. This is often convenient when interpreting the estimated model, which is the main reason for preferring the ESTAR model to the LSTAR(2) one. Note, however, that the exponential transition function degenerates to constant 1 except for st = c as γ → ∞, .which is an argument in favour of using the logistic function with k = 2. Other choices of transition function are possible. The cumulative distribution function of the normal distribution ( Chan and Tong (1986)) and the hyperbolic tangent function (Bacon and Watts (1971)) are among those proposed for the purpose. We just have to assume that the transition function is bounded, continuous, and at least twice differentiable with respect to its parameters everywhere in the sample space.

3

Equation (1) may also be written as follows: yt = φ(st ) wt + εt .

(4)

where φ(st ) = φ+G(γ, c;st )θ. From (4) it is seen that the STAR model may be interpreted as a linear AR model with time-varying parameters. We may call the STAR model ‘locally linear’ in the sense that for a fixed value of st , (4) is linear. When st = t, the coefficients of the AR model evolve deterministically over time. The STAR model may thus display locally-explosive behaviour (the roots of the polynomial 1 −

p

j=1 [φj

j + GL k (γ, c;st )θ j ]z lie inside the unit circle for some st )

but still be globally stable. This property may help in modelling economic time series with sudden movements and asymmetric behaviour.

2.2 2.2.1

Modelling cycle General

Modelling time series in an organized fashion becomes an important issue when the model is not completely determined by theory. This is often the situation in univariate modelling. Box and Jenkins (1970) devised a modelling cycle for the specification, estimation and evaluation of the family of linear ARMA or ARIMA models. Similar cycles have been proposed for nonlinear models: see Tsay (1989) and Tsay (1998) for the families of univariate and multivariate TAR models. There exists a modelling cycle for STAR models as well. The most recent description of this cycle can be found in Teräsvirta (1998); for earlier accounts see Teräsvirta (1994) and Granger and Teräsvirta (1993). The cycle consists of the specification, estimation and evaluation stages. The specification stage may include the choice of the type of the LSTAR model (order k) and includes the specification of the lag structure. This stage usually already requires estimation of STAR models. The parameter estimation can be carried out by nonlinear least squares which is equivalent to conditional maximum likelihood when we assume normality as in Section 2.1. At the evaluation stage the model is subjected to a number of misspecification tests and other checks 4

to ensure its adequacy and feasibility. We discuss these stages below.

2.2.2

Specification

It can be seen from equation (1) that a linear AR(p) model is nested in the logistic STAR model. As mentioned above, the STAR model is linear if γ = 0. The possibility that the series has in fact been generated by a linear model must not be overlooked. Since linear models are easier to apply and forecast from than most nonlinear ones, our first task is to test linearity against STAR. A possible null hypothesis is H0 : γ = 0 and the alternative H1 : γ > 0. This choice makes a statistical problem easily visible: model (1) is only identified under the alternative. Under H0 , both θ and c are nuisance parameters such that no information about their values can be obtained from the observed series. As a consequence, the parameters cannot be consistently estimated and the standard asymptotic distribution theory for the classical test statistics does not work. The first discussion of this problem appeared in Davies (1977). For a recent overview and solutions see Hansen (1996). To solve the problem we follow Luukkonen, Saikkonen, and Teräsvirta (1988) and expand the transition function into a Taylor series around the null point γ = 0. To illustrate, we choose GL 1 and its third-order expansion. Substituting this expansion for the transition function in (1) yields, after merging terms and reparameterizing, 2 3  t yt−d + β3 w  t yt−d  t yt−d + β4 w + ε∗t yt = β 1 wt + β2 w

(5)

 t = (yt−1 , ..., yt−p ) , β 1 = (β 10 , β 11 , ..., β 1p)  βj = (β j1 , ..., β jp)  , j = 2, 3, 4, and ε∗t = where w εt + R3 (γ, c; yt−d )θ wt , the term R3 (γ, c; yt−d ) being the remainder from the expansion. Every ji , β ji = 0, for i = 1, ..., p; j = 2, 3, 4. Note, furthermore, that under H0 ε∗t = εt . element β ji = γ β Thus testing the original hypothesis can be done within (5) by applying the Lagrange Multiplier principle. The new null hypothesis is H0 : β 2 = β3 = β 4 = 0. Assuming E(ε8t ) < ∞, the standard LM statistic has an asymptotic χ2 distribution with 3p degrees of freedom under the null hypothesis. The auxiliary regression based on the third-order Taylor expansion has power

5

against both k = 1 and k = 2. For a detailed derivation of the test, see Luukkonen, Saikkonen, and Teräsvirta (1988) or Teräsvirta (1994). It is advisable to use an F-version of the test if the sample size is small. In that situation, the F-test has better size properties, when the dimension of the null hypothesis H0 is large, than the asymptotic χ2 test. In the above, we have assumed that the delay parameter d is known. In practice, it has to be determined from the data. We may combine our linearity test and the determination of d. This is done as follows. First define a set D = {1, 2, ..., d0 } of possible values of the delay parameter. Carry out the test for each of them. Choose the value in D that minimizes the p-value of the test, if the value is sufficiently low for the null hypothesis to be rejected. If no individual test rejects, accept linearity. The motivation of this strategy is given in Teräsvirta (1994). In practice we have to first determine the maximum lag p in our test equation (5). This can be done by applying an appropriate model selection criterion such as AIC to the null model, which is a linear AR(p) model. As autocorrelation in the errors may cause size distortion (Teräsvirta (1994)), it is useful to check that the selected AR model has uncorrelated errors. The Ljung-Box statistic may be used for the purpose. It may be argued that when this test and selection procedure is applied, the significance level of the linearity test is not under the model builder’s control. This is true, but the linearity tests are used here as a model specification device rather than as a strict test. If linearity is erroneously rejected this mistake will most likely show up at the estimation or evaluation stages of the modelling cycle. If the main interest lies not in modelling but in testing theory, that is, testing a linear specification against a STAR one, the auxiliary regression may be modified as in Luukkonen, Saikkonen, and Teräsvirta (1988) to correspond to the case where we only assume that d ∈ D when carrying out the test. If we restrict ourselves to cases k = 1 and k = 2, a choice between them may be made at the specification stage by testing a sequence of hypotheses within the auxiliary regression (5). This is described in Teräsvirta (1994) or Teräsvirta (1998). Another possibility is to estimate both an 6

LSTAR(1) and an LSTAR(2) model and postpone the final choice to the evaluation stage. As estimating a STAR model is no longer a time-consuming affair, this may be a feasible alternative. Testing linearity and determining the delay d are carried out using the auxiliary regression (5). The specification also includes determining the lag structure in (1). This is best done by selecting a maximum lag (the lag selected for the linear AR model for testing linearity is often a reasonable choice, at least if it exceeds one), estimating the LSTAR model with this maximum lag, then eliminating insignificant lags from the linear and nonlinear part of the equation (1). Here it should be noted that, in addition to the standard restrictions φi = 0 and θj = 0, the exclusion restriction φj = − θj is a feasible alternative. It makes the combined parameter φj + θj G approach zero as G → 1. The restriction φj = 0 has the same effect as G → 0. Some experimenting may thus be necessary to find the appropriate restrictions. In some cases, these become evident from the estimates of the unrestricted model.

2.2.3

Estimation

The estimation of the parameters of the LSTAR(k) model (1) is carried out by conditional maximum likelihood. We assume that the model satisfies the assumptions and regularity conditions MLE.1-MLE.7 in Wooldridge (1994), without actually verifying them here. The conditions guarantee the consistency and asymptotic normality of the parameter estimators. They imply, among other things, that {yt } is weakly stationary and geometrically ergodic. The estimation can be carried out by using a suitable iterative estimation algorithm. For a useful review, see Hendry (1995), Appendix A5. Many of the algorithms normally lead to the same estimates but may yield rather different estimates of the Hessian matrix which is required for the inference. Thus the Newton-Raphson algorithm is in principle an excellent choice. However, in addition to analytical second derivatives it requires starting from values that already are rather close to the optimal ones. The BFGS algorithm based on numerical derivatives has also turned out to be a reasonable choice in practice.

7

The starting-values are an important issue. First, the location parameter γ is not scale invariant, which makes it difficult to find a good starting-value for it. Replacing the transition function (2) by GL k (γ, c; st ) = (1 + exp{−γ

k  (st − ci )/ σ (y)k })−1 i=1

where σ (y) is a sample standard deviation of st makes γ approximately scale-free. Note that if γ and c are fixed in (1) then the STAR model is linear in parameters. This suggests constructing a grid for γ and the elements of c. The remaining parameters are estimated conditionally on every parameter combination in the grid by linear least squares. The parameter estimates of the combination with the lowest sum of squared residuals constitute the desired starting-values. We shall not discuss potential numerical problems in the estimation. Difficulties may sometimes be expected if γ is very large. This is due to the fact that the transition function is close to a step function (k = 1) or ‘a step and a reverse’ (k = 2). For more information about this, see, for example, Bates and Watts (1988), p. 87, Teräsvirta (1994) or Teräsvirta (1998).

2.2.4

Evaluation by misspecification tests

After estimating the parameters of a STAR model it is necessary to evaluate the model. Misspecification tests play an important rôle in doing that. The validity of the assumptions underlying the estimation have to be tested carefully. Homoskedasticity and no serial error correlation and parameter stability are among them. When we are dealing with a family of nonlinear models, it is also of interest to know whether or not the estimated model adequately characterizes the nonlinearity present in the series. The null hypothesis of no remaining nonlinearity should thus also be tested. Testing homoskedasticity of errors against conditional heteroskedasticity is carried out exactly as outlined in Engle (1982). The STAR structure of the conditional mean does not affect the test. The assumption Eε4t < ∞ is required for the asymptotic theory to work. The other tests may be

8

characterized by including an additive component in (1) such that yt = φ wt +(θ wt )GL k (γ, c; st ) + A(wt , wt−1 , ..., wt−p ; ψ) + εt

(6)

where ψ is a parameter vector and εt ∼nid(0, σ2 ). Under the null hypothesis, A ≡ 0. If we test the null of no error autocorrelation, A(wt , wt−1 , ..., wt−p ; ψ) = α vt where α = (α1 , ..., αm ) and vt = (ut−1 , ..., ut−m ) with ut = yt − φ wt +(θ wt )Gk (γ, c; st ). The null hypothesis is α = 0. As to parameter constancy, it is natural to test the hypothesis in the smooth transition spirit by allowing the parameters to change smoothly over time under the alternative. Generalize (1) to yt = φ(t) wt +θ(t) wt GL k (γ, c; st ) + εt where the time-varying coefficient vectors have the form φ(t) = φ0 + φ1 H L (γ 1 , c1 ; t) and θ(t) = θ0 +θ 1 H L (γ 1 , c1 ; t) with H L (γ 1 , c1 ; t) = (1+exp{−γ 1



−1 i=1 (t−c1i )}) , γ 1

> 0 and c11 ≤ ... ≤ c1 .

This yields L A(wt , wt−1 , ..., wt−p ; ψ) = φ1 wt H L (γ 1 , c1 ; t) + θ 1 wt GL k (γ, c; st )H (γ 1 , c1 ; t).

The null hypothesis is γ 1 = 0 and the alternative γ 1 > 0. When k = 1 in the transition function H L and γ 1 → ∞, then the model has a single structural break. This alternative, very popular among econometricians, thus appears as a special case in this more general setting. The test is based on an approximation of H L (γ 1 , c1 ; t) by a Taylor-series expansion to circumvent the identification problem present even here. The null of no additive nonlinearity may be tested by assuming A(wt , wt−1 , ..., wt−p ; ψ) = π  wt Gn (γ 2 , c2 ; rt ) in (6) where π = (π 0 , π1 , ..., π p ) and rt is another transition variable. The null hypothesis γ 2 = 0 is tested against γ 2 > 0. Even here, the ensuing identification problem is solved by replacing the transition function by its Taylor-series expansion. Derivation of the tests and 9

the asymptotic distribution theory are discussed in detail in Eitrheim and Teräsvirta (1996) and Teräsvirta (1998). If at least one misspecification test rejects, the model builder should reconsider the specification. One possibility is carry out a new specification search. Another one is to conclude that the family of STAR models does not provide a good characterization of the series. Yet another one is to estimate the alternative model and retain that, at least if it passes the new misspecification tests. There is at least one case where that may be a sensible alternative. If the series contains seasonality, which is often the case in the analysis of economic time series then it is very common that the seasonal pattern changes slowly over time. The reasons include slowly changing institutions and, for instance in production series, technological change. Augmenting an LSTAR(k) model by deterministically changing coefficients instead of trying to respecify it may then be a good alternative. The question then is what to do when forecasting with the model. Should the value of H L be frozen to where it is at the end of the observation period or should it be predicted as well? The latter would mean extrapolating its values into the future and using the extrapolated values in forecasting. In some other cases the answer is quite clear. If a model contains a linear trend then the trend is always extrapolated and the extrapolated values used in forecasting. The same is obviously true when the trend is nonlinear and parameterized as a STAR component as in Leybourne, Newbold, and Vougas (1998). It may be less clear if we consider extrapolating changes in seasonality. Rahiala and Teräsvirta (1993) were in this situation, but in their case the transition function had already practically reached its final value unity at the end of the observed series.

2.2.5

Evaluation by extrapolation

It is difficult to draw conclusions on the stationarity of an estimated STAR model from its estimated coefficients. Nevertheless, from the forecasting point of view it would be important to know whether or not the estimated model is stable. An explosive model may only be applied in very

10

short-term forecasting. When the forecasting horizon increases the model becomes useless rapidly. A necessary condition for stability is discussed in Granger and Teräsvirta (1993). Consider model (1) without noise. Extrapolate this ‘skeleton’ (Tong (1990)) from different starting-values. The necessary condition for stability is that the sequence of extrapolated values always converges to a single point called the stable stationary point. Note, however, that this point need not be unique. If an estimated STAR model satisfies this condition, one may repeat the same exercise with noise added. In fact, this has a few things in common with actual forecasting with STAR models, to which we now turn.

3

Forecasting with STAR models

3.1

General procedure

The STAR model is a nonlinear model, which makes multiperiod forecasting from them more complicated than from linear models. In forecasting a single period ahead there is no difference between STAR and linear AR models. We shall first discuss the general procedure for obtaining multiperiod forecasts. Our exposition closely follows that in Granger and Teräsvirta (1993). Consider the nonlinear univariate model yt = g(wt ; ψ) + εt

(7)

where ψ is the parameter vector and {εt } is a sequence of independent, identically distributed errors. In the logistic STAR case,  g(wt ; ψ) = φ wt +(θ wt )GL k (γ, c; st ).

(8)

From (7) it follows that E(yt+1 |It ) = g(wt+1 ; ψ), which is the unbiased forecast of yt+1 made at time t given the past information It up until that time. In this case, the relevant information f is contained in wt+1 = (1, yt , yt−1 , ..., yt−(p−1) ) . We denote this forecast yt+1|t . Forecasting two

11

periods ahead is not as easy, because obtaining E(yt+2 |It ) is more complicated. We have f f f yt+2|t = E(yt+2 |It ) = E{[g(wt+2 ; ψ) + εt+2 ]|It } = E{g(wt+2 ; ψ)|It }

(9)

f f where wt+2 = (1, yt+1|t + εt+1 , yt , ..., yt−(p−2) ) . The exact expression for (9) is

 f yt+2|t

f E{g(wt+2 ; ψ)|It }

=

=



−∞

f g(wt+2 ; ψ)dΦ(z)dz

(10)

where Φ(z) is the cumulative distribution function of εt+1 . Obtaining the forecast would require numerical integration, and multiple integration would be encountered for longer time horizons. As an example, in an LSTAR(1) model with the maximum lag p = 1 and delay d = 1 (10) has the form.

 f yt+2|t =

 f = φ0 + φ1 yt+1|t +



−∞

f f [φ0 + φ1 (yt+1|t + z) + (θ 0 + θ 1 (yt+1|t + z))

f ×(1 + exp{−γ(yt+1|t + z − c1 )})−1 ]dΦ(z)dz ∞

−∞

f f [(θ0 + θ1 (yt+1|t + z)(1 + exp{−γ(yt+1|t + z − c1 )})−1 ]dΦ(z)dz

(11)

f where yt+1|t = φ0 + φ1 yt + (θ0 + θ1 yt )(1 + exp{−γ(yt − c1 )})−1 . Numerical integration of (2) is

not very complicated, but, as noticed above, the dimension of the integral grows with the forecast horizon. Thus it would be computationally more feasible to obtain the forecasts recursively without numerical integration. A simple way would be to ignore the error term εt+1 and just use the skeleton. Granger and Teräsvirta (1993) called this the naïve method: fn fn fn f (i) Naïve: yt+2|t = g(wt+2 ; ψ) where wt+2 = (1, yt+1|t , yt , ..., yt−(p−2) ) .

This is equivalent to extrapolating with the skeleton. In the above LSTAR(1) case, fn f f f yt+2|t = φ0 + φ1 yt+1|t + (θ0 + θ1 yt+1|t )(1 + exp{−γ(yt+1|t − c1 )})−1 .

Extrapolating this way is simple but in view of equation (10) it leads to biased forecasts. Another way is to simulate: this is called Monte Carlo in Granger and Teräsvirta (1993). fm (ii) Monte Carlo: yt+2|t = (1/M )

M

f m=1 g(wt+2,m ; ψ)

where each of the M values of εt+1

f in wt+2,m is drawn independently from the error distribution of (1). By the weak law of large

numbers, the forecast is asymptotically unbiased as M → ∞. 12

In the LSTAR(1) case, fm f yt+2|t ≈ φ0 +φ1 yt+1|t +(1/M)

M 

(m)

(m)

f f (θ0 +θ1 (yt+1|t +εt+1 ))(1+exp{−γ(yt+1|t +εt+1 )−c1 )})−1 . (12)

m=1

Finally, if we do not want to rely on the error distribution we have assumed for parameter estimation we may apply resampling to obtain the forecast. This is the bootstrap method: fb (iii) Bootstrap: yt+2|t = (1/B)

B

f b=1 g(wt+2,b ; ψ)

f where each of the B values of εt+1 in wt+2,b

is drawn independently from the set of the residuals of the estimated model with replacement. An advantage of these numerical approximations to the true expectations is that they automatically give a number (M or B) of point forecasts for each period to be predicted. In fact, what is available is a forecast density, and interval forecasts may be constructed on the basis of them. In so doing, one has to remember that the interval forecasts obtained this way do not account for sampling uncertainty and the intervals are therefore somewhat too narrow. Sampling uncertainty  is a consistent and (after rescaling) may be accounted for by numerical techniques. Suppose ψ asymptotically normal estimator with a positive definite covariance matrix of the parameter vector  We ψ in (7). The forecasts obtained by any of the above techniques are based on the estimate ψ.  can draw a random sample of size R, say, from the large-sample distribution of the estimator ψ and obtain R sets of new estimates. Plugging those into (7) gives R new models. By repeating the procedure (ii) or (iii) for each of these leads to RM or RB forecasts. The interval forecasts based on these predictions now accommodate the sampling uncertainty. This approach is quite computer intensive. In the case of STAR models it also requires plenty of human resources. We cannot always exclude the possibility that some of the R models are unstable, and instability in turn would mess up the forecasts. Every model thus has to be checked for stability, which requires human control and an even greater amount of computational resources. In the application of Section 4.2, we choose to ignore the sampling uncertainty.

13

3.2

Estimating and representing forecast densities

When we generate multiperiod forecasts with Monte Carlo and Bootstrap procedures we have a whole set of forecasts for each time period. A density forecast for each of these periods may be obtained as follows. First arrange the individual forecasts for a given time period in ascending order. Represent this empirical distribution as a histogram and smooth it to obtain an estimate of the density of the forecast. Various smoothing techniques exist, the most popular of them being kernel estimation and spline smoothing. These are discussed, for example, in Silverman (1986) or Härdle (1990), Chapter 3. In practical work it is helpful to represent these densities by graphs. STAR model density forecasts may well be multimodal. If we want to stress this property, we may graph the densities for each time period; for a recent application, see Matzner-Løber, Gannoun, and De Gooijer (1998). Another idea is to compute and graph so-called highest density regions. This is a compact way of representing these densities for a number of forecast periods simultaneously. Hyndman (1996) shows how to do that. Hyndman (1995) discusses the use of highest density regions in representing forecasts and provides an example of a bimodal density forecast. The 100α% highest density region is estimated as follows. Let x be a continuous random variable with probability density f(x). The highest density region HDRα (x) = {x : f (x) ≥ fα } where fα > 0 is such that the probability of a given x having a density that at least equals fα is α. Any point x belonging to a highest density region has a higher density than any point outside this region. If the density is multimodal then a given highest density region may consist of disjoint subsets of points, depending on α. Wallis (1999), see also Tay and Wallis (2000), considers another way of graphing density forecasts called the fan chart. The graph opens up like a fan as the forecast horizon increases. The idea is to compute 100α% prediction intervals P Ia (x) ={x : Pr[a < x < b] = α} from the density forecast and represent 10%, 20%,...,90%, say, intervals for forecasts with different shades

14

of colour, the colour getting lighter as the interval is widens. As such prediction intervals are not unique, choosing one depends on the loss function of the user. One possibility is to choose a and b such that f(a) = f (b) = fa . Wallis (1999) shows that this choice may be derived from an all-or-nothing loss function: no loss if the observed value lands in the interval and a positive constant loss if it lies outside. The requirement that the prediction interval have the shortest possible length also leads to f (a) = f (b). Note that we may want to allow the interval to consist of a number of disjoint subintervals while imposing the shortest possible length requirement. In that case the resulting prediction interval equals the highest density region discussed above. We may call the corresponding unbroken interval the equal density interval. Another way of defining the prediction interval is to let the tail probabilities be equal: P Ia (x) = {x : Pr[x < a] = Pr[x > b] = 1 − α/2, a < b}. A loss function yielding this choice is linear: there is no loss if the observed value lies in the interval, and the loss is a linear function of the distance between the nearest endpoint of the interval and the realization if the latter lies outside the interval. When α → 0, this prediction interval shrinks to the median of the density forecast. This is the prediction interval we shall use in the application of Section 4.2. The equal density interval shrinks to the mode of the density. Examples of these intervals and fan graphs will be given in the next section.

4 4.1

Examples Simulated example

In this section, we discuss the forecasting power of an LSTAR model through a simulated example. Of course, the results cannot therefore be generalized to the whole family of STAR models, but the main purpose of the simulation study is to highlight some issues in forecasting with STAR models. A recurring argument is that in practice, a nonlinear model hardly forecasts better than

15

a linear one. This seems to be the case even if the nonlinear model seems to fit better in-sample than the linear one built on the same information set. Thus it is natural to compare forecasts from a STAR model to those from a linear one. The LSTAR model we use to generate the data is defined as follows: yt = −0.19 + 0.38 [1 + exp(−10yt−1 )]−1 + 0.9yt−1 + 0.4ut

(13)

where {ut } is a sequence of independent normally distributed random numbers with zero mean and unit variance. A total of 1,000,000 observations is generated with this data-generating process (DGP). In this special case where only the intercept is time-varying the DGP may also be interpreted as a special case of single hidden-layer feedforward artificial neural network (ANN) model; see, for example, Granger and Teräsvirta (1993), p. 105, for a definition of such a model. The forecasting experiment is set up in the following way. From our time series of one million observations, we randomly choose a subset of 6282 observations. Taking each of these observations as the last observation in a subseries of 1000 observations we estimate the parameters of the LSTAR model by nonlinear least squares. We also estimate the parameters of two linear models. The first is a linear first-order autoregressive (AR) model for yt . The second is a linear first-order autoregressive model for ∆yt : we call this an autoregressive integrated (ARI) model. The rationale for the ARI model is that if we take those 1000 observations and test the unit root hypothesis it is most often not rejected at the 5% significance level. This is not a general feature of data generated by an LSTAR model but it happens the case for model (13). In fact, this model shares some dynamic properties with the simple nonlinear time-series model in Granger and Teräsvirta (1999). Having done this, we generate forecasts from each nonlinear model by simulation as discussed in the previous section. We also do this under the assumption that we know the parameters of the LSTAR model. This gives us an idea of how much the fact that the parameters have to be estimated affects the forecast accuracy. Furthermore, we generate the corresponding forecasts

16

from the two linear autoregressive models. To compare the accuracy of the forecasts we use the Relative Efficiency (RE) measure that Mincer and Zarnowitz (1969) proposed. It is defined as the ratio of the Mean Square Forecast Error (MSFE) of the model under consideration and its benchmark. We use the linear autoregressive models as the benchmark. A value of RE greater or equal to unity indicates that the benchmark model provides more accurate models than the nonlinear LSTAR model. Figure 4 contains the RE based on 6282 forecast simulations with the linear AR models as the benchmark. It is seen that when the linear AR model is used as the benchmark the relative forecast accuracy of the LSTAR model is steadily improved up until ten periods ahead, after which it stabilizes. With 1000 observations, having to estimate the parameters does not make things much worse compared to the hypothetical case where they are known. The situation is unchanged when the ARI model is used as a benchmark. The corresponding RE graph is not reproduced here. Furthermore, if the median forecast or the mode of the forecast density obtained from the estimated forecast density are used as the nonlinear point forecast, the situation does not change very much. It is often argued that the nonlinear model forecasts better than the linear one only when one forecasts ‘nonlinear observations’, that is, a sequence of observations whose behaviour cannot be explained in a satisfactory fashion without a nonlinear model. This possibility was mentioned in the Introduction. Recently, Montgomery, Zarnowitz, Tsay, and Tiao (1998) found that the nonlinear threshold autoregressive and the Markov-switching autoregressive (Lindgren (1978)) model outperformed the AR model during periods of rapidly increasing unemployment but not elsewhere. In our case, we define our ”nonlinear observations” to be those which correspond to large changes in the value of the transition function. A large change is defined to occur when the change of the transition function of the LSTAR model exceeds 0.2 in absolute value. There are exactly 6282 such observations out of one million in our simulated time series, so that such a large change is already a rare event. We choose every such observation to be the last observation in one 17

of our subseries, which leads to 6282 subseries. As before, we estimate our models and generate our forecasts using those subseries. Figure 5 contains the RE up to 20 periods ahead when the AR model is the benchmark. It is seen that now the advantage of knowing the parameters is clearly greater than in the previous case with randomly-chosen series. It seems that estimating the parameters of the LSTAR model sufficiently accurately may not be easy if there is not much information available about the nonlinearity in the sample (large changes in the transition function were a rare event). Besides, the RE values stabilize already after six periods or so. From Figure 6 we see that the situation changes dramatically when the ARI model is the benchmark. When the parameters of the LSTAR model are estimated (the realistic case), the model loses its predictive edge after thirteen periods into the future, and the overall gain from using the LSTAR model for forecasting is much smaller than in the previous simulation. This may appear surprising at first, but Clements and Hendry (1999), Chapter 5, offer a plausible explanation. They argue that first differences are a useful device in forecasting in case of structural breaks in levels, because the model adapts quickly to a new situation even if it is misspecified. Here we do not have structural breaks, but rapid shifts in levels are typical for our simulated STAR model. These may resemble structural breaks, for example, changes in the intercept of a model defined in levels. Flexibility inherent in first difference models is an asset when a large shift in the series occurs at the beginning of the forecasting period. When forecasting with nonlinear models one may also think of using either the median or mode of the forecast density as the point forecast. Figure 7 shows the evolution of the RE in the ARI case when the mode of the forecast density is used for the purpose. It appears that the nonlinear forecasts are off the track very quickly. It should be noted, however, that the mode forecast is not an optimal one when the loss function is quadratic. The RE is based on the quadratic loss function and may therefore be used for comparing mean forecasts. When forecasting with nonlinear models, this is an important distinction. It may also be pointed out that the RE, when the median is used as the point forecast from the LSTAR model, lies between those based on the 18

mean and the mode forecasts, respectively. The lessons from this experiment may be summed up as follows. First, there exist STAR models with which one forecasts consistently better than with linear models. Second, a part of the theoretical predictive edge of the STAR model may vanish when the parameters are estimated, and this may still happen at sample sizes that may already be considered unrealistically large in macroeconometric applications. Third, when the stationary STAR model depicts ”near unit root behaviour”, a misspecified linear model based on first differences may become a close competitor to the nonlinear model. Moreover, in our experiment this happens when the observations to be predicted show ‘nonlinear behaviour’, which is in some sense atypical. Finally, when comparing point forecasts other than mean forecasts from nonlinear models with forecasts from linear models the choice of the loss function becomes very important.

4.2

Empirical example

In this section we consider an application to Danish and Australian quarterly unemployment series. Many unemployment rates seem asymmetric in the sense that the rate may go up rather sharply and then decrease more slowly. Skalin and Teräsvirta (2000) have modelled such series with LSTAR models that are able to characterize asymmetric behaviour. A typical LSTAR model of this kind may be defined as follows: ∆yt = −αyt−1 + φ wt +ψ 1 dt + (θ wt + ψ2 dt )GL 1 (γ, c; ∆4 yt−d ) + εt

(14)

where wt = (1, ∆yt−1 , ..., ∆yt−p ) , dt = (d1t , d2t , d3t ) is a vector of seasonal dummy variables, α > 0, and the transition function is defined as in (2). Note that model (14) is in levels if α = 0. The transition variable is a four-quarter difference because possible intra-year nonlinearities are not our concern. The estimated equations and interpretations can be found in Skalin and Teräsvirta (2000). These series show unit-root behaviour: testing the unit-root hypothesis with the standard augmented Dickey-Fuller test does not lead to rejecting the null hypothesis at

19

conventional significance levels. The estimation period of Skalin and Teräsvirta for these two models ends in 1995. We forecast the two series nine quarters ahead starting from 1996(1) and compare the forecasts with the corresponding nine new observations that have meanwhile become available. We also forecast the same period with an AR model for ∆yt augmented with seasonal dummies. The results are given in Tables 1 and 2. For Australia, the LSTAR model forecasts the slow downward movement better than the AR model. In the Danish case, both models forecast reasonably well, the nonlinear model somewhat better than the linear one. (As the forecast horizon is not the same for all forecasts, the standard Diebold-Mariano test (Diebold and Mariano (1995)) for comparing the accuracy of forecasts is not available here.) The 50% equal tails interval forecasts from the STAR models contain the true values for all periods in both cases. The mode and median point forecasts differ from the means, because the forecast densities are not symmetric. The asymmetry is clearly visible in the fan charts and highest density regions for the forecasts in Figures 8-11. For the Australian unemployment rate, the graph of the highest density regions in Figure 9 also shows that the densities are bimodal for forecasts four and five quarters ahead. As to the forecasts of the Danish unemployment, the fan chart in Figure 10 indicates much greater predictive uncertainty than is the case for the Australian forecasts. While the point forecasts from an LSTAR model evaluated in MSE terms may not be more accurate than those from a linear model, the advantage of forecasts from nonlinear models may lie in this asymmetry. Asymmetric forecast densities may convey important information to decision makers. In the present case the message is that while the unemployment rate may well decrease in the future, an even faster decrease than foreseen (point forecasts) is less likely than a slowdown or even another upturn. Such information is not available to those relying on forecasts from linear models. Those forecasts have symmetric forecast densities. If we accept the idea that the forecast densities contain useful information then, instead of comparing point forecasts, the focus could be on comparing forecast densities. Clements and 20

Smith (2000) recently compared linear and nonlinear forecasts of the seasonally adjusted US GNP and unemployment series. The point forecasts from nonlinear models considered were not better than those from linear ones. On the other hand, the nonlinear density forecasts were more accurate than those obtained from linear models. These comparisons did not include STAR models, but the results are suggestive anyway. Density forecasts have been surveyed in Tay and Wallis (2000).

5

Conclusions

We have seen that forecasting with STAR models is somewhat more complicated than forecasting with linear models. As a reward, the forecaster obtains forecast densities that contain considerably more information than mere point forecasts. Reporting these densities in practical work therefore seems worthwhile, and this chapter contains examples of ways of doing that in a compact and illustrative fashion. The simulation results in the chapter show that mean forecasts from a correctly specified STAR model can be more accurate than their counterparts from a misspecified linear model. This should not come as a surprise. The results also indicate, however, that the nonlinear model may not have always a predictive edge where it may be expected to, namely in forecasting ”nonlinear observations”. On the other hand, it is no doubt possible to construct STAR models which, at least in simulation studies, have this property. As a whole, we may conclude that STAR models offer an interesting alternative in modelling and forecasting economic time series, and that more work is needed to fully understand their potential in economic forecasting.

21

References B, D. W.,  D. G. W (1971): “Estimating the Transition Between Two Intersecting Straight Lines,” Biometrika, 58, 525—534. B , D. M.,  D. G. W (1988): Nonlinear Regression Analysis and its Applications. Wiley, New York. B, G. E. P.,  G. M. J  (1970): Time Series Analysis, Forecasting and Control. Holden-Day, San Francisco. C, K. S.,  H. T (1986): “On Estimating Thresholds in Autoregressive Models,” Journal of Time Series Analysis, 7, 178—190. C   , M. P.,  D. F. H  ! (1999): Forecasting Non-Stationary Economic Time Series. MIT Press, Cambridge, MA. C   , M. P.,  J. S  (2000): “Evaluating the Forecast Densities of Linear and NonLinear Models: Applications to Output Growth and Unemployment,” Journal of Forecasting, (forthcoming). D" , R. (1977): “Hypothesis Testing When a Nuisance Parameter is Present Only under the Alternative,” Biometrika, 64, 247—254. D $, F. X.,  R. S. M  (1995): “Comparing Predictive Accuracy,” Journal of Business and Economic Statistics, 13, 253—263. E  , Ø.,  T. T ' "  (1996): “Testing the Adequacy of Smooth Transition Autoregressive Models,” Journal of Econometrics, 74, 59—75. E , R. F. (1982): “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation,” Econometrica, 50, 987—1007. G  , C. W. J.,  T. T ' "  (1993): Modelling Nonlinear Economic Relationships. Oxford University Press, Oxford. G  , C. W. J.,  T. T ' "  (1999): “A Simple Nonlinear Time Series Model with Misleading Linear Properties,” Economics Letters, 62, 161—165. H , J. D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle,” Econometrica, 57, 357—384. H , B. E. (1996): “Inference When a Nuisance Parameter Is Not Identified under the Null Hypothesis,” Econometrica, 64, 413—430. H'  , W. (1990): Applied Nonparametric Regression. Cambridge University Press, Cambridge. H  !, D. F. (1995): Dynamic Econometrics. Oxford University Press, Oxford. H!, R. J. (1995): “Computing and Graphing Highest Density Regions,” American Statistician, 50, 120—126. (1996): “Highest-Density Forecast Regions for Non-Linear and Non-Normal Time Series Models,” Journal of Forecasting, 14, 431—441. L !$)  , S. J., P. N +$,  D. V) (1998): “Unit Roots and Smooth Transitions,” Journal of Time Series Analysis, 19, 83—97.

22

L , G. (1978): “Markov Regime Models for Mixed Distributions and Switching Regressions,” Scandinavian Journal of Statistics, 5, 81—91. L)) , R., P. S ,  T. T ' "  (1988): “Testing Linearity Against Smooth Transition Autoregressive Models,” Biometrika, 75, 491—499. M - -L/$ , E., A. G),  J. G. D G1 (1998): “Nonparametric Forecasting: A Comparison of Three Kernel-Based Methods,” Communications in Statistics, Theory and Methods, 27, 1593—1617. M , J.,  V. Z + - (1969): “The Evaluation of Economic Forecasts,” in Economic Forecasts and Expectations, ed. by J. Mincer. National Bureau of Economic Research, New York. M  !, A. L., V. Z + -, R. S. T !,  G. C. T (1998): “Forecasting the U.S. Unemployment Rate,” Journal of the American Statistical Association, 93, 478—493. R, M.,  T. T ' "  (1993): “Business Survey Data in Forecasting the Output of Swedish and Finnish Metal and Engineering Industries: A Kalman Filter Approach,” Journal of Forecasting, 12, 255—271. S   , N. (1999): “Modeling Non-Linearities in Real Effective Exchange Rates,” Journal of International Money and Finance, 18, 27—45. S" , B. W. (1986): Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. S, J.,  T. T ' "  (2000): “Modelling Asymmetries and Moving Equilibria in Unemployment Rates,” Macroeconomic Dynamics, (forthcoming). T!, A. S.,  K. F. W (2000): “Density Forecasting: A Survey,” Journal of Forecasting, (forthcoming). T ' " , T. (1994): “Specification, Estimation, and Evaluation of Smooth Transition Autoregressive Models,” Journal of the American Statistical Association, 89, 208—218. (1998): “Modelling Economic Relationships with Smooth Transition Regressions,” in Handbook of Applied Economic Statistics, ed. by A. Ullah, and D. E. A. Giles, pp. 507—552. New York: Dekker. T ' " , T.,  H. M. A  (1992): “Characterizing Nonlinearities in Business Cycles with Smooth Transition Autoregressive Models,” Journal of Applied Econometrics, 7, S119— S136. T, H. (1990): Non-Linear Time Series. A Dynamical System Approach. Oxford University Press, Oxford. T !, R. S. (1989): “Testing and Modeling Threshold Autoregressive Processes,” Journal of the American Statistical Association, 84, 231—240. T !, R. S. (1998): “Testing and Modeling Multivariate Threshold Models,” Journal of the Americal Statistical Association, 93, 1188—1202. W , K. F. (1999): “Asymmetric Density Forecasts of Inflation and the Bank of England’s Fan Chart,” National Institute Economic Review, 168, 106—112.

23

W  , J. M. (1994): “Estimation and Inference for Dependent Processes,” in Handbook of Econometrics, ed. by R. F. Engle, and D. L. McFadden, vol. 4, pp. 2638—2737. Elsevier, Amsterdam.

24

Figure 1: Graphs of the logistic transition function (2) with k = 1 for γ = 0.01, 3, 20, and 50. The graph corresponding to the lowest value of γ lies closest to the line GL 1 = 1/2.

25

Figure 2: Graphs of the logistic transition function (2) with k = 2 for γ = 0.01, 3, 20, and 50. The graph corresponding to the lowest value of γ lies closest to the line GL 2 = 1/2.

26

Figure 3: Graphs of the exponential transition function (3) for γ = 0.01, 3, 20, and 50. The graph corresponding to the lowest value of γ lies closest to the horizontal axis GE = 0.

27

Figure 4: Relative Efficiency (RE) of the mean forecasts from the LSTAR model (13) based on 6282 randomly chosen forecast samples when the linear AR model serves as the benchmark model. Dashed lines represent the case where the parameters of the LSTAR model are estimated, the short dashed lines the case where they are known.

28

Figure 5: Relative Efficiency (RE) of the mean forecasts from the LSTAR model (13) based on 6282 randomly chosen forecast samples when the 6282 samples are based on series whose last observation represents a large change and the linear AR model is the benchmark model. Dashed lines represent the case where the parameters of the LSTAR model are estimated, the short dashed lines the case where they are known.

29

Figure 6: Relative Efficiency (RE) of the mean forecasts from the LSTAR model (13) based on 6282 randomly chosen forecast samples when RE when the 6282 samples are based on series whose last observation represents a large change and the linear ARI model is the benchmark model. Dashed lines represent the case where the parameters of the LSTAR model are estimated, the short dashed lines the case where they are known.

30

Figure 7: Relative Efficiency (RE) of the mode forecasts from the LSTAR model (13) based on 6282 randomly chosen forecast samples when RE when the 6282 samples are based on series whose last observation represents a large change and the linear ARI model is the benchmark model. Dashed lines represent the case where the parameters of the LSTAR model are estimated, the short dashed lines the case where they are known.

31

Figure 8: The fan graphs for forecast densities of the Australian unemployment rates 1996(1)1998(1) obtained from LSTAR models of type (14) reported in Skalin and Teräsvirta (2000). The largest interval represents 90% of the estimated forecast density

32

Figure 9: Highest density regions for forecast densities of the Australian unemployment rates 1996(1)-1998(1) obtained from LSTAR models of type (14) reported in Skalin and Teräsvirta (2000). The largest interval represents 90% of the estimated forecast density

33

Figure 10: The fan graphs for forecast densities of the Danish unemployment rates 1996(1)-1998(1) obtained from LSTAR models of type (14) reported in Skalin and Teräsvirta (2000). The largest interval represents 90% of the estimated forecast density

34

Figure 11: Highest density regions for forecast densities of the Danish unemployment rates 1996(1)1998(1) obtained from LSTAR models of type (14) reported in Skalin and Teräsvirta (2000). The largest interval represents 90% of the estimated forecast density

35

Horizon

Forecasts

Quarter

AR

True values LSTAR

Observed

25%

50%

75%

25%

50%

75%

1996(1)

8.7

9.2

9.7

8.5

9.0

9.5

9.2

1996(2)

7.8

8.6

9.4

7.7

8.5

9.3

8.3

1996(3)

7.6

8.7

9.7

7.4

8.5

9.6

8.4

1996(4)

7.4

8.9

10.2

7.2

8.5

9.9

8.4

1997(1)

8.1

9.9

11.6

8.0

9.4

10.8

9.4

1997(2)

7.3

9.3

11.1

7.4

8.8

10.2

8.5

1997(3)

7.0

9.2

11.2

7.3

8.7

10.1

8.4

1997(4)

7.0

9.3

11.5

7.3

8.6

10.1

8.0

1998(1)

7.7

10.2

12.5

8.2

9.6

10.0

8.9

Table 1: Percentiles of the forecast densities of the Australian unemployment rates 1996(1)-1998(1) obtained using both an AR model and an LSTAR model of type (14). Compare with Figure 8 and 9.

36

Horizon

Forecasts

Quarter

AR

True values LSTAR

Observed

25%

50%

75%

25%

50%

75%

1996(1)

9.1

9.9

10.6

8.9

9.7

10.4

9.9

1996(2)

7.3

8.7

10.0

7.1

8.7

9.9

8.6

1996(3)

7.0

8.8

10.6

6.6

8.6

10.3

8.6

1996(4)

6.5

8.6

10.7

5.6

8.1

10.4

7.8

1997(1)

7.1

9.8

12.4

5.7

8.6

10.3

8.8

1997(2)

5.6

8.8

11.8

4.2

7.5

10.9

7.7

1997(3)

5.3

9.0

12.2

3.9

7.4

11.0

7.7

1997(4)

5.3

9.0

12.5

3.2

6.9

10.8

6.9

1998(1)

6.0

10.1

14.0

3.4

7.2

11.2

7.7

Table 2: Percentiles of the forecast densities of the Danish unemployment rates 1996(1)-1998(1) obtained using both an AR model and an LSTAR model of type (14). Compare with Figure 10 and 11 .

37

Suggest Documents