Nonparametric Estimation of Generalized Impulse Response Functions Rolf Tschernig and Lijian Yang Humboldt-Universitat zu Berlin, Michigan State University October 2000
Abstract
A local linear estimator of generalized impulse response (GIR) functions for nonlinear conditional heteroskedastic autoregressive processes is derived and shown to be asymptotically normal. A plug-in bandwidth is obtained that minimizes the asymptotical mean squared error of the GIR estimator. A local linear estimator for the conditional variance function is proposed which has simpler bias than the standard estimator. This is achieved by appropriately eliminating the conditional mean. Alternatively to the direct local linear estimators of the k-step prediction functions which enter the GIR estimator the use of multi-stage prediction techniques is suggested. Simulation experiments show the latter estimator to perform best. For quarterly data of the West German real GNP it is found that the size of generalized impulse response functions varies across dierent histories, a feature which cannot be captured by linear models. KEY WORDS: Con dence intervals; general impulse response function; heteroskedasticity; local polynomial; multi-stage predictor; nonlinear autoregression; plug-in bandwidth.
1 INTRODUCTION Recent advances in statistical theory and computer technology have made it possible to use nonparametric techniques for nonlinear time series analysis. Consider the conditional heteroskedastic autoregressive nonlinear (CHARN) process fYt gt0
Yt = f (Xt?1 ) + (Xt?1 )Ut ; t = m; m + 1; :::: (1) where Xt?1 = (Yt?1 ; :::; Yt?m )T , t = m; m + 1; ::: denotes the vector of lagged observations up to lag m, and f and denote the conditional mean and conditional standard deviation, respectively. The series fUt gtm represents i.i.d. random variables with E (Ut ) = 0, E (Ut2 ) = 1, E (Ut3 ) = m3 , E (Ut4 ) = m4 < +1 and which are independent of Xt?1 : Masry and Tjstheim (1995) showed asymptotic normality of the Nadaraya-Watson estimator of
Address for Correspondence: Institut fur Statistik und O konometrie, Wirtschaftswissenschaftliche Fakultat, Humboldt-Universitat zu Berlin, Spandauer Str.1, D-10178 Berlin, Germany, email:
[email protected].
the conditional mean function f under an -mixing assumption. Hardle, Tsybakov and Yang (1998) proved asymptotic normality for the local linear estimator of f . For selecting the order m one may use the nonparametric procedures suggested by Tjstheim and Auestad (1994) and Tschernig and Yang (2000) which are based on local constant and local linear estimators of the nal prediction error, respectively. Alternatively one may use cross-validation, see Yao and Tong (1994). For further references on nonparametric time series analysis, see the surveys of Tjstheim (1994) or Hardle, Lutkepohl and Chen (1997). An important goal of nonlinear time series modelling is the understanding of the underlying dynamics. As is well known from linear time series analysis it is not sucient for this task to estimate only the conditional mean function. This is even more so if the conditional mean function is nonlinear. One appropriate tool that allows to study the dynamics of processes like (1) is the generalized impulse response function. In this paper we propose nonparametric estimators for generalized impulse response (GIR) functions for CHARN processes (1) and derive their asymptotic properties. Here, we follow Koop, Pesaran and Potter (1996) and de ne the generalized impulse response GIRk for horizon k as the quantity by which a prespeci ed shock u in period t changes the k-step ahead prediction based on information up to period t ? 1 only. Formally, one has
GIRk (x; u) = E (Yt+k?1 jXt?1 = x; Ut = u) ? E (Yt+k?1 jXt?1 = x) = E (Yt+k?1 jYt = f (x) + (x)u; Yt?1 = x1 ; :::; Yt?m+1 = xm?1 ) (2) ?E (Yt+k?1 jYt?1 = x1; :::; Yt?m = xm): In general, the GIRk depends on the condition x as well as the size and sign of the shock u. An alternative de nition of nonlinear impulse response functions is given by Gallant, Rossi and Tauchen (1993). We propose local linear estimators for the multi-step ahead prediction functions which are contained in GIRk and derive the asymptotic properties of the resulting plug-in estimator of GIRk . This also delivers an asymptotically optimal bandwidth allowing to compute a plug-in bandwidth. The conditional standard deviation in GIRk can be estimated with the local linear volatility estimator of Hardle and Tsybakov (1997). Alternatively, we propose a simpler local linear estimator based on a \de-meaning" idea that is asymptotically as ecient as if the true conditional mean function is known. The prediction functions can be estimated either directly or via the multi-stage prediction techniques of Chen, Yang and Hafner (2000). For two CHARN processes we compare the performance of the nonparametric direct and multi-stage GIRk estimators with parametric ones using Monte Carlo simulations. The multi-stage GIRk estimator with improved volatility estimation shows the best overall performance. It is then used to investigate the dynamics of the West German real GNP. The paper is organized as follows. In Section 2 we de ne local linear estimators for the generalized impulse response function and investigate its asymptotic properties. The alternative estimator for the conditional standard deviation is introduced in Section 3. In Section 4 a GIR estimator based on multi-stage prediction is proposed. Issues of implementation are discussed in Section 5. The results of the Monte Carlo study are summarized in Section 6. Section 7 presents the empirical analysis and Section 8 concludes. 2
2 DIRECT GIR ESTIMATION To facilitate the presentation, we use the following notation. Denote for any k 1 the k-step ahead prediction function by and write where
fk (x) = E (Yt+k?1 jXt?1 = x)
(3)
Yt+k?1 = fk (Xt?1 ) + k (Xt?1 )Ut;k
(4)
k2 (x) = V ar(Yt+k?1 jXt?1 = x) (5) and where the Ut;k 's are martingale dierences since E (Ut;k jXt?1 ) = E (Ut;k jYt?1 ; :::) = 0, 2 2 E (Ut;k jXt?1 ) = E (Ut;k jYt?1 ; :::) = 1, t = m; m + 1; :::. Apparently, f1 = f , 1 = . One also denotes
n
o
k0 ;k (x) = Cov (Yt+k0 ?1 ; Yt+k?1 )jXt?1 = x ;
k0 k0 ;k (x) = Cov
n
o
(6)
Yt+k0 ?1 ? fk0 (Xt?1 ) ; Yt+k?1 ? fk (Xt?1 ) jXt?1 = x : (7) 2
One can now write the generalized impulse response (GIRk ) function de ned in (2) more compactly as
GIRk (x; u) = fk?1 f (x) + (x)u; x0 ? fk (x) = fk?1(xu) ? fk (x) where x0 = (x1 ; :::; xm?1 ) and xu = ff (x) + (x)u; x0 g. The plug-in estimate of the GIRk function in (8) is then d k (x; u) = fbk?1 (xbu) ? fbk (x) GIR
(8)
(9)
where allnunknown functions o are replaced by local linear estimates. The estimator of xu is xb u = fb(x) + b (x)u; x0 . For de ning the local linear estimators, K : IR1 ?! IR1 denotes a kernel function which is assumed to be a continuous, symmetric and compactly supported probability density and
Kh (x) = 1=hm
m Y
j =1
K (xj =h)
de nes the product kernel for x 2 IRm and the bandwidth h = n?1=(m+4) ; > 0. De ne further the matrices
!T 1 Zk = Xm? ? x Xn?k ? x T Wk = diag fKh (Xi? ? x)=ngin?mk ; Yk = Ym k? Yn : The local linear estimator fbk (x) of the k-step ahead prediction function fk (x) can then be written as ? fbk (x) = eT ZTk Wk Zk ZTk Wk Yk : (10) 1
e = (1; 01m )T ; 1
1
=
+1
+
1
3
1
The local linear estimate bk (x) of the conditional k-step ahead standard deviation is de ned by 1=2 bk (x) = eT ZTk Wk Zk ?1 ZTk Wk Yk2 ? fbk2(x) : (11)
For simplicity, we write fb(x) = fb1(x), b (x) = b1 (x). In the following theorem we show the asymptotic normality of the local linear GIRk estimator (9) based on (10) and (11).R Closed formulae Rfor optimal bandwidth are derived in the corollary. We denote kK k22 = K 2 (u)du, K2 = K (u)u2 du. Theorem 1 De ne the asymptotic variance " 2 2m 2 k?1 (xu )(x) k2 (x) k K k ( x ) 2 2 GIR;k (x; u) = (x) (xu )2 (x) + 2 (x) + ) @fk?1 (xu) 2 ( 21;k (x) 11;k (x) # 2 u ( m @f 4 ? 1) k ? 1 (xu ) 1 + um3 + ? @x @x1 4 2 (x) + u 3 (x) 1 2m @f @f k K k k ? 1 (xu ) k ? 1 (xu ) 11;k ?1 (x) 2 ? (x) I (x = xu) 2k?1;k (x) ? 2 @x 1;k?1(x) + u @x (x) 1 1 (12) and the asymptotic bias b (x; u) = b (x ) ? b (x) + @fk?1 (xu ) fb (x) + b (x)ug (13) f;k?1
GIR;k
where
u
f;k
@x1
f
bf;k (x) = K2 Tr r2 fk (x) =2 b;k (x) = K2 Tr r2 fk2(x) + k2 (x) ? 2fk (x) Tr r2 ffk (x)g = f4k (x)g : (14) Tr r2 fk (x) denotes the Laplacian operator, and one abbreviates bf;1 (x); b;1 (x) simply as bf (x); b (x). Then under assumptions (A1)-(A3) given in the Appendix, one has o n 2 o p mn d nh GIRk (x; u) ? GIRk (x; u) ? bGIR;k (x; u)h2 ! N 0; GIR;k (x; u) : (15) Corollary 1 The optimal bandwidth for estimating GIRk (x; u) is ( m2 (x; u) )1=(m+4) hopt (x; u) = 4b2 GIR;k(x; u)n (16) GIR;k which asymptotically minimizes the mean squared error (MSE)
n
o
n
o
d k (x; u) ? GIRk (x; u) : MSEk (x; u; h) = E GIR For any compact subset Cx of Rm , the minimizer of the mean integrated squared error (MISE)
MISEk (Cx; u; h) =
is asymptotically
Z
Cx
2
d k (x; u) ? GIRk (x; u) 2 (x)dx E GIR
( m R (x; u)(x)dx ) = m : hopt (Cx ; u) = 4n RCx bGIR;k (x; u)(x)dx 1 (
2
Cx GIR;k 2
4
+4)
(17)
Obviously, each of the optimal bandwidth formulas (16) and (17) contains unknown quantities. In Section 5 we discuss estimators for those quantities in order to obtain a plug-in version of the optimal bandwidth (16). This plug-in bandwidth is then used in the Monte Carlo experiments and in the empirical analysis presented in Sections 6 and 7, respectively. Koop, Pesaran and Potter (1996) consider various de nitions of generalized impulse response functions. For example, one alternative to (2) is to allow the condition to be a compact set Cx. Denoting by Cu a compact subset of R, the generalized impulse response function over these compact sets is de ned by
GIRk (Cx ; Cu ) = E fGIRk (Xi?1 ; Ui )jXi?1 2 Cx ; Ui 2 Cug :
(18)
For its estimation, we consider its empirical version
d k (Cx; Cu) = GIR where
1
nPb (Cx; Cu )
n?X k+1 i=m
Pb (Cx; Cu ) = n1
d k (Xi?1; Ui )I (Xi?1 2 Cx; Ui 2 Cu) GIR
n?X k+1 i=m
(19)
I (Xi?1 2 Cx ; Ui 2 Cu ) :
The asymptotic properties of the estimator (19) for generalized impulse response functions over compact sets (Cx; Cu ) are summarized in the next theorem.
Theorem 2 Under assumptions (A1)-(A4) given in the Appendix d k (Cx; Cu) ? GIRk (Cx; Cu ) = bGIR;k (Cx; Cu)h + op(h ) GIR 2
where
2
(20)
bGIR;k (Cx ; Cu ) = E fbGIR;k (Xi?1 ; Ui)jXi?1 2 Cx; Ui 2 Cug :
Theorem 2 shows that for the generalized impulse response functions over compact sets there does not exist the usual bias-variance trade-o. Within the constraint of h = n?1=(m+4) it is better to use a smaller h. This, of course, has to be quali ed for nite samples. While the estimator for GIRk proposed in this section has reasonable asymptotic properties, its application may be problematic in nite samples. In the next two sections we discuss this problem in more detail and present an improved estimator.
3 EFFICIENT VOLATILITY ESTIMATION The GIRk estimator (9) is based on the standard estimator (11) for the conditional volatility. This local linear estimator b 2 (x), however, has an asymptotic bias involving the mean function f , as seen in (14), and hence may perform poorly due to the in uence of f . This problem can also occur for other auxiliary functions such as bk (x) and b1;k (x),etc., which will be needed for computing the plug-in bandwidth based on formulas (16) or (17). In this 5
section we present an alternative local linear estimator for the conditional standard deviation that is asymptotically as accurate as if the true function f is known. The proposed method can also be used for estimating the covariance functions k2 (x); 1;k (x); 11;k (x). The idea for estimating k2 (x) is to base the estimator on the estimated residuals and use ?1 ek2 (x) = eT ZTk Wk Zk ZTk Wk Vk (21)
n o T o b b : In the next theo? fk (Xm? ) Yn ? fk (Xn?k )
n
2 where Vk = Ym+k?1 1 rem it is shown that this approach is indeed useful.
2
Theorem 3 Under assumptions (A1)-(A4) in the Appendix, one has n X 1 e ek (x) ? k (x) = b;k (x)h + n(x) Kh (Xj? ? x)k (Xj? )(Uj;k ? 1) + op(h ) (22) j m 2
2
2
2
1
2
1
2
=
where
eb;k (x) = K Tr r nk (x)o 2 o n o p mn nh ek (x) ? k (x) ? eb;k (x)h ! N 0; ;k (x) 2
and
2
with
2
(23)
2
2
2
m (x) k K k k ;k (x) = (x) (m ;k ? 1) 2 2
2
4 where m4;k = E (Uj;k ).
2
4
4
This theorem basically says that by \de-meaning" one can estimate k2 (x) as well as if one knew the true k-step prediction function fk . As one would expect, the noise level is the same for both bk2 (x) and ek2 (x) which can be seen from (22) and (29). However, from comparing b;k and eb;k given by (14) and (23), it can be seen that ek2 (x) has a simpler bias which does not depend on fk . In a similar way one can de ne estimators for the quantities (6) and (7). For example, one can estimate 11;k (x) as
e11;k (x) = eT ZTk Wk Zk where
n
Ym ? fb(Xm? ) 1
on 2
V
Ym+k?1
;k
?
1
ZTk Wk V
;k
11
=
n on o b b b ? fk (Xm? ) Yn?k ? f (Xn?k ) Yn ? fk (Xn?k ) 11
o
1
2
+1
and likewise 1;k (x). Under assumptions (A1)-(A3) in the Appendix, these respective estimators have similar properties as ek2 (x). The fact that ek (x) has a simpler bias that does not involve fk facilitates the computation of the plug-in bandwidth since the asymptotic bias term in the optimal bandwidth (16) becomes much simpler as well. For this reason and also the fact that it is more 6
ecient, we use from now on in the GIRk estimator (9) the new estimator (21) instead of (11) for estimating conditional volatilities. We note that despite of the fact that ek2 (x) is obtained by smoothing positive quantities, it still can take negative values in nite samples. In such cases, one replaces in (21) the local linear by either the local constant (Nadaraya-Watson) estimator or even a homoskedastic estimator which always produces positive estimates.
4 MULTI-STAGE GIR ESTIMATION
The main ingredients of the GIRk estimator (9) are the direct local linear predictors fbk and fbk?1 . While they are simple to implement, they may contain too much noise which has accumulated over the k prediction periods. To estimate fk (x) more eciently, we therefore propose to use instead the multi-stage method of Chen, Yang and Hafner (2000). To describe the procedure, one starts with Yt(0) = Yt , and repeats the following stage for j = 1; : : : ; k ? 1. For an easy presentation, we use here the Nadaraya-Watson form. Stage j: Estimate Pn?k (j ?1) efj (x) = t=Pmn??1kKhj (Xt ? x)Yt+j ; t=m?1 Khj (Xt ? x) and obtain the j -th smoothed version of Yt+j by Yt(+j )j = fej (Xt ). Then, the conditional mean function fk (x) is estimated by Pn?k (k ?1) efk (x) = t=Pmn??1kKhk (Xt ? x)Yt+k : t=m?1 Khk (Xt ? x) Graphically, the above recursive method can be presented as
(24)
k? k? (Yt k ;Xt ) (Yt k ;Xt k? ) (k ?1) (Yt k ;Xt ) e (2) (Yt k ;Xt k? ) =) fk (x): = ) = ) Y = ) Y Yt+k (Yt k=;X)t k? ) Yt(1) t+k t+k +k +
+
1
(1) +
+
2
(2) +
+
3
( +
2)
+1
( +
1)
The following theorem is shown in Chen, Yang and Hafner (2000). Theorem 4 Under conditions (A1)-(A3) in the Appendix, if hj = o(hk ); nhmj ! 1 for j = 1; : : : ; k ? 1, and hk = n?1=(m+4) for some > 0, and if the estimators fej (x) are all obtained local linearly, then ) ( 2m 2 q mne o k K k s ( x ) 2 nhk fk (x) ? fk (x) ? bf;k (x)hk ?!N 0; 2 (x)k where o n s2k (x) = V ar f^k?1 (Xt )jXt?1 = x : The local linear GIRk estimator based on multi-stage prediction is therefore given by g k (x; u) = fek?1 (xeu) ? fek (x) GIR (25) with the multi-stage predictor fek (x) and the alternative estimator for the conditional standard deviation ek (x) given by (24) and (21), respectively. In the next section we turn to issues of implementation. 7
5 IMPLEMENTATION Computing the direct or multi-stage GIR estimators (9) or (25) requires suitable bandwidth estimates. Both estimators use the Gaussian kernel and a plug-in bandwidth hb opt which is obtained by consistently estimating the unknown quantities in the asymptotically optimal bandwidth (16). Since the ecient volatility estimator (21) is used instead of the standard estimator (11), the bias term (14) in the optimal bandwidth (16) is replaced by (23). For estimating the second order direct derivatives q in the bias (13) we use a partial ar(X) with Vd ar(X) denoting the quadratic estimator with bandwidth h m + 4; 3 Vd geometric mean of the variances for each regressor and
h(k; ) = f4=kg1=(k+2) n?1=(k+2) : The partial quadratic estimator is a simpli ed version of the partial cubic estimator presented in Yang and Tschernig (1999). For estimating all other unknown functions in 2 GIR;k and bGIR;k we employ a plug-in bandwidth for estimating the conditional mean function, see Tschernig and Yang (2000, Section 5) for details of implementation. The main consideration for these bandwidth choices is that they are of the right order. For the multi-stage GIRk estimator (25) there does not exist a scalar optimal bandwidth. According to Chen, Yang and Hafner (2000) the optimal bandwidth for the rst j k ? 1 predictions fej (x) has a dierent rate. In their simulations they nd hMS;k?1 = hb opt n?4=(m+4) =5 to work quite well. However, in simulations the application of hMS;k?1 was often found to produce too small bandwidths and thus causing singular matrices in the local linear estimator. The plug-in bandwidth bhopt may be smaller than the optimal bandwidth for exclusively estimating fk since it serves several estimation purposes simultaneously. We therefore use bhopt for all steps. These quantities are also used for computing con dence intervals based on (15). All computations are carried out in GAUSS. 2
6 MONTE CARLO STUDY In this section we investigate the performance of the proposed GIRk estimators for two conditional heteroskedastic autoregressive nonlinear (CHARN) processes (1), each with two lags, 300 observations and i.i.d. standard normal errors Ut :
CHARN1 with
? Yt2?1 + 0:6 3 ? (Yt?2 ? 0:5)3 + 0:1 (Y )U ; Yt = ?0:4 13 + t?1 t 1 + (Y ? 0:5)4 Y2 t?2
t?1
1 2 2 2 (y) = 0:8 + 0:4 0:1 + 2 1 + exp(5 y) y + 0:4 (0:1 + 2(?5y)) y
and denoting the c.d.f. of the standard normal distribution.
8
CHARN2
Yt = 0:7Yt?1 ? 0:2Yt?2 + (?0:3Yt?1 + 0:7Yt?2 ) 1 + expf?10(?1 Y
with
t?1 ? 0:02)g
2 2 (y) = 0:25 + 0:75 1 +y y2 :
+ 0:5(Yt?1 )Ut ;
A plot of one realization of the CHARN1 process, of (y) and of the density on the range of the realizations are shown in Figure 1(a) to (c). For the CHARN2 process the corresponding plots are displayed in Figure 1(d) to (f). For illustration we computed GIR4 (x; 1) functions on a two-dimensional grid for x using the simulation method described in Koop, Pesaran and Potter (1996). Figure 2(a) displays the resulting surface for the CHARN1 process where all grid points outside the range of one realization of 300 observations are ignored. Figure 2(b) shows the surface of the multi-stage GIR estimates for the 389 relevant grid points. For the CHARN2 process the corresponding plots are shown in Figure 2(c) and (d). While these estimates seem to be encouraging, we conducted a simulation study to obtain a more precise evaluation of the suggested methods. To save computation time, we only considered about 50 dierent histories x. They were obtained by taking subsequent observations of one realization for each process and discarding 5% of those observations for which the density is lowest. The density is estimated using 10000 observations. Based on 100 replications we computed the mean squared errors for various estimators of the generalized impulse response function GIRk (x; u) with k = 4; 7 and 10 and u = ?1; 1. We considered both the one-stage estimator (9) with (10) and (21) and the multi-stage estimator (25) based on (24) and (21). We also tted a linear homoskedastic AR(2) model and computed the corresponding generalized impulse responses. Finally, for the CHARN2 process we computed the generalized impulse responses based on the estimated parameters of the correct parameterization of the CHARN2 mean function. This last exercise is not possible for the CHARN1 process due to identi ability problems of the parameters. Since presenting the mean squared errors of the GIRk (x; u) for each k; x; u of one process involves 300 numbers, P we decided to summarize this information into the mean integrated squared error j GIRk (xj ; u). Inspecting the mean integrated squared errors for the CHARN1 process in Table 1 indicates that both nonparametric estimators perform substantially better than the linear estimator based on a misspeci ed homoskedastic AR(2) model. Overall, the multi-stage GIR estimator shows the smallest mean integrated squared error. The superiority of the multi-stage GIR estimator over the direct GIR and the linear estimator is also found for the CHARN2 process although it is now less dramatic with respect to the linear estimator, see Table 2. Note that the multi-stage GIR estimator also outperforms the nonlinear parametric GIR estimator except for k = 7 and u = ?1. The results of this simulation study suggest that the proposed multi-stage estimator can be useful in practice if one expects the underlying process to exhibit substantial nonlinearities or heteroskedasticity or both and the functional form is unknown. In the next section it will be applied to a typical macroeconomic time series problem. 9
Table 1: Mean integrated squared errors (10?3 ) of generalized impulse response estimators for the CHARN1 process horizon k 4 7 10 Estimator n shock u -1 1 -1 1 -1 1 multi-stage GIR 5.945 3.754 5.397 8.443 3.952 6.381 direct GIR 8.557 5.382 8.388 8.967 8.918 7.902 28.297 28.337 16.931 16.880 7.008 8.013 linear AR(2) Table 2: Mean integrated squared errors (10?3 ) of generalized impulse response estimators for the CHARN2 process horizon k Estimator n shock u multi-stage GIR direct GIR linear AR(2) nonlinear AR(2)
-1 1.606 3.085 2.261 1.700
4
1 1.495 2.730 1.800 1.631
-1 2.273 3.671 2.883 1.784
7
1 1.808 3.421 2.695 1.818
-1 1.280 3.836 1.960 1.823
10
1 1.267 3.490 1.909 1.850
7 An empirical application For the analysis of business cycles linear models may be inadequate if the dynamics vary with the stage of the cycle. Potential explanations include asymmetric adjustment costs of labor (see Hamermesh and Pfann (1996) for a survey) or recessions as cleansing periods (see, for example, Caballero and Hammour (1994)) or the insider-outsider theory (Lindbeck and Snower (1988)). In general, the empirical analysis of relevant macroeconomic time series is based on parametric models such as the smooth transition model (e.g. Skalin and Terasvirta (1998)) which incorporates seasonal features of macroeconomic time series. On the other hand, it is not easy to choose an appropriate class of nonlinear models. For the latter reason, Yang and Tschernig (1998) extend the CHARN model (1) by deterministic seasonal components. Let S denote the number of seasons. The simplest seasonal model which they propose for a seasonal process Vt is the seasonal shift model Vs+S ? s = f Vs+S?1 ? fs?1g ; : : : ; Vs+S?m ? fs?mg + Vs+S?1 ? fs?1g ; : : : ; Vs+S ?m ? fs?mg Us+S where the time index t is replaced by s + S , s = 0; 1; : : : ; S ? 1 and = 0; 1; : : :, and the s denote seasonal mean shifts. For any integer a we de ne fag as the unique integer between 0 and S ? 1 that is in the same congruence class as a modulo S . For identi ability one requires 0 = 0. In the following we estimate the GIR function for the CHARN process Ys+S = Vs+S ?s. Yang and Tschernig (1998) show that for the purpose of nonparametric inference the deseasonalized Yt can be obtained by subtracting the estimated s 's. 10
We use this model for the analysis of the quarterly (seasonally non-adjusted) West German real GNP from 1960:1 to 1990:4 compiled by Wolters (1992, p. 424, note 4). Based on seasonal unit root testing by Franses (1996) we take the rst dierences of the logs. By subtracting the estimated seasonal means ^1 to ^3 , the deseasonalized Yt 's are growth rates with respect to the spring season. In order to avoid the dependence on a speci c season, we ignore the identi ability issue and subtract all four means ^0 = 0:0386, ^1 = 0:0518, ^2 = 0:0089 and ^3 = ?0:0673. To keep the model parsimonious we employ a CHARN model with all lags up to four. Since we have more than two lags, we can no longer investigate the generalized impulse response function on a grid as it was done in the previous section. Instead, we pick six distinct histories: x1 = (?0:02; ?0:01; 0; 0:01)T , x2 = (?0:01; 0; 0:01; 0:02)T , x3 = (0; 0:01; 0:02; 0:01)T , x4 = (0:01; 0; ?0:01; ?0:02)T , x5 = (0:02; 0:01; 0; ?0:01)T , x6 = (0:03; 0:02; 0:01; 0)T which represent various stages of the business cycle ranging from a substantial downswing to a considerable upswing. These growth rates correspond to the deseasonalized Yt process. In Figure 3 we display for each history and a positive unit shock the multi-stage GIRk (x; 1) estimator (solid line) plus 95% con dence intervals (dots and dashes) and the GIRk (x; 1) estimator based on a homoskedastic linear model (dashed line) for k = 3; : : : ; 20 or up to ve years. Note that the unit shock is multiplied with the estimated conditional standard deviation which for the given histories xi , i = 1; : : : ; 6 takes values in the range from 0.008 to 0.016. Both models exhibit strong seasonal dynamics. Subtracting seasonal means cannot remove all seasonal eects. The overall dynamics implied by both estimators are qualitatively similar. For history x4 , they are basically identical, see Figure 3(d). For histories that include a 2% growth of the deseasonalized real GNP, Figures 3(b), (c), (e) and (f) indicate that for the rst two years the linear model would overestimate the generalized impulse responses of a unit shock. Taking absolute values, these results also hold for a negative unit shock as can be seen from Figure 4. Such dierences cannot be revealed by GIR estimates based on linear models.
8 Conclusion Impulse responses have proven important to study the dynamics of linear time series processes. For conditional heteroskedastic autoregressive nonlinear processes we provide local linear estimators of generalized impulse response functions as de ned by Koop, Pesaran and Potter (1996). Asymptotic normal distributions are derived for the proposed nonparametric estimators without prior choice of the parametric forms of the process. An ecient new estimator of the conditional variance function is proposed based on a \de-meaning" idea. Plug-in optimal bandwidths are obtained and implemented, and multi-stage prediction techniques are used for enhanced performance. In a simulation study we compare the direct and multi-stage GIR estimators with a linear parametric estimator for two conditional heteroskedastic autoregressive nonlinear processes of order two and nd the multi-stage GIR estimator to perform best in terms of its mean integrated squared error. Finally we investigate quarterly data of the West German real GNP using the multistage GIR estimator and a GIR estimator based on a linear model after subtracting sea11
sonal means. For six distinct histories it is found that the magnitude of the nonparametrically estimated generalized impulse response functions dier across histories, a feature that is completely missed by linear models. The responses are smaller if there was considerable growth in at least one quarter. Based on the con dence intervals which are computed using the asymptotic distribution, these dierences are signi cant. In sum, we may conclude from the results of the Monte Carlo study and the empirical analysis that the proposed nonparametric multi-stage estimator of generalized impulse response functions can be a useful tool for studying nonlinear phenomena in economics and other elds.
APPENDIX With regard to the process (1) we assume the following: (A1) The vector process Xt?1 = (Yt?1 ; :::; Yt?m )T is strictly stationary and geometrically -mixing: (n) c0 ?n for some 0 < < 1, c0 > 0. Here
n o (n) = E sup P (AjFmk ) ? P (A) : A 2 Fn1+k
where Ftt0 is the -algebra generated by Xt ; Xt+1 ; :::; Xt0 . (A2) The stationary distribution of the process Xt?1 has a density (x), x 2 IRm , which is continuous. (A3) The functions f and have bounded continuous derivatives up to order 4 and is positive on the support of . (A4) There exists constants a; r > 0 such that E exp fa jY0 jr g < +1. A discussion of assumptions (A1) to (A3) can be found e.g. in Tschernig and Yang (2000). Assumption (A3) is needed for the functions fk ; k2 to be 4-th order smooth, as shown in the lemma that follows. The 4-th order smoothness is needed for using the plug-in bandwidths of Yang and Tschernig (1999).
Lemma 1 Under assumptions (A1)-(A3), one has for t m and k 2 (26) fk (x) = Efk? f (x) + (x)Ut ; x0 ; 0 0 k (x) = Efk? f (x) + (x)Ut ; x ? fk (x) + Ek? f (x) + (x)Ut ; x : (27) 1
2
2
2
1
2
1
Moreover, all functions fk ; k2 ; k = 2; 3; ::: have continuous derivatives up to order 4.
Proof. By the de nitions in (3), (5) and (4), for any t m, one has and hence
Yt+k?1 = fk?1 (Xt) + k?1 (Xt )Ut+1;k?1
fk (x) = E (Yt+k?1 jXt?1 = x) = E fk?1 (Xt ) jXt = f (x) + (x)Ut ; x0 12
which is the same as in (26). Likewise, using the martingale property of Ut+1;k?1 , one has
k2 (x) = V ar(Yt+k?1 jXt?1 = x) = V ar ffk?1 (Xt ) + k?1 (Xt )Ut+1;k?1 jXt?1 = xg = V ar ffk? (Xt ) jXt? = xg + V ar fk? (Xt )Ut ;k? jXt? = xg n o n o = E fk? (Xt ) jXt? = x ?[E ffk? (Xt ) jXt? = xg] +E k? (Xt )Ut ;k? jXt? = x h h i i = E fk? (Xt ) jXt = f (x) + (x)Ut ; x0 ?fk (x)+E k? (Xt )jXt = f (x) + (x)Ut ; x0 1
1
2
2
1
1
1
1
1
2
1
+1
2
2
2
1
1
1
2 +1
1
1
1
which is the same as in (27). Now the recursive formulae (26) and (27), assumption (A3) plus the fact that the shock variable Ut has nite 4-th moment yield the smoothness results. For proving Theorem 1 it is necessary to derive some auxiliary results rst and decompose the GIRk estimator in several terms. By Hardle, Tsybakov and Yang (1998), we have
fbk (x) = fk (x) + bf;k (x)h2 + n1(x)
h
n?X k+1
n
i=m
Kh(Xi?1 ? x)k (Xi?1 )Ui;k + op(h2 );
o
(28)
i
bk2 (x) = k2 (x) + h2 K2 Tr r2 fk2(x) + k2 (x) ? 2fk (x) Tr r2 ffk (x)g =2 n?X k+1 2 Kh (Xi?1 ? x)k2 (Xi?1 )(Ui;k ? 1) + op(h2 ) + n1(x)
which entails that
i=m
(29)
bk (x) = k (x) + b;k (x)h2
n?X k+1 2 + 2n(x1) (x) Kh (Xi?1 ? x)k2 (Xi?1 )(Ui;k ? 1) + op(h2 ): k i=m Now (28) and (30) imply that the estimated GIR function is
d k (x; u) = fbk?1 (xbu) ? fbk (x) GIR
= fk?1 (xb u ) ? fk (x) + fbf;k?1 (xb u ) ? bf;k (x)g h2 + k+2 1 n?X b n (xbu ) i=m Kh (Xi?1 ? xu )k?1 (Xi?1 )Ui;k?1
? n1(x)
n?X k+1 i=m
Kh(Xi?1 ? x)k (Xi?1 )Ui;k + op (h2 )
= fk?1 (xu ) ? fk (x) + [bf;k?1 (xu ) ? bf;k (x)] h2 + k+2 1 n?X n (xu ) i=m Kh (Xi?1 ? xu )k?1 (Xi?1 )Ui;k?1
? n1(x)
n?X k+1 i=m
Kh(Xi?1 ? x)k (Xi?1 )Ui;k + 13
(30)
hence
@fk?1 (xu ) nfb(x) ? f (x) + b (x)u ? (x)uo + o (h2 ) p @x1
d k (x; u) = GIRk (x; u) + bGIR;k (x; u)h2 + T1 + T2 + T3 + T4 + op(h2 ) GIR
(31)
where bGIR;k (x; u) is as de ned in (13) while
T1 = n (1x ) u
n?X k+2 i=m
T2 = ? n1(x)
Kh (Xi?1 ? xu )k?1 (Xi?1 )Ui;k?1 ;
n?X k+1 i=m
Kh (Xi?1 ? x)k (Xi?1 )Ui;k ;
n X @f 1 k ? 1 ( xu ) T3 = @x n(x) Kh(Xi?1 ? x)(Xi?1 )Ui ; 1 i=m n X u ?1 (xu ) 2 2 T4 = @fk@x 2n(x)(x) i=m Kh (Xi?1 ? x) (Xi?1 )(Ui ? 1) 1
(32)
by Hardle, Tsybakov and Yang (1998). We now consider the expectations of all products Ti Tj , i; j = 1; : : : ; 4 which are needed to compute the asymptotic variance. First, one has the following ve equations
2 (x ) E (T12 ) = kK k22m nhkm?1(xu ) + o n?1h?m ; u
2 E (T22 ) = kK k22m nhmk (x()x) + o n?1 h?m ;
Lemma 2
?1 (xu) 2 2 kK k22m nhm(x()x) + o n?1h?m ; E (T32 ) = @fk@x 1 u @fk?1 (xu) 2 2 x)(m4 ? 1) + o n?1h?m ; kK k22m (nh E (T42 ) = 2 @x m (x) 1 @f (xu) 2 u 2 (x) m + o n?1 h?m : k?1 2m E (T3 T4 ) = 2 k K k 2 @x1 nhm (x) 3
(33)
(x)I (x = xu ) kK k2m + o n?1 h?m ; E (T1 T2) = ? k?1;knh 2 m (x)
?1 (xu ) 1;k?1 (x)I (x = xu ) kK k2m + o n?1 h?m ; E (T1 T3 ) = @fk@x 2 nhm (x) 1
?1 (xu ) 11;k?1 (x)I (x = xu ) kK k2m + o n?1 h?m : E (T1 T4) = ? u2 @fk@x 2 nhm (x)(x) 1
14
(34)
Proof. We take i = 3 as an illustration. By the de nitions in (32) 1 ? ( xu ) E (T T ) = @fk@x n (x) (xu) n n?X k X E fKh (Xi? ? x)Kh (Xj? ? xu)(Xi? )k? (Xj? )UiUj;k? g : 1
1
3
1
2
+2
1
i=m j =m
1
1
1
1
1
Take a typical term from the double sum
E fKh (Xi?1 ? x)Kh (Xj?1 ? xu )(Xi?1 )k?1 (Xj?1 )Ui Uj;k?1g and apply change of the random variable Xi?1 = x+hZ, the term becomes 1 E K (Z)K Xj ?1 ? xu (x+hZ) (X )U U :
hm
k?1
h
j ?1 i j;k?1
If i 6= j , then Xj ?1 = (Yj ?1 ; :::; Yj ?m )T contains variables that are not in Xi?1 and so further changes of variable will make the above term of order O(h?m+1 ). If i < j , then both Xi?1 and Ui are predictable from Yj ?1 ; :::; Yj ?m ; ::: and so by the martingale property of Uj;k?1 the above term equals 0. Similarly the term equals 0 if i > j + k ? 2. Hence, the only nonzero terms satisfy 0 i ? j k ? 2, and there are only O(n) such terms. Furthermore, these nonzero terms are of order O(h?m+1 ) unless i = j . So one has 1 ?1 (xu ) E (T1 T3 ) = O(n?1 h?m+1 ) + @fk@x 2 n (x) (xu ) 1 n?X k+2
E fKh (Xi?1 ? x)Kh (Xi?1 ? xu)(Xi?1 )k?1 (Xi?1 )Ui Ui;k?1 g : i=m If x = xu then by de nition of 1k (x) E f(Xi?1 )k?1 (Xi?1 )Ui Ui;k?1jXi?1 g = 1;k?1(Xi?1 ) and so
n?X k+2 n o 1 @fk?1 (xu ) 2 E K ( X ? x ) ( X ) ( X ) U U i ? 1 i ? 1 i ? 1 i k ? 1 i;k ? 1 h @x1 n2(x) (xu ) i=m n?X k+2 n o ?1 (x) 1 2 = @fk@x E K ( X ? x ) ( X ) i ? 1 i ? 1 1 ;k ? 1 h n22 (x) 1
=
i=m @fk?1 (x) kK k22m 1;k?1(x) @x1 nhm2 (x)
+ o(n?1 h?m ):
If x 6= xu , using the same change of variable Xi?1 = x+hZ, one gets 1 E K Xi?1 ? x K Xi?1 ? xu (X ) (X )U U
h2m
h
i?1 k?1
h
15
i?1 i i;k?1
=
1 E K (Z) K x ? xu + Z (x+hZ) (x+hZ)U U i i;k?1 k?1 hm h
which is of order o(h?m ) as
sup K (z) K
z2Rm
x ? xu h
+ z ! 0:
The latter follows from the fact that x 6= xu makes the maximum of kzk and x?hxu + z go to zero uniformly for all z 2Rm , the boundedness of K and that limz!1 K (z) =0. Hence, now one has E (T1 T3 ) = O(n?1 h?m+1 ) + o(n?1 h?m ):
Lemma 3
?1 (xu ) 1k (x) kK k2m + o n?1 h?m ; E (T2 T3 ) = ? @fk@x 2 nhm (x) 1
?1 (xu ) 11;k (x) kK k2m + o n?1 h?m : E (T2 T4 ) = ? u2 @fk@x 2 nhm (x)(x) 1
(35) (36)
Proof. We prove (35) as an illustration. By the de nitions in (32) ? ( xu ) 1 E (T T ) = ? @fk@x n (x) n n?X k X E fKh(Xi? ? x)Kh (Xj? ? x)(Xi? )k (Xj? )Ui Uj;k g 2
3
1
1
2
2
+1
1
i=m j =m
1
and by the same reasoning as in Lemma 2, one has E (T T ) = ? @fk?1 (xu) 2
n?X k+1 i=m
3
@x1
1
1
1
n 2(x) 2
o n E Kh2 (Xi?1 ? x)(Xi?1 )k (Xi?1 )Ui Ui;k + o n?1 h?m :
Note that by de nition of 1k (x)
E f(Xi?1 )k (Xi?1 )Ui Ui;k jXi?1 g = 1k (Xi?1 ) and so
n?X k+1 n o 1 @f k ? 1 (xu ) E Kh2 (Xi?1 ? x)1k (Xi?1 ) + o n?1 h?m E (T2 T3 ) = ? @x n2 2 (x) 1 i=m
which is (35).
1 kK k2m (x) + o n?1 h?m ?1 (xu ) = ? @fk@x 1k 2 nhm (x) 1
16
Lemma 4
2 E (T1 + T2 + T3 + T4 )2 = n?1 h?m GIR;k (x; u) + o n?1 h?m
2 where GIR;k (x; u) is as de ned in (12).
Proof. This follows from equations (33), (34), (35) and (36), together with X X E (T1 + T2 + T3 + T4 )2 =
Proof of Theorem 1.
4
i=1
ETi2 + 2
i