Regression-type inference in nonparametric autoregression - CiteSeerX

6 downloads 0 Views 438KB Size Report
A nonparametric version of these models was introduced by Jones (1978). ... nonparametric autoregression and nonparametric regression concerning statistical.
Regression-type inference in nonparametric autoregression

Michael H. Neumann Humboldt-Universitat Sonderforschungsbereich 373 Spandauer Strae 1 D { 10178 Berlin Germany

Jens-Peter Kreiss Technische Universitat Braunschweig Pockelsstrae 14 D { 38106 Braunschweig Germany

1991 Mathematics Subject Classi cation. Primary 62G07, 62M05; secondary 62G09, 62G15. Keywords and Phrases. Nonparametric autoregression, nonparametric regression, strong approximation, bootstrap, wild bootstrap, con dence bands. Short title. Regression approximates autoregression. The second author gratefully acknowledges the hospitality and support of the SFB 373 at Humboldt University and of the Weierstrass Institute, Berlin. 1

1

1. Introduction Autoregressive models form an important class of processes in time series analysis. A nonparametric version of these models was introduced by Jones (1978). To allow for heteroscedastic modelling of the innovations, people often consider the model Xt = m(Xt?1; : : : ; Xt?p ) + v(Xt?1; : : : ; Xt?q )"t; (1.1) where the "t are assumed to be i.i.d. with mean 0 and variance 1. Several authors dealt with the interesting statistical problem of estimating the autoregression function m nonparametrically. Robinson (1983), Tjstheim (1994) and Masry and Tjstheim (1995) considered usual Nadaraya-Watson estimators. Recently Masry (1996) and Hardle and Tsybakov (1997) investigated local polynomial estimators in this context. For some particular purposes of statistical inference like the construction of con dence sets and tests of hypotheses, it is also important to get knowledge about the statistical properties of the underlying estimator. Franke, Kreiss and Mammen (1996) consider time-series speci c as well as regression-type bootstrap methods for model (1.1), and showed their consistency for the pointwise behaviour of kernel smoothers of m. One of our goals is to show the validity of one of these bootstrap methods for statistics which concern the joint distribution of nonparametric estimators. This is motivated by potential applications to simultaneous con dence bands and nonparametric tests. In this paper, we also try to consider the situation from a more general point of view. We show rst the closeness, in an appropriate sense, of a model like (1.1) to a corresponding regression model. To simplify notation, we restrict ourselves to the case of one lag, that is p = q = 1 . Without additional e ort, we may allow the whole distribution of "t to depend on Xt?1. Accordingly, our basic assumption is that X0; : : : ; XT is a realization from a strictly stationary time-homogeneous Markov chain. The validity of our regression-type bootstrap is based on a strong approximation of ch(x)gx2I , construed as a process in x, by a cora certain nonparametric estimator fm fh(x)gx2I in a regression model, where I is a certain interval responding estimator fm of interest. As a typical nonparametric estimator, we consider a local polynomial estimator. To this end, we construct rst a strong approximation of partial sums w.r.t. dyadic intervals, Zj;k = Pt: Xt?12Ij;k "t , Ij;k = [FX?1((k ? 1)2?j ); FX?1(k2?j )) , where FX is the common cumulative distribution function of the Xt , by respective partial sums in a regression model. Such a partial sum approximation w.r.t. dyadic intervals is known to be a powerful tool to prove strong approximations in several instances; the best-known example is the seminal paper by Komlos, Major and Tusnady (1975). We achieve an approximate pairing of the random variables in the autoregressive model with the random variables in the regression model by Skorokhod embedding of the innovations/errors in a common set of Wiener processes assigned to the intervals Ij;k . Note that, in contrast to recent work of Brown and Low (1996) and Nussbaum (1996) who provide approximations of nonparametric experiments by simpler ones, we derive more practically oriented approximations of nonparametric estimators by their counterparts in observation models of simpler structure.

2

There will be several interesting consequences of our strong approximation result. Besides our particular use as a rst step for proving the validity of a certain regressiontype bootstrap, one can immediately derive similarities between properties of estimators in both models. For example, the pointwise or the integrated risks of corresponding nonparametric estimators are asymptotically the same. Moreover, one may also consider this result as a characterization of some kind of asymptotic equivalence of nonparametric autoregression and nonparametric regression concerning statistical inference about the autoregression/regression function. Accordingly, this provides a justi cation for the use of regression-type methods in the context of nonparametric autoregression, which has already been done for a long time. The second step in proving the validity of the bootstrap proposal consists of constructing a strong approximation of the stochastic part of the LPE in the regression model and the bootstrap counterpart. Such an approximation has already been derived in a similar context in Neumann and Polzehl (1995), and we will borrow the corresponding result from there. Both strong approximations together yield the dech(x) by the bootstrap sired strong approximation of the stochastic part of the LPE m process. We apply this result to the construction of nonparametric con dence bands and supremum-type tests. The good rate for the approximation error suggests that the wild bootstrap is valid for several other purposes, too. Kreiss, Neumann and Yao (1997) prove this for L2-type tests similar to that developed by Hardle and Mammen (1993) in the regression case. The paper is organized as follows. In Section 2 we present the strong approximation result for partial sums on dyadic intervals (Theorem 2.1). This implies a strong approximation of an LPE in nonparametric autoregression by an LPE in nonparametric regression. Section 3 contains the wild bootstrap proposal and the corresponding strong approximation. We describe the application of the results to simultaneous bootstrap con dence bands and nonparametric tests. Further, we brie y discuss the higher-dimensional case and robustness properties against possible deviations from our structural model assumption. In Section 4 we present some simulation results in order to demonstrate the nite sample behavior of our proposal. All proofs are deferred to a nal Section 5. 2. Approximation of nonparametric autoregression by nonparametric regression

Assume we observe a stretch X0; : : : ; XT of a strictly stationary time{homogeneous Markov chain. We are interested in estimating the autoregression function m(x) = E (Xtj Xt?1 = x) . First, we write the data generating process in the form of a nonparametric autoregressive model, Xt = m(Xt?1) + "t; t = 1; : : : ; T; (2.1) where the distribution of "t is allowed to depend on Xt?1 with E("t j Xt?1) = 0; E "2t j Xt?1 = v(Xt?1): The conditional variance v(Xt?1) is assumed to be bounded away from zero and in nity on compact intervals. Note that, in contrast to the frequently used assumption

3

of errors of the form (Xt?1)et with i.i.d. et's, the errors here can follow completely di erent distributions and are not necessarily independent. Such a dependence may arise because the distribution of "t depends, through Xt?1 , on X0 and "1; : : : ; "t?1 . To ensure recurrence, we assume that

(A1) fXt : t  0g is a (strictly) stationary time-homogeneous Markov chain. We

denote by FX the common cumulative distribution function of the Xt, which is assumed to be continuous. Furthermore, we assume absolute regularity (i. e. mixing) for fXtg and that the -mixing coecients decay at a geometric rate.

Remark 1. For the de nition of mixing we refer to the monograph by Doukhan (1994, Chapter 1). Assumption (A1) is for example ful lled if we assume the following explicit structure of the data-generating process: Xt = m(Xt?1) + (Xt?1)e0t; (2.2)

where  : R ! (0; 1) and (e0t) denote i.i.d. innovations with zero mean and unit variance. If one assumes further that (x)e01j < 1 lim sup E jm(x)j+ xj jxj!1 and that the distribution of e01 possesses a nowhere vanishing Lebesgue density, then one may conclude that fXtg de ned according to (2.2) is geometrically ergodic (cf. Doukhan (1994, p. 106/107)), which implies geometrical -mixing if the chain is stationary, i.e. X0  FX . Moreover, the assumption of an everywhere positive density of the innovations can be relaxed. Besides a usual drift condition to a certain compact set K , it is enough that L(Xt j Xt?1 = x) has a conditional density pXtjXt?1 (y j x) bounded away from zero for x; y 2 K and jx ? yj   for some  > 0; for details see Franke, Kreiss, Mammen and Neumann (1997). The assumption that the chain is stationary may be avoided, since, for any initial distribution, we have geometric convergence to the unique stationary distribution by geometric ergodicity. Nevertheless, we assume throughout the whole paper that the underlying Markov chain is stationary. Assumption (A1) will be used to show auxiliary results like that sums of random variables derived from the Xt behave similar to the independent case; see Lemma 2.1 below. Hence, it becomes clear that other mixing conditions are possible as well; geometric absolute regularity is merely assumed for convenience. Although it is perhaps more natural to approximate nonparametric autoregression by nonparametric regression with random design, we establish here an approximation by nonparametric regression with nonrandom design. This is done in view of the proposed bootstrap method, which mimics just nonparametric regression with nonrandom design. Let fx0; : : : ; xT ?1g be a xed realization of fX0; : : : ; XT ?1g . As a counterpart to (2.1) we consider the nonparametric regression model Yt = m(xt?1) + t; t = 1; : : : ; T; (2.3)

4

where the t's are independent with t  L("t j Xt?1 = xt?1) . Here we denote the independent variables by small letters to underline the fact that we consider the distribution of the Yt's conditioned on a xed realization of fX0 ; : : : ; XT ?1g . Before we turn to the main result of this subsection, we introduce some more notation and provide a useful auxiliary lemma. If we compare the cumulative distribution functions of two random variables, then we can expect that they are close to each other, if the di erence between the random variables is small with high probability. Because of the frequent use of this fact we formalize it by introducing the following notion. De nition 2.1. Let fZT g be a sequence of random variables and let f T g and f T g be sequences of positive reals. We write ZT = Oe ( T ; T ); if P (jZT j > C T )  C T (2.4) holds for T  1 and some C < 1 . This de nition is obviously stronger than the usual OP and it is well suited for our particular purpose of constructing con dence bands and critical values for tests; see the applications in Section 4. Whenever we claim that Oe holds uniformly over a certain set, we mean that (2.4) is true for the same constant C . Here and in the following we make the convention that  denotes a positive but arbitrarily small, and  an arbitrarily large constant. Before we turn to the strong approximation for the partial sums, we state quite a useful lemma about the stochastic behaviour of sums of geometrically -mixing random variables. Lemma 2.1. Suppose that (Zt)t=1;:::;T is geometrically -mixing and EZt = 0 : (i) If jZtj  1 almost surely, then 9 1 0 8 sX T q = < X var(Zt) log T + (log T )2; ; T ?A : Zt = Oe @min : T log T; t t=1 (ii) Under the weaker assumption that 8M < 1 9CM < 1 such that E jZtjM  CM we have 0s 1 T X X Zt = Oe @ var(Zt ) log T + T ; T ?A : t=1

t

Remark 2. If the Zt were independent, we would obtain similar bounds where some of the logarithmic factors were improved. In the rst case, Bernstein's inequality qP p e would immediately yield a stochastic upper bound of order O( var(Zt) log T + log T; T ?); see, for example, Neumann (1996). In the secondpcase, since supfjZtjg = q P P  ?  Oe (T ; T ) , we would obtain that Zt = Oe ( var(Zt) log T + T ; T ?) : The additional logarithmic terms arise because of the blocking technique used to handle

5

the weak dependence. To ensure the desired behaviour of weighted sums of the "t's and t's, respectively, we impose the following condition.

(A2) For all nM < 1 there exist nite o constants CM such that M supx2R E j"tj j Xt?1 = x  CM Actually, it can be seen from the proofs that a certain nite number M of uniformly bounded moments would suce. However, it seems to be dicult to get a minimal value for M , and therefore we do not make the attempt to give a particular value for it. 2.1. Approximation of partial sums w.r.t. dyadic intervals. The ultimate goal in the present paper is to show the validity of the wild bootstrap for statistics occuring with nonparametric estimators of the autoregression function m beyond the pointwise behaviour of these estimators. To this end, we show in this section that the joint (in ch(x) can be approximated by the joint distribution of x) distribution of an LPE m fh(x) de ned in a corresponding nonparametric regression an analogous estimator m model. ch(x) can be well approximated by It will bePshown in the next subsection that m m(x) = t wh (x; Xt?1)Xt, where the weight function wh(x; Xt?1) depends only on the spatial position x and a P single observation Xt?1. m(x) can be decomposed into a purely stochastic part wh (x; Xt?1)"t, and an essentially nonrandom part P wh (x; Xt?1)m(Xt?1). Whereas the treatmentPof the latter part will not cause any problems, the approximation of wh (x; Xt?1)"t by its counterpart P w substantial ( x; x )  requires more work. h t?1 t To formalize such an approximation, we construct, on a suciently rich probability space, a pairing of the random vector (X00 ; "01; : : : ; "0T ) with another vector (10 ; : : : ; T0 ) such that 1) (X00 ; "01; : : : ; "0T ) =d (X0; "1; : : : ; "T ), 2) (10 ; : : : ; T0 ) =d (1; : : : ; T ) and i o n h 3) supx2I P wh(x; Xt0?1)"0t ? wh (x; xt?1)t0 is small with high probability, where I is a certain interval of interest. To facilitate notation, we will not distinguish between the original random variables Xt; "t; t and their arti cial counterparts Xt0; "0t; t0 , and simply use the notation without prime. A rst step toward an approximation as in 3) is an approximation of partial sums Pt:Xt?12Il "t by Pt:xt?12Il t, where fIlg is an appropriate set of intervals. Using Skorokhod embedding techniques as in the proof of Theorem 2.1, it is rather straightforward to establish such an approximation for nonoverlapping intervals Il = [(l ? 1)T ; lT ). This can imply a signi cant result for the di erence in 3) if T tends to zero at a faster rate than the bandwidth of the LPE.

6

However, as will be explained below, one often gets a smaller approximation error by speci c constructions that yield simultaneously good approximations of partial sums w.r.t. intervals of di erent lengths. The perhaps best-known example is the strong approximation for empirical processes of i.i.d. random variables by Komlos et al. (1975). We establish rst a multiscale approximation for partial sums ?j ); F ?1(k 2?j )) , (j; k ) 2 IT , where w.r.t. dyadic intervals Ij;k = [FX?1((k ? 1)2 X  IT = f?(j;1 k) j 0  j  j ; 1  k  T g , 2j  T < 2j+1 . [We set FX?1(0) = ?1 and FX (1) = 1 .] This implies in particular a useful strong approximation w.r.t. the \natural" dyadic intervals [(k ? 1)2?j ; k2?j ). Let X Zj;k = "t t:Xt?12Ij;k

and

0 = Zj;k

X t:xt?1 2Ij;k

t

be partial sums of the errors according to the autoregressive model (2.1) and the regression model (2.3), respectively. The link between the two sampling schemes (2.1) and (2.3) will be reached by Skorokhod embeddings of the "t's and t's, respectively, in a common set of Wiener processes. Such an embedding was introduced by Skorokhod (1965) for independent random variables and is known as a possible tool to derive strong approximations for partial sums of independent random variables; cf. Csorg}o and Revesz (1981, Chapter 2). Later the technique has been extended to martingales by several authors; a convenient description of the main ideas can be found in Hall and Heyde (1980, Appendix A.1). Theorem 2.1. Suppose that (A1) and (A2) are ful lled. On an appropriate probability space, there exists a pairing of the random variables from (2.1) with those from (2.3) such that  o n  0 j > C (T 2?j )1=4 log T + T  for any (j; k) 2 IT = O(T ? ) P jZj;k ? Zj;k  holds uniformly in (x0 ; : : : ; xT ?1) 2 T for an appropriate set T with P ((X0; : : : ; XT ?1) 62 T ) = O(T ?) . From this approximation w.r.t. the intervals Ij;k , we can immediately derive an analogous approximation for arbitrary intervals. This includes as special cases the natural dyadic intervals [(k ? 1)2?j ; k2?j ). Corollary 2.1. Suppose that (A1) and (A2) are ful lled, and let T be as in Theorem 2.1. Then there exists a pairing of the random variables from (2.1) and (2.3) such that 9 8 P 0 1 < t: Xt?12[c;d) "t ? Pt: xt?12[c;d) t = P @ sup : [TP (X 2 [c; d))]1=4 log T + T  ; > CA = O(T ?) ?1c C = O(T ?) w2W T [TV (w)]3=4kwk11=4 log T + TV (w)T  holds uniformly in (x0; : : : ; xT ?1) 2 T . 2.2. Strong approximation for local polynomial estimators. We intend to construct an asymptotic con dence band for the conditional mean function m. This makes sense for a region where we have enough information about m. To facilitate the technical calculations, we assume

(A3) The stationary density pX of Xt is bounded and satis es pX (x)  C > 0 for all x 2 [a; b] and construct a con dence band for this interval [a; b]. We focus our attention to so-called local polynomial estimators. These estimators are introduced in a paper by Stone (1977). Tsybakov (1986), Korostelev and Tsybakov (1993, Chapter 1), Fan (1992, 1993) and Fan and Gijbels (1992, 1995) discuss the behavior of LPE for nonparametric regression in full detail. Recently Masry (1996) and Hardle and Tsybakov (1997) applied LPE to nonparametric autoregressive models. ch(x) of m(x) is given as ab0 = ab0(x; X0; : : : ; XT ) , A p-th order local polynomial estimator m 0 b b b where a = (a0; : : : ; ap?1) minimizes 1  x ? Xt?1  0 pX ?1  x ? X q 2 T X t ? 1 @Xt ? aq A : (2.5) Mx = K h h q=0 t=1 At the moment we only assume that the bandwidth h of the local polynomial estimator satis es h = O(T ? ) and h?1 = O(T 1?) for some  > 0 . We assume that the kernel K is a nonnegative function of bounded total variation with supp(K )  [?1; 1] , which is bounded away from zero on a set of positive Lebesgue measure. We do not impose any further smoothness condition on K , because only a particular choice of p, which makes a certain rate of convergence possible, can be motivated from the estimation point of view. From least-squares theory it is clear that c mh can be written as T i h X c mh (x) = wh(x; Xt?1; fX0; : : : ; XT ?1g)Xt = (Dx0 KxDx )?1Dx0 Kx X 1 ; t=1 (2.6)

8

where X = (X1; : : : ; XT )0 ,

0 1 B . Dx = B @ .. 1

x?X0 h

...

x?XT ?1 h

   ( x?hX0 )p?1 1C ...



...

( x?XhT ?1 )p?1

CA ;

  x ? X0 x ? X T ?1 Kx = Diag K ( h ); : : : ; K ( h ) and [:]1 denotes the rst entry of a vector. It is shown at the end of the proof of Lemma 2.2 that (Dx0 KxDx )?1 exists with a probability exceeding 1 ? O(T ?) : ch seems to be quite involved, because the Xt's are On rst sight the analysis of m dependent and enter into the right-hand side of (2.6) several times. To simplify the ch(x) ? m(x)gx2[a;b] , we approxiinvestigation of the deviation process (in x), fm mate it by an analogous deviation process de ned by observations according to the nonparametric regression model (2.3). In analogy to (2.6) we de ne a local polynomial estimator as T X fh(x) = wh(x; xt?1; fx0; : : : ; xT ?1g)Yt: (2.7) m t=1

Before we turn to the main approximation step, we derive rst some approximations fh, which allow us to replace the local polynomial estimators by quantities to c mh and m of a simpler structure. According to (2.6), the weights of the local polynomial estimator can be written as   q pX ?1 wh(x; Xt?1; fX0; : : : ; XT ?1g) = dq (x; fX0; : : : ; XT ?1g)K x ? hXt?1 x ?hXt?1 ; q=0 (2.8) where dq (x; fX0; : : : ; XT ?1g) = ((Dx0 KxDx)?1 )1;q+1 . (Mij denotes the (i; j )'th entry of a matrix M .) The functions dq depend on fX0; : : : ; XT ?1g in a smooth manner (\smooth" is meant in the sense of bounded total variation, which leads to appropriately decaying coecients in a Haar series expansion) and yields the following nonrandom approximation: Lemma 2.2. Assume (A1) and (A3). Then there exist nonrandom functions dq(1) (x),  dq(1)(x) = (EDx0 KxDx)?1 1;q+1 = O((Th)?1) , such that o n   sup dq (x; fX0; : : : ; XT ?1g) ? dq(1)(x) = Oe (Th)?3=2 log T; T ? : x2[a;b]

This lemma allows to introduce weights wh(x; Xt?1) , which depend only on a single value Xt?1, namely  x ? Xt?1   x ? Xt?1 q pX ?1 ( 1 ) : (2.9) wh(x; Xt?1) = dq (x)K h h q=0

9

Now we obtainPthe following assertions, which nally allow us to consider the di erence between t wh (x; Xt?1)"t and Pt wh (x; xt?1)t rather than that between the ch(x) and m fh(x). more involved quantities m Proposition 2.1. Suppose that (A1) to (A3) are ful lled. Then ) ( X   sup [wh(x; Xt?1; fX0; : : : ; XT ?1g) ? wh (x; Xt?1)] "t = Oe (Th)?1(log T )3=2; T ? : x2[a;b] t Analogously, ) ( X   sup [wh(x; xt?1; fx0; : : : ; xT ?1g) ? wh(x; xt?1)] t = Oe (Th)?1(log T )3=2; T ? x2[a;b] t holds uniformly in (x0 ; : : : ; xT ?1) 2 T , where T is an appropriate set with P ((X0; : : : ; XT ?1) 62 T ) = O(T ?) . For the next assertion concerning a term, which plays a role similar to the usual bias term in nonparametric regression, we need the following assumption.

(A4) m is p-times di erentiable with supx2[a?;b+]fjm(p)(x)jg < 1 , for some  > 0 . Proposition 2.2. Suppose that (A1), (A3) and (A4) are ful lled.

As an approximation to the bias-type term we consider the nonrandom quantity ) pX ?1 X (  x ? Xt?1   x ? Xt?1 q Z Xt?1 (Xt?1 ? s)p?1 (p) ( 1 ) b1(x) = dq (x) E K h h (p ? 1)! m (s) ds : x t q=0 Then sup fjb1(x)jg = O(hp) and

x2[a;b]

) ( X sup wh(x; Xt?1; fX0; : : : ; XT ?1g)m(Xt?1) ? m(x) ? b1(x) x2[a;b] t   = Oe hp(Th)?1=2 log T; T ? :

To establish now the desired approximation of Pt wh(x; Xt?1)"t by Pt wh(x; xt?1)t , we only have to nd upper bounds to the total variation and the L1-norm of wh (x; :). This leads to the following assertion. Proposition 2.3. Suppose that (A1) to (A3) are ful lled, and let T be as in Theorem 2.1. Then there exists a pairing of the random variables from (2.1) and (2.3) such that ) ( X   X sup wh (x; Xt?1)"t ? wh(x; xt?1)t = Oe (Th)?3=4 log T; T ? x2[a;b] t t

10

holds uniformly in (x0 ; : : : ; xT ?1) 2 T .

The approximations given in the Propositions 2.1, 2.2 and 2.3 lead also to the approximation of an LPE in nonparametric autoregression by an LPE in nonparametric regression. Theorem 2.2. Suppose that (A1) to (A4) are ful lled, and let T be as in Theorem 2.1. Then there exists a pairing of the random variables from (2.1) and (2.3) such that   fh(x)jg = Oe hp(Th)?1=2 log T + (Th)?3=4 log T; T ? sup fjc mh(x) ? m x2[a;b]

holds uniformly in (x0 ; : : : ; xT ?1) 2 T .

Besides the technical quanti cation of a certain upper bound of the rate of approxch(x) by m fh (x), the more important fact is that the di erence between imation of m c f mh(x) and mh(x) is of smaller order than the stochastic uctuations of c mh(x), which are OP ((Th)?1=2). This can be interpreted as some kind of asymptotic equivalence of nonparametric autoregression and nonparametric regression. It is the rst step in proving the validity of a regression-type bootstrap, the so-called wild bootstrap, in nonparametric autoregression. With a simple extra argument, it can also be used for proving asymptotic equivalence of the mean square error of nonparametric estimators in models (2.1) and (2.3). Remark 3. As was already mentioned, it perhaps would have been more natural to approximate nonparametric autoregression by nonparametric regression with random design. That is, instead of (2.3) we consider the nonparametric regression model Zt = m(Yt) + t; t = 1; : : : ; T; (2.10) where the pairs (Yt; Zt) are i.i.d. according to the stationary distribution of the vector (Xt?1; Xt) in model (2.1). Let m h(x) be the local polynomial estimator in model (2.10), which is de ned analogously to (2.7). It is easily seen that the statement in Theorem 2.1 implies the asymptotic equivalence of LPE's in models (2.1) and (2.10). Strictly speaking, under (A1) to (A4) there exists a pairing of the random variables from (2.1) with those of (2.10) such that   ch(x) ? m h(x)jg = Oe hp(Th)?1=2 log T + (Th)?3=4 log T; T ? : sup fjm x2[a;b]

3. The bootstrap To motivate the particular resampling scheme proposed here, rst note the di erent nature of the stochastic and the \bias-type" term. Even if the current value of the stochastic term is unknown, its distribution can be consistently mimicked by the bootstrap. In contrast, the bias can only be handled, if some degrees of smoothness ch(x). In nonparametric regression and density estimation, of m are not used by m there exist two main approaches to handle the bias problem: undersmoothing and explicit bias correction.

11

We mimic only the stochastic term of the LPE in the bootstrap world, and we will use separate adjustments for the bias. In view of the possibly inhomogeneous conditional variances, we use here the wild bootstrap technique, which has been introduced by Wu (1986). A detailed description of this resampling scheme can be found in the monograph by Mammen (1992). It has successfully been used in nonparametric regression in the already mentioned paper by Hardle and Mammen (1993). Let (x0; : : : ; xT ) be the realization of (X0; : : : ; XT ) at hand. We generate independent bootstrap innovations "1; : : : ; "T with ch(xt?1))2: E "t = 0; E ("t )2 = "bt2 = (xt ? m The notation E  is used to underline the conditional character of the distribution L("1; : : : ; "T j X0; : : : ; XT ). An appropriate counterpart to model (2.3) in the bootstrap world is given by Xt = c mh(xt?1) + "t ; t = 1; : : : ; T: Since we mimic the stochastic term Pt wh (x; Xt?1; fX0; : : : ; XT ?1g)"t of the local polynomial estimator only, we do not use the Xt's explicitly. 3.1. A strong approximation for the bootstrap process. In order to be able to apply devices such as Lemma 2.1, we have to ensure that for all integers M there exists a nite constant CM > 0 such that E j"t jM  CM j"btjM : This can be ensured if we assume that "t = "btt for a sequence of i.i.d. random variables 1; : : : ; T with E 1 = 0 , E (1)2 = 1 , and E j1jM < 1 , for all integers M . For the desired strong approximation of Pt wh (x; Xt?1; fX0; : : : ; XT ?1g)"t by P wgetting  t h (x; xt?1; fx0 ; : : : ; xT ?P1g)"t , it remains to establish a connection between the latter process (in x) and t wh(x; xt?1; fx0; : : : ; xT ?1g)t. This can be achieved by a construction along the lines of the proof of Theorem 2.1 in Neumann and Polzehl (1995). In contrast to the strong approximation of nonparametric autoregression by nonparametric regression, where we had to devise an embedding scheme which takes the consecutive order of the observations into account, we have to combine two models with independent observations. Based on Sakhanenko's (1991) strong approximation result for partial sums of independent random variables, we can derive a strong approximation with a smaller error than in our Theorem 2.1. Lemma 3.1. Suppose that (A2) is ful lled. On a suciently rich probability space, there exists a pairing of 1 ; : : : ; T with "1; : : : ; "T such that ) ( X X  sup wh(x; xt?1; fx0; : : : ; xT ?1g)t ? wh(x; xt?1; fx0; : : : ; xT ?1g)"t x2[a;b] t t   = Oe (Th)?1T ; T ? holds uniformly in (x0 ; : : : ; xT ?1) 2 T . In conjunction with Proposition 2.1 and 2.3, we obtain the following theorem:

12

Theorem 3.1. Suppose that (A1) to (A3) are ful lled. On a suciently rich probability space, there exists a pairing of X0 ; "1 ; : : : ; "T with "1; : : : ; "T such that

) ( X X  sup wh(x; Xt?1; fX0; : : : ; XT ?1g)"t ? wh(x; xt?1; fx0; : : : ; xT ?1g)"t x2[a;b] t t   = Oe (Th)?3=4 log T; T ? holds uniformly in (x0 ; : : : ; xT ?1) 2 T . Theorem 3.1 basically says that the stochastic part of c mh(x) can be successfully mimicked by its wild bootstrap analogue. Hence, one can apply the bootstrap to approximate, besides the pointwise distribution of nonparametric estimators which was already shown in Franke et al. (1996), supremum-type functionals of the LPE. The perhaps most often studied problem in this context are con dence bands for a certain function to be estimated nonparametrically. In addition, we may also employ the wild bootstrap for determining critical values for nonparametric supremum-type tests. We study both problems in the following.

3.2. Bootstrap con dence bands. The construction of nonparametric con dence bands is a classical eld of application for bootstrap methods. It is well known that rst-order asymptotic theory for the supremum of an approximating Gaussian process leads to an error in coverage probability of order (log T )?1; see Hall (1991). In contrast, we can obtain an algebraic rate of convergence by using the bootstrap. In contrast to con dence intervals for the global mean, in nonparametric statistics one always encounters a serious problem due to the bias. The point is that an optimal tuning of nonparametric estimators is achieved by forcing the magnitude of the stochastic term and the bias term to be of the same order. Although one can suciently good estimate the behaviour of the stochastic term, there remains the uncertainty about the bias term. Hence, without some extra information, it is indeed impossible to construct con dence intervals or bands with an asymptotically correct coverage probability whose length decreases with the rate of optimal estimators. This is quite obvious in the case of pointwise con dence intervals where the stochastic uctuations of the estimator are just of the same order as its standard deviation. In the case of simultaneous con dence bands, one has to consider the supremum deviation which is known to be degenerate. The size of the stochastic part is actually by a factor of p order log T larger than the pointwise standard deviation; however, the stochastic

uctuations of the supremum deviation is nevertheless of smaller order of magnitude than the pointwise standard deviation. There are several options for handling this problem. In the regression context, some authors constructed conservative con dence bands for the mean function m on the basis of some prior information about the maximal roughness of m; see Kna , Sacks and Ylvisaker (1985), Hall and Titterington (1988), and Sun and Loader (1994). As Hall and Titterington (1988) mention, such prior knowledge may sometimes arise from physical considerations or previous empirical evidence. However, this approach is clearly restricted to the case where such prior information is indeed available. If this is not the case, there does not exist an entirely satisfactory strategy for the construction of con dence bands. The perhaps cleanest solution is to consider bands

13

for a smoothed version of m, X (Kh m)(x) = wh(x; Xt?1; fX0; : : : ; XT ?1g)m(Xt?1); t

rather than for m itself. Now the problem is much easier to deal with, and with bands for Kh m, we have also more freedom to choose h. Note that Kh m itself is random, however, it can be interpreted as some local average of m. Let 1 ? be the nominal coverage probability. We develop simultaneous bands as opposed to con dence bands which attain pointwise a certain coverage probability. To construct a con dence band of uniform size, we consider the quantity ) ( X   UT = sup wh(x; xt?1; fx0; : : : ; xT ?1g)"t ; x2[a;b] t which is introduced to mimic ch(x) ? (Kh m)(x)jg : UT = sup fjm x2[a;b]

Let t be the (random, because it depends on the sample X0; : : : ; XT in model (2.1)) (1- )-quantile of UT . Then ch(x) + t ] (3.1) mh(x) ? t ; m I (x) = [c is supposed to form an asymptotic con dence band for Kh m to the prescribed level 1 ? . A more reasonable and perhaps more natural alternative are simultaneous con dence ch(x). bands whose size is proportional to an estimate of the standard deviation of m  Whereas the size of I is essentially driven by the worst case, that is by the supremum ch(x)) , a variable con dence band follows in size the local variability of V (x) = var(m ch(x). It can be expected that the area of such a con dence band is smaller than of m that of a band of uniform size. Moreover, it can serve as a visual diagnostic tool to detect regions where there are diculties for the estimator { either because of large variances of the "t's or because of too sparse a design. Now we describe the construction of a con dence band of variable size in detail. The ch(x), V (x), by residuals "bt can also be used to estimate the variance of m X (3.2) Vb (x) = wh2 (x; xt?1; fx0; : : : ; xT ?1g)"b2t : t

Let t

be the (1- )-quantile of the distribution of ( X q )   VT = sup wh(x; xt?1; fx0; : : : ; xT ?1g)"t = Vb (x) ; x2[a;b] t which mimics  q  mh(x) ? (Kh m)(x)j = Vb (x) : VT = sup jc

(3.3)

This leads to a con denceband of the form  q q    b b ch(x) + V (x)t : I (x) = c mh(x) ? V (x)t ; m

(3.4)

x2[a;b]

14

We already know from Theorem 3.1Pthat the process c mh(x) ? (Kh m)(x) is pathwise close to the conditional process t wh(x; xt?1; fx0; : : : ; xT ?1g)"t (conditioned on X0; "1; : : : ; "T ) on an appropriate probability space. In order to get statistically relevant results, we have to show that the approximation error is in magnitude below the size of the uctuations of the supremum functional. If, for example, VT had a density pVT , then the consistency of the bootstrap con dence bands would follow from a relation like kpVT k1(Th)?3=4 log T = oP (1): Since the existence of such a density is not guaranteed, we formulate the following assertion which provides a lower bound for probabilities that UT and VT fall into small intervals. Lemma 3.2. Suppose that (A1) to (A3) are ful lled. Then   (i) P (UT 2 [c1; c2 ]) = O (c2 ? c1)(Th)1=2(log T )1=2 + (Th)?1=2T  ;   (ii) P (VT 2 [c1; c2]) = O (c2 ? c1 )(log T )1=2 + (Th)?1 T  hold uniformly in (x0; : : : ; xT ) 2 T :

Part (i) of this lemma follows immediately from Lemma 2.2 in Neumann and Polzehl (1995), and part (ii) is an immediate consequence of (i) and o n   sup Vb (x) ? V (x) = Oe (Th)?3=2 log T; T ? : (3.5) x2[a;b]

In conjunction with Theorem 3.1, we now obtain an upper bound to the error in coverage probability for I  and I , respectively. Theorem 3.2. Suppose that (A1) to (A3) are ful lled. Then ch(x) ? t ; c (i) P ((Kh m)(x) 2 [m mh(x) + t ] for all x 2 [a; b]) = 1 ? + O (Th)?1=4(log T )3=2 :   q q   b b c c (ii) P (Kh m)(x) 2 [mh(x) ? V (x)t ; mh (x) + V (x)t ] for all x 2 [a; b]   = 1 ? + O (Th)?1=4(log T )3=2 : Although the construction of con dence intervals or bands for the unsmoothed function m is more problematic, there is nevertheless great interest in such methods. This is re ected by a large amount of literature, mostly in the context of nonparametric density estimation and regression; for a recent survey see Neumann and Polzehl (1995). There are two main routes to deal with the notorious bias problem: undersmoothing and a subsequent bias correction. Undersmoothing means that we choose h such that the bias-type term P w (x; X ; fX ; : : : ; X g)m(X ) ? m(x), which h

t?1

0

T ?1

t?1

15

is O(hp), is of smaller order of magnitude than the stochastic uctuations of UT . Accordingly, the additional condition   (3.6) hp = o (Th)?1=2(log T )?1=2 would imply that I  and I  are con dence bands for m with an asymptotically correct simultaneous coverage probability; for details see Neumann and Kreiss (1996). Alternatively, we may employ an explicit bias correction. Let BbT (x) be any estimate P of BT (x) = wh(x; Xt?1; fX0; : : : ; XT ?1g)m(Xt?1) ? m(x) with o  n  (3.7) sup BbT (x) ? BT (x) = oP (Th)?1=2(log T )?1=2 : x2[a;b]

Then the bias-corrected hbands i  = c I ;c mh(x) ? BbT (x) ? t ; c mh(x) ? BbT (x) + t and   q q  = c  b b c I ;c mh(x) ? BbT (x) ? Vb (x)t ; m ( x ) ? B ( x ) + V ( x ) t h T are asymptotic con dence bands for m with an asymptotic coverage probability 1 ? . Another possibility to handle the bias problem for bootstrap con dence bands is to cg ; g  h; as the underlying conditional mean use an oversmoothed estimator m function in the bootstrap world. That is, we de ne an appropriate counterpart to model (2.3) by cg (xt?1) + "t ; t = 1; :::; T : Xt = m Such a proposal has been used in Hardle and Mammen (1993) for regression and in Franke et al. (1996) for autoregression. In view of an expansion like the one given in Proposition 2.2 for the bias-term in the bootstrap situation it has to be ensured that c(gp) estimates m(p) consistently. the p-th derivative m ;0 = ;0 = sup  (x) ? c c fj m m ( x ) jg and V If we denote of U g T h T x 2 [ a;b ]   the (1 ? )-quantiles q ch(x) ? m cg (x)j = Vb (x) by t ;0 ; t ;0 , respectively, then we again obtain supx2[a;b] jm bias-corrected con dence bands for m with an asymptotic coverage probability 1 ? : These bands take exactly the form of I  and I  ; (cf. (3.1) and (3.4)), where t ; t have to be replaced by t ;0 ; t ;0 ; respectively. For the simulations in Section 4 we make use of this last proposal, i.e. we report on simulation results for   q q ch(x) + Vb (x)t ;0 : I ;0 = c mh(x) ? Vb (x)t ;0; m (3.8) Both latter methods have the practical advantage that one may use a bandwidth h of optimal order, which in particular allows to choose it automatically by any of the popular criteria. In all cases, however, we have to admit that one has to give up optimality considerations for the con dence bands. In the case of undersmoothing, one has to use a suboptimal bandwidth, whereas we have to reserve some additional degrees of smoothness for the bias-correction step of the second and third method. We do not dwell on the e ect of a data-driven bandwidth choice which is important for a real application of this method. Usually data-driven bandwidths hb are intended to approximate a certain nonrandom bandwidth hT . If (hb ? hT )=hT converges at an cbh and c appropriate rate, then the estimators m mhT are suciently close to each other,

16

such that the results obtained in this paper remain valid; see Neumann (1995) for a detailed investigation of these e ects for pointwise con dence intervals in nonparametric regression. 3.3. A supremum-type test. Another classical eld of application of bootstrap methods is the determination of critical values for tests. In our nonparametric context, it might be of interest to test the appropriateness of a certain parametric model for m. The theory developed in this paper can be readily applied to certain nonparametric tests, as will be shown in the following. We allow a composite hypothesis, that is H0 : m 2 M; where the only requirement is that the function class M allows a faster rate of convergence than the nonparametric model. We will assume that

m of m such that (A5) There exists an estimator c ) ( X  x ? Xt?1  h T i c(Xt?1) ? m(Xt?1) = oP (Th)1=2(log T )?1=2 : sup K m h x2R t=1

c itself converges in the supremum A sucient condition for (A5) is obviously that m ? 1 = 2 norm to m with a faster rate than (Th) (log T )?1=2, which can be expected to hold in certain parametric models, M = fm j  2 g. For the particular purpose of testing, there is no reason to use an LPE which automatically adapts to irregularities in the design. Rather, one may choose the test statistic under the aspect of convenience, for example, ) ( X  x ? Xt?1  h T i c(Xt?1) : X (3.9) WT = sup K t?m h x2R t=1 This roughly corresponds to a contrast function which weights the di erence between m(x) and m (x) with a factor proportional to the stationary density (x). In principle, it is also possible to look at the di erence of a nonparametric estimator m directly. This would lead to a test statistics like to (a smoothed version of) c ) ( X T i h 0 c WT = sup wh(x; Xt?1; fX0; : : : ; XT ?1g) Xt ? m(Xt?1) g(Xt?1 ) ; x2R t=1 where g is an appropriate function which downweights contributions from regions where the stationary density is low, and where, therefore, any nonparametric estimator necessarily deteriorates. In the sequel, we restrict our considerations to the technically simpler, and not less natural, test statistic WT . Let tW be the (1 ? )quantile of the (random) distribution of ( X  x ? Xt?1  ) T  : (3.10) " WT = sup K t h x2R t=1 By analogous considerations as above, one may prove the following theorem:

17

Theorem 3.3. Suppose that (A1) and (A3) to (A5) are ful lled. Then, for arbitrary

m 2 M,

  Pm WT > tW = + o(1):

Remark 4. It seems that L2 -tests such as those proposed by Hardle and Mammen (1993) in the regression setup, are the most popular ones among nonparametric statisticians. Such tests can be optimal for testing against smooth alternatives, whereas supremum-type tests have less power in in such a situation. On the other hand, supremum-type tests can also outperform L2-tests for testing against local alternatives having the form of sharp peaks; see Konakov, Lauter and Liero (1995) and Spokoiny (1996) for more details. Theory for L2-tests in nonparametric autoregression is developed in Kreiss et al. (1997).

3.4. Some additional remarks. 1) Generalization to higher dimensions It is quite straightforward to generalize our results to the higher-dimensional case. Suppose that we observe X1?d ; : : : ; XT which obey the model Xt = m(Xt?1; : : : ; Xt?d ) + "t; (3.11) where L("t j Xt?1; : : : ; X1?d ) = L("t j Xt?1; : : : ; Xt?d) and E ("t j Xt?1 ; : : : ; Xt?d)  0 . An appropriate counterpart to (3.11) is given by Yt = m(xt?1; : : : ; xt?d) + t; (3.12) where x1?d; : : : ; xT is a xed realization of (3.11), and t  L("t j (Xt?1 ; : : : ; Xt?d) = (xt?1; : : : ; xt?d)) are independent. We de ne a hierarchical set of intervals Ij;k1;:::;kd = [(k1 ? 1)2?j ; k12?j )      [(kd ? 1)2?j ; kd 2?j ); and corresponding partial sums X Zj;k1 ;:::;kd = "t; t: (Xt?1;::: ;Xt?d)2Ij;k1 ;::: ;kd X Zj0;k1 ;:::;kd = t: t: (xt?1;:::;xt?d )2Ij;k1 ;::: ;kd

Following the lines of the proof of Theorem 2.1, we can construct a pairing of the random variables from (3.11) with those from (3.12), such that   Zj;k1 ;:::;kd ? Zj0;k1 ;:::;kd = Oe [TP ((Xt?1; : : : ; Xt?d ) 2 Ij;k1;:::;kd )]1=4 log T + T ; T ? : (3.13)

18

If we intend, for example, to devise a test as described in the previous subsection, we have to analyze the di erence between     T X K y1 ?hXt?1  : : :  K yd ?hXt?d "t t=1 and  y1 ? xt?1   yd ? xt?d  T X K  ::: K t; h h t=1

where we choose h such that h = O(T ?) and h?d = O(T 1? ) for some  > 0 . We approximate these sums by corresponding truncated Haar series expansions. Let j;k (x) = I (x 2 [(k ? 1)2?j ; k2?j )) and j;k (x) = I (x 2 [(k ? 1)2?j ; (k ? 1=2)2?j )) ? I (x 2 [(k ? 1=2)2?j ; k2?j )) . De ne    Z Z  k1 ;:::;kd =    K y1 ?h z1  : : :  K yd ?h zd 0;k1 (z1)  : : :  0;kd (zd) dz1 : : :dzd ; and  yd ? zd  (i ) Z Z  y1 ? z1  (id ) 1 d) = (ji;k1;:::;i    K  : : :  K j;k1 (z1)  : : :  j;kd (zd ) dz1 : : :dzd ; 1 ;:::;kd h h (1) (0) where j;k = j;k and j;k = j;k . If we assume that K is compactly supported and Lipschitz, then X j k1;:::;kd j = O(hd ) and

k1 ;:::;kd

X k1 ;:::;kd

;id) j (ji;k1;::: 1 ;:::;kd j

1   jd= 2 d = O 2 h h2j ^ 1 :

By (3.13), we obtain, for the di erence at the scale j , that X X (i1;:::;id) X h (i1) j;k1;::: ;kd j;k1 (Xt?1 )  : : :  (i1 ;:::;id )2f0;1gdn(0;:::;0) k1 ;:::;kd

?

t

(id) j;kd (Xt?d )"t

i (i1) (id) ( x )  : : :  ( x )  t ? 1 t ? d t j;k1 j;kd

  1 h i ? ? jd 1 = 4  jd d e : (3.14) = O 2 h h2j ^ 1 (T 2 ) log T + T ; T In contrast to the one-dimensional case, this upper estimate diverges as j ! 1 . Hence, we have to choose the nest scale of our truncated Haar series expansion more carefully. Let j  be chosen such that both 2?j = O(T ? h) (3.15) and  h i d d 1 j 2 h h2j ^ 1 (T 2?jd )1=4 log T + T  = O(T ? (Thd)1=2) (3.16) hold for some  > 0 .

19

According to (3.14) and (3.16), the di erence between the truncated Haar series expansions is   X k1 ;:::;kd Z0;k1 ;:::;kd ? Z0;0 k1;:::;kd k1 ;:::;kd X X X (i1;:::;id) j;k1;:::;kd  + 0j j  (i1 ;:::;id )2f0;1gdn(0;:::;0) k1 ;:::;kd i X h (i1) (id ) (i1) (id)  ( X )  : : :  ( X ) " ? ( x )  : : :  ( x )  j;k1 t?1 j;kd t?d t j;k1 t?1 j;kd t?d t t   = Oe T ? (Thd)1=2; T ? ; (3.17) PT K  y1 ?Xt?1   : : :  which is below the magnitude of pointwise

uctuations of t?1 h  yd ?Xt?d  d 1 = 2 K h "t , which are O((Th ) ). Because of the additional factor T ? , it is also below the magnitude of uctuations of the supremum functional. The truncation errors, that is the di erences between the statistics of interest and the corresponding truncated Haar series expansion, can be treated as follows. Because of  (y1;8: : : ; yd; z1; : : : ; zd)  yd ? zd  2 X <  y1 ? z 1  ?4 k1 ;:::;kd 0;k1 (z1)  : : :  0;kd (zd) = :K h  :::  K h k1 ;:::;kd 39 X X (i1 ;:::;id) X (i1) X (id) (z )5= + j;k1 ;:::;kd ( z )  : : :  j;k1 1 j;kd d ; t 0j j  (i1 ;:::;id )2f0;1gdn(0;:::;0) k1 ;:::;kd   = O 2?j h?1 ; we obtain, for any xed (y1; : : : ; yd), that T   X  (y1; : : : ; yd; Xt?1; : : : ; Xt?d ) "t = Oe T ? (Thd)1=2 log T + T  ; T ? : t=1 (3.18) Analogously, we can show for the pointwise sums corresponding to the regression model (3.12) that T   X  (y1; : : : ; yd; xt?1; : : : ; xt?d) t = Oe T ? (Thd)1=2 log T + T  ; T ? : t=1 (3.19) By establishing (3.18) and (3.19) on a suciently ne grid, we obtain nally, in conjunction with (3.16), the desired approximation for the supremum deviation: ( X T K  y1 ? Xt?1   : : :  K  yd ? Xt?d  "t sup h h y1 ;::: ;yd t=1  y1 ? xt?1   yd ? xt?d  # ) T X ? K  ::: K t h h t =1   (3.20) = Oe T ? (Thd)1=2 log T + T ; T ? :

20

Note that the approximation error is below the magnitude of the uctuations of the supremum deviation, which are OP ((Thd)1=2(log T )?1=2). 2) Robustness against deviations from the model assumptions Although authors often avoid, for obvious reasons, such a discussion, robustness against any kind of deviations from structural model assumptions is an important issue for the reliability of a method in practical applications. Our proofs are clearly based on the Markov property of the time series, since the application of the Skorokhod embedding requires that E ("t j X0 ; : : : ; Xt?1 )  0 . On the other hand, even if the data generating process does not obey a structural model like (2.1), it makes nevertheless sense to t such a nonparametric autoregressive model. It would be interesting to know whether the wild bootstrap remains valid in such a case of an inadequate model. Under mixing and some extra condition on the joint densities, Robinson (1983) showed that the e ect of weak dependence vanishes asymptotically for nonparametric estimators. Hart (1995) coined the term \whitening by windowing" for this e ect. It is generally connected with rare events as, for example, the event that a certain Xt falls into the range of a compactly supported kernel, scaled with a bandwidth h tending to zero. In our context of supremum-type statistics, we need an appropriate version of the whitening by windowing principle beyond the pointwise properties of nonparametric estimators. Using techniques completely different from those applied here, Neumann (1996, 1997) derived such results in the context of nonparametric density estimation and nonparametric estimation of the autoregression function, respectively, from weakly dependent random variables. The rate for the approximation in this general context is of course worse than that obtained in the present paper. 4. Simulations In this section we present the results of a simulation study. The rst part deals with simulated simultaneous con dence bands for the conditional mean function m(x), cf. Section 3.2. For this purpose let us consider the following two models: and

Xt = 4  sin(Xt?1 ) + "t

(4.1)

q Xt = 0:8  Xt?1 + 1 + 0:2 Xt2?1  "t

(4.2)

The latter model is a usual linear rst order autoregression with so-called ARCHerrors. The innovations "t are assumed to be i.i.d. with zero mean and unit variance. For the innovations in model (4.1) we assume a double exponential distribution, while model (4.2) is assumed to have normally distributed errors.

21

Figures 1a-d Bootstrap con dence bands (thick) based on LPE estimator (broken) and actual band (thin)

Based upon T = 500 observations X1; : : : ; XT we simulate simultaneous con dence bands of variable size (cf. (3.4)) for m1(x) = 4  sin(x) and  m2(x) = 0:8  xq. This ch(x) ? m(x)j= Vb (x) is done by simulating the 90%-quantile t0:90 of supx2[a;b] jm from 1000 MonteqCarlo replications. The q actual 90%-con dence bands of the form I0:10 = [m^ h(x) ? Vb (x)t0:90; m^ h(x) + Vb (x)t0:90] (cf. (2.6), (3.2) for the de nition of m^ h, Vb , respectively) together with corresponding bootstrap con dence bands (cf. (3.8)) are shown in Figures 1a-d and 2a-d. Figures 1a-d correspond to model (4.1) while Figures 2a-d show results for model (4.2). In both situations we report on four replications, i.e. four di erent underlying sets of data, which result in di erent ch and di erent (1 ? )-quantiles in the bootstrap world. estimatots m The thick lines in all gures represent bootstrap con dence bands, while thin lines are used for actual con dence bands. Broken lines in all gures indicate the LPE

22

estimates c mh for each underlying data set of the corresponding gure. ch , while for m2 we make use of a usual m1 is estimated by a local linear estimator m Nadaraya-Watson type kernel estimator, that is a local constant smoother. The bandwidth h in all cases in chosen according to a cross validation criterion.

Figures 2a-d Bootstrap con dence bands (thick) based on LPE estimator (broken) and actual band (thin)

Although the above simulation results are qualitative, only, it is indicated that the bootstrap o ers a practicable tool in order to construct not only pointwise but also simultaneous con dence bands for nonparametric estimators in nonlinear autoregression. The following part is devoted to simulation results for the supremum-type test proposed in Section 3.3. As underlying true models we choose an ordinary rst order linear autoregression Xt = 0:9  Xt?1 + "t ; "t  normal ; (4.3) and Xt = 0:9  sin (Xt?1) + "t ; "t  double exponential . (4.4)

23

The rst table (Table 4.1) reports upon simulated values of the power function of a statistical test based on the test statistic WT , cf. (3.10), for the hypothesis of a rst order linear autoregression with unknown parameter  : That is we expect for model (4.3) values around the level , while for model (4.4) we obtain an impression of the power of the testing procedure.

Figure 3, T=100, model (4.3) Distribution of test statistic (thick) with ve bootstrap approximations (thin)

As values for the sample size T we use 100 and 200 : Again a cross-validation technique is used for the selection of the bandwidth h, which is in accordance with the theoretical results of the paper (cf. the discussion at the end of Section 3.2). The number of Monte Carlo replications both for the bootstrap and the testing procedure is chosen equal to 500 : Table 4.2 contains similar results for a statistical test of the hypothesis H : m 2 M = f sin()j 2 Rg ; where  is not known. The results presented in Table 4.2 are just for the opposite situation as in Table 4.1. Now the hypothesis is the nonlinear model (4.4) with unknown coecient  : The bootstrap distribution mimics the stochastic term Pt wh(x; Xt?1; fX0; :::; XT ?1g)"t ; only, which ensures that even under the alternative a reasonable approximation of the distribution of the test statistic under the hypothesis is achieved. This guarantees reasonable power values of the proposed sup-type test, as can be seen from Tables 4.1-4.2. It can also be seen from these tables, that the test on H : m 2 f sin()j 2 Rg ( unknown) for model (4.3) has much more power than the test on linearity for model (4.4). This can be explained through the much more spread stationary distribution of model (4.3). This implies essentially that deviations from the underlying conditional mean function over a larger interval will be taken into account by the sup-distance (cf. Section 3.3).

24

Model Theoretical level Sample size Simulated power 100 0.046 0.05 200 0.044 (4.3) 100 0.091 0.10 200 0.088 100 0.422 0.05 200 0.764 (4.4) 100 0.600 0.10 200 0.872 Table 4.1 Simulated power values for testing on rst order linear autoregression

Figure 4, T=200, model (4.4) Distribution of test statistic (thick) with ve bootstrap approximations (thin)

To obtain an impression of the quality of the approximation of the bootstrap distribution L(WT ) , cf. (3.11), for the distribution of the test statistic WT , cf. (3.10), we in Figures 3 and 4 include slightly smoothed plots for the density of L(WT ) (thick line) and ve randomly chosen bootstrap approximations of this distribution (thin lines). The bootstrap approximation of L(WT ) is used to compute critical values for the testing procedure. From Figures 3 and 4 it can be seen that the distribution of the test statistic is rather skewed. All presented plots are based on 1000 Monte Carlo replications.

25

Model Theoretical level Sample size Simulated power 100 0.034 0.05 200 0.036 (4.4) 100 0.068 0.10 200 0.058 100 0.986 0.05 200 1.000 (4.3) 100 0.984 0.10 200 1.000 Table 4.2 Simulated power values for testing on rst order non-linear autoregression 5. Proofs Proof of Lemma 2.1. To handle the dependence, we consider blocks of consecutive observations fZt; t 2 Jig, where Ji = f(i ? 1)T +1; : : : ; iT ^ T g and T = [C log T ]. Without loss of generality, we consider the blocks with odd numbers. By Proposition 2 in Doukhan, Massart and Rio (1995), there exists a sequence of independent blocks fZt0; t 2 Jig, i odd, with the property P ((Zt0; t 2 Ji) 6= (Zt; t 2 Ji) for any odd i) = O(T ?) ; (5.1) where we have to choose the value of C in dependence of . After this reduction to the independent case, we will obtain the assertion from Bernstein's inequality, which we quote for reader's convenience from Shorack and Wellner (1986, p. 855): Let U1; : : : ; Un be independent random variables with EUi = 0 and jUij  Kn almost surely. Then, for U = P Ui , ! 2=2 c P (U > c)  exp ? var(U ) + (K c)=3 ! n   2 c 3 c  exp ? 4 var(U ) + exp ? 4K n holds for arbitrary c > 0 . Setting q q c = var(U ) 4 log(n) + (4=3)Kn  log(n) we get P (jU j > c)  4 exp(? log(n)): In other words, we have that q  q ?  e U = O var(U ) log(n) + Kn log(n); n : (5.2)

26

From Davydov (1970) we get, for p; q; r  1 and 1=p + 1=q + 1=r = 1, that 0 1 X X var @ Zt0A = cov(Zs0 ; Zt0) t2Ji s;t2Ji X  8 (js ? tj)1=rkZskpkZtkq = O(log T ):

(5.3)

On the other hand, we obtain by Jensen's inequality that 0 1 X X var @ Zt0A  T var(Zt):

(5.4)

s;t2Ji

t2Ji

t2Ji

Assertion (i) now follows from (5.1) to (5.4) as well as k Pt2Ji Zt0k1 = O(log T ). Under the hypothesis of (ii), we have in particular that   sup fjZtjg = Oe T 0 ; T ? (5.5) t=1;:::;T

holds for any 0 2 (0; ). Using the truncated random variables Zt00 = ZtI (jZtj < T 0 ) instead of Zt, we obtain (ii) from (5.1), (5.2), (5.4) and (5.5). Proof of Theorem 2.1. (i) General idea The pairing of the observations in the autoregression model (2.1) with those in the 0 , is regression model (2.3), which provides a close connection between Zj;k and Zj;k made via a Skorokhod embedding of the "t's and t's, respectively, in a common set of Wiener processes. This technique makes use of the well-known fact that any random variable Y with EY = 0 and EY 2 < 1 can be represented as the value of a Wiener process stopped at an appropriate random time. Moreover, such a representation is also possible for the partial sum process of independent random variables as well as for a discrete time martingale; see e.g. Hall and Heyde (1980, Appendix A.1) for a convenient description. In particular, one can show asymptotic normality for a martingale with this approach. However, here we have a di erent task. We are not interested in a close connection of the two global partial sum processes Sn = Pnt=1 "t and Sn0 = Pnt=1 t , but we are interested in a close connection of the sums of those "t's and t's which correspond to Xt?1 's and xt?1's, respectively, that fall into a particular interval. A quite obvious modi cation of the usual Skorokhod embedding in one Wiener process would be to relate the sets of random variables f"1; : : : ; "T g and f1; : : : ; T g to independent Wiener processes Wk , which correspond to the intervals Ij;k on the nest resolution scale under consideration. This would lead to such a pairing of f"1; : : : ; "T g with f1; : : : ; T g , which provides a close connection between  0  ? j  Zj ;k and Zj ;k . If j is chosen ne enough, that is if 2  h , then we also get ch(x) ? m fh(x) = oP ((Th)?1=2) . However, although this monoscale approximation m 0 for j close to j  , it is not is quite good for the di erences between Zj;k and Zj;k  optimal at coarser scales j  j . In view of this ineciency we apply here a re ned, truely multiscale approximation scheme. Accordingly we will relate the "t's and t's to Wiener processes Wj;k for (j; k) 2 IT . In the following we describe this construction in detail for the autoregressive process (2.1). The construction in the regression setting (2.3) is completely analogous, and

27

will only be mentioned brie y. Then we draw conclusions for the rate of approxima0 , which will complete the proof. tion of Zj;k by Zj;k (ii) Embedding of "1 Let Wj;k , (j; k) 2 IT , be independent Wiener processes. We will use each of these processes only on a certain time interval [0; Tj;k], where the values of the Tj;k 's will be speci ed in part (v) below. At the moment it is only important to know that T0;k = 1 . Let k1 be that random number with X0 2 Ij;k1 . Now we represent "1 by the Wiener process Wj ;k1 . This should be done with the aid of a stopping time  (1), which is constructed according to Lemma A.2 in Hall and Heyde (1980, Appendix A.1). However, since we want to use Wj;k1 up to some time Tj;k1 only, it might happen that this is not enough for representing "1. In this case we additionally use a certain stretch of the process Wj?1;[k1 =2], and so on. To formalize this construction, let k(j) be such that Ij;k  Ij?1;k(j?1)  : : :  I0;k(0) ; that is, k(j) = [k2j?j ] , where [a] denotes the largest integer not greater than a. According to the above description we represent "1 by the following Wiener process: 8 > Wj;k1 (s); if 0  s  Tj;k1 < W (1)(s) = > Wj;k1 (Tj;k1 ) + : : : + Wj+1;k1(j+1) (Tj+1;k1(j+1) ) + Wj;k1(j) (s ? Tj;k1 ? : : : ? Tj+1;k1(j+1) ); : if Tj;k1 + : : : + Tj+1;k1(j+1) < s  Tj;k1 + : : : + Tj;k1(j) (W (1) is indeed a Wiener process on [0; 1), since T0;k = 1 .) According to Lemma A.2 in Hall and Heyde (1980), we have L ("1 j X0 = x0) = W (1)( (1)) for an appropriate stopping time  (1). To explain the following steps in a formally correct way, we introduce stopping times (t), t = 0; : : : ; T , assigned to the corresponding Wiener processes W . De ne j;k j;k (0) = 0 j;k for all (j; k) 2 IT : (1) (0) To get j;k we rede ne all those j;k 's, which are assigned to Wiener processes Wj;k that were needed to represent "1. According to the above construction we set j(1)  ;k1 =  (1) ^ Tj  ;k1 : We rede ne further8 > [ (1) ? Tj;k1 ? : : : ? Tj+1;k1(j+1) ] ^ Tj;k1(j) ; < (1) j;k if Tj;k1 + : : : + Tj?1;k1(j?1) <  (1) (j) = > 1 :0 otherwise The remaining stopping times j;l(1) with l 6= k1(j) keep their preceding value j;l(0) = 0 . This procedure will be repeated for all other "t's, with the modi cation that we use only stretches of the Wiener processes, which are still untouched by the previous construction steps.

28

(iii) Embedding of "t Let kt be that random number with Xt?1 2 Ij;kt . We represent "t by means of parts of Wj ;kt ; Wj?1;kt(j?1) ; : : : ; W0;kt(0) , which have not been used so far. First note that, because of the strong Markov property, these remaining parts (t?1) (t?1) Wj;kt(j) (s + j;k (j) ) ? Wj;kt(j) (j;k(j) ) are again Wiener processes. Hence, t t 8 > Wj;kt (s + j(t?;k1)t ) ? Wj;kt (j(t?;k1)t ); if 0  s  Tj;kt ? j(t?;k1)t > > >  > (t?1)  >    W ( T ) ? W (  j ;k j ;k j ;k > t t j  ;kt ) + : : : +  >  t (t?1) > + Wj+1;kt(j+1) (Tj+1;kt(j+1) ) ? Wj+1;kt(j+1) (j+1;k(j+1) ) > < t  W (t)(s) = > + Wj;k(j) (s ? (Tj;kt ? j(t?;k1)t ) ? : : : ? (Tj+1;k(j+1) ?  (t?1)(j+1) ) +  (t?(j1)) ) ? j;kt j +1;kt t t >  > ( t ? 1) > > > ? Wj;kt(j) (j;kt(j) ) ; (t?1) t?1) > if (Tj;kt ? j ;kt ) + : : : + (Tj+1;kt(j+1) ? j(+1 ) ;kt(j+1) > > :  (Tj;kt ? j(t?;k1)t ) + : : : + (Tj;kt(j) ? j;k(t?(j1)) ) t

is again a Wiener process on [0; 1). Now we take, according to the construction in Lemma A.2 in Hall and Heyde (1980), a stopping time  (t) with L ("t j Xt?1 = xt?1) = W (t)( (t)): (t), we rede ne those stopping times  (t?1), which are assigned to Wiener To get j;k j;k processes Wj;k that were used to represent "t. We set 8   (t?1) (t?1) + > (t) ? (Tj  ;k ?  (t?1)) ? : : : ? (T ) ^ Tj;kt(j) ;  ?   > ( j +1) t j ;kt j +1;kt > j +1;kt(j+1) j;kt(j) < (t) t?1) (t) j;k if (Tj;kt ? j(t?;k1)t ) + : : : + (Tj+1;kt(j+1) ? j(+1 (j) = > (j+1) ) <  ;k t t > (t?1) > : j;k otherwise (j) t

For all (j; l) with l 6= kt(j) we de ne j;l(t) = j;l(t?1): (T ) After embedding "1; : : : ; "T we arrive at stopping times j;k . (iv) Embedding of 1; : : : ; T We embed 1; : : : ; T in complete analogy to the embedding of "1; : : : ; "T in the same Wiener processes Wj;k , (j; k) 2 IT . In this way, we arrive at stopping times (t), which play the same role as the  (t)'s. ej;k j;k (v) Choice of the values for Tj;k To motivate our particular choice of the Tj;k 's we consider rst two extreme cases. If Tj;k = 1 , then Zj;k and Zj0;k are both completely represented by Wj;k . This

29

will lead to a close connection of Zj;k and Zj0 ;k . However, this choice is not favorable for scales j with j  j  . If, for simplicity, Tj;k = 1 for all k, then the 0 , for j < j  , depend very much on the particular representations of Zj;k and Zj;k values of fX0; : : : ; XT ?1g and fx0; : : : ; xT ?1g . In general, in the case of too large 0 too many a Tj;k there will be a tendency that for the representation of Zj;k and Zj;k di erent stretches of the Wiener processes Wj;m with Ij;m  Ij;k are used, which 0 . leads to a suboptimal connection of Zj;k and Zj;k On the other hand, if Tj;k is quite small, then Zj ;k and Zj0 ;k will be represented in large parts by stretches of Wiener processes Wj;m, j < j  , which correspond to intervals Ij;m  Ij;k . Moreover, these stretches used for Zj ;k will be mostly di erent from those used for Zj0 ;k , and therefore we would get a suboptimal connection of Zj;k and Zj0;k . To nd a good compromise between these two con icting aims, we choose the Tj;k 's as large as possible, but with the additional property that the stretches [0; Tj;k ], j 6= 0 , are used up in the representation of f"1; : : : ; "T g and f1; : : : ; T g with high probability. Strictly speaking, we choose the Tj;k 's in such a way that 0 1 X X P @  (t)I (Xt?1 2 Ij;k ) < Tl;m for any (j; k) 2 IT n f(0; k)gA = O(T ?) t (l;m):Il;m Ij;k (5.6) and 0 1 X X P @ e(t)I (xt?1 2 Ij;k ) < Tl;m for any (j; k) 2 IT n f(0; k)gA = O(T ?): t (l;m):Il;m Ij;k (5.7) To achieve this, we study rst the behaviour of the above sums of the stopping times assigned to the interval Ij;k . Remind that the conditional distribution of "t depends only on Xt?1. By taking a closer look at the construction of the Skorokhod embedding described in Hall and Heyde (1980), one can see  (t) depends only on "t and fW (t); 0  s   (t)g. Since, for 0 0 t 6= t0, fW (t); 0  s   (t)g and fW (t ); 0  s   (t )g correspond to disjoint stretches of the Wiener processes Wj;k separated by stopping times, the random variables  (t)I (Xt?1 2 Ij;k) are geometrically -mixing. Hence, we obtain by Lemma 2.1(ii) that 0 1 p X ( t )  P @  ? TEv(X0)I (X0 2 Ij;k ) > [C T 2?j log T + T ]A = O(T ?)(;5.8) t: Xt?12Ij;k and, analogously, 1 0 p X e(t) ? TEv(X0)I (X0 2 Ij;k ) > [C T 2?j log T + T ]A = O(T ?)(:5.9) P @ t: xt?12Ij;k holds uniformly in (x0; : : : ; xT ?1) 2 T , where T is an appropriate set of \not too irregular" realizations of (X0 ; : : : ; XT ?1) with P ((X0 ; : : : ; XT ?1) 62 T ) = O(T ?) .

30

De ne

Sj;k = Further, we de ne

T X t=1

p

E (t)I (Xt?1 2 Ij;k ) ? [C T 2?j log T + T  ]: Tj;k = Sj;k ?

X (l;m): Il;m Ij;k

Sl;m:

(Then Sj;k = P(l;m): Il;m Ij;k Tl;m .) By (5.8) and (5.9) we obtain (5.6) and (5.7). 0 j (vi) Conclusions for jZj;k ? Zj;k By (5.6) we obtain with a probability exceeding 1 ? O(T ?) that X X X (t) ) ? W ( (t?1)); Zj;k = Wl;m(Tl;m) + Wl;m(l;m l;m l;m t: Xt?1 2Ij;k (l;m): Ij;k Il;m (l;m): Il;m Ij;k (5.10) and, by (5.7), X 0 = Zj;k

(l;m): Il;m Ij;k

Wl;m(Tl;m) +

X

X

(t) ) ? W (e(t?1)); Wl;m(el;m l;m l;m t: xt?1 2Ij;k (l;m): Ij;k Il;m (5.11)

which holds again with a probability exceeding 1 ? O(T ?) , under the condition (x0; : : : ; xT ?1) 2 T . At this point we see why our particular pairing of "1; : : : ; "T 0 : most of the with 1; : : : ; T provides a close connection between Zj;k and Zj;k 0 randomness of Zj;k and Zj;k is contained in the rst terms on the right-hand side of (5.10) and (5.11), respectively. These terms are random, but identical to each other. P P T T ( t ) Assume now that both t=1  I (Xt?1 2 Ij;k )  Tj;k and t=1 e(t)I (xt?1 2 Ij;k )g  Tj;k are satis ed. By (5.8) and (5.9) we have that X X X (t) (t?1) (t) = l;m ? l;m  ? Sj;k t: Xt?1 2Ij;k (l;m): Ij;k Il;m t: Xt?1 2Ij;k  p = Oe T 2?j log T + T  ; T ? and

X

X

t: xt?1 2Ij;k (l;m): Ij;k Il;m

(t) ? e(t?1) = el;m l;m

=

X

e(t) ? Sj;k

t: xt?1 2Ij;k p Oe T 2?j log T

 + T ; T ? :

n (t?1)  s   (t) o Note that, for xed t and under Xt?1 2 Ij;k , the pieces Wl;m(s); l;m l;m of the Wiener processes Wl;m corresponding to intervals Il;m  Ij;k can be composed res;t ], where  res;t = to a piece of a new Wiener process Wj;kres;t on the interval [0; j;k j;k

P

31

(t) (t?1) (l;m): Ij;k Il;m (l;m ? l;m ) .

This is achieved by setting 8 > Wj?1;[k=2](s + j(?t?11);[k=2]) ? Wj?1;[k=2](j(?t?11);[k=2]); if 0  s  j(?t)1;[k=2] ? j(?t?11);[k=2]; > > > h > (t) (t?1) i > W j ?1h;[k=2] (j ?1;[k=2]) ? Wj ?1;[k=2](j ?1;[k=2]) + : : : + > i < (t) (t?1) res;t + W l +1 ? j (  ) ? W l +1 ? j (  ) Wj;k (s) = > h l+1;[k2 ] l+1;[k2l+1?j ] (t?1) l+1i ;[k2 ] l+1;[k2l+1?j ] > + Wl;[k2l?j ](u) ? Wl;[k2l?j ](l;[k2l?j ]) ; > > t) (t?1) (t?1) > if s = (j(?t)1;[k=2] ? j(?t?11);[k=2]) + : : : + (l(+1 > ;[k2l+1?j ] ? l+1;[k2l+1?j ]) + (u ? l;[k2l?j ]) > : and u  l;(t[k) 2l?j ] res;t = 0 .) (In the casen of Xt?1 62 Ij;k we simply set j;k o res;t is F -measurable. By the strong Markov propNote that Wj;kres;t(s); 0  s  j;k t erty,nthe remaining parts of the Wiener processes Wj;k , o (t) (t) i.e. Wj;k (s + j;k ) ? Wj;k (j;k ); 0  s < 1 , form again independent Wiener processes, which are also independent of Ft. Hence, we can compose all these parts of Wj;kres;t considered above to a Wiener process Wj;kres by setting 8 W res;1(s); if 0  s   res;1 > j;k < j;k res;1 res;u?1 ( res;u?1 ) + W res;u (s ?  res;1 ? : : : ?  res;u?1 ); res Wj;k (s) = > Wj;kres;1(j;k ) + : : : + Wj;k j;k j;k j;k j;k : if  res;1 + : : : +  res;u?1 < s   res;1 + : : : +  res;u j;k

j;k

(t) 's, el;m

j;k

j;k

An analogous construction can be made for the leading to a Wiener process res f Wj;k . res;1 + : : : +  res;T = P (t) Note that j;k t: Xt?1 2Ij;k  ? Sj;k . Now we obtain by j;k Lemma 1.2.1 in Csorg}o and Revesz (1981, p. 29) that X X (t) ) ? W ( (t?1)) Wl;m(l;m jZj;k ? Zj;k0 j  l;m l;m t: Xt?1 2Ij;k (l;m): l>j;Ij;k Il;m X X (t?1) (t) + ) ? Wl;m(el;m ) Wl;m(el;m t: xt?12Ij;k (l;m): l>j;Ij;k Il;m res X (t) f res X e(t) = Wj;k (  ? Sj;k )  ? Sj;k ) + Wj;k ( t: Xt?1 2Ij;k t: xt?1 2Ij;k   = Oe [T 2?j ]1=4 log T + T  ; T ? ; which nishes the proof. Proof of Corollary 2.1. Let (x0 ; : : : ; xT ?1 ) 2 T . We assume throughout the proof that o n (5.12) Zj;k ? Z 0  C (T 2?j )1=4 log T + T  j;k

is ful lled for all (j; k) 2 IT , which holds true by Theorem 2.1 with a probability of 1 ? O(T ?) .

32

Let [c; d) be an arbitrary interval. Beginning from the coarsest scale, we approximate [c; d) from below by a union of as large as possible intervals. There exists indices, (j1; k1); : : : ; (jm ; km) , such that Ij1;k1 [ : : : [ Ijm;km  [c; d); where j1 < : : : < jm?1  jm  j  . Here we choose j  such that (T 2?j )1=4 log T  T  . On the other hand, if we add two suitable intervals from the nest scale, say Ij;l1 and Ij;l2 , then we can approximate [c; d) from above, that is [c; d)  Ij1;k1 [ : : : [ Ijm;km [ Ij;l1 [ Ij;l2 : Now we obtain, under (5.12), that X X "t ? t t: Xt?1 2[c;d) t: xt?1 2[c;d) m X  jZji;ki ? Zj0i ;ki j + jZj;l1 j + jZj;l2 j + jZj0 ;l1 j + jZj0;l2 j i=1 ! m h i X  ? j 1 = 4  i (T 2 ) log T + T + T = O i =1     = O (T 2?j1 )1=4 log T + T  = O [TP (X0 2 [c; d))]1=4 log T + T  : Proof of Corollary 2.2. We approximate w by a truncated Haar wavelet series expansion X X X we(z) = k k (z) + j;k j;k (z); (5.13) 0j < 2 j=; 2 if (k ? 1)2?j ?j z < (k ? ?1j=2)2?j : if (k ? 1=2)2  z < k2 j;k (z ) = > ?2 ; :0 otherwise In view of the following calculations, we choose j  such that T 2?j  T  . It holds that X j k j = O(kwk1) (5.14) k

and X k

j j;k j = O (minfk j;k k1kwk1; k j;k k1TV (w)g) = O(minf2j=2kwk1; 2?j=2TV (w)g): (5.15)

33

0 = P I (xt?1 2 This implies, with Zej;k = Pt I (Xt?1 2 [(k ? 1)2?j ; k2?j ))"t and Zej;k t [(k ? 1)2?j ; k2?j ))t , that X T T we(Xt?1 )"t ? X we(xt?1)t t=1 t=1 X  k [Ze0;k ? Ze00 ;k ] k X X X j;k [ j;k (Xt?1)"t ? j;k (xt?1)t] + 0j

Suggest Documents