Stat. Meth. & Appl. (2007) 16:117–139 DOI 10.1007/s10260-006-0027-3 O R I G I NA L A RT I C L E
Bootstrap inference in local polynomial regression of time series Maria Lucia Parrella · Cosimo Vitale
Received: 11 May 2004 / Revised: 14 April 2005 / Accepted: 25 September 2006 / Published online: 7 November 2006 © Springer-Verlag 2006
Abstract In this paper we consider the inferential aspect of the nonparametric estimation of a conditional function g(x; φ) = E[φ(Xt )|Xt,m ], where Xt,m represents the vector containing the m conditioning lagged values of the series. Here φ is an arbitrary measurable function. The local polynomial estimator of order p is used for the estimation of the function g, and of its partial derivatives up to a total order p. We consider α-mixing processes, and we propose the use of a particular resampling method, the local polynomial bootstrap, for the approximation of the sampling distribution of the estimator. After analyzing the consistency of the proposed method, we present a simulation study which gives evidence of its finite sample behaviour. Keywords Nonparametric regression · Local polynomial fitting · Local bootstrap · α-mixing processes 1 Introduction Several problems arising in the study of linear and nonlinear time series can be formulated in terms of conditional moments. For example, the conditional mean of a process can be used for detecting the trend of the process. The conditional variance, also known as volatility function, is particularly significant
M. L. Parrella (B) · C. Vitale Dipartimento di Scienze Economiche e Statistiche, Università di Salerno, Fisciano (SA), Italy e-mail:
[email protected] C. Vitale e-mail:
[email protected]
118
M. L. Parrella, C. Vitale
in financial and economic markets, because it is connected to investment risk. Conditional moments are also used in forecasting, which is undoubtedly one of the main applications of time series analysis. Increasing interest has been lately put on nonparametric estimators of such conditional moments. For example, the Nadaraya–Watson estimators are often used because they are easy to implement and consistent for the estimation of smooth functions, under some appropriate conditions. Nevertheless, they present some difficulties, in terms of significant bias, especially on the bounds of the support of the function, where few observations are usually available, and in terms of estimation of the derivatives of the function. These problems have motivated the development of another type of nonparametric kernel estimators: the local polynomial estimator, which represents a generalization of the Nadaraya–Watson’s one. The advantages and properties of local polynomial estimators have been analyzed in different contexts by several researchers. See, for example, Ruppert and Wand (1994), Masry and Fan (1997), Masry and Tjøstheim (1995) and Fan and Gijbels (1995). It is well known that local polynomial estimators have better properties than the Nadaraya–Watson estimators in terms of boundary effect. Moreover, they can be used to obtain an estimate of the derivatives of a smooth function in a very natural way. There are several reasons for which it may be useful to estimate a derivative of a function: to analyze marginal increments of financial and economic data, to test the linearity of a process, to derive the functional form of the trend of a process, etc. The knowledge of the sampling distribution of the local polynomial estimator is required for the construction of confidence bands and hypothesis tests on the regression function. Under appropriate conditions, the normal asymptotic distribution of the local polynomial estimator is known, and can be used to approximate such sampling distribution, but it depends on some characteristics of the process that are unknown and difficult to estimate. So this is one of the circumstances in which it is particularly useful to use a computational method, such as the bootstrap. To our knowledge, the bootstrap methods that have been proposed so far in nonparametric contexts similar to that considered here are substantially based on the resampling of some residuals of a nonparametric fitted autoregressive model. Franke et al. (2002), for example, use the residual bootstrap, while Neumann and Kreiss (1998) adopt the wild bootstrap. This methods require a preliminary nonparametric estimation of some specific conditional curves in order to capture the dependence structure of the series. Moreover, they impose a Markovian assumption on the bootstrap data generating process. In order to avoid such model assumptions, Paparoditis and Politis (2000) propose to adopt an alternative resampling method based on the idea of the local bootstrap of Shi (1991). The fundamental idea of this method is to base the bootstrap resampling on a consistent estimation of the conditional distribution function of the process. Paparoditis and Politis applied the local bootstrap to Nadaraya–Watson estimators of conditional moments, and considered a general dependent data context. The procedure is entirely nonparametric and model
Bootstrap inference in local polynomial regression of time series
119
free, since it does not require the estimation of any characteristic of the original process. This paper starts from the idea of Paparoditis and Politis. As in their paper, we consider the nonparametric local estimation of conditional moments for α-mixing processes, but here we apply the local bootstrap to the multivariate local polynomial estimators of generic order p instead of the Nadaraya–Watson estimators. The objective is to combine the advantages of local bootstrap with those of local polynomial estimators over the Nadaraya–Watson ones. For this reason this method is called local polynomial bootstrap (LPB). Since Nadaraya– Watson estimators can be viewed as local polynomial estimators of order p = 0, the LPB represents a generalization of the method proposed by Paparoditis and Politis. This paper is organized as follows. Section 2 describes the problem and reports the assumptions made for the consistency of the LPB method. In Sect. 3 the multivariate local polynomial estimator is introduced. The Local Polynomial Bootstrap procedure is discussed in detail in Sect. 4. Here we also formalize the theorem which establishes its asymptotic validity. Section 5 reports the results from a simulation study, which gives evidence of the finite sample behaviour of the proposed resampling method. Some concluding remarks are reported in Sect. 6, while the appendix contains all technical lemmas and proofs. 2 Setup of the problem Let {Xt ; t ∈ N} be a strictly stationary stochastic process and let φ be an arbitrary real-valued measurable function. We assume that the conditional expectation g(x; φ) = E φ(Xt ) Xt,m = x
(1)
exists, where Xt,m = (Xt−i1 , . . . , Xt−im ) is the vector of past data, with respect to the set of m ≥ 1 integer indices i1 ≤ i2 , ≤ · · · ≤ im , and x is a point in Rm . For simplicity, from now on we will omit in the notation the dependence of the vector Xt,m on the parameter m. We also assume that the function g(x; φ) has up to order p + 1 continuous partial derivatives in an open neighborhood of x. Such partial derivatives are denoted by (Dj g)(x; φ) =
∂ |j| g(x; φ) j
j
∂x11 · · · ∂xmm
0 ≤ |j| ≤ p + 1,
(2)
where j = (j1 , . . . , jm ) and |j| = m i=1 ji . When |j| = 0, Eq. (2) is equivalent to (1), so an estimator of the quantity in (2) will give an estimation of either the regression function g(x; φ) or its partial derivative of order |j|, according to the specification of the vector j. Given a finite realization of the process {Xt ; t = 1, . . . , n}, let us consider the nonparametric local polynomial estimator of the function (2). We refer to the
120
M. L. Parrella, C. Vitale
papers of Masry (1996a,b), which explain methods and properties of such estimator applied to the multivariate case and to a dependent data context. Let us g )(x; φ) the estimator, for a given 0 ≤ |j| ≤ p. This estimator, indicate with (Dj h
as shown in Sect. 3, depends on a kernel function K(·) and on a parameter h, called smoothing bandwidth, which regulates the smoothness of the estimated function. In order to make some inference on the true function, we are interested in the sampling distribution of the following normalized statistic, nhm+2|j| (Dj gh )(x; φ) − (Dj g)(x; φ) ,
0 ≤ |j| ≤ p.
(3)
It is known that such statistic weakly converges towards a normal distribution, under some appropriate conditions. The asymptotic normal distribution is known, but it depends on some unknown characteristics of the process which are difficult to estimate, such as the conditional variance σ 2 (x; φ) = Var [φ(Xt )|Xt ] and the density function of Xt . In this particular context, the bootstrap is a practical and useful alternative tool for the approximation of the sampling g )(x; φ). distribution of the local polynomial estimator (Dj h
Note that the above setup is broad enough to include different applications, as the function φ may be selected to give estimates of conditional moment functions, conditional distribution functions and conditional density functions, as well as their partial derivatives. The framework of reference is built on dependent data, but the method can be also applied to i.i.d. observations. Some of the possible applications of this nonparametric estimation method are d-step prediction, volatility analysis of financial data, trend analysis and goodness-of-fit tests. We now state the assumptions made on the process and on the kernel function. Assumption 1 The stochastic process {Xt } is strictly stationary and φ is a Borel function on R, such that E |φ(Xt )|δ < ∞ for some δ ≥ 4. Assumption 2 The following assumptions are made on the distribution law of Xt : (a) fXt (u) < ∞ and fXt ,Xt+l (u, v) < ∞, for u, v ∈ Iγ (x), where Iγ (x) = {z : z − x < γ } for some γ > 0, and · is the Euclidean norm. Furthermore, fX ,X (u, v) − fX (u)fX (v) < ∞. The functions fX (·) and fX ,X (·, ·) t t+l t t t t+l t+l denote, respectively, the probability density of Xt and of (Xt , Xt+l ), with l ≥ 1; (b) the density function fXt (·) is uniformly continuous in the compact set Iγ (x) and is such that inf x∈Iγ (x) fXt (x) > 0; (c) the conditional density function fXt |Xt (Xt |Xt ) exists and is bounded; (d) the conditional density fXt Xt+l |Xt ,Xt+l (u, v|y1 , y2 ) exists and is bounded for all l ≥ 1; (e) the conditional distribution function Fφ(Xt )|Xt (y|v) of φ(Xt ) is continuous in v = x.
Bootstrap inference in local polynomial regression of time series
121
Assumption 3 The following assumptions are made on the dependence properties of the stochastic process: (a) the process {Xt } is α-mixing with the mixing coefficient α(k) satisfying 1−2/v ∞ a < ∞ for some 2 < v ≤ δ and a > 1 − 2/v; (as in Tong j=1 j α(j) 1990, Doukhan 1994); (b) there exists n } of positive integers satisfying sn → ∞ and √ a sequence {s nhm , such that n/hm α(sn ) → 0 as n → ∞. sn = o Assumption 4 The following assumptions are made on the smoothing bandwidth h and on the kernel function K(·): (a) the function K(·) is a product kernel defined on the unit ball in Rm , that is K(u) = 0 for u > 1; (b) uj K(u) − vj K(v) ≤ Q u − v for 0 ≤ |j| ≤ 2p + 1 and a constant Q; (c) the bandwidth h satisfies h → 0 and nhm → ∞ for n → ∞. Furthermore,
−1/(m+2p+2) . h=O n Assumption 5 The partial derivatives (Dj g)(x; φ) are bounded and uniformly on Rm , for |j| = p + 1. They are also Lipschitz continuous, that is continuous (Dj g)(u; φ) − (Dj g)(v; φ) ≤ Q u − v. 3 The multivariate local polynomial estimator In this section we describe the nonparametric local polynomial estimator of the function (2), as defined in Masry (1996a,b). Local polynomial estimators are based on approximating g(x; φ) locally by a multivariate polynomial of total order p g(z; φ) ≈
1 (Dj g)(y; φ) (z − x)j ≡ bj (x; φ)(z − x)j , y=x j!
0≤|j|≤p
(4)
0≤|j|≤p
where we use the following notation j! = j1 ! × · · · × jm !,
=
p i i
,
i=0 j1 =0 jm =0 j1 +···+jm =i
0≤|j|≤p
j
j
xj = x11 × · · · × xmm , bj (x; φ) =
1 (Dj g)(y; φ) . y=x j!
(5)
Given the observed time series {Xt , t = 1, . . . , n}, the estimate of the function in (2), for a given j such that 0 ≤ |j| ≤ p, is obtained following the least squares method, i.e. by solving the following multivariate minimization problem n t=im +1
⎡ ⎣φ(Xt ) −
0≤|j|≤p
⎤2 bj (x; φ)(Xt − x)j ⎦ Kh (Xt − x) ,
(6)
122
M. L. Parrella, C. Vitale
with respect to each bj (x; φ). Here Kh (u) = h−m K(u/h), where the kernel function K(·) and the bandwidth parameter h satisfy the Assumption 4 given in Sect.2. As in Masry (1996a), we define the quantities sh,j (x) = th,j (x; φ) =
n Xt − x j 1 Kh (Xt − x), n−im h 1 n−im
t=im +1 n
φ(Xt )
t=im +1
Xt − x h
0 ≤ |j| ≤ 2p
(7)
j Kh (Xt − x),
0 ≤ |j| ≤ p.
(8)
Let Ni be the number of distinct m-tuples j with |j| = i. Let us consider a function ri (q) which establishes a one-to-one relationship between each m-tuple and the integers 1 ≤ q ≤ Ni . We define the column vectors τ h,|j| and β h,|j| , both of length N|j| , for 0 ≤ |j| ≤ p, whose generic elements of position q are respectively (τ h,|j| )q = th,r|j| (q) (x; φ) (β h,|j| )q = h|j| br|j| (q) (x; φ).
(9)
We also define the matrices Sh,|j|,|k| , of dimension N|j| × N|k| , for 0 ≤ |j | ≤ p and 0 ≤ |k| ≤ p, whose generic elements of position (q, v) are
Sh,|j|,|k| q,v = sh,r|j| (q)+r|k| (v) (x).
(10)
The quantities defined in (9) and (10) are collected to construct the following arrays ⎡ ⎡ ⎤ ⎤ β h,0 τ h,0 ⎢ β h,1 ⎥ ⎢ τ h,1 ⎥ ⎢ ⎢ ⎥ ⎥ , τ h (x; φ) = ⎢ . (11) β h (x; φ) = ⎢ . ⎥ ⎥, ⎣ .. ⎣ .. ⎦ ⎦ β h,p τ h,p ⎡ ⎤ Sh,0,0 Sh,0,1 . . . Sh,0,p ⎢ Sh,1,0 Sh,1,1 . . . Sh,1,p ⎥ ⎢ ⎥ Sh (x) = ⎢ . (12) ⎥, .. . . .. ⎣ .. ⎦ . . . Sh,p,0 Sh,p,1 . . . Sh,p,p with dimensions respectively equal to (N × 1), (N × 1) and (N × N), where p N = N . Assuming that the matrix Sh (x) is positive definite, the local i=0 i polynomial estimator of the coefficients bj (x; φ) in Eq. (4) can be derived as follows (13) βˆ h (x; φ) = S−1 h (x)τ h (x; φ). Looking at (5), (9), (11) and (12), the vector βˆ h (x; φ) gives an estimation of the regression function g(x; φ) and of its partial derivatives of order j, up to a
Bootstrap inference in local polynomial regression of time series
123
total order p. In particular, for a given j such that 0 ≤ |j| ≤ p, the qth element of the vector βˆ h (x; φ) is equal to
βˆ h (x; φ)
q
gh )(x; φ) h|j| (Dj , = j!
where q =
−1 r|j| (j) +
|j|−1
Nl ,
(14)
l=0
so it can be used to get an estimation of the partial derivative (Dj g)(x; φ).
4 Inferential procedure through local polynomial bootstrap 4.1 The resampling method Under the assumptions made in Sect. 2, Theorem 4 in Masry (1996b) establishes the joint asymptotic normality of the normalized statistics reported in (3). Such result can be used to derive approximate confidence bands for the true regression function, providing that the bias vector and the variance/covariance matrix of the asymptotic distribution are known. These last quantities have also been derived, but they depend on some unknown characteristics of the process which are quite difficult to estimate. To overcome this problem, we suggest to use a resampling method based on an extension of the local bootstrap of Paparoditis and Politis (2000). We refer to this method as to the LPB, since we apply it to local polynomial estimators. The LPB works in an intuitive way: bootstrap replicates of the observed time series {Xt , t = 1, . . . , n} are obtained by resampling the observations Xt∗ on the basis of a consistent estimator of the conditional distribution function FXt |Xt (·|x). As the sampling distribution of (Dj gh )(x; φ) depends on such conditional distribution FXt |Xt (·|x), a resampling procedure which takes into account a consistent estimator of this distribution will correctly reproduce with high g )(x; φ). We will now describe the method in detail. probability the law of (Dj h
A. Bootstrap replicates of the time series are generated by resampling each individual observation Xt∗ independently from the others, according to the following relation Xt∗ ∼ Fˆ b,Xt |Xt (·|Xt ) im + 1 ≤ t ≤ n, (15) where Fˆ b,Xt |Xt (·|Xt ) represents the local polynomial estimation of order p of the conditional distribution function of Xt , given the vector of past values Xt . Such estimation is obtained by using a second bandwidth parameter b, called resampling bandwidth, which has the function of capturing the bias term of the asymptotic distribution of (3), as shown in the Appendix. The local polynomial estimator of the conditional distribution function used in (15) is given by −1 (16) Fˆ b,Xt |Xt (·|x) = eT 1 Sb (x)τ b (x; φ),
124
M. L. Parrella, C. Vitale
where e1 is a column vector of length N with all zeroes, except a one in the first position, and φ = I(−∞,·] is the indicator function. It is known that this estimator produces a conditional distribution function that is not constrained either to lie between 0 and 1, or to be monotone increasing. In general this problem arises at the boundary of the function, were there are few observations, and these are concentrated on one side of the symmetric kernel, so that the differences (Xt − x) in the (6) tend to have the same sign. To generate the data, we use a modified version of the probability integral transformation: a sequence of random data is generated from an uniform distribution, and, for each number, we take the percentile of the function in correspondence to the nearest estimated value of the nonparametric distribution function, without taking into consideration the fact that somewhere the function is not monotone increasing. Since we take the percentiles in correspondence to random numbers generated from a uniform, we are excluding from the resampling those parts of the conditional function outside the interval [0, 1]. This does not imply that the observations at the boundary are always excluded from the resampling, but they are eventually excluded from the resampling when generating some parts of the bootstrap series. Note that this is common in bootstrap applications to kernel estimators: the residual bootstrap always requires that the residuals produced at the boundary of the function are excluded from the resampling, as they derive from a local estimated autoregressive function which is inaccurate, because of the few observations available. B. Using the generated bootstrap sequence (15), the bootstrap counterpart of (13) is ∗ ∗ (17) βˆ h (x; φ) = S−1 h (x)τ h (x; φ), where the vector τ ∗h (x; φ) is obtained by replacing the observed data φ(Xt ) into (8) with the bootstrap data φ(Xt∗ ). Considering Eqs. (5), (9), (11) and (12), it g )(x; φ), given is also possible to obtain from (17) a bootstrap replicate of (Dj ∗ that the q-th element of the vector βˆ h (x; φ) is equal to
∗ gh∗ )(x; φ) h|j| (Dj βˆ h (x; φ) = , q j!
with q =
−1 r|j| (j) +
h
|j|−1
Nl .
(18)
l=0
Note that the estimation in (13) is based on the sequence of observed pairs {(Xt , Xt ), t = im + 1, . . . , n)}, while the bootstrap estimate in (17) is based on the sequence of bootstrap pairs {(Xt∗ , Xt ), t = im + 1, . . . , n)}, where (Xj∗ , Xj ) and (Xs∗ , Xs ) are independent for j = s, and the vectors of lagged values (Xt , t = im + 1, . . . , n) are the observed ones, so that at first sight they seem not to be dependent on the values (Xt∗ , t = im + 1, . . . , n). The dependence structure of the observed time series is reproduced into the bootstrap replicated series by generating each Xt∗ according to an estimated distribution function which varies with the index t, and which is conditioned to the vector Xt . This choice makes the bootstrap algorithm easier and faster.
Bootstrap inference in local polynomial regression of time series
125
C. The distribution of the normalized statistic in (3), for a given j such that 0 ≤ |j| ≤ p, can be approximated by means of the following bootstrap distribution nhm+2|j| (Dj gh∗ )(x; φ) − (Dj g∗ )(x; φ) X1 , . . . , Xn , (19) g∗ )(x; φ) represents the expected value of the bootstrap where the quantity (Dj statistic, conditioned to the observed data, that is g∗ (x; φ) = E∗[φ(Xt )|Xt = x] = φ(Xt )dFˆ b,X |X (y|x) ≡ gˆ b (x; φ), t
(Dj g∗ )(x; φ) =
∂ j g∗ (x; φ) j
j
∂x11 · · · ∂xmm
,
(20)
t
for 0 ≤ |j| ≤ p.
(21)
4.2 Consistency of the LPB In order to analyze the consistency of the LPB method, it is necessary to make some assumptions on the new parameter introduced, the resampling bandwidth b. This parameter must satisfy the condition that follows. Assumption 6 The resampling bandwidth b is such that b = O(n−ω ), with 0 < ω < 1/[m(m + 2)]. Assumption 6 is needed for the bootstrap to get a correct approximation of the bias term of the distribution of the statistic in (3). In general this parameter is of greater order than the smoothing bandwidth h of the observed statistic in (13), as it follows by comparing Assumptions 4c and 6. This situation, known as the oversmoothing assumption, appears whenever bootstrap is used to approximate the sampling distribution of some kernel type estimator [see, for example, Franke et al. (2002), Paparoditis and Politis (2000)]. The consistency of the LPB method is based on the theorem that follows. Theorem 1 If the assumptions 1–6 are satisfied, for every continuity point of the functions fXt (·) and σ 2 (x; φ), and for every compact set C ∈ Rm , we have, for a given vector j such that 0 ≤ j ≤ p, that p sup Ln (x) − L∗n (x) −→ 0, for n → ∞, where Ln (x) is the distribution of the statistic nhm+2|j| (Dj gh )(x; φ) − (Dj g)(x; φ) , and L∗n (x) is the distribution of the bootstrap statistic
nhm+2|j| (Dj gh∗ )(x; φ) − (Dj g∗ )(x; φ) |X1 , . . . , Xn .
126
M. L. Parrella, C. Vitale
Proof See the Appendix. As resulting from condition 4c, the smoothing bandwidth h depends on the size of the sample n, as well as on the parameters p and m. In particular, it increases with p and m and decreases with n. A natural explanation of this is that the bandwidth must be wide enough to include, within the compact set around the point x, a number of observations sufficient to estimate a multivariate (m ≥ 1) polynomial function of order p. As n grows, the probability of observing that number of points, in a little compact set, increases as well. Assumption 6 shows that the resampling bandwidth b does not depend on the polynomial order p. Nevertheless, further investigations are required in order to identify any specific relation between h and b, which could be used to find an objective rule for the choice of the resampling bandwidth b, given the bandwidth h. 5 Simulation results In order to give some evidence of the finite sample behaviour of the Local Polynomial Bootstrap, we report in this section the results of a simulation study. We consider the following two models: Xt = (1 + 0.8e−Xt−1 )Xt−1 − (0.25 + 1.5e−Xt−1 )Xt−2 + 0.5t , Xt = sin(Xt−2 ) + t , 2
2
(22) (23)
where t ∼ N (0, 1). Model (22) is an exponential autoregressive model of order two, EXPAR(2), discussed in Lu (1999). For such model, the function g(Xt−1 , Xt−2 ) = (1 + 0.8e−Xt−1 )Xt−1 − (0.25 + 1.5e−Xt−1 )Xt−2 2
2
represents the conditional mean of the process. Note that in this example and in the next one, the function φ corresponds to the identity function, and it is therefore omitted from the notation. Realizations of length n = 500 have been generated from the model (22), and the nonparametric local polynomial estimator with p = 1, the local linear estimator, has been used to estimate the function g(x). Figure 1 reports the scatter plot of the lagged values (Xt−1 , Xt−2 ), for a particular realization of the model, and the estimated surface of the conditional mean function g(x) resulting from the local linear estimation. We chose the Epanechnikov’s kernel function with smoothing bandwidth h = 1.3, and we applied the LPB method to derive the 90% confidence region for the function g(x). The resampling bandwidth b was chosen to be 2. For the sake of brevity, the choice of the bandwidth parameters h and b was made empirically, taking into account the closeness of the estimated surface to the true function. An objective rule for the selection of such parameters, like one based on a crossvalidation criterion, would clearly be preferable for optimization reasons, but this is beyond the purposes of this paper. The distribution of the bootstrap statistic has been evaluated by Monte Carlo approximation, using 999 bootstrap
127
-2
x2
-4 -2
0
s ob 0 2
2
4
Bootstrap inference in local polynomial regression of time series
3 2 1
-4
0 x2
-4
-2
0 x1
2
-1 -2 -3
-3
-2
-1
0 x1
1
2
3
Fig. 1 Simulation from an EXPAR(2) model. On the left, the scatter plot of the lagged values (Xt−1 , Xt−2 ), for a particular realization of the model. On the right, the estimated surface of the conditional mean function g(x) obtained trough the local linear estimator
replications. The procedure has been repeated on 500 different realizations of the model (22). The results of the bootstrap estimates, in terms of mean values and standard deviations, are reported in Table1. Figure 2 summarizes the results by representing the mean of the 500 estimations. The value Xt−1 is represented on the x-axis, while the three rows correspond to different sections of the surface for three different values of Xt−2 . Similar results are obtained for the other points of the support of the function. On the left-hand side of the figure, the central dashed line of each plot represents the local linear estimation of the function, while the central solid line represents the true function. On the same plots, the two dotted lines are the bootstrap 90% confidence bands, derived trough the LPB method. On the right-hand side of the figure, such confidence bands (dotted lines) are compared with the true confidence bands (solid lines), which have been derived by Monte Carlo approximation, considering 5,000 replications of the model. As shown in the figure, the LPB method provides a reasonable approximation of the sampling distribution of the local polynomial estimator. Some discrepancies arise as we move towards the bounds of the function’s support, where the local polynomial estimation is generally inaccurate as few observations are available. Model (23) is discussed in Paparoditis and Politis (2000). They considered the estimation of the conditional mean m(x) = E(Xt |Xt−2 = x) = sin(x), while here we concentrate on the estimation of the first derivative of such function m (x) = cos(x). Note that such derivative could not be estimated with the Nadaraya–Watson estimator used by Paparoditis and Politis. In this example we consider realizations of length n = 300. The smoothing bandwidth was chosen to be h = 0.8, while the resampling bandwidth was b = 1.4. Since we look at the first derivative of the conditional mean, we consider the local polynomial estimator of degree p = 2. The other settings are the same as in the previous example. Table2 reports the results of the experiment. Figure 3 summarizes such results. The meanings of the plotted lines are the same as in Figs. 1 and 2. The true confidence bands (solid lines) have been evaluated by Monte Carlo approximation, considering 10,000 replications of the model. It is evident that
128
M. L. Parrella, C. Vitale
Table 1 Bootstrap estimates of some percentage points of the distribution of the local linear estimator gˆ h (Xt−1 , Xt−2 ), for different values of Xt−1 and Xt−2 Xt−2 = −3,
Xt−1 =
−3
−2.6
−2
−1.4
−1
5%
Exa Boo STD Exa Boo STD
−2.447 −2.736 0.919 −1.986 −2.085 0.918
−2.005 −2.238 0.917 −1.624 −1.671 0.917
−1.305 −1.600 0.918 −0.869 −0.844 0.918
−0.413 −0.776 0.921 0.287 0.443 0.922
0.352 −0.040 0.929 1.363 1.752 0.931
Xt−1 = Exa Boo STD Exa Boo STD
−3 −2.942 −3.215 0.920 −2.372 −2.434 0.920
−2.2 −1.966 −2.097 0.916 −1.734 −1.763 0.916
−1.2 −0.486 −0.600 0.916 −0.297 −0.250 0.916
−0.2 1.462 1.559 0.916 1.742 1.985 0.916
0.6 2.367 2.533 0.920 2.968 3.295 0.921
Xt−1 = Exa Boo STD Exa Boo STD
−2 −2.128 −2.231 0.916 −1.882 −1.875 0.916
−1.2 −1.342 −1.469 0.916 −1.128 −1.106 0.916
−0.2 −0.060 −0.192 0.916 0.131 0.238 0.916
0.8 1.140 1.090 0.916 1.358 1.492 0.916
1.6 1.769 1.730 0.916 2.023 2.043 0.916
95%
Xt−2 = −1.4, 5%
95%
Xt−2 = −0.2, 5%
95%
“Boo” refers to the mean value while “STD” to the standard deviation of the bootstrap estimates over 500 independent replications of the model (22). The exact values (Exa) of such percentage points have been derived by Monte Carlo approximation
the local quadratic estimator effectively identifies the derivative of the regression function, while the LPB method works satisfactorily in approximating the relative sampling distribution.
6 Conclusions In this paper we have proposed a local resampling method for the approximation of the sampling distribution of the multivariate local polynomial estimator of generic order p. This method, called LPB, represents a generalization of the local bootstrap of Paparoditis and Politis (2000). The purpose is to combine the advantages of local bootstrap with those of local polynomial estimators over the Nadaraya–Watson estimator. The proposed method is totally model-free and works under general dependence assumptions. In spite of such dependence, the resampling mechanism on which it is based is such that the observations of the bootstrap samples are generated independently one from the others. This makes the computational procedure simple and faster. The LPB has the advantage of being easier to apply than the classic asymptotic approximation, since it does not require the estimation of any unknown quantities, i.e. the bias and the variance of the asymptotic distribution.
Bootstrap inference in local polynomial regression of time series
129
x2 = -3
-3
2 1
0 -1 -3
-2
0 -2
-1
g(x1,x2)
1
2
90% confidence bands
3
3
x2 = -3
-3
-2
-1
0
1
-3
2
-2
-1
x1
2
1
2
1
2
-1
0
1
2
3
-3
-3
-2
90% confidence bands
2 1
0 -2
-1
-2
-1
0
1
2
-3
-2
-1
0
x1
x1
x2 = -0.2
x2 = -0.2
-3
2 1
0 -1 -3
-2
-1
0
1
2
90% confidence bands
3
3
-3
-2
g(x1,x2)
1
x2 = -1.4
3
x2 = -1.4
g(x1,x2)
0 x1
-3
-2
-1
0 x1
1
2
-3
-2
-1
0 x1
Fig. 2 The three rows of the figure refer to three different sections of the support of the function g(x), for different values of Xt−2 . The value Xt−1 is represented on the x-axis of each plot. On the left-hand side, the central dashed line of each plot represents the local linear estimation of the conditional mean function, while the central solid line is the true conditional mean function. The two dotted lines are the 0.05 and the 0.95 percentile bands derived trough the LPB method. On the right-hand side, such bootstrap confidence bands (dotted lines) are compared with the true confidence bands (solid lines), derived by Monte Carlo approximation. All the estimated curves refer to the mean value of the estimates obtained from 500 different replications of the model
130
M. L. Parrella, C. Vitale
Table 2 Bootstrap estimates of some percentage points of the distribution of the local quadratic ˆ h (Xt−2 ), for different values of Xt−2 estimator m Xt−2 = 5%
Exa Boo STD Exa Boo STD
95%
−2.4
−1.6
−2.111 −1.901 1.392 0.736 1.106 1.152
−0.661 −0.693 0.408 0.582 0.567 0.337
−0.8 0.249 0.224 0.202 1.094 1.061 0.219
0
0.8
0.589 0.51 0.188 1.324 1.253 0.195
0.242 0.182 0.22 1.078 1.005 0.242
1.6
2.4
−0.667 −0.701 0.418 0.597 0.52 0.444
−2.1 −2.451 1.289 0.685 0.43 1.364
0 -3
-3
-2
-2
-1
-1
0
x(t)
1
1
2
2
3
3
“Boo” refers to the mean value while “STD” to the standard deviation of the bootstrap estimates over 500 independent replications of the model (22). The exact values (Exa) of such percentage points have been derived by Monte Carlo approximation
0
50
100
150
200
250
-3
300
-2
-1
0
1
2
3
-1
0 x1
1
2
0 -1
90% confidence bands -2
-2
0 -1 -2
deriv_m(x)
1
1
x(t-2)
-2
-1
0
1
2
x1
Fig. 3 Simulation from the model Xt = sin(Xt−2 ) + t . Top the time plot and the scatter plot of the values (Xt , Xt−2 ), for a particular realization of the model. Bottom on the left, the central dashed line represents the local quadratic estimation of the first derivative of the conditional mean function, while the central solid line is the true derivative function. The two dotted lines are the 0.05 and the 0.95 percentile bands derived trough the LPB method; on the right, the LPB confidence bands (dotted lines) are compared with the corresponding true percentile bands evaluated by Monte Carlo approximation (solid lines)
Comparing the LPB with the residual bootstrap, there are some advantages in using LPB. For example, it is not necessary to “clean” the data from dependence with some initial regression estimation. In the LPB, the bootstrap samples are generated directly on the basis of a local polynomial estimation of the conditional distribution function FXt |Xt . Moreover, the proposed method is not based on a Markovian assumption on the bootstrap data generating process: this also
Bootstrap inference in local polynomial regression of time series
131
means that the bootstrap series can be generated through matrix operations, without need of implementing a (slower) recursive algorithm. We presented a theorem that proves the consistency of the proposed method under the assumptions made in Sect. 2. In particular, a resampling bandwidth b is introduced, generally of greater order than the smoothing bandwidth h, which has the purpose of capturing the bias term of the distribution of the estimator. Such an oversmoothing assumption is due to the characteristics of the kernel estimators, and is common in applications of the bootstrap to problems similar to that considered here. The results of the simulation study presented in Sect. 5 have demonstrated that the finite sample behaviour of the LPB method is satisfactory.
Appendix Before reporting the proofs, we need some additional notation. Let us consider the multivariate moments of the functions K(·) and K2 (·), defined respectively as follows:
uj K(u)du,
µj = R
uj K2 (u)du,
νj = R
m
for 0 ≤ |j| ≤ 2p.
m
As seen for matrices Sh,i,j in (10), we define the Ni ×Nj dimensional matrices Si,j and S˜ i,j using respectively the quantities µri (q)+rj (v) and νri (q)+rj (v) , and collect ˜ as seen for Sh in (12). Then we define the them to form the matrices S and S, N × Np+1 dimensional matrices Mh and M as ⎡
⎤ Sh,0,p+1 ⎢ Sh,1,p+1 ⎥ ⎢ ⎥ Mh = ⎢ . ⎥, ⎣ .. ⎦ Sh,p,p+1
⎡
⎤ S0,p+1 ⎢ S1,p+1 ⎥ ⎢ ⎥ M = ⎢. ⎥. ⎣ .. ⎦ Sp,p+1
Let us arrange the Np+1 elements of the derivatives j!1 (Dj g)(x, φ) for |j| = p + 1 as a column vector g(p+1) (x; φ), using the same matching rule introduced earlier. Furthermore, let us denote with τ ch (x; φ) the centered version of the vector τ h (x; φ), based on the following quantities:
c th,r (x; φ) |j| (q)
n Xt,m − x j 1 φ(Xt ) − g(Xt ; φ) = Kh (Xt − x). n−im h t=im +1
132
M. L. Parrella, C. Vitale
The corresponding bootstrap version τ ∗c h (x; φ) is defined as follows: ∗c th,r (x; φ) = |j| (q)
n Xt,m − x j 1 φ(Xt∗ ) − g∗ (Xt ; φ) Kh (Xt − x). n−im h t=im +1
∗c (x; φ) Let Q∗n represent a generic linear combination of the quantities th,j
Q∗n =
∗c cj th,j (x; φ) =
0≤|j|≤p
n 1 Zt∗ (x; φ) n − im t=im +1
where Zt∗ (x; φ) = φ(Xt∗ ) − g∗ (Xt ; φ) Ch (Xt − x), 1 u and C(u) = Ch (u) = m C cj uj K(u). h h 0≤|j|≤p
We will not indicate the dependence of the quantities defined so far from the point x and from the function φ, when it is clear. As usual, we use the star notation to refer to the bootstrap counterparts of such quantities. Proof of Theorem 1 Following Masry (1996a), the following identity holds: ∗ ∗c p+1 −1 Sh Mh g∗(p+1) (x; φ) + op (bp+1 ), βˆ h (x; φ) − β ∗ (x; φ) = S−1 h τ h +h
(24)
so we can write √ nhm βˆ h (x; φ) − β ∗ (x; φ) √ ∗c m+2(p+1) S−1 M g∗(p+1) (x; φ) + o m b2(p+1) . τ + nh nh = nhm S−1 p h h h h √ We have from Lemma 2 that the vector t∗n , multiplied by the factor nhm , converges in distribution to a multivariate normal with zero means and vari˜ This, together with Theorem 1 ance-covariance matrix equal to σ 2 (x; φ)fXt (x)S. of Masry (1996b), implies that √
σ 2 (x; φ) −1 −1 ∗c d ˜ S , SS nhm S−1 τ −→ N 0, h h fXt (x)
which correctly expresses the variability of the considered estimator, while from Lemma 3 we conclude that p ∗(p+1) nhm+2(p+1) S−1 M g (x; φ) −→ nhm+2p+3 S−1 Mg(p+1) (x; φ), h h which captures the asymptotic bias.
Bootstrap inference in local polynomial regression of time series
133
The proof of Theorem 1 is based on the following lemma, which deals with the uniform convergence properties of bootstrap conditional means g∗ (x; φ) to g(x; φ), for values of x in a decreasing compact set. Put the vector β ∗ (x; φ) as β ∗ (x; φ) = E∗ [β(x; φ)] = βˆ b (x; φ).
(25)
Lemma 1 Under the hypothesis (1)–(6), let Ih (x) be the compact subset of Rm : [x − h; x + h], with h > 0. If h → 0 for n → ∞ then we have (a)
p sup β ∗ (x; φ) − β(x; φ) −→ 0,
x∈Ih (x)
(26)
where β ∗ (x; φ) = E∗ [β(x; φ)] = βˆ b (x; φ), and (b) p sup (Dj g∗ )(x; φ) − (Dj g)(x; φ) −→ 0
x∈Ih (x)
for each 0 ≤ |j| ≤ p. Proof Given (14), (20), (21) and (25), the result in (b) directly follows from (a), so we just have to prove (a). The following identity holds c p+1 (p+1) p+1 β ∗ (x; φ) − β(x; φ) = S−1 g (x; φ)S−1 ). b tb + b b Mb + op (b
(27)
Given (27), it is useful to split the argument of the sup in (26) as follows p+1 (p+1) −1 c + sup b sup β ∗ (x; φ) − β(x; φ) ≤ sup S−1 t g (x; φ) S M b b b b
x∈Ih (x)
x∈Ih (x)
x∈Ih (x)
p+1
+op (b
) = B1 + B2 + B3 .
As concerning the first term B1 , under the hypothesis 2b it is enough to prove that the generic element of the matrix Sb and of the vector tcb converges to zero uniformly in Ih (x), i.e. p sup sb,j (x) − fXt (x)µj −→ 0 for 0 ≤ |j| ≤ 2p
(28)
p c sup tb,j (x) −→ 0 for 0 ≤ |j| ≤ p.
(29)
x∈Ih (x)
x∈Ih (x)
In order to prove (28) and (29), we divide the compact set Ih (x) = [x−h; x+h] into a finite number Ln of subsets Ik of dimension m, with sides of length ln = 1/m 2h/Ln , and we consider the central values of such subsets xk , for k = 1, . . . , Ln . Note that for n → ∞ we must have h → 0 and ln → 0, so that the subsets tend to be narrower around the points xk , and also the width of the interval Ih (x)
134
M. L. Parrella, C. Vitale
tends to zero. Now we study the behaviour of the sup inside the subsets as n → ∞. For the matrix Sb we have sup sb,j (x) − fXt (x)µj
x∈Ih (x)
≤ sup sb,j (x) − E[sb,j (x)] + sup E[sb,j (x)] − fXt (x)µj . x∈Ih (x)
x∈Ih (x)
Under the hypothesis 4a, 2b and 6, Proposition 1 of Masry (1996a) states that supx∈Ih (x) E[sb,j (x)] − fXt (x)µj = o(1), while for the first term we have sup sb,j (x) − E[sb,j (x)] = max
sb,j (x) − E[sb,j (x)]
sup
1≤k≤Ln x∈Ih (x)∩Ik
x∈Ih (x)
≤ max
sup
1≤k≤Ln x∈Ih (x)∩Ik
+ max
sb,j (x) − sb,j (xk ) + max sb,j (xk ) − E[sb,j (xk )]
sup
1≤k≤Ln x∈Ih (x)∩Ik
1≤k≤Ln
E[sb,j (xk )] − E[sb,j (x)] = S1 + S2 + S3 .
Given (7) and the condition 4b, it follows that sb,j (x) − sb,j (xk ) ≤
C bm+1
x − xk ,
(30)
so if we set η = ln b−(m+1) and consider the sup of (30), we have S1 = O (η), S3 = O (η) and P(S2 > η) ≤
Ln
P sb,j (xk ) − E[sb,j (xk )] > η
k=1 n
2 1 1 E sb,j (xk ) − E[sb,j (xk )] ≤ 2 Ln sup Var(sb,j (x)) 2 η η x∈Ih (x) k=1 Ln . =O 2 η nbm
L
≤
The inequality relative to the term S2 follows from Markov’s inequality and from hypotheses 2a, 3a, 4a and 6, for which supx∈Ih (x) Var[sb,j (x)] = O(n−1 b−m ) (see Theorem 1 of Masry (1996a)). We proceed in the same way for the generic element of the vector tcb c c c c sup tb,j (x) ≤ sup tb,j (x) − tb,j (xk ) + sup tb,j (xk )
x∈Ih (x)
= max
x∈Ih (x)
x∈Ih (x)
c c c sup tb,j (x) − tb,j (xk ) + max tb,j (xk ) = T1 + T2 .
1≤k≤Ln x∈D∩Ik
1≤k≤Ln
Bootstrap inference in local polynomial regression of time series
135
Given (8) and the hypothesis 4b, we get n C x − x 1 c k c |(φ(Xt ) − g(Xt ; φ))| (xk ) ≤ tb,j (x) − tb,j bm+1 n − im t=im +1
so, considering the sup, we have, under assumption (1), that T1 = O (η). For the term T2 , by applying Markov’s inequality and lemmas 1 and 2 of Masry (1996a), we have under the hypotheses 1, 2c–d, 3a, 4a and 6 that P(T2 > η) = P
Ln c c max tb,j (xk ) > η ≤ P tb,j (xk ) > η
1≤k≤Ln
k=1
Ln 2 L 2 1 Ln n c c ≤ 2 E tb,j (xk ) ≤ 2 max E tb,j (xk ) = O 2 m . η η 1≤k≤Ln η nb k=1
Finally, as Ln = (2h/ln )m and η = ln b−(m+1) , it follows that P T2 >
ln bm+1
=O
bm+2 hm
lnm+2 n
P S2 >
ln bm+1
=O
bm+2 hm
lnm+2 n
.
Therefore, in order to prove (28) and (29), it is needed that such probabilities −1 tend to zero as n → ∞. This is true if we choose ln = bn−(m+2) , b = O(n−ω ) −1 and 0 < ω < (m + 2) . Regarding term B2 , the result in (28), and the assumptions 2b and 5, imply that M B2 = sup bp+1 g(p+1) (x; φ) S−1 = O(bp+1 ). b b x∈Ih (x)
Lemma 2 Under the hypotheses (1)–(6), for every continuity point of the functions fXt (·) and σ (x; φ), we have ⎛ √
d ⎜ nhm Q∗n −→ N ⎝0, fXt (x)σ 2 (x; φ)
⎞ ⎟ C2 (u)du⎠
Rm
Proof Note that √
nhm
√
nhm Q∗n is equal to
n 1 Zt∗ ≈ n − im t=im +1
n hm φ(Xt∗ ) − g∗ (Xt ; φ) Ch (Xt − x) n t=im +1
136
M. L. Parrella, C. Vitale
and that, conditionally to X1 , X2 , . . . , Xn , the series Zt∗ = φ(Xt∗ ) − g∗ (Xt ; φ) Ch (Xt − x)
t = im + 1, . . . , n n→∞
forms a triangular array of independent random variables, with zero mean and variance dependent on t
∗ ∗ E Zt = E∗ φ(Xt ) − g (Xt ; φ) Ch (Xt − x) = 0;
Var Zt∗ = E Zt∗2 . To apply the Central Limit Theorem of Lindeberg-Feller, we have to show that n hm 2∗ 2 2 E(Zt ) −→ fXt (x)σ (x; φ) C (u)du n t=im +1
R
(31)
m
and that, for every η > 0, the Lindeberg condition is satisfied, i.e. m n h hm p 2∗ Zt2∗ > η −→ 0. E∗ Zt I n n
(32)
t=im +1
Let us consider first (31). We define the generic conditional central moment of order s of the random variable φ(Xt ) as follows s
V (x; φ) =
s φ(y) − g(x; φ) fXt |Xt (y|x)dy
s and consider its bootstrap estimator V s∗ (Xt ; φ) = E∗ φ(Xt∗ ) − g∗ (Xt ; φ) . We have n 2 2 hm ∗ ∗ E∗ φ(Xt ) − g (Xt ; φ) Ch (Xt − x) n t=im +1
=
n hm V 2 (Xt ; φ)Ch2 (Xt − x) n
+
t=im +1 n
hm n
t=im +1
V 2∗ (Xt ; φ) − V 2 (Xt ; φ) Ch2 (Xt − x).
Bootstrap inference in local polynomial regression of time series
137
For Theorem 2 of Masry (1996b), the first term in the last equality converges under the assumptions 1, 2d, 3a and 4a to n hm V 2 (Xt ; φ)Ch2 (Xt − x) −→ fXt (x)σ 2 (x; φ) C2 (u)du, n t=im +1
Rm
while for the second term, as V 2 (z; φ) = g(z; φ(y) = y2 ) − g2 (z; φ(y) = y) and V 2∗ (z; φ) = g∗ (z; φ(y) = y2 )−g2∗ (z; φ(y) = y), given Lemma 1 and some partial results proved in theorem 2 of Masry, we have that m n h 2∗ 2 2 (33) V (Xt ; φ) − V (Xt ; φ) Ch (Xt − x) n t=im +1 ≤ sup V 2∗ (z; φ) − V 2 (z; φ) Op (1) ≤ Op (1) sup |g∗ (z; φ 2 ) − g(z; φ 2 )| z∈Ih (x)
z∈Ih (x)
∗
p
∗
+Op (1) sup |g (z; φ) − g(z; φ)| sup |g (z; φ) + g(z; φ)| −→ 0. z∈Ih (x)
z∈Ih (x)
As concerning the Lindeberg condition, using the inequality E[|X| I(|X| > δ)] ≤ δ −1 E |X|2 with δ > 0 and E |X|2 < ∞, we have n
E∗
t=im +1
m n hm 2∗ h 1 h3m 2∗ 4∗ Zt I Zt > η ≤ E∗ (Zt ). n n ηnhm n t=im +1
As before, let us consider first the second part of the last equation
=
n h3m V 4 (Xt ; φ)Ch4 (Xt − x) n
+
t=im +1 n 3m
h n
V 4∗ (Xt ; φ) − V 4 (Xt ; φ) Ch4 (Xt − x).
t=im +1
As in the proof of (33), it results from Lemma 1 that n h3m 4∗ V (x; φ) − V 4 (Xt ; φ) Ch4 (Xt − x) n t=im +1 p ≤ sup V 4∗ (z; φ) − V 4 (z; φ) Op (1) −→ 0, z∈Ih (x)
138
M. L. Parrella, C. Vitale
while for the hypothesis made on the moments of φ(Xt ), it follows that n h3m V 4 (Xt ; φ)Ch4 (Xt − x) = Op (1). n t=im +1
So we have n t=im +1
E∗
m hm 2∗ 1 h 2∗ −→ 0. Z I Z >η = Op n t n t ηnhm
Lemma 3 Under the hypothesis (1)–(6), for n → ∞ we have
p
(p+1)∗ nhm+2(p+1) S−1 (x; φ) −→ h Mh g
nhm+2(p+1) S−1 Mg(p+1) (x; φ)
(34)
Proof The result follows from Theorem 1 of Masry (1996b) and from Lemma 1. The first implies, under the hypotheses 2a, 3a, 4a and 4c that m.s.
m.s.
Sh −→ fXt (x)S
Mh −→ fXt (x)M,
while the second implies that p
g(p+1)∗ (x; φ) −→ g(p+1) (x; φ). Acknowledgments The authors are very grateful to the referees for their valuable comments, which resulted in an improvement over the original manuscript.
References Doukhan P (1994) Mixing. Properties and examples. Lectures notes in statistics, vol 85. Springer, Berlin Heidelberg New York Fan J, Gijbels I (1995) Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. J R Stat Soc B 57:371–394 Franke J, Kreiss JP, Mammen E (2002) Bootstrap of Kernel smoothing in nonlinear time series. Bernoulli 8:1–39 Lu ZQ (1999) Multivariate local polynomial fitting for martingale nonlinear regression models. Ann Ins Stat Math 51:691–706 Masry E (1996a) Multivariate local polynomial regression for time series: uniform strong consistency and rates. J Time Ser Anal 17:571–599 Masry E (1996b) Multivariate regression estimation: local polynomial fitting for time series. Stoc Process Appl 65:81–101 Masry E, Fan J (1997) Local polynomial estimation of regression functions for mixing processes. Scand J Stat 24:165–179
Bootstrap inference in local polynomial regression of time series
139
Masry E, Tjøstheim D (1995) Nonparametric estimation and identification of nonlinear ARCH time series. Econ Theory 11:258–289 Neumann MH, Kreiss JP (1998) Regression-type inference in nonparametric autoregression. Ann Stat 26:1570–1613 Paparoditis E, Politis DN (2000) The local bootstrap for Kernel estimators under general dependence conditions. Ann Inst Stat Math 52:139–159 Ruppert D, Wand P (1994) Multivariate locally weighted least squares regression. Ann Stat 22:1346–1370 Shi SG (1991) Local bootstrap. Ann Inst Stat Math 43:667–676 Tong H (1990) Non-linear time series analysis: a dynamical systems approach. Oxford University Press, Oxford