Spline-backfitted kernel smoothing of nonlinear additive ... - arXiv

7 downloads 0 Views 449KB Size Report
to the Editor, Jianqing Fan, and three anonymous referees for their helpful comments. REFERENCES. [1] Bosq, D. (1998). Nonparametric Statistics for Stochastic ...
arXiv:math/0612677v5 [math.ST] 20 Mar 2008

The Annals of Statistics 2007, Vol. 35, No. 6, 2474–2503 DOI: 10.1214/009053607000000488 c Institute of Mathematical Statistics, 2007

SPLINE-BACKFITTED KERNEL SMOOTHING OF NONLINEAR ADDITIVE AUTOREGRESSION MODEL1 By Li Wang and Lijian Yang University of Georgia and Michigan State University Application of nonparametric and semiparametric regression techniques to high-dimensional time series data has been hampered due to the lack of effective tools to address the “curse of dimensionality.” Under rather weak conditions, we propose spline-backfitted kernel estimators of the component functions for the nonlinear additive time series data that are both computationally expedient so they are usable for analyzing very high-dimensional time series, and theoretically reliable so inference can be made on the component functions with confidence. Simulation experiments have provided strong evidence that corroborates the asymptotic theory.

1. Introduction. For the past three decades, various nonparametric and semiparametric regression techniques have been developed for the analysis of nonlinear time series; see, for example, [14, 21, 25], to name one article representative of each decade. Application to high-dimensional time series data, however, has been hampered due to the scarcity of smoothing tools that are not only computationally expedient but also theoretically reliable, which has motivated the proposed procedures of this paper. In high-dimensional time series smoothing, one unavoidable issue is the “curse of dimensionality,” which refers to the poor convergence rate of nonparametric estimation of general multivariate functions. One solution is regression in the form of an additive model introduced by [9]: (1.1)

Yi = m(Xi1 , . . . , Xid ) + σ(Xi1 , . . . , Xid )εi , m(x1 , . . . , xd ) = c +

d X

mα (xα ),

α=1

Received December 2005; revised January 2007. Supported in part by NSF Grant DMS-04-05330. AMS 2000 subject classifications. Primary 62M10; secondary 62G08. Key words and phrases. Bandwidths, B spline, knots, local linear estimator, mixing, Nadaraya–Watson estimator, nonparametric regression. 1

This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The Annals of Statistics, 2007, Vol. 35, No. 6, 2474–2503. This reprint differs from the original in pagination and typographic detail. 1

2

L. WANG AND L. YANG

in which the sequence {Yi , XTi }ni=1 = {Yi , Xi1 , . . . , Xid }ni=1 is a length-n realization of a (d + 1)-dimensional time series, the d-variate functions m and σ are the mean and standard deviation of the response Yi conditional on the predictor vector Xi = {Xi1 , . . . , Xid }T , and each εi is a white noise conditional on Xi . In a nonlinear additive autoregression data-analytical context, each predictor Xiα , 1 ≤ α ≤ d, could be observed lagged values of Yi , such as Xiα = Yi−α , or of an exogenous time series. Model (1.1), therefore, is the exact same nonlinear additive autoregression model of [14] and [2] with exogenous variables. For identifiability, additive component functions must satisfy the conditions Emα (Xiα ) ≡ 0, α = 1, . . . , d. We propose estimators of the unknown component functions {mα (·)}dα=1 based on a geometrically α-mixing sample {Yi , Xi1 , . . . , Xid }ni=1 following model (1.1). If the data were actually i.i.d. observations instead of a time series realization, many methods would be available for estimating {mα (·)}dα=1 . For instance, there are four types of kernel-based estimators: the classic backfitting estimators (CBE) of [9] and [19]; marginal integration estimators (MIE) of [6, 16, 17, 22, 30] and a kernel-based method of estimating rate to optimality of [10]; the smoothing backfitting estimators (SBE) of [18]; and the two-stage estimators, such as one step backfitting of the integration estimators of [15], one step backfitting of the projection estimators of [11] and one Newton step from the nonlinear LSE estimators of [12]. For the spline estimators, see [13, 23, 24] and [28]. In the time series context, however, there are fewer theoretically justified methods due to the additional difficulty posed by dependence in the data. Some of these are the kernel estimators via marginal integration of [25, 29], and the spline estimators of [14]. In addition, [27] has extended the marginal integration kernel estimator to additive coefficient models for weakly dependent data. All of these existing methods are unsatisfactory in regard to either the computational or the theoretical issue. The existing kernel methods are too computationally intensive for high dimension d, thus limiting their applicability to a small number of predictors. Spline methods, on the other hand, provide only convergence rates but no asymptotic distributions, so no measures of confidence can be assigned to the estimators. If the last d − 1 component functions were known by “oracle,” one could P create {Yi1 , Xi1 }ni=1 with Yi1 = Yi −c− dα=2 mα (Xiα ) = m1 (Xi1 )+σ(Xi1 , . . . , Xid )εi , from which one could compute an “oracle smoother” to estimate the only unknown function m1 (x1 ), thus effectively bypassing the “curse of dimensionality.” The idea of [15] was to obtain an approximation to the unobservable variables Yi1 by substituting mα (Xiα ), i = 1, . . . , n, α = 2, . . . , d, with marginal integration kernel estimates and arguing that the error incurred by this “cheating” is of smaller magnitude than the rate O(n−2/5 ) for estimating the function m1 (x1 ) from the unobservable data. We modify the procedure of [15] by substituting mα (Xiα ), i = 1, . . . , n, α = 2, . . . , d, with

ADDITIVE AUTOREGRESSION MODEL

3

spline estimators. Specifically, we propose a two-stage estimation procedure; first we pre-estimate {mα (xα )}dα=2 by its pilot estimator through an undersmoothed centered standard spline procedure; next we construct the pseudo response Yˆi1 and approximate m1 (x1 ) by its Nadaraya–Watson estimator in (2.12). The above proposed spline-backfitted kernel (SPBK) estimation method has several advantages compared to most of the existing methods. First, as pointed out in [22], the estimator of [15] mixed up different projections, making it uninterpretable if the real data generating process deviates from additivity, while the projections in both steps of our estimator are with respect to the same measure. Second, since our pilot spline estimator is thousands of times faster than the pilot kernel estimator in [15], our proposed method is computationally expedient; see Table 2. Third, the SPBK estimator can be shown to be as efficient as the “oracle smoother” uniformly over any compact range, whereas [15] proved such “oracle efficiency” only at a single point. Moreover, the regularity conditions in our paper are natural and appealing and close to being minimal. In contrast, higher-order smoothness is needed with growing dimensionality of the regressors in [17]. Stronger and more obscure conditions are assumed for the two-stage estimation proposed by [12]. The SPBK estimator achieves its seemingly surprising success by borrowing the strengths of both spline and kernel: the spline does a quick initial estimation of all additive components and removes them all except the one of interest; kernel smoothing is then applied to the cleaned univariate data to estimate with asymptotic distribution. Propositions 4.1 and 5.1 are the keys in understanding the proposed estimators’ uniform oracle efficiency. They accomplish the well-known “reducing bias by undersmoothing” in the first step using spline and “averaging out the variance” in the second step with kernel, both steps taking advantage of the joint asymptotics of kernel and spline functions, which is the new feature of our proofs. Reference [7] provides generalized likelihood ratio (GLR) tests for additive models using the backfitting estimator. A similar GLR test based on our SPBK estimator is feasible for future research. The rest of the paper is organized as follows. In Section 2 we introduce the SPBK estimator and state its asymptotic “oracle efficiency” under appropriate assumptions. In Section 3 we provide some insights into the ideas behind our proofs of the main results, by decomposing the estimator’s “cheating” error into a bias and a variance part. In Section 4 we show the uniform order of the bias term. In Section 5 we show the uniform order of the variance term. In Section 6 we present Monte Carlo results to demonstrate that the SPBK estimator does indeed possess the claimed asymptotic properties. All technical proofs are contained in the Appendix.

4

L. WANG AND L. YANG

2. The SPBK estimator. In this section we describe the spline-backfitted kernel estimation procedure. For convenience, we denote vectors as x = (x1 , . . . , xd ) and take k · k as the usual Euclidean norm on Rd , that is, kxk = q Pd

2 α=1 xα ,

and k · k∞ the sup norm, that is, kxk∞ = sup1≤α≤d |xα |. In what follows, let Yi and Xi = (Xi1 , . . . , Xid )T be the ith response and predictor vector. Denote by Y = (Y1 , . . . , Yn )T the response vector and (X1 , . . . , Xn )T the design matrix. Let {Yi , XTi }ni=1 = {Yi , Xi1 , . . . , Xid }ni=1 be observations from a geometrically α-mixing process following model (1.1). We assume that the predictor Xα is distributed on a compact interval [aα , bα ], α = 1, . . . , d, and without loss of generality, we take all intervals [aα , bα ] = [0, 1], α = 1, . . . , d. We preselect an integer N = Nn ∼ n2/5 log n; see assumption (A6) below. Next, we define for any α = 1, . . . , d the first-order B spline function ([3], page 89), or say the constant B spline function is the indicator function IJ,α (xα ) of the N + 1 equally spaced subintervals of the finite interval [0, 1] with length H = Hn = (N + 1)−1 , that is, (2.1)

IJ,α (xα ) =



1, 0,

JH ≤ xα < (J + 1)H, otherwise,

J = 0, 1, . . . , N.

Define the following centered spline basis:

(2.2)

bJ,α (xα ) = IJ+1,α (xα ) −

kIJ+1,α k2 IJ,α (xα ) kIJ,α k2 ∀α = 1, . . . , d, J = 1, . . . , N,

with the standardized version given for any α = 1, . . . , d, (2.3)

BJ,α (xα ) =

bJ,α (xα ) kbJ,α k2

∀J = 1, . . . , N.

Define next the (1 + dN )-dimensional space G = G[0, 1] of additive spline functions as the linear space spanned by {1, BJ,α (xα ), α = 1, . . . , d, J = 1, . . . , N }, and denote by Gn ⊂ Rn the linear space spanned by {1, {BJ,α (Xiα )}ni=1 , α = 1, . . . , d, J = 1, . . . , N }. As n → ∞, the dimension of Gn becomes 1 + dN with probability approaching 1. The spline estimator of the additive function m(x) is the unique element m(x) ˆ =m ˆ n (x) from the space G so that the T vector {m(X ˆ 1 ), . . . , m(X ˆ n )} best approximates the response vector Y. To be precise, we define (2.4)

ˆ′ + m(x) ˆ =λ 0

d X N X

α=1 J=1

ˆ ′ IJ,α (xα ), λ J,α

5

ADDITIVE AUTOREGRESSION MODEL

ˆ′ ˆ′ ˆ′ , λ where the coefficients (λ 0 1,1 , . . . , λN,d ) are solutions of the least squares problem T ˆ′ , λ ˆ′ ˆ′ {λ 0 1,1 , . . . , λN,d }

= arg min

n X

RdN+1 i=1

(

Yi − λ0 −

d X N X

)2

λJ,α IJ,α (Xiα )

α=1 J=1

.

Simple linear algebra shows that ˆ0 + m(x) ˆ =λ

(2.5)

d X N X

ˆ J,α BJ,α (xα ), λ

α=1 J=1

ˆ0 , λ ˆ 1,1 , . . . , λ ˆ N,d ) are solutions of the least squares problem where (λ ˆ0, λ ˆ 1,1 , . . . , λ ˆ N,d } = arg min {λ T

n X

RdN+1 i=1

(

Yi − λ0 −

d X N X

)2

λJ,α BJ,α (Xiα )

α=1 J=1

;

(2.6) while (2.4) is used for data-analytic implementation, the mathematically equivalent expression (2.5) is convenient for asymptotic analysis. The pilot estimators of each component function and the constant are m ˆ α (xα ) = (2.7)

N X

J=1

ˆ J,α BJ,α (xα ) − n−1 λ

ˆ 0 + n−1 m ˆc =λ

d X n X N X

n X N X

ˆ J,α BJ,α (Xiα ), λ

i=1 J=1

ˆ J,α BJ,α (Xiα ). λ

α=1 i=1 J=1

These pilot estimators are then used to define new pseudo-responses Yˆi1 , which are estimates of the unobservable “oracle” responses Yi1 . Specifically, (2.8)

Yˆi1 = Yi − cˆ −

d X

m ˆ α (Xiα ),

α=2

Yi1 = Yi − c −

d X

α=2

mα (Xiα ),

√ P where cˆ = Y n = n−1 ni=1 Yi , which is a n-consistent estimator of c by the central limit theorem. Next, we define the spline-backfitted kernel estimator of m1 (x1 ) as m ˆ ∗1 (x1 ) based on {Yˆi1 , Xi1 }ni=1 , which attempts to mimic the would-be Nadaraya–Watson estimator m ˜ ∗1 (x1 ) of m1 (x1 ) based on {Yi1 , Xi1 }ni=1 if the unobservable “oracle” responses {Yi1 }ni=1 were available: m ˆ ∗1 (x1 ) = (2.9)

Pn ˆ i=1 Kh (Xi1 − x1 )Yi1 P , n i=1 Kh (Xi1

− x1 )

Pn i=1 Kh (Xi1 − x1 )Yi1 , m ˜ ∗1 (x1 ) = P n i=1 Kh (Xi1

where Yˆi1 and Yi1 are defined in (2.8).

− x1 )

6

L. WANG AND L. YANG

Throughout this paper, on any fixed interval [a, b], we denote the space of second-order smooth functions as C (2) [a, b] = {m|m′′ ∈ C[a, b]} and the class of Lipschitz continuous functions for any fixed constant C > 0 as Lip([a, b], C) = {m||m(x) − m(x′ )| ≤ C|x − x′ |, ∀x, x′ ∈ [a, b]}. Before presenting the main results, we state the following assumptions. (A1) The additive component function m1 (x1 ) ∈ C (2) [0, 1], while there is a constant 0 < C∞ < ∞ such that mβ ∈ Lip([0, 1], C∞ ), ∀β = 2, . . . , d. (A2) There exist positive constants K0 and λ0 such that α(n) ≤ K0 e−λ0 n holds for all n, with the α-mixing coefficients for {Zi = (XTi , εi )}ni=1 defined as (2.10) α(k) =

sup B∈σ{Zs ,s≤t},C∈σ{Zs ,s≥t+k}

|P (B ∩ C) − P (B)P (C)|,

k ≥ 1.

(A3) The noise εi satisfies E(εi |Xi ) = 0, E(ε2i |Xi ) = 1 and E(|εi |2+δ |Xi ) < Mδ for some δ > 1/2 and a finite positive Mδ and σ(x) is continuous on [0, 1]d : 0 < cσ ≤

inf

x∈[0,1]d

σ(x) ≤ sup σ(x) ≤ Cσ < ∞. x∈[0,1]d

(A4) The density function f (x) of X is continuous and 0 < cf ≤

inf

x∈[0,1]d

f (x) ≤ sup f (x) ≤ Cf < ∞. x∈[0,1]d

The marginal densities fα (xα ) of Xα have continuous derivatives on [0, 1] as well as the uniform upper bound Cf and lower bound cf . (A5) The kernel function K ∈ Lip([−1, 1], C∞ ) for some constant Ck > 0, and is bounded, nonnegative, symmetric and supported on [−1, 1]. The bandwidth h ∼ n−1/5 , that is, ch n−1/5 ≤ h ≤ Ch n−1/5 for some positive constants Ch , ch . (A6) The number of interior knots N ∼ n2/5 log n, that is, cN n2/5 log n ≤ N ≤ CN n2/5 log n for some positive constants cN , CN . Remark 2.1. The smoothness assumption of the true component functions is greatly relaxed in our paper and we believe that our assumption (A1) is close to being minimal. By the result of [20], a geometrically ergodic time series is a strongly mixing sequence. Therefore, assumption (A2) is suitable for (1.1) as a time series model under the aforementioned assumptions. Assumptions (A3)–(A5) are typical in the nonparametric smoothing literature; see, for instance, [5]. For (A6), the proof of Theorem 2.1 in the Appendix will make it clear that the number of knots can be of the more general form N ∼ n2/5 N ′ , where the sequence N ′ satisfies N ′ → ∞, n−θ N ′ → 0 for any θ > 0. There is no optimal way to choose N ′ as in the literature. Here we select N to be of barely larger order than n2/5 .

ADDITIVE AUTOREGRESSION MODEL

7

The asymptotic property of the kernel smoother m ˜ ∗1 (x1 ) is well developed. Under assumptions (A1)–(A5), it is straightforward to verify (as in [1]) that sup x1 ∈[h,1−h]



|m ˜ ∗1 (x1 ) − m1 (x1 )| = op (n−2/5 log n), D

nh{m ˜ ∗1 (x1 ) − m1 (x1 ) − b1 (x1 )h2 } → N {0, v12 (x1 )},

where b1 (x1 ) =

Z

u2 K(u) du{m′′1 (x1 )f1 (x1 )/2 + m′1 (x1 )f1′ (x1 )}f1−1 (x1 ),

v12 (x1 ) =

Z

K 2 (u) du E[σ 2 (X1 , . . . , Xd )|X1 = x1 ]f1−1 (x1 ).

(2.11)

The following theorem states that the asymptotic uniform magnitude of ˜ ∗1 (x1 ) is of order op (n−2/5 ), which is the difference between m ˆ ∗1 (x1 ) and m dominated by the asymptotic uniform size of m ˜ ∗1 (x1 ) − m1 (x1 ). As a result, ˜ ∗1 (x1 ). m ˆ ∗1 (x1 ) will have the same asymptotic distribution as m Theorem 2.1. Under assumptions (A1)–(A6), the SPBK estimator m ˆ ∗1 (x1 ) given in (2.9) satisfies sup |m ˆ ∗1 (x1 ) − m ˜ ∗1 (x1 )| = op (n−2/5 ).

x1 ∈[0,1]

Hence with b1 (x1 ) and v12 (x1 ) as defined in (2.11), for any x1 ∈ [h, 1 − h] √ D nh{m ˆ ∗1 (x1 ) − m1 (x1 ) − b1 (x1 )h2 } → N {0, v12 (x1 )}. ˆ ∗α (xα ) similarly constructed as Remark 2.2. Theorem 2.1 holds for m for any α = 2, . . . , d, that is,

m ˆ ∗1 (x1 ),

m ˆ ∗α (xα ) = (2.12)

Pn ˆ i=1 Kh (Xiα − xα )Yiα P , n i=1 Kh (Xi1

Yˆiα = Yi − cˆ −

X

− xα )

m ˆ β (Xiβ ),

1≤β≤d,β6=α

where m ˆ β (Xiβ ), β = 1, . . . , d, are the pilot estimators of each component function given in (2.7). Similar constructions can be based on a local polynomial instead of the Nadaraya–Watson estimator. For more on the properties of local polynomial estimators, in particular, their minimax efficiency, see [5]. Remark 2.3. Compared to the SBE in [18], the variance term v1 (x1 ) is identical to that of SBE and the bias term b1 (x1 ) is much more explicit

8

L. WANG AND L. YANG

than that of SBE, at least when the Nadaraya–Watson smoother is used. Theorem 2.1 can be used to construct asymptotic confidence intervals. Under assumptions (A1)–(A6), for any α ∈ (0, 1), an asymptotic 100(1 − α)% pointwise confidence interval for m1 (x1 ) is Z

K 2 (u) du

(2.13) m ˆ ∗1 (x1 ) − b1 (x1 )h2 ± zα/2 σ ˆ1 (x1 )

1/2 .

{nhfˆ1 (x1 )}1/2 ,

where σ ˆ1 (x1 ) and fˆ1 (x1 ) are estimators of E[σ 2 (X1 , . . . , Xd )|X1 = x1 ] and f1 (x1 ). The following corollary provides the asymptotic distribution of m ˆ ∗ (x). The proof of this corollary is straightforward and therefore omitted. Corollary 2.1. Under assumptions (A1)–(A6) and the additional assumption that mα (xα ) ∈ C (2) [0, 1], α = 2, . . . , d, for any x ∈ [0, 1]d , the SPBK estimator m ˆ ∗α (x), α = 1, . . . , d , is defined as given in (2.12). Let m ˆ ∗ (x) = cˆ +

d X

m ˆ ∗α (xα ),

b(x) =

α=1

Then



d X

bα (xα ),

v 2 (x) =

α=1

d X

vα2 (xα ).

α=1

D

nh{m ˆ ∗ (x) − m(x) − b(x)h2 } → N {0, v 2 (x)}.

3. Decomposition. In this section we introduce some additional notation to shed some light on the ideas behind the proof of Theorem 2.1. For any functions φ, ϕ on [0, 1]d , define the empirical inner product and the empirical norm as hφ, ϕi2,n = n−1

n X

φ(Xi )ϕ(Xi ),

i=1

kφk22,n = n−1

L2 -integrable,

n X

φ2 (Xi ).

i=1

In addition, if the functions φ, ϕ are define the theoretical inner product and its corresponding theoretical L2 norm as hφ, ϕi2 = E{φ(Xi )ϕ(Xi )},

kφk22 = E{φ2 (Xi )}.

The evaluation of spline estimator m(x) ˆ at the n observations results in an n-dimensional vector, m(X ˆ 1 , . . . , Xn ) = {m(X ˆ 1 ), . . . , m(X ˆ n )}T , which can be considered as the projection of Y on the space Gn with respect to the empirical inner product h·, ·i2,n . In general, for any n-dimensional vector Λ ={Λ1 , . . . , Λn }T , we define Pn Λ(x) as the spline function constructed from the projection of Λ on the inner product space (Gn , h·, ·i2,n ), that is, ˆ0 + Pn Λ(x) = λ

d X N X

α=1 J=1

ˆ J,α BJ,α (xα ), λ

9

ADDITIVE AUTOREGRESSION MODEL

ˆ0, λ ˆ 1,1 , . . . , λ ˆ N,d ) given in (2.6). Next, the multivariwith the coefficients (λ ate function Pn Λ(x) is decomposed into the empirically centered additive components Pn,α Λ(xα ), α = 1, . . . , d, and the constant component Pn,c Λ: (3.1)

Pn,α Λ(xα ) = P∗n,α Λ(xα ) − n−1 ˆ 0 + n−1 Pn,c Λ = λ

(3.2)

d X n X

n X

P∗n,α Λ(Xiα ),

i=1

P∗n,α Λ(Xiα ),

α=1 i=1

ˆ J,α BJ,α (xα ). With this new notation, we can where = J=1 λ rewrite the spline estimators m(x), ˆ m ˆ α (xα ), m ˆ c defined in (2.5) and (2.7) as PN

P ∗n,α Λ(xα )

m(x) ˆ = Pn Y(x),

m ˆ α (xα ) = Pn,α Y(xα ),

m ˆ c = Pn,c Y.

Based on the relation Yi = m(Xi ) + σ(Xi )εi , one defines similarly the noiseless spline smoothers and the variance spline components, (3.3) (3.4)

m(x) ˜ = Pn {m(X)}(x), m ˜ c = Pn,c {m(X)},

ε˜(x) = Pn E(x),

m ˜ α (xα ) = Pn,α {m(X)}(xα ),

ε˜α (xα ) = Pn,α E(xα ),

ε˜c = Pn,c E,

E = {σ(Xi )εi }ni=1 .

Due to the linearity of the operawhere the noise vector tors Pn , Pn,c , Pn,α , α = 1, . . . , d, one has the crucial decomposition m(x) ˆ = m(x) ˜ + ε˜(x),

(3.5)

m ˆc =m ˜ c + ε˜c ,

m ˆ α (xα ) = m ˜ α (xα ) + ε˜α (xα ),

for α = 1, . . . , d. As closer examination is needed later for ε˜(x) and ε˜α (xα ), we define in addition ˜ a = {˜ a0 , a ˜1,1 , . . . , a ˜N,d }T as the minimizer of (3.6)

n X i=1

(

σ(Xi )εi − a0 −

d X N X

)2

aJ,α BJ,α (Xiα )

α=1 J=1

.

Then ε˜(x) = ˜ aT B(x), where the vector B(x) and matrix B are defined as (3.7)

B(x) = {1, B1,1 (x1 ), . . . , BN,d (xd )}T ,

B = {B(X1 ), . . . , B(Xn )}T .

a is equal to Thus ˜ a = (BT B)−1 BT E is the solution of (3.6) and specifically ˜ 

1 0dN

(3.8) ×

0TdN hBJ,α , BJ ′ ,α′ i2,n

−1

1 ≤ α, α′ ≤ d, 1 ≤ J, J ′ ≤ N

n 1X σ(Xi )εi n i=1

     

     

n   1X    BJ,α (Xiα )σ(Xi )εi    n  i=1

, 1 ≤ J ≤ N, 1≤α≤d

10

L. WANG AND L. YANG

where 0p is a p-vector with all elements 0. Our main objective is to study the difference between the smoothed back˜ ∗1 (x1 ), both fitted estimator m ˆ ∗1 (x1 ) and the smoothed “oracle” estimator m given in (2.9). From now on, we assume without loss of generality that d = 2 for notational brevity. Making use of the definition of cˆ and the signal and ˆ ∗1 (x1 ) − cˆ + c can be noise decomposition (3.5), the difference m ˜ ∗1 (x1 ) − m treated as the sum of two terms, 1/n (3.9)

Pn

i=1 Kh (Xi1

= where (3.10)

Ψb (x1 ) =

(3.11)

Ψv (x1 ) =

1/n

− x1 ){m ˆ 2 (Xi2 ) − m2 (Xi2 )} K (X i1 − x1 ) i=1 h

Pn

Ψb (x1 ) + Ψv (x1 ) P , 1/n ni=1 Kh (Xi1 − x1 )

n 1X Kh (Xi1 − x1 ){m ˜ 2 (Xi2 ) − m2 (Xi2 )}, n i=1 n 1X Kh (Xi1 − x1 )˜ ε2 (Xi2 ). n i=1

The term Ψb (x1 ) is induced by the bias term m ˜ 2 (Xi2 ) − m2 (Xi2 ), while Ψv (x1 ) is related to the variance term ε˜2 (Xi2 ). Both of these two terms have order op (n−2/5 ) by Propositions 4.1 and 5.1 in the next two sections. Standard theory of kernel density estimation ensures that the denominator P in (3.9), n−1 ni=1 Kh (Xi1 − x1 ), has a positive lower bound for x1 ∈ [0, 1]. The additional nuisance term cˆ − c is clearly of order Op (n−1/2 ) and thus op (n−2/5 ), which needs no further arguments for the proofs. Theorem 2.1 then follows from Propositions 4.1 and 5.1. 4. Bias reduction for Ψb(x1 ). In this section we show that the bias term Ψb (x1 ) of (3.10) is uniformly of order op (n−2/5 ) for x1 ∈ [0, 1]. Proposition 4.1.

Under assumptions (A1), (A2) and (A4)–(A6),

sup |Ψb (x1 )| = Op (n−1/2 + H) = op (n−2/5 ).

x1 ∈[0,1]

Lemma 4.1. such that

Under assumption (A1), there exist functions g1 , g2 ∈ G,

2

X

˜ −g+ h1, gα (Xα )i2,n

m

α=1

where g(x) = c +

P2

α=1 gα (xα )

= Op (n−1/2 + H),

2,n

and m ˜ is defined in (3.3).

11

ADDITIVE AUTOREGRESSION MODEL

Proof. According to the result on page 149 of [3], there is a constant C∞ > 0 such that forPthe function gα ∈ G , kgα − mα k∞ ≤ C∞ H, ˜ − mk2,n ≤ α = 1, 2. Thus kg − mk∞ ≤ 2α=1 kgα − mα k∞ ≤ 2C∞ H and km kg − mk2,n ≤ 2C∞ H. Noting that km ˜ − gk2,n ≤ km ˜ − mk2,n + kg − mk2,n ≤ 4C∞ H, one has (4.1)

|hgα (Xα ), 1i2,n | ≤ |h1, gα (Xα )i2,n − h1, mα (Xα )i2,n | + |h1, mα (Xα )i2,n | ≤ C∞ H + Op (n−1/2 ).

Therefore

2

X

˜ −g+ h1, gα (Xα )i2,n

m

α=1

2,n

≤ km ˜ − gk2,n +

2 X

α=1

|h1, gα (Xα )i2,n |

≤ 6C∞ H + Op (n−1/2 ) = Op (n−1/2 + H).

 Proof of Proposition 4.1. Denote Pn i=1 Kh (Xi1 − x1 ){g2 (Xi2 ) − m2 (Xi2 )} , Pn R1 = sup x1 ∈[0,1] i=1 Kh (Xi1 − x1 )

Pn i=1 Kh (Xi1 − x1 ){m ˜ 2 (Xi2 ) − g2 (Xi2 ) + h1, g2 (X2 )i2,n } Pn ;

R2 = sup

i=1 Kh (Xi1

x1 ∈[0,1]

− x1 )

then supx1 ∈[0,1] |Ψb (x1 )| ≤ |h1, g2 (X2 )i2,n | + R1 + R2 . For R1 , using the result ∗ (x ) = on page 149 of [3], one has R1 ≤ C∞ H. To deal with R2 , let BJ,2 α BJ,2 (xα ) − h1, BJ,2 (Xα )i2,n , for J = 1, . . . , N , α = 1, 2; then one can write m(x) ˜ − g(x) +

2 X

h1, gα (Xα )i2,n = a ˜∗ +

α=1

2 X N X

∗ a ˜∗J,α BJ,α (xα ).

α=1 J=1

P ˜ 2 (Xi2 ) − g2 (Xi2 ) + h1, g2 (X2 )i2,n } can Thus, n−1 ni=1 Kh (Xi1 − x1 ){m PN Pn ∗ (X ), bounded by −1 ˜∗J,2 BJ,2 rewritten as n i2 J=1 a i=1 Kh (Xi1 − x1 ) N X

J=1

|˜ a∗J,2 | ≤

be

n −1 X ∗ Kh (Xi1 − x1 )BJ,2 (Xi2 ) sup n 1≤J≤N

N X

J=1

i=1

(

|˜ a∗J,2 |

) n n −1 X −1 X sup n ωJ (Xi , x1 ) + An,1 n Kh (Xi1 − x1 ) , 1≤J≤N i=1

i=1

(n−1/2 log n)

where An,1 = supJ,α |h1, BJ,α i2,n − h1, BJ,α i2 | = Op as in (A.12) and ωJ (Xi , x1 ) is in (5.5) with mean µωJ (x1 ). By Lemma A.3 n 1 X ωJ (Xi , x1 ) sup sup n x1 ∈[0,1] 1≤J≤N i=1

12

L. WANG AND L. YANG

n 1 X ωJ (Xi , x1 ) − µωJ (x1 ) ≤ sup sup x1 ∈[0,1] 1≤J≤N n i=1

+ sup

sup |µωJ (x1 )|

x1 ∈[0,1] 1≤J≤N

√ = Op (log n/ nh) + Op (H 1/2 ) = Op (H 1/2 ). Therefore, one has n −1 X Kh (Xi1 − x1 ){m ˜ 2 (Xi2 ) − g2 (Xi2 ) + h1, g2 (X2 )i2,n } sup n x1 ∈[0,1] i=1

(

≤ N = Op = Op

N X

(˜ a∗J,2 )2

J=1

)1/2 

Op (H 1/2 ) + Op



log n √ n



= Op

(

N X

(˜ a∗J,2 )2

J=1

)1/2 !

! 2

X

˜ −g+ h1, gα (Xα )i2,n

m

α=1 2

! 2

X

h1, gα (Xα )i2,n ˜ −g+ ,

m

α=1

2,n

where the last step follows from Lemma A.8. Thus, by Lemma 4.1, R2 = Op (n−1/2 + H).

(4.2)

Combining (4.1) and (4.2), one establishes Proposition 4.1.  5. Variance reduction for Ψv (x1 ). In this section we will see that the term Ψv (x1 ) given in (3.11) is uniformly of order op (n−2/5 ). This is the most challenging part to be proved, mostly done in the Appendix. Define an auxiliary entity ε˜∗2 =

(5.1)

N X

a ˜J,2 BJ,2 (x2 ),

J=1

where a ˜J,2 is given in (3.8). Definitions (3.1) and (3.2) imply that ε˜2 (x2 ) defined in (3.4) is simply the empirical centering of ε˜∗2 (x2 ), that is, (5.2)

ε˜2 (x2 ) ≡ ε˜∗2 (x2 ) − n−1

Proposition 5.1.

n X

ε˜∗2 (Xi2 ).

i=1

Under assumptions (A2)–(A6), one has

sup |Ψv (x1 )| = Op (H) = op (n−2/5 ).

x1 ∈[0,1]

13

ADDITIVE AUTOREGRESSION MODEL (2)

(1)

According to (5.2), we can write Ψv (x1 ) = Ψv (x1 ) − Ψv (x1 ), where −1 Ψ(1) v (x1 ) = n

(5.3)

n X l=1

−1 Ψ(2) v (x1 ) = n

(5.4)

n X l=1

in which

ε˜∗2 (Xi2 )

Kh (Xl1 − x1 ) · n−1

n X

ε˜∗2 (Xi2 ),

i=1

Kh (Xl1 − x1 )˜ ε∗2 (Xl2 ),

is given in (5.1). Further one denotes

(5.5) ωJ (Xl , x1 ) = Kh (Xl1 − x1 )BJ,2 (Xl2 ),

µωJ (x1 ) = EωJ (Xl , x1 ).

(2)

By (3.8) and (5.1), Ψv (x1 ) can be rewritten as −1 Ψ(2) v (x1 ) = n

(5.6)

n X N X

a ˜J,2 ωJ (Xl , x1 ).

l=1 J=1

(1)

(2)

The uniform order of Ψv (x1 ) and Ψv (x1 ) is given in the next two lemmas. Lemma 5.1.

(1)

Under assumptions (A2)–(A6), Ψv (x1 ) in (5.3) satisfies 2 sup |Ψ(1) v (x1 )| = Op {N (log n) /n}.

x1 ∈[0,1]

Proof. Based on (5.1), n−1

n X i=1

n N X 1 X BJ,2 (Xi2 ) . ε˜∗2 (Xi2 ) ≤ a ˜J,2 · sup 1≤J≤N n J=1

i=1

Lemma A.6 implies that

( N )1/2 N X X 2 a ˜J,2 ≤ N a ˜J,2 ≤ {N˜ aT ˜ a}1/2 = Op (N n−1/2 log n). J=1

J=1

By (A.12), sup1≤J≤N |n−1 (5.7)

Pn

i=1 BJ,2 (Xi2 )| ≤ An,1

= Op (n−1/2 log n), so

n 1X ε˜∗ (Xi2 ) = Op {N (log n)2 /n}. n i=1 2

By assumption (A5) on the kernel function K, Pstandard theory on kernel density estimation entails that supx1 ∈[0,1] |n−1 nl=1 Kh (Xl1 − x1 )| = Op (1). Thus with (5.7) the lemma follows immediately.  Lemma 5.2.

(2)

Under assumptions (A2)–(A6), Ψv (x1 ) in (5.4) satisfies sup |Ψ(2) v (x1 )| = Op (H).

x1 ∈[0,1]

14

L. WANG AND L. YANG

Lemma 5.2 follows from Lemmas A.10 and A.11. Proposition 5.1 follows from Lemmas 5.1 and 5.2. 6. Simulation example. In this section we carry out two simulation experiments to illustrate the finite-sample behavior of our SPBK estimators. The programming codes are available in both R 2.2.1 and XploRe. For information on XploRe, see [8] or visit www.xplore-stat.de. The number of interior knots N for the spline estimation as in (2.6) will be determined by the sample size n and a tuning constant c. To be precise, N = min([cn2/5 log n] + 1, [(n/2 − 1)d−1 ]), in which [a] denotes the integer part of a. In our simulation study, we have used c = 0.5, 1.0. As seen in Table 1, the choice of c makes little difference, so we always recommend to use c = 0.5 to save computation for massive data sets. The additional constraint that N ≤ (n/2 − 1)d−1 ensures that the number of terms in the linear least squares problem (2.6), 1 + dN , is no greater than n/2, which is necessary when the sample size n is moderate and the dimension d is high. We have obtained for comparison both the SPBK estimator m ˆ ∗α (xα ) and the “oracle” estimator m ˜ ∗α (xα ) by Nadaraya–Watson regression estimation using a quartic kernel and the rule-of-thumb bandwidth. We consider first the accuracy of the estimation, measured in terms of mean average squared error. Then to see that the SPBK estimator m ˆ ∗α (xα ) is as efficient as the “oracle smoother” m ˜ ∗α (xα ), we define the empirical ∗ ˜ ∗α (xα ) as relative efficiency of m ˆ α (xα ) with respect to m (6.1)

 Pn  ˜ ∗α (Xiα ) − mα (Xiα )}2 1/2 i=1 {m eff α = Pn . ∗ 2

ˆ α (Xiα ) − mα (Xiα )} i=1 {m

Theorem 2.1 indicates that the effα should be close to 1 for all α = 1, . . . , d. Figure 2 provides the kernel density estimations of the above empirical efficiencies to observe the convergence. Example 6.1. A time series {Yt }n+3 t=−1999 is generated according to the nonlinear additive autoregression model with sine functions given in [2], π π Yt−2 − 1.0 sin Yt−3 + σ0 εt , Yt = 1.5 sin 2 2 







σ0 = 0.5, 1.0,

T where {εt }n+3 t=−1996 are i.i.d. standard normal errors. Let Xt = {Yt−1 , Yt−2 , Yt−3 }. n+3 T Theorem 3 on page 91 of [4] establishes that {Yt , Xt }t=−1996 is geometrically ergodic. The first 2000 observations are discarded to make {Yt }n+3 t=1 behave like a geometrically α-mixing and strictly stationary time series. The multivariate datum {Yt , XTt }n+3 t=4 then satisfies assumptions (A1) to (A6) except

15

ADDITIVE AUTOREGRESSION MODEL Table 1 Report of Example 6.1 σ0

n

100 200 0.5 500 1000 100 200 1.0 500 1000

c

Component #1

Component #2

Component #3

1st stage

2nd stage

1st stage

2nd stage

1st stage

2nd stage

0.5 1.0 0.5 1.0 0.5 1.0 0.5 1.0

0.1231 0.1278 0.0539 0.0841 0.0263 0.0595 0.0169 0.0364

0.0461 0.0520 0.0125 0.0144 0.0031 0.0044 0.0015 0.0018

0.1476 0.1404 0.0616 0.0839 0.0306 0.0578 0.0210 0.0367

0.0645 0.0690 0.0275 0.0290 0.0107 0.0115 0.0053 0.0054

0.1254 0.1318 0.0577 0.0848 0.0278 0.0605 0.0178 0.0375

0.0681 0.0726 0.0252 0.0285 0.0102 0.0119 0.0054 0.0059

0.5 1.0 0.5 1.0 0.5 1.0 0.5 1.0

0.3008 0.3088 0.1742 0.2899 0.0924 0.2299 0.0616 0.1460

0.0587 0.0586 0.0256 0.0328 0.0065 0.0078 0.0033 0.0034

0.3298 0.3369 0.1783 0.2830 0.1124 0.2305 0.0637 0.1433

0.1427 0.1364 0.0802 0.0824 0.0421 0.0458 0.0270 0.0275

0.3236 0.3062 0.1892 0.3043 0.1004 0.2314 0.0646 0.1429

0.1393 0.1316 0.0701 0.0721 0.0345 0.0362 0.0224 0.0219

Monte Carlo average squared errors (ASE) based on 100 replications.

that instead of being [0, 1], the range of Yt−α , α = 1, 2, 3, needs to be recalibrated. Since we have no exact knowledge of the distribution of the Yt , we have generated many realizations of size 50,000 from which we found that more than 95% of the observations fall in [−2.58, 2.58] ([−3.14, 3.14]) with σ0 = 0.5 (σ0 = 1). We will estimate the functions {mα (xα )}3α=1 for xα ∈ [−2.58, 2.58] ([−3.14, 3.14]) with σ0 = 0.5 (σ0 = 1.0), where      π π m1 (x1 ) ≡ 0, m2 (x2 ) ≡ 1.5 sin x2 − E 1.5 sin Yt , 2 2 (6.2)      π π m3 (x3 ) ≡ −1.0 sin x3 − E −1.0 sin Yt . 2 2 We choose the sample size n to be 100, 200, 500 and 1000. Table 1 lists the average squared error (ASE) of the SPBK estimators and the constant spline pilot estimators from 100 Monte Carlo replications. As expected, increases in sample size reduce ASE for both estimators and across all combinations of c values and noise levels. Table 1 also shows that our SPBK estimators improve upon the spline pilot estimators immensely regardless of noise level and sample size, which implies that our second Nadaraya–Watson smoothing step is not redundant. To have some impression of the actual function estimates, at noise level σ0 = 0.5 with sample size n = 200, 500, we have plotted the oracle estimator

16

L. WANG AND L. YANG

Fig. 1. Plots of the oracle estimator (dotted blue curve), SPBK estimator (solid red curve) and the 95% pointwise confidence intervals constructed by (2.13) (upper and lower dashed red curves) of the function components mα (xα ) in (6.2), α = 1, 2, 3 (solid green curve).

ADDITIVE AUTOREGRESSION MODEL

17

Fig. 2. Kernel density plots of the 100 empirical efficiencies of m ˆ ∗α (xα ) to m ˜ ∗α (xα ), computed according to (6.1): (a) Example 6.1 (α = 2, d = 3); (b) Example 6.1 (α = 3, d = 3); (c) Example 6.2 (α = 1, d = 30); (d) Example 6.2 (α = 2, d = 30).

(thin dotted lines), SPBK estimator m ˆ ∗α (thin solid lines) and their 95% pointwise confidence intervals (upper and lower dashed curves) for the true functions mα (thick solid lines) in Figure 1. The visual impression of the SPBK estimators is rather satisfactory and their performance improves with increasing n.

18

L. WANG AND L. YANG

To see the convergence, Figure 2(a) and (b) plots the kernel density estimation of the 100 empirical efficiencies for α = 2, 3 and sample sizes n = 100, 200, 500 and 1000 at the noise level σ0 = 0.5. The vertical line at efficiency = 1 is the standard line for the comparison of m ˆ ∗α (xα ) and m ˜ ∗α (xα ). One can clearly see that the center of the density plots is going toward the standard line 1.0 with narrower spread when sample size n is increasing, which is confirmative to the result of Theorem 2.1. Example 6.2. d X

Consider the nonlinear additive heteroscedastic model

π Xt−α + σ(X)εt , Yt = sin 2.5 α=1 



i.i.d.

εt ∼ N (0, 1),

in which XTt = {Xt−1 , . . . , Xt−d } is a sequence of i.i.d. standard normal random variables truncated by [−2.5, 2.5] and √ P d 5 − exp( dα=1 |Xt−α |/d) , σ0 = 0.1. · σ(X) = σ0 P 2 5 + exp( dα=1 |Xt−α |/d)

By this choice of σ(X), we ensure that our design is heteroscedastic, and the variance is roughly proportional to dimension d, which is intended to mimic the case when independent copies of the same kind of univariate regression problem are simply added together. For d = 30, we have run 100 replications for sample size n = 500, 1000, 1500 and 2000. The kernel density estimation of the 100 empirical efficiencies for α = 1, 2 is graphically represented respectively in (c) and (d) of Figure 2. Again one sees that with increasing n, the efficiency distribution converges to 1. Lastly, we provide the computing time of Example 6.1 from 100 replications on an ordinary PC with Intel Pentium IV 1.86 GHz processor and 1.0 GB RAM. The average time run by XploRe to generate one sample of size n and compute the SPBK estimator and marginal integration estimator (MIE) is reported in Table 2. The MIEs have been obtained by directly recalling the “intest” in XploRe. As expected, the computing time for MIE is extremely sensitive to sample size due to the fact that it requires n2 least squares in two steps. In contrast, at least for large sample data, our proposed SPBK is thousands of times faster than MIE. Thus our SPBK estimation is feasible and appealing to deal with massive data sets.

APPENDIX Throughout this section, an ≫ bn means limn→∞ bn /an = 0 and an ∼ bn means limn→∞ bn /an = c, where c is some constant.

19

ADDITIVE AUTOREGRESSION MODEL Table 2 The computing time of Example 6.1 (in seconds) Method

n = 100

n = 200

n = 400

n = 1000

10 0.7

76 0.9

628 1.2

10728 4.5

MIE SPBK

A.1. Preliminaries. We first give the Bernstein inequality for a geometric α-mixing sequence, which plays an important role through our proof. Lemma A.1 (Theorem 1.4, page 31 of [1]). Let {ξt , t ∈ Z} be a zero-mean P real-valued α-mixing process, Sn = ni=1 ξi . Suppose that there exists c > 0 such that for i = 1, . . . , n, k = 3, 4, . . . , E|ξi |k ≤ ck−2 k!Eξi2 < +∞; then for each n > 1, integer q ∈ [1, n/2], each ε > 0 and k ≥ 3, qε2 P (|Sn | ≥ nε) ≤ a1 exp − + a2 (k)α 25m22 + 5cε 





n q+1

2k/(2k+1)

,

where α(·) is the α-mixing coefficient defined in (2.10) and n ε2 a1 = 2 + 2 1 + , q 25m22 + 5cε 





a2 (k) = 11n 1 +

2k/(2k+1) 

5mk

ε

,

with mr = max1≤i≤n kξi kr , r ≥ 2. Lemma A.2.

Under assumptions (A4) and (A6), one has:

(i) There exist constants C0 (f ) and C1 (f ) depending on the marginal densities fα (xα ), α = 1, 2, such that C0 (f )H ≤ kbJ,α k22 ≤ C1 (f )H, where bJ,α is given in (2.2). (ii) For any α = 1, 2, |J ′ − J| ≤ 1, E{BJ,α (Xiα )BJ ′ ,α (Xiα )} ∼ 1; in addition E|BJ,α (Xiα )BJ ′ ,α (Xiα )|k ∼ H 1−k ,

k ≥ 1,

where BJ,α and BJ ′ ,α are defined in (2.3). We refer the proof of the above lemma to Lemma A.2 in [26]. Lemma A.3.

Under assumptions (A4)–(A6), for µωJ (x1 ) given in (5.5), sup

sup |µωJ (x1 )| = O(H 1/2 ).

x1 ∈[0,1] 1≤J≤N

20

L. WANG AND L. YANG

Proof. Denote the theoretical norm of IJ,α in (2.1) for α = 1, 2, J = 1, . . . , N + 1, cJ,α = kIJ,α k22 =

(A.1)

Z

2 IJ,α (xα )fα (xα ) dxα .

By definition, |µωJ (x1 )| = |E{Kh (Xl1 − x1 )BJ,2 (Xl2 )}| is bounded by ZZ

Kh (u1 − x1 )|BJ,2 (u2 )|f (u1, u2 ) du1 du2 = (kbJ,2 k2 )−1

Z Z

K(v1 )IJ+1,2 (u2 )f (hv1 + x1 , u2 ) dv1 du2

cJ+1,2 + cJ,2 

1/2Z Z



K(v1 )IJ,2 (u2 )f (hv1 + x1 , u2 ) dv1 du2 .

The boundedness of the joint density f and the Lipschitz continuity of the kernel K will then imply that sup

sup

x1 ∈[0,1] 1≤J≤N

ZZ

K(v1 )IJ,2 (u2 )f (hv1 + x1 , u2 ) dv1 du2 ≤ CK Cf H.

The proof of the lemma is then completed by (i) of Lemma A.2.  Lemma A.4.

Under assumptions (A2) and (A4)–(A6), one has

n √ −1 X {ωJ (Xl , x1 ) − µωJ (x1 )} = Op (log n/ nh), (A.2) sup sup n x1 ∈[0,1] 1≤J≤N l=1 n −1 X ωJ (Xl , x1 ) = Op (H 1/2 ), (A.3) sup sup n x1 ∈[0,1] 1≤J≤N l=1

where ωJ (Xl , x1 ) and µωJ (x1 ) are given in (5.5).

Proof. For simplicity, denote ωJ∗ (Xl , x1 ) = ωJ (Xl , x1 ) − µωJ (x1 ). Then E{ωJ∗ (Xl , x1 )}2 = EωJ2 (Xl , x1 ) − µ2ωJ (x1 ), while EωJ2 (Xl , x1 ) is equal to h−1 kbJ,2 k−2 2

ZZ



K 2 (v1 ) IJ+1,2 (u2 ) +

cJ+1,2 IJ,2 (u2 ) cJ,2



× f (hv1 + x1 , u2 ) dv1 du2 , where cJ,α is given in (A.1). So EωJ2 (Xl , x1 ) ∼ h−1 and EωJ2 (Xl , x1 ) ≫ µ2ωJ (x1 ). Hence for n sufficiently large, E{ωJ∗ (Xl , x1 )}2 = EωJ2 (Xl , x1 ) −

21

ADDITIVE AUTOREGRESSION MODEL

µ2ωJ (x1 ) ≥ c∗ h−1 , for some positive constant c∗ . When r ≥ 3, the rth moment E|ωJ (Xl , x1 )|r is 1 kbJ,2 kr2

ZZ



Khr (u1 − x1 ) IJ+1,2 (u2 ) +



cJ+1,2 cJ,2

r



IJ,2 (u2 ) f (u1, u2 ) du1 du2 .

It is clear that E|ωJ (Xl , x1 )|r ∼ h(1−r) H 1−r/2 . According to Lemma A.3, one has |EωJ (Xl , x1 )|r ≤ CH r/2 , thus E|ωJ (Xl , x1 )|r ≫ |µωJ (x1 )|r . In addition E|ωJ∗ (Xl , x1 )|r ≤



c hH 1/2

(r−2)

r!E|ωJ∗ (Xl , x1 )|2 ,

so there exists c∗ = ch−1 H −1/2 such that E|ωJ∗ (Xl , x1 )|r ≤ c∗r−2 r!E|ωJ∗ (Xl , x1 )|2 , which implies that {ωJ∗ (Xl , x1 )}nl=1 satisfies Cram´er’s condition. By Bernstein’s inequality, for r = 3 ( )    6/7 n 1 X n qρ2n ∗ P +a2 (3)α ωJ (Xl , x1 ) ≥ ρn ≤ a1 exp − n q+1 25m22 + 5c∗ ρn l=1

with m22 ∼ h−1 , m3 = max1≤i≤n kωJ∗ (Xl , x1 )k3 ≤ {C0 (2h−1 )2 }1/3 and log n ρn = ρ √ , nh 

a2 (3) = 11n 1 +

a1 = 2

ρ2n n +2 1+ , q 25m22 + 5c∗ ρn 



6/7 

5m3 ρn

.

n ] > c0 log n, q > c1 n/ log n Observe that 5c∗ ρn = o(1); by taking q such that [ q+1 for some constants c0 , c1 , one has a1 = O(n/q) = O(log n), a2 (3) = o(n2 ). n Assumption (A2) yields that α([ q+1 ])6/7 ≤ Cn−6λ0 c0 /7 . Thus, for n large enough,

) ( n ρ log n 2 1 X ∗ (A.4) P ≤ cn−c2 ρ log n + Cn2−6λ0 c0 /7 . ωJ (Xl , x1 ) > √ n l=1 nh

We divide the interval [0, 1] into Mn ∼ n6 equally spaced intervals with disjoint endpoints 0 = x1,0 < x1,1 < · · · < x1,Mn = 1. Employing the discretization method, n −1 X ∗ ωJ (Xl , x1 ) sup sup n x1 ∈[0,1] 1≤J≤N l=1 n −1 X ∗ ωJ (Xl , x1,k ) sup n (A.5) = sup 0≤k≤Mn 1≤J≤N l=1 n −1 X ∗ ∗ + sup {ωJ (Xl , x1 ) − ωJ (Xl , x1,k )} . sup sup n 1≤k≤Mn 1≤J≤N x1 ∈[x ,x ] 1,k−1

1,k

l=1

22

L. WANG AND L. YANG

By (A.4), there exists a large enough value ρ > 0 such that for any 1 ≤ k ≤ Mn , ) ( n 1 X ∗ −1/2 ωJ (Xl , x1,k ) > ρ(nh) log n ≤ n−10 , P n l=1

1 ≤ J ≤ N,

which implies that ∞ X

n=1

) n log n −1 X ∗ ωJ (Xl , x1,k ) ≥ ρ √ sup n P sup nh 0≤k≤Mn 1≤J≤N (





l=1

Mn X ∞ X N X

n=1 k=1 J=1 ∞ X

n=1

) ( n log n −1 X ∗ ωJ (Xl , x1,k ) ≥ ρ √ P n nh l=1

N Mn n−10 < ∞.

Thus, the Borel–Cantelli lemma entails that n √ −1 X ∗ ωJ (Xl , x1,k ) = Op (log n/ nh). sup sup n 0≤k≤Mn 1≤J≤N

(A.6)

l=1

Employing Lipschitz continuity of the kernel K, for x1 ∈ [x1,k−1 , x1,k ] sup |Kh (Xl1 − x1 ) − Kh (Xl1 − x1,k )| ≤ CK Mn−1 h−2 .

1≤k≤Mn

According to the fact that Mn ∼ n6 , one has n −1 X ∗ ∗ {ωJ (Xl , x1 ) − ωJ (Xl , x1,k )} sup sup sup n 1≤k≤Mn 1≤J≤N x1 ∈[x1,k−1 ,x1,k ] l=1

≤ CK Mn−1 h−2 sup

sup |BJ,2 (x2 )|

x2 ∈[0,1] 1≤J≤N

= O(Mn−1 h−2 H −1/2 ) = o(n−1 ). Thus (A.2) follows instantly from (A.5) and (A.6). As a result of Lemma A.3 and (A.2), (A.3) holds.  Lemma A.5. Under assumptions (A4) and (A6), there exist constants C0 > c0 > 0 such that for any a = (a0 , a1,1 , . . . , aN,1, a1,2 , . . . , aN,2 ), (A.7) c0

a20

+

X J,α

a2J,α

!

2 !

X X

2 2 aJ,α . ≤ a0 + aJ,α BJ,α ≤ C0 a0 +

J,α

2

J,α

23

ADDITIVE AUTOREGRESSION MODEL

We refer the proof of the above lemma to Lemma A.4 in [26]. The next lemma provides the size of ˜ aT ˜ a, where ˜ a is the least squares solution defined by (3.6). Under assumptions (A2)–(A6), ˜ a satisfies

Lemma A.6.

T

(A.8)

˜ a

˜ a=a ˜20

+

N X 2 X

J=1 α=1

a ˜2J,α = Op {N (log n)2 /n}.

aT BT B˜ a=˜ aT (BT E). Thus Proof. According to (3.7) and (3.8), ˜ kB˜ ak22,n

(A.9)

T

=˜ a



1 hBJ,α , BJ ′ ,α′ i2,n



a=˜ ˜ aT (n−1 BT E).

By (A.15), kB˜ ak22 . ak22,n is bounded below in probability by (1 − An )kB˜ According to (A.7), one has (A.10)

2

! N 2

X

2 XX 2 2 2 a ˜J,α . ˜0 + = a ˜ + a ˜J,α ≥ c0 a

0

kB˜ ak22

J=1 α=1

J,α

2

Meanwhile one can show that ˜ aT (n−1 BT E) is bounded above by s

a ˜20

(A.11)

+

X J,α

+

a ˜2J,α

"(

( n X 1X J,α

n i=1

n 1X σ(Xi )εi n i=1

)2

BJ,α (Xiα )σ(Xi )εi

)2 #1/2

.

aT ˜ a is bounded by Combining (A.9), (A.10) and (A.12), the squared norm ˜ −2 c−2 0 (1 − An )

"(

n 1X σ(Xi )εi n i=1

)2

+

( n X 1X J,α

n i=1

BJ,α (Xiα )σ(Xi )εi

)2 #

.

Using the same truncation of ε as in Lemma A.11, the Bernstein inequality entails that n n √ −1 X −1 X σ(Xi )εi + max BJ,α (Xiα )σ(Xi )εi = Op (log n/ n). n n 1≤J≤N,α=1,2 i=1

i=1

Therefore (A.8) holds since An is of order op (1). 

24

L. WANG AND L. YANG

A.2. Empirical approximation of the theoretical inner product. Lemma A.7.

Under assumptions ( A2), (A4) and (A6), one has sup |h1, BJ,α i2,n − h1, BJ,α i2 | = Op (n−1/2 log n),

(A.12)

J,α

(A.13) sup |hBJ,α , BJ ′ ,α i2,n − hBJ,α , BJ ′ ,α i2 | = Op (n−1/2 H −1/2 log n), J,J ′ ,α

sup 1≤J,J ′ ≤N,α6=α′

(A.14)

|hBJ,α , BJ ′ ,α′ i2,n − hBJ,α , BJ ′ ,α′ i2 |

= Op (n−1/2 log n).

We refer the proof of the above lemma to Lemma A.7 in [26]. Lemma A.8. (A.15) An =

Under assumptions (A2), (A4) and (A6), one has

log n |hg1 , g2 i2,n − hg1 , g2 i2 | = Op 1/2 1/2 kg k kg k n H (−1) 1 2 2 2 g1 ,g2 ∈G 

sup



= op (1).

Proof. For every g1 , g2 ∈ G(−1) , one can write N X 2 X

g1 (X1 , X2 ) = a0 +

aJ,α BJ,α (Xα ),

J=1 α=1

g2 (X1 , X2 ) = a′0 +

N X 2 X

a′J ′ ,α′ BJ ′ ,α′ (Xα′ ),

J ′ =1 α′ =1

where for any J, J ′ = 1, . . . , N, α, α′ = 1, 2, aJ,α and aJ ′ ,α′ are real constants. Then X X |hg1 , g2 i2,n − hg1 , g2 i2 | ≤ ha′0 , aJ,α BJ,α i2,n + ha0 , a′J ′ ,α′ BJ ′ ,α′ i2,n ′ ′ J,α

+

J ,α

X

J,J ′ ,α,α′

|aJ,α ||a′J ′ ,α′ ||hBJ,α , BJ ′ ,α′ i2,n

− hBJ,α , BJ ′ ,α′ i2 | = L1 + L2 + L3 . The equivalence of norms given in (A.7) and (A.12) leads to L1 ≤ An,1 · |a′0 | ·

X J,α

|aJ,α |

25

ADDITIVE AUTOREGRESSION MODEL

a′2 0

≤ C0 An,1

+

X

a′2 J,α

J,α

!1/2

≤ CA,1 An,1 kg1 k2 kg2 k2 H

X

a2J,α

J,α

!1/2

N 1/2

−1/2

= Op (n−1/2 H −1/2 log n)kg1 k2 kg2 k2 .

Similarly, L2 = Op (n−1/2 H −1/2 log n)kg1 k2 kg2 k2 . By the Cauchy–Schwarz inequality L3 ≤

X

J,J ′ ,α,α′

|aJ,α ||a′J ′ ,α′ | max(An,2 , An,3 )

≤ CA,2 max(An,2 , An,3 )kg1 k2 kg2 k2

= Op (n−1/2 H −1/2 log n)kg1 k2 kg2 k2 . Therefore, statement (A.15) is established.  A.3. Proof of Lemma 5.2. Denote V as the theoretical inner product of the B spline basis {1, BJ,α (xα ), J = 1, . . . , N, α = 1, 2}, that is, (A.16)

V=



1

02N

0T2N hBJ,α , BJ ′ ,α′ i2



1 ≤ α, α′ ≤ 2, 1 ≤ J, J ′ ≤ N

,

where 0p = {0, . . . , 0}T . Let S be the inverse matrix of V, that is, (A.17)

1 S = V−1 =  0N 0N 

0TN V11 V21

0TN −1 1 V12  =  0N V22 0N 



0TN S11 S21

0TN S12  . S22 

Lemma A.9. Under assumptions (A4) and (A6), for V, S defined in ( A.16), (A.17), there exist constants CV > cV > 0 and CS > cS > 0 such that cV I2N +1 ≤ V ≤ CV I2N +1 ,

cS I2N +1 ≤ S ≤ CS I2N +1 .

We refer the proof of the above lemma to Lemma A.9 in [26]. Next we denote   0 0T2N V∗ = . ′ 02N hBJ,α , BJ ′ ,α′ i2,n − hBJ,α , BJ ′ ,α′ i2 1 ≤ α, α ≤ 2. 1 ≤ J, J ′ ≤ N

Then ˜ a in (3.8) can be rewritten as ˜ a = (BT B)−1 BT E = (A.18) = (V + V∗ )−1





−1 

1 T B B n

1 T B E . n 

1 T B E n



26

L. WANG AND L. YANG

Now define ˆ a = {ˆ a0 , a ˆ1,1 , . . . , a ˆN,1 , a ˆ1,2 , . . . , a ˆN,2 }T as

ˆ a = V−1 (n−1 BT E) = S(n−1 BT E),

(A.19)

(2)

and define a theoretical version of Ψv (x1 ) in (5.6) as −1 ˆ (2) Ψ v (x1 ) = n

(A.20)

n X N X

a ˆJ,2 ωJ (Xi , x1 ).

i=1 J=1

Under assumptions (A2) to (A6), 2 ˆ (2) sup |Ψ(2) v (x1 ) − Ψv (x1 )| = Op {(log n) /(nH)}.

Lemma A.10.

x1 ∈[0,1]

a = (V + V∗ )˜ a, Proof. According to (A.18) and (A.19), one has Vˆ ∗ which implies that V ˜ a = V(ˆ a−˜ a). Using (A.13) and (A.14), one obtains that kV(ˆ a−˜ a)k = kV∗ ˜ ak ≤ Op (n−1/2 H −1 log n)k˜ ak.

According to Lemma A.6, k˜ ak = Op (n−1/2 N 1/2 log n), so one has kV(ˆ a−˜ a)k ≤ Op {(log n)2 n−1 N 3/2 }.

a−˜ a)k = Op {(log n)2 n−1 N 3/2 }. Lemma A.6 then implies By Lemma A.9, k(ˆ (A.21)

q

kˆ ak ≤ k(ˆ a−˜ a)k + k˜ ak = Op (log n N/n).

PN P (2) ˆ (2) aJ,2 − a ˆJ,2 ) n1 nl=1 ωJ (Xl , x1 )|. Additionally, |Ψv (x1 ) − Ψ v (x1 )| = | J=1 (˜ So     √ (log n)2 (log n)2 1/2 (2) ˆ O (H ) = O . N O sup |Ψ(2) (x ) − Ψ (x )| ≤ p p p 1 1 v v nH nH x∈[0,1]

Therefore the lemma follows.  ˆ (2) Lemma A.11. Under assumptions (A2)–(A6), for Ψ v (x1 ) as defined in (A.20), one has

n

N

i=1

J=1



X X ˆ (2) (x1 )| = sup n−1 sup |Ψ Kh (Xi1 − x1 ) a ˆJ,2 BJ,2 (Xi2 ) = Op (H). v

x1 ∈[0,1]

x1 ∈[0,1]

Proof. Note that

ˆ (2) |Ψ v (x1 )| ≤ (A.22)

N X a ˆJ,2 µωJ (x1 )



J=1

N n X X −1 + a ˆJ,2 n {ωJ (Xi , x1 ) − µωJ (x1 )} J=1

i=1

= Q1 (x1 ) + Q2 (x1 ).

27

ADDITIVE AUTOREGRESSION MODEL

By the Cauchy–Schwarz inequality, (A.21) Lemma A.4 and assumptions (A5), (A6),     q √ (log n)3 log n √ (A.23) sup Q2 (x1 ) = Op (log n N/n) N Op √ = Op . n nh x1 ∈[0,1] Using the discretization idea again as in the proof of Lemma A.4, one has sup Q1 (x1 ) x1 ∈[0,1]

(A.24)

N X a ˆJ,2 µωJ (x1,k ) ≤ max 1≤k≤Mn J=1

N N X X a ˆJ,2 µωJ (x1 ) − a ˆJ,2 µωJ (x1,k ) sup + max 1≤k≤Mn x1 ∈[x1,k−1 ,x1,k ] J=1

J=1

= T1 + T2 ,

where Mn ∼ n. Define next X W1 = max n−1 1≤k≤Mn

X

1≤i≤n 1≤J,J ′ ≤N

X W2 = max n−1 1≤k≤Mn

X

1≤i≤n 1≤J,J ′ ≤N

µωJ (x1,k )sJ+N +1,J ′ +1 BJ ′ ,1 (Xi1 )σ(Xi )εi ,

µωJ (x1,k )sJ+N +1,J ′ +N +1 BJ ′ ,2 (Xi2 )σ(Xi )εi .

Then it is clear that T1 ≤ W1 + W2 . Next we will show that W1 = Op (H). 1 < θ0 < 25 ), where δ is the same as in assumption (A3). Let Dn = nθ0 ( 2+δ Define ε− i,D = εi I(|εi | ≤ Dn ),

ε+ i,D = εi I(|εi | > Dn ),

− ε∗i,D = ε− i,D − E(εi,D |Xi ),

Ui,k = µω (x1,k )T S21 {B1,1 (Xi1 ), . . . , B1,N (Xi1 )}T σ(Xi )ε∗i,D .

Denote W1D = max1≤k≤Mn |n−1 ni=1 Ui,k | as the truncated centered version of W1 . Next we show that |W1 − W1D | = Op (H). Note that |W1 − W1D | ≤ Λ1 + Λ2 , where P

n 1 X Λ1 = max 1≤k≤Mn n

X

n 1 X Λ2 = max 1≤k≤Mn n

X

µωJ (x1,k )sJ+N +1,J ′ +1

i=1 1≤J,J ′ ≤N



× BJ ′ ,1 (Xi1 )σ(Xi )E(ε− i,D |Xi ) ,

i=1 1≤J,J ′ ≤N



µωJ (x1,k )sJ+N +1,J ′ +1 BJ ′ ,1 (Xi1 )σ(Xi )ε+ i,D .

28

L. WANG AND L. YANG

Let µω (x1,k ) = {µω1 (x1,k ), . . . , µωN (x1,k )}T ; then

)N ( n X − −1 T BJ ′ ,1 (Xi1 )σ(Xi )E(εi,D |Xi ) Λ1 = max µω (x1,k ) S21 n 1≤k≤Mn ′ i=1

≤ CS max

1≤k≤Mn

(

N X

µ2ωJ (x1,k )

J=1

N X

(

J=1

J =1



)2 )1/2

n X

1 ˜ i )E(ε− |Xi ) BJ,1 (Xi1 )σ(X i,D n i=1

.

−(1+δ)

+ and By assumption (A3), one has |E(ε− i,D |Xi )| = |E(εi,D |Xi )| ≤ Mδ Dn √ P n 1 Lemma A.1 entails that supJ,α | n i=1 BJ,1 (Xi1 )σ(Xi )| = Op (log n/ n). Therefore

Λ1 ≤ Mδ Dn−(1+δ)

max

1≤k≤Mn

"

N X

µ2ωJ (x1,k )

J=1

N X

(

J=1

)2 #1/2

n 1X BJ,1 (Xi1 )σ(Xi ) n i=1

= Op {N Dn−(1+δ) log2 n/n} = Op (H),

where the last step follows from the choice of Dn . Meanwhile ∞ X

n=1

P (|εn | ≥ Dn ) ≤

∞ X E|εn |2+δ

n=1

Dn2+δ

=

∞ X E(E|εn |2+δ |Xn )

n=1

Dn2+δ



∞ X Mδ

n=1

Dn2+δ

< ∞,

since δ > 1/2. By the Borel–Cantelli lemma, one has with probability 1, n−1

n X

X

µωJ (x1,k )sJ+N +1,J ′ +1 BJ ′ ,1 (Xi1 )σ(Xi )ε+ i,D = 0,

i=1 1≤J,J ′ ≤N

for large n. Therefore, one has |W1 − W1D | ≤ Λ1 + Λ2 = Op (H). Next we will show that W1D = Op (H). Note that the variance of Ui,k is µω (x1,k )T S21 var({B1,1 (Xi1 ), . . . , BN,1 (Xi1 )}T σ(Xi )ε∗i,D )S21 µω (x1,k ). By assumption (A3), c2σ V11 ≤ var({B1,1 (Xi1 ), . . . , BN,1 (Xi1 )}T σ(Xi )) ≤ Cσ2 V11 , var(Ui,k ) ∼ µω (x1,k )T S21 V11 S21 µω (x1,k )Vε,D = µω (x1,k )T S21 µω (x1,k )Vε,D ,

where Vε,D = var{ε∗i,D |Xi .}. Let κ(x1,k ) = {µω (x1,k )T µω (x1,k )}1/2 . Then cS c2σ {κ(x1,k )}2 Vε,D ≤ var(Ui,k ) ≤ CS Cσ2 {κ(x1,k )}2 Vε,D .

Simple calculation leads to

E|Ui,k |r ≤ {c0 κ(x1,k )Dn H −1/2 }r−2 r!E|Ui,k |2 < +∞

for r ≥ 3, so {Ui,k }ni=1 satisfies Cram´er’s condition with Cram´er’s constant c∗ = c0 κ(x1,k )Dn H −1/2 ; hence by Bernstein’s inequality ( ) 6/7    n n qρ2n −1 X , Ui,k ≥ ρn ≤ a1 exp − + a (3)α P n 2 q+1 25m22 + 5c∗ ρn l=1

29

ADDITIVE AUTOREGRESSION MODEL

where m22 ∼ {κ(x1,k )}2 Vε,D , m3 ≤ {c{κ(x1,k )}3 H −1/2 Dn Vε,D }1/3 , ρ2n n , a1 = 2 +2 1+ 2 q 25m2 + 5c∗ ρn 

ρn = ρH,





a2 (3) = 11n 1+

6/7 

5m3 ρn

.

Similar arguments as in Lemma A.4 yield that as n → ∞

qρ2n ρn2/5 qρn → +∞. = ∼ c∗ 25m22 + 5c∗ ρn c0 (log n)5/2 Dn

Taking c0 , ρ large enough, one has ( n ) 1 X P Ui,k > ρH ≤ c log n exp{−c2 ρ2 log n} + Cn2−6λ0 c0 /7 ≤ n−3 , n i=1

for n large enough. Hence ∞ X

n=1

P (|W1D | ≥ ρH) =

Mn ∞ X X

n=1 k=1

! n ∞ 1 X X Ui,k ≥ ρH ≤ Mn n−3 < ∞. P n i=1

n=1

Thus, the Borel–Cantelli lemma entails that W1D = Op (H). Noting the fact that |W1 − W1D | = Op (H), one has that W1 = Op (H). Similarly W2 = Op (H). Thus (A.25)

T1 ≤ W1 + W2 = Op (H).

Employing the Cauchy–Schwarz inequality and Lipschitz continuity of the kernel K, assumption (A5), Lemma A.2(ii) and (A.21) lead to (A.26)

T2 ≤ Op



 PN

N 1/2 log n { n1/2

2 1/2 J=1 EBJ,2 (X12 )} h2 Mn

= op (n−1/2 ).

Combining (A.24), (A.25) and (A.26), one has supx1 ∈[0,1] Q1 (x1 ) = Op (H). The desired result follows from (A.23) and (A.23).  Acknowledgments. This work is part of the first author’s dissertation under the supervision of the second author. The authors are very grateful to the Editor, Jianqing Fan, and three anonymous referees for their helpful comments. REFERENCES [1] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes, 2nd ed. Lecture Notes in Statist. 110. Springer, New York. MR1640691 [2] Chen, R. and Tsay, R. S. (1993). Nonlinear additive ARX models. J. Amer. Statist. Assoc. 88 955–967. [3] de Boor, C. (2001). A Practical Guide to Splines, rev. ed. Springer, New York. MR1900298

30

L. WANG AND L. YANG

[4] Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statist. 85. Springer, New York. MR1312160 [5] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, London. MR1383587 ¨ rdle, W. and Mammen, E. (1998). Direct estimation of low-dimensional [6] Fan, J., Ha components in additive models. Ann. Statist. 26 943–971. MR1635422 [7] Fan, J. and Jiang, J. (2005). Nonparametric inferences for additive models. J. Amer. Statist. Assoc. 100 890–907. MR2201017 ¨ rdle, W., Hla ´ vka, Z. and Klinke, S. (2000). XploRe Application Guide. [8] Ha Springer, Berlin. [9] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London. MR1082147 [10] Hengartner, N. W. and Sperlich, S. (2005). Rate optimal estimation with the integration method in the presence of many covariates. J. Multivariate Anal. 95 246–272. MR2170397 ¨ , J. and Mammen, E. (2006). Optimal estimation in addi[11] Horowitz, J., Klemela tive regression. Bernoulli 12 271–298. MR2218556 [12] Horowitz, J. and Mammen, E. (2004). Nonparametric estimation of an additive model with a link function. Ann. Statist. 32 2412–2443. MR2153990 [13] Huang, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. Ann. Statist. 26 242–272. MR1611780 [14] Huang, J. Z. and Yang, L. (2004). Identification of nonlinear additive autoregressive models. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 463–477. MR2062388 [15] Linton, O. B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika 84 469–473. MR1467061 ¨ rdle, W. (1996). Estimation of additive regression models [16] Linton, O. B. and Ha with known links. Biometrika 83 529–540. MR1423873 [17] Linton, O. B. and Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82 93–100. MR1332841 [18] Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann. Statist. 27 1443–1490. MR1742496 [19] Opsomer, J. D. and Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. Ann. Statist. 25 186–211. MR1429922 [20] Pham, D. T. (1986). The mixing property of bilinear and generalized random coefficient autoregressive models. Stochastic Process. Appl. 23 291–300. MR0876051 [21] Robinson, P. M. (1983). Nonparametric estimators for time series. J. Time Ser. Anal. 4 185–207. MR0732897 [22] Sperlich, S., Tjøstheim, D. and Yang, L. (2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory 18 197–251. MR1891823 [23] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705. MR0790566 [24] Stone, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation (with discussion). Ann. Statist. 22 118–184. MR1272079 [25] Tjøstheim, D. and Auestad, B. (1994). Nonparametric identification of nonlinear time series: Projections. J. Amer. Statist. Assoc. 89 1398–1409. MR1310230

ADDITIVE AUTOREGRESSION MODEL

31

[26] Wang, L. and Yang, L. (2006). Spline-backfitted kernel smoothing of nonlinear additive autoregression model. Manuscript. Available at www.arxiv.org/abs/math/0612677. [27] Xue, L. and Yang, L. (2006). Estimation of semiparametric additive coefficient model. J. Statist. Plann. Inference 136 2506–2534. MR2279819 [28] Xue, L. and Yang, L. (2006). Additive coefficient modeling via polynomial spline. Statist. Sinica 16 1423–1446. MR2327498 ¨ rdle, W. and Nielsen, J. P. (1999). Nonparametric autoregression [29] Yang, L., Ha with multiplicative volatility and additive mean. J. Time Ser. Anal. 20 579–604. MR1720162 ¨ rdle, W. (2003). Derivative estimation and test[30] Yang, L., Sperlich, S. and Ha ing in generalized additive models. J. Statist. Plann. Inference 115 521–542. MR1985882 Department of Statistics University of Georgia Athens, Georgia 30602 USA E-mail: [email protected]

Department of Statistics And Probability Michigan State University East Lansing, Michigan 48824 USA E-mail: [email protected]

Suggest Documents