Second-order approximation for adaptive regression estimators

2 downloads 38299 Views 372KB Size Report
LSE has developed LSE Research Online so that users may access research ... Contact LSE Research Online at: Library. ...... cal Association 70, 428–434+.
LSE Research Online Article (refereed)

Oliver B. Linton and Zhijie Xiao Second-order approximation for adaptive regression estimators

Originally published in Econometric theory, 17 (5). pp. 984-1024 © 2001 Cambridge University Press. You may cite this version as: Linton, Oliver B. & Xiao, Zhijie (2001). Second-order approximation for adaptive regression estimators [online]. London: LSE Research Online. Available at: http://eprints.lse.ac.uk/archive/00000317 Available online: July 2005 LSE has developed LSE Research Online so that users may access research output of the School. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LSE Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE Research Online website.

http://eprints.lse.ac.uk Contact LSE Research Online at: [email protected]

Econometric Theory, 17, 2001, 984–1024+ Printed in the United States of America+

SECOND-ORDER APPROXIMATION FOR ADAPTIVE REGRESSION ESTIMATORS OL I V E R LI N T O N London School of Economics and Yale University

ZH I J I E XI A O University of Illinois at Urbana-Champaign

We derive asymptotic expansions for semiparametric adaptive regression estimators+ In particular, we derive the asymptotic distribution of the second-order effect of an adaptive estimator in a linear regression whose error density is of unknown functional form+ We then show how the choice of smoothing parameters influences the estimator through higher order terms+ A method of bandwidth selection is defined by minimizing the second-order mean squared error+ We examine both independent and time series regressors; we also extend our results to a t-statistic+ Monte Carlo simulations confirm the second order theory and the usefulness of the bandwidth selection method+

1. INTRODUCTION In estimation problems where a Gaussian assumption on the underlying distribution of the data is inappropriate, the so-called adaptive estimator provides an alternative to the conventional Gaussian maximum likelihood estimator ~MLE! by replacing the Gaussian density function with a nonparametric estimate of the score function of the log-likelihood+ It has been proven that an efficiency gain over the MLE can be achieved by adaptive estimators in many econometric models+ Adaptive estimation was first studied by Stein ~1956!, who considered the problem of estimating and testing hypotheses about a parameter in the presence of an infinite dimensional “nuisance” parameter+ Beran ~1974! and Stone ~1975! considered adaptive estimation in the symmetric location model, whereas Bickel ~1982! extended this to linear regression and other models+ This We thank Doug Hodgson, Joel Horowitz, Peter Phillips, Roger Koenker, Tom Rothenberg and two referees for helpful comments+ Financial support from the National Science Foundation is gratefully acknowledged+ Gauss programs are available from the second author upon request+ Address correspondence to: Oliver Linton, Department of Economics, London School of Economics, Houghton Street, London WC2A 2AE, UK; e-mail: O+Linton@lse+ac+uk; Zhijie Xiao, Department of Economics, University of Illinois at Urbana-Champaign, 1206 South Sixth Street, Champaign, IL 61820, USA; e-mail: zxiao@uiuc+edu+

984

© 2001 Cambridge University Press

0266-4666001 $9+50

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

985

latter work provided a starting point for much future work in this area+ Manski ~1984! studied adaptive estimation in nonlinear models, Kreiss ~1987! considered stationary and invertible autoregressive moving average ~ARMA! models, Steigerwald ~1992! studied linear regression with ARMA error, and Linton ~1993! considered the case of linear regression with autoregressive conditional heteroskedasticity ~ARCH!+ Jeganathan ~1995! extended the theory to nonstationary models with i+i+d+ error, and Hodgson ~1998! further studied this case but with ARMA errors+ Much of this literature has been devoted to first-order theoretical results and has used devices from mathematical statistics, such as sample splitting and discretization, that do not appeal to practitioners+ As we argued elsewhere ~Linton, 1995!, the first-order asymptotics by no means always provide a good approximation to the sampling behavior of the semiparametric estimators; for confirmation of this see the simulation evidence in Hsieh and Manski ~1987!+ Furthermore, computing the semiparametric estimates requires the selection of a smoothing parameter h, called the bandwidth, that determines the effective degree of parameterization taken by the nuisance function for given sample size n+ Although the first-order approximation does not reflect the choice of h~n!, the finite sample performance of the estimators depends greatly on the choice of bandwidth+ We shall use higher order expansions as a means to solve some of the problems presented by the first-order theory+ Higher order expansions have a long history of application in econometrics ~see, among others, Sargan, 1976; Phillips, 1978; Rothenberg, 1984!+ Applications of higher order approximations to bandwidth choice in semiparametric models have been studied by Härdle, Hart, Marron, and Tsybakov ~1992!, Linton ~1995, 1996, 1998!, Linton and Xiao ~1997!, Nishiyama and Robinson ~1997!, Powell and Stoker ~1996!, and Xiao and Phillips ~1996! among others+ In this paper, we derive higher order expansions for an adaptive estimator in linear regression+ We do not require the error to be symmetrically distributed+ In fact, we show how choices of smoothing parameters influence the semiparametric adaptive estimator by deriving the asymptotic distribution of the second-order effect+ This distribution reflects the bandwidth and kernel used and suggests a method of bandwidth choice+ We develop rule-of-thumb plug-in bandwidth selection methods for the estimation problem that are convenient to implement and reasonably insensitive to the true underlying density+ We also extend the analysis to the t-ratio and to the case of regressors that are not strictly exogenous+ The adaptive estimator is quite promising relative to other semiparametric procedures because the nonparametric estimation only involves one dimensional smoothing and so does not suffer from the curse of dimensionality+ In this case, the kernel procedures we employ can work well provided they are implemented appropriately+ The main purpose of our asymptotic approximations is to show how the semiparametric adaptive estimator is affected by the smoothing parameters to a higher order and to provide the tools to effect good implementation+ Throughout we allow the error

986

OLIVER LINTON AND ZHIJIE XIAO

density to be zero at the boundary, which is required to make the situation “regular+” This necessitates the use of a trimming function+ We use the smooth trimming adopted in Andrews ~1995! and Ai ~1997!+ The paper is organized as follows+ The model and estimators are described in the next section+ Results of the expansion are given in Section 3, and the details of these expansions can be found in the Appendix+ In Section 4 we give some extensions to dependent regressors and t-statistics+ Bandwidth selection is discussed in Section 5+ In Section 6 we provide a small Monte Carlo experiment that evaluates the effectiveness of the second-order approximation+ Section 7 concludes+ For notation, we use f ~ j ! to denote the j th derivative of a function f and for a function g of functions a 1 , + + + , a d , define the linear differential operator d

Dq g~a 1 , + + + , a d !~ x! 5

]g

~q! ~a 1 , + + + , a d !~ x!{a j ~ x!+ ( ]a j51 j

We also let 7A7 denote the Euclidean norm of the array A 5 ~a i1 , + + + , is ! defined as 7A7 5 ~ (a i21 , + + + , is ! 102+ 2. THE MODEL AND ESTIMATOR We consider the problem of estimating b [ R p in the following regression model: yi 5 b T x i 1 «i ,

i 5 1, + + + , n,

(1)

where x i and «i satisfy the following assumptions+ A1+ «i and x i are independent and identically distributed ~i+i+d+! random variables and are mutually independent+ Furthermore, E~ x i ! 5 0, Vx 5 E~ x i x iT ! is positive definite, and for some h . 0 we have E @7 x7 41h # , `+ t a# S , where A2+ «i has Lebesgue density f ~«!, which has support supp~ f ! 5 @ a, at and aS are unknown boundary parameters that satisfy 2` , at , aS , ` and f ~«! . 0 on ~ a, t a!+ S A3+ The density function f ~{! has uniformly bounded continuous partial det a!; S i+e+, rivatives up to the order r, and f ~r! ~«! is Lipschitz continuous on ~ a, there exists a constant c such that for all «, « * [ ~ a, t a!, S we have 6 f ~r! ~«! 2 f ~r! ~« * !6 # c6« 2 « * 6+ Because we do not impose any additional restrictions on the density function of «, we cannot separately identify an intercept+ Therefore, we shall absorb the intercept into the error density ~which can have arbitrary mean! and assume for convenience that the regressors are mean zero in A1+ Our other assumptions on the covariates are very weak+

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

987

In A2 we assume that f ~«! has bounded support+ Even though the secondorder analysis on adaptive regression estimators can be extended to the case with unbounded support, our discussion in this paper is confined to the bounded support case, partly for simplification and partly for some technical reasons+ We discuss this point further later ~see Remark 5 in Section 3!+ When f is strictly positive on @ a, t a# S , the situation is nonregular+ In some cases, this can lead to inconsistency of solutions of the likelihood score equations but perhaps to the potential for improved rates of convergence for other estimators+ Therefore, we shall make an additional assumption+ A4+ f ~«! and its first ® 2 1 derivatives vanish at at and a,S whereas f ~®! ~ a! t Þ S Þ 0 for some integer ® with 2 # ® # r+ 0 and f ~®! ~ a! Assumption A4 guarantees that the density f vanishes at the boundary at a sufficiently fast rate so that the properties of regular estimation hold+ In this case, one cannot estimate b at a rate better than root-n+ See Akahira and Takeuchi ~1995! for a discussion of this issue+ This assumption also implies that the Fisher information I~ f ! 5

E

, ' ~«! 2 f ~«!d«,

where ,~«! 5 log f ~«!, exists as do various other integrals used subsequently+ In the sequel we shall let b0 be the true parameter value+ If the density f were known, the MLE of b0 , denoted b,N could be obtained by setting the following average score function, s~ b! 5 s~ b; f ! 5 52

1 n

n

(

i51

1 n

n

( si ~ b; f !

i51

f ' ~«i ~ b!! xi , f ~«i ~ b!!

(2)

equal to zero, assuming an interior solution of course+ Here, for any parameter value b, «i ~ b! 5 yi 2 b T x i + This method works well in regular situations but can lead to inconsistent estimates in some cases of interest to us ~for a discussion of this issue, see Bickel, 1975!+ An alternative method is given by taking one Newton–Raphson step from a preliminary root-n consistent estimator b+D That is, let E b;D f !21 s~ b;D f !, bN NR 5 bD 1 I~ where IE is a consistent estimate of the information matrix I 5 Vx I ~ f !+ For example, E b;D f ! 5 2 I~

1 n

n

( x i x iT

i51

F

G

f ' ~«i ~ b!! f '' ~«i ~ b!! D D 2 2 + f ~«i ~ b!! D f ~«i ~ b!! D 2

988

OLIVER LINTON AND ZHIJIE XIAO

This method has been investigated in Rothenberg and Leenders ~1964! and Bickel ~1975!+ It is first-order equivalent to the MLE when the MLE is consistent, and it has the added advantage of working in certain nonregular cases where the MLE is inconsistent+ In econometrics it is common to refer to the estimator as linearized maximum likelihood or two-step, whereas the statistical literature uses one-step+ In the regression case we study, there are many preliminary root-n consistent estimators: e+g+, the ordinary least squares estimator+ When f is unknown, we have to replace it by a nonparametric estimate f,D say, and we thereby obtain the estimated average score function s~ I b! 5 s~ b; f D ! 5

n

1

1

n

( sI i ~ b; fD ! 5 2 n i51 ( n i51

f D ' ~«i ~ b!! xi + f D ~«i ~ b!!

(3)

The semiparametric profile likelihood estimator bZ PL sets s~ I b! equal to zero+ Similar to the case where f is known, a one-step Newton–Raphson estimator of b can be obtained from a preliminary root-n consistent estimator b,D E b;D f D !21 s~ b;D f D !, bZ NR 5 bD 1 I~ where E b;D f D ! 5 I~

1 n

n

( x i x iT

i51

F

(4)

G

fiD ' ~«i ~ b!! fiD '' ~«i ~ b!! D D 2 2 + fiD ~«i ~ b!! D fiD ~«i ~ b!! D 2

We shall work with this one-step estimator+ An important ingredient in our estimator is the error density estimate f+D We consider the following leave-one-out kernel estimates of f ~t ! and f ' ~t ! at the point t 5 «i ~ b! using the residuals «j ~ b! as data: fiD ~«i ~ b!! 5 5 fiD ' ~«i ~ b!! 5 5

1 ~n 2 1!h n 1 ~n 2 1!

S

«i ~ b! 2 «j ~ b! hn

D

Kh ~«i ~ b! 2 «j ~ b!!, ( jÞi n

1 ~n 2 1!h n2 1 ~n 2 1!

K ( jÞi

K' ( jÞi

S

«i ~ b! 2 «j ~ b! hn

D

Kh' ~«i ~ b! 2 «j ~ b!!, ( jÞi n

where Khn ~t ! 5 K~t0h n !0h n and Kh' n ~t ! 5 K ' ~t0h n !0h n2 + Here, K~{! is the kernel function whose properties are given in Assumption A5, which follows, whereas h n is the bandwidth parameter+ In principle, we may consider more general devices that use different bandwidth parameters in the estimation of f and f ' + However, the additional smoothing parameter brings substantial complication to the higher order analysis, and we consider the simple case where the same h n is

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

989

used in estimating f and f ' ~also see the subsequent discussion on trimming!+ Asymptotic results for the bias and variance in these nonparametric density estimates are given in the Appendix+ The estimator can be computed using only matrix computations, which makes it very fast+ As in some other applications of kernel regression estimators, the random denominator fiD can be small and may cause technical difficulty+ For this reason, we trim out small fiD as do Bickel ~1982! and Manski ~1984! ~for a more recent discussion, also see Ai, 1997! + However, trimming brings an additional parameter into the estimation and complicates the higher order expansions+ We consider the following smoothed trimming ~Andrews, 1995; Ai, 1997!+ Let g~{! be a density function that has support @0,1# , g~0! 5 g~1! 5 0, and let gb ~ x! 5

S D

1 x g 21 , b b

where b is the trimming parameter; then gb ~ x! has support on @b,2b# + Letting Gb ~ x! 5

E

x

2`

gb ~z!dz,

we have

5E

x,b

0,

Gb ~ x! 5

x

2`

b # x # 2b

gb ~z!dz,

x . 2b+

1,

For example, if we consider the following Beta density g~z! 5 B~k 1 1!21 z k ~1 2 z! k,

0 # z # 1,

for some integer k, where B~k! is the beta function defined by B~k! 5 G~k! 20G~2k!,

G~k! is the Euler gamma function,

then it can be verified that for b # x # 2b Gb ~ x! 5 B~k 1 1!21

H

k ~k!! 2 ~k!! 2 2( ~2k 1 1!! l50 ~k 2 l !!~k 1 l 1 1!!

S

3 12

x2b b

D J

S D x2b b

k2l

k1l11

,

(5)

which is a ~2k 1 1!th order polynomial in ~ x 2 b!0b+ The function Gb ~ x! is continuously differentiable on @0,1# + This property is important because it allows

990

OLIVER LINTON AND ZHIJIE XIAO

us to use standard Taylor series arguments, whereas indicator function trimming would preclude this+ We now estimate the average score function ~3! by sI n ~ b! D 52

1 n

n

( i51

fiD ' x i Gb ~ fiD !, fiD

(6)

and the information matrix by IE n ~ b;D f D ! 5 2

1 n

n

( x i x iT i51

F S DG fiD '' 2 fiD

fiD ' fiD

2

Gb ~ fiD !,

where fiD 5 fiD ~«i ~ b!! D and fiD ' 5 fiD ' ~«i ~ b!!+ D Thus we estimate b by the following one-step Newton–Raphson estimator: bZ 5 bD 1 IE n ~ b;D f D !21 sI n ~ b;D f D !+

(7)

We study the higher order property of the adaptive estimator bZ given in ~7!+ We make the following assumptions on the kernel function K~{! and the trimming parameter b+ A5+ The kernel K has support @21,1# and is symmetric about zero and satisfies *K~u!du 5 1+ It is twice differentiable on its support and K '' is Lipschitz continuous, whereas K ' ~0! 5 0+ Furthermore, there exists an even positive integer q with 2 , q # r 2 3 such that

E

u j K~u!du 5 0,

j 5 1, + + + , q 2 1,

and

E

u q K~u!du Þ 0+

A6+ The trimming function Gb ~ x! is ~L 1 1!th order differentiable for some L . 4+ In addition, h r 0 and nh 50log n r `, b r 0, and h0b r 0 as n r `+ These assumptions are similar to those used in the existing literature+ Note that because b is of larger magnitude than h, our estimator will not suffer from boundary bias ~for a discussion of boundary issues, see Müller 1988, pp+ 32–36!+1 Define for any function K and integer q m q ~K ! 5

~21! q q!

E

u q K~u!du;

7K72 5

HE

J

102

6K~u!6 2 du

+

These notations will be used in the following sections for higher order asymptotics+ 3. THE EXPANSION Making a Taylor series expansion of sI n ~ b! D about sI n ~ b0 ! and collecting terms, we obtain

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

991

!n ~ bZ 2 b0 ! 5 IE n ~ b0 !21 !n sI n ~ b0 ! 1 $IE n ~ b! D 21 2 IE n ~ b0 !21 %!n sI n ~ b0 ! 1 $I 1 IE n ~ b! D 21 sI n' ~ b * !%!n ~ bD 2 b0 !,

(8)

where b * is an intermediate point, sI n ~ b! 5 sI n ~ b, f D !, and IE n ~ b! 5 IE n ~ b, f D !, whereas sI n' ~ b! 5 ]sI n ~ b!0]b+ A first-order analysis shows that !n sI n ~ b0 ! 5 Op ~1! and IE n ~ b0 !21 5 Op ~1!+ Note that in the parametric case, both terms in ~8! would be Op ~n2102 !+ In our case, this is true apart from some “trimming terms,” which turn out to be of smaller order than our leading trimming terms ~see the discussion that follows!+ Specifically, we obtain in the Appendix that

!n ~ bZ 2 b0 ! 5 IE n ~ b0 !21 !n sI n ~ b0 ! 1 TI 1 Op ~n2102 !,

(9)

where TI is a small trimming term+ We next derive an approximation to IE n ~ b0 !21 !n sI n ~ b0 !+ The random variables D H 5 IE n ~ b0 ! 2 I and D s 5 !n $ sI n ~ b0 ! 2 s~ b0 !%, which are functions of nonparametric estimates of the residual densities and their derivatives, are now the two key elements in the expansion+ Both quantities can be decomposed into the sum of different terms that are functions of the bias and variance effects in estimating the densities f and their derivatives+ We show in the Appendix that D s 5 Ts 1 Op ~h q ! 1 Op

S! D S! D

D H 5 TH 1 Op ~h q ! 1 Op

1

(10)

,

nh n3 1

n 2 h n5

1 Op ~n2102 !,

(11)

where Ts and TH are op ~1! trimming effects ~and note that TI from ~9! is of smaller order than Ts and TH !+ These terms depend on the parameter b and on the boundary behavior of the densities f, g+ By a geometric series expansion of IE n ~ b0 !21 about I 21 we obtain IE n ~ b0 !21 5 In21 2 In21 D H In21 1 op ~dn ! 5 I 21 2 I 21 D H I 21 1 op ~dn !, where dn 5 max$h nq ,1Y nh n3 % is larger than n2102+ Here, In 5 2s ' ~ b0 !, and by the central limit theorem for independent random variables we obtain that In 5 I 1 Op ~n2102 !+ This yields that

!

!n ~ bZ 2 b0 ! 5 I 21 !n s~ b0 ! 1 I 21 D H I 21 !n s~ b0 ! 1 I 21 D s 1 op ~dn !+ (12) We then obtain the following stochastic expansion of the standardized estimator:

!n ~ bZ 2 b0 ! 5 X 0 1 T 1 h nqB 1

1

!nh

3 n

V 1 op ~dn !,

(13)

where X 0 , B, and V are zero mean Op ~1! quantities and T is an op ~1! quantity+ These quantities are defined in the Appendix+ Here, X 0 5 I 21 !n s~ b0 ! is the

992

OLIVER LINTON AND ZHIJIE XIAO

leading term, T is the trimming effect, B reflects the bias effect in nonparametric density estimation, and V reflects the variance effect in the nonparametric estimation+ The random variables X 0 , T, and B are all mean zero and sums of i+i+d+ random variables, whereas V is a degenerate U-statistic+ Note that !n ~ bN 2 b0 ! 5 X0 1 Op ~n2102 !, where bN is the infeasible estimator of b, so that

!n ~ bZ 2 b! N 5 T 1 h nqB 1

V

!nh

3 n

1 op ~dn !+

The trimming effect T is an op ~1! quantity whose magnitude is determined jointly by the trimming parameter b and the rate that the density f approaches zero on the boundary, but does not to first order depend on the bandwidth parameter h n in the nonparametric density estimation+ We are now ready to state the main result of the paper+ THEOREM 1+ Suppose that Assumptions A1–A6 hold and denote t 5 T 1 h nqB 1 VY nh n3 . We have the following results+

!

(1a) If nh 2q13 r `, then h 2q ~t 2 T ! n N~0, S 1 !, where S 1 5 m 2q ~K !@I 21 M1 I 21 1 I 21 M3 I 21 M3 I 21 1 2I 21 M3 I 21 M2 I 21 # with I 5 Vx I ~ f ! and M1 5 Vx var@Dq , ~1! ~«!# , M2 5 Vx cov@Dq , ~1! ~«!, , ~1! ~«!# , M3 5 2Vx E @Dq , ~2! ~«!#. (1b) If nh 2q13 r 0, then n 102 h 302 ~t 2 T ! n N~0, S 2 !, where S 2 5 7K ' 7 22 I 21 S1 I 21 with S1 5 Vx ~ aS 2 a!. t (1c) If h n 5 gn210~2q13! for some g with 0 , g , `, n q0~2q13! ~t 2 T ! n N~0, S!, where S 5 g 2q S 1 1 g 23 S 2 . (2) Finally, b 2~®21!02® T n N~0, S!, u

(14)

where Su 5 w~®, a, t a,S f !Vx and w~®, a, t a,S f ! depends on both the trimming function and the boundary behavior of the density. In particular, if we use the trimming function (5), w~®, a, t a,S f ! 5 c~®!@ f ~®! ~ a! t 10® 1 f ~®! ~ a! S 10® # , where c~®! is a coefficient depending on ® and G.

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

993

Remarks+ 1+ To minimize the “smoothing effect,” we should set h n 5 gn210~2q13! so that the second-order bias and variance effects are balanced, i+e+, the “in probability” magnitude of t 2 T is minimized+ Note that Dq , ~1! 5

Dq , ~2! 5

f ~q11! f f ~q12! f

2

2

f ~q! f ~1! f2 f ~q! f ~2! f

2

,

22

f ~1! f

F

f ~q11! f

2

f ~q! f ~1! f2

G

+

The terms Mj , j 5 1,2,3, arise from the bias of the nonparametric estimates f D and f D ', whereas the term S1 comes from the variance of f D ' + Both terms are positive, and the overall effect is to increase variance above the first-order limiting variance of b+Z 2+ The magnitude of the variance of the trimming effect is O~b ~®21!0® !, which increases with b and which is of larger order than h n2q under our assumptions+ The limiting variance of the trimming effect is given by ~14!; this depends on both the trimming function and the boundary behavior of the density function in a complicated manner+ Nevertheless, the limiting variance of the trimming effect can be consistently estimated without knowledge of the parameter ®; specifically, b ~®21!0® w can be estimated by 1

n

( n i51

HF G f D ' ~ «I i ! f D ~ «I i !

2

J

@1 2 Gb ~ fiD !# 2 +

(15)

3+ When f is strictly positive on @ a, t a# S , the situation is nonregular, and there is the potential for improved rate of convergence by other estimators+ In this case, the two-step estimator bZ may not necessarily have adaptive properties, at least when at . 2` or aS , `, because it has too slow a rate of convergence; specifically, when f is known it is possible to obtain estimates with faster rate of convergence+ However, bZ is consistent and asymptotically normal under our conditions in this case+ Of course, trimming is no longer needed, and the untrimmed estimator then has the stochastic expansion X 0 1 h nqB 1 VY nh n3 1 op ~dn !+ 4+ In the regular case, the two-step estimator bZ has the exact same second-order effect as the profile likelihood estimator ~for a similar result, see Linton, 1998!+ 5+ Finally we consider what happens when the error support is unbounded+ As indicated in the analysis in the Appendix, in this case, the second-order variance efn fect involves terms such as n21 ( i51 10f ~«i ! ~which is related to the S1 term defined previously! that do not satisfy a law of large numbers if f has unbounded support like the Gaussian distribution+ In fact, this random sequence grows to infinity in probability at a rate determined by the tails of the distribution+ In the Gaussian case, the rate is logarithmic+ Thus the order in probability of the secondorder terms will be larger and will depend on the tails of the distribution+ Also, whether a central limit theorem for these terms operates remains to be seen+

!

994

OLIVER LINTON AND ZHIJIE XIAO

4. EXTENSIONS 4.1. t-Statistic In this section we derive the second order expansions for t-ratio statistics+ Consider the linear hypothesis H0 : c T b 5 c0 , where c is a p 3 1 vector of constants and c0 is a scalar+ The corresponding t-statistic is t[ 5

c T bZ 2 c0 T se~c b! Z Z

5

c T bZ 2 c0

!n

c IE n ~ b! D 21 c

21 T

+

Under the null hypothesis that c T b 5 c0 , t[ is asymptotically standard normal and first-order equivalent to the corresponding ~infeasible! MLE based t-ratio t+ 5

c T !n ~ bN 2 b!

!c

T

IO 21 c

,

where IO 5 2s ' ~ b!+ N Under our conditions, 1 T 21 2302 T 21 @c I c# c I D H I 21 c 2 1 higher order terms,

@c T IE n ~ b! D 21 c# 2102 5 @c T I 21 c# 2102 2

(16)

where D H is defined by ~11!+ Denote the second-order effect as tt 5 t[ 2 t+ and the op ~1! trimming effect as tr + We have the following result+ THEOREM 2+ Suppose that Assumptions A1–A6 hold and that h n 5 gn 210~2q13! for some g with 0 , g , `. Then, under H0 , n q0~2q13! ~tt 2 tr ! n N~0, st2 !, where st2 5

c T Sc 2 g 2q m 2q ~K ! c T I 21 c 3

H

c T I 21 M2 I 21 cc T I 21 M3 I 21 c 3 1 ~c T I 21 c! 2 4

S

c T I 21 M3 I 21 c c T I 21 c

DJ 2

+

The rate of convergence for tt 2 tr is the same as for t 2 T, but the asymptotic variance is slightly different, reflecting the estimation of the asymptotic variance of b+Z The trimming terms are similar to those in Theorem 1+ 4.2. Time Series Regressors In this section, we extend our second-order analysis to more general models where the regressors contain lagged disturbances and thus are serially correlated+ In particular, we consider the case where the regressor x i 5 ~ x i1 , + + + , x ip ! T satisfies Assumption A1 '+

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

995

A1 '+ The stochastic process $x i % satisfies `

x i 5 x i* 1

( Ck «i2k , k51

where Ck 5 ~ck1 , + + + , ckp ! T,

(17)

where x i* , «i are i+i+d+ random variables and are mutually independent+ Furthermore, there exists a r with 0 , r , 1 such that 6ckj 6 , r k for all j, k+ We require also that E~ x i ! 5 0, that Vx 5 E~ x i x iT ! is positive definite, and that for some h . 0, we have E @7 x7 41h # , `+ This setting is general enough to include leading cases in time series models such as, say, stationary ARMA time series regression models+ For example, we consider the case of a first-order univariate autoregressive regression described as follows: yt 5 byt21 1 «t ,

t 5 0,1, + + + , n,

(18)

where 6 b6 , 1 and $«i % are i+i+d+ random variables with mean zero and finite variance s«2 and satisfy Assumptions A1–A3 in Section 2+ Then regression ~18! corresponds to the special case in models ~1! and ~17! with x i* 5 0, and Ck 5 b k21+ The semiparametric adaptive estimator bZ is still consistent and asymptotically normal for this specification of the covariate process+ A similar expansion for bZ can be performed+ The following theorem summarizes the higher order effects+ THEOREM 3+ Suppose that Assumptions A1 ' and A2–A6 hold and that h n 5 gn210~2q13! for some g with 0 , g , `. Then, n q0~2q13! ~t 2 T ! n N~0, S * !, where S * 5 g 2q m 2q ~K ! 3 @I 21 ~M1 1 G!I 21 1 I 21 M3 I 21 M3 I 21 1 2I 21 M3 I 21 M2 I 21 # 1 g 23 7K ' 7 22 I 21 S1 I 21, where I, M1 , M2 , M3 are defined as in Theorem 1 with Vx 5 E~ x i* x i*T ! 1 ` s«2 ~ ( k51 Ck CkT ! and

S( ( D `

G 5 s«2

`

T Ci Ci1j E 2 @Dq , ~1! # +

i50 j51

Remarks+ 1+ The second-order effects are similar in Theorem 3 to those in Theorem 1, but in model ~17! the serial correlation in the regressors brings additional terms into the second order effect; these additional terms are summarized in G+ They arise from

996

OLIVER LINTON AND ZHIJIE XIAO

autocorrelations within some of the “bias related” terms+ When « is symmetric about zero, G 5 0, because Dq , ~1! is an odd function for q an even integer+ 2+ In the special case that x i* 5 0 and Ck 5 b k21, 6 b6 , 1, we reduce to the case of autoregression of order one, i+e+, x t 5 yt21 , or, yt 5 byt21 1 «t ,

t 5 0,1, + + + , n,

and I, M1 , M2 , M3 are defined as in Theorem 1 with Vx 5 s«2 0~1 2 b 2 !, whereas G 5 s«2

2b 12b

E 2 @Dq , ~1! ~«!# +

5. BANDWIDTH SELECTION The results in the previous sections can be used to select bandwidth parameter h n for the semiparametric estimator bZ and t-ratio+ Here, we just consider the estimator in the i+i+d+ setting, although similar comments apply to the test statistic and to the dependent data design+ The higher order effects generally depend on the bandwidth parameters and the trimming procedure+ However, although in principle joint optimization over the trimming and bandwidth parameters may be considered, the analysis would be substantially more complicated, not least because there is only a lower bound on b+ In this paper, we confine our attention to the effect of bandwidth and keep the choice of trimming parameter fixed+ Our analysis is not the best over all possibilities; however, it provides a second best choice, and our analysis shows how the estimator is affected by these parameters+ We shall try to minimize the second-order term V, which is mean zero and has asymptotic variance t 2 T 5 h nqB 1 n2102 h2302 n S~h n ! 5 h n2q m 2q ~K ! 3 @I 21 M1 I 21 1 I 21 M3 I 21 M3 I 21 1 2I 21 M3 I 21 M2 I 21 # 1

1 nh n3

7K ' 7 22 I 21 S1 I 21

[ h n2qQ 1 1

1 Q2 + nh n3

Specifically, we define an optimal bandwidth as one that minimizes some scalarvalued convex loss function defined on the second-order mean square error matrix S~h n !+ If the loss function is denoted as l~S!, then, by Taylor expansion, we obtain the following optimal bandwidth formula: h nopt 5

F

3 sl T vec$Q 2 % 2q sl T vec$Q 1 %

G

10~2q13!

n210~2q13!,

(19)

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

997

where sl 5 ]l~0!0]vecS+ Replacing the unknown quantities sl, Q 1 , and Q 2 in the bandwidth formula by their estimates Zsl, QZ 1 , QZ 2 , we obtain a feasible optimal bandwidth choice+ One way of estimating the optimal bandwidth parameter is the plug-in method+ We consider the following rule-of-thumb method for bandwidth selection as in Silverman ~1986! and Andrews ~1991!+ We specify a parametric model for the error structure $ fp ~{; u!, u [ Q%, and estimators of these parameters, denoted u,Z are used to obtain preliminary estimates of the density functions fp ~{; u! Z and Z These preliminary estimates are then plugged into their derivatives fp~ j ! ~{; u!+ the formulae of I, M1 , M2 , M3 , and S1 to get estimates of them+ Let IZ 5

1 n

1 MZ 1 5 n 2 MZ 2 5 MZ 3 5

SZ 1 5

n

( x i x iT

i51 n

(

x i x iT

i51

2 n

fp ~ «[ i ; u! Z

,

fp~q11! ~ «[ i ; u! Z

i51

i51

1 n

( x i x iT i51

1

1 n

( x i x iT n

n

1

fp' ~ «[ i ; u! Z fp~q! ~ «[ i ; u! Z fp ~ «[ i ; u! Z 2

GJ 2

fp' ~ «[ i ; u! Z fp~q! ~ «[ i ; u! Z fp~q11! ~ «[ i ; u! Z

HF HF HF

fp ~ «[ i ; u! Z 3

fp' ~ «[ i ; u! Z fp~q11! ~ «[ i ; u! Z fp ~ «[ i ; u! Z 2

fp~q12! ~ «[ i ; u! Z fp ~ «[ i ; u! Z

( x i x iT

i51

2

fp ~ «[ i ; u! Z

( x i x iT

n

n

2

n

1 n

1 n

F G HF G F fp' ~ «[ i ; u! Z

GF 2

,

GF 2

fp ~ «[ i ; u! Z 3

2fp' ~ «[ i ; u! Z fp~q11! ~ «[ i ; u! Z

2fp' ~ «[ i ; u! Z 2 fp~q! ~ «[ i ; u! Z 2 fp ~ «[ i ; u! Z 3

fp' ~ «[ i ; u! Z 2 fp~q! ~ «[ i ; u! Z

fp ~ «[ i ; u! Z 2

GF 2

GJ

GJ

fp'' ~ «[ i ; u! Z fp~q! ~ «[ i ; u! Z fp ~ «[ i ; u! Z 2

,

GJ

,

1

( x i x iT fp ~ «[ i ; u!Z , i51

where «[ i 5 yi 2 bZ T x i + Then let Q 1 and Q 2 be estimated by plugging I,Z MZ j, and SZ 1 into the corresponding formulae+ When the parametric specification is correct, these estimates are consistent+ More generally, they will be not far from the truth+ Plugging them into formula ~19!, we get an estimate of the optimal bandwidth+ Under further conditions, this data-based method is secondorder efficient in the sense that the corresponding effect thZ 2 T has the same asymptotic distribution as thopt 2 T+ See Linton ~1998! for a similar result+ We now discuss further the choice of trimming parameter+ Suppose we take b 5 h n12h for some h . 0+ Then under Assumptions A1–A6 with q . 2, we obtain the following expansion:

!n ~ bZ 2 b0 ! 5 X 0 1 h n~12h!~®21!02®T0 1 h nqB 1

1

!nh

3 n

V 1 op ~dn !,

998

OLIVER LINTON AND ZHIJIE XIAO

where T0 , B, and V are all Op ~1!+ The bias related term h nqB is of smaller order than the trimming term, and the optimal choice of h n will now trade off h n~12h!~®21!02®T0 against VY nh n3 + The optimal choice of h n would be

!

t a,S f !n2®0~4®2~®21!h21!, h * 5 g~®, a, whose rate depends on h and ®, the first of which is arbitrary and the second of which is unknown+ Note also that this bandwidth may not satisfy the restrictions in A6 for some values of ®+ Therefore, this method is not appealing+ If the trimming term is of some concern, one can estimate it using ~15! and correct standard errors accordingly+ 6. MONTE CARLO RESULTS We conducted a Monte Carlo experiment to evaluate the second-order theory of the semiparametric adaptive regression estimators+ We show by simulation how the semiparametric estimators are affected by the choices of smoothing parameters in finite sample+ We evaluate the effect of a bandwidth selection criterion that minimizes the second-order mean squared error and the sampling performance of estimators that use different bandwidth choices+ The model used for data generation was the following: yi 5 bx i 1 «i ,

(20)

with b 5 1 and x i i+i+d+ standard normal variates+ Two different specifications of «i were considered+ In the first case, «i are i+i+d+ t-distributions with degree of freedom 5 and truncated at 610+ The second case considers the centered i+i+d+ Beta~4,4! variates whose probability density vanishes on the boundary and is thus consistent with the requirements for regular estimation+ See Devroye ~1995! for a discussion on generating Beta random variables+ These two specifications of the residuals are denoted DGP~1! and DGP~2! in our analysis+ The second-order effects for these examples are also calculated; readers are referred to an early version of this paper ~Linton and Xiao, 1998! for the formulae+ Two sample sizes are tried, n 5 100, 200+ In our experiment, x i and «i are independent of each other, and the number of replications is 500 in each case+ The sampling performances of both the ordinary least squares ~OLS! estimator and the semiparametric adaptive estimators were examined for each case+ For the adaptive estimator, the following kernel function was used in the semiparametric estimation: K~u! 5 15~1 2 u 2 ! 2 1~6u6 # 1!016+ For purpose of comparison, we also considered the MLE, which uses knowledge of the density function+ In particular, we calculated the two-step Newton–Raphson estimator from the OLS preliminary estimator+ Because we are especially interested in the effect of smoothing parameters on the finite sample performance of the adaptive estimators, different choices of bandwidth parameters were considered and compared+ We examined the prop-

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

999

erties of the adaptive estimators with optimal bandwidth selection and several fixed bandwidth choices+ Different trimming parameter values were used in the Monte Carlo experiment, and the effects of trimming parameter value on the sample performance of these estimators were also examined+ Denoting the jth replication of estimator b as b~ j !, we calculated in each case the ~average! bias ~R 21 ( Rj51 b~ j ! 2 1!, the median bias ~median of b 2 1!, the N 2 !, the mean squared error ~R 21 ( Rj51 ~b~ j ! 2 variance ~R 21 ( Rj51 ~b~ j ! 2 b! 2 1! !, and the interquartile range ~IQR 5 the 75% quantile–the 25% quantile!, where R is the number of replications+ Table 1 provides the simulation results for the nonregular case where «i are i+i+d+ truncated t-distributions whose density is strictly positive on its bounded support+ Both n 5 100 and n 5 200 are reported+ We calculated the OLS estimator, the MLE, and the adaptive estimators using optimal bandwidth ~19!, which was close to 0+035, and fixed bandwidth values h 5 0+01, 0+03, 0+05, 0+1, in each case without any trimming+ For this case, we can see that the mean squared errors of the MLE, the adaptive estimator with optimal bandwidth, and the OLS estimator are close, although small difference does exist+ Substantial difference can be found among adaptive estimators using different bandwidth values+ From these results we can see the influence of bandwidth choice on the adaptive estimator+ Table 2 reports the results for DGP~2! where «i are i+i+d+ Beta~4,4! random variates and n 5 100+ The results for the n 5 200 case are similar+ Besides the

Table 1. Simulation results where «i are i+i+d+ truncated t-distributions Estimators

Bias

Median Bias

Variance

MSE

IQR

OLS estimator MLE: 2-step from OLS ADAP1: optimal band ADAP2: h 5 0+1 ADAP3: h 5 0+05 ADAP4: h 5 0+03 ADAP5: h 5 0+01

0+00176 0+00316 0+00320 0+04815 0+00406 20+00238 20+01958

n 5 100 20+00144 20+00166 20+00178 20+00132 20+00267 20+00209 20+00239

0+0302 0+0294 0+0294 1+2063 0+0612 0+0437 0+2617

0+0303 0+0294 0+0294 1+2086 0+0612 0+0437 0+2621

0+0810 0+0798 0+0801 0+0958 0+1207 0+0823 0+0853

OLS estimator MLE: 2-step from OLS ADAP1: optimal band ADAP2: h 5 0+1 ADAP3: h 5 0+05 ADAP4: h 5 0+03 ADAP5: h 5 0+01

0+00251 0+00254 0+00246 0+00372 0+00255 20+0027 20+0021

n 5 200 20+00163 20+00208 20+00131 20+00166 20+00239 20+00178 20+00207

0+0135 0+0131 0+0132 0+0297 0+0139 0+0131 0+0172

0+0139 0+0135 0+0136 0+0304 0+0145 0+0136 0+0178

0+0810 0+0812 0+0801 0+0799 0+0852 0+0801 0+0958

1000

OLIVER LINTON AND ZHIJIE XIAO

Table 2. Simulation results where «i are i+i+d+ Beta~4,4! random variates Estimators

Bias

Median Bias

Variance

MSE

IQR

OLS estimator MLE: 2-step from OLS ADAP1: optimal band ADAP2: h 5 0+01 ADAP3: h 5 0+003 ADAP4: h 5 0+001

20+00251 20+00213 20+00250 20+00253 20+00236 20+00253

b 5 0+005 20+00214 20+00132 20+00154 20+00174 20+00084 20+00186

0+000385 0+000319 0+000369 0+000372 0+000372 0+000376

0+000391 0+000323 0+000375 0+000379 0+000378 0+000382

0+0254 0+0243 0+0253 0+0250 0+0238 0+0249

OLS estimator MLE: 2-step from OLS ADAP1: optimal band ADAP2: h 5 0+01 ADAP3: h 5 0+003 ADAP4: h 5 0+001

20+00251 20+00213 20+00279 20+00262 20+00264 20+00249

b 5 0+05 20+00214 20+00132 20+00262 20+00247 20+00244 20+00208

0+000385 0+000327 0+000382 0+000390 0+000386 0+000408

0+000391 0+000332 0+000390 0+000397 0+000393 0+000414

0+0254 0+0243 0+0242 0+0257 0+0258 0+0263

OLS, MLE, and adaptive estimator with optimal bandwidth, which was close to 0+006, we also considered the adaptive estimators with fixed bandwidth h 5 0+001, 0+003, 0+01+ We use the trimming function ~5! and the following two values of trimming parameter b: 0+005 and 0+05, corresponding to the two parts of Table 2+ Alternative choices of the trimming parameters were tried, and the results are quantitatively similar+ These results confirm the previous finding from Table 1 that the finite sample performance of the adaptive estimator is affected by the choice of h+ We also see that the efficiency gain from using the density information is relatively higher for DGP~2!+ A comparison within Table 2 indicates that the choices of trimming parameter values have important influence on the finite sample performance+ In summary, these Monte Carlo results illustrated the influence of choices of smoothing parameters on the finite sample performance of the semiparametric adaptive regression estimators, and confirms the effectiveness of the second order theory+

7. CONCLUSION The results of this paper readily extend to the multivariate SUR case where «i is a vector, see Jeganathan ~1995! and Hodgson ~1998! for first-order theory+ In this case the corresponding second-order rate is n q0~2q1d12!, which worsens with dimensions+ Our results also extend to the nonlinear regression function case as

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1001

in Manski ~1984!+ Finally, when «i has higher order dependence on x i , it may still be possible to justify our results provided that

F

E xi

G

f ' ~«i ! 5 0+ f ~«i !

This would happen for example if the conditional distribution of «i 6 x i were symmetric about zero+ See Hodgson ~1999! for a first-order result in this direction+ However, in cases where ARCH effects are strong, it may be preferable to work with the adaptive ARCH estimator of Linton ~1993!+ NOTE 1+ In any case, the density and its derivatives are zero at the boundary, so that the bias would be O~h ® ! were we to be estimating there+

REFERENCES Ai, C+ ~1997! A semiparametric maximum likelihood estimator+ Econometrica 65, 933–963+ Akahira, M+ & K+ Takeuchi ~1995! Non-Regular Statistical Estimation+ New York: Springer-Verlag+ Andrews, D+W+K+ ~1991! Heteroskedasticity and autocorrelation consistent covariance matrix estimation+ Econometrica 59, 817–858+ Andrews, D+W+K+ ~1995! Nonparametric kernel estimation for semiparametric models+ Econometric Theory 11, 560–596+ Beran, R+ ~1974! Asymptotically efficient adaptive rank estimates in location models+ Annals of Statistics 2, 248–266+ Bickel, P+J+ ~1975! One-step Huber estimates in the linear model+ Journal of the American Statistical Association 70, 428– 434+ Bickel, P+J+ ~1982! On adaptive estimation+ Annals of Statistics 10, 647– 671+ De Jong, P+ ~1987! A central limit theorem for generalized quadratic forms+ Probability Theory and Related Fields 75, 261–277+ Devroye, Luc ~1995! Non-Uniform Random Variate Generation+ New York: Springer-Verlag+ Dieudonné, J+ ~1969! Foundations of Modern Analysis+ New York: Academic Press+ Fan, J+ & Q+ Li ~1996! Central Limit Theorem for Degenerate U-statistics of Absolutely Regular Processes with Applications to Model Specification Testing+ Discussion paper 1996-8, University of Guelph+ Härdle, W+, J+ Hart, J+S+ Marron, & A+B+ Tsybakov ~1992! Bandwidth choice for average derivative estimation+ Journal of the American Statistical Association 87, 218–226+ Hodgson, D+ ~1998! Adaptive estimation of cointegrating regressions with ARMA errors+ Journal of Econometrics 85, 231–268+ Hodgson, D+ ~1999! Partial Maximum Likelihood and Adaptive Estimation in the Presence of Conditional Heterogeneity of Unknown Form+ Manuscript, University of Rochester+ Hsieh, D+A+ & C+F+ Manski ~1987! Monte Carlo evidence on adaptive maximum likelihood estimation of a regression+ Annals of Statistics 15, 541–551+ Jeganathan, P+ ~1995! Some aspects of asymptotic theory with applications to time series models+ Econometric Theory 11, 818–887+ Kreiss, J+ ~1987! On adaptive estimation in stationary ARMA process+ Annals of Statistics 15, 112–133+ Linton, O+B+ ~1993! Adaptive estimation in ARCH models+ Econometric Theory 9, 539–569+ Linton, O+B+ ~1995! Second order approximation in the partially linear regression model+ Econometrica 63, 1079–1112+

1002

OLIVER LINTON AND ZHIJIE XIAO

Linton, O+B+ ~1996! Second order approximation in a linear regression with heteroskedasticity of unknown form+ Econometric Reviews 15, 1–32+ Linton, O+B+ ~1998! Edgeworth Approximations for Semiparametric Instrumental Variable Estimators and Test Statistics+ Manuscript, Yale University+ Linton, O+B+ & Z+ Xiao ~1997! Second Order Approximation in a Semiparametric Binary Choice Model+ Manuscript, Yale University+ Linton, O+B+ & Z+ Xiao ~1998! Second Order Approximation for Adaptive Regression Estimators+ Manuscript, Yale University+ Manski, C+F+ ~1984! Adaptive estimation of nonlinear regression models+ Econometric Reviews 3, 187–208+ Masry, E+ ~1996! Multivariate local polynomial regression for time series: Uniform strong consistency and rates+ Journal of Time Series Analysis 17, 571–599+ Müller, H+G+ ~1988! Nonparametric Regression Analysis of Longitudinal Data+ London: Springer-Verlag+ Nishiyama, Y+ & P+M+ Robinson ~1997! Edgeworth expansions for semiparametric averaged derivatives+ Econometrica, forthcoming+ Phillips, P+C+B+ ~1978! A general theorem in the theory of asymptotic expansion as approximations to the finite sample distributions of econometric estimators+ Econometrica 45, 463– 485+ Powell, J+L+ & T+M+ Stoker ~1996! Optimal bandwidth choice for density-weighted averages+ Journal of Econometrics 75, 291–316+ Rothenberg, T+ ~1984! Approximating the distributions of econometric estimators and test statistics+ In Z+ Griliches & M+ Intriligator ~eds+!, The Handbook of Econometrics, vol+ 2+ Amsterdam: North-Holland+ Rothenberg, T+ & C+T+ Leenders ~1964! Efficient estimation of simultaneous equation systems+ Econometrica 32, 57–76+ Sargan, J+D+ ~1976! Econometric estimators and the Edgeworth approximation+ Econometrica 44, 421– 448+ Silverman, B+ ~1978! Weak and strong uniform consistency of the kernel estimate of a density function and its derivatives+ Annals of Statistics 8, 1175–1176+ Silverman, B+ ~1986! Density Estimation for Statistics and Data Analysis+ London: Chapman and Hall+ Steigerwald, D+ ~1992! Adaptive estimation in time series regression models+ Journal of Econometrics 54, 251–276+ Stein, C+ ~1956! Efficient nonparametric testing and estimation+ In J+ Neyman ~ed+!, Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability+ Berkeley: University of California Press+ Stone, C+ ~1975! Adaptive maximum likelihood estimation of a location parameter+ Annals of Statistics 3, 267–284+ Xiao, Z+ & P+C+B+ Phillips ~1996! Higher order approximation for a frequency domain regression estimator+ Journal of Econometrics 86, 297–336+

APPENDIX We shall use the notation Ei ~{! to denote Ei ~{6Fi !, where Fi 5 $«i ; x 1 , + + + , x n %+ Note that the Lebesgue density of «i ~ b! 5 «i 2 ~ b 2 b0 ! T x i , denoted fb ~{; b!, is the convolution of f with the density or probability mass function of x i + We shall just treat explicitly the case where x i has a Lebesgue density fx , because the discrete case is similar+ Note that if

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1003

~ j!

x i has unbounded support, then so does «i ~ b!+ Let Khn ~u! 5 ~10h nj11 !K ~ j ! ~u0h n !, j 5 0,1, 2+ LEMMA 1+ For j 5 0,1,2, we have as n r `, ~ j!

~ j!

Ei @Khn ~«i ~ b! 2 «j ~ b!!# 5 fb ~«i ~ b!! 1

h nq q!

~q1j !

fb

~«i ~ b!!

E

u q K~u!du 1 o~h nq !, (A.1)

1

~ j!

Ei @Khn ~«i ~ b! 2 «j ~ b!! 2 # 5

h n2j11

fb ~«i ~ b!!7K ~ j ! 7 22 1 o~h22j11 ! n

(A.2)

uniformly in S 5 $i : f ~«i ! $ b% ù $ b : 7 b 2 b0 7 # c0!n % with probability one. Here, ~ j! fb ~u! 5 * f ~ j ! ~u 2 ~ b 2 b0 ! T x! fx ~ x!dx. LEMMA 2+ Suppose that Assumptions A1–A6 hold. Then, for j 5 0,1,2, max 6 fbD ~«i ~ b!! 2 fb ~«i ~ b!!6 5 Op ~h n® ! 1 Op

sup

7 b2b0 7#c0!n 1#i#n

S! D log n

nh n2j11

,

(A.3)

sup

~ j!

~ j!

max 6 fbD ~«i ~ b!! 2 fb ~«i ~ b!!6Gb ~6 fi 6! 5 Op ~h nq ! 1 Op

7 b2b0 7#c0!n 1#i#n

S! D log n

nh n2j11

+

(A.4) In the remainder of this section we write fiD , fiD ' , ]fiD 0]b, and ]fiD ' 0]b ~at the true b0 ! in terms of their probability limits and correction terms involving “bias terms” and “variance terms+” We decompose f D ~«i ! 2 f ~«i ! in the following way: fiD ~«i ! 2 f ~«i ! 5 @ fiD ~«i ! 2 Ei $ fiD ~«i !%# 1 @Ei $ fiD ~«i !% 2 f ~«i !# [ Vi 1 Bi ,

(A.5)

where Ei $ fiD ~«i !% 5 Ei $Khn ~«i 2 «j !% 5 f ~«i ! 1

h nq q!

f ~q! ~«i !

E

u q K~u!du 1 o~h nq !

by identity of distribution and Lemma 1+ Likewise, the derivative estimate can be written as follows: fiD ' 5 fiD ' ~«i ! 5 f ' ~«i ! 1 @Ei fiD ' ~«i ! 2 f ' ~«i !# 1 @ fiD ' ~«i ! 2 Ei fiD ' ~«i !# 5 f ' ~«i ! 1 Bi' 1 Vi ' , (A.6) where Ei $ fiD ' ~«i !% 5

h nq

1

f ~q11! ~«i ! Ei $Kh' ~«i 2 «j !% 5 f ' ~«i ! 1 n 21 ( q! jÞi

using integration by parts+

n

E

u q K~u!du 1 o~h nq !,

1004

OLIVER LINTON AND ZHIJIE XIAO

For the derivatives with respect to b of the density estimates, we have ]fiD ~«i ~ b!!

5

]b ]fiD ' ~«i ~ b!!

5

]b

1 n 21 1 n 21

Kh' ~«i 2 «j !~ x j 2 x i !, ( jÞi n

Kh'' ~«i 2 «j !~ x j 2 x i !, ( jÞi n

and we make the following decomposition:

FH

]fiD ~«i ~ b!!

5 fiN ' 1 Ei

]b

]f D ~«i ~ b!! ]b

J G F 2 fiN ' 1

]f D ~«i ~ b!! ]b

2 Ei

H

]f D ~«i ~ b!! ]b

JG

[ fiN ' 1 BO i' 1 VP i ' ,

(A.7)

where fiN ' 5 BO i' 5

1 n 21 1 n 21

~ x j 2 x i ! f ' ~«i ! 5 f ' ~«i !~ xS 2i 2 x i ! 5 2x i f ' ~«i ! 1 Op ~n2102 !, ( jÞi ~ x j 2 x i !Ei Kh' ~«i 2 «j ! 2 fiN ' ( jÞi n

5 f ' ~«i !~ xS 2i 2 x i ! 1 5 2x i

h nq

f ~q11! ~«i !

q!

h nq q!

E

f ~q11! ~«i !

E

u q K~u!du~ xS 2i 2 x i ! 2 fiN ' 1 o~h nq !

u q K~u!du 1 o~h nq ! 1 Op ~n2102 !,

where xS 2i 5 10~n 2 1! (jÞi x j 5 Op ~n2102 !+ Finally, ]fiD ' ~«i ~ b!! ]b

FH

5 fiN '' 1 Ei

]fiD ' ~«i ~ b!! ]b

J G F 2 fiN '' 1

]fiD ' ~«i ~ b!! ]b

2 Ei

H

]fiD ' ~«i ~ b!! ]b

5 fiN '' 1 BO i'' 1 VP i '' ,

(A.8)

where fiN '' 5 BO i'' 5

1 n 21 1 n 21

~ x j 2 x i ! f '' ~«i ! 5 f '' ~«i !~ xS 2i 2 x i ! 5 2x i f '' ~«i ! 1 Op ~n2102 !, ( jÞi ~ x j 2 x i !Ei Kh'' ~«i 2 «j ! 2 fiN '' ( jÞi n

5 f '' ~«i !~ xS 2i 2 x i ! 1 5 2x i

h nq q!

f ~q12! ~«i !

h nq q!

E

f ~q12! ~«i !

JG

E

u q K~u!du~ xS 2i 2 x i ! 2 fiN '' 1 o~h nq !

u q K~u!du 1 o~h nq ! 1 Op ~n2102 !+

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1005

As to the stochastic terms in ~A+5!–~A+8!, we have, for example, Ei @ VP i ' VP i 'T # 5 5

1 ~n 2 1! 2 1 n

~ x j 2 x i !~ x j 2 x i ! T $Ei @Kh' ~«i 2 «j ! 2 # 2 @Ei Kh' ~«i 2 «j !# 2 % ( jÞi n

n

E @~ x j 2 x i !~ x j 2 x i ! T #Ei @Kh' n ~«i 2 «j ! 2 # 1 O~n21 ! 5 O~n21 h23 n !+

In conclusion, Bi , Bi' , BO i' , BO i'' are Op ~h nq ! uniformly in i+ The stochastic terms Vi ,Vi ' , VP i ' , and VP i '' are mean zero sums of independent random variables with probability orders Op ~n2102 h2102 !, Op ~n2102 h2302 !, Op ~n2102 h2302 !, and Op ~n2102 h2502 !, respecn n n n tively, which follows from Lemma 1+ Proof of Lemma 1. We first use the law of iterated expectations to write Ei @Khn ~«i ~ b! 2 «j ~ b!!# 5

E FE

G

Khn ~«i ~ b! 2 «j ~ b!! f ~«j !d«j fx ~ x j !dx j +

We work on the inner integral+ Given the moment conditions in assumption A1 and the bandwidth conditions in A6, and when 7 b 2 b0 7 # c0!n , we have ~see the proof of Lemma 2! max1#i#n 6~ b 2 b0 ! T ~ x i 2 x j !6 5 op ~b 10r !+ Notice that on S we have f ~«i ! $ b; thus, for small values of f ~«i !, «i 2 at or aS 2 «i are of order b 10® by A4+ Under Assumption A6, they are larger in order of magnitude than h+ As a result, by a change of variable «j ° u 5 ~«i 2 «j 2 ~ b 2 b0 ! T ~ x i 2 x j !!0h n , we have the following approximation:

E

Khn ~«i ~ b! 2 «j ~ b!! f ~«j !d«j '

E

1

21

f ~«i 2 ~ b 2 b0 ! T ~ x i 2 x j ! 1 uh n !K~u!du

on S+ The following Taylor expansion of f around «ij* 5 «i 2 ~ b 2 b0 ! T ~ x i 2 x j ! is valid for q # r f ~«ij* 1 uh n ! 5 f ~«ij* ! 1 uh n f ' ~«ij* ! 1 {{{ 1 1

E

~uh n ! q q!

~uh n ! q q!

f ~q! ~«ij* !

1

q~1 2 t ! q21 $ f ~q! ~«ij* 1 tuh n ! 2 f ~«ij* !%dt

0

by Dieudonné ~1969, Theorem 8+14+3!+ Therefore,

E

Khn ~«i ~ b! 2 «j ~ b!! f ~«j !d«j 5 f ~«ij* ! 1

h nq q!

f ~q! ~«ij* !

E

u q K~u!du 1 R ij ,

where, by Assumption A4, uniformly in i with probability one 6R ij 6 5

Eu K~u! E q~1 2 t ! q! *

h nq

1

q

0

# ch nq11

* Eu

q11

K~u!du

*

q21

$ f ~q! ~«ij* 1 tuh n ! 2 f ~q! ~«ij* !%dtdu

*

1006

OLIVER LINTON AND ZHIJIE XIAO

for some constant c, by Lipschitz continuity of f ~q! + The final step is to integrate with respect to fx —the integrals are finite because of Assumptions A1–A3+ As for the derivatives, we use integration by parts ~which is justified by Assumption A4!+ We have, e+g+,

E

1 h n2

K'

S

«i ~ b! 2 «j ~ b! hn

D

f ~«j !d«j 5 5

1 hn

E

E

f ' ~«j !K

S

«i ~ b! 2 «j ~ b! hn

D

d«j

f ' ~«ij* 1 uh n !K~u!du+

We can then apply the same Taylor expansion as previously, this time of f ' around «ij* +

n

Proof of Lemma 2. The proof follows from an extension of Silverman ~1978!, as treated in Andrews ~1995!+ In the second part of the theorem, note that Gb ~6 fi 6! excludes observations too close to the boundary, so that we can apply the usual Taylor series expansion to treat the bias terms+ In the first part of the theorem, the bias terms are small because the density ~and its derivatives up to order ® 2 1! is zero at the boundary+ With regard to the stochastic part in both theorems, uniform convergence over i follows from Masry ~1996, Theorem 2!+ We concentrate on the uniformity with respect to b+ We first show that Pr @A# 5 o~1! for any cn r `, where A5

H

~ j!

sup 7 b2b0 7#c0!n

~ j!

6 fbD ~«i ~ b!! 2 Ei $ fbD ~«i ~ b!!%6 . cn

! nh

log n 2j11 n

J

+

The proof is made more complicated by the fact that we have not assumed that the support of x i is finite so that the density function fb could have unbounded support—or n more relevantly, the range of the evaluation points $«i ~ b!% i51 increases as n r `+ Define the events Cn ~d ! 5

H max 7 x 7 # dn 1#i#n

i

102

J

b 10® +

Then, by Assumption A1, we have Pr @Cnc ~d !# # n Pr @7 x i 7 . dn 102 b 10® # #n

E @7 x i 7 41h # d 41h n 21h02 b ~41h!0®

r 0,

(A.9)

i+e+, max1#i#n 7 x i 70n 102 b 10® 5 op ~1!+ We shall restrict attention to A ù Cn ~d !, which can be justified by the argument that Pr @A# # Pr @A ù C # 1 Pr @C c # for any event A+ Write b 5 b0 1 b0!n for any b [ Nn ~c! 5 $ b : 7 b 2 b0 7 # c0!n %+ Because Nn ~c! is compact it can be covered by a finite number of cubes Inl with centers bl 5 b0 1 bl 0!n having sides of length d~L!, for l 5 1, + + + , L+ Note that d~L! 5 O~L210p !+ We take L~n! 5

S

n 3

h log n

D

p02

+

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1007

We now make a standard decomposition ~ j!

~ j!

sup 6 fbD ~«i ~ b!! 2 Ei $ fbD ~«i ~ b!!%6

Nn ~c!

~ j!

# max

~ j!

sup 6 fbD ~«i ~ b!! 2 fbD ~«i ~ bl !!6

1#l#L Nn ~c!ùInl ~ j!

~ j!

1 max 6 fbD ~«i ~ bl !! 2 Ei $ fbD ~«i ~ bl !!%6 1#l#L

1 max

~ j!

~ j!

sup 6Ei $ fbD ~«i ~ bl !!% 2 Ei $ fbD ~«i ~ b!!%6

1#l#L Nn ~c!ùInl

[ Q1 1 Q2 1 Q3 + By the Lipschitz condition on the kernel and the Cauchy–Schwarz inequality, we have for any b [ Nn ~c! ù Inl , 1 h nj11

*K # # #

~ j!

S

«i ~ bl ! 2 «j ~ bl ! hn

c

D

2 K ~ j!

S

«i ~ b! 2 «j ~ b! hn

D*

max 6«i ~ bl ! 2 «i ~ b!6

h nj12 1#i#n c

7 bl 2 b7 max 7 x i 7

h nj12

1#i#n

cd~L! h nj12

5

c h nj12

S

h 3 log n n

D S D 102

102

log n

5c

nh n2j11

for some constant c ~which can be different from expression to expression! with probability tending to one by ~A+9!+ This gives a bound on Q1 similar to ~3+21! in Masry ~1996!+ The Bonferroni inequality and an exponential bound are used to treat Q2 + Specifically, for any l . 0, we have

F

~ j!

~ j!

Pr max 6 fbD ~«i ~ bl !! 2 Ei $ fbD ~«i ~ bl !!%6 . l 1#l#L

3 S(

# 2L~n!exp 2

l2

n

2

sni2

1 ml

i51

D4

G (A.10)

,

where m 5 c0nh nj11 is a bound on the random variable Z ni 5

1 nh nj11

F S K ~ j!

«i ~ bl ! 2 «j ~ bl ! hn

D

2 Ei K ~ j !

S

«i ~ bl ! 2 «j ~ bl ! hn

DG

n and sni2 5 c0n 2 h n2j11 5 var @Z ni # + We take l 5 cn log n0nh n2j11 + Note that the ( i51 sni2 term in the denominator of ~A+10! dominates when nh n 0log n r `+ It now follows that

!

1008

OLIVER LINTON AND ZHIJIE XIAO

the right hand side of ~A+10! is o~1!, provided the constants are taken large enough+ Finally, the term Q3 is bounded in the same way as Q1 + n Proof of Theorem 1. By a Taylor series expansion of sI n ~ b! D about sI n ~ b0 !, we have D 5 sI n ~ b0 ! 1 sI n' ~ b * !~ bD 2 b0 !, sI n ~ b! where sI n' ~ b! 5 ]sI n ~ b!0]b and b * is an intermediate point+ Thus

!n ~ bZ 2 b0 ! 5 IE n ~ b!D 21 @!n sI n ~ b0 ! 1 sI n' ~ b * !!n ~ bD 2 b0 !# 1 !n ~ bD 2 b0 ! 5 IE n ~ b0 !21 !n sI n ~ b0 ! 1 $IE n ~ b! D 21 2 IE n ~ b0 !21 %!n sI n ~ b0 ! 1 $I 1 IE n ~ b! D 21 sI n' ~ b * !%!n ~ bD 2 b0 !+ Noticing that sI n' ~ b * ! 5 sI n' ~ b0 ! 1 sI n'' ~ b ** !~ b * 2 b0 !, a first-order analysis shows that sI n'' ~ b ** ! 5 Op ~1!, with limit Vx E $ f '''0f 2 3f '' f '0~ f ! 2 1 2~ f '0f ! 3 %+ Also observing the fact that bD 2 b0 5 Op ~n2102 !, and b * is an intermediate point between b0 and b,D we have sI n' ~ b * ! 5 sI n' ~ b0 ! 1 Op ~n2102 !+ In addition, sI n' ~ b0 ! 5 2IE n ~ b0 ! 2

1 n

n

( x i x iT

i51

fiD ' ~«i ~ b0 !! fiD ~«i ~ b0 !!

gb ~ fiD ~«i ~ b0 !!! fiD ~«i ~ b0 !!+

n The second term, ~10n! ( i51 x i x iT gb ~ fiD !~ fiD ' ! 20fiD , is a higher order term that depends on the trimming parameter and on the boundary behavior of the densities+ By a similar argument used elsewhere in this paper, we can show that it is op ~b ~®21!02® !+ In particular, notice that for small enough b, 6gb ~ f !6 # b 21, the leading term is asymptotically

F

E x i x iT

~ fi ' ! 2 fi

5 Vx

EF

' Vx

E

G

gb ~ fi !

f ' ~«! 2 f ~«!

a1 t dt 2

a1 t dt 1

1 Vx

E

G

gb ~ f ~«!! 1~b # f ~«! # 2b! f ~«!d«

c~ a!~« t 2 a! t 2~®21! gb ~ f ~«!!d«

a2 S dN 1

a2 S dN 2

c~ a!~« S 2 a! S 2~®21! gb ~ f ~«!!d«

F

# Vx b 21 c~ a! t

E

a1 t dt 2

a1 t dt 1

~« 2 a! t 2~®21! d« 1 c~ a! S

~2®21!0® 5 Vx b 21 @c~ a! t 1 c~ a!#b S

5 o~b ~®21!02® !,

E

a2 S dN 1

a2 S dN 2

G

~« 2 a! S 2~®21! d«

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1009

where dt 1 , dt 2 , dN 1 , dN 2 are defined as in ~A+15! and ~A+16!, which follow+ Thus, we have $I 1 IE n ~ b! D 21 sI n' ~ b!% N !n ~ bD 2 b0 ! 5 TI 1 Op ~n2102 !, where TI is a trimming effect of order op ~b ~®21!02® !+ For $IE n ~ b! D 21 2 IE n ~ b0 !21 %!n sI n ~ b0 !, first, it can be shown that IE n ~ b! D 2 IE n ~ b0 ! 5 2102 Op ~n !+ Using the results given previously in this Appendix, we have fiD ~«i ~ b!! D 2 fiD ~«i ~ b0 !! 5

'

1 n 21 1 n 21

$Kh @~«i 2 «j ! 1 ~ bD 2 b! T ~ x j 2 x i !# 2 Kh @«i 2 «j #% ( jÞi n

n

Kh' ~«i 2 «j !~ x j 2 x i ! T ~ bD 2 b! ( jÞi n

' 2x i f ' ~«i !~ bD 2 b! 5 Op ~n2102 !, and similarly fiD ' ~«i ~ b!! D 2 fiD ' ~«i ~ b0 !! ' 2x i f '' ~«i !~ bD 2 b! 5 Op ~n2102 !+ D 2 IE n ~ b0 ! 5 Op ~n2102 !+ Thus, using the definition of IE n ~ b, f D !, we can show that IE n ~ b! In addition, !n sn ~ b0 ! 5 Op ~1!+ Thus, by a geometric expansion we have $IE n ~ b! D 21 2 IE n ~ b0 !21 %!n sI n ~ b0 ! 5 Op ~n2102 !+ We now develop the expansions of the score function and the Hessian+ By definition

!n sI n ~ b0 ! 5 2

1

n

( !n i51

f D ' ~«i ~ b0 !! f D ~«i ~ b0 !!

x i Gb ~ fiD !+

For simplicity, we denote Gb ~ fiD ! as GE i + For the denominator 1 f D ~«i !

5

1 f ~«i !

2

f D ~«i ! 2 f ~«i ! f ~«i !

where R2 5

2$ f D ~«i ! 2 f ~«i !% 3 f 3 ~«i ! f D ~«i !

+

2

1

$ f D ~«i ! 2 f ~«i !% 2 f ~«i ! 3

1 R2 ,

1010

OLIVER LINTON AND ZHIJIE XIAO

Substituting Bi 1 Vi for f D ~«i ! 2 f ~«i ! and f ' ~«i ! 1 Bi' 1 Vi ' for f D ' ~«i ~ b0 !! and Taylor expanding, we get

!n sI n ~ b0 ! 5 2 52

f D ' ~«i ~ b0 !!

n

1

( !n i51 n

1

( !n i51

f D ~«i ~ b0 !!

F

1 f ~«i !

2

x i Gb ~ fiD !

Bi 1 Vi f ~«i ! 2

1

$Bi 1 Vi % 2 f ~«i ! 3

1 R2

G

3 @ f ' ~«i ! 1 Bi' 1 Vi ' # x i Gb ~ fiD ! 52

f ' ~«i !

n

1

( !n i51 1

n

1

n

f ~«i !

xi 2

n

1

Bi'

( f ~«i ! x i GE i !n i51

Vi '

n

1

f ' ~«i !

( f ~«i ! x i GE i 1 !n i51 ( f ~«i ! 2 x i Bi GE i !n i51

2

1

!n

22

f ' ~«i !

( f ~«i ! 2

x i Vi GE i 2

i51

f ' ~«i !

n

1

!n

( f ~«i ! 3

1

!n

x i Bi Vi GE i 2

i51

n

f ' ~«i !

( f ~«i ! 3 x i Bi2 GE i

i51

1

!n

n

f ' ~«i !

( f ~«i ! 3 x i Vi2 GE i

i51

1

n

Bi'

1

n

Vi '

1

n

Bi'

1

n

Vi '

1

n

f ' ~«i !

( f ~«i ! 2 Bi x i GE i 1 !n i51 ( f ~«i ! 2 Bi x i GE i !n i51

1

( f ~«i ! 2 Vi x i GE i 1 !n i51 ( f ~«i ! 2 Vi x i GE i !n i51

1

( !n i51

1

f ~«i !

x i @1 2 GE i # 1 R n2

11

5

( Mj 1 Remainder terms, j50

where the remainder term equals

R n2 5

1

n

( !n i51

f D ' ~«i !@ f D ~«i ! 2 f ~«i !# 3 f ~«i ! 3 f D ~«i !

x i GE i

(A.11)

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1011

and M0 5

M1 5

M3 5

M5 5

M7 5

M9 5

M11 5

21

n

( !n i51 21

n

1

n

1

n

1

n

21

n

1

n

f ' ~«i ! f ~«i !

M2 5

xi ;

Bi'

1

n

21

n

Vi '

( f ~«i ! x i GE i ; !n i51 f ' ~«i !

( f ~«i ! x i GE i 1 !n i51 ( f ~«i ! 2 x i Bi GE i ; !n i51 f ' ~«i !

( f ~«i ! 2 x i Vi GE i ; !n i51

M4 5

Vi '

( f ~«i ! 2 Bi x i GE i ; !n i51

M6 5

Vi '

( f ~«i ! 2 Vi x i GE i ; !n i51

M8 5

f ' ~«i !

( f ~«i ! 3 x i Vi2 GE i ; !n i51 ( !n i51

f ' ~«i ! f ~«i !

1

n

Bi'

1

n

Bi'

21

n

f ' ~«i !

( f ~«i ! 2 Bi x i GE i ; !n i51 ( f ~«i ! 2 Vi x i GE i ; !n i51 ( f ~«i ! 3 x i Bi2 GE i ; !n i51

M10 5

22

n

f ' ~«i !

( f ~«i ! 2 x i Bi Vi GE i ; !n i51

x i @1 2 GE i # +

n The leading term, M0 5 ~210!n ! ( i51 ~ f ' ~«i !0f ~«i !!x i , is of order Op ~1!+ We now verify the orders of the other terms+ First we show that Gb ~ fiD ! can be replaced by Gb ~ fi ! 5 Gi + If we perform a first-order expansion on Gb ~ fiD ! around fi we obtain

max 6Gb ~ fiD ! 2 Gb ~ fi !6 5 max 6gb ~ fiN !~ fiD 2 fi !6

1#i#n

1#i#n

# #

1 b 1 b

max 6 fiD 2 fi 61~ fiN $ b!

1#i#n

max 6 fiD 2 fi 61~ fi $ b!$1 1 op ~1!%,

1#i#n

where fiN is an intermediate point between fiD and fi + Note that the second equality follows from Lemma 2, part 1, and Assumption A6+ The last term is op ~1! under our conditions+ We also need further expansions L21

Gb ~ fiD ! 2 Gb ~ fi ! 5

1

1

( gb~,! ~ fi !~ fiD 2 fi ! , 1 L! gb~L! ~ fiN !~ fiD 2 fi ! L, ,50 ,!

whose properties can be similarly derived+

1012

OLIVER LINTON AND ZHIJIE XIAO

We verify that under Assumption A6, M11 is of order Op ~b ~®21!02® !+ We have n

1

( !n i51

M11 5

f ~«i !

1

L21

1

!n

,50

,!

1

1

!n

L!

1 1

f ' ~«i !

(

x i @1 2 Gb ~ fi !# n

f ' ~«i !

i51

f ~«i !

(

n

f ' ~«i !

i51

f ~«i !

(

~,!

x i gb ~ fi !~ fiD 2 fi ! , ~L!

x i gb ~ fiN !~ fiD 2 fi ! L+

It can be shown that under the assumptions, f ' ~«i !

n

1

( !n i51

f ~«i !

x i @1 2 Gb ~ fi !#

(A.12)

is the leading term and is of order Op ~b ~®21!02® !+ Notice that x i are mean zero and n $x i , «i % i51 are generated as an i+i+d+ sample+ We only need to calculate the second moment of ~A+2!, which is E

H (F G f ' ~«i !

1

n

n

i51

f ~«i !

2

x i x i' @1 2 Gb ~ fi !# 2

J

5 Vx E

HF G f ' ~«! f ~«!

J

2

@1 2 Gb ~ fi !# 2 +

Notice that Assumptions A2–A4 imply that in a small neighborhood around a, t 1

f ~«! '

®!

f ~®! ~ a!~« t 2 a! t ®, 1

f ' ~«! '

~® 2 1!!

(A.13)

f ~®! ~ a!~« t 2 a! t ®21

(A.14)

for at # « # at 1 d and d is small+ Similar results hold in small neighborhoods around a+S Thus, if we use the trimming function given by ~5!, noting that Gb ~ x! is a polynomial in ~ x 2 b!0b, it can be shown that E

HF G f ' ~«!

2

@1 2 Gb ~ fi !# 2

f ~«!

'

E

a1 t dt 1

at

1

E

J

c0 ~ a!~« t 2 a! t ®22 d« 1

aS

a2 S dN 1

E E

a1 t dt 2

a1 t dt 1

c0 ~ a!~« S 2 a! S ®22 d« 1

F( F(

G

2k12

cl ~ a!~« t 2 a! t l®22 b 12l d«

l51

a2 S dN 1

2k12

a2 S dN 2

G

cl ~ a!~« S 2 a! S l®22 b 12l d«,

l51

where dt 1 5 dN 1 5

S S

®! f

~®!

f

~®!

~ a! t

®! ~ a! S

D D

10®

b 10® ;

dt 2 5

b 10® ;

dN 2 5

10®

S S

2®! f

~®!

f

~®!

~ a! t

2®! ~ a! S

D D

10®

b 10®,

(A.15)

10®

b 10®,

(A.16)

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1013

and cl ~a! are functions of f ~®! ~a! and ®+ By calculating the integrals, we obtain that E

HF G f ' ~«!

2

@1 2 Gb ~ fi !# 2

f ~«!

J

' w~®, a, t a,S f !b ~®21!0®,

where w~®, a, t a,S f ! 5 c~®!@ f ~®! ~ a! t 10® 1 f ~®! ~ a! S 10® # , where c~®! is a coefficient that depends on ®+ In general, E

HF G f ' ~«!

2

@1 2 Gb ~ fi !# 2

f ~«!

J

' w~®, a, t a,S f !b ~®21!0®+

We now calculate the magnitude of 1

n

1

( !n ,! i51

f ' ~«i ! f ~«i !

~,!

x i gb ~ fi !~ fiD 2 fi ! ,,

, 5 0,1, + + + , L 2 1

and 1

n

1

( !n L! i51

f ' ~«i ! f ~«i !

~L!

x i gb ~ fiN !~ fiD 2 fi ! L+

The random variable 1

n

( !n i51

f ' ~«i ! f ~«i !

x i gb ~ fi !~ fiD 2 fi !

is mean zero, and its second moment is E

1

n

( n i51

F G f ' ~«i ! f ~«i !

2

x i x i' @ gb ~ fi !# 2 ~ fiD 2 fi ! 2 5 Vx E

F G f ' ~«i ! f ~«i !

2

@ gb ~ fi !# 2 ~ fiD 2 fi ! 2,

by iterated expectations+ Notice that for small enough b 6gb ~ fi !6 # b 21,

for any i,

and gb ~{! has support on @b,2b# + By an application of Lemma 2, we have E

F G f ' ~«i !

2

@ gb ~ fi !# 2 ~ fiD 2 fi ! 2

f ~«i !

# c1 ~h 2q 1 n21 h 21 log n!b 22 E

F G f ' ~«i ! f ~«i !

2

I ~b # fi # 2b!

for some constant c1 + Under Assumptions A2 and A4, we can use the approximations ~A+13! and ~A+14! and thus E

F G f ' ~«i ! f ~«i !

2

@ gb ~ fi !# 2 ~ fiD 2 fi ! 2 # c2 ~h 2q 1 n21 h 21 log n!b 2~®11!0® 5 o~b ~®21!0® !+

1014

OLIVER LINTON AND ZHIJIE XIAO

Thus f ' ~«i !

n

1

( !n i51

f ~«i !

x i gb ~ fi !~ fiD 2 fi ! 5 op ~b ~®21!02® !+

Similarly we can show that 1

1

!n

,!

n

f ' ~«i !

i51

f ~«i !

n

f ' ~«i !

(

~,!

x i gb ~ fi !~ fiD 2 fi ! , 5 op ~b ~®21!02® !,

for , 5 1, + + + , L 2 1+

For 1

1

( !n L! i51

f ~«i !

~L!

x i gb ~ fiN !~ fiD 2 fi ! L,

notice that c3

~L!

6gb ~ y!6 #

bL

,

for some constant c3 ,

( * !n L! i51 1

1

#

n

f ' ~«i ! f ~«i !

b

(* i51 n

c L11

~L!

x i gb ~ fiN !~ fiD 2 fi ! L

!n

max 6 fiD 2 fi 6 L

1#i#n

*

f ' ~«i ! f ~«i !

*

x i 1~b # fiN # 2b!+

Notice that h0b r 0 as n r ` and fiN is an intermediate point, by the result of Lemma 2, n

f ' ~«i !

i51

f ~«i !

* !n L! ( 1

1

5

c b

L11

~L!

x i gb ~ fiN !~ fiD 2 fi ! L

max 6 fiD 2 fi 6 L

1#i#n

(* !n i51 1

n

*

f ' ~«i ! f ~«i !

*

x i 1~b # fi # 2b!$1 1 op ~1!%,

and max 6 fiD 2 fi 6 L 5 Op ~h Lq 1 n2L02 h 2L02 log L n!+

1#i#n

Thus, for L . 4, n

f ' ~«i !

i51

f ~«i !

* !n L! ( 1

1

~L!

*

x i gb ~ fiN !~ fiD 2 fi ! L 5 op ~b ~®21!02® !+

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1015

The next two terms are M1 and M2 + Letting z J2 denote zz T for any vector z, we have E~M1 M1T ! ' E

5E

F F

Bi'

( !n i51 n

1

F

fi

fi fk n

2

( !n i51

x i x kT Gi Gk 1

fk' Bi' Bk

(

n

fi ' Bi fi 2 n

1

x i Gi

n i,( k51

x i x kT Gi Gk

fi fk2

i, k51

n

1

x i Gi 2

Bi' Bk'

n i,( k51

2E

5E

n

1

G

J2

fi ' fk' Bi Bk fi 2 fk2

x i x kT Gi Gk

G

H (F G J H (F G J H (F J G h n2q

n

n

i51

1E

2E

fi

~q11!

2

x i x iT Gi2 m q ~K ! 2

fi

h n2q

n

n

i51

fi ' fi

~q!

2

x i x iT Gi2 m q ~K ! 2

fi 2

2h n2q

n

n

i51

G

fi ' fi

~q!

fi

~q11!

2

x i x iT Gi2 m q ~K ! 2

fi 3

1 o~h n2q !

5 O~h n2q !+ For M2 , we write 1

n

Vi '

n

1

jij , ( f ~«i ! x i Gi 5 ~n 2 1!!n i51 (( !n i51 jÞi where jij 5 $Kh' n ~«i 2 «j ! 2 Ei Kh' n ~«i 2 «j !%x i Gi 0f ~«i ! satisfies Ei jij 5 0+ Furthermore, E @M2 M2T # ' ' '

1 3

n

1 n3 1 n

3

' Vx 5 Vx 5O

n

n

( ( ( ( E @jij jksT # i51 k51 jÞi sÞk n

( ( E @jij jijT # 1 smaller terms

i51 jÞi n

( ( E @Kh' ~«i 2 «j !# 2 Gi2 x i x iT f ~«i !22 1 smaller terms i51 jÞi n

n

1 n

3

n

1 n

(( i51 jÞi

3

EE

( ( h 23 i51 jÞi

S D 1

nh 3

+

h 24 K '

EE

S

«i 2 «j h

D

2

f ~«i !22 f ~«i ! f ~«j !d«i d«j

K ' ~u! 2 f ~«i !21 f ~«i 2 uh!dud«i

1016

OLIVER LINTON AND ZHIJIE XIAO

The magnitudes of other terms follow by straightforward moment calculations, because the random variables are U-statistics, of various orders, whose moments exist+ For example, for M3 5

f ' ~«i !

n

1

( f ~«i ! 2 x i Vi Gi , !n i51 ((iÞj wn ~Z i , Z j !, where

note that this is of the form wn ~Z i , Z j ! 5

f ' ~«i !

1

nh!n f ~«i !

2

FS

x i Gi K

«i 2 «j h

D

2 Ei K

S

«i 2 «j h

DG

,

which satisfies Ei wn ~Z i , Z j ! 5 0 and Ej wn ~Z i , Z j ! 5 0+ Furthermore, E @wn ~Z i , Z j !wn ~Z i , Z j ! T # 5

5

#

5

5

1 n 3h 2 1 3

n h

n h

n h c 3

n h

5O

S

f ~«i ! 2

Vx E

2

Vx E

2

Vx

c 3

SS SS FS EES

f ' ~«i !

2

c 3

E

Vx 1

n 3h

E

D

2

D D D ES

f ' ~«i !

2

f ~«i !

f ' ~«i !

2

f ~«i !

f ' ~«i ! f ~«i !

K~u! 2 du

FS D FS S S D

x i x iT Gi2 K

2

1

f ~«i ! 1

Gi2 K

2

Gi2 K

f ~«i !

2

f ' ~«i ! f ~«i !

h

2

f ~«i ! 1

«i 2 «j

S DG D D S DG D DG D

2 Ei K

«i 2 «j h

«i 2 «j

Gi2 K

h

«i 2 «j h

«i 2 «j

2

h

2 Ei K

«i 2 «j

2

h

2

2

f ~«i ! f ~«j !d«i d«j

2

Gi2 d«i

D

3 b 210® ,

where the last line follows because

E

`

b 10®

« 22 d« 5 O~b 210® !+

And thus M3 5 op ~n2102 h 2302 !+ Similarly, it can be verified that the rest of the terms are of order op ~n2102 h 2302 ! or op ~h q !+

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1017

For R n2 , by the Cauchy–Schwarz inequality, we obtain that 7R n2 7 2 # c

H

max

6 fiD 2 fi 6 6 fi

1#i#n

J

* n ( H fD ~« ! J x x GE *

102

GE i

6

f D ' ~«i !

n

1

3

2

T i

i

i51

102

,

i

i

where c is a constant, and we have

* n ( H fD ~« ! J x x GE * 5 O ~1! 1

n

f D ' ~«i !

2

i

i51

T i

i

p

i

6 fiD 2 fi 6 6

max

fi 6

1#i#n

GE i 5 Op ~b 26 h 6q 1 b 26 n23 h 23 ~log n! 6 !+

Thus, 7R n2 7 2 is of order Op ~h 3q b 23 1 n2302 h 2302 b 23 ~log n! 3 ! and, under Assumption A6, is of smaller order of magnitude than Op ~h q 1 n2102 h 2302 !+ We now turn to D H 5 IE n ~ b0 ! 2 I+ Notice that IE n ~ b0 ! 5 2

1

n

( x i Gb ~ fiD ! n i51

F

]f D ' ~«i !0]b fiD

S

2

DG

f D ' ~«i !]f D ~«i !0]b f D ~«i ! 2

fb5b0 +

Expanding each of the estimates in the square brackets to the third term, we have IE n ~ b0 ! 5 2 1

1

n

( x i @ fiN '' 1 BO i'' 1 VP i'' # n i51 1

n

( xi n i51

H

1

n

F

( x i Gb ~ fiD ! n i51

1 fi

2

H

fiN ''

n

( x i Gb ~ fiD ! n i51

H

fi

fi

fi

fiN '' fi

2

fiN '' fi

3

fi ' fiN '

1

fi 2

fi

22 22 13

1

1

2

G

~Bi 1 Vi ! 2 fi 3

VP i '' fi fi

Vi 2 1 2 fiN ' Bi'

1

fi 2 1

2

fi

fiN ' Bi' Vi fi

3

fi ' fiN ' fi

4

2

fiN '' fi 3 1

Bi' VP i '

fiN ' Vi ' Bi 3

3~Bi 1 Vi ! 2

BO i'' Vi

Vi 2

fi ' VP i ' fi

1

3

BO i''

1

fi

1 1

Bi 1 Vi

2

Bi 1 Vi

22

2

1

1

Gb ~ fiD !

~ fi ' 1 Bi' 1 Vi ' !~ fiN ' 1 BO i' 1 VP i ' ! 3

52

F

fi

2

22 22

Bi2 1 3

fi 4 fiN ''

2

fi

Bi 2

2

VP i '' Vi

2

fi

2

fiN ' Vi ' fi 2

fi

2

fi ' BO i' Bi 3

fiN ' Vi ' Vi fi

3

fi ' fiN ' fi

4

fi 3

22

22 22

Vi 2 1 6

2

2

fiN ''

fi 2

Vi ' VP i '

fi

1

fi

fi ' BO i'

1

Gb ~ fiD !

BO i'' Bi

J

Bi Vi

1

GJ

VP i '' Bi fi 2

Bi2

Bi' BO i'

1

fi 2

fi ' fiN ' fi

3

3

fi ' BO i' Vi fi

3

fi ' fiN ' fi 4

fi 2 fiN ' Bi' Bi

Bi 2 2

fi ' Bi VP i ' fi

BO i' Vi '

1

22 22

J

fi 3 fi ' fiN ' fi 3

Vi

fi ' VP i ' Vi

Bi Vi ,

fi 3

1018

OLIVER LINTON AND ZHIJIE XIAO

where we have written fi 5 f ~«i !, etc+, for convenience+ After collecting terms and dropping higher order terms, n

1

( x i x iT n i51

IE n ~ b0 ! 5

F H JG fi '' fi

2

fi '

2

fi

16

1

16

( Z , 5 In 1 ,50 ( Z, , ,50

where Z0 5 2

Z1 5 2

1

Z2 5 2

Z3 5

Z4 5

Z6 5

1

Z8 5

Z9 5

Z 10 5

n

1

1 n 1

n

i51 n

1

n

n

n

2

n

n

n

( x i GE i n i51

1

n

( x i GE i n i51

2

2

n

1 n

Bi' 1

fi 2

n

( x i GE i

i51

1

n

2

fi ' fi

fiN ''

,

Bi ,

fi 2

n

2

fi '

( x i fi2 BO i' GE i 2 n i51 ( x i GE i n i51

F

fi ' fiN ' fi 3

G

Bi ,

,

fi

fiN ''

fi

2

Vi 2

( x i GE i n i51

fi

Z5 5

,

2

BO i'' Bi fi 2

1

( x i GE i n i51

3

F G

n

1

fiN ' Bi' Bi fi

n

2

fiN ' Vi '

n

( x i GE i n i51

1

fiN '

FG F G

( xS 2i x iT n i51

VP i ''

( x i GE i n i51

( x i GE i n i51

1

fi

( x i GE i n i51 n

1

n

1

1

fi

( x i GE i n i51

( x i GE i n i51

1

fi ''

BO i''

( x i GE i

( x i GE i n i51

1

F G F G F G F G F G F G

( xS 2i x iT n i51

( x i GE i n i51

2

Z7 5

1

2

F

fi ' fiN '

n

1

( x i GE i n i51 Bi' BO i' fi 2

n

1

( x i GE i n i51

G F G

Vi ,

fi 3

fi 2

,

n

3

( x i GE i n i51

1

fiN '' Bi2 3

fi

fi ' VP i '

2

n

2

fi ' fiN ' Bi2 fi 4

( x i GE i n i51

fi ' BO i' Bi ,

fi 3

VP i '' Bi fi 2

Vi ' BO i' fi

2

2

Bi' VP i ' fi 2

2

F G Vi BO i''

( x i GE i n i51

fi

2

,

( x i GE i n i51

fi

n

2

( x i GE i n i51

2

fiN ' Vi Bi' 3

n

2

n

2

fiN ' Vi ' Bi

fi ' Bi VP i '

( x i GE i n i51

2

2

n

,

fi 3

,

fi 3

fiN '' Bi Vi

( x i GE i n i51

fi 3 fi ' BO i' Vi fi

3

1

6

n

( x i GE i n i51

fi ' fiN ' Bi Vi fi 4

,

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS Z 11 5 Z 12 5

1

n

( x i GE i n i51

3 n

Z 14 5 2 Z 16 5 2

n

( x i GE i

i51

2

n

F G F G

( x i GE i n i51

1

n

Vi ' VP i ' fi 2

fi ' fiN '

Vi 2 2

fi 4

fi

( xS 2i x iT n i51

( x i GE i

n

fiN ' Vi ' Vi 3

i51

Z 15 5 2

,

H F GJ fi '' fi

1

( x i GE i n i51

n

1

F G F G

n

1

Z 13 5

,

fiN ''

fi 3

2

1019

VP i '' Vi fi 2

,

Vi 2 ,

n

( x i GE i n i51

fi ' VP i ' Vi fi 3

,

2

fi ' fi

$1 2 GE i %+

Combining the expansions for sI n ~ b0 ! and IE n , notice that In 5 I 1 Op ~n2102 !+ Dropping higher order terms, we get 11

!n ~ bZ 2 b0 ! 5 I 21 ( Mj 1 I 21 j50

S( D 16

,50

Z , I 21

11

( Mj j50

5 I 21 M0 1 I 21 M1 1 I 21 M11 1 I 21 M2 1 I 21 Z 1 I 21 M0 1 I 21 Z 16 I 21 M0 1 op ~dn ! 5 X 0 1 T 1 h nqB 1

1

!nh

3 n

V 1 op ~dn !,

(A.17)

%, whereas where dn 5 max$h nq , n2102 h2302 n X 0 5 I 21 M0 , T 5 I 21 M11 1 I 21 Z 16 I 21 M0 , 21 B 5 h2q M1 1 I 21 ZO 1 I 21 M0 %, n $I

V 5 n 102 h n302I 21 M2 , where ZO 1 5 E~Z 1 !, and ~A+17! follows because Z 1 5 ZO 1 1 Op ~h nq n2102 ! by a central limit theorem for independent random variables+ Furthermore, note that the term B, being a sum of mean zero independent random variables, satisfies a central limit theorem+ For the trimming effect T, as we shown in the expansion of the score function, the first term, I 21 M11 , is of order Op ~b ~®21!02® !+ By a similar argument, it can be verified that I 21 Z 16 I 21 M0 is Op ~b ~®21!0® ! and thus of smaller order of magnitude than the leading term+ We have E~t 2 T ! 5 0, and E~t 2 T !~t 2 T ! T 5 h n2q E~BB T ! 1

1 nh n3

E~V V T !

5 I 21 E~M1 M1T !I 21 1 I 21 ZO 1 I 21 E~M0 M0T !I 21 ZO 1TI 21 1 2I 21 E~M1 M0T !I 21 ZO 1TI 21 1 I 21 E~M2 M2T !I 21+

1020

OLIVER LINTON AND ZHIJIE XIAO

Previous calculation gives that E~M1 M1T ! 5 h n2qM1 1 o~h n2q !+ Also, we have E~M0 M0T ! 5 I and E~M1 M0T ! 5 h nqM2 1 o~h nq !+ Finally, E~Z 1 ! 5 2 1

1

F G

n

Bi''

( xi n i51 1

fi

n

1

n

1

n

2

fi '

( x i fi2 BO i' 2 n i51 ( xi n i51

HF

5 2h nq m q ~K !Vx E

F G F G fiN ''

( xi n i51

fi

Bi 1

fi ' fiN ' fi 3

1

n

( xi n i51

f ~«!

2 2E

f ' ~«! 2 f ~q! ~«! f ~«! 3

F G fiN '

fi 2

Bi'

Bi

G F F G F

f ~«! ~q12!

1 2E

2

f ' ~«! f ~q11! ~«! f ~«! 2

2E

G

f ~2! ~«! f ~q! ~«! f ~«! 2

GJ

1 o~h nq ! 5 2h nq m q ~K !Vx Dq , ~2! ~«! 1 o~h nq ! and E @M2 M2T # ' Vx 5

n

1 n

1 nh 3

3

EE E

( ( h 23 i51 jÞi

Vx Ef ~«!21

K ' ~u! 2 f ~«i !21 f ~«i 2 uh!dud«i

K ' ~u! 2 du+

We now apply De Jong’s ~1987! central limit theorem for degenerate weighted U-statistics to the scalar quantity c T M2 for any vector c, and the result follows by an application of the Cramer–Wold device+ n Proof of Theorem 2. For linear hypothesis H0 : c T b 5 c0 , the corresponding t-statistic is t[ 5

c T bZ 2 c0 se~c b! Z Z T

5

c T bZ 2 c0

!n

c IZ n21 c

21 T

,

D is the estimator of the information matrix+ Under the null hypothwhere IZ n 5 2sI ' ~ b! esis that c T b 5 c0 , t[ 5

c T !n ~ bZ 2 b!

!c IZ T

21 n c

(A.18)

+

Under our conditions, c T IZ n21 c 5 c T @2sI ' ~ b0 !# 21 c 1 Op ~n2102 !+ Because IO 5 2sS ' ~ b0 ! 5 I 1 Op ~n2102 !, the expansions of @c T IZ n21 c# 2102 and !n ~ bZ 2 b! can be written as @c T IZ n21 c# 2102 5 @c T I 21 c# 2102 2

1 2

@c T I 21 c# 2302 c T I 21 D H I 21 c 1 higher order terms (A.19)

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS

1021

and

!n ~ bZ 2 b! 5 @I 21 1 I 21 D H I 21 #!n s~I b0 ! 1 higher order terms,

(A.20)

I b0 ! are given in the proof where D H is defined by ~11!+ The expansions of D H and !n s~ of Theorem 1+ Substituting these expansions and ~A+19! and ~A+20! into ~A+18!, it can be verified that, after dropping higher order terms,

S D S D S( D F S( D G S( D

t[ 5 @c T I 21 c# 2102 c T I 21

11

( Mi

1 @c T I 21 c# 2102 c T I 21

i50

2

1 2

16

@c T I 21 c# 2302 c T I 21

16

10

i51

i50

( Z i I 21

Mi

11

Z i I 21 c c T I 21

i51

Mi

1 higher order terms

i50

5 @c T I 21 c# 2102 c T I 21 !n s~ S b0 ! 1 @c T I 21 c# 2102 c T I 21 M11 1 @c T I 21 c# 2102 c T I 21 Z 16 I 21 M0 2

1 2

@c T I 21 c# 2302 @c T I 21 Z 16 I 21 c# c T I 21 M0

1 @c T I 21 c# 2102 c T I 21 M1 1 @c T I 21 c# 2102 c T I 21 Z 1 I 21 M0 2

1 2

@c T I 21 c# 2302 c T I 21 Z 1 I 21 cc T I 21 M0

1 @c T I 21 c# 2102 c T I 21 M2 1 higher order terms, where Mj and Z j are defined in the proof of Theorem 1+ Thus the t-statistic can be expanded as the sum of a leading term, @c T I 21 c# 2102 c T I 21 !n s~ S b0 !, and tt 5 @c T I 21 c# 2102 c T I 21 M11 1 @c T I 21 c# 2102 c T I 21 Z 16 I 21 M0 2

1 2

@c T I 21 c# 2302 @c T I 21 Z 16 I 21 c# c T I 21 M0

1 @c T I 21 c# 2102 c T I 21 M1 1 @c T I 21 c# 2102 c T I 21 Z 1 I 21 M0 2

1 2

@c T I 21 c# 2302 c T I 21 Z 1 I 21 cc T I 21 M0

1 @c T I 21 c# 2102 c T I 21 M2 + The term tt can be decomposed into a trimming effect tr 5 @c T I 21 c# 2102 c T I 21 M11 1 @c T I 21 c# 2102 c T I 21 Z 16 I 21 M0 2

1 2

@c T I 21 c# 2302 @c T I 21 Z 16 I 21 c# c T I 21 M0

and the nonparametric estimation biases and variances effect @c T I 21 c# 2102 c T I 21 M1 1 @c T I 21 c# 2102 c T I 21 Z 1 I 21 M0 2

1 2

@c T I 21 c# 2302 c T I 21 Z 1 I 21 cc T I 21 M0

1 @c T I 21 c# 2102 c T I 21 M2 ,

1022

OLIVER LINTON AND ZHIJIE XIAO

which is mean zero, and it can be verified that E~tt 2 tr ! 2 5 h 2q $@c T I 21 c# 21 ~c T I 21 M1 I 21 c! 1 @c T I 21 c# 21 ~c T I 21 M3 I 21 M3 I 21 c! 2

3

@c T I 21 c# 22 ~c T I 21 M3 I 21 c! 2

4

1 2@c T I 21 c# 21 ~c T I 21 M2 I 21 M3 I 21 c! 2 @c T I 21 c# 22 ~c T I 21 M2 I 21 c!~c T I 21 M3 I 21 c!% 1 n21 h 23 @c T I 21 c# 21 ~c T I 21 S1 I 21 c!,

n

giving the result in Theorem 2+

Proof of Theorem 3. The steps of expansions for the adaptive estimator of models ~1! and ~17! are very similar to those in the previous section+ Notice that x i* and «j are i+i+d+ and are mutually independent and that the analysis for the x i* part is basically the same as those given earlier in this Appendix+ The only thing that is substantially differ` ent from the previous section lies on the part of ( k51 Ck «t2k , which has serial correlation+ Specifically, the expansion ~A+17! holds with the same included terms and the same magnitude of approximation error+ The two main differences arise in the properties of M1 and M2 + Note that ~ f ' ~«t !0f ~«t !!x t is a martingale difference sequence because E

F G f ' ~«t ! f ~«t !

50

for any regular density f+ Therefore, E~M0 M0T ! 5

1

E n( t

F

f ' ~«t ! f ~«t !

xt

G

J2

5 I ~ f !E~ x t x tT !

by stationarity and the assumption about the process x+ Furthermore, Z 1 5 E~Z 1 ! 1 op ~n2102 !, just as in the i+i+d+ case+ Note that M1 5

1

n

( !n i51

H

Bi' fi

2

fi ' Bi fi

2

J

x i Gi $1 1 o~1!% 5 h nq m q ~K !

1

n

( h~«i !x i Gi $1 1 o~1!%, !n i51

where h~«i ! 5

f ~q11! ~«i ! f ~«i !

2

f ' ~«i ! f ~q! ~«i ! f 2 ~«i !

is not a martingale, because E @h~«i !# Þ 0+ However, when f is symmetric about zero it is a martingale because it is an odd function+ In any case, E~M1 ! 5 0 and M1 satisfies a central limit theorem for mixing processes+ Indeed, E~M1 M1T ! 5 h n2q m 2q ~K !E ' h n2q m 2q ~K !E

F! ( F ( n

1

n

i51

1

n

n

i, k51

h~«i !x i Gi

G

J2

G

h~«i !x i h~«k !x kT ,

EXPANSION FOR ADAPTIVE REGRESSION ESTIMATORS where E E

F

F

n

1

( h 2 ~«i !x i x iT n i51

1

h~«i !x i h~«k !x kT n (( iÞk

G G

1023

5 E @h 2 ~«i !#Vx , 5 5

2

n21 n2i

n

i51 r51

2

n21 n2i `

n

i51 r51 j50 ,50

T # ( ( E @h~«i !x i h~«i1r !x i1r `

( ( ( ( Cj C,T

3 E @h~«i !«i212j h~«i1r !«i1r212, # 5

2

n21 n2i `

n

i51 r51 ,50

2 # ( ( ( Cr1, C,TE @h~«i !#E @h~«i1r !#E @«i212,

5 2E 2 @h~«i !#s«2

1

n21

`

5 2E 2 @h~«i !#s«2 Similarly, we can show that h nq

E~M1 M0T ! 5

n

n

m q ~K ! (

F

n

(

E h~«i !

i51 k51

5 h nq m q ~K !Vx E

H

fi

~q11!

fi '

fi

fi

f ' ~«k ! f ~«k ! 2

fi ' fi

n2i `

J

`

( ( Cr1, C,T 1 o~1!+ r51 ,50

G

x i Gi x kT Gk $1 1 o~1!%

~q!

fi 2

H

( r51 ( ,50 ( Cr1, C,T n i51

fi ' fi

J

1 o~h nq !+

The second main departure is in the term M2 5

1 ~n 2 1!!n

5

jij (( jÞi

1 ~n 2 1!!n

$Kh' ~«i 2 «j ! 2 Ei Kh' ~«i 2 «j !%x i Gi 0f ~«i !+ ( jÞi n

n

Note that M2 is not necessarily mean zero because when j , i, Kh' n ~«i 2 «j ! can be correlated with x i + However, E~M2 ! 5 ' 5 5

n i21

1

( j51 ( E @Kh' ~«i 2 «j !x i Gi 0f ~«i !# ~n 2 1!!n i52 n

n i21 `

1

( j51 ( ,51 ( C, E @Kh' ~«i 2 «j !«i2, 0f ~«i !# ~n 2 1!!n i52 n

n i21

1

( j51 ( Ci2j E @Kh' ~«i 2 «j !«j 0f ~«i !# ~n 2 1!!n i52 n

E @ f ' ~«i !«i 0f ~«i ! 1 O~h nq !#

5 O~n

~n 2 1!!n 2102

!+

n

n2i

( ( Ck i52 k51

1024

OLIVER LINTON AND ZHIJIE XIAO

Furthermore, var~M2 ! is the same as for the associated i+i+d+ sequence ~see Fan and Li, 1996!+ Let M2* 5 M2 2 E~M2 !, where n

1

jij* , (( ~n 2 1!!n i51 jÞi

M2* 5

where jij* 5 jij 2 E~jij !+ Therefore, it can be verified that the leading term in n n E~M2* M2*T ! 5 ~10~n 2 1! 2 n! ( i51 (jÞi (,51 (sÞ, E~jij* j,s*T ! is given as follows:

S

n

1

`

E @Kh' ~«i 2 «j !# 2 ( Ck «i2k (( ~n 2 1! n i51 jÞi k51 2

n

n

1

DS ( `

D

CsT «i2s f ~«i !22

s51

`

2 ! f ~«i !22 (( ( E @Kh' ~«i 2 «j !# 2 ~Ck CkT «i2k ~n 2 1! n i51 jÞi k51

5

2

n

n

1

2 E @Kh' ~«i 2 «j !# 2 ~Ck CkT «i2k ! f ~«i !22+ (( ~n 2 1! n i51 jÞi

5

2

n

n In this expectation, if j Þ i 2 k, we get s«2 ~ ( k51 Ck CkT !~10~n 2 1! 2 n! ( i51 3 ' 2 2 22 (jÞi E @Khn ~«i 2 «j !# f ~«i ! , and for j 5 i 2 k, there is correlation between «i2k and Kh' ~«i 2 «j !, and we get `

n

1

T E @Kh' ~«i 2 «j !# 2 ~Ci2j Ci2j «j2 ! f ~«i !22+ (( ~n 2 1! n i51 jÞi 2

n

For the first term,

S( D ( ( S ( D ( ( EE S D S ( D ( ( EE S( D ( ( E S( D E Ck CkT

k51

n

1

`

s«2

~n 2 1! n 2

Ck CkT

k51

~n 2 1! n 2

Ck CkT

k51

~n 2 1! n 2

Ck CkT

k51

;

1

nh 3

`

s«2

~n 2 1! 2 n

n h

h

2

f ~«i !22 f ~«i ! f ~«j !d«i d«j

h 23

K ' ~u! 2 f ~«i !21 f ~«i 2 uh!dud«i

i51 jÞi

3

Ef ~«!21

K ' ~u! 2 du

i51 jÞi

Ck CkT Ef ~«!21

K ' ~u! 2 du+

k51

Notice that 1

3

i51 jÞi

«i 2 «j

n

1

`

; s«2

h 24 K '

n

1

`

5 s«2

n

1

`

5 s«2

E @Kh' n ~«i 2 «j !# 2 f ~«i !22

i51 jÞi

T 5 O~n!, and it follows that (i (jÞi Ci2j Ci2j

n

T «j2 ! f ~«i !22 5 O~n22 h 23 !+ ( ( E @Kh' ~«i 2 «j !# 2 ~Ci2j Ci2j n

i51 jÞi

The central limit theorem follows from Theorem 2+1 of Fan and Li ~1996!+

n

Suggest Documents