Convergence rates for series estimators - DSpace@MIT

21 downloads 0 Views 1MB Size Report
Jul 10, 1993 - This paper gives convergence rates these estimators. ... improves on mau>y previousresults in the convergence rate or generality of regularity.
Digitized by the Internet Archive in

2011 with funding from

Boston Library Consortium

Member

Libraries

http://www.archive.org/details/convergenceratesOOnewe

working paper department of economics

massachusetts institute of

technology 50 memorial drive Cambridge, mass. 02139

CONVERGENCE RATES FOR SERIES ESTIMATORS

Whitney No.

93-10

K.

Newey July 1993

W6 12.9J* rEGBVk

i

CONVERGENCE RATES FOR SERIES ESTIMATORS

Whitney K. Newey

MIT Department of Economics July, 1993

This paper consists of part of one originally titled "Consistency and Asymptotic Normality of Nonparametric Projection Estimators." Helpful comments were provided by Andreas Buja and financial support by the NSF and the Sloan Foundation.

Abstract

Least squares projections are a useful way of describing the relationship between

random

variables.

functions.

These include conditional expectations and projections on additive

Series estimators,

i.e.

regressions on a finite dimensional vector where

dimension grows with sample size, provide a convenient way of estimating such projections.

This paper gives convergence rates these estimators.

derived, and primitive regularity conditions given for

Keywords:

General results are

power series and

splines.

Nonparametric regression, additive interactive models, random coefficients,

polynomials, splines, convergence rates.

1.

Introduction

Least squares projections of a random variable

x

provide a useful

example

is

way

on functions of a random vector

of describing the relationship between

and

y

The simplest

x.

linear regression, the least squares projection on the set of linear

combinations of

as exemplified in Rao (1973, Chapter

x,

nonpar ametric example functions of fall in

y

x

is

4).

An interesting

the conditional expectation, the projection on the set of all

with finite mean square.

There are also a variety of projections that

between these two polar cases, where the set of functions

One example

linear combinations but smaller than all functions.

is

is

larger than

all

an additive

regression, the projection on functions that are additive in the different elements of x.

This case

is

motivated partly by the difficulty of estimating conditional

expectations when

Friedman (1977).

(1985),

x

has many components: see Breiman and Stone (1978), Breiman and

Friedman and Stuetzle

A generalization

(1981),

Stone (1985), and Zeldin and Thomas

that includes some interaction terms

functions that are additive in some subvectors of

combinations of functions of

x,

x.

is

the projection on

Another example

is

random

linear

as suggested by Riedel (1992) for growth curve

estimation.

One simple way to estimate nonparametric projections

is

by regression on a finite

dimensional subset, with dimension allowed to grow with the sample size, e.g. as in

Agarwal and Studden

(1980), Gallant (1981), Stone (1985),

which will be referred to here as series estimation.

Cox

(1988),

and Andrews

This type of estimator

may

(1991),

not be

good at recovering the "fine structure" of the projection relative to other smoothers, e.g.

see Buja, Hastie, and Tibshirani (1989), but

is

computationally simple.

Also,

projections often show up as nuisance functions in semiparametric estimation, where the fine structure is less important.

This paper derives convergence rates for series estimators of projections.

Convergence rates are important because they show how dimension affects the asymptotic

accuracy of the estimators

(e.g.

Stone 1982, 1985).

Also, they are useful for the

theory of semiparametric estimators that depend on projection estimates 1993a).

(e.g.

Newey

The paper gives mean-square rates for estimation of the projection and uniform

convergence rates for estimation of functions and derivatives.

Fully primitive

regularity conditions are given for power series and regression splines, as well as more

general conditions that

may apply

to other types of series.

Previous work on convergence rates for series estimates includes Agarwal and Studden

Cox

(1980), Stone (1935, 1990),

(1988),

improves on mau>y previous results

and Andrews and Whang (1990).

in the

This paper

convergence rate or generality of regularity

Uniform convergence rates for functions and their derivatives are given and

conditions.

some of the results allow for a data-based number of approximating terms, unlike

Cox in

but

all

Also, the projection does not have to equal the conditional expectation, as

(1988).

Stone (1985, 1990) but not the others.

2.

Series Estimators

The results of this paper concern estimators of least squares projections that can be described as follows. functions of

z,

Let

x

with

z

denote a data observation,

having dimension

linear subspace of the set of all functions of

projection of

(2.1)

y

on

&

is

„E[

(measurable)

denote a mean-squared closed,

with finite mean-square.

The

2 ].

the conditional expectation,

measurable functions of

x

^

x

and

is

g Q (x) = argmin

An example

Let

r.

y

x

g n (x) = E[y|x],

with finite mean-square.

as illustrations, and are of interest in their

own

right.

Two

where



is

the set of

all

further examples will be used

When

Additive- Interactive Projections: difficult to estimate

general x,

way

a feature often referred to as the "curse of

E[y|x],

so that the individual components have smaller dimension than

x,

to describe these is to let

W =

For example,

{Z^g^)

if

L =

additive functions.

:

ngt (xt

and each

r

Z )

)

x.

The projection on

nonparametric nonlinearities

x.,

(1

=

L)

1

x.

One

be distinct subvectors of


,

are satisfied, then

2 2a/a ;. S[g(x)-g (x)] dF(x) = O (K/n * K~

The integrated mean square error result for splines that been derived by Stone (1990). (1990) give the

The rest of

An implication of Theorem

optimal integrated mean-square convergence rate

between certain bounds. and

a

>

If

3o/2,

attains Stone's (1982) bound.

= Cn

Andrews and Whang

this result is new, although

same conclusion for the sample mean square error of power series under

different hypotheses.

= a/(2A+a),

given here has previously

is

C a

there are

c >

4.1 is that

if

power series

the number of terms

such that

K =

cn

then the mean-square convergence rate

The side condition that

A

satisfies Assumption 4.2.

spline version of Stone (1990), but

it

&

>

3n/2

is

will have an is

chosen randomly

K = Cn

,

n

~

,

,

needed to ensure

o.

>

K

a/2.

Theorem 3.2 can be specialized to obtain uniform convergence rates for power

13

which

similar side condition is present for the

has the less strigent form of

and spline estimators.

where

series

y

Theorem

If Assumptions

4.2:

\g -

g

\

3.1,

(K[(K/n)

=

and 1/2

4.1 - 4.3

are satisfied, then for power series

K^l),

+

and for regression splines,

If - g Q

\

=

O (K p

1/2

[(K/n)

1/2

+

K^l).

Obtaining uniform convergence rates for derivatives

is

more

approxirnaton rates are difficult to find in the literature. function

argument

because

When the argument of each

only one dimensional, an approximation rate follows by a simple integration

is

Lemma

see

(e.g.

A. 12 in the Appendix).

convergence rate for the one-dimensional

Theorem

difficult,

4.3:

If Assumptions

3.1

\g -

additive model) case.

(i.e.

and 4.1-4.3 are satisfied,

m

power series or a regression spline with i-

This approach leads to the following

n /T?l + 2d,{[K/n] rlz, .1/2 gQ d = O (K i

\

fc

1 =

1,

d

< &,

p

(x)

h-d, then for power series,

d,

-,-A+d..

+

jc

;;,

and for splines

\g -

gQ d = \

In the case of

power

(K

{[K/n]

+

K

}).

series, it is possible to obtain

an approximation rate by a

Taylor expansion argument when the derivatives do not grow too fast with their order.

The rate

is

faster than any power of

K,

14

leading to the following result.

is a

Theorem

4.4:

If Assumptions

C

3.1

and 4.1-4.3 are satisfied,

such that for each multi-index

and there

is

derivative

of each additive component of

a constant

a

any positive integers

\g-g

\

d

-

and

o (K

1+2d

X,

(x) the

is a

X

power

series,

partial

exists and is bounded by

g(x)

C

,

then for

d,

{[K/n]

1/2

*

p

jf

a ;;.

The uniform convergence rates are not optimal improve on existing results.

p

in the

sense of Stone (1982), but they

For the one regressor, power series case Theorem 4.2

improves on Cox's (1988) rate of

(K ).

For the other cases there do

not seem to be any existing results in the literature, so that Theorems 4.2 - 4.4 give

the only uniform convergence rates available.

It

would be interesting to obtain further

improvements on these results, and investigate the possibility of attaining optimal uniform convergence rates for series estimators of additive interactive models.

5.

Covariate Interactive Projections.

Estimation of random coefficient projections provides a second example of how the general results of Section 3 can be applied to specific estimators.

This Section gives

convergence rates for power series and regression spline estimators of projections on the set

&

described in equation (2.3).

For simplicity, results will be restricted to

mean-square and uniform convergence rates for the function, but not for Also, the

u

K.

in

equation (2.3) will each be taken equal to the set of

all

its derivatives.

functions of

with finite mean-square.

Convergence rates can be derived under the following analog to the conditions of Section

4.

15

Assumption

5.1:

u

i)

continuously distributed with a support that

is

product of compact intervals, and bounded density that

K

ii)

p.

K

(u)

is



r [-1,1]

is

= is

...,

1,

_4 K /n

a power series with

K(n) = K(n) = K,

,

is

L),

and

p (x) =

»

0,

or

b) P (u) kK

2



K /n

and

>

0.

iii)

continuously differentiable of order

E[ww'

bounded, and

support of



K

£

restricted to be a multiple of

is

w®p

K/£ (u)

a cartesian

away from

a.

are splines, the support of

on the support of is

zero.

where either

Each of the components

has smallest eigenvalue that

|u]

also bounded

is

is

h,

u.;

(u),

iv)

u (I

w

bounded away from zero on the

u..

These conditions lead to the following result on mean-square convergence.

Theorem

5.1:

If Assumptions

Z^&xJ-grfxjf/n Also,

is

=

and

are satisfied, then

5.1

(K/n + k'

^

2

1

"),

2

0,

x

subvector of

is a

c >

1

such that for each

cSa(x)dlF(x )'F(x (

{Z^^x^t ElhfxJ2]

x„

< », I =

C )]

*

t

L)

1

closed in mean-square.

Proof:

Let

H

=

L

-

-

iZ^hfa)}

and

II

2 1/2 a ^ = [Ja(x) dF(x)J II

.

By Proposition 2 of Section

4 of the Appendix of Bickel, Klaasen, Ritov, and Wellner (1993),

17

K

is

closed

if

and

H

.}

only

there

if

(note

h,

C

a constant

is

such that for each

need not be unique).

h-

maximal dimension of

h e H,

h

II

£ Cmax.dlh.ll

II

Lemma

Following Stone (1990,

1),

for some

}

suppose that the

x.

is

r,

and suppose that this property holds whenever the

maximal dimension of the

x-

is

r-1

= E»,h.(x.), E[h.(xJS(x.,

such that for =

)]

Consequently,

for all measurable functions of

-1

~

E[h.(x.

2

To show

1.

)

— x.

components

that there

x.,

is

a unique decomposition

is

a constant

x.,

c

xf >

that

,

is

not a proper

such that

1

E[h(x)

that are not components of

function of a strict subvector of

2

1

s c" J

maxKsK3^ llL.-Zj.ll =

Then by the Markov inequality, conclusion then follows by

=

0,

and

,

k

a

I,

3P(f,m,x)/3x. = P(3f/3x.,m-l,x),

Also, 3 f(x)

C m(K)

such that

* a f (x)_sAf (x)

case, note that

there is

f(x)\ s

form of the remainder,

= P(f,m(K),x).

'

CK~

max r \d

is

denote the Taylor series up

P(f,m,x)

f(x),

X

differentiable of order

Proof:

all

f(x) - P(3 f,m-| A|,x)|

For splines, if

A.14:

iZ-p^'nl

such that for

be the largest integer such that

/[(m(K)-d)!] s

sup

such that fix)

A,

>

3 P(f,m,x) = P(3 f,m-|A|,x).

m(K)

C

is

.

star-shaped, there exists

max

and

C

there is

so that by the intermediate value

p (x),

star-shaped and there

orders and for all multi-indices

for an expansion around

by induction

Next, let

is

U

Oss

all

to order

1

all

d >

a,

QED

.

For power series, if

A.13:

continuously differentiable of

C

+1

d =

3 p (x)/3x

X

Is

X sd |

X

'

(x)_P(aXf m(K)_ X '

a compact box and

then there are

a = /-d

^

for

a,

r =

1

C > and

I

fix)

I

>

is

x)

I

"

CK_a

K

such that for

all

d s m-1

a = £/r

and

a spanning vector for splines of degree

23

-

continuously

follows by Theorem 12.8 of Schumaker (1981). is

QED

-

there is

for

n

d =

0.

For the other

m-d,

with knot

spacing bounded by

w

there exists

OK

CK

K

for

such that for

K

f

large enough and some

K

= p

(x)

K

(x)'7r

K

,

sup

Therefore, by Powell

C.

d

x

3 f(x)/3x

l

d

d

- 3 f

(x)/3x

(1981),

d


0)P,

the elements of

knots,

-

ck 5+v+2\

QED.

.

Next, a well known property of B-splines that for

P

s

satisfied then Assumptions 3.2 and

4.1 is

x = x_

where

corresponding to components of

Theorem

CK

a

for the knot sequence

m, j,

^(K)

Then existence of a nonsingular matrix by inclusion

5 * ,HW

so

(x)

-

,

/rf -

iv ***' 5)

equation (2.3),

in

J

last equality follows by

A.16:

as

,

J

J

cn'iwuk-.)]

Mk-s)

= C*C

is

C

with

X

.

min

(I

1

P. .

.

c,L

p.

1587) that for

(x)P. . (x)'dx) £ t,L

C

P. ,(x) =

for

all

Therefore, the boundedness away from zero of the smallest

P

K

(x)

a subvector of

25

r

®»_,P/

,

(

x #). analogously to the proof of

Lemma

since changing even knot spacing

Also,

A. 14.

argument of B-splines,

sup_ \d B

Ax)/dx

.

Lemma

that there

n

K

2 (x.)'ji]

t

Lemma

Also, by

let

/n s sup

l

„|g n (x)-p

K

K

let

Z = JT (x)P (x)'dF(x)

A. 10

and

let

re

the hypotheses of

P

K

=

(K

_2a )

so

= O (K —

_2a ).

p



= (K/n)

e

1/2

The

A. 7.

e

K p

hypotheses of

Lemma

A. 8,

By Assumption 3.2 and Lemmas

)]'.

n

replacing

(x)



= (K/n)

Then

(x).

eq.

1/2

For each

.

(A.2) is

K 2 T[g (x)-P (x)'Tt] dF(x) s

n

(K/n) +

+

Proof of Theorem 3.2: (x),

d = 0,

Then by the second conclusion of Lemma A.8,

).

p

p

K

the

K 2 2 J[g (x)-i(x)] dF(x) s 2T[g (x)-P (x)'n] dF(x) + 2(n-7t)'Z(n-ir)

s

p

K

_2a

with

A.8 are satisfied with

K

(K

CK

s

In the

A. 8.

K

p (x)) and

—let

*y

(x)'ir|

P

Lemma

P (x

[pfyxj]

Lemma

replacing

(x)

\C

„|g n (x)-P

(A.3)

p

be as above, except with

satisfied (with

sup

and

2

(x)'ir|

A. 7 is satisfied

Lemma

proven using

is

K

u

xea.

Lemma

the hypothesis of

A.ll,

The second conclusion

K

in

be that from Assumption 3.4 with

it

first conclusion then follows by the conclusion of

A.ll,

present follows as

is

such that

E.i=l [g u n (x.)-p l

(A.2)

K

For each

3.1:

C

is

x

implying the bounds on

QED.

8.4.

Proof of Theorem

d s m,

,

The proof when

derivatives given in the conclusion.

proof of

J CL

I

equivalent to rescaling the

is

Because

P

(l)r.

[g ft ^1=1 °0

(x)

is

i

|g_(x)-P%x)'ir|

b),

.,

d

when

K a

so that

K,

n

Lemma

2 (x.)'Tr]

p

P

K

K

|g-(x)-P (x)'n|

,

d

=

_a

p

(K —

).

(x).

p

2a ).

QED.

Also, by

|g_(x)-P (x)'ir|. s

Also,

Lemma

A.8 and the triangle inequality,

26

(K/n + — K

K

replacing

(x)

can be chosen so that

O

/n =

l

of Theorem 3.1 that eq. (A.2) and the hypotheses of first conclusion of

K

l

a constant nonsingular linear transformation of

Assumption 3.4 will be satisfied for

Assumption 3.3

(x.)-P

it

follows as in the proof

A.8 are satisfied.

Then by the

K

(A.4)

lg -£l

K/

s lg -P 'nl

d

IP

d

pap

= ° (K~

Proof of Theorem

=

C.(K)0 ((K/n)

CK

wflere for eacn

and

A. 12

A. 14

+K

_a

d =

a = a/a.

and

s O (K p

=

)

4.1 it

d

d

1/2

+K~ ~~

Lemma

A. 3 there exists

Lemma

x,

follows by Theorem 3.1 with for splines.

d =

K

there

n = £.n.,

Lemma

K

A. 3 are

a representation

is

n

I

Assumption 3.4

a.

lg n » - P

with

K

'

n n \

satisfied

is

Assumptions 3.2 and equation

A. 15,

C Q (K) =

and

QED.

]).

less than or equal to

is

(3.1)

are

Then the conclusion

and Assumption 4.2 implies that Assumption 3.3 holds.

satisfied,

a

follows that the hypotheses of

inequality, for

Also, by

+ C (K)"*-wll

)

(C,

tne dimension of

*

a

pa,(K)[(K/n)

follows that for each

it

Then by the triangle

with

1/2

By Assumption

4.1:

E/8n^ x ^'

Then by Lemmas s

+

)

Therefore, by the conclusion of

satisfied.

SqM

a

(w-w)l

C n (K) =

for power series and

K

1/2

QED.

Proof of Theorem 4.2:

It

follows as in the proof of Theorem 4.1 that Assumptions 3.1 -

3.4 are satisfied, with £ (K) = n

K

C n (K) =

for power series and

K

1/2

The

for splines.

conclusion then follows by Theorem 3.2.

QED.

Proof of Theorem 4.3:

proof of Theorem 4.2, except that Assumption 3.4

is

now

(3.1)

< d (K) =

a = -&+d

satisfied with

are

now

K

satisfied with

(1/2)+d

K

A. 12

and

that Assumption 4.1

A. 12 is

and Assumption 3.2 and equation

A. 14,

for power series, by

Lemma

A.16.

Lemma

A. 15,

Lemma

and

A. 14,

a

Assumption 3.4

satisfied with

u

replacing

equal to the vector from the conclusion of

27

is

x.

satisfied with

Let

Lemmas

A. 13 is

QED.

> 0.

By

similar to that of Theorems 4.1 and 4.2.

is

and with

QED.

Follows as in the proof of Theorem 4.3, except that

The proof

5.1:

bounded and Lemmas

(u)

Lemmas

by

show that Assumption 3.4 holds for any

Proof of Theorem

P

in the

Cj(K) =

for splines, by

Proof of Theorem 4.4: applied to

Follows as

P

(x)

a =