Undersmoothing and bias corrected functional estimation

13 downloads 0 Views 3MB Size Report
"undersmoothing" of the nonparametric estimator to achieve /n-consistency, meaning the bias of the nonparametric estimator shrinks faster than its variance.
M.I.T

LIBRARIES -DEWEY

Digitized by the Internet Archive in

2011 with funding from

Boston Library Consortium

Member

Libraries

http://www.archive.org/details/undersmoothingbiOOnewe

!iB;V£Y

working paper department of

economics

Undersmoothing and Bias Corrected Functional Estimation

massachusetts institute of

techinology 50 memorial drive Cambridge, mass. 02139

WORKING PAPER DEPARTMENT OF ECONOMICS

Undersmoothing and Bias Corrected Functional Estimation

Whitney Newey

No. 98-17

October 1998

MASSACHUSEHS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142

1

MASSACHUSEHS OF TECHNOLOGY

LIBRARIES

UNDERSMOOTHING AND Abbreviated Title:

BIAS

CORRECTED FUNCTIONAL ESTIMATION^

Undersmoothing and Bias Correction

By Whitney Newey, Pushing Hsieh, and James Robins

Massachusetts Institute of Technology, Academia Sinica, and Harvard University First version October 1991; revised July 1998.

SUMMARY There are many important example of v^-consistently estimable functionals that are interesting in econometrics, such as average derivatives and nonparametric consumer surplus.

Corresponding estimators may require undersmoothing to achieve v^-consistency,

due to first order bias in the expected influence function.

We

give a general bias

correction that can be added to a plug-in estimator to remove the need for undersmoothing

and improve

its

higher order properties.

We

also describe a bootstrap smoothing

correction for the nonparametric estimator that achieves analogous results for the plug-in estimator and show that idempotent transformations of the empirical distribution

need not require undersmoothing for -/n-consistency.

We

find that this bias correction

can lead to large efficiency improvements and lower sensitivity to bandwidth choice.

Research partially supported by NSF grant SBR-9409707.

Key words and phrases.

Functional estimation, nonparametric estimation, v'n-consistency,

undersmoothing, influence function, bias correction.

Introduction

1.

Functionals of nonparametric estimators that can be /n-consistent have important applications in econometrics, including average derivatives and average consumer surplus.

Most functional estimators are based on nonparametric estimation and many require "undersmoothing" of the nonparametric estimator to achieve /n-consistency, meaning the bias of the nonparametric estimator shrinks faster than its variance.

show that

this

In this

paper we

requirement can be removed, by either adding a bias correction term to the

functional estimator or using a smoothing correction for the nonparametric estimator, leading to v^-consistent functional estimation without undersmoothing.

These

modifications also lead to functional estimators with improved higher-order efficiency, that

may

attain ^-consistency

The source of estimator.

this

improvement

is

not.

a reduction in the bias of the functional

Many previous functional estimators have a

bias of the density estimator. is

when others do

In contrast,

bias that

the estimators

is

the same order as the

we develop have

a bias that

the same order as the product of the bias of the nonparametric estimator with another

bias term.

which

is

The other bias term

is

that for the influence function of the estimator,

the mean-square derivative of the functional.

We

refer to the corresponding

reduction in bias order as a bias complementarity, with the influence function bias term

complementing the nonparametric bias term.

The properties of the influence function play a key role influence function

is

in

our analysis.

When the

smooth, in certain ways to be made precise below, the influence

function bias term will be small enough that the need for undersmoothing will be removed.

For some of the estimators we consider, the influence function will be required to be

smooth as a function of the density. satisfied.

This

is

a regularity condition that

For other estimators the influence function will be required to be smooth as

a function of the data, a condition that does not hold for

We

often

is

some functionals.

consider two approaches to bias corrected functional estimation.

-

1

-

Our first

approach

is

to add to the functional estimator a bias correction, consisting of the

difference of integrals of an influence function estimator over the empirical This additional term does not affect the

distribution and the nonparametric estimate.

asymptotic distribution of the estimator

Our second approach

improved higher order properties.

case but

in the Vn'-consistent

correction to the nonparametric estimator before using

is

it

it

used to obtain

G

where

is

in functional estimation.

an estimator obtained from

from the empirical

F

distribution.

We show

we show

by

F = F -

that using this estimator

smooth enough.

is

For

that this smoothing correction leads to a particular kind of

We

kernel, the "twicing" kernel described below.

be needed for /n-consistency distribution, so that

F

This

by the same transformation

F

removes the need for undersmoothing when the influence function kernel estimators

.

to do a bootstrap smoothing

correction consists of replacing a nonparametric distribution estimator (G - F) = 2F - G,

does lead to

G = F

when

F

is

and

F =

F,

also

show that undersmoothing

will not

an idempotent transformation of the empirical

and the bootstrap correction

is

built into the

This idempotent estimator class includes orthogonal series density

original estimator.

estimators, series estimators of conditional expectations, and sieve estimators, and thus

provides an explanation for the lack of an undersmoothing requirement for these estimators.

We emphasize

that the bias reduction

bias of the density estimator.

we

consider does not come from reducing the

Such higher order bias reductions

(e.g.

via higher-order

kernels) do not remove the requirement that bias shrinks faster than variance for the

nonparametric estimator. functional estimator

reduction in bias

is

is

That requirement can only be removed

the bias of the

smaller order than the bias of the nonparametric estimator.

This

brought about by the bias complementarity we will discuss.

The bias reduction may be accompanied by some increase functional estimator.

if

Although the variance

the variance will be larger.

In large

still

variance of the

shrinks at the same rate, the size of

samples the reduction

in bias will

adjustment of the smoothing parameters so that the variance

- 2 -

in the

is

allow

smaller, although the bias

reduction could increase the small sample variance of the estimator.

nonparametric estimators

(like

Bias reductions for

higher order kernels) have similar properties, except that

they depend on higher order properties of the nonparametric estimator whereas our functional bias reduction depends on the properties of the influence function.

We

find

that in a simple example the bias correction can give a large efficiency gain and reduce sensitivity to the choice of bandwidth.

The type of estimator we consider has antecedents

Ritov (1988) developed an estimator for the average density that

minimal smoothness conditions.

Bickel and

in the literature.

is

v^-consistent under

This estimator has a similar form to the bias corrected

Our

functional discussed here, as do estimators in Pastuchova and Hasminskii (1989).

contribution

improvement

is

to give a general version of this type of estimator and

in its properties.

Also,

show the

we show how bootstrap corrected nonparametric

estimators remove the need for undersmoothing, and hence that undersmoothing

is

not

needed for idempotent nonparametric estimators. Section

we

Two

form of the bias corrected functional

of the paper describes the general

consider, and gives results for kernel estimators.

Section Three describes a

nonparametric estimator with a bootstrap correction for smoothing and functional estimation.

its

use for

Section Four considers a special class of linear estimators where

mean-square error calculations are

feasible,

showing more precisely the higher order

effect of the bias correction, and giving exact

MSE

calculations for one case.

Five analyzes semiparametric m-estimators, shows that undersmoothing

is

Section

not needed

the nonparametric estimator has no effect on the limiting distribution, and derives results for kernel estimation with a smoothing correction.

- 3

Section Six concludes.

when

Bias Corrected Functional Estimation

2.

essential features of the problem at hand

To focus on the

discussion with a linear functional

fi(F)

= J5(2)F(dz)

denotes some unsigned measure (charge) on

where

correspond to a nonparametric density estimator fi

=

;i(F)

known and

is

is

estimator

is

(2.1)

Kill]

-

Mq =

F

relatively simple,

.

For example,

Let

F

F(z) = Jl(u:£z)f(u)du

= j5(z)F(dz)

and

let

F

could

and

F(z) =

Assuming that the order of integration can be interchanged, the bias of

E[F(z)].

If

Consider the estimator

F

with

f,

useful to begin our

is

easily understood here.

is

be some nonparametric estimator of the true distribution

= J5(z)f(z)dz.

6{z)

Although this case

z.

the role of undersmoothing in achieving v^-consistency

/j(F)

it

this

J'5(z)F(dz) - J'5(z)FQ(dz) = SS{z){F-F^){dz).

the order of this bias

is

the same as the order of the pointwise bias of the

nonparametric density estimator, as occurs

in

many

involve the pointwise bias shrinking faster than

cases, then Vn'-consistency will

Furthermore, the pointwise

l/-/n.

standard deviation of a nonparametric density estimator generally shrinks no faster than so that /n-consistency will involve the pointwise bias shrinking faster than the

l/'/n,

pointwise standard deviation. bias

is

error

This property

is

referred to as undersmoothing, since less

generally associated with less smoothing and the fastest shrinkage of

is

mean square

generally associated with bias and standard deviation shrinking at the same

rate.

A way (2.1)

to reduce the bias of

and subtract

it

(2.2)

is

to

In this simplest

off.

J'5(z)F(dz) - 5]._ 5(z.)/n

fi

form an estimator of the bias term case there

is

an unbiased estimator

of the bias term, and subtracting

M = M + I.",6(z.)/n ^^1=1 1

- j5(z)F(dz)

= J].",5(z.)/n, ^1=1 1

- 4 -

it

off gives

in

equation

the usual unbiased estimator.

For nonlinear functionals the same basic idea can be applied to a first-order bias term.

To describe

which both

this approach, suppose that

and

F(z)

Fp,(z)

belong.

ju(F)

For example,

has some restricted domain S^

to

S^

might be restricted to have

elements that are absolutely continuous with respect to Lebesgue measure, with a density f

corresponding to each

F e

?.

Also suppose that there

is

an expansion of

/i(F)

on

5^

such that

m(F) =

(2.3)

where

)li(Fq)

+ j5(z,FQ)(F-FQ)(dz) + R(F-Fq.Fq),

denotes a function semi-norm.

IIFll

5(z) = 5(z,F

),

and

i/((z)

= 5(z)-E[5(z)].

P

Let

|R(F-Fq,Fq)| = odlF-F^II),

denote the empirical distribution,

Then the plug-in estimator

= ^(F)

fi

satisfies

Vniii - tij = y.",i//(z.)/'/n +

(2.4)

The

^1=1

1

R

+

n

B

first order bias of this estimator will be

R(F-F

,F

including higher order terms.

)

an estimate of this first order term. first order

term would be

If

R

,

n

E[B

n

6(z)

]

n

= /Kj5(z)(F-P)(dz).

= /nJ"5(z)(F-F„)(dz),

with

A bias correction can be formed by subtracting 5(z)

were known, an unbiased estimate of

/5{z)F(dz) - y. ,5(z.)/n. ^1=1 1

estimate can be formed by replacing

= \/nR(F-F^,F^), B

n

A feasible version of

with an estimator

this

this bias

Then subtracting the

6(z).

corresponding bias estimator gives

(2.5)

M = A + Ei^^Zi^/n - j5(z)F{dz) = A + J'5(z)(P-F)(dz).

Unlike the linear functional case, this estimator will not be exactly unbiased, because of the higher order term

R(F-F ,F

)

and the estimation of

because we have subtracted an estimator of the first-order bias of should have smaller bias than

expansion that

is

the same as

ji.

ji

In fact, as discussed below,

Nevertheless,

5(z).

it

fi,

has an asymptotic

except one of the remainder terms

- 5 -

this estimator

is

of smaller order.

A simple example

is

the average density.

Suppose tha.

restricted to contain

is

3-

only absolutely continuous elements with square integrable densities and let

2

"

Let

dz.

J'f(2)

=

fi(F)

m(F) = jj(F

)

f

and

denote the densities corresponding to

f

+ SZT^{z)ir{z)-r^{z)]dz + J[f(z)-fQ(2)]^dz.

satisfied with

6(z,F

)

= 2fQ(z),

a bias corrected estimator

and

=

IIFII

and

Letting

Note that

.

is

«

" .

F

so that equation (2.3)

1/2 fi(F)

F

= 6(z,F) =

5(z)

^

2f(z),

given by

is

M = Shz)'^dz + J-2f(z)(P-F)(dz) = 2j].y(z.)/n - J-f(z)^dz.

(2.6)

example the bias corrected estimator

In this

2

estimators, the plug-in estimator

J'f(z)

a linear combination of two well known

is

and the average estimated density

dz

I.",f(z.)/n. ^1=1 1

To see the effect of the bias correction

V^{^ - Mn) = I.",(z)

continuously differentiable of order

some

c

and

>

all

s

ff^(z)

satisfied with

s

h

s ,

d+a

has

with

A

|A|

s+d

=

on

with bounded

IR

Jsup

s,

\d f(z+A)|dz


00

fi.

2.1:

and

h

R

Also, if

n

If Assumption 1-3 are satisfied and

— =

>

0,

p

and

6('z^

is

-^

and

such that

nh

/ln(n)

continuous with probability one and bounded,

B

(ln(n)/VRh'"^^^ + \^h^^),

ln(n)/Vnh'"*'^^

h = h

VRh^ -^

n

=

(/Rh^) + o

p

then

- 13 -

(1).

p

^/nCtl-aJ =

y.'^Mz.)/VR

+ o (1).

The hypotheses of

this result require undersmoothing.

mean-square error) for estimation of the

d

derivative of

d+s

*• J J derivatives J J *u and bounded the kernel continuous u

and

Vnlh



than



vnh

>

,

order,

s

does not go to zero.

to have

h

is

when

fr,(z),

Here

h

.

,

is

h

= n

l/(r+2d+2s)

0.

essential to be specific about 5(z).

Here we assume that

satisfies certain smoothness conditions in

h,

F e

with

4:

5^,

There

is

^

for

and

z S Z,

|5(z,F)-5(z,F )-D(z,F-F

5-

D(z,F) 11^,

that

F

equicontinuity term

S

obtain the order of

S

It

is

|D(z,F)|

£

F e

5-,

b(z)IIFll.

5(z,F)

in

F,

helpful in deriving the order of both the stochastic

T

and the nonlinear term ,

6(z,F)

bounded,

b(z)

such that for

This condition follows from second order Frechet differentiability of

with bounded derivative.

where

and for small enough

Also, there is

linear in

is

and

e J

F

such that

with probability one.

£ b(z)IIF-F

)|

6(z) = 5(z,F),

is

it

F.

a set of functions

Jl(u:£z)K (u-z.)du 6

b(z) =

,

must be chosen smaller

To obtain conditions for ^^-consistency of the bias corrected estimator,

Assumption

has

f^(z)

*

th

,

,

= n

)

The optimal bandwidth (minimum

in

D

.

We

use this assumption to

rather than empirical process methods, because a direct proof

for kernel estimators (as in Newey and McFadden, 1994) seems to allow for a wider class of functionals.

In particular, empirical process

rely heavily on smoothness of Also,

when

6(z)

is

linear in

U-statistic theory gives

S



5(z)

F

^

in

(i.e.

z,

methods for showing that

S

-^

that can be avoided by using Assumption

b(z) =

as for the average density),

under very weak conditions.

These conditions lead to the following result for the bias corrected estimator:

- 14 -

4.

Theorem.



>

and

00

If Assumptions

2.2:

h



>

D

Also, If

ln(n)/VKh^'^^^ ->

=

p

n

in

such that

h - h

and

VRh^^ -^

VR(ii-yiJ =

then

D

Also,

h.

if

the bandwidth

is

the mean-square minimizing value for estimation of the

d

derivative of

the estimator will be V^-consistent for any value of

that

is

of

such that

h

this case the

of

fp,(z),

r+2d InCnj/VTih

bandwidth which

which

is

n



is

2s

and

>

\/nh

,

in the

h

>

satisfying the conditions of

requires

s

>

and

d + r/2,

in

derivative

d

does not require undersmoothing of

ji

f.

are weaker than the conditions for

(j

Theorem

2.1 requires

s > r + 2d,

Existence of

h

while existence

Theorem 2.2 for v^-consistency requires only

Thus, with the bias-corrected estimator

f(z)

only required to have half the number

is

of derivatives, as for the original estimator, or alternatively derivatives that are needed, the kernel for /J

then

s > (r+2d)/2.

(2.10)

for

fp.(z),

Specifically, existence

sense that they require less smoothness.

satisfying the bandwidth conditions of of



chosen to be

will satisfy the conditions for /n-consistency.

The conditions for Vn-consistency of fi,

(I).

p

large enough so that

pointwise optimal for estimation of the

This result then shows that v/n-consistency of

^n-consistency of

s

satisfying the conditions of this theorem.

h

o

+

I

obtained here will be smaller than the upper bound obtained for

for a range of bandwidths

2.1,

there exists an

/ln(n)

Y^ Mt..)/^

'-'1=1

Theorem

nh

an(n)/VRh'"^^^ + VRh^^ + h^).

The upper bound on B

are satisfied, and

then

0,

(2.9)

n

1-4

in

order to attain Vn'-consistency.

of smoothness of

6(z,F)

as a function of

/i

^^.(2)

has

all

the

need only be half the order of the kernel

This improvement F,

if

is

achieved at the expense

as in Assumption

- 15 -

4.

Bootstrap Corrected Nonparametric Estimation

3.

Another approach that also reduces the size of the bias can remove the need to undersmooth,

To motivate

corrected for smoothing.

E[F]-F

the order of

is

fi(F)

We

.

the empirical distribution, vi^hich

because smoothing

is

is

is

F)

to plug in a nonparametric estimator that has been

could eliminate this bias

F

To describe

if

we could replace

Often this replacement

unbiased.

into the

domain of

and smoothing induces some

F = CP

this correction, suppose that

with domain that includes the empirical distribution

P.

(j(F)

it

F

not possible

is

when

(e.g.

/j(F)

to partially correct for

for some transformation

Often

by

However, we can

bias.

construct a smooth estimate of the smoothing effect and use bias.

functional estimation and

this approach, recall that the first order bias in

required to bring

depends on the density of

in

C

will be a

C

smoothing

transformation, such as convolution for kernel estimators, that depends on the sample size and gets close to the identity

CF-F

I

as the sample size grows.

should be an estimate of the smoothing effect

CP-P.

Then for large samples

Subtracting

CF-F

from

F

gives

F = F - (CF - F) = 2F - CF = 2CP - C^P = (2C-C^)P.

(3.1)

where

C

2

smoothing, correction

is

in

the composition of the sense

P

is

C

with

replaced by

F

in the

is

a bootstrap correction for

smoothing effect

CP-P

to

form the

CF-F.

In this Section

we

will consider an estimator obtained

corrected nonparametric estimator, giving

estimator

This

itself.

/j

=

ii(F).

It

may be v^-consistent without undersmoothing.

correcting the distribution estimator and plugging

it

in,

by plugging

will be

in this

shown that

smoothing

this

This approach, of bootstrap

allows the same nonparametric

estimator to attain the optimal nonparametric convergence rates and be plugged into functionals that attain the optimal

l/\^

rate.

To achieve

this simultaneous

optimality, the influence function will need to satisfy certain conditions that are

- 16 -

p

detailed below.

To understand the effect that the bootstrap correction has on the functional estimator, consider an expansion analogous to equation (2.4) with

Mix

(3.2)

- n^) = Ij^^iAlzJ/v^ ^

This expansion

is

the same as for

the first order bias term.

{2C-C

^n

^(F)

Suppose that

^

" ^R^F'-F'o.Fq), B^ = ^J'6(2)(F-P)(d2

^n

^n'

except that

F

C

with

is

linear,

replaces

with

F,

E[B 2

being

]

'•

E[F] = E[{2C-C )P] = *

2 )F

Also, suppose that there

.

is

C 5

a transformation

of

such that

5(z)

»



J'5(z)CF(dz) =

SC 5(z)F(dz)

F = F

for

interpreted as the adjoint of

C,

F = CF

and

With more structure

.

as in examples given below.

Then

may be

C

the order of

if

integration can be interchanged,

E[B

(3.3)

n

= v^J'5(z)[(2C-C^-I)F^](dz) = -ySj5(z)[(I-C)^F^](dz) U U

]

= -v^J[(I-C*)5](z)[(I-C)FQ](dz).

Here

(I-C)F

represents smoothing bias from the transformation

small in large

C,

#

samples when

C

is

close to

I.

The term

also be close to zero in large samples. -/nJ'5(z)[(I-C)F ](dz)

for

(I-C )6

bootstrap-corrected estimator

F

an analogous term that should

Comparing equation

from equation

B

is

we

(2.4),

with

(3.3)

E[B

=

]

see that using the

leads to the replacement of

by

5

in the

(I-C )5

*

integral, reducing the first order bias

when

(I-C )5

is

close to zero.

bias complementarity effect referred to above, where the bias in

F

This

is

the

has been

*

complemented by the influence function remainder

(I-C

)5.

Kernel estimators are an important example, where

^

/s

jK(z,u)P(du)

for

f

for

(c{z,u)

= h

F

has density



f(z)

=

^

K((z-u)/h).

leads to a transformation

the bootstrap corrected estimator "twicing" kernel associated with

C

Plugging in

where

CF

F = 2F-CF, K(u).

let

F

for

has density

in the

formula

J"K(z,u)F{du).

To describe

K(u) = 2K(u) - J'K(u-t)K(t)dt

Then the density of

17 -

P

F

will be

be the

= 2f(2) - jKtz.ujfCujdu = 2f(z) - JjK(z,u)K(u,t)duP(dt) = jK(z,u)P(du),

f(z)

(3.4)

= 2k(z,u) -

lc(z,u)

That

J'K(z,t))c(t,u)dt

= h~^K((z-u)/h).

the smoothing correction gives a kernel estimator with a twicing kernel that

is,

is

constructed from the original kernel.

We

may is

use twicing kernel estimators to illustrate the bias correction, although they

not be the best choice of nonparametric estimator.

twice that of the original kernel and

density estimator feature.

Also,

may

f(z)

it

is

is

A twicing kernel has order that

not everywhere positive.

not be everywhere positive, which

Consequently, the

may be an undesirable

known that using higher order kernels may not improve mean-square

error in most sample sizes of interest,

e.g.

see Marron and

Wand

This motivates

(1992).

a search for other bootstrap corrected estimators, as considered below.

To understand the bias formula *

C

find a transformation f,

CF

has density

in

equation (3.3) for kernel estimators

we need

to

*

with

JC 5(z)F(dz) = j5(z)CF(dz).

5(z) = J'/c(u,z)5(z)du,

so that for

jK(z,u)f(u)du,

F

For any

with density

.

j6(z)CF(dz) = j5(z)[J'K(z,u)f(u)du]dz = J5(z)f(2)d2 = SC 5(z)F(dz).

_

»

where and

u

C

*

5(z) = 5(z).

in

which

k,

Here is

C

known

is

z

to be the adjoint under certain conditions (Luenberger,

This transformation

1969, p. 153).

the integral transform obtained by interchanging

is

a convolution, with

5(z)

= J'K(u)6(z+hu)du.

Equation (3.3) now becomes

(3.5)

E[B

]

n

= -/Sj[5(z)-5(z)][f(z)-f-(z)]dz.

The bias effect of the bootstrap correction J'6(z)[f(z)-f (z)]dz (3.5).

is

to replace the integrated convolution bias

with the integral of the product of convolution biases in equation

The magnitude of the bias reduction

will

depend on the convolution bias

- 18 -

which

5(z)-5(z),

in

turn depends on the smoothness of the influence function, as

known from the kernel estimation

well

is

The following result makes these conditions

literature.

precise.

Theorem

3.1:

If Assumptions 1-3 are satisfied,

order

^ s

on

and

t

h



Also, if

with bounded derivatives,

R

S(z)

continuously differentiable of

is

h = h

such that

n

ln(n)/Vnh'"^^^ -^

The upper bound on ^^

B

Vnh

has been replaced by \/n-consistency of the bandwidth

is

is

n

/j(F)

is

and

satisfied with

->

Vnh^"^^

/a

.

B

in

n

B

and

Vii(il-n) =

then

smaller than the bound on

s+t

= ^i(F)

=

(Vnh

y.'^Mz.)/VR

^1=1

Theorem



/ln(n)

~

~

/K

then equation (3.2)

0,

>

nh

S+t

t ).

(1).

p

I

2.1

+h

+ o

oo

>

because

Vnli

This replacement means that sufficient conditions for

are different than for the original plug-in estimator

chosen so the dominating remainder terms

are asymptotically proportional then

fi(F)

r+?d

Vnh

and

ln(n)/\/nh

If

jj.

s+t

will be \^-consistent if

s + t > r + 2d.

(3.6)

This condition allows some tradeoff of smoothness of

f^(z)

and

5(z)

for attaining

v^-consistency.

This estimator also attains V^-consistency without undersmoothing function

is

smooth enough.

Consider the case where

fp,(z)

differentiable, so that the order of the pointwise bias in

bandwidth so that the squared bias order of the pointwise variance.

(3.7)

t >

r/2 +

h

is

~ f

is

only

h

s

s .

if

the influence

times

Choose the -1

2s is

proportional to the order

n

Then the conditions for v/n-consistency are satisfied

h

-r-2d

if

d.

The smoothing bias correction depends on smoothness of the influence function in

contrast to the bias correction of Section

influence function in

F.

2,

in

that depends on smoothness of the

The bias correction of Section 2 may

- 19 -

still

give \^-consistency

z,

even when

5(z)

Jl(a:£z:sb)f(z)

positive at

2

is

dz

b,

is

that

(z)

is

=

fi(F)

discontinuous when the density

and so does not satisfy the conditions of Theorem

satisfy the conditions of (2.5) could attain

For example, the functional

z.

5(z) = l(a:£z:£b)2f(z)

Nevertheless,

It

in

6(z) = l(a:5z r

Consider a plug-in estimator with an ordinary kernel

is

chosen to give /n-consistency.

The plug-in estimator

with a twicing kernel of the same order will also be Vn-consistent for the same bandwidth choice, f

(z)

if

t

=

s > r/2.

That

to be half as smooth,

\/n-consistency with a twicing kernel only requires

is,

if

also

5(z)

is

smooth enough.

Also, the

same bandwidth

no longer has to involve undersmoothing, because only half as many derivatives of the density are needed to exist, and the optimal bandwidth for estimating a density with

fewer derivatives

will be smaller.

There are other nonparametric estimators that have the same functional bias reduction property as twicing kernel estimators. those where the smoothing transformation

= 2F-F =

is

may

=

j

1,

2,

...)

to Lebesgue measure on

"built into"

F.

F.

Here

F = 2F

is

(R

,

i.e.

an orthogonal series density estimator.

Jp.Culp, (u)du =

p (u) = (p

(u),...,

p,(u))'

k

and

series estimator is

- 20

1

if

j

a = X-_iP

= k (z-)/n.

-

Our results then

be a sequence of functions that are orthonormal with respect

J

Let

is

CF =

not be needed in this case.

One idempotent nonparametric estimator

otherwise.

idempotent, with

so the bootstrap smoothing correction

F,

indicate that undersmoothing

Let (p.(u),

A particularly important class are

and equal to zero

Then an orthogonal

CF

= p-^(z)'a = jK(z,u)P(du),

f(z)

Like the kernel estimator

k(z,u) = p-'(z)' p-^lu).

has the linear form

f

jK(z,u)P(du),

inner product of orthonormal functions rather than a kernel.

bootstrap correction, plug

F

in

but

k(z,u)

is

now an

To see the effect of the

P in the density formula to obtain

for

jK(z,u)f(u)du = J'[jK(z,u)»c(u.t)du]P(dt)

= V(z)'[J"p"^(u)p"^(u)'du]p-'(t)P(dt) = jK(z,t)P(dt) =

C

so that the transformation

p (zj'Jp (u)f (u)du f

to

Then by

6(z).

the

E[B

]

n

Note that

idempotent.

f(z) = E[f(z)]

minimum integrated squared error

5(z) = j6(u)K(u,z)du

and

(z)

is

is

= p (z)'J"5(u)p (u)du

is

f(z),

the

(ISE)

=

approximation to

minimum ISE approximation

jSlzJif^Cz)-? (z)]dz = 0,

= V^SSU)[fiz]-fAz)]dz = -/Kj[5(z)-5(z)][f(z)-f^(z)]dz. U U

Here the bias term that appears to be first order the bias correction being built into

f.

is

actually second order, a result of

Consequently, the bias of the functional

estimator can shrink faster than the pointwise bias, removing the need to undersmooth. This example can be made precise by specifying an ISE rate of approximation for

and

ff^(z)

Theorem 0(J~^^'~),

3.2:

leading to the following result:

If Assumption

and

B

5(z),

1

is satisfied,

{Slf(z)-f^(z)fdz)^^^ = 0(J~^'^^)

= Y..'^Jd(z.)-d(z.)}/Vn =

Furthermore, if

f rf^)

y/nJ

-^

and

*''''

(J

+

^^

P

- 21 -

— 1/2 2 = {S[S(z)-5(z)] dz)

then equation (2.4) is satisfied and

VnJ

VnWF-F^W

o (V.

hounded,

^^'' *'^0.

-^

0,

then

VnCju-ju^;

= I-^/Cz^VvTi +

this result are not very primitive, but are consistent with the

The hypotheses of rate

for approximation by orthogonal polynomials, where

s/r

undersmoothing will not be required for /n-consistency.



Vni

conditions only require that

>

For instance, n

linear.

correction

C

We can

.

is

^

if

)

by hypothesis, but the

t

is

large enough the

vnJ

n

1/4 -s/r J

— s/r— t/r



>



>

0,

meaning the bias

0,

and should be implied by

obtain more primitive conditions in the case of

=

2

is

only considered the case where the smoothing transformation

C

?

I

/?

fi(f)

-?c:/r

-t;/r

= 0{J

{/[flzl-f^lz)] dz>

we have

When

then

s >

-1/4



dz



So far

2:

/—

n

-s/r

For example, the average density estimator

specific functionals. if

t

will suffice for

,

converging faster than

/K-consistent

if

—1/4

shrinks faster than

0{J

is

Therefore

0.

II

which will not require

,

f

VnllFrF

Specifically,

same rate as the standard deviation without

bias could be allowed to shrink at the

affecting consistency.

n

mean-square bias of

Also, the

undersmoothing.

f

-1/4

converges slightly faster than

f

number of

the

is

The conditions are specific enough to see that

continuous derivatives that exist.

will hold if

s

known

)

and

v^(J/n +

J

nonlinear analogous results should also hold:

may remove

)

J"f(z)

^

0.

is

A bootstrap smoothing

the need for undersmoothing when the influence function has enough

derivatives, and this correction will be built into estimators that are idempotent

transformations of the empirical distribution.

an expansion of analysis, so

C

ourselves with pointing out some existing examples of nonlinear,

idempotent transformations where

Newey

(1994)

However, this would greatly complicate the

in the analysis.

we content

These results could be shown by including

it

is

known that undersmoothing

showed that undersmoothing

estimator of a conditional expectation.

is

is

not needed.

not needed for functionals of a series

This occurs because series estimators of

conditional expectations are idempotent, leading to a corresponding idempotent distribution estimator.

For brevity, we omit details.

Shen (1997) has also shown that

undersmoothing may not be needed for sieve density estimators when the influence function is

smooth enough.

This also corresponds to an idempotent transformation.

describe a sieve estimator as the solution to

- 22

We can

= argmax

f

= argmax

^Y.---,^^^^^-^^'"-

where

B

is

some restricted class

where

B

is

some parametric family that

we

F

by

P

replace

CF

Thus,

also imposes other restrictions, such as

follows from the information inequality that

It

= argmax^^gj[lnf(z)]f(z)dz,

f

if

Sieve estimators are of this form,

of densities.

boundedness of higher order derivatives.

i.e.

Jlnf(z)P(dz),

in the

transformation that gives

idempotent, so the smoothing correction

is

is

F

we

obtain

where

L

Consider

[I-(I-C)^]P,

a positive integer.

is

again.

built into a sieve estimator.

Higher order bootstrap smoothing corrections could also be carried out.

F^ =

F

We have

F

be any integer,

j

£

replacing :£

L,

= F

and

The estimator

to higher order bootstrap corrections.

equation (3.2), with

F

F.

C

Assuming

;-i(F

is

=

F )

while

F,

will

L

>

2

correspond

have an expansion

like

linear as before, and letting

the first-order bias term will be

This represents a higher-order bias complementarity, where some of the smoothing bias shifted

from

F

to

5.

Of course,

if

corrections are built into the estimator,

We

also note that

it

is

C

idempotent these higher-order bootstrap

is

i.e.

F

=

F.

possible to bootstrap correct the functional, forming an

estimator as

fi

If

=

li(F)

j

- [fi(CF)-fi(F)] = 2m(F) - (j(CF).

the functional were linear then

/j

=

ii{F),

i.e.

- 23 -

the estimator

is

the same

is

whether we bootstrap correct the functional or the density. case the first-order linear terms for both estimators, so that

(jl

In the

general nonlinear

the expansion of equation (2.3) would be the same

in

and

)u

should have the same first-order bias.

Linear Kernel Averages

4.

Many important estimators are averages data observations.

do so

Examples include the well known average density estimator

and the weighted average derivative estimator of Powell, Stock, and Stoker

X|._,f(x.)/n (1989).

of functions of nonparametric estimators and

The bias corrected estimation results can be extended to cover this case, and we Section and the following one.

in this

Consider a parameter of interest

= E[g(2,FQ)] = Jg(z,FQ)FQ(dz),

^Xq

where F.

g(z,F)

is

some known function, that may have a restricted domain as a function of

One way to estimate

fi

is

to plug a nonparametric estimator

F

into

integrate over the empirical distribution to obtain

(4.1)

11

= J'g(z,F)P(dz) = Xj^^g(Zj,F)/n.

One could also use

F

in place of

P,

although the estimator

computationally simpler.

- 24 -

fi

is

often

g(z,F)

and

To understand how a bias correction should be constructed for

A = J-g(z.FQ)P(dz) + {liFhiiiF^) +

Here is

S

is

n

expectation of

fi(F)-/i(F

nonlinearity in

F.

functional 2.

/i(F).

Suppose that

The term

correction

(4.3)

is

il

then

It

is

Section

3.

is

is

is

unbiased for

but the

(n

to smoothing inherent in

we need

y.

u

second order, so to first order

may depart from zero due

)

F

and to

to bias correct for the

This correction can be constructed by applying the analysis of Section (i{F)

= i"g(z,F)F )

and

(dz)

satisfies equation (2.3) and has influence

5(z)

denote an estimator of

let

the

same as

in

5(z).

and the corresponding estimator

J'3(z)(P-F)(dz)

= M + l5(z)(P-F)(dz) =

This correction function of

S^ = J[g{z,F)-g(z,FQ)](P-FQ)(dz).

J'g(z,FQ)P(dz)

Therefore, to bias correct

5(z) = 5(z,F

function

S^,

a stochastic equicontinuity term that

Jg(z,F )P(dz) + m(F)-m(Fq).

let

Then

m(F) = E[g(z,F)] = J'g(z,F)FQ(dz).

(4.2)

this estimator,

ii

The bias

is

+ X;.",5(z.)/n - i"5(z)F(dz).

Section 2 except that

6(z)

estimates the influence

J'g(z,F)F (dz).

also possible to

form a bias corrected estimator by applying the analysis of

Plugging in a bootstrap corrected nonparametric estimator

F

in

g(z,F)

gives

(4.4)

It

ji

= Jg(z.F)P(dz) = X."^g(z.,F)/n.

can be shown by results analogous to those of Section 2 and Section 3 that the need

for undersmoothing can be removed by both approaches to bias correction.

omit this general analysis, that linear case,

where

g(z,F)

is

is

a special case of Section 5.

linear in

asymptotic mean-square error results.

F

and

F

is

linear,

For brevity we

We focus here on

the

where we can derive

This allows us to quantify the variance effect of

the bias reduction, and includes several interesting examples.

- 25 -

We

One example

is

v3

g(z,F) =

consider

the average density, where

weighted average derivative, where

A

is

=

v = y

and

v

A = 0.

and

I

are not elements of

y

y =

a unit vector and

is

x.

a density

so that by

I,

a density weighted conditional covariance, where

is

Another

n^ = Elvd^T^U)] = E[E[v|x]5\q(x)] = -E[fQ(x)a'^E[v|x]]/2.

integration by parts third example

where

{E [y|x]f(x)}

= E[vE[y|x]f

fi

The estimators are obtained by substituting a kernel estimator for

A (x)].

d {E [y|x]f(x)> r

and averaging over

They have the form

z..

",K, (x.-x.)y./n]/n = ^."T.'^flV (x.-x.)v.y ./n^. A = i:.'^,v.[a'^j: ^1=1 1 ^i=l^j=l ^j=l h 1 1 J J J J

(4.5)

hi

These types of estimators have been considered by Hall and Marron (1987), Hardle and

Tsybakov

Our contribution

and Stoker (1997), and others.

(1993), Powell

is

to

show how

the bias complementarity affects the mean-square error of these estimators.

The estimator

has precisely the form

fi

z = (x,w)

describe this form suppose that

in

equation

where

w

includes

and

y

v,

To

F.

and

The corresponding marginal density of Also, for any function

a(w)

x

is

E[a(w)|x]

d (E [y|x]f(x)}

is

d

It

is

equal to

in

fi

h

J-l

equation

J

is

|^(F)

=

E[vS

(x-x.)/n.

,a(w.)K, (x-x.)/f(x). ^1=1 1 h 1 Y.

./n,

so that

J

5(z),

the bias corrected

identical to the bootstrap corrected estimator of equation

based on the corresponding twicing kernel.

(1994) to

K

(4.5).

turns out that with a symmetric kernel and a certain

estimator of equation (4.3)

n

is

V ._ K (x-x.)y

1"

J'g(z,F)P(dz)

f(x) = 5]._

the kernel estimator

the estimator of

Therefore, the estimator of

{E„[y|x]f(x)}]

To describe

this result, apply

Newey

to obtain the influence function formula

r

(4.7)

let

F(z) = n~V.",l(w.:)3'^b(x)dx/(s!t!).

,=r, I

A =t A A I

term

r,, ^7 = n -hr Var[(jj(z)] + n

^

)

in this

that will dominate

if

expression

is

are satisfied then as

-2^-r-2\X\ ^ ^2s+2t 2 h B Q + h

+



n

and

oo

>

,2s+2t

o(h

+n

h



^

-2-r-2\X\ h

0,

+n

-U ;.

the usual asymptotic variance under v^-consistency

the other terms go to zero.

from kernel estimation.

bias terms

5-7

If Assumptions 2 and

.,^r^^~ MSE(n

(4.11)

The

IiM ^1 A|=s,

The estimator

usual asymptotic variance term dominating,

if

n

The other two terms are variance and with the

will be Vn'-consistent,



h

and

>



nh

>

0,

meaning the pointwise variance of the kernel estimator goes to zero and the product of pointwise biases for

and

a{x)

shrink faster than

b(x)

chosen to balance the variance and bias terms, asymptotically proportional,

s + t

(4.12)

This condition

is

> r/2 +

i/Vn.

n

so that

If

the bandwidth

and

h

h

are

will be v^-consistent if

(i

|A|.

weaker than the requirement of Sections 2 and 3 that

by the linearity of

is

g(z,F)

made possible

and the absence of the own observation term.

F

in

is

Equation (4.12) allows equality rather than the strict inequality of equations (2.10)

here. in

and

(3.6),

However

equation

Vniyt -|j

)

(4.11)

that s + t

is

possible because uniform convergence rates for

= r/2 +

will be

Qn

I

,

A

|

is

are not used

a knife edge case where the variance remainder term

the same size as the leading term.

will not be asymptotically

f

normal with variance

In this case

Var(i//(z)),

so the usual

functional asymptotic theory does not apply.

We can compare

the

MSE

in

Theorem

4.1 to the

MSE

for the estimator based on

the original kernel to evaluate the effect of the bias correction. C..Sb{x)d a(x)dx/(s!). El-itA —S A I

Then by Powell and Stoker

1

- 29 -

Let

(1997), the

B =

estimator

/i

C

obtained from equation (4.9) by replacing

MSE(A

(4.13)

'

^2s-^2

The comparison of the MSE of

MSE

and

fi

B

fi

,,2s

+ o(h

and

+ n

n

-2^-r-2|A| h

is

is

/i

to be larger than

A 2 A~ 2 Sid K(u)] du/J"[5 K(u)] du,

A 2 Sid K(u)] du.

complementarity, the bias term estimation.

The constant

rather than the

2s

is

).

The ratio of kernel variance

which

A

term depends on the

order derivatives of

a(x).

derivatives of

t

This bias reduction kernel.

optimal bandwidth formula for estimation of

with

over

h

B *

a{x)

is

b(x)

better in some

In particular,

it

may

implies smoothness

as for the average density).

(e.g.

The asymptotic mean-square error (MSE) given

s,

A~ 2 Sid K(u)] du

same way as for pointwise

require no additional derivatives to exist when smoothness of b(x)

exactly the same as the

On the other hand, because of the bias

ways than the pointwise bias reduction from a higher order

of

is

at a point, with

S f^(x)

not reduced in exactly the

in the bias

-1,

partly analogous to a comparison between

ratio of pointwise variance terms for estimation

known

+ n

of the corresponding kernel estimators. ~

terms for

has

K(u)

= n"Var[0(z)] + n^V'^"^' ^ {J[a^K(u)]^du/J[a^K(u)]^du}Q

)

+ h

the pointwise

by

K(u)

0.

fi

,

in

Theorem

when

t

=

s

4.1

can be used to obtain an

and the kernel

is

order

Minimizing the sum of the second and third terms in equation (4.13)

gives

h = [Q(r+2|A|)/(B24sn2)]l/(4s+r-.2|A|)_

(4.14)

This bandwidth

is

optimal, in the sense of minimizing the leading terms in the

MSE

of the

estimator. It

is

interesting to note that although undersmoothing

\/n-consistency, undersmoothing is still optimal for

ji.

is

not required for

The optimal bandwidth converges

faster to zero than a bandwidth that minimizes the asymptotic mean-square error of

This feature seems to be specific to linear functional estimators with the

-

30 -

own

a(x).

observation deleted. n

If

term

h

the

own observation were included then there would be an MSE, which dominates the

in the

n

term, and will make the

h

rate for the optimal bandwidth for functional estimation the same as for nonparametric

when

the functional

estimation.

Also,

R(F-F

as discussed in Section

,F

)

MSE

This should make the

3,

nonlinear there

is

which

~ IIF-F

of order

is

of no smaller order than

an additional remainder term

is

IIF-F_II

2

under Assumption

II

=

(max{n

h

1.

,h

>),

p

where undersmoothing would not be optimal. The optimal bandwidth formula the bandwidth in practice.

in

equation (4.14)

The constants

Q

and

by replacing the unknown nonparametric terms

Q

obtain

and

B,

which could then be used

is

potentially useful for choosing

are unknown, but could be estimated

B

in their

formulae by kernel estimates to

Q

in place of

and

B

the optimal

in

bandwidth formula to form an estimate.

An interesting example described above, where

be

i//(z)

= 8

f

Stoker (1989).

is

|A|

the density weighted average derivative estimator

=1

and

y =

Here the influence function for

1.

(x){v-E[v|x]} - f (x)3 E[v|x] -

Suppose that

fp,(x)

is

s+1

as derived in Powell, Stock, and

fi

times differentiable and

times differentiable, that the original kernel has order at least following integrals exist, and

s,

E[v|x]

and

all

is

the

let

Q = a[a^K(u)]^du}JVar(v|x)fQ(x)^dx

B =

1,^,^3^

|^,^^CxCx/s't3\(x)]aV[v|x]fo(x)}dx/s!t!.

Then, the conclusion of Theorem

(4.15)

In this

MSE(mJ =

4.1 implies that

Var[iA(z)]/n +

as n



>

m

and

h



>

0,

n"\"''"^Q + h^^^^^B^ + o(n~-^+n~V^~^+h^^*^^).

example the optimal bandwidth with

s

=

will

fi

t

- 31 -

is

t

oi/ro2.4sn [Q{r+2)/(B h = ^r^r

2,,l/(4s+r+2) )]

For example, with a normal kernel,

=

s

t

=

r

2,




small enough,

llx-xll

A =

not

is

a

and

(x),

|a(x)-a(x)|

such that

C, (x)

b

^ C (x)llx-xll

and

JC

a

(x)C, (x)dx


and

oo

is

again the product of pointwise bias orders

chosen so that

the estimator Vn-consistent

n if

and

h

^ +

t

h

a r/2.

,

h

and

h

.

If



This result

is

similar to the condition in

This result shows that the bias complementarity available with the

- 32 -

the

are asymptotically proportional then

equation (4.12) for the differentiable case, but only requires a Holder condition rather

than derivatives.

h

= Var[ip(z)]/n + 0(n~^h~^) + 0(h^^^^^).

The order of the bias bandwidth

2,

>

twicing kernel

is

not solely an artifact of the higher order property of the twicing

kernel.

We can

we have derived

illustrate the results

density estimators.

we regard

If

=

jiiF)

2 J"f(z)

dz

as a nonlinear functional and derive 5(z) = 2f(2),

a bias correction from the influence function estimate estimator of equation

(2.6),

corrected density estimator

=

II

= 2^._ f(z.)/n - Jf(z) dz.

/i

comparing different average

so far by

we

obtain the

Plugging in a bootstrap

gives

f(z)

~ 2 J"f(z) dz,

that will generally be different from

the density, where influence function

= E[g(z,F

(u

)]

for

g(z,F) = f(z),

that can be estimated by

ff^(z)

2

SfAz) dz

Regarding

.

fj

as the expectation of

and noting that

E[g{z,F)]

has

the bias corrected estimator

f(z),

of equation (4.3) becomes

i^T

3

= I-",nz-)/n + J"f(z)(P-F)(dz) = 2jf(z)P(dz) - Jf(z)^dz = ^1=1

1

ji,. 1

Also, the bias corrected estimator of equation (4.4) will be

M4 = L^Ji^-Vn.

It

f

follows from equation (4.8) that

/i

=

if

/i

f

based on the twicing kernel corresponding to

is

and f.

are kernel estimators where

f

Here we see that three of the

bias corrected estimators are equal, with the different one being the nonlinear integral

2 J"f(z)

dz

of a bias corrected nonparametric estimator.

gives ^^-consistency of

times differentiable,

fi

if

the original kernel has order at least

s > r/2,

A further reduction

in

The conclusion of Theorem 2.2

MSE

and

\^ —

>

oo,

Vnh

-^

33 -

ff^(2;)

is

s

0.

of the bias corrected kernel estimator

by deleting the own observation to obtain

s,

fi

is

possible

(4.16)

=

IJ-

L

Applying Theorem

bandwidth f

is

.K^(z.-z.)/[n(n-l)].

4.1,

this estimator will be \/n-consistent if

chosen as

in

equation (4.14

satisfies Assumption 9 then

(z)

/u

if

6

2:

1/4.

is

chosen so that ^ = 1/4

Here

is

be the same size as the leading

n

is

For example,

proportional to

h

46

r

if

~ ,

a(x) = b(x) =

if

(^+t)/2 = ^ ^ r/4

will still be Vn-consistent if

-2 -r h

and the

Furthermore, by Theorem 4.2,

.

and the other conditions described above are satisfied. the bandwidth

a r/4,

s

fi

is

=

1

then for

\^-consistent

a knife edge case where the variance remainder term will

term.

Var(i/((z))/n

The condition

^ ^ 1/4

was shown by

Bickel and Ritov (1988) to be necessary for existence of a /n-consistent (regular)

Thus, the twicing kernel estimator

estimator.

like the

conditions,

more complicated sample

fi

attains ^^-consistency under minimal

splitting estimator of Bickel and Ritov

(1988).

To

illustrate the potential efficiency gains

from using twicing kernels we have done

some exact MSE comparisons for different estimators of the average density when the true density

is

a standard normal and a standard normal kernel

estimator. (4.16)

when

Specifically, z.

are

we have calculated exact MSE

i.i.d.

with

N(0,I)

is

used to construct the

for the estimator in equation

distribution and

K(u)

is

either the standard

1

normal density or the twicing kernel based on the standard normal. been chosen to minimize the asymptotic (4.14) for the twicing kernel

MSE

The bandwidths have

for the respective estimators as in equation

and the analogous equation for the standard normal kernel

(with corresponding larger bias term).

Table One gives the sample sizes at which the

becomes smaller than the MSE of the normal kernel.

- 34 -

MSE

for the twicing kernel estimator

Table One: MSE Crossover Sample size Dimension 18 17

1

2 3 4 5 6 7 8

r = 4.

25 31

39 52

MSE

Figure One graphs the ratio of to dimension

19 21

as a function of sample size for dimension

For dimension

for higher dimensions.

=

1

r

:s

4

case because the

1

up

MSE

We

r = 8.

ratio would asymptote to zero

These graphs show persistent MSE gains.

Figure

Two

presents graphs of the

and

= 3

and for sample sizes

r

=

the standard normal kernel will not be

r > 4

v^-consistent but the twicing kernel estimator will, up to dimension restricted attention to the

r

MSE

as a function of sample size for dimension

50, 100, and 200.

r

These graphs allow us to

compare the sensitivity of the MSE to the choice of bandwidth for the standard normal and twicing kernel.

bandwidths, bandwidths.

we

Although the MSE of the twicing kernel can turn sharply upward at low find that

it

is

In this sense the

and close to

flat,

MSE

its

minimum, over a wide range of

for the twicing kernel seems less sensitive to the

choice of bandwidth than the original kernel.

5.

Semiparametric M-estimation

Estimators that solve an estimating equation with a nonparametric component have

many

interesting applications and include as special cases all the ones

considered so far.

we have

This class also includes profile likelihood estimators and

others, e.g. see Bickel et.

al.

semiparametric m-estimators.

(1990).

We

many

refer to this class of estimators as

In this Section

we develop bias-corrected

estimators.

- 35 ~

versions of these

To describe a semiparametric m-estimator, F

and

P

solve

be as previously discussed, and

F

m(z,/3,F)

q x

denote a

/3

q x

a

1

This includes as special cases

where

m(z,p,F) =

/3

where

fi(F),

vector of functions.

1

Let

is

it

Then expanding and solving for

/3

/3

is

from element

a mean-value that to element of

asymptotic normality of is

M

for

M =

E[Sm(z,/3 ,F

Also, let

and actually differs

/3

|3

The consistency of

M

and

p

^._ m(z.,F)/v'n

is

is

will

and

M,

a straightforward

F

and a uniform law

where undersmoothing and

can be obtained by applying the analysis of Section 4 to

/3

= J'm(z,F)F

M

differentiable in

important.

^

function.

is

nonsingularity of

)/5/3],

and accounting for the Jacobian term.

fi(F)

moment equation

Asymptotic normality and Vn-consistency of

Asymptotic normality of

may be

m(z,/3,F)

and

/3

influence function correction like that of equation (4.3).

function for

X!-_ig^z.,F)/n,

= n"^^.2^3m(z.,p,F)/5/3, m(z.,F) = m(z.,/3Q,F),

on a line joining

J]._ m(z.,F)/Vn'.

Bias corrections for g(z,F) = m(z,F)

lies

Suppose that

.

generally an implication of consistency of

of large numbers.

bias corrections

m.

M

follow from consistency of

property that

and

gives

JS

V^(P-/3q) = -M"^Xi"^m(z.,F)/-/H,

where

- fi(F),

/3

useful to begin by expanding the

around the true value of the parameter

(5.2)

m(z,/3,F) =

- g(z,F).

To obtain bias corrections

= n

~1

(dz),

n i^.

and ^

let

6(z)

The first approach Let

6(z)

is

by an

be the influence

be an estimator of this influence

y^

dm{.z.,f3,F)/dfB.

Then a bias corrected estimator

by

(5.3)

parameter vector,

X:-"|m(z.,|3,F)/n = 0.

(5. 1)

/3.

let

P = P - M"^[i;.y(z.)/n - J-6(z)F(dz)] = p - M~V6(z)(P-F)(dz)

- 36 -

is

given

This estimator has a form like that is

in

equation (4.3), except that the Jacobian estimator

M

included to account for the presence of

equation

in

To see how the correction affects the estimator,

v/S(p - Pq)

(5.4)

Jn

R„

zn

+ 6(z)-E[5(2)]

)

= J'[6(z)-6(z)](P-F)(dz).

= J[m(z,F)-m(z,F^)]F^(dz) - j5(z)(F-F^)(dz). u u u

The remainder terms

= m(z,F

= -m"^[I."j^(z.)/v^ + ^(Rj^ + R^^ + R3^)] + R^^

= J[m(z,F)-m(z,F-)](P-F_)(dz), R, u u In

R^

i//(z)

Then

Jm(z,F)F(dz).

be the influence function of

let

(5.2).

in

equation (5.4) are

all

= •/E'(M~^-M"-^)J'5(z)(P-F)(dz). R^ 4n

second-order, so that v^n-consistency of

p

should not require undersmoothing, although precise regularity conditions are difficult to specify at the level of generality considered here.

One important property of semiparametric estimators follows immediately from equation

Suppose that

(5.4).

6(z)

the asymptotic distribution of

=

meaning that estimation of

0,

Then

^.

6(z) =

estimator would be the original estimator. not need undersmoothing.

Thus,

large sample distribution of

we

is

F

does not affect

consistent, and the bias corrected

Consequently, the original estimator should

find that

if

the presence of

F

does not affect the

no undersmoothing will be needed.

p

There are many interesting semiparametric estimators where estimation of affect the asymptotic variance, and so undersmoothing

Newey

(1994),

if

m(z,p,F)

is

may

the gradient of a function

estimator of the "profile" distribution that maximizes

F

does not affect the asymptotic variance.

any of these estimators.

is

Hence, undersmoothing

is

an

then estimation of

may

not be needed for

Many known semiparametric estimators are special cases of

result, including Robinson (1988),

m(z,p,F)

E[q(z,p,F))

F

and

does not

As shown in

not be needed. q(z,P,F)

F

Chen and Shiau

an efficient score for

(3

in

(1991),

and Ichimura (1993).

a semiparametric model and

F

is

Also, if

an estimator

that imposes the restrictions implied by the model then the limiting distribution of

- 37 -

this

p

is

many examples is

in the

when the model

F

not affected by estimation of

may not be needed when

in

m(z,/3,F)

correct.

is

The second approach to bias correction formation of the estimator of

(i,

is

to use a bootstrap corrected estimator

choosing

F

as the solution to

p

5:."^m(z.,/3,F)/n = 0.

(5.5)

This estimator should not need undersmoothing because

F

demonstrated

true, as has been

Hence, undersmoothing

literature.

an efficient score and the model

in the

is

will replace

F

estimator

F

in the

that

is

average

^._ m(z.,F)/n.

in

the expansion of equation (5.2)

As discussed

Section

in

any

3,

an idempotent transformation of the empirical distribution, such as

a sieve estimator or a series estimator of a conditional expectation, has this correction so undersmoothing

built in,

We

may

not be required for Vn-consistency of

p.

could derive precise results on bias-corrected m-estimation for both the

estimators of equation (5.2) and the same as in Sections 2 and

However, because the ideas here are basically

(5.5).

we choose

3,

to be brief, and consider only

of equation (5.5), focusing on conditions for VTi-consistency

when

F

estimator

th^;

a twicing kernel

is

estimator.

Newey and McFadden

(1994) have already given general results on \/n-consistency of

semiparametric kernel estimators.

We

build on their results, deriving a corresponding

result for twicing kernels, showing that undersmoothing

specification for

w

where

is

m(z,|3,F),

where

m

depends on

F

is

only through

3'p,(x)

y{x) =

E

their

[w|x]f(x), r

a vector of random variables that are not elements of

estimator of the true function

We adopt

not needed.

= E[w|x]f

(x)

x.

A twicing kernel

would be

iix) = Y. ,K, (x-x.)w./n. ^1=1 h 1 1

Letting y(-),

m(z,p,2')

be a

q x

1

vector of functions that depends on

/3

a kernel semiparametric bootstrap corrected m-estimator would be

- 38 -

and the function p

solving

= Ej^^m(z.,/3,J)/n.

To specify regularity conditions we modify the norm



II

to apply to

II

'^.

X

Let

denote

a compact set and

= sup

llyll

This

is

norm

a Sobolev

Assumption

9:

c

m(2,y) = m(z,/3

Assumption with

b(z)ll3'll,

J'i/'(x){3r-2r

and

E[b(z)]


oo.

Also,

n

n

and

b(z)

Jsup

s,

s

with bounded derivatives

|5

y (x+A)|dx

that are linear in

DCz.g-)

llm(z,r)-m(z,3' Also, there is a




]




and

oo.

D(x) = JK(u)y(x+hu)du,

B

We

,

=

|A|

3.

.^r).

small enough,

ll3--?-„ll

with

A

all

There are

10:

one used for the kernel results of Section 2 and

continuously differentiable to order

and

>

^(x)ll.

,115

.

like the

is

j-tx)

and for some

Let

„sup.

p

=

= /nJ'[y(x)-y(x)]wP(dz),

R

n

+ h

,

= •/nJ"[m{z,F)-m(z,F^)-D(z,F-F-)]F-(dz),

= •/Sj[m(z,F)-m(z,F^)](P-F-)(dz).

first give a result that bounds the

remainder terms

m(z,f)-

- 39 -

in the

expansion for the average

Theorem m(z,Tf

)

If Assumptions

5.1:

and

9,

are satisfied then for

10

{f)(z)

=

+ i'(x)w - E[v(x)w],

B

=

p

n

Y^Mz.)/VR ^1=1

=

Y.'^Mz.,i)/^ '-'1=1 I

+

B

I

s

(VTlh^^^ + h^),

v(x) =

Furthermore, if

To show

2,

=

n

p

S

+

n

(VRp^ n

n

+

R

n

,

R

+ h^),

n

=

p

then this result also holds with

(VRp^).

n

replacing

K(u)

K(u).

a corresponding /n-consistency result for the semiparametric estimator

make assumptions that ensure convergence of the Jacobian term

we need

to

The next condition

M.

suffices.

Assumption

^ —

11:

small enough,

^

fS

m{z,l3,'y)

for

,

a neighborhood

in

/3

is

nonsingular,

Theorem then

Vn(^-(i

v[x)

has

)

)

-^

is

9-11

N(0,M~^Var(\p(z))M~^'

K(u)

replacing

).

is

and

/3

all

y

with

Il3'-?'^ll

c > 0,

= E[dm{z,f3^,r^)/aii]

exists

moments.

Vnp

Furthermore, if

2



>

v(x) =

0,

and

Vnh

s+t



>

then the same

K(u).

nonzero the conditions for ^^-consistency of this estimator are exactly

if

t >

r/2 +

d.

3.1.

The case where

having no effect on the asymptotic variance of correction

finite second

are satisfied,

analogous to those discussed following Theorem not be needed

M

117-^^^^),

mean zero and

If Assumptions 2 and

5.2:

result holds with

When

ni(z,y

of

continuously differentiable and for some

is

liam(z,/3,9-)/ap-am(z,pQ,?-Q)/5pll < b(z)(llp-/3Qll^ +

and

B

needed for this case:

If

Vnp



In particular,

v[x) = p.

>

corresponds to estimation of

As previously discussed, no bias and the kernel

(non-twicing one) then v^-consistency follows from Theorem 5.2.

- 40 -

undersmoothing will

is

the original

-y

6.

Conclusion

paper we have shown that

In this

is

it

estimator where the degree of smoothing

often possible to use a nonparametric

optimal (MSE minimizing) for estimation of the

is

function to construct a v'n-consistent estimator of a functional.

One approach involves

adding a bias correction to the estimator where the nonparametric estimate

Another involves a smoothing correction to the nonparametric estimator.

is

plugged

in.

An important

class of nonparametric estimators, that are idempotent transformations of the empirical distribution, have this smoothing correction built

in,

so that undersmoothing

is

not

needed for /n-consistency, including orthogonal series density estimators, series estimators of conditional expectations, and sieve estimators.

The influence function plays a key role

in

achieving Vn'-consistency without

For the additive correction to a plug

undersmoothing.

in

estimator

it

must be possible to

construct a nonparametric estimator of the influence function that converges sufficiently fast.

For the smoothing adjustment to the nonparametric estimator the influence function

must be smooth as a function of the data. to the functional.

A

These properties may or may not be intrinsic

topic of future research

would be to seek to identify the class of

functionals for which Vn-consistent estimation without undersmoothing

We have

is

possible.

discussed bias correction and undersmoothing for an increasingly general

class of estimators, beginning with functionals of a density and ending with

semiparametric m-estimators.

undersmoothing

is

not needed

variance of the estimator. estimators.

We showed

We found

that for semiparametric m-estimators

when nonparametric estimation does not affect the asymptotic

Regularity conditions were given for "twicing" kernel

that this class of kernel estimators has a special property, being

the outcome of a smoothing correction applied to the original kernel, that removes the necessity of undersmoothing for these estimators.

Our numerical results show MSE gains

for the twicing kernel at small sample sizes and less sensitivity to bandwidth choice in

estimation of the average density, indicating some potential for use in practice.

- 41 -

Appendix: Proofs of Theorems

Throughout the Appendix, that

may

Op =

By

2.1:

Op

(ln(n)/-/Hh'^''^*^

>

Also, for small

0.

will represent a generic positive constants,

enough

Newey and McFadden

8.10 of

R

Therefore, for

+ V^h^^).

By continuity of

jK(u)[6(2+hu)-6(z)]du.



Lemma

(ln(n)/nh'^"^^^ + h^^).

O(VnllF-F^II^) =

h

C

be different in different uses.

Proof of Theorem IIF-F^II^

and

c

5(z)

bounded

|5(z+A)|

probability one.

theorem

Var(B

Also,

n

2 |e(z,h)|

^ [Jb(z,u)du]

= Var(e(z,h}} ^ E[e(z,h)

)

2 ]



2

£ C,

0.

^

as

>

=

for

e(z,h) =

that

as



is finite

K

by

u h

u

U,

= b(z,u),



e(z,h)

n

for almost all

and has finite integral over

compact, so by the dominated convergence theorem

R

= y.",e(z.,h)/v^ ^1=1 1

n

having bounded support

K(u)

sup |K(u)[5(z+hu)-5(z)]| < l(uel/)sup |K(u)l2sup

with probability one by

B

5(z+hu)-5(z) -^

6(z),

by

h,

Also, for

follows that

it

equation (2.4), ^

in

n

(1994),

with

>

so by the dominated convergence

Also, by the usual

mean-value

expansion,

jK(u)[fQ(z-hu)-fQ(z)]du

(A.l)

.uVf^(x)

= JK(u){E^~}((-h)-J/j!)E,^,

= (-h)^X I

where

I

h

I

convenience.

£

|

h

|

^

uVf^(x-hu)Mu

^/K(u)u'^a\Q(z-hu)du/s!, I

and dependence of

Then for small enough

|E[e(z,h)]|

+ {{-hf/s\)l,^,

h

on

z

and

u

is

suppressed for notational

h,

= |J'jK(u)[5(z+hu)-5(z)]dufQ(z)dz| ^ J|5(z)

|

|

J-K(u)[fQ(z-hu)-fQ(z)]du|dz

^ Ch^I|;^,=s[^|K(u}|u^du]JsuP||^„^^la\Q(z+A)|dz = O(h^).

Then by the Markov inequality,

B

=

(V^

+ o(l)),

giving the first conclusion.

Furthermore, under the stated bandwidth conditions both

- 42 -

B

n

-^

and

R

n

-^

0,

giving

>

QED.

the second conclusion.

The following Lemma

g(z) = E[w|z],

variable,

Lemma linear,

that

£

\n(r)\

By

for

h

2

Clly

II

=

for

For

n

z ^

all

Z

=

2'„(z)

linearity of

m(3^),

for

Let

h

continuous in

(

+2^

= m(^

)

=

and

Z

Also, by

).

--z))] = E[m(g(z)K,

h

(

--z))]

^

|m(3r„)|

(--zlH =

IIK

so by linearity of

z g S,

in the

z

on

z

)?.

d

and hence

is

uniform measure on

semi-norm It

to order

K(v)

if.

m.{is),

=

G

BhJGh

support and linearity of



>

it

G

>

(

n

•-z))du.

IIJ'„'3r(u)K.

h

('-uldCu

For

T

n

from Section

-u)ll

- 43 -



Furthermore, by

u,

having finite

J

)

J

2,

Also,

J

= J"m(3'(u)K (•-u))du h J

.

Then by the

m(r) = m(J„r(u)K, (--ujdu) = J„m(3'(u)K, (--ujldu = E[m(wK. (--z))]. 6 h G n h

2.2:

is

of up

triangle inequality,

Proof of Theorem

and

continuous and

iS^

z

^-Cz)

9'(z)K ('-z)

J~„m(y(z)K,

is

m(X„3'(u)K. (•-u)du).

h



J

follows that

m(J"„9'{u)K ('-ujdu

mC^-),

is

h

with respect to S,

compact,

m(3r(z)K, ('-z))

h

bounded and continuous on

J

Z

J'„m(3'(z)K, (--zjjdu,

y(u)K, (z-u)

m(J'„3r(u)K, (•-u)du

Then, by continuity of

and by

d

Hence

ll»ll.

follows that

Also, since each derivative of

0,

llj-^ll

and

J

continuous differentiability of

to order

so that

n

such

be a sequence of measures with finite support, that

u,

in distribution to the

in

= m(y

all

h

h

J'„m(3'(u)K, ('-uljdu.

bounded

G

i.

m('j')

m(3^(z)K (--z)) = 0,

i?,

h

converge

0,

i

^

a compact set

is

y,(z) = J„3r(u)K, (z-u)du

Let

is

iJ-C^f)

E[m(wK^( --z))] = m(^).

compact there

u i ^.

z e Z,

g(z)f (z).

are continuous,

f^(z)

E[wK^('-z)],

=^

)

E[m(wK^(--z))] = E[wm(K, (--z))] = E[g(z)m(K,

b

denote a random

Z.

Then by

0.

'

and

z e Z

all

3-(u)K, (z-u)du. joc

r(

and

g(z)

having bounded support and

=

K, (z-u)

satisfied,

is

w

Let

denote a possible value of

^-(z)

then for

Cllj-ll,

K(u)

r„(z) = S

and

If Assumption 2

A.l:

Proof:

useful in the proofs to follow.

is

QED.

It

(A.2)

s VES\Siz)-5{z)\\hz)-rAz)\dz

O

I

n

£ v/Kj-blzJlfCzj-fQCzJIdzllF-FQll + v^JI D(z,F-Fq)

=

< ZV^JblzJdzllF-F^II

|

+ /Kh

(ln(n)/V^h

|

f(z)-fQ(z) |dz

).

p

Next, by

bounded,

b{z)

^ Cb{z)h

b{z)IIK ('-z)!!

D(z,F-F

that

Also, by Assumptions

m.




as

h

~



>

0.

Since

[a(x)v + £(x)y]

theorem imply that



>

2

0,

Var(a(x)v+?(x)y)

'



>

'

).

By applying the dominated

follows from Assumption 7 that

a(x)v + 2(x)y

so that

2 2 ^ C(v +y ),

it

otn^V"""^' ^

^ (Q-Q^) +

a(x) = XK(u)a(x+hu)du.

and

that

(A. 9)

follows that

convergence theorem as we have done previously

Ux)

^''^').

-2,-r-2|A|,

''^ '

^'^'q^ + o([l/n(n-l)]h ^

'^



>

a(x)v + £(x)y

2(x) -

as

Assumption 7 and the dominated convergence Furthermore, by integration by parts

Var(i//(z)).

and interchanging the order of differentiation and integration,

Emz^,z^)+k{z^,z^)\z^] = E[S'^K^(x^-X2)y2V^|z^] + ElS'^K^Cx^-x^ly^v^lz^]

=

hl2 22

E[S'^K, (x -x„)E[y„|x„]|z,]v,

11

h21

22

+ E[a'^K. (x„-x,)E[v„ |x„] Izjy,

= [ja'^Kj^(x^-x)E(y|x]fQ(x)dx]v^ +

(-1)

''^ '

11

ja'^Kj^(x^-x)b(x)dx]y^ = a(x)v + 2(x)y.

Therefore,

(A.12)

[(n-2)/n(n-l)]Var(E[fe(z^,Z2)+^(z2,z^)|z^])

=

Var(i/>(z))/n + o(n"^).

Plugging the results from (A.10)-(A.12) into {A.8) and using

- 47 -

MSE(jn

)

= Var(fx

)

+

2 (E[/j

to combine equations (A. 7) and (A. 8) and then gives the result.

]-Mf^5

By Assumption 8 and

Proof of Theorem 4.2:

made as small Then for

having bounded support,

as desired uniformly over the support of

small enough,

h

K(u)

=

|a(x)-a(x)|

|

u

by choosing

h

QED.

hu

can be

small enough.

£

JK(u)[a(x+hu)-a(x)]du|

J|K(u)| |a(x+hu)-a(x)|du ^ J|K(u)|C (x)llhull^du = h^C (x)/] K(u) llull^du = Ch^C (x)

and

|

similarly,

|b(x)-b(x)|

lEfjl

s Ch C

a

(x).

It

then follows by equation

^ T|i(x)-a(x)| |b(x)-b(x)|dx

Suggest Documents