"undersmoothing" of the nonparametric estimator to achieve /n-consistency, meaning the bias of the nonparametric estimator shrinks faster than its variance.
M.I.T
LIBRARIES -DEWEY
Digitized by the Internet Archive in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/undersmoothingbiOOnewe
!iB;V£Y
working paper department of
economics
Undersmoothing and Bias Corrected Functional Estimation
massachusetts institute of
techinology 50 memorial drive Cambridge, mass. 02139
WORKING PAPER DEPARTMENT OF ECONOMICS
Undersmoothing and Bias Corrected Functional Estimation
Whitney Newey
No. 98-17
October 1998
MASSACHUSEHS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142
1
MASSACHUSEHS OF TECHNOLOGY
LIBRARIES
UNDERSMOOTHING AND Abbreviated Title:
BIAS
CORRECTED FUNCTIONAL ESTIMATION^
Undersmoothing and Bias Correction
By Whitney Newey, Pushing Hsieh, and James Robins
Massachusetts Institute of Technology, Academia Sinica, and Harvard University First version October 1991; revised July 1998.
SUMMARY There are many important example of v^-consistently estimable functionals that are interesting in econometrics, such as average derivatives and nonparametric consumer surplus.
Corresponding estimators may require undersmoothing to achieve v^-consistency,
due to first order bias in the expected influence function.
We
give a general bias
correction that can be added to a plug-in estimator to remove the need for undersmoothing
and improve
its
higher order properties.
We
also describe a bootstrap smoothing
correction for the nonparametric estimator that achieves analogous results for the plug-in estimator and show that idempotent transformations of the empirical distribution
need not require undersmoothing for -/n-consistency.
We
find that this bias correction
can lead to large efficiency improvements and lower sensitivity to bandwidth choice.
Research partially supported by NSF grant SBR-9409707.
Key words and phrases.
Functional estimation, nonparametric estimation, v'n-consistency,
undersmoothing, influence function, bias correction.
Introduction
1.
Functionals of nonparametric estimators that can be /n-consistent have important applications in econometrics, including average derivatives and average consumer surplus.
Most functional estimators are based on nonparametric estimation and many require "undersmoothing" of the nonparametric estimator to achieve /n-consistency, meaning the bias of the nonparametric estimator shrinks faster than its variance.
show that
this
In this
paper we
requirement can be removed, by either adding a bias correction term to the
functional estimator or using a smoothing correction for the nonparametric estimator, leading to v^-consistent functional estimation without undersmoothing.
These
modifications also lead to functional estimators with improved higher-order efficiency, that
may
attain ^-consistency
The source of estimator.
this
improvement
is
not.
a reduction in the bias of the functional
Many previous functional estimators have a
bias of the density estimator. is
when others do
In contrast,
bias that
the estimators
is
the same order as the
we develop have
a bias that
the same order as the product of the bias of the nonparametric estimator with another
bias term.
which
is
The other bias term
is
that for the influence function of the estimator,
the mean-square derivative of the functional.
We
refer to the corresponding
reduction in bias order as a bias complementarity, with the influence function bias term
complementing the nonparametric bias term.
The properties of the influence function play a key role influence function
is
in
our analysis.
When the
smooth, in certain ways to be made precise below, the influence
function bias term will be small enough that the need for undersmoothing will be removed.
For some of the estimators we consider, the influence function will be required to be
smooth as a function of the density. satisfied.
This
is
a regularity condition that
For other estimators the influence function will be required to be smooth as
a function of the data, a condition that does not hold for
We
often
is
some functionals.
consider two approaches to bias corrected functional estimation.
-
1
-
Our first
approach
is
to add to the functional estimator a bias correction, consisting of the
difference of integrals of an influence function estimator over the empirical This additional term does not affect the
distribution and the nonparametric estimate.
asymptotic distribution of the estimator
Our second approach
improved higher order properties.
case but
in the Vn'-consistent
correction to the nonparametric estimator before using
is
it
it
used to obtain
G
where
is
in functional estimation.
an estimator obtained from
from the empirical
F
distribution.
We show
we show
by
F = F -
that using this estimator
smooth enough.
is
For
that this smoothing correction leads to a particular kind of
We
kernel, the "twicing" kernel described below.
be needed for /n-consistency distribution, so that
F
This
by the same transformation
F
removes the need for undersmoothing when the influence function kernel estimators
.
to do a bootstrap smoothing
correction consists of replacing a nonparametric distribution estimator (G - F) = 2F - G,
does lead to
G = F
when
F
is
and
F =
F,
also
show that undersmoothing
will not
an idempotent transformation of the empirical
and the bootstrap correction
is
built into the
This idempotent estimator class includes orthogonal series density
original estimator.
estimators, series estimators of conditional expectations, and sieve estimators, and thus
provides an explanation for the lack of an undersmoothing requirement for these estimators.
We emphasize
that the bias reduction
bias of the density estimator.
we
consider does not come from reducing the
Such higher order bias reductions
(e.g.
via higher-order
kernels) do not remove the requirement that bias shrinks faster than variance for the
nonparametric estimator. functional estimator
reduction in bias
is
is
That requirement can only be removed
the bias of the
smaller order than the bias of the nonparametric estimator.
This
brought about by the bias complementarity we will discuss.
The bias reduction may be accompanied by some increase functional estimator.
if
Although the variance
the variance will be larger.
In large
still
variance of the
shrinks at the same rate, the size of
samples the reduction
in bias will
adjustment of the smoothing parameters so that the variance
- 2 -
in the
is
allow
smaller, although the bias
reduction could increase the small sample variance of the estimator.
nonparametric estimators
(like
Bias reductions for
higher order kernels) have similar properties, except that
they depend on higher order properties of the nonparametric estimator whereas our functional bias reduction depends on the properties of the influence function.
We
find
that in a simple example the bias correction can give a large efficiency gain and reduce sensitivity to the choice of bandwidth.
The type of estimator we consider has antecedents
Ritov (1988) developed an estimator for the average density that
minimal smoothness conditions.
Bickel and
in the literature.
is
v^-consistent under
This estimator has a similar form to the bias corrected
Our
functional discussed here, as do estimators in Pastuchova and Hasminskii (1989).
contribution
improvement
is
to give a general version of this type of estimator and
in its properties.
Also,
show the
we show how bootstrap corrected nonparametric
estimators remove the need for undersmoothing, and hence that undersmoothing
is
not
needed for idempotent nonparametric estimators. Section
we
Two
form of the bias corrected functional
of the paper describes the general
consider, and gives results for kernel estimators.
Section Three describes a
nonparametric estimator with a bootstrap correction for smoothing and functional estimation.
its
use for
Section Four considers a special class of linear estimators where
mean-square error calculations are
feasible,
showing more precisely the higher order
effect of the bias correction, and giving exact
MSE
calculations for one case.
Five analyzes semiparametric m-estimators, shows that undersmoothing
is
Section
not needed
the nonparametric estimator has no effect on the limiting distribution, and derives results for kernel estimation with a smoothing correction.
- 3
Section Six concludes.
when
Bias Corrected Functional Estimation
2.
essential features of the problem at hand
To focus on the
discussion with a linear functional
fi(F)
= J5(2)F(dz)
denotes some unsigned measure (charge) on
where
correspond to a nonparametric density estimator fi
=
;i(F)
known and
is
is
estimator
is
(2.1)
Kill]
-
Mq =
F
relatively simple,
.
For example,
Let
F
F(z) = Jl(u:£z)f(u)du
= j5(z)F(dz)
and
let
F
could
and
F(z) =
Assuming that the order of integration can be interchanged, the bias of
E[F(z)].
If
Consider the estimator
F
with
f,
useful to begin our
is
easily understood here.
is
be some nonparametric estimator of the true distribution
= J5(z)f(z)dz.
6{z)
Although this case
z.
the role of undersmoothing in achieving v^-consistency
/j(F)
it
this
J'5(z)F(dz) - J'5(z)FQ(dz) = SS{z){F-F^){dz).
the order of this bias
is
the same as the order of the pointwise bias of the
nonparametric density estimator, as occurs
in
many
involve the pointwise bias shrinking faster than
cases, then Vn'-consistency will
Furthermore, the pointwise
l/-/n.
standard deviation of a nonparametric density estimator generally shrinks no faster than so that /n-consistency will involve the pointwise bias shrinking faster than the
l/'/n,
pointwise standard deviation. bias
is
error
This property
is
referred to as undersmoothing, since less
generally associated with less smoothing and the fastest shrinkage of
is
mean square
generally associated with bias and standard deviation shrinking at the same
rate.
A way (2.1)
to reduce the bias of
and subtract
it
(2.2)
is
to
In this simplest
off.
J'5(z)F(dz) - 5]._ 5(z.)/n
fi
form an estimator of the bias term case there
is
an unbiased estimator
of the bias term, and subtracting
M = M + I.",6(z.)/n ^^1=1 1
- j5(z)F(dz)
= J].",5(z.)/n, ^1=1 1
- 4 -
it
off gives
in
equation
the usual unbiased estimator.
For nonlinear functionals the same basic idea can be applied to a first-order bias term.
To describe
which both
this approach, suppose that
and
F(z)
Fp,(z)
belong.
ju(F)
For example,
has some restricted domain S^
to
S^
might be restricted to have
elements that are absolutely continuous with respect to Lebesgue measure, with a density f
corresponding to each
F e
?.
Also suppose that there
is
an expansion of
/i(F)
on
5^
such that
m(F) =
(2.3)
where
)li(Fq)
+ j5(z,FQ)(F-FQ)(dz) + R(F-Fq.Fq),
denotes a function semi-norm.
IIFll
5(z) = 5(z,F
),
and
i/((z)
= 5(z)-E[5(z)].
P
Let
|R(F-Fq,Fq)| = odlF-F^II),
denote the empirical distribution,
Then the plug-in estimator
= ^(F)
fi
satisfies
Vniii - tij = y.",i//(z.)/'/n +
(2.4)
The
^1=1
1
R
+
n
B
first order bias of this estimator will be
R(F-F
,F
including higher order terms.
)
an estimate of this first order term. first order
term would be
If
R
,
n
E[B
n
6(z)
]
n
= /Kj5(z)(F-P)(dz).
= /nJ"5(z)(F-F„)(dz),
with
A bias correction can be formed by subtracting 5(z)
were known, an unbiased estimate of
/5{z)F(dz) - y. ,5(z.)/n. ^1=1 1
estimate can be formed by replacing
= \/nR(F-F^,F^), B
n
A feasible version of
with an estimator
this
this bias
Then subtracting the
6(z).
corresponding bias estimator gives
(2.5)
M = A + Ei^^Zi^/n - j5(z)F{dz) = A + J'5(z)(P-F)(dz).
Unlike the linear functional case, this estimator will not be exactly unbiased, because of the higher order term
R(F-F ,F
)
and the estimation of
because we have subtracted an estimator of the first-order bias of should have smaller bias than
expansion that
is
the same as
ji.
ji
In fact, as discussed below,
Nevertheless,
5(z).
it
fi,
has an asymptotic
except one of the remainder terms
- 5 -
this estimator
is
of smaller order.
A simple example
is
the average density.
Suppose tha.
restricted to contain
is
3-
only absolutely continuous elements with square integrable densities and let
2
"
Let
dz.
J'f(2)
=
fi(F)
m(F) = jj(F
)
f
and
denote the densities corresponding to
f
+ SZT^{z)ir{z)-r^{z)]dz + J[f(z)-fQ(2)]^dz.
satisfied with
6(z,F
)
= 2fQ(z),
a bias corrected estimator
and
=
IIFII
and
Letting
Note that
.
is
«
" .
F
so that equation (2.3)
1/2 fi(F)
F
= 6(z,F) =
5(z)
^
2f(z),
given by
is
M = Shz)'^dz + J-2f(z)(P-F)(dz) = 2j].y(z.)/n - J-f(z)^dz.
(2.6)
example the bias corrected estimator
In this
2
estimators, the plug-in estimator
J'f(z)
a linear combination of two well known
is
and the average estimated density
dz
I.",f(z.)/n. ^1=1 1
To see the effect of the bias correction
V^{^ - Mn) = I.",(z)
continuously differentiable of order
some
c
and
>
all
s
ff^(z)
satisfied with
s
h
s ,
d+a
has
with
A
|A|
s+d
=
on
with bounded
IR
Jsup
s,
\d f(z+A)|dz
00
fi.
2.1:
and
h
R
Also, if
n
If Assumption 1-3 are satisfied and
— =
>
0,
p
and
6('z^
is
-^
and
such that
nh
/ln(n)
continuous with probability one and bounded,
B
(ln(n)/VRh'"^^^ + \^h^^),
ln(n)/Vnh'"*'^^
h = h
VRh^ -^
n
=
(/Rh^) + o
p
then
- 13 -
(1).
p
^/nCtl-aJ =
y.'^Mz.)/VR
+ o (1).
The hypotheses of
this result require undersmoothing.
mean-square error) for estimation of the
d
derivative of
d+s
*• J J derivatives J J *u and bounded the kernel continuous u
and
Vnlh
•
than
—
vnh
>
,
order,
s
does not go to zero.
to have
h
is
when
fr,(z),
Here
h
.
,
is
h
= n
l/(r+2d+2s)
0.
essential to be specific about 5(z).
Here we assume that
satisfies certain smoothness conditions in
h,
F e
with
4:
5^,
There
is
^
for
and
z S Z,
|5(z,F)-5(z,F )-D(z,F-F
5-
D(z,F) 11^,
that
F
equicontinuity term
S
obtain the order of
S
It
is
|D(z,F)|
£
F e
5-,
b(z)IIFll.
5(z,F)
in
F,
helpful in deriving the order of both the stochastic
T
and the nonlinear term ,
6(z,F)
bounded,
b(z)
such that for
This condition follows from second order Frechet differentiability of
with bounded derivative.
where
and for small enough
Also, there is
linear in
is
and
e J
F
such that
with probability one.
£ b(z)IIF-F
)|
6(z) = 5(z,F),
is
it
F.
a set of functions
Jl(u:£z)K (u-z.)du 6
b(z) =
,
must be chosen smaller
To obtain conditions for ^^-consistency of the bias corrected estimator,
Assumption
has
f^(z)
*
th
,
,
= n
)
The optimal bandwidth (minimum
in
D
.
We
use this assumption to
rather than empirical process methods, because a direct proof
for kernel estimators (as in Newey and McFadden, 1994) seems to allow for a wider class of functionals.
In particular, empirical process
rely heavily on smoothness of Also,
when
6(z)
is
linear in
U-statistic theory gives
S
—
5(z)
F
^
in
(i.e.
z,
methods for showing that
S
-^
that can be avoided by using Assumption
b(z) =
as for the average density),
under very weak conditions.
These conditions lead to the following result for the bias corrected estimator:
- 14 -
4.
Theorem.
—
>
and
00
If Assumptions
2.2:
h
—
>
D
Also, If
ln(n)/VKh^'^^^ ->
=
p
n
in
such that
h - h
and
VRh^^ -^
VR(ii-yiJ =
then
D
Also,
h.
if
the bandwidth
is
the mean-square minimizing value for estimation of the
d
derivative of
the estimator will be V^-consistent for any value of
that
is
of
such that
h
this case the
of
fp,(z),
r+2d InCnj/VTih
bandwidth which
which
is
n
—
is
2s
and
>
\/nh
,
in the
h
>
satisfying the conditions of
requires
s
>
and
d + r/2,
in
derivative
d
does not require undersmoothing of
ji
f.
are weaker than the conditions for
(j
Theorem
2.1 requires
s > r + 2d,
Existence of
h
while existence
Theorem 2.2 for v^-consistency requires only
Thus, with the bias-corrected estimator
f(z)
only required to have half the number
is
of derivatives, as for the original estimator, or alternatively derivatives that are needed, the kernel for /J
then
s > (r+2d)/2.
(2.10)
for
fp.(z),
Specifically, existence
sense that they require less smoothness.
satisfying the bandwidth conditions of of
—
chosen to be
will satisfy the conditions for /n-consistency.
The conditions for Vn-consistency of fi,
(I).
p
large enough so that
pointwise optimal for estimation of the
This result then shows that v/n-consistency of
^n-consistency of
s
satisfying the conditions of this theorem.
h
o
+
I
obtained here will be smaller than the upper bound obtained for
for a range of bandwidths
2.1,
there exists an
/ln(n)
Y^ Mt..)/^
'-'1=1
Theorem
nh
an(n)/VRh'"^^^ + VRh^^ + h^).
The upper bound on B
are satisfied, and
then
0,
(2.9)
n
1-4
in
order to attain Vn'-consistency.
of smoothness of
6(z,F)
as a function of
/i
^^.(2)
has
all
the
need only be half the order of the kernel
This improvement F,
if
is
achieved at the expense
as in Assumption
- 15 -
4.
Bootstrap Corrected Nonparametric Estimation
3.
Another approach that also reduces the size of the bias can remove the need to undersmooth,
To motivate
corrected for smoothing.
E[F]-F
the order of
is
fi(F)
We
.
the empirical distribution, vi^hich
because smoothing
is
is
is
F)
to plug in a nonparametric estimator that has been
could eliminate this bias
F
To describe
if
we could replace
Often this replacement
unbiased.
into the
domain of
and smoothing induces some
F = CP
this correction, suppose that
with domain that includes the empirical distribution
P.
(j(F)
it
F
not possible
is
when
(e.g.
/j(F)
to partially correct for
for some transformation
Often
by
However, we can
bias.
construct a smooth estimate of the smoothing effect and use bias.
functional estimation and
this approach, recall that the first order bias in
required to bring
depends on the density of
in
C
will be a
C
smoothing
transformation, such as convolution for kernel estimators, that depends on the sample size and gets close to the identity
CF-F
I
as the sample size grows.
should be an estimate of the smoothing effect
CP-P.
Then for large samples
Subtracting
CF-F
from
F
gives
F = F - (CF - F) = 2F - CF = 2CP - C^P = (2C-C^)P.
(3.1)
where
C
2
smoothing, correction
is
in
the composition of the sense
P
is
C
with
replaced by
F
in the
is
a bootstrap correction for
smoothing effect
CP-P
to
form the
CF-F.
In this Section
we
will consider an estimator obtained
corrected nonparametric estimator, giving
estimator
This
itself.
/j
=
ii(F).
It
may be v^-consistent without undersmoothing.
correcting the distribution estimator and plugging
it
in,
by plugging
will be
in this
shown that
smoothing
this
This approach, of bootstrap
allows the same nonparametric
estimator to attain the optimal nonparametric convergence rates and be plugged into functionals that attain the optimal
l/\^
rate.
To achieve
this simultaneous
optimality, the influence function will need to satisfy certain conditions that are
- 16 -
p
detailed below.
To understand the effect that the bootstrap correction has on the functional estimator, consider an expansion analogous to equation (2.4) with
Mix
(3.2)
- n^) = Ij^^iAlzJ/v^ ^
This expansion
is
the same as for
the first order bias term.
{2C-C
^n
^(F)
Suppose that
^
" ^R^F'-F'o.Fq), B^ = ^J'6(2)(F-P)(d2
^n
^n'
except that
F
C
with
is
linear,
replaces
with
F,
E[B 2
being
]
'•
E[F] = E[{2C-C )P] = *
2 )F
Also, suppose that there
.
is
C 5
a transformation
of
such that
5(z)
»
•
J'5(z)CF(dz) =
SC 5(z)F(dz)
F = F
for
interpreted as the adjoint of
C,
F = CF
and
With more structure
.
as in examples given below.
Then
may be
C
the order of
if
integration can be interchanged,
E[B
(3.3)
n
= v^J'5(z)[(2C-C^-I)F^](dz) = -ySj5(z)[(I-C)^F^](dz) U U
]
= -v^J[(I-C*)5](z)[(I-C)FQ](dz).
Here
(I-C)F
represents smoothing bias from the transformation
small in large
C,
#
samples when
C
is
close to
I.
The term
also be close to zero in large samples. -/nJ'5(z)[(I-C)F ](dz)
for
(I-C )6
bootstrap-corrected estimator
F
an analogous term that should
Comparing equation
from equation
B
is
we
(2.4),
with
(3.3)
E[B
=
]
see that using the
leads to the replacement of
by
5
in the
(I-C )5
*
integral, reducing the first order bias
when
(I-C )5
is
close to zero.
bias complementarity effect referred to above, where the bias in
F
This
is
the
has been
*
complemented by the influence function remainder
(I-C
)5.
Kernel estimators are an important example, where
^
/s
jK(z,u)P(du)
for
f
for
(c{z,u)
= h
F
has density
—
f(z)
=
^
K((z-u)/h).
leads to a transformation
the bootstrap corrected estimator "twicing" kernel associated with
C
Plugging in
where
CF
F = 2F-CF, K(u).
let
F
for
has density
in the
formula
J"K(z,u)F{du).
To describe
K(u) = 2K(u) - J'K(u-t)K(t)dt
Then the density of
17 -
P
F
will be
be the
= 2f(2) - jKtz.ujfCujdu = 2f(z) - JjK(z,u)K(u,t)duP(dt) = jK(z,u)P(du),
f(z)
(3.4)
= 2k(z,u) -
lc(z,u)
That
J'K(z,t))c(t,u)dt
= h~^K((z-u)/h).
the smoothing correction gives a kernel estimator with a twicing kernel that
is,
is
constructed from the original kernel.
We
may is
use twicing kernel estimators to illustrate the bias correction, although they
not be the best choice of nonparametric estimator.
twice that of the original kernel and
density estimator feature.
Also,
may
f(z)
it
is
is
A twicing kernel has order that
not everywhere positive.
not be everywhere positive, which
Consequently, the
may be an undesirable
known that using higher order kernels may not improve mean-square
error in most sample sizes of interest,
e.g.
see Marron and
Wand
This motivates
(1992).
a search for other bootstrap corrected estimators, as considered below.
To understand the bias formula *
C
find a transformation f,
CF
has density
in
equation (3.3) for kernel estimators
we need
to
*
with
JC 5(z)F(dz) = j5(z)CF(dz).
5(z) = J'/c(u,z)5(z)du,
so that for
jK(z,u)f(u)du,
F
For any
with density
.
j6(z)CF(dz) = j5(z)[J'K(z,u)f(u)du]dz = J5(z)f(2)d2 = SC 5(z)F(dz).
_
»
where and
u
C
*
5(z) = 5(z).
in
which
k,
Here is
C
known
is
z
to be the adjoint under certain conditions (Luenberger,
This transformation
1969, p. 153).
the integral transform obtained by interchanging
is
a convolution, with
5(z)
= J'K(u)6(z+hu)du.
Equation (3.3) now becomes
(3.5)
E[B
]
n
= -/Sj[5(z)-5(z)][f(z)-f-(z)]dz.
The bias effect of the bootstrap correction J'6(z)[f(z)-f (z)]dz (3.5).
is
to replace the integrated convolution bias
with the integral of the product of convolution biases in equation
The magnitude of the bias reduction
will
depend on the convolution bias
- 18 -
which
5(z)-5(z),
in
turn depends on the smoothness of the influence function, as
known from the kernel estimation
well
is
The following result makes these conditions
literature.
precise.
Theorem
3.1:
If Assumptions 1-3 are satisfied,
order
^ s
on
and
t
h
—
Also, if
with bounded derivatives,
R
S(z)
continuously differentiable of
is
h = h
such that
n
ln(n)/Vnh'"^^^ -^
The upper bound on ^^
B
Vnh
has been replaced by \/n-consistency of the bandwidth
is
is
n
/j(F)
is
and
satisfied with
->
Vnh^"^^
/a
.
B
in
n
B
and
Vii(il-n) =
then
smaller than the bound on
s+t
= ^i(F)
=
(Vnh
y.'^Mz.)/VR
^1=1
Theorem
—
/ln(n)
~
~
/K
then equation (3.2)
0,
>
nh
S+t
t ).
(1).
p
I
2.1
+h
+ o
oo
>
because
Vnli
This replacement means that sufficient conditions for
are different than for the original plug-in estimator
chosen so the dominating remainder terms
are asymptotically proportional then
fi(F)
r+?d
Vnh
and
ln(n)/\/nh
If
jj.
s+t
will be \^-consistent if
s + t > r + 2d.
(3.6)
This condition allows some tradeoff of smoothness of
f^(z)
and
5(z)
for attaining
v^-consistency.
This estimator also attains V^-consistency without undersmoothing function
is
smooth enough.
Consider the case where
fp,(z)
differentiable, so that the order of the pointwise bias in
bandwidth so that the squared bias order of the pointwise variance.
(3.7)
t >
r/2 +
h
is
~ f
is
only
h
s
s .
if
the influence
times
Choose the -1
2s is
proportional to the order
n
Then the conditions for v/n-consistency are satisfied
h
-r-2d
if
d.
The smoothing bias correction depends on smoothness of the influence function in
contrast to the bias correction of Section
influence function in
F.
2,
in
that depends on smoothness of the
The bias correction of Section 2 may
- 19 -
still
give \^-consistency
z,
even when
5(z)
Jl(a:£z:sb)f(z)
positive at
2
is
dz
b,
is
that
(z)
is
=
fi(F)
discontinuous when the density
and so does not satisfy the conditions of Theorem
satisfy the conditions of (2.5) could attain
For example, the functional
z.
5(z) = l(a:£z:£b)2f(z)
Nevertheless,
It
in
6(z) = l(a:5z r
Consider a plug-in estimator with an ordinary kernel
is
chosen to give /n-consistency.
The plug-in estimator
with a twicing kernel of the same order will also be Vn-consistent for the same bandwidth choice, f
(z)
if
t
=
s > r/2.
That
to be half as smooth,
\/n-consistency with a twicing kernel only requires
is,
if
also
5(z)
is
smooth enough.
Also, the
same bandwidth
no longer has to involve undersmoothing, because only half as many derivatives of the density are needed to exist, and the optimal bandwidth for estimating a density with
fewer derivatives
will be smaller.
There are other nonparametric estimators that have the same functional bias reduction property as twicing kernel estimators. those where the smoothing transformation
= 2F-F =
is
may
=
j
1,
2,
...)
to Lebesgue measure on
"built into"
F.
F.
Here
F = 2F
is
(R
,
i.e.
an orthogonal series density estimator.
Jp.Culp, (u)du =
p (u) = (p
(u),...,
p,(u))'
k
and
series estimator is
- 20
1
if
j
a = X-_iP
= k (z-)/n.
-
Our results then
be a sequence of functions that are orthonormal with respect
J
Let
is
CF =
not be needed in this case.
One idempotent nonparametric estimator
otherwise.
idempotent, with
so the bootstrap smoothing correction
F,
indicate that undersmoothing
Let (p.(u),
A particularly important class are
and equal to zero
Then an orthogonal
CF
= p-^(z)'a = jK(z,u)P(du),
f(z)
Like the kernel estimator
k(z,u) = p-'(z)' p-^lu).
has the linear form
f
jK(z,u)P(du),
inner product of orthonormal functions rather than a kernel.
bootstrap correction, plug
F
in
but
k(z,u)
is
now an
To see the effect of the
P in the density formula to obtain
for
jK(z,u)f(u)du = J'[jK(z,u)»c(u.t)du]P(dt)
= V(z)'[J"p"^(u)p"^(u)'du]p-'(t)P(dt) = jK(z,t)P(dt) =
C
so that the transformation
p (zj'Jp (u)f (u)du f
to
Then by
6(z).
the
E[B
]
n
Note that
idempotent.
f(z) = E[f(z)]
minimum integrated squared error
5(z) = j6(u)K(u,z)du
and
(z)
is
is
= p (z)'J"5(u)p (u)du
is
f(z),
the
(ISE)
=
approximation to
minimum ISE approximation
jSlzJif^Cz)-? (z)]dz = 0,
= V^SSU)[fiz]-fAz)]dz = -/Kj[5(z)-5(z)][f(z)-f^(z)]dz. U U
Here the bias term that appears to be first order the bias correction being built into
f.
is
actually second order, a result of
Consequently, the bias of the functional
estimator can shrink faster than the pointwise bias, removing the need to undersmooth. This example can be made precise by specifying an ISE rate of approximation for
and
ff^(z)
Theorem 0(J~^^'~),
3.2:
leading to the following result:
If Assumption
and
B
5(z),
1
is satisfied,
{Slf(z)-f^(z)fdz)^^^ = 0(J~^'^^)
= Y..'^Jd(z.)-d(z.)}/Vn =
Furthermore, if
f rf^)
y/nJ
-^
and
*''''
(J
+
^^
P
- 21 -
— 1/2 2 = {S[S(z)-5(z)] dz)
then equation (2.4) is satisfied and
VnJ
VnWF-F^W
o (V.
hounded,
^^'' *'^0.
-^
0,
then
VnCju-ju^;
= I-^/Cz^VvTi +
this result are not very primitive, but are consistent with the
The hypotheses of rate
for approximation by orthogonal polynomials, where
s/r
undersmoothing will not be required for /n-consistency.
—
Vni
conditions only require that
>
For instance, n
linear.
correction
C
We can
.
is
^
if
)
by hypothesis, but the
t
is
large enough the
vnJ
n
1/4 -s/r J
— s/r— t/r
—
>
—
>
0,
meaning the bias
0,
and should be implied by
obtain more primitive conditions in the case of
=
2
is
only considered the case where the smoothing transformation
C
?
I
/?
fi(f)
-?c:/r
-t;/r
= 0{J
{/[flzl-f^lz)] dz>
we have
When
then
s >
-1/4
—
dz
—
So far
2:
/—
n
-s/r
For example, the average density estimator
specific functionals. if
t
will suffice for
,
converging faster than
/K-consistent
if
—1/4
shrinks faster than
0{J
is
Therefore
0.
II
which will not require
,
f
VnllFrF
Specifically,
same rate as the standard deviation without
bias could be allowed to shrink at the
affecting consistency.
n
mean-square bias of
Also, the
undersmoothing.
f
-1/4
converges slightly faster than
f
number of
the
is
The conditions are specific enough to see that
continuous derivatives that exist.
will hold if
s
known
)
and
v^(J/n +
J
nonlinear analogous results should also hold:
may remove
)
J"f(z)
^
0.
is
A bootstrap smoothing
the need for undersmoothing when the influence function has enough
derivatives, and this correction will be built into estimators that are idempotent
transformations of the empirical distribution.
an expansion of analysis, so
C
ourselves with pointing out some existing examples of nonlinear,
idempotent transformations where
Newey
(1994)
However, this would greatly complicate the
in the analysis.
we content
These results could be shown by including
it
is
known that undersmoothing
showed that undersmoothing
estimator of a conditional expectation.
is
is
not needed.
not needed for functionals of a series
This occurs because series estimators of
conditional expectations are idempotent, leading to a corresponding idempotent distribution estimator.
For brevity, we omit details.
Shen (1997) has also shown that
undersmoothing may not be needed for sieve density estimators when the influence function is
smooth enough.
This also corresponds to an idempotent transformation.
describe a sieve estimator as the solution to
- 22
We can
= argmax
f
= argmax
^Y.---,^^^^^-^^'"-
where
B
is
some restricted class
where
B
is
some parametric family that
we
F
by
P
replace
CF
Thus,
also imposes other restrictions, such as
follows from the information inequality that
It
= argmax^^gj[lnf(z)]f(z)dz,
f
if
Sieve estimators are of this form,
of densities.
boundedness of higher order derivatives.
i.e.
Jlnf(z)P(dz),
in the
transformation that gives
idempotent, so the smoothing correction
is
is
F
we
obtain
where
L
Consider
[I-(I-C)^]P,
a positive integer.
is
again.
built into a sieve estimator.
Higher order bootstrap smoothing corrections could also be carried out.
F^ =
F
We have
F
be any integer,
j
£
replacing :£
L,
= F
and
The estimator
to higher order bootstrap corrections.
equation (3.2), with
F
F.
C
Assuming
;-i(F
is
=
F )
while
F,
will
L
>
2
correspond
have an expansion
like
linear as before, and letting
the first-order bias term will be
This represents a higher-order bias complementarity, where some of the smoothing bias shifted
from
F
to
5.
Of course,
if
corrections are built into the estimator,
We
also note that
it
is
C
idempotent these higher-order bootstrap
is
i.e.
F
=
F.
possible to bootstrap correct the functional, forming an
estimator as
fi
If
=
li(F)
j
- [fi(CF)-fi(F)] = 2m(F) - (j(CF).
the functional were linear then
/j
=
ii{F),
i.e.
- 23 -
the estimator
is
the same
is
whether we bootstrap correct the functional or the density. case the first-order linear terms for both estimators, so that
(jl
In the
general nonlinear
the expansion of equation (2.3) would be the same
in
and
)u
should have the same first-order bias.
Linear Kernel Averages
4.
Many important estimators are averages data observations.
do so
Examples include the well known average density estimator
and the weighted average derivative estimator of Powell, Stock, and Stoker
X|._,f(x.)/n (1989).
of functions of nonparametric estimators and
The bias corrected estimation results can be extended to cover this case, and we Section and the following one.
in this
Consider a parameter of interest
= E[g(2,FQ)] = Jg(z,FQ)FQ(dz),
^Xq
where F.
g(z,F)
is
some known function, that may have a restricted domain as a function of
One way to estimate
fi
is
to plug a nonparametric estimator
F
into
integrate over the empirical distribution to obtain
(4.1)
11
= J'g(z,F)P(dz) = Xj^^g(Zj,F)/n.
One could also use
F
in place of
P,
although the estimator
computationally simpler.
- 24 -
fi
is
often
g(z,F)
and
To understand how a bias correction should be constructed for
A = J-g(z.FQ)P(dz) + {liFhiiiF^) +
Here is
S
is
n
expectation of
fi(F)-/i(F
nonlinearity in
F.
functional 2.
/i(F).
Suppose that
The term
correction
(4.3)
is
il
then
It
is
Section
3.
is
is
is
unbiased for
but the
(n
to smoothing inherent in
we need
y.
u
second order, so to first order
may depart from zero due
)
F
and to
to bias correct for the
This correction can be constructed by applying the analysis of Section (i{F)
= i"g(z,F)F )
and
(dz)
satisfies equation (2.3) and has influence
5(z)
denote an estimator of
let
the
same as
in
5(z).
and the corresponding estimator
J'3(z)(P-F)(dz)
= M + l5(z)(P-F)(dz) =
This correction function of
S^ = J[g{z,F)-g(z,FQ)](P-FQ)(dz).
J'g(z,FQ)P(dz)
Therefore, to bias correct
5(z) = 5(z,F
function
S^,
a stochastic equicontinuity term that
Jg(z,F )P(dz) + m(F)-m(Fq).
let
Then
m(F) = E[g(z,F)] = J'g(z,F)FQ(dz).
(4.2)
this estimator,
ii
The bias
is
+ X;.",5(z.)/n - i"5(z)F(dz).
Section 2 except that
6(z)
estimates the influence
J'g(z,F)F (dz).
also possible to
form a bias corrected estimator by applying the analysis of
Plugging in a bootstrap corrected nonparametric estimator
F
in
g(z,F)
gives
(4.4)
It
ji
= Jg(z.F)P(dz) = X."^g(z.,F)/n.
can be shown by results analogous to those of Section 2 and Section 3 that the need
for undersmoothing can be removed by both approaches to bias correction.
omit this general analysis, that linear case,
where
g(z,F)
is
is
a special case of Section 5.
linear in
asymptotic mean-square error results.
F
and
F
is
linear,
For brevity we
We focus here on
the
where we can derive
This allows us to quantify the variance effect of
the bias reduction, and includes several interesting examples.
- 25 -
We
One example
is
v3
g(z,F) =
consider
the average density, where
weighted average derivative, where
A
is
=
v = y
and
v
A = 0.
and
I
are not elements of
y
y =
a unit vector and
is
x.
a density
so that by
I,
a density weighted conditional covariance, where
is
Another
n^ = Elvd^T^U)] = E[E[v|x]5\q(x)] = -E[fQ(x)a'^E[v|x]]/2.
integration by parts third example
where
{E [y|x]f(x)}
= E[vE[y|x]f
fi
The estimators are obtained by substituting a kernel estimator for
A (x)].
d {E [y|x]f(x)> r
and averaging over
They have the form
z..
",K, (x.-x.)y./n]/n = ^."T.'^flV (x.-x.)v.y ./n^. A = i:.'^,v.[a'^j: ^1=1 1 ^i=l^j=l ^j=l h 1 1 J J J J
(4.5)
hi
These types of estimators have been considered by Hall and Marron (1987), Hardle and
Tsybakov
Our contribution
and Stoker (1997), and others.
(1993), Powell
is
to
show how
the bias complementarity affects the mean-square error of these estimators.
The estimator
has precisely the form
fi
z = (x,w)
describe this form suppose that
in
equation
where
w
includes
and
y
v,
To
F.
and
The corresponding marginal density of Also, for any function
a(w)
x
is
E[a(w)|x]
d (E [y|x]f(x)}
is
d
It
is
equal to
in
fi
h
J-l
equation
J
is
|^(F)
=
E[vS
(x-x.)/n.
,a(w.)K, (x-x.)/f(x). ^1=1 1 h 1 Y.
./n,
so that
J
5(z),
the bias corrected
identical to the bootstrap corrected estimator of equation
based on the corresponding twicing kernel.
(1994) to
K
(4.5).
turns out that with a symmetric kernel and a certain
estimator of equation (4.3)
n
is
V ._ K (x-x.)y
1"
J'g(z,F)P(dz)
f(x) = 5]._
the kernel estimator
the estimator of
Therefore, the estimator of
{E„[y|x]f(x)}]
To describe
this result, apply
Newey
to obtain the influence function formula
r
(4.7)
let
F(z) = n~V.",l(w.:)3'^b(x)dx/(s!t!).
,=r, I
A =t A A I
term
r,, ^7 = n -hr Var[(jj(z)] + n
^
)
in this
that will dominate
if
expression
is
are satisfied then as
-2^-r-2\X\ ^ ^2s+2t 2 h B Q + h
+
—
n
and
oo
>
,2s+2t
o(h
+n
h
—
^
-2-r-2\X\ h
0,
+n
-U ;.
the usual asymptotic variance under v^-consistency
the other terms go to zero.
from kernel estimation.
bias terms
5-7
If Assumptions 2 and
.,^r^^~ MSE(n
(4.11)
The
IiM ^1 A|=s,
The estimator
usual asymptotic variance term dominating,
if
n
The other two terms are variance and with the
will be Vn'-consistent,
—
h
and
>
—
nh
>
0,
meaning the pointwise variance of the kernel estimator goes to zero and the product of pointwise biases for
and
a{x)
shrink faster than
b(x)
chosen to balance the variance and bias terms, asymptotically proportional,
s + t
(4.12)
This condition
is
> r/2 +
i/Vn.
n
so that
If
the bandwidth
and
h
h
are
will be v^-consistent if
(i
|A|.
weaker than the requirement of Sections 2 and 3 that
by the linearity of
is
g(z,F)
made possible
and the absence of the own observation term.
F
in
is
Equation (4.12) allows equality rather than the strict inequality of equations (2.10)
here. in
and
(3.6),
However
equation
Vniyt -|j
)
(4.11)
that s + t
is
possible because uniform convergence rates for
= r/2 +
will be
Qn
I
,
A
|
is
are not used
a knife edge case where the variance remainder term
the same size as the leading term.
will not be asymptotically
f
normal with variance
In this case
Var(i//(z)),
so the usual
functional asymptotic theory does not apply.
We can compare
the
MSE
in
Theorem
4.1 to the
MSE
for the estimator based on
the original kernel to evaluate the effect of the bias correction. C..Sb{x)d a(x)dx/(s!). El-itA —S A I
Then by Powell and Stoker
1
- 29 -
Let
(1997), the
B =
estimator
/i
C
obtained from equation (4.9) by replacing
MSE(A
(4.13)
'
^2s-^2
The comparison of the MSE of
MSE
and
fi
B
fi
,,2s
+ o(h
and
+ n
n
-2^-r-2|A| h
is
is
/i
to be larger than
A 2 A~ 2 Sid K(u)] du/J"[5 K(u)] du,
A 2 Sid K(u)] du.
complementarity, the bias term estimation.
The constant
rather than the
2s
is
).
The ratio of kernel variance
which
A
term depends on the
order derivatives of
a(x).
derivatives of
t
This bias reduction kernel.
optimal bandwidth formula for estimation of
with
over
h
B *
a{x)
is
b(x)
better in some
In particular,
it
may
implies smoothness
as for the average density).
(e.g.
The asymptotic mean-square error (MSE) given
s,
A~ 2 Sid K(u)] du
same way as for pointwise
require no additional derivatives to exist when smoothness of b(x)
exactly the same as the
On the other hand, because of the bias
ways than the pointwise bias reduction from a higher order
of
is
at a point, with
S f^(x)
not reduced in exactly the
in the bias
-1,
partly analogous to a comparison between
ratio of pointwise variance terms for estimation
known
+ n
of the corresponding kernel estimators. ~
terms for
has
K(u)
= n"Var[0(z)] + n^V'^"^' ^ {J[a^K(u)]^du/J[a^K(u)]^du}Q
)
+ h
the pointwise
by
K(u)
0.
fi
,
in
Theorem
when
t
=
s
4.1
can be used to obtain an
and the kernel
is
order
Minimizing the sum of the second and third terms in equation (4.13)
gives
h = [Q(r+2|A|)/(B24sn2)]l/(4s+r-.2|A|)_
(4.14)
This bandwidth
is
optimal, in the sense of minimizing the leading terms in the
MSE
of the
estimator. It
is
interesting to note that although undersmoothing
\/n-consistency, undersmoothing is still optimal for
ji.
is
not required for
The optimal bandwidth converges
faster to zero than a bandwidth that minimizes the asymptotic mean-square error of
This feature seems to be specific to linear functional estimators with the
-
30 -
own
a(x).
observation deleted. n
If
term
h
the
own observation were included then there would be an MSE, which dominates the
in the
n
term, and will make the
h
rate for the optimal bandwidth for functional estimation the same as for nonparametric
when
the functional
estimation.
Also,
R(F-F
as discussed in Section
,F
)
MSE
This should make the
3,
nonlinear there
is
which
~ IIF-F
of order
is
of no smaller order than
an additional remainder term
is
IIF-F_II
2
under Assumption
II
=
(max{n
h
1.
,h
>),
p
where undersmoothing would not be optimal. The optimal bandwidth formula the bandwidth in practice.
in
equation (4.14)
The constants
Q
and
by replacing the unknown nonparametric terms
Q
obtain
and
B,
which could then be used
is
potentially useful for choosing
are unknown, but could be estimated
B
in their
formulae by kernel estimates to
Q
in place of
and
B
the optimal
in
bandwidth formula to form an estimate.
An interesting example described above, where
be
i//(z)
= 8
f
Stoker (1989).
is
|A|
the density weighted average derivative estimator
=1
and
y =
Here the influence function for
1.
(x){v-E[v|x]} - f (x)3 E[v|x] -
Suppose that
fp,(x)
is
s+1
as derived in Powell, Stock, and
fi
times differentiable and
times differentiable, that the original kernel has order at least following integrals exist, and
s,
E[v|x]
and
all
is
the
let
Q = a[a^K(u)]^du}JVar(v|x)fQ(x)^dx
B =
1,^,^3^
|^,^^CxCx/s't3\(x)]aV[v|x]fo(x)}dx/s!t!.
Then, the conclusion of Theorem
(4.15)
In this
MSE(mJ =
4.1 implies that
Var[iA(z)]/n +
as n
—
>
m
and
h
—
>
0,
n"\"''"^Q + h^^^^^B^ + o(n~-^+n~V^~^+h^^*^^).
example the optimal bandwidth with
s
=
will
fi
t
- 31 -
is
t
oi/ro2.4sn [Q{r+2)/(B h = ^r^r
2,,l/(4s+r+2) )]
For example, with a normal kernel,
=
s
t
=
r
2,
small enough,
llx-xll
A =
not
is
a
and
(x),
|a(x)-a(x)|
such that
C, (x)
b
^ C (x)llx-xll
and
JC
a
(x)C, (x)dx
and
oo
is
again the product of pointwise bias orders
chosen so that
the estimator Vn-consistent
n if
and
h
^ +
t
h
a r/2.
,
h
and
h
.
If
—
This result
is
similar to the condition in
This result shows that the bias complementarity available with the
- 32 -
the
are asymptotically proportional then
equation (4.12) for the differentiable case, but only requires a Holder condition rather
than derivatives.
h
= Var[ip(z)]/n + 0(n~^h~^) + 0(h^^^^^).
The order of the bias bandwidth
2,
>
twicing kernel
is
not solely an artifact of the higher order property of the twicing
kernel.
We can
we have derived
illustrate the results
density estimators.
we regard
If
=
jiiF)
2 J"f(z)
dz
as a nonlinear functional and derive 5(z) = 2f(2),
a bias correction from the influence function estimate estimator of equation
(2.6),
corrected density estimator
=
II
= 2^._ f(z.)/n - Jf(z) dz.
/i
comparing different average
so far by
we
obtain the
Plugging in a bootstrap
gives
f(z)
~ 2 J"f(z) dz,
that will generally be different from
the density, where influence function
= E[g(z,F
(u
)]
for
g(z,F) = f(z),
that can be estimated by
ff^(z)
2
SfAz) dz
Regarding
.
fj
as the expectation of
and noting that
E[g{z,F)]
has
the bias corrected estimator
f(z),
of equation (4.3) becomes
i^T
3
= I-",nz-)/n + J"f(z)(P-F)(dz) = 2jf(z)P(dz) - Jf(z)^dz = ^1=1
1
ji,. 1
Also, the bias corrected estimator of equation (4.4) will be
M4 = L^Ji^-Vn.
It
f
follows from equation (4.8) that
/i
=
if
/i
f
based on the twicing kernel corresponding to
is
and f.
are kernel estimators where
f
Here we see that three of the
bias corrected estimators are equal, with the different one being the nonlinear integral
2 J"f(z)
dz
of a bias corrected nonparametric estimator.
gives ^^-consistency of
times differentiable,
fi
if
the original kernel has order at least
s > r/2,
A further reduction
in
The conclusion of Theorem 2.2
MSE
and
\^ —
>
oo,
Vnh
-^
33 -
ff^(2;)
is
s
0.
of the bias corrected kernel estimator
by deleting the own observation to obtain
s,
fi
is
possible
(4.16)
=
IJ-
L
Applying Theorem
bandwidth f
is
.K^(z.-z.)/[n(n-l)].
4.1,
this estimator will be \/n-consistent if
chosen as
in
equation (4.14
satisfies Assumption 9 then
(z)
/u
if
6
2:
1/4.
is
chosen so that ^ = 1/4
Here
is
be the same size as the leading
n
is
For example,
proportional to
h
46
r
if
~ ,
a(x) = b(x) =
if
(^+t)/2 = ^ ^ r/4
will still be Vn-consistent if
-2 -r h
and the
Furthermore, by Theorem 4.2,
.
and the other conditions described above are satisfied. the bandwidth
a r/4,
s
fi
is
=
1
then for
\^-consistent
a knife edge case where the variance remainder term will
term.
Var(i/((z))/n
The condition
^ ^ 1/4
was shown by
Bickel and Ritov (1988) to be necessary for existence of a /n-consistent (regular)
Thus, the twicing kernel estimator
estimator.
like the
conditions,
more complicated sample
fi
attains ^^-consistency under minimal
splitting estimator of Bickel and Ritov
(1988).
To
illustrate the potential efficiency gains
from using twicing kernels we have done
some exact MSE comparisons for different estimators of the average density when the true density
is
a standard normal and a standard normal kernel
estimator. (4.16)
when
Specifically, z.
are
we have calculated exact MSE
i.i.d.
with
N(0,I)
is
used to construct the
for the estimator in equation
distribution and
K(u)
is
either the standard
1
normal density or the twicing kernel based on the standard normal. been chosen to minimize the asymptotic (4.14) for the twicing kernel
MSE
The bandwidths have
for the respective estimators as in equation
and the analogous equation for the standard normal kernel
(with corresponding larger bias term).
Table One gives the sample sizes at which the
becomes smaller than the MSE of the normal kernel.
- 34 -
MSE
for the twicing kernel estimator
Table One: MSE Crossover Sample size Dimension 18 17
1
2 3 4 5 6 7 8
r = 4.
25 31
39 52
MSE
Figure One graphs the ratio of to dimension
19 21
as a function of sample size for dimension
For dimension
for higher dimensions.
=
1
r
:s
4
case because the
1
up
MSE
We
r = 8.
ratio would asymptote to zero
These graphs show persistent MSE gains.
Figure
Two
presents graphs of the
and
= 3
and for sample sizes
r
=
the standard normal kernel will not be
r > 4
v^-consistent but the twicing kernel estimator will, up to dimension restricted attention to the
r
MSE
as a function of sample size for dimension
50, 100, and 200.
r
These graphs allow us to
compare the sensitivity of the MSE to the choice of bandwidth for the standard normal and twicing kernel.
bandwidths, bandwidths.
we
Although the MSE of the twicing kernel can turn sharply upward at low find that
it
is
In this sense the
and close to
flat,
MSE
its
minimum, over a wide range of
for the twicing kernel seems less sensitive to the
choice of bandwidth than the original kernel.
5.
Semiparametric M-estimation
Estimators that solve an estimating equation with a nonparametric component have
many
interesting applications and include as special cases all the ones
considered so far.
we have
This class also includes profile likelihood estimators and
others, e.g. see Bickel et.
al.
semiparametric m-estimators.
(1990).
We
many
refer to this class of estimators as
In this Section
we develop bias-corrected
estimators.
- 35 ~
versions of these
To describe a semiparametric m-estimator, F
and
P
solve
be as previously discussed, and
F
m(z,/3,F)
q x
denote a
/3
q x
a
1
This includes as special cases
where
m(z,p,F) =
/3
where
fi(F),
vector of functions.
1
Let
is
it
Then expanding and solving for
/3
/3
is
from element
a mean-value that to element of
asymptotic normality of is
M
for
M =
E[Sm(z,/3 ,F
Also, let
and actually differs
/3
|3
The consistency of
M
and
p
^._ m(z.,F)/v'n
is
is
will
and
M,
a straightforward
F
and a uniform law
where undersmoothing and
can be obtained by applying the analysis of Section 4 to
/3
= J'm(z,F)F
M
differentiable in
important.
^
function.
is
nonsingularity of
)/5/3],
and accounting for the Jacobian term.
fi(F)
moment equation
Asymptotic normality and Vn-consistency of
Asymptotic normality of
may be
m(z,/3,F)
and
/3
influence function correction like that of equation (4.3).
function for
X!-_ig^z.,F)/n,
= n"^^.2^3m(z.,p,F)/5/3, m(z.,F) = m(z.,/3Q,F),
on a line joining
J]._ m(z.,F)/Vn'.
Bias corrections for g(z,F) = m(z,F)
lies
Suppose that
.
generally an implication of consistency of
of large numbers.
bias corrections
m.
M
follow from consistency of
property that
and
gives
JS
V^(P-/3q) = -M"^Xi"^m(z.,F)/-/H,
where
- fi(F),
/3
useful to begin by expanding the
around the true value of the parameter
(5.2)
m(z,/3,F) =
- g(z,F).
To obtain bias corrections
= n
~1
(dz),
n i^.
and ^
let
6(z)
The first approach Let
6(z)
is
by an
be the influence
be an estimator of this influence
y^
dm{.z.,f3,F)/dfB.
Then a bias corrected estimator
by
(5.3)
parameter vector,
X:-"|m(z.,|3,F)/n = 0.
(5. 1)
/3.
let
P = P - M"^[i;.y(z.)/n - J-6(z)F(dz)] = p - M~V6(z)(P-F)(dz)
- 36 -
is
given
This estimator has a form like that is
in
equation (4.3), except that the Jacobian estimator
M
included to account for the presence of
equation
in
To see how the correction affects the estimator,
v/S(p - Pq)
(5.4)
Jn
R„
zn
+ 6(z)-E[5(2)]
)
= J'[6(z)-6(z)](P-F)(dz).
= J[m(z,F)-m(z,F^)]F^(dz) - j5(z)(F-F^)(dz). u u u
The remainder terms
= m(z,F
= -m"^[I."j^(z.)/v^ + ^(Rj^ + R^^ + R3^)] + R^^
= J[m(z,F)-m(z,F-)](P-F_)(dz), R, u u In
R^
i//(z)
Then
Jm(z,F)F(dz).
be the influence function of
let
(5.2).
in
equation (5.4) are
all
= •/E'(M~^-M"-^)J'5(z)(P-F)(dz). R^ 4n
second-order, so that v^n-consistency of
p
should not require undersmoothing, although precise regularity conditions are difficult to specify at the level of generality considered here.
One important property of semiparametric estimators follows immediately from equation
Suppose that
(5.4).
6(z)
the asymptotic distribution of
=
meaning that estimation of
0,
Then
^.
6(z) =
estimator would be the original estimator. not need undersmoothing.
Thus,
large sample distribution of
we
is
F
does not affect
consistent, and the bias corrected
Consequently, the original estimator should
find that
if
the presence of
F
does not affect the
no undersmoothing will be needed.
p
There are many interesting semiparametric estimators where estimation of affect the asymptotic variance, and so undersmoothing
Newey
(1994),
if
m(z,p,F)
is
may
the gradient of a function
estimator of the "profile" distribution that maximizes
F
does not affect the asymptotic variance.
any of these estimators.
is
Hence, undersmoothing
is
an
then estimation of
may
not be needed for
Many known semiparametric estimators are special cases of
result, including Robinson (1988),
m(z,p,F)
E[q(z,p,F))
F
and
does not
As shown in
not be needed. q(z,P,F)
F
Chen and Shiau
an efficient score for
(3
in
(1991),
and Ichimura (1993).
a semiparametric model and
F
is
Also, if
an estimator
that imposes the restrictions implied by the model then the limiting distribution of
- 37 -
this
p
is
many examples is
in the
when the model
F
not affected by estimation of
may not be needed when
in
m(z,/3,F)
correct.
is
The second approach to bias correction formation of the estimator of
(i,
is
to use a bootstrap corrected estimator
choosing
F
as the solution to
p
5:."^m(z.,/3,F)/n = 0.
(5.5)
This estimator should not need undersmoothing because
F
demonstrated
true, as has been
Hence, undersmoothing
literature.
an efficient score and the model
in the
is
will replace
F
estimator
F
in the
that
is
average
^._ m(z.,F)/n.
in
the expansion of equation (5.2)
As discussed
Section
in
any
3,
an idempotent transformation of the empirical distribution, such as
a sieve estimator or a series estimator of a conditional expectation, has this correction so undersmoothing
built in,
We
may
not be required for Vn-consistency of
p.
could derive precise results on bias-corrected m-estimation for both the
estimators of equation (5.2) and the same as in Sections 2 and
However, because the ideas here are basically
(5.5).
we choose
3,
to be brief, and consider only
of equation (5.5), focusing on conditions for VTi-consistency
when
F
estimator
th^;
a twicing kernel
is
estimator.
Newey and McFadden
(1994) have already given general results on \/n-consistency of
semiparametric kernel estimators.
We
build on their results, deriving a corresponding
result for twicing kernels, showing that undersmoothing
specification for
w
where
is
m(z,|3,F),
where
m
depends on
F
is
only through
3'p,(x)
y{x) =
E
their
[w|x]f(x), r
a vector of random variables that are not elements of
estimator of the true function
We adopt
not needed.
= E[w|x]f
(x)
x.
A twicing kernel
would be
iix) = Y. ,K, (x-x.)w./n. ^1=1 h 1 1
Letting y(-),
m(z,p,2')
be a
q x
1
vector of functions that depends on
/3
a kernel semiparametric bootstrap corrected m-estimator would be
- 38 -
and the function p
solving
= Ej^^m(z.,/3,J)/n.
To specify regularity conditions we modify the norm
•
II
to apply to
II
'^.
X
Let
denote
a compact set and
= sup
llyll
This
is
norm
a Sobolev
Assumption
9:
c
m(2,y) = m(z,/3
Assumption with
b(z)ll3'll,
J'i/'(x){3r-2r
and
E[b(z)]
oo.
Also,
n
n
and
b(z)
Jsup
s,
s
with bounded derivatives
|5
y (x+A)|dx
that are linear in
DCz.g-)
llm(z,r)-m(z,3' Also, there is a
]
and
oo.
D(x) = JK(u)y(x+hu)du,
B
We
,
=
|A|
3.
.^r).
small enough,
ll3--?-„ll
with
A
all
There are
10:
one used for the kernel results of Section 2 and
continuously differentiable to order
and
>
^(x)ll.
,115
.
like the
is
j-tx)
and for some
Let
„sup.
p
=
= /nJ'[y(x)-y(x)]wP(dz),
R
n
+ h
,
= •/nJ"[m{z,F)-m(z,F^)-D(z,F-F-)]F-(dz),
= •/Sj[m(z,F)-m(z,F^)](P-F-)(dz).
first give a result that bounds the
remainder terms
m(z,f)-
- 39 -
in the
expansion for the average
Theorem m(z,Tf
)
If Assumptions
5.1:
and
9,
are satisfied then for
10
{f)(z)
=
+ i'(x)w - E[v(x)w],
B
=
p
n
Y^Mz.)/VR ^1=1
=
Y.'^Mz.,i)/^ '-'1=1 I
+
B
I
s
(VTlh^^^ + h^),
v(x) =
Furthermore, if
To show
2,
=
n
p
S
+
n
(VRp^ n
n
+
R
n
,
R
+ h^),
n
=
p
then this result also holds with
(VRp^).
n
replacing
K(u)
K(u).
a corresponding /n-consistency result for the semiparametric estimator
make assumptions that ensure convergence of the Jacobian term
we need
to
The next condition
M.
suffices.
Assumption
^ —
11:
small enough,
^
fS
m{z,l3,'y)
for
,
a neighborhood
in
/3
is
nonsingular,
Theorem then
Vn(^-(i
v[x)
has
)
)
-^
is
9-11
N(0,M~^Var(\p(z))M~^'
K(u)
replacing
).
is
and
/3
all
y
with
Il3'-?'^ll
c > 0,
= E[dm{z,f3^,r^)/aii]
exists
moments.
Vnp
Furthermore, if
2
—
>
v(x) =
0,
and
Vnh
s+t
—
>
then the same
K(u).
nonzero the conditions for ^^-consistency of this estimator are exactly
if
t >
r/2 +
d.
3.1.
The case where
having no effect on the asymptotic variance of correction
finite second
are satisfied,
analogous to those discussed following Theorem not be needed
M
117-^^^^),
mean zero and
If Assumptions 2 and
5.2:
result holds with
When
ni(z,y
of
continuously differentiable and for some
is
liam(z,/3,9-)/ap-am(z,pQ,?-Q)/5pll < b(z)(llp-/3Qll^ +
and
B
needed for this case:
If
Vnp
—
In particular,
v[x) = p.
>
corresponds to estimation of
As previously discussed, no bias and the kernel
(non-twicing one) then v^-consistency follows from Theorem 5.2.
- 40 -
undersmoothing will
is
the original
-y
6.
Conclusion
paper we have shown that
In this
is
it
estimator where the degree of smoothing
often possible to use a nonparametric
optimal (MSE minimizing) for estimation of the
is
function to construct a v'n-consistent estimator of a functional.
One approach involves
adding a bias correction to the estimator where the nonparametric estimate
Another involves a smoothing correction to the nonparametric estimator.
is
plugged
in.
An important
class of nonparametric estimators, that are idempotent transformations of the empirical distribution, have this smoothing correction built
in,
so that undersmoothing
is
not
needed for /n-consistency, including orthogonal series density estimators, series estimators of conditional expectations, and sieve estimators.
The influence function plays a key role
in
achieving Vn'-consistency without
For the additive correction to a plug
undersmoothing.
in
estimator
it
must be possible to
construct a nonparametric estimator of the influence function that converges sufficiently fast.
For the smoothing adjustment to the nonparametric estimator the influence function
must be smooth as a function of the data. to the functional.
A
These properties may or may not be intrinsic
topic of future research
would be to seek to identify the class of
functionals for which Vn-consistent estimation without undersmoothing
We have
is
possible.
discussed bias correction and undersmoothing for an increasingly general
class of estimators, beginning with functionals of a density and ending with
semiparametric m-estimators.
undersmoothing
is
not needed
variance of the estimator. estimators.
We showed
We found
that for semiparametric m-estimators
when nonparametric estimation does not affect the asymptotic
Regularity conditions were given for "twicing" kernel
that this class of kernel estimators has a special property, being
the outcome of a smoothing correction applied to the original kernel, that removes the necessity of undersmoothing for these estimators.
Our numerical results show MSE gains
for the twicing kernel at small sample sizes and less sensitivity to bandwidth choice in
estimation of the average density, indicating some potential for use in practice.
- 41 -
Appendix: Proofs of Theorems
Throughout the Appendix, that
may
Op =
By
2.1:
Op
(ln(n)/-/Hh'^''^*^
>
Also, for small
0.
will represent a generic positive constants,
enough
Newey and McFadden
8.10 of
R
Therefore, for
+ V^h^^).
By continuity of
jK(u)[6(2+hu)-6(z)]du.
—
Lemma
(ln(n)/nh'^"^^^ + h^^).
O(VnllF-F^II^) =
h
C
be different in different uses.
Proof of Theorem IIF-F^II^
and
c
5(z)
bounded
|5(z+A)|
probability one.
theorem
Var(B
Also,
n
2 |e(z,h)|
^ [Jb(z,u)du]
= Var(e(z,h}} ^ E[e(z,h)
)
2 ]
—
2
£ C,
0.
^
as
>
=
for
e(z,h) =
that
as
—
is finite
K
by
u h
u
U,
= b(z,u),
—
e(z,h)
n
for almost all
and has finite integral over
compact, so by the dominated convergence theorem
R
= y.",e(z.,h)/v^ ^1=1 1
n
having bounded support
K(u)
sup |K(u)[5(z+hu)-5(z)]| < l(uel/)sup |K(u)l2sup
with probability one by
B
5(z+hu)-5(z) -^
6(z),
by
h,
Also, for
follows that
it
equation (2.4), ^
in
n
(1994),
with
>
so by the dominated convergence
Also, by the usual
mean-value
expansion,
jK(u)[fQ(z-hu)-fQ(z)]du
(A.l)
.uVf^(x)
= JK(u){E^~}((-h)-J/j!)E,^,
= (-h)^X I
where
I
h
I
convenience.
£
|
h
|
^
uVf^(x-hu)Mu
^/K(u)u'^a\Q(z-hu)du/s!, I
and dependence of
Then for small enough
|E[e(z,h)]|
+ {{-hf/s\)l,^,
h
on
z
and
u
is
suppressed for notational
h,
= |J'jK(u)[5(z+hu)-5(z)]dufQ(z)dz| ^ J|5(z)
|
|
J-K(u)[fQ(z-hu)-fQ(z)]du|dz
^ Ch^I|;^,=s[^|K(u}|u^du]JsuP||^„^^la\Q(z+A)|dz = O(h^).
Then by the Markov inequality,
B
=
(V^
+ o(l)),
giving the first conclusion.
Furthermore, under the stated bandwidth conditions both
- 42 -
B
n
-^
and
R
n
-^
0,
giving
>
QED.
the second conclusion.
The following Lemma
g(z) = E[w|z],
variable,
Lemma linear,
that
£
\n(r)\
By
for
h
2
Clly
II
=
for
For
n
z ^
all
Z
=
2'„(z)
linearity of
m(3^),
for
Let
h
continuous in
(
+2^
= m(^
)
=
and
Z
Also, by
).
--z))] = E[m(g(z)K,
h
(
--z))]
^
|m(3r„)|
(--zlH =
IIK
so by linearity of
z g S,
in the
z
on
z
)?.
d
and hence
is
uniform measure on
semi-norm It
to order
K(v)
if.
m.{is),
=
G
BhJGh
support and linearity of
—
>
it
G
>
(
n
•-z))du.
IIJ'„'3r(u)K.
h
('-uldCu
For
T
n
from Section
-u)ll
- 43 -
—
Furthermore, by
u,
having finite
J
)
J
2,
Also,
J
= J"m(3'(u)K (•-u))du h J
.
Then by the
m(r) = m(J„r(u)K, (--ujdu) = J„m(3'(u)K, (--ujldu = E[m(wK. (--z))]. 6 h G n h
2.2:
is
of up
triangle inequality,
Proof of Theorem
and
continuous and
iS^
z
^-Cz)
9'(z)K ('-z)
J~„m(y(z)K,
is
m(X„3'(u)K. (•-u)du).
h
—
J
follows that
m(J"„9'{u)K ('-ujdu
mC^-),
is
h
with respect to S,
compact,
m(3r(z)K, ('-z))
h
bounded and continuous on
J
Z
J'„m(3'(z)K, (--zjjdu,
y(u)K, (z-u)
m(J'„3r(u)K, (•-u)du
Then, by continuity of
and by
d
Hence
ll»ll.
follows that
Also, since each derivative of
0,
llj-^ll
and
J
continuous differentiability of
to order
so that
n
such
be a sequence of measures with finite support, that
u,
in distribution to the
in
= m(y
all
h
h
J'„m(3'(u)K, ('-uljdu.
bounded
G
i.
m('j')
m(3^(z)K (--z)) = 0,
i?,
h
converge
0,
i
^
a compact set
is
y,(z) = J„3r(u)K, (z-u)du
Let
is
iJ-C^f)
E[m(wK^( --z))] = m(^).
compact there
u i ^.
z e Z,
g(z)f (z).
are continuous,
f^(z)
E[wK^('-z)],
=^
)
E[m(wK^(--z))] = E[wm(K, (--z))] = E[g(z)m(K,
b
denote a random
Z.
Then by
0.
'
and
z e Z
all
3-(u)K, (z-u)du. joc
r(
and
g(z)
having bounded support and
=
K, (z-u)
satisfied,
is
w
Let
denote a possible value of
^-(z)
then for
Cllj-ll,
K(u)
r„(z) = S
and
If Assumption 2
A.l:
Proof:
useful in the proofs to follow.
is
QED.
It
(A.2)
s VES\Siz)-5{z)\\hz)-rAz)\dz
O
I
n
£ v/Kj-blzJlfCzj-fQCzJIdzllF-FQll + v^JI D(z,F-Fq)
=
< ZV^JblzJdzllF-F^II
|
+ /Kh
(ln(n)/V^h
|
f(z)-fQ(z) |dz
).
p
Next, by
bounded,
b{z)
^ Cb{z)h
b{z)IIK ('-z)!!
D(z,F-F
that
Also, by Assumptions
m.
as
h
~
—
>
0.
Since
[a(x)v + £(x)y]
theorem imply that
—
>
2
0,
Var(a(x)v+?(x)y)
'
—
>
'
).
By applying the dominated
follows from Assumption 7 that
a(x)v + 2(x)y
so that
2 2 ^ C(v +y ),
it
otn^V"""^' ^
^ (Q-Q^) +
a(x) = XK(u)a(x+hu)du.
and
that
(A. 9)
follows that
convergence theorem as we have done previously
Ux)
^''^').
-2,-r-2|A|,
''^ '
^'^'q^ + o([l/n(n-l)]h ^
'^
—
>
a(x)v + £(x)y
2(x) -
as
Assumption 7 and the dominated convergence Furthermore, by integration by parts
Var(i//(z)).
and interchanging the order of differentiation and integration,
Emz^,z^)+k{z^,z^)\z^] = E[S'^K^(x^-X2)y2V^|z^] + ElS'^K^Cx^-x^ly^v^lz^]
=
hl2 22
E[S'^K, (x -x„)E[y„|x„]|z,]v,
11
h21
22
+ E[a'^K. (x„-x,)E[v„ |x„] Izjy,
= [ja'^Kj^(x^-x)E(y|x]fQ(x)dx]v^ +
(-1)
''^ '
11
ja'^Kj^(x^-x)b(x)dx]y^ = a(x)v + 2(x)y.
Therefore,
(A.12)
[(n-2)/n(n-l)]Var(E[fe(z^,Z2)+^(z2,z^)|z^])
=
Var(i/>(z))/n + o(n"^).
Plugging the results from (A.10)-(A.12) into {A.8) and using
- 47 -
MSE(jn
)
= Var(fx
)
+
2 (E[/j
to combine equations (A. 7) and (A. 8) and then gives the result.
]-Mf^5
By Assumption 8 and
Proof of Theorem 4.2:
made as small Then for
having bounded support,
as desired uniformly over the support of
small enough,
h
K(u)
=
|a(x)-a(x)|
|
u
by choosing
h
QED.
hu
can be
small enough.
£
JK(u)[a(x+hu)-a(x)]du|
J|K(u)| |a(x+hu)-a(x)|du ^ J|K(u)|C (x)llhull^du = h^C (x)/] K(u) llull^du = Ch^C (x)
and
|
similarly,
|b(x)-b(x)|
lEfjl
s Ch C
a
(x).
It
then follows by equation
^ T|i(x)-a(x)| |b(x)-b(x)|dx