Dec 3, 1992 - asymptotic variance formula and then "plug-in" consistent estimators. ... asymptotic theory of partial means, and more generally for other ...
Digitized by the Internet Archive in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/kernelestimationOOnewe
working paper department of economics
KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR
Whitney K No.
.
Newey
92 -3
Dec
.
1992
massachusetts institute of
technology 50 memorial drive Cambridge, mass. 02139
M.I.T.
LIBRARIES
KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR Whitney K. Newey No.
93-3
Dec.
1992
MIT Economics Working Paper 93-3
KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR*
Whitney K. Newey
MIT Department of Economics December, 1991 Revised: December, 1992
*
Financial support
comments.
was provided by the NSF.
P.
Robinson and T. Stoker provided useful
Abstract
Econometric applications of kernel estimators are proliferating, suggesting the need for convenient variance estimates and conditions for asymptotic normality.
This paper develops
a general "delta method" variance estimator for functionals of kernel estimators.
Also,
regularity conditions for asymptotic normality are given, along with a guide to verifying
them for particular estimators.
The general results are applied to partial means, which are
averages of kernel estimators over some of their arguments with other arguments held fixed. Partial
means have econometric applications, such as consumer surplus estimation, and are
useful for estimation of additive nonparametric models.
Keywords: estimation.
Kernel estimation, partial means, standard errors, delta method, functional
1.
Introduction
There are a growing number of applications where estimators use the kernel method their construction,
where functionals of kernel estimators are involved.
i.e.
in
Examples
include average derivative estimation (Hardle and Stoker, 1989, and Powell, Stock, and Stoker, 1989), nonparametric policy analysis (Stock, 1989), consumer surplus estimation
(Hausman and Newey,
1992), and others that are the topic of current research.
important example in this paper
is
a partial mean, which
is
an average of a kernel
regression estimator over some components, holding others fixed.
The growth of kernel
applications suggests the need for a general variance estimator, that applies to
This paper presents one such estimator.
cases, including partial means.
An
many
Also, the paper
gives general results on asymptotic normality of functionals of kernel estimators.
means control for covariates by averaging over them.
Partial
They are related to
additive nonparametric models and have important uses in economics, as further discussed
below.
It is
shown here that
their convergence rate is determined by the
components that are averaged
out, being faster the
number of
more components that are averaged
over.
The variance estimator
is
based on differentiating the functional with respect to the
contribution of each observation to the kernel.
A more common method
asymptotic variance formula and then "plug-in" consistent estimators. quite difficult case.
when the asymptotic formula
In contrast, the
functional and kernel.
In this
way
Also,
it
This method can be
complicated, as often seems to be the
it
is
gives consistent standard errors even for fixed is
centered at
like the
An alternative approach
its limit),
unlike the
more common
Huber (1967) asymptotic variance for m-estimators.
Also, it is a generalization of the "delta
bootstrap.
to calculate the
approach described here only requires knowing the form of the
bandwidths (when the estimator approach.
is
is
method" for functions of sample means.
to variance estimation, or confidence intervals, is the
The bootstrap may give consistent confidence intervals
-
1
-
(e.g.
by the
percentile method) for the
not appear to be known.
same types of functionals considered here, although
In any case, variance estimates are useful for bootstrap
improvements to the asymptotic distribution, as considered
The variance formula given here has antecedents density at a point
it
is
this does
in Hall (1992).
in the literature.
For a kernel
equal to the sample variance of the kernel observations, as
recently considered by Hall (1992).
For a kernel regression at a point, a related
estimator was proposed by Bierens (1987).
Also, the standard errors for average
derivatives in Hardle and Stoker (1989) and Powell, Stock, and Stoker (1989) are equal to this estimator
when the kernel
is
New
symmetric.
cases included here are partial means
and estimators that depend (possibly) nonlinearly on
of the density or regression
all
function, and not just on its value at sample points.
Section 2 sets up m-estimators that depend on kernel densities or regressions, and gives examples.
estimator.
Section 3 gives the standard errors,
i.e.
the asymptotic variance
Section 4 describes partial means and their estimators, and associated
asymptotic theory.
Section 5 gives some general lemmas that are useful for the
asymptotic theory of partial means, and more generally for other nonlinear functionals of kernel estimators.
The proofs are collected
in
Appendix
A,
and Appendix B contains some
technical lemmas.
2.
The Estimators
The estimators considered is
in this
To describe the
a vector of kernel estimators.
vector of variables,
x
a
k x
paper are two step estimators where the first step
1
denote the product of the density
first step, let
y
be a
r
x
1
vector of continuously distributed variables, and f
n (x)
of
x
with
- 2 -
E[y|x]
as
h
(2.1)
Let
Q
(x)
= E[y|x]f
(x).
where
in Section 4,
u
k x
is
that include observations
Let
1.
and
y.
x.
l
K
=
(u)
-k a-
z.,
(i
on
h(x) = n
This estimator
is
A second
^"y.K
h.
u
n"
This
is
h,
($
,
and not just
E[m(z,£
1
is
an m-estimator that depends on the
To describe such an estimator, and
,h
)]
=
0.
its
h.
let
|3
denote a vector of
a vector of functions that depend on the
m(z,/3,h)
observation, parameter, and the function
(2.3)
and
(x-x.).
parameters, with true value
sample equation
>
o-
the first step considered here.
estimated function
Suppose that
Then for a bandwidth
x.
is
h_.
step allowed for in this paper
entire function
denote data observations,
n),
1
and
y
a kernel estimator of
K(u/o-),
=
and other conditions given
1
l
o"
(2.2)
TX(u)du =
denote a kernel function satisfying
X(u)
Here
m(z,j3,h)
is
allowed to depend on the
value at observed points; see below for examples.
A second
step estimator
|3
that solves a corresponding
is
n m(z.,/3,h) = 0. *n=l i £.
i
a two-step m-estimator where the first step
the kernel estimator described
is
above.
This estimator includes as special cases functions of kernel estimators evaluated at points, e.g. a kernel density estimator at a point.
Some other interesting examples
are as follows:
Partial Means:
An example that
is
(apparently)
new
regression over some variables holding others fixed.
and
g n (x) = E[q|x].
Partition
x = (x ,x
)
and
let
- 3 -
is
an average of a nonparametric
Let
x„
q
denote a random variable
be a variable that
is
included
and has the same dimension as
z
in
x(x
&
be some fixed value for
x
and
x„,
(2.3a)
/3
This object
A
= E[T(x
Q
2
)g
(x
x
r 2
mean
partial
an average over some conditioning variables holding others fixed.
is
estimator
y =
Let
expectation.
f(x) = h,(x),
and
|3
= n
This estimator )h (x
is
= (x,,x„.).
x.
is
^TlXylilx.).
m(z,0,h) =
a special case of equation (2.3) with
,x„)/h (x ,x„) -
It
/3.
shows how explicit estimators can be included as Further discussion
special cases of equation (2.3).
given in Section
is
4.
An estimator with economic applications
Differential Equation Solution:
solves a differential equation depending on a nonparametric regression.
estimator, let (x ,x )'.
,
y = (l,q)
x
and suppose
is
be some fixed value for
x
denoted by
p
and
p
,
with
p
M VM
will be an asymptotic 95 percent confidence interval.
interesting to note that the |3
-
will be asymptotically valid.
Var(/3)/n
1/2
/3
cr
very hard to construct,
5.
C,.
5.
can be calculated by
Alternatively, if the analytic
can be calculated as the numerical derivative of
7 -
n y. ,m(z.J,h +
du
and
)
f (x
)
are continuous, and
Pk-k
< «; (fi
- P )
Q
-^
9
i/ln(n)
v) n
N(0,V).
V.
The conditions here embody "undersmoothing," meaning that the bias goes to zero faster than the variance. distribution
is
Undersmoothing
reflected in the conclusion, where the limiting
is
centered at zero, rather than at a bias term.
An improved convergence rate for in the
k /2
normalizing factor
v'ntT
partial
means over pointwise estimators
k -1/2
the asymptotic distribution result
is
(no- l)
while the corresponding rate from the
k -1/2
usual asymptotic normality result for pointwise estimators
converges to zero slower by is
cr
is
(n
such that
\\m(z,P,h)-m(z,p,h
(5.1)
sup
i)
n
)
E[sup
d £ A+l, ln(n)/(ncr E[b(z)] < )\\
oo,
and for
* b(z)(\\h-h
"
l \\n~ £.
i
\\
f
.
x e
X.
j.
m
Let
m(z,fi,h
compact, and
are satisfied with
and for any
the derivatives do not exist for some
if
uniform convergence
conclusion of the following result.
Lemma
x,
l B{x)/dx'\\,
IIBII
is
denote any
elements of
all
contained in the support of
where
is
3 B(x)/5x^
show smoothness
let
j
IIBII.smax,
.
let
order partial derivatives of
j
denote a set that
nonnegative integer
quite easy to
functionals.
To define the norm, for a matrix of functions B(x)
B(x).
is
= E[m(z,/3,h )]. n
(£)
continuous at each
is
CR
)
\\m(z,p,h
—
/3
m(z,p,h) - E[m(z,p,h
and
>
Then
co.
Then
m(h) = fD(z,h)dF(z),
CC
n
Jm(z.,h)-m(z.,h n )]/n Y *n=2
Vn(r
i
Op
= VRa^ [m(h)-m(h J] + o
i
The conditions of this result imply Frechet differentiability at function of
h,
in the Sobolev
norm
llhll
A = A
with different norms, rather than
,.
max{A = A
h_
of
m(z,h)
as a
The remainder bounds are formulated
... ,A > r 2
to allow
,
(1).
weaker conditions for
asymptotic normality in some cases.
Asymptotic normality of either
Lemma
Lemmas
5.2 or 5.3.
5.2 and 5.3 that
cr
m(z.,h)/Vn
2>
In the v'n-consistent case of
n
In the
n ,m(z.,h,J/'/n ^1=1 l
-^
0,
Lemma
n
Var(m(z.,h )+5.)
Lemma so that
5.3 and o-
T.
n
^1=1 i
a
>
0,
m(z.,n)/vn' l
- 18 -
it
>
N(0,V).
so that
follows by the central
m(h) = JD(z,h)dF(z)
will be the case that
—
5.4 with
from
5.2, it will follow
,m(z.,h)/Vn = Y. Am{z.,h n ) + S.-E[d.]}/Vn + o (1), ^1=1 ^i=l l l l l p
slower than v'n-consistent case, where
satisfies the conditions of o-°T.
Lemma
V.
asymptotic normality, with asymptotic variance limit theorem.
can be shown by combining
Assumption oo;
For
ii)
linear on llh-hll
A
—
Pn
>
Lemma
and
U A_
.
e
—
llh-h.ll.
Sa
5.2:
+
),
—>
co.
m(h) = SD(z,h;f$ n ,h n )dF(z)
= v(x.)y.,
T.
_IIS.-5.il
/n —^-»
satisfies the conditions of
Lemma
and 5.4,
Appendix
C
Throughout the appendix different uses and
= £•_,
£.
Proofs of Theorems
A:
T
and
Also, CS, M,
different in
Cauchy-Schwartz,
will refer to the
DCT
Markov, and triangle inequalities, respectively, and
to the dominated convergence
Before proving the results in the body of the paper
theorem.
may be
will denote a generic constant that
is
it
useful to state and
prove some intermediate results.
Proof of Theorem
x = (x ,x
Let II
= sup
nil
t(x) = x(x
),
x
for
e
_1
D(z,h-h;h)| =
|
[h.(x)
a = k,/2.
_r
,
,
i)
n ^- 2s+a
„l/2 a+s-k/2
2[ln(n)J
+ vncr
a-
x e
all
=
[ln(n)/(ner
—
_
,
by
>
.
k—
—
v'na-
>
Thus, the conclusion of -1 (x(t))
Let
= x(x(t))f
a.e.
u(t)
Lemma
(x(t))
n,
5.4 are satisfied.
m(h) = JD(z,h;h )dF(z) = for
(t)dt,
= x
t
This function f
,
The other conditions of Assumption
x.
ti
_ 0,
>
a+s
and zero outside a compact set by continuity of
assumption about
—
*
|D(z,h;h)|
v'na-
,
a-k.^-.
,.
and
llhll
5.5.
|m(z,/3,h) - m(z,j3,h) -
< c,
so that the rate hypotheses of
oo,
[h (x(t)) -
J*T(x(t))f
,
lnlnjcr
goes to zero faster than some power of implies that
+
)]
2
_1
small enough that
e
llh.-h._ll
s Cllh-hll
|
and
iii),
D(z,h;h) = x(x)h (x)
Choose
n ).
Then for
X.
h (x) - l]D(z,h-h;h)
Then for
1
be the compact set of hypothesis
D(z,h) = D(z,h;h
and
bounded below by
X
),
m(z,h) = x(x)h (x)/h (x),
Let
llh(x)ll.
{h_(x)/h (x)}h (x)],
Let
The proof proceeds by checking the conditions of Lemmas 5.3 -
4.1:
v'no-
Lemma
l
and
f_, 5.1
v'no-
)]
I.{m(z.,n)-E[m(z,h )]>/n
x(t)
=
(x-.t).
bounded and continuous g
,
and by the
are also satisfied by
—
5.3 holds, for
= E[m(z,h
- 20
=
is
and
2
>
oo
V
by hypothesis and in equation (4.3).
a
a
=
i/n 0.
~\/Vn)
Then
-H> 0.
A
the conclusion follows by T.
E[D(z.,h-hJ
By J
2
Lemma
D.. ij
It
2
2 ](
llh-h^ll
JT>(z,,y.K (•-x.);|3_,h.JdF(z). IID..-D..II ij
ij
lO
lO
i
2
J
—
»
0.
Then
QED.
n
o"
)
.
A
follows by a standard argument, similar to the proof of
= D(z.,y.K (•-x.);0 n ,h n ), 0' i oJ
l
Let
E.
J
=
D. i«
£ E[b(z)
]
B.4,
/ cr'TAD. • -E[D. • ]-D(z.,h )+E[D(z.,h,,)]>/v n -?-> 0.
Jlm(z.,0,n)-m(z.,0_,h_)ll /n -5-» 0. ^i=l l i
that
1
IJ
l
^i
5.1,
=E[D..|z.]
=
l
Thus, by Chebyshev's inequality,
5.5:
for
s
.]
l
Lemma
2
i»
p
Hh-hJL
l
Proof of
2
Then by a V-statistic projection on the basic observations
l.
+(E[D 0,
=
1
2
=
Ilfi-h.ll.
.
ij
ij
—
\\
r.D(z.,h)/n n
D(z,h),
ii
E[b(z)
Then by hypothesis
-^-» 0.
l
2
The
0.
QED.
Hh-lrJI
B.3,
a ll s
llcAi
1
k {l+v(x,(t)-oni,t)>f-(x (t)-(ru,t)dtdu/n
conclusion then follows by the Liapunov central limit theorem.
Proof of
1
4 l
o*
=
]
(x.)ll (T
1
1
V.
>
1
E[llp
1
(x,(t)-«ru,t)ll
l
—
(t)-
V
——» V
5.3 note that
n
~\(5
p
Q pn
+7)
2
2 /n](j:.lly.ll
/n)
A 2 2)] ) = o (1). n p
2 E[II5.-5.II
a cr
T and
follows by 5.
= p
E[5.]
2 n
2 ]
]
* Gr
£ Qr
2a (E[lln
2a
n
^
(D -«
u
2
1
(E[IID
II
2 )B
+ E[IID
]
Under the conditions of lemma
-£-» 0.
1
—
>
0,
V.(5.-y .5 ./n)' u\ 1 ^j.5 ./n)(5.-y 1 ^j j j
y.HS.-S.II ^1 1 1
E[5.6'.
11
—5->
V,
]
2
5.2, it
/n -^>
+ n
]
2
1
E[IID
n
ll
]
2 II
])
was shown
in the
—
-1 >
V,
and
n
4a o-
follows by J
Lemma
4 E[H6.ll
]
—
>
0.
5.3.
so the conclusion follows by T.
As shown
in
Therefore, by M,
1
- 24 -
T.
Under the conditions of Lemma
defined in the proof of e
p (x) r xr
i
cr
so that
the law of large numbers.
2a
—
0,
>
]
1
for
(x.)y. J
c
1
(r
~k
[5:.b( Z .)
1
5.2 that
1
that proof,
CC
2a
0,
1
Then
* Qr
= OJ[a-
)
O A
2 /!!
i.i.d.,
E[
M/2) + Prob(5
and the triangle inequality,
su P;r |H(x)-E[H(x)]
sup T |H(x)-H(x)
=
|
0(c-
If Assumptions
K, H,
Therefore, by eq.
The conclusion then follows by
(5).
applying this conclusion to each derivative of up to order
B.2:
< e.
p
J.
Lemma
M/2)
>
|
and by
j
and Y are satisfied for
d £ j+s
QED.
bounded.
cr
then
\\E[h]-h
II
.
=
m ).
Proof:
that by
1
around
=
-
a
IIS
it
Lemma
B.3:
J
h/5x J -
Ccr
C
m
o-
J
a 'h (x)/ax
J ll
mXX(u){® m r
=l ]
=
d s j+s
Hh-hJI
.
J
=
J
=
1/2 (n®{a
m [T|K(u)|llull m du]llsup
U
C„,
If the hypotheses of Lemmas
satisfied with
(B.7)
J
9 E[h](x)/dx
for constant matrices
0,
so
1
h (x) = TK(u)h (x)du.
follows that
+
*
10"
having finite support,
X(u)
J~K(u)du =
(B.6)
k
E[h](x) = E[y.K (x-x.)] = J"h(t)[X((x-t)/