Kernel estimation of partial means and a general ... - DSpace@MIT

2 downloads 0 Views 1MB Size Report
Dec 3, 1992 - asymptotic variance formula and then "plug-in" consistent estimators. ... asymptotic theory of partial means, and more generally for other ...
Digitized by the Internet Archive in

2011 with funding from

Boston Library Consortium

Member

Libraries

http://www.archive.org/details/kernelestimationOOnewe

working paper department of economics

KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR

Whitney K No.

.

Newey

92 -3

Dec

.

1992

massachusetts institute of

technology 50 memorial drive Cambridge, mass. 02139

M.I.T.

LIBRARIES

KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR Whitney K. Newey No.

93-3

Dec.

1992

MIT Economics Working Paper 93-3

KERNEL ESTIMATION OF PARTIAL MEANS AND A GENERAL VARIANCE ESTIMATOR*

Whitney K. Newey

MIT Department of Economics December, 1991 Revised: December, 1992

*

Financial support

comments.

was provided by the NSF.

P.

Robinson and T. Stoker provided useful

Abstract

Econometric applications of kernel estimators are proliferating, suggesting the need for convenient variance estimates and conditions for asymptotic normality.

This paper develops

a general "delta method" variance estimator for functionals of kernel estimators.

Also,

regularity conditions for asymptotic normality are given, along with a guide to verifying

them for particular estimators.

The general results are applied to partial means, which are

averages of kernel estimators over some of their arguments with other arguments held fixed. Partial

means have econometric applications, such as consumer surplus estimation, and are

useful for estimation of additive nonparametric models.

Keywords: estimation.

Kernel estimation, partial means, standard errors, delta method, functional

1.

Introduction

There are a growing number of applications where estimators use the kernel method their construction,

where functionals of kernel estimators are involved.

i.e.

in

Examples

include average derivative estimation (Hardle and Stoker, 1989, and Powell, Stock, and Stoker, 1989), nonparametric policy analysis (Stock, 1989), consumer surplus estimation

(Hausman and Newey,

1992), and others that are the topic of current research.

important example in this paper

is

a partial mean, which

is

an average of a kernel

regression estimator over some components, holding others fixed.

The growth of kernel

applications suggests the need for a general variance estimator, that applies to

This paper presents one such estimator.

cases, including partial means.

An

many

Also, the paper

gives general results on asymptotic normality of functionals of kernel estimators.

means control for covariates by averaging over them.

Partial

They are related to

additive nonparametric models and have important uses in economics, as further discussed

below.

It is

shown here that

their convergence rate is determined by the

components that are averaged

out, being faster the

number of

more components that are averaged

over.

The variance estimator

is

based on differentiating the functional with respect to the

contribution of each observation to the kernel.

A more common method

asymptotic variance formula and then "plug-in" consistent estimators. quite difficult case.

when the asymptotic formula

In contrast, the

functional and kernel.

In this

way

Also,

it

This method can be

complicated, as often seems to be the

it

is

gives consistent standard errors even for fixed is

centered at

like the

An alternative approach

its limit),

unlike the

more common

Huber (1967) asymptotic variance for m-estimators.

Also, it is a generalization of the "delta

bootstrap.

to calculate the

approach described here only requires knowing the form of the

bandwidths (when the estimator approach.

is

is

method" for functions of sample means.

to variance estimation, or confidence intervals, is the

The bootstrap may give consistent confidence intervals

-

1

-

(e.g.

by the

percentile method) for the

not appear to be known.

same types of functionals considered here, although

In any case, variance estimates are useful for bootstrap

improvements to the asymptotic distribution, as considered

The variance formula given here has antecedents density at a point

it

is

this does

in Hall (1992).

in the literature.

For a kernel

equal to the sample variance of the kernel observations, as

recently considered by Hall (1992).

For a kernel regression at a point, a related

estimator was proposed by Bierens (1987).

Also, the standard errors for average

derivatives in Hardle and Stoker (1989) and Powell, Stock, and Stoker (1989) are equal to this estimator

when the kernel

is

New

symmetric.

cases included here are partial means

and estimators that depend (possibly) nonlinearly on

of the density or regression

all

function, and not just on its value at sample points.

Section 2 sets up m-estimators that depend on kernel densities or regressions, and gives examples.

estimator.

Section 3 gives the standard errors,

i.e.

the asymptotic variance

Section 4 describes partial means and their estimators, and associated

asymptotic theory.

Section 5 gives some general lemmas that are useful for the

asymptotic theory of partial means, and more generally for other nonlinear functionals of kernel estimators.

The proofs are collected

in

Appendix

A,

and Appendix B contains some

technical lemmas.

2.

The Estimators

The estimators considered is

in this

To describe the

a vector of kernel estimators.

vector of variables,

x

a

k x

paper are two step estimators where the first step

1

denote the product of the density

first step, let

y

be a

r

x

1

vector of continuously distributed variables, and f

n (x)

of

x

with

- 2 -

E[y|x]

as

h

(2.1)

Let

Q

(x)

= E[y|x]f

(x).

where

in Section 4,

u

k x

is

that include observations

Let

1.

and

y.

x.

l

K

=

(u)

-k a-

z.,

(i

on

h(x) = n

This estimator

is

A second

^"y.K

h.

u

n"

This

is

h,

($

,

and not just

E[m(z,£

1

is

an m-estimator that depends on the

To describe such an estimator, and

,h

)]

=

0.

its

h.

let

|3

denote a vector of

a vector of functions that depend on the

m(z,/3,h)

observation, parameter, and the function

(2.3)

and

(x-x.).

parameters, with true value

sample equation

>

o-

the first step considered here.

estimated function

Suppose that

Then for a bandwidth

x.

is

h_.

step allowed for in this paper

entire function

denote data observations,

n),

1

and

y

a kernel estimator of

K(u/o-),

=

and other conditions given

1

l

o"

(2.2)

TX(u)du =

denote a kernel function satisfying

X(u)

Here

m(z,j3,h)

is

allowed to depend on the

value at observed points; see below for examples.

A second

step estimator

|3

that solves a corresponding

is

n m(z.,/3,h) = 0. *n=l i £.

i

a two-step m-estimator where the first step

the kernel estimator described

is

above.

This estimator includes as special cases functions of kernel estimators evaluated at points, e.g. a kernel density estimator at a point.

Some other interesting examples

are as follows:

Partial Means:

An example that

is

(apparently)

new

regression over some variables holding others fixed.

and

g n (x) = E[q|x].

Partition

x = (x ,x

)

and

let

- 3 -

is

an average of a nonparametric

Let

x„

q

denote a random variable

be a variable that

is

included

and has the same dimension as

z

in

x(x

&

be some fixed value for

x

and

x„,

(2.3a)

/3

This object

A

= E[T(x

Q

2

)g

(x

x

r 2

mean

partial

an average over some conditioning variables holding others fixed.

is

estimator

y =

Let

expectation.

f(x) = h,(x),

and

|3

= n

This estimator )h (x

is

= (x,,x„.).

x.

is

^TlXylilx.).

m(z,0,h) =

a special case of equation (2.3) with

,x„)/h (x ,x„) -

It

/3.

shows how explicit estimators can be included as Further discussion

special cases of equation (2.3).

given in Section

is

4.

An estimator with economic applications

Differential Equation Solution:

solves a differential equation depending on a nonparametric regression.

estimator, let (x ,x )'.

,

y = (l,q)

x

and suppose

is

be some fixed value for

x

denoted by

p

and

p

,

with

p


M VM

will be an asymptotic 95 percent confidence interval.

interesting to note that the |3

-

will be asymptotically valid.

Var(/3)/n

1/2

/3

cr

very hard to construct,

5.

C,.

5.

can be calculated by

Alternatively, if the analytic

can be calculated as the numerical derivative of

7 -

n y. ,m(z.J,h +

du

and

)

f (x

)

are continuous, and

Pk-k

< «; (fi

- P )

Q

-^

9

i/ln(n)

v) n

N(0,V).

V.

The conditions here embody "undersmoothing," meaning that the bias goes to zero faster than the variance. distribution

is

Undersmoothing

reflected in the conclusion, where the limiting

is

centered at zero, rather than at a bias term.

An improved convergence rate for in the

k /2

normalizing factor

v'ntT

partial

means over pointwise estimators

k -1/2

the asymptotic distribution result

is

(no- l)

while the corresponding rate from the

k -1/2

usual asymptotic normality result for pointwise estimators

converges to zero slower by is

cr

is

(n

such that

\\m(z,P,h)-m(z,p,h

(5.1)

sup

i)

n

)

E[sup

d £ A+l, ln(n)/(ncr E[b(z)] < )\\

oo,

and for

* b(z)(\\h-h

"

l \\n~ £.

i

\\

f

.

x e

X.

j.

m

Let

m(z,fi,h

compact, and

are satisfied with

and for any

the derivatives do not exist for some

if

uniform convergence

conclusion of the following result.

Lemma

x,

l B{x)/dx'\\,

IIBII

is

denote any

elements of

all

contained in the support of

where

is

3 B(x)/5x^

show smoothness

let

j

IIBII.smax,

.

let

order partial derivatives of

j

denote a set that

nonnegative integer

quite easy to

functionals.

To define the norm, for a matrix of functions B(x)

B(x).

is

= E[m(z,/3,h )]. n

(£)

continuous at each

is

CR

)

\\m(z,p,h



/3

m(z,p,h) - E[m(z,p,h

and

>

Then




co.

Then

m(h) = fD(z,h)dF(z),

CC

n

Jm(z.,h)-m(z.,h n )]/n Y *n=2

Vn(r

i

Op

= VRa^ [m(h)-m(h J] + o

i

The conditions of this result imply Frechet differentiability at function of

h,

in the Sobolev

norm

llhll

A = A

with different norms, rather than

,.

max{A = A

h_

of

m(z,h)

as a

The remainder bounds are formulated

... ,A > r 2

to allow

,

(1).

weaker conditions for

asymptotic normality in some cases.

Asymptotic normality of either

Lemma

Lemmas

5.2 or 5.3.

5.2 and 5.3 that

cr

m(z.,h)/Vn

2>

In the v'n-consistent case of

n

In the

n ,m(z.,h,J/'/n ^1=1 l

-^

0,

Lemma

n

Var(m(z.,h )+5.)

Lemma so that

5.3 and o-

T.

n

^1=1 i

a

>

0,

m(z.,n)/vn' l

- 18 -

it

>

N(0,V).

so that

follows by the central

m(h) = JD(z,h)dF(z)

will be the case that



5.4 with

from

5.2, it will follow

,m(z.,h)/Vn = Y. Am{z.,h n ) + S.-E[d.]}/Vn + o (1), ^1=1 ^i=l l l l l p

slower than v'n-consistent case, where

satisfies the conditions of o-°T.

Lemma

V.

asymptotic normality, with asymptotic variance limit theorem.

can be shown by combining

Assumption oo;

For

ii)

linear on llh-hll

A



Pn

>

Lemma

and


U A_

.

e



llh-h.ll.

Sa

5.2:

+

),

—>

co.

m(h) = SD(z,h;f$ n ,h n )dF(z)

= v(x.)y.,

T.

_IIS.-5.il

/n —^-»

satisfies the conditions of

Lemma

and 5.4,

Appendix

C

Throughout the appendix different uses and

= £•_,

£.

Proofs of Theorems

A:

T

and

Also, CS, M,

different in

Cauchy-Schwartz,

will refer to the

DCT

Markov, and triangle inequalities, respectively, and

to the dominated convergence

Before proving the results in the body of the paper

theorem.

may be

will denote a generic constant that

is

it

useful to state and

prove some intermediate results.

Proof of Theorem

x = (x ,x

Let II

= sup

nil

t(x) = x(x

),

x

for

e

_1

D(z,h-h;h)| =

|

[h.(x)

a = k,/2.

_r

,

,

i)

n ^- 2s+a

„l/2 a+s-k/2

2[ln(n)J

+ vncr

a-

x e

all

=

[ln(n)/(ner



_

,

by

>

.

k—



v'na-

>

Thus, the conclusion of -1 (x(t))

Let

= x(x(t))f

a.e.

u(t)

Lemma

(x(t))

n,



5.4 are satisfied.

m(h) = JD(z,h;h )dF(z) = for

(t)dt,

= x

t

This function f

,

The other conditions of Assumption

x.

ti

_ 0,

>

a+s

and zero outside a compact set by continuity of

assumption about



*

|D(z,h;h)|

v'na-

,

a-k.^-.

,.

and

llhll

5.5.

|m(z,/3,h) - m(z,j3,h) -

< c,

so that the rate hypotheses of

oo,

[h (x(t)) -

J*T(x(t))f

,

lnlnjcr

goes to zero faster than some power of implies that

+

)]

2

_1

small enough that

e

llh.-h._ll

s Cllh-hll

|

and

iii),

D(z,h;h) = x(x)h (x)

Choose

n ).

Then for

X.

h (x) - l]D(z,h-h;h)

Then for

1

be the compact set of hypothesis

D(z,h) = D(z,h;h

and

bounded below by

X

),

m(z,h) = x(x)h (x)/h (x),

Let

llh(x)ll.

{h_(x)/h (x)}h (x)],

Let

The proof proceeds by checking the conditions of Lemmas 5.3 -

4.1:

v'no-

Lemma

l

and

f_, 5.1

v'no-

)]

I.{m(z.,n)-E[m(z,h )]>/n

x(t)

=

(x-.t).

bounded and continuous g

,

and by the

are also satisfied by



5.3 holds, for

= E[m(z,h

- 20

=

is

and

2

>

oo

V

by hypothesis and in equation (4.3).

a

a

=

i/n 0.

~\/Vn)

Then

-H> 0.

A

the conclusion follows by T.

E[D(z.,h-hJ

By J

2

Lemma

D.. ij

It

2

2 ](

llh-h^ll

JT>(z,,y.K (•-x.);|3_,h.JdF(z). IID..-D..II ij

ij

lO

lO

i

2

J



»

0.

Then

QED.

n

o"

)

.

A

follows by a standard argument, similar to the proof of

= D(z.,y.K (•-x.);0 n ,h n ), 0' i oJ

l

Let

E.

J

=

D. i«

£ E[b(z)

]

B.4,

/ cr'TAD. • -E[D. • ]-D(z.,h )+E[D(z.,h,,)]>/v n -?-> 0.

Jlm(z.,0,n)-m(z.,0_,h_)ll /n -5-» 0. ^i=l l i

that

1

IJ

l

^i

5.1,

=E[D..|z.]

=

l

Thus, by Chebyshev's inequality,

5.5:

for

s

.]

l

Lemma

2



p

Hh-hJL

l

Proof of

2

Then by a V-statistic projection on the basic observations

l.

+(E[D 0,

=

1

2

=

Ilfi-h.ll.

.

ij

ij



\\

r.D(z.,h)/n n

D(z,h),

ii

E[b(z)

Then by hypothesis

-^-» 0.

l

2

The

0.

QED.

Hh-lrJI

B.3,

a ll s

llcAi

1

k {l+v(x,(t)-oni,t)>f-(x (t)-(ru,t)dtdu/n

conclusion then follows by the Liapunov central limit theorem.

Proof of

1

4 l

o*

=

]

(x.)ll (T

1

1

V.

>

1

E[llp

1

(x,(t)-«ru,t)ll

l



(t)-

V

——» V

5.3 note that

n

~\(5

p

Q pn

+7)

2

2 /n](j:.lly.ll

/n)

A 2 2)] ) = o (1). n p

2 E[II5.-5.II

a cr

T and

follows by 5.

= p

E[5.]

2 n

2 ]

]

* Gr

£ Qr

2a (E[lln

2a

n

^

(D -«

u

2

1

(E[IID

II

2 )B

+ E[IID

]

Under the conditions of lemma

-£-» 0.

1



>

0,

V.(5.-y .5 ./n)' u\ 1 ^j.5 ./n)(5.-y 1 ^j j j

y.HS.-S.II ^1 1 1

E[5.6'.

11

—5->

V,

]

2

5.2, it

/n -^>

+ n

]

2

1

E[IID

n

ll

]

2 II

])

was shown

in the



-1 >

V,

and

n

4a o-

follows by J

Lemma

4 E[H6.ll

]



>

0.

5.3.

so the conclusion follows by T.

As shown

in

Therefore, by M,

1

- 24 -

T.

Under the conditions of Lemma

defined in the proof of e

p (x) r xr

i

cr

so that

the law of large numbers.

2a



0,

>

]

1

for

(x.)y. J

c

1

(r

~k

[5:.b( Z .)

1

5.2 that

1

that proof,

CC

2a

0,

1

Then

* Qr

= OJ[a-

)

O A

2 /!!

i.i.d.,

E[

M/2) + Prob(5

and the triangle inequality,

su P;r |H(x)-E[H(x)]

sup T |H(x)-H(x)

=

|

0(c-

If Assumptions

K, H,

Therefore, by eq.

The conclusion then follows by

(5).

applying this conclusion to each derivative of up to order

B.2:

< e.

p

J.

Lemma

M/2)

>

|

and by

j

and Y are satisfied for

d £ j+s

QED.

bounded.

cr

then

\\E[h]-h

II

.

=

m ).

Proof:

that by

1

around

=

-

a

IIS

it

Lemma

B.3:

J

h/5x J -

Ccr

C

m

o-

J

a 'h (x)/ax

J ll

mXX(u){® m r

=l ]

=

d s j+s

Hh-hJI

.

J

=

J

=

1/2 (n®{a

m [T|K(u)|llull m du]llsup

U

C„,

If the hypotheses of Lemmas

satisfied with

(B.7)

J

9 E[h](x)/dx

for constant matrices

0,

so

1

h (x) = TK(u)h (x)du.

follows that

+

*

10"

having finite support,

X(u)

J~K(u)du =

(B.6)

k

E[h](x) = E[y.K (x-x.)] = J"h(t)[X((x-t)/

Suggest Documents