Jul 10, 1993 - This paper gives convergence rates these estimators. ... improves on mau>y previousresults in the convergence rate or generality of regularity.
Digitized by the Internet Archive in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/convergenceratesOOnewe
working paper department of economics
massachusetts institute of
technology 50 memorial drive Cambridge, mass. 02139
CONVERGENCE RATES FOR SERIES ESTIMATORS
Whitney No.
93-10
K.
Newey July 1993
W6 12.9J* rEGBVk
i
CONVERGENCE RATES FOR SERIES ESTIMATORS
Whitney K. Newey
MIT Department of Economics July, 1993
This paper consists of part of one originally titled "Consistency and Asymptotic Normality of Nonparametric Projection Estimators." Helpful comments were provided by Andreas Buja and financial support by the NSF and the Sloan Foundation.
Abstract
Least squares projections are a useful way of describing the relationship between
random
variables.
functions.
These include conditional expectations and projections on additive
Series estimators,
i.e.
regressions on a finite dimensional vector where
dimension grows with sample size, provide a convenient way of estimating such projections.
This paper gives convergence rates these estimators.
derived, and primitive regularity conditions given for
Keywords:
General results are
power series and
splines.
Nonparametric regression, additive interactive models, random coefficients,
polynomials, splines, convergence rates.
1.
Introduction
Least squares projections of a random variable
x
provide a useful
example
is
way
on functions of a random vector
of describing the relationship between
and
y
The simplest
x.
linear regression, the least squares projection on the set of linear
combinations of
as exemplified in Rao (1973, Chapter
x,
nonpar ametric example functions of fall in
y
x
is
4).
An interesting
the conditional expectation, the projection on the set of all
with finite mean square.
There are also a variety of projections that
between these two polar cases, where the set of functions
One example
linear combinations but smaller than all functions.
is
is
larger than
all
an additive
regression, the projection on functions that are additive in the different elements of x.
This case
is
motivated partly by the difficulty of estimating conditional
expectations when
Friedman (1977).
(1985),
x
has many components: see Breiman and Stone (1978), Breiman and
Friedman and Stuetzle
A generalization
(1981),
Stone (1985), and Zeldin and Thomas
that includes some interaction terms
functions that are additive in some subvectors of
combinations of functions of
x,
x.
is
the projection on
Another example
is
random
linear
as suggested by Riedel (1992) for growth curve
estimation.
One simple way to estimate nonparametric projections
is
by regression on a finite
dimensional subset, with dimension allowed to grow with the sample size, e.g. as in
Agarwal and Studden
(1980), Gallant (1981), Stone (1985),
which will be referred to here as series estimation.
Cox
(1988),
and Andrews
This type of estimator
may
(1991),
not be
good at recovering the "fine structure" of the projection relative to other smoothers, e.g.
see Buja, Hastie, and Tibshirani (1989), but
is
computationally simple.
Also,
projections often show up as nuisance functions in semiparametric estimation, where the fine structure is less important.
This paper derives convergence rates for series estimators of projections.
Convergence rates are important because they show how dimension affects the asymptotic
accuracy of the estimators
(e.g.
Stone 1982, 1985).
Also, they are useful for the
theory of semiparametric estimators that depend on projection estimates 1993a).
(e.g.
Newey
The paper gives mean-square rates for estimation of the projection and uniform
convergence rates for estimation of functions and derivatives.
Fully primitive
regularity conditions are given for power series and regression splines, as well as more
general conditions that
may apply
to other types of series.
Previous work on convergence rates for series estimates includes Agarwal and Studden
Cox
(1980), Stone (1935, 1990),
(1988),
improves on mau>y previous results
and Andrews and Whang (1990).
in the
This paper
convergence rate or generality of regularity
Uniform convergence rates for functions and their derivatives are given and
conditions.
some of the results allow for a data-based number of approximating terms, unlike
Cox in
but
all
Also, the projection does not have to equal the conditional expectation, as
(1988).
Stone (1985, 1990) but not the others.
2.
Series Estimators
The results of this paper concern estimators of least squares projections that can be described as follows. functions of
z,
Let
x
with
z
denote a data observation,
having dimension
linear subspace of the set of all functions of
projection of
(2.1)
y
on
&
is
„E[
(measurable)
denote a mean-squared closed,
with finite mean-square.
The
2 ].
the conditional expectation,
measurable functions of
x
^
x
and
is
g Q (x) = argmin
An example
Let
r.
y
x
g n (x) = E[y|x],
with finite mean-square.
as illustrations, and are of interest in their
own
right.
Two
where
!»
is
the set of
all
further examples will be used
When
Additive- Interactive Projections: difficult to estimate
general x,
way
a feature often referred to as the "curse of
E[y|x],
so that the individual components have smaller dimension than
x,
to describe these is to let
W =
For example,
{Z^g^)
if
L =
additive functions.
:
ngt (xt
and each
r
Z )
)
x.
The projection on
nonparametric nonlinearities
x.,
(1
=
L)
1
x.
One
be distinct subvectors of
,
are satisfied, then
2 2a/a ;. S[g(x)-g (x)] dF(x) = O (K/n * K~
The integrated mean square error result for splines that been derived by Stone (1990). (1990) give the
The rest of
An implication of Theorem
optimal integrated mean-square convergence rate
between certain bounds. and
a
>
If
3o/2,
attains Stone's (1982) bound.
= Cn
Andrews and Whang
this result is new, although
same conclusion for the sample mean square error of power series under
different hypotheses.
= a/(2A+a),
given here has previously
is
C a
there are
c >
4.1 is that
if
power series
the number of terms
such that
K =
cn
then the mean-square convergence rate
The side condition that
A
satisfies Assumption 4.2.
spline version of Stone (1990), but
it
&
>
3n/2
is
will have an is
chosen randomly
K = Cn
,
n
~
,
,
needed to ensure
o.
>
K
a/2.
Theorem 3.2 can be specialized to obtain uniform convergence rates for power
13
which
similar side condition is present for the
has the less strigent form of
and spline estimators.
where
series
y
Theorem
If Assumptions
4.2:
\g -
g
\
3.1,
(K[(K/n)
=
and 1/2
4.1 - 4.3
are satisfied, then for power series
K^l),
+
and for regression splines,
If - g Q
\
=
O (K p
1/2
[(K/n)
1/2
+
K^l).
Obtaining uniform convergence rates for derivatives
is
more
approxirnaton rates are difficult to find in the literature. function
argument
because
When the argument of each
only one dimensional, an approximation rate follows by a simple integration
is
Lemma
see
(e.g.
A. 12 in the Appendix).
convergence rate for the one-dimensional
Theorem
difficult,
4.3:
If Assumptions
3.1
\g -
additive model) case.
(i.e.
and 4.1-4.3 are satisfied,
m
power series or a regression spline with i-
This approach leads to the following
n /T?l + 2d,{[K/n] rlz, .1/2 gQ d = O (K i
\
fc
1 =
1,
d
< &,
p
(x)
h-d, then for power series,
d,
-,-A+d..
+
jc
;;,
and for splines
\g -
gQ d = \
In the case of
power
(K
{[K/n]
+
K
}).
series, it is possible to obtain
an approximation rate by a
Taylor expansion argument when the derivatives do not grow too fast with their order.
The rate
is
faster than any power of
K,
14
leading to the following result.
is a
Theorem
4.4:
If Assumptions
C
3.1
and 4.1-4.3 are satisfied,
such that for each multi-index
and there
is
derivative
of each additive component of
a constant
a
any positive integers
\g-g
\
d
-
and
o (K
1+2d
X,
(x) the
is a
X
power
series,
partial
exists and is bounded by
g(x)
C
,
then for
d,
{[K/n]
1/2
*
p
jf
a ;;.
The uniform convergence rates are not optimal improve on existing results.
p
in the
sense of Stone (1982), but they
For the one regressor, power series case Theorem 4.2
improves on Cox's (1988) rate of
(K ).
For the other cases there do
not seem to be any existing results in the literature, so that Theorems 4.2 - 4.4 give
the only uniform convergence rates available.
It
would be interesting to obtain further
improvements on these results, and investigate the possibility of attaining optimal uniform convergence rates for series estimators of additive interactive models.
5.
Covariate Interactive Projections.
Estimation of random coefficient projections provides a second example of how the general results of Section 3 can be applied to specific estimators.
This Section gives
convergence rates for power series and regression spline estimators of projections on the set
&
described in equation (2.3).
For simplicity, results will be restricted to
mean-square and uniform convergence rates for the function, but not for Also, the
u
K.
in
equation (2.3) will each be taken equal to the set of
all
its derivatives.
functions of
with finite mean-square.
Convergence rates can be derived under the following analog to the conditions of Section
4.
15
Assumption
5.1:
u
i)
continuously distributed with a support that
is
product of compact intervals, and bounded density that
K
ii)
p.
K
(u)
is
—
r [-1,1]
is
= is
...,
1,
_4 K /n
a power series with
K(n) = K(n) = K,
,
is
L),
and
p (x) =
»
0,
or
b) P (u) kK
2
—
K /n
and
>
0.
iii)
continuously differentiable of order
E[ww'
bounded, and
support of
—
K
£
restricted to be a multiple of
is
w®p
K/£ (u)
a cartesian
away from
a.
are splines, the support of
on the support of is
zero.
where either
Each of the components
has smallest eigenvalue that
|u]
also bounded
is
is
h,
u.;
(u),
iv)
u (I
w
bounded away from zero on the
u..
These conditions lead to the following result on mean-square convergence.
Theorem
5.1:
If Assumptions
Z^&xJ-grfxjf/n Also,
is
=
and
are satisfied, then
5.1
(K/n + k'
^
2
1
"),
2
0,
x
subvector of
is a
c >
1
such that for each
cSa(x)dlF(x )'F(x (
{Z^^x^t ElhfxJ2]
x„
< », I =
C )]
*
t
L)
1
closed in mean-square.
Proof:
Let
H
=
L
-
-
iZ^hfa)}
and
II
2 1/2 a ^ = [Ja(x) dF(x)J II
.
By Proposition 2 of Section
4 of the Appendix of Bickel, Klaasen, Ritov, and Wellner (1993),
17
K
is
closed
if
and
H
.}
only
there
if
(note
h,
C
a constant
is
such that for each
need not be unique).
h-
maximal dimension of
h e H,
h
II
£ Cmax.dlh.ll
II
Lemma
Following Stone (1990,
1),
for some
}
suppose that the
x.
is
r,
and suppose that this property holds whenever the
maximal dimension of the
x-
is
r-1
= E»,h.(x.), E[h.(xJS(x.,
such that for =
)]
Consequently,
for all measurable functions of
-1
~
E[h.(x.
2
To show
1.
)
— x.
components
that there
x.,
is
a unique decomposition
is
a constant
x.,
c
xf >
that
,
is
not a proper
such that
1
E[h(x)
that are not components of
function of a strict subvector of
2
1
s c" J
maxKsK3^ llL.-Zj.ll =
Then by the Markov inequality, conclusion then follows by
=
0,
and
,
k
a
I,
3P(f,m,x)/3x. = P(3f/3x.,m-l,x),
Also, 3 f(x)
C m(K)
such that
* a f (x)_sAf (x)
case, note that
there is
f(x)\ s
form of the remainder,
= P(f,m(K),x).
'
CK~
max r \d
is
denote the Taylor series up
P(f,m,x)
f(x),
X
differentiable of order
Proof:
all
f(x) - P(3 f,m-| A|,x)|
For splines, if
A.14:
iZ-p^'nl
such that for
be the largest integer such that
/[(m(K)-d)!] s
sup
such that fix)
A,
>
3 P(f,m,x) = P(3 f,m-|A|,x).
m(K)
C
is
.
star-shaped, there exists
max
and
C
there is
so that by the intermediate value
p (x),
star-shaped and there
orders and for all multi-indices
for an expansion around
by induction
Next, let
is
U
Oss
all
to order
1
all
d >
a,
QED
.
For power series, if
A.13:
continuously differentiable of
C
+1
d =
3 p (x)/3x
X
Is
X sd |
X
'
(x)_P(aXf m(K)_ X '
a compact box and
then there are
a = /-d
^
for
a,
r =
1
C > and
I
fix)
I
>
is
x)
I
"
CK_a
K
such that for
all
d s m-1
a = £/r
and
a spanning vector for splines of degree
23
-
continuously
follows by Theorem 12.8 of Schumaker (1981). is
QED
-
there is
for
n
d =
0.
For the other
m-d,
with knot
spacing bounded by
w
there exists
OK
CK
K
for
such that for
K
f
large enough and some
K
= p
(x)
K
(x)'7r
K
,
sup
Therefore, by Powell
C.
d
x
3 f(x)/3x
l
d
d
- 3 f
(x)/3x
(1981),
d
0)P,
the elements of
knots,
-
ck 5+v+2\
QED.
.
Next, a well known property of B-splines that for
P
s
satisfied then Assumptions 3.2 and
4.1 is
x = x_
where
corresponding to components of
Theorem
CK
a
for the knot sequence
m, j,
^(K)
Then existence of a nonsingular matrix by inclusion
5 * ,HW
so
(x)
-
,
/rf -
iv ***' 5)
equation (2.3),
in
J
last equality follows by
A.16:
as
,
J
J
cn'iwuk-.)]
Mk-s)
= C*C
is
C
with
X
.
min
(I
1
P. .
.
c,L
p.
1587) that for
(x)P. . (x)'dx) £ t,L
C
P. ,(x) =
for
all
Therefore, the boundedness away from zero of the smallest
P
K
(x)
a subvector of
25
r
®»_,P/
,
(
x #). analogously to the proof of
Lemma
since changing even knot spacing
Also,
A. 14.
argument of B-splines,
sup_ \d B
Ax)/dx
.
Lemma
that there
n
K
2 (x.)'ji]
t
Lemma
Also, by
let
/n s sup
l
„|g n (x)-p
K
K
let
Z = JT (x)P (x)'dF(x)
A. 10
and
let
re
the hypotheses of
P
K
=
(K
_2a )
so
= O (K —
_2a ).
p
—
= (K/n)
e
1/2
The
A. 7.
e
K p
hypotheses of
Lemma
A. 8,
By Assumption 3.2 and Lemmas
)]'.
n
replacing
(x)
—
= (K/n)
Then
(x).
eq.
1/2
For each
.
(A.2) is
K 2 T[g (x)-P (x)'Tt] dF(x) s
n
(K/n) +
+
Proof of Theorem 3.2: (x),
d = 0,
Then by the second conclusion of Lemma A.8,
).
p
p
K
the
K 2 2 J[g (x)-i(x)] dF(x) s 2T[g (x)-P (x)'n] dF(x) + 2(n-7t)'Z(n-ir)
s
p
K
_2a
with
A.8 are satisfied with
K
(K
CK
s
In the
A. 8.
K
p (x)) and
—let
*y
(x)'ir|
P
Lemma
P (x
[pfyxj]
Lemma
replacing
(x)
\C
„|g n (x)-P
(A.3)
p
be as above, except with
satisfied (with
sup
and
2
(x)'ir|
A. 7 is satisfied
Lemma
proven using
is
K
u
xea.
Lemma
the hypothesis of
A.ll,
The second conclusion
K
in
be that from Assumption 3.4 with
it
first conclusion then follows by the conclusion of
A.ll,
present follows as
is
such that
E.i=l [g u n (x.)-p l
(A.2)
K
For each
3.1:
C
is
x
implying the bounds on
QED.
8.4.
Proof of Theorem
d s m,
,
The proof when
derivatives given in the conclusion.
proof of
J CL
I
equivalent to rescaling the
is
Because
P
(l)r.
[g ft ^1=1 °0
(x)
is
i
|g_(x)-P%x)'ir|
b),
.,
d
when
K a
so that
K,
n
Lemma
2 (x.)'Tr]
p
P
K
K
|g-(x)-P (x)'n|
,
d
=
_a
p
(K —
).
(x).
p
2a ).
QED.
Also, by
|g_(x)-P (x)'ir|. s
Also,
Lemma
A.8 and the triangle inequality,
26
(K/n + — K
K
replacing
(x)
can be chosen so that
O
/n =
l
of Theorem 3.1 that eq. (A.2) and the hypotheses of first conclusion of
K
l
a constant nonsingular linear transformation of
Assumption 3.4 will be satisfied for
Assumption 3.3
(x.)-P
it
follows as in the proof
A.8 are satisfied.
Then by the
K
(A.4)
lg -£l
K/
s lg -P 'nl
d
IP
d
pap
= ° (K~
Proof of Theorem
=
C.(K)0 ((K/n)
CK
wflere for eacn
and
A. 12
A. 14
+K
_a
d =
a = a/a.
and
s O (K p
=
)
4.1 it
d
d
1/2
+K~ ~~
Lemma
A. 3 there exists
Lemma
x,
follows by Theorem 3.1 with for splines.
d =
K
there
n = £.n.,
Lemma
K
A. 3 are
a representation
is
n
I
Assumption 3.4
a.
lg n » - P
with
K
'
n n \
satisfied
is
Assumptions 3.2 and equation
A. 15,
C Q (K) =
and
QED.
]).
less than or equal to
is
(3.1)
are
Then the conclusion
and Assumption 4.2 implies that Assumption 3.3 holds.
satisfied,
a
follows that the hypotheses of
inequality, for
Also, by
+ C (K)"*-wll
)
(C,
tne dimension of
*
a
pa,(K)[(K/n)
follows that for each
it
Then by the triangle
with
1/2
By Assumption
4.1:
E/8n^ x ^'
Then by Lemmas s
+
)
Therefore, by the conclusion of
satisfied.
SqM
a
(w-w)l
C n (K) =
for power series and
K
1/2
QED.
Proof of Theorem 4.2:
It
follows as in the proof of Theorem 4.1 that Assumptions 3.1 -
3.4 are satisfied, with £ (K) = n
K
C n (K) =
for power series and
K
1/2
The
for splines.
conclusion then follows by Theorem 3.2.
QED.
Proof of Theorem 4.3:
proof of Theorem 4.2, except that Assumption 3.4
is
now
(3.1)
< d (K) =
a = -&+d
satisfied with
are
now
K
satisfied with
(1/2)+d
K
A. 12
and
that Assumption 4.1
A. 12 is
and Assumption 3.2 and equation
A. 14,
for power series, by
Lemma
A.16.
Lemma
A. 15,
Lemma
and
A. 14,
a
Assumption 3.4
satisfied with
u
replacing
equal to the vector from the conclusion of
27
is
x.
satisfied with
Let
Lemmas
A. 13 is
QED.
> 0.
By
similar to that of Theorems 4.1 and 4.2.
is
and with
QED.
Follows as in the proof of Theorem 4.3, except that
The proof
5.1:
bounded and Lemmas
(u)
Lemmas
by
show that Assumption 3.4 holds for any
Proof of Theorem
P
in the
Cj(K) =
for splines, by
Proof of Theorem 4.4: applied to
Follows as
P
(x)
a =