Testing linearity of regression models with dependent errors by kernel based methods Stefanie Biedermann Ruhr-Universitat Bochum Fakultat fur Mathematik 44780 Bochum Germany
email:
[email protected]
Holger Dette Ruhr-Universitat Bochum Fakultat fur Mathematik 44780 Bochum Germany
email:
[email protected] FAX: +49 2 34 3214 559 Tel.: +49 2 34 322 82 84
July 20, 2000 Abstract
In a recent paper Gonzalez Manteiga and Vilar Fernandez (1995) considered the problem of testing linearity of a regression under MA(1) structure of the errors using a weighted L2 -distance between a parametric and a nonparametric t. They established asymptotic normality of the corresponding test statistic under the hypothesis and under local alternatives. In the present paper we extend these results and establish asymptotic normality of the statistic under xed alternatives. This result is then used to prove that the optimal (with respect to uniform maximization of power) weight function in the test of Gonzalez Manteiga and Vilar Fernandez (1995) is given by the Lebesgue measure independently of the design density. The paper also discusses several extensions of tests proposed by Azzalini and Bowman (1993), Zheng (1996) and Dette (1999) to the case of non-independent errors and compares these methods with the method of Gonzalez Manteiga and Vilar Fernandez (1995). It is demonstrated that among the kernel based methods the approach of the latter authors is the most ecient from an asymptotic point of view.
Keywords: Test of linearity, nonparametric regression, moving average process, optimal weighted least squares, asymptotic relative eciency
1 Introduction Consider the common nonparametric regression model (1.1) Yi = m(xi ) + "i i = 1; : : : ; n 1
where m denotes the regression function, xi the ith explanatory variable varying in the interval [0; 1] and the "i form a triangular array of random errors with zero mean. It is an important question in applied statistics if a linear model describes the data adequately , i.e.
H : m(x) =
(1.2)
0
p X j =1
#j gj (x) 8 x 2 [0; 1];
where g ; : : : ; gp are given linearly independent functions and = (# ; : : : ; #p)T 2 R p denotes the vector of parameters. Linear models are attractive among practitioners because they describe the relation between the response and the predictor in a concise way. Much eort has been devoted to the problem of checking linearity in the recent literature, because misspeci cation of a linear model may lead to serious errors in the subsequent data analysis. For some recent literature we refer to Eubank and Hart (1992), Azzalini and Bowman (1993), Brodeau (1993), Stute, Gonzalez Manteiga, Presedo Quindimil (1998), Dette and Munk (1998), Alcala, Christobal, Gonzalez Manteiga (1999) or Dette (1999). While most authors consider the case of independent innovations much less progress has been made in the problem of checking linearity in the case of dependent errors. Recently, Gonzalez Manteiga and Vilar Fernandez (1995) studied the problem of testing the lack of t of a parametric regression under an MA(1) structure of the errors by considering the weighted L -distance 1
1
2
p n X X ^Tn = 1 fm^ n (xi) ? #^j gj (xi)g w(xi) ni j
(1.3)
(1)
2
=1
=1
where w denotes a (known) weight function, ^n = (#^ ; : : : ; #^p)T the weighted (with respect to w) LSE in the assumed linear regression and m^ n is the nonparametric curve estimator of Gasser and Muller (1979). Note that originally a smoothed version of the parametric t was considered in Tn in order to avoid problems with the bias [see also Hardle and Mammen (1993)]. The dierences between the two statistics are minor and will be explained at the end of Section 2. Therefore we will also denote Tn as the statistic introduced by Gonzalez Manteiga and Vilar Fernandez (1995). The statistic (1.3) de nes an empirical distance between a parametric and nonparametric estimate of the regression and the null hypothesis (1.2) is rejected for large values of T^n : Gonzalez Manteiga and Vilar Fernandez (1995) proved asymptotic normality of T^n under the hypothesis of linearity and under local alternatives and as a consequence they obtained the consistency of this procedure. A bootstrap procedure of this test was examined by means of a simulation study in Vilar Fernandez and Gonzalez Manteiga (1996). In this paper we are interested in the asymptotic behaviour of the statistic T^n (and several related tests) under xed alternatives. These results are important for at least two reasons. On the one hand we obtain estimates of the type II error which are of particular interest if the hypothesis of linearity is not rejected. On the other hand we will demonstrate below that these results can be used for the determination of an optimal weight function w in the statistic T^n such that the (asymptotic) power at any xed alternative becomes maximal. The paper will be organized as follows. In Section 2 we introduce the necessary notation and establish asymptotic normality of T^n under xed alternatives. This result is used to prove 1
(1)
(1)
(1)
(1)
(1)
(1)
(1)
2
that the uniform weight function maximizes the (asymptotic) power of the corresponding test under any xed alternative and that this property does not depend on the underlying design density. Section 3 discusses generalizations of the tests of Azzalini and Bowman (1993), Zheng (1996) and Dette (1999) to the case of errors with MA(1) structure and compares the dierent methods from a local asymptotic point of view. In particular it is shown that from an asymptotic viewpoint the approach of Gonzalez Manteiga and Vilar Fernandez (1995) yields a most ecient procedure for testing linearity under MA(1) structure of the errors. Finally, some of the proofs are given in Section 4.
2 The statistic Tn(1) and its asymptotic distribution under xed alternatives Throughout this paper we consider the regression model (1.1) with a xed design given by Z x i (2.1) f (t)dt n= where f is a positive density on the interval [0; 1] [see Sacks and Ylvisaker (1970)]. We also assume that Z M = (m(x) ? gT (x)) f (x)w(x)dx i
0
1
2 1
2
0
is minimal at a unique point 2 where denotes the interior of R p (note that M = 0 if and only if the hypothesis of linearity is valid). In the general regression model we use the nonparametric curve estimate of Gasser and Muller (1979) 0
2 1
0
0
Z s n X 1 s )ds m^ n(x) = h Yj K( x ? h s ?1 j j
(2.2)
j
=1
where s = 0; sn = 1; sj? xj sj (j = 2; : : : ; n); h is the bandwidth and K a symmetric kernel with compact support, say [?1; 1]: For the asymptotic analysis of the statistic Tn in (1.3) we require the following basic assumptions [see also Gonzalez Manteiga and Vilar Fernandez (1995)]. The design density, the regression, the weight and kernel function are assumed to be suciently smooth, that is 0
1
(1)
g ; : : : ; gp; w; f; m 2 C r [0; 1]; K 2 C [0; 1]
(2.3)
1
( )
(2)
where r 2 and C p [0; 1] denotes the set of p-times continuously dierentiable functions. Throughout this paper (2.4) Up = spanfg ; : : : ; gpg denotes the linear subspace spanned by the linearly independent regression functions g ; : : : ; gp and obviously the null hypothesis (1.2) is valid if and only if ( )
1
1
m 2 Up : 3
The errors "i are generated by a stationary causal process
"i =
1 X j =0
bj ei?j ;
where feig is a sequence of independent identically distributed random variables with zero mean, zero kurtosis, e = Var(e) < 1; such that 2
E [jeij
(2.5)
] < 1
4+2
P (for some > 0) and the autocovariance function (k) = E [" "k ] = e 1 j bj bj absolutely summable and additionally 1
1 X
(2.6)
s=?1
+1
2
=0
k
+
is
jsjj (s)j < 1:
Finally, we assume that the bandwidth in (2.2) satis es
nh = ! 1; hn
(2.7)
3 2
! 0;
=
(2+ ) (2 +2)
and that the weight function has support contained in the interval [0; 1]: The following theorem (part b) speci es the asymptotic distribution of the statistic Tn introduced by Gonzalez Manteiga and Vilar Fernandez (1995) under xed alternatives. Because there is a term missing in the asymptotic bias under the hypothesis of linearity given by the lastnamed authors we also restate it here (part a). (1)
Theorem 2.1. Assume that (2.1), (2.3) { (2.7) are satis ed and n ! 1: (a) Under the hypothesis of linearity we have
p
D n h Tn ? B nh ?! N (0; )
(2.8)
1
(1)
2 0
where the asymptotic bias and variance are given by
(2.9) (2.10)
=2 2 0
1 X s=?1
B = 1
(s)
1 X s=?1
2 Z
(s)
2
?2 Z
(K K ) (z) dz
Z
1
2
1
?1
K (z) dz
Z
1
2
0
0
w (x) dx; 2
w(x) dx;
respectively and K K denotes the convolution of K with itself. (b) Under a xed alternative m 62 Up = spanfg1 ; : : : ; gpg we have
(2.11)
pn T ? M ? B ?! D N (0; ) n nh (1)
2 1
4
1
2 1
where the asymptotic bias and variance are given by
M =
(2.12)
Z
2 1
0
=4
(2.13)
1
2 1
w(x) (x)f (x) dx; 2
1 X
s=?1
(s)
Z
1 0
w (x) (x)f (x) dx; 2
2
the orthogonal projection onto Up with respect to the = m ? PU m and PU denotes R inner product < q ; q >= q (x)q (x)w(x)f (x) dx: p
p
1
2
1
0
1
2
It is important to note the dierent rates of convergence under the null hypothesis and alternative in Theorem 2.1. While underpthe hypothesis of linearity [and under local alternatives converging to the null at a rate (n h)? = ] the variance of Tn is of order (n h)? ; it is of order n? under xed alternatives. The second part of Theorem 2.1 is particularly useful for the analysis of the type II error of the test which rejects the hypothesis whenever (1)
1 2
2
1
1
p
n hfTn ? B =nhg > u ?
(2.14)
(1)
1
1
0
[u ? is the (1 ? ) quantile of the standard normal distribution and in practice B and have to be replaced by consistent estimates]. Because the acceptance of the null hypothesis yields to a data analysis adapted to the linear model this error is often considered as more important than the type I error. By Theorem 2.1 b) the probability of such an error is approximately given by p B g>u ) P (\rejection") = P (n hfTn ? nh (2.15) ? pn pn B u ? = P fTn ? M ? nh g > p ? M nh p pn u ? ) ( n M ) ( M ? p nh where ; M and are de ned in (2.9), (2.12), (2.13), respectively. A further important application of the second part of Theorem 2.1 is given in the following corollary, which identi es an optimal weight function such that the asymptotic power becomes maximal. 1
1
(1)
1
(1)
2 1
1
1
2 0
2 1
2 1
0
1
0
1
0
1
1
1
1
2 0
1
1
2 1
2 1
2 1
Corollary 2.2. Under the assumptions of Theorem 2.1 the asymptotic power pnM 2 1
( ) of the test (2.14) is maximized for the weight function proportional to the Lebesgue density on the interval [0; 1] uniformly with respect to m 62 Up : 1
Proof. In order to make the dependence of the asymptotic power on the weight function
w more explicit we denote the quantities in (2.12) and (2.13) by M (w), w and (w); 2 1
5
2 1
respectively, and obtain R1
( w(x)w (x)f (x)dx) (w) ) = (M (2.16) R P1 (w) 4 s ?1 (s) w (x)w (x)f (x)dx R ( w(x)w (x) (x)f (x)dx) = P1 R 4 s ?1 (s) w (x)w (x)f (x)dx R (x)f (x)dx M () 4P 1 (s) = () s ?1 where denotes the Lebesgue density and p the inequality follows from Cauchy's inequality p applied to the factors w(x)w (x) f (x) and f (x) (x): Discussing equality in (2.16) shows that the optimal weight function has to be constant. Therefore the Lebesgue density (or any multiple) maximizes the asymptotic power independently of the speci c alternative. 2 2 1
2
2
2
0
1
1
=
0
2
2
1
2
0
1
=
1
0
2
2
2
2 1
0
2
1
=
Remark 2.3. We note that Gonzalez Manteiga and Vilar Fernandez (1995) worked with a
modi ed weighted LSE ~n in the de nition of Tn ; which minimizes (1)
n X i=1
fm^ n (xi) ? gT (xi )g w(xi): 2
Theorem 2.1 and Corollary 2.2 remain valid in this case. Under the null hypothesis of linearity this method avoids a bias of order O(h r ) [see also Hardle and Mammen (1993)]. However, under xed alternatives this bias also appears if the smoothed version of the weighted LSE is used. Because the main interest of this paper is the asymptotic behaviour under xed alternatives we worked with the classical weighted LSE and used a suciently small bandwidth [see assumption (2.7)] to obtain the order o(1) for the corresponding term in the bias of the standardized statistic. 2
3 Related tests of linearity. In this section we discuss the asymptotic behaviour of several related tests which were recently introduced in the context of independent observations. We begin with a test statistic proposed by Zheng (1996) X Tn = n(n ?1 1)h K ( xi ?h xj )w(xi)w(xj )^"i"^j (3.1) (2)
i6=j
where "^i are the residuals formed from a weighted least squares t, i.e. (3.2)
"^i = Yi ?
p X `=1
g`(xi)#^`
[note that in contrast to Zheng's (1996) work we introduced a weight function in the de nition of Tn ]: (2)
6
Theorem 3.1. If the assumptions of Theorem 2.1 are satis ed we have under the null hypothesis of linearity
p
?
D n h Tn ? B =nh ?! N (0; ) (2)
2 0
2
where the asymptotic variance and bias are given by
=2
(3.3)
2 0
1 X s=?1
B = K (0) 2
(s)
2 Z
1
f (x) w (x)dx 2
0
1 X
s=?1;s6=0
(s)
Z
Z
1
4
1
?1
K (z) dz 2
w (x)f (x)dx 2
0
Under a xed alternative we obtain
pnT ? M ? B~ ?! D N (0; ); n nh (2)
2
2 2
2 1
where the asymptotic bias and variance are given by
M =
Z
(x)f (x)w (x)dx
2 2
2
2
B~ = B ? K (0) 2
=4
(3.4)
2 1
1 X s=?1
2
Z
1
2
(s)
Z
1
0
(x)w (x)f (x)dx 2
2
f (x)w (x)f(fw)(x) ? PU (fw)(x)g dx 2
2
p
0
Our next example considers the asymptotic behaviour of the test of Dette (1999), who studied a dierence of variance estimators as test statistic, i.e.
Tn = ^ (3)
2 LSE
? ^ : 2 HM
Here ^ is the weighted least squares estimator of the variance in the linear regression model and ^ is a weighted version of the nonparametric estimator introduced by Hall and Marron (1990) which is de ned by 2 LSE 2 HM
n X Xn ^HM = 1 Yi ? j wij Yj w(xi) i 2
2
=1
(3.5)
= n?2 wij =
K
Pn
l=1
=1
n X i=1
wii +
? xi ?xj
K
n X n X i=1 k=1
?h x ?x : i
h
7
l
wik 2
Theorem 3.2. If the assumptions of Theorem 2.1 are satis ed we have under the null hypothesis of linearity
p
B ?! D n h Tn ? nh N (0; ) 3
(3)
2 0
where the asymptotic bias and variance are given by
B = 3
(3.6)
1 X
s=?1;s6=0
=2 2 0
1 X
(s) 2K (0) ?
(s)
s=?1
2 Z
2
Z
1
?1
1 0
w(x)dx
f2K (x) ? (K K )(x)g dx
Z
1
2
?2
Under a xed alternative we obtain
p
K (x) dx
Z
2
0
w (x)dx 2
D ?! N (0; ) n Tn ? M ? B nh (3)
3
2 1
2 1
where the asymptotic variance is given by
=4
(3.7)
2 1
1 X s=?1
(s)
Z
1 0
f (x)w (x) (x)dx 2
2
Corollary 3.3. Under the assumptions of Theorem 2.1 the asymptotic power of the test which rejects H0 ; whenever p B )>u n h(Tn ? nh ? 3
(3)
1
0
is maximized for the weight function proportional to the density of the Lebesgue measure uniformly with respect to m 62 Up:
A very similar statistic was considered by Azzalini and Bowman (1993) T T Tn = "^ "^"^?T M"^ "^M "^ p p where "^ = ( w(x )^" ; : : : ; w(xn)^"n)T is the vector of (weighted) residuals formed from a weighted LSE t, M = (In ? W )T (In ? W ) and W = (wij )nij is the matrix de ned by the weights (3.5). Roughly speaking, this statistic is obtained from the statistic Tn replacing the original observations by residuals from a parametric t. (4)
1
1
=1
(3)
Theorem 3.4. If the assumptions of Theorem 2.1 are satis ed we have under the null hypothesis of linearity
p
B ?! D n h Tn ? nh N (0; = ) (4)
4
2 0
8
2
where the asymptotic bias is given by
B = 4
P1
s=?1 (s)
Z
2K (0) ?
1
K (z) dz
Z
1
2
?1
0
w(x)dx;
is de ned in (3.6) and is a constant of proportionality given by 2 0
= (0)
Z
1
0
w(x)p(x)dx
Under a xed alternative we obtain
pn T ? M ? B ?! D N (0; = ) n
nh 2 1
(4)
4
2 1
2
where 21 is de ned in (3.7).
Corollary 3.5. Under the assumptions of Theorem 2.1 the asymptotic power of the test which rejects H0 ; whenever p n h(Tn ? B nh ) > u ? = 4
(4)
1
0
is maximized for the weight function proportional to the density of the Lebesgue measure uniformly with respect to m 62 Up:
Remark 3.6. Note that we are not able to derive a result similar to Corollary 3.3 and 3.5 about the optimal weight function for the statistical test proposed by Zheng (1996), because the asymptotic variance under the alternative in Theorem 3.1 is more complicated compared to Theorem 3.2 and 3.4.
We will conclude this section with a brief comparison of the dierent methods based on Tn ? Tn : Calculations similar as those used in the derivation of (2.15) show that the asymptotic power of the test based on Tni is given by (1)
(4)
( )
p u ? ) pi ( n M ? xi p xi xi nh (0)
2 1 (1)
(3.8) where (for j = 0; 1)
1
(1)
i = 1; 3; 4
(
xij = j if i = 1 j if i = 3; 4 ( )
and ; ; ; are de ned in (2.9), (2.13), (3.6) and (3.7), respectively. The application of the Lebesgue measure as optimal weight function makes the dominating term in (3.8) for all methods equal to R (x)f (x)dx 1 p p M (3.9) n = n 4 P1 (s) 2 : s ?1 2 0
2 1
2 0
2 1
2 1
1
2
0
1
=
9
Note that derivation of an optimal weight function for Zheng's (1996) statistic is not possible because of the complicated structure of the limiting variance under xed alternatives [see Theorem 3.1]. In this case the power of the corresponding test is given by p p u ? ? py ) < (pn M ? p u ? ) (3.10) p ( n M ? nh nh nh 2 1
2
R1
2 2
0
1
1
1
1
2 2
0
1
1
1
where y = K (0) (x)w (x)f (x)dx and ; are de ned in (3.3) and (3.4), respectively. The following result shows that the dominating term in (3.10) is smaller than the term in (3.9). Consequently for any weight function a test of linearity based on Tn is (asymptotically) less ecient than procedures based on Tn ; Tn and Tn provided that the Lebesgue measure is used as the optimal weight function in these procedures. 0
2
2
2 0
2 1
(2)
(1)
(3)
(4)
Lemma 3.7. Under the assumptions of Theorem 2.1 it follows 1 X
(s) M 4 s ?1 =
2 2
1
2
=R
R1
Z (x)w (x)f (x)dx (x)f (x)dx f (x)w (x)f(fw)(x) ? PU (fw)(x)g dx 2
2
2
1
0
1 0
2
2
p
2
0
for every weight function w; such that the integrals in this inequality exist.
It follows from (3.8) that for the remaining procedures the power is maximized by minimizing the asymptotic variance under the null hypothesis. Our nal result shows that xi becomes minimal for the test of Gonzalez Manteiga and Vilar Fernandez (1995) and consequently this procedure is asymptotically most powerful among the kernel based methods discussed in this paper. (0)
Lemma 3.8. For any square integrable density K we have Z
(K K ) (x)dx 2
Z
Z
K (x)dx (2K ? K K ) (x)dx 2
2
or equivalently
: 2 0
2 0
2 0
4 Proofs Because all proofs are similar, we restrict ourselves exemplarily to a proof of Theorem 3.1, for which the asymptotics is slightly more complicated. For the sake of a transparent notation we only consider the case w = (here denotes the density of the Lebesgue measure on the interval [0; 1]): Without loss of generality we assume orthonormality of the regression functions g ; : : : ; gp with respect to the density f: Introducing the notation g(x) = (g (x); : : : ; gp(x))T the residuals in (3.2) can be written as (4.1) "^i = "i + (xi ) ? gT (xi)f^n ? g 1
1
0
10
R1
where is the unique minimizer of (m(x) ? gT (x)) f (x)dx: Our rst Lemma speci es the asymptotic behaviour of ^n ? under the null hypothesis and xed alternatives. 0
2
0
0
Lemma A.1. Under the assumptions of Theorem 2.1, w 1 and orthonormal regression functions we have for any m 2 C r [0; 1] 1 n X X p ^ 1 D n( ? ) = p
(s)I ) g(x )" + o (1) ?! N (0; ( )
n
i i
ni
0
p
p
s=?1
=1
where Ip denotes the p p identity matrix.
Proof. Recalling the notation (xi) = (m ? PU m)(xi ) = m(xi ) ? T g(xi) we obtain Yi = 0
p
(xi ) + gT (xi) + "i and 0
pn( ^
n n X pnB ? 1 X 1 g(xi)(xi ) + n g(xi)"ig n ? ) = n fn 1
0
i=1
i=1
where n X 1 Bn = n g(xi)gT (xi) = Ip + O( n1 ) (4.2) i is the design matrix of the LSE =1
n X ^n = Bn? n1 g(xi)Yi: i 1
=1
For the rst term in the sum we note that Z n 1X 1 ) = O( 1 ) g ( x g ( x )( x ) f ( x ) dx + O ( i )(xi ) = ni n n 1
0
=1
where the last estimate follows from the fact that 2 is the unique minimizer of 0
0
Z
1 0
(m(x) ? T g(x)) f (x)dx: 2
Observing (4.2) this establishes the rst equality of Lemma A.1. The asymptotic normality now follows exactly by the same arguments as given by Gonzalez Manteiga and Vilar Fernandez (1995) in the proof of their Theorem 1. Throughout the proof of Theorem 3.1 we make use of the decomposition (4.3)
Tn = V ;n ? 2fV ;n ? V ;n g + fV ;n ? 2V ;n + V ;n g (2)
1
(1) 2
(2) 2
11
(1) 3
(2) 3
(3) 3
2
which is obtained from (4.1) and the notation n X n X 1 K xi ? xj " " V ;n = n(n1? 1) i j h i j ;j 6 i h n n X X 1 K xi ? xj " gT (x )f^ ? g V ;n = n(n2? 1) i j n h i j ;j 6 i h n X n x ? x X 2 1 i j V ;n = n(n ? 1) K h "i(xj ) i j ;j 6 i h n X n x ? x X 1 1 i j T g (xi )f^n ? ggT (xj )f^n ? g K V ;n = n(n ? 1) h i j ;j 6 i h n X n x ? x X 2 1 i j V ;n = n(n ? 1) (xi )gT (xj )f^n ? g K h h i j ;j 6 i n X n X 1 1 K xi ? xj (x )(x ) V ;n = n(n ? 1) i j h i j ;j 6 i h 1
=1
=1 =
=1
=1 =
=1
=1 =
=1
=1 =
=1
=1 =
=1
=1 =
(1) 2
(4.4)
0
(2) 2
(1) 3
0
(2) 3
0
0
(3) 3
Proof of Part a) of Theorem 3.1. Under the hypothesis of linearity 0 we have
V ;n = V ;n = V ;n = 0: The remaining terms are treated essentially in the same way as in Gonzalez Manteiga and Vilar Fernandez (1995) and therefore we only state the main steps here. We have (4.5) V ;n = op( p1 ); V ;n = op( p1 ) n h n h and for the asymptotic bias and variance of V ;n (2) 2
(2) 3
(3) 3
(1) 3
(1) 2
1
(0) E [V ;n] = Knh s
(4.6)
1
1 X
(s) + o p1 n h ?1;s6
=
=0
Z 1 X Z 2 V ar(V ;n) = n h
(r) (4.7) f (x) dx K (z) dz + o( n1h ) ? r ?1 Note that the derivation of (4.6) requires a nite rst moment of the autocovariance function as assumed in (2.6) and the condition nh = ! 1 speci ed in (2.7). These assumptions are necessary but not stated explicitlyp in Gonzalez Manteiga and Vilar Fernandez (1995). Finally, the asymptotic normality of n h(V ;n ? E [V ;n]) follows from a central limit theorem for triangular arrays with m(n) dependent main part [see Niewenhuis (1992)]. 1
2
1
2
1
2
0
=
2
1
2
3 2
1
1
Proof of part b) of Theorem 3.1. The statements given in (4.5) of the previous paragraph
show (4.8)
Tn ? E [V ;n] = 2fV ;n ? V ;n g + V ;n + op( p1n ) (2)
1
(2) 2
(2) 3
12
(3) 3
where V ;n is nonrandom and asymptotically equivalent to n X n n X X 1 1 K (0) (x ) 1 1 x i ? xj V ;n = n(n ? 1) ( x K i )(xj ) ? i h n(n ? 1) i h i j h Z Z 1 K (0) (x)f (x) dx + o p : = (x)f (x) dx ? nh n Combining this estimate with (4.6) and (4.8) yields for the statistic of interest B~ = 2fV ? V g + o ( p1 ) Tn ? M ? nh (4.9) p ;n ;n n For the variance of the dominating term on the right hand side of (4.9) we obtain n x ? x X 1 K i h j (xi)gT (xj )g(xk )"k bn = V ar(V ;n ? V ;n ) = V ar n (n ? 1)h i;j;k n n XX 1 x i ? xj ? n(n ? 1)h K ( h )(xj )"i + o( n1 ) i j (3) 3
(3) 3
2
=1
1
2
=1
=1
1
2
2
0
0
(2)
(2) 3
2
2
2 2
(2) 2
(2) 2
(2) 3
2
=1
=1
=1
p where we used the representation of n(^n ? ) of Lemma A.1. Changing the order of summation yields n n n 1 X X 1 K ( xk ? xj )(xk )gT (xj )g(xi) " (4.10) bn = V ar i ni n(n ? 1)h j;k h n o X 1 x i ? xj ? (n ? 1)h K ( h )(xj ) + o( n1 ) j 0
2
=1
=1
=1
n X n X
n X n X
K ( xk ?h xj )(xk )gT (xj )g(xi)
(r ? i) n(n ? 1)h j k i r n o X ? (n ?1 1)h K ( xi ?h xj )(xj ) j n n n XX K ( xk ?h xj )(xk )gT (xj )g(xr ) n(n ?1 1)h j k n o X 1 ? (n ? 1)h K ( xr ?h xj )(xj ) + o( n1 ) j
= n1
n
1
2
=1
=1
=1
=1
=1
=1
=1
=1
Z 1 n1 Z Z X 1
(s) f (x) h K ( z ?h y )(z)gT (y)g(x)f (y)f (z) dz dy = n s ?1 Z o 1 ? h K ( x ?h y )(y)f (y) dy dx + o( n1 ) 1
=
1
0
0
1
0
1
2
0
Z Z p 1 o nX X 1 g (f )(y)gl(y)f (y) dy ? (f )(x) dx + o( 1 )
(s) f (x) = l (x) n s ?1 n l 1
1
=
0
=1
0
13
2
Summarizing these calculations gives 4nV (V ;n ? V ;n ) = 4 (4.11) nlim !1 (2) 2
(2) 3
1 X
s=?1
(s)
Z
1
f (x)f(f )(x) ? PU (f )(x)g dx = : 2
p
0
2 1
In order to establish asymptotic normality we apply Theorem 2.3 of Nieuwenhuis (1992) to the statistic n X V ;n ? V ;n = Xi;n + o( p1n ) (4.12) i where [note that we have applied Lemma A.1 in the de nition of the Xi;n]: Xi;n = ci;n"i 3
2
=1
Xi;n;m n = ci;n
(4.13)
m (n) X
( )
Xi;n;m n = ci;n ( )
r=0
br ei?r
1 X
r=m(n)+1
br ei?r
and the constants ci;n are de ned by n n X n X 1 1 (4.14) K ( xk ?h xj )(xk )gT (xj )g(xi) ci;n = n n(n ? 1)h j k n o X ? (n ?1 1)h K ( xi ?h xj )(xj ) : j =1
=1
=1
We are now establishing conditions Pn (C1), (C2) and (C2 ) in Theorem 2.3 of Nieuwenhuis (1992) noting that bn = Var( i Xi;n) = =(4n) + o(1=n) by (4.10). We start with the condition (C2) and obtain 2
=1
2 1
j j 1 V ar X Xk;n = O(n) X j?i bn j?i k i k i = +1
j X
(l ? k)ck;ncl;n
l i j j X X
= +1 = +1
1 = O( n1 ) j ? i k i l i j (l ? k)j X j
(s)j(1 ? jsj ) = O( 1 ) n sm n X X = O( n1 ) jbr jjbsj jCov(e ?r; e t?s )j + o( n1 ) t2Z r;s>m n X
X
1
1+
( )
1
1+
( )
X = O( n1 )e jbr j + o( n1 ) = o( n1 ) r>m n 2
2
( )
which gives the corresponding estimate (C2); that is j X Xk;n;m n 1) 1 = o ( max V ar i