asymptotic normality of parametric part in partial linear heteroscedastic ...

2 downloads 0 Views 232KB Size Report
tic, semiparametric model, asymptotic normality. Short title: Heteroscedasticity. 1 INTRODUCTION. Consider the semiparametric partial linear regression model, ...
ASYMPTOTIC NORMALITY OF PARAMETRIC PART IN PARTIAL LINEAR HETEROSCEDASTIC REGRESSION MODELS Hua Liang and Wolfgang Hardle  Abstract

Consider the partial linear heteroscedastic model Yi = XiT + g (Ti) + i ei ; 1  i  n with random variables (Xi ; Ti) and response variables Yi and unknown regression function g (). We assume that the errors are heteroscedastic, i.e., i2 6= const: ei are i.i.d. random error with mean zero and variance 1. In this partial linear heteroscedastic model, we consider the situations that the variances are an unknown smooth function of exogenous variables, or of nonlinear variables Ti, or of the mean response XiT + g (Ti): Under general assumptions, we construct an estimator of the regression parameter vector which is asymptotically equivalent to the weighted least squares estimatores with known variance. In procedure of constructing the estimators, the technique of splitting the samples is adopted.

Key Words and Phrases: Nonparametric estimation, partial linear model, heteroscedastic, semiparametric model, asymptotic normality. Short title: Heteroscedasticity

1 INTRODUCTION Consider the semiparametric partial linear regression model, which is de ned by

Yi = XiT + g(Ti) + "i; i = 1; : : :; n

(1)

Hua Liang is Associate Professor of Statistics, at Institute of Systems Science, Chinese Academy of Sciences, Beijing 100080. Wolfgang Hardle is Professor of Econometrics, at the Institut fur Statistik und O konometrie, Humboldt-Universitat zu Berlin, D-10178 Berlin, Germany. This research was supported by Sonderforschungsbereich 373 \Quanti kation und Simulation O konomischer Prozesse" . The rst author was supported by Alexander von Humboldt Foundation. The authors would like to thank Dr. Ulrike Grassho for her valuable comments. 

1

with Xi = (xi ; : : :; xip)T and Ti 2 [0; 1] random design points, = ( ; : : :; p)T the unknown parameter vector and g an unknown Lipschitz continuous function from [0; 1] to IR . The random errors " ; : : : ; "n are mean zero variables with variance 1. This model was studied by Engle, et al. (1986) under the assumption of constant error variance. More recent work in this semiparametric context dealt with the estimation of at a root-n rate. Chen (1988), Heckman (1986), Robinson (1988) and Speckman (1988) p constructed n?consistent estimates of under various assumptions on the function g and on the distributions of " and (X; T ). Cuzick (1992a) constructed ecient estimates of when the error density is known. The problem was extended later to the case of unknown error distribution by Cuzick (1992b) and Schick (1993). Schick (1996a, b) considered the problem of heteroscedasticity, i.e., of nonconstant error variance, for model (1). He constructed root-n consistent weighted least squares estimates with random weight of the nite-dimensional parameter, and gave an optimal weight function when the variance is known up to a multiplicative constant. His model for nonconstant variance function of Y given (X; T ) assumed that it is some unknown smooth function of an exogenous random vector W , which is unrelated with and g. The present paper focus to uniformly existed approaches in the literature and to extend some of existing results. It is concerned with the cases that i is some function of some independent exogenous variables; for some function of Ti; for some function of XiT + g(Ti): The aim of this paper is to present a uniformly applicable method for estimating the parameter on the regression model (1) with heteroscedastic error, and then to prove that in large samples there is no cost due to estimating the variance function under appropriate conditions. In our analysis it is related to the literature on attention in semiparametric models. Earlier papers are Bickel (1978), Carroll (1982), Carroll and Ruppert (1982) and Muller, et al. (1987). There are mainly two kind of theoretical analysis, that is, the parametric approach, which generally assumed i = H (Xi ; ) or H (XiT ; ) for H being known. [See Box and Hill (1974), Carroll (1982), Carroll and Ruppert (1982), Jobson and Fuller (1980) and Mak (1992)] ; and the nonparametric approach, which assumed i = H (Xi ) or H (XiT ) for H being unknown. [See Carroll and Hardle (1989), Fuller and Rao (1978) and Hall and Carroll (1989)] 1

1

1

1

2

2

2

2

Let f(Yi; Xi; Ti); i = 1; : : : ; ng denote a random sample from

Yi = XiT + g(Ti) + iei; i = 1; : : :; n;

(2)

where Xi; Ti; Ti are the same as these in model (1). ei are i.i.d. with mean 0 and variance 1. i are some functions of other variables, whose speci c form is discussed in later sections. The classic approach works as follows. Assume f(Xi; Ti; Yi); i = 1; : : :; n:g satisfy the model (2). Let fWni(t) = Wni (t; T ; : : : ; Tn); i = 1; : : : ; ng be probability weight functions depending only on the design points T ; : : : ; Tn: Since g(Ti) = E (Yi ? XiT ). Let be the "true" value, and then suppose 2

1

1

gn (t) =

n

X

j =1

Wnj (t)(Yj ? XjT )

Replace now g(Ti) by gn(Ti) in model (2), we then obtain least squares estimator of

LS = (XT X)? XT Y f

1f

f

(3)

f

where XT = (X ; : : :; Xn ); Xi = Xi ? nj Wnj (Ti)Xj ; Y = (Y ; : : :; Yn )T ; Yi = Yi ? n W (T )Y : nj i j j When the errors are heteroscedastic, LS is modi ed to a weighted least squares estimator f

f 1

f

P

f

f

=1

e 1

e

e

P

=1

n

X

W =

i=1

iXi XiT f f

?

n

 1 X

i=1

iXi Yi f e



(4)

for some weight i i = 1; : : : ; n. In our model (2) we take i = 1=i : In principle the weights i (or i )are unknown and must be estimated. Suppose f i; i = 1; : : : ; ng be a sequence of estimators of . Naturally one can take W given in (4) by substituting by i as our estimator of : In order to develop the asymptotic theory, we use the idea of splitting of sample. Let kn be the integer part of n=2. i and i are the estimators of i based on the rst kn sample (X ; T ; Y ); : : : ; (Xk ; Tk ; Yk ); and the later n ? kn samples (Xk ; Tk ; Yk ), : : : ; (Xn ; Tn; Yn ); respectively. De ne 2

2

b

b

b

1

1

1

nW =

n

n

X

i=1

(1)

n

i XiXiT b f f

b

(2)

n +1

n

?

k

n  1 X

i=1

i Xi Yi + b

(2) f e

n

X

i=kn +1

i XiYi b

(1) f e



n +1

n +1

(5)

as the estimator of : The next step is to establish our conclusion, that is, to prove that nW is asymptotic normal. We intend to prove W is asymptotic normal, and then prove pn( nW ? W ) 3

converges to zero in probability. Some notations are introduced. " = (" ; : : :; "n )T ; " = (" ; : : :; "n)T ; "i = "i ? nj Wnj (Ti)"j ; gni = g(Ti) ? nk Wnk (Ti)g(Tk ); G = (g(T ) ? gn (T ); : : :; g(Tn ) ? gn (Tn))T ; hj (t) = E (xij jTi = t), uij = xij ? hj (Ti) for i = 1; : : : ; n and j = 1; : : :; p. We will use the following assumptions. Assumption 1. sup t E (kX k jT = t) < 1 and limn!1 1=n ni iuiuTi = B and B is a positive de nite matrix. Where ui = (ui ; : : :; uip )T : Assumption 2. g() and hj () are all Lipschitz continuous of order 1. Assumption 3. Weight functions Wni() satisfy the following: e

1

e1

b

e

P

e

P

=1

b

=1

1

b

1

0

P

3

1

1

=1

1

(i) max in 1

n

X

j =1

Wni (Tj ) = O(1); a:s:

(ii) max W (T ) = O(bn ); a:s: bn = n? = ; i;j n ni j 2 3

1

(iii) max in 1

n

X

j =1

Wnj (Ti)I (jTj ? Tij > cn) = O(cn ); a:s: cn = n? = log? n: 1 2

1

Assumption 4. There exist constants C and C such that 1

2

0 < C  min

 max

C   1

n

X

n

n X

P

i=1

j =1



aji(Vj0 ? EVj0) > C 

1

o

1

? s log n C n  2n exp ? 2 n a EV + 2n?p1 =r?s log n ji j j  2n exp(?C C log n)  Cn? = for some large C > 0: 

P

2

=1

2

2

1

2



+1

2 1

3 2

1

The last second inequality from n

X

j =1

ajiEVj  sup jajij 2

2

By Borel-Cantelli Lemma max in

1

n

X

j =1

n

X

j =1

ajiEVj = n?p1 2

and n?p1

p

+ 2

=r?s log n  n?p1 +p2 :

+1



Wnj (Ti)(Vj0 ? EVj0) = O(n?s log n) a:s:

(7)

Let 1  p < 2, 1=p + 1=q = 1 such that 1=q < (p + p )=2 ? 1=r. By Holder's inequality max in

1

n

X

j =1

1



aji(Vj00 ? EVj00)  max in 1

2

n

X

j =1

 Cn? p q? ( 1

6

jajijq =q

1)

n =q X

1

n

X

j =1

j =1

jVj00 ? EVj00jp

jVj00 ? EVj00jp

=p

1

=p

1

(8)

Observe that 1 n jV 00 ? EV 00jp ? E jV 00 ? EV 00jp ! 0 j j j j nj 

X

a:s:

=1

and E jVj00jp  E jVj jr n?

p=r , and

1+

n

X

j =1

(9)

then

E jVj00 ? EVj00jp  Cnp=r a:s:

(10)

Combining (8), (9) with (10), we obtain max in

1

n

X

k=1



aki(Vk00 ? EVk00)  Cn?p1

=q+1=r

+1

= o(n?s ) a:s:

(11)

Lemma 2.3 follows from (7) and (11) directly. Let r = 3; Vk = ek or ukl, aji = Wnj (Ti); p = and p = 0: We obtain the following formulas, which will play critical roles in the process of proving the theorems. 1

n

X

max in

k=1



2 3

2

Wnk (Ti)ek = O(n? = log n) a:s:

(12)

1 3

and n

X

max in

k=1



Wnk (Ti)ukl = O(n? = log n) for l = 1; : : : ; p a:s: 1 3

3 VARIANCE IS A FUNCTION OF OTHER RANDOM VARIABLES This section is devoted to the nonparametric heteroscedasticity structure

i = H (Wi); H unknown Lipschitz continuous 2

where fWi; i = 1; : : : ; ng are also design points, which are assumed to be independent of ei and (Xi; Ti) and de ned on [0; 1]. De ne

Hn (w) = c

n

X

j =1

Wnj (w)(Yj ? XjT LS ? gn (Ti)) f

b

2

as the estimator of H (w). Where fWnj (t); i = 1; : : :; ng is a sequence of weight functions satisfying also the same assumptions on fWnj (t); j = 1; : : : ; ng: f

7

Theorem 3.1. Under our assumptions, sup jHn (Wi) ? H (Wi)j = oP (n? = log n) in Proof. Set "i = iei for i = 1; : : : ; n. Note that n Wnj (Wi)(Yj ? XjT LS ) Hn(Wi ) = 1 3

c

1

X

c

=

j =1 n X

j =1

f

e

2

f

Wnj (Wi)fXjT ( ? LS ) + g(Ti) + "ig f

f

= ( ? LS )T

n

X

e

2

e

Wnj (Wi)XjT ( ? LS ) + f

f

n

X

Wnj (Wi)g (Ti) + f

2 e

n

X

Wnj (Wi)"i f

j =1 j =1 j =1 n n X X f fT e f fT e W +2( ? LS ) W nj (Wi )X nj (Wi )X j "i j g (Ti) + 2( ? LS ) j =1 j =1 n X f +2 W nj (Wi )ge(Ti)"ei j =1 f f fT Since Pnj=1 X nj (Wi )  Cn?2=3, j Xj is a symmetric matrix, and 0 < W n X ?2=3gX f f fT fW nj (Wi ) ? Cn j Xj j =1 is a p  p nonpositive matrix. Recall that LS ? = O(n?1=2). These arguments

2 e

(13)

mean the rst term of (13) is OP (n? = ): The second term of (13) is easily shown to be order OP (n = cn): Now we want to show that 2 3

1 3 2

sup

n

X f

i j =1



Wnj (Wi)"i ? H (Wi ) = OP (n? = log n) 2 e

(14)

1 3

This is equivalent to prove the following three items n

X f

sup

i j =1 n X

Wnj (Wi)

n

nX

k=1

Wnk (Tj )"k

o2

= OP (n? = log n)



sup



Wnj (Wi)"i ? H (Wi) = OP (n? = log n)

sup



Wnj (Wi)"j

i j =1 n X

i j =1

f

2

f

n

nX

k=1

(15)

1 3

(16)

1 3

o

Wnk (Tj )"k = OP (n? = log n)

(17)

1 3

(12) assures that (15) holds. Lipschitz continuity of H () and assumptions on Wnj () entails (16), i.e., f

sup

n

X f

i j =1



Wnj (Wi)"i ? H (Wi ) = OP (n? = log n) 2

8

1 3

(18)

whose proof is similar as that of Lemma 2.1. By taking aki = Wnk (Wi)H (Wk ) and Vk = ek ? 1 and r = 2 and p = 2=3 and p = 0 in Lemma 2.3, we have 2

f

sup

n

X f

i j =1

1



2

Wnj (Wi)H (Wj )(ej ? 1) = OP (n? = log n) 2

(19)

1 3

A combination of (19) and (18) means (15). (16) and (15) and Cauchy-Schwarz inequality imply (17). Thus we proved (14). The later three terms of (13) are all of order oP (n? = log n) by Cauchy-Schwarz inequality and the conclusions for the rst three terms of (13). Thus we complete the proof of Theorem 3.1. 1 3

4 VARIANCE IS A FUNCTION DESIGN Ti In this section we consider the case in which we suppose the variance i is a function of the design points Ti, i.e. 2

i = H (Ti) H unknown Lipschitz continuous 2

Similar as in section 3, we de ne our estimator of H () as

Hn (t) = c

n

X

j =1

Wnj (t)fYj ? XjT LS ? gn (Ti)g f

b

2

Theorem 4.1. Under our assumptions, sup jHn (Ti) ? H (Ti)j = oP (n? = log n) in

1 3

c

1

Proof. The proof of Theorem 4.1 is similar to that of Thorem 3.1 and is omitted.

5 VARIANCE IS A FUNCTION OF THE MEAN Here we consider the model (2) with

i = H fXiT + g(Ti)g; H unknown Lipschitz continuous 2

which means that the variance is a unknown function of mean response. A related situations in linear and nonlinear models are discussed by Carroll (1982), Box and Hill (1974), Bickel 9

(1978), Jobson and Fuller (1980) and Carroll and Ruppert (1982). Engle et al. (1986), Green et al. (1985) and Wahba (1984) and others studied the estimator for the regession function X T + g(T ). Since H () is assumed completely unknown, the standard method is to get information about H () by replication, i.e., we consider the following "improved" partial linear heteroscedastic model

Yij = XiT + g(Ti) + ieij ;

j = 1; : : : ; mi; i = 1; : : :; n

Here Yij is the response of the j th replicate at the design point (Xi; Ti), eij are i.i.d. with mean 0 and variance 1, , g() and (Xi; Ti) are the same as that in model (2). We will borrow the idea of Fuller and Rao (1978) for linear heteroscedastic model to construct an estimate of i . That is, to compute predicted value XiT LS + gn (Ti) by t least squares estimate LS and nonparametric estimate gn (Ti) to the data, and residuals Yij ? fXiT LS + gn (Ti)g and estimate m 1 i = m [Yij ? fXiT LS + gn (Ti)g] : (20) ij When each mi stays bounded, Fuller and Rao (1978) concluded that the weighted estimate based on (20) and the weighted least squares estimates based on the true weights have di erent limiting distributions results from the fact that i do not converge in probability to the true i . Theorem 5.1. Let mi = ann q = m(n) for some sequence an converging to in nite. Under our assumptions, 2

b

b

b

b

i X

2

2

b

=1

b

2

2

2

def

sup ji ? H fXiT + g(Ti)gj = oP (n?q ) q  1=4 in

b

2

1

Proof. We only outline the proof of the theorem. In fact

m

ji ? H fXiT + g(Ti)gj  3fXiT ( ? LS )g + 3fg(Ti) ? gn(Ti)g + m3 i (eij ? 1) ij b

2

2

i X

2

2

2

=1

The rst two items are obviously oP (n?q ): Since eij are i.i.d. with mean zero and variance 1, after taking mi = ann q ; mj (eij ? 1) is equivalent to jm n (e j ? 1): Using the law of the iterated logarithm and the boundedness of H () one know that 1 m  (e ? 1) = Ofm(n)? = log m(n)g = o (n?q ) P mi j i ij Thus we derive the proof of Theorem 5.1. 2

i X

2

P

i

=1

P

2

2

1 2

=1

10

( ) =1

2 1

6 SIMULATION We present a small simulation study to explain the behaviour of the previous results. We took the following model with di erent variance functions.

Yi = XiT + g(Ti) + i"i;

i = 1; : : : ; n = 300

Here f"ig are standard normal random variables, fXig and fTig are both of uniform random variables on [0; 1]: = (1; 0:75)T and g(t) = sin(t). The simulation number for each situation is 500: Three models for the variance functions are considered. LSE and WLSE represent the least squares estimator and the weighted least squares estimator given in (3) and (5), respectively.

 Model 1: i = Ti ; 2

2

 Model 2: i = Wi ; where Wi are i.i.d. uniformly distributed random variables. 3

2

 Model 3: i = a exp[a fXiT + g(Ti)g ]; where (a ; a ) = (1=4; 1=3200): This model 2

1

2

2

1

2

is mentioned by Carroll (1982) without the item g(Ti).

TABEL 1: Simulation results (10?3)

Estimator Variance Model LSE 1 WLSE 1 LSE 2 WLSE 2 LSE 3 WLSE 3

=1 Bias MSE 8.696 8.7291 4.230 2.2592 12.882 7.2312 5.676 1.9235 5.9 4.351 1.87 1.762 0

= 0:75 Bias MSE 23.401 9.1567 1.93 2.0011 5.595 8.4213 0.357 1.3241 18.83 8.521 3.94 2.642 1

From tabel 1, one can nd that our estimator (WLSE) is better than LSE in the sense of both bias and MSE for above each model. By the way, we also mention the behaviour of the estimate for nonparametric part, that is n

X

i=1

!ni (t)(Yi ? XiT nW ) e

f

11

0

g(T) and its estimate values 0.5

1

Simulation comparation

0

0.5 T

1

Figure 1: Estimates of the function g(T ) for the rst model

!ni () are other weight functions which also satisfy the Assumption 3. In procedure of simulations, we take Nadaraya-Watson weight function with quartic kernel (15=16)(1 ? u ) I (juj  1) and use Cross-Validation criterion to select bandwidth. Figures 1,2,3 are devoted to the simulation results of the nonparametric parts for the models 1, 2, 3, respectively. In the following gures, solid-lines for real values and dished-lines for our estimate values. The gures indicate that our estimators for nonparametric part perform also well except the neighbourhoods of the points 0 and 1. 2 2

7 PROOFS OF THEOREMS First two notations are introduced.

An = b

n

X

i=1

i XiXiT ; An = b f f

n

X

i=1

iXi XiT f f

For any matrix S , s(j; l) denotes the (j; l)-th element of S . Proof of Theorem 1. It follows from the de nition of W that

W ? = A?n

1

n

nX

i=1

i Xig(Ti) + fe

12

n

X

i=1

i Xi"i fe

o

0

g(T) and its estimate values 0.5

1

Simulation comparation

0

0.5 T

1

Figure 2: Estimates of the function g(T ) for the second model

0

g(T) and its estimate values 0.5

1

Simulation comparation

0

0.5 T

1

Figure 3: Estimates of the function g(T ) for the third model 13

We will complete the proof by the following three steps, for j = 1; : : : ; p; (i) H j = 1=pn 1

P

p

n xe ge(T ) = o (1); i P i=1 i ij

(ii) H j = 1= n

P

(iii) H = 1= n

P

2

p

3

n xe i=1 i ij

n P

n W (T )e k=1 nk i k

n X L f i=1 i i ei ?!

o

= oP (1);

N (0; B ? B B ? ): 1

1

1

The proof of (i) is mainly based on lemmas 2.1 and 2.3. Denote hnij = hj (Ti) ? n W (T )h (T ): Note nk i j k k

P

=1

pnH = j 1

n

X

i=1

iuij gni +

n

X

i=1

i hnij gni ?

n

X

i=1

i

n

X

q=1

Wnq (Ti)uqj gni

(21)

In Lemma 2.3 we take r = 2; Vk = ukl, aji = gnj ; < p < and p = 1 ? p : Then the rst term of (21) is 1 4

OP (n?

?

2p1 1 2

1 3

1

2

1

) = oP (n = ) 1 2

The second term of (21) can be easily shown to be order OP (ncn) by using Lemma 2.1. The third term of (21) can be handled by using Abel's inequality and lemmas 2.1 and 2.3. Hence 2

n n

X X

i=1 q=1



n

X

jg j max

iWnq (Ti)uqj gni  C n max in ni in 2

q=1



Wnq (Ti)uqj = O(n = cn log n): 2 3

Thus we complete the proof of (i). p We now show (ii), i.e., nH j ! 0: Notice that 2

pnH

j

2

n

=

X

=

X

i=1 n

i=1

n

i

nX

i

nX

k=1 n

o

xkj Wni (Tk ) ei e

o

ukj Wni (Tk ) ei +

n

X

i

n

nX

o

hnkj Wni (Tk ) ei

i=1 k=1 k=1 n n nX n hX o i uqj Wnq (Tk ) Wni (Tk ) ei ? i i=1 k=1 q=1 X

(22)

?1

The order of the rst term of (22) is O(n? 12 log n) by letting r = 2; Vk = ek , ali = n u W (T ); and < p < and p = 1 ? p in Lemma 2.3. kj ni k k It follows from Lemma 2.1 and (12) that the second term of (22) is bounded by 2p

P

1 4

=1

n

X

i=1

i

n

nX

k=1

1

o



1 3

2

n

X

hnkj Wni (Tk ) ei  n max kn

i=1

1



jh j = O(n = cn log n) a:s: (23) Wni (Tk )ei max j;kn nkj 14

2 3

The same argument as that for (23) yields that the thrid term of (22) cab be dealt with as n

X

i=1

i

n

n

h X nX

k=1 q=1

o



i

n

X

uqj Wnq (Tk ) Wni(Tk ) ei  n max kn

i=1



k

X

Wni (Tk )ei max kn

q=1



uqj Wnq (Tj )

= O(n log n) = o(n = ) a:s: 1 3

2

(24)

1 2

A combination (22){(24) entails (ii). Finally the central limit theorem and Lemma 2.2 derive that n p1n i Xiei ?!L N (0; B ); i X

f

1

=1

which and the fact that An ! B imply that n p1n A?n i Xiei ?!L N (0; B ? B B ? ): i X

1

1

f

1

1

=1

This completes the proof of Theorem 1. Proof of Theorem 2. In order to complete the proof of Theorem 2, we prove

pn( ? ) = o (1) nW W P

First we state a fact, whose proof is immediately derived by (6) and Lemma 2.2, 1 ja (j; l) ? a (j; l)j = o (n?q ) n P n n for j; l = 1; : : : ; p. This will be used later repeatly. It follows that n nW ? W = 21 A?n (An ? An)A?n iXi g(Ti) i b

n

1

k

b

+A?n

n X

b 1

+A?n

b 1

fe

( i ? i )Xi g(Ti) + A?n (An ? An)A?n

+A?n

i=1 k

X =1

n X

b 1

b 1

b

(2) f e

b 1

( i ? i )Xi ei + A?n

i=1 n X

b

(2) f e

b 1

i=kn +1

( i ? i )Xi ei b

(1) f e

n

n

X

i=1

e



e

15

n

X

i=1

i Xiei fe

( i ? i )Xi g(Ti) b

i=kn +1

o

p

i xij g(Ti)  C n max jg(Ti)j in e

b 1

b

(1) f e

(26)

By Cauchy-Schwarz inequality, for any j = 1; : : :; p, X

(25)

n

X

i=1

xij e

2

=

1 2

This is oP (n = ) by lemmas 2.1 and 2.2. Thus each element of the rst term of (26) is oP (n? = ) by watching the fact that each element of A?n (An ? An)A?n is n? = . The similar argument shows that each element of the second and fth terms is also oP (n? = ). Recall that the proofs for H j = oP (1) and H converges to normal distribution, we conclude that the third term of (26) is also oP (n? = ). Thus we see that the dicult problem is to show that the fourth and the last terms of (26) are both oP (n? = ). Since their proofs are the same, we only show that, for j = 1; : : : ; p, 3 4

1 2

1

b 1

b

5 4

1 2

2

3

1 2

1 2

n

A?n

b 1

k

i=1

o

( i ? i )Xi ei j = oP (n? = )

n X

b

(2) f e

1 2

or equivalently k

( i ? i )xij ei = oP (n = )

n X

i=1

b

(2)

e

(27)

1 2

e

Let fng be a sequence numbers converge to zero but satisfy n > n? = . Then for any  > 0 and j = 1; : : : ; p; 1 4

P

k

n nX

i=1

o

n

o

j ? i j  n ! 0 ( i ? i )xij eiI (j i ? i j  n) > n =  P max in i b

(2)

e

b

(2)

1 2

b

(2)

(28)

The last step is due to (6). Next we shall deal with the term

P

k

n nX

i=1

( i ? i )xij eiI (j i ? i j  n) > n = b

(2)

e

b

(2)

1 2

o

by Chebyshv's inequality. Since i is independent of ei for i = 1; : : :; kn , we can easily calculate b

E

(2)

k

n nX

i=1

( i ? i )xij ei b

(2)

e

o2

This is why we use splitting techniqne to estimate i by i and i . In fact, b

P

k

n nX

i=1

(2)

b

( i ? i )xij eiI (j i ? i j  n) > n = b



(2)

e

b

(2)

1 2

(1)

o

kn E f( ? b (2) )I (j ? b (2)j   )g2E kX f 2 i i n i k Ee2i i=1 i i

P

n

n n !0  C kn 2

2

(29)

2

16

Thus, by (28) and (29), k

( i ? i )xij ei = oP (n = )

n X

i=1

Finally k

n X

i=1

( i ? i ) xij b

(2)

e

b

n

nX

k=1

 pn

(2)

1 2

e

Wnk (Ti)ek k

n X

i=1

Xij f2

=

o

1 2

b

(2)

n

X

max j ? i j max in i in

k=1

Wnk (Ti)ek



This is oP (n = ) by using (25) and (12) and Lemma 2.2, which and (29) entail (27). We complete the proof of Theorem 2. 1 2

REFERENCES Bickel, P.J. (1978). Using residuals robustly I: Tests for heteroscedasticity, nonlinearity. Annals of Statistics, 6 266-291. Bickel, Peter J., Klaasen, Chris A.J., Ritov, Ya'acov and Wellner, Jon A. (1993). Ecient and adaptive estimation for semiparametric models. The Johns Hopkins University Press. Box,G.E.P. and Hill, W.J. (1974). Correcting inhomogeneity of variance with power transformation weighting. Technometrics 16, 385-389. Carroll, R.J. and Hardle, W. (1989). Second order e ects in semiparametric weighted least squares regression. Statistics 2, 179-186. Carroll, R.J. (1982). Adapting for heteroscedasticity in linear models, Annals of Statistics, 10, 1224-1233. Carroll, R.J.and Ruppert, D. (1982). Robust estimation in heteroscedasticity linear models, Annals of Statistics, 10, 429-441. Chen, H.(1988). Convergence rates for parametric components in a partly linear model. Annals of Statistics, 16, 136-146. Cuzick, J. (1992a). Semiparametric additive regression. Journal of the Royal Statistical Society, Series B, 54, 831-843. Cuzick, J. (1992b). Ecient estimates in semiparametric additive regression models with unknown error distribution. Annals of Statistics, 20, 1129-1136. Engle, R. F., Granger, C.W.J., Rice,J. and Weiss, A. (1986). Semiparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association, 81, 310-320. 17

Fuller, W.A. and Rao, J.N.K. (1978). Estimation for a linear regression model with unknown diagonal covariance matrix. Annals of Statistics, 6, 1149-1158. Green, P., Jennison, C. and Seheult, A. (1985). Analysis of eld experiments by least squares smoothing. Journal of the Royal Statistical Society, Series B, 47, 299-315. Hall, P. and Carroll, R.J. (1989). Variance function estimation in regression: the e ect of estimating the mean. Journal of the Royal Statistical Society, Series B, 51,3-14. Heckman, N.E.(1986). Spline smoothing in partly linear models. Journal of the Royal Statistical Society, Series B, 48, 244-248. Jobson, J.D. and Fuller, W.A. (1980). Least squares estimation when covariance matrix and parameter vector are functionally related. Journal of the American Statistical Association, 75, 176-181. Mak, T. K. (1992). Estimation of parameters in heteroscedastic linear models. Journal of the Royal Statistical Society, Series B, 54, 648-655. Muller, H. G. and Stadtmuller, U. (1987). Estimation of heteroscedasticity in regression analysis. Annals of Statistics, 15, 610-625. Robinson, P.W. (1987). Asymptotically ecient estimation in the presence of heteroscedasticity of unknown form. Econometrica, 55, 875-891. Robinson, P.M. (1988). Root-N-consistent semiparametric regression. Econometrica, 56, 931-954. Schick, A. (1987). A note on the construction of asymptotically linear estimators. Journal of Statistical Planning & Inference, 16, 89-105. Correction 22, (1989) 269-270. Schick, A. (1996a). Weighted least squares estimates in partly linear regression models. Statistics & Probability Letters, 27, 281-287. Schick, A. (1996b). Ecient estimates in linear and nonlinear regression with heteroscedastic errors. To appear by Journal of Statistical Planning & Inference. Speckman, P. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society, Series B, 50, 413-436. Wahba, G. (1984). Partial spline models for the semi-parametric estimation of several variables. In Statistical Analysis of Time Series, 319-329.

18

Suggest Documents