Efficient estimation for semivarying-coefficient models

3 downloads 0 Views 999KB Size Report
bias of the parameter estimator is of order h3 when a symmetric kernel is used, ... Some key words: Efficient estimator; Local linear; Semivarying-coefficient ...
Biometrika (2004), 91, 3, pp. 661–681 © 2004 Biometrika Trust Printed in Great Britain

Efficient estimation for semivarying-coefficient models B YINGCUN XIA Department of Statistics and Applied Probability, National University of Singapore, Singapore [email protected]

WENYANG ZHANG Institute of Mathematics & Statistics, University of Kent, Canterbury, Kent CT 2 7NF, U.K. [email protected]

 HOWELL TONG Department of Statistics & Actuarial Science, T he University of Hong Kong, Pokfulam Road, Hong Kong [email protected]

S Motivated by two practical problems, we propose a new procedure for estimating a semivarying-coefficient model. Asymptotic properties are established which show that the bias of the parameter estimator is of order h3 when a symmetric kernel is used, where h is the bandwidth, and the variance is of order n−1 and efficient in the semiparametric sense. Undersmoothing is unnecessary for the root-n consistency of the estimators. Therefore, commonly used bandwidth selection methods can be employed. A model selection method is also developed. Simulations demonstrate how the proposed method works. Some insights are obtained into the two motivating problems by using the proposed models. Some key words: Efficient estimator; Local linear; Semivarying-coefficient model; Strong a-mixing; Varyingcoefficient model.

1. I 1·1. Preamble A number of recent modelling approaches relax the conditions imposed on traditional parametric models and explore the hidden structure. Examples include additive models (Breiman & Friedman, 1985; Hastie & Tibshirani, 1990), varying-coefficient models (Hastie & Tibshirani, 1993; Fan & Zhang, 1999), low-dimensional interaction models (Friedman, 1991; Gu & Wahba, 1993; Stone et al., 1997), partially linear models (Wahba, 1984; Green & Silverman, 1994) and their hybrids (Carroll et al., 1997; Fan et al., 1998; Heckman et al., 1998), among others. Varying-coefficient models are particularly appealing in longitudinal studies where they allow us to examine the extent to which covariates affect responses over time (Hoover et al., 1998; Fan & J. T. Zhang, 2000). For nonlinear time series applications, Chen &

662

Y. X, W. Z  H. T

Tsay (1993) and Xia & Li (1999) develop functional-coefficient autoregressive models. For applications in economics, see Durlauf (2001). A remaining question of interest is what to do when some of the coefficients of the varying-coefficient model are not really varying. 1·2. T wo motivating practical problems and one statistical model Our first practical problem is about the relationship of human health to air pollution and weather conditions; see for example Smith et al. (1999) and Xia et al. (2002). It is relevant to study how weather conditions affect human health, directly or indirectly, via pollutants. More specifically, since patients suffering from circulatory and respiratory illnesses react differently under different weather conditions, it is relevant to investigate what pollutant levels and what weather conditions would combine to aggravate the human circulatory and respiratory problems to the extent that hospital admission is necessary. In our example, the pollutant data and the weather data are the daily average levels of sulphur dioxide, x in mgm−3, nitrogen dioxide, x in mgm−3, respirable suspended t1 t2 particulates, x in mgm−3, ozone, x in mgm−3, relative humidity, U in %, and tempert3 t4 t ature, V in °C. The data were collected daily in Hong Kong from 1 January 1994 t to 31 December 1995; see Fig. 1(a). Since additional hospital beds were released to

Fig. 1. (a) The lower trace gives the daily hospital admissions of patients suffering from circulatory and respiratory problems in Hong Kong; the upper trace gives the daily humidity, rescaled for easier visualisation. (b) The lower trace gives monthly measles cases in New York City. The upper trace gives the birth rate, rescaled for easier visualisation.

Semivarying-coeYcient models

663

accommodate circulatory and respiratory patients at the beginning of 1995, there is an increasing trend in the number of admissions. We use y to denote the daily hospital t admissions with the trend removed; see Xia et al. (2002) for more details. Now we consider modelling the time series y . First, there is strong autocorrelation in y , which we remove t t using autoregression with lag q. Secondly, there is a day-of-the-week effect, presumably due to the hospital booking system. We use dummy variables D (l=1, 2, . . . , 6) to tl denote the day-of-the-week effect, where D =1 if day t is the lth day of the week and tl D =0 otherwise. One possible approach is to start with a linear approximation for the tl effects of pollutants on y , noting that it is not clear how a weather factor, such as the t humidity U , affects y . It might affect y directly by f (U ) or indirectly via the pollutants, t t t 0 t for example, as f (U )x (k=1, 2, 3, 4). Taking these factors into account, we may consider k t tk the tentative model q 6 4 y = ∑ b y + ∑ c D +f (U )+ ∑ f (U )x +e . (1·1) t k t−k l tl 0 t j t tj t k=1 l=1 j=1 In model (1·1), the pollutants affect y linearly, but their coefficients depend on the weather t condition. Our second practical problem is about the transmission mechanism of epidemics. The susceptible-exposed-infectious-recovered mechanism describes an epidemic in four discrete states. Infectious individuals spread the disease to the susceptible population. Those in the susceptible population to which the disease is transmitted become exposed and after a period of time, the incubation period, these individuals become infectious. Individuals remain infectious for a period of time, the infectious period, and then these individuals recover. It is assumed that the recovered individuals are considered to be permanently immune to the disease, as in the case of measles or whooping cough. In the susceptible-infectious-recovered mechanism, which is a simplified version, the rate of transmission is strictly proportional to the product of the density of susceptibles, S, and the number of infectious hosts, I. It follows that dI/dt3SI. Liu & Levin (1986) showed that adding exponents to the S and I, dI/dt3ScIa, can describe the observed data better. The mixing parameters of transmission, a and c, give a phenomenological model for local spatial variation in the contact rate between the infectious hosts and susceptibles. Following the discrete time series model of Bailey (1975, p. 156), we have I =R Sc Ia e , S =S +B −I +u , t t t−1 t−1 t t t−1 t−1 t−1 t

(1·2)

where B is the number of births at time t. The basic reproductive rate R is essentially t t the average number of successful offspring that a parasite is intrinsically capable of producing; see Anderson & May (1991, pp. 17, 138). In most cases, the basic reproductive rate is a function of the seasonality. For monthly data let D (k=1, . . . , 12) be dummy tk variables such that D =1 if the corresponding month is the kth month of a year and tk D =0 otherwise. Then R =exp (W 12 c D ). tk t k=1 k tk Figure 1(b) shows the monthly observed cases of measles in New York City. The dynamics displays biennial cycles after 1945 and 3- or 4-year cycles before 1945. This change seems to synchronise with the change with the birth rate. With fixed parameters c (k=1, 2, . . . , 12), a and c, it is very hard for model (1·2) to describe the dynamics in k New York. An immediate question is how the birth rate changes the mixing parameters, i.e. which parameters will change with the birth rate. If we take logarithms on both sides

664

Y. X, W. Z  H. T

of the first equation in (1·2), we have the model 12 y = ∑ c D +c(b ) log (S )+a(b )y +e , (1·3) t k tk t−1 t−1 t−1 t−1 t k=1 where y =log (I ) and e =log (e ). The statistical task is to determine whether or not t t t t c(b ) and a(b ) are constants. t−1 t−1 The above problems suggest a common model, p y= ∑ a (U)x +ZTb+e, j j j=1 E(e|U, X, Z)=0, var (e|U, X, Z)=s2(U),

(1·4)

where X=(x , . . . , x )T. This is a semivarying-coefficient regression model, where b is a 1 p q-dimensional unknown vector and a (.) ( j=1, . . . , p) are unknown functions to be j estimated. Note that, if in model (1·4) we set b=0, (1·4) becomes a standard varying-coefficient model. If a (.)=0 ( j=2, . . . , p) and x =1, (1·4) becomes a standard semiparametric j 1 model. It would be naive to treat (1·4) as a special case of the varying-coefficient models because the information provided by the semiparametric structure would not be used and, as a result, we would pay an unnecessary price on the variance of the estimator. A sensible way is to develop a new estimation procedure for (1·4). Although there is substantial literature about the estimation procedure for semiparametric models (Speckman, 1988; Wahba, 1984; Green & Silverman, 1994, Ch. 4), as far as we know, all existing methods require undersmoothing in order that the estimator of b achieve the convergence rate of n−D, because the bias of the estimator is typically O(h2). Based on local linear modelling (Fan & Gijbels, 1996, Ch. 3), we propose a new estimation procedure for (1·4), in which undersmoothing is not necessary, the bias of the proposed estimator for b is O(h3) and the variance is O(n−1). Theoretically, the proposed method achieves an important objective in semiparametrics, namely to estimate the parameters and the nonparametric functions of the same model at their optimal consistency rates simultaneously, i.e. using a common bandwidth (Ha¨rdle et al., 1993; Carroll et al., 1997). With no need to undersmooth, we can employ commonly used bandwidth selection methods in our estimation procedure even for the estimation of the parameters in practice. The  (Akaike, 1973) and crossvalidation (Stone, 1977) model selection methods are discussed and compared. On the basis of the discussion and simulations, we prefer crossvalidation. However, crossvalidation is computationally intensive, especially when the number of covariates is large, so we propose a related method that simplifies the calculations. To address possible overfitting (Stone, 1977) we put a slightly heavier penalty on model complexity. The method is shown to behave appropriately asymptotically.

2. M 2·1. Notation and motivation Let 1˘ be the n-dimensional vector with all entries 1, and let 0 be the q×p matrix q×p with all entries 0. Let (U , XT , ZT , Y ) (i=1, . . . , n) denote samples from (U, XT , ZT , y). i i i i

Semivarying-coeYcient models

665

Also let X =(x , . . . , x )T, b =(b , . . . , b )T, c =(c , . . . , c )T (i=1, . . . , n), i i1 ip i i1 ip i i1 ip ˘ =(X , . . . , X )T, a(U)=(a (U), . . . , a (U))T, Y=(Y , . . . , Y )T, X 1 p 1 n 1 n

P P

˘ =diag (U , . . . , U ), Z ˘ =(Z , . . . , Z )T, Z*=1˘ EZ ˘, n = Y˘ =1˘ EY, U 1 n 1 n k W=diag (W , . . . , W ), W =diag {K (U −U ), . . . , K (U −U )}, m = 1 n i h 1 i h n i i

ukK(u)du,

uiK2(u)du.

Throughout this paper, we assume that (U , XT , ZT , Y ) (i=1, 2, . . .) is a strongly mixing i i i i process and that s2(u)¬s2. The proposed methodology and the asymptotic properties still apply for the case where s2(u) is not constant but at the expense of more complex notation. Traditionally, the estimation procedure for b goes as follows. Using a Taylor’s expansion, we have a (U)ja (u)+a∞ (u)(U−u) ( j=1, . . . , p), j j j which leads to the local least squares estimation procedure, in which we minimise n ∑ [Y −ZT b−{b+c(U −u)}TX ]2K (U −u) (2·1) i i i i h i i=1 with respect to (b, c) to obtain (bA , cA ). Let aA (u) be bA and let u range over U ( j=1, . . . , n). j Then we have ˘,U ˘X ˘ −U X ˘ )TW (X ˘,U ˘X ˘ −U X ˘ )}−1(X ˘,U ˘X ˘ −U X ˘ )TW (Y−Z ˘ b). aA (U )=(I , 0 ){(X j p p×p j j j j j Replacing a(U ) ( j=1, . . . , n) in (1·4) by aA (U ), we obtain j j Y =XT aA (U )+ZT b+e (i=1, . . . , n). i i i i i Letting

A

˘,U ˘X ˘ −U X ˘ )TW (X ˘,U ˘X ˘ −U X ˘ )}−1(X ˘,U ˘X ˘ −U X ˘ )TW XT (I , 0 ){(X 1 p p×p 1 1 1 1 1 e H= we have

˘,U ˘X ˘ −U X ˘ )TW (X ˘,U ˘X ˘ −U X ˘ )}−1(X ˘,U ˘X ˘ −U X ˘ )TW XT (I , 0 ){(X n p p×p n n n n n ˘ b)+Z ˘ b+e, Y=H(Y−Z

B (2·2)

where e=(e , . . . , e )T. A least squares estimation procedure applied to (2·2) yields the 1 n estimator ˘ T(I −H )T(I −H )Z ˘ }−1Z ˘ T(I −H )T(I −H )Y. bA ={Z n n n n As an estimator of b, bA has a bias of order O (h2) and a variance of order O (n−1). P P Undersmoothing is typically necessary to achieve the convergence rate of n−D for bA . Another approach is to minimise (2·1) with respect to (b, c, b) to obtain (bA , cA , bA (u)), and r let u range over U ( j=1, . . . , n). The estimator of b is taken to be j n bA =n−1 ∑ bA (U ), r j j=1

Y. X, W. Z  H. T

666

which still has bias of order O (h2) and variance of order O (n−1). To achieve the P P convergence rate of n−D, undersmoothing is again necessary. The above two approaches cannot estimate b efficiently as they have bias of order O (h2). On the other hand, b is an unknown vector rather than an unknown function. P There should therefore be a global approach for estimating it with bias of higher order than O (h2), which prompts the following procedure. P 2·2. Estimation procedure Note that b is a global constant. It could be estimated more efficiently. We estimate b by replacing u in (2·1) by U and summing with respect to j. This leads to the following j semi-local least squares estimation procedure: minimise n n n−2 ∑ ∑ [Y −{b +c (U −U )}TX −ZT b]2K (U −U ) (2·3) i j j i j i i h i j j=1 i=1 with respect to b , c ( j=1, . . . , n) and b. Denote the minimiser by (b@ , c@ , . . . , b@ , c@ , b@ )T. 1 1 n n j j The estimator of a(U ) is b@ , which we also denote by a@ (U ), and the estimator of b is b@ , i i i where b@ =(0 , I )(VTW V)−1VTW Y˘, q×2pn q ˘,U ˘X ˘ )−U ˘ E(0 , X ˘ ). V=(A, Z*) diag {I Ediag (I , h−1I ), I }, A=I E(X n p p q n n×p We call the proposed approach a semi-local least squares estimation procedure because a(u) is estimated locally, and b is estimated globally. It is intuitively clear that the bias of b@ is O(h3), because the derivative of (2·3) with respect to b should be zero at (b@ , c@ , . . . , b@ , c@ , b@ )T; that is n n 1 1 n n ∑ ∑ [Y −{b@ +c@ (U −U )}TX −ZT b@ ]Z K (U −U )=0, j i j i i i h i j j i j=1 i=1 which gives us

q

r

−1 n n n n ∑ ∑ [Y −{b@ +c@ (U −U )}TX ]Z K (U −U ). ∑ ∑ Z ZT K (U −U ) j i j i i h i j i j i i h i j j=1 i=1 j=1 i=1 Note that there is a bias of O(h2) when we approximate a(U ) in Y by a local linear i i function, and that the bias of b@ is also O(h2). If these two terms cancel each other out in j the double summation, then the bias is reduced to order h3. We shall verify this in § 3. The proposed procedure only gives the estimate of a(.) at U ( j=1, . . . , n). In general, j a(.) can be estimated in two steps. First, estimate b based on the proposed approach, and substitute b@ for b in (1·4). Then (1·4) becomes a standard varying-coefficient model and the method in Fan & Zhang (1999) can be used to get the final estimator of a(.). To be specific, for any u, ˘,U ˘X ˘ −uX ˘ )TW (X ˘,U ˘X ˘ −uX ˘ )}−1(X ˘,U ˘X ˘ −uX ˘ )TW (Y−Z ˘ b@ ), a@ (u)=(I , 0 ){(X p p×p 0 0 where W =diag {K (U −u), . . . , K (U −u)}. As we shall see, the convergence rate of b 0 h 1 h n is of the order of n−D. Thus, the estimator of a(.) enjoys all the properties in Fan & Zhang (1999). b@ =

2·3. Model selection Fan et al. (2003) have developed an extension of , which for our models may be described as follows. For ease of exposition, denote all the covariates here by x with

Semivarying-coeYcient models

667

different subscripts, and let P=p+q. Suppose that S is any subset of {1, 2, . . . , P}, l P P . l∏ ∑ k k=0 We consider the semivarying-coefficient model

AB

y = ∑ b x + ∑ a (U )x +e . (2·4) i k ik k i ik i kµSl k1Sl Let (S ) be the value in (2·3) corresponding to the above model. For the minil misation in (2·3), the ‘degrees of freedom’ is m =n −p , where n =tr (W ) and l l l l p =tr {(VTW V)−1VTW 2V}; see Fan et al. (2003) for more details. We define , l corresponding to the above candidate model, by (S )=log {(S )/m }+2p /n . l l l l l Suppose that (S )=min (S ). Then the model with constant coefficients for l0 l l S is the model selected by the  method. l0 Although for many parametric cases asymptotic equivalence between the  method and the crossvalidation method holds, there is a fundamental difference in terms of operation: the latter does not require an explicit knowledge of the dimensionality of the parameter space. We develop an alternative to the above  approach. It is essentially a crossvalidation method under our setting. For every i, we first estimate a and b based k k on observations {( y , X , U ), jNi}. Denote the estimators by b@ ci and a@ ci (u). Then the j j j k k crossvalidation sum of squares is defined as

q

r

2 n (S )=n−1 ∑ y − ∑ b@ ci x − ∑ a@ ci (U )x . l i k ik k i ik 1 i=1 kµSl k Sl Suppose that (S )=min (S ). Then the model with constant coefficients for S is l0 l l l0 the ‘preferred’ model. Our fairly extensive simulation experiments suggest that the crossvalidation method is a very competitive alternative to the  method defined earlier. A typical set of simulation results is reported in § 4. It is well known that, under general conditions in parametric modelling, the  and the crossvalidation methods usually lead to an inconsistent estimator of the dimension of the parameter space. For linear models, several modifications have been made to the , which is defined as log (s@ 2)+c p/n, where s@ 2 is the mean of the residual sum of squares n of the working model, p is number of covariates and c =2. Well-known examples are n c =log (n) (Schwartz, 1978) and c =c log log (n) (Hannan & Quinn, 1979), for which the n n penalties of selecting too large a p are bigger than that for the . For some nonparametric and semiparametric models, Cheng & Tong (1992), Xia et al. (2002) and others have shown that the crossvalidation method can lead to consistent model selection because the penalty against overfitting increases exponentially with dimensionality. However, as we will see in the next section, the penalty against there being too many varying coefficients in the present model increases only algebraically. Similarly to the modifications of the , we explore a modification of the crossvalidation method in § 4. Since the number of possible subsets S is very large when the number of covariates l is large, a simplification of the crossvalidation procedure is also explored there. There are other possible approaches for us to select models. One potential approach is to use data-driven bandwidths to decide directly which coefficients are parametric and

668

Y. X, W. Z  H. T

which are nonparametric. The coefficients with very large bandwidths could be categorised as parametric and those with small bandwidth as nonparametric. On this ground, to find the best model is to find the best multivariate bandwidth. Similar ideas appeared in an unpublished Weierstrass Institute, Berlin technical report by A. Samarov, V. Spokoiny and C. Celine for the selection of the linear part and the nonlinear part in a partially linear model (Engle et al., 1986). The reasons for taking the model selection approach to find the best model rather than the multivariate bandwidth selection approach are as follows: it is computationally very difficult to search for the optimal bandwidth vector in a multivariate setting; it is hard to define how large the bandwidth should be when we could treat the corresponding term as parametric; since we use local linear methods in our estimation procedure, if we went for the bandwidth selection approach, the coefficient with bandwidth h that is infinite or very large may be constant but it may also be a linear function of U, which means that the bandwidth selection approach cannot tell us directly which coefficients are constant; and, since we are selecting between constant coefficients and functional coefficients, the bandwidth selection approach cannot use fully the information of the constant alternative, which has an impact on the efficiency of the selection. 3. A  Let L(u)=E(X XT |U=u), L (u)=E(Z XT |U=u), L (u)=E(Z ZT |U=u), 1 1 1 1 1 2 1 1 and let f (u) be the density function of U. T 1. Under the conditions in the Appendix, if h  0 and log h/(nh3)  0, then ˘,U ˘,Z ˘ )=O (h3), bias (b@ |X P ˘,U ˘,Z ˘ )=s2n−1[E{L (U ) f (U )−L (U )L(U )−1L (U )T f (U )}]−1 var (b@ |X 2 1 1 1 1 1 1 1 1 ×E{L (U ) f 2(U )−L (U )L(U )−1L (U )T f 2(U )} 2 1 1 1 1 1 1 1 1 ×[E{L (U ) f (U )−L (U )L(U )−1L (U )T f (U )}]−1{1+O (h)}. 2 1 1 1 1 1 1 1 1 P Remark 1. As b@ is a linear function of Y, its asymptotic distribution is normal. Remark 2. If U is uniformly distributed, then 1 ˘,U ˘,Z ˘ )=s2n−1(EV )−1{1+O (h)}, var (b@ |X P where V=L (U )−L (U )L(U )−1L (U )T. 2 1 1 1 1 1 1 For semiparametric models, since L(U )=1, we have V=cov (Z |U ). Therefore, 1 1 1 @ ˘ ˘ ˘ n var (b|X, U, Z)=s2{E cov (Z |U )}−1{1+O (h)}. 1 1 P Furthermore, if U is independent of Z , then 1 1 ˘,U ˘,Z ˘ )=s2{cov (Z )}−1{1+O (h)}. n var (b@ |X 1 P This implies that the proposed estimator is as efficient as the least squares estimator for the standard linear model.

Semivarying-coeYcient models

669

We now turn to the asymptotic behaviour of the crossvalidation method of the proposed model selection procedure. Since the convergence rate of b@ is n−1/2 when the bandwidth is taken to be the optimal bandwidth O(n−1/5), we can see, from the proof of Theorem 2, that asymptotically the crossvalidation sum of squares of the proposed estimation procedure is the same as that of Fan & Zhang’s (1999) method for a standard varying-coefficient model, which is model (1·4) with either known b or q=0. Thus, we only need to present the crossvalidation sum of squares of Fan & Zhang’s (1999) method for the model (1·4) with q=0. T 2. Under the conditions in the Appendix and if q=0, that is S is empty, if l nDh3  0 and nh/log n  2, the crossvalidation sum of squares of Fan & Zhang’s (1999) method is =n−1eTe+4−1n2 E{XT a◊(U )}2h4+n−1h−1m s2E{ f (U )−1XT L(U )−1X } 2 1 1 0 1 1 1 1 +o (n−1h−1+h4), P where a◊(.)=(a◊ (.), . . . , a◊ (.))T. 1 p Remark 3. In the formula for  in Theorem 2, the first term represents the effect of random error, the second term reflects the bias of the corresponding estimator, and the third term is the variance of the estimator. If we treat model (1·4) as a varying-coefficient model and appeal to the method proposed by Fan & Zhang (1999), the resulting value of  is n−1eTe+4−1n2 E{XT a◊(U )}2h4 2 1 1 +n−1h−1m s2E{ f (U )−1(XT , ZT )G−1(XT , ZT )T}+o (n−1h−1+h4), 0 1 1 1 1 1 P because the second derivative of constant b is zero. Here

A

B

L(U ) L (U )T 1 1 1 . L (U ) L (U ) 1 1 2 1 As expected, there is no difference between the bias terms if we treat model (1·4) as a varying-coefficient model or as a semivarying-coefficient model. However, simple calculation gives G=

(XT , ZT )G−1(XT , ZT )T−XT L(U )−1X =dV−D{L (U )L(U )−1X −Z }d2>0, 1 1 1 1 1 1 1 1 1 1 1 1 where djd2=jTj, which indicates that the variance coming from treating model (1·4) as a varying-coefficient model is larger, and the increment is detectable up to O(n−1h−1); this leads to an increment in  which is detectable up to O(n−1 h−1). This means that a model selection procedure based on  is sensible since O(n−1h−1) is the dominant term besides the random error term n−1eTe, which does not change with the model. The above discussion leads to the next theorem, which indicates that the proposed crossvalidation procedure yields consistent model selection. T 3. Under the conditions in the Appendix, if nDh3  0 and nh/log n  2, then lim pr (S =S )=1, l0 0 n2 where S is the minimiser of (S ) defined in § 2·3, and S is the true model. l0 l 0

670

Y. X, W. Z  H. T

4. A   Our simulations give us some insight into how the method depends on the sample size n and the signal-to-noise ratio for finite datasets. The models are designed to change with the sample size, so that we can further check the local ‘power’ of our model selection method. We do not have prior knowledge of which coefficients are varying and which are not, so that all the covariates here are denoted by x with different subscripts. Recall that our model can be written as P−L P y= ∑ a (U)x + ∑ b x +e. (4·1) j j j j j=1 j=P−L+1 In all the calculations, the crossvalidation bandwidth choice and the Epanechnikov kernel are used. The crossvalidation bandwidth is chosen as follows. For every possible h, we first estimate b as given in § 2. Denote the estimator by b@ . For each l, we minimise h n min ∑ [y −{bT +(U −U )cT }X −b@ T Z ]2K (U −U ) (l=1, 2, . . . , n). l i l l i h i h i l al,bl i=1 i iNl Suppose that the minimum is achieved at (b@ (h), c@ (h)). The bandwidth is chosen to be l l n h@ =arg min ∑ {y −b@ T (h)X −b@ T Z }2. l l h l h l=1 l As we mentioned in § 2·2, implementation of the crossvalidation model selection procedure can be very time-consuming when P is large. In fact, we need to estimate 2P models with different combinations. If we estimate model (4·1) assuming that all the coefficients are varying, we have, by Theorem 2, that min {a@ (U)}=c+o (1), max {a@ (U)}=O {h2+(nh)−D}, l p l p P−L+1∏l∏P 1∏l∏P−L where c>0 and ‘’ denotes standard deviation. Thus, (4·2) min {a@ (U)}. max {a@ (U)}< l l P−L+1∏l∏P 1∏l∏P−L Inequality (4·2) tells us that estimators of constant coefficients always have smaller standard deviation than those of varying coefficients. Exploiting this fact, we suggest that the following simplified crossvalidation model selection procedure is worth exploring for large P. Step 1. Set L =0, so that all the coefficients are varying. For each i=1, . . . , n, estimate model (4·1) based on {(X , y ), lNi, l=1, 2, . . . , n} and obtain estimates l l a@ ci (u), a@ ci (u), . . . , a@ ci (u) using the method of Fan & Zhang (1999). Calculate 1 2 n P v =[n−1W {a@ ci (u )−m }2]D, for j=1, . . . , P, where m =n−1W n a@ ci (u ), and j ij ij i=1 j i i=1 j i 2 n P (0)=n−1 ∑ y − ∑ a@ ci (U)x . i j ij i=1 j=1 Step 2. Look for the minimum value among {v , j=1, . . . , P−L }, v say, and j P−L then set L =L +1. We estimate model (4·1) using the method in § 2. Obtain esti(u), b@ , . . . , b@ based on all the observations and mates a@ (u), a@ (u), . . . , a@ P−L P−L+1 P 2 1

q

r

Semivarying-coeYcient models 671 a@ ci (u), a@ ci (u), . . . , a@ ci (u), b@ ci , . . . , b@ ci based on {(X , y ), lNi, l=1, 2, . . . , n} using 1 2 P−L P−L+1 P l l the proposed method. Calculate v ={a (u ), i=1, 2, . . . , n}, j j i 2 n P−L P , (L )=n−1 ∑ y − ∑ a@ (U)x − ∑ b@ x j ij i j ij i=1 j=1 j=P−L+1 2 n P−L P . (L )=n−1 ∑ y − ∑ a@ ci (U)x − ∑ b@ ci x i j ij j ij i=1 j=1 j=P−L+1 Step 3. If (L )

Suggest Documents