Improved prediction in finite population sampling using ... - Sankhya

1 downloads 0 Views 3MB Size Report
non-parametric regression is robust against model specification and performs .... 2.1 Parametric model, non-parametric model and their convex combination.
Sankhya B (November 2010) 72:189–201 DOI 10.1007/s13571-011-0009-9

Improved prediction in finite population sampling using convex combination of parametric and non-parametric models N. G. N. Prasad · Subhash R. Lele

Received: 14 October 2009 / Revised and Accepted: 14 July 2010 / Published online: 10 May 2011 © Indian Statistical Institute 2011

Abstract We consider inference for sampling from a finite population when information on auxiliary variables is available. In such situations it is well known that both model based and model assisted approaches perform better than the purely design-based approach provided the assumed model that links study variable and auxiliary variables is appropriate. An approach based on non-parametric regression is robust against model specification and performs well when the sample size is large. However, for small to medium sample sizes a parametric model-based or model-assisted approach performs better even if the assumed parametric model is not the correct one. In this paper, we propose a compromise approach that considers a convex combination of a parametric and a non-parametric model where the weight for the parametric model is determined based on its adequacy. We determine the optimal weight by minimizing the cross-validation based prediction error. We illustrate the use of this idea in the case of the stratified simple random sampling design and the probability proportional to size sampling design. Using simulations, we show that our approach provides predictions that are better than both the purely parametric model based and the purely non-parametric model based predictions. Keywords Bias-variance tradeoff · Design based · Model based · Model assisted · Super-population · Semi-parametric regression

N. G. N. Prasad (B) · S. R. Lele Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, Canada T6G–2G1 e-mail: [email protected]

190

Sankhya B (November 2010) 72:189–201

1 Introduction

In finite population sampling interest lies in predicting either the values of the individual unobserved units or in predicting the population or strata totals (Valliant et al. 2000). Simple estimators of population or strata totals are given by their sample analogues. These estimators are known to be both design unbiased and model consistent (Thompson 1997). In many situations, aside from the observations on the quantity of interest, some auxiliary variables are also available. This information can be successfully exploited to improve the performance of the simple estimators based on the sample strata means and totals (Valliant et al. 2000; Sarndal et al. 1992) and also can be used to predict values of the individual unobserved units. However, this improvement comes with a price: one needs to stipulate a model that relates the covariates to the response variable. Unfortunately, if the presumed model is inappropriate, model based approaches can falter (Hansen et al. 1983). One can use a nonparametric regression model that assumes little structure aside from some smoothness characteristics (Breidt et al. 2005; Thompson 1997, p. 195). But if the sample size is small to medium, the variance associated with the estimated non-parametric regression function can be substantial. On the other hand, if the sample size is small, the parametric model, although inappropriate and thus involving some bias, is estimated with higher accuracy. It, thus, stands to reason that for small to medium sample sizes an inappropriate parametric model might be more useful than a non-parametric model. In this paper, we consider a compromise between the fully parametric and the fully non-parametric model. We consider a convex combination of the parametric model parametric model is high and vice versa. We determine the optimal weight by minimizing the cross-validation based prediction error. Using theoretical calculations we show that as the sample size increases if the assumed model holds, its weight converges to 1; on the other hand if the model is not correct, its weight converges to 0. For small to medium sample sizes, the weight corresponds to the bias-variance trade-off. If the sample size is small, a parametric model can be estimated accurately but has bias because it is not the true model; on the other hand, non-parametric model estimation has smaller bias but is highly variable. Thus, for small to medium sample sizes, even an incorrect parametric (read, simple) model might be preferable to a correct non-parametric (read, complex) model. We let the data determine the correct combination of the simple and the complex model. We illustrate the use of this convex combination model for predicting population totals in the context of the stratified simple random sampling and probability proportional to size sampling designs. Using simulations, we show that our approach provides predictions that are better than both the purely parametric model based predictions and the purely non-parametric model based predictions.

Sankhya B (November 2010) 72:189–201

191

2 Notation and assumptions Let Y1 , Y2 , ..., Y N denote the finite population values and X1 , X2 , ..., X N denote the associated covariates. Let Y1 , Y2 , .., Yn denote the sample and X1 , X2 , ..., Xn be the associated covariates. Let m(x, β) denote the parametric model that relates the covariates to the responses. We assume that β is a finite dimensional parameter. Let k(x) denote the kernel regression model. We assume that the response variables and the covariates are related to each other by the model: Yi = f (Xi ) + i where f (.) is a smooth function and i are independent, identically distributed random variables with mean and finite variance σ 2 . The parametric model m(x, β) need not be the same as f (.). 2.1 Parametric model, non-parametric model and their convex combination Agresti (1990, p. 183) discusses the gain in the inferential power by using adequate (but not necessarily correct) parametric model in the context of the categorical data analysis. He shows that if the sample size is small to medium, many times it is better to use a (potentially incorrect) parametric model than a fully non-parametric model. Altham (1984) provides similar results in more general context. To combine the benefits of the parametric model efficiency and the model robustness of the non-parametric approach, Olkin and Spiegelhalter (1987) utilized a convex combination of a parametric model and a non-parametric model in the context of density estimation. In the context of categorical data analysis, Rudas et al. (1994) used a convex combination of a parametric model and a non-parametric model. In this paper, we propose to use such a convex combination in the regression context and show its utility in the finite population sampling inference. Instead of fitting only the parametric or the non-parametric model to the data, we consider fitting a convex combination: πn∗ m(X, β) + (1 − πn∗ )k(X) where m(x, β) denotes the working parametric model and k(X) denotes the non-parametric model. In the following we provide an algorithmic description of the estimation procedure for the weight πn∗ and justify the solution in the Appendix. Algorithm for the computation of the optimal convex combination: Step 1: Fit the parametric model using any of the standard methods, for example, the maximum likelihood or the least squares. The main constraint on the method being used is that it provides consistent estimators of the parameters when the model is correct. If the model is incorrect, this implies that we obtain that value of the parameter that minimizes the divergence between the true model and the parametric model (White 1981). Step 2: Fit the non-parametric regression model using any standard method. The main constraint on the method used is that it provides a consistent

192

Sankhya B (November 2010) 72:189–201

estimator of the regression function. See Hardle (1990) for a description of such methods. In the case of multiple covariates, one may consider methods such as Projection Pursuit Regression (Friedman and Stuetzle 1981) instead of the standard kernel regression. Step 3: Determine the optimal convex combination by minimizing the crossvalidation prediction error under the constraint that πn∗ ∈ [0, 1]. That is, obtain the value of that minimizes n      2   Yi − πn∗ m Xi , βˆ(−i) + 1 − πn∗ kˆ (−i) (Xi ) , i=1

where m(Xi , βˆ(−i) ) and kˆ (−i) (Xi )indicate the corresponding model fits after deleting the i-th observation. In this paper we consider only squared error loss format for the cross-validation prediction error. One may consider other loss functions such as the absolute error or the Huber loss format (Huber 2003). Similarly, instead of unweighted cross-validation prediction error, one may consider a weighted crossvalidation prediction error with weights proportional to sampling design weights for finding optimum value for πn∗ . We illustrate the use of both unweighted and weighted cross-validation prediction error in the simulation study. The following results are proved in the Appendix. (a) If the parametric model is the true model, the bias for the parametric model is zero. Hence, consistency of the estimator implies that as sample size increases variance converges to zero. The rate at which the mean squared error for the parametric model converges to zero is usually faster than that for the non-parametric model. Hence, as, n → ∞, πn∗ → 1. (b) If the parametric model is not the true model, mean squared error for the parametric model does not converge to zero but for the non-parametric model it does and hence as, n → ∞, πn∗ → 0. How quickly πn∗ → 0 (not the rate, but the constant) depends on the extent of bias and variance of the parametric model (MSE in the parametric model) as compared to the MSE of the non-parametric model. If the parametric model is adequate this ratio is small and the convergence is slow (thus, retaining the power of the parametric model for a longer time) and vice-versa. In the following we apply the convex combination of the parametric and the non-parametric regression model in the finite population sampling context. We show that one can obtain model robustness through the use of non-parametric model and statistical efficiency through the use of the parametric model.

3 Simulation based performance comparison In the following, we study the finite sample behavior of the estimator (predictor) of the finite population total with respect to its MSE using the convex

Sankhya B (November 2010) 72:189–201

193

combination idea described in the previous section. For the sake of simplicity we consider only the case with a single covariate, X. Two common sampling designs, namely, stratified random sampling and probability proportional to size sampling designs are considered here for the illustration purpose. (1) The stratif ied simple random sampling design In the simulation study, covariate values were generated using the uniform distribution on the range 0–10. The general model used was Yi = m (Xi , β) + g (Xi ) i , i = 1, 2, ..., N, where, i s are normally distributed random variables with mean zero and variance 5. We generated four finite populations, Y1 , Y2 , ...Y N , each of size N = 500 according to the following four scenarios. 1. Population A: m(Xi , β) = β0 + β1 Xi with β0 =5 and β1 =1.5, and g(Xi )=0. For this population, the linear regression model is the true model. 2. Population B: m(Xi , β) = β0 + β1 Xi with β0 =5 and β1 =1.5, and g(Xi ) = 0.5 ∗ Sin(Xi ). For this population, there is a non-linear component added to the linear regression model but it is not very large. 3. Population C: m(Xi , β) = β0 + β1 Xi with β0 =5 and β1 =1.5, and g(Xi ) = 5 ∗ Sin(Xi). For this population a large non-linear component is added to the linear regression model. 4. Population D: m(Xi , β) = 5 and g(Xi )=5.0 ∗ Sin(Xi ). For this population, the model is completely non-linear. Each of the above population is stratified into L = 5 strata, each of size 100 units based on the covariate values. For each population, stratified simple random sampling design was used to draw samples of size nh from the h-th stratum. Two different choices of nh = 20 and nh = 40 were used for the simulation study. For a given sample, weused the following four estimators (predictors) of N the population total T = i=1 Yi . L (i) Design-based estimator (Cochran 1977): Tˆ N,D = h=1 Nh y¯ h . (ii) Model-based predictive inference (Valliant et al. 2000): Tˆ N,P =

L 

nh y¯ h +

L  

  m xhi .βˆ .

h=1 i∈sh

h=1

(iii) Model-assisted inference (Sarndal et al. 1992): Tˆ N,A =

L  h=1

nh y¯ h +

L   h=1 i∈sh

  m xhi .βˆ .

194

Sankhya B (November 2010) 72:189–201

(iv) Kernel regression based predictor (Thompson 1997): Tˆ N K =  ˆ i∈s K(Xi ). (v) Convex combination based predictor:       ˆ i ). Tˆ NW = Yi + πn∗ m Xi , βˆ + 1 − πn∗ k(X i∈s

 i∈s

Yi +

i∈s

The summary numbers on percent absolute relative deviations 100 ∗ |(Predictor − T)/T, namely, minimum, maximum and average values taken over 500 simulated samples for the four populations are presented in Table 1. Based on the entries in Table 1 we make the following observations. In the following, by working model we mean the linear regression model. For Population A where the working model is identical to the one being used for generating the population; as expected, the linear model based estimator (model based or model assisted) is better than the kernel regression based estimator as well as the design based estimator. Notice, however, that the convex combination based estimator is almost as good as the linear model based estimator. For the Population B, where the working model is slightly different from the population model, the convex combination method is superior to both the linear model based and the kernel regression based approach. For Populations C and D, where the linear model is quite inappropriate, Table 1 Summary statistics (min, max and mean) for percent relative absolute error (100 ∗ [ predictor − true value]/true value) based on 500 Monte Carlo Runs Population n = 30 A

B

C

D n = 50 A

B

C

D

Method

Min

Max

Mean

Convex combination Linear Model-based Kernel regression Convex combination Linear Model-based Kernel regression Convex combination Linear Model-based Kernel regression Convex combination Linear Model-based Kernel regression

2.31 2.3 2.33 2.4 2.43 2.42 3.44 3.91 3.33 3.74 4.92 3.71

3.99 4.01 4.11 4.03 4.11 4.1 4.71 5.89 4.69 4.83 6.01 4.79

3.04 3.01 3.45 3.11 3.09 3.31 3.77 4.93 3.71 3.86 5.42 3.82

Proposed Model-based Non-parametric Proposed Model-based Non-parametric Proposed Model-based Non-parametric Proposed Model-based Non-parametric

2.14 2.1 2.15 2.15 2.33 2.12 2.84 3.01 2.83 3.14 4.22 3.11

3.21 3.12 3.21 3.93 4.11 3.81 3.91 4.29 3.89 3.93 5.81 3.89

2.81 2.79 2.83 2.69 2.9 2.63 3.14 3.39 3.13 3.46 4.99 3.42

Sankhya B (November 2010) 72:189–201

195

the convex combination method and the non-parametric method have very small and comparable percent absolute relative deviation summary numbers as compared to the values under linear model-based method. The differences are negligible when nh = 40. With larger sample size we can get away with less model assumptions and use the kernel regression or purely design based estimator. The convex combination estimator performs similar to a sandwichtype estimator of model-based and the non-parametric method estimator: it borrows robustness from the non-parametric method and the efficiency from the model-based approach. (2) The probability proportional to size sampling (PPS) design Now we consider the probability proportional to size (PPS) design. In this design the probability that i-th individual will be selected in the sample is proportional to. We generate finite populations from the model Yi = m(Xi , β) + g(Xi )i , i = 1, 2, ..., N, where, i s are normally distributed random variables with mean zero and variance 2, and Xi ’s are generated from a gamma distribution with scale 10 and location 100. We generated two different finite populations, each of size according to the following two scenarios: 1. Population A1: m(Xi , β) = β0 + β1 Xi with β0 = 5 and β1 = 1.5, and g(Xi ) = 0. For this population, the linear regression model is the true model. 2. Population B1: m(Xi , β) = 200 + 2.5(Xi − 8) and g(Xi ) = 3.5(Xi − 20)2 − 1.5 ∗ Xi3 + 2.5(Xi − 5)4 + 4.8 exp(−0.8(Xi − 5)). For this population, the true model is completely non-linear. Two sample sizes  n = 30 and n = 50 were considered to estimate the N population total T = i=1 Yi . The estimators (predictors) of the population total under different approaches are as follows:  yi (i) Purely design based estimator : Tˆ N,D = i∈s np . i (ii) Model assisted estimator:    N    yi − m Xi , βˆ  Tˆ N MD = + m Xi , βˆ . npi i∈s i=1 (iii) Parametric model based predictor:     Tˆ N M = Yi + m Xi , βˆ . i∈s

i∈s

  ˆ i ). (iv) Kernel regression based predictor: Tˆ N K = i∈s Yi + i∈s K(X (v) Convex combination based predictor:       ˆ i ). Tˆ NW = Yi + πn∗ m Xi , βˆ + 1 − πn∗ K(X i∈s

i∈s

In these above, the working model is a linear model with βˆ being the weighted least square estimator of β and pi =  NXi X . i=1

i

196

Sankhya B (November 2010) 72:189–201

N Absolute bias (|Predictor − i=1 Yi l ) and relative efficiency with respect to pure design (MSE(Predictor under pure design)/MSE (Predictor)) are computed based on 500 Monte Carlo runs. Summary results of this simulation study are given in Table 2. Clearly when the parametric model is appropriate, modelbased predictive approach works the best. However, notice that the convex combination approach works substantially better than the kernel regression based approach. The convex combination based approach is comparable to the model-assisted approach in this situation. On the other hand, when the working model is inappropriate, the convex combination based approach is obviously better than the model based approach. But, more significantly, the convex combination based approach is superior to the kernel regression based approach as well as model assisted approach. It has smaller relative bias and higher relative efficiency. This shows that the convex combination approach is both robust and efficient. In Table 3 we provide the descriptive statistics of the weight function πn over the 500 simulations. It is clear that when the working model is identical to the true model, on an average the weight for working model is 0.91 whereas when the working model is inappropriate, the average weight for the parametric model is 0.18. This illustrates that the data based determination of the weight as described in Section 2 is sensible. 3.1 Performance comparison for complex survey design based on real data set Similar to the study conducted by Royall and Cumberland (1981), we now compare performance of different predictors under a more realistic situation where the true model is unknown. We use data collected by the California State Department of Education. These data contain annual Academic Performance Index (API) of schools quantifying an overall rating of a school. One of the goals for collecting these data is to compare performance of different regions in terms of the total API of a region. In order to obtain the total API score of a region, one can either conduct the census of all schools in the region or conduct a sample survey of the schools in a region and predict the total

Table 2 Monte Carlo estimates for absolute bias and relative efficiency based on 500 runs for probability proportional to size sampling design under population models A and B Population

Sample size

Model-assisted

Pure model

Kernel

Proposed

A

30 30 50 50 30 30 50 50

19.14 1.34 12.34 1.28 91.82 0.7572 65.13 0.9112

14.24 1.42 10.23 1.39 100.45 0.6556 62.85 0.8131

28.32 1.16 21.81 1.17 33.5 1.1279 29.35 1.3215

21.34 1.33 12.41 1.31 31.71 1.84 22.16 1.9613

B

Absolute bias Relative efficiency Absolute bias Relative efficiency Absolute bias Relative efficiency Absolute bias Relative efficiency

Sankhya B (November 2010) 72:189–201

197

Table 3 Summary statistics (mean, median and standard deviation) for wn over 500 simulated samples Population A B

n = 30 n = 50 n = 30 n = 50

Mean

Median

Standard deviation

0.911 0.921 0.182 0.173

0.92 0.91 0.1 0.11

0.215 0.221 0.185 0.203

API score of the region based on such a sample. A reasonable sampling design is a two-stage sampling design where school districts are considered as the primary sampling units and schools within a district as the secondary sampling units. In the following we compare the performance of the convex combination approach with the other approaches for predicting the total API of a single region based on the two-stage sampling design. As our population, we consider the California State region (http://www.cde. ca.gov/ta/ac/ap/) containing 757 districts and a total of 6,194 schools. The academic performance of a school is generally related to the education of the parents: higher the level of parental education, better should be the API. The percentage of parents with post-graduate education is available for all schools in a given region and we plan to use it as auxiliary information to predict API of the schools not in the sample. We selected a simple random sample of 50 districts from the 757 districts and within each district a simple random sample of ten schools was selected. We use a linear model as our working model to relate the API score to percentage of parents with post-graduate N education. Based on this sample, we predict the mean API (T¯ = N1 i=1 Yi ) of a region using different approaches: model-assisted estimation, model based prediction, kernel regression based prediction and the convex combination based prediction. They are as follows.  N ˆ + i=1 (i) Model Assisted Estimator: T¯ˆ N MD = N1 [ i∈s Wi (Yi − m(Xi , β) ˆ m(Xi , β)].   ˆ (ii) Pure Model Predictor: Tˆ¯ N M = N1 [ i∈s Yi + i∈s / m(Xi , β)].   ˆ 1 ˆ (iii) Kernel Predictor: T¯ K = [ Y + K(X , )]. N

N

i∈s

i

i∈s /

i

(iv) Convex combination based Predictor:   

  1 ˆ ˆ i) . T¯ N W = Yi + πn m Xi , βˆ + (1 − πn ) K(X N i∈s i∈s / These estimators were compared to the true mean API score for the California State. To compare their performance, 500 Monte Carlo runs were used to compute predicted values for mean API score based on above mentioned predictors along with pure design based estimator (Cochran 1977). Further, these predicted values were used to compute absolute bias and mean square errors for these predictors. The values reported under relative absolute bias

198

Sankhya B (November 2010) 72:189–201

Table 4 Percent relative absolute bias and relative efficiency based on 500 Monte Carlo samples for PPS simulation study

Absolute bias (%) Relative efficiency

Model-assisted

Model-based

Kernel

Proposed*

Proposed

9.22 1.24

4.87 1.21

7.44 1.11

8.33 1.36

7.33 1.33

*computed with design weights in estimating πn

and relative efficiency (defined as ratio of mean square errors) in Table 4 indicate that model based prediction approach works quite well indicating that the linear model is an adequate model. This is also reflected in the fact that the weight for the parametric model in the convex combination varied from 0.71 to 1. The model assisted approach yields somewhat larger bias but with no gain in efficiency. The kernel regression approach yields larger bias and some loss in efficiency. This is to be expected since the parametric model is adequate for these data. On the other hand, the convex combination approach yielded somewhat larger bias than the model based predictor but it also provided some increase in the efficiency. It provides a trade-off in the relative absolute bias and relative efficiency with respect to the pure design estimator. We also considered using design weights in the determination of the weight in convex combination but the effect was negligible, leading to increase in bias but no associated increase in the efficiency.

4 Summary In this paper, we have shown that one can use a compromise between a parametric and a non-parametric model to improve the performance of the estimators of population totals for stratified simple random sampling and PPS designs. This approach, of course, is not limited in its utility only in the sampling situation. It could be used when the goal of regression modelling is prediction rather than estimation of the parameters. The estimated weight could also be used to determine the adequacy of the parametric model. Incorporating sampling weights is an important problem in survey sampling. The method described in this paper can be extended to other complex survey designs as well as to small area estimation problem. We have deliberately avoided the philosophical discussion on whether model assisted design based inference or the model based prediction approach is inferentially correct. In this paper, we simply replace the model component in these approaches, whether model based or model assisted, by the convex combination of the parametric regression and nonparametric regression. We show that, the idea of using a convex combination of a parametric and non-parametric model improves performance whichever philosophy one chooses to follow. Acknowledgement This work was partially supported by grants from the National Science and Engineering Research Council of Canada.

Sankhya B (November 2010) 72:189–201

199

Appendix: Behavior of the weight πn Let us look at the theoretical analogue of the minimization criterion to understand what the algorithm for the computation of πn provides. Consider  

ˆ E X,Y Y − π m X, βˆ − π¯ K(X)   = E X,Y Y − π m X, βˆ − π m(X, β) + π m(X, β)

ˆ −π¯ K(X) − π¯ K(X) + π¯ K(X)   ˆ − π m(X, β) = E X,Y Y − π m(X, β) − π¯ K(X) − π m(X, β)   ˆ −π K(X) − K(X)    = E X,Y Y − K(X) − π (m(X, β) − K(X)) − π m X, βˆ   ˆ −m(X, β)) − π K(X) − K(X) . . .

(1)

Notice the following results: (a) It follows from the model assumptions that V1 = E X,Y (Y − K(X))2 = σ 2 . (b) The mean squared difference between the true model and the parametric model V2 = E X,Y (m(X, β) − K(X))2 = O(1) (conditions: finite range for X and the mean functions are finite). (c) The parametric model may or may not be the true model. Hence, following White (1981), let β be defined as the value that minimizes ˆ − E X,Y (Y − m(X, β))2 . √It, then, follows that V3 = E X,Y (m(X, β) 2 ˆ m(X, β)) = O(n) by n consistency of the estimator β. (d) The commonly used non-parametric regression estimators are known to be n4/10 consistent (Hardle 1990). It, thus, follows that V4 = ˆ E X,Y [ K(X) − K(X)]2 = O(n8/10 ). (e) SinceEY|X [(Y − K(X))|X = x] = 0, ) it follows that C1 = E X,Y [(Y − K(X))(m(X, β) − K(X)] = 0. The following results follow from Cauchy–Schwartz inequality (Cassella and Berger 2002): √ ˆ (f) C2 = E X,Y [(Y − K(X))(m(X, β)−m(X, β))]≤ O(1) ∗ O(n) = O(n1/2 ), ˆ − K(X))] ≤ O(1) ∗ O(n8/10 ) = O(n4/10 ), (g) C3 = E X,Y [(Y − K(X))( K(X) √ ˆ − m(X, β))] ≤ O(1) ∗ O(n) = (h) C4 = E X,Y [(m(X, β) − K(X))(m(X, β) O(n1/2 ),

200

Sankhya B (November 2010) 72:189–201

 ˆ (i) C5 = E X,Y [(m(X, β) − K(X))( K(X) − K(X))] ≤ O(1) ∗ O(n8/10 ) = O(n4/10 ),  ˆ ˆ − m(X, β))( K(X) − K(X))] ≤ O(n) ∗ O(n8/10 ) = (j) C6 = E X,Y [(m(X, β) O(n0.9 ). (a) Now suppose that the parametric model is not the correct model. In this case, we can write, as n → ∞, n  

2  

2 1  ˆ ˆ i ) ≈ E X,Y Y − π m X, βˆ − π¯ K(X) Yi − π m X, βˆ − K(X n i=1   = σ 2 + π 2 + O n4/10 .

Hence, under this case the prediction error is minimized when πn → 0 as n → ∞. (b) Now suppose that the parametric model is the correct model. Then n  

2  

2 1  ˆ i ) ≈ E X,Y Y − π m X, βˆ − π¯ K(X) ˆ Yi − π m X, βˆ − K(X n i=1   = σ 2 + (1 − π )C3 + O n1/2 .

ˆ Moreover, as n → ∞, C3 = Cov(Y, K(X)) > 0. Thus it follows that the prediction error is minimized when πn → 1 as n → ∞.

References Agresti, A. 1990. Categorical data analysis. New York: Wiley. Altham, P.M.E. 1984. Improving the precision of estimation by fitting a model. Journal of the Royal Statistical Society. Series B, Statistical Methodology 46:118–119. Breidt, F.J., G. Clasekens, and J.D. Opsomer. 2005. Model-assisted estimation for complex surveys using penalized splines. Biometrika V92(4):831–846. Cassella, G., and R.L. Berger. 2002. Statistical inference. 2nd edition. New York: Wiley. Cochran, W.G. 1977. Sampling techniques. New York: Wiley. Friedman, J., and W. Stuetzle. 1981. Projection pursuit regression. Journal of the American Statistical Association 76:817–823. Hansen, M.H., W.G. Madow, and B.J. Tepping. 1983. An evaluation of model-dependent and probability-sampling inferences in sample surveys (with discussion). Journal of the American Statistical Association 78:776–807. Hardle, W. 1990. Applied nonparametric regression. Cambridge: Cambridge University Press. Huber, P.J. 2003. Robust statistic. New York: Wiley. Olkin, I., and C.H. Spiegelhalter. 1987. A semiparametric approach to density estimation. Journal of the American Statistical Association 82:858–865. Royall, R.M., and W.G. Cumberland. 1981. The finite-population linear regression estimator and estimators of its variance-an empirical study. Journal of the American Statistical Association 76:924–930. Rudas, T., C.C. Clogg, and B. Lindsay. 1994. A new index of fit based on mixture methods for the analysis of contingency tables. Journal of the Royal Statistical Society. Series B, Statistical Methodology 56(4):623–639.

Sankhya B (November 2010) 72:189–201

201

Sarndal, C.E., B. Swensson, and J.H. Wretman. 1992. Model assisted survey sampling. New York: Springer. Thompson, M. 1997. Theory of sample surveys. London: Chapman & Hall. Valliant, R., A.H. Dorfman, and R. Royall. 2000. Finite population sampling and inference: A prediction approach. New York: Wiley. White, H. 1981. Consequences and detection of misspecified nonlinear regression models. Journal of the American Statistical Association 76:419-433.

Suggest Documents