Model Weights for Regression Estimation - American Statistical ...

Joint Statistical Meetings - Section on Survey Research Methods

MODEL WEIGHTS FOR REGRESSION ESTIMATION Wayne A. Fuller and Mingue Park Statistical Laboratory, Snedecor Hall, Iowa State University, Ames, IA 50011-1210 Abstract: Linear models that form the basis for survey regression estimation and the conditions under which the regression estimators are design consistent are reviewed. Model justification for some commonly used regression estimators is presented. Test for reduced models against design consistent models are discussed.

1.

Introduction

The earliest references to the use of regression in survey sampling include Jessen (1942) and Cochran (1942). Regression in similar contexts would certainly have been used earlier and Cochran (1977, page 189) mentions a regression on leaf area by Watson (1937). Cochran (1942) gave the basic theory for regression in survey sampling relying on linear model theory. He showed that the linear model did not need to hold in order for the regression estimator to perform well. He derived an expression for the O(n−1 ) bias and an O(n−2 ) approximation for the variance. He also showed that for the model with regression passing through the origin and error variances proportional to x, the ratio estimator is the generalized least squares estimator. Brewer (1963) is an early reference that considers linear estimation using a superpopulation model to determine an optimal procedure. He was concerned with finding the optimal design for the ratio estimator and discussed the possible conflict between an optimal design under the model and a design that is less model dependent. See also Brewer (1979). Various estimators have been proposed for estimating a finite population mean under a regression superpopulation model that postulates a relationship between the study variable and a set of auxiliary variables. Royall (1970, 1976) adapted linear prediction theory to the finite population situation and suggested the best linear model unbiased predictor (BLUP) for a finite population total. Cassel, Särndal and Wretman (1976) and Särndal (1980) proposed a generalized regression predictor that is asymptotically design unbiased and design consistent. Isaki and Fuller (1982) considered predictors of the regression type that are model unbiased and design consistent.Wright (1983) proposed a class of predictors, called QR-predictors, that contains most proposed predictors. Inference based on prediction theory is sensitive to model misspecification, as illustrated by Hansen, Madow

and Tepping (1983). Many techniques for robust inference have been suggested. See Royall (1992), Royall and Cumberland (1981a, 1981b), and Royall and Herson (1973a, 1973b). Design consistency has been proposed (Isaki and Fuller, 1982; Robinson and Särndal, 1983) as a way of providing protection against model misspecification in the large sample setting. In this paper, we consider the problem of constructing estimators with good model properties, such as model unbiasedness and minimum model variance, that are also design consistent. Assume the finite population is a realization from the superpopulation model y N = X N β + eN ,

(1)

eN ∼ (0 , ΣeeN ) , where yN = (y1 , · · · , yN )0 , eN = (e1 , · · · , eN )0 , XN = (x01 , · · · , x0N )0 , and xi = (xi1 , · · · , xik ) . We assume the covariance matrix ΣeeN is a positive definite matrix. Expressions without the subscript N are used to denote the corresponding sample quantities, for example, y = (y1 , · · · , yn )0 is the vector of sample observations. To investigate the large sample properties of certain estimators, we define sequences of populations, samples and sampling designs. The set of indices for the N -th finite population is UN = {1, · · · , N }, where N = 1, 2, · · · . Associated with j-th element of the N -th finite population is a vector of characteristics, denoted by yj N . Let FN = {y1N , · · · , yN N } be the set of vectors for the N -th finite population. The populationPmean of y for N the N -th finite population is y¯N = N −1 i=1 yiN . Let AN denote the set of indices appearing in the sample selected from the N -th finite population. The sample size is denoted by nN . The sample size nN is strictly less than N and nN → ∞ as N → ∞. We assume that samples are selected according to the probability rule PN (·). Under the specified sequence of populations, samples, and sampling designs, we define a sequence of estimators

1090


θˆN of the population mean y¯N to be design consistent, if for all ² > 0, n o ¯ lim P |θˆN − y¯N | > ² ¯ FN = 0 , N →∞

where the notation indicates that N -th finite population is held fixed and the probability depends only on the sampling design.

2.

Design consistent regression estimators

Under some superpopulation models and designs, the BLUP constructed under the model is also design consistent. Assume the superpopulation model in which the covariance matrix of the errors is a multiple of the identity matrix and the column of ones is in the column space of XN . Without loss of generality, assume the first element of xi is identically equal to one, and let xi = (1, x1,i ). The regression estimator of the population mean of y is

where ei = yi − xi β N . ¥ Proof.

Omitted.

Consider the construction of an estimator to meet the requirements of Theorem 1. Let a sample design have selection probabilities πi and define the sample estimator of the mean by Ã !−1 X X −1 ¯π ) = (¯ yπ , x πi πi−1 (yi , xi ) i∈A i∈A (3) X =: ai (yi , xi ) , i∈A

where

ai = 

˜ , ¯ 1,n ) β = y¯n + (¯ x1,N − x 1

(2)

−1 πj−1 

πi−1 .

Assume the first element of xi is identically equal to one. By analogy to (2), consider an estimator of the form

where −1

X

ˆ . ¯ 1,π )β y¯reg = y¯π + (¯ x1,N − x 1 (yi , xi ) ,

³ ´0 ˜ = β˜0 , β ˜ 0 = (X0 X)−1 X0 y , β 1

ˆ ) − (¯ ¯ 1,π , β ¯ 1,N , β 1,N )|FN = Op (bN ) , (¯ yπ , x yN , x 1

¯ 1,N is the population mean of x1,i . See Cochran and x (1977, p193). The estimator (2) is the BLUP under the model (1) and is also design consistent if the sample is selected by simple random sampling. Many of the samples encountered in practice are more complicated than simple random nonreplacement samples. Theorem 1 gives conditions under which the regression estimator is design consistent. Theorem 1. Let {FN } be a sequence of finite populations, where FN is a random sample of size N selected from an infinite superpopulation with finite fourth moments. Let qi = (yi , xi ) be a vector with mean ¯ N ) for the N -th population. Let a sequence q¯N = (¯ yN , x of probability samples be selected from the sequence {FN }. Define the regression estimator of y¯N by ˜ , ¯N β y¯reg = x ˜ is a design consistent estimator of a parameter where β denoted by β N . Then p lim {(¯ yreg − y¯N ) |FN } = 0 , N →∞

if and only if p lim {¯ eN |FN } = 0 ,

(4)

Assume

i∈A

N →∞

X j∈A

˜ ¯N β y¯reg = x

¯ 1,n ) = n (¯ yn , x



where bN → 0 as N → ∞. Then ˆ ¯ 1,π )β y¯reg − y¯N = y¯π − y¯N + (¯ x1,N − x 1 ¯ 1,π )β 1,N + Op (b2N ) = y¯π − y¯N + (¯ x1,N − x = e¯π + Op (b2N ) , (5) where ¯ 1,N )β 1,N . ei = yi − y¯N − (x1,i − x

(6)

The population mean of the ei of (6) is zero for any β 1,N . Therefore (4) gives a way to construct a design consistent estimator. The estimator (4) is linear in y and can be written P y¯reg = i∈A wi yi . Regression weights that define a regression estimator of the form (4) can be constructed by minimizing the quadratic objective function

subject to

(w − α)0 Φ(w − α)

(7)

¯N = 0 , w0 X − x

(8)

where α is a vector of initial weights and Φ is a positive definite matrix. One possible Φ-matrix is a diagonal matrix with the diagonal elements being the initial weights. Possible initial weights are αi = N −1 πi−1 or ai of (3).

1091


We observed that the regression estimator (2) is the BLUP under the regression model with homogeneous variances and is also design consistent under simple random sampling. But, for some models and designs the estimator that is conditionally best, given X, need not be a design consistent estimator. Assume the superpopulation model (1). Under the model, the unknown finite population mean is ¯ N β + e¯N . y¯N = x

(9)

It follows that, under the model, the best linear conditionally unbiased predictor of y¯N , conditional on X, is ·X −1 ˆ + y¯BLU P = N yi + (N − n)¯ xN −n β i∈A

JN −n Σee¯AA Σ−1 ee 0

where

³

ˆ y − Xβ

¯ N −n = (N − n) x

Xγ N = Σee Lπ ,

If, in addition Xη N = Σee Jn + ΣeeAA¯ JN −n ,

(17)

Assume there is a sequence {ζ N } such that (11)

Xζ N = Σee (Lπ − Jn ) − ΣeeAA¯ JN −n ,

(18)

for every sample from UN that is possible under the design. Then y¯BLU P of (10) is expressible as

Σee¯AA = E {eA¯ e0 } , eA¯ = (en+1 , · · · , eN )0 , JN −n is an N − n dimensional column vector of ones, and A¯ is the set of elements in U that are not in A. See Royall (1976). The estimator (10) will be design consistent if the design probabilities, the matrix ΣeeN and the matrix XN meet certain conditions. These conditions have been considered by, among others, Isaki (1970), Royall (1970, 1976), Scott and Smith (1974), Cassel, Särndal and Wretman (1976, 1979, 1983), Zyskind (1976), Tallis (1978), Isaki and Fuller (1982), Wright (1983), Pfefferman (1984), Tam (1986), Brewer, Hanif and Tam (1988), Montanari (1999) and Gerow and McCulloch (2000). We summarize the results in Theorem 2.

ˆ ¯c) β y¯BLU P = y¯HT + N −1 (N − n) (¯ xN −n − x ˆ , ¯ HT ) β = y¯HT + (¯ xN − x and

¯ ¡ ¢ (¯ yBLU P − y¯N ) ¯FN = Op n−α , N

Theorem 2. Let the superpopulation model be given by (1). Assume a sequence of populations, designs and estimators such that ¯ ¯ ¯ FN (12) ¯

i=1 −α

¯ c = (N − n)−1 x

(20)

X¡ ¢ πi−1 − 1 xi . i∈A

Proof. The sufficient condition for the estimator to be design consistent given in Theorem 1 is ¯ N β N | FN ) = 0 . p lim (¯ yN − x

(21)

By assumption (12) and (13), a sufficient condition for (21) is ´ ³ ˆ | FN = 0 . ¯ HT β (22) p lim y¯HT − x N →∞

A sufficient condition for (22) is n ³ X

=Op (nN ) , where the πi are the inclusion probabilities, (Ty,N , Tx,N ) is the total of (y, x) for the N -th population and α > 0. ˆ be defined by (11) and let {β } be a sequence Let β N such that

(19)

where

N →∞

ˆ − β ) | FN = Op (n−α ) . (β N N

(16)

where Jn is a n-dimensional column vector of ones, for a sequence {η N } and all possible samples, then y¯BLU P of (10) satisfies (¯ yBLU P − y¯N ) | FN = Op (n−α N ) .

¯ N − n¯ (N x xn ) ,

¯ HT ) − (¯ ¯ N )] | FN [(¯ yHT , x yN , x ( n ) X −1 =N −1 πi (yi , xi ) − (Ty,N , Tx,N )

(14)

where Lπ = (π1−1 , · · · , πn−1 )0 , for every sample form UN that is possible under the design. Then ´ ³ ˆ − y¯N | FN = Op (n−α ) . ¯N β (15) x N

,

¡ ¢ ˆ = X0 Σ−1 X −1 X0 Σ−1 y , β ee ee −1

(10)

´¸

Assume there is a sequence {γ N } such that

ˆ yi − xi β

´

πi−1 = 0 ,

(23)

i=1

for all A with positive probability. By the properties of the generalized least square estimator of (11),

(13)

1092

³

ˆ y − Xβ

´0

Σ−1 ee X = 0 ,


for every X such that X0 Σ−1 ee X is nonsingular. Therefore, if there is a γ N such that Σ−1 ee Xγ N = Lπ , condition (23) is satisfied. By assumptions (12), (13) and (14),

Montanari (1987) introduced the general QRpredictor as an extension of the QR-predictor of Wright (1983) and gave conditions under which the general QRpredictor is design consistent. The general QR-predictor is ´ X ³ ˆ , ˆ + N −1 ¯N β (26) ri yi − xi β y¯QR = x i∈A

ˆ − y¯N ¯N β x

³ ´ ˆ + (¯ ˆ ¯ HT ) β ¯ HT β = (¯ xN − x yHT − y¯N ) − y¯HT − x

where

¯ HT ) β N + (¯ = (¯ xN − x yHT − y¯N ) ´ ³ ˆ −β ¯ HT ) β + (¯ xN − x N =Op (n−α N ) . (24) If (16) is satisfied, · ¡ ¢ ˆ + N −1 J0 Σee + J0 ¯N β y¯BLU P = x AA n N −n Σee¯ ´¸ ³ (25) ˆ × Σ−1 y − Xβ

ˆ = (X0 QX)−1 X0 Qy , β

and Q is n×n matrix whose (i, j)-th element denoted by qij . Conditions under which the estimator (26) is design consistent are ˆ = BN (27) p lim β N →∞

and c ∈ C(WN XN ) ,

(28)

where −1

BN = (X0N WN XN )

ee

X0N WN yN ,

c = (c1 , · · · , cN )0 ,

ˆ . =¯ xN β

ci = 1 − ri πi ,

Result (17) follows from (24) and (25). If (18) is satisfied, ³ ´0 ˆ Σ−1 Xζ 0 =N −1 y − Xβ N ee ³ ´0 £ ¤ ˆ =N −1 y − Xβ (Lπ − J) − Σ−1 ¯ JN −n ee ΣeeAA ³ ´ ˆ ¯c − x ¯cβ =N −1 (N − n) y ³ ´0 ˆ Σ−1 ΣeeAA¯ JN −n . − N −1 y − Xβ ee

WN is N × N symmetric matrix whose (i, j)-th entry is qij πij and C(X) denotes the space spanned by the columns of X. A specification of ΣeeN may be particularly appropriate for two-stage cluster samples, See Royall (1976) and Montanari (1987). A reasonable model is that in which there is common correlation among items in the same primary sampling units and zero correlation between units in different primary sampling units. That is, a potential model for the j-th observation in cluster i is

It follows that y¯BLU P of (10) is ·X −1 y¯BLU P =N yi + (N − n)¯ yc i∈A

yij = xij β + uij ,

(29)

uij = bi + eij , bi ∼ II(0, σb2 ) ,

¸

ˆ ¯c) β + (N − n) (¯ xN −n − x

eij ∼ II(0, σe2 ) ,

ˆ ¯c) β =¯ yHT + N −1 (N − n) (¯ xN −n − x ˆ . ¯ HT ) β =¯ yHT + (¯ xN − x The error in the predictor is Op (n−α N ) because of assumptions (12) and (13). ¥ Theorem 2 gives the conditions under which the best linear unbiased predictor is design consistent. Especially, if (18) is satisfied, the estimator is the regression estimator with the coefficient estimated by the generalized least squares estimator. Theorem 2 also gives a way of constructing the model based design consistent estimator.

where eij is independent of bk for all i, j and k. Under the model (29), the BLUP defined in (10) is a general QR-predictor with ¡ ¢ r = (r1 , · · · , rn ) = Jn + Σ−1 , ¯ JN −n ee ΣeeAA and Q = Σee , where Σee is a block diagonal matrix in which the i-th block is a mi × mi matrix

1093

σe2 Imi + σb2 Jmi J0mi ,


and mi is the number of sampled elements in cluster i. Let a sample of primary sampling units be selected by unequal probability sampling design and a simple random nonreplacement sample of secondary sampling units be selected within a selected primary sampling unit. Under the specified model and design, the condition for the BLUP of the finite population mean to be design consistent given by Montanari (1987) is equivalent to the condition given in (18). Although the two conditions are equivalent under the specified set up, the equivalence of the two conditions does not generally hold. The condition in (18) is easier to check than the condition (28) because we only need the first order inclusion probabilities, the values of auxiliary variables corresponding to sampled elements and the covariance matrix of eN . Note that condition (28) is a function of XN , Q, ri and the first and the second order inclusion probabilities for elements in population.

3.

and

If we multiply (34) by X02 Φ−1 , multiply (34) by X00 Φ−1 and set the results equal to zero, we obtain the linear equation · 0 −1 ¸· ¸ X0 Φ X0 , X00 Φ−1 X2 Ψ λ X02 Φ−1 X0 , I + X02 Φ−1 X2 Ψ X02 w − µ0x2 · 0 ¸ ¯ 0,π − x ¯ 00,N x = , ¯ 02,π − x ¯ 02,N x (35) ¯ 2,π ) = α0 (X0 , X2 ). Thus, the vector of where (¯ x0,π , x weights that minimizes the objective function Q is ¯ 02,N ) − Φ−1 X0 λ w =α − Φ−1 X2 Ψ(X02 w − x · ¸ £ ¤ λ =α − Φ−1 X0 , Φ−1 X2 Ψ ¯ 02,N X02 w − x ¸−1 · X00 Φ−1 X0 X00 Φ−1 X2 −1 =α + Φ X X02 Φ−1 X0 Ψ−1 + X02 Φ−1 X2 ¸ · 0 ¯ 00,π ¯ N −x x , × 2, ¯ 02,π ¯ 02,N − x x (36)

Mixed model regression estimation

The regression model with random components has been heavily used for small area estimation. See Rao (2002). Royall (1976) and Montanari (1987) used the model for cluster samples. The model has also been used for mean estimation in the post stratification setting. See Little (1993) and Lazzeroni and Little (1998). We consider the mixed model y = X0 β 0 + X2 β 2 + e ,

1 ∂Q ¯ 00,N . = X00 w − x 2 ∂λ

where ½ λ = Q−1

(30)

where β 2 ∼ (0, Ψ), e ∼ (0, Φ), X0 is an n × l matrix, X2 is an n × (p − l) matrix, the random vector β 2 is independent of e, and β 0 is a fixed vector. ¯ N β is The best linear model unbiased predictor of x w0 y, where the vector w is chosen to minimize ¯ N β} = V{w0 e + (w0 X2 − x ¯ 2,N )β 2 } (31) V{w0 y − x

¡

¢ ¯ 00,π − x ¯ 2,N − X00 Φ−1 X2 x

¡ ¢−1 ¡ 0 ¢ ¯ 2,π − x ¯ 2,N × Ψ−1 + X02 Φ−1 X2 x

¯ 0,N w X0 = x

Q =X00 Φ−1 X0

¡ ¢−1 0 −1 − X00 Φ−1 X2 Ψ−1 + X02 Φ−1 X2 X2 Φ X0 .

The estimator defined with the vector of weights (36) is ˆ , ¯ π )θ y¯rreg = y¯π + (¯ xN − x

(32)

¯ N = (¯ ¯ 2,N ) is the population mean of x. If and x x0,N , x α is a vector of preliminary weights and if Φα is in the column space of X0 then the vector w can be obtained from the Lagrangean ¯ 2,N )Ψ Q = (w − α)0 Φ(w − α) + (w0 X2 − x 0 0 0 0 ¯ 2,N ) + 2λ (w X0 − x ¯ 0,N )0 , × (w X2 − x

(33)

where λ is a vector of Lagrangian multipliers. The partial derivatives with respect to w and λ are 1 ∂Q = Φw−Φα+X2 Ψ(X02 w−¯ x02,N )+X0 λ , (34) 2 ∂w

,

and

subject to the constraint 0

¾

where

(37)

ˆ = H−1 X0 Φ−1 y , θ Ψxx

and ·

HΨxx

X00 Φ−1 X0 = X02 Φ−1 X0

, X00 Φ−1 X2 −1 , Ψ + X02 Φ−1 X2

¸ .

The estimator defined in (37) is a design consistent estimator and is the best predictor under the mixed model. Estimation for the population mean in the present context differs from the situation under the model (29) in that the population mean of X2 is assumed to be known in estimation under model (30). Thus, in the estimation under model (30) we are estimating a linear combination of

1094


fixed and random effects, while the regression estimator under model (29) is an estimator of fixed effects. To derive the large sample properties of the estimator, consider a sequence of ΨN . If Ψ−1 is increasing at the N same rate as the sample size n, the estimator (37) is a design consistent estimator of the population mean of y.

where ai = yi − xθ N and Qxφa,N = Qxφy,N − Qxφx,N θ N . By (44), Qxφa,N − ΛN θ N = 0 . ˆ in estimating θ N is The error of θ µ

Theorem 3. Let {FN , AN , ΦN , ΨN } be a sequence of populations, samples, and positive definite matrices such that ³ 1´ − ¯ π ) − (¯ ¯ N )] |FN = Op nN 2 . (38) [(¯ yπ , x yN , x where nN is the sample size for the N -th sample and ¯ N ) is the population mean of (y, x). Assume there (¯ yN , x exists a sequence Qzφz,N such that ³ 1´ £ −1 0 −1 ¤¯ − nN Z ΦN Z − Qzφz,N ¯FN = Op nN 2 (39) and lim N −1 Qzφz,N = Qzφz ,

N →∞

(40)

(z01 , · · · z0n )0 ,

zi = (yi , xi ) and Qzφz is a where Z = positive definite matrix. Assume £ −1 ¤ lim n−1 ΨN = Ψ−1 , (41) N

where −1

θ N = [Qxφx,N + Λφ,N ] = H−1 xψx,N Qxφy,N ,

Qxφy,N

Hxψx,N = Qxφx,N + Λφ,N , and Λφ,N = Proof.

· 1 0 nN 0

, 0 , Ψ−1 N

µ

¸

¶

³ 1´ − = Op nN 2 ,

³ 1´ ¡ ¢ −2 ˆ − θ N = Op nN and θ because nN −1 Hxψx is bounded. The result follows from (39) and (43). ¥ If ΨN is fixed or ΨN → ∞ as n → ∞, then the estiˆ approaches the weighted least squares estimator mator θ ˆ and we obtain design consistency for y¯rreg because θ converges to the population analog of the weighted least square estimator. The proof of Theorem 4 is a proof of the assertion that the estimator y¯rreg defined in (37) is the best linear conditionally unbiased predictor for the population mean of y under the mixed model.

.

Φα = X0 c1

³ ´ ˆ − θN . ¯ π )θ N + (¯ ¯π ) θ y¯rreg = y¯π + (¯ xN − x xN − x (43) The population characteristic θ N is θ N =H−1 xψx,N Qxφy,N =H−1 xψx,N [Qxφx,N θ N + Qxφa,N + ΛN θ N − ΛN θ N ] =θ N +

1 Qxφa − Qxφa,N nN

Theorem 4. Consider the mixed model (30). Assume that there exists column vector c1 such that

The estimator (37) is

H−1 xψx,N

1 Hxψx nN

where Qxφx = X0 Φ−1 X, Qxφy = X0 Φ−1 y and Qxφa = Qxφy − Qxφx θ N . By the assumption (39),

N →∞

where Ψ is a positive definite matrix. Then, estimator (37) satisfies ¡ ¢ ¯ π )θ N + Op n−1 y¯rreg − y¯N = y¯π − y¯N + (¯ xN − x N ³ 1´ − = Op nN 2 , (42)

¶−1

1 Qxφy − θ N nN µ ¶−1 ½ 1 1 1 = Hxψx Qxφx θ N + Qxφa nN nN nN ¾ 1 Qxφx θ N − ΛN θ N − nN µ ¶−1 µ ¶ 1 1 = Hxψx Qxφa − ΛN θ N nN nN ¶−1 µ ¶ µ 1 1 Hxψx Qxφa − Qxφa,N , = nN nN (45)

ˆ − θN = θ

[Qxφa,N − ΛN θ N ] , (44)

.

Let y¯rreg = w0 y, where w is the vector of weights that minimizes ¯ 2,N )Ψ(w0 X2 − x ¯ 2,N )0 , (w−α)0 Φ(w−α)+(w0 X2 − x (46) subject to ¯ 0,N = 0 , w 0 X0 − x (47) ¯ N = (¯ ¯ 2,N ) is the population mean of where x x0,N , x x. Then y¯rreg is the best linear conditionally unbiased ¯ 0,N β 0 + x ¯ 2,N β 2 . predictor of x

1095


Proof. Under the restriction (47), the objective function (46) is w0 Φw − 2w0 Φα + α0 Φα ¯ 2,N )Ψ(w0 X2 − x ¯ 2,N )0 + (w0 X2 − x =w0 Φw − 2w0 X0 c1 + α0 Φα ¯ 2,N )Ψ(w0 X2 − x ¯ 2,N )0 + (w0 X2 − x 0

population mean vector. To force the regression through (¯ x1,π , y¯π ), we compute the regression of y − y¯π on ¯ 1,π . The transformed regression model for the x1 − x sample can be written (I − Ja0 )y = (I − Ja0 )X1 β 1 + (I − Ja0 )e ,

0

=w Φw − 2µx0 c1 + α Φα ¯ 2,N )Ψ(w0 X2 − x ¯ 2,N )0 + (w0 X2 − x ¯ 2,N )Ψ(w0 X2 − x ¯ 2,N )0 + C1 , =w0 Φw + (w0 X2 − x (48) where , C1 = α0 Φα − 2¯ x0,N c1 , is a constant that is independent of w. The conditional expectation of the error of a linear predictor w0 y under the model is E{(w0 y−¯ x0,N β 0 −¯ x2,N β 2 )|X} = w0 X0 β 0 −¯ x0,N β 0 . Thus, the sufficient and necessary condition for w0 y to be unbiased for the population mean of y is

¯ 1,π ) = where X1 = (x01,1 , · · · , x01,n )0 , (¯ yπ , x a0 (y , X1 ), a = (a1 , · · · , an )0 and ai were defined in (3). The regression estimator of the mean is expressed as X X wi yi =: (ai + bi ) yi , (50) y¯reg = i∈A

i∈A

The restriction that

¯ 0,N = 0 , w0 X0 − x

P

i∈A

¯ 1,N becomes wi x1,i = x

¯ 1,π ) = x ¯ 1,N − x ¯ 1,π . bi (x1,i − x

(52)

i∈A

To simplify the expressions, let

¯ 0,N β 0 − x ¯ 2,N β 2 )|X} V{(w0 y − x 0 0 ¯ 2,N )Ψ(w0 X2 − x ¯ 2,N )0 . = w Φw + (w X2 − x

¯ 1,π ) , b = (b1 , · · · , bn )0 , zi = (1, x1,i − x

Thus, minimizing the objective function (46) subject to (47) is equivalent to minimizing the conditional variance of a linear predictor under the restriction for a linear estimator to be conditionally unbiased. ¥

4.

i∈A

where bi is to be determined.PIf the estimator is to be location invariant we require i∈A wi = 1 or equivalently, X bi = 0 . (51)

X

which is equivalent to (47). Under the constraint (47), the conditional variance of a linear estimator is

(49)

Construction of a model based design consistent regression estimator

In the previous section, we obtained the condition for the BLUP to be design consistent. In this section, we consider the problem of constructing a model based design consistent regression estimator when condition (14) or (18) is not satisfied. We call a regression model for which (14) holds a full model. If (14) does not hold, we call the model a reduced model. We can not expect condition (14) for a full model to hold for every y in a general purpose survey because Σee will be different for different y’s. Therefore, given a reduced model, we search for a good model estimator under the model (1) in the class of design consistent estimators of the form (4). As we have seen in the previous section, the estimator of the form (4) is design consistent if the estimator of the regression coefficient is a design consistent estimator of a constant. By (4), the requirement of design consistency is essentially a requirement that the estimated regression function pass through the design consistent estimator of the

¯c = (0, x ¯ 1,N − x ¯ 1,π ) , z and Z = (z01 · · · , z0n )0 . Then the bi that give the miniˆ are obtained by minimizing the Lamum variance of β 1 grangian Ã n ! p X X bi zji − z¯cj (53) λj b0 Σee b + j=1

i=1

with respect to b. The solution vector is ¡ ¢−1 0 −1 ¯c Z0 Σ−1 b0 = z Z Σee . ee Z

(54)

Thus, the regression estimator (50) with b of (54) is ˆ , (55) ¯ 1,π )β y¯reg,1 = (a + b)0 y = y¯π + (¯ x1,N − x 1 ˆ is the second component of where β 1 ´0 ¡ ³ ¢ ˆ 0 = Z0 Σ−1 Z −1 Z0 Σ−1 y . βˆ0 , β ee 1 ee

(56)

In constructing the regression estimator (50), we obtained design consistency by forcing the regression line through the design consistent estimator of the population mean. We can also construct a design consistent estimator by adding a vector satisfying (14) to the X matrix if the original matrix X does not satisfy (14). This creates a full model from the original reduced model.

1096


To describe this approach let z denote the added variable, where z0 = (z1 , · · · , zn )0 satisfies z = Σee Lπ .

(57)

where ¯π ) = (¯ yπ , x

zd = z − X(X

−1 0 −1 Σ−1 X Σee z ee X)

,

y =Z1 β y,Z1 + e , e ∼ (0, Σee ) ,

(59)

ai = 

There are two possible situations associated with this approach. In the first, the population mean of the added variable, z¯d,N , is known. In this case, the resulting estimator ˆ ¯1,N β y¯reg = z (60) y,Z1 , where ¯1,N = (¯ ¯N ) , z zd,N , x 0 −1 −1 0 −1 ˆ β Z1 Σee y , y,Z1 = (Z1 Σee Z1 )

is the best linear, conditionally unbiased predictor under the full model (59). If Σee is known and if an equal probability sample is selected, then the regression estimator (60) is calculable. If the population mean of the added variable is not known, the mean of the added variable zd can be estimated with a design consistent estimator. A design consistent estimator of z¯d,N is Ã !−1 X X −1 z¯d,π = πi πi−1 zd,i . (61)

and

Then a regression estimator of the population mean of y can be constructed by replacing the unknown mean of zd with the estimated mean to obtain

πj−1 

πi−1 ,

¡ 0 −1 ¢−1 0 −1 ˆ β X Σee y . y,X = X Σee X

(64)

Proof. By construction Z01 Σ−1 ee Z1 is block diagonal with z0d Σ−1 z as one block and X0 Σ−1 ee d ee X as the other ˆ block. Thus β y,X1 is expressed as · ¸ βˆy,zd ˆ β y,Z1 = ˆ β y,X "¡ ¢−1 0 −1 # −1 0 zd Σee zd z Σ y = ¡ 0 −1 ¢−1 d0 ee X Σee X X Σ−1 ee y " ¡ ¢−1 0 −1 # {L0π zd }−1 {L0π y − L0π X X0 Σ−1 X Σee y} ee X = , ¡ 0 −1 ¢−1 0 −1 X Σee X X Σee y because ¡ 0 −1 ¢−1 0 0 0 X Lπ z0d Σ−1 ee zd = Lπ Σee Lπ − Lπ X X Σee X n o ¡ ¢ −1 0 = L0π Σee Lπ − X X0 Σ−1 X X L π ee n ¡ 0 −1 ¢−1 0 −1 o 0 = Lπ z − X X Σee X X Σee z = L0π zd . The regression estimator (62) is

(62)

The regression estimator of (62) can be

ˆ ¯ π )β y¯reg,2 = y¯π + (¯ xN − x y,X ,

−1

where a = (a1 , · · · , an )0 , is identically equal to the vector of weights defined in (55).

ˆ where β y,Z1 is of (60). The regression estimator (62) has the form of (55). For the estimator to be location invariant, we assume the first element of xi is identically equal to one and let the matrix X = (Jn , X1 ). Theorem 5. written as

ai (yi , xi ) ,

i∈A

Also, the vector of weights used to define the regression estimator (63) is

i∈A

ˆ ¯ N )β y¯reg,2 = (¯ zd,π , x y,Z1 ,

X

X

¡ ¢−1 0 −1 ¯ π ) X0 Σ−1 w = a0 + (¯ xN − x X Σee , ee X

Z1 = (zd , X) .

i∈A

πi−1 (yi , xi ) =:

j∈A

where

and

X i∈A



(58)

where X is the original matrix of auxiliary variables with ¯ N . The vector zd is known population mean vector, x orthogonal in the metric Σee to X. Under this approach our full model for the sample is

!−1 πi−1

i∈A

where Lπ is defined in (14). We will find it convenient to work with the vector of deviations 0

Ã X

ˆ ¯ N )β y¯reg = (¯ zd,π , x y,Z1 ˆ ˆ ¯N β + x = z¯d,π β y,X y,zd Ã !−1 X = (L0π zd )(L0π zd )−1 πi−1 i∈A

×

Ã X

! πi−1

n o ˆ ˆ ¯π β ¯N β y¯π − x y,X + x y,X

i∈A

(63)

1097

ˆ ¯ π )β = y¯π + (¯ xN − x y,X .


The matrix Z that was used to define the vector b on (55) is expressed ¡ ¢ ¯ 1,π Z = Jn X1 − Jn x µ ¶ ¡ ¢ 1 −¯ x1,π = Jn X1 0 I ¡ ¢ =: Jn X1 T , µ

where T=

1 0

−¯ x1,π I

¶ .

By using the inverse of partitioned matrix, the vector b in (55) is ¡ ¢ ¯ 1,N − x ¯ 1,π b= 0 x µ ¶ ¸ · µ 0 ¶ ¡ ¢ −1 0 J0n −1 0 Jn Σee Jn X1 T T Σ−1 × T ee X01 X01 ¡ ¢ ¡ ¢−1 0 −1 ¯ 1,N − x ¯ 1,π T−1 X0 Σ−1 = 0 x X Σee ee X µ ¶ ¡ ¢ 1 x ¯ 1,π ¯ 1,N − x ¯ 1,π = 0 x 0 I ¡ 0 −1 ¢−1 0 −1 × X Σee X X Σee ¡ 0 −1 ¢−1 0 −1 ¯ π ) X Σee X =(¯ xN − x X Σee . The result follows from (55) and (64).

¥

Thus the regression estimator of the finite population mean based on the full model, but with the mean of z unknown and estimated, is the regression estimator with β y,x estimated by the generalized least squares regression of y on x using the covariance matrix Σee . The estimator is conditionally model unbiased under the reduced model containing only x if the reduced model is true. If the coefficient for zd is not zero, the reduced model is not true. Then the estimator is conditionally model biased, but the estimator is unconditionally unbiased for the finite population mean because n h io ˆ ¯ π )β E E y¯π + (¯ xN − x y,X © ª ¯ π β y,X + z¯d,π β y,zd + (¯ ¯ π )β y,X | F =E x xN − x . ¯ N β y,X =¯ zd,N β y,zd + x (65) where the approximation is due to the use of the ratio estimator z¯d,π defined on (61). Because the variable z is the variable whose omission from the full model can produce a bias, it seems prudent to test the coefficient of z before using the reduced model to construct an estimator for the population mean of y. This can be done using a model estimator of the variance, n ¯ o ³ 0 −1 ´−1 ˆ ˆ ¯ Vˆ β y,Z1 Z1 = Z1 Σee Z1 or using the design estimator of variance. See Du Mouchel and Duncan (1983) and Fuller (1984).

5.

Bibliography

Brewer, K.R.W. (1963). Ratio estimation and finite population: some results deductible from the assumption of an underlying stochastic process. Australian Journal of Statistics. 5, 93–105. Brewer, K.R.W. (1979). A class of robust sampling designs for large scale surveys. Journal of the American Statistical Association. 74, 911–915. Brewer, K.R.W., Hanif, M. and Tam, S. M. (1988). How nearly can model-based prediction and design-based estimation be reconciled? Journal of the American Statistical Association. 83, 128–132. Cassel, C.M., Särndal, C.E. and Wretman, J.H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika. 63, 615–620. Cassel, C.M., Särndal, C.E. and Wretman, J.H. (1979). Prediction theory for finite populations when modelbased and design-based principles are combined. Scandinavian Journal of Statistics. 6, 97–106. Cassel, C.M., Särndal, C.E. and Wretman, J.H. (1983). Some uses of statistical models in connection with the nonresponse problem. In: W.G. Madow and I. Olkin (eds.), Incomplete Data in Sample Surveys, Vol. 3. Academic Press, New York, pp. 143–160. Cochran, W.G. (1942). Sampling theory when the sampling units are of unequal sizes. Journal of the American Statistical Association. 37, 199–212. Cochran, W.G. (1977). Sampling Techniques. 3rd ed. Wiley, New York. Du Mouchel, W. H. and Duncan, G. J. (1983). Using survey weights in multiple regression analysis of stratified samples. Journal of the American Statistical Association. 78, 535–543. Fuller, W. A. (1984). Least squares and related analyses for comples survey designs. Survey Methodology. 10, 97–118. Gerow, K. and McCulloch, C.E. (2000). Simultaneously model unbiased, design-unbiased estimation. Biometrics. 56, 873–878. Hansen, M.H., Madow, W.G. and Tepping, B.J. (1983). An evaluation of model-dependent and probabilitysampling inferences in sample surveys. Journal of the American Statistical Association. 78, 776–793. Isaki, C.T. (1970). Survey designs utilizing prior infor-

1098


mation. Unpublished Ph.D. thesis. Iowa State University. Isaki, C.T. and Fuller, W.A. (1982). Survey design under the regression superpopulation model. Journal of the American Statistical Association. 77, 89–96.

Statistical Association. 76, 924–930. Royall, R.M. and Herson, J. (1973a). Robust estimation in finite populations I. Journal of the American Statistical Association. 68, 880–889.

Jessen, R.J. (1969). Statistical investigation of a sample survey for obtaining farm facts. Iowa Agriculture Experiment Station Research Bulletin. 304.

Royall, R.M. and Herson, J. (1973b). Robust estimation in finite populations II: Stratification on a size variable. Journal of the American Statistical Association. 68, 890– 893.

Lazzeroni, L. C. and Little, R. J. A. (1998). Randomeffects models for smoothing poststratification weights Journal of Official Statistics. 14, 61–78.

Särndal, C.E. (1980). On π inverse weighting versus best linear unbiased weighting in probability sampling. Biometrika. 67, 639–650.

Little, R.J.A. (1993). Post-stratification : a modeler’s perspective. Journal of the American Statistical Association. 88, 1001–1012.

Särndal, C.E. (1996). Efficient estimators with simple variance in unequal probability sampling. Journal of the American Statistical Association. 91, 1289–1300.

Montanari, G.E. (1987). Post-sampling efficient Q-R prediction in large-sample surveys. International Statistical Review. 55, 191–202.

Scott, A. and Smith, T.M.F. (1974). Linear superpopulation models in survey and sampling. Sankhya, C. 36, 143–146.

Montanari, G.E. (1999). A study on the conditional properties of finite population mean estimators. Metron. 57, 21–35.

Tam, S.M. (1986). Characterization of best model-based predictors in survey sampling. Biometrika. 73, 232–235.

Pfeffermann, D. (1984). Note on large sample properties of balanced samples. Journal of the Royal Statistical Society. 46, 38–41. Rao, J. N. K. (2002). Small area estimation : Theory and Methods. Wiley, New York. Robinson, P.M. and Särndal, C.E. (1983). Asymptotic properties of the generalized regression estimator in probability sampling. Sankhya Series B. 45, 240–248. Royall, R.M. (1970). On finite population sampling theory under certain linear regression models. Biometrika. 57, 377-387.

Tallis, G.M. (1978). Note on robust estimation infinite populations. Sankya C. 40, 136–138. Watson, D. J. (1937). The estimation of leaf area in field crops. Journal of the Agricultural Science. 27, 474–483. Wright, R.L. (1983). Finite population sampling with multivariate auxiliary information. Journal of the American Statistical Association. 78, 879–884. Zyskind, G. (1976). On canonical forms, non-negative covariance matrices and best and simple least squares linear estimators in linear models. Annals of Mathematical Statistics. 38, 1092–1109.

Royall, R.M. (1976). The linear least squares prediction approach to two-stage sampling. Journal of the American Association. 71, 657–664. Royall, R.M. (1992). Robust and optimal design under prediction models for finite populations. Survey Methodology. 18, 179–185. Royall, R.M. and Cumberland, W.G. (1981a). An empirical study of the ratio estimator and estimators of its variance. Journal of the American Statistical Association. 76, 66–77. Royall, R.M. and Cumberland, W.G. (1981b). The finite population linear regression estimator and estimators of its variance, an empirical study. Journal of the American

1099

Model Weights for Regression Estimation - American Statistical ...

Model Weights for Regression Estimation - American Statistical ...

Suggest Documents

Statistical Graphics for Survey Weights

Linear regression for survey data using regression weights

Nonparametric Regression Estimation for

chngpt: threshold regression model estimation ... - BMC Bioinformatics

Adaptive Estimation of Heteroscedastic Linear Regression Model ...

Estimation in multiple regression model with

estimation of hyperparameters on parametric regression model

Estimation of Variance Components for Lamb Weights

Shrinkage Inverse Regression Estimation for Model Free Variable ...

A New Regression Based Model for Estimation of the Process ...

Estimation of slope for linear regression model ... - Semantic Scholar

A New Regression Based Model for Estimation of the Process ...

weights for classical groups - American Mathematical Society

weights for classical groups - American Mathematical Society

Chapter 1 STATISTICAL REGRESSION

A Statistical Model for Simultaneous Template Estimation, Bias ...

Using Regression to Establish Weights for a Set of Composite

Confidence Estimation for Statistical Machine

Ethical Guidelines for Statistical Practice - American Statistical ...

Regression-based techniques for statistical ... - Semantic Scholar

Statistical Test for Multivariate Geographically Weighted Regression ...

Visual Statistical Inference for Regression ... - Semantic Scholar

EMPIRICAL STATISTICAL MODEL FOR

Spatial regression modeling for compositional ... - Statistical Science