can be used to analyze survey data collected from such populations in order to incorporate the com- plex hierarchical design structure. In this paper, we develop ...
2003 Joint Statistical Meetings - Section on Survey Research Methods
Design Consistent Estimators for a Mixed Linear Model on Survey Data Rong Huang and Mike Hidiroglou Business Survey Methods Division, Statistics Canada, Ottawa, Ontario K1Y 0A6 ABSTRACT Some investigations associated with large government surveys typically require statistical analysis for populations that have a complex hierarchical structure. Classical analysis often fail to account for the nature of complex sampling designs and possibly result in incorrect inference for the parameters of interest. Linear mixed models can be used to analyze survey data collected from such populations in order to incorporate the complex hierarchical design structure. In this paper, we develop a method for estimating the parameters of the linear mixed model accounting for such sampling designs. We obtain the pseudo best linear unbiased estimators for the fixed and random effects by solving weighted sample estimating equations. The use of survey weights results in design consistent estimation. We also derive estimators for variance components for the nested error linear regression model. We compare the efficiency of the proposed estimators with that of existing estimators using a simulation study. This simulation study uses a two stage sampling design. Several informative or non-informative sampling schemes are considered in the simulation.
1
Classical analysis often fail to account for the nature of complex sampling designs and possibly result in biased estimators for the parameters of interest. Linear mixed models (LMM) or multilevel models (Goldstein, 1995) have been used to analyze survey data collected from populations with complex hierarchical design structures. Related work can be found in Henderson (1975), Laird and Ware (1982), Battese, et.al. (1983), Prasad and Rao (1990), and Schall (1991). Ignoring sampling designs might still be able to produce unbiased estimators for the parameter of interest in some cases, such as, when the characteristics of the sampling design can be fully explained by the LMM (non-informative design). However, when sampling designs are informative in the sense that sampling weights depend on the values of the response, even after conditioning on the covariates in the LMM, the conventional estimators of the model parameters can be biased. How to incorporate informative sampling designs into the multilevel analysis has generated much research, especially in the field of small area estimation (Prasad and Rao (1999), and You and Rao (2002)). Analytical methods for estimating model parameters incorporating informative designs are desired for general LMM on complex survey data.
Introduction
In this paper, we develop a general method for estimating the parameters of the linear mixed Many large government surveys conducted model on survey data accounting for sampling by government agencies utilize multistage sam- designs. In section 2, we introduce the LMM pling schema because of the nature of the multi- we have considered and two special cases that level or hierarchical structure of the population have been widely employed for analyzing survey from which the sample units are selected. Conse- data. In section 3, we propose the pseudo empirquently, investigations associated with such sur- ical best linear unbiased estimators for the fixed veys typically require statistical analysis for pop- and random effects by solving weighted sample ulations that have a complex hierarchical struc- estimating equations (WSEE). In section 4, we ture. develop estimators for variance components for
1897
2003 Joint Statistical Meetings - Section on Survey Research Methods
the nested error linear regression model. In sec- Z − = {V −i }M −i = −D −Z −T , where V i=1 with blocks V T tion 5, we discuss the model-based and design- R −i + Z −i D −i Z −i , i = 1, . . . , M . based properties of the estimators for variance components. We then compare the efficiency of 2.1 Nested error linear regression model the proposed estimators with that of existing estimators using a simulation study in section 6. The nested error linear regression model is We conclude our paper in section 7. a special case of model (2.1). Battese, Harter and Fuller (1988), Prasad and Rao (1990), etc., 2 Model used the nested error linear regression model to estimate the mean of small areas. In the nested Suppose that a population U has a two-level error linear regression model, random effects in structure represented by levels 1 and 2. Using the model are essentially the area effects. Thus, the Goldstein (1995) terminology, level 2 has M bi becomes a scalar denoted as bi with covariates clusters and the ith cluster has Ni level 1 ele− z ij = 1. For each observation Yij , we have ments. Let Yij denote the response of interest from the j th element in the ith cluster. Let − z ij = x Tij β + bi + ij (2.3) Yij = − i denote a vector of characteristics asvec{zkij }qk=1 Random area effects bi , i = 1, . . . , M are indesociated with a qi × 1 random effect bi for the j th element within the ith cluster, where vec is an pendent to one another with mean 0 and a com2 T T T − = cov(b −) = σb2 IM . Ranoperator such that vec{•k }K k=1 = (•1 , . . . , •K ) . mon variance σb , i.e., D p Let − x ij = vec{xkij }k=1 , with x1ij = 1, denote dom errors ij , i = 1, . . . , M, j = 1, . . . , Ni are asthe vector of characteristics associated with fixed sumed to be independent with a mean 0 and a − = cov() = σe2 IN . effect β for the j th element within the ith cluster. common variance σe2 , i.e., R Let ij be the random disturbance and the linear mixed model we considered can be expressed as 2.2 Simple random effects model In a simple random effects model, all elex Tij β + − z Tij bi + ij (2.1) Yij = − ments share a common mean, i.e., the fixed effor i = 1, . . . , M, j = 1, . . . , Ni . fects β reduces to a scalar µ. Consequently, coModel (2.1) can be expressed in a matrix form. variate − x ij = 1. Random effects are essentially M i and Y = vec{Y } , a Let Yi = vec{Yij }N the cluster effects. Vector bi becomes a scalar i i=1 j=1 M −ij = 1, which are denoted as b i with covariates z N × 1 vector, where N = i=1 Ni . Similarly, let Ni the same as in the nested error linear regression i = vec{ij }j=1 , and = vec{i }M i=1 . Denote model (2.3). The simple random effects model i vec{bi }M as − b . Let Z −i = vec{z −Tij }N and Z −= i=1 j=1 can be expressed as denotes the direct sum. {Z −i }M i=1 , where Matrix Z − is block diagonal with Z −i , i = 1, . . . , M (2.4) Yij = µ + bi + ij , i −Tij }N as its diagonal blocks. Let X −i = vec{x j=1 and which is also known as a one-way classification X − = vec{X −i }M i=1 . The matrix expression of (2.1) model in ANOVA. is then: Y=X −β+Z −b − + (2.2) The random effects bi and the disturbance i are assumed to have mean 0 and variance−i , respectively. We covariance matrices D −i and R assume that bi and i are independent. The variance-covariance matrix R − = cov(Y|b −) is normally made up of diagonal blocks, R − = {R −i }M i=1 . The random effects bi , i = 1, . . . , M are usually independent from one another, which implies thatthe variance-covariance matrix of − b is D − = {D −i }M i=1 . The resulting variancecovariance matrix of responses Y is V − = R −+
3
3.1
Estimation
Sampling design set-up
Given the population model (2.2), we can obtain a corresponding census estimating equation (CEE). Since population data are usually not available in practice, we estimate parameters in the CEE using a sample. We assume a twostage sampling design. At stage 1, m level 2 clusters are selected from the M population clusters with inclusion probability πi , i = 1, . . . , m. At
1898
2003 Joint Statistical Meetings - Section on Survey Research Methods
stage 2, ni elements are selected with inclusion probability πj|i from the Ni elements within the ith selected cluster. Totally, n = i ni units are sampled from the population. The weight associated with the ith cluster is wi = πi−1 , whereas the one associated with the j th selected element in −1 the ith selected cluster is wj|i = πj|i . Hence, the inclusion probability associated with the ij th unit is πij = π = wi wj|i . Let i πj|i and its weight is wij i {wj|i }nj=1 and W = {wi W.|i }m W.|i = i=1 be the matrices of second stage weights and final weights, respectively. We further assume that the sample data y follow the same model as the population data,
when the variance-covariance matrix cov(Y|b) = ˆ 1 is the optimal estimator for R is diagonal, Φ Φ1 in the CEE by Theorem 1 in Godambe and Thompson (1986). A similar argument fits for ˆ 2i . Φ Solving the WSEE (3.2), we obtain the pseudo BLUP for β and b as ˆ = {XT [V∗ ]−1 X}− XT [V∗ ]−1 y β
(3.4)
and
ˆ ˆ i = Ωi Di ZT [V∗ ]−1 (yi − Xi β) (3.5) b i i −1 ∗ where V∗ = {Vi∗ }m i=1 with Vi = πi [W.|i Ri + Zi Di ZTi ]. y = Xβ + Zb + e. (3.1) When the variance components are known, the ˆ and b ˆ i are covariances of β The random effects b and the disturbance e ˆ = {XT [V∗ ]−1 X}− {XT [V∗ ]−1 V are assumed to have mean 0 and variancecov(β) −1 −1 covariance matrices D and R, respectively. The [V∗ ] X}{XT [V∗ ] X}− (3.6) variance-covariance matrix R = cov(y|b) = −i = and {Ri }m i is a sub-matrix of R i=1 , where R cov(Yi |bi ), D = {Di }m The resulting i=1 . ˆ i ) = {Ωi Di ZT [V∗ ]−1 }C cov(b variance-covariance matrix of responses y is V = i i −1 R + ZDZT , where V = {Vi }m {Ωi Di ZTi [Vi∗ ] }T (3.7) i=1 with blocks T Vi = Ri + Zi Di Zi , i = 1, . . . , m. ˆ T− ˆ = Vi + Xi cov(β)X Under model (3.1), the unweighted sample es- where C = cov(yi − Xi β) i −1 −1 ∗ T ∗ − T timating equation (UWSEE) Φ0 is 2Vi [Vi ] Xi [X [V ] X] Xi . ˆ 2i + Φ ˆ 3i in WSEE (3.3) does The function Φ Φ01 0 not reduce to Φ + Φ + =0 (3.2) Φ0 (β, b) = 02i 03i in simple SEE (3.2) Φ02 Φ03 for simple random sampling (SRS) design. To w ij where Φ01 (y, β, b) = XT R−1 (y − Xβ − Zb), adjust for SRS design, we define Ω†i = j w2 Iqi ij m j Φ02 (y, β, b) = vec{Φ02i (yi , β, bi )}i=1 , † m † T −1 and Ω = {Ω } . The WSEE adjusted for i=1 i Φ02i (yi , β, bi ) = Zi Ri (yi − Xi β − Zi bi ) † m SRS is Φ and Φ03 (b) = vec{Φ03i (bi )}i=1 , and Φ03i (bi ) = −D−1 i bi for i = 1, . . . , m. ˆ1 0 Φ † Investigations (e.g. Pfeffermann, et al. (1998)) Φ (β, b) = ˆ 2 + [Ω† ]−1 Φ3 = 0. (3.8) Φ showed that estimates obtained by solving the UWSEE can be biased when sampling is informaIn the case of SRS at cluster level and element tive. Incorporating design weights, we construct M Ni i level, W.|i = N I and W = { ni Ini }m n i i=1 . ni m a WSEE as † M Ni th The matrix Ωi = m ni Ini . The i component ˆ1 0 Φ ∗ ˆ 2 + [Ω† ]−1 Φ3 = 0 in (3.7) reduces Φ (β, b) = ˆ + = 0 (3.3) of equation Φ −1 Ω Φ3 Φ2 Φ02i + Φ03i = 0 as in the simple SEE (3.2). Solving Φ† , we obtain another set of pseudo ˆ 2i = ˆ 1 = XT R−1 W(y − Xβ − Zb), Φ where Φ ˆ † ˆ† ZTi R−1 (yi − Xi β − Zi bi ), i = 1 . . . , m. De- BLUP β , bi and their variance-covariance ma-† i W.|i trices by substituting V∗ by V† and Ω by Ω fine Ω = {Ωi }m i=1 as a matrix of the first † † −1 T order inclusion probabilities πi , i = 1, . . . , m, in 3.4-3.7, where V = W R + Ω ZDZ , † T ˆ ˆ ˆ 2 = vec{π −1 Φ ˆ 2i }m = C = cov(yi − Xi β) = Vi + Xi cov(β)Xi − with Ωi = πi Iq . Let Φ i
i
i=1
ZT R−1 W(y − Xβ − Zb). It can be shown that
−1
2Vi [Vi† ]
1899
−1
Xi [XT [V† ]
X]− XTi .
2003 Joint Statistical Meetings - Section on Survey Research Methods
When the variance components are unknown, The respective variances of µ ˆ and ˆbi can be we obtain the pseudo EBLUP for β and b and obtained when σe2 and σb2 are known. their covariances by plugging in the consistent Replacing γi by γi† in (3.11) and (3.12) results estimates of variance components as discussed in in the pseudo BLUP µ ˆ† and ˆb†i , respectively. Note section 4. n (1−˜ γi )y i ˆ 0 = i n (1−˜γ ) i that estimator µ ˆ† reduces to µ
3.2
i
i
i Nested error linear regression in the case of SRS, where γ˜i = σ2 /(σ2 /n 2 i + σb ). e b model The estimator µ ˆ0 coincides with the one ob-
The variance-covariance matrix for the random effects vector b is D = σb2 Im , for the random errors is R = σe2 In , and the covariate matrix is Zi = 1ni . Consequently, the covariance of yi is Vi = σb2 Jni + σe2 Ini , where Jni = 1ni 1Tni . Using (3.4) and (3.5), we obtain the pseudo BLUP estimator for fixed effects as ˆ =[ β xij dTij ]−1 [ yij dij ] (3.9) ij
ij
tained by applying the Prasad and Rao (1990) procedure on the simple random effects model. Note that the method of Prasad and Rao (1990) does not account for sampling weights. Estimators µ ˆ and µ ˆ† result from using both individual level and aggregated level models. Prasad and Rao (1999) proposed an additional estima † † tor µ ˆ∗ = i γi y iw / i γi for µ using only an aggregated level model. We compare these four ˆ0 , µ ˆ∗ ) with a simulation estimators for µ (ˆ µ, µ ˆ† , µ study in Section 6.
and that for random effects bi as ˆbi = γi (y − xT β) ˆ iw iw
(3.10)
4
where d = wij (xij − γ γi = ij i xiw ) and σb2 /(σe2 / j wj|i +σb2 ), y = w y / iw j j|i ij j wj|i and xiw = j wj|i xij / j wj|i . The estimators (3.9) agrees with that obtained by Pfeffermann, et al. (1998). It is straightforward to obtain the ˆ corresponding variance-covariance matrices of β and ˆbi , i = 1, . . . , m. The pseudo BLUP adjusted for SRS can be obtained by solving the WSEE (3.8). Resulting ˆ † and ˆb† have the same expression estimators β i as (3.9) and (3.10), respectively, with γi replaced
Estimation of components
variance
Maximum likelihood (ML) method, restricted maximum likelihood (REML) method and the EM algorithm have been used for estimating the variance-covariance of response vector y. Pfeffermann et al. (1998) proposed an iterative process for estimating the variance-covariance matrix in model (2.2) using only first-order selection probabilities. For the special case of the nested error linear regression model, Prasad and Rao (1990) 2 and You and Rao (2002) adopted Henderson’s wij ˆ† by γi† = σb2 /(σe2 ( jw )2 + σb2 ). Estimators β method III to estimate the variance components ij j σb2 and σe2 . However, the accuracy of these esti† ˆ and bi we obtained for the nested error linear mated variances is not known in the design based regression model are in coincidence with those context. Furthermore, they do not take into acby You and Rao (2002). count the sampling design. One way to account for the sampling design 3.3 Simple random effects model is to duplicate the sample data using sample weights to obtain a pseudo population. The Substituting xij = 1 into (3.9) and (3.10), we resulting pseudo population resembles the true obtain the pseudo BLUP estimator of the mean population in so far as that the sampled cluster as i can be considered to have been selected with w (1 − γ )y i iw ij ij (3.11) probability πi and that the sampled element j µ ˆ= ij wij (1 − γi ) within cluster i can be considered to have been selected with probability πj|i from this pseudo and that for the random effects population. bi = γi (yiw − µ The duplication occurs in two steps. We ˆ). (3.12)
1900
2003 Joint Statistical Meetings - Section on Survey Research Methods
assume that sampling weights {wi }’s and {wj|i }’s are integers. First, the response yij and the covariates xij and zij observed on the ij th element are duplicated wj|i ˜i = times. The response vector is y inflated i . Similarly, the inflated coyij }nj=1 vec{1wj|i T ni ˜ i = vec{1w xij }j=1 and variate matrix is X j|i T ni ˜ i = vec{1w Z } . z ij j=1 j|i ˜ i , the covariate Second, the inflated vector y ˜ i are duplicated wi times to ˜ i and Z matrices X ˜ ˜ , covariate matrices X form the response vector y ˜ of the pseudo population. This results in and Z m ˜ = vec{1wi X ˜ i }m , ˜ = vec{1 y y ˜i }i=1 , X wi i=1 m ˜ = {Iwi ˜ i} . and Z Z i=1 Denote inflated random disturbances as ˜ and ˜ Their respective inflated random effects as b. ˜ and D. ˜ Here, variance-covariance matrices are R we assume that the components of ˜, {˜ ij }’s, are ˜ {˜bi }’s, independent. Also, the components of b, are independent. The resulting model for this pseudo population is ˜+˜ ˜ +Z ˜b ˜ = Xβ y (4.1)
For the simple random effects model, by substituting xij = 1 and p = 1 into (4.2) and (4.3), the estimators for variance components are obtained as 2 = wij (yij − y iw )2 /( wij − wi ) σ ˆe0w ij
ij
i
(4.4) and
wij (yiw − y w )2 − ( i wi − 1)ˆ σe2 2 ij wij − i wi ( j wj|i ) / ij wij (4.5) where yw = ij wij yij / ij wij . Estimators (4.4) and (4.5) are identical to those obtained by Graubard and Korn (1996) 2 2 (ˆ σeC and σ ˆaC ), respectively. As noted by Graubard and Korn, estimators (4.4) and (4.5) 2 = do simple sample estimators ˆe0u σ not reduce2 to 2 (y − y ) / (n − 1) and σ ˆ = [ (y ij − i b0u ij ij i i ij y)2 − (n − 1)ˆ σe2 ]/(n − i n2i /n) (denoted as 2 2 σ ˆeA and σ ˆbA in Graubard and Korn (1996)) for some non-informative sample designs, where yi = j yij /ni . ˜ = Furthermore, under model (2.3) for sample ˜ is V and the variance-covariance matrix of y T ˜ ˜ ˜ ˜ data, estimators (4.2) and (4.3), and thus (4.4) R + ZDZ . Using Henderson’s method III on the pseudo and (4.5) are not model unbiased. To adjust for population, the sum of square errors (SSE) the bias, we calculate the expectation of SSE and for fitting a nested error linear regression S(b|β) under model (2.3). Unbiased estimators model (4.1) population is for σe2 and σb2 are obtained as on the pseudo 2 SSE = ij − y iw ) − ij wij (y ij wij (yij − 2 j wij y iw )(xij − xiw )T [ ij wij (xij − xiw )(xij − 2 σ ˆeadj = SSE/[ wij − − t2 )] (4.6) xiw )T ]−1 ij wij (yij − y iw )(xij − xiw ) and the j wij ij i estimator for σe2 is and 2 σ ˆew = SSE/( wij − wi − p + 1). (4.2) 2 σ ˆb2adj = [ wij yij −( wij yij xTij )( wij xij xTij )−1 ij i
2 σ ˆb0w
ij =
ij
Denote the difference between the reduction of sum of squares due to fitting model (4.1) ˜ + ˜ ˜= Xβ and that due to fitting model y 2 as S(b|β), where S(b|β) = ij wij yij − ( ij wij yij xTij )( ij wij xij xTij )−1 ( ij wij yij xij )− SSE. Consequently, the variance component σb2 can be estimated by 2 2 σ ˆbw = [S(b|β) − ( wi − 1)ˆ σew ]/( wij − t1 ),
ij
ij
ij
wij yij xij )−( wij−t3 )ˆ σe2 ]/( wij−t4 ), (4.7) ij
ij
where t2 = tr{[ ij wij (xij − xiw )(xij − 2 xiw )T ]− ij wij (xij − xiw )(xij − xiw )T }, t3 = T − 2 T tr{[ ij wij xij xij ] ij wij xij xij }, and t4 = tr{ ij wij xij xTij ]−1 i ( j wij )2 xiw xTiw }. Estimators (4.6) and (4.7) reduce to simple i ij sample estimators resulting from Henderson’s (4.3) method III that were used by Prasad and Rao where t1 = tr[( ij wij xij xTij )−1 i wi ( j wj|i )2 (1990) and You and Rao (2002) for some nonxiw xTiw ]. informative sampling designs.
1901
2003 Joint Statistical Meetings - Section on Survey Research Methods
For the simple random effects model (2.4), the model unbiased estimators for σe2 and σb2 are obtained by substituting xij = 1 into (4.6) and (4.7) 2 ij wij (yij − y iw ) 2 2 (4.8) σ ˆe0adj = wij j w − ij ij i
sampling design given in section 3.2 is used to draw the sample. As noted in section 4, estima2 and tors (4.4) and (4.5) are identical to the σ ˆec 2 σ ˆbc in Graubard and Korn (1996). We provide the design-based and model-based properties of the following variance component estimators given in section 4, as well as those wij j provided in Graubard and Korn (1996), and Korn and and Graubard (2003). The various variance components estimators 2 2 σe ij wij (y iw − y w ) − cˆ 2 for σe2 are σ ˆb0 = , (4.9) adj ( wij )2 i m ni j ij wij − wij (yij − yi )2 i=1 ij 2 mj=1 , (5.3) σ ˆeA = 2 i=1 (ni − 1) where c = − i j wij / j wij m ni 2 2 w / w . i=1 j=1 wij (yij − y iw ) ij ij ij ij 2 , (5.4) = σ ˆ 2 2 2 m ni eB Estimators σ ˆe0 and σ ˆb0 reduce to σ ˆe0u adj adj i=1 j=1 (wij − 1) 2 ni m and σ ˆb0u for some non-informative sample de2 j=1 wj|i (yij − y iw ) i=1 wi 2 signs considered by Graubard and Korn (1996) ni σ ˆeC = , and m (e.g. SRS with wi = M/m and wj|i = N/n). j=1 wj|i − 1) i=1 wi ( (5.5) ni m 2 j=1 (yij − y i ) i=1 wi 5 Model and Design-based 2 m σ ˆeD = , (5.6) i=1 w i (ni − 1) properties of variance ni where w i = j=1 wij /ni is the within cluster i components sample mean of the weights wij , and In this section, we will investigate the model and design based properties of variance components for the simple random effects model (2.4) as well as others discussed in Korn and Graubard (2003). Given that the population values Yij are assumed to satisfy the simple random effects model (2.4), we are interested in estimating the variances of the {bi }’s and {ij }’s, namely σb2 and σe2 . The model-based variance components Se2 and Sb2 are estimated via the methods of moments (Searle et al. (1992), page 106). Let
1 2 2 σ ˆeKG =
m
i=1
ni (Ni −1)I(ni >1) πi
m
i=1
(yik −yij )2 (2) π (kj)|i ni 1 k