Design Consistent Estimators for a Mix Linear Model on Survey Data

2003 Joint Statistical Meetings - Section on Survey Research Methods

Design Consistent Estimators for a Mixed Linear Model on Survey Data Rong Huang and Mike Hidiroglou Business Survey Methods Division, Statistics Canada, Ottawa, Ontario K1Y 0A6 ABSTRACT Some investigations associated with large government surveys typically require statistical analysis for populations that have a complex hierarchical structure. Classical analysis often fail to account for the nature of complex sampling designs and possibly result in incorrect inference for the parameters of interest. Linear mixed models can be used to analyze survey data collected from such populations in order to incorporate the complex hierarchical design structure. In this paper, we develop a method for estimating the parameters of the linear mixed model accounting for such sampling designs. We obtain the pseudo best linear unbiased estimators for the fixed and random effects by solving weighted sample estimating equations. The use of survey weights results in design consistent estimation. We also derive estimators for variance components for the nested error linear regression model. We compare the efficiency of the proposed estimators with that of existing estimators using a simulation study. This simulation study uses a two stage sampling design. Several informative or non-informative sampling schemes are considered in the simulation.

1

Classical analysis often fail to account for the nature of complex sampling designs and possibly result in biased estimators for the parameters of interest. Linear mixed models (LMM) or multilevel models (Goldstein, 1995) have been used to analyze survey data collected from populations with complex hierarchical design structures. Related work can be found in Henderson (1975), Laird and Ware (1982), Battese, et.al. (1983), Prasad and Rao (1990), and Schall (1991). Ignoring sampling designs might still be able to produce unbiased estimators for the parameter of interest in some cases, such as, when the characteristics of the sampling design can be fully explained by the LMM (non-informative design). However, when sampling designs are informative in the sense that sampling weights depend on the values of the response, even after conditioning on the covariates in the LMM, the conventional estimators of the model parameters can be biased. How to incorporate informative sampling designs into the multilevel analysis has generated much research, especially in the field of small area estimation (Prasad and Rao (1999), and You and Rao (2002)). Analytical methods for estimating model parameters incorporating informative designs are desired for general LMM on complex survey data.

Introduction

In this paper, we develop a general method for estimating the parameters of the linear mixed Many large government surveys conducted model on survey data accounting for sampling by government agencies utilize multistage sam- designs. In section 2, we introduce the LMM pling schema because of the nature of the multi- we have considered and two special cases that level or hierarchical structure of the population have been widely employed for analyzing survey from which the sample units are selected. Conse- data. In section 3, we propose the pseudo empirquently, investigations associated with such sur- ical best linear unbiased estimators for the fixed veys typically require statistical analysis for pop- and random effects by solving weighted sample ulations that have a complex hierarchical struc- estimating equations (WSEE). In section 4, we ture. develop estimators for variance components for

1897


the nested error linear regression model. In sec- Z − = {V −i }M −i = −D −Z −T , where V i=1 with blocks V T tion 5, we discuss the model-based and design- R −i + Z −i D −i Z −i , i = 1, . . . , M . based properties of the estimators for variance components. We then compare the efficiency of 2.1 Nested error linear regression model the proposed estimators with that of existing estimators using a simulation study in section 6. The nested error linear regression model is We conclude our paper in section 7. a special case of model (2.1). Battese, Harter and Fuller (1988), Prasad and Rao (1990), etc., 2 Model used the nested error linear regression model to estimate the mean of small areas. In the nested Suppose that a population U has a two-level error linear regression model, random effects in structure represented by levels 1 and 2. Using the model are essentially the area effects. Thus, the Goldstein (1995) terminology, level 2 has M bi becomes a scalar denoted as bi with covariates clusters and the ith cluster has Ni level 1 ele− z ij = 1. For each observation Yij , we have ments. Let Yij denote the response of interest from the j th element in the ith cluster. Let − z ij = x Tij β + bi + ij (2.3) Yij = − i denote a vector of characteristics asvec{zkij }qk=1 Random area effects bi , i = 1, . . . , M are indesociated with a qi × 1 random effect bi for the j th element within the ith cluster, where vec is an pendent to one another with mean 0 and a com2 T T T − = cov(b −) = σb2 IM . Ranoperator such that vec{•k }K k=1 = (•1 , . . . , •K ) . mon variance σb , i.e., D p Let − x ij = vec{xkij }k=1 , with x1ij = 1, denote dom errors ij , i = 1, . . . , M, j = 1, . . . , Ni are asthe vector of characteristics associated with fixed sumed to be independent with a mean 0 and a − = cov() = σe2 IN . effect β for the j th element within the ith cluster. common variance σe2 , i.e., R Let ij be the random disturbance and the linear mixed model we considered can be expressed as 2.2 Simple random effects model In a simple random effects model, all elex Tij β + − z Tij bi + ij (2.1) Yij = − ments share a common mean, i.e., the fixed effor i = 1, . . . , M, j = 1, . . . , Ni . fects β reduces to a scalar µ. Consequently, coModel (2.1) can be expressed in a matrix form. variate − x ij = 1. Random effects are essentially M i and Y = vec{Y } , a Let Yi = vec{Yij }N the cluster effects. Vector bi becomes a scalar i i=1 j=1 M −ij = 1, which are denoted as b i with covariates z N × 1 vector, where N = i=1 Ni . Similarly, let Ni the same as in the nested error linear regression i = vec{ij }j=1 , and = vec{i }M i=1 . Denote model (2.3). The simple random effects model i vec{bi }M as − b . Let Z −i = vec{z −Tij }N and Z −= i=1 j=1 can be expressed as denotes the direct sum. {Z −i }M i=1 , where Matrix Z − is block diagonal with Z −i , i = 1, . . . , M (2.4) Yij = µ + bi + ij , i −Tij }N as its diagonal blocks. Let X −i = vec{x j=1 and which is also known as a one-way classification X − = vec{X −i }M i=1 . The matrix expression of (2.1) model in ANOVA. is then: Y=X −β+Z −b − + (2.2) The random effects bi and the disturbance i are assumed to have mean 0 and variance−i , respectively. We covariance matrices D −i and R assume that bi and i are independent. The variance-covariance matrix R − = cov(Y|b −) is normally made up of diagonal blocks, R − = {R −i }M i=1 . The random effects bi , i = 1, . . . , M are usually independent from one another, which implies thatthe variance-covariance matrix of − b is D − = {D −i }M i=1 . The resulting variancecovariance matrix of responses Y is V − = R −+

3

3.1

Estimation

Sampling design set-up

Given the population model (2.2), we can obtain a corresponding census estimating equation (CEE). Since population data are usually not available in practice, we estimate parameters in the CEE using a sample. We assume a twostage sampling design. At stage 1, m level 2 clusters are selected from the M population clusters with inclusion probability πi , i = 1, . . . , m. At

1898


stage 2, ni elements are selected with inclusion probability πj|i from the Ni elements within the ith selected cluster. Totally, n = i ni units are sampled from the population. The weight associated with the ith cluster is wi = πi−1 , whereas the one associated with the j th selected element in −1 the ith selected cluster is wj|i = πj|i . Hence, the inclusion probability associated with the ij th unit is πij = π = wi wj|i . Let i πj|i and its weight is wij i {wj|i }nj=1 and W = {wi W.|i }m W.|i = i=1 be the matrices of second stage weights and final weights, respectively. We further assume that the sample data y follow the same model as the population data,

when the variance-covariance matrix cov(Y|b) = ˆ 1 is the optimal estimator for R is diagonal, Φ Φ1 in the CEE by Theorem 1 in Godambe and Thompson (1986). A similar argument fits for ˆ 2i . Φ Solving the WSEE (3.2), we obtain the pseudo BLUP for β and b as ˆ = {XT [V∗ ]−1 X}− XT [V∗ ]−1 y β

(3.4)

and

ˆ ˆ i = Ωi Di ZT [V∗ ]−1 (yi − Xi β) (3.5) b i i −1 ∗ where V∗ = {Vi∗ }m i=1 with Vi = πi [W.|i Ri + Zi Di ZTi ]. y = Xβ + Zb + e. (3.1) When the variance components are known, the ˆ and b ˆ i are covariances of β The random effects b and the disturbance e ˆ = {XT [V∗ ]−1 X}− {XT [V∗ ]−1 V are assumed to have mean 0 and variancecov(β) −1 −1 covariance matrices D and R, respectively. The [V∗ ] X}{XT [V∗ ] X}− (3.6) variance-covariance matrix R = cov(y|b) = −i = and {Ri }m i is a sub-matrix of R i=1 , where R cov(Yi |bi ), D = {Di }m The resulting i=1 . ˆ i ) = {Ωi Di ZT [V∗ ]−1 }C cov(b variance-covariance matrix of responses y is V = i i −1 R + ZDZT , where V = {Vi }m {Ωi Di ZTi [Vi∗ ] }T (3.7) i=1 with blocks T Vi = Ri + Zi Di Zi , i = 1, . . . , m. ˆ T− ˆ = Vi + Xi cov(β)X Under model (3.1), the unweighted sample es- where C = cov(yi − Xi β) i −1 −1 ∗ T ∗ − T timating equation (UWSEE) Φ0 is 2Vi [Vi ] Xi [X [V ] X] Xi . ˆ 2i + Φ ˆ 3i in WSEE (3.3) does The function Φ Φ01 0 not reduce to Φ + Φ + =0 (3.2) Φ0 (β, b) = 02i 03i in simple SEE (3.2) Φ02 Φ03 for simple random sampling (SRS) design. To w ij where Φ01 (y, β, b) = XT R−1 (y − Xβ − Zb), adjust for SRS design, we define Ω†i = j w2 Iqi ij m j Φ02 (y, β, b) = vec{Φ02i (yi , β, bi )}i=1 , † m † T −1 and Ω = {Ω } . The WSEE adjusted for i=1 i Φ02i (yi , β, bi ) = Zi Ri (yi − Xi β − Zi bi ) † m SRS is Φ and Φ03 (b) = vec{Φ03i (bi )}i=1 , and Φ03i (bi ) = −D−1 i bi for i = 1, . . . , m. ˆ1 0 Φ † Investigations (e.g. Pfeffermann, et al. (1998)) Φ (β, b) = ˆ 2 + [Ω† ]−1 Φ3 = 0. (3.8) Φ showed that estimates obtained by solving the UWSEE can be biased when sampling is informaIn the case of SRS at cluster level and element tive. Incorporating design weights, we construct M Ni i level, W.|i = N I and W = { ni Ini }m n i i=1 . ni m a WSEE as † M Ni th The matrix Ωi = m ni Ini . The i component ˆ1 0 Φ ∗ ˆ 2 + [Ω† ]−1 Φ3 = 0 in (3.7) reduces Φ (β, b) = ˆ + = 0 (3.3) of equation Φ −1 Ω Φ3 Φ2 Φ02i + Φ03i = 0 as in the simple SEE (3.2). Solving Φ† , we obtain another set of pseudo ˆ 2i = ˆ 1 = XT R−1 W(y − Xβ − Zb), Φ where Φ ˆ † ˆ† ZTi R−1 (yi − Xi β − Zi bi ), i = 1 . . . , m. De- BLUP β , bi and their variance-covariance ma-† i W.|i trices by substituting V∗ by V† and Ω by Ω fine Ω = {Ωi }m i=1 as a matrix of the first † † −1 T order inclusion probabilities πi , i = 1, . . . , m, in 3.4-3.7, where V = W R + Ω ZDZ , † T ˆ ˆ ˆ 2 = vec{π −1 Φ ˆ 2i }m = C = cov(yi − Xi β) = Vi + Xi cov(β)Xi − with Ωi = πi Iq . Let Φ i

i

i=1

ZT R−1 W(y − Xβ − Zb). It can be shown that

−1

2Vi [Vi† ]

1899

−1

Xi [XT [V† ]

X]− XTi .


When the variance components are unknown, The respective variances of µ ˆ and ˆbi can be we obtain the pseudo EBLUP for β and b and obtained when σe2 and σb2 are known. their covariances by plugging in the consistent Replacing γi by γi† in (3.11) and (3.12) results estimates of variance components as discussed in in the pseudo BLUP µ ˆ† and ˆb†i , respectively. Note section 4. n (1−˜ γi )y i ˆ 0 = i n (1−˜γ ) i that estimator µ ˆ† reduces to µ

3.2

i

i

i Nested error linear regression in the case of SRS, where γ˜i = σ2 /(σ2 /n 2 i + σb ). e b model The estimator µ ˆ0 coincides with the one ob-

The variance-covariance matrix for the random effects vector b is D = σb2 Im , for the random errors is R = σe2 In , and the covariate matrix is Zi = 1ni . Consequently, the covariance of yi is Vi = σb2 Jni + σe2 Ini , where Jni = 1ni 1Tni . Using (3.4) and (3.5), we obtain the pseudo BLUP estimator for fixed effects as ˆ =[ β xij dTij ]−1 [ yij dij ] (3.9) ij

ij

tained by applying the Prasad and Rao (1990) procedure on the simple random effects model. Note that the method of Prasad and Rao (1990) does not account for sampling weights. Estimators µ ˆ and µ ˆ† result from using both individual level and aggregated level models. Prasad and Rao (1999) proposed an additional estima † † tor µ ˆ∗ = i γi y iw / i γi for µ using only an aggregated level model. We compare these four ˆ0 , µ ˆ∗ ) with a simulation estimators for µ (ˆ µ, µ ˆ† , µ study in Section 6.

and that for random effects bi as ˆbi = γi (y − xT β) ˆ iw iw

(3.10)

4

where d = wij (xij − γ γi = ij i xiw ) and σb2 /(σe2 / j wj|i +σb2 ), y = w y / iw j j|i ij j wj|i and xiw = j wj|i xij / j wj|i . The estimators (3.9) agrees with that obtained by Pfeffermann, et al. (1998). It is straightforward to obtain the ˆ corresponding variance-covariance matrices of β and ˆbi , i = 1, . . . , m. The pseudo BLUP adjusted for SRS can be obtained by solving the WSEE (3.8). Resulting ˆ † and ˆb† have the same expression estimators β i as (3.9) and (3.10), respectively, with γi replaced

Estimation of components

variance

Maximum likelihood (ML) method, restricted maximum likelihood (REML) method and the EM algorithm have been used for estimating the variance-covariance of response vector y. Pfeffermann et al. (1998) proposed an iterative process for estimating the variance-covariance matrix in model (2.2) using only first-order selection probabilities. For the special case of the nested error linear regression model, Prasad and Rao (1990) 2 and You and Rao (2002) adopted Henderson’s wij ˆ† by γi† = σb2 /(σe2 ( jw )2 + σb2 ). Estimators β method III to estimate the variance components ij j σb2 and σe2 . However, the accuracy of these esti† ˆ and bi we obtained for the nested error linear mated variances is not known in the design based regression model are in coincidence with those context. Furthermore, they do not take into acby You and Rao (2002). count the sampling design. One way to account for the sampling design 3.3 Simple random effects model is to duplicate the sample data using sample weights to obtain a pseudo population. The Substituting xij = 1 into (3.9) and (3.10), we resulting pseudo population resembles the true obtain the pseudo BLUP estimator of the mean population in so far as that the sampled cluster as i can be considered to have been selected with w (1 − γ )y i iw ij ij (3.11) probability πi and that the sampled element j µ ˆ= ij wij (1 − γi ) within cluster i can be considered to have been selected with probability πj|i from this pseudo and that for the random effects population. bi = γi (yiw − µ The duplication occurs in two steps. We ˆ). (3.12)

1900


assume that sampling weights {wi }’s and {wj|i }’s are integers. First, the response yij and the covariates xij and zij observed on the ij th element are duplicated wj|i ˜i = times. The response vector is y inflated i . Similarly, the inflated coyij }nj=1 vec{1wj|i T ni ˜ i = vec{1w xij }j=1 and variate matrix is X j|i T ni ˜ i = vec{1w Z } . z ij j=1 j|i ˜ i , the covariate Second, the inflated vector y ˜ i are duplicated wi times to ˜ i and Z matrices X ˜ ˜ , covariate matrices X form the response vector y ˜ of the pseudo population. This results in and Z m ˜ = vec{1wi X ˜ i }m , ˜ = vec{1 y y ˜i }i=1 , X wi i=1 m ˜ = {Iwi ˜ i} . and Z Z i=1 Denote inflated random disturbances as ˜ and ˜ Their respective inflated random effects as b. ˜ and D. ˜ Here, variance-covariance matrices are R we assume that the components of ˜, {˜ ij }’s, are ˜ {˜bi }’s, independent. Also, the components of b, are independent. The resulting model for this pseudo population is ˜+˜ ˜ +Z ˜b ˜ = Xβ y (4.1)

For the simple random effects model, by substituting xij = 1 and p = 1 into (4.2) and (4.3), the estimators for variance components are obtained as 2 = wij (yij − y iw )2 /( wij − wi ) σ ê0w ij

ij

i

(4.4) and

wij (yiw − y w )2 − ( i wi − 1)ˆ σe2 2 ij wij − i wi ( j wj|i ) / ij wij (4.5) where yw = ij wij yij / ij wij . Estimators (4.4) and (4.5) are identical to those obtained by Graubard and Korn (1996) 2 2 (ˆ σeC and σ âC ), respectively. As noted by Graubard and Korn, estimators (4.4) and (4.5) 2 = do simple sample estimators ê0u σ not reduce2 to 2 (y − y ) / (n − 1) and σ ˆ = [ (y ij − i b0u ij ij i i ij y)2 − (n − 1)ˆ σe2 ]/(n − i n2i /n) (denoted as 2 2 σ êA and σ ˆbA in Graubard and Korn (1996)) for some non-informative sample designs, where yi = j yij /ni . ˜ = Furthermore, under model (2.3) for sample ˜ is V and the variance-covariance matrix of y T ˜ ˜ ˜ ˜ data, estimators (4.2) and (4.3), and thus (4.4) R + ZDZ . Using Henderson’s method III on the pseudo and (4.5) are not model unbiased. To adjust for population, the sum of square errors (SSE) the bias, we calculate the expectation of SSE and for fitting a nested error linear regression S(b|β) under model (2.3). Unbiased estimators model (4.1) population is for σe2 and σb2 are obtained as on the pseudo 2 SSE = ij − y iw ) − ij wij (y ij wij (yij − 2 j wij y iw )(xij − xiw )T [ ij wij (xij − xiw )(xij − 2 σ êadj = SSE/[ wij − − t2 )] (4.6) xiw )T ]−1 ij wij (yij − y iw )(xij − xiw ) and the j wij ij i estimator for σe2 is and 2 σ êw = SSE/( wij − wi − p + 1). (4.2) 2 σ ˆb2adj = [ wij yij −( wij yij xTij )( wij xij xTij )−1 ij i

2 σ ˆb0w

ij =

ij

Denote the difference between the reduction of sum of squares due to fitting model (4.1) ˜ + ˜ ˜= Xβ and that due to fitting model y 2 as S(b|β), where S(b|β) = ij wij yij − ( ij wij yij xTij )( ij wij xij xTij )−1 ( ij wij yij xij )− SSE. Consequently, the variance component σb2 can be estimated by 2 2 σ ˆbw = [S(b|β) − ( wi − 1)ˆ σew ]/( wij − t1 ),

ij

ij

ij

wij yij xij )−( wij−t3 )ˆ σe2 ]/( wij−t4 ), (4.7) ij

ij

where t2 = tr{[ ij wij (xij − xiw )(xij − 2 xiw )T ]− ij wij (xij − xiw )(xij − xiw )T }, t3 = T − 2 T tr{[ ij wij xij xij ] ij wij xij xij }, and t4 = tr{ ij wij xij xTij ]−1 i ( j wij )2 xiw xTiw }. Estimators (4.6) and (4.7) reduce to simple i ij sample estimators resulting from Henderson’s (4.3) method III that were used by Prasad and Rao where t1 = tr[( ij wij xij xTij )−1 i wi ( j wj|i )2 (1990) and You and Rao (2002) for some nonxiw xTiw ]. informative sampling designs.

1901


For the simple random effects model (2.4), the model unbiased estimators for σe2 and σb2 are obtained by substituting xij = 1 into (4.6) and (4.7) 2 ij wij (yij − y iw ) 2 2 (4.8) σ ê0adj = wij j w − ij ij i

sampling design given in section 3.2 is used to draw the sample. As noted in section 4, estima2 and tors (4.4) and (4.5) are identical to the σ êc 2 σ ˆbc in Graubard and Korn (1996). We provide the design-based and model-based properties of the following variance component estimators given in section 4, as well as those wij j provided in Graubard and Korn (1996), and Korn and and Graubard (2003). The various variance components estimators 2 2 σe ij wij (y iw − y w ) − cˆ 2 for σe2 are σ ˆb0 = , (4.9) adj ( wij )2 i m ni j ij wij − wij (yij − yi )2 i=1 ij 2 mj=1 , (5.3) σ êA = 2 i=1 (ni − 1) where c = − i j wij / j wij m ni 2 2 w / w . i=1 j=1 wij (yij − y iw ) ij ij ij ij 2 , (5.4) = σ ˆ 2 2 2 m ni eB Estimators σ ê0 and σ ˆb0 reduce to σ ê0u adj adj i=1 j=1 (wij − 1) 2 ni m and σ ˆb0u for some non-informative sample de2 j=1 wj|i (yij − y iw ) i=1 wi 2 signs considered by Graubard and Korn (1996) ni σ êC = , and m (e.g. SRS with wi = M/m and wj|i = N/n). j=1 wj|i − 1) i=1 wi ( (5.5) ni m 2 j=1 (yij − y i ) i=1 wi 5 Model and Design-based 2 m σ êD = , (5.6) i=1 w i (ni − 1) properties of variance ni where w i = j=1 wij /ni is the within cluster i components sample mean of the weights wij , and In this section, we will investigate the model and design based properties of variance components for the simple random effects model (2.4) as well as others discussed in Korn and Graubard (2003). Given that the population values Yij are assumed to satisfy the simple random effects model (2.4), we are interested in estimating the variances of the {bi }’s and {ij }’s, namely σb2 and σe2 . The model-based variance components Se2 and Sb2 are estimated via the methods of moments (Searle et al. (1992), page 106). Let

1 2 2 σ êKG =

m

i=1

ni (Ni −1)I(ni >1) πi

m

i=1

(yik −yij )2 (2) π (kj)|i ni 1 k

Design Consistent Estimators for a Mix Linear Model on Survey Data

Design Consistent Estimators for a Mix Linear Model on Survey Data

Suggest Documents

Consistent Estimators for Panel Duration Data with

SPLINE ESTIMATORS FOR THE FUNCTIONAL LINEAR MODEL

SPLINE ESTIMATORS FOR THE FUNCTIONAL LINEAR MODEL

consistent procedures for mixed linear model selection

Design and Implementation of a Consistent Data Store for a ...

iterative estimators of parameters in a linear model with partially ...

Classical and Bayesian Linear Data Estimators for ... - Semantic Scholar

A Fuzzy-Neuro Model for Normal Concrete Mix Design - CiteSeerX

consistent procedures for mixed linear model selection - Sankhya

Projection Estimators for Generalized Linear Models

On a Class of Parameters Estimators in Linear Models

Data and model uncertainty estimation for linear

Robust Estimators for the Fixed Effects Panel Data Model - CiteSeerX

A consistent dynamical model for pion-photoproduction

A thermodynamically consistent model for multicomponent ... - arXiv

On altimetric data assimilation experiments in a linear model ... - Horizon

A Lane consistent optical model potential for nucleon scattering on ...

A Thermodynamically Consistent Damage Model for ... - CiteSeerX

Admissible Estimators in the General Multivariate Linear Model ... - EMIS

Biogeographical Survey Identifies Consistent

Robust estimators for generalized linear models with a dispersion ...

Improved Linear Combination of Two Estimators for a Function of ...

Local linear estimators and a statistical framework for event related ...

Data-driven calibration of linear estimators with minimal penalties