Multilevel-model assisted generalized regression estimators for ...

10 downloads 140 Views 17KB Size Report
In design-based domain estimation based on sample survey data, the domain ... (NUTS2) and a jobseeker indicator showing which persons were registered as ...
Multilevel-model assisted generalized regression estimators for domain estimation Risto Lehtonen and Ari Veijanen Statistics Finland Työpajakatu 13 FIN - 00022 STATISTICS FINLAND [email protected] [email protected]

1. Introduction In design-based domain estimation based on sample survey data, the domain statistics such as totals are regarded as fixed unknown quantities. The sampling design and estimated response probabilities alone can be used to derive estimators that are unbiased under repeated sampling from the fixed population. In a model-based approach, in contrast, a model is imposed on the data and the estimators are typically unbiased under hypothetical repeated generation of the population but biased under repeated sampling given a fixed population. Despite the design bias, such model-dependent estimators are often preferred for domain estimation because they can have small variance even for small domains due to efficient use of auxiliary data in the estimation procedure. Multilevel models (Goldstein 1995) are often used in the model-dependent approach for domain estimation. Under a design-based approach for domain estimation, auxiliary data are usually incorporated in an estimation procedure by model-assisted techniques. Linear regression models are often adopted in the construction of generalized regression estimators (GREG; Särndal, et al. 1992; Estevao et al. 1995) that are design-unbiased. GREG estimators can often be expected to be precise especially for domains large enough but in the smallest domains, they tend to have large variance. Thus, small domains seem to call for model-dependent (or composite) estimators. In this paper we introduce a class of designbased GREG estimators (MGREG estimators) for domain estimation that are assisted by a multi-level model. We examine the properties of the MGREG estimators by Monte Carlo methods. 2. Estimators Consider a model yk'f(xk; )% k for standard GREG estimation, where x refers to our auxiliary variables. After estimating the parameter vector from the sample, we estimate the domain total Y d , the sum of yk ’s over a domain Ud , using the fitted values yˆk'f(xk; ˆ ) and observations in the subsample sd ' s _ Ud by the following design-unbiased GREG estimator: (1)

Yˆd' j yˆk% j ak(yk&yˆk) , k0U d

k0s d

where the ak ’s are the sampling weights (possibly adjusted for nonresponse). In our two-level linear regression model for a simple MGREG estimator, some of the coefficients of the linear model are regarded as random variables varying from a domain to another: for observation k in the dth domain, yk ' xkN % zkNu d % k , where the vector zk is a subset of xk and ud is a realization from a multinormal distribution N(0, ) . We estimate the random effects ud in each domain and obtain fitted values yˆk ' xkN ˆ % zkNuˆd . A model-dependent domain estimator amounts to the sum of these values over the domain. This estimator appears preferable to the corresponding synthetic estimator obtained by fitting an ordinary linear model, since the estimated random effects may take

into account model misspecification, for example. Finally, a model-assisted MGREG estimator is derived by incorporating the fitted values in the corresponding GREG estimator. For binary or polytomous responses the approach can be generalized by adopting more realistic models such as logistic models (Lehtonen and Veijanen 1998). 3. Experiments Our Monte Carlo experiments are based on data from the Finnish Labour Force Survey of Statistics Finland with labour force status (employed, unemployed, or not in labour force) as the polytomous response variable. Our models incorporate four auxiliary variables: age, sex, region (NUTS2) and a jobseeker indicator showing which persons were registered as unemployed jobseekers according to the administrative records of the Ministry of Labour. These auxiliary data were merged with the survey data on micro level by using personal identification numbers that are unique in both data sources. A sample of 12000 individuals yields regional estimates for employment and unemployment in a total of 84 NUTS4 sub-regions which are used as domains in our experiments. We compare the properties (bias and efficiency) of the standard model-assisted GREG estimators, model-dependent synthetic estimators based on multilevel models, and multilevel-model assisted MGREG estimators.

REFERENCES Estevao V., Hidiroglou M. A. and Särndal C.-E. (1995). Methodological principles for a Generalized Estimation System at Statistics Canada. Journal of Official Statistics 11, 181 - 204. Goldstein, H. (1995). Multilevel statistical models. Second Edition. London: Arnold and New York: John Wiley & Sons. Lehtonen R. and Veijanen A. (1998). Logistic generalized regression estimators. Survey Methodology 24, 51 - 55. Särndal C.-E., Swensson B. and Wretman J. H. (1992). Model Assisted Survey Sampling. New York: Springer.

RÉSUMÉ Une nouvelle approche de l’estimation des domaines dans l’échantillonnage des enquêtes est présentée sur la base d’une estimation par régression généralisée. Les estimateurs sont spécialement destinés à l’estimation de petites zones. Les propriétés des estimateurs sont examinées par les expériences de Monte-Carlo basées sur les données provenant de l’enquête sur la maind’oeuvre de Statistique Finlande.