Journal of Exposure Analysis and Environmental Epidemiology (2005) 15, 431–438 r 2005 Nature Publishing Group All rights reserved 1053-4245/05/$30.00
www.nature.com/jea
Population coverage and nonresponse bias in a large-scale human exposure study PAUL MOSQUIN,a ROY WHITMORE,a CINDY SUERKENb AND JIM QUACKENBOSSc a
RTI International, Research Triangle Park, North Carolina, USA Department of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA c USEPA National Exposure Research Laboratory, Las Vegas, Nevada, USA b
We used estimates derived from screener variables of the National Human Exposure Assessment Survey (NHEXAS) Phase I field study in EPA Region V (one of three NHEXAS Phase I field studies) to examine biases resulting from survey nonresponse and/or incomplete population coverage inherent in the study design. For variables with population values obtainable from Census projections, the combined effect of nonresponse and coverage bias was tested for after each stage of nonresponse using design-based weights. For variables where population values were not available as Census projections, nonresponse bias was tested for after the screener stage of nonresponse using weights adjusted for screener nonresponse. Additional tests for bias were performed using final survey weights to evaluate the performance of survey weight adjustments in reducing observed bias. Comparison of biases estimated using both design-based and adjusted weights was used to identify potentially important weight adjustment variables for future exposure studies, identify possible weaknesses in survey design strategies, and support the use of nonresponse and poststratification weight adjustments to reduce bias in future survey studies. Journal of Exposure Analysis and Environmental Epidemiology (2005) 15, 431–438. doi:10.1038/sj.jea.7500421; published online 26 January 2005
Keywords: nonresponse, human exposure survey, population coverage.
Introduction Objectives Four common shortcomings in sample surveys are: (i) specifying a survey population which excludes subgroups of the ideal target population (e.g., individuals living in institutions) from the sampling frame, (ii) failing to obtain 100% response rates, (iii) accepting less than ideal precision goals to obtain acceptable survey costs, and (iv) substituting sample units (e.g., substituting a willing population member for an unwilling sample one) (Callahan et al., 1995). The objective was to determine if the first two of these F incomplete frame coverage and survey nonresponse F resulted in detectable bias in survey estimates for the National Human Exposure Assessment Survey (NHEXAS) Phase I field study conducted in EPA Region V, the Great Lakes states, from July 1995 through May 1997. In this paper, tests for the presence of coverage bias and/or nonresponse bias were performed at all possible stages of the survey in which nonresponse could occur (i.e. screener
1. Address all correspondence to: Paul Mosquin, Research Triangle Institute, RTI International, Box 12194, 3040 Cornwallis Road, Research Triangle Park, NC, 27709-2194, USA. Tel.: þ 1 919 316 3971. Fax: þ 1 919 541 6722. E-mail:
[email protected] Received 16 March 2004; accepted 20 December 2004; published online 26 January 2005
questionnaire, baseline questionnaire, core media sample stage and optional media sample stage). These tests were performed on two types of background variables obtained from the initial screener questionnaire: (i) variables with population data available from Census Bureau projections and (ii) variables without population data available from Census Bureau projections. For the first group of variables, tests comparing respondent estimates to ‘‘known’’ population values provided an accumulated bias estimate due to both nonresponse and incomplete coverage issues. For the second group of variables, respondent estimates after later stages of nonresponse were compared to estimates based on the much larger sample of individuals rostered during screening. This approach provided a test of accumulated bias due to survey nonresponse alone. For both groups, tests were also performed using final survey weights to evaluate the performance of survey weight adjustments in reducing observed biases.
Background The NHEXAS study was undertaken to assess exposures to a wide range of chemicals across diverse media in EPA Region V. It had a target population of non-institutionalized permanent residents of households in EPA Region V (Minnesota, Wisconsin, Michigan, Illinois, Indiana, and Ohio) during July 1995 to May 1997. Notable exclusions were residences on military bases, group quarters (e.g.,
Population coverage and nonresponse bias
Mosquin et al.
college dormitories), and individuals deemed mentally or physically incapable (mostly frail, elderly persons). No age restrictions were placed on the target population, in order to test the feasibility of the monitoring methods for all age groups. The NHEXAS survey was implemented using a complex, four-stage survey design. The first three of which were: a firststage sample of counties, a second-stage sample of area segments within counties, and a third-stage sample of housing units within segments. Willing adult members of selected third-stage households were asked to complete a household screening questionnaire. This questionnaire contained the rostering information needed to complete the fourth stage, where individuals were randomly selected from households for more detailed interviewing and monitoring. Individuals selected at the fourth-stage completed a baseline questionnaire, after which they were scheduled for collection of environmental media samples about 1–3 weeks later. Participants were required to do 6 days of ‘‘core’’ media monitoring (air VOCS, water sampling, and household dust). Participation for other media (air aerosols, food, urine, blood and hair) collection was strongly encouraged, but not required. Upon completion of the study, over 37 different analytes had been measured in a variety of media for 249 participating individuals. Within the NHEXAS design, nonresponse could occur at various points in the data collection process, each of which might present a different pattern of nonresponse. Nonresponse could occur for the ‘‘screening questionnaire’’ (a roster of household members and their demographic characteristics), for the ‘‘baseline questionnaire’’, for the core media samples, or for any of the optional media samples. The impact of nonresponse on final survey estimates was conditional on whether variables measured in the survey were dependent on response status. Three possibilities exist: that the missing data were missing completely at random (MCAR), that the missing data were missing at random (MAR), or that the data were neither MCAR nor MAR
(Little and Rubin, 1987). These possibilities make progressively fewer assumptions about the missing data mechanism and, as a result, require increasingly greater knowledge regarding the missing data mechanism to compensate for nonresponse bias. Neither the MCAR nor MAR situation requires explicit modeling of the missing data mechanism. For the NHEXAS data, weight adjustment methods were introduced that provided some robustness under the MAR assumption (Little and Rubin, 1987). Where the response status of an individual was correlated with a background variable(s), the use of MAR adjustment methods (weighting and poststratification) was validated and variables of concern for future studies were identified. Because nonresponse could occur at various points in the study, several weight adjustment steps were required. An overview of the adjustment procedures is presented in Table 1 and discussed in detail in Whitmore et al. (1999). Each nonresponse stage had both a nonresponse adjustment and a poststratification adjustment, except for the environmental media where sample sizes were considered too small for poststratification. Adjustments were conducted using variables considered most appropriate for each stage, subject to sample size constraints. The first nonresponse adjustment used the weighting class approach due to a lack of good background variables for households that were screener nonrespondents. At later stages of the study, a logit model was favored due to the available screener background variables and reduced respondent group sizes. The variables given in Table 1 are standard predictors with the exception of the ‘‘field worker loop’’, which indicates one of four temporal and geographic loops that field workers determined through the six-state study region during data collection. Methods described by Folsom (1991) were used for the model-based poststratification and nonresponse adjustment. Whitmore et al. (1999) suggested that meaningful levels of nonresponse bias might exist in the NHEXAS survey data that appear worth investigating. For surveys, EPA guidelines recommend at least a 75% response rate for in-person or
Table 1. Nonresponse and poststratification adjustment methods used for NHEXAS nonresponse stages. Adjustment step
Nonresponse stage Screener
Baseline
Core media samples
Optional media samples
Nonresponse Method Variables used
Weighting class Field worker loopa
Logistic regression Screener background variables and their interactionsb
Logistic regression Field worker loop, age, gender
Logistic regression Field worker loop, age, gender
Poststratification Method Variables used
Control totals Householder age, state
Generalized raking Age; cross of race, ethnicity and gender
Generalized raking Age, gender
None
a
This variable records the sampling pass on which the individual was contacted. Four sampling passes were made through the study region. Interactions selected using a CHAID procedure (Kass, 1980).
b
432
Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)
Population coverage and nonresponse bias
telephone interviews (US EPA, 1983). In the NHEXAS study, the estimated population response (participation) rate at the screener stage (first nonresponse stage) was 71.9% and the estimated population response rate after the third nonresponse stage (core media data collection) was 42.7%. Hence, about 28.1% of the survey population was not covered by the first stage of data collection while another 29.2% was not covered by the start of the third stage. Nonresponse for additional, more burdensome types of monitoring led to further decreases in population coverage. These low response rates are, however, not unusual for exposure studies due to the often substantial burden placed on respondents (Cox et al., 1988; Callahan et al., 1995). The NHEXAS study placed more burden than most, as attempts were made to collect samples from a variety of media from selected individuals, including: personal air VOCs, personal air aerosols, indoor air aerosols, food and beverages, urine, blood, and hair. Low response rates typically observed in exposure studies may have a strong effect on bias in survey estimates as was well illustrated by an example1 from Callahan et al. (1995). For a population characteristic with prevalence equal to 20% (e.g., 20% of the population exceeds a threshold exposure concentration) and a response rate of 90%, the bias can lie between 9% and þ 2%. However, if the response rate is decreased to 60%, then the bias may be anywhere between 20% and 13%. Such potentially high levels of bias in exposure studies could be reduced if a sample of nonrespondents could be converted to respondents at some time during the survey. However, doing so can be expensive. In the NHEXAS survey, conversion was attempted through a fixed number of call-back attempts, and participation rates in NHEXAS were increased by offering incentives for completion of the more burdensome types of monitoring (Whitmore et al., 1999).
Methods This section describes the statistical methodology for estimating and testing bias, the multiple testing methodologies adopted, and the variables and biases for which they were tested (e.g., incomplete coverage and/or nonresponse). The variables available for bias testing were divided into two groups: those which had available census projections, and those which did not. Census projections treated as population values2 were used to estimate (and test) bias due 1 These bounds on the bias are obtained using the identities p ¼ ZpNR þ (1Z)pR where p is the overall population prevalence (mean), pNR is the prevalence among nonrespondents, pR is the prevalence among respondents, and Z is the nonresponse rate, and Bias ¼ pRp. 2 It is expected that the projections are more precise than the estimated distributions obtained in this paper. The projections are relatively nearterm (5 years forward), and are averaged over several relatively slow growing states.
Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)
Mosquin et al.
to the combined effects of incomplete coverage and nonresponse. Where census projections were not available, nonresponse bias among individuals in screened households was evaluated using the screener distributions as a comparative baseline.
Estimating and Testing for Bias For a categorical variable with K levels, the bias in the estimate pˆk of the true proportion pk0 at level k is defined as pk p0k Bð^ pk Þ ¼ E½^ ¼ pk p0k The expectation E[pˆk] is taken over the sample design and the response propensity distribution for each stage of nonresponse. Thus, the bias at level k can be estimated as Bˆ(pˆk) ¼ pˆkpk0 when pk0 is known. A K-level categorical variable leads to K1 independent estimates. For this study, the population was represented as either the population at large (census projections) or the population of individuals who could be rostered. Bias was tested for using a single goodness-of-fit test of all K levels. To test if pˆ1, y, pˆk are compatible with known p01, y, p0K, the hypothesis is H0: H1:
p1 ¼ p01, p2 ¼ p02, y, pk ¼ pk0 at least one pkapk0, k ¼ 1, y, K
Any significant bias detected can be attributed to systemic differences between the responding population and the reference population. This test can be conducted according to the methods of Rao and Scott (1981) for weighted survey data using a modified w2 statistic with K1 degrees of freedom. Our approach was to identify biases of potential concern through estimation and hypothesis testing. Note that biases identified in this manner are subject to type-I error (being identified as significant when they are not), and even when correctly identified, may not be associated with a bias for the primary response: exposure level. However, identified biases indicate variables important to include in future exposure studies to both improve survey design and to have available for nonresponse adjustment under an MAR assumption. The statistical tests used herein had relatively low power due to the moderate sample sizes such that not all significant biases would have been detected.
Adjustments for Multiple Comparisons Two approaches were taken to address the multiple comparisons problem: a per comparison error rate approach and a false discovery rate approach (Benjamini and Hochberg, 1995; Genovese and Wasserman, 2002). The per comparison error rate approach ignores multiple testing issues by conducting each test separately at an a ¼ 0.05 significance level. For null hypotheses that are true, approximately 5% will be rejected by chance alone. 433
Population coverage and nonresponse bias
Mosquin et al.
Thus, the use of this approach allows the simple identification of test results for which further investigation may be in order. In contrast, the false discovery rate procedure provides a more stringent standard for rejection, fixing a false discovery rate of q* over a specified family of tests. The false discovery rate controls the expected proportion of falsely rejected null hypotheses to be less than or equal to the familywise error rate (as might be adopted with a Bonferroni-type approach), with equality holding only if all of the null hypotheses are in fact true. When one or more null hypotheses are false, the true familywise error rate is larger than the false discovery rate and there is a consequent gain in power. The false discovery rate procedure is implemented as follows: order all P-values P(1), y, P(m) for m separate tests within a given family, find the largest k for which P(k)r(kq*)/m, and then reject all null hypotheses associated with P(1), y, P(k). Since the tests performed in this study were dependent across response stage F individuals present at a later nonresponse stage were also present in earlier stages F multiple comparison test families were restricted to groups of tests that were considered independent. These were taken to be all tests at a given response stage for each kind of comparison. Comparisons to census projections had families of size three (three variables), while comparisons to screener estimates were of size six. The false discovery rate was set to q* ¼ 0.05, the Bonferroni level if all nulls were true.
Bias due Solely to Nonresponse For screener variables which did not have census projections available, bias due solely to nonresponse was estimated and tested for by comparison of distributions among respondents at stages of nonresponse occurring after the screener stage (baseline questionnaire, core media, optional media) and similar distributions obtained from all individuals rostered during screening. For these comparisons, the screener distributions (based on many more respondents) were fixed which allowed the use of statistical methods described earlier. Screener variables used in this analysis were: highest level of education, smoking status, work status, type of home, own home, and telephone in house. Again, two sets of comparisons were made. Both compared distributions of respondents at later nonresponse stages to the distributions for all rostered individuals. The first set used weights adjusted for the probabilities of selection, for nonresponse, and for poststratification at the screener stage. Comparisons based on these weights allowed assessment of accumulated nonresponse bias at later stages of nonresponse. The second set of comparisons used weights that were adjusted for nonresponse and poststratified for the stage or media under consideration. These comparisons allowed an evaluation of the performance of the weight adjustments in reducing bias for these variables.
Results Bias due to Incomplete Coverage and Nonresponse Screener background variables from Census Bureau projections included age, gender and race/ethnicity. Their respective distributions were obtained after collapsing across state-level 1995 projections for EPA Region V. Respondent distributions were compared to census distributions for the screening questionnaires, baseline questionnaires, core media data and optional environmental media collections (personal air aerosol, food, urine, blood metals, blood VOC, and hair). All comparisons were of study participant distributions versus census distributions, with the exception of the screening stage. For the screening stage, distributions based on all rostered individuals were compared to census distributions. Any bias detected by comparison of study distributions to census distributions was attributed to the combined effects of nonresponse and incomplete population coverage. Two sets of comparisons were made. The first set used weights adjusted only for the design-based probabilities of selection. The second used weights adjusted for nonresponse and poststratified for the stage or media under consideration. The first comparison allowed the evaluation of accumulated bias across multiple stages of nonresponse; while the second allowed evaluation of the cumulative performance of weight adjustments for that stage of nonresponse (screening, baseline, or media participation). 434
Screening Questionnaire Coverage and Nonresponse Bias Table 2 presents the results for bias estimation in comparison to census projections using the design-based weights. Bias estimated in this table could be attributed to the combined effects of nonresponse and incomplete coverage. Two variables, gender and age, were significant at an unadjusted a ¼ 0.05 level (Table 2). For gender, distributions at the time of rostering were close to census projections (estimated male bias of 2.08% greater than the census rate), but increased significantly at the baseline questionnaire due to a greater propensity for females to participate in the study. The elevated frequency of female respondents was retained at later collection stages, with the magnitude of the resulting bias becoming even more pronounced for the optional personal air aerosol and blood VOC media. For the age variable, the distribution was significantly different from census projections at the screener stage but lost significance at the baseline stage. The significant difference at the screener stage might be attributed primarily to an under-representation of 18–24-year olds in the sample due to incomplete coverage. Individuals of this age group were more likely to be residing on military bases or college campuses. Those individuals temporarily living away from a sampled home were not included on the roster. The subsequent loss of significance for age at the baseline questionnaire might be Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)
Population coverage and nonresponse bias
Mosquin et al.
Table 2. Biases of respondent distributions from census projections as estimated using design-based weights. Variable
Variable level
Gender
Male Female Race/ White non-Hispanic Ethnicity Other Age 0–17 18–24 25–34 35–44 45–64 65+ Respondent sample size
Census projections (%)
48.68 51.32 83.31 16.69 26.20 9.51 15.20 16.30 20.01 12.78
Respondent distribution bias Screener
Baseline
Core media
Personal air aerosol
Food
Urine
Blood metals
Blood VOC
Hair
2.08 2.08 3.97 3.97 7.83 3.13 0.15 0.28 2.98 2.15 1456a
9.18 9.18 3.54 3.54 1.30 1.62 3.15 3.64 2.54 1.33 326
8.67 8.67 4.66 4.66 0.15 1.20 3.91 4.46 3.54 3.77 247
14.23 14.23 2.59 2.59 1.49 0.22 1.60 4.35 2.30 4.91 169
9.96 9.96 2.60 2.60 0.20 0.78 8.08 0.12 3.53 5.25 159
8.51 8.51 1.04 1.04 2.79 1.01 4.18 1.69 0.08 4.17 202
9.85 9.85 6.24 6.24 8.80 0.09 9.26 4.41 1.43 3.35 165
12.81 12.81 5.23 5.23 12.04 0.82 10.86 5.89 2.10 3.43 153
6.35 6.35 5.59 5.59 2.49 2.14 4.18 3.97 0.04 3.57 182
Boldfaced entries indicate significance at unadjusted a ¼ 0.05 level. Shaded cells indicate significance at a q* ¼ 0.05 false discovery rate level. a Rostered individuals from 555 responding households out of a total of 805 sampled households.
Table 3. Biases of respondent estimates from census projections as estimated using final adjusted weights. Variable
Gender
Variable level
Male Female Race/ White non-Hispanic Ethnicity Other Age 0–17 18–24 25–34 35–44 45–64 65+ Respondent sample size
Census projections (%)
48.68 51.32 83.31 16.69 26.20 9.51 15.20 16.30 20.01 12.78
Respondent distribution bias Screener
Baseline
Core media
Personal air aerosol
Food
Urine
Blood metals
Blood VOC
Hair
2.18 2.18 2.64 2.64 3.28 4.04 0.26 0.07 0.96 1.39 1456a
2.58 2.58 0.24 0.24 1.16 1.37 0.27 0.21 0.35 0.08 326
0.05 0.05 0.58 0.58 0.79 1.23 0.04 0.08 0.16 0.24 247
0.05 0.05 1.00 1.00 0.41 2.23 1.34 0.56 0.84 0.76 169
0.26 0.26 5.07 5.07 1.66 1.15 0.69 2.07 0.27 1.62 159
0.05 0.05 3.22 3.22 1.25 2.67 0.93 2.03 2.76 1.21 202
0.05 0.05 8.26 8.26 4.04 1.25 3.28 0.58 0.86 0.96 165
0.05 0.05 7.65 7.65 10.39 3.66 7.21 0.07 1.11 0.56 153
0.05 0.05 7.11 7.11 1.43 0.11 1.80 1.01 1.88 1.35 182
Boldfaced entries indicate significance at unadjusted a ¼ 0.05 level. No cells had significance at a q* ¼ 0.05 false discovery rate level. a Rostered individuals from 555 responding households out of 805 sampled households.
attributed to a reduced propensity for 0–17-year olds to enter into the study. Age later regained significance at the optional blood media collection stage. This significance could largely be attributed to the pronounced nonparticipation of the 0– 17-year-old age group. All variables significant at the unadjusted a ¼ 0.05 level remained significant at a false discovery rate of q* ¼ 0.05. Biases estimated relative to census projections using final adjusted weights are given in Table 3 and are, in general, notably smaller than corresponding biases in Table 2. Biases for the screening questionnaire and race/ethnicity for metals in blood medium were significant at an unadjusted a ¼ 0.05 level, yet not significant at a false discovery rate of q* ¼ 0.05. Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)
The unadjusted significance of age was not surprising as weight adjustment procedures were strictly household- (not person-) level at the screener stage. Person-level poststratification was not conducted at that stage because the screener data were largely collected for use in later adjustment procedures. Besides the continued significance of age, biases were reduced by poststratification and were low for baseline and core media. Biases increased for optional media, which lacked a poststratification step due to reduced sample sizes.
Nonresponse Bias in Screener Variables Table 4 provides nonresponse bias estimates for selected screener variables for which census projections were unavail435
Population coverage and nonresponse bias
Mosquin et al.
Table 4. Bias of respondent estimates relative to rostered individuals, using final screener weights. Variable
Highest level of education
Variable level
At most high school graduate Some college or technical school College graduate Smoking status Yes No Work outside home Yes No Single family detached Type of homea Other Home ownershipa Own Rent Telephone in housea Yes No Sample size respondentsb
Screening questionnaire adjusted (%)
Baseline
Core media
Personal air aerosol
Food
Urine
Blood metals
Blood VOC
Hair
60.04
13.08
13.22
12.33
13.59
16.48
15.67
15.68
16.72
24.81
9.28
8.53
6.74
4.69
10.04
6.38
5.11
8.65
3.80 3.99 3.99 3.01 3.01 6.72 6.72 0.69 0.69 0.38 0.38 326
4.69 7.85 7.85 1.95 1.95 6.69 6.69 0.72 0.72 0.99 0.99 249
5.60 7.11 7.11 3.14 3.14 8.66 8.66 1.15 1.15 0.60 0.60 169
8.90 9.47 9.47 2.58 2.58 8.72 8.72 0.02 0.02 0.09 0.09 159
6.44 7.21 7.21 3.80 3.80 8.67 8.67 2.23 2.23 1.30 1.30 202
9.29 8.77 8.77 4.07 4.07 7.62 7.62 3.21 3.21 1.53 1.53 165
10.58 8.32 8.32 4.74 4.74 6.94 6.94 3.64 3.64 2.66 2.66 153
8.07 8.73 8.73 3.20 3.20 9.43 9.43 3.79 3.79 2.98 2.98 182
15.15 31.28 68.72 65.99 34.01 68.02 31.98 66.87 33.13 92.18 7.82 1456c
Distribution difference
Boldfaced entries indicate significance at unadjusted a ¼ 0.05 level. Shaded cells indicate significance at a q* ¼ 0.05 false discovery rate level. a Person-level distribution of household characteristics. b Highest level education, smoking status, and work outside home have a lower sample size due to restriction to 18 years or older. c Includes all those listed on the rosters for 555 responding households out of 805 sampled households.
able. Bias estimated in this table could be attributed to the combined effects of nonresponse after the screener stage. The most notable trend in Table 4 is the significant and pronounced under-representation of ‘‘at most high school graduate’’ in all post-baseline stages. This bias could be attributed primarily to a difficulty in bringing these individuals into the study at the baseline stage. The underrepresentation of these individuals continued through the remaining stages becoming more pronounced for optional media urine, blood and hair. ‘‘Type of home’’ was also of significance for the optional media hair, attributable to an excess of single family detached home owners in the sample. This excess was first present after baseline collection and generally increased in magnitude through later collection stages. Although not significant, Table 4 also shows a tendency for smokers to be present in higher frequency than expected. The trend begins with baseline, becoming more pronounced in later stages. Variables work outside the home, home ownership, and presence of a telephone in the home showed neither significance nor a discernable trend in bias. In contrast to the results in Table 4, Table 5 provides nonresponse bias estimates and significance test results using final adjusted weights. Variables, such as highest level of education, no longer show significance when final adjusted weights were applied. Other variables, such as home ownership, telephone in house, type of home, became significant at an unadjusted a ¼ 0.05 level. There was no discernable 436
pattern to the changes in bias given that weight adjustments reducing known biases appeared to have mixed effects on the bias of other variables.
Discussion This paper has analyzed nonresponse and coverage bias in a large scale exposure survey, NHEXAS. Questions of interest include: how might future studies be improved given the biases estimated here and how did weight adjustment procedures perform. Notable biases were observed for gender, age, highest level of education, smoking status and type of home. For gender, there was a significant tendency for women to respond more than men, particularly at the baseline participation stage. For age, there were apparent coverage problems for 18–24-year olds at the screener stage of the survey and a difficulty in retaining 0–17-year olds, both to participate in the study by responding to the baseline questionnaire and to provide analyzable blood samples. Highest level of education showed a deficit of individuals with at most high school degrees; a deficit which first became evident with the difficulty of recruiting these individuals into the baseline study. More smokers were found in the study than expected, although the trend was not significant. Finally, beginning with baseline, more single family dwelling individuals were present than Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)
Population coverage and nonresponse bias
Mosquin et al.
Table 5. Bias of respondent estimates using final analysis weights relative to estimates based on all rostered individuals using final screener weights. Variable
Highest level of education
Variable level
At most high school graduate Some college or technical school College graduate Smoking status Yes No Work outside Yes home No Single family detached Type of homea Other Home ownershipa Own Rent Yes Telephone in No housea Sample size respondentsb
Screening questionnaire adjusted (%)
Baseline
Core media
Personal air aerosol
Food
Urine
Blood metals
Blood VOC
Hair
60.04
8.66
8.67
8.26
9.96
9.95
9.67
10.02
10.63
24.81
7.04
6.66
7.04
5.64
7.87
5.51
4.88
6.72
1.62 0.23 0.23 3.99 3.99 9.63 9.63 6.57 6.57 2.19 2.19 326
2.02 2.07 2.07 1.92 1.92 9.46 9.46 6.65 6.65 2.34 2.34 249
1.22 0.41 0.41 8.12 8.12 12.13 12.13 9.02 9.02 1.30 1.30 169
4.32 3.21 3.21 3.45 3.45 11.24 11.24 8.22 8.22 2.44 2.44 159
2.08 1.97 1.97 0.93 0.93 11.37 11.37 7.96 7.96 2.58 2.58 202
4.16 2.75 2.75 0.90 0.90 10.46 10.46 11.12 11.12 3.42 3.42 165
5.14 2.07 2.07 1.65 1.65 7.58 7.58 10.67 10.67 4.30 4.30 153
3.90 2.04 2.04 0.80 0.80 11.59 11.59 9.37 9.37 4.49 4.49 182
15.15 31.28 68.72 65.99 34.01 68.02 31.98 66.87 33.13 92.18 7.82 1456c
Distribution difference
Boldfaced entries indicate significance at unadjusted a ¼ 0.05 level. Shaded cells indicate significance at a q* ¼ 0.05 false discovery rate level. a Person-level distribution of household characteristics. b Highest level education, smoking status, and work outside home have a lower sample size due to restriction to 18 years or older. c Includes all those listed on the rosters for 555 responding households out of 805 sampled households.
expected. Biases for each of these variables might be reduced in future surveys by modifying the survey study design (e.g., including group quarters, such as dormitories), providing greater incentives, using less burdensome methods for environmental and body burden monitoring, or modifying the effort or approach of field-workers during recruitment. For variables with available census projections, the performance of weight adjustment procedures was good, as illustrated by Table 3. In most cases, biases were reduced in comparison to biases computed using unadjusted weights. Poststratification to control totals based on some or all of these variables greatly reduced bias for baseline questionnaires and core media analyses, as was expected. Biases were somewhat larger for the optional media, perhaps due to their weights having a nonresponse adjustment but no poststratification adjustment because of smaller sample sizes (see Table 1). Based on the estimates in Tables 4 and 5, the NHEXAS weight adjustments for variables not available in the census projections had varied effects on estimated biases. For example, although highest level of education appeared to benefit from adjustments, type of home and home ownership showed increased bias. For these latter two variables most of the increase in bias occurred with the baseline questionnaire weight adjustments, attributable to the correlated effects of variables used for the baseline adjustments. The lack of consistent improvement for the non-census variables suggests having larger monitoring-phase sample sizes to support Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)
nonresponse models with more variables would be beneficial in future exposure monitoring studies. Although the variables of primary interest in an exposure analysis are the exposure measurements themselves, the variables used for this analysis were necessarily restricted to those in the screening questionnaire. The weight adjustments used to reduce their biases may have decreased bias for primary survey variables (the exposure measurements) to the extent that those biases were related to the weight control variables (gender, race/ethnicity, and age). Although the variables used for bias tests in this paper were not the primary outcomes of interest (exposure levels), detection of bias in estimates made from them may be useful in that (i) any bias may be correlated with bias in outcomes of interest, (ii) practical and statistical significance of detected bias is important in exposure studies that typically have poor overall response rates, (iii) variables with detectable bias might be considered as adjustment steps in future exposure studies, and (iv) their use in nonresponse and poststratification adjustment in the NHEXAS study allows an evaluation of the performance of weight adjustments by comparison of estimates obtained using preadjustment and postadjustment weights. Direct assessment of bias in primary survey variables could be accomplished in future studies by following up a subsample of nonrespondents using sufficient incentives to achieve high response rates in the subsample. However, respondents obtained in this way are typically much more expensive than initial respondents. 437
Mosquin et al.
Acknowledgements The United States Environmental Protection Agency through its Office of Research and Development funded and collaborated in the research described here under Contract Number 68-D-99-008 and assistance Agreement Number CR821902 to Research Triangle Institute. It has been subjected to Agency review and approved for publication. We thank C.A. Clayton for preparation of the statistical analysis database. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
References Benjamini Y., and Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995: 57: 289–300.
438
Population coverage and nonresponse bias
Callahan M.A., Clickner R.P., Whitmore R.W., Kalton K., and Sexton K. Overview of important design issues for a national human exposure assessment survey. J Expos Anal Environ Epidemiol 1995: 3: 257–282. Cox B.G., Mage D.T., and Immerman F.W. Sample design considerations for indoor air exposure surveys. J Air Pollut Control 1988: 38: 1266–1270. Folsom R.E. Exponential and logistic weight adjustments for sampling and nonresponse error reduction. Proc Amer Stat Assoc Soc Stat Sec 1991, pp. 197–202. Genovese C., and Wasserman L. Operating characteristics and extensions of the false discovery rate procedure. J R Statist Soc B 2002: 64: 499–517. Kass G.V. An exploratory technique for investigating large quantities of categorical data. Appl Statist 1980: 29: 119–127. Little R.J., and Rubin D.B. Statistical Analysis with Missing Data. John Wiley & Sons, New York, 1987. Rao J.N.K., and Scott A.J. The analysis of categorical data from complex surveys: chi-squared tests for goodness of fit and independence in two-way tables. J Am Statist Assoc 1981: 76: 221–230. US EPA. Survey management handbook, Volume I: Guidelines for planning and managing a statistical survey. EPA-230/12-84-002. Office of Policy, Planning and Evaluation, Washington, DC, 1983. Whitmore R.W., Byron M.Z., Clayton C.A., Thomas K.W., Zelon H.S., Pellizari E.D., Lioy P.J., and Quackenboss J.J. Sampling design, response rates, and analysis weights for the National Human Exposure Assessment Survey (NHEXAS) in EPA Region 5. J Expos Anal Environ Epidemiol 1999: 9: 369–380.
Journal of Exposure Analysis and Environmental Epidemiology (2005) 15(5)