Examining the Relationship Between Nonresponse Propensity and Data Quality in Two National Household Surveys Scott Fricker Office of Survey Methods Research, BLS
Roger Tourangeau
Joint Program in Survey Methodology
September 21, 2011
Overview 1.
Literature on response propensity - measurement error link
2.
Description of surveys and analytic methods
3.
Findings on relation of nonresponse propensity to data quality
4.
Effects of potential common causal factors
2
Trends in Response Rates Survey nonparticipation is not new. Last 20 years have seen a steady decline in rates of contact, cooperation, other nonrespondents (e.g., due to physical impairments, etc.) Implementation of NR reduction techniques (e.g., advance letters, incentives, rigorous callbacks, etc.) These methods may increase RR but do not necessarily lead to gains in nonresponse bias reduction (e.g., Keeter et al., 2000/2006; and Curtin et al., 2000; 2005)
3
Nonresponse – Data Quality Link
Until recently, not much attention Nonresponse seen as motivational, measurement error seen as due to cognitive factors Assumption of independent causes may be untenable
A few empirical examinations: Early vs. late responders and data quality (DQ) Late responders/initial refusers - ↑ skips, DK, shorter answers (e.g., Friedman et al., 2003; cf Yan et al., 2004)) Propensity to respond and DQ Olson (2006) – mixed findings Tourangeau, Groves, and Redline (2010) - sensitive topic ↓ participation and accuracy 4
Causal Mechanism(s) for Link?
Covariance may reflect cause(s) common to both NR & DQ Several possible candidates may apply to broad range of surveys: Topic Interest Social Capital – activate norms of cooperation and prompt careful processing Busyness or Time Stress → disinclination to respond AND respond carefully
IDing and statistically controlling for shared explanatory factors would eliminate relationship, provide means of removing bias (see, e.g., Biemer, 2001)
Purpose of this study – explore relation and explore common causal factors 5
Overview of CPS and ATUS Analyses
Issues investigated using data from two national household surveys: Current Population Survey (CPS) and the American Time Use Survey (ATUS)
Examined overall nonresponse (i.e., not separating out noncontacts and refusals).
Created nonresponse propensity models (multivariate logistic regression) that included the predictors related to the following:
Busyness Social capital Survey process Respondent demographics
Models accounted for complex survey design and base weights 6
Overview of CPS and ATUS Analyses, continued
Current Population Survey (CPS) Dataset – 97,053 households in sample for all 8 CPS waves during 2-year period – Excluded replacement HH, those that were ever ineligible, and those that did not respond in any of the eight waves • Not modeling “hard core” nonrespondents
IVs constructed from information available in first two rounds of CPS DV: nonresponse in any of the last six CPS interviews (yes/no)
American Time Use Survey (ATUS) Dataset 25,778 respondents, excluding ATUS ineligible and those excluded from CPS dataset
IVs constructed from information available on CPS and ATUS files DV: nonresponse in ATUS (yes/no) 7
Overview of CPS and ATUS Analyses, continued
On the basis of predicted probabilities of nonresponse, households were divided into propensity quintiles CPS: Low NR prop (1%) to High NR prop (30%), 19,400 cases/group ATUS: Low NR group (10.6%) to High NR group (54.9%), 3,275 respondents/group
Examined relationship between NR propensity for each survey to DQ indicators
Data Quality Indicators: CPS: item nonresponse, round value reports, classification errors (reflecting potentially spurious changes in Rs‟ answers), and interview-reinterview response variance ATUS: total diary activities reported, missing „basic‟ activities, round values for durations, item nonresponse on labor-force questions
8
Overview of CPS and ATUS Analyses, continued
Examined means of DQ indicators across the propensity strata to assess the relative size and direction of the NR propensity – DQ association
To make it easier to compare the relative strength of each DQ measure‟s association with propensity, I standardized them into standard deviation units
Then assessed effects of controlling for potential shared explanatory factors
9
Findings: Predictors of CPS Survey Nonresponse Table 1. Significant Predictors of Nonresponse in Waves 3 – 8 of CPS Construct Variable Construct Respondent age, sex, race, origin Controls
Urbanicity x region
Variable # of Non-family/Relatives present Household size Citizenship
Season of CPS wave 1 interview HH ownership (own vs. rent) CPS nonresponse in wave 1 or 2 Level of effort / Reluctance
# of contact attempts in wave 1 Family income item nonresponse in wave 1
Racial diversity (county) Social Capital
Educational attainment (county) Median family income (tract) Income inequality (county)
Hours worked (wave 1) Employment status
Busyness All working adults in HH work 40+ hours per week?
Marital status Presence of young child(ren)
Max-rescaled R-Squared: 0.1637 10
Findings: Predictors of ATUS Survey Nonresponse Table 2. Significant Predictors of ATUS Nonresponse Construct Variable Respondent age, race, origin Controls Family income
Construct
Variable # of Non-family/Relatives present
CPS nonresponse in waves 3 - 8 Employment status Level of effort / Reluctance
# of call attempts ATUS respondent same as CPS Family income item nonresponse in CPS
Busyness
Marital status Social Capital
Presence of young child(ren)
Percent of HH adults who work
Median family income (tract)
Occupation type (executive/professional, service, support/production, not in labor force)
Racial diversity (county) Diversity x region
Max-rescaled R-Squared: 0.3708
11
CPS Findings
12
What Would Explain the Covariance?
Is there a shared explanatory factor (or factors) that mediates this relationship?
Relationship should be eliminated once the common cause is statistically controlled.
Potential common cause variables: busyness (hours worked, commute times) social capital (marital status, children, home ownership, education achievement in community) survey burden (item burden)
Examined effects of these variables on the data quality indicator that showed the strongest association with nonresponse propensity (i.e., item NR) 13
CPS Findings, 2 – Item NR Figure 2. Effects of Potential Common Cause Variables on the Relationship Between CPS Item Nonresponse and Unit Nonresponse Propensity
14
ATUS Findings
15
ATUS Findings, continued Figure 5. Effects of Potential Common Cause Variables on the Relationship Between the Number of ATUS Diary Reports and ATUS Nonresponse Propensity
16
ATUS Findings, continued
17
ATUS Findings, continued
18
Summary of Findings
Data quality decreased as the probability of nonresponse increased
Strength of relationship varied by data quality indicator Stronger effects in CPS than ATUS Stronger effects for CPS Item NR and Round Value Reports ATUS – number of diary reports
When data quality and nonresponse did covary, controlling for potential common cause variables did not weaken the relationship.
Implications for survey organizations Extraordinary efforts to bring difficult to contact/reluctant sample members into respondent pool may not reduce nonresponse error (e.g., Curtin et al., 2000; Keeter et al., 2006) AND may also produce significant increases in measurement error 19
Thank You
Acknowledgements/Thanks to: JPSM & BLS colleagues Dissertation Committee – Roger Tourangeau (Chair, UMD), Stanley Presser (UMD), Fred Conrad (UMICH), Suzanne Bianchi (UMD), Clyde Tucker (BLS)
POQ reviewers and editors (anonymous; Paul Biemer)
Questions? 20
Contact Information Scott S. Fricker
Research Psychologist Office of Survey Methods Research www.bls.gov/osmr 202-691-7390
[email protected]