May 13, 2010 - May need to bring data collection information ... bachelor's or master's degrees in science, i i h lth ... Information collected on demographics,.
Using Paradata for Nonresponse g g Adjustments j Weighting May 13, 2010 Presentation to the 2010 AAPOR Conference Donsig Jang and Amang Sukasih (Mathematica Policy Research) Kelly H. Kang and Stephen H. Cohen (National Science Foundation)
Disclaimer Opinions expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the National Science Foundation (NSF).
2
Nonresponse
Nonresponse is common in data collection
There has been considerable discussion about the difficulty of obtaining high response rates in government surveys. The discussion typically focuses on— on – Survey operations – Treatment of nonresponse
Understanding survey operations is critical to dealing with nonresponse, and vice versa
3
Nonresponse Adjustments for Estimation
Implicit assumption that nonresponse is missing at random (MAR); i.e.,
Pr( Ri | X i ) pi
Can estimate a population parameter (Y ) with respondent data only ˆ ( YˆR ) , where
Yˆ R
pˆ i
i R
1 i
pˆ i 1 y i
Critical to gather auxiliary variables (X’s) so that p i pˆ i and d ˆ are small ll and d nonsystematic
4
How to Estimate Response Propensities
X’s are usually limited to variables in the sampling frame – Can a sample member member’s s response propensity be completely determined by frame variables only?
Implicitly p y assume that the effect of data collection protocol on response propensities is constant across all sampling groups
Data collection protocol has become more complicated to increase response propensity – Multiple modes modes, incentives, incentives targeting underrepresented groups, multiple reminders, etc. – Responsive survey design (Groves et al. 2006)
5
How to Estimate Response Propensities (cont’d.)
To what extent does each data collection stimulus affect response rates across different groups? (Leverage-Saliency Theory, Groves and Singer 2000)
May need to bring data collection information in response propensity estimation
6
Paradata
Paradata includes interviewers’ level of effort, method of data transmittal, and type of incentive treatment
Statisticians have begun to look at the paradata for weighting adjustments and study the connection between paradata, response propensity, and survey outcomes ( (e.g., Biemer Bi 2009)
7
National Survey of Recent College Graduates (NSRCG)
Sponsored by NSF and conducted every two or three years since 1974
Target population: recent graduates with bachelor’s or master’s degrees in science, engineering, i i or health h lth
Covers two or three graduate cohorts – 2003: AY01 (7/1/2000–6/30/2001), AY02 – 2006: AY03, AY04, AY05 – 2008: AY06, AY07
8
NSRCG (cont’d.)
Information collected on demographics, education, employment, etc.
Two-stage sample design: school sample (first stage) and graduate sample (second stage) – Sample sizes: 300 schools and 9,000 graduates per graduate cohort
For more information, see www.nsf.gov/statistics/srvyrecentgrads
9
Unweighted Response Rates for NSRCG
90 85 80 75 70 65 60 55 50 2003
2006
10
2008
Estimated Coefficients from Logistic Regression Model of Survey Response Using 2003, 2006, and 2008 NSRCG Data Variable Intercept
Category
Graduate cohort ((ref: AY02, AY05, , , AY07)) Degree level Field of major (ref: computer and info. science)
Year 3 (AY03): only for 2006 survey Year 2 (AY01, AY04, AY06) ( , , ) Master's Health Engineering Social science Psychology Physical science Mathematical science Mathematical science Life science Minority (black, Hispanic, Am. Indian) Asian Male Non‐U.S. residents 2008 2006
Race/ethnicity ((ref: white)) Gender Residence status Survey year (ref: 2003)
Estimate Pr > ChiSq Pr > ChiSq 0.897