Grim, Semali & Maretzki (2005)
1
ICIK Working Paper
Checking for Nonresponse Bias in Web-Only Surveys of Special Populations using a Mixed-Mode (Web-with-Mail) Design Brian J. Grim* Ladislaus M. Semali Audrey N. Maretzki Pennsylvania State University Since web surveys can be a quick and cost-effective option to survey special populations, it is important to understand the extent to which the data and analyses are biased by nonresponse. We demonstrate that using an experimental mixed-mode design provides a way to check whether nonresponse is a nonignorable source of bias in web surveys of special populations; we also demonstrate that it provides a cost-effective option to correct for nonresponse bias, if bias exists. We incorporated an experimental mixed-mode design to test for nonresponse bias in a web survey of the entire faculty at Penn State University. The members of the whole population sample were randomly assigned to two different subsamples: 5548 were assigned to a web-only treatment group and 1000 were assigned to a web-with-mail treatment group, which received the identical web treatment plus an equivalent paper follow-up survey and a reminder postcard. Comparison of the samples provides a cost-effective way to identify sources of nonresponse bias. The design also increased response rates, potentially in a ‘significant’ way. We also demonstrate the benefits of using structural equation modeling as a method to explore the effects of nonresponse bias.
Introduction Using the web as an alternative to other survey modes such as mail or telephone is becoming increasingly accepted (Couper 2000). Using an ‘equivalent’ web component in a mixed-mode approach is often seen as a strategy to decrease costs, increase the speed of data collection, and increase response rates with the hope of decreasing the amount of nonresponse error (Dillman 2000; Schaefer and Dillman 1998). Web surveys are especially attractive to use as a stand-alone mode for surveys within ‘special populations,’ e.g., organizational populations, where all members have an official email
*
Contact
[email protected]. The authors wish to thank David R. Johnson for consultation on this project and Talat Azhar for assistance with project management.
Grim, Semali & Maretzki (2005)
2
ICIK Working Paper
address provided by their organization (Couper, Traugott and Lamias 2001; Kaplowitz, Hadlock and Levine 2004; Sills and Song 2002). Since web surveys can be a quick and cost-effective option to survey special populations, it is important to understand the extent to which the data and analyses are biased by nonresponse. This is a particular concern because response rates to web surveys tend to be lower than to other modes (Couper, Blair and Triplett 1999; Dillman et al 2001, Sheehan 2001),1 including among university faculty populations (Less 2003).2 Groves et al (2004:165) have suggested that mixed-mode designs provide a ‘creative’ way to do methodological research. Using an experimental mixed-mode design provides a way to ‘creatively’ check whether nonresponse is a nonignorable source of bias in web surveys of special populations; it also provides a cost-effective option to correct for nonresponse bias, if bias exists. Methods We incorporated an experimental mixed-mode design to test for nonresponse bias in a web survey of the entire faculty (N=6548) at The Pennsylvania State University in the fall of 2004.3 The members of the whole population sample were randomly assigned to two different subsamples: 5548 were assigned to a four-contact web-only group and 1000 were assigned to a web-with-mail group, which received the identical four web contacts plus an equivalent paper follow-up survey and a reminder postcard.
1
Dillman et al (2001) in a multi-modal study reported a web response rate of only 12.7 percent compared to much higher rates for other modes. Kim Sheehan (2001) did an analysis of response rates to email surveys from 1986 to 2000. She found that just as in all modes of survey research, response rates are declining. The average response rate of web surveys reviewed by Sheehan was 24%. 2 Karen Hill Less (2003) conducted a web survey of all faculty in the North Carolina Community College system (59 independent two-year public institutions). She reported a 20.0% response rate of the total number of faculty members who were confirmed as having received the survey, which was only 13.8% of the total number of faculty sent the survey. The total response rate was only 11.1% of the full-time faculty members in the North Carolina Community College system. 3 This included all full time and part time faculty at all 23 Penn State locations as well as all extension educators. The study was approved byt Penn State’s Office of Research Protections (IRB # 19575).
Grim, Semali & Maretzki (2005)
3
ICIK Working Paper
Our substantive interest made it desirable to survey the entire population rather than a smaller sample of the population because the research was exploratory in nature, aiming to study a topic that has not been researched either at Penn State or in-depth in the literature. The research objective was to see how extensively Indigenous Knowledges (Semali and Kincheloe 1999),4 a form of engagement scholarship (Boyer 1996), is incorporated across the whole university. A whole population sample allowed coverage of every college, campus, and department in the university. Since many departments have small numbers of faculty members, a less inclusive sampling would have missed many small departments where there was no way to predict whether Indigenous Knowledges (IK) were being incorporated into teaching and/or research activities. A web survey is a particularly good option for this population since all Penn State faculty are assigned email accounts, which are frequently used since the addresses and passwords associated with these accounts are tied to other services including library accounts and personnel information. These email accounts and addresses are kept current and are discontinued for faculty almost instantaneously upon termination of employment.5 The survey had 45 items divided into four sections: (a) basic demographics, (b) questions on the use of IK in teaching, research, and outreach; (c) sections completed only by those who use IK in their teaching; and (d) general questions on interest in
4
Indigenous Knowledges (IK), as used by the researchers. is “local, traditional, or folk knowledge or ways of knowing that are grounded in the experience of a local community.” Knowledges is used in the plural to emphasize that there are different bodies of indigenous knowledge, not just different types of indigenous knowledge. 5 The fact that all Penn State faculty have official email addresses overcomes one of the most serious concerns in web surveying, i.e., coverage bias (Kaye and Johnson, 1999; Crawford, Couper and Lamias 2001). There is the potential for non-response bias due to coverage error if not all have equal access to the web (Alvarez and VanBeselaere 2003), however, the use of the web within the Penn State community is ubiquitous; in general, university communities mirror if not lead the general public in the adoption of such technology (Less 2003). Still, the varying level of respondent experience and comfort with Web browsers is a possible source of bias (Dillman, Tortora and Bowker 2001).
Grim, Semali & Maretzki (2005)
4
ICIK Working Paper
utilizing IK. Pre-notice emails were sent to the whole population sample on October 12, 2004. An email invitation with a link to the survey was sent out on October 14 to the entire sample, excluding those who wrote back asking to be removed from the survey.6 Email reminders were sent on October 18 to all nonrespondents, continuing to exclude those who asked to be removed. On October 21, paper surveys were mailed out to 843 non-respondents of the N=1000 subsample who had not responded to the web survey.7 On that same day, a final email invitation and link to the survey was sent out to all nonrespondents. On October 23, a postcard reminder was sent to those who received the paper surveys.8 Of the 5548 in the web-only sample, 1471 responded (26.5%), while 452 of 1000 (45.2%) responded from the web-with-mail sample. The latter response rate is comparable to rigorous studies such as the national telephone Survey of Consumer Attitudes, which presently has a response rate of 48% (Curtin, Presser and Singer 2005). The combined total was 1923 respondents, or 29.4% of the whole population (Table 1). [Insert Table 1 about here.] Analyses The analysis proceeded in three stages.9 First, the possibility of a mode effect within the web-with-mail sample was investigated by comparing the item means of 6
Those who asked to be removed from the survey sample were not counted as respondents. Of these, 157 initially completed the web survey, so only 843 received a paper survey follow-up. 8 Considering the potential objection of university faculty to repeated follow-ups, a moderate level of follow-up effort was deemed most appropriate. A more aggressive follow-up strategy may have resulted in higher response rates. 9 A fourth stage not highlighted here was a comparison of the general demographics of the university community with the respondents in both samples indicated that the survey respondents generally matched the demographics of the university. For example, neither sample showed a significant difference between the likelihood of minorities to respond to the survey than Whites. However, both samples found the same significant difference: women were more likely to respond to the survey compared to men. In general, this is not a surprising finding given that women are generally more likely to respond to surveys than men (Green 1996). See Appendix A. 7
Grim, Semali & Maretzki (2005)
5
ICIK Working Paper
respondents to the web mode (N=315) with those who responded to the paper mode (N=137). There were no statistically significant differences between these two groups on any of the demographic or substantive questions of the survey. There was only one nearly significant finding: tenure track respondents were more likely to respond to the web survey than to the paper survey. We found this in a re-coded variable10 which collapsed the various types of faculty appointments into two categories: tenure track/standing (2) versus non-tenure track/non-standing (1): web mean = 1.5116 versus mail mean 1.4211, p = .082, two-tailed. This provides some evidence that mode does have an effect; it also suggests that there is underrepresentation of non-tenure track faculty and thus potentially nonresponse bias in the web-only study. This is not a surprising finding in that other studies have similarly found that higher status employees are more likely to respond by web (Couper, Blair and Triplett 1999). Second, we analyzed the difference between the web-only sample (N=1471; 26.5% response rate) with the combined web-with-mail sample, which had a much higher response rate (N=452; 45.2% response rate). This comparison picked up one significant difference that a comparison of the modes in the web-with-mail survey did not show. The difference was on an open-ended question about departmental affiliation which we coded into two categories: technical (2) versus non-technical (1). A T-Test for difference of means showed that respondents from predominantly technical departments were more likely to respond to the web-with-mail survey than to the web-only survey (web-with-mail mean = 1.4840 versus web-only mean 1.4152, p = .016, two-tailed). Somewhat counterintuitively, those from technical departments were less likely to respond to the web-only treatment. On the one hand, it could be that respondents from technical departments were 10
See Appendix B for a summary of the variables.
Grim, Semali & Maretzki (2005)
6
ICIK Working Paper
less likely to respond because the topic was not salient to them. In that case, receiving the paper mail follow-up may have increased the sense of saliency to those in technical departments. On the other hand, it could be that those more heavily involved with technology may be receiving higher volumes of email, and thus may be more inclined to filter or trash incoming mail. Either way, the paper survey and reminder post card seem to have been instrumental in a higher response from faculty in technical departments on the web-with-mail treatment,11 with the result that the overall response rate for the web component of the web-with-mail sample was higher than the web component of the webonly sample (31.5% versus 26.5%). This finding provides some evidence of nonresponse bias in that faculty members from technical departments may be underrepresented in the web-only sample. And third, since the comparison of the web-with-mail and web-only samples indicated that technical departments were significantly underrepresented, we then specifically checked whether this might bias the substantive interpretation of the effects of being from a technical department. The substantive thesis of this research was that a Penn State faculty member’s use of Indigenous Knowledges12 is positively affected by working at a branch campus and by a high level of peer support, but negatively affected by working in a technical department and by the demands of the higher rank
11
There was not a significant difference in their response to web versus paper, but the overall response rate. The dependent variable to be explained is the level of a Penn State faculty member’s use of Indigenous Knowledges (IK, i.e., local, traditional and/or folk knowledge or ways of knowing that are grounded in the experience of a local community), particularly in teaching. This variable (see Appendix B) is treated as an ordinal variable in that if faculty members incorporate IK into their research or outreach but not in their teaching (=2), then this is reasonably a step towards incorporating IK in teaching (=3) in that what someone researches eventually will find its way into the classroom. This variable is also seen as an indicator of a faculty member engagement with the population the university serves. See publications such as the American Association of State Colleges and Universities’ “Stepping Forward as Stewards of Place: A Guide for Leading Public Engagement at State Colleges and Universities”, available at http://www.unomaha.edu/plan/stewardsofplace_02.pdf. 12
7
Grim, Semali & Maretzki (2005)
ICIK Working Paper
of tenure. We used structural equation modeling13 (Figure 1, Model A) to test the model implicit in our substantive thesis. Model A (Figure 1) uses only the data from the Web-with-Mail sample, N=452. In Model A, p is greater than .05, indicating that the model is a good fit with the data, i.e., any departure of the data from the models is insignificant. The other fit statistics also are good.14 [Insert Figure 1 about here.] Though Model A ‘fits the data,’ there is only one significant path in the model, i.e., the negative association of technical department with the incorporation of IK in teaching (standardized coefficient -.15, p < .05, two-tailed). The lack of statistical significance of the other paths may be due to having a fairly small number of cases to
13
Structural equation modeling (SEM) is especially appropriate for this study in that it allows both for a theory-driven model to be developed and tested and for various goodness of fit tests and the other ways of testing the model to increase the level of confidence in the findings. Though SEM is often discussed as if it were causal modeling, this is not appropriate since the causal mechanisms in SEM are no surer than in other forms of linear regression analysis. Also see John Fox for a summary of this issue: http://cran.rproject.org/doc/contrib/Fox-Companion/appendix-sems.pdf. 14 In structural equation modeling it is possible that more than one model can fit the data when a model is overidentified, i.e., when a model makes more estimates than there are unknowns. Overidentification is indicated by positive degrees of freedom (df=5). This overidentification allows for goodness-of-fit tests – tests of whether this particular model fits the data. The chi-square statistic for Model A (chi-sq=6.560, df=5, p=.255) indicates that this model is a good fit with the data, i.e., any departure of the data from this model is insignificant. Also, when the ratio of the chi-square statistic to the degrees of freedom is close to or less than 1, then the model is generally accepted. The chi-sq/df ratio of 1.312 suggests that the model fits the data. Since the chi-square statistic is sensitive to sample size, it is useful to discuss the results of three other goodness-of-fit tests reported in Figure 5. First, the Normed Fit Index (NFI) score of 0.953 estimates that the model is 95.3% away from the worst-fitting model and 4.7 % away from what could be a statistically ideal model. Since models with overall fit indices of less than NFI=.9 can usually be substantially improved (Bentler and Bonett 1980), further adjustments to this model would not be expected to make any appreciable improvement. The second test, the Tucker-Lewis coefficient (TLI) developed by Tucker and Lewis (1973), is more consistent across sample size. Unlike the NFI, the TLI is not bounded by 0 and 1. Scores close to 1 are considered to have a very good fit. This model’s TFI=.944 indicates a good fit. The third goodness-of-fit test reported in addition to the chi-square test is the Root Mean Square Error of Approximation test (RMSEA). RMSEA goes beyond just a comparison of this model with a worst case scenario model by adjusting for degrees of freedom. It permits the testing of the proposed model against other possible models and uses estimates adjusted for discrepancies in population moments and not just sample moments in its estimate of the model’s goodness (Steiger 1990). The RMSEA also produces an estimate that adjusts for sample size. This model’s RMSEA=.026 indicates a good fit considering that a value of < .050 indicates a close fit (Browne and Cudeck 1993).
8
Grim, Semali & Maretzki (2005)
ICIK Working Paper
evaluate, e.g., only 102 respondents completed the item Penn State Peer Support. As mentioned above, the underrepresentation of technical departments may bias the results. An indication of this would be a substantive change in the results of the same analysis using the web-only sample, N=1471 (Model B, Figure 2) instead of the web-with-mail sample. [Insert Figure 2 about here.] In Model B, p is again greater than .05, indicating that any departure of the data from the model is insignificant. The other fit statistics are also good. In this model, all of the paths are statistically significant at p ≤ .056, two-tailed, due to having a larger number of cases for each item, e.g., 347 respondents completed the item Penn State Peer Support. The increased strength in Model B of several relationships may be due to nonresponse biasing the analysis. The stronger relationship between technical department and incorporation of IK in Model B (-.18) than in Model A (-.15) as well as the stronger relationship between technical department and Peer Support in Model B (-.21) versus Model A (-.14) may be due to persons from technical departments being underrepresented in the web-only sample. Given such findings, it would seem advisable to weight the web-only data to compensate for the possible underrepresentation of technical departments (just as one would for the underrepresentation of males). However, this is problematic in that while there are data from Penn State on male-female ratios, there are no data available on the breakdown of faculty by department. This means that the only way the data could be weighted would be based on the ratio of technical to non-technical respondents from the web-with-mail sample, which may or may not be accurate. Another alternative to
Grim, Semali & Maretzki (2005)
9
ICIK Working Paper
mitigate nonresponse bias is to combine the two subsamples (A and B) since they do not overlap (Model C, Figure 3). [Insert Figure 3 about here.] Combining the two subsample data sets may be less preferable than weighting; nonetheless, it does produce a larger and ‘richer’ data set for analysis. Model C is an improvement over the two previous models. It fits the data extremely well (chisquare=3.340, df=5, p=.648; nfi=.992; tli=1.018; rmsea=.000). Also, all paths from exogenous variables are now statistically significant at p < .05, two-tailed. The relationship between being in a technical department with the incorporation of IK in teaching in Model C (-.17) provides a statistic closer to that of Model A (-.15), which would arguably be considered a better estimate due to the higher response rate, albeit smaller n. The wide discrepancy between the regression paths to and from the latent variable Peer Support for using IK may be an artifact of having a much smaller n for the two peer support variables in Model A. Lacking reliable external data upon which to weight survey data, combining the independent treatment groups into one sample provides a way to lessen the degree of nonresponse bias. Discussion This study, which used a mixed-mode experimental design, has made several contributions to understanding the dynamics involved with response (and nonresponse) to web surveys, especially web surveys of special populations. First, the results indicate that nonresponse in web surveys of special populations should not be ignored. Along with Couper, Blair and Triplett (1999), we found evidence that higher status employees are more likely to respond to a web survey than are lower
Grim, Semali & Maretzki (2005)
10
ICIK Working Paper
status employees. In this specific case, tenure track faculty members are more likely to respond to a web mode and non-tenure track faculty are more likely to respond to a paper survey option. Second (and counterintuitively), we found that more technically inclined respondents were significantly less likely to respond to web-only surveys. This does not mean that they will not respond by web, but that receiving a paper survey or postcard may help get the survey past their more sophisticated filtering practices. An rival explanation is that respondents from technical departments were less likely to respond because the topic was not salient for them. Instead of breaking through their higher filtering process, the web-with-mail treatment may have increased the sense of saliency of the survey. Either way, the results extend a previous finding that pre-notification by paper mail increases overall web response rates (Kaplowitz, Hadlock and Levine 2004) by demonstrating that mailed materials sent to follow-up a web survey were also associated with an increase in the web response rate. And third, this mixed-mode experimental design boosted the overall response rate from 26.5% in the web-only sample to 29.5% in the combined sample. While the response rate increase seems relatively negligible—only 3 percent, it does have an important effect on the significance levels in the substantive analysis. For example, three of the five regression paths in Model C (Figure 3) increased in their level of statistical significance over the analyses using the web-only data (Model B, Figure 2) and the webwith-mail data (Model A, Figure 1). Since significant results are generally more desirable than non-significant results, a response rate increase of 3% produced a ‘significant’ benefit for a moderate cost.
Grim, Semali & Maretzki (2005)
11
ICIK Working Paper
This study indicates that there are benefits for including a mixed-mode component as part of a web survey of special populations. It provides a cost-effective way to identify potential sources of nonresponse bias and it increases response rates, potentially in a ‘significant’ way. These benefits as well as the wider potential of using structural equation modeling as a method to explore the effects of nonresponse bias are areas of interest for future research.
Grim, Semali & Maretzki (2005)
12
ICIK Working Paper
REFERENCES Alvarez, R. Michael., and Carla VanBeselaere. 2003. “Web-Based Surveys.” California Institute of Technology. Internet last accessed 7/27/05: http://survey.caltech.edu/encyclopedia.pdf. Bentler, P.M. and Douglas G. Bonett. 1980. “Significance Tests and Goodness of Fit in the Analysis of Covariance Structures.” Psychological Bulletin 88:588–606. Boyer, Ernest. 1996. “The Scholarship of Engagement.” Journal of Public Outreach 1:11-20. Browne, Michael W., and Robert Cudeck. 1993. “Alternative Ways of Assessing Model Fit.” Pages 136-162 in K.A. Bollen and J.S. Long [Eds.], Testing Structural Equation Models. Newbury Park, CA: Sage. Couper, Mick P. 2000. "Web Surveys: A Review of Issues and Approaches." Public Opinion Quarterly 64:464-494. Couper, Mick P., Johnny Blair, and Timothy Triplett. 1999. “A comparison of mail and e-mail for a survey of employees in federal statistical agencies.” Journal of Official Statistics 15:39-56. Couper, Mick P., Michael W. Traugott, and Mark J. Lamias. 2001. "Web Survey Design and Administration." Public Opinion Quarterly 65:230–53 Crawford, Scott, Mick P. Couper, and Mark Lamias. 2001. “Web Surveys: Perception of burden.” Social Science Computer Review 19:146-162. Curtin, Richard, Stanley Presser, and Eleanor Singer. 2005. “Changes in Telephone Survey Nonresponse over the Past Quarter Century.” Public Opinion Quarterly 69:87-98. Dillman, Don A. 2000. Mail and Internet Surveys: The Tailored Design Method. New York: John Wiley and Sons. Dillman, Don A., Robert D. Tortora, and Dennis Bowker. 1998. “Principles for Constructing Web Surveys.” Internet last accessed 6/05: http://survey.sesrc.wsu.edu/dillman/papers.htm Dillman, Don A., Glenn Phelps, Robert Tortora, Karen Swift, Julie Kohrell, and Jodi Berck. 2001. :Response Rate and Measurement Differences in Mixed Mode Surveys Using Mail, Telephone, Interactive Voice Response and the Internet.” Draft Paper. Internet last accessed 7-21-05: http://survey.sesrc.wsu.edu/dillman/papers/Mixed%20Mode%20ppr%20_with%2 0Gallup_%20POQ.pdf
Grim, Semali & Maretzki (2005)
13
ICIK Working Paper
Fox, John. 2002. “Structural Equation Models: Appendix to An R and S-PLUS Companion to Applied Regression.” Internet last accessed 5/6/05: http://cran.rproject.org/doc/contrib/Fox-Companion/appendix-sems.pdf Green, Kathy E. 1995. “Sociodemographic Factors and Mail Survey Response.” Psychology and Marketing 13:171–84. Groves, Robert M., Floyd J. Fowler, Jr., Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau. 2004. Survey Methodology. New York: WileyInterscience. Kaplowitz, Michael D., Timothy D. Hadlock, and Ralph Levine. 2004. “A Comparison of Web and Mail Survey Response Rates.” Public Opinion Quarterly 68:94-101. Kaye Barbara K., and Thomas J. Johnson. 1999. “Research Methodology: Taming the Cyber Frontier.” Social Science Computer Review 17:323-337. Less, Karen Hill. 2003. Faculty Adoption of Computer Technology for Instruction in the North Carolina Community College System. A dissertation presented to the faculty of the Department of Educational Leadership and Policy Analysis East Tennessee State University. Internet last accessed 6/05: http://etdsubmit.etsu.edu/etd/theses/available/etd-0627103160627/unrestricted/LessK08052003f.pdf Schaefer, David R. and Don A. Dillman. 1998. “Development of a Standard E-mail Methodology.” Public Opinion Quarterly 62:378-397. Semali, Ladislaus M., and Joe L. Kincheloe. 1999. What is Indigenous Knowledge? Voices from the Academy. New York: Falmer Press. Sheehan, Kim. 2001. “E-mail Survey Response Rates: A Review.” Journal of ComputerMediated Communication 6. Internet last accessed 5/05: http://www.ascusc.org/jcmc/vol6/issue2/sheehan.html#references Sills, Stephan J., and Chunyuan Song. 2002. “Innovations in Survey Research: An Application of Web Surveys.” Social Science Computer Review 20:22-30. Steiger, James H. 1990. EzPATH: Causal Modeling. Evanston, IL: Systat. Tucker, Ledyard R., and Charles Lewis. 1973. “A Reliability Coefficient for Maximum Likelihood Factor Analysis.” Psychometrika 38:1–10.
Grim, Semali & Maretzki (2005)
14
Figures
Figure 1: Model A Structural Equation Model, N=452
Note: Penn State Peer Support n=102; External Peer Support n=101
ICIK Working Paper
Grim, Semali & Maretzki (2005)
15
Figure 2: Model B Structural Equation Model, N=1471
Note: Penn State Peer Support n=347; External Peer Support n=346
Figure 3: Model C Structural Equation Model, N=1923
Note: Penn State Peer Support n=449; External Peer Support n=447
ICIK Working Paper
Grim, Semali & Maretzki (2005)
16
ICIK Working Paper
Appendix A: Chi-Square Tests of Difference for Gender of Respondent A significant difference was found in the likelihood of women to respond to the survey compared to men in the combined sample as well as the subsamples. In general, this is not a surprising finding given that women are generally more likely to respond to surveys (Green 1996). See following tables.
Grim, Semali & Maretzki (2005)
Appendix B: Variables Used
17
ICIK Working Paper
Grim, Semali & Maretzki (2005)
18
ICIK Working Paper