Erica Olmsted-Hawala1a , Temika Holland a and Elizabeth Nichols a. aU.S. Census Bureau, Center for Survey Measurement, 4600 Silver Hill Road, ...
Answers for Self and Proxy—Using Eye Tracking to Uncover Respondent Burden and Usability Issues in Online Questionnaires Erica Olmsted-Hawala 1a , Temika Holland a and Elizabeth Nichols a a
U.S. Census Bureau, Center for Survey Measurement, 4600 Silver Hill Road, Washington DC, 20233;
{Erica.L.Olmsted.Hawala, Temika.Holland, Elizabeth.May.Nichols}@census.gov
Abstract. In a study of the American Community Survey online instrument, we assessed how people answered questions about themselves and other individuals living in their household using eye-tracking data and other qualitative measures. This paper focuses on the number of fixations (whether participants looked at specific areas of the screen), fixation duration (how long participants looked at the questions and answers), and number of unique visits (whether participants rechecked the question and answer options). Results showed that for age, date of birth and employment duty questions participants had more fixations and unique visit counts, and spent more time on the screen when answering about unrelated members of their household than when answering about themselves. Differing eye movements for proxy reporting suggest that answering some survey questions for other unrelated people poses more burden on respondents than answering about oneself. However, not all questions showed this tendency, so eye tracking alone is not enough to detect burden. Keywords: eye tracking, usability, questionnaire design, proxy reporting, respondent burden
1
Introduction/Background
Eye tracking has been used in data collection surveys for a number of purposes to help influence the design of surveys and understand how respondents interact with survey questions. For example, it has been used to identify the usability of branching instructions in paper survey designs - gaze paths (or trails) were used to identify whether respondents read a branching instruction and then followed the instruction to skip a question or move forward to the next appropriate question [1]. Eye tracking has also been used in studying online questionnaires to identify whether participants no1
Disclaimer: This report is released to inform interested parties of research and to encourage discussion. Any views expressed on the methodological issues are those of the authors and not necessarily those of the U.S. Census Bureau.
adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011
ticed instructions informing the respondent how to return to the survey if they were logged out [2]. Eye tracking helped to visualize what survey designers intuitively understood, that respondents notice the first few response options more than they do the latter response options [3]. However, there are also cautions when using eye tracking with complex survey designs. Researchers recommend using eye tracking in conjunction with other data, such as qualitative results collected from usability testing, when reporting findings because complex surveys may elicit the “blank gaze” where participants recall information while looking at the screen but their attention is not actually on where their eyes are focused [4]. There is plenty of research that investigates how, particularly in household surveys, data quality varies by whether the respondent is answering questions about themselves or other individuals (also known as proxy reporting). Many household surveys rely on proxy reporting because of cost limitations or time constraints. For example, it may not be possible within a certain time limit and budget to have all individuals of a household answer the survey. It is typical then to have one person in the household report for all others in that household. In general, the consensus on whether proxy reporting is reliable is mixed. There are many reasons for this. One primary reason, noted by Moore, in a review of the literature, is due to methodological reasons, such as self-selection bias (e.g. many studies fail to assign respondent to answer as self or proxy a priori [5]. Some researchers have found proxy reports can be accurate depending on the subject of the questions [6; 7]. However, when proxies are asked, questions that are more burdensome or difficult to answer, such as questions about race/ethnicity [8], mental health questions [9], income questions [10], or other questions concerning subsections of the population [11], proxy reporting may be significantly inaccurate. In addition, there are differences in the quality of proxy reporting depending on the relationship of the proxy to the target individual [12]. That is, when the proxy respondent is more closely related to the person they are answering the questions about, the data are more accurate [13]. To date, the evaluation of proxy data has relied on comparing self-reports to proxy reports to examine differences, or on comparing proxy reports to administrative records. We hypothesize that eye-tracking data can be another method to identify difficult questions for proxy respondents. We were specifically interested in evaluating difficult or burdensome questions for people who are unrelated to the respondent. To do so, we compare the eye-tracking behavior for an individual when answering questions about oneself versus answering the same questions about an unrelated person residing in the same household. Based on prior literature that suggests increased eyefixations and durations can be an indicator of burden (confusion/frustration) [14; 2; 15] we hypothesized that there would be a higher count of eye fixations, an increased duration of eye fixation data and more visit counts (that is going back to the same area of interest multiple times) for the proxy reports of unrelated individuals than for when participants were reporting for themselves.
2
Methods
In the fall of 2013, the U.S. Census Bureau conducted a usability test with eye tracking on the online American Community Survey (ACS). The ACS is an ongoing national survey, administered to nearly 3 million household units per year. The survey generates data to assist in the allocation of more than $450 billion in federal and state funds. The aim of the usability testing was to improve the design of the survey questions and navigation features of the online instrument; thus, minimizing measurement error. Specifically, the testing attempted to investigate whether changes to the instrument would minimize the break-offs and the number of edit messages received at particular screens, especially when answering for unrelated household members [16]. This paper focuses on some of those same screens, including difficult questions for an unrelated individual to answer, such as date of birth, employment information, and wages. 2.1
Participants
Ten participants who lived in unrelated households were recruited from a database managed by the Center for Survey Measurement. These participants resided in the Washington DC metropolitan area and responded to a newspaper advertisement stating they would be interested in participating in research studies at the Census Bureau. Household compositions of participants consisted of a mix of related and unrelated household members. Participants answered detailed questions for all household members, including themselves and all individuals living in the household (related and unrelated). Of the 10 respondents, one had one unrelated person in their household and the others had between two and nine unrelated persons in their household. Participants were given a $40 honorarium. Participants completed a questionnaire about their computer use and Internet experience. To be eligible for the study, participants had to meet a minimum Internet experience requirement of using the Internet three times a week for at least a year, for more than simply checking their email. If participants met this requirement, and lived in complex households (which we defined as households having three or more persons with at least one of the three being unrelated to the participant), they were scheduled to participate in a usability test. All participants were considered knowledgeable in navigating the Internet and using a computer, although some were more experienced than others. Participant demographics are presented in Table 1. All participants were unfamiliar with the ACS, and none of the participants reported receiving this survey in the past.
Table 1. Mean (and Range) Participant Demographics
N Gender Age
Education Difficulty in learning to use new Websitesa Difficulty in using the Interneta Overall experience with computers to use the Internet b Familiarity with Census Bureau Website c
Participants (range) 10 3M/7F 31 (21-51) 7 < High school & some college 3 > Masters or PhD 1.4 (1-2) 1 (1-1) 1.2 (1-2) 1.7 (1-4)
a
Scale: 1 (Not difficult at all) – 5 (Extremely difficult) b Scale: 1 (A great deal) – 5 (None) c Scale 1 (Not familiar at all) – 5 (Extremely familiar)
2.2
Procedure
Usability testing was conducted in the Human Factors and Usability laboratory at the U.S. Census Bureau, Suitland, MD. Each participant sat in a room facing a one-way mirror and a wall camera. The room was equipped with a 17” LCD monitor attached with a Tobii X-120 eye tracker. The test administrator (TA) began the session by reading a brief introduction about the purpose of the study and the uses of data that are collected in the survey. The participant, working one-on-one with the TA, signed a consent form granting permission to be video and audio recorded, completed a short-term memory task, and a brief calibration of their eyes for eye tracking. Once the set up was complete, the participant was left alone in the room. The TA monitored the session from the opposite side of the one-way mirror, communicating with the participant via microphones and speakers. The TA began video and audio recording. After completing a short questionnaire about their Internet search habits and strategies, and a short-term memory exercise, the participant answered the online ACS survey. During the study, the participant completed the ACS survey in silence. While the participant was answering the ACS survey questions, their eye movements were recorded unobtrusively. At the conclusion of the ACS survey, eye tracking was stopped and the participant completed a satisfaction questionnaire that was loosely based on the Questionnaire for User Interaction Satisfaction [17]. Upon completion of the satisfaction questionnaire, the TA returned to the participant room, asked a set list of debriefing questions, paid the participant, and then escorted the participant from the building.
2.3
Qualitative usability feedback
Participants were debriefed at the conclusion of the study, using a list of scripted questions. However, the TA deviated from the script to ask targeted questions based on the participants’ responses. This debriefing allowed for an unscripted conversation about the survey questionnaire they had just completed. The verbalized comments from the debriefing were reviewed and transcribed by the authors of this paper. 2.4
Eye tracking
We recorded the eye tracking data using Tobii Studio [18]. We examined eye movements in predefined areas of interest (AOIs). For each screen analyzed, the AOIs included the question stem, any italic instructional test, and the answer field. We focused on the eye-tracking data related to participants’ answers about themselves and other unrelated individuals in their household. For each AOI, we examined the total number of fixations to assess where participants looked on the screen; total fixation duration to assess how long participants spent on each area of the screen; and the total number of unique visits to the AOIs to assess if participants rechecked the question stem, instructional text or response fields. Fixation duration and the number of eye-tracking fixations per screen were evaluated to assess the amount of burden experienced by participants when responding to these survey items. We compared the questionnaire screens when a participant answered questions about themselves to the screens when the participant answered questions about unrelated individuals living in the household. The purpose was to identify differences in eye-movement patterns.
3
Results
The analysis plan included a comparison of eye-tracking data of participants who answered questions about themselves and about other unrelated people in their household. This paper reports eye-tracking results on a few of the screens that appeared to have more usability issues and were more difficult for proxy respondents. A paired t-test was performed to assess differences in fixation duration, fixation counts and visit counts on survey items for the two groups: self-reports and proxy reports for unrelated individuals in the household, which we refer to as proxy reports. Table 2 provides information on the means and standard errors for each AOI in which significant differences were observed. Table 2. Means and standard errors for AOIs Fixation Duration Mean (SE) Screen (AOI)
n
Self
Proxy
Fixation Counts Mean (SE) Self
Proxy
Visit Counts Mean (SE) Self
Proxy
Date of Birth Question Text Italic instructions Response field(s) Entire Screen
7 3 8 8
.74 (.24)** ns 1.54 (.55)** 4.48 (.77)**
1.32 (.38)** ns 2.89 (.93)** 8.67 (1.24)**
3.86 (.85)** ns ns 23.5 (1.81)*
6.71 (1.71)** ns ns 45.25 (10)*
2.29 (.52)* ns 4.75 (.84)* 7.25 (.56)*
Employment Duties Question Text Italic instructions Response field Entire Screen
5 5 7 8
ns ns ns 5.02 (1.36)*
ns ns ns 8.29 (2.39)*
3.2 (1.11)* ns ns 23.5 (4.84)*
9.4 (2.80)* ns ns 35.5 (7.55)*
2.2(.58)** ns ns ns
4 5 4 4 4 2
ns ns ns ns .91 (.20)** ns ns
ns ns ns ns .37 (.12)** ns ns
ns ns ns ns 5.00 (1.00)** ns ns
ns ns ns ns 2.25 (.95)** ns ns
4 5
ns ns
ns ns
ns 4.75 (.75)** ns 5.25 (1.60)* 3.00 (.71)** ns ns
4 3 4 4 3 3 5
ns ns ns ns ns ns ns
ns ns ns ns ns ns ns
ns ns ns ns
ns ns ns ns
3.71 (.78)* ns 8.25 (1.70)* 18.88 (6.01)*
4.4(.87)** ns ns ns
Employer location Question Text M ain a b c d e f Italic Instructions M ain a Response field(s) a b c d e f Entire Screen Wages Question Text Italic instructions Response field(s) Entire Screen
*p