Keywords: Survey Design, Hidden Markov Model, Ill-defined domain. 1. Introduction. The web has lowered the barrier to collect information through surveys [1]. ..... [12] Rion Snow , Brendan O'Connor , Daniel Jurafsky , Andrew Y. Ng, Cheap ...
Survey Sidekick: Structuring Scientifically Sound Surveys I-Han Hsiao1, Shuguang Han2, Manav Malhotra1, Hui Soo Chae1, Gary Natriello1 1
EdLab, Teachers College Columbia University, New York City, NY, USA {ih2240; mm2625; hsc2001; gjn6}@columbia.edu 2 School of Information Sciences, University of Pittsburgh, Pittsburgh, PA, USA {shh69}@pitt.edu
Abstract. Online surveys are becoming more popular as a means of information gathering in both academia and industry because of their relatively low cost and delivery. However, there are increasing debates on data quality in online surveys. We present a novel survey prototyping tool that integrates embedded learning resources to facilitate the survey prototyping process and encourage creating scientifically sound surveys. Results from a controlled pilot study confirmed that survey structure follows three guided principles: simplefirst, structure-coherent and gradual-difficulty-increase, revealing positive effects on survey structures under learning resources influences. Keywords: Survey Design, Hidden Markov Model, Ill-defined domain
1
Introduction
The web has lowered the barrier to collect information through surveys [1]. Survey Monkey1, one of the most popular online survey tools, has successfully created more than 15 million online surveys. However, until today, most of the online survey tools mainly focus on the support of survey delivery and simple analytics, neglecting the quality of the survey. For experienced survey researchers, they can rely on their expertise and experience to ensure survey validity and reliability. Non-experienced survey creators may be at a disadvantage from a lack of feedback or guidance, unknowingly creating biased and incomplete surveys. In this paper, we study and report an innovative solution to encourage creating scientifically sound surveys. In traditional Artificial Intelligence, intelligent tutoring systems have succeeded in automatically providing feedback for problem solving and direct instructions, in the form of examples or definitions of the concepts [2] or auto-grading [3]. Recent Intelligent Tutoring Systems face new challenges due to the increasing importance of interdisciplinary study in ill-defined domains, where there is no guarantee of getting quick and sound feedback and the quality of answers is difficult to evaluate. For in1
Survey Monkey: https://www.surveymonkey.com
stance, reasoning legal arguments [4], providing semantic and constructive feedback for survey design [5], and programming assignments [3, 6] among others. Constrain-based tutors are studied to effectively provide feedback for ill-defined problems. However, it is time-consuming to manually generate constraints for a broad domain [9,10]. A less costly way of receiving constructive feedback is to obtain answers from online Q&A systems [11], crowdsourcing, [12] or collecting community feedback through a systematic peer review process [13]. However, these approaches still present several challenges such as low answering rate [14], answer quality ambiguity [15], and among others. To address these new challenges and move from automatic assessment to a more data-driven approach for feedback generation, there are techniques such as considering probabilistic distance to solution for assessing the progress to identify misconceptions or the problem solving path [6], forms of latent semantic analysis (LSA) for automatic evaluation and topic mapping [3]. QUAID [16] is one of the few web tools that assist survey methodologists in examining survey questions such as wording, syntax, and semantics of questions. Our focus is on survey structure and adaptive learning resources during the survey prototyping process. Applying previous research conclusions into our study, we hypothesize that embedded learning resources and providing automatic hints [8] during survey prototyping process with dynamic survey structure modeling will enhance survey design quality. Before providing effective feedbacks, understanding user behaviors in creating survey also helped better suggesting learning resources. To research all the issues addressed above, we present an innovative system – Survey Sidekick and study the effectiveness of our approach.
2
Survey Sidekick
Survey Sidekick (https://surveysidekick.com) is an online survey tool developed by EdLab, Teachers College Columbia University. The beta version was launched in October 2012 and is currently open via invitation. We currently have 444 users, with 102 of them designing one or more surveys. Survey Sidekick supports design, delivery, data analytics and reporting. The system includes embedded learning resources (orange icons) and interface support (blue icons) (Figure 1). Both are displayed at the relevant moment during prototyping process at the side of the questions or the entire survey. Embedded learning resources are tutorials extracted from a survey design textbook [17]. Further design rationale is reported in [5]. In this work, we extend the dynamic learning resources support by evaluating the survey structure & question composition.
Figure 1. Survey editing interface; https://surveysidekick.com
3
Modeling Sequential Survey Structure using HMM
The Hidden Markov Model (HMM) is a popular method for modeling sequential data. In this study, we employ the HMM to model users’ hidden tactics in designing a survey, and use the choice of each answer type (e.g. free text, Likert, and multiple choice) as the outcome of the hidden tactics. The hidden tactics together can be thought of as the strategy used to design the survey. A similar study is conducted by Yue et al. [18] in understanding users’ information seeking behavior. In Survey Sidekick, there are 7 different types of question/answer types supported. They are: Numeric (N), Free text (F), Short answer (S), Multiple choices with single correct answer (MS), Multiple choices with multiple correct answers (MM), Likert with 5-value scale (LI), and Likert list with more than 5-values (LL). We have a sequence of survey answer types from T1 to TM, and each is from the predefined set: TS = {N, F, S, MS, MM and LI}. HMM assumes that we also have a sequence of hidden states, from H1 to HM, and each answer type is generated by a corresponding hidden state, but different answer types can be generated by the same hidden state with different probabilities. A HMM model has several parameters: the number of hidden states HS, the start probability of each state , the transition probabilities among any two hidden states and the emission probability from each state to each action .
4
Evaluation & Results
We randomly selected the training dataset from Survey Sidekick, which contains 38 surveys with 1,048 questions. For the test data set, we recruited 22 subjects and randomly assigned them into control and treatment groups to design a survey on the same scenario [5], where control group had no learning resources access and treatment group did. All the usage logs, including the survey content (questions, questions types, survey layouts, survey administration) and learning resources usage (modules, sub-modules, access points: static list view or dynamic box view) were collected. 4.1
Survey Structure Analysis
The first step of using HMM is to determine the number of hidden states, which also refers to the model selection problem. A complex model with large number of states will help to increase the sequence likelihood, as there are more parameters that can be used to describe the model more precisely. The tradeoff is a high risk of over-fitting. We chose hidden state (HS=7) because it had the best performance under Akaike Information Criterion (AIC). The emission probability of each hidden state and transition probability are shown in Table 1, in which the probabilities under 0.05 were removed for clarity of presentation. The hidden states can be thought as the underlying “tactics” or “strategies” the surveyors use to design their survey. For example, in HS2, the designers focused on collecting information based on the Likert questions, while in HS5, the designers tend to collect data either using Likert questions or multi-
ple choices questions. However, some hidden states (e.g. the HS1) also have high probabilities of generating both Likert question and free text questions, which suggests they make alternative choices for collecting the same type of information. Table 1. N
The Hidden States of Survey Structure ( )(LEFT); Transitions among the hidden states ( ) (RIGHT) LI
MM
MS
F
HS1
0.41
0.60
HS2
0.84
0.07 0.12
HS3 HS4 HS5
0.43
HS6 HS7
4.1.1
0.53
0.37
0.15
0.42
HS1
HS2
HS3
HS4
HS5
0.10
0.84
HS6 0.07
0.61 0.10
0.05
0.34
0.14
0.77
0.07
0.93 0.35
0.86
HS7
0.07
0.91
0.09
0.85
0.63 0.09
S
0.28
0.05
0.41
0.24
0.35
0.61
Simple first principle
HS7 has a high prior probability (start probability), which means that surveys usually begin with HS7, asking a numeric question or simple multiple-choice questions (i.e. a demographic question with numeric or multiple-choice question type such as what is date of birth). HS2 also has a reasonable start probability, in which the Likert questions or short answer questions may be asked. Moreover, the prior probability also indicates the complex question type (free text) is less likely to appear as the survey starters (HS1 & HS3). The result aligned with literature in designing the opening survey where Iarossi [7] suggested using simple questions to begin the survey. 4.1.2
Structure coherence principle
The probability in each diagonal cell is the highest in each row, which suggests an interesting fact in the survey designing process: the same types of questions tend to be used closely together. It indicates a consistency among sub-sections. One of the biggest benefits of designing a survey this way is that the structure coherence may help designers reduce cognitive load that caused by switching between different types of questions. Such finding is again supported by the design principle proposed by Iarossi [7]: finishing one topic before raising a new topic, which focused more on the content consistency, but the HMM structure also strongly suggests consistency at the structure level. In addition, maintaining survey structure consistency appears to be a more manageable task when the survey involves skip logic, or detailed questions. 4.1.3
Gradual difficulty increase principle
We also observed several inter-type transitions: HS6HS3, HS6HS7, HS7HS6 and HS3HS4. Take HS7HS6 for instance, after raising the opening questions (HS7), the designers may continue asking simple short answer questions or more difficult multiple-choice questions. To give a concrete example, after demographic questions (usually numeric type) or skip logic questions (usually multiple choice with
single correct answer) are asked, a more difficult multiple-choice question or free-text based question is likely to be extended to solicit more in-depth information from the survey takers. If however the short answer type questions were asked, the next step will either stay in the same state (self-transition), go back and ask another round of simple questions (HS6HS7), or ask even more difficult questions, e.g. the openended questions (HS6HS3). The transition from HS3HS4 (free text question to multiple-choice question) also suggests that the designers tend to choose to ask even more in-depth questions for the open-ended questions. In addition, we found that HS1, HS2 and HS5 are less likely to transit to other survey hidden states. Their correspondent question types, such as Likert and free text questions appear to be at the very end of the entire survey. 4.2
Effects of Learning Resources
To evaluate how learning resources affect on survey prototyping structures, we looked at the topics accessed by users, survey question types, question text, survey layout edits and moves. We found that on average every user in learning-resourcesenabled group studied 5.18 topics, and 57 topics in total. They had significantly more moves, or structural edits, (p