495870
research-article2013
CDPXXX10.1177/0963721413495870Glass, SinhaQuestion Answering
Multiple-Choice Questioning Is an Efficient Instructional Methodology That May Be Widely Implemented in Academic Courses to Improve Exam Performance
Current Directions in Psychological Science 22(6) 471–477 © The Author(s) 2013 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0963721413495870 cdps.sagepub.com
Arnold L. Glass and Neha Sinha Rutgers University
Abstract Distributed multiple-choice questioning during instruction improves exam performance in middle-school and college classes. Instructional multiple-choice questioning is effective whether done in class or online and improves performance on short-answer and free-recall exam questions as well as multiple-choice questions. It may improve performance on exam questions about facts not previously queried. Keywords testing effect, distributed study, long-term retention, instruction The entire purpose of pedagogy is the learning and retention of knowledge or a skill; therefore, any factor of a study task that influences long-term retention is clearly relevant. In laboratory studies, answering a question about the study material increased long-term retention compared with repeated listening or reading (Carpenter, 2012; Carpenter, Pashler, Wixted, & Vul, 2008; Marsh, Roediger, Bjork, & Bjork, 2007; McDaniel, Roediger, & McDermott, 2007). Furthermore, when a question was repeated during study, doing so after an interval increased the effect of questioning on learning and retention compared with massed questioning (Carpenter, Cepeda, Rohrer, Kang, & Pashler, 2012). Taken together, the results of laboratory studies suggest that distributed questioning would be an effective instructional method if integrated into the curricula of actual academic courses. However, these studies fell short of demonstrating this (Roediger & Marsh, 2005) for two reasons. First, the retention intervals used in laboratory studies, ranging from less than a day to a week, are much shorter than the retention intervals, measured in weeks or months, that are relevant to academic performance (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006). Effects that are obtained at shorter retention intervals cannot be assumed to generalize to longer intervals. Second, during laboratory experiments, students had no opportunity to review the study materials during the
retention interval. However, in actual academic courses, students are encouraged to review study materials (i.e., study for the exam) during the retention interval, and almost all of them certainly do so. It is possible that the effect of distributed questioning observed in a laboratory experiment, in which studying during the retention interval is impossible, will be reduced in size or even obliterated by the general improvement in exam performance that results from the studying done for an exam in an actual course. Therefore, the only way to demonstrate that distributed questioning is an effective instructional method in an actual academic course is to demonstrate an effect of distributed questioning on exam performance in an actual academic course. Fortunately, advances in technology made it possible to develop a method for studying the effect of distributed questioning in an academic course: the course-embedded experimental paradigm. In this paradigm, a within-student, within-item, counterbalanced experimental design is embedded within a multisection academic course. The course materials and exams are also the experimental treatments and measures. By Corresponding Author: Arnold Glass, Department of Psychology, Rutgers University, 152 Frelinghuysen Road, Piscataway, NJ 08854 E-mail:
[email protected]
Glass, Sinha
472
Increase in performance when a review test was given Performance when no review test was given 95 85
Percentage Correct
introducing personal response systems (clickers) into the classroom, it was possible to monitor academic performance in much more detail than previously possible. By having students gain access to and answer homework assignments online, it was possible to monitor homework performance in more detail as well. Nine recent experimental studies that made use of this method have been published. Three of these studies were of middle-school courses (McDaniel, Agarwal, Huelser, McDermott, & Roediger, 2011; McDaniel, Thomas, Agarwal, McDermott, & Roediger, 2013; Roediger, Agarwal, McDaniel, & McDermott, 2011), four were of college courses (Glass, 2009; Glass, Brill, & Ingate, 2008; McDaniel, Wildman, & Anderson, 2012; Shapiro & Gordon, 2012), and two were of training programs for medical residents (Larsen, Butler, Lawson, & Roediger, 2013; Larsen, Butler, & Roediger, 2009). All of the studies except those of Larsen et al. (2009) made use of variants of the same experimental procedure. Either the same multiple-choice exam question was asked up to three times before the exam or a related question, the answer for which was implied by the same fact statement as the exam question, was asked up to three times before the exam. In the control condition, the answer to the multiple-choice question was presented by itself, in statement form, at the same points in the lesson that the question was asked in the experimental condition. Throughout the course, prelesson questions were presented at the beginning of a class presenting the lesson containing the fact statement tested by the question. Postlesson questions were presented at the end of the lesson, shortly after the fact statements they tested. Review questions were presented 1 or more days after the class containing the lesson. Immediate feedback as to the correct answer always followed prelesson, postlesson, and review questions. Furthermore, each exam question was presented on a monthly unit exam and, in most experiments, again on the end-of-semester final exam. Most important, all the studies used counterbalanced within-student, within-question experimental designs, embedded within multisection courses, in which the experimental factor was the effect of the number of related questions encountered before the exam on performance on the exam question. Despite the variety in the subject matter and in the student populations, the effects across studies were similar enough that the means used in the figures are representative of all results of all experiments. Figure 1 provides a bridge from the laboratory to the classroom. It shows that repeating a multiple-choice question during the semester increases performance on the question on a final exam. Figure 2 shows that multiple-choice questioning also improves exam performance on a novel question, the answer to which is implied by the same fact statement as the repeated question.
75 65 55 45 35 25
Unit Exam
Final Exam
Fig. 1. The improvement in exam performance as the result of a review test 1 or 2 days before a unit exam for the unit exam and the final exam (McDaniel, Agarwal, Huelser, McDermott, & Roediger, 2011, Experiments 1 & 2B; Roediger, Agarwal, McDaniel, & McDermott, 2011, Experiments 1 & 2).
A Previous Question Increases Short-Term and Long-Term Exam Performance on the Repeated Questions It is unsurprising that asking some of the questions that will appear on the exam on the day before, with feedback, improves performance on those questions on the exam (Figure 1). However, more interesting is the longterm effect of a review 1 or 2 days before a unit exam when those same questions are repeated 1 to 3 months later on the final exam. The purpose of academic instruction is not a short-term improvement on a unit exam but the imparting of long-term knowledge, which may be measured by performance on an end-of-semester final exam. As shown in Figure 1, the review’s short-term effect on unit-exam performance persists when the same questions are repeated on the end-of-semester final exam 1 to 3 months later (McDaniel et al., 2011; Roediger et al., 2011). Two aspects of the results shown in Figure 1 are important. With regard to long-term retention on the final exam, a significantly greater increase in performance occurs after two or more prior presentations of the question compared with one prior presentation of the question. In Figure 1, this is the difference between final-exam questions answered previously only on the unit exam versus final-exam questions asked previously on both the
Question Answering
473
100
1st Presentation
2nd Presentation
3rd Presentation
4th Presentation
90
Percentage Correct
80 70 60 50 40 30 20 10 0
Prelesson
Postlesson
Unit Exam
Final Exam
Fig. 2. The improvement in performance as a function of number of question presentations when prelesson quizzes, postlesson quizzes, or both were presented over the 3- to 4-week interval before a unit exam (McDaniel, Agarwal, Huelser, McDermott, & Roediger, 2011, Experiment 2B; Glass, Brill, & Ingate, 2008).
unit exam and a preceding review test. Furthermore, the increase in performance caused by two versus one prior presentation falls in the range of a medium to large effect, as shown in Table 1. In terms of the usual academic grading system, the increase is approximately a letter grade.
Questioning Distributed Over 3 to 4 Weeks of Instruction Increases LongTerm Exam Performance on Both the Repeated Questions and Novel Related Questions Figure 2 shows the results when prelesson and postlesson but no review questions have been presented before the unit exam. A unit-exam question was preceded by two (prelesson and postlesson), one (prelesson or postlesson), or no previously answered questions. The questions used as prelesson and postlesson questions were either identical to or related to the unit-exam questions. However, this factor had no effect and so is not shown in the figure. The interval between the previous question and the unit exam ranged from 2 to 30 days with a mean of approximately 14 days (Glass et al., 2008; McDaniel et al., 2011). The questions on the unit exam were repeated on the final exam. McDaniel et al. (in press) and Shapiro and Gordon (2012) also found that preexam questions improved performance on related exam questions for which the answers were implied by the same fact statement. The point of education is not verbatim retention of questions and answers but retention of the underlying fact statements that imply the answers. The effect on
related exam questions is the critical finding that demonstrates the academic value of distributed questioning as an instructional method.
Varied Questioning Increases Generalization to a Related Question Most theories of memory predict that the degree of generalization to a novel related question should be a positive function of the number of different related questions asked. Glass (2009) tested this prediction by comparing the effects of repeating a question versus asking different related questions. In the same condition, the same question was repeated as a prelesson, a postlesson, and a review question asked 1 week after the postlesson question. In the varied condition, three different related questions were used as the prelesson, postlesson, and review questions. A fourth related question that did not appear as a prelesson, postlesson, or review question in either condition appeared on a unit exam and the final exam. In Figure 3, bars in the same color represent results for the same questions. In the same condition, performance was best on the review question. In contrast, in the varied condition, performance declined on the review question. However, this proved to be a desirable difficulty (Bjork, 1994), because it resulted in a generalization, which resulted in better performance on the novel exam questions in the varied condition. These findings raise the possibility that when generalization to semantically related items is considered, the monotonically increasing function characterizing verbatim learning does not apply. Instead, by systematically
Glass, Sinha
474 Table 1. Effect Sizes of Distributed Questioning for All Conditions in Three Middle-School or College Class-Embedded Studies Effect Size (d) Condition
Unit Exam
McDaniel, Agarwal, Huelser, McDermott, and Roediger (2011) Experiment 1 Prelesson, postlesson, and review McDaniel et al. (2011) Experiment 2A Prelesson only Postlesson only Prelesson and postlesson Review only Prelesson and review Postlesson and review Prelesson, postlesson, and review McDaniel et al. (2011) Experiment 2B Prelesson only Postlesson only Prelesson and postlesson Review only Prelesson and review Postlesson and review Prelesson, postlesson, and review Roediger, Agarwal, McDaniel, and McDermott (2011) Experiment 1 Prelesson, postlesson, and review Roediger et al. (2011) Experiment 2 Prelesson, postlesson, and review Glass, Brill, and Ingate (2008) Prelesson, identical exam question Postlesson, identical exam question Pre- and postlesson, identical exam question Glass et al. (2008) Prelesson, related exam question Postlesson, related exam question Pre- and postlesson, related exam question
Final Exam
1.60 0.24 0.50 0.48 1.15 1.10 1.30 1.50 0.23 0.65 0.68 0.82 1.00 1.00 1.30
1.80 1.00 0.40 0.41 0.89 0.30 0.19 0.71
0.56 0.30 0.30 0.70 0.62 0.50 0.70 0.90 0.73 0.26 0.18 0.30 0.64 0.20 0.06 0.60
Note: The effect size (Cohen’s d) compares the difference between percentage of correct answers on the exams for questions not presented before the exam and the percentage of correct answers on the exams for questions presented one or more times before the exam. An effect size of 0.2 to 0.3 is a “small” effect, 0.4 to 0.7 is a medium effect, and above 0.8 is a large effect.
varying the similarity among related questions, higher error rates for related questions in early quizzes may be a necessary price for lower error rates on similar items on later exams.
Distributed Questioning With MultipleChoice Questions Increases Exam Performance on Related Short-Answer Questions Multiple-choice questions improve performance not only on subsequent multiple-choice questions but also on short-answer questions. Two classroom studies examined the benefits of incorporating short-answer questions into
instruction. Larsen et al. (2009) used short-answer questions and demonstrated that repeated short-answer testing improves short-answer performance. In an online course, McDaniel et al. (2012) found the same size effects for repeated short-answer versus repeated multiplechoice questions about the same content. Furthermore, Roediger et al. (2011) found that multiple-choice quizzes improved performance on both related short-answer questions and free recall of the facts tested. That multiple-choice questions improve performance on short-answer questions is not entirely surprising, because contemporary theories of recognition have converged on the hypothesis that recognition and recall are the result of the same retrieval processes (Diana, Reder,
Question Answering
475 95
Percentage Correct
85 75 65 55 45 35 25 Prelesson
Postlesson
Review
Unit Exam
Final Exam
Question Fig. 3. The effects of same condition and varied condition questioning on performance on related but novel exam questions (Glass, 2009). For each pair of bars, the same condition is shown on the left, and the varied condition is shown on the right. The left prelesson, right prelesson, left postlesson, and left review bars are all white because the same questions were asked in all of these conditions. The left unit exam, right unit exam, left final exam, and right final exam bars are all black because the same questions were asked in all of these conditions. The right postlesson bar is a unique shade of light gray because these questions were only asked in this condition. The right review bar is a unique shade of dark gray because these questions were only asked in this condition.
Arndt, & Park, 2006; Yonelinas, 2002). Therefore, both multiple-choice and short-answer questions are cued recall tasks. Consistent with the recollection hypothesis, laboratory research by Little, Bjork, Bjork, and Angello (2012) found that well-constructed multiple-choice questions resulted in the recollection of considerable semantic information rather than just specific answers.
Conclusion Distributed multiple-choice questioning has been demonstrated to be an effective and efficient instructional method for increasing exam performance for a variety of student populations and topics. Furthermore, with compliant students, minimal class time is required for instructional questioning because the questions can be asked as homework. Glass (2009), Glass et al. (2008), and Roediger et al. (2011) integrated online prelesson questions, review questions, or both in a classroom course, and McDaniel et al. (2012) integrated the questions into an entirely online course. The low cost and ease of implementation, combined with the consistency of the results, provide ample justification for the widespread, immediate adoption of distributed questioning as an instructional method. Glass, Ingate, and Sinha (2013) found that 4 months after the end of the course (hence, after the final exam), performance was better on questions related to those asked on
the final exam than to those that appeared only on unit exams. What is tested may be the most important determinant of the long-term retention of academic material. We are at the very beginning of an era in which surprising new findings will transform our understanding of how students learn. A variety of theories have been proposed to explain why distributed questioning is effective. On the one hand, Glass (2009) explains the effect from the perspective of a dual-system description of mammalian memory (Yin & Knowlton, 2006) in which the memory system is designed to forget idiosyncratic events but retain routine events. Consequently, the distributed repetition of the question is assumed to be a causal feature of its long-term retention, regardless of context. Consistent with this prediction, the effect is found for both classroom and online lessons (Glass, 2009; Glass et al., 2008; McDaniel et al., 2012; Roediger et al., 2011). On the other hand, Mayer et al. (2009) describes the effect of distributed questioning as an example of the hypothesis that social activity stimulated by clicker responses is more memorable than passive listening and solitary writing. Consistent with this prediction, Mayer et al. found that students who used clickers in class ultimately performed better on exams than students who wrote their answers on paper. In addition, the social activity is supposed to raise the memorability of the entire lesson, not just the memorability of the questions and their answers. Consistent with this prediction, Mayer
476 et al. found that students who answered classroom questions with clickers, compared with those who did not, performed better on exam questions about both nonqueried and queried facts from the lesson. However, Shapiro (2009) did not replicate this finding. It is likely that integration of questioning into classroom lessons has multiple effects. It is fortunate for integrated questioning that all of the hypothesized effects are presumed to improve exam performance, and all of the predictions that have been tested have been confirmed. This multiplicity of effects may be the factor responsible for the wide effectiveness of distributed questioning. Some effects are specific to verbal questions, whereas other effects are general to other forms of testing (Larsen et al., 2013). Recommended Reading Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). New York, NY: Worth. An insightful article on instruction, including the role played by testing. Carpenter, S. K, Cepeda, N. J., Rohrer, D., Kang, S. H. K., & Pashler, H. (2012). (See References). A review of recent research on spacing. Glass, A. L., & Sinha, N. (2013). Providing the answers does improve performance on a college final exam. Educational Psychology, 32, 1–32. Related research on academic testing. Roediger, H. L., III, Agarwal, P. K., Kang, S. H. K., & Marsh, E. J. (2010). Benefits of testing memory: Best practices and boundary conditions. In G. M. Davies & D. B. Wright (Eds.), Current issues in applied research (pp. 13–49). New York, NY: Psychology Press. A more detailed review of the benefits of instructional testing.
Declaration of Conflicting Interests The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
References Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185– 205). Cambridge, MA: MIT Press. Carpenter, S. K. (2012). Testing enhances the transfer of learning. Current Directions in Psychological Science, 21, 279– 283. doi:10.1177/0963721412452728 Carpenter, S. K., Cepeda, N. J., Rohrer, D., Kang, S. H. K., & Pashler, H. (2012). Using spacing to enhance diverse forms of learning: Review of recent research and implications for instruction. Educational Psychology Review, 24, 369–378. doi:10.1007/s10648-012-9205-z
Glass, Sinha Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The effects of tests on learning and forgetting. Memory & Cognition, 36, 438–448. doi:10.3758/MC.36.2.438 Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380. doi:10.1037/0033-2909.132.3.354 Diana, R., Reder, L. M., Arndt, J., & Park, H. (2006). Models of recognition: A review of arguments in favor of a dual process account. Psychonomic Bulletin & Review, 13, 1–21. Glass, A. L. (2009). The effect of distributed questioning with varied examples on exam performance on inference questions. Educational Psychology, 29, 831–848. doi:10.1080/ 01443410903310674 Glass, A. L., Brill, G., & Ingate, M. (2008). Combined online and in-class pretesting improves exam performance in general psychology. Educational Psychology, 28, 483–504. doi:10.1080/01443410701777280 Glass, A. L., Ingate, M., & Sinha, N. (2013). The effect of a final exam on long-term retention. Journal of General Psychology, 140, 224–241. Larsen, D. P., Butler, A. C., Lawson, A. L., & Roediger, H. L. (2013). The importance of seeing the patient: Test-enhanced learning with standardized patients and written tests improves clinical application of knowledge. Advances in Health Sciences Education, 18, 409–425. doi:10.1007/s10459012-9379-7 Larsen, D. P., Butler, A. C., & Roediger, H. L. (2009). Repeated testing improves long-term retention relative to repeated study: A randomized controlled trial. Medical Education, 43, 1174–1181. doi:10.1111/j.1365-2923.2009.03518.x Little, J. L., Bjork, E. L., Bjork, R. A., & Angello, G. (2012). Multiplechoice tests exonerated, at least of some charges: Fostering test-induced learning and avoiding test-induced forgetting. Psychological Science, 23, 1337–1344. doi:10.1177/ 0956797612443370 Marsh, E. J., Roediger, H. L., III, Bjork, R. A., & Bjork, E. L. (2007). The memorial consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14, 194–199. Mayer, R. E., Stull, A., DeLeeuw, K., Almeroth, K., Bimber, B., Chun, D., . . . Zhang, H. (2009). Clickers in college classrooms: Fostering learning with questioning methods in large lecture classes. Contemporary Educational Psychology, 34, 51–57. doi:10.1016/j.cedpsych.2008.04.002 McDaniel, M. A., Agarwal, P. K., Huelser, B. J., McDermott, K. B., & Roediger, H. L., III. (2011). Test-enhanced learning in a middle school science classroom: The effects of quiz frequency and placement. Journal of Educational Psychology, 103, 399–414. doi:10.1037/a0021782 McDaniel, M. A., Roediger, H. L., III, & McDermott, K. B. (2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14, 200–206. McDaniel, M. A., Thomas, R. C., Agarwal, P. K., McDermott, K. B., & Roediger, H. L., III. (2013). Quizzing in middle-school science: Successful transfer performance on classroom exams. Applied Cognitive Psychology, 27, 360–372. doi:10.1002/ acp.2914
Question Answering McDaniel, M. A., Wildman, K. M., & Anderson, J. L. (2012). Using quizzes to enhance summative-assessment performance in a Web-based class: An experimental study. Journal of Applied Research in Memory and Cognition, 1, 18–26. doi:10.1016/j.jarmac.2011.10.001 Roediger, H. L., III, Agarwal, P. K., McDaniel, M. A., & McDermott, K. B. (2011). Test-enhanced learning in the classroom: Long-term improvements from quizzing. Journal of Experimental Psychology: Applied, 17, 382–395. doi:10.1037/a0026252 Roediger, H. L., III, & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cogni tion, 31, 1155–1159. doi:10.1037/0278-7393.31.5.1155
477 Shapiro, A. M. (2009). An empirical study of personal response technology for improving attendance and learning in a large class. Journal of the Scholarship of Teaching and Learning, 9, 13–26. Shapiro, A. M., & Gordon, L. T. (2012). A controlled study of clicker-assisted memory enhancement in college classrooms. Applied Cognitive Psychology, 26, 635–643. doi:10.1002/ acp.2843 Yin, H. H., & Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nature Reviews Neuroscience, 7, 464–476. Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30 years of research. Journal of Memory and Language, 46, 441–517.