Improving metacognition in the classroom through instruction, training ...

3 downloads 390 Views 654KB Size Report
May 7, 2015 - Providing training on metacognition and incentives for accurate calibration may improve calibration, although the extant research is mixed.
Metacognition Learning DOI 10.1007/s11409-015-9142-6

Improving metacognition in the classroom through instruction, training, and feedback Aimee A. Callender 1 & Ana M. Franco-Watkins 1 & Andrew S. Roberts 1

Received: 19 February 2014 / Accepted: 29 April 2015 # Springer Science+Business Media New York 2015

Abstract Accurately judging one’s performance in the classroom can be challenging considering most students tend to be overconfident and overestimate their actual performance. The current work draws upon the metacognition and decision making literatures to examine improving metacognition in the classroom. Using historical data from several semesters of an upper-level undergraduate course (N=127), we analyzed students’ judgments of their performance and their actual performance for two exams. Students were instructed on the concepts of overconfidence, received feedback on exams, and were given incentives for accurate calibration. We found results consistent with the Bunskilled and unaware^ effect Kruger & Dunning (Journal of Personality and Social Psychology, 77(6), 1121–1134, 1999) where lower performing students initially displayed overconfidence and the highest performing students initially displayed underconfidence. Importantly, students were able to change both judgments and performance such that metacognitive accuracy improved significantly from the first to the second exam. In a second study, two additional semesters for the same course used in Study 1 were examined (N=90). For one of the semesters feedback was not provided, allowing us to determine whether feedback can improve both metacognitive judgments and performance. Our findings revealed significant improvements in performance paired with decreases in overconfidence on Exam 2, but only for students who received feedback about their performance and judgments. We postulate that feedback may be an important component in improvement metacognitive judgments. Keywords Overconfidence . Underconfidence . Metacognition . Performance . Students Overconfidence is a robust phenomenon affecting individuals across many domains and different levels of experience. Because most people tend to be overconfident in their judgments (Fischhoff et al. 1977; Koriat and Goldsmith 1996; Lichtenstein et al. 1982; Ludwig and

* Aimee A. Callender [email protected] 1

Department of Psychology, Auburn University, Auburn, AL 36849, USA

A.A. Callender et al.

Nafziger 2011), accurately judging one’s performance or knowledge can be challenging. For example, when lawyers are asked to judge the outcome of a trial, they exhibit overconfidence (Goodman-Delahunty et al. 2010), even those who have been practicing for up to 40 years! The pervasiveness of overconfidence is observed in numerous situations including entrepreneurs starting new businesses (Koellinger et al. 2007), drivers’ competency estimates (Delhomme 1991), and students’ self-evaluations of their performance (Dunlosky and Rawson 2012). In these cases, overconfidence can be problematic, leading to poor decisions such as making inappropriate investment allocations, engaging in dangerous behaviors when driving, or not allocating study time appropriately. The ability to accurately monitor and judge performance is not easily accomplished and the discrepancy between judgments and performance has been of interest to both metacognitive researchers interested in student learning and performance (e.g., Hacker et al. 2008b; Miller and Geraci 2011a, b) and decision making researchers examining probability judgments and accuracy (e.g., Brenner et al. 1996; Fischhoff et al. 1977; Keren 1991). Although researchers in these two domains examine different content, they have a common goal– furthering the understanding of the underlying mechanisms involved in metacognitive monitoring and reducing the discrepancies between judgments and performance. Metacognition is particularly important in the classroom as knowledge about one’s own learning affects future study choices and learning (Metcalfe 2009; Rawson and Dunlosky 2007). Psychology courses, particularly those in the areas of judgment and decision making and cognition are unique in that metacognition is not only required for gauging learning in the course but it is also a topic of instruction. Thus, we examined how instruction on concepts related to confidence and metacognition, and how incentives for metacognitive accuracy, may influence students’ future test performance. We also investigated the role of feedback in the classroom. However, we must caution that feedback was a naturally occurring variable and was not experimentally manipulated (one class received feedback and one class did not), and so we cannot draw causal conclusions about feedback. Regardless of the level of actual performance, successful metacognition entails making an accurate judgment of performance. Due to the amount of material students are expected to learn, it is important students accurately monitor their knowledge while studying to efficiently regulate their study choices (Rawson and Dunlosky 2007). This monitoring should theoretically result in better performance, however, judgments of test performance are often inaccurate and the tendency is to overestimate actual performance (Finn and Metcalfe 2007; Metcalfe 2009; Miller and Geraci 2011a, b) resulting in poor calibration (the association between judged and actual performance; Lichtenstein et al. 1982). For example, a student who obtains a 75 % on an exam may inaccurately judge that they answered 85 % of the questions correctly, thus demonstrating overconfidence and poor calibration. The student may be able to improve calibration on the next exam even if he or she earns the same grade by lowering their judgment of performance to match actual performance, 75 %. In this context, closely monitoring performance and changing judgments accordingly is necessary to achieve accurate calibration. When a discrepancy between actual performance and the judgment occurs, an adjustment to performance, the judgment, or both, must occur to maintain accurate calibration; however, improvements in calibration can occur independently of changes in performance. Although performance on subsequent exams may change (and often both performance and judgments change), improvements in calibration are not predicated on improvements in performance per se. Therefore, it is important to understand when metacognitive changes occur and whether these changes are due to changes in performance, judgments of performance, or both.

Metacognition in the classroom

Although the emphasis is often on judgments exceeding actual performance, failures in metacognition are not limited to overconfidence. Students can be underconfident about their abilities. In a classroom setting, underconfidence may result in wasting time studying alreadylearned information (Maki et al. 2005, 2009). Thus, understanding the conditions that can lead to both overconfidence and underconfidence is integral to improving metacognition. In the decision making literature, the difficulty of the task can affect both overconfidence and underconfidence (see Juslin et al. 2000 for review). Specifically, there is a tendency to be overconfident for difficult questions and underconfident for easy questions (Lichtenstein et al. 1982), known as the hard-easy effect. Although this phenomenon is typically observed at the question-level, other researchers have observed similar patterns at the test-level. Students typically construct an impression of the overall difficulty or ease of an exam on a global level. Accordingly, a poorer test performer might rate the test as difficult or challenging, yet overestimate their performance. Kruger and Dunning (1999) noted that those who perform poorly are often overconfident, deeming them Bunskilled and unaware^. In such cases, these individuals are Bdoubly-cursed^ because not only do they fail to perform well, they are unaware that their performance is inferior (Dunning et al. 2003; Hartwig and Dunlosky 2014; Miller and Geraci 2011b). Conversely, a better test performer may tend to be underconfident because they found the test to be easier than expected or are more aware of the items they did not know and the items they guessed (Hacker et al. 2008a, b). This would result in lower levels of confidence to account for potential uncertainty about performance. Consequently, we anticipated that there would be differences in judgments of performance based on the level of test performance. To investigate these differences, we used the grade obtained on the first exam to investigate changes in performance and judgments from an initial exam to a second exam. We anticipated that those who earned an A would exhibit underconfidence whereas those who earned a D or an F would exhibit overconfidence.

Training, incentives and feedback in calibration studies Providing training on metacognition and incentives for accurate calibration may improve calibration, although the extant research is mixed. Cao and Nietfeld (2007) conducted a classroom study across an entire semester; requiring students to indicate their confidence level on several concepts learned each day. Surprisingly, even with experience making judgments and receiving feedback about performance, the students did not automatically adjust their study strategies. Although in a similar study, Nietfeld et al. (2005) found that frequent monitoring activities did lead to improvements in performance across a semester. However, Bol et al. (2005) found that overt practice making predictions and postdictions on quizzes administered in a classroom setting did not sufficiently improve calibration or exam performance. Thus, it is unclear under which circumstances training in metacognition will lead to improvements in performance or metacognitive accuracy. One possibility is that the inclusion of feedback may be critical to improving performance and judgments. Renner and Renner (2001) found that question-by-question feedback improved performance and decreased confidence judgments when feedback was provided on multiple quizzes over the course of a semester. Additionally, when feedback was paired with training, calibration improved (Huff and Nietfeld 2009; Lichtenstein and Fischhoff 1980; Renner and Renner 2001). In these studies, training consisted of pairing multiple trials of confidence judgments with feedback on calibration. Prior work implemented training that involved practice but not instruction on the concepts of metacognition and metacognitive errors as we have done in our studies. Additional training in the form of instruction on the concepts may provide general

A.A. Callender et al.

knowledge that can be applied to making judgments. Thus, a novel aspect of the current studies is the instruction on the concepts of confidence, judgments, and overconfidence prior to eliciting judgments of test performance. Instruction and training have been implemented in several ways. Some studies include practice making judgments and opportunities for self-reflection such as Cao and Nietfeld (2007) and Nietfeld et al. (2005). Other studies have provided students with metacognitive strategies such as training in metacomprehension strategies, monitoring accuracy or both strategies with 5th grade students (Huff and Nietfeld 2009). When the training was provided to the children, calibration improved. However, the training provided in Huff and Nietfeld’s study was very different than the training provided in the current studies. In the current studies, rather than providing students with specific ways to improve comprehension monitoring, the instructor provided instruction on the concepts of confidence and overconfidence, calibration, and the hard-easy effect. Students received training in that they practiced making confidence judgments about trivia and received feedback on their performance (via iClickers) and the accuracy of their judgments. The original goal of the instruction and training in the current studies was for students to learn the concepts, as this was done in an upper-level psychology course. The goal of the studies, then, was to determine if learning about the concepts and receiving training on the concepts translates into improved calibration accuracy when students judge their own exam performance. Incentives have also been shown to improve calibration in both laboratory and classroom settings. In a laboratory study, providing incentives for improving calibration improved subsequent test performance more than just rewarding better performance (Schraw et al. 1993). Additionally Hacker et al. (2008a) found that lower-performing students were capable of improving calibration, but only when incentives (in the form of extra credit on the exam) were provided. However, Miller and Geraci (2011a) demonstrated that incentives alone did not improve calibration, but when concrete feedback was paired with incentives, lower-performing students improved their calibration accuracy from the first to the second exam. One of the key differences in the current studies compared to previous studies is that the incentives were provided based on metacognitive accuracy; more accurate judgments corresponded to more bonus points on the exam. Although Hacker et al. (2008a, b) used similar scales for incentives; they required students to be calibrated in both predictions and postdictions to earn these points. Effortful attempts to retrieve information often improve the accuracy of metacognitive judgments (Finn and Metcalfe 2007), therefore postdictions are potentially a stronger metric of metacognitive accuracy due to the fact students will be more knowledgeable of each exam’s content and can use this information in making their judgments. The current studies used archival data to investigate if the combination of instruction on metacognitive concepts, training on making judgments, feedback on performance and calibration, and providing incentives for accurate calibration could improve metacognition in a real classroom situation. Because of the nature of the course content, all students received training via class lectures on these topics. All students then took a unit exam and made postdiction judgments of their performance immediately after taking each exam (on the last page of the exam, and students could go back through the exam before making their judgment). Thus, the course allowed a real-world application between learned material and how well students applied this knowledge in a situation (test) that provided incentives to improve calibration.

Metacognition in the classroom

Measuring metacognition Measuring the association between judgments and actual performance has been calculated in a variety of ways. Two primary methods calculate calibration at a global level. One method uses the absolute difference between judged and test performance (Hacker et al. 2008a, b; Miller and Geraci 2011a) and other method, calibration bias, takes into account the directionality of the difference between judgments and test performance (Huff and Nietfeld 2009). Using different calculations can allow for different interpretations (see Schraw et al. 2013 for a review). For example, with the absolute difference score, the same score can represent an underconfident and overconfident person, yet psychologically, the resultant phenomena are different. Using a calibration bias score is more informative, indicating whether the learner was overconfident or underconfident. Nonetheless, neither method is able to distinguish whether changes across time are due to changes in performance, changes in judgments, or a combination of both elements changing. The possibility that both may change is important to consider and is often overlooked in the literature. Consequently, a second aim of the current studies was to determine which elements (judgment, test performance, or both) changed across time. Specifically, whether calibration improved on the second exam after receiving feedback on the first exam, and whether students of differential abilities on test performance demonstrated improvements on judgments, performance, or both. The best performing students, given that their performance is already high, should show greater improvements in judgments of performance whereas the lower performing students should benefit more from improving test performance and to some degree, improving judgments.

Hypotheses Based on previous research and our study, the following hypotheses were made for Studies 1 and 2: 1. Higher performers would exhibit underconfidence whereas lower performers would exhibit overconfidence. 2. The combination of training, incentives, and feedback would lead to improvements in calibration across the semester. 3. Change to both performance and judgments would be made. It was predicted that the types of changes will vary by ability (grade on Exam 1). An additional hypothesis was made for Study 2: 4. Students who received in-class feedback about their performance and judgments would improve calibration more than students who do not receive feedback (Study 2).

Current studies In order to examine metacognition changes in judgments and/or performance in a real classroom situation, we analyzed historical data from several semesters of an upper-level undergraduate decision making course. Because this was an historical data set, we were limited in some of the information that was available to us to analyze, including full demographic information and psychometric information on the exams, although we report some general

A.A. Callender et al.

exam properties in the next section. Previous research has also reported similar data with similar limitations of incomplete demographic information and limited psychometric properties for each exam (see Miller and Geraci 2011a). Additionally, it is important to note that this was not a controlled experiment and the aim was to improve student calibration in a real classroom where students had a stake in passing the course. Because students took multiple exams, we examined changes in performance on exams as well as changes in judgments across the first two exams in the course. A unique aspect of this study was that students received instruction on concepts related to metacognition and had practice making judgments. As an application of these learned concepts, the students made judgments (postdictions) about their performance on exams taken in the course and received bonus points for being metacognitively well calibrated. Postdiction judgments were used out of practicality based on classroom instructional methods. Finally, Study 1 and Study 2 differed in that all of the students in Study 1 received feedback about performance and judgments whereas Study 2 compared two sections, one of which received feedback and one that did not receive any feedback.

Study 1 Method Participants & design Data represent the total number of students (N=127) enrolled in an upper-level decision making course across five different semesters. The same instructor taught the course each semester and course material was consistent across semesters. The participants were undergraduate students enrolled in the course, with approximately 32 % male and 68 % female.1 We did not have experimental design due to the historical nature of the data; however, we analyzed the data based on exam (1 and 2), performance and judgments (as well as differences across exams on these measures) by Exam 1 Grade Group (A, B, C, D/F). Exams, performance and judgment scores were used a within-subject variables and Grade Group as a between-subjects variable.

Materials and procedure Instruction & training Students enrolled in an upper-level decision making course were introduced to the concepts of overconfidence, underconfidence, and the hard-easy effect as part of course curriculum instruction. This instruction spanned two class periods (a total of 150 min). In addition to the instruction on the concepts related to metacognition, students also received training and feedback on making accurate judgments. The training required all of the students to answer multiple choice trivia questions based on information pertaining to the university’s football team and school spirit and the questions were classified as hard or easy to further illustrate 1

Because this was a historical data set spanning 5 semesters from Fall 2009 to Fall 2011 semesters and the classes were not taught with the objective of analyzing course performance and judgments, we only have gender as demographic data on the students and more detailed demographic information is not included in the class rosters provided to instructors.

Metacognition in the classroom

differences in confidence judgments based on question type. Each participant answered the questions via classroom response system (iClicker) and also judged their confidence for each response. Feedback via iClicker software was provided for each set of questions. The students received immediate feedback, showing them the correct answer and allowing the student to compare their confidence with performance. The students were then asked to report if they were overconfident, underconfident, or well calibrated to the previous set of questions using iClickers to indicate their judgment-performance gap. Students were then provided with psychological explanations for the phenomena as well as information about overconfidence. Furthermore, students were made aware during this lecture that they would be asked to judge their performance on the first exam, and that bonus points could be earned based on the degree to which their judgment mapped onto performance. Students used iClickers throughout the course (almost every class period except 1–2 times where no iClicker responses were recorded) to answer questions about decision and choices and completed short quizzes on course content, however, only in the overconfidence-calibration lectures did students judge their performance accuracy and examine the judgment-performance gap.

Judgments, incentives, & feedback Students were administered an exam consisting of multiple choice, matching, and short-answer questions approximately1 to 2 weeks after training. After completing the exam, but prior to turning it in to the instructor, students were given the following instructions on the last page of the exam: Rate your performance (from 0 to 100) on this exam (not including the bonus) __________. Additionally, on the last page of the exam, it was made explicit to the students the importance of good calibration by delineating the bonus incentive as follows: If you are 0–2 points from your exam score; you will receive an additional 5 points If you are 2.5–4 points from your exam score, you will receive an additional 4 points If you are 4.5–7.5 points from your exam score, you will receive an additional 3 points If you are 8–10.5 points from your exam score, you will receive an additional 2 points If you are 11–15 points from your exam score, you will receive an additional 1 points If you are off by more than 15 points from your exam score, you will receive an additional 0 points Two days later, the instructor returned the graded exam to students. Before going over the exam in detail with the students, the instructor presented a calibration scatterplot demonstrating the class distribution for judgments and performance on Exam 1. Thus, students could view class calibration before receiving their exam back for review. Next, the instructor and students went over each exam question to provide feedback on performance for a total of 30 min of the class period. Lastly, students were asked to closely examine how well their judgments matched their performance and to consider ways to improve either judgment or performance for the next exam. Students were reminded that they would have an opportunity to make a similar judgment to earn bonus points on the next exam in the course. An identical procedure was followed for exam 2.

Letter grade classification for exam 1 Students were classified into four grade groups based on their first exam performance score (without the bonus). An BA^ represented a score of 90 or above (n=20), a BB^ represented a

A.A. Callender et al. Table 1 Distribution of students per exam 1 grade per semesters for the same course Semester

Grade group A

B

C

D or F

Total

Mean rank

Fall 2009

1

9

10

7

27

Fall 2010

6

10

8

12

36

58.67 62.33

Fall 2011

2

4

7

10

23

49.89

Spring 2010

7

8

9

4

28

74.71

Summer 2011

4

5

2

2

13

81.58

20

36

36

35

127

score of 80–89 (n=36), a BC^ represented a score of 70–79 (n=36), a BD/ F^ represented a score of 69 or less (n=35). Note that the student sample reflects students who completed the course; therefore, some students who earned a D or F on the first exam elected to drop the course and are not included in the study. Table 1 presents the distribution of students per Exam 1 letter grade (A, B, C, D/ F) per semester. A Kruskal-Wallis analysis indicated a difference across semesters [χ 2(4)=10.066, p=.04], with the Fall 2009, 2010, 2011 semesters having similar ranks contrasted to Spring 2010 and Summer 2011 semesters. As can be noted from Table 1, the lower mean ranks for the Fall semesters reflect a larger portion of D/F grades. The exact same exam was not delivered each semester and different students took the courses each semester, so there is natural variability of exams and students (as noted in Tables 1 and 2). However, it should be noted that all comparisons for exam performance and judgments are within subject at the student level and not based on across semester comparisons. Hence, we are most interested in each student’s calibration improvements from one exam to the other and not across semesters.

Results Exam properties Because the nature of this study is based on historical data and the exams were not created to be equivalent (but in large part based on same content and same questions), there is some variability in terms of multiple choice questions versus other items (e.g., true/false, matching, and short answer) as can be noted in Table 2.2 However, it should be noted that the purpose of this study was to examine global assessment of exam performance (comprised of all the different types of questions) as well as global judgment of their total exam grade. We did not ask students to provide an estimate on each section of the exam. Students were made aware of the type of judgment they would do in advance of taking the exam. We conducted two separate one-way ANOVAs to determine whether student performance differences on Exam 1 or Exam 2 existed based on course semester. There were no differences on either exam 1, F (4, 122)=1.85, p=.13, η2p =0.06, or Exam 2, F (4, 122)=1.17, p=.33, η2p = 0.04. When we examined the omnibus repeated measures ANOVA, where we entered exam 1 and 2 as a repeated measures variable and course semester as the independent variable, only 2

Because of the historical nature of the data set, item by item responses and some of the individual level data for exams’ content multiple choice versus other content is incomplete and therefore is not presented in the table.

Metacognition in the classroom Table 2 Number of points per question type for exam 1 and exam 2 in study 1 Semester and exam

Exam question type Multiple choice total

Other items total

Breakdown of other items by question type Matching

True/False

Short answer

26 30

Fall 2009 50 50

50 50

24 10

– 10

Exam 1

60

40

10

10

20

Exam 2

60

40

18

6

16

Exam 1

50

50

10

24

16

Exam 2

50

50

16

14

20

Spring 2010 Exam 1

60

40

20



20

Exam 2

60

40



20

20

Exam 1

50

50

18

12

20

Exam 2

60

40

18

6

16

Exam 1 Exam 2 Fall 2010

Fall 2011

Summer 2011

Multiple Choice, Matching, and True/False Items were worth 2 points each and short answer questions had its own point allotment based on details of question. Students completed at least 2 short answer questions

the main effect of exam was significant, F (1, 122)=10.20, p

Suggest Documents