Session T1G
Development, Testing, and Application of a Chemistry Concept Inventory Stephen Krause1, James Birk2, Richard Bauer3, Brooke Jenkins4, and Michael J. Pavelich5 Abstract - A Chemistry Concept Inventory (CCI) has been created that provides linkages to misconceptions observed in chemistry and subsequent introductory materials engineering courses as revealed by a Materials Concept Inventory (MCI). The CCI topics included were bonding, intermolecular forces, electrochemistry, equilibrium, thermochemistry and acids and bases. Numerous students were interviewed in development of questions in order to ascertain that the questions and responses were interpreted as intended. Questioning students on topics of molecular shape gave helpful insight into how students solve problems. For example, a question might be written to test one aspect of the topic, but students might solve it differently. They might use different reasoning that would lead to a correct answer. The item is therefore testing something other than the intended topic. Interviews led to some unique findings in spatial understanding and misconceptions held by these students. Multiple rounds of testing were then used in ascertaining development of a valid Chemistry Concept Inventory. Index Terms - Chemistry Concept Inventory, misconceptions, student interviews, validity. INTRODUCTION Over the last two decades new theories of learning and associated methods of teaching have been emerging in the fields of science, math, engineering, and technology. The new methods have demonstrated that, compared to teachercentered lecturing, using a more a hands-on, inquiry-based, student-centered model of education increases a student’s knowledge and conceptual understanding of a subject. To compare the differences between the new and old methods, new assessment tools are being created to measure students’ conceptual knowledge as well as to understand what preconceptions and background limitations students have when entering a class. One of the most widely used and influential assessment tools is the Force Concept Inventory (FCI) which was created by Hestenes et al. [1]-[2], and tested broadly by Hake [3] for students in high school and college physics classes. A concept inventory questionnaire utilizes a series of multiple-choice, frequently-illustrated questions, based on qualitative, concept-
oriented problems on a particular subject. It measures deep understanding and conceptual knowledge of a subject rather than a student’s ability to solve problems by memorizing facts or by just solving equations. The results have been compared to the performance of students in classes with different teaching methods to determine teaching effectiveness. Over more than a decade, the FCI has shown quite clearly that the application of new methods of teaching based on education reform is, indeed, more successful in developing conceptual understanding. Hake’s [3] findings indicated that students using reformed methods with “interactive engagement” significantly increased their gain in conceptual understanding compared to courses taught with traditional, teacher-centered lecture courses. As part of a project on innovative education in engineering, D. Evans and others in the NSF-sponsored Foundation Coalition [4] are developing concept inventory assessment tools similar to the FCI. As part of this process, a concept inventories test for the subject of chemistry has been developed and is called the Chemistry Concept Inventory (CCI). CHEMISTRY CONCEPT INVENTORY DEVELOPMENT The Chemistry Concept Inventory is a multiple-choice test designed to assess the effect of curriculum changes. The goal was to make a reliable instrument that is easy to use, administered in a short period of time and can accurately assess student understanding of general chemistry topics. If the CCI can easily and accurately assess student understanding, then teaching techniques can be evaluated for effectiveness by comparing a student group using new techniques to a control student group. This paper describes both the development and the application of the CCI. Topics covered by the CCI were selected by a group of chemistry educators in consultation with engineering faculty. These topics are first introduced in general chemistry and then reappear in later engineering courses. These topics were selected from both semesters of a two-semester chemistry sequence. The following topics were selected as shown in Table 1.
1
Stephen Krause, Professor, Dept of Chemical & Materials Engineering, Arizona State University, Tempe AZ 85287-6006,
[email protected] James Birk, Professor, Dept of Chemistry, Arizona State University, Tempe AZ 85287-1604,
[email protected] 3 Richard Bauer, Lecturer, Dept of Chemistry, Arizona State University, Tempe AZ 85287-1604,
[email protected] 4 Brooke Jenkins, Graduate Student, Dept of Chemistry, Arizona State University, Tempe AZ 85287-1604,
[email protected] 5 Michael J. Pavelich, Professor, Dept of Chemistry, Colorado School of Mines, Golden, CO 80401,
[email protected] 2
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA 34th ASEE/IEEE Frontiers in Education Conference T1G-1
Session T1G TABLE 1 TOPICS COVERED
Topic Sub-Topics Chemistry I Heat Thermochemistry Thermal Conductivity Thermal Equilibrium Chemistry I Bond Polarity Bonding Octet Rule Molecular Structure Chemistry I Intermolecular Forces Intermolecular Forces Chemistry II Equilibrium Rate Equilibrium Equilibrium-Dynamic vs. Static Equilibrium- Le Chatelier’s Principle Equilibrium Constant Chemistry II Acid/Base Neutralization Acids and Bases Acid Strength pH Chemistry II Oxidation/Reduction Electrochemistry Voltaic Cells Electrolytic Cells Once the topic areas were determined, an extensive literature search was carried out to identify misconceptions associated with the CCI topics that have already been studied. There is a large body of literature available on chemistry misconceptions. Because of this, distracters could be designed to test for known misconceptions. This search also led to natural sub-topics within each of the main topics. For each subtopic, at least three questions were written giving a total of 30 questions for the Chemistry I inventory and 31 questions for the Chemistry II inventory. The questions were intended to be conceptual, not mathematical or algorithmic. These questions were initially given to the students in Chemistry I and II as part of their weekly quizzes. The questions were then compiled into Version A of the CCI. Because the questions were given after the students had covered the information in lecture, only post-test data were available. Table 2 summarizes these results. TABLE 2 RESULTS FROM CCI VERSION A
Chemistry I N = 326 Students Alpha = 0.7883 Post-Test Mean = 14.73/30 = 49.1% Chemistry II N = 158 Students Alpha = 0.7855 Post-Test Mean = 18.53/31 = 59.8%
reliability of the test, which is the ability of the test to evaluate an individual consistently. Alpha values range from 0 to 1 with a value of 0.7 or higher indicating the test is reasonably reliable. Both the Chemistry I and Chemistry II inventories scored above 0.7. An alpha value is calculated for the whole test. In contrast, discrimination and difficulty indices are calculated for each individual question. The discrimination index is a measure of how well the question discriminates between the students. To calculate a discrimination index, students are first ranked by overall performance on the exam. Then, a top portion of the class is compared to the bottom portion of the class on each question. If every student in the top portion answered the question correctly and every student in the bottom portion answered the question incorrectly, then the question would be considered to perfectly discriminate between good students and poor students. The discrimination index would be 1. The discrimination index ranges from –1 to 1. Along with the discrimination index it is important to consider the difficulty index. The difficulty index is the percent of students that answered the question correctly. Since a question can receive a low discrimination index because the majority of the class answers the question right (or because the majority of the class answers it wrong), it is important to combine these two indices when evaluating a question. One of the goals of the CCI was to be able to administer it in a short period of time. In order to do so, the test had to be shortened. Questions were eliminated so that the Chemistry I and Chemistry II inventories were 20 questions long. Since three questions were written on each sub-topic, one question was eliminated thus leaving two questions on that sub-topic. Questions for elimination were selected by determining what effect elimination would have on the coefficient alpha. This elimination of questions can have a negative effect on the alpha and lower the reliability of the CCI. The questions that were eliminated were those that would have the least negative affect on the alpha. These data, as well as expert judgment, were combined to eliminate weak questions to leave two 20-question tests. Also, a number of questions were modified to make them clearer. Version B of the Chemistry I and II CCIs were then piloted during the summer of 2003. The Chemistry I Version B of CCI was given to university students at the beginning of the semester for a pre-test and then again at the end for a post-test. The Chemistry II Version B was given at a community college at the beginning of the semester as a pre-test. The questions were then administered again, but spread out over several weekly quizzes. The results from the weekly quizzes were combined for use as the post-test. The results from Version B of the CCI are summarized in Table 3.
The tests were analyzed and descriptive data were gathered. The coefficient alpha, discrimination index and difficulty index were used to evaluate this first version of the CCI. The coefficient alpha is a measure of the internal 0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA 34th ASEE/IEEE Frontiers in Education Conference T1G-2
Session T1G TABLE 3 RESULTS FROM CCI VERSION B
Chemistry I N = 42 Alpha = 0.7135 Pre-Test Mean = 5.48/20 = 27.4% Post-Test Mean = 10.60/20 = 53.0% Chemistry II N = 42 Alpha = 0.4188 Pre-Test Mean = 7.17/20 = 35.9% Post-Test Mean = 10.93/20 = 54.7% These results show lower alpha values for Version B than for Version A of the CCI. A smaller number of students were available to be tested. With this smaller sample size it is more difficult to establish reliability. The tests were shortened from 30 questions to 20 questions. Shortening a test inherently lowers a reliability coefficient since you remove items that are testing the same topic. These factors can combine to lower the reliability of the test. During the summer of 2003, 11 students were interviewed in depth on seven questions on molecular shape from the CCI Chemistry I Version B. Student interviews gave useful insights into how students solve problems. For example, a distractor might be written to test for a particular misconception, but students might be selecting that distractor for a different reason. The item is therefore testing something other than the intended misconception. For these interviews using Version B, the selected students were currently taking Chemistry I. They were interviewed just after covering the relevant information in lecture. These interviews led to some unique findings in spatial understanding and misconceptions held by these students. They also helped to validate the test. Information from student interviews was used to modify questions to make them clearer and to insure that the inventory was testing what was intended. In one question students had to determine the polarity of a molecule for which they were given a description. It reads: Consider a molecule with the formula ZA2 where Z is the central atom and A and Z have different electronegativities. In which of the following cases would this molecule always be non-polar? A) B) C) D) E)
If A has 2 lone pairs and Z has no lone pairs. If Z has 2 lone pairs and A has no lone pairs. If Z has 1 lone pair and A has 3 lone pairs. If A is drastically more electronegative than Z. If Z is drastically more electronegative than A.
During the interviews it became obvious that the students were misinterpreting the statements that led to incorrect molecular drawings they made to try to answer the question. The main confusion came from the number of lone pairs of electrons that should be on each of the A atoms. The purpose of the question is not to test whether or not students can draw
a molecule from a written description, but whether they can determine the polarity of the molecule. It was not the intent for students to miss the question because they could not draw the molecule correctly. To clarify the question, the word “each” was inserted into the three distracters so that the question now reads: Consider a molecule with the formula ZA2 where Z is the central atom and A and Z have different electronegativities. In which of the following cases would this molecule always be non-polar? A) B) C) D) E)
If each A has 2 lone pairs and Z has no lone pairs. If Z has 2 lone pairs and each A has no lone pairs. If Z has 1 lone pair and each A has 3 lone pairs. If A is drastically more electronegative than Z. If Z is drastically more electronegative than A.
This new wording was tested on a group of incoming teaching assistants. They were asked to draw the molecule that distracters A, B and C represented. In nearly every case the drawings that the teaching assistants produced were the intended ones. Results from Version B were evaluated and a C version of the CCI was developed. During Fall 2003, the Chemistry I and II CCIs were administered to a large number of students. Information from this version is in Table 4. TABLE 4 RESULTS FROM CCI VERSION C
Chemistry I N = 556 Students Alpha = 0.6803 Pre-Test Mean = 4.94/20 = 24.7% Post-Test Mean = 8.90/20 = 44.5% Chemistry II N = 195 Students Alpha = 0.5957 Pre-Test Mean = 6.51/20 = 33.6% Post-Test Mean = 9.63/20 = 48.1% Two things should be noted about these results. First, they show that each group’s Pre-Test Mean is consistent with guessing, which is expected of students at the beginning of the semester. Second, the gains were not substantial. At the end of the Fall semester, the CCIs were administered again as a post-test. The difference between the pre and post-test scores represents what is gained by being in the class. The tests seem to indicate that the conceptual understanding of the topics on the test did not improve a great deal from being in the class. The normalized gains, as defined by Hake [3], were in the low range, with values of 24% gain for Chemistry I and 22% for Chemistry II. Table 5 summarizes the steps taken to develop the CCI.
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA 34th ASEE/IEEE Frontiers in Education Conference T1G-3
Session T1G TABLE 5 STEPS IN THE DEVELOPMENT OF THE CCI
Topic areas to be covered were selected. The literature was searched for research on misconceptions in those topic areas. 3. Sub-topic areas were determined. 4. At least three questions were written for each sub-topic. 5. These questions were administered to students. (Version A) 6. Weak questions were eliminated, based on the discrimination index, difficulty index, and the coefficient alpha. 7. The remaining questions were administered to students. (Version B) 8. Students were interviewed. 9. Questions were modified, based on results from test administration and student interviews. 10. The modified questions were administered to students. (Version C) 11. Steps 8 through 10 were repeated until acceptable results are attained.
I were compared to one another. comparisons are shown in Table 6.
1. 2.
APPLICATION OF THE CHEMISTRY CONCEPT INVENTORY The Chemistry Concept Inventories were statistically analyzed in order to answer two questions. First, were there significant differences in the gains between different instructors of the same course? Second, was there a correlation between score on the CCI and performance in the course? Each question will be discussed in detail. Are there significant differences in the gains between different instructors of the same course? In investigating the differences in the gains, we are looking at the effect of instructor on student learning. Do different instructors affect student understanding differently? To do this, all the results from the CCIs were considered. A oneway ANOVA was used to evaluate the relationship between the normalized gains and the instructor. Normalized gains were used because this describes the change in conceptual understanding from the beginning of the course to the end. The gains are normalized to account for the portion of points a student could possibly gain back. The following equation was used to calculate the normalized gains.
The results of these
TABLE 6 COMPARISONS OF SCORES, GAINS, AND NORMALIZED GAINS BY
INSTRUCTOR (N = 529)
Section
Pretest Average Score
Posttest Average Score
Average Gain
Mean Normalized Gain
Std Dev.
A B C D
5.12 5.03 4.57 5.25
8.34 9.49 8.10 9.51
3.22 4.46 3.53 4.26
0.21 0.29 0.20 0.28
0.21 0.23 0.26 0.22
E
5.23
9.29
4.06
0.25
0.24
From these results we see that sections A and C had lower gains than the other three sections. From classroom observations made during the semester, we had noted that section A and C were taught by traditional lecture methods. The other three sections were taught using the same classroom materials that involved a highly visual approach with systematic student group work during the lecture period. While we note that there are numerous factors that might explain these results, from time of day and class demographics to gender of the instructor, the differences in teaching style may be a significant contributor to student performance. At present, we are not ready to fully explain all the factors contributing to these differences. This will be left to future projects that can use these inventories to answer such research questions. It is enough to note that differences in the gains occurred and they are consistent with research on learning. Is there a correlation between score on the CCI and performance in the course? For the second question, these results were used to determine if there was a correlation between score on the CCI and performance in the course. Ideally the two should be correlated since both the CCI and the grade in the course are supposed to measure understanding of chemistry. Results from previous attempts at developing chemistry concept inventories have been unable to establish this correlation. Fortunately, not only were we able to establish a correlation, but also the correlation coefficients are quite high. The results for each inventory are shown in Table 7.
Normalized gains = (Post-Test Score) – (Pre-Test Score) 20 – (Pre-Test Score) Normalized gains between groups were found to be significantly different, F (5,704) = 2.62, p = 0.02. Following this analysis, a series of comparisons was done. The CCI I results were used because there were 5 different sections of this course. Two of the sections (A and C) were team-taught. This left four different instructional settings with the possibility of four different teaching styles. The CCI II results were not used because the same instructor taught both sections of this course. The sections of Chemistry 0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA 34th ASEE/IEEE Frontiers in Education Conference T1G-4
Session T1G TABLE 7 PEARSON CORRELATION COEFFICIENTS (R) FOR CCI SCORE AND COURSE
CONCLUSIONS
AVERAGE
The results just discussed help to establish the validity of the chemistry concept inventories. The two factors of the CCI score and the course average should be well correlated. Due to the high correlations found, we can assume that the CCI does measure a fundamental factor instrumental to understanding chemistry. This validates these tests for use in future research projects that assess different approaches that might be used to influence this factor, which is assumed to be conceptual understanding. A few more minor changes have been made to the test from the results of the last trial. They are now ready to be used by education researchers to answer additional questions, some of which have been alluded to in this paper. This inventory should become a valuable tool in assessing pedagogy in introductory chemistry classes.
CCI
Section
CCI Score N P vs. Course value Average (R) Chem I All 0.57 564 0.00* A 0.67 111 0.00* B 0.58 102 0.00* C 0.60 133 0.00* D 0.51 112 0.00* E 0.65 106 0.00* Chem II All 0.61 193 0.00* A 0.62 112 0.00* B 0.64 81 0.00* *Indicates statistically significant results.
R2
0.33 0.45 0.34 0.36 0.26 0.42 0.37 0.39 0.41
First consider the results for all of the sections for each of the inventories. Meaning for these results can be gained by squaring the correlation coefficient. This gives the amount of the variance that is shared. In other words, for the CCI I, one third of the variance of scores on the CCI matches the variance seen in course performance. This means that there is a factor that contributes largely to the results on the CCI and the course average. This factor presumably is conceptual understanding. Results for CCI II showed an even higher correlation. These CCI tests are strongly related to how well students perform in the course. The breakdown by section is also presented in Table 7 and shows that these correlations hold up in smaller groups and all are above 0.50.
REFERENCES [1]
Hestenes, David, Wells, Malcolm, Swackhamer, and Gregg. “Force concept inventory.” The Physics Teacher, 30(3): 141-151, (1992).
[2]
Hestenes, David, Wells, and Malcolm. “A mechanics baseline test.” The Physics Teacher, 30:159-166, (1992).
[3]
Hake, Richard R (1998). “Interactive-engagement versus traditional methods: A six-thousand survey of mechanics test data for introductory physics courses.” American Journal of Physics 66(Jan): 64-74, (1998).
[4]
Evans, D.L., Gray, D., Krause, S., Martin, J., Midkiff, C., Notaros, B.M., Pavelich, M., Rancour, D., Reed-Rhoads, T., Steif, P., Streveler, R., and Wage, K., “Progress on Concept Inventory Assessment Tools” 33d ASEE/IEEE Frontiers in Education Conference, Boulder CO (2003).
0-7803-8552-7/04/$20.00 © 2004 IEEE October 20 – 23, 2004, Savannah, GA 34th ASEE/IEEE Frontiers in Education Conference T1G-5