J Sci Educ Technol (2014) 23:82–97 DOI 10.1007/s10956-013-9452-x
The Use and Effectiveness of an Argumentation and Evaluation Intervention in Science Classes Janis A. Bulgren • James D. Ellis • Janet G. Marquis
Published online: 6 June 2013 Ó Springer Science+Business Media New York 2013
Abstract This study explored teachers’ use of the Argumentation and Evaluation Intervention (AEI) and associated graphic organizer to enhance the performance of students in middle and secondary science classrooms. The results reported here are from the third year of a design study during which the procedures were developed in collaboration with teachers. A quasi-experimental pretest– posttest design with 8 experimental and 8 control teachers was used with a total of 282 students. An open-ended test assessed students’ abilities to evaluate a scientific argument made in an article. The students were asked to identify the claim and its qualifiers, identify and evaluate the evidence given for the claim, examine the reasoning in support of the claim, consider counterarguments, and construct and explain a conclusion about the claim. The quality of students’ responses was assessed using a scoring rubric for each step of the argumentation process. Findings indicated a significantly higher overall score and large effect size in favor of students who were instructed using the AEI compared to students who received traditional lecture– discussion instruction. Subgroup and subscale scores are also presented. Teacher satisfaction and student satisfaction and confidence levels are reported.
J. A. Bulgren (&) Center for Research on Learning, University of Kansas, J. R. Pearson Hall, 1122 West Campus Road, Lawrence, KS 66045, USA e-mail:
[email protected] J. D. Ellis School of Education, University of Kansas, Lawrence, KS, USA J. G. Marquis Schiefelbusch Life Span Institute, University of Kansas, Lawrence, KS, USA
123
Keywords Argumentation Higher-order thinking Middle- and secondary-level science instruction Graphic organizer Design study Student diversity
Introduction According to the Framework for K-12 Science and Engineering, all science students must be able to engage in argumentation based on evidence [National Academy Press (NAP) 2012]. In addition to having a prominent place in the Framework, the need for expertise in argumentation also appears in the goals of Common Core State Standards, particularly related to literacy in science and technical subjects (CCSS 2010). Specifically, according to the Standards, if students are to succeed in college or careers, they must be able to integrate knowledge and ideas, delineate and evaluate claims and arguments, and assess the reasoning used in arguments. In addition to college and career readiness, students must be able to analyze claims made in scientific journals, newspaper articles and editorials, on television and the Internet, or in classroom and everyday discourse. This is necessary whether they are explaining their own experimental results or engaging in social debates about claims made by others (Kolsto and Ratcliffe 2007). Expertise in these areas is particularly critical if students are to engage in the higher-order thinking necessary to compete in the world economy of the twenty-first century (Conley 2008; Heller and Greenleaf 2007; National Research Council 2012). Ultimately, the goal is that students learn to apply scientific practices to everyday challenges (Bricker and Bell 2008) and develop defensible ways to convince others of the truth of a conclusion (Lawson 2003). Argumentation is the complex reasoning process that allows students to do this.
J Sci Educ Technol (2014) 23:82–97
Toulmin et al. (1984) defined the components of argumentation as reasoning from data to arrive at a claim by using warrants that tie the evidence to the claim, considering additional supports for the warrant, and proposing qualifiers of and rebuttals to the claim. These components can be used to evaluate claims in a variety of areas. Subsequently, other researchers have applied the components of argumentation to specific areas such as science. For example, Driver et al. (2000) explored how argumentation facilitates the construction of plausible links between scientists’ conjectures and evidence. Others continued to explore and expand the use of argumentation in scientific thinking and learning (Bulgren and Ellis 2012; Duschl and Osborne 2002; Erduran et al. 2004; Kelly et al. 2007; Jimenez-Alexandre and Erduran 2007; Krajcik et al. 2007; Linn et al. 2003; Sadler 2004; Sandoval and Millwood 2007). Particularly related to scientific thinking, other components of reasoning are important to the development of argumentation. Kuhn (1991) called the methodological components of scientific thinking the ‘‘skills’’ of argumentation. These include an appreciation of the role of empirical evidence (Kuhn et al. 1988) and the ability to judge the credibility of evidence. The credibility of evidence may be judged by reliability (Schauble 1996), experimental control (Koslowski et al. 1989; Kuhn et al. 1995; Schauble 1996), and objectivity (Klahr et al. 1993; Kuhn et al. 1995; Penner and Klahr 1996; Schauble 1996). These components represent complex, but necessary, ways of thinking if students are to learn to craft arguments. Unfortunately, school science instruction rarely focuses on teaching students how to craft arguments (Bricker and Bell 2008; Norton-Meier et al. 2008). Duschl and Osborne (2002) highlighted this failure relative to innovative instruction, contending that few changes occurred in science classrooms, school environments, and teaching practices during the last half of the twentieth century. The difficulty in making changes to instruction in argumentation in science classrooms is addressed by Sadler (2004). In a review of research, he found mixed results from efforts designed to help students analyze complex arguments and suggested that these may be due, in part, to the complex interrelationships between socio-scientific issues and the nature of science. Nevertheless, he contended that in order to achieve the goals laid out in reform documents, efforts must continue to help students engage in argumentation. Unfortunately, data from national assessments such as the National Assessment of Educational Progress indicated that most young Americans do not have a firm grasp of higher-order intellectual skills (Jacobs 2008). Therefore, innovative instruction is needed to reverse these trends. This effort may be particularly challenging for many teachers whose classes contain students of diverse abilities, especially in light of goals that all students graduate from
83
high school prepared for college and careers (U.S. Department of Education 2010). Therefore, development of effective evidence-based instructional methods for use with all students across the grade levels is needed (Reznitskaya and Anderson 2002). Analyses by groups from What Works Clearinghouse have identified effective practices. These include providing explicit vocabulary instruction, providing comprehension strategy instruction and opportunities for discussion (Kamil et al. 2008), as well as organizing instruction and study, combining graphics with verbal descriptions, using prequestions to introduce a topic, delaying judgments of learning, and asking deep explanatory questions (Pashler et al. 2007). Furthermore, Ysseldyke (2009) emphasized that effective treatments, such as those that may incorporate these proven components, must be implemented with fidelity or integrity if students are to fully benefit. In response to these challenges, the intervention used in this study brings together many of the evidence-based components described above into a single instructional procedure to help all students learn components of argumentation with emphasis on scientific reasoning; reasoning components were based not only on philosophers such as Toulmin et al. (1984) but also on components of scientific analysis and evaluation such as objectivity, reliability, validity, and methodology. The Purpose of the Study The purpose of this study was to develop and determine the effectiveness of the Argumentation and Evaluation Intervention (AEI) to help students engage in reasoning associated with argumentation challenges in the Framework and the CCSS. In addition to a focus on the total groups of students, it was the intent of the study to determine response to the intervention by students of different genders, students with learning disabilities, those identified as gifted, and by grade level. Other purposes were to explore the satisfaction level of teachers and the satisfaction and confidence levels of students who were taught using the AEI. The sources of claims, either from articles or results of laboratory experiments, were identified and tallied. The results reported here are from a study in the third year of a design study (Bannan-Ritland 2003; Kelly et al. 2008) during which the instructional procedures were developed in collaboration with teachers (Bulgren and Ellis 2012). The Argumentation and Evaluation Intervention The AEI is part of a programmatic line of research on Content Enhancement Routines (CERs) (Lenz and Bulgren 2013) designed to help all students succeed in general education courses. Components for the AEI are presented below.
123
84
The Argumentation and Evaluation Guide (AEG) The AEG is a graphic device constructed to support the analysis and evaluation of a claim. During the instructional process, information associated with each of nine steps of an associated cognitive strategy is written on the AEG (see Fig. 1 for an example AEG). On this form, each of the nine steps is cued by a number and a guiding question. Space is provided to write an answer to each question. The guiding questions or directions for each of the steps are as follows: (1) ‘‘What is the claim, including any qualifiers?’’; (2) ‘‘What evidence is presented?’’; (3) ‘‘Identify the type of evidence (data, fact, opinion, or theory)’’; (4) ‘‘Evaluate the quality of the evidence as poor, average, or good’’; (5) ‘‘What chain of reasoning (warrant) connects the evidence to the claim (authority, theory, or type of logic)’’; (6) ‘‘Identify the type of reasoning’’; (7) ‘‘Evaluate the quality of the reasoning as poor, average, or good’’; (8) ‘‘What are your concerns about the believability of the claim (your counterarguments, rebuttals, or new questions)?’’; and (9) ‘‘Accept, reject, or withhold judgment about the claim. Explain your decision.’’ The teacher completes an AEG prior to the class but only as an instructional plan. In class, the teacher and all students start with a blank guide and interactively discuss and complete it. Therefore, the final guide may differ from what the teacher planned because it incorporates students’ contributions, questions, and insights. It is important to note that this guide is never handed out to the students in a complete form. The Argumentation and Evaluation Strategy (AES) The intervention incorporates a flexible cognitive reasoning strategy that guides students as they evaluate the components of arguments made in support of claims. The strategy steps parallel the questions on the AEG. The strategy consists of the following steps: (a) identify the claim and qualifiers; (b) identify the evidence presented; (c) identify the type of evidence as data, fact, opinion, or theory; (d) evaluate the quality of the evidence; (e) explore the reasoning that connects the evidence to the claim; (f) identify the type of reasoning as theory, authority, or logic, (e.g., analogy, cause-effect, correlation, or generalization); (g) evaluate the quality of the reasoning; (h) explore counterarguments, rebuttals, or new questions; and (i) draw a conclusion, either accepting, rejecting, or withholding judgment about the claim, and explain the reasoning for the decision.
J Sci Educ Technol (2014) 23:82–97
whole-group and in small-group cooperative structures. These methods were sequenced within three instructional phases: ‘‘Cue,’’ ‘‘Do,’’ and ‘‘Review.’’ During the ‘‘Cue’’ phase, the teacher (a) introduces the topic of the lesson, (b) explicitly informs students about the importance and benefits of understanding the targeted information, (c) distributes and explains the one-page graphic organizer (the AEG), and (d) prompts the students to take notes on the AEG and participate in the discussion. During the ‘‘Do’’ phase, the major part of the routine, the parts of the guide are completed by the teacher and all students by following a set of nine strategic thinking steps; these are the steps of the strategy that are cued on the AEG described previously. Finally, in the ‘‘Review’’ phase, the teacher and students collaboratively review the information covered in the ‘‘Do’’ phase and the process that has been used to analyze and evaluate the claim and supporting argument. The routine is meant to support rather than replace the ways that teachers teach critical content and procedures. Therefore, the components of the intervention are flexible and do not replace hands-on learning experiences.
Method Participants Sixteen teachers in grades 6 through 9 from five school districts participated, eight in the experimental and eight in the control conditions, matched for grade, subject, and school district. Subjects included General Science (four classes in each condition), Life Science, Earth Science, Biology, and Physical Science (one class in each condition). The classrooms were typical general education– inclusive classrooms. A total of 282 students in grades 6, 7, 8, and 9 from the classes of the 16 participating teachers took part in the study, 158 in the experimental condition (78 female and 80 male) and 124 in the control condition (61 female and 63 male). Students with an Individualized Education Program (IEP) were identified. Among these were students identified as gifted (GI), 13 in the experimental condition, and 9 in the control condition. Others were identified as having a specific learning disability (LD), 9 in the experimental condition, and 13 in the control condition. Procedures Professional Development Procedures
The Argumentation and Evaluation Routine (AER) These instructional procedures were designed to support teachers’ instruction and to guide student dialogue both in
123
Teachers in the experimental condition attended a 10-day institute during the summer prior to the study and met with researchers once a month thereafter. During the summer,
Fig. 1 Sample Argumentation and Evaluation Guide. Ó Bulgren & Ellis 2011
J Sci Educ Technol (2014) 23:82–97 85
123
86
each experimental teacher was asked to develop at least 10 ways to use the AEG as part of his or her ongoing instruction, identify relevant articles, text, video, or laboratory experiments, complete the appropriate AEGs, teach the topics to the other teachers, and score rubrics to ensure common understandings of evaluation components. Student Testing Procedures Pretests were administered at the beginning of the school year in all experimental and control classes before the experimental teachers began instruction on the unit. The posttests were administered after teachers completed instruction. Measures Student Achievement Measure The student assessment consisted of an instrument designed to evaluate students’ ability to use the steps of argumentation discussed above to answer questions in writing about an argument made in support of a claim. First, researchers developed a written argument in the form of a one-page presentation about the dangers to women of drinking colas. A script containing the argument in support of the claim was prepared (see the ‘‘Appendix 1’’ for the script). In addition, a 10-point test was developed for students to answer questions about the argument (see the ‘‘Appendix 2’’ for the test). For purposes of evaluation, the ‘‘claim’’ step was separated into two parts: (a) the student accurately identifies the claim, and (b) the student explicitly identifies each qualifier. The remaining items of the assessment are congruent with the remaining components of the instructional strategy: (c) identify evidence, (d) label the evidence by type, (e) judge the quality of the evidence, (f) identify the reasoning that allowed the claimant to make the claim based on the evidence presented, (g) label the reasoning by type, (h) judge the quality of the reasoning, (i) present rebuttals or counterarguments, and (j) come to a conclusion about the quality of the claim and explain the reasoning that supports the conclusion. The assessment was designed to allow analysis by four subscores for the following clusters: Claims (claim and qualifiers); Evidence (identification, type, and quality of evidence); Reasoning (identification, type, and quality of reasoning); and Conclusion (identification and explanation for conclusion). The instrument was used as a pre- and posttest measure. In addition, a Scoring Rubric was developed (see the ‘‘Appendix 3’’ for the rubric). Reliability Interscorer reliability was determined by having coders independently score a random sample (20 %) of tests taken
123
J Sci Educ Technol (2014) 23:82–97
by students in all classes in both conditions. Reliability was calculated for the 10 items individually and for the total test. For the total test, two measures of reliability were calculated: the intraclass correlation coefficient (ICC) and the Pearson correlation (r). On the total scores, the correlation between the coders’ scores was .954; the intraclass coefficient was .976. For the individual items, percent agreement and weighted Kappa were calculated. The percent agreement was calculated by dividing the number of agreements by the number of agreements plus disagreements and multiplying by 100. Percent agreement ranged from 59 to 93, with a mean of 78.2. The weighted Kappa statistic ranged from .46 to .89. The distribution of the Kappa scores for the 10 items and their interpretation by Landis and Koch (1977) was as follows: four scores above .81, considered to be ‘‘almost perfect agreement,’’ four scores between .61 and .80, considered to be ‘‘substantial agreement,’’ and two scores between .41 and .60, considered to be ‘‘moderate agreement.’’ Student Satisfaction Questionnaire This instrument was developed to determine students’ satisfaction with the AEI. The survey consisted of eight items, each formatted using a 7-point Likert-type scale, in which a ‘‘1’’ indicated ‘‘completely dissatisfied’’ and a ‘‘7’’ indicated ‘‘completely satisfied.’’ See the left side of Table 1 for questions related to student satisfaction. This questionnaire was administered only to the students in the experimental condition. Student Confidence Questionnaire The instrument developed to assess students’ confidence related to their preparedness for engaging in analysis and evaluation of claims and their supporting arguments. This survey consisted of three items, each formatted using a 7-point Likert-type scale, in which a ‘‘1’’ indicated ‘‘completely disagree’’ and a ‘‘7’’ indicated ‘‘completely agree.’’ See the left side of Table 1 for questions related to confidence. This questionnaire was administered to the students in the experimental condition. Teacher Satisfaction Questionnaire Finally, an instrument was developed to determine teachers’ satisfaction with the AEI. This survey consisted of 20 items, each formatted using a 7-point Likert-type scale, in which a ‘‘1’’ indicated ‘‘completely disagree’’ and a ‘‘7’’ indicated ‘‘completely agree.’’ See the left side of Table 2 for questions related to teacher satisfaction.
J Sci Educ Technol (2014) 23:82–97
87
Table 1 Results of student satisfaction and confidence surveys Student survey results How satisfied are you that the argumentation and evaluation intervention helped you to
Mean
SD
N
Satisfaction 1. Understand how to think about important science issues in class?
4.44
1.69
129
2. Think about important science issues outside class, such as from newspapers, the Internet, or television news?
4.50
1.76
128
3. Focus your attention on what was important in class?
4.53
1.71
129
4. Study for tests?
3.74
1.94
129
5. Do well on tests?
3.88
1.91
126
6. Improve your grades?
3.95
1.96
129
7. Use knowledge that you already have to think about science issues.
4.86
1.65
127
8. How satisfied are you with this new way of teaching as compared to when your teacher did not use it?
4.02
1.81
128
How confident are you
Mean
SD
N
Confidence With the correctness of the steps you go through as you think about a statement or claim dealing with a science issue?
4.92
1.65
129
With the correctness of your decision about the accuracy of a statement you hear or read about a science issue?
4.54
1.67
129
That you can explain your judgment about science issues to others?
4.98
1.97
128
Table 2 Results of teacher satisfaction survey To what degree do you think that
Mean
SD
N
1. This is a flexible procedure that can easily be adapted for use with other teaching procedures I use
3.88
1.25
8
2. Once I understood this procedure, it was easy for me to use with my students
3.50
1.41
8
3. The time that it took to understand this procedure was worth the benefits that followed
4.88
1.73
8
4. There is sufficient time to incorporate this procedure into my course and cover the required science standards
2.25
1.28
8
5. I will continue to use all parts of the AEG in future lessons.
3.00
2.39
8
6. I will adapt and use only certain elements of the AEG in future lessons
5.88
1.25
8
7. This instruction fits with my teaching procedures
4.50
1.60
8
8. This procedure enhances students’ understanding of science content in my course
5.38
1.41
8
9. This procedure enhances students’ development of inquiry and critical thinking skills
6.25
10. This procedure fits my vision of teaching
4.63
1.85
11. Reasoning about claims and conclusions is important in my content area 12. This procedure might improve students’ ability to write essays
6.25 4.13
.71 8 1.46 8
.89 8 8
13. This procedure improves the quality of student discussion about science
5.12
1.55
8
14. My students enjoyed it when I used the procedures in my classes
2.13
1.64
8
15. The students have mastered the elements of the AEG
3.38
1.41
8
16. I would recommend the use of this procedure to other teachers
5.00
1.41
8
17. As a result of using this procedure, my students demonstrate enhanced awareness and ability to analyze science-related issues (personal and social) presented in the media outside of class
4.63
1.19
8
18. This procedure might improve my students’ reading comprehension
5.00
1.85
8
19. This procedure improves students’ ability to analyze science data and develop conclusions from their own science investigations
5.75
1.28
8
20. This procedure improves interactions when working in small groups
5.13
1.25
8
Research Design A quasi-experimental pretest–posttest design was used with teachers in eight classrooms using the AEI and teachers in the eight control classrooms using the typical instructional
procedures. The following primary research questions guided the study: 1.
Do students who received the AEI have higher total test mean scores than students in the control group? (See Table 3).
123
88
J Sci Educ Technol (2014) 23:82–97
Table 3 Mean test results by total and subscale scores Outcome
Time
Group
Total
Pre
Experimental Control
Post
Experimental Control
Claim
Evidence
SD
Interaction effect pre–post by condition
Main effect groups pretest
Main effect groups posttest
.88
152
.43
F(1,13) = 140.9
F(1,13) = .37
F(1,13) = 47.5
p \ .0001
p = .55
p \ .0001
.85
124
.33
1.65
158
.47
.97
124
.31
1.28
152
.60
F(1,13) = 2.9
F(1,13) = .10
F(1,13) = 2.89
Control
1.27
124
.52
p = .11
p = .76
p = .11
Post
Experimental Control
1.41 1.24
158 124
.54 .64
Pre
Experimental
1.10
152
.78
F(1,13) = 60.1
F(1,13) = .74
F(1,13) = 40.1
Control
1.01
124
.45
p \ .0001
p = .40
p \ .0001
Experimental
2.06
158
.76
Control
.46
Pre Post
Conclusion
N
Experimental
Pre
Post Reasoning
Means
Pre
1.19
124
Experimental
.33
152
.32
F(1,13) = 156.3
F(1,13) = .50
F(1,13) = 66.2
Control
.44
124
.38
p \ .0001
p = .49
p \ .0001
F(1,13) = 27.4
F(1,13) = 1.2
F(1,13) = 31.7
p = .0002
p = .29
p \ .0001
1.28
158
.61
Control
Experimental
.52
124
.31
Experimental
.96
152
1.09
Control Post
.83
124
.63
Experimental
1.83
158
.77
Control
1.06
124
.65
The F values and probability level given in the pre–post by condition effect is the probability that there is an interaction between the experiment/ control condition and the PrePost time of assessment. The probabilities in the last two columns are the probabilities that the scale/subscale means of the experimental and control groups are equal at the pretest and at the posttest
2.
Do students who received the AEI have higher mean subscale test scores than students in the control group? (See Table 3).
Secondary research questions focused on the differences in performance between the experimental and control students in various subgroups in the study. Specifically, total test mean score differences were examined for the following subgroups: males and females, students with LD, gifted students, and students in various grades. In addition, the satisfaction and confidence levels for students and satisfaction levels for teachers in the AEI group were examined. Finally, teachers’ selection of articles versus laboratory experiments to explore claims was collected. Data Analysis The data for the primary questions (questions 1 and 2 above) were analyzed using a general linear mixed model (GLMM) approach. This approach is appropriate for hierarchical or nested data (e.g., students nested within
123
classrooms) since it allows modeling of the complex variance structure due to differences among students within the same classroom as well as differences among the different classrooms. In these analyses, three levels were included: time (pre– post), students, and classrooms/teacher. The primary interest was the interaction between time and condition: Do the students in the AEI condition gain (pre- to post-) more knowledge than the students in the control condition? This same analytic approach was used to determine differences between the males and females and between the different grades. For analyses involving the gifted students and the students with LD, a general linear model (repeated measures analysis of variance) approach was used since the limited number of students (22 in each condition) were scattered among the 16 classrooms, and the dependency due to classrooms, therefore, was not an issue. Effect sizes for the GLMM analyses were calculated using the modification of the Hedges’ g method suggested for hierarchical linear model (HLM) analyses by the What Works Clearinghouse (2010). Bonferroni
J Sci Educ Technol (2014) 23:82–97
adjustments for significance levels were used where appropriate. In addition to the total scores over all 10 test items, four subscales, each based on a set of related questions, were analyzed: Subscale 1 (hereafter referred to as ‘‘Claim’’) included items 1 and 2 describing the claim and its qualifiers; Subscale 2 (hereafter referred to as ‘‘Evidence’’) included items 3, 4, and 5 concerning the evidence; Subscale 3 (hereafter referred to as ‘‘Reasoning’’) included items 6, 7, and 8 regarding the reasoning; and Subscale 4 (hereafter referred to as ‘‘Conclusion’’) included item 9 regarding counterarguments, rebuttals, and new questions; and item 10 describing the conclusion. Mean scores were computed for the teacher satisfaction measures as well as the student satisfaction and confidence measures. See ‘‘Measures’’ section for discussion of reliability.
Results Student Test Results Total Test Scores for All Participants In the analysis of total test mean differences between the AEI group and the control group, significant results were found for the interaction between experimental condition and change in scores from pretest to posttest, F(1,13) = 140.9, p \ .0001, effect size = 1.7, large. The significant interaction effect indicates that one group, in this case the AEI group, gained more than the control group. Examination of the estimated means shows that there were no statistically significant differences between the two groups on the pretest, but differences were found on the posttest. The observed means and standard deviations for the total test and the subscale scores are presented in Table 3.
Subscale Scores for All Participants Similar results were found for three of the four subscale scores (significance level: .05/4 = .0125). Those three were as follows: Evidence (identify, type, and quality of evidence), F(1,13) = 60.1, p \ .0001, effect size = 1.4, large; Reasoning (identify, type, and quality of reasoning), F(1,13) = 156.3, p \ .0001, effect size = 1.6, large; and Conclusion (arrive at and explain a conclusion), F(1,13) = 27.4, p \ .0002, effect size = 1.1, large. The only subscale for which significant differences were not found was Claim (identify a claim and associated qualifiers),
89
F(1.13) = 2.94, p = .11. These results are presented in Table 3.
Individual Item Scores for All Participants In addition to total and subscale cluster scores, descriptive statistics were computed to analyze students’ performance on each of the ten individual scores. This analysis indicated that mean posttest scores achieved by students in the experimental condition were higher than those of students in the control condition for every question except one; this was the question assessing students’ ability to identify qualifiers.1 Two other items were of interest. First, for the question requiring students to identify the chain of reasoning (Toulmin’s warrant), students in the control condition scored, on average, .65 on the pretest and .64 on the posttest. By contrast, mean scores achieved by students in the experimental condition were, on average, .45 on the pretest and 1.15 on the posttest. Second, for the question that assessed students’ abilities to label the type of reasoning in the argument (based on authority, theory, or logic), the mean for students in the control condition was, on average, virtually the same from pretest to posttest, 15 and .16. On the same question, students in the experimental condition had a mean score of .08 on the pretest to 1.58 on the posttest. Subgroup Test Scores Each subgroup was tested with an overall alpha level of .05. For the males and females, there was a statistically significant three-way interaction between gender, condition, and time, F(1,13) = 7.02, p = .02. Follow-up analyses showed that the treatment group had higher scores than the control group, regardless of gender, F(1,13) = 48.17, p \ .0001. Further examination of the gender effects revealed that in the control group, the mean differences for females between pretest and posttest score was .06 compared to .18 for males. In the AEI group, the mean difference for females between the pretest and posttest scores was greater than that for the males, .86 and .69, respectively. Further, the female scores were slightly higher than the male scores on the posttest. See Table 4 for total test scores by gender. For students identified as having a LD and those identified as gifted [those with Individualized Educational Programs (IEP)], subgroup analyses revealed positive results. For students with LD, statistically significant 1
Complete pretest and posttest means and standard deviations on individual items available upon request.
123
90
J Sci Educ Technol (2014) 23:82–97
Table 4 Numbers, means, and standard deviations on total scores for subgroups Female n
Male M(SD)
n
LD M(SD)
n
Gifted M(SD)
n
M(SD)
Experimental Pre
73
.85 (.34)
79
.90 (.51)
9
.72 (.27)
13
1.03 (.39)
Post
78
1.71 (.48)
80
1.59 (.46)
9
1.21 (.46)
14
1.88 (.34)
Pre
61
.90 (.31)
63
.81 (.34)
13
.65 (.39)
9
.96 (.26)
Post
61
.96 (.28)
63
.99 (.34)
13
.78 (.39)
9
1.21 (.27)
Control
results were found between the experimental and control groups in favor of the students in the experimental condition, F(1,20) = 6.16, p = .022, effect size = 1.1, large. For the gifted students, the results were also statistically significant, in favor of those in the experimental condition, F(1,20) = 10.96, p = .003, effect size = 1.5, large. Students in the experimental condition from these subgroups, on average, made gains from pre- to posttest. See Table 4 for total mean scores for students with LD and those identified as gifted.2 No statistically significant differences were found among students in the different grades. Overall, students at different grade levels in the control condition scored a mean of .97 on the posttest, whereas those in the AEI condition scored a mean of 1.65. Inspection of data revealed that the sixth-grade students, in general, made the largest relative gains from pre- to posttest. For the students in the sixth grade, the mean posttest scores for students in the experimental condition were 2.7 times greater than those on the pretest. (.53 vs. 1.44). For the other grades, the difference between the mean pretest and posttest scores for students in the experimental condition for students in the seventh, eighth, and ninth grades were markedly improved when compared to students in the control condition. For example, the difference between the mean pretest and posttest scores for students in the experimental condition in the seventh grade was more than 50 % larger (1.18 vs. 1.91), and almost double for the eighth graders (.91 vs. 1.7) and ninth graders (.75 vs. 1.37) when compared to students in the control condition. The students in the ninth-grade experimental condition (whose teacher used the AEI 6 times) received the lowest mean score of all the grades.3
2
Complete pretest and posttest means and standard deviations for subgroup performance available upon request. 3 Complete pretest and posttest means and standard deviations for performance by grade are available upon request.
123
Teacher Use Results When teachers selected topics, they selected laboratory experiments or articles. The ratio of laboratory experiment to articles was as follows: 1:9 for one teacher (who used three videos in place of some written articles); 3:7 for four teachers, 4:6 for one teacher, 5:5 for one teacher, and 0:6 for the teacher who implemented the AEI six times. In general, when the teachers used the AEI, students were taught as a whole group. Seven used the AEI 10 times throughout the school year during one class period each; class periods ranged from 45–50 minutes. The eighth teacher, who taught ninth-grade Biology, used the AEI six times. Measures of Satisfaction and Confidence Descriptive statistics for student satisfaction and confidence are presented in Table 1, and for teacher satisfaction in Table 2.
Discussion Student Test Results Total and Subscale Results Analyses of total test scores for the entire group of students indicated a significant difference and large effect size in favor of students in the experimental condition whose teachers used the AEI. This is an important overall finding for teachers who are responding to challenges to help their students engage in higher-order reasoning, such as that associated with argumentation. Furthermore, significant differences and large effect sizes in favor of students in the experimental condition were found for three of four subscale scores, those associated with evidence, reasoning, and conclusions—challenging components of argumentation. As such, these
J Sci Educ Technol (2014) 23:82–97
findings reinforce the overall findings of the power of the AEI to help students respond to higher-order thinking and reasoning challenges, particularly the most challenging component associated with argumentation in the Framework and the CCSS. Among the four subscale scores, the only subscale score for which significant differences between the experimental and control groups was not found was that representing students’ abilities to identify the claim and qualifiers to the claim. The reason for this was explored by further analysis of individual test item scores. Individual Item Results Deeper analysis of the individual test items resulted in findings of interest, particularly in light of the sub-score data in which claims and qualifiers were analyzed together and for which no significant difference was found between groups of students in the experimental and control conditions. Specifically, individual item analysis suggests that students in the experimental condition performed significantly better than those in the control group when asked to identify a claim, but questions remain as to why no difference was found between the groups relative to identifying qualifiers. Ultimately, this use of deeper analysis to compare items individually revealed insights and issues not available when sub-scores were reported. Furthermore, analysis of scores on two other individual test item results illustrates the efficacy of the AEI to enhance student performance on components that are central to evaluating arguments: identifying and labeling the chain of reasoning. These findings of differences between the groups on the two questions are important because of the emphasis on the more complex components of higher-order thinking in CCSS (2010) and the science Framework (NAP 2012). That is, the chain of reasoning in this study paralleled the higher-order thinking embodied in Toulmin’s warrant that challenged students to evaluate the ways that the evidence did, or did not, allow the claimant to make the claim based on evidence. Furthermore, this supports the use of finer-grained analyses used to explore responses on individual test items in addition to the analysis of total test scores and subscale test scores. Such analyses add insights into preliminary sets of information. This allows teachers and researchers to focus more precisely on which instructional components need more support. Subgroup Results Results for students identified as having a LD and those identified as gifted [those with Individualized Educational Programs (IEP)] revealed positive results. These results
91
support the ability of students of diverse abilities and achievement levels to benefit from instruction in the same general education science class. This is an important finding in light of national goals to help all students succeed in critical content areas such as science. In terms of gender, a positive finding was that both males and females in the experimental condition performed significantly better than those students in the control condition. Of interest is the finding that, on average, females outperformed males on the total posttest score in the experimental condition, more than doubling their mean scores from pretest to posttest. This is an important finding given goals of helping female students achieve in the areas of science, technology, engineering, and mathematics. For students at different grade levels, the finding that students in sixth grade in the experimental condition almost doubled their mean scores from pretest to posttest overall is an important finding relative to a frequent perception among teachers that students in the earlier grades are not sufficiently mature to benefit from this type of instruction (Ellis and Bulgren 2009). In this study, the sixth-grade students, in general, made the highest relative gains from pre- to posttest. However, since only one sixth-grade teacher participated, the performance of sixth-grade students may be due to teacher effects. For the other grades, the difference between the mean pretest and posttest scores for students in the experimental condition for students in the seventh, eighth, and ninth grades were improved when compared to students in the control condition. However, the students in the ninth-grade experimental condition received the lowest mean posttest score of all the grades. This is noteworthy because the teacher in the ninth grade used the AEI only six times, whereas the teachers in the other grades used it 10 times each, raising questions about the need for adequate experience with new instructional procedures for students to fully benefit.
Satisfaction and Confidence Results Student Satisfaction Based on student responses on the satisfaction survey, they were most satisfied that they used the knowledge they had to think about science issues. On the other hand, the lowest satisfaction scores were awarded to the questions regarding whether the AEI helped them study for tests, do well on tests, or improve their grades. Their feedback on studying and doing well on tests may indicate that students may have had accurate impressions that they were not likely to encounter questions associated with the higher-order reasoning required for argumentation on their regular classroom tests. Students’ perception that the intervention was
123
92
not highly likely to help improve their grades requires closer analysis. Most teachers used the intervention ten times, often representing more than 10 days of instruction. Therefore, it is worth exploring whether the teachers did not, in fact, award a considerable grade weighting to the argumentation tasks, or, if they did, whether this was not shared with the students. Students need an accurate view of what the teacher considers important and for what they will be held accountable. Student Confidence In general, mean student confidence ratings were between the ‘‘neither confident nor not confident’’ and the ‘‘somewhat confident’’ range. Students of teachers in the experimental condition awarded the highest of the mean score to the questions that asked how confident they were that they could explain their judgments about science issues to others. This is an important finding in light of recent emphasis on helping students acquire the ability to conduct explorations of claims and arguments and share the results with others (CCSS 2010). Students awarded the second highest mean rating to the questions asking how confident they were with the correctness of the steps they went through as they thought about a science issue. Finally, they awarded the lowest mean score to the question asking how confident they were with the correctness of their decision about the accuracy of a statement they heard or read about scientific issues. This raises the need for more practice activities designed to help students generalize their learning.
Teacher Satisfaction For teachers, two relatively high mean satisfaction ratings were awarded to the questions that indicated teacher satisfaction that the AEI enhanced student development of critical thinking skills and that reasoning about claims and conclusions was important to their content area. Furthermore, teachers awarded a higher mean score to their belief that the procedure improved students’ abilities to analyze science data and develop conclusions from their own science investigations than they did for students’ ability to analyze science-related issues presented in the media outside of class. For teachers, their lower mean rating parallels the low mean rating awarded by students relative to their ability to generalize their reasoning to settings outside the classroom. As a result, students may need more instructional support from teachers and higher expectations that they will learn to analyze real-world issues and generalize the use of higher-order reasoning. The teachers awarded a low mean satisfaction rating to a question exploring whether they had time to incorporate
123
J Sci Educ Technol (2014) 23:82–97
the entire AEI into their instruction. Teachers, in general, indicated they would not use the entire routine in the future. It is difficult to incorporate any new intervention into instruction, but particularly one that requires the use of higher-order reasoning. It is, if course, possible that recent emphases on higher-order thinking and reasoning such as that associated with argumentation may well become more prominent. Nevertheless, teachers must have adequate time to prepare and implement new instructional procedures with fidelity. Relative to this study, Ysseldyke (2009) emphasized the importance of treatment integrity. He contended that when effective treatments are implemented with fidelity or integrity, the treatments have a strong effect on student outcomes; when that is not the case, research results can be misleading, making us think that a treatment was not effective when, in fact, it was the implementation that was not effective. This may become an important issue for this intervention, if teachers pick and choose which components to use, especially if they omit the higher-order thinking components related analyzing and evaluating evidence and reasoning. When research studies provide evidence that a package of instructional interventions serves to help students learn, the value of fidelity of implementation, or an accepted range of modifications, must become important issues. Another important issue involves teachers’ perceptions that their students did not particularly enjoy the argumentation and evaluation intervention and activities. Student enjoyment is, indeed, a concern in education. However, Ysseldyke (2009) raised another important issue relative to the importance of challenging all students as they learn. He contended that instruction should be matched to the level of each student’s ability and that all students should be working at what Vygotsky (1978) called their zone of proximal development. In this study, student enjoyment and ease of learning may have been a greater consideration for teachers than challenging students to think and reason well. The issues of enjoyment of the learning process and ease of learning require future study.
Limitations and Future Research Limitations Despite significant differences found for the mean total score and three of the four mean subscale scores in favor of students in the experimental condition, the rubric scoring method raises concerns. The scoring utilized a rubric system with four points (0–1–2–3); the highest, 3, represented a score that indicated the student’s performance was ‘‘very good and met standards’’ and a score of 2 indicated that
J Sci Educ Technol (2014) 23:82–97
students were ‘‘making good progress toward improvement.’’ In the posttest scores for students in the experimental condition, only 3 of the 10 items reached mean scores above 2.0. Therefore, more refinements may be needed to identify the nuances of the incremental steps toward student progress. In addition, refinements are needed for the scoring rubric itself. Items 5 and 6 had weighted Kappa scores that indicated only moderate agreement. Further revision of the scoring rubric was undertaken with a new random sample of 20 % of the participants. For this sample and the new scoring rubric for the two items, the reliability scores improved: For item 5, the percent agreement was .78 and the weighted Kappa was .78; for item 6, the percent agreement was .84 and the weighted Kappa was .77. These Kappa scores represent excellent agreement.
Future Research Future research might address the treatment integrity issues addressed by Ysseldyke (2009). Teachers reported that they might implement the procedures by picking and choosing among components. Future studies might compare implementation of the total package versus partial implementation of the instructional intervention to determine what, if any, differences in students’ performance outcomes occur. This could lead to guidelines for modifications that are acceptable but that do not alter the core components of instruction. In a related issue, future research is needed to determine the amount of support teachers need to adopt the AEI and incorporate it into ongoing instruction. Despite their recognition that the AEI enhanced students’ understanding of science, development of inquiry and critical thinking skills, and ability to analyze data and develop conclusions, teachers had concerns about the time needed to incorporate the routine into their instruction. Future research is needed to determine whether this perception was associated with the challenges of adopting a new instructional procedure, and whether using the AEI in subsequent years would be perceived differently. If not, more supports may be needed to facilitate the use of the AER. This support may come from department and school communities of learning or from prepared sample argumentation materials and guides. For students, research is needed to enhance their understanding of components of logic, such as those incorporated into the AER. In particular, students may need support in recognizing that claims may be warranted based on correlational reasoning or causal reasoning, the two
93
must be carefully distinguished, and causal reasoning is the more powerful of the two. Furthermore, research could be conducted to determine whether the findings of no significant differences for students in the experimental and control conditions on abilities related to identifying qualifiers resulted from the ability of both groups to identify qualifiers or whether more instruction in the AER must be devoted to qualifiers. In addition, research could be conducted to determine whether students can acquire knowledge of the steps of a strategic approach when it is used primarily to teach content and to transfer use of the strategy to new settings. To that end, some authors have recommended that students develop a deep understanding of science and use general strategies in particular scientific domains (Reiser et al. 2001). Others have emphasized the importance of reasoning processes for either domain-general or domain-specific knowledge (Klahr et al. 1993). Building on this thinking, the expanded scope and use of general strategies, domainspecific strategies, and cross-curricular higher-order reasoning strategies might be explored. Such research might focus on strategic approaches that support transfer and generalization from one content area, domain, or discipline to another. Students’ exposure to the intervention is a related issue in need of future research. In this study, teachers were asked to implement the new intervention at least 10 times; students were tested once before instruction and once at the end of instruction. Research is needed to explore the optimal number of times students need to experience the intervention to benefit. This might involve a design in which students are tested after each implementation of the procedure, or at intervals, rather than the pretest/posttest design used in this study. This issue was raised by the lower performance of students in the class of the teacher who implemented the intervention 6 times rather than the 10 times used by other teachers. Her students performed at lower levels than any other grade level. Therefore, exposure to instruction needs to be explored. In addition, grade level differences might benefit from further study. Although students at the sixth-grade level, on average, achieved higher scores than any other group, teacher differences may have played a role. Only one teacher at the sixth-grade level participated. Therefore, exploration of the AEI with multiple teachers across multiple grade levels could provide additional valuable insights. Given student ratings, more research on student satisfaction and confidence levels is also needed. Introduction of new instructional procedures is challenging for both
123
94
teachers and students. It is possible that students do not rate any instructional procedure highly when a new and difficult learning challenge is introduced, or when a teacher is learning to implement the procedures in a research setting. As a result, familiarity with teachers’ use of an instructional procedure may result in higher student satisfaction and confidence ratings. Furthermore, students’ feedback to teachers, possibly in the form of easily administered surveys, might provide teachers with insights regarding students’ learning needs. In general, this study supports the use of the Argumentation and Evaluation Intervention in inclusive general education science classes with groups of students of diverse ability and achievement levels, different genders, and at different grade levels to help students learn and use the components of argumentation. Overall, the study supports ongoing calls for ways to support students as they think about and analyze claims and arguments in science classes and the real world. Research can continue to analyze argumentation needs presented in the Common Core State Standards and Science Framework, in addition to the Next Generation Science Standards (Achieve 2013), and to refine and develop assessments, instruction that supports these goals and standards, and professional development supports for teachers. Acknowledgments This material is based upon work supported by the National Science Foundation under Grant Number 0554414. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Appendix 1: The Dangers of Drinking Soda When I was growing up, my mother always told me that drinking soda pop was bad for my health. My friends and I felt good and thought drinking pop was fine. We stopped by the drug store on the way home from school to get a cherry Coke several times a week. Mom warned me about the ‘‘empty calories’’ in the sugar and the link between
123
J Sci Educ Technol (2014) 23:82–97
sugar and tooth decay. It turns out that there may be an even more important reason for young girls to avoid carbonated drinks—drinking too many of them causes women to have weaker bones. In Boston, a Tufts University study found women drinking three 12 oz. colas a day—either regular or diet—had 5 percent less bone density than women who drink less than a serving a day. Less dense bones are weaker bones. At first researchers thought the weaker bones were caused by the women drinking soda instead of milk, but they found out the amount of milk they drank did not change (http://www. vitabeat.com/study-soda-thins-bones-in-women/v/4611/). Another study of 2,500 people concluded that women who drank carbonated beverages had low bone mineral density in their hips. This was true for women who drank regular or diet soft drinks, but not for the men (http:// engineering.curiouscatblog.net/2006/10/07/another-strikeagainst-cola/). A third study done by the Harvard School of Public Health in 2000 used 460 high school students and concluded that girls who drank carbonated soft drinks were three times as likely to break a bone than those who consumed other drinks (http://www.prevention.com/cda/article/ the-case-against-soda/d9bc50d1fa803110VgnVCM100000 13281eac/health/healthy.living.centers/diabetes/0/0/4). Researchers give two reasons for the connection between drinking soda pop and weaker bones. First, soda contains a chemical called phosphoric acid which may interfere with the body’s ability to use calcium to make strong bones. Second, drinking a lot of soda is likely to leave you with little appetite for milk, vegetables, protein and other food that your body needs (http://www.diabeticsfightback.com/Soda HazardTomlison.html). So, it turns out my mother as right when she warned me about drinking soda. Not only should she have been concerned about the 10 teaspoons of sugar in a can of pop, but also the risk of drinking soda causing me to have weaker bones.
Appendix 2: Pretest/Posttest
J Sci Educ Technol (2014) 23:82–97
95
Evaluation of a Claim Name _______________________________________ Teacher _____________________________________
Date _________________ Hour _________________
Directions: Read the article on the next page and answer the following questions: 1.
What CLAIM is made in the article?
2.
What QUALIFIERS to the claim (if any) are made?
3.
What EVIDENCE is used to support the claim?
4.
What are the TYPES OF EVIDENCE are used to support the claim?
5.
How well does the QUALITY OF EVIDENCE support the claim? Explain your evaluation.
6.
What CHAIN OF REASONING proves that the evidence supports the claim?
7.
What TYPE of reasoning is used to support the claim?
8.
How well does the QUALITY OF REASONING support the claim? Explain your evaluation.
9.
What are your CONCERNS about the believability of the claim?
10.
Indicate whether you would ACCEPT, REJECT or WITHHOLD JUDGEMENT of the claim made in this article. EXPLAIN your evaluation that led you to either accept or reject the claim. (Use the back of this page for your explanation.)
123
96
J Sci Educ Technol (2014) 23:82–97
Appendix 3: Argumentation and Evaluation Scoring Rubric
Name: ______________________________ _________________________ ________________________ Teacher: ______________________________________________________ ____________________________________________________ Scoring 2 Guidelines 0 1 Good Progress Toward Poor for each Needs Improvement Improvement Step Student The student response The student partially identifies the 1 Claim gives no inaccurately identifies the claim claim being made. response. being made or writes a response not structured as a claim. Student The student response fails to The student partially identifies 1 gives no accurately identify qualifier(s) qualifier(s) within the claim that Qualifier response. within the claim OR fails to are present. state there are no qualifiers present. Student The student response identifies The student accurately cites some 2 gives no evidence that fails to support the evidence used to support the Evidence response. claim. claim. Student The student response fails to The student accurately identifies 3 gives no accurately identify any types of some types of evidence. Identifying response. evidence. Types of Evidence: Data, Fact, Theory or Opinion Student The student response fails to The student evaluates and 4 Evaluation gives no accurately evaluate OR discuss discusses some of the quality of of Quality of response. the quality of the evidence. evidence OR indicates that quality Evidence was not relevant. 5 Chain of Reasoning (Warrant)
Student gives no response.
The student response fails to explain the author’s reasoning connecting the evidence to the claim.
The student explains some of the author’s reasoning connecting the evidence to the claim.
6 Identification of Types of Reasoning
Student gives no response.
The student response fails to accurately identify types of reasoning.
The student accurately identifies some types of reasoning.
7 Evaluation of Quality of Reasoning
Student gives no response.
The student response fails to accurately evaluate the quality of reasoning OR explain his/her evaluation.
The student evaluates some of the quality of reasoning and/or explains some of his/her evaluation.
8 Concerns of the student
Student gives no response.
The student response raises no new relevant concerns.
The student raises some new relevant concerns.
9 Conclusion and explanation about the Claim
Student gives no response.
The student response neither makes a conclusion to accept, reject or withhold a decision about the claim NOR provides an explanation of his or her reasoning.
The student makes a conclusion to accept, reject or withhold a decision about the claim OR provides an explanation about his or her reasoning.
© Bulgren & Ellis, 2011
123
Date: Topic:
3 Very Good Meets Standards The student accurately identifies the claim being made.
The student accurately identifies most of the qualifier(s) within the claim OR correctly states that none are present. The student accurately identifies most of the evidence used to support the claim. The student accurately identifies all of the evidence as data, fact, theory, or opinion.
The student evaluates and discusses the quality of evidence (i.e., validity, reliability, objectivity/bias or controlled experiment). The student explains the author’s reasoning connecting the evidence to the claim (i.e., authority, theory, or types of logic such as generalization, analogy, correlation, or cause and effect). The student evaluates the quality of reasoning AND explains his/her evaluation (i.e., authority, theory, or types of logic, such as generalization, analogy, correlation, or cause and effect). The student evaluates the quality of reasoning AND explains his/her evaluation (i.e., authority, theory, or types of logic, such as generalization, analogy, correlation, or cause and effect). The student clearly raises new relevant concerns AND expresses them as counterarguments, rebuttals or new questions OR states there are none. The student makes a conclusion to accept, reject or withhold a decision about the claim AND provides an explanation for his or her reasoning.
Score
J Sci Educ Technol (2014) 23:82–97
References Achieve (2013) The next generation science standards. Retrieved from http://www.nextgenscience.org/next-generation-science-standards Bannan-Ritland B (2003) The role of design in research: the integrative learning design framework. Educ Res 32(1):21–24. doi:10.3102/0013189X032001021 Bricker LA, Bell P (2008) Conceptualizations of argumentation from science studies and the learning sciences and their implications for the practices of science education. Sci Educ 92(3):473–498 Bulgren JA, Ellis JD (2012) Argumentation and evaluation intervention in science classes: teaching and learning with Toulmin. In: Kline MS (ed) Perspectives on scientific argumentation: theory, practice, and research. Springer, New York, NY, pp 135–154 Common Core State Standards Initiative (CSSS) (2010) Common core state standards for English language arts & literacy in history/social studies, science, and technical subjects. Retrieved from http:// www.corestandards.org/assets/CCSSI_ELA%20Standards.pdf Conley DT (2008) College knowledge: what it really takes for students to succeed and what we can do to get them ready. Wiley, New York, NY Driver R, Newton P, Osborne D (2000) Establishing the norms of scientific argumentation in classrooms. Sci Educ 84(3):287–312 Duschl RA, Osborne J (2002) Supporting and promoting argumentation discourse in science education. Stud Sci Educ 38:39–72 Ellis JD, Bulgren JA (2009) Year 3 report submitted to the National Science Foundation. University of Kansas Center for Research on Learning, Lawrence Erduran S, Simon S, Osborne J (2004) TAPping into argumentation: developments in the application of Toulmin’s argument pattern for studying science discourse. Sci Educ 88:915–933. doi:10.1002/sce.20012 Heller R, Greenleaf C (2007) Literacy instruction in the content areas: getting to the core of middle and high school improvement. Alliance for Excellent Education, Washington, DC Jacobs VA (2008) Adolescent literacy: putting the crisis in context. Harv Educ Rev 78(1):7–39 Jimenez-Alexandre MP, Erduran S (2007) Argumentation in science education: an overview. Contemp Trends Issues Sci Educ 35(1):915–933. doi:10.1007/978-1-4020-6670-2_1 Kamil ML, Borman GD, Dole J, Kral CC, Salinger T, Torgesen J (2008) Improving adolescent literacy: effective classroom and intervention practices: a practice guide (NCEE #2008-4027). U.S. Department of Education, National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, Washington, DC Kelly GJ, Regev J, Prothero W (2007) Analysis of lines of reasoning in written argumentation. In: Argumentation in science education. Springer, Netherlands, pp 137–158 Kelly AE, Baek JY, Lesh RA, Bannan-Ritland B (2008) Enabling innovations in education and systemizing their impact. In: Kelly AE, Lesh RA, Baek JY (eds) Handbook of design research methods in education. Routledge, New York, pp 3–18 Klahr D, Fay AL, Dunbar K (1993) Heuristics for scientific experimentation: a developmental study. Cogn Psychol 25(1):111–146 Kolsto SD, Ratcliffe M (2007) Social aspects of argumentation. Argum Sci Educ 35(2):117–136. doi:10.1007/978-1-4020-6670-2_6 Koslowski B, Okagaki L, Lorenz C, Umbach D (1989) When covariation is not enough: the role of causal mechanism, sampling method, and sample size in causal reasoning. Child Dev 60(6):1316–1327 Krajcik J, McNeill KL, Reiser BJ (2007) Learning-goals-driven design model: developing curriculum materials that align with national standards and incorporate project-based pedagogy, Wiley InterScience. Retrieved from www.interscience.wiley. com. doi:10.1002/sce.20240
97 Kuhn D (1991) The skills of argument. Cambridge University Press, Cambridge, MA Kuhn D, Amsel E, O’Loughlin M (1988) The development of scientific thinking skills. Academic Press, New York, NY Kuhn D, Garcia-Mila M, Zohar A, Andersen C (1995) Strategies of knowledge acquisition. Monogr Soc Res Child Dev 60(4):1–127 Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174 Lawson AE (2003) The nature and development of hypotheticopredictive argumentation with implications for science teaching. International Journal of Science Education 25(11):1387–1408 Lenz BK, Bulgren J (2013) Improving academic outcomes in content areas. In: Cook BG, Tankersley M (eds) Research-based practices in special education. Pearson, Upper Saddle River, NJ, pp 98–115 (Reprinted from Research-based for improving outcomes in academics, pp. 95–112, by D. J. Chard, B. G. Cook, & M. Tankersley, Eds., 2013, Upper Saddle River, NJ: Pearson) Linn MC, Clark D, Slotta JD (2003) WISE design for knowledge integration. Sci Educ 87:517–538 National Academy Press (2012) A framework for K-12 science education: practices, crosscutting concepts, and core ideas. Washington, DC National Research Council (2012) Education for life and work: developing transferable knowledge and skills in the 21st century. The National Academies Press, Washington, DC Norton-Meier L, Hand B, Hockenberry L, Wise K (2008) Questions, claims, and evidence: the important place of argument in children’s science writing. NSTA Press, Arlington, VA Pashler H, Bain PM, Bottge BA, Graesser A, Koedinger K, McDaniel M, Metcalfe J (2007) Organizing instruction and study to improve student learning. IES Practice Guide (NCER 2007–2004). National Center for Education Research, Jessup, MD Penner D, Klahr D (1996) The interaction of domain-specific knowledge and domain-general discovery strategies: a study with sinking objects. Child Dev 67:2709–2727 Reiser B, Tabak I, Sandoval WA, Smith BK, Steinmuller F, Leone AJ (2001) BGuILE: strategic and conceptual scaffolds for scientific inquiry in biology classrooms. In: Carver SM, Klahr D (eds) Cognition and instruction twenty-five years of progress. Erlbaum, Mahwah, NJ, pp 263–306 Reznitskaya A, Anderson RC (2002) The argument schema and learning to reason. In: Block CC, Pressley M (eds) Comprehension instruction: research-based best practices. Guilford Press, New York, NY, pp 319–334 Sadler TD (2004) Informal reasoning regarding socioscientific issues: a critical review of research. J Res Sci Teach 4:513–536. doi:10.1002/tea.2009 Sandoval WA, Millwood KA (2007) What can argumentation tell us about epistemology? In: Argumentation in science education. Springer, Netherlands, pp 71–88 Schauble L (1996) The development of scientific reasoning in knowledge-rich contexts. Dev Psychol 32:102–119 Toulmin S, Rieke R, Janik A (1984) An introduction to reasoning. MacMillan, New York, NY U.S. Department of Education, Office of Planning, Evaluation and Policy Development (2010) ESEA blueprint for reform. Author, Washington, DC Vygotsky L (1978) Mind in society. Harvard University Press, Boston, MA What Works Clearinghouse (2010) Procedures and standards handbook (version 2.1). Retrieved from http://ies.ed.gov/ncee/wwc/ DocumentSum.aspx?sid=19 Ysseldyke J (2009) When politics trumps science: generalizations from a career of research on assessment, decision making, and public policy. Communique 38(4):6–8
123