Using Science Notebooks to Assess Students ... - capsi - Caltech

3 downloads 119 Views 1MB Size Report
Apr 15, 2004 - California Institute of Technology. Pasadena, CA 91125 pama@caltech.edu acalonzo@caltech.edu. Paper presented at the Annual Meeting of ...
Using Science Notebooks to Assess Students’ Conceptual Understanding

Pamela R. Aschbacher & Alicia C. Alonzo CAPSI California Institute of Technology Pasadena, CA 91125 [email protected] [email protected]

Paper presented at the Annual Meeting of the AERA San Diego, April 15, 2004 Session 52.027: Assessment for Reform-Based Science Teaching and Learning

The project on which this paper is based was supported by a grant from the National Science Foundation (REC 0106994). The findings and opinions expressed in this report do not necessarily reflect the positions or policies of the agency. We greatly appreciate the students, teachers, and district science coordinators who participated in this study, and we also wish to acknowledge the following CAPSI staff and associates for their contributions to the project from which this paper is drawn: Jerry Pine, Ellen Roth, Melanie Jones, Cameron McPhee-Baker, Marie Rodriguez, Laurie Thompson, Claire Haagensen, Leila Gonzalez, Linda Endicott, Joy Hinckley, Saskya Byerly, and Ashley Grant. We also thank Shirley Magnusson and Annemarie Palincsar for their generous advice to our project.

Introduction In the past decade, national reform documents on science education have advocated a significant shift in emphasis from earlier visions of what students should learn in science classrooms. Instead of memorizing isolated scientific facts, the standards call for students to demonstrate deep conceptual understanding of a few fundamental ideas (AAAS, 1993; NRC, 1996). The success of these reform efforts depends in large part on assessment, at both the classroom and large-scale levels. In the classroom, assessments provide feedback to teachers and students to improve teaching and learning. This feedback is a critical component of effective teaching (Black, 1998; Harlen, 1990). Without information about student understanding, teachers’ efforts to improve practice are limited. Especially when implementing new ways of teaching, it is not enough for the teacher to consider what he or she did. To evaluate how a reform is succeeding in the classroom, the teacher must consider what was actually understood by the students (Hein & Price, 1994). Just as traditional forms of teaching are inadequate to address the learning goals embodied in current reform efforts, traditional forms of assessment are ill-equipped to assess the types of understandings students are expected to demonstrate (Herman, Aschbacher & Winters, 1992). This paper explores the potential of a particular type of classroom-level science assessment – notebooks – to help teachers monitor their students’ complex conceptual understanding. The use of notebooks as a part science instruction (collections of student writing during and after hands-on investigations) has been encouraged in many districts over the past two decades. This approach is supported by a number of researchers who advocate writing in science to enhance student understanding of scientific content and processes, as well as general writing skills (Bass, Baxter & Glaser, 2001; 2000; Keys, Prain, Hand & Collins, 1999; Rivard & Straw, 2000; Shepardson & Britsch, 1997). Additional support for the instructional value of notebooks comes from the literature on expertise, which suggests that competence is developed through active construction of knowledge, including explaining concepts to oneself or others (Chi, 2000). Given their close ties to instruction, science notebooks seem to have potential as assessment tools for at least a couple of reasons: 1) Notebooks are an embedded part of the curriculum. Teachers can obtain information about student understanding at any point in time without spending precious instructional time creating their own quizzes or giving an external assessment. 2) Because they are embedded in the curriculum, notebooks are a direct measure of student understanding of the implemented curriculum, and are thus directly relevant for formative assessment purposes. Despite notebooks’ potential as a formative assessment tool, research suggests that teachers tend not to make use of notebooks in this way, i.e. to improve teaching and learning (Baxter, Bass & Glaser, 2000; Ruiz-Primo, Li & Shavelson, 2001; Alonzo, 2001). In our experience, most teachers receive little if any professional guidance in how or why to use notebooks effectively, and it is well known that few teachers have had much if any background in assessment (Stiggins, 1991).

1

Several researchers have explored the relative value of notebooks and other methods of assessment. For example, Baxter & Shavelson (1994) found that scores on science notebooks were comparable to observations of hands-on tasks (group means were similar and correlations were between .75 and .84). Correlations between hands-on tasks and other assessment methods were lower (.28 to .53). They concluded that notebooks were useful tools and that each method may measure different yet related aspects of science achievement. Sugrue, Webb & Schlackman (1998) suggest that for complex concepts, open-ended written responses can uncover more misconceptions than multiple choice items reveal. Ruiz-Primo, Li & Shavelson (2001) explored the relationship between overall notebook performance and overall performance on other measures. But teachers need to be able to monitor and diagnose student understanding of specific concepts. This paper examines the utility of notebook evidence of specific conceptual understanding for formative assessment. We also examine how teachers’ patterns of practice (Aschbacher & Roth, 2002; Spillane & Zeuli, 1999) with science notebooks may affect their utility as a formative assessment tool. Our Approach This research is part of a three-year design experiment to study whether and how rigorous use of science notebooks may improve teaching practices and student achievement in science. Four districts that had already been using science notebooks expressed an interest in participating in research on how to use them more effectively. A collaborative team of science coordinators, expert teachers, and researchers has been developing and studying several iterations of a model for notebook design and implementation to accompany hands-on inquiry science curricula in 4th and 5th grade classrooms. This paper is based on some of the data from Year 2 of that study. In the course of this work we defined student notebooks as a set of entries written by the student that reflect inquiry experiences within the science classroom. Thus they reflect both teaching and learning as it occurs. In professional development for teachers, we have advocated that entries include a research question (also called “focus question”) for the inquiry, records of data collected during an investigation, organization of data to facilitate analysis, knowledge claims that relate to the original research question, and evidence to support claims that involves data and reasoning. Notebooks may also include other entries, such as predictions, materials and procedures, and questions for future investigation. Teachers were encouraged to experiment with using notebooks for other types of science writing related to the curriculum, and some included entries such as letters written to another class about what students had investigated and discovered. We posed three questions for this analysis: 1) How well do notebook scores predict other measures of students’ conceptual understanding? 2) How does teacher use of notebooks affect their utility as a formative assessment tool? 3) What are the major challenges teachers face in using notebooks to assess students’ conceptual understanding?

2

Method Overview This paper utilizes both quantitative and qualitative data from several sources collected during and after the teaching of an elementary hands-on science unit during the 2002/03 academic year. Primary data sources include student science notebooks, multiple choice pre- and post-tests of unit concepts, a performance assessment of one concept embedded in the middle of the unit, and teacher interviews. Four districts participated in the study. They were drawn from a network of a dozen districts in the state that were trying to establish and maintain hands-on elementary science programs. The majority of students come from low-income families. Many of the students are also English language learners. (See Table 1.) Table 1. District Characteristics % English Language K-5 students % FRPL* Learners District A 5428 56 28 District B 11,693 69 32 District C 5,319 75 20 District D 8,632 74 56 State 2,902,294 47 25 *FRPL indicates students eligible for free or reduced price lunch.

Location Suburban Urban Suburban/rural Rural

The four districts in the study had been encouraging but not requiring teachers to use science notebooks prior to this study. With limited resources for science professional development, districts had trained only some teachers in how to use notebooks in science. Principal understanding of and support for science notebook use was limited. Everyone was feeling the pressures of the accountability system in language arts. All four districts were eager to join the study on how to use science notebooks more effectively, and they hoped that improved use could support both science and literacy goals. One science unit the districts had in common at the 4th or 5th grade was Circuits and Pathways (Education Development Center, 1990), which became the target unit for this study. Participants Two groups of teachers and their 4th and 5th grade students participated in this study. One group, the “Protocol Teachers” (PTs), were experienced teachers nominated by their district science coordinators and principals and agreed to work with us for 2-3 years. Four PTs had already helped train other teachers in their districts in science. These teachers were selected as PTs with the goal that the extensive professional development experiences they would have over the course of the project would help them to coach other teachers in their districts after the study. By Year 2 of the study, they ranged in teaching experience from 7 to 25 years, with an average of 15 years. All but one had taught the target unit before, and all were experienced in teaching hands-on science at the target grade level and using notebooks per district recommendations. For our study, they participated in our professional development, agreed to implement

3

project ideas about notebooks during the target unit, let us collect data in their classrooms, and were interviewed about their work. Data from 8 of the PTs are used in this paper. The second group of teachers, “Regular Teachers” (RTs), provide a comparison group that illustrates “typical district practice” for this paper (n=17). These teachers were also recruited by the science coordinators, principals and PTs and volunteered to participate in the study during Years 2 and 3. All of them had had some (minimal) district professional development on how to use the science curriculum and notebooks. Nine of them were “experienced” teachers (ETs) who had taught from 4 to 27 years, (13 on average) and who had taught the unit and grade level before. Eight of them were considered “novice” teachers NTs), who had only 1 to 3 years teaching experience (with the exception of two who had 5 and 7 years experience but had never taught the target unit and grade level before). During Year 2 of the study, RTs taught the target unit and used science notebooks as recommended by the districts, without any guidance or professional development from us, thereby providing a baseline measure of typical practice in these districts. Professional Development In Year 1 (Y1), PTs participated in 2 days of professional development and 3 study group meetings. In Year 2 they had 1 day of professional development prior to teaching the target unit, and during the middle of the unit they had an additional day of professional development and 2 more study group meetings. Teachers were paid $150/day stipends for the professional development workshops and $25/hour for the hour-long after-school study groups we held during the time they were teaching the unit. Professional development experiences for the project were conducted by a team of CAPSI researchers, district science coordinators, and two consultants with strong experience in elementary science professional development. The workshops included hands-on learning experiences for teachers that addressed the science content of the unit, the purposes and content of science notebooks as they relate to the nature of scientific inquiry, analysis of student notebook entries to assess learning and revise instruction, and productive feedback strategies and rationale. Study group meetings gave teachers additional opportunities to examine their students’ notebooks for evidence of conceptual understanding, to practice how to give feedback to students in their notebooks, and to extend their own science content knowledge through hands-on experiences and discussion. Data Sources Data for this paper include ratings of students’ science notebooks, scores on multiple choice pre- and post-tests of unit concepts, and scores on a performance assessment designed for the unit called “What’s My Circuit?” (WMC). For this paper, we use the work of 10 students per class, randomly sampled from those who had completed all assessments and had parent permission to participate. A few students were excluded because they were such recent immigrants that they were unable to take the tests or write their notebooks in English. Notebooks. For the purposes of the larger study we developed a set of scoring rubrics to rate various aspects of student conceptual understanding and inquiry process, as well as classroom opportunity-to-learn variables. Analyses for this paper deal with three

4

conceptual understanding scales: understanding of simple, series and parallel circuits, which are summarized in Figure 1 in the appendix. Each of the concept scales addresses several sub-ideas that relate to the central concept. For each concept, students were given a point if their notebooks contained drawings showing that they had successfully constructed the type of circuit in question. Additional points were awarded for each of the sub-ideas that were evident in their entries for relevant notebook lessons. The simple circuit scale has a total of 7 points, the series circuit scale has 5, and the parallel circuit scale has 4 points. We developed precursors of these scales and used them in Year 1, then revised them to the current form for use in Years 2 and 3. One of our scientist-researchers who helped create the scales trained two research assistants (RA’s) who have very strong science backgrounds to use the scales. All 3 raters scored the first 4 classes of notebooks and reached consensus on each one. By this point they had reached 80% overall agreement. Since it was easy to overlook evidence in student notebooks for a variety of reasons, both RA’s read each of the sampled notebooks in the remaining classes and came to consensus on the scores. As they worked, the scientist-researcher also scored 20% of the sampled notebooks in each class and met with them regularly to check on the reliability of their ratings and to recalibrate as necessary. (See Table 2.) Table 2. Reliability of Notebook Scores on Conceptual Understanding 2 trained RAs’ agreement 2 RAs’ consensus w/ Senior Rater Could S build circuit (yes or no) 81.6% 88.9% Content understanding (of sub ideas) over 3 82.2% 81.3% scales* *Agreement on this part of the scale is plus or minus one

Multiple-choice test. In Year 1 of the study we developed a multiple choice test of unit concepts, drawing on ideas from assessments used in research by Palincsar and Magnusson for their GisML unit of study on electricity (Magnusson, Palincsar, Ford, Lomangino, Hapgood, & MacLean, 2000) and research by Sugrue, Webb and colleagues (e.g. Sugrue, Webb & Schlackman, 1998) as well as the pre- and post assessments included in the Insights curriculum. We eventually refined our Y1 version for use in Y2. Eleven of the 14 items used in Y2 are relevant to the 3 concepts we explore here: 5 items addressed knowledge of simple circuits (4 points); 4 items addressed series circuits (4 points), and 2 addressed parallel circuits in combination with series circuits as taught in the curriculum (2 points). The same test was given prior to the unit and after the last day of the unit. Performance assessment. WMC was designed several years ago by Jerry Pine and Gail Baxter to accompany the target unit in one of the districts and was then adopted for use as an embedded assessment activity in some of the other districts. We adapted the assessment for administration by teachers in Y1 of our study, then refined it and administered it ourselves inY2 for greater standardization. WMC was administered in the middle of the unit. In WMC, students are to figure out the configurations of 2 mystery circuits (each built inside a box with only a lightbulb showing). They are given similar materials to make their own circuits for comparison purposes. The questions related to series circuits and used in this analysis are: drawing 3 circuits accurately (6 pts),

5

explaining their reasoning (3 points), and generalizing a rule for the brightness of the circuits related to the materials (3 points). Table 3 lists the sub-ideas for understanding the three types of circuits analyzed for this paper and indicates which ideas are measured by the notebook ratings, post-test and/or WMC performance assessment. Table 3. Ideas Measured in Notebooks, Performance Assessment, and Pre/Post Test NB simple circuit • simple circuit drawing • complete circuit • directional flow of electricity • conceptual understanding of CCPs1 • 1 – 3 CCPs • 4th CCP • flow of electricity through the bulb series circuit • series circuit drawing • single pathway • adding bulbs • adding batteries • remove bulb • relative brightness of bulbs in series circuit parallel circuit • parallel circuit drawing • multiple pathways • brightness of bulbs (compared to series or simple circuits) • remove bulb

WMC

X X X X X X X X X X X X

Pre/Post Test X X X X X X X

X X X

X X X

X X X

X

X

X

Results and Discussion Analysis of our data suggest that notebooks have some validity as measures of classroom learning, but the value of that measurement is very dependent on how teachers use them. How well do notebook scores predict other measures of students’ conceptual understanding? Table 4 presents student achievement on three concepts measured by the notebooks and the multiple-choice post test. The table also includes achievement on the series circuit concept measured by the WMC performance assessment. All scores were normalized to 100 so that we could more easily compare them and refer to the percent of points obtained. The table compares classes taught by the three groups of teachers in the study: PTs with special training, untrained experienced teachers (ETs), and untrained novice teachers (NTs). The latter two groups represent the range of current 1

The criticial contact points (CCP’s) are the points on the battery and bulb that must be in contact with each other or the wire to complete a simple circuit.

6

baseline practice in the districts studied. For simplicity sake, pretest scores are not included in the table. However, there were only very small differences among the three teacher groups on each pretest concept score (differences ranged from only 1 to 6%), so the three groups of classes -- including trained and untrained teachers, experienced and novice, were quite comparable at the beginning of the unit. Since ETs and NTs achievement levels were similar, some of the subsequent analyses combine these groups into one group of regular (baseline/untrained) teachers (RTs). Table 4. Student performance on different assessments of three electrical circuit concepts. _____Teachers PTs ETs NTs RTs (et+nt) All T’s

(N=77) (89) (79) (168) (245)

MC Post Test Series x (Sd) 78 (30.9) 60 (40.3) 54 (36.7) 57 (38.6)

Simple x (Sd) 73 (16.8) 68 (21.4) 65 (18.6) 67 (20.2) 69

(19.4)

64

(37.6)

Parallel x (Sd) 29 (37.5) 17 (28.3) 12 (25.6) 15 (27.1)

Simple x (Sd) 41 (29.8) 10 (19.1) 20 (24.7) 15 (22.4)

Notebooks Series x (Sd) 36 (35.1) 17 (25.3) 6 (12.1) 12 (20.9)

Parallel x (Sd) 31 (27.9) 14 (20.5) 7 (10.3) 10 (16.9)

WMC Series x (Sd) 42 (20.4) 32 (20.8) 25 (18.3) 28 (20.0)

19

23

19

17

33

(31.4)

(27.7)

(28.4)

(23.0)

These results show that regardless of type of measure used, students taught by PTs showed greater understanding of the 3 target concepts than students taught by both the untrained experienced and untrained novice teachers. These results suggest that it was the special training rather than the prior teaching experience of the PTs that accounted for the better performance of their students. Later we will look at how trained teachers used notebooks to examine what kinds of practices make the notebooks more or less useful as assessment tools. Figure 1 displays some of the results from Table 4 and represents the combined ET and NT classes as RTs for easier comparison.

Fig. 1 Multiple Measures of Student Conceptual Understanding in Trained and Untrained Teachers' Classes 100 90 80 70 60 50 40 30 20 10 0

87 73

67 57

41

42

36

28

15 NB

PO

NB

PT

RT

Simple

31 29

12 PO

NB

PO WMC

PT

NB

10 PO WMC

RT

Series

NB

PO

NB

15 PO

RT

PT

Parallel

7

(21.0)

For parallel circuits, the two kinds of measures (notebooks and post test) agreed quite closely about the performance level of each group. For simple and series circuit concepts, however, notebooks and the multiple choice test portrayed somewhat different pictures of student understanding or perhaps measured somewhat different abilities. Notebooks gave a somewhat harsher view of conceptual understanding than the post test, particularly for simple and series circuits. For example, PT classes got about 40% of the notebook simple circuit points whereas they got 73% of the simple circuit points on the post test. For RT classes, there was an even greater disparity between results for the different assessment methods. Students in these untrained teachers’ classes received rather low scores on the notebook concept ratings (10-15%) but much higher scores on the post test (54 to 68%). Regression analyses on all classes combined confirmed that the notebook scores predict performance on other measures, but they account for only a very small amount of the variance in each case. (See Table 5.) When regressions were run to compare the predictive value for PTs and RTs, notebooks scores were consistently more predictive for PT classes than for RT classes. We feel that training in how to use notebooks was largely responsible for this difference. Table 5. Summary of Regression Analyses, Using Notebook Scores to Predict Scores on Other Measures Predictor NB Simple NB Series NB Series NB Parallel

Dependent Variable Post Simple Post Series WMC Series Post Parallel

R Square .058 .036 .109 .047

Significance Level .001 .003 .001 .001

There are several reason why notebooks might have somewhat limited predictive power. Notebook entries reflect learning in real time whereas the post test is a summative measure. Of course learning is not instant. Students don’t usually understand a concept as soon as it is introduced. In fact, notebooks are useful in the classroom because they can provide a window on students’ emerging conceptions. Teachers can use them as a formative assessment tool to help decide when to revisit concepts or ideas in later units. But, whether or not teachers explicitly re-teach earlier concepts in later lessons, students themselves can have insights about early concepts when they build new kinds of circuits in later lessons. This possibility is supported by comparing data from three sources for the series circuit concept. The NB series and WMC series scores were obtained at about the same point in time, and they are more highly correlated (r=.33) than the NB series scores are to the POST series scores (r=.19).2 Another reason we might not expect notebooks to predict post test scores well is that the two measures may tap somewhat different abilities. Indeed, notebook entries are meant to be generative responses whereas the post test is a multiple choice test. The test involves mostly recognition -- of drawings of three types of circuits, and of statements or drawings that represent ideas learned from hands-on experience about the relative brightness of bulbs in different circuits and the effect of removing a bulb. For example, 2

2-tailed Pearson correlations, both are significant at p

Suggest Documents