When Science Counts as Much as Reading and Mathematics: An ...

5 downloads 68244 Views 402KB Size Report
Sep 3, 2012 - Twitter: @epaa_aape. Accepted: ...... Washington, DC: The National Academies Press. .... Michael W. Apple University of Wisconsin, Madison.
education policy analysis archives A peer-reviewed, independent, open access, multilingual journal

epaa aape Arizona State University

Volume 20 Number 26

September 3rd, 2012

ISSN 1068-2341

When Science Counts as Much as Reading and Mathematics: An Examination of Differing State Accountability Policies Eugene Judson Mary Lou Fulton Teachers College, Arizona State University United States Citation: Judson, E. (2012). When Science Counts as Much as Reading and Mathematics: An Examination of Differing State Accountability Policies. Education Policy Analysis Archives, 20 (26). Retrieved [date], from http://epaa.asu.edu/ojs/article/view/987 Abstract: Although only results from mathematics and reading assessments are required to be used when Adequate Yearly Progress (AYP) of schools is calculated, some states have elected to include science achievement results either in their AYP calculations or as part of a separate dual accountability system. This study examined 2009 National Assessment for Educational Progress (NAEP) results based on how states use, or do not use, science in their accountability programs. Consideration was given to the idea that including science achievement might detract from efforts, and consequently results, in mathematics and reading. Results from both fourth- and eighth-grade data indicated that states choosing to use science in their accountability calculations did not lose ground in those other subjects. Fourth-grade data indicates that the states using science in their accountability programs additionally had significantly higher science achievement than the other states. Journal website: http://epaa.asu.edu/ojs/ Facebook: /EPAAAAPE Twitter: @epaa_aape

Manuscript received: 11/07/2011 Revisions received: 03/15/2012 Accepted: 05/27/2012

Education Policy Analysis Archives Vol. 20 No. 26

2

Keywords: accountability; assessment; educational policy; science education; large scale assessment Cuando la ciencia cuenta tanto como lectura y matemáticas: un estudio de las políticas estatales diferenciadas de rendición de cuentas Resumen: Aunque sólo resultados de las evaluaciones de matemáticas y lectura se requieren para calcular el Progreso Anual Adecuado (por sus siglas en inglés, AYP) de las escuelas, algunos estados han optado por incluir los resultados en el área de ciencias, ya sea en sus cálculos de progreso anual adecuado o como parte de un sistema doble de rendición de cuentas. Este estudio examinó los resultados para 2009 de la Evaluación Nacional de Progreso Educativo (por sus siglas en inglés, NAEP) en función si los estados utilizaron o no los resultados en el área de ciencias en sus programas de rendición de cuentas. Se tuvo en cuenta la idea de que como los esfuerzos en el área de ciencias podrían disminuir los esfuerzos y por tanto los resultados en las matemáticas y la lectura. Los resultados tanto en cuarto y octavo grados indican que los estados que optaron por utilizar la ciencia en sus evaluaciones no perdieron terreno en los otros temas. Además, para el cuarto grado los datos indican que los estados que utilizaron datos del área de ciencias en sus programas de rendición de cuentas tuvieron logros significativamente mayores en ciencias que otros estados. Palabras clave: rendición de cuentas; evaluación; política educativa; educación científica; evaluación a gran escala Quando a ciência conta tanto como a leitura e a matemática: um estudo de políticas públicas diferenciadas de prestação de contas Resumo: Embora apenas os resultados para leitura e matemática sejam necessários para o cálculo de Progresso Anual Adequado (a sigla em Inglês, é AYP) de escolas, alguns estados optaram por incluir os resultados das ciências, em seus cálculos de AYP ou como parte de um sistema dual de prestação de contas. Este estudo analisou os resultados de 2009 da Avaliação Nacional do Progresso Educacional (a sigla em Inglês é NAEP), dependendo se os estados usaram ou não os resultados na área da ciência em seus programas de avaliação. Considerou-se que os esforços em ciência poderiam enfraquecer os esforços e, portanto, os resultados nas áreas de matemática e leitura. Os resultados tanto na quarta quanto na oitava série indicam que os Estados que decidem usar a ciência em suas avaliações não perderam terreno em outras disciplinas. Além disso, para a quarta série, a evidência sugere que os estados que utilizaram dados da área da ciência em seus programas de avaliação tiverem resultados na área de ciência significativamente maior do que outros estados. Palavras-chave: prestação de contas; avaliação; política educacional; educação científica; avaliação em grande escala

Introduction In the decade that has followed the passing of the No Child Left Behind (NCLB) Act, educators, policymakers, and researchers have often referred to a resulting narrowed curriculum. Typically, a narrowed curriculum refers to an overconcentration on the content areas of mathematics and reading and a diminished focus on other subjects, such as science and social studies. This is commonly believed to occur because achievement results from mathematics and reading contribute directly to the calculation of Adequate Yearly Progress (AYP) for schools and for local educational agencies (LEAs). When schools or LEAs miss AYP in successive years in the same area, such as mathematics achievement, then the institution is subject to intervention from the state department of education and if AYP is missed in the same area for six consecutive years, invasive actions such as removal of a school board or replacement of staff may occur (Northwest Regional

When Science Counts as Much as Reading and Mathematics

3

Educational Laboratory, 2004). Other measures such as the percentage of students tested, high school graduation rates, and attendance rates, also contribute to AYP calculations, but it is student achievement results from mathematics and reading that are the primary reasons schools and LEAs miss AYP (Hoff, 2009). Criteria for the use of mathematics and reading achievement results were laid out in the NCLB Act. A chief objective for the designers of NCLB was that 100 percent of students would meet or exceed grade level expectations in mathematics and reading by the 2013-14 academic year (No Child Left Behind Act, 2002; Spellings, 2007). Regarding science, the NCLB Act specified that science standards were to be in place by the 2005-06 school year and, by the 2007-2008 school year, states were to have in place science assessments to be administered at least once during grades 3-5; grades 6-9; and grades 10-12 (Gross et al, 2005). Thirty-one states administer science assessments to only the three grades minimally required. Mathematics and reading assessments are however administered to at least seven grades in all fifty states. Significant to this study, although it is a requirement for the states to have science standards in place and to assess science, there is no stipulation within the NCLB Act requiring states to use the results from their science assessments as part of accountability calculations. Despite there being no federal requirement to include science achievement into accountability calculations, eleven states have chosen to do just that. To take the figure of speech of the narrowing curriculum further, the author suggests the image of an accountability filtration system (Figure 1). On the left side of the accountability membrane it is envisioned there is an extensive curriculum scope where all content areas are represented equitably. However, the accountability membrane allows primarily just the mathematics and reading content to easily pass. The fraction of a content area present to the left or to the right of the membrane would perhaps be proportionate to the amount of effort dedicated to teaching and learning that particular subject. In this simplistic model it is predicted that student learning is balanced on the left side of the membrane where effort is distributed among the subjects and equivalent achievement is occurring. To the right of the membrane it may be believed that greater concentration of the two subjects of mathematics and reading leads to focused effort by teachers and students and subsequently greater achievement in these two most tested subjects. In this sense, the curriculum hasn’t been narrowed as much as it has been restricted.

math

technology

arts social studies P.E./health science reading

Figure 1. The Accountability Filter

A C C O U N T A B I L I T Y

math reading reading math math reading

Education Policy Analysis Archives Vol. 20 No. 26

4

However this filtration system operates differently among the states. In some states science is passing through the accountability membrane in addition to mathematics and reading. That is, a few states include science achievement directly in their accountability calculations alongside mathematics achievement and reading achievement. If a narrowed, or filtered, curriculum does lead to improved results in the subjects that pass through the accountability boundary, then there might be concern that achievement becomes more diffused in those few states where more than just two subjects are allowed to pass through. This might be a particular concern in elementary schools where greater latitude is often allowed regarding how much time and effort is spent on each subject. That is to say, if elementary teachers must be held accountable for three, instead of two subjects, the concern can immediately be that in order to attend to science, there will be a loss in mathematics or reading, or both. This concern for how science is attended to in classrooms and how that attention may be affected by state accountability policies does not simply result from a sentiment for fairness among the subject areas, but stems from broader national interest. Various national-level policy documents have promoted the importance of supporting effective science education programs as a means to strengthen national pride and invigorate the United States’ global competitiveness (e.g., National Governors Association, 2007; National Research Council, 2007). While these documents, and policies such as the America COMPETES Act, endorse an agenda of enriched science curricula and improved science teacher training, they have not impacted the practices of state departments of education calculating AYP. The intent of this study was to investigate effects of allowing more than the two subject areas of mathematics and reading to be part of accountability calculations. In comparing the few states that integrate science into their accountability calculations with all other states, two research questions were pursued: 1. Does science achievement of students differ between states that use science as part of their accountability calculations and states that do not use science in their accountability calculations? 2. Does mathematics achievement and reading achievement of students differ between states that use science as part of their accountability calculations and states that do not use science in their accountability calculations?

Background It can be generally agreed that a primary intention of the NCLB legislation was for all children to achieve specific proficiency standards in mathematics and reading. This goal was supported through several key provisions of the NCLB Act such as requiring schools that have not made AYP for two consecutive years to provide opportunities for their students to receive extra academic assistance and promoting teacher quality by ensuring teachers meet state certification requirements (Shaul & Ganson, 2005). Although it is not difficult to find commentaries and studies that contest the value of NCLB (e.g., Amrein-Beardsley, 2009), there does exist lukewarm support for the efficacy of NCLB in the research literature. Using estimates of accountability pressure among the states, Nichols, Glass, and Berliner (2006) found no relationship between accountability pressure with later cohort mathematics achievement on National Assessment of Educational Progress (NAEP) results at the fourth- and eighth-grade levels; however, the researchers did find a causal relationship between high-stakes testing pressure and subsequent achievement on non-cohort fourth-grade mathematics achievement (i.e., comparing achievement to a subsequent year’s achievement from the same grade level). Nichols, Glass, and Berliner attributed this effect as likely

When Science Counts as Much as Reading and Mathematics

5

due to increased time spent on mathematics instruction. Also, using a rating scale to approximate states’ strength of accountability, Carnoy and Loeb (2002) found substantial gains in eighth-grade mathematics scores when states raised external pressure on schools. Finally, in a comprehensive study of the effects of external accountability on student achievement, Lee (2008) examined the 76 effect size estimates from 14 large-scale studies. Results of Lee’s meta-analysis showed modest positive policy effects on average, but the study did not address possible shifting of resources or attention from one subject area to another when accountability pressures mount. Although a scholarly debate may persist for years to come regarding the value of policies that provide sanctions and rewards based largely on results from once-a-year multiple-choice tests, teachers and school administrators do not have the luxury of dismissing the NCLB related requirements that have been translated and set down by their state departments of education. While it is known that achievement of both high and low performing students can be affected by multiple variables, such as teacher quality (Darling-Hammond, 2000) and parental influence (Davis-Kean, 2005), there remains the question of what effect the distal variable of an accountability program has on different levels of student performance. Reback (2008) found that when school personnel believe the goal of attaining proficiency is achievable they are quick to respond to looming interventions and students at the lowest levels of performance make greater than expected gains in mathematics, yet high achieving students do not make similar gains in mathematics. However, this type of differentiated improvement may be due to school personnel attending to achievement in an educational triage manner – with students who are nearly passing or failing high stakes exams receiving greatest attention. Reback concluded that accountability incentives could influence achievement across subjects and across grades: If a school has a relatively strong incentive to improve students' math performance in a particular grade, then the lowest achieving students in that grade outperform similar schoolmates. The other students in that grade, however, perform worse than similar schoolmates in the other grades, (unless their own performance is relatively important for the school's rating). If a school has a relatively strong incentive to improve some students' reading performance in a particular grade, then other students in this grade perform much worse than similar schoolmates. The findings are again consistent with schools sacrificing general performance in a classroom to focus on the performance of particular students. (p. 1411) In support of this idea that accountability policies can lead to a shifting of resources, in a case study of one elementary school, Booher-Jennings (2005) also concluded that teachers’ response to potential accountability consequences led to a focus on students close to performance thresholds and to diminished attention for other students. Similarly, Diamond and Spillane (2004) analyzed data from observations and interviews and inferred that low performing schools had a more limited focus on improving achievement among a narrow band of students who were at or near performance levels. This narrowed focus was also found within the Chicago Public Schools where Neal and Schanzenbach (2010) used data from standardized tests to learn that students in the third and fourth deciles of prior achievement made greater than expected gains in both mathematics and reading, while students in the first and second deciles remained stagnant. However, other researchers have found that although failing schools may target students who are on the boundary of meeting performance expectations, this does not occur at the expense of the higher performing students enrolled at the same schools (Springer, 2007). Adding to the mixed results of the NCLB effect, Ballou and Springer (2008) found that improvements experienced at schools do not necessarily come at the cost of affecting high-performing students. In fact, they found pressured schools tend to increase achievement in most grades and not just in the grades

Education Policy Analysis Archives Vol. 20 No. 26

6

where low achievement had led to the schools missing performance targets. Furthermore, the findings of these researchers cast doubt on the notion that NCLB related gains are largest among schools most likely to face sanctions; in fact, Ballou and Springer found the opposite: that response to NCLB has been greatest among schools that are least threatened by interventions due to low test scores. While the aforementioned research literature addresses how resources and attention can be transacted between the subjects of reading and mathematics, across grades, and among student groups, less has been studied regarding the effects on other subjects, namely science. There was optimism earlier in the history of NCLB that the requirement of states to assess science in at least three grades would actually lead to an increased focus on the subject (Cavanagh, 2007). However, a review of further reports leads to the conclusion that any early optimism turned to concern about science being relegated to a subject less important than reading or mathematics, especially in the elementary grades. Although it has been reported that elementary teachers describe their beliefs about teaching science to be unchanged by NCLB, and that they possess a generally positive attitude (Milner, Sondergeld, Demir, Johnson, & Czerniak, 2011), there are descriptions of schools diminishing the amount of time spent on science that have been reported repeatedly (Kingsbury, 2007; Linn, 2008; McMurrer, 2007; McMurrer, 2008). As might be expected, school and district personnel have reported cuts in time spent on science, as well as arts, social studies, physical education, and even lunch and recess. These cuts have been attributed to a shared perception for the need to spend more time on mathematics and reading. In fact, analysis of data from elementary schools found that in light of NCLB mandates, science was cut by at least 75 minutes per week in at least half of the reporting districts (McMurrer, 2008). Yet, as discussed, accountability programs are not identical among the states and a few states do require science achievement to be part of calculations when determining if schools are meeting or not meeting targets necessary to avoid intervention from their state departments of education. In a comparison of fourth-grade and eighth-grade 2005 NAEP science achievement results, states were grouped based on whether they did or did not include science in their accountability calculations (Judson, 2010). Results of this study revealed that there was no appreciable difference between the groups of states when comparing eighth-grade NAEP science results. However, analysis of the fourth-grade NAEP science results revealed there were significant differences in favor of states that used science in their accountability programs. The medium effect size of the difference in fourthgrade results between the groups of states can on the one hand be taken as support for including science into accountability formulas. On the other hand, missing from this study was an examination of what was simultaneously occurring to achievement in mathematics and reading. If the states that use science in their accountability programs are shown to have significantly higher science achievement than other states on a common assessment but are also found to have inferior achievement on mathematics or reading achievement, then the argument can be made that allowing a third subject to pass through the accountability membrane leads to diffused results across the other high stakes content areas. The intent of this study was to then pick up where this earlier study left off. Using the more recent 2009 NAEP science achievement data, comparisons were to be made between states that choose to use and not use science in their accountability programs. Additionally, the analysis here would go further and examine if differences between these groups of states could be detected in the mathematics and reading achievement results of their fourth- and eighth-grade students.

When Science Counts as Much as Reading and Mathematics

7

Methods Categorizing the States The states were grouped into three categories based on how they use or do not use science achievement in their accountability program calculations. Although all of the states assess science achievement in at least three grades, only a few have mandated that the results from their state science assessments contribute to the determinations of whether schools and LEAs are meeting accountability benchmarks. Choosing to use science as part of accountability programs may be done in one of two ways. One way is for a state to directly integrate science achievement results into the federally required AYP calculations. Although the states are required to use high school graduation rate as a variable when determining if secondary schools are meeting expectations, the states are allowed to decide what variable they will use as an additional indicator when calculating AYP for the elementary grades. The large majority of states have selected attendance as their additional indicator for the elementary grades. However, a few states, such as New York, use science achievement as their additional indicator when calculating AYP. The second way that science achievement can contribute to accountability is when it is part of a second accountability system that is parallel to the NCLB required AYP-based accountability system. Because most states did have some form of accountability and reporting in place prior to the commencement of NCLB legislation, several of those states have chosen to continue with, and adapt, their previous accountability systems. At present there are fifteen states that have dual accountability programs (i.e., the AYP accountability program plus a state accountability program) in place (Blank & Hovanetz, 2009). Among these fifteen states, a few require that, as part of their state accountability program, science achievement be included in the calculations. For example, within Utah’s U-PASS accountability system, science achievement contributes 20 percent of a school’s proficiency rating. For this study, to be labeled as a state that “uses” science in their dual accountability program, it was determined that this parallel accountability system needed to carry potential penalties when targets are not met, just as the AYP-based accountability program does. In other words, if the dual accountability program simply assessed and reported schools’ status, but did not have the weight of possible intervention when schools failed, then that was viewed equivalent to the predominant AYPbased system in which states report science results but those results do not contribute to computing whether or not a school or LEA is subject to sanctions. The states were categorized into three groups. Group 1 was comprised of the thirty-nine states that do not use science achievement in their accountability calculations. The definitions for the second and third groups of states were based on (a) the degree to which science is required as an accountability variable, and (b) the grade levels assessed by the states. Each of these criteria is clarified here. Among the eleven states that integrate science achievement results into accountability calculations, some of those states allow schools to select science from a menu of choices. This is the case in Georgia where science may be chosen as the additional indicator for AYP by the elementary schools, but most often the schools select to use attendance rate. If a state allows for science to be used in accountability, but does not require it, that state was placed in Group 2. Because the NAEP science assessment achievement results were used in this study as the dependent variable when comparing states, the definitions for the second and third groups were also influenced by the grades tested by NAEP. That is, while some states do require that science achievement be included in accountability, the grades from which they require those results do

Education Policy Analysis Archives Vol. 20 No. 26

8

not always match the grades assessed by NAEP. NAEP tests science in only fourth- and eighthgrade. If a state required science to be included in accountability but did not require results from fourth- or eighth-grade, the state was placed in Group 2. There were cases such as that of North Carolina that requires fifth- and eighth-grade science results to be used in their accountability programs. In this instance, North Carolina matches only one NAEP grade. Therefore, North Carolina was placed in Group 2 for fourth-grade categorization and in Group 3 for eighth-grade categorization. Group 2 are the states that have either partially committed to using science in accountability by making it a choice or do require the use of science achievement in accountability, but do not require those results from a NAEP tested grade (i.e., fourth- or eighth-grade). The rationale for creating the category of Group 2, as opposed to aggregating these states with the states that required the use of science achievement results and did match the NAEP grades, was to maintain a clear focus on possible effects of a stricter definition of using science achievement in accountability. The Group 2 states were not coupled with the Group 1 states because there was interest to see if perhaps a spillover effect was in play in states that had made some movement in the direction of using science achievement in accountability. Group 3 are those states that require science to be used in accountability calculations and require achievement results from a grade matching a NAEP tested grade. The groupings of the states are provided in Table 1. Table 1 Categories of States Based on Use of Science Achievement in Accountability Calculations 4th Grade

8th Grade

Group 1 – Do not include science in accountability.

All other states

All other states

Group 2 – Include science in accountability, but either as a choice or do not use results from the NAEP tested grade

CA, GA, MI, NC, OH, VA

GA, KY

Group 3 – Require science in accountability from the NAEP tested grade

KY, NY, SC, TN, UT

CA, MI, NY, NC, OH, SC, TN, UT, VA

Use of NAEP Data As mentioned, NAEP data were utilized to address the research questions. The first research question is an inquiry of whether the process of including science into accountability calculations affects student achievement. NAEP science achievement results from 2000 and 2009 were available and considered well suited to address the first research question. In reauthorizing the Elementary and Secondary Education Act (ESEA), NCLB legislation was passed in 2001 and began to take effect in 2002 (No Child Left Behind Act, 2002). The NAEP

When Science Counts as Much as Reading and Mathematics

9

data from 2000 then provided a glimpse into pre-accountability austerity. Because the framework for the NAEP science achievement assessment was different in 2009 than it was in 2000 (National Assessment Governing Board, 2008), scale scores from the two years could not be strictly compared. Although the science framework had changed, certainly both the 2000 and 2009 assessments were built on the construct of science and the 2000 achievement results could be used as covariate variables when comparing 2009 fourth- and eighth-grade results across Groups 1, 2, and 3. A point of clarification must be made regarding the use of NAEP data for this study that investigated how differing AYP practices among the states might yield dissimilar results on NAEP assessments. It is important to note that NAEP assessment results are not integrated into any AYP calculations and are not used as proxies for states’ high stakes accountability tests. Studies have demonstrated that the percentage of students scoring proficient on state examinations can be extremely different from the percent of students scoring at Proficient level on NAEP (Stoneberg, 2007) and therefore using NAEP data when examining AYP practices may warrant caution. For example, although overall NAEP results have remained relatively stable over time, student proficiency on state assessments has increased in some states (Jacob, 2007). A cause of this discrepancy is likely due to the evidence revealed in a state level examination of proficiency standards conducted by the National Center for Education Statistics (NCES) indicating that for approximately half of the cases, examined in grades 4 and 8, the rigor of states’ standards had decreased between 2005 and 2009 (Bandeira de Mello, 2011). However, this lack of corroboration between NAEP results and the proportion of students meeting states’ proficiency standards was not viewed as a deterrent to this study. Within this study NAEP was not used as confirmatory evidence of states’ assessment results; rather, NAEP was utilized only as a basis of comparison of achievement in the content areas of science, mathematics, and reading, as NAEP “offers the most reliable and equitable measures of student achievement across states” (Nicholas, 2005, p. 2). That is, it was not an objective to determine if states’ methods of determining proportions of proficient students on their state assessments correlated with NAEP. Instead, the intent was to determine if the variable of including science results from state assessments in state-level accountability practices influenced achievement, as detected by NAEP. To be consistent with the selection of the science NAEP analysis, mathematics and reading NAEP data were selected from 2009 and a pre-NCLB year. Similar to science, mathematics had been assessed in fourth- and eighth-grade in both 2000 and 2009. Reading had been assessed in both fourth- and eighth-grade in 1998 and again in 2009. The use of these data allowed the second research question to be addressed. If the inclusion of science into accountability programs reduced attention to the core subjects of mathematics and reading, then there was anticipation of a negative impact on mathematics and reading achievement in the Group 3 states. Data Analysis A series of analysis of covariance (ANCOVA) were conducted for the three subject areas of science, mathematics, and reading using the pre-NCLB NAEP mean scale scores as covariates and the 2009 NAEP mean scale scores as the dependent variables. The ANCOVA analyses were conducted for each of the three subjects of reading, mathematics, and science and for both fourth- and eighth-grade data. The use of ANCOVA analysis allowed the 2009 data to be adjusted on the pre-NCLB covariate data; this procedure yields adjusted means, also referred to as estimated marginal means, on the 2009 scale scores “as if” all groups had equivalent scale

Education Policy Analysis Archives Vol. 20 No. 26

10

scores in the pre-NCLB year. Significant omnibus F-test results were followed up with Fisher least significant difference (LSD) post-hoc comparisons. NAEP achievement data from 1998 were used as the covariate when analyzing differences among the groups in reading and NAEP achievement data from 2000 were used as the covariates when analyzing differences among the groups in mathematics and science. To be included in the analyzed dataset, states needed to have reported NAEP data from the pre-NCLB year and from 2009. This requirement eliminated some states from the study. Fourth-grade data were available from 39 states for reading, 40 states for mathematics, and 37 states for science. Eighth-grade data were available from 36 states for reading, 39 states for mathematics, and 36 states for science. Data were analyzed for all students and were also disaggregated and analyzed based on socioeconomic status (SES). To varying degrees, SES status has been shown to be related to academic achievement in multiple subject areas (e.g., McGraw, Lubienski, & Strutchens, 2006; Perry & McConney, 2010; Stipek & Ryan, 1997, Willms, 2003). Also of interest was determining if effects, attributable to the use of science in state accountability programs, could be detected among student groups based on ethnicity. Researchers have examined disparity in achievement among ethnic groups on standardized tests that is often related to SES differences among the groups (Flores, 2007; Magnuson, Rosenbaum, & Waldfogel, 2008). However, the NAEP data were inadequate to examine groups based on ethnicity. Disaggregation of NAEP science data, based on ethnicity, provided too few states meeting criteria of the National Assessment Governing Board (NAGB), that oversees NAEP administration and reporting, to be useable across all ethnic groups. For example, only a total of four states among the eighth-grade data met the NAGB criteria for both the pre-NCLB and post-NCLB data years for the groups of Asian and American Indian students. A reduced amount of states reported other ethnic groups (e.g., Hispanic), but not to the extreme of the example of Asian and American Indian student reporting. Because of the requirement that data needed to be available from both the pre-NCLB year and from 2009 had already vetted the states to be analyzed by approximately 30 percent, analysis remained focused on the broad category of all students and the SES-based categories that were generally well reported by the states.

Results Results of ANCOVA analysis from the aggregate category of all students and SES-based categories (i.e., eligible and not eligible for free or reduced lunch) are provided for fourth-grade in Table 2. Separate ANCOVA analysis of all students’ NAEP scale scores for the three fourth-grade subject areas revealed that after controlling for the pre-NCLB scale scores (covariates) there were no significant differences among the groups in reading or in mathematics. There were, though, significant differences in the category of all students among the groups of states in their fourth-grade science results, F(2, 34) = 4.831, p < .05. A significant difference among the groups of states persisted when evaluating the fourth-grade data in the categories of students eligible for free or reduced lunch, F(2, 34) = 3.639, p < .05, and students not eligible for free or reduced lunch, F(2, 34) = 4.286, p < .05.

When Science Counts as Much as Reading and Mathematics

11

Table 2 Fourth-grade NAEP Pre- and Post-NCLB, All Students and SES-based Groups Pre-NCLB Subject Reading

Students All

Eligible free/reduced Not eligible free/reduced Math

All

Eligible free/reduced Not eligible free/reduced Science

All

Eligible free/reduced Not eligible free/reduced

Post-NCLB

Group

n

M

SD

M

SD

EMM

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

29 5 5 29 5 5 29 5 5 29 6 5 29 6 5 29 6 5 26 6 5 26 6 5 26 6 5

214.7 211.4 214.0 199.0 194.3 199.8 225.3 223.2 225.5 224.9 225.1 222.2 212.5 210.7 209.1 234.1 235.2 233.3 149.4 146.6 147.8 135.9 128.9 134.4 160.0 159.3 159.2

8.2 6.0 3.6 8.0 7.3 5.4 5.7 3.1 3.9 6.6 7.4 3.5 6.0 6.8 4.1 4.5 4.4 3.0 8.3 10.1 5.5 9.1 10.0 5.9 5.2 5.1 2.8

220.0 218.3 220.4 207.1 204.3 208.8 231.0 230.8 231.2 239.4 239.1 237.5 228.9 226.5 227.2 248.6 250.0 247.2 149.9 149.6 152.0 136.7 134.0 139.2 161.7 163.0 164.6

6.9 5.9 4.4 5.4 5.3 5.3 5.3 3.2 4.3 6.5 5.1 3.7 5.6 4.9 3.9 4.8 3.4 3.0 7.8 9.1 5.5 7.6 7.7 6.3 5.6 6.2 4.5

219.7 220.1 220.5 206.9 205.8 208.3 230.8 231.9 230.9 239.2 238.7 239.4 228.5 227.2 229.0 248.7 249.2 247.8 149.3 151.5 152.9 135.7 138.2 139.4 161.5 163.6 165.2

F

p

.090

.914

.347

.709

.152

.859

.066

.936

.403

.671

.299

.743

4.831

.014*

3.639

.037*

4.286

.022*

EMM: Estimated Marginal Mean *p < .05 Fisher LSD comparisons were conducted to determine differences between the groups in fourth-grade science achievement. The post hoc tests revealed significant differences in fourth-grade between Group 1 (i.e., states not including science in accountability) and Group 3 (i.e., states that require science in accountability), p < .05, for the categories of all students, students eligible for free or reduced lunch, and students not eligible for free or reduced lunch. In all three cases, Group 3 had significantly higher mean scale science scores. Fourth-grade science achievement was significantly higher in states where science was required to be part of an accountability program. There were no significant differences in the fourth-grade data between Group 2 and either Group 1 or Group 3 for the all students and SES-based categories.

Education Policy Analysis Archives Vol. 20 No. 26

12

Partial eta-squared ( p2) measurements were used to determine effect sizes in which small, medium, and large effects were operationalized as .01, .06, and .14, respectively (Stevens, 1992). The effect of using science in accountability programs on the 2009 mean NAEP science scale scores was considered large for the fourth grade categories of all students ( p2 = .226), students eligible for free or reduced lunch ( p2 = .181), and students not eligible for free or reduced lunch ( p2 = .206). The eighth-grade data were similarly analyzed. Table 3 provides results of the ANCOVA analyses of the three groups of states across the subjects of reading, mathematics, and science. Table 3 Eighth-grade NAEP Pre- and Post-NCLB, All Students and SES-based Groups Pre-NCLB Subject Reading

Students All

Eligible free/reduced Not eligible free/reduced Math

All

Eligible free/reduced Not eligible free/reduced Science

All

Eligible free/reduced Not eligible free/reduced

Post-NCLB

Group

n

M

SD

M

SD

EMM

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

27 2 7 27 2 7 27 2 7 28 2 9 28 2 9 28 2 9 25 2 9 25 2 9 25 2 9

261.4 259.8 260.3 247.4 245.4 244.0 268.4 269.1 269.5 272.8 267.6 271.1 256.4 250.4 251.8 281.0 279.1 280.2 148.6 145.8 147.0 132.3 129.1 128.6 157.2 156.5 155.8

6.2 3.6 5.3 6.0 7.5 5.5 5.1 1.7 3.5 9.2 3.2 7.4 9.5 6.3 6.5 7.3 1.9 5.9 8.9 5.4 9.3 10.3 8.8 9.2 6.5 3.5 7.5

262.5 263.5 260.8 250.2 253.0 247.9 271.2 273.8 270.5 282.4 278.4 280.7 268.2 266.2 265.5 292.3 290.0 291.4 149.5 151.6 149.4 135.6 140.1 134.2 159.5 162.9 160.6

6.5 4.7 4.8 4.9 5.4 3.9 5.1 3.1 2.8 8.9 1.2 5.3 7.0 2.3 4.4 7.2 0.5 4.4 8.5 6.7 7.4 8.5 10.3 7.1 6.1 2.9 4.2

262.3 264.6 261.5 249.8 253.5 249.1 271.4 273.5 270.0 281.9 282.1 281.6 267.4 268.9 267.4 292.1 291.3 291.8 149.0 153.6 150.3 134.8 141.7 136.2 159.3 163.1 161.3

F

p

.468

.625

.971

.389

1.094

.347

.029

.972

.146

.865

.054

.948

2.711

.082

4.269

.023*

3.556

.040*

EMM: Estimated Marginal Mean *p < .05 Similar to the fourth-grade analysis, the only significant differences among the three groups were found in the subject of science. There were no significant differences among

When Science Counts as Much as Reading and Mathematics

13

groups in either eighth-grade reading or eighth-grade mathematics data when comparing 2009 mean scale scores, with the pre-NCLB scores used as the covariates. Unlike the fourth-grade results, the eighth-grade results in science did not align neatly with the hypothesis that inclusion of science into accountability programs will lead to greater achievement. The omnibus F tests did not reveal a significant effect in the category of all students. However, there was a significant difference among the groups in the categories of students eligible for free or reduced lunch, F(2, 33) = 4.269, p