Mathematical Communication in State Standards ... - SAGE Journals

1 downloads 0 Views 1MB Size Report
This study examined how effective the prevalence of various forms of mathematical communication in ... fully implementing the Common Core State Standards in Mathematics. Specifically, although ...... Standards.pdf. Clotfelter, C. T., Ladd ...
595723 research-article2015

EPXXXX10.1177/0895904815595723Kosko and GaoEducational Policy

Article

Mathematical Communication in State Standards Before the Common Core

Educational Policy 2017, Vol. 31(3) 275­–302 © The Author(s) 2015 Reprints and permissions: sagepub.com/journalsPermissions.nav DOI: 10.1177/0895904815595723 journals.sagepub.com/home/epx

Karl Wesley Kosko1 and Yang Gao1

Abstract Mathematical communication has been an important feature of standards documents since National Council of Teachers of Mathematics’ (NCTM) (1989) Curriculum and Evaluation Standards. Such an emphasis has influenced content standards of states from then to present. This study examined how effective the prevalence of various forms of mathematical communication in 2009 state standards documents was in regard to National Assessment of Educational Progress (NAEP) 2009 achievement scores for Grade 4. Analysis suggests mixed results with potential implications as states move toward fully implementing the Common Core State Standards in Mathematics. Specifically, although including language requiring mathematical descriptions from students had a positive effect on Grade 4 NAEP 2009 achievement scores, including language requiring rationales and justifications was not found to have a statistically significant effect. Keywords mathematics education, standards, achievement, common core

1Kent

State University, OH, USA

Corresponding Author: Karl Wesley Kosko, Kent State University, 401 White Hall, P.O. Box 5190, Kent, OH 44242, USA. Email: [email protected]

276

Educational Policy 31(3)

Introduction Many education reformers continue to advocate engaging students in mathematical communication through discussion and writing. First gaining traction as a reform-oriented position with the publication of Curriculum and Evaluation Standards for School Mathematics (National Council of Teachers of Mathematics [NCTM], 1989), mathematical communication has been observed to engage students in developing deeper understandings of mathematics (e.g., Hufferd-Ackles, Fuson, & Sherin, 2004; Staples, 2007; Yackel & Cobb, 1996). Furthermore, student engagement in mathematical communication has been found to statistically predict students’ mathematics achievement (Hiebert & Wearne, 1993; Koichu, Berman, & Moore, 2007; Kosko, 2012; Mercer & Sams, 2006). While evidence exists to support teacher practices that engage students in mathematical communication, no such examination has been done from a policy perspective. Specifically, the importance of mathematical communication was first argued successfully from a standards document (Lampert & Cobb, 2000; McLeod, 2003; NCTM, 1989). Therefore, it is important to evaluate the effectiveness of various standards documents in advocating student engagement in mathematical communication. As individual states transition away from their own standards to the new Common Core State Standards in Mathematics (CCSSM), the importance of such an evaluation is more pertinent than ever. It is important to evaluate the differences in the “old” standards and the “new” CCSSM ones to determine if we are moving forward effectively. Furthermore, there is a need to examine what ways different content standards may be improved by including expectations that students describe and justify their mathematics. The purpose of the present study is to provide a baseline of research for future efforts in the field to review and recommend revisions for standards-based language. Specifically, we chose to examine the general effectiveness of including such expectations as integrated in mathematics content standards. To accomplish these goals, we examined the effectiveness of states’ standards expectations for mathematical communication in regard to students’ National Assessment of Education Progress (NAEP) 2009 mathematics scores. Comparing states by way of students’ NAEP 2009 mathematics achievement provides a common metric between states, as well as a metric that is similar in form (i.e., standardized test) to how most states tend to assess the effectiveness of teachers’ instruction in accordance with standards. In a similar manner, this study examines the effectiveness of the various 2009 standards documents (as opposed to teachers’ instruction) regarding conveyance of expectations for students to communicate mathematically.

Kosko and Gao

277

A Brief History of Mathematical Communication in the Content Standards Although the modern standards movement can be dated as far back as the launch of Sputnik in 1957, the first explicit national standards recommendations for mathematics came about in NCTM’s (1989) Curriculum and Evaluation Standards for School Mathematics (Lappan & Wanko, 2003). The new standards document was effectively the first such publication of its kind and led to various states’ and textbook publishers’ incorporation and adaptation of the standards (Lappan & Wanko, 2003; Porter, 1994). Within this initial standards document was an expectation for students to communicate about their mathematics. As McLeod (2003) notes, The importance of mathematics as communication was noted in early drafts, but mostly in the context of the growing need for mathematical literacy in the postindustrial age. At some point, communication got listed in the 5-8 group as one of the “big ideas” that deserved attention, perhaps because the issue of communication had been a central theme in some College Board publications that were available. In later drafts, the Communication Standard became one of the general standards listed for all grade levels. The emphasis on communication was a new feature in NCTM recommendations, a new idea that later appeared regularly in standards documents for other school subjects. (p. 777)

The communication standard was the second standard listed in each grade band (NCTM, 1989). A useful example of expectations comes from the Grades K-4 band in which NCTM (1989) advocated communication as a means for children to make connections among multiple means of representing mathematical ideas (pictorial, graphical, symbolic, etc.). Discussion and writing were encouraged as ways for students to develop better understandings of mathematics. Furthermore, the Curriculum and Evaluation Standards for School Mathematics encouraged teachers at all grade levels to engage students in communication through the posing of questions, facilitating discussions among students, and engaging students in writing for mathematics (NCTM, 1989). The legacy of NCTM’s 1989 document can be seen in Principles and Standards for School Mathematics (NCTM, 2000), a revision of their 1989 standards document. The Principles and Standards refined the ideas posed at various grade bands in 1989 and provided a more comprehensive set of expectations for what they referred to as the mathematical process standard of communication. As a process standard, communication was viewed as distinct, but integrated with the mathematics content standards as a “[way] of acquiring and using content knowledge” (NCTM, 2000, p. 29). Specifically,

278

Educational Policy 31(3)

NCTM (2000) suggested that from prekindergarten through Grade 12, students should organize and consolidate their mathematical thinking through communication; communicate their mathematical thinking coherently and clearly to peers, teachers, and others; analyze and evaluate the mathematical thinking and strategies of others; and use the language of mathematics to express mathematical ideas precisely (NCTM, 2000). Among the specific actions that support these usages of communication are students describing, explaining, justifying, conjecturing, writing, questioning, arguing, listening, and speaking about their mathematics. More recently, the Common Core State Standards for Mathematics (CCSSM) have included expectations for various aspects of mathematical communication within their grade-specific content standards and their more general Standards for Mathematical Practice (Common Core State Standards Initiative [CCSSI], 2010). These expectations include the mathematical practice to construct viable arguments and critique the reasoning of others. Like the NCTM communication process standard that preceded it, this mathematical practice includes expectations for students to provide descriptions and rationales in various formats, as well as interpreting them via reading and listening to others’ mathematical arguments. This brief history of mathematical communication in mathematics standards documents shows that mathematical communication has consistently been considered an important expectation since the first serious standards document was written. Yet, the general effect of including such expectations in standards documents on students’ mathematics achievement has not been examined. We argue that such an examination is necessary to determine how effective language used to convey such expectations of mathematical communication are in their most current form.

Interpretations of Mathematical Communication Much of the literature suggests teachers’ self-reported familiarity and understanding with NCTM-aligned standards relates to what teachers do in the classroom and to their students’ mathematics achievement (Hamilton & Berends, 2006; Loeber, 2008; Young, 2007). However, other research reports on discrepancies between what various state standards intend and what teachers interpret they intend. H. C. Hill (2001) observed how teachers in a particular school district interpreted new mathematics standards in their states as they aligned them to the adopted textbook series in the district (Saxon Math). She notes that state standards were interpreted in reference to the curriculum, rather than the curriculum being interpreted in terms of the standards. Interpretations by the teachers were observed to represent “a wide gulf [from] what the state wants and what teachers determined to do within this district”

Kosko and Gao

279

(H. C. Hill, 2001, p. 310). A separate study by H. C. Hill (2005) found differences between teachers’ and researchers’ assessment of teaching practices in the classroom in regard to “mathematical communication and representation.” Examining secondary mathematics teachers interpretation of the Common Core Standard for Mathematical Practice, construct viable arguments and critique the reasoning of others, Kosko, Rougee, and Herbst (2014) also found many teachers had interpretations of such practices that did not match expectations of researchers. Observations as those presented in the prior paragraph may help explain findings from Kosko and Miyazaki (2012) who found that there was significant variance in the effect of discussion frequency on mathematics achievement between teachers and schools. The variance was so large that in many schools, engaging in discussion had a negative effect and in others, it was generally positive. Yet, the cause of such variation may go beyond the teacher. In examining various state standards for the usage of primary verbs in content standards, Larnell and Smith (2011) noted that verbs such as “explain” and “justify” (as well as others that would be classified here as related to communication) were used relatively infrequently. The verb “explain,” which was used more frequently than other communication-related verbs, was used in varying ways. “Though explain is commonly considered a relatively highlevel verb in Bloom’s Taxonomy, this analysis suggests that states may be using it to identify low-level cognitive processes” (pp. 111-112) as well as high-level cognitive processes. Thus, Larnell and Smith’s (2011) findings suggest that misinterpreting of standards, and mathematical communication in standards, may generate at the state level, and not merely at the teacher level. It also seems entirely plausible that teachers’ misinterpretations (or correct interpretations) may be influenced by the state policy documents they are beholden. Although it is not a purpose of this study to investigate teachers’ interpretations of standards documents, they are discussed here because such factors may influence the effectiveness of states’ standards. As useful as it would be to incorporate teachers’ interpretations of standards in the present study, the nature of data (as will be discussed later) does not allow for such a comparison at the scale of the present study.

Research Question Research suggests that standards documents may affect how teachers incorporate mathematical communication in their classrooms (e.g., Loeber, 2008; Young, 2007). In turn, students who are taught by teachers incorporating mathematical communication appropriately tend to have higher mathematics

280

Educational Policy 31(3)

achievement (Hiebert & Wearne, 1993; Koichu et al., 2007; Kosko, 2012; Mercer & Sams, 2006). Yet, some evidence exists that suggests teachers may not interpret standards as intended, when it comes to mathematical communication (H. C. Hill, 2001, 2005; Kosko et al., 2014), possibly because some states’ standards may not include consistent language in providing expectations in content standards for mathematical communication (Larnell & Smith, 2011). In essence, these various studies suggest engaging students in mathematical discussions and writing in particular ways can have a statistically significant and meaningful effect on their mathematics achievement, but it is unclear whether or not state standards as policy documents are effective in ensuring students engage in such actions. The purpose of this study is to provide an initial examination of whether and to what degree states’ incorporation of mathematical communication in their mathematics standards documents affects students’ mathematics achievement. Our focus on mathematical communication is interpreted from NCTM’s (2000) descriptions, in accordance with McLeod’s (2003) description of the historical incorporation of mathematical communication in standards documents. While the term effectiveness can be interpreted in a number of ways, the present study uses the term in regard to students’ mathematics achievement. The reason for such an interpretation is simple: Those who drive policy (e.g., politicians and the public) typically interpret effectiveness of education reforms in terms of test scores. Whether or not such usage is appropriate is a worthwhile topic, but not one which we discuss here. Rather, we use this indicator of effectiveness because of its present currency in policy decision making. Therefore, the present study uses that metric to determine how “effective” policy documents in the form of state content standards are. To assess this study’s objective, a secondary analysis of NAEP 2009 data was conducted. NAEP 2009 provides an assessment common among all 50 states and at the last measured point before adoption of CCSSM beginning in 2010. As will be described in the forthcoming sections, state mathematics content standards effective in the 2008-2009 academic year were collected and examined for elements of mathematical communication. These data were then merged with NAEP 2009 Grades 4 restricted-use data to provide an indicator of the effectiveness of various states’ standards emphasizing mathematical communication on students’ mathematics achievement.1 Merging of the data and the analysis that followed allowed for the examination of the following research question: Research Question 1: To what degree does the prevalence of states’ content standards that emphasize aspects of mathematical communication (in terms of descriptions and explanations, as well as justification and rationale) affect students’ mathematical achievement in those states?

Kosko and Gao

281

Nature of Data Two sources of data were used in the present study. The first were mathematics content standards documents from all 50 U.S. states. These data are publicly available on request from the various governments of each of those states. The second source of data was restricted-use data for Grade 4 mathematics in the NAEP 2009 assessment (National Center for Education Statistics [NCES], 2009b). The NAEP mathematics assessment uses a common set of items to examine the status of mathematics learning across the United States. Additional data related to the students, teachers, schools, and districts are collected as well to provide for a general indication of the status of education in the nation. The nature of NAEP’s common assessment across states, which can have very different state-level content tests, was a necessary requirement for the present study’s goals. Although the nature of NAEP 2009 provided for a useful and necessary source of data for this study, use of such data came with certain limitations. Specifically, neither teachers’ interpretations of standards nor their teaching practices were observed in NAEP 2009, and such effects are not assessed in the present study. This limits the interpretability and available range of focus of the present study to the effectiveness of the standards documents alone. So, inferences regarding teaching practices cannot be made.

Sample The sample used for NAEP 2009 incorporated a multistage cluster sampling design to obtain a sample where “estimates of population and subpopulation characteristics could be obtained with reasonably high precision” (NCES, 2009a, p. 68). Therefore, certain subpopulations were oversampled (e.g., Native Americans) to insure that appropriate estimates for those subpopulations could be obtained. To adjust for this oversampling in certain clusters, NCES (2009a) recommends use of certain sample weights, depending on the analysis performed. In accordance with these recommendations, the sample weight identified in the data set with the designation ORIGWT was used in the present study. Using this weight, the sample in this study can be interpreted as representative of students in those enrolled in Grade 4 in a public school in one of the 50 U.S. states.2 While NAEP 2009 collected data from private schools, the goals of this study were to examine how state mathematics content standards influenced student achievement, and thus only public school data were used. In addition, only students who resided in a U.S. state were included in the study. Other reductions in sample were due to various independent variables included as

282

Educational Policy 31(3)

covariates in the study. The result of these reductions was an effective sample3 of 23,340 students taught by 3,610 teachers in 50 U.S. states. Descriptive characteristics for the sample is further provided in the measures section below, as these characteristics were used as covariates in the present study.

Measures Dependent variable.  Students’ mathematics achievement was used as the outcome in this study. According to NCES (2009a), the NAEP 2009 mathematics test assessed five content strands: number properties and operations; measurement; geometry; data analysis, statistics, and probability; and algebra. The assessment used multiple-choice, short constructed-response and extended constructed-response items in such a manner as to cover all content strands. NAEP 2009 tests were assessed and scored using item response theory (IRT). “IRT models the probability of answering a question in a certain way as a mathematical function of proficiency or skill” (NCES, 2009a, p. 26). NAEP created a composite score based on IRT estimations with a range of 0 to 500. For the present analysis, the mean of variables MRPCM1 to MRPCM5, which represent composite score estimates as previously described, were used to create the outcome measure (MathScore). MathScore represents the mathematics composite score for students across all content strands (M = 240.05, SD = 26.96, range = 120.60-332.77). Independent variables.  The primary purpose of this study is to examine the effect of mathematics content standards that require students to communicate, with specific focus on how prevalence of such standards affects students’ mathematics achievement scores. As noted in the review of literature, the Principles and Standards for School Mathematics (NCTM, 2000) describes a variety of ways for students to communicate mathematically. Using NCTM’s guidelines, two codes were used to study states’ 2009 content standards: description and rationale. We incorporated systemic functional linguistics (SFL) in our coding process to identify language indicative of expectations for communication (Halliday & Matthiessen, 2004). Specifically, we examined the system of transitivity, which serves to organize the experiential function of language, to identify standards conveying an expectation for mathematical description or rationale. Through the system of transitivity, experience is conveyed primarily through grammatical processes (e.g., I added 42; the answer is 20), with participants and other grammatical elements providing appropriate context. Although transitive processes often convey the central meaning of a grammatical clause, or unit of grammatical meaning, these processes can, in certain

Kosko and Gao

283

contexts, become embedded in a clause such that the meaning is not initially at the fore. This process, referred to as nominalization, causes the transitive process to serve as a participant in the clause. In other words, such processes are acted on by other processes, thereby providing multiple layers of meaning in a grammatical text. While others have looked at the difference between nominalized and nonnominalized processes in content standards (i.e., Kosko & Herbst, 2011), we looked only at the presence of processes eliciting expectations for communication (i.e., description and rationale). This presence was counted whether the process acted as a main transitive process in the standard or if it was embedded in the clause as a participant. Throughout the descriptions of our coding, we provide examples with italicized processes and nominalized processes to help illustrate their presence in standards language. Description refers to standards that require students to describe mathematical procedures, strategies, steps, or concepts. According to the Principles and Standards (NCTM, 2000), communication helps students to “gain insights into their thinking when they present their methods for solving problems” (p. 60), and students should “learn new mathematical concepts” through giving “verbal accounts and explanations” (p. 61). Maryland (2004) requires students to identify, describe, extend, analyze, and create a non-numeric growing or repeating pattern. (Grade 4, Patterns and Functions, 4.1.2)

While this particular standard asks students to do several things regarding patterns, asking students to describe these patterns encourages them to articulate procedures or concepts, which further confirms communication as “a way of sharing ideas and clarifying understanding” (NCTM, 2000, p. 60). Hawaii (2007) asks Grade 4 students to explain the need to use standard units for measuring. (Grade 4, Measurement, 4.4.1)

This standard corresponds to the Principles and Standards recommendation that students should “explain their ideas in mathematic class” (p. 62). Both example standards provided here utilize transitive process for describing some form of mathematical procedure or concept, and thus were coded as description standards. Rationale designated standards that require students to articulate justifications, conjecture or create hypotheses, or engage in mathematical arguments. Take some Grade 4 standards as examples, Hawaii (2007) asks students to

284

Educational Policy 31(3)

propose and justify conclusions/predictions based on data. (MA4.13.1)

Similarly, Missouri (2008) requires students to estimate and justify products of whole numbers. (3. D)

The transitive process justify solicits the action to engage students in mathematical communication and arguments, which has the potential to help them to better understand mathematical concepts or procedures. Therefore, examples such as the ones provided in this paragraph were coded as rationale standards. Grade 4 mathematics content standards were collected for analysis. To align the standards with NAEP mathematics achievement scores, standards documents that were in effect in the academic year of 2008-2009 were collected for analysis (as this coincided with the year NAEP 2009 data were collected). These documents were collected from a variety of sources including the websites from the Department of Education in each U.S. state, and directly contacting each of these offices for verification we had identified the standards document effective in the 2008-2009 academic year. After standards documents were collected, a portion of the data (10%) was examined by both authors, as is typical in such analyses, to refine the coding rubric (see Hruschka et al., 2004). The remaining 90% of data were coded by both authors individually. Afterward, both authors compared and reconciled results. Weighted Kappa was calculated to assess interrater reliability or prereconciled data, as it allows for ratings closer in magnitude to be given more weight in terms of agreement than ratings further away in magnitude (Landis & Koch, 1977). Our weighted Kappa was .62 for description and .60 for rationale. Landis and Koch (1977) suggested that kappa coefficients .41 to .60 have moderate reliability and .61 to .80 have substantial reliability. This indicates that the coding of 2009 standards was sufficiently reliable for the present study. Descriptive statistics suggest that description standards (M = .16; SD = .10; range = .00-.47) may be more prevalent than rationale standards (M = .05; SD = .05; range = .00-.24). However, there was substantial variance in the frequency of standards that required either form of communication. While most states had relatively low frequencies of standards requiring descriptions or rationales from students, there was a great deal of variation in the degree to which different states incorporated such expectations of students, with some states having no such expectations and others requiring descriptions for nearly half of their math content standards and rationales for nearly a quarter.

285

Kosko and Gao Table 1.  Descriptive Statistics for Student Covariates at Level 1. dFemale M = 0.49 SD = 0.50

dLunch

dBlack

M = 0.49 M = 0.16 SD = 0.50 SD = 0.37

dHispanic

dAsian

dNatAmer

dOther

M = 0.22 M = 0.05 M = 0.01 M = 0.02 SD = 0.41 SD = 0.22 SD = 0.11 SD = 0.12

Covariates.  While the focus of the present study is on how language in 2009 state mathematics standards affected students’ mathematics achievement in Grade 4, other factors have historically affected mathematics achievement. Namely, gender, race/ethnicity, and socioeconomic status (SES) have each been consistently associated with differences in mathematics achievement. Tate (1997) reviewed literature and found that between 1973 and 1992, the achievement gap attributable to race closed in Grades 4, 8, and 12. This was also found to be true of SES. However, Lubienski (2002) examined NAEP data from 1990, 1996, and 2000 and found that the achievement gap due to race increased between 1990 and 2000 for Grade 8 Black students, but remained the same at Grade 4. SES did not adequately explain the achievement gaps. A later study of NAEP data found that, while small, the achievement gap due to gender was consistently present for data between 1990 and 2003 (McGraw, Lubienski, & Strutchens, 2006). This finding is consistent with prior research (Tate, 1997). Given the consistent effects associated with gender, race, and SES, these factors were included at the student level of analysis in the present study. Gender was included by taking the variable SEX from the NAEP data set and dummy coding it (dFemale) so that the effect would associate with being female (0 = male; 1 = female). For race/ethnicity, the variable SRACE, which indicated several designations of race for students, was separated into 5 dummy coded variables so that the designation White/Caucasian was the reference group: dBlack (0 = not Black; 1 = Black), dHispanic (0 = not Hispanic; 1 = Hispanic), dAsian (0 = not Asian; 1 = Asian), dNatAmer (0 = not Native American; 1 = Native American), and dOther (0 = not Other non-White ethnicity; 1 = Other non-White ethnicity). SES was assessed using the variable SLUNCH1, which designated a student’s eligibility for the National School Lunch Program (i.e., free and/or reduced lunch). As the provided variable included multiple designations, it was dummy coded into dLunch (0 = not eligible for free/reduced lunch; 1 = eligible for free/reduced lunch). Descriptive statistics for these student-level covariates are displayed in Table 1. Additional covariates were included in the analysis to account for teacher factors influential to mathematics achievement. According to findings by Clotfelter, Ladd, and Vigdor (2007), the more years of experience

286

Educational Policy 31(3)

a teacher has, the higher his or her students’ mathematics achievement scores are. This finding has been confirmed through use of experience as a covariate in other literature (e.g., Kosko & Miyazaki, 2012). Therefore, we included a dichotomous variable Experience (M = 0.83, SD = 0.37), which represents whether a teacher has more than 3 years of experience4 (1 = more than 3 years; 0 = 3 years or less). The next teacher-level covariate included SES_Mean, which indicates the percentage of students enrolled in a teacher’s class who are eligible for free or reduced lunch (M = 0.47, SD = 0.34) and has been found to influence individual student achievement (de Fraine, van Damme, van Landeghem, Opdenakker, & Onghena, 2003). We also included MathScore_Mean (M = 236.74, SD = 20.23), which is the average NAEP mathematics achievement score for a teacher’s class. Mean class achievement is another teacher-level factor found to influence mathematics achievement at the individual student level (de Jong, Westerhof, & Kruiter, 2004; Opdenakker, van Damme, de Fraine, van Landeghem, & Onghena, 2002) and inclusion of this variable is meant to account for this effect.

Analysis and Results The present study incorporated hierarchical linear modeling (HLM) to examine the effect of language used in Grade 4 mathematics content standards in various states on students’ mathematics achievement, as assessed by the 2009 NAEP assessment. HLM accounts for the nested structure of social science data (e.g., students are nested in teachers’ classes, teachers are nested within the states they teach) by using multilevel regression analysis (Raudenbush & Bryk, 2002). Specifically, the group average of a coefficient’s effects at an individual student-level regression is treated as outcome measures for regression equations at the teacher level. Likewise, effects in teacher-level regression equations are treated as outcomes at the state level. For the analysis presented here, maximum likelihood estimation (MLE) was used to estimate effects at each level. The reported effects are, in actuality, averages of effects for each grouping factor, which are in turn weighted for the specific variance associated with each group. While there are various versions of HLM, the present study used threelevel HLM-3 to investigate the research questions. Specifically, it is assumed that state standards documents are read directly, and thus interpreted directly, by classroom teachers. These teachers in turn instruct their students in accordance with their interpretations of the standards documents. So, students were assessed at Level 1 in the three-level HLM and were assumed to be nested by teacher at Level 2, who were in turn nested within the state at Level 3.

287

Kosko and Gao

As is customary in HLM analysis, the HLM-3 models were constructed by first analyzing the data with no predictors, and then including predictors sequentially at each level to properly assess changes in effects and the variance due to the nested nature of data (Raudenbush & Bryk, 2002). Therefore, in the sections that follow, the construction and modification of each model leading up to the final model of analysis is described.

Unconditional Model The initial model examined in all HLM analyses is the unconditional model, or the model without conditions (i.e., variables). Analysis of the unconditional model allows for the researchers to determine the variance in the outcome measure and how much of this variance is due to the respective levels in the analysis. It also allows for a generalized baseline interpretation of the outcome measure across the sample, without adjusting for other factors. In the Equation Set 1 below, the mathematics score (MathScore) of student i taught by teacher j in state k is represented as the outcome at Level 1. The variable π0 jk represents the mean achievement of students taught by teacher j in state k, and eijk represents the random variance from this mean that is associated with the individual student i. At Level 2, the mean score per teacher, π0 jk, is the outcome variable with β00k representing the mean achievement for state k and r0 jk is the random variance associated with the individual teacher j. At Level 3, β00k , or the mean score per state, is the outcome variable with γ 000 symbolizing the grand mean, or overall average, of math achievement scores in all 50 states. The variable u00 k represents the deviation of state k’s mean achievement from the grand mean. Equation Set 1. Level 1:

( MathScore )ijk

= π0 jk + eijk .

Level 2: π0 jk = β00 k + r0 jk . Level 3: β00 k = γ 000 + u00 k . Results from the unconditional model suggest the typical student in the United States, residing in 1 of 50 states and not accounting for any other factors, has a mathematics achievement score of 239.50, which is statistically significant from zero at the .001 level.

288

Educational Policy 31(3)

Model 1—Student-Level Covariates Following the unconditional model, an initial conditional model was constructed by incorporating student factors at Level 1. These changes are represented in Equation Set 2. With all Level 1 predictors included, π0 jk now represents the mean achievement of White male students, not eligible for free and reduced lunch, who are taught by teacher j in state k. The remaining variables each represent the effect of belonging to a certain classification on a student’s mathematics achievement score. Thus, π1 jk represents the effect of being a female student, π2 jk represents the effect of being eligible for free and reduced lunch, and the variables π3 jk through π7 jk represent belonging to an ethnicity other than White/Caucasian (i.e., Black, Hispanic, Asian, Native American, or Other, respectively). eijk represents the random variance associated with student i. At Level 2, all slopes for covariates were set constant so as to provide for a more parsimonious model. Specifically, as the focus of the study is on the effect of certain language uses in states’ standards (at Level 3), the variance of Level 1 and Level 2 variables between grouping factors (teachers or states) was not addressed. Equation Set 2:

( MathScore )ijk

= π0 jk + π1 jk ⋅ ( dFemale )ijk + π2 jk ⋅ ( dLunch )ijk + π3 jk ⋅ ( dBlack )ijk + π4 jk ⋅ ( dHispanic )ijk + π5 jk ⋅

( dAsian )ijk + π6 jk ⋅ ( dNatAmer )ijk + π7 jk ⋅ ( dOther )ijk + eijk . Effects at Level 1 are outcome variables at Level 2. Likewise, effects in each Level 2 equation become the outcome variables at Level 3. At Level 2 and Level 3, random effects are associated only with the intercept, meaning that only the variation of the intercept was measured during this portion of analysis. This was done to keep a parsimonious, or simplified, model as additional variables were included at other levels. Results are presented in Table 2 and indicate that dFemale has a statistically significant, negative effect on students’ mathematics achievement ( γ100 = −1.12, p < .01). Other covariates found to have statistically significant effects were dLunch ( γ 200 = −12.69, p < .001), dBlack ( γ 300 = −16.86, p < .001), dHispanic ( γ 400 = −12.14, p < .001), dNatAmer ( γ 600 = −13.75, p < .001), and dOther ( γ 700 = −10.34, p < .01). The results suggest academic advantages for students who are not eligible for free and reduced lunch, who

289

Kosko and Gao Table 2.  Level 1 Coefficient Results for Grades 4 and 8.

Intercept, γ 000 dFemale, γ100 dLunch, γ 200 dBlack, γ300 dHispanic, γ 400 dAsian, γ 500 dNatAmer, γ 600 dOther, γ 700

Coefficient

SE

251.33*** −1.12** −12.69*** −16.86*** −12.14*** 1.64 −13.75*** −10.34**

0.77 0.17 0.21 0.29 0.32 0.49 0.77 0.74

*p < .05. **p < .01. ***p < .001.

are male, and who are either White or Asian (students reported as Asian were found not to have statistically significant math scores from White students). These results generally follow findings from previous research and merit their inclusion in the present analysis.

Model 2—Teacher-Level Covariates The next step in constructing the HLM-3 models was to include covariates at Level 2, the teacher level. Equation Set 2 represents the model at Level 1 as no changes were made at this level from Model 1. Equation Set 3 represents additions to the HLM-3 model, which take place at Level 2. Equation Set 3: π0 jk = β00 k + β01k ( Experience ) jk β02 k

( SES _ Mean ) jk β03k ( MathScore _ Mean ) jk + r0 jk . π1 jk = β10 k . π2 jk = β20 k . π3 jk = β30 k . π4 jk = β40 k . π5 jk = β50 k . π6 jk = β60 k . π7 jk = β70 k .

290

Educational Policy 31(3)

Table 3.  Results for Model 2.

Intercept, γ 000 SES_Mean, γ 010 MathScore_Mean, γ 020 dFemale, γ100 dLunch, γ 200 dBlack, γ300 dHispanic, γ 400 dAsian, γ 500 dNatAmer, γ 600 dOther, γ 700

Coefficient

SE

246.46*** 14.33*** 0.96*** −1.04** −10.57*** −9.46*** −6.44*** 1.16 −7.69*** −6.90**

0.93 0.69 0.01 0.32 0.66 0.58 0.52 1.04 0.98 2.53

*p < .05. **p < .01. ***p < .001.

As can be seen in Equation Set 3, Experience, SES_Mean, and MathScore_ Mean were included at Level 2 as predictors of the intercept, π0 jk, at Level 1, or the mean math achievement score for teacher j in state k. SES_Mean and MathScore_Mean were group mean centered, meaning that the effects for these particular variables are adjusted for the state mean of teacher j. So, β01k represents the effect of a teacher having more than 3 years of experience in the classroom on the mean math achievement score of his or her students. Also, β02k represents the effect of having a class with larger (or smaller) percentages of students eligible for free and reduced lunch than the average for state k. Similarly, β03k represents this effect in regard to higher average NAEP mathematics achievement scores relative to the average achievement score in state k. Initial results from Model 2 indicated that Experience was not statistically significant. We examined whether the removal of this variable would significantly change the model deviance, an indicator of model fit. This examination suggests that removal of the variable did not decrease model fit, χ2(df = 1) = .13, p > .05. Therefore, the variable was removed to defer to a more parsimonious model. Results for Model 2 are presented in Table 3 and indicate that coefficients at Level 1 were found to be statistically significant at levels similar to those found in Model 1. At Level 2, the effect of SES_ Mean, now designated by β01k , was found to have a positive and statistically significant effect on math achievement scores ( γ 010 = 14.33, p < .001), indicating that teachers with higher percentages of students eligible for free and reduced lunch had higher average mathematics scores in their class. This finding is counter to previous research examining class mean SES. Although

291

Kosko and Gao

this particular finding is outside of the scope of the present study, it is worth noting that prior research has indicated use of eligibility for free or reduced lunch may not be a reliable indicator of poverty, primarily because many students are eligible for such lunch programs for 1 or 2 years and then rise out of such indicators of poverty (M. S. Hill & Jenkins, 2001). However, a less reliable indicator would likely result in nonstatistically significant findings, suggesting that our small finding here needs further investigation. We acknowledge the possible limitation of eligibility for free or reduced lunch as a sole measure for SES, but include it due to a lack of additional information for SES of the student population in the sample. As expected, MathScore_ Mean, β02k , was found to be a statistically significant and positive predictor of student achievement ( γ 020 = 0.96, p < .001).

Model 3—Final Model The final model was constructed by including the main variables of interest at Level 3. These additions are represented by Equation Set 4. Equation Set 4: Level 2: π0 jk = β00 k + β01k ⋅ (Years ) jk + β02 k ⋅ ( PD ) jk + r0 jk . π1 jk = β10 k . π2 jk = β20 k . π3 jk π4 jk π5 jk π6 jk π7 jk

= β30 k . = β40 k . = β50 k . = β60 k . = β70 k .

Level 3: β00 k = γ 000 + γ 001 ⋅ ( Description )k + γ 002 ⋅ ( Rationale )k + u00 k . β01k = γ 010 . β02 k = γ 020 . β10 k = γ100 . β20 k = γ 200 . β30 k = γ 300 . β40 k = γ 400 .

292

Educational Policy 31(3) β50 k = γ 500 . β60 k = γ 600 . β70 k = γ 700 .

Level 2 was left unchanged from Model 2. At Level 3, the independent variables Description and rationale were added as predictors for β00k , which represents the mean math achievement score for White male students in state k who are taught by teachers whose classes have average achievement and percentages of students eligible for free or reduced lunch for their respective state. Given the nature of the independent variables, β00k also represents the aforementioned mean score for state k, which does not have any content standards that include language encouraging mathematical communication either in the form of description or rationale. Moreover, γ 001 represents the effect on a state’s mean math achievement adjusted for the percentage of mathematics content standards requiring students to describe or explain their mathematics. The effect of the percentage of standards requiring students to provide conjectures, rationales, or justifications is represented by γ 002 . Finally, u00 k represents the random variance associated with state k from the grand mean of all states’ math achievement scores. Final model results.  Final model results for Grade 4 are presented in Table 4. As can be seen from Table 4, Level 1 coefficients saw little change from previous models, and Level 2 covariates were also found to be similar to findings from Model 2. For Level 3, description was found to have a positive and statistically significant effect ( γ 001 = 22.41, p < .01). However, rationale was found not to be statistically significant ( γ 002 = −2.12, p = .90). These coefficients should be interpreted in reference to the percentage of such standards a state has included. For example, in a state with 16% of its standards including requirements for description (the national average), the average student would typically see an increase in his or her scores by 3.59 points. Such changes are not meaningful in magnitude. Rather, to improve the average student’s mathematics achievement score by just half a standard deviation, a state would need approximately 60% of its content standards to include expectations for mathematical description. As indicated previously, no state had such high frequencies of mathematics standards requiring description. Therefore, these results suggest that although the inclusion of expectations for mathematical description in content standards does have a statistically significant effect on students’ mathematics achievement, for most states this effect is not meaningful in size. Furthermore, inclusion of standards requiring rationales from students was not found to have a statistically significant effect on their mathematics achievement. Essentially, various states’ incorporation of mathematical communication in their Grade 4

293

Kosko and Gao Table 4.  Final Model Results. Grade 4  

Coefficient

SE

Intercept, γ 000 Description, γ 001 Rationale, γ 002 SES_Mean, γ 010

243.21*** 22.41*** −2.12 14.32*** 0.96*** −1.04** −10.56*** −9.47*** −6.44*** 1.19 −7.68*** −6.95**

1.81 7.81 16.94 0.68 0.01 0.32 0.66 0.58 0.51 1.03 0.98 2.53

MathScore_Mean, γ 020 dFemale, γ100 dLunch, γ 200 dBlack, γ300 dHispanic, γ 400 dAsian, γ 500 dNatAmer, γ 600 dOther, γ 700 *p < .05. **p < .01. ***p < .001.

mathematics standards documents is generally not effective in improving students’ mathematics achievement.

Discussion Results of the present study should be interpreted with great care. Specifically, these findings are indicative of the quality of state standards that were in effect in 2009 for Grade 4, but of which most are no longer in effect. Furthermore, the results are indicative of mathematics content standards specifically and not of mathematics teaching practices generally. It is of utmost importance that such facts be considered prior to interpreting the findings presented and discussed here. The present study presents a complicated view of how various states’ standards conveyance of mathematical communication affected students’ mathematics achievement. Results revealed that higher frequencies of description standards in Grade 4 documents did have a statistically significant effect on mathematics scores. However, the size of these effects were relatively low in magnitude. Interestingly, higher frequencies of rationale standards were found to not have a statistically significant effect associated with mathematics achievement. On one hand, these findings could be interpreted that efforts to include expectations for mathematical communication are not worthwhile. On the other hand, there is a large body of research demonstrating young children’s ability to

294

Educational Policy 31(3)

describe and argue about their mathematics (e.g., Hufferd-Ackles et al., 2004; Yackel & Cobb, 1996). Therefore, expectations for Grade 4 children to explain and justify their mathematics are reasonable. In addition, prior research on the effect of NCTM’s (2000) Principles and Standards document suggest this is not the case (Hamilton & Berends, 2006; Loeber, 2008; Young, 2007). Such documents can be effective, but the majority of U.S. state’s standards documents simply were not effective, or were not effective enough. Rather than conclude that standards do not matter, it is more likely that the incorporation of communication expectations in content standards by various states has not been done in an effective manner. Such a conclusion would align with studies such as that conducted by Larnell and Smith (2011), who found that various states’ inclusion of mathematics communication-oriented expectations was vague, inconsistent, and not always aligning with the higher expectations described by NCTM (2000). Although the findings of the present study did not report on the quality of such expectations in various state content standards, findings from this study do suggest that such expectations, regardless of quality, were included in relatively few of the content standards by state. Therefore, findings from the present study provide additional evidence that language used in state standards is ineffective at communicated expectations for students (particularly in regard to providing mathematical description and rationales). If the reason for the lackluster effects found in this study rests in the manner in which such expectations were included in content standards, then it is logical that either low frequency, the vague nature of inclusion, or a combination of both would influence conditions such that these standards would not have a sizable effect on student achievement. Although findings from the present study are general in nature, these findings can inform potential improvements regarding inclusion of communication expectations in content standards. Practical implications from this study suggest that mathematics content standards should include expectations for mathematical communication in higher frequencies and in language more explicit and supportive of NCTM (2000) recommendations. While approximately half of states did include reference to communication in a separate set of expectations, there appears to be little effect on students’ mathematics achievement whether this separate section is included in a document or not.5 Therefore, it may be that for expectations for mathematical communication to be conveyed effectively, they should be embedded in content standards. However, such expectations must be conveyed in a meaningful manner. For example, Larnell and Smith’s (2011) analysis suggests that standards including expectations for mathematical descriptions should do so in a way that elicits higher reasoning and engagement in the content. One solution to this mismatch is for states to

Kosko and Gao

295

provide further specification to enhance standards’ ability to convey expectations for mathematical communication. For example, Hawaii (2007) asks Grade 4 students to “propose and justify conclusions/predictions based on data.” The standard lacks specification on the type of data that conclusions and/or predictions would be associated, and this relates to the format of the justification to be solicited by students. Certain types of data elicit more or less sophisticated justifications. For example, a pie chart illustrating the number of boys and girls in a classroom would likely elicit less interesting justifications than a line plot of different heights in the classroom. In addition to improving the wording of standards, another possibility is to provide teachers with meaningful resources that articulate, for each standard, how students should engage in communicating mathematically. Confrey et al. (2012) provided one example of such a resource with the website www.turnonccmath. net. The resource provides in-depth descriptions and learning trajectories for standards and groups of standards. However, examination of the effectiveness of such a resource on teachers’ interpretations of standards is still needed. Examining and improving the clarity of standards’ expectations for mathematical communication provides a useful starting point for improving the quality of mathematics content standards. Yet, research findings suggest that part of the issue may be how teachers interpret, or misinterpret, such standards (e.g., H. C. Hill, 2001, 2005). H. C. Hill’s (2005) observations of teachers’ interpretations of standards documents found that teachers interpreted the standards from the perspective of the curriculum at hand (i.e., textbook), rather than the intent of particular standards. Newton’s (2009) observations of Grade 6 teachers showed that teachers consistently engaged students in relatively fewer descriptive or “how” questioning than the written curriculum (textbook) they were assigned. If we pair the findings by H. C. Hill (2005) and Newton (2009), an interesting problem comes to the fore. Potentially, teachers may interpret content standards expectations for mathematical communication via the expectations of their textbook, but their interpretations of such textbooks’ expectations may also not match the intent behind the curriculum. While the present study did not investigate teachers’ interpretations of standards, such factors may have influenced the findings presented here. Yet, it is unclear how separable the issues of unclear standards (or curriculum in general) and teachers’ misinterpretations of such standards actually are. The two issues are likely intertwined, and further study is needed to determine how much effort should be devoted to clarifying standards, and how much should be devoted to professional development regarding standards. The findings of the present study suggest that including standards that require descriptions and rationales of mathematics does not affect students’ mathematics achievement. However, it is unclear how well NAEP measures

296

Educational Policy 31(3)

Figure 1.  NAEP extended constructed-response item that includes the questions “why or why not.” Source. Obtained from NAEP questions tool (NCES, 2014). Note. NAEP = National Assessment of Education Progress; NCES = National Center for Education Statistics.

these particular skills. Although NAEP does include some prompts that solicit explanations, accessible items do not provide a clear assessment of rationales on Grade 4 items. Figure 1 provides an example item from the NAEP Question Tool that seemingly solicits a rationale. However, the scoring rubric evaluates the correct or incorrect answer along with whether the description of procedures is appropriate. No mention of rationale or justification is provided. A similar phenomenon was observed by Kosko and Wilkins (2011) in which open-response items from various assessments of quantitative literacy were examined. Specifically, while several items included requests for descriptions and rationales, the rubrics often did not assess such data from test takers. If a similar phenomenon is present in NAEP items, it may explain why the effect sizes obtained in this study were not larger. However, various studies have documented the positive association between these mathematical communication behaviors and achievement scores, where achievement was not measured with items soliciting description or justification (e.g., Hiebert & Wearne, 1993; Kosko, 2012; Kosko & Miyazaki, 2012). Therefore, it is more likely that the findings of the present study are due mainly to the manner in which various states incorporated communication expectations in their content standards.

The Common Core The findings of the present study may provide some information on how the CCSSM will impact student achievement in the coming years. Table 5 shows the average makeup of each type of communication standard for 2009effective standards alongside the CCSSM. While higher prevalence of descriptive standards was found to have a positive effect on mathematics achievement at Grade 4, there is a much lower

297

Kosko and Gao Table 5.  Comparison of Grade 4 Communication Standards in 2009 and Common Core Standards.

Descriptive Rationale

2009 state standards

Common Core State Standards

0.16 0.05

0.06 0.12

prevalence of such standards in the new CCSSM. Yet, those standards included in CCSSM soliciting descriptions appear to provide some clarity in the kind of explanations expected of students. For example, the standard below states that students should explain calculations of multiplying various digit numbers with different representations. Although anecdotal, comparing this expectation with the examples provided earlier in examples from Maryland and Hawaii suggests a more purposeful expectation from CCSSM than some states. Multiply a whole number of up to four digits by a one-digit whole number, and multiply two-digit numbers, using strategies based on place value and the properties of operations. Illustrate and explain the calculation by using equations, rectangular arrays, and/or area models. (CCSSI, 2010, p. 29)

While description standards are less prevalent in Grade 4 CCSSM standards, rationale standards saw subtle increases over the national mean in 2009. As with the example provided for description standards, the rationale standards in CCSSM provide more specific expectations of what should be included. In the example below, rationales for comparing numbers with decimals is conveyed through the expectation of reasoning via comparison of the size of numbers and justifying symbolic representations through other representations, such as a visual model. Compare two decimals to hundredths by reasoning about their size. Recognize that comparisons are valid only when the two decimals refer to the same whole. Record the results of comparisons with symbols >, =, or