New Zealand teachers' overall teacher judgements (OTJs): Equivocal

New Zealand teachers’ overall teacher judgements (OTJs): Equivocal or unequivocal? Jenny Poskitt and Kerry Mitchell

Abstract Central to New Zealand National Standards is the concept of overall teacher judgements (OTJs). This paper examines the concepts of OTJs and standards through international literature and experiences of a sample of New Zealand teachers in 2010. Standards contain expectations—what is, what could be and what might be desirable— implicit degrees of performance. Teacher capacity to judge current and future performance is important. With multiple opportunities to gather pertinent information, teachers are best placed to make valid (unequivocal) judgements on student achievement when they have shared understandings of standards. Because standards are comprised of multiple criteria, not all of which are evident in samples of student achievement, teacher understanding of standards develops through professional conversations and moderation processes. In 2010 New Zealand teachers had mixed (equivocal) understandings of National Standards, applied them in different ways and had minimal experience of moderation processes.

Introduction Within the international context of political concern about standards of education, the New Zealand Government launched National Standards into New Zealand schools in 2010 for students in Years 1–8. The National Standards have been defined in writing, reading and mathematics in relation to the national curriculum. The intention is to monitor student achievement and to ascertain whether standards are improving or declining across the school sector, and to ensure that by senior secondary school students are sufficiently prepared to achieve National Certificate

Assessment Matters 4 : 2012 53

Poskitt and Mitchell of Educational Achievement (NCEA) at Level 2. To date, New Zealand has avoided the widely criticised national testing programmes introduced elsewhere, notably the No Child Left Behind policy introduced in the US in 2001. This is because New Zealand values the central role of formative assessment in improving learning and teaching, and the professionalism of its teachers (Absolum, Flockton, Hattie, Hipkins, & Reid, 2009). Critical to the implementation of National Standards in New Zealand is the notion of standards and the centrality of the OTJ. These concepts are explored in the following sections before examining data from New Zealand teachers about their understanding and implementation of OTJs.

Standards In everyday language the word standards has a range of meanings such as a standard hotel room (implying basic level) and products meeting New Zealand safety standards (implying quality assurance of manufacturing). In education, standards, according to Sadler (1987, p. 191), refer to “fixed points of reference for assessing individual students”. However, omitted from this definition is the inherent element of expectation, the interrelationship, if you like, between what is, what might be acceptable and what might be desirable. For example, perhaps a standard light bulb could illuminate for 10,000 hours, but with some technological advancement it may be possible to manufacture a light bulb that illuminates for 30,000 hours. This additional time might require some less environmentally sustainable materials or highly expensive manufacturing processes. A compromise is needed between creating light bulbs with a longer illumination lifespan, and “acceptable” manufacturing and environmental costs. Similarly, standards in education are somewhat arbitrary. For example, Hipkins and Hodgen posit an operational definition based on socioconstructivist beliefs: “a standard is a collective of exemplars, shared experiences and accumulating practice” (2011, p. 2, emphasis in original). Fundamental to the understanding of standards is the recognition that they are defined to serve particular purposes (Sadler, 1987), in a particular period of time, within a particular political, cultural and societal context, and implicit values. In education, specifying standards is a process of 54

Assessment Matters 4 : 2012

New Zealand teachers’ overall teacher judgements (OTJs) ascertaining what is desirable and what is possible within human and resource constraints. Sadler’s (1989, p. 129) refined definition makes a distinction between possibility and desirability: “a standard or reference level is a designated degree of performance or excellence. It becomes a goal when it is desired, aimed for or aspired to.” A key concept in standards, according to Ferrara, Johnson and Chen (2005), is the prediction, from the attainment of one standard, of subsequent performance in adjacent standards. Standards relate to, and are defined in conjunction with, other standards within a set of standards. However, a set of standards does not imply equal increments or an interval scale (Sadler, 1987), particularly when the underlying body of knowledge is not continuous. Sadler’s (1989, p. 129) definition of a standard—“a designated degree of performance or excellence”—implies not only the specification of the degree of performance or excellence, but also the ascertaining or judging of that performance. Questions arise as to who might make such judgements and on what basis.

Teacher judgements Pivotal to the New Zealand National Standards policy is the notion of OTJs. Each of these words: overall, teacher and judgement is examined separately in the next section before discussing the combined term. The first is the central word teacher. The New Zealand Government has decreed, by using the term “overall teacher judgement”, that teachers are best placed to make judgements on National Standards. There is support from the research literature for teachers making judgements. Analysing data from a Queensland study, Cumming, Wyatt-Smith, Elkins and Neville (2006, p. 16) argue that, “Teachers as professionals are able to make appropriate judgements about students’ work and ... are best placed to make judgements ... and to provide full information on student performance in a range of contexts and through a range of assessment opportunities.” They argue that teachers can draw upon a wide pool of formal and informal classroom assessments, over a period of time, from various learning situations, to synthesise a judgement, and use that


Poskitt and Mitchell judgement to inform next teaching and learning steps. Higher validity is achieved from using multiple assessments in various learning contexts, rather than basing judgements on one or a few national tests, for example. Using teacher judgements is consistent with contemporary theories of pedagogy and formative assessment wherein the assessment information can be used immediately by teachers and learners to inform subsequent learning. Moreover, Sadler (1987) argued qualitative judgements can be made across the curriculum, in a range of learning situations, across a range of cognitive and performance tasks (e.g., musical performances, product production such as food or materials technology, sports). Therefore, Sadler (1987, p. 193) maintains, “standards-referenced assessment draws upon the professional ability of competent teachers to make sound qualitative judgements of the kind they make constantly in teaching”. Judgements are integral components of teaching, learning and assessment.

Judgements Sadler (1989, p. 121) states, “In assessing the quality of a student’s work or performance, the teacher must possess a concept of quality appropriate to the task, and be able to judge the student’s work in relation to that concept.” This understanding of judgement is based partly on the tacit knowledge that “experts” or “judges” know intuitively about a standard (Sadler, 1987). In this context, tacit knowledge comprises experience built up over time of student performance with particular tasks or concept development. The difficulties with tacit knowledge are the inaccessibility of this knowledge to nonexperts and students, disagreements amongst experts, the time-consuming nature of developing consensus amongst a panel of experts and the variable basis of the final decisions depending on social dynamics of a judging panel (Sadler, 1987). To counter these limitations of tacit knowledge, Sadler advocates the combined use of exemplars and verbal descriptions. Exemplars are denotative of standards in that they illustrate them, often by way of an annotated product, or indicative attributes, skills or processes. Typically, however, standards are based on multiple criteria. Because exemplars only illustrate a few criteria, they are limited in what components of a standard they clarify. To rely solely on exemplars would necessitate innumerable 56


New Zealand teachers’ overall teacher judgements (OTJs) exemplars to portray multiple criteria within a standard. However, when used in conjunction with verbal descriptions, fewer exemplars are required. Verbal descriptions explain “the properties that characterise something of the designated level of quality” (Sadler, 1987, p. 201), and in so doing, make the qualities accessible to people other than the experts. While verbal descriptions may be limited by the precision of the language used, and hence always somewhat subject to interpretation, therein lies the learning potential. Deconstructing the standard within the context in which it is applied (e.g., reading performance in a rural classroom) enables deeper understanding of the quality characteristics defining the standard, and the importance given to particular criteria evident in the samples of student work. Remember that a standard is comprised of multiple criteria, and samples of work will demonstrate many, but not all, of those criteria. Some criteria will be sharper (Sadler, 1989) than others. Sharp criteria are those that are present or not, or correct/incorrect (e.g., punctuation), while fuzzy criteria are “characterised by a continuous gradation from one state to another ... they have no absolutes and unambiguous meaning independent of its context” (Sadler, 1989, p. 124). For example, whether a student’s piece of writing has impact on the reader has “fuzziness” because all pieces of writing have an impact on the reader—but to what degree and in what ways will vary. In deriving a judgement teachers bring together a compilation of criteria, only some of which are manifested in any particular student’s work. The dilemma can be determining whether a standard has been met when some specific criteria are not evident. This situation can lead to debates about whether all criteria need to be evident within a standard to demonstrate competency, or whether a midpoint judgement of a standard suffices. To some extent the decision will be determined by whether teachers use analytic or holistic scoring. Analytic scoring comprises identifying criteria to be judged, determining the weighting given to each component, judging samples of work in relation to each criterion and awarding marks accordingly, and subsequently totalling the separate scores to ascertain the overall score. In contrast, holistic scoring requires the assessor to make an overall or configurational assessment (Kaplan, 1964, p. 211, as cited in Sadler, 1989, p. 132) and subsequently defend it by reference to pertinent criteria. The strength of this approach is to capture unexpected


Poskitt and Mitchell application of criteria, particularly for tasks or processes that require problem solving or creativity. Moreover, it is more relevant to everyday life where multiple solutions to problems can be deemed successful and are not dependent on predetermined and prescribed criteria. Judgements can therefore vary according to which samples of work or performances are compared against which particular frame of reference. Indeed, Hipkins (2010) argues that application, and therefore judgement, of a standard is socially constructed and dynamic according to evolving professional knowledge, understandings and interpretations. She states (p. 19), “the standard resides in the collective constituted by: the formal standard’s definition and notes; the body of tasks used to assess the standard; the range of student work generated by those tasks; and the history of judgements that builds up in relation to how and why student work meets a standard or not”.

Overall teacher judgements Bringing together multiple samples of work in order to derive a holistic judgement of a student’s performance in relation to the National Standards is the essence of the New Zealand OTJs. New Zealand has focused on formative and cumulative teacher judgements, with the intention of generating more valid and informative assessments of student progress. According to the Ministry of Education fact sheet (Ministry of Education, 2010) on OTJ: [an OTJ] draws on and applies the evidence gathered up to a particular point in time in order to make a judgement about a student’s progress and achievement. Using a range of approaches allows the student to participate throughout the assessment process, building their assessment capability ... No single source of information can accurately summarise a student’s achievement or progress. A range of approaches is necessary in order to compile a comprehensive picture of the areas of progress, areas requiring attention, and what a student’s progress looks like.

“Triangulation” of information increases the dependability of the OTJ. See Figure 1 for more detail.

58


New Zealand teachers’ overall teacher judgements (OTJs) Figure 1 OTJ in relation to National Standards Observation of Process Evidence obtained from informal assessment opportunities, incorporating the observation of process, such as: • Focused classroom observation • Student work books • Tasks: e.g. maths tasks, ARBs • Running records • Student peer assessment

National Standards Used as a signpost

Overall Teacher Judgement (OTJ) Decision made in relation to National Standards

Learning Conversations Evidence arising from learning conversations, such as: • Conferencing • Interviewing • Questioning • Explaining • Discussing

Tool Outcomes Evidence obtained from assessment tools, including standardised tools, such as: • 6 year observation survey • PAT • STAR • e-asTTle • GloSS • IKAN • NumPA

Source: Ministry of Education (2010)

Difficulties with teacher judgements National policy for OTJs implies that teachers understand what OTJs comprise and that they are able to use the processes appropriately in classroom practice. However, the application of teacher judgement has been somewhat problematic in other countries. Gardiner, Tom and Wilson (2000), and Harlen (2007), argue that the key issue in the role of teacher judgements is consistency. In order for the wider educational community to be confident about teacher judgements, teachers need to be consistent so that “judgements about student learning are not dependent on the individual teacher, student, location or time, and are based on a shared understanding of syllabus and standards of learning” (New South Wales Department of Education and Training, 2008, p. 1). Users of teacher judgements need to be confident about accuracy and consistency across the profession. To this end, a report in Queensland by Cumming et al. (2006, p. 9) recommended


Poskitt and Mitchell guidance for teachers’ judgement: “Descriptive elaborations of standards, citing the benchmark performance within such standards, should be developed to guide teacher assessment, with opportunities for teachers to engage through some process of moderation and sharing of exemplars.” These authors recommended further professional development with teachers on the interpretation and use of assessment data on which teachers formulate judgements. The Queensland Studies Authority funded several projects within a research programme to explore the capacity of teachers to make judgements against defined standards and whether their judgements could provide more valid and reliable information about student capabilities than standardised tests (Cumming et al., 2006; Maxwell, 2002). Despite increased validity, there were ongoing issues with the comparability and consistency of teacher judgements in Queensland, particularly in aligning teacher assessment task results with norm-based assessments. Similar concerns about reliability of teacher judgements were found in the US. A research study by Meisels, Bickel, Nicholson, Xue and AtkinsBurnett (2001) in the US investigated teachers’ capability to make consistent—and therefore trustworthy—judgements on student work. Meisels and colleagues asked teachers to collect and evaluate samples of student work in structured portfolios, to use checklists and prepare summary reports three times a year. The researchers compared this Work Sampling System with student results on the Woodcock-Johnson battery for literacy and math measures and found correlations between the Work Sampling System and the Woodcock-Johnson battery ranging from 0.5–0.75. These figures indicate a sufficient degree of correlation to trust and support teacher judgements, but insufficient statistical relationship to suggest that the Work Sampling System and the Woodcock-Johnson battery measure the same dimensions. Meisels et al. (2001) concluded that teacher judgements from student work samples are inadequate for high-stakes decision making or comparison of students, but helpful for informing ongoing teaching and learning programmes, ongoing monitoring of student progress and for informing parents. Harlen (2007) argues that reliability of teacher judgement is low, but can be improved through quality control and quality assurance processes, whereby task 60


New Zealand teachers’ overall teacher judgements (OTJs) specification is increased, professional collaboration and learning are encouraged and effective moderation processes put in place. Many authors (e.g., Cumming et al., 2006; Sadler, 1989) argue that, despite the lower reliability of teacher judgements, the increased validity and valuing of formative assessment for ongoing learning is immense. A system that values teacher judgements supports formative assessment practices. Cumming et al. (2006, p. 16) cite Kellis and Silvernail (2002) who argue that teacher judgement of student progress has multiple benefits: [firstly it] provides immediate feedback; secondly teachers are able to use more recent information on student learning to make instructional decisions rather than waiting months for test scores; and thirdly, owing to the unobtrusive nature of teacher judgement techniques, such as daily observation, teachers have the opportunity to make deeper judgements of student learning that go beyond fact based, short answer responses.

Moreover, quality professional discussions during moderation processes can be valuable opportunities for teachers to clarify expectations, confirm judgements, increase confidence (e.g., Maxwell, 2002), critically reflect on achievement and assessment activities and make adjustments to learning and teaching approaches. What is pertinent here is to indicate that the research literature demonstrates that the application of OTJs can be problematic unless teachers are clear about what constitutes an overall teacher judgement, teachers have common understandings of “standards”, such understandings are supported by clear criteria and exemplars of student work demonstrating achievement of standards, and moderation processes are used to ensure appropriate and consistent judgements are made (Gardiner et al., 2000). Given that New Zealand is only beginning to use OTJs in the primary school sector, a research investigation set out to explore teacher understandings about OTJs and how they were applying OTJs in the classroom and school.

Context and methodology of the research study Although some studies have been conducted about teacher judgements in countries such as the US and Australia, little was known about New Zealand teachers’ understanding, and application, of OTJ, or any related


Poskitt and Mitchell professional learning needs. Because little was known about teachers’ understandings, an exploratory (intrinsic) case study research design was selected (Punch, 2009). Exploratory case studies enable researchers to develop a better understanding of a case (teacher understanding and use of OTJs), in natural settings (schools), within the complexity of school life. The phenomenon of OTJs was explored within the bounded context of a sample of 10 primary schools in the 2010 Assess to Learn (AtoL) professional development contracts (for information about AtoL, refer, for example, to Poskitt and Taylor, 2008). These schools were invited to participate because they were already part of a larger evaluation study (the AtoL contract), and hence willing to engage with external evaluation and research. Furthermore, their participation in formative assessment professional learning meant they were likely to have some understanding of classroom and school-wide assessment processes that could be drawn on for making OTJs. If these teachers struggled with the concept and application of OTJs then this would have significant implications for nationwide professional development programmes. Schools from four regions were invited to participate, from which 10 schools chose to be involved: three in Auckland, three in Christchurch, three in Manawatu and one in Wellington. This represented five (out of eight) AtoL professional development providers. Typically, case study research design uses multiple sources of evidence. Within the broader evaluation of AtoL, a range of data was collected: researcher, teacher and facilitator observations; student achievement data; and national questionnaires of teachers and principals. In this smaller exploratory case study, semistructured interviews were the primary source of data, supplemented by document analysis. Semistructured interviews enable researchers to prepare interview questions in advance (Punch, 2009). Advanced planning also enables interviewees to be at ease, knowing the questions to be asked, and to gather relevant information, think or reflect prior to the interview. Incorporating some flexibility is possible with semistructured interviews, which allow for spontaneity beyond the preprepared questions and further exploration of ideas in depth with participants.

62


New Zealand teachers’ overall teacher judgements (OTJs) Interviews were conducted with school principals and teachers. An interview protocol was developed and used for all interviews. During the interviews, participants were encouraged to refer to samples of teacher or school documentation related to understandings of OTJ, benchmark or reference points against which judgements were made and processes used in the school. The interviews were recorded and analysed. The first interviews took place mid-year, June 2010, and the same people were interviewed again in November 2010. Findings from the interviews were reported to the Ministry of Education and to the AtoL providers so that assessment facilitators could respond to the identified needs.

Data analysis Interviews were initially transcribed and then collated according to type of interviewee and interview question. For example, all principal interviews were labelled and collated into a principal response file; then responses analysed question by question. Coding sought to summarise the data by identifying themes and patterns. The first phase established descriptive codes and themes (finding and describing relationships amongst data). The intention was to develop the data into second-level, inferential pattern codes (giving meaning or interpretation to the data relationships). However, when the researchers met to validate the coding and discuss emerging relationships, the similarity to Gardiner et al.’s (2000) study became apparent, and the data were then reorganised into the four questions discussed below (e.g., “What do teachers understand about OTJs?”). The resultant findings could potentially be used for ongoing professional learning purposes. The reader is cautioned, however, that this is an exploratory study of only 10 schools so the findings, while potentially useful to inform future professional development provision, have limited generalisability.

Findings Principals (n = 10) and teachers (n = 30) were interviewed to explore understandings about making OTJs and the processes used in the school to support teachers. This paper only examines data related to teachers.


Poskitt and Mitchell The data are presented here in relation to Gardiner et al.’s (2000) four identified principles (phrased here as questions): What do teachers understand about the constitution of OTJs? How common are their understandings about “standards”? To what extent do they use clear criteria and exemplars for student work to support their understandings? How are moderation processes used to ensure appropriate and consistent judgements are made?

Teacher findings What do teachers understand about OTJs? Teachers varied in the way they conceptualised OTJs. Three interview extracts illustrate contrasting understandings about OTJs: the first referring to “gut feeling”; the second to the need to refer to information but uncertainty about how to interpret the information; and the third to drawing on a large collection of data, including students’ perspectives of their work: My initial thinking is it is a gut feeling and reaction. I observe at mat time, listen and see their [children’s] response to my questions. I sometimes gather information at the end of a lesson to support my gut feeling. (School P, Teacher 1, June 2010) Having more than one piece of evidence—a range of assessments to get best fit. Not sure—need one piece of standardised assessment, but I place those in background especially if done some time ago—in process but not the dominating factor. I may make up a test according to the criteria I have and use that as one piece in the puzzle. (School L, Teacher 4, June 2010) OTJs are formed by combining a range of data from informal and formal assessment, incorporating the child’s views as well as the teacher’s. I draw on conversations with the child, something in writing (like their use of selfand peer assessment), and refer to my modelling books. I incorporate the kid’s judgements and evidence from a range of writing like narrative and report writing and highlight when we have evidence that something has been achieved in relation to the [writing] matrix. I use a range of data, at least three sources, because otherwise the information could be affected by influences on the day like playground interactions. (School L, Teacher 4, June 2010)

64


New Zealand teachers’ overall teacher judgements (OTJs) A few teachers (n = 5/30) viewed an OTJ as a “gut feeling” from their professional interactions with students, while most teachers viewed an OTJ as a professional judgement based on considered analysis of a combined range of assessment data. Some (n = 7/30) teachers added to this mix the student’s self- or peer assessment, the student view on their overall performance and those of their professional peers. Others considered that an OTJ was identifying “best fit” between criteria and performance and triangulating data. Finally, a few teachers (n = 3/30) compared the combination of assessment data to a reference framework (such as writing matrices, the numeracy project framework, the national curriculum or National Standards). Analysis of data in this small case study revealed two themes in teachers’ understanding of OTJs: deriving the OTJs and making sense of the OTJ.

The derivation of OTJs The first theme relating to how judgements were derived contained three categories of understandings about OTJs: a) gut feeling b) (intra) professional judgement based on a range of assessment information c) (inter) professional judgement derived from a combination of the teacher’s own professional judgement, collegial professional discussions and student input. These categories build from a holistic, but largely unsubstantiated view (gut feeling), to a judgement based on synthesis of a collection of assessment information (intra professional judgement), to a more socially derived view based on collegial professional discussion of the teacher’s judgement with input from the student (inter professional judgement). This latter category aligns with Hipkins’ (2010) definition of judgements (refer to above section) as socially constructed.


Poskitt and Mitchell

Making sense of the OTJ The second theme contained two categories of sense making: a) “best fit” b) comparison against particular frameworks (such as school-based or nationally derived matrices, formal assessments or norm-referenced assessments). Teachers struggled with conceptual alignment of the overall judgement— was it midpoint, high end, or was full competency implied? Most teachers seemed to resolve this dilemma by what they termed “best fit”; the point where converging data predominantly coalesced. Sadler (1987) discusses these conundrums about how to judge achievement within levels and argues, “features listed at each level can at best be indicative, not definitive” (p. 203). He continues: “The areas of maximum ambiguity then become the grade boundaries themselves, yet in the context of actual grading decisions, these are the critical points” (p. 205). This dilemma is why Sadler (1987) advocates the use of exemplars and verbal descriptions to help teachers focus attention on significant features of student work. Other teachers in this study sought to compare a derived judgement against a framework (another recommendation of Sadler, 1987), but the frameworks used varied from teacher to teacher and undermined the consistency for which they were striving. Newton (2005) argues that to have high levels of consistency or comparability, similar tools and frameworks need to be used based on the same constructs. If teachers are using a range of tools which emphasise different constructs then true comparability and consistency is nebulous. Newton (2005) posits that value judgements are often used when technical difficulties are encountered and there is inherent variability in value judgements. The teachers interviewed in this study varied in their understanding of OTJs and how to derive them. When this level of ambiguity exists there is likely to be minimal consistency.

Supporting judgements with evidence Teachers varied in the extent to which they referred to evidence or data in supporting their OTJs. A few teachers (n = 4/30) relied extensively on their 66


New Zealand teachers’ overall teacher judgements (OTJs) years of teaching experience (they “knew where a child was and what to do based on their experience”), while most teachers mentioned referring to a range of tools such as running records, classroom observations while teaching groups of children, formal tests (e.g., asTTle, PATs), exemplars and modelling books. Conversations with students “We check with kids … can they explain it?” were also cited as evidence supporting OTJs. Three interview extracts illustrate differing consideration of evidence. First, limited use of information; second, drawing on a more extensive range of information; third, confusion: Our professional judgement and knowledge that we call on after years of experience [teaching this level] combined with observations of children. (School B, Teacher 1, June 2010) I use a range of information like standardised tests, PATs, diagnostic surveys and knowledge tests in maths, modelling books, kid’s work, matrix in writing, feedback and look to see if kids have acted on it—write in pencil and edit in black pen and see whether change has occurred since I made the comment (written or oral). Self-assessments they do. Sometimes I discuss with teacher aide progress students making and discuss with them before making judgement. (School L, Teacher 4, June 2010) Our professional knowledge over the year has grown so [my team leader] said I mark them harder because my standards may have changed as my knowledge increases … If I have done my OTJ I’d put in the writing level they got in the test or would I actually ... no you’d put in your OTJ ... and change the level that way ... well, I am not sure really. (School TP, Teacher 2, June 2010)

Themes analysed from the data related to two categories: a) reliance on tacit professional knowledge b) reference to evidence. A small group of teachers relied on their “years of experience”—in other words, their professional tacit knowledge. Sadler (1989, p. 126) states, “teachers’ conceptions of quality are typically held, largely in unarticulated form, inside their heads as tacit knowledge”. The difficulty with tacit knowledge is that it can be changeable, depending on the quality of work being appraised and changing teacher understanding (as seen in


Poskitt and Mitchell the third quote), and, of greater concern, it is not accessible to learners (nor their parents) and therefore available for use to improve ongoing learning and teaching. The other teachers referred to some evidence to support their judgements. These teachers were able to show how particular characteristics of student work demonstrated achievement of certain features of the standards. However, the range and pertinence of the data drawn upon varied amongst the teachers interviewed, and the predicament they faced was how to synthesise information from formal and informal assessments, especially when there was some divergence in the data. Hipkins (2010) argues the need for teachers in all schools to determine which sources of evidence are relevant to particular standards and how they are combined to represent attainment of the overall standard. In November, when the same teachers were interviewed again, a few teachers reported minimal difference in their understanding of OTJs. Indeed, three teachers stated their perception now of it being a “snapshot at this stage of the year, rather than progress over time”. In contrast, two teachers reported now realising the importance of considering class work as well as test results, the need for reflection and moderation, and for triangulating data from multiple sources. Another group of teachers reported changes such as now comparing beginning and end-of-year data to show progress, and understanding the need to distinguish students performing below, at and above standard. The final group of teachers indicated that they now ensure they have at least three pieces of evidence which they triangulate and on which they base OTJs. In essence these teachers were refining their referent evidence—the range and relevance of the data used to inform or justify their judgements. By the end of 2010, teachers in this study still varied in their understanding of OTJs, how to derive them and on what basis, although most teachers were drawing on a range of assessment information. Wyatt-Smith, Klenowski and Gunn (2010, p. 59) argue, “teachers’ judgements and interpretations of assessment data are fundamental in achieving greater coherence” and dependability. In the first year of implementation there was mixed understanding about OTJs.

68


New Zealand teachers’ overall teacher judgements (OTJs)

How common are teacher understandings of “standards”? It appeared that few discussions had occurred related to understanding of standards. Instead more attention was focused on use or assessment of the “standards”. Most teachers discussed their judgement of “standards” with colleagues either informally or formally: Ask other colleagues’ views on student work and ‘where to’ level. I approach experienced staff or those with deep knowledge of curriculum and NS and find out more information about particular characteristics that are indicative of a stage or level. I now have increased familiarisation of standards with our matrix. (School H, Teacher 1, June 2010) Ask other people—work with people within the team; moderating took samples of different level and worked with people in team and compared samples of work we thought were in a particular level. Curriculum leaders help, RTLit, Numeracy adviser, exemplars, advisors, moderation, seek the views of other schools if really stuck. (School L, Teacher 2, June 2010)

The first quote indicates the teacher’s searching for deeper knowledge about the curriculum and the characteristics of the cognitive knowledge and skills embodied in the standard. It might be inferred the teacher is seeking to understand the composition of the standard. Nevertheless, the two quotes reveal, through their professional conversations with other colleagues, these teachers’ implicit understanding of the socially constructed nature of standards (Hipkins, 2010). Although sharing common understandings of standards is considered by Gardiner et al. (2000) to be fundamental to consistency, teachers in this study were more focused on common use rather than common understanding of the standards.

To what extent did teachers use clear criteria and exemplars of student work to support their understandings? Few teachers referred specifically to exemplars, but almost all teachers talked of using students’ work and student observations alongside tests to complement decisions (rather than understandings of “standards” per se). Several teachers used an extensive range of information, as illustrated in this quote: Focused classroom observations, student workbooks, modelling books, running records, learning conversations, standardised tests e.g. PATs, Gloss.


Poskitt and Mitchell NS focused us on making a difference and making it higher and lifting our game; some kids formerly okay and now thought perhaps borderline. Makes for focused teaching and knowing what to do differently to move kids on and kids are aware now too in AtoL. Fortunate in implementing new curriculum and AtoL, reporting to parents and NS—all aligned; outstanding facilitation of facilitator has been integral to staff understanding and taking on board. (School H, Teacher 3, June 2010)

Some teachers discussed the need for more exemplars and deeper discussion of what constitutes achievement of particular standards: “we need to tease out what we mean by particular indicators in our matrices”, “working through differences in our marking”, “even out teacher judgements”. Only one group of teachers reported comparing their analyses and judgements against standards and exemplars. Sadler (1989) advocates the value of exemplars and verbal descriptions to explicate standards and as points of reference in the forming of judgements. Many teachers in this study expressed awareness of the need for assessment information to support judgements, but few teachers appeared to support their understandings and judgements by using clear criteria or exemplars.

To what extent are moderation processes used to ensure appropriate and consistent judgements are made? Collegial support and informal discussion were most frequently mentioned, along with gathering multiple sources of evidence to formulate judgements. Formal moderation of OTJs was used in few schools in 2010: Talking with other teachers and getting second opinion. (School P, Teacher 1, June 2010) Starting to develop own school matrices and hope to use that for moderation. Matrices give us confidence—recorded in writing then got support to back up judgements if asked. Matrices are really valuable for less experienced teachers. We also refer to criteria, tests used, combined with professional experience from what we know where kids need to be. We look at ‘where at’ and ‘where need to get to’. We focus on where they started from; realise kids all different. (School B, Teacher 2, June 2010)

70


New Zealand teachers’ overall teacher judgements (OTJs) Our team moderates. Moderation occurs firstly in pairs [of teachers] and we bring samples you are unsure of to a meeting and discuss together as a team … We highlight children where we’re a bit unsure—these are the ones we make sure we have enough evidence … we talk to children, not just look at the assessment results and we refer to the standards … (School M, Teacher 2, November 2010)

Moderation is more than informal conversations with colleagues (Hipkins, 2010); indeed it involves deep philosophical discussions in which the tensions between assessment, pedagogical and content knowledge are deconstructed and reconstructed in new ways so that teacher understanding is extended. Let us briefly explore this notion as extensive discussion of moderation is outside the scope of this paper. In making OTJs, teachers draw on a complex array of data—formal and informal, quantitative and qualitative. Not only are teacher understandings based on content (theoretical and practical knowledge of a curriculum area), but also on pedagogical (learning and teaching theories and practices) and assessment processes. Teachers vary in their depth of knowledge and understanding of curriculum, progressions of learning, assessment tools and the meanings that can be derived from them; as well as the interrelationships they have with their students. All of these factors influence judgements. Wiliam (1998, as cited in Reid, 2007) talks of “construct referencing”, where notions of quality are matters of professional judgement rather than fact. Wiliam’s view is the interpretation of evidence is important, rather than the criterion descriptors used. For primary teachers, interpretation is aided by an enhanced understanding of progression principles (Wiliam, 1998, as cited in Reid, 2007). Furthermore, Reid (2007, p. 144) argues that “engaging in assessment moderation appears to help teachers resolve formative and summative assessment tensions and strengthen links between pedagogy, curriculum and assessment”. Teachers as learners can consider their learning as an interactive social activity as well as one which is individually located. Engaging in some form of peer review (moderation) enables teachers to check their interpretations with other colleagues, to debate understandings


Poskitt and Mitchell about such matters as curriculum, learning and assessment, and to have their views deepened, challenged or confirmed. Professional conversations within moderation processes are the basis of potential deep learning and shared understandings of curriculum, pedagogy and assessment. In turn, these clearer understandings help teachers in their instructional and assessment processes to assist students to “develop the capacity to monitor the quality of their own work during its actual production” (Sadler, 2009, p. 45), thus linking assessment with improved learning and teaching.

Conclusion and implications The actualisation of improved learning, teaching and assessment from the implementation of National Standards may be possible if highly valid and reliable data are generated and used to inform professional and student learning. However, as seen in this study, such ideals are challenging to achieve when teachers are surrounded by uncertainty and confusion about the meaning of, and process for deriving, OTJs. Judgements are by nature somewhat arbitrary when they incorporate fuzzy criteria (Sadler, 1989), characteristics which become explicated when abstract ideals are manifested in actual performances or student works (Sadler, 2009). Interpreting the meanings of standards by comparing particular characteristics of student work against referent frameworks (Hipkins, 2010; Reid, 2007) in conjunction with verbal descriptors and exemplars (Sadler, 1989) is a means of developing greater consistency of judgements amongst teachers. But if the interpretation is more important than the actual criteria or descriptors used (Reid, 2007), then teachers need to be given structured opportunities for professional discussions. Indeed, Wyatt-Smith et al. (2010) argued that, in moderation processes, teachers’ professional conversations vacillated as they moved to and fro between samples of students’ achievements, stated standards, assessment and curriculum material. Influenced by social dynamics of the group and new information, they were deepening their theoretical understandings. In essence there is tension between teachers’ tacit knowledge (gut feeling), intra and inter professional judgements, and explicit knowledge. Although having clarity about the composition of OTJs and standards, appropriate evidence to underpin those judgements and using exemplars 72


New Zealand teachers’ overall teacher judgements (OTJs) and verbal descriptors to support the judgements is likely to lead to greater consistency (Sadler, 1998; Gardiner et al., 2000), it is only through moderation processes that teachers will reach deeper understanding of standards (Wyatt-Smith et al., 2010). Deep or meaningful change in teacher beliefs and professional practice takes time, generally considerably longer than anticipated (Timperley, Wilson, Barrar, & Fung, 2007). New Zealand has only just begun the journey of National Standards and associated OTJs. The dynamic interplay between the formally defined (standards, pedagogical content knowledge, curriculum and assessments) and the socially interpreted takes time and skill but is essential. When this interplay occurs and incorporates active participation of students (Sadler, 2009), it is likely that New Zealand teacher judgements will transform from equivocal (changeable, of varying interpretation) to unequivocal (sound, dependable and consistent).

Acknowledgement Thank you to the principals and teachers who willingly agreed to participate in this study and the Ministry of Education for funding the research.

References Absolum, M., Flockton, L., Hattie, J., Hipkins, R., & Reid, I. (2009). Directions for assessment in New Zealand. Retrieved 1 November 2011, from http://assessment. tki.org.nz/Assessment-in-the-classroom/Assessment-position-papers Cumming, J., Wyatt-Smith, C., Elkins, J., & Neville, M. (2006). Teacher judgment: Building an evidentiary base for quality literacy and numeracy education. Brisbane: Centre for Applied Language, Literacy and Communication Studies and Centre for Learning Research, Griffith University. Retrieved 30 November 2010, from Queensland Studies Authority: http://www.qsa.qld.edu.au/downloads/ publications/research_qsa_teacher_judgment.pdf Ferrara, S., Johnson, E., & Chen, W. H. L. (2005). Vertically articulated performance standards: Logic, procedures, and likely classification accuracy. Applied Measurement in Education, 18(1), 35–59. Gardiner, J., Tom, C., & Wilson, K. (2000). Consistency of teacher judgement: Research report. Brisbane: Queensland School Curriculum Council. Retrieved 4 December 2010, from Queensland Studies Authority: http://www.qsa.qld.edu.au/ downloads/publications/research_qscc_teacher_judgment.pdf


Poskitt and Mitchell Harlen, W. (2007). Assessment of learning. London: Sage. Hipkins, R. (2010). Learning through moderation: Minding our language. set: Research Information for Teachers, 1, 18–19. Hipkins, R., & Hodgen, E. (2011, September). National standards, moderation challenges and teacher learning. Paper presented at the Symposium on Assessment and Learner Outcomes, Wellington. Kellis, M., & Silvernail, D. (2002). Considering the place of teacher judgement in Maine’s local assessment systems. Gorham, ME: Maine Centre for Education Policy, Applied Research and Evaluation, University of Southern Maine. Maxwell, G. S. (2002). Moderation of teacher judgements in student assessment. Discussion paper on assessment and reporting. Brisbane: School of Education, The University of Queensland. Meisels, S. J., Bickel, D. D., Nicholson, J., Xue, Y., & Atkins-Burnett, S. (2001). Trusting teachers’ judgments: A validity study of a curriculum-embedded performance assessment in kindergarten to grade 3. American Educational Research Journal, 38(1), 73–95. Ministry of Education. (2010). National Standards factsheet: Overall teacher judgment (OTJ). Retrieved 4 December 2010, from Te Kete Ipurangi: http:// nzcurriculum.tki.org.nz/National-Standards/Key-information/Fact-sheets/ Overall-teacher-judgment New South Wales Department of Education and Training. (2008). Consistent teacher judgement. Retrieved 4 December 2010, from www.curriculumsupport.education. nsw.gov.au/consistent_teacher/consistency.htm Newton, P. E. (2005). Examination standards and the limits of linking. Assessment in Education, 12(2), 105–123. Poskitt, J., & Taylor, K. (2008). National education findings of Assess to Learn (AtoL) report. Auckland: Education Group. Retrieved from Education Counts: http:// www.educationcounts.govt.nz/publications/schooling/27968/2 Punch, K. (2009). Introduction to research methods in education. London: Sage. Reid, L. (2007). Teachers talking about writing assessment: Valuable professional learning? Improving Schools, 10(2), 132–149. Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of Education, 13(2), 191–209. Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144. Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education, 5(1), 77–85. Sadler, D. R. (2009). Transforming holistic assessment and grading into a vehicle for complex learning. In G. Joughin (Ed.), Assessment, learning and judgement in higher education (pp. 45–64). Wollongong, NSW: Springer.

74


New Zealand teachers’ overall teacher judgements (OTJs) Timperley, H., Wilson, A., Barrar, H., & Fung, I. (2007). Teacher professional learning and development. Best Evidence Synthesis Iteration. Wellington: Ministry of Education. Wiliam, D. (1998, July). The validity of teacher assessments. Paper presented at the 22nd annual conference of the International Group for the Psychology of Mathematics Education, Stellenbosch, South Africa. Wyatt-Smith, C., Klenowski, V., & Gunn, S. (2010). The centrality of teachers’ judgement practice in assessment: A study of standards in moderation. Assessment in Education: Principles, Policy & Practice, 17(1), 59–75.

The authors Dr Jenny Poskitt is Director, Graduate School of Education at Massey University. Jenny’s research and teaching interests include professional learning, assessment, adolescent learning and engagement. Email: [email protected] Kerry Mitchell is a director of The Education Group, a private consultancy business. Kerry has research and professional expertise in assessment, leadership, principal appraisal, literacy and numeracy. Email: [email protected]