New Zealand and Louisiana practicing teachers

Contextual differences in conceptions of feedback

1

New Zealand and Louisiana practicing teachers’ conceptions of feedback: Impact of Assessment of Learning versus Assessment for Learning policies? Gavin T. L. Brown The Hong Kong Institute of Education Lois R. Harris The University of Auckland Chrissie O’Quin & Kenneth E. Lane Southeastern Louisiana University Abstract Teacher beliefs about feedback matter since they are responsible for its implementation in classrooms. This paper compares the conceptions of feedback of practicing teachers from two very different jurisdictions (Louisiana, USA, n=308; New Zealand, n=518). Responses to a common research inventory were modelled independently but multi-group confirmatory factor analysis produced inadmissible solutions for both models. Joint factor analysis produced a five-factor solution, which was inadmissible for the Louisiana teachers. Intercorrelations around feedback as teacher-grading exceeded 1.00 for Louisiana teachers; whereas, New Zealand teachers had correlations close to zero for this factor. While both groups of teachers endorsed the notion of feedback for improved learning, differences appear related to contrasting assessment policy frameworks (i.e., high-stakes in Louisiana, lowstakes in New Zealand).


2

Introduction Feedback is almost universally endorsed as a powerful strategy for teachers of all subjects and grade levels (e.g., Leahy, Lyon, Thompson, & Wiliam, 2005, Reeves, 2000). Hattie and Timperley (2007, p. 102) cite it as “among the most critical influences on student learning.” Feedback can increase learner satisfaction and persistence (Kluger & DeNisi, 1996), and contribute to students adopting more productive learning strategies (Vollmeyer & Rheinberg, 2005). However, what counts as ‘good feedback’ is contested (Shute, 2008). Kluger and DeNisi (1996) found that feedback actually decreased student performance in one- third of the studies analysed in their literature review, demonstrating that feedback, when inappropriately administered, can have negative effects. To better understand the various types of feedback and their potential outcomes, numerous authors have created models of feedback (e.g., Butler & Winne, 1995; Hattie & Timperley, 2007; Shute, 2008; Tunstall & Gipps, 1996). For example, Hattie and Timperley’s (2007) influential review of the feedback literature identified four types of feedback, along with a range of factors mediating their effectiveness. These types were feedback task (e.g., whether work was correct or incorrect), feedback process (e.g., comments about the processes or strategies underpinning the task), feedback self-regulation (e.g., reminding students of strategies they can use to improve their own work), and feedback self (e.g., non-specific praise and comments about effort). Feedback process and feedback self-regulation are thought to be the most important when working towards deep processing and mastery of tasks, while feedback task can be useful if students incorporate it into a strategy to improve their process or self regulation. Hattie and Timperley argue that feedback self is the least effective form of feedback as it rarely contains any useful information about the tasks, with students seldom translating it into increased efficacy, engagement, or understanding of the task. When looking at the effectiveness of feedback, the particular type selected is just one of the variables potentially affecting its influence on learning. Regardless of the type of feedback used, the student’s ability to interpret and use the feedback and their motivation to do so (Sadler, 2010) also mediates its effectiveness. Additionally, the timing, complexity, and accuracy of the feedback contribute to its influence on student performance (Hattie & Timperley, 2007). These variables are often under the control or supervision of the classroom teacher. Additionally, some types of feedback (e.g., grades, reports to parents) are required by authorities (e.g., school principals, education boards), irrespective of teachers’ beliefs or educational research on best practice. Teacher Understandings of Feedback Despite the power that teachers commonly exercise over the delivery of feedback, there has been little research to date examining teachers’ conceptions of feedback. Most studies examine teacher enacted practices, making it important to identify and understand which feedback practices and purposes they endorse. Irving, Harris, and Peterson (in press) found New Zealand teachers identified three main purposes for feedback: (1) improving student learning (e.g., providing information about weaknesses in student work and how to correct them), (2) reporting and compliance purposes (e.g., giving grades, providing hints to students about what their final grades might be to prevent surprise when marks were received), and (3) encouraging students (e.g., praise, feedback about good ‘non-academic’ qualities like effort). Additionally, teachers identified a fourth classification; they articulated that feedback which did not lead to student action was useless or irrelevant (i.e., comments given alongside grades), however they described continuing to use these ‘useless’ practices out of politeness to students or because of stakeholder expectations. Irving, Harris, and


3

Peterson’s study provides insight into teacher thinking, but is limited in its generalizability due to its small sample size and the uniqueness of the New Zealand assessment context, which is based on Assessment for Learning principles (e.g., Black & Wiliam, 1998). Clearly, studies carried out in one jurisdiction do not automatically apply in other educational contexts where policy frameworks and practices are different. Research suggests that differences in culture or society lead not only to differences in policy, but also to variations in how teachers understand those corresponding practices and processes (Brown & Harris, 2009; Brown, Lake, & Matters, 2011; Brown & Michaelides, 2011).For example, Carless (2011) has shown that tests and examinations have to be used for their potential to inform improvements in teaching in the context of examination-dominated Hong Kong educational culture, even though testing is commonly rejected within a western Assessment for Learning approach. Additionally, teachers in New Zealand and Hong Kong, which have very different assessment regimes1, had very different conceptions of how grading students was related to the purpose of improvement as indicated by the responses to a common research instrument (Brown, Kennedy, Fok, Chan, & Yu, 2009). Hong Kong teachers strongly associated grading with improvement, whereas, in New Zealand this had a weak positive correlation. However, in contrast, Hamilton et al. (2007) reported that teachers in California, Georgia, and Pennsylvania had very similar responses to, experiences in, and attitudes towards standards-based accountability assessments, perhaps because there were very small differences in how the systems were implemented. Hence, it may be expected that teachers working within differing policy contexts would be likely to have systematic differences in their beliefs about assessment. The model underpinning the study reported in this paper (i.e., Brown & Harris, 2009) posits that societal and cultural contexts determine the nature of policy priorities; that policy priorities, including reforms or innovations, affect teachers’ beliefs and attitudes; and that these, in turn, influence the translation of policy into practice in the classroom and school. Hence, student achievement outcomes are partly a function of policy reforms as implemented by teachers. In this model, teachers’ beliefs mediate between policy and outcomes and policy directions are also partly a function of priorities within a society or culture, suggesting that variation in culture, society, policy, and practices can lead to systematic variation in teachers’ beliefs. In terms of assessment and feedback practices, the largest tensions in teacher thinking seem to exist between improvement and accountability (Harris & Brown, 2009). Generally, teachers can be assumed to be very much in favour of Assessment for Learning practices that lead to improved learning (e.g., feedback), whereas, teachers are presumably neutral or negative about the use of assessment for teacher and school accountability. This paper compares how two samples of teacher, one from Louisiana, USA and one from New Zealand, responded to a new self-report inventory about their conceptions of feedback, consisting of 71 items related to 10 theoretically derived constructs (Harris & Brown, 2008). The theoretical framework suggests that the degree to which the policy framework emphasises high-stakes consequences within an accountability framework, the more teachers are likely to conceive of feedback in terms of grading, ranking, and accountability, whereas, the more the policy framework is low stakes, the more teachers will likely conceive of feedback in terms of learning-oriented improvement. Hence, it is important to examine these two contexts to understand how they are similar and different when interpreting the results of this study.

New Zealand has no public examinations before senior secondary school, whereas Hong Kong uses public examinations extensively (Choi, 1999). 1


4

Contexts of the Studies The two locations used in this study, while a convenience sample, have extremely different assessment contexts, with one reflecting a very high-stakes accountability-oriented framework and the other a much lower-stakes improvement-oriented framework. Comparing and contrasting these contexts can help evaluate whether contextual factors may be associated with teacher thinking. Louisiana. Louisiana schools are under tremendous pressure to improve academic achievement as evaluated through external testing and accountability measures (e.g., NCLB), often considered Assessment of Learning. Louisiana, with a population of about 4.5 million people (US Census Bureau, 2009), has the second highest rate of poverty in the United States at 18% (Council for a Better Louisiana, 2006), along with a high rate of minority students (52% minority students in Louisiana compared with 41% nationwide) (The Louisiana Department of Education, 2006). Seventy-six percent of public schools remain below Louisiana’s 10-year performance goal set for the end of the 2009-10 school year (Council for a Better Louisiana, 2009). At the time of the study, Louisiana offered alternative schools as an option for students struggling within the mainstream2; these alternative schools primarily catered for students who were academically functioning two or more years below grade level or had major behavioural difficulties. At a state level, Louisiana schools are judged based on a weighted composite of achievement, attendance, and dropout data (Schildkamp & Visscher, 2010). Schools not meeting targets are often the subject of media bashings, forced to adopt a narrow test-based curriculum, and may be encouraged to focus on strategies designed to move students on the brink of classification (i.e., ‘bubble kids’) across the line, often at the expense of other student groups (Schildkamp & Visscher, 2010). In more extreme cases, in schools not meeting targets, the principal may be fired or transferred, the school converted to a charter school, or half of the teachers fired (Louisiana Educator, 2010). However, the irony of the charter school option is that most public schools converted into charter schools recently have failed to improve upon student results (Lussier, 2010). Hence, in all school settings, teachers and school administrators are under tremendous pressure to produce “results” identified through standardised measures. New Zealand. New Zealand as an educational context is quite different to Louisiana. Because of the country’s relatively small population [approximately 4.4 million according to Statistics New Zealand (2011)], there is a national education system. While the majority of New Zealanders identify as Pakeha3, approximately 15% of the population identify as Maori. There are also rapidly growing populations of people identifying as Pasifika4 (approximately 6%) and Asian (approximately 8%), due, in part, to an influx in immigration from these regions (Statistics New Zealand, 2011). New Zealand has a somewhat unique assessment environment. School quality is determined through triennial reviews by the Education Review Office; these do not require schools to demonstrate effectiveness through any one assessment method. The New Zealand Ministry of Education (2010, p. 5) espouses Assessment for Learning as its assessment philosophy, stating “We have a deliberate focus on the use of professional teacher judgment underpinned by assessment for learning principles rather than a narrow testing regime,” a system quite different to that experienced in Louisiana. System monitoring takes place in Years 4 and 8 through the National Education Monitoring Project’s light sampling of student 2

Now, the majority of these alternative schools have been converted into charter schools.

Pakeha is the Māori word for New Zealanders of European decent. 4 Pasifika is a term used to describe people who culturally identify with and/or have immigrated from neighbouring Pacific island nations (e.g., Samoa, Tonga, etc.). 3


5

performance (Brown, et al., 2008). The national assessment policy prior to Year 11 (students nominally 15 years old) emphasizes voluntary, school-based assessment for the purposes of raising achievement and improving instruction relative to the outcomes and objectives specified in the national curriculum (Crooks, 2010). This curriculum is child centred, nonprescriptive, holistic, and integrated while, simultaneously, having managerial overtones with specified outcomes and objectives across multiple levels. Hence, prior to Year 11, all assessment practices are voluntary and low stakes, making it possible for teachers to implement a range of feedback practices without the threats presented by testing and grading. While the primary sector commonly uses informal assessments and standardised tests (Crooks, 2010), mainly for the purpose of improving instruction and student learning (Croft, Strafford, & Mapa, 2000; Hill, 2000), secondary school assessment is often focused on preparing for or implementing the high-stakes, student qualifications system (i.e., the National Certificate of Educational Achievement) which combines internally-administered assessments and external end-of-year examinations. This qualification system begins formally in the third year of secondary schooling when students are about age 15 (Crooks, 2010), however, there is considerable evidence that teacher often introduce the vocabulary and procedures of such assessments in Years 9 and 10 to ‘prepare’ students (Harris, Harnett, & Brown, 2009). Hence, the influence of high-stakes testing on feedback practices becomes stronger as students progress through high school. However, overall tests and examinations in New Zealand are evaluative for students (especially in the final years of schooling), whereas, standardized tests function, for schools, as improvement-oriented assessments While New Zealand and the US state of Louisiana have populations of similar size, they have very different demographic characteristics and assessment environments. The next part of the paper examines how participants were selected from these two regions and discusses how the questionnaire was actually administered. Participants and Procedures Louisiana. The study was carried out in one urban school district that had a large number of students at-risk of academic failure. This school system educates 42,742 students in 88 schools; recent demographic information on these students showed that 76% were ‘in poverty’, 11% were disabled according to Exceptional Students Service Guidelines, and 89% were of minority ethnicity with 62 different ethnic groups represented (Louisiana Department of Education, 2008a). The district’s average test scores indicated unsatisfactory achievement levels, with 15 schools being in danger of takeover by the state of Louisiana because of unsatisfactory school performance scores over a period of many years. Two of these schools were also in danger of disciplinary sanctions from the United States Department of Education because of unsatisfactory subgroup scores (Louisiana Department of Education, 2008b). Four schools had already been taken over by the state, two of which were middle schools (i.e., schools with student nominally aged 11-14). The questionnaire was distributed to 818 middle school teachers within this urban district, leading to 308 completed surveys (38% response rate). Of those who responded, it was self reported that 78% were regular middle school teachers, while 13% were academic alternative middle school teachers. Nine percent reported their role in education as either a discipline alternative teacher (i.e., teacher specialising in the management of difficult student behaviour) or administrative staff (i.e., principal or assistant principal). 80% of participants were female; 47% were Caucasian, 40% African American, 5% Asian, 2% Hispanic, and 5% Other. Fifty-five percent (55%) had more than ten years of teaching experience, while 45% had less than 10 years of experience. The survey questionnaires were administered on-line using a commercial electronic survey collection tool. This format guaranteed anonymity of respondents and was viable thanks to the ubiquity of electronic mail and internet services


6

used regularly in school employment. With the approval of the district Assistant Superintendent, and at her request, principals directed their teachers to participate in the survey as a part of that month’s teacher professional development held at individual schools. Teachers had one week to complete the survey. Reminders were sent out at four different times during the week. New Zealand. The study was carried out nationally, with participants recruited from all over New Zealand. In total, 1492 teacher surveys were delivered to 457 schools, to be completed by teachers of students in Year 5-10. Valid responses were received from 518 teachers, constituting a 35% return rate. Out of 518 valid responses, 72% were female (n=374) and 82% were of New Zealand European or Pakeha ethnicity (n=422). These proportions are consistent with the 2004 Teacher Census (New Zealand Ministry of Education, 2005) which had 80% of respondents as European/Pākeha5;82% of primary and 58% of secondary teachers were female. 56% of the sample had taught more than 10 years. Approximately half (52%) reported as a teacher with no additional responsibilities (e.g., department head, dean, director, manager, or subject specialist). While the questionnaire sent to participants was identical to that received by Louisiana teachers, different distribution methods were used in New Zealand. Printed survey forms were mailed to New Zealand primary and secondary schools which had been selected according to a stratified representative frame using size, region, and socio-economic strata. When forms were returned blank, they were sent out again to a school with a similar stratification. School principals distributed the questionnaires to volunteer teachers who returned their questionnaires directly to the research team in postage paid envelopes. When comparing these two samples, differences in results may partly be a consequence of the different sampling and administration methods. For example, the New Zealand sample is nationally representative while the Louisiana sample is drawn from one district. The New Zealand sample covers elementary/primary and high school/secondary populations, while the Louisiana sample is made up only of middle school teachers; however, it is important to note that there is considerable overlap in the nominal ages of the students being taught by these two samples. The New Zealand sample is much less ethnically diverse than that of Louisiana, while both samples are predominantly female and have similar levels of teaching experience. The administrative method is also noticeably different. Putting aside the issue of paper versus web response format (see Beck, Yan, & Wang, 2009 and Schonlau, van Soest, Kapteyn, & Couper, 2009), the issue of response bias as a consequence of the semi-compulsory approach adopted in Louisiana has to be considered, even though both approaches yielded a similar response rate. These differences must be taken into account when interpreting the results of this study. Instrument As there was no existing instrument designed to measure teacher conceptions of feedback quantitatively, Harris and Brown (2010) devised questionnaire to measure this construct. Items related to ten feedback constructs were drafted. The first four factors related to Irving, Harris, and Peterson’s (in press) four purposes of feedback (i.e., irrelevance, improvement, reporting and compliance, and encouragement). The next four factors were related to Hattie and Timperley’s (2007) four feedback types (i.e., task, process, selfregulation, and self). The final two factors were related to variables arising from the feedback literature. Factor Nine related to the validity of self and peer feedback, while Factor 10 related to feedback’s timing. These ten factors are listed below with a sample item provided for each: 5 Pākeha is the indigenous Māori word for white people.


7

Purposes Irrelevance: Feedback is pointless because students ignore my comments and directions. Improvement: Students use the feedback I give them to improve their work. Reporting and Compliance: I give feedback because my students and parents expect it. Encouragement: The point of feedback is to make students feel good about themselves. Types Task: My feedback tells students whether they have gotten the right answer or not. Process: My feedback focuses on the procedures underpinning tasks rather than whether the work is correct or incorrect. Self-Regulation: Good feedback reminds students that they already know how to check their own work. Self: Good feedback pays attention to student effort over accuracy. Other Peer and self feedback: Students are able to provide accurate and useful feedback to each other and themselves. Timeliness of feedback: Delaying feedback helps students learn to fix things for themselves. Respondents used a six-point, positively-packed agreement rating scale known to generate discrimination in contexts of social desirability (Brown, 2004). Responses were coded: strongly disagree=1, mostly disagree=2, slightly agree=3, moderately agree=4, mostly agree=5, and strongly agree=6. Factorial analysis for each sample was carried out independently. Exploratory results were tested with confirmatory factor analysis and generated models with acceptable fit indices; that is, is χ2/df had p>.05, gamma hat >.90, RMSEA and SRMR