NZ Teacher Conceptions of Assessment
1
Teachers' conceptions of assessment: Comparing primary and secondary teachers in New Zealand Gavin T L Brown Abstract The conceptions teachers have about assessment are assumed to influence their practices and to be consistent with the jurisdictional and policy frameworks in which they work. This paper compares two groups of teachers (i.e., New Zealand primary and secondary) in response to the Teachers’ Conceptions of Assessment (TCoA-IIIA) self-administered survey inventory. The previously reported four-factor hierarchical model for primary teachers (i.e., improvement, irrelevance, school accountability, and student accountability) was found to be statistically invariant with good fit characteristics across both groups using nested, multi-group invariance testing in confirmatory factor analysis. The only statistically significant difference was the mean score for the student accountability conception, which was more strongly endorsed by secondary teachers, consistent with their role in administering the New Zealand qualifications system. The study suggests that teachers develop or adopt conceptions of assessment that allow them to successfully function within their own policy or legal framework.
Author Dr Brown is an Associate Professor in the Department of Curriculum and Instruction, the Hong Kong Institute of Education. From 1 September 2011 he will be Associate Professor in the Faculty of Education at the University of Auckland. His research interests focus on educational assessment and social psychological responses to assessment. Email:
[email protected];
[email protected] Acknowledgements Funding for this research came from Auckland UniServices Ltd. and the University of Auckland Research Office. A University of Auckland Faculty of Education-funded Summer Scholar, Hanida Moh’d Aris, is thanked for her assistance with data preparation. An earlier version of this paper was presented at the 2007 NZARE annual conference. This is the author, pre-publication version of the manuscript to appear in 2011 in volume 3 of Assessment Matters published by NZCER.
NZ Teacher Conceptions of Assessment
2
Teachers’ conceptions of educational processes (e.g., teaching, learning, curriculum, or assessment) are “general mental structure[s], encompassing beliefs, meanings, concepts, propositions, rules, mental images, preferences, and the like” (Thompson, 1992, p. 130) arising largely from the person’s experience of the phenomenon. Conceptions act as a framework through which a teacher views, interprets, and interacts with the teaching environment (Marton, 1981) and these have been shown to have a strong effect on educational practices and outcomes (Calderhead, 1996; Clark & Peterson, 1986). Specifically, teachers’ beliefs about students, learning, teaching, and subjects influence assessment techniques and practices (Cizek, Fitzgerald, Shawn, & Rachor, 1995; Kahn, 2000; Tittle, 1994). New Zealand primary teachers’ espoused use of informal–interactive assessment practices was strongly predicted by their self-reported agreement that assessment was for improved teaching and learning and that assessment was irrelevant (Brown, 2009). Consistent with the importance of intentions in shaping human behaviour (Ajzen, 1991) and the emphasis on assessment for learning (Black & Wiliam, 1998), some researchers have been investigating teachers’ conceptions of assessment by focusing on teachers’ purposes or intentions for assessment (e.g., Brown, 2008; Stamp, 1987) rather than on the types of assessment used (e.g., Gipps, Brown, McCallum, & McAlister, 1995; Hill, 2000; McMillan, Myran, & Workman, 2002). Three major purposes for assessment exist (Heaton, 1975; Torrance & Pryor, 1998; Webb, 1992): improvement—that is, assessment improves teaching and learning; student accountability—assessment makes students accountable for learning’ and school accountability—assessment makes schools and teachers accountable. Further, an anti-purpose can be seen in teachers treating assessment as fundamentally irrelevant to the life and work of teachers and students (irrelevance) (Shohamy, 2001). Improvement, commonly referred to as assessment for learning or formative assessment, has been shown to have a positive effect on educational outcomes through involving students in the assessment process with the primary function of assessment being constructive feedback about who needs to learn what next (Absolum, 2006; Black & Wiliam, 1998; Brookhart, 2004; Hattie & Timperley, 2007; New Zealand Ministry of Education, 1994, 2007; Stiggins, Arter, Chappuis, & Chappuis, 2005). The central tenet of student accountability is that rewards for students, including certification, are determined through the use of information from assessment processes. Some argue that assessment places a necessary and important motivating pressure on students; whereas others emphasise the legitimacy and importance of publicly certifying standards of learning attained by students. The core idea behind school accountability is that assessment results are used to evaluate the quality of schooling (whether that be individual teachers, schools, or whole systems). Two distinct and potentially competitive rationales exist: demonstrating to governments, taxpayers, and the public that schools and teachers deliver quality instruction and results (Smith & Fey, 2000); and systematically improving the quality of instructional results, sometimes known as schooling improvement (Linn, 2000). Irrelevance is based on the view that external evaluation processes are inadequate, inaccurate, and/or irrelevant to teachers’ ability to improve student learning (Shohamy, 2001). Teachers have been reported as having multiple and conflicting conceptions of assessment. For example, in some studies, teachers have agreed that assessment can improve teaching and learning (Bulterman-Bos, Verloop, Terwel, & Wardekker, 2003; Garcia, 1987; Saltzgaver, 1983) and, in the same studies, they have also indicated that they treat assessment as irrelevant despite their having to use it. Other
NZ Teacher Conceptions of Assessment
3
studies, which have shown that teachers endorsed improvement purposes, have also reported that simultaneously that (1) they disagreed with the use of assessment to make students accountable (Philippou & Christou, 1997); (2) they believed assessment interferes with teachers’ work (Stamp, 1987) or has forced them to conduct assessments they did not believe in (James & Pedder, 2005); (3) they considered assessment should not be used to report to others, even though this is what was actually done (Warren & Nisbet, 1999); (4) they considered assessment extremely relevant to their teaching practice (Lovett & Sinclair, 2005); or (5) they used assessment to fulfil student qualification requirements (Hume & Coll, 2009). A phenomenographic analysis of 26 New Zealand teachers’ conceptions of assessment identified significant tensions around the competing purposes of improvement and accountability (Harris & Brown, 2009). It seems likely, if teacher beliefs are ecologically rational responses (Rieskamp & Reimer, 2007) to policies and practices in their professional environment, that teachers should have multiple and possibly contradictory conceptions of assessment because assessment is used in multiple ways. For example, educational systems use assessments to simultaneously guide improvements in teaching and learning (improvement), to judge the quality of student learning (student accountability), and to make inferences about the quality of schools (school accountability). Hence, while improvement and accountability might be construed as polar opposites, it is just as likely that teachers will have differing levels of positive endorsement for both of these purposes, if they accept or are required to accept the legitimacy of accountability mechanisms. Furthermore, assuming belief systems are ecologically rational, differences between societies and cultures in how assessment is used should also generate differences in how teachers conceive of assessment, a point argued by Brown, Lake, & Matters (2011) in their analysis of Queensland teachers’ conceptions of assessment. Hence, rather than attempt a reductionist account of how teachers’ conceive of assessment as either being in favour of summative or formative assessment, it seems appropriate to examine teachers’ responses to the four contrasting, possibly contradictory, purposes of assessment. Allowing teachers to have opinions about all four purposes simultaneously permits detailed examination of how the teachers associate the purposes with each other and how strongly they endorse each purpose. It is for this reason that a survey inventory of Teachers’ Conceptions of Assessment (TCoA) was developed (Brown, 2002).
The Teachers’ Conceptions of Assessment inventory Two early versions of the TCoA inventory were developed with relatively small samples of practising and pre-service teachers (Brown, 2002). The TCoA-III was administered to a large national survey of New Zealand primary teachers (Brown, 2004) and the abridged version (TCoA-IIIA), which has the same structure as the full version, was validated with large samples of Queensland primary and secondary teachers (Brown, Lake, & Matters, 2011) and with a Chinese translation in Hong Kong (Brown, Kennedy, Fok, Chan, & Yu, 2009). The TCoA inventory is a selfreported survey that allows teachers to indicate their level of agreement with statements related to the four main purposes of assessment described above. Brown (2008) found that New Zealand primary teachers tended to agree with the improvement and student accountability conceptions and disagreed with the irrelevance and school accountability conceptions. Irrelevance was positively correlated with student accountability and negatively correlated with improvement, while improvement was positively correlated with school accountability. These results
NZ Teacher Conceptions of Assessment
4
suggested that New Zealand primary teachers saw assessment as a relevant means of improving teaching and learning that also simultaneously held students accountable, an aspect that they saw as irrelevant. Furthermore, while they rejected the idea that schools should be held accountable through assessment, they associated improved teaching and learning as an aspect of quality schooling. This positive association between improvement and school accountability meant that teachers endorsed deep cognitive assessments as the only way to reveal the difference schools made (Brown, 2009). However, it remains to be seen whether New Zealand secondary and primary teachers have similar conceptions of assessment. It may be that secondary teachers have different emphases around the four conceptions of assessment because of the effect the national qualifications system has on teaching and assessment practices. For example, high school teachers may agree less with improvement because the need to cover a broad syllabus in preparation for qualifications would reduce the formative effect of assessment. On the other hand, the current New Zealand qualifications system (i.e., unit and achievement standards) offers many opportunities for assessment to be an integral part of a formative teaching process (New Zealand Qualifications Authority, 2001). Furthermore, as adolescents progress through secondary schooling, there is a well-established expectation that students will take increasing responsibility for their learning and lives. Hence, especially given that New Zealand secondary teachers act as in-school assessors or examiners for the qualifications system, it is possible that secondary teachers, in contrast to primary teachers, would place a greater emphasis on student accountability as the purpose of assessment. It is also likely that among secondary school teachers, student accountability would be less positively correlated with irrelevance; in other words, evaluating students might not be irrelevant since secondary teachers are required to participate in the certification assessment of their students. Thus, it is pertinent and useful to understand whether primary and secondary teachers in New Zealand actually conceive of assessment differently. Understanding similarities and differences in how assessment is conceived might assist in improving the quality of primary–secondary sector conversations about assessment practices. This may also have a beneficial effect for children as they make the transition between sectors.
The study This paper examines the responses of New Zealand primary and secondary teachers to the Teachers’ Conceptions of Assessment (version III abridged) questionnaire to determine the degree to which the questionnaire, developed with primary teachers, was admissible and well-fitting to the responses of secondary teachers. Then, the study examines the degree to which the TCoA-IIIA four-factor analytic model was statistically equivalent for both groups. After evaluating the psychometric properties of the inventory across the two groups, the paper examines the differences in mean scores and factor inter-correlations to answer questions about how primary and secondary teachers in New Zealand are different in their conceptions of assessment. Context At the time of this study, the New Zealand Ministry of Education required schools to monitor and ensure that students reached expected levels of achievement and, at the same time, use voluntary, school-based assessments for the purposes of
NZ Teacher Conceptions of Assessment
5
raising achievement and improving the quality of teaching programmes (Ministry of Education, 1994). The national policy required school assessments to provide clear indicators of student performance relative to the outcomes specified in the national curriculum statements (New Zealand Ministry of Education, 1993, 2007). A range of nationally standardised assessment tools (e.g., exemplars, item resource banks, computerised teacher-managed testing tools) were available for teachers to use as appropriate (Crooks, 2010). Additionally, the Ministry of Education provided funding for professional development programmes that focus on improving teachers’ use of assessment for learning (e.g., Assessment for Better Learning, Assess to Learn). New Zealand primary school teachers make extensive use of teacher-made observations, conversations, checklists, and standardised tests (Crooks, 2010). Most teachers reported that they used voluntary diagnostic assessments frequently or always to change the way they taught their students; for example, by identifying students for further appraisal, grouping students for instruction, and planning instructional activities (Croft, Strafford, & Mapa, 2000). Also, primary school teachers use their assessments to evaluate their own teaching programmes (Hill, 2000). In contrast, the secondary school assessment environment, while being governed by the same policy framework as the primary school system, was dominated by the National Qualifications Framework (NQF) (New Zealand Qualifications Authority, 1993). Officially, school qualifications assessment begins in the third year (students nominally aged 15) of secondary schooling (i.e., National Certificate of Educational Achievement [NCEA] Level 1) (Crooks, 2010). However, the importance of the externally-administered school qualifications has meant considerable washback effects, with much adoption of qualifications assessment systems in the first two years of secondary schooling (Bashford, 2007; Mizutani, 2006; Rae, 2007). Furthermore, approximately half of the content in each subject is evaluated through school-based teacher assessments of student performances (i.e., internal assessments). This means that teachers act as assessors as well as instructors throughout the three levels of the NCEA administered in New Zealand secondary schools. Hence, we might expect, given the quite different roles that secondary and primary teachers play in regard to student qualifications purposes, that secondary teachers will endorse student accountability more than primary teachers. Furthermore, given the importance placed on school self-evaluation, that primary school teachers might endorse assessment as school accountability more than secondary teachers. Nonetheless, given the common policy framework emphasising assessment for learning, it seems plausible that agreement for improvement and irrelevance should be similar in both populations. Participants Data were obtained between 2001 and 2007. During this time period, the New Zealand assessment policy and practice environment was relatively stable. School principals were asked to distribute questionnaires to staff inviting them to voluntarily complete them. To protect anonymity, teachers returned their completed questionnaire directly to the researcher. Details of the primary and secondary teacher samples are given in Table 1. New Zealand primary teachers In 2001, a nationally representative sample of 525 teachers responded to the full inventory (Brown, 2004). An additional 48 responses to the abridged inventory were
NZ Teacher Conceptions of Assessment
6
obtained in 2004 from teachers in the Auckland region. Of the 573 teachers, 81 percent worked in either contributing primary or full primary schools. Approximately 83 percent were of New Zealand European or P keh ethnicity (vs. 87 percent in New Zealand primary population), 70 percent were female (vs. 71 percent in New Zealand primary population), and 70 percent had taught for more than 10 years (vs. 50 percent long service in New Zealand primary population). Twenty-seven percent of participants were school principals or associate/deputy principals. This group included principals who had an active teaching role and those who did not. The gender and ethnicity demographic characteristics of the teachers in this sample reflected those of the Ministry of Education (Sturrock, 1999) New Zealand primary school teacher census. New Zealand secondary teachers In 2005, nine teachers in the Auckland region responded to the abridged inventory (Brown, 2005). A nationally representative sample of 395 teachers responded to the full inventory in 2007. Eighty-four percent of the participants were employed in secondary schools (either Year 7–13 or Year 9–13). Of these 404 teachers, just 54 percent were female, 75 percent were of New Zealand European or Pākehā ethnicity, and 67 percent had taught more than 10 years. These proportions are consistent with the 2004 teacher census,i which reported 80 percent of respondents as European/Pākehā and 58 percent of secondary teachers as female. Only 9 percent were not active classroom teachers (i.e., the principals and associate/deputy principals). Table 1 Demographic Characteristics of Primary and Secondary Teacher Samples (Percent)
Characteristics Ethnicity New Zealand European/P keh New Zealand M ori Asian/Other Pacific Nation Missing Sex Female Male Missing Years of Teaching Experience More than 10 Between 6 and 10 Between 2 and 5 Less than 2 Missing Role in School Principal Associate Principal or Deputy Principal Senior Teacher Dept HoD; Assistant HoD; Faculty HoD
Level Primary Secondary (n = 573) (n = 404) 83
75
6 7 1 2
4 12 1 9
70 21 1
54 38 8
70 12 13 7 1
67 15 6 2 10
12 15 11 0
1 6 49
NZ Teacher Conceptions of Assessment
Characteristics Teacher Other/ Missing School Type Secondary (Years 9–13) Secondary (Years 7–13) Intermediate (Years 7–8) Full Primary (Years 1–8) Contributing Primary (Years 1–6) Composite Missing
7
Level Primary Secondary (n = 573) (n = 404) 49 30 4 13 8 41 40 2
67 17 13 3
Instruments Teacher Conceptions of Assessment The Teacher Conceptions of Assessment version III abridged (TCoA-IIIA) inventory, consisting of 27 items (Brown, 2006), was used. Responses aggregate into nine factors, each made up of three items, which form four inter-correlated, intentionoriented conceptions of assessment (i.e., improvement, student accountability, school accountability, and irrelevance). The improvement conception had four contributing factors (i.e., assessment describes student learning, assessment is valid, assessment improves student learning, and assessment improves teaching). Irrelevance had three contributing factors (i.e., assessment is unfair, assessment is ignored, and assessment is inaccurate). Student accountability and school accountability conceptions included single factors. More details of each item are seen in Figure 1. Participants responded by selecting one of six degrees of agreement ratings that best expressed their opinion about each statement. The rating scale used a positivelypacked format in which there are two negative categories (i.e., strongly disagree, mostly disagree) and four positive categories (i.e., slightly agree, moderately agree, mostly agree, strongly agree). This response format is beneficial when it is expected participants are positively inclined towards various constructs and generates more variance in the positive range (Klockars & Yamagishi, 1988; Lam & Klockars, 1982). Definitions of assessment In order to determine which types of assessment teachers were thinking of as they responded to the TCoA-IIIA, they were presented with a list of 11 practices and were asked to select all that they had in mind when they thought about assessment. The list of practices formed two conceptual groups: a formal, test-like group (i.e., teachermade written test, standardised test, essay test, and 1–3 hour examination) and an informal, interactive group (i.e., unplanned observation, oral question and answer, planned observations, student written work, student self- or peer-assessment, conferencing, and portfolio/scrapbook). It was expected that secondary teachers would, on average, choose more of the formal, test-like practices as assessment, while primary teachers, would, on average, choose more of the interactive, informal assessments. Analysis
NZ Teacher Conceptions of Assessment
8
Because an existing model for the TCoA-IIIA was available, responses for the primary and secondary teacher groups were analysed using confirmatory factor analysis and multi-group invariance testing. The goal was to identify whether the original model from primary teachers fit the secondary teacher data well and to what extent, the model was equivalent for both groups. Data preparation Cases with fewer than 90 percent valid responses were removed and values for responses missing at random for the two instruments were calculated using the expectation maximisation missing values procedure (Dempster, Laird, & Rubin, 1977). Confirmatory factor analysis Confirmatory factor analysis tests the fit of a set of pathways within and among factors by using the factor patterns, covariance patterns, and residual or error values within a data matrix (Byrne, 2001; Hoyle, 1995; Klem, 2000). In confirmatory factor analysis, relationships between variables and latent factors that are not expected are set to zero, while the expected relationships are free to load onto their appropriate factors (Byrne, 2001). Large samples are required to provide stable parameter estimates; N>400 have robustness against nonconvergence, improper solutions, or Heywood cases (Boomsma & Hoogland, 2001). Thus, group size would not be a significant factor in model estimation. The quality of fit for a model to the underlying data matrix is best tested with measures that are not affected by sample size or model complexity. In line with current practice (Cheung & Rensvold, 2002; Fan & Sivo, 2007; Marsh, Hau, & Wen, 2004; Vandenberg & Lance, 2000), acceptable fit for a model was imputed when the 2 per df was statistically nonsignificant (p>.05), gamma hat >.90, and RMSEA and standardized root mean residuals (SRMR) were both .05). Thus, in terms of improvement, primary and secondary teachers thought alike. Likewise, the differences in school accountability and irrelevance were not statistically significant. In general, the teachers disagreed (i.e., less than slight agreement) with school accountability. In contrast, for student accountability, there was a moderate and statistically significant difference in mean scores. Secondary teachers agreed moderately more with this construct (d = .42), a moderate difference. In general, the teachers disagreed with the factors “assessment is bad” and “assessment is ignored” (i.e., M < 3.0 slightly agree), while they gave moderate agreement (M = 3.86) to the factor “assessment is inaccurate”. Although they rejected the notion that assessment was negative, destructive, or irrelevant, they did agree that it was inaccurate. It would be interesting to discover whether this level of agreement applied equally to formal, externally sourced tests and examinations and to teachermade observations, checklists, and judgements. Thus, it can be concluded that both groups of teachers had very similar levels of agreement about three of the conceptions of assessment and differed only on student accountability, a conception agreed to more by secondary teachers. In general, the teachers agreed with improvement, while disagreeing with school accountability. This pattern of mean scores suggests that messages about assessment for improvement would be positively received, while messages about making schools and teachers accountable through assessment would probably be rejected. The results are also consistent with the notion that endorsement of grading, evaluating, and holding students accountable as a legitimate function of assessment is ecologically rational within the New Zealand framework of qualifications assessment in secondary school. Conceptions correlational structures The conceptions inter-correlations were statistically equivalent and the values (Figure 1) apply equally to both groups. Irrelevance was positively correlated with student accountability and inversely related to improvement and school accountability. Student accountability was positively related to both school accountability and improvement; and was positively associated with improvement. Contrary to expectations, secondary teacher responses did not associate student accountability with improvement more than primary teachers, nor did they connect student accountability with irrelevance more than primary teachers.
NZ Teacher Conceptions of Assessment
12
Table 3 Model 2 Factor and Conceptions Statistics [Mean Scores (SD); MANOVA, Effect Size]
Group
N
NZ Prim
573
NZ Sec
404
MANOVA N=977; df=1, F (p) NZ Prim 573 NZ Sec 404 MANOVA
Bad 2.42 (.85) 2.50 (.87) 2.36 (.13)
Irrelevance Ignore Inaccurate 2.43 (.94) 2.33 (.83)
3.86 (.94)
.01 (.92)
2.83 (.09)
3.86 (.97)
Improvement Valid Student Learning 3.97 (.83) 3.35 4.45 (.90) (.94) 3.93 (.80) 3.55 4.39 (.84) (.89) 11.30 .51 (.47) (.00) 1.14 (.29) Describe
2.90 (.68) 2.90 (.65)
4.10 (.69) 4.02 (.66)
F(1,977)=.00; p=.96 .00
F(1,977)=2.71; p=.10 .11
Effect Size (Cohen’s d) Note: = statistically significant difference by group.
Teaching
Accountability School Student
4.61 (.87)
2.70 (1.01)
3.54 (.95)
4.21 (.85)
2.68 (1.06)
3.93 (.86)
49.41 (.00)
.09 (.77) 2.70 (1.01) 2.68 (1.06) F(1,977)=.09; p=.77 .02
42.04 (.00) 3.54 (.95) 3.93 (.86) F(1,977)=42.04; pprimary; N: Primary=475; Secondary=375; ***=p