washback, impact and consequences - Springer Link

5 downloads 0 Views 150KB Size Report
Consequences of high-stakes testing refer to both intended or unintended and positive or negative aspects of instruction, students, teachers, and the school.
LIYING CHENG

WASHBACK, IMPACT AND CONSEQUENCES

INTRODUCTION

Testing, large-scale high-stakes testing in particular, tends to induce consequences for its stakeholders. It is clear that “testing is never a neutral process and always has consequences” (Stobart, 2003, p. 140). Testing is a differentiating ritual for students: “for every one who advances there will be some who stay behind” (Wall, 2000, p. 500). It is well known in the field of education that there is a set of relationships, intended and unintended, positive and negative, between testing, teaching and learning. The earliest literature can possibly be traced back to Latham (1877) when he referred to an examination system as an “encroaching power,” and How it influences the prevalent view of life and work among young men, and how it affects parents, teachers, the writers of educational books, and the notion of the public about education (p. 2). Washback and impact of language testing is, however, a relatively new concept. Comparatively, there is a longer and more substantial amount of research conducted in general education where researchers refer to the phenomenon as measurement-driven instruction (e.g., Popham, 1987), test-curriculum alignment (Shepard, 1990), and consequences (Cizek, 2001) (see Cheng and Curtis, 2004 for a detailed review). The concept of measurement-driven instruction stipulates that testing should drive instruction. Test-curriculum alignment focuses on the relationship between test content and curriculum, which can result in narrowing of the curriculum by teaching the test. Consequences of high-stakes testing refer to both intended or unintended and positive or negative aspects of instruction, students, teachers, and the school. Only since the late 1980s, there has been a rapid increasing number of studies conducted in language testing (see Alderson and Wall, 1993; Bailey, 1996; Wall, 1997 for a review of earlier washback studies). Wall (1997) defines impact as “any of the effects that a test may have on individuals, policies or practices, within the classroom, the school, the educational system or society as a whole.” She also points out that “washback (also known as backwash) is sometimes used as a synonym of impact, but it is more frequently used to refer to the effects of tests on teaching and learning” (p. 291). Some language testers consider

E. Shohamy and N. H. Hornberger (eds), Encyclopedia of Language and Education, 2nd Edition, Volume 7: Language Testing and Assessment, 349–364. #2008 Springer Science+Business Media LLC.

350

LIYING CHENG

washback as one dimension of impact (Bachman and Palmer, 1996; Hamp-Lyons, 1997). Hamp-Lyons (1997) suggested a view of test influence that would fall between the narrow one of washback and the all-encompassing one of impact. Primarily, the effects of testing on teaching and learning have been associated with test validity (consequential validity) where Messick refers to washback as “only one form of testing consequences that need to be weighted in evaluating validity” (Messick, 1996, p. 243). He promotes the examination of the two threats to test validity, construct under-representation and construct-irrelevant variance, to decide the possible consequences that a test can have on teaching and learning. Bachman (2005) proposes a framework with a set of principles and procedures for linking test scores and score-based inferences to test use and the consequences of test use. In addition, the effects of testing on teaching and learning are increasingly discussed from the point of view of critical language testing (Shohamy, 2001) including ethics and fairness in language testing (Elder, 1997; Hamp-Lyons, 1997; Kunnan, 2000; Davies, Ethics, Professionalism, Rights and Codes, Volume 7), all of which are expressions of social concern. Shohamy (2001) pointed out the political uses and abuses of language tests and called for examining the hidden agendas of the testing industry and of high-stakes tests. Kunnan (2000) discussed the role of tests as instruments of social policy and control. He also drew on research in ethics to link validity and consequences and created a test fairness framework (Kunnan, 2004). Hamp-Lyons (1997) argued for an encompassing ethics framework to examine the consequences of testing on language learning at the classroom as well as the educational, social, and political levels. All of the above has led to the creation of a Code of Ethics for the International Language Testing Association (see Davies, 2003). E A R LY D E V E L O P M E N T S

The work of Alderson and Wall (1993) and Wall and Alderson (1993) marked a significant development in shaping the constructs of washback studies for the field of language testing. Alderson and Wall (1993) explored the potential positive and negative relationship between testing, teaching and learning, and questioned whether washback could be a property of test validity. They consequently proposed 15 hypotheses regarding the potential influence of language testing on various aspects of language teaching and learning, which thus directed washback studies for years to come. The study of Wall and Alderson (1993) was the first empirical research published in the field of language testing, investigating the nature of washback of a new national English

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 351

examination in Sri Lanka by observing what was happening inside the classroom. A review of the early literature indicates there seems to be at least two major types of washback studies: those relating to traditional, multiple-choice, large-scale standardized tests, which are perceived to have had mainly negative influences on the quality of teaching and learning (Shepard, 1990) and those studies where a specific test or examination has been modified and improved upon (e.g., assessment with more communicative tasks in Wall and Alderson, 1993) in order to exert a positive influence on teaching and learning (see also Cheng, 2005). In fact, many studies in language testing have focused on this aspect of the washback mechanism, that is, changing teaching and learning using the influence of language testing although these studies have shown positive, negative and/or no influence on teaching and learning. In 1996, a special issue in Language Testing published a series of articles which further explored the nature of washback and empirically investigated the relationship between testing, teaching and learning. In this volume, Messick (1996) suggested building in validity considerations through test design in order to promote positive washback and to avoid construct under-representation and construct-irrelevant variance. Although Messick did not specify how researchers could go about studying washback through test design validation, he pointed out that test washback could be associated with test property—the potential relationship between test design and its consequences on teaching and learning. In this way, he brought us a coherent argument to investigate the factors in testing in relation to the factors in teaching and learning. Bailey (1996, p. 268), however, argued that any test, whether good or bad in terms of validity, can have either negative or positive washback “to the extent that it promotes or impedes the accomplishment of educational goals held by learners and/or program personnel.” Her argument indicated that washback effects (positive or negative) might differ for different groups of stakeholders. Wall (1996) stressed the difficulties in finding explanations of how tests exert influence on teaching and turned to innovation theory to offer “insights into why attempts to introduce change in the classroom are often not as effective as their designers hoped they would be” (p. 334). The three empirical research studies reported in the same special issue further demonstrated that washback effects occur to a different extent in relation to different individuals and different aspects of teaching and learning within a specific educational context. In particular, language tests are seen to have a more direct washback effect on teaching content rather than teaching methodology. In their study of washback on Test of English as a Foreign Language (TOEFL) preparation

352

LIYING CHENG

courses, Alderson and Hamp-Lyons (1996) found that the TOEFL test affects both what and how teachers teach, but the effect is not the same in degree or kind from teacher to teacher, and the simple difference of TOEFL versus non-TOEFL teaching did not explain why the teacher taught the way they did. Watanabe (1996) investigated the effect of the university entrance examination on the prevalent use of the grammar-translation method in Japan. His analyses of the past English examinations, classroom observations and interviews with teachers showed very little relationship between the test content and the use of this particular teaching methodology. Rather, teacher factors, including personal beliefs, past education, and academic background, seemed to be more important in determining the teaching methodology a teacher employs. Shohamy, Donitsa-Schmidt, and Ferman (1996) pointed out that the degree of impact of a test is often influenced by several other factors: the status of the subject matter tested, the nature of the test (low or high stakes), and the uses to which the test scores are put. Furthermore, the washback effect may change over time and may not last indefinitely within the system. In summary, testing may be only one of those factors that “affect how innovations [through testing] succeed or fail and that influence teacher (and pupil) behaviors” (Wall and Alderson, 1993, p. 68). MAJOR CONTRIBUTIONS

The 10 years since the 1996 special issue in Language Testing have seen a flurry of empirical studies investigating different tests within different teaching and learning contexts. These studies have investigated the influence of testing on teachers (including teaching assistants) and teaching (Borrows, 2004; Cheng, 2005; Ferman, 2004; Hayes and Read, 2004; Nazari, 2005; Scaramucci, 2002; Saif, 2006; Wall, 2006), textbooks (Read and Hayes, 2003; Saville and Hawkey, 2004; Yu and Tung, 2005), learners and learning (Andrews, Fullilove, and Wong, 2002; Chen and He, 2003; Robb and Ercanbrack, 1999; Watanabe, 2001), attitudes toward testing (Cheng, 2005; Jin, 2000; Read and Hayes, 2003), and test preparation behaviors (Stoneman, 2005). Some of these studies investigated the influence of a national English examination on the local English language teaching and learning due to its high-stakes nature in a particular country such as Brazil (Scaramucci, 2002), China (Jin, 2000; Qi, 2004, 2005; Zhao, 2003), Hong Kong (Andrews, 1995; Andrews, Fullilove, and Wong, 2002; Cheng, 2005), Iran (Nazari, 2005; Nemati, 2003), Israel (Ferman, 2004; Shohamy, Donitsa-Schmidt, and Ferman, 1996), Japan (Watanabe, 1996), Romania (Gosa, 2004), Sri Lanka (Wall, 2005), and Taiwan (Chen, 2002; Shih, 2005). Some of these studies investigated worldwide English testing

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 353

such as the International English Language Testing System (IELTS) (Green, 2003; Hayes and Read, 2004; Nguyen, 1997), TOEFL (Alderson and Hamp-Lyons, 1996; Robb and Ercanbrack, 1999), and the Michigan Examination for Certificate of Competency (Irvine-Niakaris, 1997). Two major longitudinal studies by Dianne Wall (2005) and Liying Cheng (2005) (also published in various journal articles since the 1990s) have made a substantial contribution to the understanding of the complexity of washback and offered methodological implications for washback studies. Wall (2005) documents a study examining one of the widely believed ways of creating change in an education system by introducing or by redesigning high-stakes examinations. Her study analyzed the effects of a national examination in English as a Foreign Language in Sri Lanka that was meant to serve as a lever for change. Her study illustrated how the intended outcome was altered by factors in the exam itself, as well as the characteristics of the educational setting, the teachers and the learners. Her study reviewed the literature of examination impact and innovation in education, and provided guidelines for the consideration of educators who continue to believe in the potential of examinations to affect curriculum change. This method is not foolproof, however, as there are many factors, which can affect the impact of such an innovation (Wall, 2005). Cheng investigated the impact of the Hong Kong Certificate of Education Examination in English (HKCEE), a high-stakes public examination, on the classroom teaching and learning of English in Hong Kong secondary schools—a situation similar to that reported by Wall (2005) where an examination is used as the change agent (see Cheng, 2005). The washback effect of this public examination change was observed initially at the macrolevel, including different parties or levels of stakeholders within the Hong Kong educational context, and subsequently at the microlevel, in terms of classroom teaching and learning, including aspects of teachers’ attitudes (also learners’), teaching contents and classroom interaction. This was a large-scale, three-phase study using multiple methods to explore the multivariate nature of washback. The findings indicate that the washback effect of the new examination on classroom teaching is limited, although the new examination was specifically designed to bring about positive washback effects on teaching and learning in schools. Her study further demonstrated a similar situation found in Wall and Alderson (1993), that is, that the change of the examination has informed what teachers teach, but not how. Cheng, Watanabe, with Curtis’s Washback in Language Testing: Research Context and Methods (2004) is a cornerstone collection of washback studies—an area of research, which attracted the initial attention of the field of language testing about 20 years ago. This

354

LIYING CHENG

volume is the first systematic attempt to capture the essence of washback and has, through its collection of washback studies from around the world, responded to the question “what does washback look like?” (Cheng, Wanatabe, and Curtis, 2004, p. ix)—a step further from the question “does washback exist?” posed by Alderson and Wall (1993). This volume consists of two sections: the first section highlights the concept and nature of washback by providing a historical review of the phenomenon by Cheng and Curtis, the methodology to guide washback studies where Watanabe disentangles the dimensions and aspects of washback research, and a further critique of washback and curriculum innovation by Andrews where he pointed out that “it is precisely the power of high-stakes tests (or the strength of the perceptions which are held about them) that makes them potentially so influential upon the curriculum and curricular innovation” (p. 37). The second section showcases a range of studies conducted in: The USA—Stecher, Chun, and Barron investigated the influence of tests on school practices as a result of the introduction of test-based reform efforts at the state level; The UK—Saville and Hawkey looked at the impact of IELTS on the content and nature of IELTS-related teaching materials; New Zealand—Hayes and Read reported their study of the impact of IELTS on the way international students prepare for academic study in New Zealand; Australia—Burrows explored the differences between teachers’ classroom practices prior to and after the introduction of a new competencybased curriculum and assessment taking into account teachers’ belief systems and their consequent response to the change; Japan—Watanabe explored the teacher factors mediating washback through detailed classroom observations within the school setting; Hong Kong—Cheng used repeated survey measure investigating the effects of a newly introduced examination in Hong Kong on teachers’ pedagogical practices; China—Qi explored the intended washback on secondary school English teaching and learning from the point of view of both the test developers and that of the senior teachers; Israel—Ferman investigated whether and how a national high-stakes testing affected the educational processes, teachers and students, and their teaching and learning. This book brings together washback studies on various aspects of teaching and learning conducted in many parts of the world and constitutes a substantial body of research that has contributed to our understanding of washback and impact of our tests. Once the washback effect had been examined in the above empirical studies, we no longer could take for granted that where there is a test, there is a direct effect.

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 355

Washback is a highly complex phenomenon, and these studies show that simply changing the contents or methods of an examination will not necessarily bring about direct and desirable changes in teaching and learning. Rather, various factors within a particular educational context are involved in engineering desirable washback. However, questions remain about what factors are involved and under which conditions beneficial washback is most likely to be generated. WORK IN PROGRESS

Recently, several important projects commissioned by major testing agencies such as Cambridge ESOL and Educational Testing Services (ETS) have increasingly played a major role in producing clusters of washback and impact studies. These studies are conducted in many countries around the world on the same test, for example, TOEFL or IELTS. These studies tend to be large-scale, multiphased, and multifaceted, and offer important directions for future research. Impact (including washback) is a key focus of the Cambridge ESOL research and validation program, which is designed to ensure that all ESOL assessment products meet acceptable standards in relation to the four essential test qualities of validity, reliability, impact and practicality. With impact being one of the key assessment properties, longterm research on IELTS such as the three-phase IELTS impact study has been implemented through the Cambridge ESOL Research & Validation Group. Hawkey’s study (2006) exemplifies test impact work in the context of Cambridge ESOL’s test production methodology and ongoing validation program. This book provides illustrative examples of how impact studies may be undertaken from both the IELTS impact study and from the Progetto Lingue 2000 impact study in Italy. The two studies provide thorough and detailed information on the washback and impact of the tests on a range of stakeholders, candidates and learners, teachers, education managers, and receiving institutions. The data presented provide direct relevant feedback to the test and program validation. In addition to such fairly large-scale impact studies, 65 projects under the joint IDP Education Australia/British Council IELTS funded research program which is managed jointly with Cambridge ESOL, have included, since 2002, around 20 studies directly investigating test impact and washback (see further details at www.CambridgeESOL.org/ rs_notes). These studies have been conducted in different parts of world with test-takers taking IELTS, and have investigated aspects such as:  Candidate identity, their learning and performance, with specific reference to the affective and academic impact of IELTS on successful IELTS students

356 

LIYING CHENG

Ethnographic study of classroom instruction, and of the relationship between teacher background and classroom instruction on an IELTS preparation program  The impact of IELTS on receiving institutions, for example, tertiary decision-maker attitudes to English language tests; the use of IELTS for university selection, and IELTS as a predictor of academic language performance  Perceptions of the IELTS skills modules, for example, the speaking test and the writing test, features of written language production, the impact of computer versus pen-and-paper versions. These clusters of studies not only investigated the encompassing consequences of IELTS on a broad range of factors in relation to IELTS (impact as defined in the Introduction), they also investigated the effects of IELTS on classroom teaching and learning (washback). It seems clear that worldwide high-stakes language tests such as IELTS exert a powerful influence on large numbers of language learners and teachers. Similar to IELTS is the TOEFL test. With the introduction of the Next Generation TOEFL (TOEFL iBT) in 2005, ETS has funded a series of studies, two of which aim at examining the impact of the TOEFL test (see Hamp-Lyons and Brown, 2007; Wall and Horak, 2006). The study reported by Hamp-Lyons and Brown was designed to be in three stages: the first stage developed and validated instruments for the impact study; the second stage collected and analyzed data on TOEFL preparation in the USA, China and Egypt in the period immediately preceding the introduction of the TOEFL iBT in 2005. In the third stage, data is to be collected and analyzed in the same countries and institutions in order to identify change and constancy in beliefs, attitudes, methods, and the content of instruction under the influence of the significant changes to the existing TOEFL. Within these three countries, university-based and commercial institutions have been studied, and subjects from both TOEFL-taking and non-TOEFL-taking contexts have been included. This study is designed so that the collection of data at several points along the continuum of dissemination of information about the changes to TOEFL will enable change to be observed as it happens. The study primarily uses questionnaire data from teachers and students in all centres. However, it also incorporates classroom observation data in the USA, student participant logs in China, and teacher and student interviews in the USA and China. The initial findings have revealed differences in perceptions and attitudes between TOEFL teachers and their students, but surprisingly few differences between the views of students who are preparing for TOEFL and those who are not. There are also differences emerging across the three countries.

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 357

The TOEFL Impact Study in Central and Eastern Europe (Wall and Horak, 2006) investigated whether the new version of TOEFL— TOEFL iBT—contributed to changes in teaching and learning after its introduction. Phase 1 was a ‘baseline study,’ which described the type of teaching and learning that was taking place in commercial language teaching operations before details of the test were released about the content and format of the new TOEFL test. This would give a point of comparison for any changes that might occur in the future. Phase 2 was a ‘transition study,’ which traced the reactions of teachers and teaching institutions to the news that was released about the TOEFL iBT and investigated the arrangements they made for new preparation courses in the future. Phase 3 aimed to find out whether textbook accurately reflected the new test and what use teachers make of them in the classroom. Data was collected via computer-mediated communication with informants providing not only responses to questions about activities in their classrooms and institutions but also reactions to tasks which have been designed to probe their understanding of the new test construct and format. Apart from the main projects commissioned by the two testing agencies, there are an increasing number of doctoral level washback studies, which add to our understanding of the complex nature of washback and impact (e.g., Glover, 2006; Gosa, 2004; Gu, 2005; Scott, 2005; Shih, 2006; Stoneman, 2005). Glover (2006) explored the washback on how teachers teach by analyzing teacher talk and established some links between testing and certain aspects of teacher talk. Gosa (2004) used student diaries to investigate the influence of the Romanian national examination on learners. Gu (2005) conducted her study in both case study settings and nationwide contexts on the College English Test (CET) in China. A wide range of stakeholders (4,500 in total) were involved using multiple research methods. Her findings indicate a mix of positive and negative washback effects on teaching and learning. Certain test formats, that is, speaking and writing, seem to induce positive effects leading to the increasing of such activities in classroom teaching and learning, while multiple choice test items induce negative effects—where multiple-choice became the way teachers teach and students learn. The author concludes “the CET is part of the complex set of factors that determine the outcome of College English teaching and learning” (Gu, 2005, p. 2). Scott (2005) conducted an exploratory case study of the effects of high-stakes statutory testing on primary English as an Additional Language (EAL) learners in the UK. Her findings illustrated the extent to which there is congruence and/or dissonance between EAL-oriented teaching and washback from high-stakes testing. Shih (2006) investigated stakeholders’ perceptions (including department heads, teachers, students, and their partner/

358

LIYING CHENG

spouse) of the Taiwan General English Proficiency Test and its washback on school policies and teaching and learning. Stoneman (2005) examined and compared the nature and extent of test-preparation of university students. She suggested that, as a strategy to promote desirable changes in learners and their learning, testing may or may not bring out the predicted results. P R O B L E M S A N D D I F F I C U LT I E S

Although there has been an increasing number of empirical washback and impact studies conducted over the past 20 years, since the late 1980s, researchers in the field of language education still wrestle with the nature of the washback and do not know exactly how to induce positive and reduce the negative washback and impact of our tests. As indicated above, washback is one dimension of the consequences of the testing on classroom teaching and learning and impact studies include broader effects of testing (as defined in Wall, 1997); both assume a causal relationship between testing, teaching and learning which has not been established up to now. Most of the washback and impact empirical studies have only established an exploratory relationship. In many cases, we cannot be confident that certain aspects of teaching and learning perceptions and behaviors are the direct and causal effects of testing. They could well be, within certain contexts, however, this relationship has not yet been fully disentangled. Further, apart from the studies on IELTS and TOEFL, where a worldwide test influences teachers and learners across countries and educational contexts, the majority of the empirical studies are carried out in order to study the effects of one single test, within one educational context, and using the research instruments designed specifically for that particular study. The strength of these studies is they have investigated factors that affect the intensity of washback1 such as test factors (test methods, test contents, skills tested, purpose(s) of the test), prestige factors (stakes of the test, status of the test), personal factors (teachers’ educational backgrounds and their beliefs), micro-context factors (the school/ university setting), and macro-context factors (the specific society in which the tests are used) (Cheng, Watanabe, and Curtis, 2004). In fact, many of the factors related with the influence of testing on teaching and learning illustrated in Wall (2000) have been empirically studied. However, not only does little overlap exist among the studies regarding what factors affect washback, but little overlap also exists in researchers’ reports of the negative and positive aspects of washback (Brown, ‘Washback intensity’ refers to the degree of the washback effect in an area or a number of areas of teaching and learning that are affected by an examination (Cheng, 2005, p.33).

1

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 359

1997). Further, the existence of washback effects is evident in various contexts, yet there does not seem to be an overall agreement on which factors affect the intensity of washback and which factors promote positive or negative washback. This is a challenging feature of washback and impact studies, since researchers set out to investigate a very complex relationship (causal or exploratory) among testing, and teaching and learning. Such a relationship can be influenced or mediated by any of the factors of a test and/or within a particular educational system. Such complexity causes problems and difficulties in any washback and impact research, which in turn challenges any researcher who wishes to conduct, is conducting, or has conducted such studies. In many ways, the nature of such a study requires subtle, refined and sophisticated research skills in disentangling the relationship. Researchers need to understand the specificity, intensity, length, intentionality, and value of washback/impact and how (or where and when) to observe the salient aspects of teaching and learning that are potentially influenced by the test. They also need to identify their own bias, analyze the particular test and its context, and produce the predications of what washback/impact looks like prior to the design and conduct of the study (see also, Watanabe, 2004). Washback and impact studies are, by definition, studies of evaluation, which require researchers not only to understand but also to make a value judgment about the local educational context as well as the larger social, political, and economic factors governing teaching and learning in relation to a test/examination or a testing system. Researchers need to acquire both the breadth and depth of necessary research skills to avoid research based on investigating random factors of teaching and learning, which may or may not have a direct relationship with testing. FUTURE DIRECTIONS

It is clear that the future direction of washback and impact studies to investigate the consequences of language testing need to be multiphase, multimethod and longitudinal in nature. Washback and impact of testing take time to evolve, therefore longitudinal studies are essential with repeated observations (and measures) of the classroom teaching, including teachers and students as well as policy, curriculum, and assessment documents. Also, researchers need to be immersed in the educational system interacting with a wide range of stakeholders. In addition, researchers should pay attention to the seasonality of the phenomenon, that is, the timing of researchers’ observations may influence what we discover about washback (Bailey, 1999; Cheng, 2005; Watanabe, 1996). Examples like the IELTS impact studies (see Hawkey, 2006) and the two impact studies on TOEFL iBT across different countries and continents

360

LIYING CHENG

over a few years (Hamp-Lyons and Brown, 2007; Wall and Horak, 2006) have a great deal to contribute to our understanding of this complex phenomenon. Studies of a single test within an individual context by a single researcher can still offer valuable insights for that particular context, however, it would be the best use of resources if a group of researchers could work collaboratively and cooperatively to carry out a series of studies around the same test within the same educational context. In this way, researchers can investigate a range of different aspects of this phenomenon as discussed in the Introduction. The findings of these researchers could then be cross-referenced and can portray a more accurate picture of the effects of the test, avoiding the blind elephant syndrome. In addition, the methodology (and the methods) used to conduct washback studies need to be further refined. For example, more sophisticated methods, for example, those linking directly with test-takers’ characteristics, learning processes, and their learning outcomes (test performance) need to be employed beyond classroom observations and survey methods (interviews and questionnaires) commonly used by the studies reviewed in this chapter. Building on the increasing numbers of studies carried out on the same test or within the same educational context, future researchers can replicate or refine certain methodology and procedures, which was not possible 20 years ago. The replication would allow researchers to build on what we have learned conceptually and methodologically over the years and further our understanding of this phenomenon. While it would be useful to continue to study the effects of tests on broad aspects of teaching, it is essential to turn our attention to investigate the effects on student learning as they receive the most direct impact of testing (see Wall, 2000). What has not been focused in previous studies is the direct influence of testing on students (e.g., their perceptions, their strategy use, motivation, anxiety, and affect) and on their learning (e.g., what and how they learn or how they perform on a test). It is also important to investigate the impact of the test constructs, test methods and the function of the test on students and on their learning processes (including test-taking processes) and learning outcomes (test scores or other outcome measures) (see Cheng, Klinger and Zheng, in press, for an example). Based on these investigations, it is also important to use the results to do in-depth observations of the students. Furthermore, these studies should be conducted directly in relation to a test, for example, test takers’ responses (cognitive, psychological and emotional) to a test. This type of research can actually links the consequences of testing with test validity. It would be also worthwhile for washback and impact studies to look at the test taker population more closely, for example, the characteristics (learning and testing) of the students in the study. We know high-stakes

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 361

testing like IELTS or TOEFL influences students. However, will the impact of the test be different on students learning English in one country than in another where the educational tradition (beliefs and values) are different? If so, what are the different factors inducing washback? Without a thorough understanding of where these students come from and the characteristics they bring to their learning and testing, it is unlikely that we can fully understand the nature of test washback and impact. In the end, washback/impact researchers need to fully analyze the test under study and understand its test use. Bachman (2005, p. 7) states that “the extensive research on validity and validation has tended to ignore test use, on the one hand, while discussions of test use and consequences have tended to ignore validity, on the other”. It is, then, essential for us to establish the link between test validity and test consequences. Therefore, it is imperative that washback/impact researchers work together with other language testing researchers as well as educational policy makers and test agencies to address the issue of validity, in particular, fairness and ethics of our tests. See Also: Alan Davies: Ethics, Professionalism, Rights and Codes (Volume 7); Geoff Brindley: Educational Reform and Language Testing (Volume 7); Antony Kunnan: Large Scale Language Assessments (Volume 7) REFERENCES Alderson, J.C. and Hamp-Lyons, L.: 1996, ‘TOEFL preparation courses: A case study’, Language Testing 13, 280–297. Alderson, J.C. and Wall, D.: 1993, ‘Does washback exist?’, Applied Linguistics 14, 115–129. Andrews, S.: 1995, ‘Washback or washout? The relationship between examination reform and curriculum innovation’, in D. Nunan, V. Berry, and R. Berry (eds.), Bringing About Change in Language Education, University of Hong Kong, Hong Kong, 67–81. Andrews, S., Fullilove, J., and Wong, Y.: 2002, ‘Targeting washback—A case study’, System 30, 207–223. Bachman, L.F.: 2005, ‘Building and supporting a case for test use’, Language Assessment Quarterly 2(1), 1–34. Bachman, L.F. and Palmer, A.S.: 1996, Language Testing in Practice, Oxford University Press, Oxford, England. Bailey, K.M.: 1996, ‘Working for washback: A review of the washback concept in language testing’, Language Testing 13, 257–279. Bailey, K.M.: 1999, Washback in Language Testing, Educational Testing Service, Princeton, NJ. Brown, J.D.: 1997, ‘Do tests washback on the language classroom?’, The TESOLANZ Journal 5, 63–80. Burrows, C.: 2004, ‘Washback in classroom-based assessment: A study of the washback effect in the Australian adult migrant English program’, in L. Cheng, Y. Watanabe, and A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ.113–128.

362

LIYING CHENG

Chen, L.-M.: 2002, Washback of a Public Exam on English Teaching, Unpublished PhD dissertation, the Ohio State University. Chen, Z. and He, Y.: 2003, ‘Influence of CET-4 on college students and some suggestions’, Journal of Technology College Education 22, 40–41. Cheng, L.: 2005, Changing Language Teaching through Language Testing: A Washback Study, Studies in Language Testing: Volume 21, Cambridge University Press, Cambridge. Cheng, L. and Curtis, A.: 2004, ‘Washback or backwash: A review of the impact of testing on teaching and learning’, in L. Cheng, Y. Watanabe, and A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ, 3–18. Cheng, L., Klinger, D., and Zheng, Y.: 2007, ‘The challenges of the Ontario Secondary School Literacy Test for second language students’, Language Testing 24(2), 1–24. Cizek, G.J.: 2001, ‘More unintended consequences of high-stakes testing’, Educational Measurement: Issues and Practice 23(3),1–17. Davies, A.: 2003, ‘Three heresies of language testing research’, Language Testing 20(4), 355–368. Elder, C.: 1997, ‘What does test bias have to do with fairness?’, Language Testing 14, 261–277. Ferman, I.: 2004, ‘The washback of an EFL national oral matriculation test to teaching and learning’, in L. Cheng, Y. Watanabe, and A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ, 191–210. Glover, P.: 2006, Examination Influence on How Teachers Teach: A Study of Teacher Talk, Unpublished PhD thesis, University of Lancaster. Gosa, G.: 2004, Investigating Washback: A Case Study Using Student Diaries, Unpublished PhD dissertation, Lancaster University, UK. Green, A.: 2003, Test Impact and English for Academic Purposes: A Comparative Study in Backwash Between IELTS Preparation and University Pre-sessional Courses, Unpublished PhD thesis, Centre for Research in Testing, Evaluation and Curriculum in ELT, University of Surrey, Roehampton. Gu, X.: 2005, ‘Positive or negative? An empirical study of CET washback on college English teaching and learning in China’, ILTA Online Newsletter, 2. Retrieved on June 1, 2006 http://www.iltaonline.com/newsletter/02-2005oct/ Hamp-Lyons, L. and Brown, A.: 2007, The Effect of Changes in the New TOEFL Format on the Teaching and Learning of EFL/ESL: Stage 2 (2003–5): Entering Innovation, Submitted to the TOEFL Research Committee, Educational Testing Service. Hamp-Lyons, L.: 1997, ‘Washback, impact and validity: Ethical concerns’, Language Testing 14(3), 295–303. Hawkey, R.: 2006, Impact Theory and Practice: Studies of the IELTS Test and Progetto Lingue 2000, Cambridge University Press, Cambridge. Hayes, B. and Read, J.: 2004, ‘IELTS test preparation in New Zealand: Preparing students for the IELTS academic module’, in L. Cheng, Y. Watanabe, and A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ, 97–112. Irvine-Niakaris, C.: 1997, ‘Current proficiency testing: A reflection of teaching’, Forum 35, 16–21. Jin, Y.: 2000, ‘Washback of College English Test-Spoken English Test on teaching, Foreign Language World 80, 56–61. Kunnan, A.: 2000, Fairness and Validation in Language Assessment: Selected Papers From the 19th Language Testing Research Colloquium, Orlando, Florida, Studies in Language Testing: Volume 9, Cambridge University Press, Cambridge.

WA S H B A C K , I M P A C T A N D C O N S E Q U E N C E S 363 Kunnan, A.J.: 2004, ‘Test fairness’, in M. Milanovic, C. Weir, and S. Bolton (eds.): Europe Language Testing in a Global Context: Selected Papers from the ALTE Conference in Barcelona, Cambridge University Press, Cambridge. Latham, H.: 1877, On the Action of Examinations Considered as a Means of Selection, Deighton, Bell and Company, Cambridge. Messick, S.: 1996, ‘Validity and washback in language testing’, Language Testing 13, 243–256. Nazari, A.: 2005, ‘Washback effects on TEFL: A case study from Iran’, IATEFL Voices 185, 9–10. Nemati, M.: 2003, ‘The positive washback effect of introducing essay writing tests in EFL Environments’, Indian Journal of Applied Linguistics 29, 49–62. Nguyen, P.: 1997, Washback Effects of International English Language Testing System at the Vietnam National University, Unpublished PhD thesis, University of Melbourne. Popham, W.J.: 1987, ‘The merits of measurement-driven instruction’, Phi Delta Kappa 68, 679–682. Qi, L.: 2004, ‘Has a high-stakes test produced the intended changes?’, in L. Cheng, Y. Watanabe, and A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ, 171–190. Qi, L.: 2005, ‘Stakeholders’ conflicting aims undermine the washback function of a high-stakes Test’, Language Testing 22, 142–173. Read, J. and Hayes, B.: 2003, ‘The impact of IELTS on preparation for academic study in New Zealand’, in R. Tulloh (ed.), International English Language Testing System Research Reports 2003, Volume 4, IELTS Australia, Canberra. Robb, T.N. and Ercanbrack, J.: 1999, ‘A study of the effect of direct test preparation on the TOEIC scores of Japanese university students’, TESL-EJ 3, A2, http://tesl-ej. org/ej12/toc.html. Saif, S.: 2006, ‘Aiming for positive washback: A case study of international teaching assistants’, Language Testing 23, 1–34. Saville, N. and Hawkey, R.: 2004, ‘The IELTS impact study: Investigating washback on teaching materials’, in L. Cheng, Y. Watanabe, and A. Curtis.(eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ, 73–96. Scaramucci, M.V.R.: 2002, ‘Entrance examinations and TEFL in Brazil: A case study’, Revista Brasileira de Lingüística Aplicada 2, 61–81, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. Scott, C.: 2005, Washback in the UK Primary Context with EAL Learners: Exploratory Case Studies, Unpublished PhD thesis, University of Bristol. Shepard, L.A.: 1990, ‘Inflated test score gains: Is the problem old norms or teaching the test?’, Educational Measurement: 9, Issues and Practice 15–22. Shih, C-M.: 2006, Perceptions of the General English Proficiency Test and its Washback: A Case Study at the Two Taiwan Technological Institutes, Unpublished PhD dissertation, University of Toronto. Shohamy, E., Donitsa-Schmidt, S., and Ferman, I.: 1996, ‘Test impact revisited: Washback effect over time’, Language Testing 13, 298–317. Shohamy, E.: 2001, The Power of Tests: A Critical Perspective on the Uses of Language Tests, Longman, Essex, England. Stecher, B., Chun, T., and Barron, S.: 2004, ‘The effects of Assessment-driven reform on the teaching of writing in Washington State’, in L. Cheng, Y. Watanabe, and A. Curtis. (eds.), Washback in Language Testing: Research Contexts and Methods, Lawrence Erlbaum Associates, Mahwah, NJ, 53–72. Stobart, G.: 2003, ‘The Impact of Assessment: Intended and Unintended Consequences’, Assessment in Education 16, 139–140.

364

LIYING CHENG

Stoneman, B.: 2005, An Impact Study of an Exit English Test for University Graduates in Hong Kong: Investigating Whether the Status of a Test Affects Students’ Test Preparation Activities, Unpublished PhD thesis, Hong Kong Polytechnic University. Wall, D.: 1996, ‘Introducing new tests into traditional systems: Insights from general education and from innovation theory’, Language Testing 13, 334–354. Wall, D.: 1997, ‘Impact and washback in language testing’, in C. Clapham and D. Corson (eds.), Encyclopedia of Language and Education, 291–302. Wall, D.: 2000, ‘The impact of high-stakes testing on teaching and learning: Can this be predicted or controlled?’, System 28, 499–509. Wall, D.: 2005, The Impact of High-Stakes Examinations on Classroom Teaching: A Case Study Using Insights from Testing and Innovation Theory, Studies in Language Testing: Volume 22, Cambridge University Press, Cambridge. Wall, D. and Alderson, J.C.: 1993, ‘Examining washback: The Sri Lankan impact study’, Language Testing 10, 41–69. Wall, D. and Horak, T.: 2006, The TOEFL impact study: Phase 1, TOEFL Monograph 34, Educational Testing Service. Watanabe, Y.: 1996, ‘Does grammar translation come from the entrance examination? Preliminary findings from classroom-based research’, Language Testing 13, 318–333. Watanabe, Y.: 2001, ‘Does the university entrance examination motivate learners? A case study of learner interviews’, in Akita Association of English Studies (ed.), Trans-equator exchanges: A Collection of Academic Papers in Honour of Professor David Ingram, Author, Adita, Japan, 100–110. Yu, G.K.H. and Tung, R.H.C.: 2005, The washback effects of JCEEEs in the past fifty years, Proceedings of 22nd Conference on English Teaching and Learning 379–403, Normal University, Taipei, Taiwan. Zhao, L.: 2003, ‘College English teaching evaluation system in China: Major problems and corresponding countermeasures’, Indian Journal of Applied linguistics 29, 85–98.