Liz Hamp-Lyons and Tom Lumley Hong Kong Polytechnic. University ... Davies rejects the notion of considering Specific Purposes (SPs) as registers alone ...
Language Testing http://ltj.sagepub.com
Assessing language for specific purposes Liz Hamp-Lyons and Tom Lumley Language Testing 2001; 18; 127 DOI: 10.1177/026553220101800201 The online version of this article can be found at: http://ltj.sagepub.com
Published by: http://www.sagepublications.com
Additional services and information for Language Testing can be found at: Email Alerts: http://ltj.sagepub.com/cgi/alerts Subscriptions: http://ltj.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.co.uk/journalsPermissions.nav Citations http://ltj.sagepub.com/cgi/content/refs/18/2/127
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009
Editorial
Assessing language for specific purposes Liz Hamp-Lyons and Tom Lumley Hong Kong Polytechnic University
This issue of Language Testing focuses on an area that continues to fascinate and trouble many of us: assessing language for specific purposes (LSP). The five articles in this issue present a range of challenging issues and findings that will inform our understanding of LSP assessment, but also further problematize it. This is no bad thing as English spreads wider still around the world and more and more users see English as a wholly instrumental skill in their lives. The issues apply, of course, equally to languages other than English. Davies questions the English for Specific Purposes (and by extension, the LSP) enterprise on both practical and theoretical grounds. Davies rejects the notion of considering Specific Purposes (SPs) as registers alone, agreeing with Widdowson and others that SPs are characterized by their communicative natures. As he points out, with communication in mind, we are in the territory of discourse and therefore of blurred boundaries. This is an issue which is also taken up by Elder and Cumming in this issue. Davies’ critique of the IELTS development, and of the validation study of the original ELTS, leads him to question the influence that SP factors (linguistic or discoursal) have in test-taker performances relative to other factors, such as mastery of the language core (grammar, core lexis, etc.). Davies concludes that, to date, SP tests can only be judged on the pragmatic criterion of test usefulness. In her article, Elder uses three language tests for teachers to examine the problem of indeterminacy in LSP tests, in the process of which she calls into question the notion that any test has an inherent claim to validity by virtue of its SP-ness. The tests illustrate some of the diversity of issues that may arise when we consider the language ability needed by both subject teachers and foreign language teachers. Address for correspondence: Liz Hamp-Lyons, Asian Centre for Language Assessment Research, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong; e-mail: eglhl얀polyu.edu.hk Language Testing 2001 18 (2) 127–132
0265-5322(01)LT204ED 2001 Arnold
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009
128 Editorial Elder describes how in none of these cases can the language of teaching be described as a well-defined domain. She considers how the notions of situational and interactional authenticity raise rather separate problems, depending on the test’s purpose, and how we still lack a satisfactory approach to settling issues of authenticity. The article also illustrates how measurement problems can arise when the reality of assessing task fulfilment conflicts with raters’ perceptions of linguistic competence in the test performance. It is clear that the relationship between performance and ability is far from settled. Douglas addresses the issue of the criteria used in assessing LSP tests. Again, this article deals with the issue of separability of language and content. He reinforces the points made by Jacoby and McNamara (1999) that second language assessments based entirely on linguistic criteria may fail to satisfy the purpose of the test user; by contrast, the use of ‘indigenous’ assessment criteria identified by carefully chosen professionals may provide very useful supplements to assessments based on native speaker norms. It is not enough to consult subject specialists when designing the task only; the criteria are equally worthy of attention in LSP test development. There are practical as well as theoretical difficulties in balancing the level of input from specialists with that of linguists. Trying to resolve this problem is one of the many interesting challenges facing those who work in LSP test development and validation. Wu and Stansfield’s article introduces an interesting context for language assessment, the translation ability of employees in a Law Enforcement Agency. Their focus is on authenticity of task, addressed through a description of the stages followed in developing the Listening Summary Translation Exam (LSTE) in Taiwanese. This involves exploration of the rather exotic ground of wire-tapping and drug-dealing – very much the ‘sharp end’ of LSP assessment. The article illustrates some of the practical difficulties, associated among others with Munby’s work (1978), of using a taxonomy for test development purposes. The taxonomy presented here lists both content and functions, which mirrors the problem faced continually in LSP testing (also described clearly in Elder’s article) of distinguishing language knowledge from strategic competence: how can we describe theoretically the knowledge and competence required to both understand and convey successfully the message of the listening text? Cumming’s in-depth interviews with teachers around the world led him to uncover an SP vs. general purpose (GP) distinction in their orientations to the work they do. The difference in orientation was most obviously signalled in the criteria these teachers use to assess their students’ writing. The ‘SP view’ was related to narrower forms viewed as appropriate for written work, and more restricted criteria
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009
Liz Hamp-Lyons and Tom Lumley 129 for judgement of the writing. The ‘GP view’ seemed to be related to more variable assessment methods and criteria, and focused more on individual learners. Cumming notes that even the ‘boundary’ between SP and GP teaching seemed to be blurred, so wide was the variation in teaching and assessment practices. In his recent book, Douglas (2000) suggests two reasons for engaging in specific purpose testing: language performances vary with context; and specific purpose language is precise (p. 281). However, he also notes that there is some evidence of a tendency in language assessment to reduce specific purpose tests gradually towards less and less field specificity, citing the move from ELTS to IELTS (1) to IELTS (Revised); and the replacement of the PLAB English assessment by a clinical skills measure and an IELTS screening (p. 280– 81). We see, then, two potentially opposing views of a relationship between SP contexts/SP teaching, and language testing/assessment. SP issues are interesting to language testers because they have the ability to illuminate the relationship between language as vehicle and language as content/function/action. Since SP language tests are difficult to create and validate, and demand relatively more resources than more generic tests, they demand more of the skills of the language tester: this kind of challenge will never fail to be attractive. As we have said, the article by Wu and Stansfield in this issue illustrates the ‘sharp end’ of SP language test development and validation. However, as we move further from the sharp end towards less preciselydefined contexts with their concomitant language behaviours and needs, questions about boundary issues arise. To answer these questions, we need a theory of LSP testing. Both Davies and Elder in this issue suggest that current SP assessment theory is inadequate. Davies’ article questions the basis for any theoretical claims made on behalf of SP testing, on the grounds that no sustainable distinction can be made for SP tests in contrast to general purpose tests. He suggests that the essential characteristic of SP tests is no more than their pragmatic purpose in fulfilling a need. Elder’s article looks at domains of teacher competencies but acknowledges that there are blurred boundaries between language and content, and even between domains of language: she finds SP test theory inadequate for test development needs. Douglas (2000) suggests there are three key characteristics of LSP tests/assessments: specificity, authenticity and separability of language and content. But this simple characterization does not equal, or lead to, a theory. Douglas (2000) also proposes a model that distinguishes language knowledge, strategic competence and background knowledge; he emphasizes the concept of discourse domains (Douglas and Selinker, 1985). Wu and Stansfield focus on the notion
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009
130 Editorial of authenticity and target language use (Bachman and Palmer 1996) domains. However, doubt is raised in their article about the extent to which the test they describe seems to fit all these areas of the model; it appears to have less a clearly specified domain of knowledge than a domain of use. It seems that the language used draws on rather general knowledge; nevertheless, test-takers will require strategic competence of unspecified kinds in order to perform successfully on the test. Certainly we can’t say that LSP testing theory is further developed by Wu and Stansfield’s article. Cumming, like Elder, examines the issue of blurred boundaries. He identifies the relationship between orientations of teachers towards curriculum and assessment as deciding whether or not assessment practices are perceived as SP or GP. This takes us to the question of how we recognize an SP test. It may be helpful, in thinking about an SP test, to start from its purpose and (or) the need it fills. This involves, first, the purpose for which the assessment measure is designed (in response to some situational need). The notion of specificity, including a domain of knowledge, suggests that SP tests should give us a way of including rather than excluding content. This content should provide on some level a target as well as a vehicle of communication. Secondly, we can consider the purpose to which the test is put. Can or should we draw lines for the use of particular tests? It seems obvious that certain tests can only be used in certain situations, as the examples offered by Wu and Stansfield and by Elder show. Yet, as Cumming shows, an assessment that is not by design an SP assessment can also generate SP-like behaviour and data. To take another example, should a test such as IELTS, originally designed to screen university applicants, later also be used to make immigration decisions, as happened in Australia and New Zealand? Is IELTS as it now stands an SP test? Do test-takers and users of IELTS scores suffer because of the loss of specificity over time in IELTS? Thus we can also think about an SP test from the point of view of audience. Tests have multiple audiences, but the one that is most likely to drive design decisions is the score consumer. TOEFL is the generic, non-specific test it is because it meets the demands of a large and disparate ‘audience’ or groups of score consumers who want simple ‘information’ about the language performance of a wide range of people, without regard to their strategic competence or background knowledge domains. The current view of appropriate test use as grounded in construct or consequential validity suggests that this expectation by consumers is unwarranted; but without adequate theory it is difficult to argue this. This raises the perennial problem of how to educate the consumer, which is an issue that extends well beyond SP testing.
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009
Liz Hamp-Lyons and Tom Lumley 131 Educating the user is bound to be more difficult, however, when we ourselves remain unclear about the definitions and parameters of the construct we are working with. What is an SP test? As is the case with all test development, decisions about modes and levels, about text types, task types and item types must be tailored to purpose/need and audience. Performance samples must be gathered, criteria developed, raters trained and test-taker production judged. Why, then, is one test/assessment/measure to be labelled ‘SP’ while another is not? First, a more detailed and context-linked needs analysis is needed than is usual with general purpose test development. Indeed, in our view, a more detailed and context-linked needs analysis is needed than usually underpins a ‘specific purpose’ test. Further, a clearlydefined set of test specifications must inform test development on all dimensions: skills, levels, task types, performance requirements, items, focus of judgements, etc. But as we have already pointed out, specificity is not enough; authenticity is also necessary. While test developers typically think about authenticity of text types, task types and instantiations of these, we would argue that the assessment criteria and the rater orientations in an SP test must also be authentic. It is not enough that the criteria should have an SP ‘face’: they also must be applied in SP ways. This requires raters who have a suitable background for making judgements relevant to the situations in which the SP language will be employed. Douglas’ article in this issue looks at criteria on LSP tests and uses Jacoby and McNamara’s (1999) notion of ‘indigenous assessment’ to argue in favour of developing criteria that (may be) more authentic to the knowledge domain of an SP test. But how are we to go further in uncovering what the issues are in and for SP testing? We will need to turn to the test-takers themselves – and to other stakeholders in SP test use, such as score consumers – for greater understanding of what we do and what its impact is. Hamp-Lyons (1986) used statistical and text analyses to look at the original, more field-related, ELTS, but it was discussions with ELTS test candidates that led her to hypothesize about the interrelationships between the ‘specific’ texts and tasks in ELTS and the test-takers’ own positions as members of a disciplinary community, or as an outsider. Lumley (1998) found that ESL-trained raters could apply GP criteria in an SP test to produce similar judgements to those made by health professionals. However, as Jacoby and McNamara (1999) point out, the result was a test that did not satisfy the larger audience of health professionals who worked with those who had taken the test. A new approach to SP assessment is taken in a current project (Hamp-Lyons et al., in progress) which aims to develop an on-line assessment of the English language competence of employees
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009
132 Editorial (initially accountants) in Hong Kong. The project not only involves representatives of the test score consumers (accountants as employers) during the needs analysis and task design stages, but also intends to make administration of the assessment procedures, including task selection and assessment, the responsibility of the score consumers. It will (try to) match tasks and scores to job levels and to different aspects of a company/profession. It is thus explicitly useroriented. It aims to combine specificity and authenticity. The issue of the separability of content and language, as illustrated numerous times in the pages of this issue, remains to be confronted. References Bachman, L. and Palmer, A. 1996: Language testing in practice. Oxford: Oxford University Press. Douglas, D. 2000: Assessing languages for specific purposes. Cambridge: Cambridge University Press. Douglas, D. and Selinker, L. 1985: Principles for language tests within the ‘discourse domains’ theory of interlanguage. Language Testing 2, 205–26. Hamp-Lyons, L. 1986: Writing in a foreign language and rhetorical transfer: influences on evaluators’ ratings. In Meara, P., editor, British series in applied linguistics, Volume 1: Selected papers from the 1985 annual meeting. London: CILT, 72–84. Hamp-Lyons, L., Hamilton, J., Lumley, T. and Lockwood, J. in progress: A context-led English language assessment system. Central Earmarked Research Grant. Hong Kong University Grants Council. Project due to finish end 2001. Jacoby, S. and McNamara, T. 1999: Locating competence. English for Specific Purposes 18, 213–41. Lumley, T. 1998: Perceptions of language-trained raters and occupational experts in a test of occupational English language proficiency. English for Specific Purposes 17, 347–67. Munby, J. 1978: Communicative syllabus design. Cambridge: Cambridge University Press.
Downloaded from http://ltj.sagepub.com at UNIV HONG KONG on September 3, 2009