important interaction skills such as distributing and competing for opportunities to speak ... information or message they wish to communicate, but also the skills.
Designing rating scales for assessing small-group performance illustrates the difficulty—common to many attempts at principled EFL activity—of satisfying the conflicting demands of both theory and daily practice. While small-group interaction is commonly used in language lessons which focus on developing interactive ability, the assessment of the ability to interact in a small group is seen as more problematic, and is often avoided. Yet there are compelling reasons to assess what we teach, not the least of which may be that students in institutional settings take activities which are assessed more seriously. Rating scales are proposed here as a means of providing the necessary counterbalance between the teaching and the assessment of small-group interaction. Setting
Naturally, the general aims and the specific objectives of language courses need to be sensitive to the educational setting. The rating scales described in this paper were designed in response to a practical need at Kochi National University in Japan, where a course in English Conversation has recently become compulsory for allfirst-yearstudents. Observation of English classes in local secondary schools, and initial assessment by university instructors, indicate that students have had little experience in participating in conversations. Secondary school classrooms are not always suitable settings for activities which devolve responsibility for managing the interaction to the students. In Japan, for example, small-group interaction is not proposed as a viable activity for a school system which has its own priorities, one of which is to prepare EL T Journal Volume 54/2 April 2000 © Oxford University Press 2000
169
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
Classroom activities in small groups provide opportunities for practising important interaction skills such as distributing and competing for opportunities to speak, holding the floor, adjusting to the contributions of other speakers, and negotiating real understanding when exchanging information, opinions, feelings, and attitudes. A rating scale is proposed here as a practical means of addressing the difficult task of assessing both the level of a particular communicative performance in a small group and the general ability to perform in small-group conversations over time. This paper will argue that theoretical difficulties of designing and using rating scales for this purpose, while requiring serious consideration, are outweighed by practical advantages. Rating scales not only report test performances. They can also guide the teaching process, defining the principles for the construction of both assessment and classroom tasks and providing teachers (and students) with achievable goals which they themselves have formulated in writing.
students for written university entrance exams. But the development of interactive skills is a more appropriate general aim for first-year university conversation classes. This is partly because students who have learnt well in school need to activate language already learnt, but also because students who have not learnt well in six years of school English classes have little to lose by trying a different approach. In classes of perhaps 40 or 45 students, it also seems logical to use smallgroup activities as one means of providing the necessary practice in spoken interaction. With a large team of (mainly part-time) teachers, but no common examination, rating scales help to provide common criteria for both teaching and assessing small-group interaction. Conversational analysts have long placed great emphasis on the fact that conversation is a participant-managed system. Sacks et al. {191 A: 234) demonstrate that to participate in interaction, active attention to the abstention and distribution of turns to speak is required. However, students are often extremely reluctant to either self-select or to take the responsibility for nominating the next speaker ('current speaker select next') in a formal classroom context. Recent analysis of the structure of spoken discourse (Coulthard 1992, Tsui 1994) also emphasizes the 'hereand-now' process that participants are engaged in as they continually adjust to the other participants' contributions. Negotiating real understanding by adjusting to the contributions of other speakers (Bygate 1987: 22-41) is not only a necessary skill in terms of interaction in a foreign language, it is also worthwhile in terms of promoting truly intercultural exchange. While pairs are frequently preferred for teaching and particularly for assessing interactive ability, there are many situations in which spoken interaction is not restricted to face-to-face interaction between only two speakers. Because it is more difficult to participate in small-group interaction in a foreign language, extensive practice is required. Students must coordinate not only the linguistic means of expressing the information or message they wish to communicate, but also the skills of participation, which often require pinpoint timing and precision. They must obtain turns, maintain contributions, defend their floor rights, nominate other speakers, adjust to the contributions of other participants, improvise, negotiate meaning, repair misunderstandings, defend opinions, and support other participants.
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
Why teach small-group interaction?
Teachers of small-group interaction also need to be aware that there are other kinds of interaction that need practice, and that small-group conversations may, on occasions, leave even able students excluded. Students can be made aware throughout the instruction period that the ability to dominate a group conversation is not the same as the ability to participate in a successful conversation. While equal access to participation is provided, this does not mean that participants should be evaluated in terms of the quantity of participation, which will always be variable within a group. On their side, candidates in examinations have to be aware that they must participate to an extent that allows them 170
Roger Nunn
to be evaluated, which in practice means anything above about 20 per cent participation. Having three candidates in each group was considered to be a good number for assessing a group conversation in this project, as more than that number increases the possibility of some students dominating or being dominated.
Providing students The general aim relevant to this discussion is for students to develop the and teachers with ability to maintain a simulated conversation in a small group. Few achievable goals published rating scales are available for this aim, and it is difficult for course books to provide suitable activities if each group member is to possess information not available to the others. Designing and wording rating scales is a useful activity in itself, because the very process of writing the banded descriptions of attainment in specific skills requires detailed attention not only to assessment, but also to teaching and learning objectives. A rating scale provides the student with a realistic goal by describing the performance just above her or his present level. Rating scales also focus attention on what both individual students and groups of students are good at, and what needs more attention. While there is disagreement about the significance of the 'washback effect' of tests, it does not seem controversial to suppose that in graded courses in institutional learning, what is seen to be tested is more likely to be taught and learnt. Descriptions of desired performances link the students' natural ability to pass exams to the need to develop real language skills. Alderson (1991: 71-85) discusses the reasons for using rating scales in some detail, but only a short summary will be provided here. Firstly, rating scales provide an easily understandable report (op. cit: 72) for candidates, administrators, course designers, and teachers on the level of performance of individuals or groups, at the same time as providing descriptions of what candidates can do. They can report on 'typical or likely behaviours of candidates at any given level' or on the proportions of candidates at each level. Secondly, rating scales can guide the rating process (op. cit: 73) standardizing the criteria for an individual rater or act as 'a common standard for different raters'. Finally, they also help to guide the construction of tasks (op. cit: 74) which allow students to display the described behaviours at their own level. Designing the rating scales
The wording of the rating scales is necessarily a compromise between theoretical and practical considerations in the specific institutional context in which they will be used. Feedback from a team of teachers during piloting indicated that they felt comfortable handling up to four Designing rating scales for small-group interaction
171
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
Why use rating A rating scale is a practical means of assessing the level of a particular scales? communicative performance by using a number of descriptive bands for a particular skill, on a scale of competence ranging from excellence to failure. The main general advantage of designing rating scales for any course is the harmony that can be achieved between the potentially discordant and conflicting perspectives of teachers, learners, and assessors.
scales when assessing a group of three students during a 15-minute conversation. The choice of four scales meant that bands had to be designed using a relatively global definition of skills objectives covering the areas considered to constitute interactive competence in small-group conversations.
1 Keeping a conversation going: turn-taking and negotiation 1 Has (almost) no ability to keep a conversation going. Without constant help, the conversation is always likely to break down. 2 Rarely self selects, but responds minimally to other speakers, and sometimes supports their contributions. Negotiates rarely and/or only with a very limited repertoire. Communication sometimes breaks down without support. 3 Responds fully when nominated, supports other speakers, and sometimes self selects. Has an adequate repertoire for negotiation. Communication almost never breaks down. 4 Is able to take initiatives, self-selecting and negotiating whenever necessary drawing on a wide repertoire of expressions and techniques. Helps other participants to join in and interrupts politely when appropriate. la Keeping a conversation going: turn-taking 1 Has (almost) no ability to exploit turn-taking to keep a conversation going. Without constant help, the conversation is always likely to break down. 2 Rarely self selects, but responds minimally to other speakers, and sometimes supports their contributions. Only rarely nominates other speakers, even when he/she has the floor. Communication sometimes breaks down without support. 3 Responds fully when nominated, supports other speakers, and sometimes self selects Communication almost never breaks down. 4 Is able to take initiatives, self-selecting, holding the floor, interrupting or nominating as the conversation demands. Helps other participants to join in.
172
Roger Nunn
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
Interactive skills The first two scales cover (1) the interactive ability to participate in keeping a conversation going in terms of turn-taking and negotiation, and (2) the ability to bring about a genuine exchange of information and express feelings, opinions, and attitudes in relation to the information. For each rating scale, four bands were used respecting the need for a small number of bands in terms of reliability (Underhill: 1987:100). Each of the following four scales is a combination of two other scales which are presented below it. For example, scale 1 below is a combination of scales la and lb. Only the combined scale is used for assessment, but teachers are aware of the scales they are derived from.
lb Making communication effective: negotiation
2 Content of contributions: exchanging information, ideas, and feelings 1 Has almost no ability to communicate even basic information such as age, price, etc. 2 Can only communicate the most basic information, and cannot really express ideas or feelings on anything but the most basic everyday topics. 3 Can communicate information on a reasonable range of topics, and can express opinions, feelings, and ideas to a certain degree on a more limited range of topics. 4 Has a sound ability to communicate information, and to express feelings, opinions, and ideas on a variety of topics. 2a Content of contributions: exchanging information 1 Has almost no ability to communicate even basic information, such as age, price, etc. 2 Can only exchange the most basic information on everyday topics. 3 Can exchange information adequately on a reasonable range of topics. 4 Has a sound ability to exchange information on a wide variety of topics. 2b Content of contributions: expression of opinions, ideas, and feelings 1 Has (almost) no ability to communicate even basic opinions, ideas, or feelings. 2 Can only express opinions, ideas, or feelings in a fairly limited manner on everyday topics. 3 Can express opinions, feelings, and ideas adequately on common topics. 4 Has a sound ability to express feelings, opinions, and ideas on a variety of topics. Designing rating scales for small-group interaction
173
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
1 Has (almost) no ability to negotiate effectively. Without constant help, communication of even basic information is unlikely to be successful. 2 Sometimes adjusts to the contributions of other speakers, but rarely negotiates, and then only with a very limited repertoire, which limits the effectiveness of the communication. 3 Is able to negotiate when necessary, adjusting to the contributions of other speakers and demonstrating an adequate repertoire for negotiation. Communication is normally effective and successful. 4 Is able to adjust fully to other speakers' contributions, taking initiatives and negotiating persistently whenever necessary, drawing on a wide repertoire of expressions and techniques. Takes a full share of the responsibility for successful communication.
Intelligibility
The other two scales were designed to emphasize the use of microlinguistic skills, reflecting the common concern of conversation teachers trying to strike a balance between developing confidence on the one hand and micro-linguistic ability on the other. However, when the course objectives lean more towards the confidence-building skills related to keeping a conversation going, pronunciation (both individual sounds and intonation), grammar and vocabulary are best assessed in terms of intelligibility in a conversational context, and not as ends in themselves.
1 The speaker is almost impossible to understand. 2 Inadequate use of intonation and/or poor pronunciation of individual sounds make(s) the speaker very difficult to follow without compensation. 3 Adequate use of intonation and pronunciation of individual sounds; some attempt is made to make important syllables prominent. The message is intelligible, although there are occasional lapses. 4 Good use of intonation and accurate pronunciation of individual sounds makes the speaker easy to follow. Intelligibility is almost never impeded by wrong sounds, insufficient or misplaced prominence. 3a Intonation 1 Flat intonation makes the speaker almost impossible to understand. There is (almost) no attempt to make key words or tonic syllables prominent. 2 Inadequate use of intonation makes the speaker very difficult to follow without compensation. There is little effort to make important words or syllables stand out. 3 Adequate use of intonation, making the intelligibility of the message fairly high, although there are occasional lapses. Intelligibility is sometimes impeded by making the wrong syllables prominent. 4 Good use of intonation makes the speaker easy to follow. Intelligibility is almost never impeded by insufficient or misplaced prominence.
174
Roger Nunn
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
3 Pronunciation of individual sounds and intonation (note that loudness is not considered here)
3b Pronunciation of individual sounds 1 The speaker is almost impossible to understand. 2 Poor pronunciation of individual sounds makes the speaker very difficult to follow without compensation. 3 A reasonable pronunciation of individual sounds; the message is intelligible, although there are occasional lapses. 4 Accurate pronunciation of individual sounds helps to make the speaker easy to follow. Intelligibility is almost never impeded by mis-pronounced sounds.
1 Poor structure and/or inadequate and inappropriate use of vocabulary make it (almost) impossible to understand. 2 Barely adequate use of structure, and limited vocabulary, make it difficult to follow without compensation. 3 Vocabulary and structure are normally adequate for the task. Fairly frequent errors donit seriously impede comprehension. 4 Good use of structure and vocabulary make the speaker easy to understand. Only a few errors which donit impede comprehension at all. 4a Intelligibility: grammar 1 Poor structure makes it (almost) impossible to understand. 2 Barely adequate use of structure makes it difficult to follow without compensation. 3 Structure is normally adequate for the task. Errors may still be fairly frequent but they don't seriously impede comprehension. 4 Good use of structure makes the speaker easy to understand. Only a few errors, which hardly impede comprehension at all. 4b Intelligibility: vocabulary 1 Inadequate and inappropriate use of vocabulary make it (almost) impossible to understand. 2 Limited and inappropriate use of vocabulary make it difficult to follow without compensation. 3 Vocabulary is normally adequate for the task. Some inappropriate usage, but it doesn't seriously impede comprehension. 4 Normally appropriate use of a wide repertoire of vocabulary. Only a little inappropriate usage, which hardly impedes communication at all.
Designing rating scales for small-group interaction
175
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
4 Grammar and vocabulary
Problems of using rating scales for small-group interaction
Weir (1990: 79) points to several disadvantages of using interactive tasks for assessment. Some of them are common sense, and some theoretical. The common-sense points relate to the possibility of one participant dominating the interaction. This may be due to a large difference in proficiency or greater interest in or knowledge of a particular topic. Weir also refers to practical constraints, such as the time needed, test security, and administration, etc.
Oral assessment sheet Name of student: Ability to keep a conversation going: 1 2 3 4 1 Content of contributions: information, ideas, feelings 2 expressed: 3 4 Intelligibility: pronunciation: 1 2 3 4 Intelligibility: grammar/vocabulary: 1 2 3 4
Any other comments
Assessors have to become very familiar with the more general definitions, since they will not have time to refer to them during the assessment. It was also apparent that they should not give too much importance to early contributions to the conversation, nor be unduly influenced by isolated instances of very positive or very negative contributions. Their aim should be to assess stable performance over time. A further difficulty is created by the borderline cases, when a performance appears to reflect some features assigned to one band description and others assigned in equal proportions to the band below or above. These difficulties tend to lead to the conclusion that tests of interactive ability should never be the only means of assessing a student's ability to speak a language. When the teacher is the assessor, training, practice, and his/her knowledge of his or her own students will partly resolve 176
Roger Nunn
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
Practical difficulties The assessment of small-group interaction requires considerable of test practice and training. Initially assessors find that there are too many administration features to concentrate on. This problem is alleviated by providing an assessment sheet with a brief summary of each band on the rating scales.
some of these difficulties. However, the teacher's familiarity with his or her students might create a reliability problem in the later assessments, which is why it is important for teachers to work together in teams whenever possible. Theoretical Weir (1990: 79) suggests that an important theoretical problem of using difficulties interactive tasks is the 'extrapolations' made from one test performance to a candidate's ability to perform in other situations. Pollitt (1991: 87-94) summarizes this essential theoretical difficulty of tests using band-scales in one question:
This fundamental question relates to the long-standing debate between testing simulations of 'real' performance rather than 'underlying' competence. The genuine concern about generalizing from (a single) test performance raises a very serious concern about the use of band-scales in tests in which an unknown candidate is assessed on only one occasion. When interactive tasks are used for course assessment, it is essential to test students at regular intervals. The extrapolation issue can also be addressed by developing alternative methods of assessment, such as continuous assessment throughout the course. Students can also become involved in their own assessment. If the scales are translated into the students' native language they can be asked to assess themselves at the beginning of the course and after each test performance, and be given the opportunity to compare their assessment to the teacher's. In this way, students can also set themselves targets for future assessments. Problems of task An attempt to set up realistic conditions of performance is an important implementation feature of construct validity, but we should never forget that a task only simulates real-life communication. The problem of extrapolation is hence inextricably linked to task design and implementation, as task type and task familiarity always have an influence on performance which cannot be measured. Task familiarity is potentially the biggest concern in assessing performance, as it is difficult to decide how familiar students should be allowed to become with tasks. A totally unfamiliar task type is unlikely to produce typical performance, but high familiarity may easily help exam-wise students to perform above the level they can be expected to achieve in real life. Practice in the classroom must in any case never involve rehearsal of assessed task activities. For this project, no more than basic familiarity with the actual assessment task types was allowed. Students had frequent practice in the skills of small-group interaction, using a very wide variety of tasks, but were not over-trained in the actual assessment tasks: only one practice session in the lesson preceding each assessment used a similar task format. Assessment tasks always involved a three-way exchange of information, followed by a discussion using this information to try to come to some kind of agreement or decision in the group. Designing rating scales for small-group interaction
111
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
Can you generalise from a student's performance on one task to predict performance on another? (90)
Conclusions
This paper has only considered in detail the design of rating scales for small-group interaction. However, the considerable difficulties of reliability and validation need to be fully understood when the final results are interpreted, the safest course being to use parallel forms of continuous assessment which allow for the individual differences and inbuilt inconsistencies which are part of any subjective process. The temptation of making facile extrapolations about how well students can perform in real-life conversations should also be avoided.
Received March 1999 References Alderson, C. 1991. 'Bands and Scores' in Alderson and North 1991. Alderson, C. and B. North (eds.). 1991. Language Testing in the 1990s. London: Macmillan. Bygate, M. 1987. Speaking. Oxford: Oxford University Press. Coulthard, M. (ed.). 1992. Advances in Spoken Discourse Analysis. London: Routledge. Pollitt, A. 1991. 'Reaction to C. Alderson's "Bands and Scores"' in Alderson and North. 1991. Sacks, H., E. Schegloff, and G. Jefferson. 1974. A simplest systematics for the organization of turntaking for conversation' in Language 50/4: 696-735. Tsui, A. 1994. English Conversation. Oxford: Oxford University Press. Underhill, N. 1987. Testing Spoken Language. Cambridge: Cambridge University Press.
178
Roger Nunn
Weir, C. 1990. Communicative Language Testing. Hemel Hemstead: Prentice Hall.
The author Roger Nunn started his career in language teaching in England in 1976. Since 1979 he has worked in TEFL overseas in Germany, Ethiopia, Qatar, and Japan. He is presently an Associate Professor in the Department of International Studies at the University of Kochi in Southern Japan. He has a Licentiate Diploma in TEFL (Trinity College, London) and an MA and PhD in TEFL from the Centre for Applied Language Studies at the University of Reading. His current academic interests include intercultural communication, spoken discourse analysis, classroom interaction, the international media, and the novels of Jane Austen. Email
Downloaded from http://eltj.oxfordjournals.org/ at Petroleum Institute (PI) on August 20, 2013
The unequivocal publication of objectives, and the practical value of tests being seen to test what is taught in a direct way, are two clear advantages of using rating scales, the same criteria being used for teaching, learning, and assessment. Rating scales provide a basis for course evaluation which goes to the heart of the matter. What was the students' starting level? How much did they improve in the skills which were targeted for improvement? Yet possibly the strongest argument in favour of spending so much time on improving the subjective assessment of actual performance is simply that it is done anyway by the majority of language teachers. The question, therefore, is not whether to do it, but how to do it as fairly and efficiently as possible.