Assertion-reason multiple-choice testing as a tool for

0 downloads 0 Views 90KB Size Report
Connelly's main findings were that ARQ tests were good substitutes for the more ..... To make the best use of these quizzes it is necessary to have at least read some of the .... Using multiple choice questions that relate to trickery rather than encouraging learning are ... the meaning is clear and the wording free of ambiguity.
Assessment & Evaluation in Higher Education Vol. 31, No. 3, June 2006, pp. 287–301

Assertion-reason multiple-choice testing as a tool for deep learning: a qualitative analysis Jeremy B. Williams* Universitas 21 Global, Singapore Assessment 10.1080/02602930500352857 CAEH_A_135268.sgm 0260-2938 Original Taylor 302005 31 Universitas JeremyWilliams 00000June and & Article Francis (print)/1469-297X Francis 21 & 2005 Evaluation Global5 Ltd Shenton in Higher (online) Way, Education #01-01 UIC [email protected]

This paper reflects on the ongoing debate surrounding the usefulness (or otherwise) of multiplechoice questions (MCQ) as an assessment instrument. The context is a graduate school of business in Australia where an experiment was conducted to investigate the use of assertion-reason questions (ARQ), a sophisticated form of MCQ that aims to encourage higher-order thinking on the part of the student. It builds on the work of Connelly (2004) which produced a quantitative analysis of the use of ARQ testing in two economics course units in a flexibly-delivered Master of Business Administration (MBA) program. Connelly’s main findings were that ARQ tests were good substitutes for the more conventional type of multiple-choice/short-answer type questions and, perhaps more significantly, ARQ test performance was a good predictor of student performance in essays—the assessment instrument most widely favoured as an indicator of deeper learning. The main focus of this paper is the validity of the second of these findings, analysis of questionnaire data casting some doubt over whether student performance in ARQ tests can, indeed, be looked upon as a sound indicator of deeper learning—student reactions and opinions suggesting instead that performance might have more to do with one’s proficiency in the English language.

Introduction Since the time they were first proposed by Arthur Otis, and used on a large scale by the US Army to measure the abilities of new recruits around the time of the First World War (Caruano, 1999), the efficacy of multiple-choice type questions (MCQs) as an assessment tool has attracted considerable debate. As Berk (1998) quips, the MCQ format ‘holds world records in the categories of most popular, most unpopular, most used, most misused, most loved, and most hated’. Indeed, the volume of literature that has been generated on the subject is quite voluminous, the bulk of it appearing during the 1990s. *Universitas 21 Global, 5 Shenton Way, #01-01 UIC Building, Singapore 068808. Email: [email protected] ISSN 0260-2938 (print)/ISSN 1469-297X (online)/06/030287–15 © 2006 Taylor & Francis DOI: 10.1080/02602930500352857

288 J. B. Williams Following in the footsteps of the United States, many countries around the world have embraced MCQ testing as the foundation of their testing systems. The main advantage of MCQ testing is its versatility. There are significant cost savings— particularly where large numbers are involved—and it is a format that can provide precision where other measurement options may be lacking (e.g. observing performance or interviewing). Criticisms of MCQs, on the other hand, tend to centre upon unreliability due to random effects (e.g. Burton, 2001), the inequity of the format in terms of its bias towards certain socio-economic or ethnic groups (e.g. De Vita, 2002), and also the depth of learning the format engenders (or lack thereof) (e.g. Leamnson, 1999, p. 111). This paper will reflect on this ongoing debate in the context of a graduate school of business in Australia where an experiment was conducted in the use of assertionreason questions (ARQ), a sophisticated form of MCQ that aims to encourage higher order thinking on the part of the student. The paper will first provide a brief insight into the background to the project, before reporting on the results of evaluations undertaken to date. In particular, it extends the quantitative study of one of the collaborators on the project (Connelly, 2004), who found ARQ test performance to be a good predictor of student performance in essay work which, appropriately structured, is the assessment instrument most widely favoured as an indicator of deeper learning (Brown et al., 1997; Haladyna, 1999). The findings of this paper do not contradict the conclusion drawn by Connelly, but suggest that student performance in ARQ tests may have as much to do with their linguistic skills and the time taken to process complex prose as with their conceptual understanding and problem-solving ability. The paper concludes that an ARQ format, although not widely used, constitutes a useful assessment tool and one that appears to be superior to the traditional MCQ format in terms of student learning outcomes. Importantly, though, to ensure equitable treatment of student groups, the questions need to tested and carefully edited, and might be better utilised for formative purposes, as an online, self-paced learning device. This conclusion is equally applicable to other disciplines and is not exclusive to economics and other business subjects. The context The Brisbane Graduate School of Business (BGSB) is one of six schools in the Faculty of Business at Queensland University of Technology (QUT), and was formed in 1995 to administer the MBA—a full-fee-paying program. Commencing in 1999, an innovative new MBA course structure was introduced offering prospective students greater flexibility and choice, through seven-week, half-semester-long course units. Since this time, student numbers have trebled at the same time as course fees have more than doubled, and entry standards have been raised. The BGSB currently has around 1,000 students in the MBA and associated programs. Around three quarters of these students are enrolled for part-time study. These students are almost exclusively Australian residents. The majority of full-time students are international in origin, recruited from 35 different countries, nearly all of whom speak English as

Assertion–reason multiple–choice testing 289 their second language. The average age of BGSB students is 33 years old, and male students outnumber females by a ratio of 3:2. Funded entirely from student fees, the BGSB, like other institutions in the same position, is very sensitive to market perceptions of its services. One strategy actively pursued by the School has been to gain an international reputation for the flexible delivery of its programs. Flexible delivery is, by definition, a client-oriented approach because it is a commitment, on the part of the education provider, to tailor courses to meet the various individual needs of its students. Furthermore, it is tacit recognition of the fact that the student profile has changed quite dramatically—socially, culturally, economically—and that, pedagogically, there is a need to cater for this increasingly diverse student body. The essence of flexible delivery is that it provides students with a number of different options for study. It is not prescriptive in the sense that one approach to study is identified as being superior to another. A student can chart a route through a degree that is most compatible with their social, family and working lives, and their preferred learning style. In short, flexible delivery is non-discriminatory, catering equally for an international student, a single parent working part-time, or a business executive travelling regularly overseas and interstate. At the heart of this strategy of flexible delivery has been the development of online learning and teaching (OLT) sites. The framework for OLT sites varies from course unit to course unit but, typically, there is a download facility where students can access PowerPoint lecture slides, solutions to problems, past examination papers and the like, discussion forums, chat rooms, and discipline-relevant web links. Until recently, however, little attention has been devoted to assessment, and how this might be integrated with the OLT system. The project Commencing in 1999, funding was secured to investigate the use of computerassisted learning in the form of optional weekly, timed, MCQ tests, which could be accessed remotely, or on campus. These traditional MCQ tests were being conducted in class at this time as part of formal assessment, and invigilated in the standard way. By putting them online, the idea was to enhance flexibility by providing opportunity for students unable to attend class to complete the tests, while at the same time freeing up class time for interaction and discussion. By September 2000, two economics course units within the MBA (GSN411: Economics of Strategy, and GSN414: Business Conditions Analysis) were trialling online MCQ tests, albeit with modified objectives. The tests, accessible via the OLT system, are marked automatically, providing students with instant feedback on their progress. Early in the trial, a student received a mark for participation (5% for the completion of five tests, as long as they scored 70% or above in each test) rather than a mark for performance. Even this low weighting was subsequently removed. The decision to go along this path arose largely because, despite the best efforts of the project team, no solution could be found to the

290 J. B. Williams problem of invigilation. Quite simply, a test involving the use of ‘point-and-click’ radio buttons was an open invitation for students to cheat if they were unsupervised. Thus, in the absence of any cheap and readily accessible devices for online test supervision, the project team elected to use the test banks they had developed primarily for formative assessment purposes where they would be used online. However, where class tests would continue to be held, ARQs would be used in preference to the traditional MCQ format. The questions Although the MCQ format has been criticised almost since the time of its inception, it perhaps met with its most formidable challenge during the 1990s as an increasing number of educationalists, guided by the constructivist theories proffered (most notably) by Marton & Säljö (1976a,b), Entwistle (1981), Biggs (1987, 1993), and Ramsden (1992), argued for teaching and assessment methods that encouraged higher order thinking skills. As Steffe and Gale (1995) point out, while constructivism offers no unitary theoretical position, whichever strand of constructivism one adheres to, most constructivists would agree that, essentially, learners arrive at meaning by actively selecting and constructing their own knowledge through experience (both individual and social). A key criticism of MCQ has been that real world tasks are not usually multiplechoice tasks, and passing an MCQ test is not equivalent to mastering a real world skill. In short, MCQs suffer from lack of authenticity (Wiggins, 1990). Ideally, say the critics, instructors ought to be asking questions that go beyond the mere memorisation of facts, encouraging students to apply, analyse and synthesise their knowledge. Hakel (1998), for example, makes the point that recognising a correct response from a list does not demonstrate students can construct that response themselves, the narrowness and appearance of precision in MCQs inhibiting other information relevant to decision-making. In response to the constructivists, there has been a steady stream of research work from the statistics community (see, for example, Case & Swanson, 1996 and Haladyna, 1999) that has tested a variety of MCQ structures for measuring complex cognitive outcomes. A strong case for the continued use of MCQs is also advanced by those advocating computer-assisted learning. Bracey (1998), a known critic of MCQs, opines that when teaching is lecturing and testing is multiple choice, one can never know for sure whether students have really understood what you were trying to teach them. However, he does concede that, at graduate school level, stems can be more complex and questions more subtly worded such that understanding is demonstrated. He also acknowledges the great promise of technology. The uptake of computer-assisted assessment has, indeed, been gathering pace and if the number and quality of international conferences dedicated to the subject is anything to go by, there is every chance that, if implemented with pedagogical (as well as technical) awareness, it will serve the educational sector well. This was certainly the feeling of the project team when it introduced the online ARQ tests.

Assertion–reason multiple–choice testing 291 There is nothing novel about the ARQ format. Heywood (1999) observes that ARQs first appeared in UK ‘A-level’ secondary school examinations in the 1960s, although it would seem the format was used even earlier than this in US medical exams (see Moore, 1954, in Hubbard & Clemans, 1961). It is quite surprising, therefore, that the academic literature on the subject is quite sparse. Connelly (2004) provides an overview of the existing published material (see Newble et al., 1979; Skakun et al., 1979; Fox, 1983; and Scouller & Prosser, 1994) before going on to extol the main virtue of the ARQ test item; viz. that ‘its structure facilitates the construction of questions that test student learning beyond recall. In particular, higher level thinking and application of key concepts may sometimes be more easily constructed using this format, than by using a conventional multiple-choice approach alone’ (Connelly, 2004, p. 362). A key concern of the project team, mindful of the criticisms made of MCQs, was to develop question sets that would test reasoning (procedural knowledge) rather than recall (declarative knowledge). In terms of Bloom’s taxonomy (Bloom, 1956), the goal was to focus on the highest levels of learning within the cognitive domain: analysis, synthesis and evaluation (see Figure 1). Carneson et al. (n.d.) in their application of Bloom’s taxonomy to different types of MCQs identify ARQs as belonging to the very highest level in the cognitive hierarchy because they contain elements of all the other categories, and the fact that ‘one is asked to pass judgement on, for

Figure 1.

Bloom’s taxonomy

292 J. B. Williams Assertion

Reason

In a small open economy, if the prevailing world price of a good is lower than the domestic price, the quantity supplied by the domestic producer will be greater than the domestic quantity demanded, increasing domestic producer surplus. (a) (b) (c) (d) (e)

BECAUSE

In a small, open economy, any surplus in the domestic market will be absorbed by the rest of the world. This increases domestic consumer surplus.

True; True; Correct reason True; True; Incorrect reason True; False False; True False; False

(The correct answer is (d).) Figure 2.

Example ARQ question

example, the logical consistency of written material, the validity of experimental procedures or interpretation of data’ (Carneson et al., n.d., Appendix C). An example question is illustrated in Figure 2. Like traditional MCQs, ARQs present students with a number of possible solutions. In contrast to traditional MCQs, however, ARQs also include a true/false element (CAA Centre, 2000). Specifically, each item consists of two statements, an assertion and a reason, that are linked by the word ‘because’. The student then selects from a multiple-choice legend after proceeding through a number of steps. First, he or she must determine whether the ‘assertion’ is true or false, and then whether the ‘reason’ is true or false. If one, or both, of the statements is deemed false, then the answer will be alternative (c), (d), or (e) accordingly. If, on the other hand, both statements are deemed true, a third step is required whereby the respondent must determine whether the second statement provides an accurate explanation for the first. Traditional MCQs usually test only one issue/concept. ARQs, on the other hand, test two per question (the assertion and the reason statements) plus the validity of the ‘because’ statement in the event assertion and reason are both correct statements. On the basis that judging the correctness of two statements must be harder than judging the correctness of one, it would follow that ARQs present more of an intellectual challenge than traditional MCQs. One might put forward the case that because options (a) and (b) require a third step of reasoning (only two steps being required if the learner correctly identifies a false statement), questions with correct answers of (c), (d) or (e) may be less effective in terms of learning outcomes. That is, assuming all answers (a)–(e) occur in roughly equal proportions in an exam, the ‘because’ statement would only be tested in 40% of the questions, the depth of learning being relatively less the remaining 60% of the time. However, this is hardly a reason for not using ARQs, as a two-step question is still preferable to a single-step traditional Figure 1. Bloom’s taxonomy

Figure 2. Example ARQ question

Assertion–reason multiple–choice testing 293 MCQ. If one wanted to reward ARQ test candidates in accordance with their depth of learning per question, then one possibility would be to assign a proportionately higher weighting to questions with solutions of (a) or (b). The project team elected not to proceed along these lines, preferring, instead, to view the learning experience of ARQ tests in their totality. Presented with ten such questions per test, the students involved in the trial were able to take each test as many times as they wished without penalty. Initially, to be eligible for the 5% credit allowed for completion of these tests, students were required to get at least 70% on each of the tests. To encourage the students to persist until they have got all the questions right, question feedback is generated after each attempt at the test, without explicitly presenting the student with the correct solution. The results As Connelly (2004, p. 363) observes, the writing of ARQ test items for the two course units trialling the assessment instrument was not particularly difficult, but it took some time for the students to become accustomed to the format. Where the questions were used offline in class, more time was required by the students to compute the solutions than had been the case with traditional MCQs, and average test scores were significantly lower. These outcomes notwithstanding, student evaluation of the ARQ format has been generally encouraging. Table 1 through to 5 present the results of an online questionnaire administered during March of 2001 that was open to all students who had been enrolled in the two economics course units over the previous six months. There were 69 respondents, which equates with around 15% of the total number of students enrolled in these course units during this time. (Note: some of these students took invigilated ARQ tests in class as well as online.) Table 1 clearly illustrates that this form of MCQ tests students’ intellect considerably more than traditional MCQs. Importantly, 64% of students were of the view that the learning outcomes associated with ARQ tests were superior to those associated with traditional MCQs, only 16% adjudging them to be inferior (Table 2). Examples of detailed student comments support this aggregate picture:

Table 1.

Level of intellectual challenge presented by ARQ

Question 1: In terms of the intellectual challenge it presented, how did you find the assertion-reason format? A. It was very challenging. B. It was moderately challenging. C. The challenge it presented was no different to any other type of multiple-choice testing I have encountered. D. It was moderately easy. E. It was childishly easy.

Score

%

30 34 5

44 49 7

0 0

0 0

294 J. B. Williams Table 2.

Learning outcomes produced by ARQ

Question 2: In terms of learning outcomes, how did you find the assertionreason format compared to the more traditional multiple-choice format? A. The assertion-reason format produced far superior outcomes. B. The assertion-reason format produced moderately superior outcomes. C. The learning outcomes were more or less the same. D. The assertion-reason format produced moderately inferior outcomes. E. The assertion-reason format produced significantly inferior outcomes.

Score

%

15 29 14 8 3

22 42 20 12 4

It forced you to learn as you progressed through the unit. It is a very good idea. [serial no. 69] I liked it because it made you think rather than just match the best one. [serial no. 72] I thought it was a good tool to help study content with. [serial no. 81] To make the best use of these quizzes it is necessary to have at least read some of the material being tested. This is a useful mechanism for slow starters. [serial no.26] Mimics real world decision making. Useful for understanding concepts. Anything to get away from rote learning testing, which is non-productive. [serial no. 129] They [the assertion-reason format] are a much more challenging format from the conventional multiple-choice questions [sic]. Hence seem more suited to a masters level. [serial no. 34]

The frequency of the online tests (one per week) also receives the resounding support of the students (Table 3), and while 45% of students felt the mark 5% for participation (given earlier in the trial) was about right, a total of 42% felt a larger weight should be attached (Table 4). This sentiment is probably a reflection of the amount of time and effort required on the part of the student to complete the tests. However, with a pass mark of 70% and the student able to have as many attempts as they like with no invigilation, the project team was loath to give a higher weighting. (Indeed, for reasons explained earlier, the 5% participation mark has since been removed altogether.) The results from Table 5 also give the project team some cause for optimism, 56% of respondents stating that they think the instrument could be used on other course

Table 3.

Frequency of ARQ tests

Question 3: How useful did you find the quizzes in terms of their frequency? A. There were way too many. B. There was a few more than necessary. C. One a week is just about right. D. There could have been a few more. E. There could have been a lot more. F. No answer.

Score

%

1 3 61 0 1 3

1.5 4.5 88 0 1.5 4.5

Assertion–reason multiple–choice testing 295 Table 4.

Weighting of ARQ assessment item

Question 4: Given that there is no mechanism for guarding against student cheating, what proportion of the course unit marks do you think should be allocated for these quizzes? A. The proportion of total marks ought to be increased significantly. B. The proportion of total marks could be increased slightly. C. 5 per cent is just about right. D. The proportion of total marks could be reduced slightly. E. No marks should be allocated. The assessment should be entirely formative. F. No answer

Score

%

2 27 31 1 7 1

3 39 45 1.5 10 1.5

units, and a further 25% saying that ARQ tests should be a feature of all course units in the MBA. Only 1.5% of respondents called for the idea to be abandoned. On the downside, some student comments suggest that caution should be exercised in the preparation of ARQ questions, implying that the degree of difficulty had a lot to do with semantics and one’s mastery of the English language. Examples include: Most questions were very good. Sometimes the wording was quite tricky … the assistance given after each attempt … helped in the understanding of the subject. [serial no. 8] … semantics seem to play a large role in determining the correct answer. [serial no. 67] It was challenging due to ambiguity rather than degree of difficulty. [serial no. 33] I found the assertion-reason format overly difficult in relation to testing your English skills rather than knowledge of the subject being studied. i.e. you had to pick up on little idiosyncrases (spelling?)[sic] on the way in which the questions were worded. This is a difficult and onerous thing to do when already nervous and on edge in a test environment. [serial no. 90] To get an answer right doesn’t necessarily show you are a better economist than someone who gets it wrong: I think what it shows is that you are better at logically answering structured problems. [serial no. 26] With the array of options it is necessary to make sure there are no ambiguous questions or facts [sic] stated within those questions. I guess this is the same with any multiple choice exam. [serial no. 26] Table 5.

Relative merit of ARQ assessment instrument

Question 5: Would you like to see this kind of on-line, formative assessment used more widely? A. Yes, it’s a great idea—all course units should have quizzes like these on their OLT sites. B. Yes, it could work well for some course units. C. I don’t feel strongly either way. D. No, the assessment type is ok, but it shouldn’t be used on-line. E. No, it’s a waste of time—abandon the idea. F. No answer.

Score

%

17

25

39 6 4 1 2

56 8.5 6 1.5 3

296 J. B. Williams Eighteen months after this initial survey, another online questionnaire was circulated, this time to all MBA students. The survey focused on assessment practices in general, rather than ARQ tests in particular, but one of the five questions was dedicated to ARQ questions and their relative merits when compared to traditional MCQs. A total of 187 students responded which corresponded with approximately 20% of enrolled students. The responses to this question are presented in Table 6. On this occasion, while the students are still showing a preference for ARQ over traditional MCQs (39% compared to 29%), this inclination appears less pronounced than it was 18 months previously. One possible explanation for this is that as ARQ tests became more widely used in the School, quality control diminished. When the project team piloted the new assessment instrument, great care was taken to avoid ambiguity and overly complex language, and as the comments above suggest, they were not always successful. Upon close inspection of individual comments from students in the second survey, specifically in relation to Question 6, and in discussing the matter with a student focus group of nine students (including eight international students), there is considerable evidence to support this hypothesis. A selection of student comments is detailed below: Assertion reason are useful, however are often worded very poorly or ambiguously. They should be used only when the question can be framed clearly, without ambiguity. [serial no. 625] I have been surprised by the extensive use of MC at BGSB. I believe they can be useful as self-paced learning tools, however I don’t believe that should contribute substantially to the overall mark. While I understand the theory behind using assertion-reason MC, I Table 6.

The role of multiple-choice type assessments in a business school

Question 6: Which of the following options best describes your view on the role of multiple-choice type assessments? A. The more traditional type of multiple-choice questions is a useful means of assessment because they help me learn. The assertion-reason type of multiplechoice questions is less useful. B. The assertion-reason type of multiple-choice questions is a useful means of assessment because they help me learn. The more traditional type of multiplechoice questions is less useful. C. The more traditional multiple-choice type questions are a useful means of assessment because they help me learn, but they should be located on OLT sites for formative assessment (self-paced learning) purposes only. D. The assertion-reason type of multiple-choice questions is a useful means of assessment because they help me learn, but they should be located on OLT sites for formative assessment (self-paced learning) purposes only. E. There is no place for multiple-choice type questions in a Master’s level course. F. No answer.

Score

%

30

16

42

22

24

13

31

17

33

18

27†

14

†The high proportion of people who elected to submit no answer can explained by the fact many students had had no exposure to ARQ questions and therefore were not in a position to comment.

Assertion–reason multiple–choice testing 297 sometimes feel that the correct answer is too much about interpreting the phrasing of the question, not about whether I understand the material. [serial no. 634] Assertion-reason MC test are good for black/white subjects like Economics. In soft subjects like Entrepreneurship they are very subjective. [serial no. 781] The assertion-reason questions should not play so much on double negatives as this does not assist in learning. [serial no. 887] MCQ questions will work perfectly with the absolute and exact answer questions. i.e. Finance, Accounting or any exact answers. … For some argueable [sic] answers, these kind of assessments create confusion and upset since we were forced to accept what the lecturer thinks is the right answer. Indeed, at Masters’ level, the argument about what we believe is more important than the right or wrong answer. [serial no. 945] All the MCQ tests I have done to date—and especially the assertion reasoning ones. Those currently being used for the 2a Entrepreneurship … unnit [sic] are a classic example: They often show insufficient attention both to the logic behind the question, and fail particularly from the use of imprecise language to express the question. This leads to ambiguity and frustration: what I call speed camera questions! [serial no. 977] Using multiple choice questions that relate to trickery rather than encouraging learning are pointless. This has been experienced in a core subject that I have completed and is shared among other students. [serial no. 1216] In my personal opinion the assertion-reason questions are very good, but they are much more difficult for international students, who do not speak English as their first language. [serial no. 1581] The assertion reasoning type questions should not be used a summative assessment, because it takes up too much time especially in strict examination type setting. But it is useful as a learning tool. [serial no. 1756] Assertion-reason is an excellant [sic] study tool, however I have found that certain lecturers use vague or misleading statements that do not serve the purpose of assertion reason. This is the major fault in this type of assessment. [serial no. 1994] It is not useful when some methods of assessment are unrealistically strict on time e.g. assertion and reasoning multiple choice questions—I had an exam that was 20 minutes all up including perusal and there were 15 very complex questions and perusal time end was not announced. The whole exercise was a waste of time due to unrealistic timing— this left 1 minute per question and the wording was so complex it took 1 minute to just read the question. [serial no. 2024] MCQ’s have an important role to play as they can differentiate understanding and application of what is read and applied. They should test knowledge and not require you to have a Masters in English to remove ambiguity. [serial no. 1197] Most multiple choice type questions end up in debate over English rather than the true purpose of the question. In real life, where are they used? [serial no. 1364]

Summary and conclusions This paper began by pointing out that, for many, the MCQ test is an economical and versatile assessment instrument capable of providing the necessary precision required to measure learning outcomes. Critics of MCQs, meanwhile, question its validity in certain settings. Typically, criticisms fall into one of three categories: those that

298 J. B. Williams concentrate on unreliability arising from random effects such as guessing, those that focus upon the inequity of the format in terms of its inherent bias towards certain socio-economic or ethnic groups, and those that question the depth of learning the instrument is capable of producing. Mindful of these philosophical positions, an experiment was conducted with graduate level business students that focused primarily on the third category of criticisms. The aim, simply, was to investigate the robustness of an ARQ test format to determine whether it was possible to assemble questions that induced the kind of higher-order thinking generally required of graduate students. The first part of this experiment was essentially quantitative in nature, and results of regression analysis showed ARQ test performance to be a good predictor of student performance in essays; the assessment instrument most widely favoured as an indicator of deeper learning (Connelly, 2004). The second part of the experiment was to analyse the qualitative data collected by the project team to ascertain whether this lent support to the hypothesis that there is a positive correlation between student ARQ test performance and their performance in essays. Analysis of the qualitative data collected in this second phase of the project reveals that student performance in ARQ tests—in this experiment, at least—may have as much to do with a student’s linguistic skills and the time taken to process complex prose than their conceptual understanding and problem-solving ability. This does beg the question, of course, as to whether students’ essay performance is also largely determined according to one’s proficiency in the English language. Assuming an essay is structured to facilitate deep learning in the first place and not to simply regurgitate text (Williams, 2004), this ought not be the case as a student is able to construct a response to an essay question and convey meaning; the structure and internal consistency of this response being more important than the mastery of the finer points of English grammar. Such an active role is not possible in an ARQ test setting where the student role is more passive. This is not to reject the ARQ model because there is certainly sufficient positive comment from students to suggest that ARQ tests are capable of producing useful learning outcomes, especially if there is some interaction as a consequence of their being online. Importantly, though, for this type of assessment instrument to be truly effective, psychometric editing of questions is a must. This could be said of all MCQs, of course, but given the additional complexity of ARQs, it is even more important that the meaning is clear and the wording free of ambiguity. This means beta testing—not just to check test design and administration are functional—but to ensure that test items are constructed in accordance with accepted standards and practices for ‘highstakes’ testing. Statistical analysis of the results of the beta test will reveal which questions are too hard or too easy, which discriminate among more knowledgeable and less knowledgeable candidates, which show evidence of not being clearly understood, and so on. Those items that demonstrate good psychometric performance can remain and those that do not may be edited to improve communication, be it a case of correcting grammar or spelling, maintaining a consistent style, or removing potentially offensive or biased language.

Assertion–reason multiple–choice testing 299 In the absence of such intervention, one will inevitably be subject to criticism from those parties who maintain that MCQs produce inequitable outcomes for certain student cohorts—in this case, those for whom English is a second language (see Paxton, 2000). For an institution like the BGSB, with its public commitment to flexible delivery, this kind of criticism is particularly unpalatable. In the context of the two economics course units at the centre of this project, the professional opinion of an academic linguist and ethicist (Gesche, 2003) is unequivocal: For a native English speaker, your ARQ questions are very elegant, precise, concise and logical. Many of the ARQ questions for GSN411 are linguistically and conceptually just beautiful. I am not surprised that they received some critical acclaim. … However, for a non-English speaking person (NESB) this lack of ‘redundancy’ (or economy of words) plus a selection of relatively uncommon words (for a NESB person) can cause tremendous problems. … I think it was a good idea to take the quizzes out of any timed, classroom assessment. …

In conclusion, while the depth of learning is unlikely to parallel that emanating from more authentic learning tasks such as case study analysis or some other aspect of a problem-based curriculum (Williams, 2004), the experience with ARQs in this experiment would suggest that learning outcomes are likely to be superior to those produced by traditional MCQs which tend to focus on recall rather than reasoning. An important lesson to be learnt, however, is that there is absolutely no margin for error when it comes to the authoring of questions. It is conceivable that in disciplines other than business (science or mathematically-oriented subjects, perhaps), assertion-reason statements may be linguistically more straightforward in which case this will be less of an issue. However, irrespective of the discipline, if ARQs are to be used effectively they need to be psychometrically tested prior to use. This would reduce the likelihood of any criticism from an equity point of view. ARQs might also be more appropriately utilised in an online environment for formative assessment purposes only, where there is no time constraint and where there is ample opportunity for students to master any linguistic intricacies. Acknowledgments The project on which this paper reports was made possible through a Small Teaching and Learning Grant provided by Queensland University of Technology (QUT). The author wishes to acknowledge the contribution of co-collaborator, Dr Luke Connelly, and the assistance of Elizabeth Heathcote from the Software, Multimedia and Internet Learning Environments (SMILE) section within Teaching and Learning Support Services (TALSS) at QUT. The author also gratefully acknowledges the comments of two anonymous referees on an earlier draft of this paper. Any remaining errors or inaccuracies are the responsibility of the author, and the author alone. Note on the contributor Jeremy Williams is currently Director of Pedagogy and Assessment and Associate Professor in E-Learning at Universitas 21 Global (U21G), and Adjunct Professor

300 J. B. Williams in Economics at the Brisbane Graduate School of Business (BGSB). At U21G, he is responsible for the oversight of all aspects of pedagogy and assessment, specifically in relation to quality control and the application of best practice. One of Jeremy’s main research interests is the question of authentic assessment and the ways in which assessment items might be contextualised to promote greater student engagement and deeper learning. In addition to his work in the e-learning area, he has spent the last two decades teaching, researching and consulting in the field of economics, with work experience in several countries including Australia, the United Kingdom, France, Singapore, Malaysia and India. Before joining Universitas 21 Global in 2003, Jeremy was Teaching Fellow and Director of the MBA program at the BGSB.

References Berk, R. A. (1998) A humorous account of 10 multiple-choice test-item flaws that clue testwise students, Electronic Journal on Excellence in College Teaching, 9(2). Available online at: http:// ject.lib.muohio.edu/contents/article.php?article=170 (accessed 31 December 2004). Biggs, J. (1987) Student approaches to learning and studying (Hawthorn, Victoria, Australian Council for Educational Research). Biggs, J. (1993) What do inventories of students’ learning process really measure? A theoretical review and clarification, British Journal of Educational Psychology, 83, 3–19. Bloom, B. S. (Ed.) (1956) Taxonomy of educational objectives: the classification of educational goals: handbook I, cognitive domain (London, Longman Group). Bracey, G. (1998) Put to the test: an educator’s and consumers guide to standardized testing (Bloomingto, IN, Phi Delta Kappa). Brown, G., Bull, J. & Pendlebury, M. (1997) Assessing student learning in higher education (London, Routledge). Burton, R. F. (2001) Quantifying the effects of chance in multiple choice and true/false tests: question selection and guessing of answers, Assessment and Evaluation in Higher Education, 26(1), 41–50. Computer Assisted Assessment (CAA) Centre (2000) Designing and using objective tests (University of Luton, CAA Centre). Carneson, J., Delpierre, G. & Masters, K. (n.d.) Designing and managing multiple choice questions. Available online at: http://web.uct.ac.za/projects/cbe/mcqman/mcqman01.html (accessed 31 December 2004). Caruano, R. M. (1999) An historical overview of standardised educational testing. Available online at: http://www.gwu.edu/∼gjackson/caruano.PDF (accessed 31 December 2004). Case, S. M. & Swanson, D. B. (1996) Constructing written test questions for the basic and clinical sciences (Philadelphia, PA, National Board of Medical Examiners). Connelly, L. B. (2004) Assertion-reason assessment in formative and summative tests: results from two graduate case studies, in: R. Ottewill, E. Borredon, L. Falque, B. Macfarlane & A. Wall (Eds) Educational innovation in economics and business VIII: pedagogy, technology and innovation (Dordrecht, Kluwer Academic Publishers), 359–378. De Vita, G. (2002) Cultural equivalence in the assessment of home and international business management students: a UK exploratory study, Studies in Higher Education, 27(2), 221–231. Entwistle, N. (1981) Styles of learning and teaching: an integrated outline of educational psychology for students, teachers and lecturers (Chichester, John Wiley). Fox, J. S. (1983) The multiple choice tutorial: its use in the reinforcement of fundamentals in medical education, Medical Education, 17, 90–94. Gesche, A. (2003) Personal e-mail (April).

Assertion–reason multiple–choice testing 301 Hakel, M. D. (Ed.) (1998) Beyond multiple choice: evaluating alternatives to traditional testing for selection (Mahwah, NJ, Lawrence Erlbaum Associates). Haladyna, T. M. (1999) Developing and validating multiple-choice test items (2nd edn) (London, Lawrence Erlbaum Associates). Heywood, J. (1999) Review: assessing student learning in higher education, Studies in Higher Education, 24(1), 133–134. Hubbard, J. P. & Clemans, W. V. (1961) Multiple choice questions in medicine: a guide for examiner and examinee (Philadelphia, PA, Lea and Febiger). Leamnson, R. (1999) Thinking about teaching and learning (Sterling, VA, Stylus Publishing). Marton, F. & Säljö, R. (1976a) On qualitative differences in learning—1: outcome and process, British Journal of Educational Psychology, 46, 4–11. Marton, F. & Säljö R. (1976b) On qualitative differences in learning—2: outcome as a function of the learner’s conception of the task, British Journal of Educational Psychology, 46, 115–127. Moore, R. A. (1954) Methods of examining students in medicine, Journal of Medical Education, 29(1), 23–27. Newble, D. I., Baxter, A. & Elmslie, R. G. (1979) A comparison of multiple-choice and freeresponse tests in examinations of clinical competence, Medical Education, 13, 263–268. Paxton, M. (2000) A linguistic perspective on multiple choice questioning, Assessment & Evaluation in Higher Education, 25(2), 109–119. Ramsden, P. (1992) Learning to teach in higher education (London, Routledge). Scouller, K. M. & Prosser, M. (1994) Students’ experiences in studying for multiple choice question examinations, Studies in Higher Education, 19, 267–279. Skakun, E. N., Nanson, E. M., Kling, S. & Taylor, W. C. (1979) A preliminary investigation of three types of multiple choice questions, Medical Education, 13, 91–96. Steffe, L. P. & Gale, J. (Eds) (1995) Constructivism in education (Hillsdale, NJ, Erlbaum). Wiggins, G. (1990) The case for authentic assessment, Practical Assessment, Research & Evaluation, 2(2). Available online at: http://pareonline.net/getvn.asp?v=2&n=2 (accessed 31 December 2004). Williams, J. B. (2004) Creating authentic assessments: a method for the authoring of open book open web examinations, in: R. Atkinson, C. McBeath, D. Jonas-Dwyer & R. Phillips (Eds) Beyond the comfort zone: proceedings of the 21st ASCILITE Conference (vol. 2). Available online at: http://www.ascilite.org.au/conferences/perth04/procs/pdf/williams.pdf (accessed 31 December 2004).

Suggest Documents