Evaluating student assessments: the use of optimal ...

6 downloads 22 Views 268KB Size Report
Jan 7, 2015 - To cite this article: W. Brian Whalley (2015): Evaluating student ...... I thank Professor Bob Elwood, for comments on an earlier version of this ...
This article was downloaded by: [Professor Brian Whalley] On: 07 January 2015, At: 02:57 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Assessment & Evaluation in Higher Education Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/caeh20

Evaluating student assessments: the use of optimal foraging theory W. Brian Whalley

a

a

Geography Department, University of Sheffield, Sheffield, UK Published online: 03 Jan 2015.

Click for updates To cite this article: W. Brian Whalley (2015): Evaluating student assessments: the use of optimal foraging theory, Assessment & Evaluation in Higher Education, DOI: 10.1080/02602938.2014.991909 To link to this article: http://dx.doi.org/10.1080/02602938.2014.991909

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Conditions of access and use can be found at http://www.tandfonline.com/page/termsand-conditions

Assessment & Evaluation in Higher Education, 2014 http://dx.doi.org/10.1080/02602938.2014.991909

Evaluating student assessments: the use of optimal foraging theory W. Brian Whalley*

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Geography Department, University of Sheffield, Sheffield, UK The concepts of optimal foraging theory and the marginal value theorem are used to investigate possible student behaviour in accruing marks in various forms of assessment. The ideas of predator energy consumption, handling and search times can be evaluated in terms of student behaviour and gaining marks or ‘attainment’. These ideas can be used to examine student responses to dealing with assessments by examining a marks awarded/time-on-task curve. The nonlinear, cumulative mark gain, as a Gompertz function, has implications for how students tackle continuously assessed projects as well as examination questions. The attainment of a student can be viewed in these general terms, as well as in specific aspects such as question ‘difficulty’ and mark gain in an examination answer. Prospect theory, from econometrics and psychology, can also be used to suggest ways in which students might tackle problems in examinations. The implications of this analysis are considered with respect to setting questions, criterion referencing of assignments and dealing with ‘troublesome knowledge’. The ideas can also be used to assist problems regarding mark fidelity and integrity as well as mark comparability. Keywords: assessment; optimal foraging theory; marginal value theory; prospect theory

Introduction Assessment is generally seen as the main promoter of student ‘learning’. In its diverse forms, it is one of the commonest things tutors do in higher and further education, and yet, is one of the most difficult to accomplish satisfactorily. Furthermore, a wide variety of assessment procedures exist in higher and further education institutions (Bryan and Klegg 2006), and these need forms of comparison or standards in order to moderate them for various programmes of study. External examiners view and read students’ contributions, but underlying principles and concepts are only now being investigated. Biggs (2003) has stressed the importance of ‘constructive alignment’ in assessment, but less widely known are the linkages to ‘thinking styles’ to enhance learning via assessment (Biggs 2001). For the examination component, books have been devoted to assisting students with the practicalities of assessment (e.g. Cottrell 2012). However, there has been very little attention to ways in which students actually do assessed pieces of work, whether examinations, essays, multiple-choice questions, reports or practical work in general. This article examines, in formal but general terms, some aspects of assessment to see how they are related to some of the ideas mentioned above. It does this by employing concepts from ecology and psychology; namely, foraging theory and the *Email: b.whalley@sheffield.ac.uk © 2014 Taylor & Francis

2

W.B. Whalley

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

marginal value theorem. The aim is to cast light on assessment processes by examining mark attainment over time and to show that these ideas can also be related to some recent contributions to the topic of assessment (Knight and Yorke 2003; Baum, Yorke, and Coffey 2004; Dunn et al. 2004; Falchikov 2005; Bryan and Klegg 2006; Boud and Falchikov 2007; Rust 2007; Stobart 2008). At a purely practical level, Race (2001, 2005) considers advantages and disadvantages of many types of assessment that are touched upon below, as are some of the examination-related issues mentioned by Miller and Parlett (1974). Background This paper is part of a deconstruction of assessment and examines the basic educational processes with some ideas from behavioural ecology. By this route, it tries to highlight some aspects that tutors, as well as examiners, might be able to improve and thereby enable students to do better. For example, we might suggest that students should plan their assessments and not leave things until the last minute. This may seem obvious – to both student and tutor – but this does not stop either party leaving things until the last minute. Of concern here is not just the time available, although there are implications, but in how students should use their time wisely and the best way for them to do so. In other words, giving thought to the overall process, which is often much more than just sit down, ‘think’ and write. The critique goes beyond this to look at ways in which students gain marks as part of their achievements. It is also designed to help assessment setting and to bridge the gap between tutors’ demands and students’ fulfilment of various assessment tasks. The approach is also designed to allow better formulation of rather vaguely worded statements about assessment, and link to practical help provided by various authors (Ramsden 2003; Race 2005; Brown and Pickford 2006), as well as provide insights into assessment from the student point of view (Cottrell 2012). Tools for comparing assessments are rare, although Sadler (2009a, 2010) has recently examined the concepts of ‘fidelity’ and ‘integrity’ in academic assessment and achievement. There are explanations about forms of assessment (Stobart 2008), but there is, apparently, no discussion of any ‘theoretical’ basis for assessment. This paper attempts such an overview. Pub quizzes, tests and multiple-choice questions We start by considering perhaps the simplest style of assessment, whether summative or formative; a test of vocabulary, general knowledge or arithmetic, as typically performed in primary education. Pub, radio and TV quizzes are mostly of this kind. They are used to discriminate between individuals or teams, and are not held to be conducive of ‘deep learning’ in an academic context. Multiple-choice questions are a somewhat more sophisticated version of such tests. Figure 1 shows the mark gain for totally correct answers (line a) and a generalised, less than perfect, scoring (b) over the time of the test. In a multiple-choice question set, it would be reasonable to think of a linear mark gain from the start to the end of the examination or test, assuming that all questions are weighted equally. This does not necessarily means that the questions are of the same ‘difficulty’, although students may perceive such differences.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Assessment & Evaluation in Higher Education

3

Figure 1. Generalised mark gain for a simple test or Multiple-Choice Question approach to assessment where: (a) perfect score, (b) less than 100%, the loss of marks could be anywhere on the line and produce a stepped appearance.

Most forms of assessment are not like simple tests, although whether they necessarily promote deep learning will not be discussed here. We need a rather different model to examine mark gain over time in assessments. Optimum foraging theory Optimal foraging theory, from ecological theory and some related concepts from psychology, helps visualise the manner in which students approach a problem of foraging for marks (i.e. ‘nutrition’). In the following, some sort of practical activity is envisaged but it can also encompass essay writing, examinations and the general context of ‘attainment’. Optimal foraging theory was developed by animal ecologists in the 1970s; further information can be found in several books and papers (Stephens and Krebs 1986; Kamil, Krebs, and Pulliam 1987; Stephens 2008) and aspects of theory in, for example, Pirolli (2007). A basic exposition is given in Wikipedia, and also the related concept of marginal value theorem. For clarity, the following uses the terminology and material of these Wikipedia articles unless otherwise stated. The application to information searching is provided by Pirolli (2007), although a simpler model will be used here. Applications in behavioural ecology and related ideas (without mathematics and graphs) can be found in Bird and O’Connell (2006). The basic idea behind optimal foraging theory can be considered in terms of animals’ need to intake food for a healthy metabolism: cows in pasture, lions on the savannah, goats in scrubland, etc. In general terms, optimal foraging theory tries to model behaviour of ‘predation’ (including herbivores as well as carnivores). In particular, it assumes that predators focus on consuming the most energy while expending the least amount of energy in the process of searching for it, killing and devouring. It is useful to visualise a carnivore, such as a lion, at the top of a local food chain – this provides an analogy with students. The following is an outline of the concept and defines the following components: E, the amount of energy (calories or kJ, the units are not significant) derived from a prey item (or plants in an area or patch). Although there are other nutritional requirements, only the energy-related term is considered. Note that ‘energy’ is closely related to the concept of ‘work’ and has the same units.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

4

W.B. Whalley

h, the ‘handling time’ which includes capture, killing, eating and digesting the prey; it starts once the prey has been identified. Thus, E/h is the ‘profitability’ of the prey item. Various prey types will be identified and eaten with varying amounts of profitability. Note that ‘power’ is work done per unit time, and that we can consider a system operating under low power or producing high power, although in the following discussion, we are generally looking at averages. For an animal in the wild, a search term, time, s can also be assumed. This, of course, may vary according to season, availability of various prey types, density of foraged material, etc. Hence, the predator will try to maximise E/(h + s). For a range of prey types, the predators average intake rate is Eav/(hav + sav), where Eav is the average energy of all prey items in the diet; hav is the average handling time and sav is the average search time. In ecological practice, observations could provide these data. When the predator has found an item that it does not currently eat, it has two choices. It can eat the new item, in which case the profitability is Enew/hnew or it can leave it and search for an item already in its diet, in which case we should use Eav/ (hav + sav). The predator should eat this new item when Enew/hnew ≥ Eav/(hav + sav). This is because the new item increases the predator’s energy intake per unit time. From the above, the following ecological scenarios can be envisaged: (1) Predators with short handling times (h) and long search times (s) should be generalists and include a wide range of prey types. (2) Specialists will have long handling times and relatively short search times; lions have a very short search time but a high (energy consumptive) handling time, which can be prohibitively large for some prey. They therefore pick out the sick and old. (3) In unproductive environments, predators should be generalists, but in productive environments, they should be specialists. Response curves allow us to look at certain aspects, especially handling terms, more specifically. As each prey item has its own search time then: (4) When predator density increases, the search time depends on the density of the prey. (5) At low prey densities, the predator is searching most of the time and eating every prey item it can find and kill; the search time s is relatively large (compared with h). (6) At high prey densities, each new prey item is caught almost immediately. The predator spends almost all of its time catching, eating or digesting the prey; s is small. It chooses only those individuals with the highest E/h. Some of these implications will be used in discussing assessment tasks.

Student behaviour and OFT We can now consider ‘a student’ (indeed academic or tutor) in a similar way to an animal foraging, and use the same symbols, analogous terminology and numbered consequences.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Assessment & Evaluation in Higher Education

5

Thus, E would correspond to the yield or marks for a particular piece of assessment (scaled to 100%). More generally, we might wish to think of this as an ‘achievement’. The handling time (h) is that time spent on task from start to finish; a student would want to maximise E/h in much the same way as an animal. What actually constitutes h depends on the task, but might include time in the laboratory/library, doing experiments, producing graphs, writing and even thinking. All these need to be taken into account when setting a task for students and evaluating the necessary time for the activity. Optimal foraging theory identifies s as the seeking or search time, such that total time on task is h + s. The type of predator/environment relationship is significant. For the student model, we might think of s as a ‘prevaricator’ or ‘delay’ term. We can, thus, redefine the search term (s) as a potential delay component that might be sub-divided as (s1 + s2 +….) where, for example, the times might be: s1 general (‘I wonder how to tackle this’) s2 a further procrastination components (‘I’m doing sport this weekend’) s3 a more specific delay (‘I really don’t understand this’, etc.)

For various tasks, there may be different components of s according to student experience, aptitude, etc. Different students in a class may ascribe to these values in different ways or use them as excuses. The components may have to be recognised, even if not acted upon. An immediate suggestion here is for institutions to show students how to minimise s (prevaricator) components by helping students to identify and reduce them at an early stage in their educational programmes. We can revisit the forms of the expression, animal behaviour and possible student behaviour and re-examine student assessment/achievement in the light of optimal foraging theory. For an animal, the length of times of s and h may not be normally significant (as long as some food is caught so the animal lives). This is slightly different for a student where, if the total time available for completion of a task is T, then T = h + s and h needs to be maximised and s (and possible sub-components) minimised. Students need marks as much as animals need food. Using optimal foraging theory in practical considerations in setting tasks Here, it is helpful to differentiate between tasks, what tutors set students to work on, and activities, which are what students actually do (Beetham 2013). In this way, we may suggest that, as well as components that give rise to an estimate of h (time on task), the components of s also need to be considered. Both may require specific explanation to students, especially those entering a world requiring more independent work than previously encountered. Evidence suggests that students, especially incoming students, are largely unaware of the terminology used in the assessment process (‘explain’, ‘describe’, essay, report, etc.), let alone how such items are implemented. For example, guidance could be given in the following way to help students plan a project: h1:

plan what you need to do for the activity (and feed back here to revise your estimate) (15 min)

6

W.B. Whalley h2: h3: h4: h5: h6:

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

h7:

consider health and safety issues (15 min) set out the equipment and consider the specifics of what you actually have to do (30 min) perform the experiment and check you have all the data required, tidy up, etc. (2 h) do the calculations and plot the results and check h4 (1 h) do a preliminary draft of your report, check other information required from library, web sources, etc. (3 h) write report, proofread, check, reread and then hand it in (2 h).

Thus, h (optimally) = sum (h1 : h7) = 9 h, Add a safety factor of 10% = 10 h; i.e. this is possible in one day. Time available from start to hand in: Tavailable = h + s = 2 weeks (say) Such a project may appear to be over-specified, but again, this may be necessary in making sure that students engage with all that is required and gain experience in foraging for marks. Such a time allocation model illustrates the need for tutors to make sure there is adequate time, scheduling equipment and rooms, etc., as well as expected times on task required for the student. This is tacit knowledge; when students become more adept at experimentation then less scaffolding need be supplied. What we tend to forget are the insidious effects of the s components, the reasons why students hand essays in late or leave it until the last minute. These are the aspects that (presumably) animals do not do as they may affect the total time taken by adding to each component of h. For example, for students: h4:

may be followed by sh4 (‘going to have a coffee’)

h5:

might incorporate sh5 (‘how on earth do you do this calculation?’)

h6:

might involve, spread out sh6 (‘I’ve been searching for the data in the library and on the web and it took me ages to find it.’)

h7:

might involve the activity (‘I mislaid my memory stick’).

The statements in parentheses could become indicators of threshold knowledge or excuses for lateness. Knowing what these statements are for a class could be used to provide informal information for tutors. Considering the other aspects noted above, what might these tell us about tasks for students, especially given that student behaviours may well be very diverse in a class (corresponding to all sorts of foragers/predators)? With ever-widening participation in the UK system some attention needs to be given to the way students approach problems, especially if they have received little active experiential education. Not all students will have the same ways of treating problems. This relates to students’ ‘thinking styles’ (Sternberg 1997; Biggs 2001) rather than ‘learning styles’. The following items are suggested interpretations from predation theory to student activity. They may be forced or even naive but the general model does provide a student focus that is helpful. (1) Predators with short handling times and long search times should be generalists and include a wide range of prey types. This aspect of student behaviour should be minimised. Just because all tasks so far have been done rapidly (low h) by a student it does not mean that a problem will not occur, especially if things are left to the last minute and a computer failure occurs or a bus is missed. Tutors might like to investigate various tasks to maximise educational competence and identify means of overcoming ‘troublesome knowledge’.

Assessment & Evaluation in Higher Education

7

(2) Specialists will have a long handling times and relatively short search times. Tasks with long handling times (T) for a task should be treated carefully in case problems occur. This may not be difficult for a laboratory experiment and write, up but must be looked at carefully (by providing formative assistance) where T = several months, such as for a dissertation. In an undergraduate degree, it may not be beneficial for a student to require a high degree of competence before a piece of work can be completed.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

(3) In unproductive environments, predators should be generalists but in productive environments they should be specialists. This could apply when a student is looking for a project to undertake, does not know where to go for data or where information searching techniques are not adequate (Pirolli 2007). For example, direction from a dissertation supervisor at the early stages of formulating a question is very important.

(4) When predator density increases, the search time depends on the density of the prey. Are there enough pieces of equipment; are the necessary books in the library?

(5) At low prey densities, the predator is searching most of the time and eating every prey item it can find and kill. One corollary is that care should be taken in choosing dissertation (including PhD) topics. In particular, students need support if results are not forthcoming or if achievement is deemed to be low. There are clearly personal points of view here; does the student feel reluctant to seek the supervisor? There may be cultural as well as personal aspects to take into consideration. Unless appropriately directed, students may embark on reading that is knowledge accretion into ever-finer detail without considering the wider view. Response curves, of predation against time, allow us to look as certain aspects of foraging, especially handling terms. More specifically, this can be done by considering the marginal value theorem.

(6) At high prey densities, each new prey item is caught almost immediately. The predator spends almost all of its time catching, eating or digesting the prey. This might be taken as a justification for multiple-choice questions, although the metacognitive approach using certainty-based marking (Gardner-Medwin 2006) and by providing good and immediate feedback are educationally more sound. The analysis so far suggests that we can use optimal foraging theory-based reasoning to assist consideration of setting tasks and student time usage in activities. Hence,

Consideration 1 – The component parts in a task and the time various activities might take in this time may be considerable; do students need some guidance in this? That is, we should enhance student experience so that the next time h + s for any component is reduced, or at least explained and optimised if not monitored. The latter may be very important for Level 1 students. Getting students to do their own ‘time and motion’ study for an activity would be useful for identification of delay/ prevaricator components (s1, etc.). Incorporated in the delay terms could be the achievement of tacit experience. This might be difficult for some students. One upshot is that students have to know how to deal with the equipment/programme/concept perhaps even (semi) automatically. Only practice and experience will show a student (and of course students will

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

8

W.B. Whalley

have different experiences here) what to do. This needs to be taken into account when planning the task. Similarly, ‘troublesome knowledge’ (Meyer and Land 2005) or a ‘sticking point’ (Whalley and Taylor 2008) may extend the length of an s component, and in doing so might produce psychological reasons for driving down student attainment. Cognitive overload (Sweller 1994) and ‘information overload’ (Toffler 1971) may also significantly impinge on students’ well-being. Tutors are apt to provide long reading lists. This may be more overbearing and intimidating than helpful for many students, who are not sufficiently well trained in dealing with seeking and digesting data and metadata. ‘Overlong’ reading lists may be detrimental and may themselves produce competition for resources. Charnov (1976) has suggested, in the theory of patch choice, a predator should choose patches which should be foraged and when to leave them as a consequence of the marginal rate of energy receipt compared with the overall average in the habitat. The analogy with respect to information resources will not be extended here, but the presentation by Shirky (2009) is especially significant from a general academic viewpoint, as much as the use of Web 2.0 and cognitive psychological approaches to education. One way of combating undue length of s components might be for students to keep a time log in a personal development planning or laboratory book. Tutors might want to provide guidance of a trick concept or issue, perhaps by introducing a ‘preflight’ (Novak et al. 1999; Whalley and Taylor 2008). To minimise delay times, students need guidance and practice in handling them. How and when will depend very much on the task in hand. We see, therefore, that what might appear to be a simple task might have much more involved than the tutor might first consider. Optimal foraging theory can help us deconstruct some of these issues and provide better guidance for students. This simple model also needs to take into account not only how we learn but how we make decisions (Kahneman 2011) and mistakes (Hallinan 2009). Marginal value theorem This aspect of foraging behaviour was developed by Charnov (1976) and follows from the concept of diminishing (marginal) costs in economics. Figure 2 shows the generalised response curve of ‘energy gain’ against time. Instead of energy gain, our analogy would suggest student ‘attainment’ on the ordinate. Optimal foraging theory and Figure 2 allow us to focus on other aspects of a task set for students, whether this be a practical activity, such as a laboratory experiment, or an examination. Consider further the case of an examination, starting at T0 and continuing until the defined time and, for simplicity, consider one question in that time. The curve then defines mark gain per question. More generally, we could think of the ordinate as ‘attainment’. For ‘essay type’ questions, the curve is, as a first approximation, more likely to be a curve, F(Tp), as in Figure 2, than the straight line of Figure 1. This also illustrates what is widely known in higher and further education assessment, that it is not common to find mark attainment of 100% for (usually) essay-type examination answers or for essays. Sadler (2005) has raised this issue with respect to achievement. Indeed, one is tempted to ask why the curve is not a straight line from the origin. It does show (if the curve is a reasonable approximation to mark accreditation to a single answer) why students should not overstay a dwell time on one question

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Assessment & Evaluation in Higher Education

9

Figure 2. Optimal Stay duration as a function of time against energy gain. From Krebs et al. (1981). The curve from the origin at T0 shows an energy gain with time and which is non-linear (‘diminishing returns’). A common description would be one person picking apples from a tree, easy and with high productivity at first but gradually becoming less productive over time. The tangent, AB, to the curve F(Tp) gives the optimal stay time (at that patch) Tp (opt). The time from point A, of AB, the tangent to the curve, to T0 is the transit time to the foraging patch.

much beyond the optimal, Tp (opt). That is, to achieve a diminishing mark gain over an ever-shorter time at the expense of another question where not even the optimum stay time is reached. This assumes that mark awarding can be plotted as such a curve. It also suggests that marks awarded for a question should be plotted on a curve in order for a student to gain say, 60% on the vertical axis. However, without data it is not possible to apply theory more precisely (e.g. to define the shape of the curve better). Nevertheless, Figure 2 does raise questions which are topical and relate to standards and criteria as well as achievement and comparability. These questions include:  Where ‘should’ (or perhaps ‘might’) grade boundaries (let alone actual marks) lie on the ordinate for a given mark gain curve such as Figure 2?  How can assessment criteria be related to such a plot? For example, might there be a complicated curve with plateaus demarcating grade boundaries, rather than a smooth curve, as in Figure 2?  Do, or should, different questions in the same examination have similar curves?  Should ‘difficult’ and ‘easy’ questions have similar curves, and what actually expresses a measure of difficulty?  Should the optimal stay time approximate to the estimated duration of a question? If so, how does the tutor devise this?  Should different mark gain curves be associated with various thinking styles (Sternberg 1997) and related to specific types of question? Unfortunately, we currently lack data to answer these questions. Nevertheless, the use of marginal value theory allows a more dispassionate analysis of examinations than is generally undertaken, and hints again at a move away from the ‘connoisseur approach’ to assessment (Rust, Price, and O’Donovan 2003) and towards criteria referencing.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

10

W.B. Whalley

The tangent, AB, the intersection with the time line at A, indicates a period before T0 the start of answering the question. This time period might correspond to a search time (or delay time, s), i.e. before the start of the handling (or writing) time, T0. The time from T0 corresponds to the time h from our consideration of optimal foraging theory; capture, killing, eating and digesting in the wild, but for mark accretion by students. Thus, Tp(opt)’ can be increased if this head start can be given. This is shown in Figure 3 with a new tangent to F(Tp) as CD. Is it possible to achieve such a head start? Several ways can be suggested. The use of appropriate questions and phraseology, at least making sure that students know what is wanted, is one requirement. Setting ‘seen’ examinations may be another. In general, any way in which students’ cognitive processes can be activated before they start to write, such as reading through a question in advance, would be beneficial. We can also use the marginal value theorem to cast further light on the assessment and mark-awarding process. Figure 3 adapts Figure 2 with the addition of a dashed line CD, again as a tangent to F(Tp). Its lower gradient than AB results in a longer stay (dwell) time, i.e. Tp(opt)’ is further from the origin, T0, than previously, and secondly, the tangent intersects the ordinate at a higher value. There are several ways this might be produced for animal predators. How might this be interpreted for student mark attainment? The shape of F(Tp) shown in Figure 3 is given in Krebs, Houston, and Charnov (1981), and is similar to a plot of the Michaelis–Menten theory of enzyme kinetics. In ascribing marks, it suggests that the first few marks are essentially easy to get. With some standards (as opposed to marking criteria) this might correspond to 15% rather than 0%. However, this curve may not model mark gain appropriately, so we might look at other mathematical forms of curve. The most obvious are sigmoid functions, such as the logistic function and its relatives the error function and Gompertz function. The latter is shown in Figure 4. These functions have the property of being non-symmetric, with slow growth at the start and have been used to model learning

Figure 3. The Marginal Value model of optimal time in a patch, from Krebs et al. (1981). The optimal stay time Tp(opt) resulting from the tangent AB to F(Tp) can be lengthened by adding, at an ‘energetic cost’ T0C, the line CD which shifts Tp(opt) to the right to a new position Tp(opt)’. The ‘energy’ required could be produced, educationally, by reducing the transit time (horizontally) or by overcoming troublesome knowledge or other impediment (vertically). The time AT0 (i.e. Tt) might be envisaged as ‘thinking time’ in preparation for the answer.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Assessment & Evaluation in Higher Education

11

Figure 4. A Gompertz function as a mark attainment curve against time (as abscissa). The ct precise shape is governed by three parameters (in the general form y(t) = ae be ). A notable aspect of this function is that, like the sigmoid and logistic functions, they have cumulative growth properties, previous values build the curve. The insets show families of the Gompertz function with different parameters.

and growth processes (e.g. Mahajan and Muller 1979). There are apparently no data on what marks accretions curves look like – let alone what they ‘should’ look like – although something like Figure 4 is likely. Notwithstanding the lack of knowledge of an exact use of the function, the application of marginal value theory to Figure 4 still provokes the same sort of question: how do marks accrue in answering a question, where are the grade boundaries, how difficult is the question, etc.? Some other considerations of mark accrual involving choices: prospect theory It is also interesting to note the use of prospect theory (Kahneman and Tversky 1979) in the manner of a student actually providing an answer. Prospect theory describes decisions between alternatives that involve risk or uncertain outcomes. The risk might be answering a question without having done the reading, or being faced with a choice of questions where the candidate knows (or thinks they know) the material well enough to get a good mark. The theory consists of two stages: editing and evaluation. For an examination, the evaluation phase, writing the answer, is preceded by an ‘editing stage’ where the outcomes of the decision to do a particular question are evaluated (usually heuristically). This may be important in determining how well a student plans an answer. The evaluation builds upon what students ‘know’, think they know, expect to come up in the examination, or actually write down. Decisions made in the first few moments of an (unseen) examination can influence the marks achieved thereafter. Consider an unseen examination with a choice of questions to answer. First, a decision has to be made about which optional question to answer. Obvious constraints include, how much revision has been undertaken and how prepared students are. Additionally, students may perceive a question as being ‘hard’. Indeed, examiners themselves might view a particular question as ‘hard’, but how should this be marked in practice? Without extending a discussion as to how ‘hard’ is interpreted or what it entails in an answer, we might still ask what should (or might) the curve

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

12

W.B. Whalley

look like? Figure 5 shows mark accrual for a question. Curve a is a possible ‘response curve’ for a question’s mark scheme expected by the examiner. It is then clear that tutors need to fully explain the nature of ‘harder’ questions in order to be fair to all. This would not matter if the mark awarded (or gain) graph was linear, but in the type of response, we are envisaging this in unlikely to be the case. Hence, we come back to optimal foraging theory, marginal value theory and the implications for student choice, answering technique and what have been termed the ‘rules of the game’ (Norton et al. 2001; Bloxham and West 2004). Figure 5 is similar to Figure 1 of Sadler (2010). However, Sadler’s diagram shows the attainment for two students (curves b and c) over four equal time periods for different assessments. This paper is concerned with individual marking of questions, but does suggest that ‘curve matching’ might be useful in making comparisons of assessment schemes and student responses to a question in terms of mark accrual over a given time. This has yet to be determined quantitatively, but suggests the importance of establishing assessment criteria in order to facilitate comparisons. In Figure 5, curve a is a Gompertz function, showing a written response (the evaluation stage of prospect theory) with Topt positioned about t3. Response curve b shows an initial slow mark accrual, perhaps with the student in a rush at the end of the answer, where the ‘optimal’ stay time Topt is near t3 but nevertheless with the student having gained about 80% of available marks. For c, the written response is a very rapid mark gain; high E/h in terms of optimal foraging theory. For student response c, the written material might have been selected judiciously (what the examiner wanted), or the student was able to write fast or perhaps used Tt (of Figure 3) wisely. In all three cases, it shows that it is difficult to achieve the last 10–20% of the available marks. Feedback to students might indicate why this could be so. For instance, in terms of prospect theory, it could be poor choice of question in that a student thought they knew more than they did. Alternatively, they might have assumed that the risk of giving a descriptive answer rather than an analytical one would suffice in order to get some marks. At present, there seems to be no work

Figure 5. Attainment (= marks accrued to maximum possible in the time available) for three types of response to a question with comparisons over time quartiles. Curve a. is a Gompertz function, perhaps showing average response, b. shows slow take up of the required response but may be in a rush at the end of the answer, For c., the response is very rapid but does the person know the answer or the best way to respond to the question? As well as being expected responses they could also map the expectation of the examiner for what might be deemed to be an ‘easy’ (c), ‘average’ (a) and ‘difficult’ (b) question. A Gompertz-type curve is assumed for mark attainment in each case.

Assessment & Evaluation in Higher Education

13

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

on utility theory or prospect theory (e.g. Kahneman 2011) in student choice under assessment conditions. Discussion Various forms of assessment have been used widely for many purposes (Stobart 2008). Boud (2007) examines various higher education policies and highlights various statements as showing assessment to be of; ‘quality assurance, of confirming learning outcomes’, ‘as an activity of determining achievement’, ‘of quality assurance, in terms of ensuring confidence in standards and procedures’ and on ‘measurement of outcomes’. Boud points out that the dominant discourse is that assessment constructs learners as passive subjects (17). He then starts to provide some realignment that places ‘informing judgement’ centrally and ‘is able to include key graduate learning attributes as an intrinsic part of what assessment is for’, and ‘gives prominence to students making judgements about their own learning’. However, implementation of assessment (in its broadest terms) needs to be a product of institutional policies, but especially by tutors and examiners. Furthermore, students themselves need transparency in the assessment process. Not least of these aspects is in peer-assessment practices. My experience of discussing the ‘theory’ presented in this paper with students suggests that they appreciate it. Although they may not do ‘better’ in the various forms assessment I have set, at least they feel more relaxed with the process of assessment. Theory helps to inform their judgement of assessment tasks, and thereby provides some immediate feed-forward as they are engaged on the assessment. This applies to all forms of assessment, especially examinations where students are under pressure to perform well, and where decisions of which questions to answer and what to write may account for a substantial proportion of a module’s marks. Thus, we need to be careful in addressing student preparation for learning in general as much as examination taking. This may well be critical in assisting students make the transition from compulsory to tertiary education. The simple models outlined here provide contexts for times taken in project work as well as examinations of various kinds, and they also allow a corresponding view of marks gained per unit time. The ideas, thus, help identify ways in which continuously assessed work might be made appropriately meaningful and set in an appropriate time framework. The variety of experiential, problem-based and similar types of assessment need more careful treatment and explanation in awarding marks than we are normally providing. Furthermore, students spending time on task and identifying time wasting activities may be as important as what they write, and where decision-making under risk has also a part to play in task completion or submitting material late. The ecological theory adapted here helps reveal something about assessment processes and deserves a more comprehensive evaluation in the context of pedagogy and practice (Baum, Yorke, and Coffey 2004). Posing such questions in the framework of Chickerlng and Ehrmann’s (1996) ‘seven principles of good education’, and within analyses of student and tutor behaviour, can help guide both communities towards better educational practices. Such practices may be more than the general seven principles but also related to specific aspects of assessment, such as the manner of scaling and aggregating students marks (McLachlan and Whiten 2000). More detailed contextual use of the optimal foraging theory and marginal value theory models will be discussed in a subsequent paper.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

14

W.B. Whalley

Along with a generalised foraging model, decision-making processes are also important and are implicit in general human behaviour, and should, thus, be accommodated in ‘learning how to learn’ as much as feedback for ‘assessment for learning’ (Stobart 2008). Falchikov (2005) lists and discusses seven pillars of assessment: why?, how to?, what to?, when to?, how well do we?, who assesses? And, what next? With respect to ‘what next?’, she suggests (254), ‘What is there on the horizon in terms of involving students in assessment’ and ‘A first step … would be to bring at least a modicum of light where there is now darkness’. We have much still to understand about individual cognitive processes for student learning, including assessment, and tutors’ marking. Whilst the ideas suggested in this paper are still a long way from providing any cognitive theory of assessment, they do indicate ways in which students can be shown what is expected of them and tutors what to expect of students responses. Newstead (2004, 97) suggested that ‘things may improve if lecturers communicate clearly with students what the purposes of assessment are and ensure that their marking reflects those purposes’. Conclusions The use of optimal foraging theory, the marginal value theorem and prospect theory helps identify what we might want from assessment and its evaluation in all its forms. We need a wider view of what assessment is actually for; which is not the same as how it is used. Assessment also needs to be viewed from student as well as tutor positions. The validity of various assessment practices (Newstead 2002), as well as concepts such as moderation, grade integrity, fidelity and indeterminacy, raised by Sadler (2009a, 2009b, 2009c, 2010), can also be identified and even examined by the tools outlined here. It is also possible to place assessment within a student-personalised framework (Newstead 2004; Falchikov 2005) and within ideas of human behaviour (e.g. Bird and O’Connell 2006; Kahneman 2011). Acknowledgement I thank Professor Bob Elwood, for comments on an earlier version of this paper.

Notes on contributor W. Brian Whalley taught and researched in geomorphology and geology for most of his academic life at The Queen’s University of Belfast. He retired from QUB and is now Emeritus at the University of Sheffield. He was awarded a National Teaching Fellowship in 2008. He continues to work on practical pedagogy, including participation in a recently completed HEA-funded project on ‘Enhancing Fieldwork Learning’. He is now learning how to fly a glider – which places him firmly back on the experiential learning cycle ab initio.

References Baum, D., M. Yorke, and M. Coffey. 2004. “What is Happening When We Assess, and How Can We Use Our Understanding of This to Improve Assessment?” Assessment & Evaluation in Higher Education 29 (4): 451–477. Beetham, H. 2013. “Designing for Active Learning in Technology-rich Contexts.” In Rethinking Pedagogy for a Digital Age, edited by H. Beetham and R. Sharpe, 31–48. New York: Routledge.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

Assessment & Evaluation in Higher Education

15

Biggs, J. 2001. “Enhancing Learning: A Matter of Style or Approach?” In Perspectives on Thinking, Learning and Cognitive Styles, edited by R. J. Sternberg and L.-F. Zhang, 73–102. Mahwah, NJ: Lawrence Erlbaum. Biggs, J. 2003. Teaching for Quality Learning at University. Buckingham: Open University Press. Bird, D. W., and J. F. O’Connell. 2006. “Behavioral Ecology and Archaeology.” Journal of Archaeological Research 14 (2): 143–188. Bloxham, S., and A. West. 2004. “Understanding the Rules of the Game: Marking Peer Assessment as a Medium for Developing Students’ Conceptions of Assessment.” Assessment and Evaluation in Higher Education 29 (6): 721–733. Boud, D. 2007. “Reframing Assessment as If Learning Was Important.” In Rethinking Assessment in Higher Education: Learning for the Longer Term, edited by D. Boud and N. Falchikov, 14–25. London: Routledge. Boud, D., and N. Falchikov. 2007. Rethinking Assessment in Higher Education: Learning for the Longer Term, 206. London: Routledge. Brown, S., and R. Pickford. 2006. Assessing Skills and Practice. London: Routledge. Bryan, C., and K. Klegg. 2006. Innovative Assessment in Higher Education, 233. London: Routledge. Charnov, E. L. 1976. “Optimal Foraging: The Marginal Value Theorem.” Theoretical Population Biology 9: 129–136. Chickering, A. W., and S. C. Ehrmann. October, 1996. “Implementing the Seven Principles: Technology as Lever.” AAHE Bulletin 306. http://www.clt.astate.edu/clthome/Implement ingtheSevenPrinciples,EhrmannandChickering.pdf. Cottrell, S. 2012. The Exam Skills Handbook. Basingstoke: Palgrave Macmillan. Dunn, L., C. Morgan, M. O’Reilly, and S. Parry. 2004. The Student Assessment Handbook. London: Routledgefalmer. Falchikov, N. 2005. Improving Assessment through Student Involvement. London: Routledge. Gardner-Medwin, A. R. 2006. “Confidence-based Marking.” In Innovative Assessment in Higher Education, edited by C. Bryan and K. Clegg, 141–149. London: Routledge. Hallinan, J. T. 2009. Why We Make Mistakes. New York: Broadway Books. Kahneman, D. 2011. Thinking, Fast and Slow. London: Allen Lane. Kahneman, D., and A. Tversky. 1979. “Prospect Theory: An Analysis of Decision under Risk.” Econometrica 47: 263–291. Kamil, A. C., J. R. Krebs, and H. R. Pulliam. 1987. Foraging Behavior. New York: Plenum Press. Knight, P. T., and M. Yorke. 2003. Assessment, Learning and Employability. Maidenhead: Open University Press. Krebs, J. R., A. I. Houston, and E. L. Charnov. 1981. “A Comparative Analysis of Optimal Foraging Behavior: Laboratory Simulations.” In Foraging Behaviour, edited by A. C. Kamil and T. D. Sargent, 3–18. New York: Garland/STPM Press. Mahajan, V., and E. Muller. 1979. “Innovation Diffusion and New Product Growth Models in Marketing.” The Journal of Marketing 43 (4): 55–68. McLachlan, J. C., and S. C. Whiten. 2000. “Marks, Scores and Grades: Scaling and Aggregating Student Assessment Outcomes.” Medical Education 34: 788–797. Meyer, J. H. F., and R. Land. 2005. “Threshold Concepts and Troublesome Knowledge (2): Epistemological Considerations and a Conceptual Framework for Teaching and Learning.” Higher Education 49 (3): 373–388. Miller, C. M. L., and M. Parlett. 1974. Up to the Mark. Guildford: Society for Research in Higher Education. Newstead, S. 2002. “Examining the Examiners: Why Are We So Bad at Assessing Students?” Psychology Learning and Teaching 2 (2): 70–75. Newstead, S. 2004. “The Purposes of Assessment.” Psychology Learning and Teaching 3 (2): 97–101. Norton, L., A. J. Tilley, S. E. Newstead, and A. Franklyn-Stokes. 2001. “The Pressures of Assessment in Undergraduate Courses and Their Effect on Student Behaviours.” Assessment & Evaluation in Higher Education 26 (3): 116–126. Novak, G. M., E. T. Patterson, A. D. Gavrin, and W. Christian. 1999. Just-in-Time Teaching. Upper Saddle River, NJ: Prentice Hall.

Downloaded by [Professor Brian Whalley] at 02:57 07 January 2015

16

W.B. Whalley

Pirolli, P. 2007. Information Foraging Theory: Adaptive Interaction with Information. Oxford: Oxford University Press. Race, P. 2001. The Lecturer’s Toolkit: A Resource for Developing Learning, Teaching and Assessment. London: Kogan Page. Race, P. 2005. Making Learning Happen. London: Sage. Ramsden, P. 2003. Learning to Teach in Higher Education. London: Routledge Falmer. Rust, C. 2007. “Towards a Scholarship of Assessment.” Assessment & Evaluation in Higher Education 32 (2): 229–237. Rust, C., M. Price, and B. O’Donovan. 2003. “Improving Students’ Learning by Developing Their Understanding of Assessment Criteria and Processes.” Assessment & Evaluation in Higher Education 28 (2): 146–164. Sadler, D. R. 2005. “Interpretations of Criteria Based Assessment and Grading in Higher Education.” Assessment & Evaluation in Higher Education 30 (2): 175–194. Sadler, D. R. 2009a. “Grade Integrity and the Representation of Academic Achievement.” Studies in Higher Education 34 (7): 807–826. Sadler, D. R. 2009b. “Indeterminacy in the Use of Preset Criteria for Assessment and Grading.” Assessment & Evaluation in Higher Education 34 (2): 159–179. Sadler D. R. 2009c. Moderation, Grading and Calibration. Good Practices in Assessment Symposium, Griffith University. Accessed August 7, 2014. http://www.griffith.edu.au/ __data/assets/pdf_file/0017/211940/GPA-Symposium2009-Edited-Keynote-Address-FINAL. pdf Sadler, D. R. 2010. “Fidelity as a Precondition for Integrity in Grading Academic Achievement.” Assessment & Evaluation in Higher Education 35 (6): 727–743. Shirky, C. 2009. It’s Not Information Overload. It’s Filter Failure. Accessed August 6, 2014. http://web2expo.blip.tv/file/1277460 Stephens, D. W. 2008. Foraging: Behaviour and Ecology. Chicago, IL: University of Chicago Press. Stephens, D. W., and J. R. Krebs. 1986. Foraging Theory. Princeton, NJ: Princeton University Press. Sternberg, R. J. 1997. Thinking Styles. Cambridge: Cambridge University Press. Stobart, G. 2008. Testing Times: The Uses and Abuses of Assessment. London: Routledge. Sweller, J. 1994. “Cognitive Load Theory, Learning Difficulty, and Instructional Design.” Learning and Instruction 4: 295–312. Toffler, A. 1971. Future Shock. London: Pan. Whalley, W. B., and L. Taylor. 2008. “Using Criterion-referenced Assessment and ‘Preflights’ to Enhance Education in Practical Assignments.” Planet 20: 29–36.