Jun 16, 2006 - Lawrence Erlbaum Associates, Hillsdale, NJ, 1991. Alan H. Schoenfeld. Mathematical problem solving. Academic Press, Or- lando, 1985.
The Requirements of Mathematical Reasoning in Upper secondary Level Assessments Torulf Palm, Jesper Boesen and Johan Lithner June 16, 2006
Abstract Many studies have shown that students often use different kinds of mathematically superficial reasoning in their task solving endeavours. The focus on such reasoning may be one of the main reasons for the difficulties many students have in their learning of mathematics. Studies of textbooks and classroom instruction have revealed several characteristics of the learning environment that are likely to support the development and the continued use of superficial reasoning. In the study presented in this paper we investigate the mathematical reasoning required to solve the tasks in the Swedish national tests and a random selection of Swedish teacher-made tests. The results show that only a small proportion of the tasks in the teacher-made tests require the students to produce new reasoning and to consider the intrinsic mathematical properties involved in the tasks. In contrast, the results also show that the national tests include a large proportion of tasks to which memorisation of facts and procedures are not sufficient.
1
Introduction
It is widely shown that many students of all age groups often use mathematically superficial reasoning when solving all kinds of tasks (Cox, 1994; Palm, 2002; Schoenfeld, 1991; Tall, 1996; Verschaffel, Greer, and DeCorte, 2000). The most elementary example is the key word strategy (Hegarty, Mayer, and Monk, 1995), which can be exemplified by the choice of using addition if the task text includes the word ‘more’ and subtraction if the text includes the word ‘less’. Studies on upper secondary and undergraduate students indicate that reasoning focusing on what is familiar and remembered at a superficial level are dominant over reasoning based on mathematical properties of the components involved, even when the latter can lead to considerable progress 1
(Bergqvist, Lithner, and Sumpter, 2003; Lithner, 2000, 2003; ), (in preparation). The students’ beliefs often do not seem to include the latter type of mathematical reasoning as a main approach, even though they master the necessary knowledge base. Since the students seldom make any attempts to construct their own solution reasoning it is crucial for them to find solution procedures to copy and the choice of procedure to use is often made on mathematically superficial grounds. The reliance on such reasoning is not likely to be efficient for the learning of advanced mathematical thinking or for achieving relational understanding (Skemp, 1978) of basic mathematical concepts and ideas. In addition, it is likely to be devestating for task solution success when dealing with non-routine tasks to which there are no readymade solution procedures directly available to the students. Indeed, a large body of research have showed that many students of different age groups have large difficulties with solving non-routine tasks (Boesen, Lithner, and Palm, 2005; Schoenfeld, 1985; Selden, Selden, and Mason, 1994; Verschaffel et al., 2000). However, the focus on remembering procedures superficially related to the task at hand also limits the possibilities of task solution success on routine tasks when the procedure is forgotten or a mistake in the procedure is made. The reasons for this focus on mathematically superficial reasoning may, at least to a large extent, be found in the students’ learning environment. There have been many critics towards the experienced ‘diet’ of stereotypical tasks that the students in elementary and lower secondary school encounter (Reusser, 1988; Verschaffel et al., 2000). In a study of undergraduate calculus textbooks (Lithner, 2004) it was found that most of the tasks could be solved, and actually were solved (Lithner, 2003), by looking for and copying procedures earlier in the same textbook section without considering relevant mathematical properties. Such characteristics of the tasks included in the textbooks may be one influential factor in the development of mathematically superficial reasoning. The study reported in this paper deals with assessment, which is another component of the learning environment that has been claimed to have the possibilities to substantially influence student learning. The purpose of the study is to investigate which kinds of mathematical reasoning that are required to successfully solve the tasks in the written tests that upper secondary school students encounter in their mathematical studies (including both national tests and teacher-made classroom tests). The results of the study may contribute to the knowledge of the characteristics of the students’ learning environment and extend the foundation for explanations about the students’ choices of task solution strategies. A theoretical framework for 2
mathematical reasoning (Lithner, 2005) will be used for the definition of the concept mathematical reasoning used in this paper and for descriptions of different kinds of mathematical reasoning. In addition, a framework has been developed for the classification of the tasks in terms of the mathematical reasoning required to successfully solve them. This framework includes a procedure for analysing the tasks as well as a set of task variables to consider in this procedure. The framework for mathematical reasoning will be described in the next section about Mathematical reasoning and the framework for analysing the assessment tasks will be described in the Method section.
2
Mathematical reasoning
The framework for analysing mathematical reasoning is a summary of Lithner (2005) which is a theoretical structuring of the outcomes of a series of empirical studies aiming at analysing characteristics of the relation between reasoning types and learning difficulties in mathematics (Bergqvist et al., 2003; Lithner, 2000, 2003, 2004). The framework defines different types of mathematical reasoning found in the empirical studies. These comprise rich problem solving (in terms of Creative mathematically founded reasoning) and a family of reasoning types characterised by a strive for a recall of algorithms or facts (in terms of Imitative reasoning). The reasoning types will be defined below (see Figure 1 for an overview). The framework is chosen
Figure 1: An overview of empirically established reasoning types. for this study since it a. is firmly anchored in empirical data in the sense that it captures key characteristics of reasoning that in empirical studies have been found to be used by students. 3
b. does not only consist of vague abstract definitions and sets of examples, but is built on well-defined concepts and can be used as a concrete tool for determining characteristics of empirical data. c. is uniform in the sense that characterisations of different reasoning types are comparable within the framework. This enables connections between different learning and achievement situations to be identified, in order to explain origins of reasoning types and their relations to learning difficulties. d. can be used to communicate key phenomena between researchers, students, teachers, textbook authors, curricula developers etc., and provide a basis for analysing, understanding, and constructing learning environment components like tests, textbooks and teaching. Before defining the different reasoning types a number of terms that are used in the definitions will be treated. Reasoning in this paper is the line of thought, the way of thinking, adopted to produce assertions and reach conclusions. It is not necessarily based on formal deductive logic, and may even be incorrect as long as there are some kind of sensible (to the reasoner) arguments that guide the thinking. Argumentation is the substantiation, the part of the reasoning that aims at convincing oneself, or someone else, that the reasoning is appropriate. In particular in a task-solving situation, which is called a problematic situation if it is not clear how to proceed, two types of argumentation are central: (1) The strategy choice, where ‘choice’ is seen in a wide sense (choose, recall, construct, discover, guess, etc.), can be supported by predictive argumentation: Will the strategy solve the difficulty? (2) The strategy implementation can be supported by verificative argumentation: did the strategy solve the difficulty?
2.1
Creative Mathematically Founded Reasoning
When contrasting creative reasoning in mathematics to imitative reasoning there are two types of considerations to make that will be briefly discussed below: What makes it creative and what makes it mathematical? Creativity. According to Haylock (1997) there are at least two major ways in which the term is used: i) thinking that is divergent and overcomes fixation and ii) the thinking behind a product that is perceived as grandiose by a large group of people. Silver (1997) argues that “although creativity is being associated with the notion of ‘genius’ or exceptional ability, it can be productive for mathematics educators to view creativity instead as an orientation or disposition toward mathematical activity that can be fostered broadly in the 4
general school population”. Thus, a notion of creativeness limited to ii), the thinking of geniuses or the creation of great ideas with large impact on our society, is not suitable for the purposes of this paper. Instead, central are the creative aspects of ordinary students’ everyday task solving thinking, the reasoning that goes beyond just following strict algorithmic paths or recalling ideas provided by others. Regarding i) Haylock (1997) sees two types of fixation. Content universe fixation concerns the range of elements seen as appropriate for applications to a given problem: useful knowledge is not seen as useful. Algorithmic fixation is shown in the repeated use of an initially successful algorithm that becomes an inappropriate fixation. According to Silver (1997), a new research based view of creativity suggests that it is related to deep, flexible knowledge in content domains and associated with long periods of work and reflection rather than rapid and exceptional insights. The framework of this paper amalgamates Haylock’s and Silver’s views and sees fluency, flexibility and novelty as key qualities of creativity. However, in the analyses of the reasoning in the empirical studies that resulted in the framework (Lithner, 2005) a need arose to characterise what distinguishes creative mathematical reasoning from general creative reasoning. This resulted in complementing the description of creativity by adding the aspects of plausibility and mathematical foundation. Plausibility. One could claim that the argumentation in creative mathematical reasoning should be logically strict as in proof, but this is inappropriate in the school context. In school tasks, one of the goals is also to achieve a high degree of certainty, but one crucial distinction from professional tasks is that within the didactic contract (Brousseau, 1997) of school it is allowed to guess, to take chances, and use ideas and reasoning that are not completely firmly founded. Even in exams it is often accepted to have only, for example 50% of the answers correct, while it is absurd if mathematicians or the engineers are correct in only 50% of their conclusions. This implies that is is allowed, and perhaps even encouraged, within school task solving to use forms of mathematical reasoning with considerably reduced requirements on logical rigour. P´olya (1954, p. iv) stresses the important role of reasoning that is less strict than proof: “In strict reasoning the principal thing is to distinguish a proof from a guess, a valid demonstration from an invalid attempt. In plausible reasoning the principal thing is to distinguish a guess from a guess, a more reasonable guess grom a less reasonable guess”. Mathematical foundation. In this framework, well-founded arguments are anchored in intrinsic properties of components involved in the reasoning. Before specifying this foundation it is necessary to briefly discuss the components one is reasoning about, which consists of objects, transformationis, and concepts. 5
The object is the fundamental enitity, the ‘thing’ that one is doing something with or the result of doing something, e.g. numbers, variables, functions, graphs, diagrams, matrices, etc. A transformation is what is being done to an object (or several), and the outcome is another object (or several). Counting apples is a transformation applied to real-world objects and the outcome is a number. To calculate a determinant is a transformation on a matrix. A well-defined sequence of transformations, e.g. finding a local maximum of a third-degree polynomial will be called a procedure. A concept is a central mathematical idea built on a related set of objects, transformations, and their properties, for example, the concept of function or the concept of infinity. A property of a component is mathematical if it is accepted by the mathematical society as correct. Since a property of a component may be more or less relevant in a particular context and problematic situation, it is necessary to distinguish between intrinsic properties that are central and surface properties that have no or little relevance in a specific situation. In deciding which of the fractions 99/120 and 3/2 that is the largest, the size of the numbers (99, 120, 3 and 2) is a surface property that is insufficient to consider in this particular task while the quotient captures the intrinsic property. With the preparations made above it is now possible to define Creative mathematically founded reasoning (CR) as fulfilling the following conditions: I. Novelty. A new (to the reasoner) sequence of solution reasoning is created, or a forgotten sequence is a recreated. To imitate an answer or a solution procedure is not included in CR. II. Flexibility. It fluently admits different approaches and adaptions to the situation. It does not suffer from fixation that hinders the progress. III. Plausibility. There are arguments supporting the strategy choice and/or strategy implementation, motivating why the conclusions are true or plausible. Guesses, vague intuitions and affective reasons are not considered. IV. Mathematical fondation. The argumentation is founded on intrinsic mathematical properties of the components involved in the reasoning.
2.2
Imitative Reasoning
Different versions of imitative reasoning are types of reasoning that are more frequently used by students than CR is. In these types of reasoning students copy or follow a model or an example without any attempts at originality. Learning difficulties are partly related to a reduction of complexity that appears as a procedural focus on facts and algorithms and a lack of relational understanding (Lithner, 2005). Hiebert (2003) finds massive amount of con6
verging data showing that students know some basic elementary skills but there is not much depth and understanding. Leron and Hazzan (1997) argue that analyses of task solving behaviour should not only consider attempts to understand the task, and successes and failures in such attempts. They emphasise additional non-cognitive means of trying to cope: attempts to guess and to find familiar surface clues for action, and the need to meet the expectations of the teacher or researcher. The definitions below aims at characterising superficial reasoning that may be based on such attempts to cope, where the two main types found in the empirical studies mentioned above are defined as Memorised and Algorithmic reasoning. Memorised reasoning (MR) fulfils the following conditions: i) The strategy choice is founded on recalling by memory and answer. ii) The strategy implementation consists only of writing it down. One can describe any part of the answer without having considered the preceding parts. An example is to recall every step of a proof. An algorithm is a set of rules that will if followed solve a particular task type. The most common algorithms consist of procedures. Algoritmic reasoning (AR) fulfils the following conditions: i) The strategy choice is founded on recalling by memory, not the whole answer in detail as in MR, but an algorithm that will guarantee that a correct solution can be reached. ii) After this algorithm is given or recalled the reasoning parts that remain in the strategy implementation are trivial for the reasoner and only a careless mistake can hinder that an answer to the task is reached. Fundamental in AR is how to identify a suitable algorithm. If this can be done, the rest is straightforward. AR based on surface property considerations is common, often dominating, and Bergqvist et al. (2003); Lithner (2000, 2003, 2004) have distinguished three (partly overlapping) families of common reasoning: Familiar AR/MR (FAR). This reasoning consists of strategy choice attempts to identify a task as being of a familiar type with a corresponding known solution algorithm or comcplete answer. The simplest example is a version of the Key word strategy where the word ‘more’ in a text is connected to the addition algorithm and ‘less’ to subtraction (Hegarty et al., 1995). Another example can be found in (Lithner, 2000), which describes how students make a holistic but superficial interpretation of the task text and reach a clear but faulty image that it is of a particular familiar type. Delimiting AR (DAR). The algorithm is chosen from a set of algorithms that are available to the reasoner, and the set is delimited by the reasoner through the included algorithms’ surface property relations to the task. For example, if the task contains a second-degree polynomial the reasoner can 7
choose to solve the corresponding equation even if the task asks for the maximum of the polynomial (Bergqvist et al., 2003). Here the reasoner do not have to see the task as a familiar one. Guided AR (GAR). An individual’s reasoning can be guided by a source external to the task. The two main types empirically found are: (i) Personguided AR, when someone (e.g. a teacher or a peer) pilots a student’s solution. (ii) Text-guided AR, where the strategy choice is founded on identifying similar surface properties in an example, definition, theorem, rule, or some other situation in a text source connected to the task. Using the the definitions of AR and MR we can now also define two different categories of CR; Local and global CR. Reasoning that is mainly based on MR or AR but contains minor, local elements of CR will be called local CR (LCR) while reasoning that contains large elements of CR is called global CR (GCR). The latter may still contain large elements of MR or AR. One difference between LCR and MR/AR is that the latter may be possible to carry out without considering any intrinsic mathematical properties. CR cannot be done arbitrarily, not even locally, and it may in LCR be necessary to understand large parts of the task in order to make the required local decisions.
3
Research questions
The purpose of this paper is to investigate which kinds of mathematical reasoning that are required by the Swedish upper secondary school students to solve the tasks in the assessments they encounter in their mathematical studies. Using the terminology in the theoretical framework above the following specific research questions will be focused: Q1. Which are the proportions of tasks that require GCR for a successful solution in the National tests and the teacher-made tests that the Swedish students from three large study programmes encounter? Q2. Is there a difference between the three study programmes regarding the proportion of tasks in the teacher-made tests that require GCR for a successful solution? Q3. Is there a difference between the mathematical courses regarding the proportion of tasks in the teacher-made tests and the national tests that require GCR for a successful solution? 8
Q4. Is there a difference between the teacher-made tests and the national tests regarding the proportion of tasks that require GCR for a successful solution?
4
Method
4.1 4.1.1
The Tests School System
The Swedish upper secondary school (grades 10-12) is not compulsory, but 98% of the students in compulsory school are pursuing their studies in grades 10-12. In the upper secondary school each student attend one of 16 study programmes, which are vocational or more theoretical. School mathematics in the Swedish upper secondary school is divided in five consecutive courses, A-E. Course A is studied by all students and is often the only mathematics course taken by the students following a vocational programme. Students studying the Social science programme are also required to take course B. Students following the Natural science program are required to extend their mathematics studies to include at least course C and D. The Swedish grading system is based on national criteria for four different grade levels; Not passed, Pass, Pass with distinction and Pass with special distinction. The national assessment system is therefore also criterion based. The students receive a test grade, determined by the in advanced decided cut scores. The course grade is based on both the national test result and other performances made during the course such as the performances on the teacher-made classroom tests. The national course tests (NCT:s) for the courses A-D are administered two times a year. Normally, the tests have a ten-year period in which they are classified as strictly secret, but for some tests the secrecy is lifted so the teachers and students can have a chance to see what the tests look like. 4.1.2
Test Selection Procedure
According to a survey to teachers accompanying the national tests in spring 2003, the national tests and the teacher-made classroom assessments constitute the absolute majority of the tests the students encounter in upper secondary school. Therefore, these two kinds of tests were selected for the analysis. The study programmes selected for the sample of teacher-made assessments are the Natural science programme (NV), the Social science programme (SP) and the Hotel and restaurant programme (HP). The first 9
two were chosen because they are the largest theoretical programmes. HP was chosen since it is one of the largest vocational programmes and is attended by approximately the same number of boys and girls. A random selection of the Swedish teacher-made tests for the courses A-D administered to the students from the three chosen study programmes were included in the study. A multi-step procedure was applied to select the tests. First, there was a randomised selection of schools. In this selection all Swedish upper secondary schools were weighted according to the number of students attending the specific programmes of the study. For example, when selecting tests given to students attending the Natural science programme and studying course A a school with 1000 students attending this program and course was given 1000 posts and a school with 200 students was given 200 posts. Then, a randomised selection of teachers (teaching the specific programme and course in the school year of 2003/2004) was made at each school that was picked out in the school selection. In this procedure schools could be selected more than once and in those cases more than one teacher were selected. Finally, the teachers were asked to randomly select one of the tests they gave the students in the particular course and programme in 2003/2004. Only written tests with at least 30 minutes of test time were to be considered. In addition, all of the eight national tests administered for the courses A-D in 2003/2004 (two for each course) were selected for the analysis.
4.2
Analysis
The kind of reasoning that is required to solve a task cannot be described by considering only the task itself. A task may be solved with algorithmic reasoning by a student to whom the algorithm is familiar, but the same task may require some form of creative reasoning by a less experienced student who has not seen or used the algorithm. In categorising tasks it is thus necessary to consider the relation (Niss and Jensen, 2002; Schoenfeld, 1985) between the task and the student that the task is designed for. It is not possible to consider the students’ complete learning experiences so we have restricted the analysis to the textbook, which is a part of the learning environment that was possible to analyse in this large-scale study. This is a reduction of complexity, but the dominant learning activity in the Swedish upper secondary school is the students’ work with their textbooks (Skolverket, 2003) and it seems like most classroom activites are guided by the textbook content. Also internationally this seems to be a quite common learning activity (Lester and Lambdin, 2004). 10
4.2.1
Analysis Procedure
One of the two parts of the framework developed for the classification of the assessment tasks in terms of the mathematical reasoning required to solve them is the procedure used for the analysis. The overall idea behind the procedure is to determine if it is possible, and reasonable, that the students can solve the assessment task by using Algorithmic reasoning (AR), Memorised reasoning (MR), or some other mathematically superficial reasoning type that is not in the present reasoning framework, or if Creative mathematically founded reasoning (CR) is required. For AR or MR to be sufficient to solve an assessment task it is assumed that (1) the students must have encountered an applicable algorithm or fact in the textbook at several occasions (so the students have a fair chance of being able to remember the algorithm or fact) and (2) the characteristics of the assessment task must be such that the students can relate this task to tasks or examples in the textbook that can be solved with the same algorithm as the assessment task or to facts in the textbook that solve the assessment task. Task variables considered to be important in this sense are described in the next section. The analysis procedure comprise the following steps: I. Analysis of the assessment task - Answers and solutions Identification of the answers (for MR) or algorithms (for AR) that can solve the assessment task. II. Analysis of the assessment task – Other task characteristics Description of the task by means of the task variables described in the next section. III. Analysis of the textbook – Answers and solutions a A search for exercises and examples in the textbook that can be solved with the same answer or algorithm as the assessment task (when judging whether the tasks can be solved with the same algorithm the equipment (such as calculators) available to the students must be taken into account)). b A search in the theory text in the textbook for components that include the answer or algorithm. These components can be rules, theorems, described facts, etc. IV. Analysis of the textbook – Other task characteristics 11
a A search for exercises and examples in the textbook that are similar to the assessment task with respect to the task variables described in the next section. b A search in the theory text in the textbook for information that seems closely related to the assessment task with respect to the task variables described in the next section. V. Conclusion about and argumentation for a required reasoning type An argumentation, based on the information obtained in the steps I-IV, backing the conclusion that it is possible and reasonable for the students to solve the assessment task with one of the mathematically superficial reasoning types or backing the conclusion that creative and mathematically founded reasoning is required. The following is a description of the essence of the argumentation backing the reasoning requirement classifications of the assessment tasks. Familiar AR classification. The assessment task is judged to be very similar to at least three of the exercises or examples in the textbook in terms of the task variables described in the next section. Therefore, the students are judged to have the possibility to relate the assessment task to these textbook tasks and in the assessment situation apply the algorithms used in the textbook tasks. The existence of at least three such excercises or examples makes it likely that the connection between the exercises/examples and the assessment task can be done and the assessment task is therefore classified as solvable by the reasoning type FAR (the choice of three as the required number of similar excercises or examples are of course a somewhat arbitrary choice but is based on the idea that many students need to see or work with a few tasks with similar characteristics before they can recall them). The reasonableness of this number (and the whole analysis) was validated by a study of the reasoning students actually used on tasks classified in this way (see the section Validity of the analysis later in this paper and (Boesen et al., 2005). Guided AR classification. The test conditions include a textbook or a formula sheet that can be used to copy a described procedure that will solve the assessment. The argumentation is similar to the one backing a FAR classification but since the students do not need to recall the excercises or examples only one such excercise or example is enough to classify an assessment tasks as solvable by GAR. MR classification. This argumentation is also similar to the one backing a FAR classification but for the classfication of MR it is required three 12
answers in the excercises, examples (or the theory text) that solves the assessment task. Note that for the classification of all of the mathematically superficial reasoning types FAR, GAR and MR the whole task need not be experienced as familiar. It is sufficient that some of the important characteristics of the task variables described in the next section indicate to the students that familiar information can be use. Other type of superficial (non-CR) reasoning. No such classifications were made. CR classification. Reasonable mathematically superficial reasoning is judged as not possible to use for solving the task since the possible answers or algorithms solving the assessment task is not described in the textbook. In addition, creative and mathematically founded reasoning that will solve the task is judged to be possible to carry out for the students taking the assessment. That it is possible means that the content knowledge is included in the textbook and the required creative steps are judged to be of a kind that may be carried out by at least some of the students. Therefore, the assessment task is judged to require, and to be solvable by, CR. 4.2.2
Analysis of Task Relatedness
The second part of the framework used for the classification of the tasks, in terms of the their requirements on mathematical reasoning, is a set of task variables. Before defining what we mean by task variables we first define what we mean by a task. With a task in a test (or test item) we mean “ . . . an instruction or question that requires a student response under certain conditions and specific scoring rules” (Haladyna, 1997). The term task variable will be used in accordance with its use in Kulm (1993) meaning “any characteristic of problem tasks which assumes a particular value from a set of possible values. A task variable may thus be numerical (e.g. the number of words in a problem) or classificatory (e.g. problem content area)”. The task variables included in the classification framework are used in the analysis to compare the characteristics of the assessment tasks with the characteristics of the theory, exercises and examples in the textbooks. The purpose of this comparison is to judge if there is a similarity between the assessment task and what the students have met in the textbook. That may be sufficient for them to have reasonable possibilities of relating the assessment task to known solution algorithms or facts via similarity with the theory, exercises and examples in the textbook. Only task variables that can be argued to be important for these possibilities are included in the framework (for the argumentation of the included task variables see be13
low). It is acknowledged that task solving is taking place in a socio-cultural situation and the choices made by the students are dependent on subject variables and situation variables as well as on task variables. However, in the analysis of the similarities between the assessment task and information in the textbook only task variables are considered. The following is a description of the task variables chosen and the reasons for including them in the analysis. 1. Assignment. According to the definition above an assessment task includes, explicitly or implicitly, a question or an instruction, an assignment, to do something. The assignment sets up the goals for the students’ work. Since it is a central as well as conspicuous characteristic of a task it may be used to relate an assessment task to excercises or examples in the textbook. If the assignment in the assessment task is recognised as included in textbook excercises or examples that are solvable by a certain algorithm or fact the students may conclude that the assessment task is solvable with the same algorithm or fact. In addition, the question was one of the similarity dimensions the students in both a study by Silver (1997) and the students in a study by Chartoff (1976) used to relate tasks to each other. Examples of different types of assignments are; differentiate the following polynomial, find the minimum value of the function, and explain why the derivative of the function has the following numerical value. 2. Explicit information about the situation. It is common that the assignment is embedded in a description of some situation. This description may include two parts: (a) Explicit information about mathematical components (the terms given, the values given etc.) (b) Explicit information about a real life event If the situation described in the task, the figurative context (Clarke and Helme, 1998), is purely mathematical then only mathematical components are described in the task. The mathematical components given in the task may suggest a solution method if the students recognise the similarity in the described mathematical components with those described in a number of textbook tasks that are solved with a certain algorithm or fact. For example, if (1) two probability values are given in the assessment task, one for each of two events, and the assignment asks for a third probability value and (2) most of the 14
tasks in the textbook that include this combination of probability values are solvable by multiplication of the two given probability values this may suggest to the students that the assessment task is solvable by multiplication of the two given probability values. If the situation described is extra-mathematical then this situation may be, for example, a situation from daily life or some other discipline such as physics or biology. In these cases there may or may not exist explicit information about the mathematical components in the described situation. The information about the real life event may be recognised as included in textbook tasks or examples that are solvable by a certain algorithm and therefore suggest relatedness to such tasks and the solution method used to solve those tasks. For example, information about the amount of deposits made at a specific number of occasions together with an interest rate may suggest solving the task using an algorithm of geometric sums if most of the textbook tasks including this information are solvable with this algorithm. In the Silver study (Silver, 1997) the students did pay attention to contextual details and in the Chartoff study (Chartoff, 1976) the contextual setting was also a factor that the students used for relating tasks to each other. 3. Representations. The main components (object, transformation and concept) and the real world phenomena of the situation described in the task can be conveyed to the students by different types of representations. The representations chosen may make it easier or more difficult to recognise the relatedness to similar tasks encountered in their learning environment depending on if the components are represented in the same form in the assessment task and in the textbook tasks. Examples of different forms of representations are pictorial, symbolic, textual, tabular, graphical. 4. Linguistic features. A written description of the situation (all of the assessment tasks and the textbook tasks are in writing) includes words and may therefore have different linguistic features that may be of importance when students relate tasks. The differences in linguistic features may be of semantic character (the meaning of words and sentences) or of syntactic character (e.g. long text, the question put in the middle of the text, choice of numerals or words, difficult grammar). The semantic part of this task variable refers to mathematical terms and key words or phrases that may work as verbal clues to a solution. It has been shown that key words (Hegarty et al., 1995). For example, 15
if most of the textbook tasks that include the term maximum in the assignment can be solved by differentiation and setting the derivative equal to zero, then this term may function as a trigger for applying this algorithm to assessment tasks that include this term. Words or phrases are also a type of contextual feature that students have been shown to use when relating tasks to each other (Chartoff, 1976; Silver, 1979). The syntactic features will probably mostly influence students if the texts in the tasks are very different in difficulty so the relatedness in other variables cannot be seen. Long text with difficult grammar may make it more difficult to see the relatedness between tasks. 5. Explicitly formulated hints. An example of an explicitly formulated hint or requirement is “use derivatives to. . . ”. Such a hint or requirement clearly can directly guide students towards a solution algorithm. 6. Response format. An assessment task requires a student response. When students are faced with the task and looks for a solution they obviously have not provided a solution yet. Thus, we are not considering the students possible solutions to the task as a task variable that will be analysed in terms of task relatedness (this variable is analysed in step I and III in the analysis procedure though). However, the format of the solution required may be directly inferred from the task and therefore a possible task variable for students to consider when looking for relatedness with tasks encountered in the textbook. Selecting from predetermined answers, providing short answers, and providing extended solutions including justifications for the answers are examples of different response formats. 4.2.3
Examples of Classifications
The following three examples of assessment tasks will be used to illustrate the analysis and the classifications made. Task 1. You have the function y = 3 − x. Is the function linear or exponential? Explain how you know this! (Comment: the task also included the graph of the function) Task 2. The value of a car, y SEK, can be calculated with the formula y = 120000 · 0.85x , where x is the time in years after the purchase. (a) How much is the car worth after 3 years? (b) How much did the car cost when it was new? 16
Task 3. The following is known about the function f : f (7) = 3 and for 7 ≤ x ≤ 9 it holds that 0.8 ≤ f 0 (x) ≤ 1.2 Determine the largest possible value for f (9). Task 1 was classified as possible to solve with MR. In the theory text a linear function is defined as a function which graph is a straight line whereupon several linear graphs are shown. Exponential functions are said to describe courses of events where something changes with a certain percentage. Also several exponential functions expressed as graphs are shown. In addition, a linear function and an exponential function are displayed in the same graph and the differences are discussed. Since it suffices to justify the answer with a reference to the characteristics of the graph there are several places in the textbook where the answer to this task exists. In addition, the characteristics of the task were judged to be similar enough to the characteristics of the information including the answer. There are no tasks in the textbook that include exactly this type of assignment, but the explicit information about, and the representation of, the mathematical components are similar in the assessment task and in the theory and examples in the textbook. There is a linear function represented both by an algebraic expression and by a graph. The important terms function, linear and exponential are given and there are no syntactic features that are problematic. Task 2a is an example of an assessment task that was classified as requiring FAR. There are several similar tasks in the textbook. The following task is such a textbook task: The value of a savings bond is estimated to increase according to the formula y = 100000 · 1.12x where x is the time in years. Calculate the value of the savings bond. a) when x = 0 b) when x = 1 c) after 3 years In these three subtasks the assignment does not differ in a significant way from the assessment task (Task 2a). The information about the mathematical components and the representation of them is similar – they include an exponential function that is expressed as an algebraic expression and a value for x (even if the assessment task does not explicitly state that it is for x, which the first two textbook subtasks do). They are allowed the same equipment (a calculator), and the syntactic features are not likely to be in the way for experiencing relatedness. The type of response is also the same. The real life events are both about money, but the events are not the exactly the same. This was not judged to be very important in this case since so many other important characteristics are similar. In addition, there is similar information about both real life events. The conclusion is that there are textbook tasks that both can be solved with the same algorithm as the assessment task and possess similar characteristics regarding the set of task 17
variables used in the analysis. Therefore, the assessment task was classified as requiring only FAR and not CR. Task 2b was classified as requiring LCR. The difference to Task 2a is that Task 2b does not explicitly state a value (for x), which concerns the task variable explicit information about mathematical components. This means that the students first have to realise that x = 0 when the car is new. Since there were no tasks in the textbook where this reasoning had to be done the students cannot directly apply a memorised algorithm but have to make a new reasoning and consider the intrinsic properties of a mathematical component (the meaning of in this applied situation). This is CR, but since it was considered not to constitute a significant step the task was classified as requiring LCR and not GCR. Task 3 is an assessment task that is not similar to any task, example or other information in the textbook. There are, for example, no tasks with this assignment and with this information about the mathematical components (the information about the function f ). In addition, there are no tasks in the textbook that are solved with any of the possible algorithms that can solve the assessment task. Therefore, the task was classified as requiring GCR. 4.2.4
Validity of the Analysis
The meaningfulness of the results of the analysis is dependent on the relation between the reasoning requirements theoretically established for the tasks (carried out in this study) and the reasoning actually used by students. One part of the validation of the analysis is the theoretical argumentation for the appropriateness of (1) the procedure used for the task classification and (2) the set of task variables used in the task classification. This argumentation is made in the Method section of this paper. A second part of the validation of this way of establishing reasoning requirements for assessment tasks is to compare the classifications with the kinds of reasoning students actually use. Such a study was made in which the reasoning used by eight students when solving national tests in an authentic national test situation was compared with the tasks’ reasoning requirements established with the procedure described in this paper (Boesen et al., 2005). The eight students were divided among the four course tests (for courses A-D) so that two solution attempts were made on each test (and item). In total 191 solution attempts were classified. For the theoretically established reasoning requirements to be consistent with the students’ reasoning it should be possible for the students to solve the tasks with the reasoning judged to be required (the 18
establishment procedure should not provide reasoning requirements that are to high). In addition, the procedure of establishing reasoning requirements should not provide reasoning requirements that are to low. The students should not be able to solve a task with a less creative reasoning than what is judged to be required (MR and AR is defined to be less creative reasonings than LCR, which in turn is considerd to be a less creative reasoning than GCR). In 4% of the tasks in the study either both students failed to solve the task or the student(s) that succeeded used a more creative reasoning than what was judged to be required. This means that in only 4% of the tasks there were indications that the classification procedure may have provided too high reasoning requirements. Three percent of the tasks in the tests were solved with a less creative reasoning than what was theoretically established, indicating that the procedure very rarely provided too low reasoning requirements. Thus, the analysis of the students used reasoning indicates that the establishment of reasoning requirements in this way can provide meaningful results and be a valuable tool for the evaluation of assessments.
5
Results
Eight national tests and 52 teacher-made classroom tests were analysed. In total 1186 assessment tasks were classified in terms of their reasoning requirements. The proportion of tasks requiring GCR is low on the teachermade tests (see Table 1 below). The mean proportions vary between 7% for the A-course tests developed for the Social science programme and 24% for the C-course tests developed for the Natural science programme (the 95% confidence intervals are 0%-14% and 19%-29% respectively). Students from different study programmes passing through the compulsory courses for each programme will in total encounter about the same proportion of tasks requiring GCR in the tests. Taking into account the teacher-made tests from all of the compulsory courses for each programme the proportions of tasks requiring GCR are 16%, 11% and 16% for the study programmes HP, SP and NV respectively. Since also the proportions of tasks requiring LCR are relatively low most of the tasks in the teacher-made tests are possible to solve with one of the superficial reasoning types MR or AR. This is not possible to the same extent in the national tests. The requirements of GCR on the national tests are much higher. In all of the national tests about half of the tasks require GCR for a successful solution. These results are not to be interpreted such as the tests made for the 19
Table 1: The table displays the mean proportions (%) of the tasks in the national tests (NCT:s) and the teacher-made classroom tests that require the mathematical reasoning types MR/AR, LCR and GCR respectively. The proportions are presented for each course (A-D) and study programme (HP, SP and NV).
MR/AR LCR GCR MR/AR LCR GCR MR/AR LCR GCR
HP A 70 14 16 NV A 78 13 8 NCT A 27 19 54
SP A 88 6 7 NV B 73 15 12 NCT B 42 11 47
SP B 74 12 15 NV C 60 16 24 NCT C 49 7 44
SP Tot 80 9 11 NV 58 22 20 NCT D 26 18 56
NV Tot 67 16 16 NCT Tot 36 14 50
different study programmes are equally difficult. The analysis is made in relation to the students’ textbooks and although the students from different programmes sometimes study the same course they do not have the same textbooks and the examples, excercises and partly the content of the textbooks are different for different study programmes. For example, the textbooks for the study programme NV include technically more advanced equations than the textbooks intended for the study programme HP. However, the results do say that the students enrolled in the different programmes are faced with similar requirements in the teacher-made tests regarding the type of mathematical reasoning. In addition, the analysis shows that the teacher-made tests and the national tests do not emphasise the same types of reasoning. Since Algorithmic reasoning and Memorised reasoning is sufficient for achieving high scores on many of the teacher-made tests such results may be accomplished without producing new reasoning and without considering the intrinsic mathematical properties of the components involved in the reasoning. Thus, relational understanding of the concepts and methods used is not required. High scores may be obtained by relating the assessment 20
task to tasks encountered in their textbooks (by superficial properties of the tasks) and then recall facts and procedures connected to these textbook tasks. Such reasoning is not possible to the same extent in the national tests. The specific study programme does not seem to influence the kinds of reasonings that are required in the tests. Since the students from the different programmes partly attend different courses this does not follow from what was noted above that the students from different study programmes encounter approximately the same proportion of tasks requiring GCR. However, this conclusion can be drawn from a comparison of the teacher-made tests developed for the different study programmes while keeping the course constant. In this comparison no large differences can be found. The proportions of tasks requiring GCR are 16%, 7% and 8% for the A-course tests developed for HP, SP and NV respectively. The mean proportion of tasks requiring GCR in the B-course tests for the SP-students are 12% and the corresponding proportion in the B-course tests for the NV-students is 15%. The national tests are the same for all study programmes. When keeping the study programme constant the influence of the course on the proportion of tasks requiring GCR can be analysed. The difference between the proportion of tasks requiring GCR on the A-course tests and the corresponding proportion for the B-course tests developed for the SP students is not statistically significant (p>0.05). The only statistically significant difference (p