The Calculator Project - Formal Reasoning about ... - Semantic Scholar

8 downloads 8462 Views 155KB Size Report
UK. Mike Hopkins. Richard Bornat. Department of. Computer Science. QMW, London. UK. Abstract ... support learning to program in Miranda (“Miranda” is a.
To appear in IEEE Proceedings of SRIG-ET’94.

The Calculator Project - Formal Reasoning about Programs Steve Reeves

Doug Goldson

Pat Fung Tim O’Shea

Mike Hopkins Richard Bornat

Department of Computer Science University of Waikato New Zealand

Department of Computer Science Massey University New Zealand

CITE The Open University Milton Keynes UK

Department of Computer Science QMW, London UK

Abstract This paper describes The Calculator Project, which was a three-year joint research project between the Centre for Information Technology in Education at The Open University, U.K. (Pat Fung, Tim O’Shea) and the Department of Computer Science, QMW, University of London, U.K. (Richard Bornat, Doug Goldson, Mike Hopkins, Steve Reeves). The project was funded by the U.K. Joint Council Initiative in Cognitive Science and Human-Computer Interaction. The central aim of the project was to test the hypothesis that providing so-called calculators would improve students’ performance in those parts of the undergraduate first-year that relied on formal reasoning skills.

1: Introduction Computer Science undergraduates find “formal methods”, i.e. the use of mathematical notations and methods for supporting the development of computer software and systems, difficult and in particular they find the part of formal methods that demands reasoning about their programs the most difficult. The work reported in this paper set out to see what factors affected these difficulties and also to test the hypothesis that providing so-called calculators would ease the burden that formal reasoning imposes. In order to gather these data the project team used many different methods: questionnaires, interviews, test and exam results, logging of system use and monitoring of teacher and student discussion on electronic bulletinboards. As we shall see, this data gathering exercise allowed us show some statistically significant results concerning the experience of students prior to enrolling on the courses. It also, equally valuably, produced much anecdotal evidence about how courses should be run. The plan of this paper is as follows: in the next section we introduce and discuss the term “calculators”; we then go on to introduce the main hypothesis of the experimental work of the project and the design of the

experiments that we conducted; next we introduce and discuss two of the calculators used during the project; following that we present and discuss the results that we have, so far, obtained; finally we suggest where a followon project might lead and suggest some improvements to our work.

2: Calculators? A calculator, for us, is a self-contained, robust, simple to use software device which supports working in some area analogous to the way the familiar, hand-held, electronic calculator supports working in arithmetic. The traditional calculator supports the users in arithmetic tasks by relieving them of the burden of carrying out the arithmetic operations, allowing them to concentrate on the substance of the problem at hand. The aim of the calculators developed or borrowed in this project was to support learning to program in Miranda (“Miranda” is a trademark of Research Software Ltd.) and the understanding and use of the language of first-order logic. An important design feature of a calculator should be that it can be used without a great deal of explanation and instruction. A second feature should be that it provides a way for students to build-up a reliable model of the way in which a calculation takes place. They should come to understand how it works and what rules it follows. This means that we actually have two goals for our calculators: they should support the student in the task of building an understanding of the activities that the calculator helps to carry out; they should support the student in the more mechanical and book-keeping parts of their tasks, once learned properly, so that the tasks they can tackle become progressively harder. When we first developed our ideas for this project we had also hoped that the calculators would be thought so much fun to use that students would use them spontaneously and perhaps play with them. This turned out to be a very naïve idea on our behalf - our students were too busy and so oriented towards doing what they must do to choose to do something viewed as extra work.

3: Project hypothesis and design In the project we had a simple hypothesis - providing calculators for first-year undergraduate students in Computer Science would improve their performance on courses that involved formal reasoning, which students typically find hard, to say the least [1,2]. The project had two parallel and interconnecting streams of work: one (based at QMW) scoured the world for already existing calculators, evaluated them and, because for functional programming no suitable calculators existed, developed one of its own; the other stream (based at the OU) carried out an educational evaluation by comparing students who had used the calculators with those who had not - the comparison was done using a mixture of interviews, questionnaires, consideration of exam, test and coursework results and monitoring of the electronic bulletin boards that students and tutors on the courses concerned used for day-to-day communication about the courses. The project had, broadly, six main phases and the Open University (OU) and the QMW team worked in parallel most of the time. The phases, broadly, were: OU1.

Gather information about first-year undergraduates, design information gathering techniques for future years based on these results;

QMW1. Search for, or build, systems which would be useful as calculators supporting functional programming and formal reasoning; OU 2.

Plan and carry out “before, during and after” assessments of students on the functional programming and logic courses;

QMW2. Use the calculators in teaching; OU3.

Analysis of data gathered and writing-up of results;

QMW3. Improvement of calculators based on feedback from OU2. It became clear during QMW1 that while there were many, many calculators which could be used to support the logic course (of very varied usefulness, see [3]), there were none for the functional programming course. Hence, we had to build our own calculator to support the functional programming course, which used Miranda. So, the outcome of these initial phases was the decision to use Tarski’s World [4], developed by the Center for the Study of Language and Information, Stanford University, to support the logic course and our

own MiraCalc [5] which supported the functional programming course. In the following academic year the courses “Functional Programming One” (FP1) and “Introduction to Logic” (ItL) used these two calculators to support their teaching via laboratory work and demonstrations during lectures. The students were surveyed at the start of these courses using questionnaires and interviews which were designed on the basis of the results from OU1. These initial surveys gave us a good idea of the sorts of students we were teaching and allowed the OU team to choose representative students according to several criteria that we had hypothesized might make a difference to their achievement on the courses. These criteria included gender, level of achievement in maths, amount of programming experience and whether they used a computer at home. These representative students were interviewed in some depth during and after the courses so that we could get a very clear picture of how they had perceived the courses. We also gathered data from tests and exams to give some quantitative information on their achievements on the courses. Further, we logged the amount of time that each student used MiraCalc to see whether there was any significant difference between students who did and did not use that calculator. We will see the results of this data gathering in section five below.

4: The Calculators Below we will work through parts of two simple examples to show how each calculator performs, but first we make some general remarks. The calculators had to have two main attributes: robustness and ease-of-use. They clearly had to be robust since in the environment that they were to be used in we could not afford to have them breaking due to mistakes by students or the inevitable experiments, enthusiastically conducted, whose goals would be to find out how to break them. They had to be easy to use in order that the calculators themselves did not become part of the problem that the students were facing. They were intended to ease the students’ burdens, not add to them. They had to have good supporting documentation or help facilities and it had to be clear at any point that the students had control.

4.1: Tarski’s World The main aims of this system are to introduce the syntax of first-order logic (actually an interpreted firstorder logic with equality), to make the bridge to its semantics via the idea of an interpretation and to support the teaching of semantic ideas like consistency, inconsistency and entailment. Central to all this is the graphical representation of the link between syntax and semantics, the interpretation. Experience shows that students, when presented with the

usual formal definition of an interpretation (a pair of functions, one mapping terms to objects in the universe and the other mapping predicate letters to relations of the appropriate arity), start to give up. In Tarski’s World very natural use is made of a picture, the situation, to give meaning to terms and predicates. The situation displays a picture of the world which contains some objects of certain shapes (cubes, tetrahedra, dodecahedra) with certain attributes (names, sizes and their shapes) and standing in certain relations to other objects (larger than, in front of, between). Figure one shows, in the top, left-hand window, a picture of a “world”, the situation, in which the sentences in the lower left-hand window are interpreted. The student can decide whether or not a given sentence is a well-formed formula or a sentence and then the system can check for the well-formedness of formulae and sentences - see the small box in the top right-hand corner of the sentence window in figure one. This allows exercise of the syntactic knowledge of the student and, by suitable examples and questions, allows the support of teaching of these notions. Having mastered some of the syntax of the language, the student can then go on into learning about the semantics. Given a sentence and a situation, the student can try to reason for themselves about the truth-value of the sentence. Having reached some conclusion, the system can used to check it. For simple examples it is usually enough to be told you are wrong. In more complex cases the student can use the “game” to investigate why their choice was wrong. The game poses a series of questions about the truthvalues of parts of the sentence concerned, gradually “homing in” on a misunderstanding, which is made very clear by using the situation to show why the truth-value was wrongly calculated by the student. Experience shows that this is, not surprisingly, viewed as a much more interesting way of calculating truth-values than going through seemingly endless exercises using truth-tables. Typical problems involve being given a situation and a list of sentences and being asked which sentences are true in the situation and being given a list of sentences and being asked to build a situation that makes them all true, if possible. This allows the ideas of consistency and entailment, and their opposites, to be introduced and developed. As ever with systems which have a graphical component at their heart, it is impossible to do justice to Tarski’s World by words and “snapshots”. The reader is urged to try it and see. Suffice to say that of all the logic

Figure One

Tarski’s World

support tools we looked at (around 15 of them), Tarski’s World was amongst the best.

4.2: MiraCalc MiraCalc aims simply to provide an environment in which students can investigate the syntactic structure of their scripts, find out more about the type of expressions in the script and evaluate expressions in the environment formed by a script in a variety of ways: either fully, as in the Miranda system, in largish chunks (so-called “skips”) and one step at a time (in the manner of Bird and Wadler’s explanations in [6]). The step-by-step evaluation option turned out to be highly favoured by the students since it allowed them to see exactly how evaluation in Miranda works and clearly supported the task of working out why their programs were not working as expected. Since for a functional language there are none of the complications of, for example, global variables and sideeffects that afflict other sorts of language, the mechanism for doing step-by-step evaluation and, more importantly, the rather small amount of information that has to be given to the user to enable them to make sense of what is happening at each step, means that the calculator is relatively simple, one of the main attributes that we decided on for a successful system.

Figure Two

MiraCalc on a Mac

Figure two shows, in the top window, a parsed script and in the right-hand lower window the result of the first step in the evaluation of numsfrom 5. The left-hand lower window (actually a dialogue box) allows the user to say how large a step should be taken next. In order to give a flavour of MiraCalc we briefly consider three examples. (More detailed examples can be found in [5] and [7], also note that there are two versions of MiraCalc: for a Mac and for an X terminal). Once a script has been typed or read in and successfully parsed, the Expand and Contract operations can be used. They are meant to illustrate the structure of program expressions, the grammar of definitions, the associativity and precedence of operators, and the structure of curried functions and functions defined using patterns. Expand is used to expand expressions, Contract Left and Contract Right to contract expressions. Back can be used to go back to previous states of the cursor. Expand works as follows. It takes the current cursor selection in the script window and then expands this selection to cover the smallest enclosing expression. For example, using a box to represent the extent of highlighted text in a window, the effect of successive expansions can be illustrated as times (double 2) (double 2) times (double 2) (double 2) times (double 2) (double 2) times (double 2) (double 2)

Figure Three MiraCalc under X working on scopes This example illustrates the fact that application associates to the left. Another feature of MiraCalc, which is relevant to any programming language with control abstraction, is that it gives support for the task of determining when an occurrence of a name is free, bound or binding in an environment. This is a part of the use of a programming language that poses problems for many students, of course, so it is important that a system like Miracalc provides support for learning about it. As an example of the sort of question that might be posed consider the following: which binding occurrence of x binds the second occurrence of x in f x = x + 1 where x = 1 and does f 2 have the value 2 or 3? To answer this question we can experiment with MiraCalc. Assuming that we have entered the definition into the calculator we select the second occurrence of x and check to see that it is bound by selecting the B o u n d option from the Calculate menu. A message is displayed confirming our guess as correct and, importantly for deciding the binding of this x, the cursor selection in the window is moved to cover the binding occurrence of x. The window will now look like the one in figure three.

As a final example of what can be done with MiraCalc, figure four shows, in the top pane of the window, a script which contains definitions of various functions for producing and manipulating various sorts of lists of natural numbers. One example is numsfrom_inc x y = x : (numsfrom_inc (x + y) y) The function numsfrom_inc gives, with its first argument being x and its second y, as much of the list of natural numbers starting [x,x + y,x + y + y,x + y + y + y...] as we care to calculate. In the lower pane of figure four is shown the first five steps in the stepwise evaluation of numsfrom_inc 5 1 This feature turned out to be a heavily used, and very popular, one. An example of what one student said during interview gives the flavour of their response to stepwise evaluation: - I mean to be honest without the calculator I think everyone would be a hell of a lot more lost because it’s so much easier. You can just go in and if you don’t understand how it reduces you can go

through step by step and it will tell you. I mean it’s like having your own little teacher there saying well, this is how it does it which is very useful [8].

5: Results There is room here to present only a brief view of the results of the project. In the following paragraphs we paraphrase material from the fuller report [8]. There is no doubt that the use of the calculators helped students and that they had a beneficial effect on their learning of formal reasoning. Here we show why we can conclude this and consider in which ways the tools have helped and whom they have most helped. For both MiraCalc and Tarski's World a selection of assessment measures were used. Data in relation to Tarski's World are qualitative, using student perceptions and responses to questionnaires to gauge its effect. This data is strongly positive in showing the use that students have made of the tool and the benefits which they perceive in using it. Replies from questionnaires were from almost 65% of the target population. Of these, 90% had found it useful. Even more impressive is the 91% of those students positively recommending that it should be used for the following student intake. The benefits that they perceived from using Tarski's World were varied. It was used regularly, in that 63% of questionnaire respondents used it for all or for most of the lab sessions allocated for its use. Data from interviews showed that students enjoyed using it. A major benefit was its appealingly simple, user-friendly interface encouraging exploration: - When you use the ‘game’ bit you can see the steps, you can see what you've done wrong. It's a bit like computer games - If you don't understand the sentence, then you can build it up bit by bit. It's easy to try things in different ways - It makes it very easy to experiment. You can try out ideas about what the sentences mean. It's great, it would be a pain trying out different things on paper, having to keep erasing things, crossing things out Another benefit was that of being able to visualise the more complex ideas to which students were introduced: - Now we're doing quantifiers it's really useful - Now we're going a bit more into depth with you know...[quantifiers], it's useful for that now. It helps make sense of it.

Figure Four

Stepwise evaluation

- If there is something with lots of ‘nots’ in, then I use it, to keep check. - I got a sort of picture in my head, from Tarski's World, 'cos it's easy to picture and I work from there. It is very useful as a tool. It is good to be able to see everything. A most interesting finding from the data collected was the absence of comments on the difficulties relating to this area of the course. In a study of the previous intake in QMW [9], a difficulty which students had noted in relation to this introductory logic course was that of interpreting, building and manipulating logical statements. The ease with which students were able to do so using Tarski's World suggests that this difficulty was being addressed by the tool. Data collected on the use of MiraCalc were both qualitative and quantitative. Analysis and interpretation of quantitative data indicates clearly that MiraCalc was of benefit to the students using the system and that its use had a positive effect on their learning of formal reasoning methods. Qualitative data, collected from interviews and returns from questionnaires, indicate that where students had used it, they perceived the tool as useful and helping them in a number of areas. In addition to the qualitative data collected, quantitative data were available to help assess the effect of using MiraCalc upon learning outcomes, i.e. records of the number and length of each student's MiraCalc sessions and background information given by students at the beginning of the first semester. For the purposes of this study, learning outcomes were represented by the marks which students scored at the end of the year for the course Functional Programming 1. Since the frequency of scores in FP1 did not satisfy normal distribution assumptions the Mann-Whitney U -Wilcoxon Rank Sum W test was used in the following data analyses. For a number of the tests, notably those related to crosstabulation procedures, data relating to FP1 scores were ordered into the 25%, 50% and 75% quartiles, with corresponding score values of 26, 56 and 88, e.g. the top quartile referring to the 25% of students whose end of year score for FP1 was greater than 88. For tests relating to MiraCalc use, non-users were classed as those who had used the software tool for only an hour or less. In relation to the level of use a scale of five was used, ranging from very low to very high. Initial analysis showed a significant difference, in relation to FP1 scores, between those who had used MiraCalc in their first semester and those who had not (p ≤ 0. 001 Mann-Whitney). The mean score of those using MiraCalc was higher than that of those students who did not use MiraCalc. Next we looked at the level of MiraCalc use in relation to FP1 scores. Here, there was a

difference between those with a low or very low level of MiraCalc use and those with an average or above level of use, but there were no significant differences between groups as the level of MiraCalc use rose from average to very high. Other factors which might have had an effect on FP1 scores were also considered. Among these were mathematical backgrounds prior to university, previous experience of programming, previous home use of a computer, students' motivations, expectations of the course. These factors and FP1 scores were examined using Spearman's rank correlation coefficient as a measure of the relationships between them. Only mathematical background showed a significant correlation with FP1 scores. From previous studies [9] we were aware that a significant factor was likely to be whether students had studied A level maths or not. (“A level maths” here refers to a British qualification: “A levels”, i.e. “advanced levels”, are exams taking in the second-year of the sixth form at secondary school, which is seventh-form in New Zealand, which is the last possible year of a school education and so happens for most students to be when they are around 17-18 years old.) Further analysis of the data confirmed the significance of A level maths. In relation to FP1 scores, a significant difference was shown between the two groups of students, those who had studied A level maths and those who had not (p < .003 MannWhitney). Mean Score Mean Rank 71.4 59.2 49.4 41.5

Cases 29 = 1 maths 64 = 2 no maths

While it is very encouraging that the use of MiraCalc shows a positive effect in relation the end of year scores, it is even more interesting to look at where the effect is most evident. We next considered different groupings of students, using levels of programming expertise and maths backgrounds in order to address this point. Perhaps surprisingly, there appeared to be no significant relationship between the benefits of using MiraCalc and levels of programming expertise in relation to FP1 scores. Looking at the relationship between the use of MiraCalc, maths background and FP1 however was very enlightening. Here, in relation to FP1 we looked at two groups, those students with a mathematical background, i.e. had studied maths at A levels, and those without a maths background. Within each of these groups, we then looked at whether or not they had used MiraCalc (figure five).

Mean Score (FP1) 55.11 78.750

Mean Rank 10.0 17.2

(two-tailed) p Figure five