A Principled Approach to the Evaluation of SV - Semantic Scholar

2 downloads 0 Views 57KB Size Report
Paul Mulholland. Knowledge Media Institute. Open University. A large amount of Software Visualization (SV) technology has been developed This is particularly ...
A Principled Approach to the Evaluation of SV: a case-study in Prolog

Chapter

X

by Paul Mulholland Knowledge Media Institute Open University

A large amount of Software Visualization (SV) technology has been developed This is particularly the case for the Prolog programming language whose execution model causes particular difficulties for the learner (e.g. Taylor, 1988; Fung et al; 1990; Schertz et al, 1990). As a result, a wide range of Prolog SVs (or tracers) exist and though many claims are made regarding their usefulness and suitability for various potential user populations there is little empirical evidence. This paper reports an empirical investigation into the suitability of four tracers for an early novice population. The methodology aims to learn from the lessons of SV and Computer Based Training (CBT) evaluation which has provided many conflicting results that cannot be clearly interpreted. The empirical approach uses protocol analysis (Ericsson and Simon, 1984) to develop a fine-grained account of the user, identifying information access, the use of strategies, and misunderstandings of the SV and execution. This approach allows differences in performance to be more confidently explained. The results show overall performance differences across subjects using the SVs which can be interpreted using the protocols. The interpretation permits

Introduction

2

|

PAUL MULHOLLAND

prescriptions to be made as to the suitability of the four tracers for novices and suggestions as to how each could be improved. Background

A number of studies have been undertaken to investigate the relative advantages of various types of display for computer-based instruction (CBI) or computer-aided learning (CAL). Most of these have focused on determining the efficacy of particular display features such as graphics, animation or colour. These studies have produced a number findings which appear to be contradictory. Rigney and Lutz (1976) and Alesandrini and Rigney (1981) investigated the usefulness of graphical representations for the presentation of chemistry concepts. Both studies found animation to be an advantageous feature. Other studies have failed to find any benefit for animation in educational technology. For example, in a study by Reed (1985) subjects were given rules enabling them to estimate how long it would take the computer to perform algebra word problems. Those receiving a dynamic simulation of the behaviour of the computer performed no better than those viewing a static representation. In a study of the effects of graphics and animation on learning outcomes Baek and Layne (1988) found an advantage for graphics and animation over text for teaching a mathematical concept, whereas Peters and Daiker (1982) found no advantage for graphics or animation in a CAL environment. This kind of problem has also pervaded evaluation studies into the efficacy of textual and graphical programming notations. Cunniff and Taylor (1987) investigated the effect of textual and graphical presentation on novice code comprehension. The study investigated the comprehension of equivalent code written in one graphical (FPL) and one textual (Pascal) language. Subjects had to perform three tasks thought to be central to computer program comprehension: the recognition and counting of types of program constructs, determining the values assigned to specific variables, and determining the number of times particular program segments will be executed. They found faster response times with FPL than Pascal. The accuracy of responses was also superior in FPL. This was particularly so with questions requiring the comprehension of program conditionals. There were also far more errors using Pascal than FPL on questions relating to the values of variables.

EVALUATION OF SV

|

3

A rather different result was gained by Badre and Allen (1989) in their comparison of a graphical and a diagrammatic programming notation. Overall they found no difference in bug location time between the two notations. They then separately analysed the performance of the novice and expert programmers within the subject sample. They found no effect of notation for experts but did find a superior performance for the textual notation among the novice subjects. Within the evaluation of SV global classifications have been applied to the test materials rather than the features of the SV itself, though once again the findings have been less than straightforward. Patel et al (1991) performed a direct comparison between three Prolog trace formats. They investigated the relative speed with which subjects could access information from static displays of three Prolog SVs. Five programs were used: three focused on backtracking (i.e. the retrying of earlier goals) and two on recursion. One SV performed best overall, though the pattern of results could not be explained by the distinction between recursion and backtracking. From the review it can be seen that much of the empirical work in the fields of ITS, SV and program notations suffers from the problem of trying to find global generalisations that are not there. Many of the studies derive performance results without deriving the information necessary to explain them. Observations are usually not made of for example how the subjects approached the task and what features they found confusing. Evaluation studies therefore seem to be providing a set of isolated findings which can often on the surface appear contradictory. As only global measures of performance have been used in many of the studies, it is necessary to rely on anecdotal explanations of any apparent contradictions. A research methodology that sought to focus more on providing a qualitative account of why a particular result occurred would hopefully be able to move away from isolated observations toward building an overall picture of what is occurring. The next two sections shall outline a software environment and psychologically motivated empirical framework which provide a foundation for the evaluation study.

4

The Prolog Program Visualization Laboratory (PPVL) car(mini). car(jaguar). gold(ring). bike(bone_shaker). bike(honda). silver(honda). fun(Object) :car(Object), gold(Object). fun(Object) :bike(Object), silver(Object). Figure 1. The fun program. call fun(_1) UNIFY 1 [] call car(_1) UNIFY 1 [_1 = mini] exit car(mini) call gold(mini) fail gold(mini) redo car(mini) fail car(_1) UNIFY 2 [] call bike(_1) UNIFY 1 [_1 = honda] exit bike(honda) call silver(honda) UNIFY 1 [] exit silver(honda) exit fun(honda) Figure 2. Spy trace of the query fun(What).

|

PAUL MULHOLLAND

PPVL provides an experimental laboratory on which to base a systematic comparison of tracers. PPVL incorporates four Prolog SVs (Spy, PTP, TPM and TTT) providing the first opportunity to study a number of fully implemented tracers within the same environment. PPVL is implemented in MacPrologTM version 4.5 running on MacintoshTM system 7.1. PPVL provides common interface and navigation for all tracers so differences in performance due to the ease of use of different interface technologies are minimised. PPVL also internally records all user activity at the terminal. The empirical work considered here will focus on the visualization of Prolog. Prolog is a logical programming language. It allows programs to be written and read in a declarative way consistent with the predicate logic assertions it represents. There are two main kinds of construct used in Prolog. These are facts and rules. Facts are unconditionally true. Rules specify something that is true given that one or more conditions are satisfied. The fun program (figure 1) contains six facts (car, gold, bike and silver) and two (fun) rules. Each separate rule or fact constitutes a clause. The real World or declarative meaning of the first rule can be expressed as "something is fun if it is a car and is gold" and the second can be expressed as "something is fun if it is a bike and is silver". The first part of the rule, constituting the goal (in this case fun(Object)) is known as the head of the rule. The subgoals which have to be satisfied (in this case car and gold or bike and silver) in order for the goal in the head of the rule to be true form the body of the rule. The Spy tracer (see figure 2) is a stepwise, linear, textual SV system which adopts the Byrd Box model of Prolog execution (Byrd, 1980). The model uses a procedural interpretation of Horn clause logic. The head of a rule is classed as a procedure and the body treated as one or more sub procedures. Byrd’s aim in the development of Spy was to provide a basic but complete account of Prolog underpinned by a consistent execution model. PTP (Prolog Trace Package) (see figure 3) was developed by Eisenstadt (1984/5) to provide a more detailed and readable account of Prolog execution than is found in Spy. PTP aimed to make the account of execution as explicit as possible, thereby reducing the amount of interpretation required by the user. Particular areas where PTP aimed to improve on Spy were the presentation of more specific status information and a more explicit presentation of unification.

EVALUATION OF SV

|

5

TPM (Transparent Prolog Machine) (see figure 4) aimed to provide the very detailed account provided by PTP in a much more accessible form. TPM uses an AND/OR Tree model of Prolog execution (Eisenstadt and Brayshaw, 1990; Brayshaw and Eisenstadt, 1991). Execution is shown as a depth first search of the execution tree. Unlike the other SVs, TPM incorporates two levels of granularity. The Coarse Grained View (CGV) uses an AND/OR tree to provide an overview of how clauses are interrelated during execution. Fine grained views (FGVs) giving the unification details for a particular node are obtained by selecting the node in question. The fine grained view uses a lozenge notation to show variable binding. The Textual Tree Tracer (TTT) (see figure 5) has an underlying model similar to TPM but uses a sideways textual tree notation to provide a single view of execution which more closely resembles the source code (Taylor et al, 1991). Unlike linear textual tracers such as Spy and PTP, current information relating to a previously encountered goal is displayed with or over the previous information. This keeps all information relating to a particular goal in the same location. For example, all information pertaining to the car subgoal is contained in lines 2 and 3 of the trace, though in PTP is spread over lines 3, 4, 7, 8, and 9. Seven symbols relating to clause status are employed, five of these distinguishing types of failure. The variable binding history is shown directly below the goal to which it relates. The aim behind TTT was to provide the richness of information found in the TPM trace in a form more closely resembling the underlying source code. This approach illuminates an important trade-off in the design of the SV notation. TPM aims to show the structure and nature of the execution in an maximally clear form. For TTT, constructing a SV notation near to the underlying source code was a primary aim. The designers of TTT felt that the "tracer should be designed so as to enhance the ease with which the trace output can be correlated with the source code of the program being traced" (p. 4) (Taylor et al, 1991). The review of previous studies suggests a number of requirements of future empirical work. A first important characteristic is that the results should provide some integrated understanding of the subjects’ performance in relation to the task and their individual characteristics. For example, in terms of understanding performance within the context of the task, Patel et al (1991) raised the issue as to whether performance differences between subjects using

1: ? fun(_1) 2: > fun(_1) [1] 3: ? car(_1) 4: +*car(mini) [1] 5: ? gold(mini) 6: -0gold(mini) 7: ^ car(mini) 8: < car(mini) [1] 9: -~car(_1) 10: < fun(_1) [1] 11: > fun(_1) [2] 12: ? bike(_1) 13: +*bike(honda) [1] 14: ? silver(honda) 15: +*silver(honda) [1] 16: + fun(honda) [2] Figure 3. PTP trace of the query fun(What).

Figure 4. TPM CGV of the query fun(What). >>>1: fun(What) 1F/2S |2 What = honda ***2: car(What) 1SF |1 What ≠ mini ***3: gold(mini) Fu ***4: bike(What) 1S |1 What = honda ***5: silver(honda) 1S Figure 5. TTT trace of the query fun(What).

Methodological requirements

6

|

PAUL MULHOLLAND

different SVs were due to information access or task modification. By this they meant on one hand, whether the nature of the SV had affected the rate in which necessary information can be accessed from the display, without altering the strategic approach taken to the task, or alternatively whether the nature of the SV could affect the approach taken by the subject in performing the task. A desirable quality of the methodology would be that it could produce results that could shed light on this kind of issue. In order to provide an integrated understanding of performance it will be necessary to derive a far more fine-grained account of the subject rather than relying solely on gross performance measures. An accepted method of gaining a fine-grained account of the cognitive activities occurring when performing some task is protocol analysis (Ericsson and Simon, 1984). This study will employ the technique of getting subjects to work in pairs and talk between themselves during the task to allow the recording of a protocol without placing artificial demands on the subject. The effectiveness of using subject pairs to evaluate human computer interfaces has already been expounded by Suchman (1987). The methodology should also have a theoretical basis in what is already known from previous evaluation studies and general research within the psychology of programming. Research in the psychology of programming can be used to motivate what kinds of things could be looked for or expected within the protocols. Previous research can be used to show what kinds of experimental hypotheses are likely or unlikely to yield meaningful results. One aspect of previous studies to which this particularly applies is classification according to features of the display or source code. Global classifications of display features such as the use of colour or animation tend to miss the key issue, and produce confusing or contradictory results. The important point is what certain features are used to represent rather than whether they are used at all. Additionally, this form of display classification could not hope to distinguish between many Prolog SVs such as Spy, PTP and TTT though there may be large performance differences between them. Patel et al (1991) found EPTB (Ditchev and du Boulay, 1987) to be significantly better than Spy though both are textual tracers with very similar dynamics. The focus of the methodology will therefore be more concerned with providing a framework for understanding what occurs rather than for testing global hypotheses.

EVALUATION OF SV

|

7

Another desirable criteria is generalizability. This applies both to the empirical findings and the methodology itself. The more fine-grained account of how the subjects perform should provide a picture not only of how well subjects did in some particular situation but also why they performed as they did. This information will allow justifiable assertions as to what range of situations or subjects the findings will likely apply. This may permit some basic prescriptions to be made as to how suitable particular SVs are likely to be for certain situations. The results of a fine-grained account could also be used to motivate improvements to existing SV systems. Ideally the methodology should also be applicable to new or different programming languages or types of SV. The study was carried out involving 64 Open University summer school cognitive psychology students taking an Artificial Intelligence project. Students taking the project are required to model a simple cognitive theory in Prolog. Each summer school project lasts approximately 2.5 days. Each tracer was used as the main teaching focus and sole debugging aid for one week (i.e. two AI project groups). Prior to the summer school the students had completed assessed work using Prolog to model a simple AI problem. A four way between subjects design was used with 16 subjects per cell working in pairs. Each pair of subjects were given five minutes to familiarise themselves with a program presented on a printed sheet. They each retained a copy of this program throughout the experiment. The program was an isomorphic variant of the one used by Coombs and Stell (1985) to investigate backtracking misconceptions. They were then asked to work through the traces of four versions of the program which had been modified in some way. Their task was to identify the difference between the program on the sheet and the one they were tracing. They had no access to the source code of the modified versions. After five minutes the subjects were given the option to move onto the next problem. This was used as an upper bound for timing data. Verbal protocols were taken throughout. Program modifications were selected which required the novice to focus on different types of information in order to correctly identify the change. The four problems given were a change in a relation name, a changed atom name, a data flow change and a control flow change. The data flow change was either passing the wrong variable from a rule or changing a variable within a

Outline of the study

8

|

PAUL MULHOLLAND

rule to an atom. The control flow change was either a swap in the subgoal order of a rule or the fact order within the database. Results

Performance on the task

The mean number of problems solved by each subject pair in total are shown in figure 6. There was a significant main effect for tracer, F(3, 28) = 3.260, p < 0.05. A planned comparison revealed a significant difference between the graphical and textual tracers, F(1, 28) = 8.174, p < 0.01. Figure 6. Mean number of problems completed within five minutes.

Tracer Solutions

Spy 2.250

PTP 2.750

TPM 1.500

TTT 2.375

Information content

A preliminary analysis of the protocols revealed eight kinds of information discussed within protocol utterances (see figure 7). Some such as CFI, DFI and SOURCE would be expected from the work of Bergantz and Hassell (1991) and Pennington (1987). Others such as ETO and TRACE reflect the role of the SV within information access and use. Figure 7. Protocol coding scheme for information types.

Information CFI DFI ETO GOAL PRED READ SOURCE TRACE

Description Derive control flow information from the trace Derive data flow information from the trace Compare to an earlier trace output Comment on the goal of all or part of the program Predict future behaviour of the trace Read the trace output Refer to or reconstruct source code Comment on navigation or notation of the trace

As control flow and data flow are the more central information types to program comprehension, a two way mixed ANOVA was performed on this data. This revealed significant main effects for SV, F(3, 26) = 5.155, p < 0.01 and information type, F(1, 26) = 15.262, p < 0.01. A Tukey (HSD) post-hoc comparison revealed significant differences between PTP and TPM (p < 0.05) and TTT and TPM (p < 0.05). A one way analysis of variance revealed a significant main effect for trace related utterances (TRACE), F(3, 26) = 5.536, p < 0.01. A Tukey (HSD) post-hoc comparison revealed significant differences

EVALUATION OF SV

|

9

between PTP and TPM (p < 0.01) and TTT and TPM (p < 0.05). The frequencies of CFI, DFI and TRACE utterances are shown in figure 8. 30

CFI

25

Figure 8. Mean number of CFI, DFI and TRACE utterances.

DFI

20

TRACE

15 10 5 0 Spy

PTP

TPM

TTT

Strategies

A preliminary analysis of the protocols revealed seven comprehension strategies (see figure 9). The presence of REVIEW and TEST strategies would be expected from the work of among others Green (1977) and Katz and Anderson (1987/8). The other strategies relate to the way the SV was used within the comprehension process. A strategy was defined as a set of temporally close utterances performing some function relating to the comprehension of either the program, the SV, the execution or the interrelations between them. The strategy may be carried out by one subject or jointly between the pair. Strategy REVIEW CF REVIEW DF TEST CF TEST DF EXPERIENCE SOURCEMAP OVERVIEW

Description Review previous execution steps Review previous data flow Predict and test future steps of the trace Predict and test future bindings of variables Compare against previous experience of the tracer Map successive steps of the trace against the code Comment on the overall trace output at some point

A two factor ANOVA comparing the four SVs across the two review strategies revealed a main effect for tracer, F(3, 26) = 3.495, p < 0.05 and a significant interaction between SV and strategy, F(3, 26) = 4.304, p < 0.05. Simple effects were found for the strategy RDF, F(3, 43) = 5.528, p < 0.01 and

Figure 9. Strategies identified in the protocols.

10

|

PAUL MULHOLLAND

the tracer TTT, F(1, 26) = 9.333, p < 0.01. A Tukey (HSD) post-hoc comparison revealed a significant difference between PTP and TPM (p < 0.05). The review strategies are shown graphically in figure 10. Figure 10. Mean number of REVIEW CF and REVIEW DF strategies for each SV.

1.5 REVIEW CF 1

REVIEW DF

0.5 0 Spy

PTP

TPM

TTT

A two factor ANOVA comparing the four SVs across the two test strategies revealed a main effect for tracer, F(3, 26) = 3.253,p < 0.0378. A Tukey (HSD) post-hoc comparison revealed a significant difference between PTP and TPM (p < 0.05). The test strategies are shown graphically in figure 11. Figure 11. Mean number of TEST CF and TEST DF strategies for each SV.

3 2.5

TEST CF

2 1.5

TEST DF

1 5 4 3 2 1 0

0.5 0 Spy

Spy

PTP

TPM

TTT

Figure 12. Mean number of SOURCEMAP strategies for each SV.

PTP

TPM

TTT

A one way analysis of variance of the distribution of the SOURCEMAP strategy revealed a main effect for SV, F(3, 26) = 5.656, p < 0.01 (figure 12). A Tukey (HSD) post-hoc comparison revealed a significant difference between TTT and TPM (p < 0.01) and TTT and Spy (p < 0.05). Misunderstandings

A preliminary analysis of the protocols revealed four main misunderstandings of the trace: clause-goal, control flow, data flow and timing.

EVALUATION OF SV

|

11

Clause-goal misunderstanding refers to when the clause in the program is compared in an inappropriate way to the goal as currently shown in the trace. Control flow misunderstandings tended to result from an incorrect model of control flow being used by the subject which the SV has failed to counteract. Two different kinds of data flow misunderstanding were identified. Some examples appeared to result from a disparity between the binding observed in the trace and the binding (wrongly) expected by the subjects. Other instances of data flow misunderstanding resulted from subjects using an incorrect or incomplete model of how data flow occurs in Prolog. Time misunderstandings are those resulting from a failure to appreciate the position in the execution being shown at any particular point. Misunderstanding Clause-goal Control flow Data flow Time Total

Spy 1.63 0.75 1.13 0.00 2.08

PTP 0.57 0.00 0.43 0.00 1.00

TPM 1.00 0.00 0.13 2.75 3.88

TTT 1.43 0.00 0.29 0.43 2.16

Figure 13. Mean number of misunderstandings of the trace per subject pair.

The mean number of misunderstandings of the trace are shown in figure 13. A one way ANOVA of the total number of misunderstandings for each subject pair revealed a main effect for trace, F(3, 26) = 3.669, p < 0.05. A Tukey (HSD) post-hoc comparison revealed a significant difference between PTP and TPM (p < 0.05). In terms of completion rates, TTT and PTP both significantly outperformed TPM. Spy faired less well than the other two textual tracers and TPM performed worst overall. The results therefore show that the features of the SV have a significant effect on its usability for Prolog beginners. Though the performance rates of TTT and PTP subjects were similar, the protocols suggest there were important differences in how the subjects used and adapted to the two tracers. The information content shows that PTP subjects derived more control flow information whereas TTT subjects focused more on data flow. The linear indented display of PTP showing clause entering and exiting provides a very clear model of control flow. The more compact TTT display showing higher level variable names permeating through the trace provides a good model of data flow.

Discussion

12

|

PAUL MULHOLLAND

Similarly far more control flow strategies were identified in the PTP protocols. Subjects had more misunderstandings of TTT than PTP. TTT subjects were more likely to confuse the clause and goal as displayed in the trace. This is understandable because unlike PTP, TTT does not distinguish directly between clause and goal. TTT subjects also showed a number of time misunderstandings. This is no doubt due to the difficulties adapting to using a textual display that develops in a non-linear fashion. The protocols can also shed some light on why Spy performed less well than the other textual tracers. Spy subjects obtained less control flow and data flow information from the tracer than TTT and PTP subjects. This is largely due to the UNIFY lines in the trace being harder to understand (and easier to ignore) than the clause and goal pairings of PTP. This combined with the length of the display made it far more difficult for subjects to derive a control flow or data flow account of the execution. As a result the subjects used far fewer control flow and data flow related strategies to aid comprehension, relying far more on mapping between lines of the trace and the source code. Source-mapping can be thought of as a default strategy the subjects defer to when their understanding is insufficient to permit a higher level strategy. Spy subjects were also far more likely to misunderstand control flow, data flow and confuse the clause and goal as presented in the trace. This again is probably due to the UNIFY line failing to clearly show the entering and exiting of clauses, unification or describe the relation between the clause and goal in a way the novice users could understand. The protocols can also help explain why TPM subjects accomplished far less than those using a textual trace. TPM subjects made fewer control flow and data flow statements though did comment far more on the goals of the program. This suggests that although the subjects could gain a vague understanding of the program on a higher level of abstraction they had difficulty using the trace to extract more fine grained information. TPM subjects also relied heavily on source-mapping and discussing the overall trace to gain an understanding of the execution. This supports the view that when the subjects looked at the trace in more detail they had to rely on a simple strategy requiring little understanding. TPM subjects were the only ones to use an OVERVIEW strategy. This was used to gain a more global account of the execution. Subjects could use the SV to spot basic changes such as incorrect relation names or an incorrect ordering of subgoals but were unable to derive more complex

EVALUATION OF SV

|

13

information. TPM subjects also frequently lost their position in the execution. It could therefore be that preserving the entire AND/OR tree shape throughout the execution could be a false blessing when novices are using the trace unaided. The methodology appeared to serve its purpose of providing a finegrained account of the performance of subjects, succeeding in identifying information access, strategies, and misunderstandings in the protocols. The protocol-based approach offers explanation as to the reasons why subjects performed the way they did. The approach identified important problems subjects have when using particular SVs, which can be used to motivate design modifications. The methodology was also able to identify interesting differences in how the task was carried out. Using subject pairs was found to be a successful empirical technique, facilitating detailed verbal reports in a naturalistic way.

Conclusions

This work was supported by an EPSRC postgraduate research studentship.

Acknowledgements

Alesandrini, K. L. and Rigney, J. W. (1981). Pictorial presentation and

References

review strategies in science learning. Journal of Research in Science Teaching, 18 (5), 465-474. Badre, A. N. and Allen, J. (1989). Graphic language representation and programming behaviour. In G. Slavendy and M. J. Smith (Eds.), Designing and using human-computer interfaces and knowegde based systems. Amsterdam: Elsevier. Baek, Y. K. and Layne, B. H. (1988). Color, graphics and animation in a computer-assisted learning tutorial lesson. Journal of Computer-based Instruction, 15 (4), 131-135. Bergantz D. and Hassell, J. (1991). Information Relationships in PROLOG programs: how do programmers comprehend functionality? International Journal of Man-Machine Studies, 35, 313-328. Brayshaw, M. and Eisenstadt, M. (1991). A Practical Tracer for Prolog. International Journal of Man-Machine Studies, 42:597-631.

14

|

PAUL MULHOLLAND

Byrd, L. (1980). Understanding the control flow of Prolog programs. In S. Tarnlund (Ed.), Proceedings of the Logic Programming Workshop, Debrecen, Hungary. Coombs, M. J. and Stell, J. G. (1985). A model for debugging Prolog by symbolic execution: the separation of specification and procedure. Research Report MMIGR137. Department of Computer Science, University of Strathclyde. Cunniff, N. and Taylor, R. P. (1987). Graphical vs. textual representation: an empirical study of novices' program comprehension. In G. M. Olson, S. Sheppard, and E. Soloway (Eds.), Empirical studies of programmers: second workshop. Norwood, NJ: Ablex. Ditchev, C. and du Boulay, J. B. H. (1987). An enhanced trace tool for Prolog. In Proceedings of the Third International Conference, Children in the Information Age. Sofia, Bulgaria. Eisenstadt, M. (1984). A Powerful Prolog Trace Package. Proceedings of the 6th European Conference on Artificial Intelligence. Pisa, Italy. Eisenstadt, M. (1985). Tracing and debugging Prolog programs by retrospective zooming. In R. Hawley (Ed.), Artificial Intelligence Programming Environments. Chichester, UK: Ellis Horwood. Eisenstadt, M. and Brayshaw, M. (1988). The Transparent Prolog Machine (TPM): an execution model and graphical debugger for logic programming. Journal of Logic Programming, 5 (4), 277-342. Ericsson, K, A. and Simon, H. A. (1984). Protocol analysis: Verbal reports as data. Cambridge, MA: MIT.

EVALUATION OF SV

|

15

Fung, P., Brayshaw, M., Du Boulay, B., and Elsom-Cook, M. (1990). Towards a taxonomy of novices' misconceptions of the Prolog interpreter. Instructional Science, 19 (4/5), 311-336. Green, T. R. G. (1977). Conditional program statements and their comprehensibility to professional programmers. Journal of Occupational Psychology, 50, 93-109. Katz, I. R. and Anderson, J. R. (1988). Debugging: an analysis of bug location strategies. Human-Computer Interaction, 3, 351-399. Patel, M. J., B., d. B., and C., T. (1991). Effect of format on information and problem solving. In Proceedings of the 13th Annual Conference of the Cognitive Science Society, Chicago. Pennington, N. (1987). Stimulus structures and mental representations in expert comprehension of computer programs. Cognitive Psychology, 19, 295-341. Peters, H. J. and Daiker, K. C. (1982). Graphics and animation as instructional tools: A case study. Pipline, 7 (1), 11-13. Reed, S. K. (1985). Effect of computer graphics on improving estimates to algebra word problems. Journal of Educational Psychology, 77 (3), 285-298. Rigney, J. W. and Lutz, K. A. (1976). Effect of graphic analogies of concepts in chemistry on learning and attitude. Journal of Educational Psychology, 68 (3), 305-311. Schertz, Z., Goldberg, D., and Fund, Z. (1990). Cognitive implications of learning Prolog - Mistakes and misconceptions. Journal of Educational Computing Research, 6 (1), 89-110.

16

|

PAUL MULHOLLAND

Suchman, L. A. (1987). Plans and Situated Actions: The problem of human machine communication. Trowbrigde, UK: Redwood. Taylor, C., du Boulay, B., and Patel, M. (1991). Outline proposal for a Prolog 'Textual Tree Tracer' (TTT). CSRP No. 177, Department of Cognitive and Computing Sciences, University of Sussex. Taylor, J. A. (1988). PROGRAMMING IN PROLOG: An In-Depth Study of the Problems for Beginners Learning to Program in Prolog. Unpublished PhD Thesis, Department of Cognitive and Computing Sciences, University of Sussex.