the design of the user interface for software development ... - CiteSeerX

46 downloads 573866 Views 1013KB Size Report
design choices for software development tools can be evaluated using methods ... In summary, software development tools are amenable to the same types of.
THE DESIGN OF THE USER INTERFACE FOR SOFTWARE DEVELOPMENT TOOLS by

Mark Anthony Toleman

BAppSc (DDIAE), MSc (JCU), GradDipBus (DDIAE)

A thesis submitted to Department of Computer Science The University of Queensland for the degree of DOCTOR OF PHILOSOPHY

May 1996

To Margaret, Jessica and Thomas

DECLARATION To the best of my knowledge and belief, this thesis, including that material closely related to the joint papers [WT89, TW91b, WT92, TWC92, TW94a, TW94b, TW95, TW96] reports my original work except as otherwise acknowledged in the text. It has not been submitted either in full or in part for a degree at this or any other university.

Mark A. Toleman

ACKNOWLEDGEMENTS I am most grateful to my supervisor, Professor Jim Welsh, for his guidance, support, and friendship throughout the course of my doctoral program, and for his careful reading and assessment of the drafts of this thesis. I would also like to thank the members of the Software Tools Group for their assistance during my studies. Alan Chapman programmed the Macintosh for the experiment on usability of graphical and text-based menu selection schemes. Henry Eastment and Bob Wicks provided assistance with experimental design choices. I lectured at the University of Southern Queensland during my studies and their support is gratefully acknowledged. Finally I want to thank my family, Margaret, Jessica and Thomas for their love and support without which this thesis would not have been possible.

ABSTRACT This thesis is concerned with design options for user interfaces for language-based editors and in particular the evaluation of these options where competing choices are available. Most design and evaluation within this domain is based on the intuition and experience of the designers. Tool designers consider themselves typical users of the tools that they design and tend to subjectively evaluate their products rather than seek the views of their colleagues or indeed the potential users of those products. We consider this approach inadequate if the quality of software tools is to improve and we advocate the use of more systematic methods. Four studies have been conducted as part of this research project. All four show how user interface design choices for software development tools can be evaluated using methods not routinely applied in this domain. Firstly we conducted a study of the application of user interface design guidelines by examining their use in a retrospective evaluation of the user interface of an existing language-based editor. The analysis reinforced many of the original intuitive design decisions and con rmed subsequent design options included in the editor's successor but clearly indicated the usefulness of such guidance. This study also focused our attention on a range of user interface issues and indicated areas needing further investigation. One such area was the display representation appropriate to certain views of a document, in particular a document's overall hierarchical structure and the use of this structure as a menu for selecting a section for examination. Thus, in a second study, we examined the eciency from the user's perspective of two di erent representations of this hierarchical structure: a graphical version and a text-based version. Both representations were equally ecient although a majority of users perceived that they performed better with (and preferred) the graphical representation. An alternative to intuition for evaluating the design of the user interface is the predictive modelling approach. Models, which can assist design choices before implementation or prototyping, were built to compare the two basic paradigms for language-based editing: tree-building and (text) recognition. This modelling exercise was useful in that it con rmed our view on the eciency of various editing paradigms but it also pointed to gaps in our knowledge and the need for empirical

studies. Thus, in a usability experiment, we examined software engineers using various editing paradigms and we measured the eciency and e ectiveness of the various design options. Users performed similarly with both paradigms although they perceived that they performed better with the tree-building paradigm. By utilising the same tasks in the usability experiment as in the predictive modelling analysis, we were able to compare results from both evaluations and e ectively validate the predictive models. The validation showed a close correlation between predicted and actual empirical results indicating the usefulness of such modelling schemes. In summary, software development tools are amenable to the same types of evaluation as other software products, but there are challenges. We have conducted a guideline review, built predictive models, and experimentally evaluated aspects of user interfaces for language-based editors and expect such evaluations to enhance the usability of the tools that we develop in future. Moreover, these studies have convinced us that such systematic experimentation is vital if important user interface design issues relevant to software tool design are to receive adequate consideration.

Contents 1 Introduction

1

2 Approaches to User Interface Design

4

2.1 User Interface Design Strategies . . . . . . . . . . 2.2 User Experience-Based Design . . . . . . . . . . . 2.2.1 Design by Intuition . . . . . . . . . . . . . 2.2.2 Experiments to Assist Design Choices . . . 2.3 Model-Based Design . . . . . . . . . . . . . . . . 2.3.1 Predictive Models for Comparing Designs . 2.3.2 Anthropomorphic Design . . . . . . . . . . 2.3.3 Cognitive Approach to Design . . . . . . . 2.4 Use of Guidelines in Design . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . .

3 Interface Design for Software Tools

3.1 User Interface Design Strategies . . . . . . . . . . 3.2 User Experienced-Based Design . . . . . . . . . . 3.2.1 Design by Intuition . . . . . . . . . . . . . 3.2.2 Experiments to Assist Design Choices . . . 3.3 Model-Based Design . . . . . . . . . . . . . . . . 3.3.1 Predictive Models for Comparing Designs . 3.3.2 Anthropomorphic Design . . . . . . . . . . 3.3.3 Cognitive Approach to Design . . . . . . . 3.4 Use of Guidelines in Design . . . . . . . . . . . . 3.5 Project Description . . . . . . . . . . . . . . . . . vii

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

4 6 6 7 8 8 10 11 13 16

17

17 17 17 18 19 19 20 21 23 24

CONTENTS

viii

4 Software Development Tools

26

4.1 4.2 4.3 4.4 4.5 4.6

Language-Based Editors . Models in Editor Design . Editing Paradigms . . . . The Synthesizer Generator UQ1 . . . . . . . . . . . . Summary . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

5 Guidelines and a Language-Based Editor

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

5.1 How Guidelines can be Used . . . . . . . . . . . . . . . 5.2 Choosing the Guidelines . . . . . . . . . . . . . . . . . 5.2.1 Application of Guidelines . . . . . . . . . . . . . 5.2.2 Software for Application of the Guidelines . . . 5.3 Correspondence of Editor Functions and the Guidelines 5.3.1 Data Entry . . . . . . . . . . . . . . . . . . . . 5.3.2 Data Display . . . . . . . . . . . . . . . . . . . 5.3.3 Sequence Control . . . . . . . . . . . . . . . . . 5.3.4 User Guidance . . . . . . . . . . . . . . . . . . 5.3.5 Data Transmission . . . . . . . . . . . . . . . . 5.3.6 Data Protection . . . . . . . . . . . . . . . . . . 5.4 Application Procedure . . . . . . . . . . . . . . . . . . 5.5 Analysis of Guidelines Review . . . . . . . . . . . . . . 5.5.1 Feedback and Response Rate . . . . . . . . . . 5.5.2 Editor Functionality . . . . . . . . . . . . . . . 5.5.3 Document Structure and Appearance . . . . . . 5.5.4 Input Device Consistency . . . . . . . . . . . . 5.5.5 User Needs and User Models . . . . . . . . . . . 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . .

6 An Experiment with Menu Design

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

26 27 29 31 33 36

37

37 38 40 41 42 42 43 44 48 49 49 50 50 51 54 55 57 58 60 61

63

6.1 Representation of Hierarchies . . . . . . . . . . . . . . . . . . . . . 63 6.2 Experimental Details . . . . . . . . . . . . . . . . . . . . . . . . . . 65

CONTENTS

ix

6.2.1 Experimental Subjects . . . . . . . . . . . . 6.2.2 Experimental Apparatus . . . . . . . . . . . 6.2.3 Experimental Procedure . . . . . . . . . . . 6.2.4 Experimental Design . . . . . . . . . . . . . 6.3 Experimental Results . . . . . . . . . . . . . . . . . 6.3.1 Mouse Dexterity . . . . . . . . . . . . . . . 6.3.2 Errors in Selection of Menu Items . . . . . . 6.3.3 Menu Style Di erences . . . . . . . . . . . . 6.3.4 Menu Style Preferences . . . . . . . . . . . . 6.4 Discussion of Experimental Procedures and Results 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

7 A Keystroke Model of Editing Paradigms 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10

The Editing Paradigm Debate . . . . . . . . . . . . . . . The Keystroke-Level Model . . . . . . . . . . . . . . . . Programming Tasks . . . . . . . . . . . . . . . . . . . . . Tree-Building Paradigm . . . . . . . . . . . . . . . . . . Tree-Building with Keyboard-Only Input . . . . . . . . . Text-Recognition Paradigm . . . . . . . . . . . . . . . . Text-Recognition with Mouse-Based Template Selection . Plain Text Editing . . . . . . . . . . . . . . . . . . . . . Comparative Results and Discussion . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . .

8 An Experiment with Editing Paradigms 8.1 Experimental Context . . . . . 8.2 Experimental Details . . . . . . 8.2.1 Experimental Subjects . 8.2.2 Experimental Apparatus 8.2.3 Experimental Procedure 8.2.4 Experimental Design . . 8.3 Usability Evaluation . . . . . . 8.3.1 Task Completion Times

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65 66 68 69 70 70 70 71 73 75 78

79

79 80 82 84 88 91 93 95 97 100

102

102 103 103 103 104 105 107 107

CONTENTS

x

8.3.2 Errors in Editor Use . . . . . . . . . . . 8.3.3 Menu Selection versus Typing . . . . . . 8.3.4 Use of Acceptance versus Typing . . . . 8.3.5 Perceptions and Preferences of Subjects . 8.4 Conclusions . . . . . . . . . . . . . . . . . . . . 8.5 Summary . . . . . . . . . . . . . . . . . . . . .

9 Validation of Keystroke-Level Models

9.1 Overall Keystroke-Level Model Validation . . . 9.2 Data Collection for Parameter Estimation . . . 9.2.1 Hypotheses Related to KLM Validation . 9.2.2 Statistical Analysis . . . . . . . . . . . . 9.2.3 Results for M2K . . . . . . . . . . . . . 9.2.4 Results for MK . . . . . . . . . . . . . . 9.2.5 Results for MPK . . . . . . . . . . . . . 9.2.6 Results for MPD . . . . . . . . . . . . . 9.3 Conclusions . . . . . . . . . . . . . . . . . . . . 9.4 Summary . . . . . . . . . . . . . . . . . . . . .

10 Need for Further Work

10.1 Detail Suppression . . . . . . 10.1.1 A Possible Experiment 10.2 Multilingual Documents . . . 10.2.1 Possible Experiments . 10.3 Summary . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

110 112 113 115 116 118

119

119 122 124 124 125 125 127 128 128 131

132

132 137 138 142 143

11 Conclusions

144

A Sample Guideline Use

149

11.1 Summary of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . 144 11.2 Major Outcomes and Contributions . . . . . . . . . . . . . . . . . . 146 11.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

CONTENTS

xi

B Hierarchies Used in Menu Experiment

B.1 Partial University Hierarchy|A . . . . . . . . . . . . . . . . . B.1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . B.1.2 Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . B.1.3 Questions . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Sta in University Schools|B . . . . . . . . . . . . . . . . . . B.2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . B.2.2 Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . B.2.3 Questions . . . . . . . . . . . . . . . . . . . . . . . . . B.3 The Bachelor of Information Technology Course Structure|C B.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . B.3.2 Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . B.3.3 Questions . . . . . . . . . . . . . . . . . . . . . . . . . B.4 The Widget Farming Company|D . . . . . . . . . . . . . . . B.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . B.4.2 Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . B.4.3 Questions . . . . . . . . . . . . . . . . . . . . . . . . . B.5 The British Royal Family . . . . . . . . . . . . . . . . . . . . B.5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . B.5.2 Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . B.5.3 Question . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

155

155 155 155 156 156 156 156 157 158 158 158 158 159 159 159 160 160 160 160 160

C Program Development Code Segments

161

D Program Maintenance Code Segments

164

E Instructions for Editing Experiment

166

E.1 E.2 E.3 E.4

Introduction . . . . . . . . . . . . . . . . . . . Program Development and Maintenance Tasks Experimental Procedure . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . .

F Editing Paradigm Experiment Details

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

166 167 168 168

170

F.1 Sample Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

CONTENTS F.1.1 Task 1: Junk . . . . . . . F.1.2 Task 2: Binary . . . . . . F.2 Program Input Tasks . . . . . . . F.2.1 Task 1: Check . . . . . . . F.2.2 Task 2: AddNumbers . . . F.2.3 Task 3: AddNumbersAlso F.2.4 Task 4: StoreCount . . . . F.2.5 Task 5: Random . . . . . F.2.6 Task 6: ComputeChange . F.2.7 Task 7: CheckInput . . . . F.2.8 Task 8: IterativeSum . . . F.2.9 Task 9: Count . . . . . . . F.3 Program Maintenance Tasks . . . F.3.1 Task 1: Dummy . . . . . . F.3.2 Task 2: Dummy . . . . . . F.3.3 Task 3: AddNumbers . . . F.3.4 Task 4: AddNumbers . . . F.3.5 Task 5: AddNumbers . . . F.3.6 Task 6: AddNumbers . . . F.3.7 Task 7: ComputeChange . F.3.8 Task 8: ComputeChange . F.3.9 Task 9: ComputeChange . F.3.10 Task 10: IterativeSum . . F.3.11 Task 11: IterativeSum . . F.4 Typing/Mousing Test . . . . . . .

xii . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

170 170 171 171 171 172 172 173 173 173 174 174 175 175 175 176 177 177 178 179 179 180 181 182 182

G Questionnaire for Editing Experiment

184

H Tables of Error Analysis Means

186

List of Figures 4.1 Diagram from Welsh, Broom and Kiong [WBK91] showing pluralistic model of program structure. . . . . . . . . . . . . . . . . . . . . . . 28 4.2 The Cornell Pascal editor interface. . . . . . . . . . . . . . . . . . . 32 4.3 UQ1 Pascal editor showing block-oriented program display. . . . . . 34 5.1 Guideline 1.04 Fast Response, from Smith and Mosier [SM86]. . . 40 6.1 Macintosh program simulation of UQ editor windows. . . . . . . . . 66 6.2 Graphical menu style. . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.1 An example program input task. . . . . . . . . . . . . . . . . . . . . 82 7.2 An example program maintenance task. . . . . . . . . . . . . . . . . 83 9.1 Linear relationships between KLM predicted and actual execution times for three editors. . . . . . . . . . . . . . . . . . . . . . . . . . 122 10.1 A scrolled view of program text. . . . . . . . . . . . . . . . . . . . . 134 10.2 Detail suppression by structural distance. . . . . . . . . . . . . . . . 135 10.3 Mixed language zones. . . . . . . . . . . . . . . . . . . . . . . . . . 140

xiii

List of Tables 5.1 Distribution of guidelines in Smith and Mosier [SM86]. . . . . . . . 39 5.2 Distribution of applicable and non-applicable guidelines. . . . . . . 50 6.1 Analysis of variance table showing sources of variation. . . . . . . . 6.2 Total number of errors before correct selection of an item. . . . . . 6.3 Analysis of variance table, for experienced subjects, for hierarchy A and question type I. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Mean time in seconds for selection of items from two menu styles by experienced subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Mean time in seconds for selection of items from two menu styles by inexperienced subjects. . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Mean time in seconds for selection of items when the graphical menu style was presented to experienced subjects with stated preferences. 6.7 Mean time in seconds for selection of items when the text-based list menu style was presented to experienced subjects with stated preferences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Mean time in seconds for selection of items when graphical or textbased list style menus were presented to inexperienced users with stated menu style preferences. . . . . . . . . . . . . . . . . . . . . .

70 71 71 72 72 74

74

75

7.1 KLM analysis for program code input for tree-building paradigm with mouse-based menu selection (CSG-TB). . . . . . . . . . . . . . 85 7.2 KLM analysis for program code maintenance for tree-building paradigm with mouse-based menu selection (CSG-TB). . . . . . . . . . 86 7.3 KLM analysis for program code input for tree-building paradigm with keyboard-based menu selection (CSG-TBk). . . . . . . . . . . 89 xiv

LIST OF TABLES

xv

7.4 KLM analysis for program code maintenance for tree-building paradigm with keyboard-based menu selection (CSG-TBk). . . . . . . . 90 7.5 KLM analysis for program code input for text-recognition paradigm (UQ1-TR). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.6 KLM analysis for program code maintenance for text-recognition paradigm (UQ1-TR). . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.7 KLM analysis for program code input for text-recognition paradigm with mouse-based menu selection (UQ1-TB). . . . . . . . . . . . . . 93 7.8 KLM analysis for program code maintenance for text-recognition paradigm with mouse-based menu selection (UQ1-TB). . . . . . . . 94 7.9 KLM analysis for program code input for plain text editing with automatic indent propagation (EDIT). . . . . . . . . . . . . . . . . 96 7.10 KLM analysis for program code maintenance for plain text editing with auto-indentation (EDIT). . . . . . . . . . . . . . . . . . . . . . 96 7.11 KLM analysis for program code maintenance for modeless text-recognition (UQ*). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.12 KLM time estimates for all paradigms for program input. . . . . . . 98 7.13 Comparison ratios of KLM estimates for program input. . . . . . . 98 7.14 KLM time estimates for all paradigms for program maintenance. . . 99 7.15 Comparison ratios of KLM estimates for program maintenance. . . 99 8.1 Loadings on computer while editing experiment conducted. . . . . . 8.2 Program input and maintenance tasks performed by all subjects with all editors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Analysis of variance table of task completion time for program input task 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Overall completion times for program input and maintenance tasks (sec). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Analysis of variance table of task completion time for program input task 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Types of error in program input and maintenance tasks. . . . . . . 8.7 Times for program input and maintenance sub-task events. . . . . .

103 106 108 109 110 111 113

LIST OF TABLES

xvi

8.8 Counts of the use of accept for program input and maintenance subtask events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.9 Times for program input and maintenance sub-task events. . . . . . 114 8.10 Rank totals over ve subjects, comparing perceptions and preferences of those subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 9.1 Empirically measured times for program input and maintenance tasks used in KLM study. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.2 KLM analysis time estimates and, in parenthesis, percentage absolute error of these compared with empirical data for program input and maintenance tasks. . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.3 Correlation and regression results for comparing KLM predicted and actual times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 9.4 Analysis of variance table for M2K events. . . . . . . . . . . . . . . 125 9.5 Mean times for M2K events. . . . . . . . . . . . . . . . . . . . . . . 125 9.6 Analysis of variance table for MK events. . . . . . . . . . . . . . . . 126 9.7 Mean times for MK events for editors. . . . . . . . . . . . . . . . . 126 9.8 Mean times for MK events for subjects and individual's typing speeds.126 9.9 Analysis of variance table for MPK events. . . . . . . . . . . . . . . 127 9.10 Mean times for MPK events for editors. . . . . . . . . . . . . . . . . 127 9.11 Analysis of variance table for MPD events. . . . . . . . . . . . . . . 128 9.12 Mean times for MPD events for subjects. . . . . . . . . . . . . . . . 128 9.13 Comparison of actual and predicted times (sec) for KLM events including M2K, MK, MPK, MPD. . . . . . . . . . . . . . . . . . . . . 129 9.14 KLM estimates using empirically determined values of M2K, MK, MPK, MPD and, in parenthesis, percentage absolute error of these compared with empirical data for program input and maintenance tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 9.15 Correlation and regression results for comparing new KLM predicted and actual times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 10.1 Performance data comparing hypertext, scrolling and folding program browsing mechanisms (Monk, Walsh and Dix [MWD88]). . . . 133

LIST OF TABLES H.1 H.2 H.3 H.4

Number of types of error in program input and maintenance tasks. . Number of actual errors in program input and maintenance tasks. . Error times for program input and maintenance tasks (sec). . . . . Average time for each error for program input and maintenance tasks (sec). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii 187 188 189 190

Chapter 1 Introduction Software development is an expensive business principally because trained human intellect is the primary resource. To assist software developers with their tasks, software development methodologies have evolved and continue to be enhanced and created. There are numerous methodologies addressing software life-cycle problems but these methodologies are complex and require software development environments, which provide automated tools, to support software developers. To meet the competing demands of keeping costs down and quality high, the search continues for better methodologies and better support tools. The objective of software development tools1 is to assist software developers to produce quality products in minimum time. Such tools should maximise the product quality and productivity achieved, by enabling their users to perform their creative intellectual activity under optimal conditions, by preventing or detecting human errors as they occur, and by relieving the users of routine mental and physical activity associated with the productive process. Software tools are required by and used by experienced and expert users| software engineers. General computer users are usually interested in tools with built-in knowledge in domains other than computing, whereas software engineers use tools that incorporate specialised knowledge about software and software development. The interfaces between the user and the computer required for these di erent purposes are likely to be as di erent as the tasks themselves. In addition, the expertise of individual software engineers varies over time, with engineers We often drop the words software development or the word development to avoid heaviness throughout this thesis. 1

1

CHAPTER 1. INTRODUCTION

2

becoming more pro cient in the use of a tool as experience with the tool grows. Since software engineers are likely to spend much of their time at this higher level of expertise, designing user interfaces for software tools should emphasise the need to optimise the physical and intellectual ergonomics involved, rather than ensuring comprehensibility for users unfamiliar with the tool. Thus, for maximum eciency to be achieved by software engineers in the use of software tools, it is vital that the human-computer interface be optimised for the task of software development. For the designers producing software tools (and for designers of other software too), one problem is that rapid advances in user interface technology imply correspondingly rapid change in the factors that determine user interface success. Another problem is that software tool designers are often unaware of basic research in the area of human-computer interaction so these results are not considered in user interface design. Ideally, user interface choices should be determined, or at least validated, by application of a well-de ned set of interface selection and evaluation criteria or procedures. In practice, such criteria and procedures are not used by many tool developers, either through ignorance of results available in the literature, or through the lack of a workable set of procedures for applying them. Tool developers typically rely on their intuition in deciding on and choosing between user interface options|they consider themselves representative users and believe that other users have a similar sense of tool usability and relevance. With many tools this is apparently not the case since users have been slow to adopt these tools due to `interaction' concerns. Software tools are frequently criticised for their idiosyncratic user interfaces and poor usability both by software engineers (the potential users of the tools) and by human factors researchers. Clearly, there is a requirement to make software tool developers more aware of the need for careful user interface design of their tools. A more systematic approach, which encompasses an understanding of the cognitive processes associated with software development and the tasks to be performed by the software engineer, has been advocated for over a decade yet many tool designers continue to ignore this research and advice. Poor communication between, on the one hand, psychologists and ergonomists, and on the other hand, software engineers developing tools, must be partly to blame. Without e ective communication of research results, intuition

CHAPTER 1. INTRODUCTION

3

(and subjective evaluation) of the tool developers, often becomes the only basis for interface design choices. Of course, many tools and software development environments are experiments with some conceptual model or another, so sometimes it can be argued that such advice can be ignored to experiment with aspects other than the user interface. Nevertheless the main problems remain|how to improve the communication of results from cognitive psychology to tool designers, and to encourage them to make use of such knowledge. Communication of psychological models and notations to tool designers, and communication of the need to empirically test design choices with prospective users, remain as signi cant concerns among cognitive psychologists and ergonomists. Where aspects of cognitive psychology have been considered in designing user interfaces for software tools (as in the case of some language-based editors), there has been no validation nor empirical testing of the interfaces chosen, only subjective discussion about usefulness of particular design options. Prototyping and empirical user testing (combined with iterative design) are the most basic elements of the usability engineering model, but the typical model applied by tool designers ignores empirical user testing. The preceding arguments and discussion indicate that tool developers have been remiss. They have failed to adequately evaluate the user interfaces of their tools, relying instead on intuition and subjective evaluation in considering the usability of their tools. There is a need to show that evaluation techniques (other than intuition) have a place in the design of user interfaces for software tools. The objective of this thesis, then, is to investigate systematic approaches to evaluation, such as those described above, in an examination of design choices for the user interface of typical software tools. In chapter 2 we review some of the strategies that designers can apply in both identi cation of design options and evaluation of competing options. Chapter 3 also examines these strategies but in the context of their past and potential application to software tools, de nes the remaining parts of the thesis, and previews the potential contributions of this research.

Chapter 2 Approaches to User Interface Design In designing the user interface for any software object there are many interacting concerns. The designer rstly considers the task or tasks to be performed using the particular software product. This might be done with or without advice from and observation of the potential users of the software. The designer must make choices between paradigms, or even provide multiple paradigms, for the various aspects of the human-computer communication. If a particular paradigm is chosen, say direct manipulation for input and system control, then the physical devices required to implement that paradigm must also be chosen. Essentially, the designer has various key issues of concern in the design of the user interface, may need assistance in identifying design options, and requires strategies for choosing between design alternatives. In this chapter we review user interface design strategies that are available to software designers and developers.

2.1 User Interface Design Strategies Designers can employ a number of strategies that assist in making design choices for a particular software system. For example, in discussing the design of programming languages, environments and methodologies, Soloway [Sol84] noted two common scenarios. Typically designers rely on intuition, based on their experiences, in deciding what is and what is not an important feature to include in a particular software system. Alternatively, designers could examine the tasks and methods employed by users, and in the process develop a theory of work behaviour that 4

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

5

could be empirically tested. If testing was successful then design principles could be derived. In essence, Soloway advocated that psychological theories and evaluation strategies used by psychologists should play a signi cant part in the design process. In considering the design of user interfaces generally, Eberts and Eberts [EE89]1 categorised the possible approaches under the four headings of empirical (results from experiments), predictive modelling (construction of engineering-style models to assist evaluation of design choices), anthropomorphic (modelling humancomputer interaction on human-human communication) and cognitive (results from behaviour studies and psychology). Of course, some studies of user interface design transect these approaches (for example, predictive models are based on cognitive theories) but the categorisation is useful as an organisational tool. Intuition or experience (introspection) was not explicitly considered as an approach but was mentioned as a baseline for examination of the e ectiveness of the other approaches. These approaches can be used to either identify or evaluate design alternatives. Introspection is a methodology in itself covering both identi cation and evaluation of design options. The empirical approach is primarily concerned with evaluation of design choices using statistically valid experiments with users. Predictive modelling approaches (called `formal' by [MN94]) are used in the evaluation process also. In contrast, the anthropomorphic and cognitive approaches provide assistance in identifying design options for the designer to consider. Where alternatives occur they can be evaluated using intuition, experiments or a predictive modelling approach. Thus, the strategies available for identifying and evaluating design decisions, concerning the user interface, can be categorised. Probably the most common way that information about design choices is presented to the designer is through sets of guidelines, and a wide range of handbooks and text books. Guidelines are based on examination of the extensive research literature available on human-computer interaction and, very often, on the experience of the particular authors (for example [SM86, May92]). They encompass, or at least try to encompass, all the approaches above in presenting to the designer a comprehensive guide to the design of the user interface. Their main advantage Eberts [Ebe94] has recently published a new text book that con rms this categorisation of approaches and expands the detail associated with each approach. 1

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

6

is that the designer is freed from having to sift through the extensive literature on user interface design. At this stage there are few `automatic' tools that assist these strategies. Moreover, in their current primitive state, there is argument about their existence, use, accuracy and coverage [MN94, Bal95]. This may change with time and e ort but such `tools' are not considered further in this thesis. We now examine these strategies from the general perspective of their use in the development of a software system. Our grouping is more hierarchical than in [EE89] re ecting two foci: user experience-based design and model-based design.

2.2 User Experience-Based Design 2.2.1 Design by Intuition

Many software engineers design systems for users based on their intuition about what those users need and want. Designers rely on their own opinion (or a consensus of expert opinion) when choosing between design options [AD89]. They assume that their experience is representative. Assuming that personal behaviour is typical is a doubtful assumption [Dow93]. Nevertheless, design by intuition has been a signi cant methodology in the past and probably will be in the future. The requirement for creativity, insight and innovation in design is ever present but designers of all types of software are becoming more aware of the real needs of users and the obligation to design `knowing the user'. This means more than simply communicating with users (although that is a start). Thimbleby [Thi90, page 189] has said . . . there is one crucial di erence between a designer and a computer scientist . . . the designer knows he does not know how to design . . . the computer scientist merely implements a system and by calling the process `design' assumes that the user interface is `designed'. Another approach involves `participation' by the end-users of software in both design and evaluation [PRS+94]. This approach considers a range of user concerns including organisational, social and technical, and emphasises the close relationship between design and evaluation.

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

7

While intuition certainly contributes to the identi cation of design choices, a systematic approach, through for example usability engineering and including a range of evaluation techniques, must be incorporated into the design of the user interface for software. This is essential if the users and potential users of that software are to be satis ed with the product and software engineering is to advance as a `true engineering discipline' [RBS93].

2.2.2 Experiments to Assist Design Choices The empirical (or experimental) approach does o er an alternative to intuition in the evaluation of user interfaces. In fact, as noted in [EE89], many experimental studies have found results that disagree with commonly held intuitive views. According to Nielsen [Nie92], empirical testing is one of the key elements of the usability engineering life-cycle. Experimental studies comparing various user interface options are common. There are several types of experiment that have been done or can be done by a designer. As an example, a laboratory-style controlled experiment may indicate the best pointing device for a particular situation. Normally these types of experiments are conducted under strict supervision with all variables, except the ones of interest, held constant. The experimenter proposes hypotheses about the variables of interest and designs the experiment so that results are amenable to statistical analysis. Typically the experiments are carried out on a prototype system so that results can be fed back to the designers for incorporation in the next version of the product. A eld experiment, conducted after release of a product, can also provide valuable information, particularly since the users are more likely to be working in their usual environments on their problems. Of course, lack of control over these eld experiments is a major concern. From the software designer's viewpoint, the transition to work-stations with bitmapped displays and pointing devices has led to a large number of interface options. The designer can choose between many input devices such as keyboard, mouse, light pen, touch screen and voice, and various output devices and quality of output. They can choose between several methods of menu display including on-mouse, popup, walk-in and xed. For systems that query the user, the designer can choose

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

8

from ll-in-the-blank, parametric or direct manipulation approaches. In general the relative merits of the options available are amenable to empirical evaluation, and a wide literature is available (for a review, see for example [Ebe87]). The scope for empirical evaluation in user interface design is thus virtually unbounded. There are disadvantages associated with such experimentation. The cost of experimentation, in both time and physical resources, can be signi cant. Furthermore, poor experimental methodology or choice of experimental design can invalidate results [Gil90]. The choice of experimental subjects is critical, particularly where extrapolation of results to some larger population is to be considered. Experiments are often carried out in environments and surroundings that are unfamiliar to the subjects, thus leading to questions about the generality of any conclusions reached. Many experiments are often conducted in an attempt to ll some requirement at a particular time and are thus `need-driven' rather than `theory-driven' [EE89]. The experimental approach is not a generative one. It can only test available design options and does not itself generate ideas for new or better designs, although user feedback can. By incorporating the empirical approach, interface design choices can be evaluated (both objectively and subjectively) by the users (or potential users) of a product with the results of these evaluations being returned to the designer for consideration in the next design.

2.3 Model-Based Design

2.3.1 Predictive Models for Comparing Designs

Another alternative to intuition for evaluating design choices is the predictive modelling approach. Here the idea is to predict the performance of humans interacting with computers in a similar way to predictions with engineering models. Thus, a model is built and used to help evaluate various user interface design options even before prototyping. Models are useful in other ways too. They can guide design decisions by describing how a user might decompose a problem into sub-problems. Models can also capture a representation of the knowledge that users have. In the rst case the designer is performing a `hierarchical task analysis', whereas the second involves both physical and mental operations and is termed a `cognitive task

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

9

analysis'. There are several modelling techniques available for software designers but no single technique can provide all the information a designer might need. Descriptions of typical modelling techniques are available elsewhere (see for example [Joh92, pages 114{150] or [Lin94, pages 5{12]). The most common and well known cognitive modelling technique is the Model Human Processor described by Card, Moran and Newell [CMN83]. Using this model, each of the human processor subsystems; perceptual, motor and cognitive, are described by a few important parameters, for example, storage capacity, decay time for an item, and by formulae such as Fitt's Law for sizing and placement of onscreen controls. The GOMS (Goals, Operators, Methods and Selection of Methods) [CMN83] approach to predictive modelling is one type of modelling scheme that attempts to produce a formal model of the human-computer interaction task. Users are assumed to act rationally to attain their goals. Their behaviour is predicted both qualitatively, in the way they achieve their goals (in selecting between di erent methods), and quantitatively in terms of time to do tasks and error frequencies. A GOMS analysis predicts user behaviour while the KLM (Keystroke-Level Model) [CMN80, CMN83] can be used to predict the time taken by skilled users in carrying out tasks using a particular system. This quantitative assessment is based on the time required to use the primitive actions to be made available in the system such as keystroking, pointing, homing and drawing. It also includes a mental operator that describes the time spent in preparing to use these primitive physical actions and the response time of the system where this is in excess of mental preparation time. The GOMS/KLM approach has been applied extensively to the use of text editors [CMN83]. A review of the use of this approach and the domains where it has been applied can be found in [OO90]. More recently it has been used in designing on-line help and documentation [GE90, EP91], in analysing the tasks carried out by telephone operators [Joh90], in developing a model of browsing on-line documents [PJ92], in analysing the playing of video games [JV92], in aircraft cockpit design [IPI94], in specialised CAD programs [GK94], and in designing text entry systems for people with disabilities [KL94b]. There are several problems with the GOMS/KLM approach. It assumes that users are skilled in the tasks to be done, that is, it assumes experienced and ex-

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

10

pert behaviour. This is a recognised aw of the approach and one that must be acknowledged by designers using it. The main diculties with the GOMS model lie in obtaining its components for any particular task and in applying the approach to complete systems.2 Some tasks may not be amenable to this approach at all and applying the approach to complete systems is particularly problematic. Nevertheless, certain components of software systems and di erent designs for those components are comparable using predictive models.

2.3.2 Anthropomorphic Design Predictive modelling can be seen as the attribution of machine-like qualities to human users. In contrast, anthropomorphism is the attribution of human qualities to non-human entities. This approach provides an alternative to intuition in the identi cation of design options. Human-human communication is used as a model of human-computer interaction|the computer is considered, at least to some extent, human. It has been argued that if computers behaved more like humans then humans would have less diculty with computers. The qualities that can be applied to the computer include: natural language processing, voice communication, adaptable systems, help messages, tutoring, friendliness, and user pro les [EE89]. Many of these qualities are dicult to de ne and even more dicult to build into a user interface. The development of natural systems is dependent on technological advances and future progress will also depend on further understanding of arti cial intelligence. Development of so-called `intelligent agents' (computer programs that actively assist users in their tasks) is part of current research and although designing and prototyping these programs is an important issue for computer science, empirical studies of these agents with users are required [Rie94]. Sometimes natural systems have been found to be inferior and users have rejected `friendly' systems in favour of computer-oriented ones. Studies detailing diculties with such systems are cited in [EE89] including [Sca81, DL81, SW83, Jen84]. Various user types, experts usually, prefer less verbose systems nding them inecient once they are familiar with the system and tasks. An enhanced version of GOMS, NGOMSL [Kie88], incorporates the KLM and supposedly assists with some of these diculties but, application of NGOMSL is much more complex than the KLM and the practical use of NGOMSL is not well documented [Ebe94]. 2

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

11

In [EE89] it was argued that the anthropomorphic approach was better applied by examining mismatches between human-human and human-computer interaction rather than trying to design systems that matched these. Clearly, this anthropomorphic approach is of relevance to software and systems developed for public and general use but may be less relevant where expert users are concerned.

2.3.3 Cognitive Approach to Design The cognitive approach applies the theories of cognitive psychology and cognitive science to user interface design. These cognitive theories attempt to explain how users view information, how this information is stored in a user's memory, how information is manipulated for problem solving and decision making, and how users respond once solutions (or potential solutions) are discovered. This approach revolves around three important concepts: conceptual models built by designers, mental models developed by users and the representation of the conceptual model that a user sees (the user interface or system model) [Nor83, EE89]. Designers develop conceptual models of a system (the designer's model of the task and objects [Whi90]), prior to implementation, to provide a framework for the presentation of the functionality of that system to the user [May92]. Users then form their mental model (often called user model) of the implemented system. In general, the user's model is based on experience with a previous system, whether that system was a manual or automated one, and both visible and invisible aspects of the new system. Considering this framework we expect that the degree to which the designer's conceptual model (implemented as a system model) helps to build the user's mental model appears to be vital to the usability of the system. However, there has been little research to assess the in uence of di erent conceptual models on software use [May92]. Smith and Lansman [SL92] described a writing environment that was designed based on what they termed `cognitive modes'. Essentially these were strategies and tactics that users employed during document generation. Their evaluations of the environment did not conclusively prove their theories. Some design decisions were vindicated but others were not. There is a substantial body of literature, regarding cognitive issues and their application in the cognitive approach, that can be used and applied in the area

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

12

of user interface design. For example, Gardiner and Christie [GC87] presented principles from literature on thinking, memory, skill acquisition and language, in applying cognitive psychology to user interface design. Eberts and Eberts [EE89] noted that theories of problem solving, spatial reasoning, analogical reasoning and metaphors, attentional models, and goals, plans and scripts have also been used.3 Two of the more important concepts to emerge from considerations of models have been metaphors and direct manipulation [Shn83]. Metaphors combine characteristics from a familiar domain (such as an oce desktop [SIK+82]) with the system interface so that they become one and the same. Indeed, the anthropomorphic approach might well be seen as a special metaphor where the computer is seen as having human qualities. Direct manipulation of on-screen objects is a `natural' component of metaphors that relate real-world entities to screen-based objects. Users are able to interact with and handle objects on the screen as they would real objects. We note however that direct manipulation is not necessarily linked to metaphors. In general, direct manipulation systems reduce the need to type commands. Our visual perception skills allow us to quickly interpret the screen display and directly manipulate any objects that are present (whether they be metaphor-based or not). The book by Mayhew [May92] included a chapter on conceptual models where fourteen guidelines were presented to assist designers to develop conceptual models. Galitz [Gal93] provided similar advice in a chapter on system considerations as did Larson [Lar92] in a chapter on end-users' conceptual models. Regardless of this information the communication of these cognitive results from psychologists to software engineers continues to be a signi cant concern and much more e ort is required from both disciplines [Gre94, SH94]. However this is not only a problem for software engineers. Most human-computer interaction researchers create models and then suggest that they can be used in system design, but rarely do they indicate how they can be used much less demonstrate or evaluate their usefulness [Whi90]. We need to evaluate the user interface of software and in particular the models that provide the framework underlying the functionality a orded by that software. Eberts [Ebe94] expanded this list to include connectionist or neural network models but more for their potential application rather than current use in user interface design. 3

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

13

2.4 Use of Guidelines in Design Choices about user interface design options can be assisted by reference to standards and guidelines. By encapsulating generalisable results obtained by either experience-based or model-based design, guidelines are one way of communicating the results of experiments, and knowledge from areas such as psychology and graphic design, to software engineers. Additionally guideline reviews are a useful informal evaluation method [JMWU91]. Every issue of the SIGCHI Bulletin of the Association for Computing Machinery (ACM) contains a column devoted to standards, both current and emerging. Nearly all computing-oriented standards organisations are attempting to examine some aspect of human-computer interaction. For example, hardware standards have been developed by the Federal Republic of Germany's national standards organisation (DIN) and the American National Standards Institute (ANSI) have adopted technical standards for keyboards, video displays and furniture. Standards for software are rarer. DIN has proposed a standard that might be called `software aspects of user interfaces' [Gou88]. The Institute of Electrical and Electronics Engineers, Inc. (IEEE) P1201 standards working group established two sub-groups: P1201.1 to de ne a standard user interface management system (UIMS) based on the X Window System,4 and P1201.2 to de ne recommended practices for graphical user interface drivability. ISO 9241 will eventually have 17 parts and be quite comprehensive in this area [TN91]. So far only three parts; Introduction, Task Requirements and Visual Display Requirements, have completed the entire ISO standards development and approval process [Bil94] with the rst part (approved in 1988) currently undergoing signi cant review. ISO 9241 is expected to form the basis of the European Community health and safety directive 90/270/EEC on health and safety requirements for work with display screen equipment [Bil93] and any ANSI software user interface standards e ort [Ree94, Bil95]. Computer hardware and software manufacturers also provide guidelines. The development of the Xerox Star is a classic example [SIK+ 82]. NCR has a human factors guideline series and there are various Macintosh user interface guidelines 4X

Window System is a trademark of the Massachusetts Institute of Technology (MIT).

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

14

[App88]. Some organisations and companies go so far as to require observance of their guidelines. The US Department of Defense has developed detailed requirements for applying human usability engineering to the development of systems (Military Speci cation MIL-H-46855) [Gou88]. Availability of guidelines of this type is largely dependent on the sponsoring institution. Many manufacturers, for example, IBM, Wang and DEC, develop guidelines for in-house use only [Smi88]. The formulation of `look and feel' guidelines such as OSF/MotifTM[Ope93]5 and AT&T and SUN's OPEN LOOKTMguidelines [Sun89]6 assist the designer in maintaining uniformity across a range of tools and systems. The extent to which such guidelines become `standards' remains to be seen, but provided they are consistent with results arising from human factors research, they provide an e ective channel for communicating these results to the software engineer. There may be argument about the interpretation of guidelines such as OSF/Motif and OPEN LOOK but this is considered a relatively minor problem by some if not all [NSC+ 91]. Further operating system speci c standards and guidelines are available (see for example [Ano94]). Molich and Nielsen [MN90] argued that small sets of general guidelines were a better basis for design than large sets since large sets were often not consulted during the design process because of their sheer size. They proposed the use of only nine usability principles. The ten general design principles described in [App88] provided similar guidance. Such guidelines are designed to `raise the consciousness' of designers but they provide little in detailed assistance. The main problem with such guidelines is that the intuition, experience and skill of the designer are still the main components used in interpreting and implementing such guidance. Recently Mayhew [May92] published nearly 300 guidelines; each with examples and illustrations, all presented against a backdrop of the relevant research. This guideline set is much more comprehensive than those in [MN90, App88] but there is little or no indexing of the guidelines and no support via, for example, software. The guidelines presented by Smith and Mosier [SM86] are another typical example of such sets of guidelines. These guidelines form the foundation for the current 5 6

OSF/Motif is a trademark of the Open Software Foundation. OPEN LOOK is a trademark of AT&T.

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

15

work on ISO 9241 [BM94]. There are 944 guidelines in this set making it one of the largest sets currently available [TN91]. The set not only provides extensive indexing and cross referencing of guidelines but there is also software supporting use of the guidelines: NaviTextTMSAM [PM88],7 BruitSAM [Ian92], and HyperSAM [Ian94].8 Many of the guidelines above were developed when full screen, character-based systems were the norm. Nevertheless, such guidelines still embody good design principles and are still appropriate for graphical user interfaces as well as for even more futuristic hardware technology represented in, say, virtual reality environments [Nie92]. Guidelines can be used both as design tools and evaluation tools. Je ries et al. [JMWU91] examined four di erent techniques for user interface evaluation: heuristic evaluation,9 software guidelines, cognitive walkthroughs and usability testing. The results indicated that guidelines were useful as a tool to identify important usability problems and that guidelines were usable by software engineers who were not necessarily user interface design specialists. More recently guideline reviews have been categorised as a typical usability inspection method [MN94]. The e ort in applying guidelines is large, particularly where the set is large. Guidelines are often repetitive, too general or too speci c, highly interconnected, open to di erent interpretations by di erent users and multiple interpretations by the same user. However, individual guidelines act as a stimulus for inquiry and discussion on proposed user interface design alternatives. There is no doubt that, as a check list, guidelines are an extremely useful tool, but given the e ort required to use such guidelines, to what extent they provide complete guidance needs to be evaluated. Guidelines need to be investigated since they are a signi cant resource and have the potential to assist software designers in both the design of the user interface of their software and, through guideline reviews, in user interface evaluation. NaviText is a trademark of Northern Lights Software. Work describing the continuing development of tools that support the use of guidelines is maintained in [Van94]. 9 Heuristic evaluations are those done by experts with a view to nding obvious usability errors. 7

8

CHAPTER 2. APPROACHES TO USER INTERFACE DESIGN

16

2.5 Summary We have reviewed strategies that enable software engineers to identify design options and make decisions about and between options for the design of user interfaces for software. It is not at all clear that software engineers explicitly use such strategies or apply systematic approaches to user interface design decisions. In the next chapter we examine these strategies and their application to the design of the user interface for software development tools and then de ne objectives and content of the rest of this thesis.

Chapter 3 Interface Design for Software Tools In this thesis the domain of interest is software development tools. Thus, we are interested in how the strategies presented in the previous chapter might be applied to user interface design for software tools. In this chapter we examine these strategies in this context, the user interface issues that arise, and their relevance and application to the design of tools. We also de ne the investigation described in the remainder of the thesis.

3.1 User Interface Design Strategies To achieve the goal of this thesis the strategies associated with user interface design, as presented in the previous chapter, require examination from the perspective of the design of software development tools. Such an examination leads to a better understanding about the requirements of user interface design in this domain.

3.2 User Experienced-Based Design 3.2.1 Design by Intuition

It is common for software engineers to develop a user interface for a software product with minimal reference to the potential users of that product|in e ect, using their intuition and commonsense to choose an appropriate user interface. This is especially true of the designers of software tools. There is a natural tendency for these designers to see themselves as typical users and to disregard the considerable 17

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

18

variation in individual users' characteristics that actually exists and how these characteristics evolve as users become more experienced with using a tool. Thimbleby [Thi90] suggested that there was evidence that these developers have designed effective tools for their own use but this led them to underestimate the problems that other users experienced when using the tools. Of course, software developers do have a better understanding than usual of the user's problems when it comes to software tool use, but this implies more responsibility, not less, if these tools are to gain acceptance and be used by other software engineers. Typically, the uptake rate of innovative software tools by software engineers is lower than their designers expect, and this is often attributed to usability issues. Evidence of the use of a participative approach to design and evaluation of software tools was dicult to locate. It seems likely that some software tools have been informally evaluated by potential users of the tools (with such evaluations fed back to the designers) but explicit descriptions of this process have not been found. Zelkowitz [ZKIH89, Zel90] reported on a survey where computer science students were asked to rate and comment on their satisfaction with the syntaxdirected editor SUPPORT. The students rated SUPPORT lower than typical text editing environments available at the time. Nevertheless, development of the tool continued for several years and through several re-designs, but evaluations of these were not reported. Intuition forms the basis for many software evaluations and, in particular, for evaluations of the user interface of software tools. This is unsatisfactory and more rigorous methods are required. The opinions of the tool designers are important, but less subjective measures should be sought in evaluating ease of use and, indeed, the optimality of the user interface for all users of software tools. Intuition is important to tool design but more systematic approaches to selection and evaluation of design options need exploration.

3.2.2 Experiments to Assist Design Choices Experimental studies with prototypes and users are a valuable means of selecting between user interface design options. Unfortunately, empirical studies of users interacting with software tools are not common. Where evaluations have been

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

19

done and reported they were usually informal and primarily anecdotal. The CHI'90 Workshop on Structure Editors reported in [NS90] noted experimental studies by only three sets of researchers [ZKIH89, Chi90, GW91]1 and concluded that there was far too little of this type of research. The consensus of the workshop participants was that the complexity of tools, the types of tasks and actions in which tool users engaged, and decisions about what and how to evaluate, hindered research in this area. More recently the report from the ICSE-16 Workshop on Software Engineering and Computer-Human Interaction [TC94] criticised the lack of testing and evaluation in this area noting how `immature' many studies were, particularly those consisting only of questionnaires. In a recent review of structure-oriented environments, Whittle, Gautier and Ratcli e [WGR94] noted that evaluation of the usability of these environments had received little attention. They highlighted only three articles [Lan86, WHBK87, Min90] and reported on one continuing project [OSWS93]. Several reasons were given for this lack of attention: designers tended to rely on their own evaluations or informal evaluations by their peers; usability evaluations were hampered by the relatively weak market penetration of tools and consequent lack of experience of users with the tools; and between-environment evaluations were dicult to undertake due to the diversity of facilities o ered. There are problems with experimentation, such as cost, subject selection and extrapolation of results to `real life' but the objectivity that experiments bring to the evaluation process is invaluable. Experiments with software tools and tool users, and experimental methodologies appropriate to this domain, are required.

3.3 Model-Based Design

3.3.1 Predictive Models for Comparing Designs Modelling techniques, such as GOMS/KLM, predict potential usage patterns for systems and timings for speci c tasks. Text editors, which are primitive software tools, have been studied extensively using predictive models. Embley and Nagy [EN81] reviewed the literature on the application of predictive models to text editing 1

At the time of the report the last study had not been published.

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

20

tasks and indeed much of the discussion in [CMN83] was based on examination of editor use. All studies reported the usefulness of the approach in accurately predicting user actions and task times. Since then, though, the technique does not seem to have been applied in the area of software tools. Of the problems with the GOMS/KLM approach, those related to component identi cation within tasks and application to a complete system seemed most relevant to software tools, and most dicult. The assumption of skilled users and error-free behaviour is less of a problem when examining software engineers using a software tool than it is for other situations where unskilled users are required to use tools and techniques about which they have no expert knowledge. In some instances the identi cation of task components is simple. For example, a task such as document correction after mark-up, which could involve cut and paste operators to achieve the stated goal, might be implemented using mouse-based menu selections or keyboard-based commands. Another task such as document comprehension prior to correction is much more abstract involving the user's knowledge and the user's cognitive skills as well as appropriate perceptual and motor skills for assisting with browsing and searching. It may be dicult or impossible to completely model this comprehension and correction task using GOMS/KLM techniques since it is not just the command structure that is in use|the form of the display system also a ects the time of tasks and error frequency. Thus, creating models for complete systems may be limited to only qualitative analysis since we are limited by our ability to model individual tasks. Despite such limitations the approach has been shown to be useful in a range of situations and with text editors in particular so it warrants investigation as a technique to apply to the evaluation of competing designs for interface components of current generation software tools.

3.3.2 Anthropomorphic Design Anthropomorphic approaches to design, where human qualities are applied to computer systems, are reasonably common. However the usefulness of such systems, where they have been designed and developed for software engineers, is relatively unknown. In particular, software engineers would be concerned if such systems

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

21

treated them as novices. This approach for developing user interfaces for tools and environments, is exempli ed by the work of Carroll [Car94] and his colleagues.2 They were concerned with examining actual use of systems as a basis for design by developing scenarios of system use. For example, the Guru tool utilised a software developer's natural tendency to apply self-evaluation techniques to code writing by o ering suggestions on possible improvements to code design. One of the problems noted, though, concerned the level of advice o ered which may cause programmers to consider their code from too narrow a perspective. Carroll argued that this approach warranted further consideration although whether this was in the context of design of software tools or computer systems for use by a much wider range of user types was unclear. The Programmer's Apprentice project [RW90] was another signi cant research e ort investigating the use of arti cial intelligence techniques to support the software development process. Their aim was to make expert software engineers `superproductive' by providing them with an intelligent assistant (similar to a human assistant or junior programmer). A demonstration system was developed, KBEmacs, but as far as we can ascertain there have been no empirical studies with software engineers using the software. Anthropomorphic approaches, and in particular `intelligent' tools, appear to have received some attention by designers of software development tools but to what extent the tools developed have been successful is dicult to gauge. Moreover, if this approach suggests tool qualities that software engineers perceive as appropriate for novice users rather than experts then those tools are unlikely to be utilised.

3.3.3 Cognitive Approach to Design Cognitive science and cognitive psychology have much to o er the software engineer designing the user interface for a software tool. A decade ago, Soloway [Sol84] argued that software development environments, languages and methodologies should be based, at least in part, on a cognitive understanding of the programmer and the tasks to be performed by the programmer. There have been numerous studies of the cognitive processes involved in programming (both before and after 2

See the reference list in [Car94].

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

22

Soloway's paper). For example, the Empirical Studies of Programmers workshops [SI86, OSS87, KBMR91, CSS93] have been the premier forums in this area and much research has followed from ndings reported there. However, communication of much of this research to software tool designers has been a problem. Green [Gre94] commented recently that software engineers had paid `little or no heed to empirical results obtained by cognitive psychologists' and had produced systems which were `strange and gawky'|characteristics that should have been obvious. In the defence of software engineers, he noted that many software development environments were viewed by software engineers as experiments and, therefore, not necessarily perfect in any sense. Koubek et al. [KSDL89] examined the overall process of software development to generate a methodology suitable for the extraction of knowledge of the process. This review recognised various models of the cognitive processes in computer programming and the relationship between the various tasks in software development (based on the waterfall model of the software life-cycle) and cognitive aspects in programming performance. They reviewed cognitive aspects of all software lifecycle stages. It has been argued that these activities interact to such an extent that it is dicult to consider aspects of them separately [PG90] but they are not necessarily identical and they can be and have been studied separately [KSDL89]. Since software development tools cover the entire range of software development activities, it seems reasonable to propose (or suppose) that such research has been considered in designing user interfaces for software tools. Generally, though, this is not the case. The review of text editors in [EN81] noted the `rudimentary fashion' in which cognitive psychology had been applied to their study. Robertson and Black [RB86] reviewed some later studies of text editing (noting that it was a complex cognitive skill) as well as experimenting with how users developed and used plans while editing. A more recent review of text editors by Roberts [Rob88] suggested that designers needed to follow general cognitive psychology principles in designing editors. The review also concluded that evaluation must be a key component in the design phase with evaluation by users the most trustworthy means of validating usability.

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

23

Considerable attention has been given to cognitive aspects of the software development process. However, the extent to which cognitive issues have been explicitly incorporated in software development tools is limited. Further, if they have been included then there appears to be no evaluation or only primitive evaluation of the design choices generated by them. For example, we are aware of the underlying conceptual models and anticipated user models used to generate design options for the user interface of various syntax-directed and language-based editors [WBK91], but there has been no empirical testing, and there is only anecdotal evidence to support the options chosen. Clearly, this cognitive approach provides many potentially useful ideas for assisting in design decisions related to the user interface for software tools. The three key concepts of conceptual model, mental (user) model and user interface representation need consideration in the design of software tools and these tools need subsequent evaluation by users to ensure that the models are appropriate.

3.4 Use of Guidelines in Design Guidelines, and their use in assisting the design of user interfaces, have been studied extensively, but as far as we can ascertain there are few studies of their use in the design of the user interface for software tools. We are aware of the consideration of the OPEN LOOK guidelines for designing the `look and feel' of a speci c languagebased editor [BWW89]. Another small study [LL93] examined the use of Motif guidelines by four designers in generating the interface for a tool that supported the browsing of les for modules for reuse. Of the designs produced in this study, only one was fully Motif-compliant. As well nearly one-half of the design deviations identi ed were violations of general design recommendations and best practice as documented in available guidelines. Other instances of the use of guidelines in this domain probably exist but explicit reference to such use has not been found and, in general, design rationales for software tools rarely indicate the basis for the design of the interface. Guidelines need to be investigated further since they are a signi cant resource and have the potential to assist software tool designers in both the design of the user interface of their tools and, through guideline reviews, in user interface evaluation.

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

24

3.5 Project Description The objective of this study is to investigate the feasibility of systematic approaches to evaluation of user interface design for software development tools by exploring the application of the relevant design strategies mentioned in previous sections. Based on these strategies, general outcomes for the design of user interfaces for software tools include: more consideration by software tool designers of user-centred design principles; realisation by tool designers of the importance of interface design guidelines; recognition of the use of predictive modelling in the early stages of design; and a determination, on the part of software tool designers, to engage in user-based experiments to evaluate their tools. Outcomes speci c to language-based editors also arise, though the details emerge more completely in chapter 4. In chapter 4 we introduce two alternative existing tools that provide a focus for our investigations, and we identify two signi cant conceptual issues. These issues coincide with usability issues identi ed in [KU93] relating to `user view of language structures' and `enforced construction discipline'. In chapter 5 we describe a guideline review to analyse an existing languagebased editor's user interface. By comparing the editor and the guidelines, in this retrospective study, it is possible to gauge the extent to which the editor conforms to the guidelines. Issues neither addressed nor uncovered by the guidelines, which the editor designers considered important, are also indicated. In chapter 6 we re-examine the rst of the conceptual issues identi ed in chapter 4 and describe an experiment to compare the e ectiveness, for menu selection purposes, of textual display versus graphical display for hierarchical navigation spaces such as the presentation of the overall structure of a document. In chapter 7 the second conceptual issue identi ed in chapter 4 of choice of editing paradigm is evaluated using predictive modelling techniques. Analysis and results from the models suggest the need for experiments that both verify the ndings and validate the models. In chapter 8 we describe such a usability experiment of the editing paradigms and in chapter 9 we present a validation of the predictive models from chapter 7 based on the data collected in the usability experiment.

CHAPTER 3. INTERFACE DESIGN FOR SOFTWARE TOOLS

25

In chapter 10 we present two untested conceptual issues and discuss methods appropriate to their examination. These methods are based on the procedures and results of previous experiments and, as such, form a framework for future studies in this domain. Finally, chapter 11 summarises the major contributions of this thesis and discusses topics for future investigation.

Chapter 4 Software Development Tools In the previous chapter we examined user interface design strategies and how they have been applied to designing software tools. We noted how designers tend to rely on their intuition in tool design and evaluation and we suggested that this was inadequate and unsatisfactory. In this chapter we review conceptual models that have been proposed for tool design, brie y describe two software tools that are currently available and whose user interfaces are examples of implementations of those models, and two conceptual issues underlying the design of such tools.

4.1 Language-Based Editors The essential characteristics of a software tool include its use for display and edit of static or dynamic objects. Software tools are available for all phases of the software development life-cycle. Unfortunately, though, we have neither the time nor physical resources to examine all software tools or types of tools. Software tools can be categorised by the roles they play. Sommerville [Som95, pages 508{509] provided a functional and an activity-based classi cation of tools. Language-based editors are typical software tools in that they support construction and checking activities, the dominant roles of software tools. Thus we examine them in the case-studies and experiments reported in this thesis. The extent to which language-based editors are being utilised in practice is unknown although anecdotal evidence suggests that they are little used outside educational and research institutions. Minor [Min90] investigated reasons for their lack of success in attracting wider use and noted `interaction' with such editors as 26

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

27

one of the main concerns of users. In [Zel90, KU93] it was reported that many users criticised language-based editors for being too restrictive, in exible and inef cient. In many ways then, such tools were ideal as case-studies and examples for application of the user interface design strategies mentioned so far. We begin by discussing conceptual models of documents expressed in highly structured languages (structured documents). Such models form an integral part of the design process in that they suggest design options for the tools that display and manipulate the documents. Then we examine user interface aspects of two language-based editors.

4.2 Models in Editor Design A cognitive approach to user interface design for software tools has been advocated for some time. Using this approach in the context of editors implies that designers need to develop a conceptual model of the data and operations that may be applied to that data. The system model (user interface) represents how this conceptual model is presented to the user and, at any particular instant, it is the current program and all programs able to be constructed from it using the available data manipulation operations. Provided this system model is consistent with the user's (mental) model of data and operations, the design is a good one. In most of the literature describing language-based editors and their design rationale we nd little or no reference to this approach to design. Welsh, Broom and Kiong [WBK91], in describing the design rationale for the UQ editors, postulated a conceptual model based on the entity to be edited by a user|the source code. A program is essentially a tree so it is not surprising that early syntax-directed editors provided editing operations as tree manipulations e ectively giving the user a tree editor. However, this implicit model1 of the objects under manipulation and their manipulation methods is a narrow view of the editing process. For example, it seems unreasonable to require users to edit arithmetic expressions in a tree-like manner so we would expect a mismatch between this part of the conceptual model and any model formed by a user. We say implicit because there is no evidence that the appropriateness of such conceptual models was explicitly considered in arriving at the design. 1

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

28

In formulating their conceptual model, [WBK91] considered the hierarchy of constructs that appear in programs. Their diagram (reproduced as gure 4.1) indicated a pluralistic model of program data where users sometimes viewed programs as text and sometimes as tree structures, depending on the user's purpose at the time. An upper bound for a tree-like view is easy to x since few users perceive a hierarchy of nested modules and blocks as other than a hierarchical tree structure. This is one of the basic disadvantages of text editors|the requirement `to manipulate it (program text) as a attened space' [WBK91]. There is no diculty in identifying a lower bound either since no users consider identi ers or literals as trees of characters. Thus, at this lower bound, any editor must allow simple textediting operations. In between these bounds the situation is less clear, it being `dicult to justify an exclusively tree-like or an exclusively textual perception of the constructs' [WBK91]. They argued that an editor that only recognised tree structure for the entities in the middle part of gure 4.1, was no better and could be worse than one that only recognised text. This observation is consistent with comments by others [Wat82, Min92]. compilable units, programs, etc. nested modules, packages, procedures and functions declaration parts, structured statements and structured types simple statements and declarations expressions, lists, etc. identi ers, literals etc. characters

tree-like tree-like trees, lines, symbols or text? trees, lines, symbols or text? trees, lines, symbols or text? text-like text-like

Figure 4.1: Diagram from Welsh, Broom and Kiong [WBK91] showing pluralistic model of program structure.

Another diculty [WBK91] perceived with a purely tree-based view of programs related to the display of programs. Most code is displayed and manipulated as text. As such it fails to re ect a tree-based view of a program so users are forced to abstract from a textual view. Graphical display of programs is possible but only modest program fragments can be displayed in this manner. Users prefer commands which represent `direct manipulation' of the displayed representation to those whose e ect is de ned in terms of some abstraction of the display [Shn92]. The textual display enforced by display limitations therefore biases the user towards operations

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

29

de ned in textual terms. The conceptual model of [WBK91] suggested that a pluralistic view of program structure was more logical than a purely tree-based view. This model was implemented in the UQ editors but so far it has not been validated. Subsequent chapters in this thesis explore aspects relevant to this conceptual model.

4.3 Editing Paradigms The most striking di erence in language-based editor designs that follows from the (implicit) model of documents is the editing paradigm. In language-based editors to date, two basic paradigms for the input and editing of program fragments have been recognised. One is the tree-building paradigm, in which the user is expected to perceive and manipulate the program as a structurally correct (if incomplete) tree at all times. The user extends an existing program by expanding a previously unexpanded node of the program tree, usually by selecting an appropriate node template from the menu of allowable templates at that point. Likewise the user may only delete existing constructs that are optional, or replace an existing construct with another of the same syntactic class. Tree-building by template selection has the obvious advantages that construction of syntactically incorrect programs is precluded, and the user is freed from all need to type syntactic sugar in the program|the only textual input required is the user-determined lexicals such as identi ers and literals which appear as leaves of the tree. However, tree-building is an unnatural and very tedious way of constructing low-level constructs such as expressions and identi er lists, so most editors adopting the paradigm (including the Cornell editor used here) allow users some form of text input and editing at these levels. The conceptual model of program structure underlying this approach is simple, consisting of a strict tree model at those levels where manipulation by template selection is required, with textual words or phrases as the leaves of the tree concerned. From the user's viewpoint, however, the tree model is compromised by the fact that a textual display is normally used at all levels. To formulate the next operation required, the user must exercise signi cant cognitive e ort to abstract

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

30

from this textual display to the tree structure involved. The alternative to tree-building is the (text) recognition paradigm, in which the user manipulates the displayed representation in textual terms, and the editor parses or `recognises' the textual changes to deduce the program tree required. The delete, insert and replace operations permitted by this approach are clearly a superset of those allowed by a tree-editor, giving the user greater exibility as to how a given change is e ected. Depending on the nature of the text inserted or deleted, the user may conceive the change involved as a tree manipulation, a symbol sequence alteration, or a character-based textual alteration, but need not distinguish which it is. The editor is thus tolerant of a pluralistic model of program structure. Text recognition is more natural for input of low-level constructs, and in general has the advantage of `direct manipulation', that is, the user perceives and e ects the change involved directly in terms of the displayed representation. It does not, however, preclude user error, and in general the editor must tolerate incorrect intermediate program states as changes are made. Its e ectiveness in achieving error-free programs (and in reducing the keystroke e ort required of the user) depends critically on the nature of the parsing process applied, the degree of its synchronisation with the actual keying of the text involved, and its capacity to propagate the consequences of changes in an ecient manner. The choice between tree-building and text-recognition paradigms has been an issue in language-based editor design over the past decade, with much intuitive comment appearing in the literature. To the best of our knowledge, however, no systematic attempt to demonstrate the advantage of one paradigm or the other, either by application of relevant theories or by controlled experimental evaluation, has been attempted. In chapter 7 we present a predictive modelling approach to evaluation and comparison of these paradigms while in chapter 8 a usability experiment comparing the paradigms is reported.

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

31

4.4 The Synthesizer Generator The Synthesizer GeneratorTM2 [RT84, RT89a, RT89b]3 is a system for generating language-based editors. It has been used to produce editors for a wide variety of applications including programming languages such as Pascal, C and FORTRAN 77.4 Originally editors were built for visual display terminals only, so all input of data and commands was through the keyboard. Now editors are able to be generated for the X Window System and SunViewTM5 as well as visual display terminals. Figure 4.2 shows an example of the editor's user interface. As far as we can ascertain, the Synthesizer Generator user interface was designed intuitively with no reference to the approach in section 4.2. The X Window System version used in our studies, supported multiple overlapping windows and mouse-based selection of data and commands. Using editors built with the Synthesizer Generator, language constructs at all levels are created and manipulated as templates. These templates provide formatted patterns for each construct in a language. The Cornell editor presents the user with a menu of template constructs and operators. This menu is permanently displayed and dynamically updated as the user inputs text and selects templates. Thus the editors are essentially tree-building editors. However the editor builder can also specify that certain constructs, usually at lower levels such as expressions, can be entered and edited as text. The example editor that we used provided this hybrid approach to editing at this level.

The Synthesizer Generator is a trademark of GrammaTech, Inc. Work on the Synthesizer Generator began at Cornell University with the Cornell Program Synthesizer. This program evolved into the Cornell Synthesizer Generator and now the commercial product known as the Synthesizer Generator. In this thesis we refer to editors built using the Synthesizer Generator as Cornell editors and, where appropriate, abbreviate (Cornell) Synthesizer Generator to CSG. 4 For a more complete list see appendix E of [RT89b]. 5 SunView is a trademark of Sun Microsystems, Incorporated. 2

3

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

Figure 4.2: The Cornell Pascal editor interface.

32

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

33

4.5 UQ1 UQ1 [WRL86] is a language-based editor designed for use on a bit-mapped display with mouse and keyboard input. Currently there are SunView and X Window System versions. It is a generic tool for inspecting, creating and editing software documents and has been instantiated for a number of programming languages including Pascal, Modula-2, and Ada, and for the speci cation language, Z. The features for the Z instantiation are a superset of those prevailing in the other instantiations. A detailed description of the Z editor may be found in [CHW90]. UQ1's conceptual design was based on consideration of the cognitive aspects of program editing as discussed in section 4.2. It is a multiple-window language-based editor. Each window displays a formatted textual view of the document being edited or browsed. During document input, typed text is parsed and formatted.6 UQ1 is a (text) recognition editor. Figure 4.3 shows a multiple-window formatted view of a Pascal program displayed using UQ1. The UQ1 editor is actually a precursor to UQ2 [BWW90] and, more recently, UQ* [WH94], new editing environments being developed at the University of Queensland. The physical appearance of UQ1's user interface was designed intuitively. An heuristic evaluation of this aspect of the interface would identify a number of major de ciencies for professional use, including:

 user commands are activated via a control panel which is permanently displayed and consumes too much screen real estate,

 button sizes and placement are not tailored to ecient normal use,  insucient keyboard accelerators are provided to avoid slow interleaving of mouse and keyboard activity, and

 multi-window display management is inconsistent with the host windowing system's conventions.

More complete details of UQ1's functionality and user interface characteristics are given in chapter 5. 6

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

34

Figure 4.3: UQ1 Pascal editor showing block-oriented program display.

In contrast, the designers of UQ2 considered anecdotal user reaction to UQ1 and the OPEN LOOK style guide in designing UQ2's user interface with the following features:

 permanently displayed control features are limited to a narrow band of buttons through which menus or command windows are displayed when required,

 frequently used edit commands are available from a dynamically changing on-mouse menu within the edit area,

 default keyboard accelerators are de ned for all appropriate commands, with

additional or alternative accelerators de nable via a user-customising option, and

 multi-window display management conforms to the host windowing conventions.

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

35

A similar range of features was considered for UQ*'s design. Unfortunately, at the time of these studies development of UQ2 had ceased in favour of UQ* which was still under development, while UQ1 was stable and readily available. Thus, UQ1 was the platform used throughout this study although, where relevant, features of UQ2 and UQ*7 are also considered and discussed. From the model of program structure discussed in section 4.2, [WBK91] postulated that the user perceived the structure of a Pascal program say, as a tree of nested blocks. In practice however the user was most commonly interested in looking at one block at a time, less frequently in comparing the content of two or more blocks, and occasionally in reviewing the overall hierarchic structure itself. To meet the user's predominant need (to look at one block at a time) the solution adopted in the UQ editors8 was that each view presented was a single block and its associated heading, with any nested blocks abbreviated by elision of their bodies. In the UQ1 screen shown as gure 4.3 the topmost sub-window shows the display of the outermost block of a trivial Pascal program Example. The ellipsis points (. . . ) after each procedure heading indicate that the corresponding procedure bodies are suppressed by the display system. To view any of these abbreviated procedures, the user invokes zoom in. This replaces the current view by that of the procedure concerned, in which the bodies of its nested procedures or functions are again abbreviated. From this point, the user may zoom in, zoom out, or pan forward and pan back between sibling blocks. Alternatively, absolute selection of a new block is made at any time from a menu of all viewable blocks in the program. This menu of viewable blocks also ful lls the user's occasional need to review the overall hierarchic structure. To meet the user's remaining need, that of comparing the content of two or more blocks, simultaneous display of views in two or more sub-windows was enabled. Figure 4.3 shows the use of a second sub-window to display the procedure FixPair, in parallel with the main program itself. Anecdotal evidence from users suggests that they nd these block-oriented display choices in the UQ editors preferable to a at scrollable representation. This is In future, unless stated otherwise, when we refer to UQ* we also mean UQ2. The Cornell editor can also be built with a similar facility although the example editor used in this study o ered only a at scrollable presentation. 7

8

CHAPTER 4. SOFTWARE DEVELOPMENT TOOLS

36

consistent with the unequivocal conceptual model at block level. In principle, the model also suggests that the menu of viewable blocks should be displayed in a graphical tree-like form. In practice, UQ1 displays this menu as an indented list of block names, as shown on the right in gure 4.3. The e ectiveness or otherwise of the textual display of this block-menu versus a graphical display requires examination. An empirical investigation of this issue is presented in chapter 6.

4.6 Summary Language-based editors are yet to receive the recognition, as assistants to the software development process, that their designers and implementors think they deserve. Their widespread use outside academic and research contexts is still to occur but the growth in their use will remain small until their design re ects the view of software, and functionality of tools for manipulation of that software required by users. Conceptual models form a basis for assisting the designer to understand a user's needs. We have reviewed conceptual models proposed for language-based editors and some issues that they raise. Various editors have system models (and hence user interfaces) based on some or all aspects of these conceptual models. We now want to validate these issues both theoretically and with users to see how well the user's model ts the conceptual and system models. To do so we use the concepts of guideline reviews, predictive models and usability experiments as described in chapter 2. In particular, the intuitive design of the interface of UQ1 invites evaluation against user interface design guidelines. We pursue this in the next chapter. The block-menu display issue invites a simple usability experiment to compare textual and graphical presentations (chapter 6). Comparisons of the basic editing paradigms used in UQ1 and the Synthesizer Generator are evaluated theoretically using predictive modelling (chapter 7) and in a usability experiment (chapter 8).

Chapter 5 Guidelines and a Language-Based Editor In chapter 2 we mentioned some of the sets of user interface design guidelines1 available to software designers. Potentially, these guidelines o er signi cant assistance to the software developer in the design of all types of software including software development tools. Therefore, an examination of their use in this domain is warranted. In this chapter we use guidelines to evaluate the intuitively designed user interface of UQ1.

5.1 How Guidelines can be Used Guideline reviews have been shown to be a useful method of identifying important usability problems in existing software and they are usable by software engineers who are not necessarily user interface design specialists [JMWU91]. By comparing the user interface of a software tool with a typical set of guidelines (if the guidelines were not used in the original design of the tool) it is possible to gauge the extent to which the tool conforms to the guidelines. Here we seek answers to three questions. Firstly, how might the use of guidelines a ect the design of the user interface of a typical software development tool (designed in an intuitive fashion without reference to guidelines)? Any study of this question provides insight into the impact of guidelines on this type of software product. The guidelines could suggest changes to a tool's user interface or We often drop the words user interface design and simply refer to guidelines to avoid heaviness throughout this chapter. 1

37

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

38

enhancements for future versions of the tool. In this sense retrospective application of guidelines is used to evaluate a user interface. Secondly, and equally importantly, are any user interface issues relevant to a tool and its successors not addressed by the guidelines? Experience with using UQ1 has already shown a number of user interface de ciencies, and corresponding improvements have been incorporated in the design of its successor UQ*. The interesting issue is the extent to which the guidelines identify these de ciencies and point to the same solutions. Guidelines have been successful in identifying many, but not all, problem types across a wide range of user activity [JMWU91, CB94]. Thus in this case it is the guidelines themselves that are on trial. Our third issue concerns the usability of guidelines themselves. Becoming familiar with voluminous sets of guidelines is time consuming and a designer needs to have some faith in the completeness of a set of guidelines if the designer is to invest the time to examine and utilise same. In trying to make use of long and complex documents, such as guidelines, much material goes unread, is misunderstood or dicult to interpret [dSB90]. Consequently guidelines are frequently not heeded [TS91]. Guidelines are a signi cant resource but they themselves must be usable.

5.2 Choosing the Guidelines Various guideline documents were mentioned in chapter 2. Many relate to user interface design at the physical level and some at the functional and conceptual levels. The guidelines presented by Smith and Mosier [SM86] are typical of such sets and are a signi cant, readily available resource for software designers. They principally address physical and functional level interface design issues, avoiding reference to speci c hardware and user interface implementation software. That is, they are a general set of guidelines applicable across all hardware and software platforms. They are more general than so-called `look and feel' style guides [LL93] such as Motif and OPEN LOOK. These style guides provide some high-level design principles but are mainly intended to enhance consistency of interface presentation across di erent applications. Moreover, they are dicult to interpret and apply, with their coverage of user interface design concepts being small [Mye94]. The Smith and Mosier guidelines were compiled using material that the authors

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

39

had previously published (including research on the use of guidelines by designers) and the published material of many others. The aim of these guidelines is to provide the user interface designer with a comprehensive and exhaustive set of guidelines that are e ective in a broad range of applications. Wording of the guidelines is in terms of the functions that a user must perform and the functional capabilities that a designer should provide, rather than the particular physical devices that might be used to implement those functions. Many of the guidelines were developed before the ready availability of graphical user interfaces but nevertheless they still provide good design principles [Nie92]. These guidelines form the starting point for ISO standards, such as ISO 9241, currently being developed [BM94]. There are 944 guidelines in the set spread over six functional areas making this one of the largest sets currently available [TN91]. Table 5.1 shows the distribution of the guidelines over these six functional areas. Table 5.1: Distribution of guidelines in Smith and Mosier [SM86].

1. 2. 3. 4. 5. 6.

Functional Area Number of Guidelines Data entry 199 Data display 298 Sequence control 184 User guidance 110 Data transmission 83 Data protection 70

Guidelines on data entry relate to the function of accepting data that is input by a user. Such guidelines might relate to lling in forms or input of text to a document. They are not meant to be used for input related to control of software, that is, they are not applicable to the command language of a software object|the guidelines in the functional area of sequence control are appropriate in this latter case. Similarly, the guidelines on data display are concerned with task-related data rather than prompts, error messages, help or guidance messages. These functions are treated in user guidance. The functional area of data transmission concentrates on guidelines in the area of communication|data transfer between systems and communication between individual users. Data protection deals with issues of security, whether it be to secure data from deliberate unauthorised access or accidental access and

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

40

consequent problems of integrity. Each guideline may include examples, comments, references to other relevant guidelines, exceptions, and references to external material relevant to the guideline such as design standards or research that in uenced the guideline. An example guideline is shown in gure 5.1.

1.04 Fast Response

Ensure that the computer will acknowledge data entry actions rapidly, so that users are not slowed or paced by delays in computer response; for normal operation, delays in displayed feedback should not exceed 0.2 seconds. EXAMPLE: A key press should be followed by seemingly immediate display of its associated symbol, or by some other display change. COMMENT: This recommendation is intended to ensure ecient operation in routine, repetitive data entry tasks. Longer delays may be tolerable in special circumstances, perhaps to reduce variability in computer response, or perhaps in cases where data entry comprises a relatively small portion of the user's task. COMMENT: Note that this guideline refers to acknowledgement, rather than nal processing of entries which may be deferred pending an explicit ENTER action. REFERENCE: [EG75] Table 2. SEE ALSO: 3.018, 3.019. Figure 5.1: Guideline 1.04 Fast Response, from Smith and Mosier [SM86].

5.2.1 Application of Guidelines Normally guidelines are considered before prototyping of software. In general these steps are followed: 1. Select the relevant guidelines. In some cases it is easy to eliminate guidelines since the proposed system is not expected to use a particular feature. For example, some guidelines relate speci cally to speech input which may not be necessary or appropriate for a particular product.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

41

2. Apply those relevant guidelines that are required. Some guidelines may con ict or time and budget constraints may prevent the software designer from considering certain guidelines. 3. Translate each guideline into speci c design rules. Without speci c rules, programmers interpret guidelines in almost as many ways as there are programmers. It is necessary to instantiate guidelines in terms of speci c rules so that all the members of a design team may act consistently regarding application of a particular guideline. 4. Evaluate the system designed. Evaluation is necessary to ensure that design rules have indeed been followed. In this study guidelines were used retrospectively. All relevant guidelines were chosen and examined. Speci c design rules were not developed because the software existed already. Evaluation was based on consideration of the compliance or non-compliance of the software with the guidelines. This evaluation procedure is general enough to be applied to any software product. More explicit application procedures, for the particular software development tool examined, are contained in later sections.

5.2.2 Software for Application of the Guidelines The Smith and Mosier guidelines document is large, comprising 488 pages. With 944 individual guidelines to examine, many being interrelated, the task of dealing with such a document was non-trivial. NaviText SAM is a software system presenting these guidelines in a hypertext environment, allowing easy navigation, collation and individual annotation of relevant guidelines. This compares with simply examining the printed document which would only allow copying of relevant guidelines with very limited scope for individualising the guidelines for the particular application. The software allows guidelines to be easily `gathered' either by simple sequential scanning of the guidelines or by selection of topics of particular relevance to some aspect in which the designer or experimenter is interested.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

42

In this study we used the software to generate a list of guidelines, including guideline descriptions, and then annotated these based on experience with the software development tool.

5.3 Correspondence of Editor Functions and the Guidelines To use the guidelines as an aid to developing software or as a tool to evaluate existing software it was necessary to partition all operations associated with editor use into the functional areas into which the guidelines themselves were partitioned. The version of the UQ1 editor that aided the production of Z language documents was the focus of this study, although much of the discussion in this chapter is relevant to other instantiations for the other languages. Features for the Z instantiation were a superset of those prevailing for the other languages. As already mentioned, the user interface of the UQ1 editor was intuitively designed. In e ect UQ1 was an excellent example for this study. Its successor, the new editing environment UQ*, has a less idiosyncratic user interface than UQ1 due to the in uence of `look and feel' guidelines, but, in part, it was this in uence that led us to choose UQ1 rather than UQ* for this study. As well UQ1 was stable while UQ* was still under development. The six functional areas covered by the guidelines have already been described. Below is a breakdown of the features of the UQ1 Z editor's operations in terms of these functional areas. We then examine the application of the guidelines to the Z editor.

5.3.1 Data Entry The editor was a tool designed for inspecting, creating and editing Z documents. For the purposes of this functional categorisation, data entry was essentially the entry of Z text. The Z language has a rich vocabulary of mathematical symbols, well beyond the range provided by a standard QWERTY keyboard. In the current version of UQ1, such symbols were input as equivalent ASCII keystroke sequences. In insertion mode, the user typed characters after the block cursor in the active

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

43

context of the document. The editor attempted to interpret the sequence of keys entered as a Z symbol and to display that symbol. If the ASCII text was consistent with an appropriate Z symbol, at a particular point in a Z construct, then it was parsed and formatted for display. The editor rejected symbols representing syntax errors and the user was immediately invited to enter another symbol. If an error was detected then an error message appeared in the MESSAGE area of the control panel. When appropriate, the editor anticipated and displayed mandatory downstream symbols for acceptance by the user.

5.3.2 Data Display Once data was entered, data display functions were concerned with the display of Z text. Since Z symbols were mathematically based and few were found on a normal QWERTY keyboard, input of Z text was achieved by the user typing Z text representations (as described under data entry functions above). On completion of a representation the actual symbol was displayed in the bit-mapped display window. For example, the logical not was input as lnot/ but the displayed symbol was :. In the document display region there could be several windows. The Z text displayed within any particular window was de ned to be a context. A context may be the entire document or a section, sub-section or sub-sub-section. The user was responsible for structuring a document into sections. The user also controlled the number of windows displayed and which contexts were displayed in those windows. When a context was displayed in a window it was compressed to show only the headings of directly-embedded contexts. Elided detail within these sub-ordinate contexts was represented by the ellipsis (. . . ). If a compressed context still did not t into its window then it was scrollable by the user. Within a window the physical format and display of Z constructs was controlled automatically by `adaptive formatting' [RW81]. At any instant there was one active window containing the active context. Within this active context, the highlight indicated the user's or system's current focus of attention, as a reverse video text sequence.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

44

5.3.3 Sequence Control Most commands referred to the active context. In inspection mode the user could de ne the highlight shown in the active context by moving the mouse cursor to a character and clicking the left, middle or right mouse button. The left mouse button highlighted the character, the middle button highlighted the symbol containing that character and the right button highlighted the construct that properly contained the character. A control panel facilitated activation of most sequence control functions. Use of the mouse was necessary to invoke any of these controls. The user moved the mouse cursor to the required control panel button and clicked an appropriate mouse button. There were 26 such screen buttons in nine groupings:

(a) Window management Clone

Close

(b) Context selection Contexts

Zoom In

Zoom Out

Pan Bwd

Pan Fwd

(c) Highlight movement ( ) * + (d) Editing Insert

Change

Append

(e) Checking Syntax

Analysis

(f) Text searching Backward

Forward

(g) Text movement Save

(h) Files Write

Write

`LATEX'

Coalesce

Delete

Undo

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

45

(i) User guidance/System control Help

QUIT

In all cases above, except QUIT, the user could activate the option by moving the mouse cursor to the screen button and clicking the left mouse button. In the case of QUIT the right mouse button needed to be pressed down and held down to display a pop-up window containing a menu of available options. The user could then make a selection by moving the highlight within the pop-up window to the desired option and lifting the mouse button. The editing buttons could also be selected in a similar way to the QUIT operation but for each edit button a single option was available. The management of screen `real-estate', or display of windows to view particular Z constructs, was facilitated by the host's windowing system SunView and the Clone and Close control panel buttons. To create a second window, the user rst needed to ensure that there was space below the current window, by reducing its size if necessary. This was a SunView2 procedure and was achieved by positioning the mouse cursor on the bottom of the window pane holding down the middle mouse button, holding down the CTRL key on the keyboard and moving the mouse cursor upwards until the required space was available, then lifting the CTRL key and the mouse button. The user could then select the Clone button to produce a copy of the current context in a new window. Once at least two windows were available the Close button could be used to close the active window. The user could navigate within a Z document using one of ve context-switching command buttons: Contexts, Zoom In, Zoom Out, Pan Bwd and Pan Fwd. The Contexts button allowed absolute selection of a context from a menu of available contexts while the other buttons allowed relative context selection. Choosing of the Contexts button presented the user with a window of the available contexts listed in an indented style re ecting the hierarchical structure of the Z document. The menu had limited display capacity but was scrollable in either direction using a SunView scroll bar. When the active context was other than the entire document its identi er within this list was marked with asterisks. Selection of a new context 2

A X Window version of the editor is now available.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

46

to become the active context was achieved by moving the mouse cursor to the required context label and clicking the left mouse button. Contexts could also be expanded (Zoom In) by selecting elided sections, sub-sections or sub-sub-sections. Selection was achieved by moving the mouse cursor to the ellipsis beside the name of the required context and clicking the middle or right mouse button. The user could also highlight the name of the section to be expanded. In this case the left mouse button also enabled selection. The new context became the active context, replacing the previously existing context in the active window. From the control panel, four large arrow buttons allowed the user to control the highlight of Z text in the active context. The ( and the ) display buttons moved the highlight, displayed in the active context, one character, symbol or construct (depending on the button clicked) left or right. The * and + buttons expanded and contracted the highlight. Using the * expanded the highlight to the smallest construct that properly enclosed the current highlight. The + reduced the highlight to its left-most sub-construct. The editing buttons allowed the user to invoke one of ve editing operations: Insert, Change, Append, Coalesce and Delete. Once an edit operation was complete an extra button, Undo, became available. In inspection mode the user could request a syntax check of the entire document by selecting the Syntax button in the checking group. A Z document could be analysed using various tools by selecting the Analysis button. Text searching involved setting two parameters, Range and Case, determining the Target text and selecting the direction of a search, either Backward or Forward, through a document. The range of the search for a target was local (the active context) or global (the entire document). The case sensitivity indicated whether the case of source and target text must match (sensitive) or not (insensitive). On editor startup these parameters had default values but they could be changed thereafter by moving the mouse cursor to the name of the option or its current setting and clicking the left mouse button. Repeating this operation cycled through the available option settings (in this case only two). Absolute selection could be made from a pop-up window using the right button (in a similar manner to the QUIT operation). On editor startup the target text zone was empty and the target was unde ned. With

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

47

the mouse cursor in the control panel, characters input from the keyboard, up to the next RETURN key, speci ed the target. If the target text zone was empty the user could simply hit the RETURN key and text highlighted in the active context became the target text. To change previously entered text it was necessary to position the mouse cursor appropriately within the text and either use the Delete key or insert new text. To invoke a search the user selected either Backward or Forward control panel buttons depending on which direction through the document the user wanted the search to proceed. If the search was successful then the target was shown highlighted in the active context. Switching between contexts occurred, where global searches required a change of context, to enable display of the target text. If a match was not found within the speci ed search range, the current view and highlight remained unchanged and a message appeared in the MESSAGE area of the control panel. Searching concluded at the end of a document for forward searching and at the beginning of a document for backward searching. There was no wrap around for searches. Highlighted text in a Z context could be saved for later copying by clicking the Save button. The save bu er, labelled Buffer, displayed a truncated representation of the highlighted text. Z text representations rather than Z symbols were displayed. The writing of les involved setting three parameters: Extent, Page Width and Indentation, which on editor startup had default values. The user needed to determine the Filename text and select the type of le to be produced. The Extent was either the current highlight, the active context or the entire document. It was set in the same manner as the search parameters. If the editor was started with a particular le as input on the system command line then that name was present as the lename text; otherwise the Filename text zone was empty. The Page Width and Indentation could be changed in the same way as the target text in the search procedure. To write the le to the UNIX R 3 le system, the user selected either the Write button or the Write `LATEX' button. The rst option wrote a le suitable for later edit and display using this editor while the second option wrote a le appropriate for incorporation in a text document to be processed by LATEX. There were two forms of help, one in inspection mode providing assistance with 3

UNIX is a registered trademark of AT&T.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

48

commands and the other in insertion mode that gave language-sensitive help. Details on the invocation and use of help are described in the next section on user guidance. Selection of the QUIT button allowed the user to exit the editor. The options presented to the user depended on the status of the document. All sequence control functions, used in insertion mode, could be input from the keyboard. During insertion the user could request language help by use of the key sequence, CTRL z. This elicited the same response as using the Help button. Other commands that operated in insertion mode allowed the user to accept an anticipated Z symbol, CTRL a, copy a previously saved piece of text at the current insertion location, CTRL c, or insert the contents of another le at the current insertion location, CTRL r. Invocation of this last command caused display of a window containing a prompt for input of the lename to be read, Read filename, and two buttons Ok and Cancel. The user typed the name of the le and pressed the RETURN key and then moved the mouse cursor from its current position in the active context to select one of the two buttons.

5.3.4 User Guidance Prompts, help and error messages were all dealt with in the set of guidelines in the functional area of user guidance. There were four di erent cursor styles: a block highlight cursor in the active context, a diamond shaped cursor in the non-active control panel window, a triangular blinking cursor in the active control panel window and the arrow cursor of the mouse that could appear in either the document display region or the control panel. In inspection mode the user could move the mouse cursor to a control panel button to select a command. A command button, once selected, was highlighted. The Help button provided assistance to the user. The type of assistance was dependent on the mode|inspection or insertion. In insertion mode, the help was language-sensitive. It presented to the user a pop-up window with the buttons: Expand, List Control Cmds, Restart, Help and Quit. Each of these could be selected with the left mouse button. An area underneath these was available for display of help information applicable to the language upon which the editor was

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

49

based. In inspection mode, help was available on the use of editor commands and operations. Here the user was presented with a menu of command options and a window above that menu in which the text appropriate to an option appeared. One of the options, EXIT, allowed the user to exit help mode. The left button of the mouse was used to select all help options. Errors in the use of the editor were reported in various ways. In inspection mode, with the mouse cursor in the document display region, use of the keyboard produced a null response. Mouse button clicks in white space produced an audible beep. In the control panel, mouse button clicks outside the available button areas met with a null response. In insertion mode it was not possible to use the mouse at all except to solicit help using the Help button in the control panel. If mouse buttons were clicked while the mouse cursor was in the document display region there was a null response. If the mouse cursor was moved to the control panel, while the editor was in insertion mode, then clicking the mouse buttons produced an error message in the MESSAGE area indicating how the user might exit insertion mode rst before selecting another command option.

5.3.5 Data Transmission Data transfer, between systems and between users, is the focus of the data transmission functional area. These guidelines concentrated on the format of messages passed between systems. The editor did not perform any of these functions|they were the responsibility of the host operating system UNIX which provided tools for electronic mail, le transfer and monitoring mechanisms for these functions.

5.3.6 Data Protection Data protection is concerned with security of data. Many of the issues addressed in the guidelines are accommodated by the host operating system. For example, the operating system controlled access to individual les, by particular individuals, using its security system. However the editor did take some responsibilities. These were mainly in provision of reversible control actions, such as undo for editing operations and con rmation actions for various destructive sequence control actions, such as le overwrite when exiting the editor or updating les already in existence.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

50

5.4 Application Procedure In this study the entire guideline set was examined. NaviText SAM facilitated collection and annotation of relevant guidelines. For each guideline an assessment was made of its applicability and if it was considered applicable (in any way) to the editor then that detail was recorded. Appendix A gives several examples of the guidelines reviewed and associated discussion. Each example includes the sentence describing the guideline, as given in the guidelines document, the list of associated guidelines and a discussion of the applicability of the guideline to the Z editor. The discussion was based on the partitioning presented in the previous section, experience in using the editor, details presented in the Z Editor User Manual [ZMa] and discussions held with active users of the editor. The complete set of guidelines and corresponding guideline discussion is available as [TW91a].

5.5 Analysis of Guidelines Review Of the 944 guidelines contained in the guideline set only 437 (46.3%) were considered applicable to the Z editor. This compared favourably with the survey conducted by [SM84b] on the application of guidelines. They found that respondents indicated that they only applied 40% of the guidelines published in another report [SM84a]. Table 5.2 shows the distribution of applicable guidelines in each of the six functional areas. Table 5.2: Distribution of applicable and non-applicable guidelines.

Functional Area Applicable Data entry 94 Data display 134 Sequence control 84 User guidance 85 Data transmission 0 Data protection 40 Total 437

Total 199 298 184 110 83 70 944

% Applicable 47 45 46 77 0 57 46

For data entry, the guidelines associated with tables (10)4, graphics (44) and speech input (6) accounted for most of those not applicable. Guidelines on tables 4

That is, there were ten (10) guidelines concerning tables.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

51

(17) and graphics (105) were not applicable to data display and comprised the majority of non-applicable guidelines. Sequence control guidelines that were not applicable included those on interrupts (11) and inappropriate dialogue types (53). User guidance suggested the recording of user interaction but there was no such recording by the editor (7). As already discussed, data transmission functions were carried out by the operating system, so none of these guidelines was relevant. Most data security was also handled by the operating system so guidelines on user identi cation (8), le and data access (10) and data transmission security (7) were not considered. Examination of the guidelines, in the context of the Z editor, highlighted many issues concerning the user interface of the editor. Some issues were relatively trivial. For example, guideline 1.12 required that the cursor be designed so that it did not obscure any character displayed in the position designated by the cursor. Clearly the editor needed to comply with this and did comply. Just what constituted compliance was also interesting. It was sometimes dicult to interpret the guidelines as presented. The authors indicated that it was hazardous to interpret the guidelines using only the comments and examples given. These could be too narrow and speci c and the designer needed to be able to recognise the generality of a guideline and apply it appropriately. Thus the guidelines became a focusing mechanism|focusing the attention of a designer on the design of a user interface rather than being prescriptive about the design. Many guidelines needed to be interpreted within the functional setting of the Z editor so the issue of compliance became less clear. As well, many issues were far less trivial than the design of a cursor and appeared to relate less to physical issues of user interface design and more to higher level issues of concern. Such issues included feedback and response rate, editor functionality, document structure and appearance, input device consistency, and user needs and user models. In the following sections we discuss these higher level issues.

5.5.1 Feedback and Response Rate Feedback on user operations, whether the interaction is correct in some sense or not, is vital for harmonious human-computer interaction. Many guidelines dealt with

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

52

feedback and the associated issue of response rate. It has long been recognised that response rate is of prime concern to all computer users and possibly even more important to skilled users such as software engineers using a language-based editor. Statements of requirements about speed of response are very easy to make but not always easy to achieve given current hardware. Some guidelines even went so far as to suggest minimum response times for various types of operations. As examples, it was suggested in guideline 1.04 that data entry actions should be acknowledged within 0.2 seconds while guideline 3.018 suggested that for control actions such as NEXT PAGE the system should respond within 0.5{1.0 second, and that error messages should be displayed within 2.0{4.0 seconds. Now while these gures suggest di erent acceptable delay times for various tasks, they do not indicate a general delay-time hierarchy. Such a hierarchy should recognise several types of delay including: undetectable (< 0:2 seconds), detectable (> 0:2 seconds), noticeable (user has to wait), and unacceptable (wait time exceeds the user's assessment of the work required). Guideline 1.03 required that there be feedback for all data entry actions. In addition, guideline 1.012 required that every data entry transaction be acknowledged. Exceptions to this latter guideline included repetitive data entry transactions. During data entry, Z text was entered into the active context as Z symbol representations (ASCII text sequences). As text was typed by the user, it was echoed on the display. If the text sequence was consistent with an appropriate Z symbol then the text was parsed and the symbol displayed, otherwise an error was detected and an error message displayed in the MESSAGE area of the control panel. Response rate was rapid for this data input operation|a requirement of guideline 1.04 where feedback delays should not exceed 0.2sec. Thus the editor complied with these guidelines. Sequence control actions also required feedback. Guideline 3.014 required that every control action be acknowledged immediately. All UQ1 control actions were acknowledged in some sense but sometimes the information accompanying the feedback was inadequate. For example, several guidelines (3.05, 4.23 and 4.24) suggested that if transactions were expected to take some time to complete then the user should be provided with information related to the status of the transaction.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

53

With UQ1, initial loading, parsing and formatting of a document le, reading a le into an existing document, or copying text from the save bu er was a lengthy process. The user was informed that the editor was parsing the text (via a message in the MESSAGE area of the control panel) but there was no indication of the progress with the process. None of the text, from the outside source, was displayed until it was all parsed, that is, the text was read and parsed in its entirety, before display. If the document was syntactically correct then the rst symbol was highlighted. Alternatively, if a syntax error was detected, the o ending symbol was highlighted and a corresponding message displayed in the MESSAGE area. From the point of view of editor design, it is relatively trivial to also provide a status indicator since speed of text loading is approximately proportional to the amount of text. In practice, UQ1's designers recognised the slow loading problem and a more pro-active solution (that is, one that actually reduces the time taken in loading, parsing and formatting text) has been sought. The current structure of data les used by the editor precludes increases in le processing speed. More specialised formatting of les could be implemented to speed the operation but this would probably involve use of more complex le structures. The technology to remedy the problem, possibly using persistent storage techniques, is currently being investigated for the UQ editors [Ped94a, Ped94b]. In any case, the designers envisage that it may still be necessary to implement some status indicator mechanism for this text processing situation. UQ1's parse strategy required upstream correctness during insertion, but insertions or deletions could create downstream inconsistency. To re-establish overall correctness the user needed to invoke a syntax check|this could be viewed as lack of feedback or avoidance of irritating feedback (the user knew that, for example, inserting or deleting a BEGIN creates a downstream inconsistency without being told). Incremental reparse is achieved by dumping parse states at regular intervals determined by the editor builder|a space/time tradeo that potentially a ects response time. UQ* addresses both these issues by a space-ecient incremental parsing algorithm [KW86, KW91] which delivers immediate downstream propagation of edit consequences with minimal, typically undetectable delay. Thus, from experience and practical use of UQ1, the designers have recognised

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

54

the problems mentioned here prior to this guidelines analysis. Response rate and quality of feedback have always been of concern to the designers of UQ1. The guidelines have reinforced these as key issues for examination but the problems identi ed regarding feedback and response rate were either solved or solutions were under consideration before this analysis.

5.5.2 Editor Functionality Normally user interface guidelines would not be expected to provide explicit guidance within the domain of the actual application. However, several guidelines related speci cally to the editing of text. This must be attributed to the pervasive nature of text-editing in computer usage. In particular, guideline 1.32 suggested that simple editing should be available during text entry without the necessity of invoking a separate edit mode. The guideline did not endorse modeless over modal editors and in fact pointed out that experienced users may nd advantages with a modal editor. However, it did suggest that simple editing, such as correcting typographical errors or simple text changes, should not require the user to invoke a separate edit mode. The Z editor was not a modeless editor|there were two distinct modes for inspection of existing text, and insertion of new text or change of existing text. Further discussion of the merits of modality in language-based editors is left to chapter 7. For the moment there were particular features of UQ1 that made its modality irritating. In insertion mode the user could correct errors in an incomplete Z symbol by backspacing, with deletion, to the o ending character and re-typing. However, changes to symbols already accepted by the editor required the user to exit insertion mode and initiate a separate edit operation to change them. In general, the user was locked into left to right, symbol by symbol progression within insertion mode. There was no concept of backspacing over previously typed symbols to allow change or to move to another part of a document to make some change when the editor was in insertion mode. The designers of UQ1 recognised this problem and the current version of UQ* is a modeless editor. Some guidelines suggested functionality that should be available in a text-based environment. The editor provided most of the functionality suggested but did not

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

55

provide global search and replace mechanisms (guidelines 1.312 and 1.313). The editor's search, insert and copy operations could be combined to achieve this e ect but the execution was clumsy. A macro command facility, as suggested in guidelines 3.1.514 and 3.218, would enable a simpler approach. There was no macro facility available in UQ1, but the UQ* design does include this facility. The designers of UQ* have already considered the issues raised here and have incorporated some into the design of UQ*. Modeless, as opposed to bi-modal, editing has been implemented and macro commands are proposed which would enable provision of user-de ned names for de ned series of commands.

5.5.3 Document Structure and Appearance Several guidelines including 1.812, 1.031, 1.618 and 2.4.811 referred to the entry and display of data that was in an organised hierarchic structure. These guidelines suggested that where such structures occur in an application there should be corresponding assistance in entry, display and manipulation of the structures. The emphasis in these guidelines appeared to be on the high level structure of documents so these guidelines related to coarse- rather than ne-grain detail of documents. Users were responsible for overall structuring of Z documents into logical sections. Once a document was created, the UQ1 Z editor used its structure to assist document display and navigation. A viewable context was the entire document, a section, sub-section or sub-sub-section. Any one window displayed a single context. When a context was displayed in a window the text was compressed to show only the headings of directly-embedded contexts. Elided detail within these contexts was represented by the ellipsis (. . . ). When users have hierarchic structures it is important that they be able to navigate easily within those structures to view and edit parts of them (guidelines 2.7.215 and 2.7.217). The editor provided a window that listed all contexts, in an indented form, simulating the document's structure. This window provided an overview of the hierarchic structure of the document and the user could select a context to be displayed from this window as if the items were options on a menu. The indented-list is just one format for such a view of a structure. Another is a graphical structure-chart style of view as used in other software development tools

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

56

(for example [MR89, GW91]). The guidelines gave no advice on the presentation or format for such overviews. We have compared the textual version of this menu with a graphical one using a controlled experiment and present the results in chapter 6. In language-based editing, the whole document is considered a hierarchic structure. UQ1's highlight in the active context could be de ned as any viewable sequence of characters, with scrolling if necessary, within that context. Relative highlight de nition allowed the user to move the highlight, in a tree-walking fashion, through the entire context. Thus, the editor had several mechanisms to allow the user to make optimal use of both coarse- and ne-grained hierarchic structure of a document. Guideline 2.71 suggested that the user should be able to tailor data coverage within a display and suppress some data. Contexts play a positive role here because they provide automatic tailoring of the data display. The user was able to zoom in and out of contexts and pan between sibling contexts. Thus this context-oriented display reduced the user's need to tailor the display. Other guidelines required that the user be able to adapt data entry and data display characteristics for the user's style. A UQ1 user could not alter the displayed format of Z text directly as this was xed by the language description supplied to the generic editor at editor-build time. That is, override by a user of individual instances of display decisions was not supported. However, because of the editor's generic nature it was easy to provide versions with alternative display styles for the same documents and languages. Nevertheless, although UQ1 provided some exibility for display adaptation it might be seen to fail to meet these guidelines. Once a particular version of the editor (with its set of display styles) was chosen by a user then the only mechanism for the user to suppress or expose part of a context is by scrolling it in a smaller window using the SunView scroll bar. This is a shortcoming contrary to guidelines requiring display exibility and user-tailoring of document presentation. UQ*'s designers have experimented with `suppression by structural distance' [BW91], as a user-selectable alternative to scrolling, when a context is too large for the available window. Using this technique, text furthest (in some structural sense) from the user's focus of attention (the highlight), is suppressed until the view ts. Thus the user can alter the visible detail of a context

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

57

by highlight movement, with an e ect resembling a ` sheye lens'. Control of parameters de ning the structural distance calculation also enables the precise focus or rate of attenuation of the sheye e ect to be adjusted. In developing this feature, the designers of UQ* also recognised a consequent user interface problem that is related, somewhat indirectly, to another guideline in the set. If distance suppression is applied before the adaptive formatting used in UQ1 and UQ*, the resultant shape of the displayed text varies dramatically as the user moves the highlight. Such changes of shape could create recognition problems for a user. For this reason the designers chose a suppression strategy that suppresses only whole lines of the formatted unsuppressed view. Guideline 2.7.44 suggested that it was important to consider human perceptual abilities where there were changing patterns in `graphic data displays', a concern sometimes known as `preserving the user's mental map'. While UQ*'s concerns are mainly text-based, the same principles clearly apply. A comparison of the eciency and e ectiveness of suppression by structural distance and traditional scrolling techniques needs investigation but at this stage the platform for such experimentation is unavailable. It is one of several issues requiring attention and foreshadowed in future work.

5.5.4 Input Device Consistency Guideline 1.05 suggested that the user be able to use one data entry device for most data entry transactions. There were two possible data entry devices used in conjunction with the editor: keyboard and mouse. In UQ1's current implementation, within a particular mode, one input device dominated the user's attention. In insertion mode the keyboard was used almost exclusively. The only actions the mouse could invoke in insertion mode were language-sensitive help and as a side e ect of requesting a le to be read and incorporated in the current document. In this latter case a pop-up dialogue box was used to input the name of the le concerned. In inspection mode the mouse was used almost exclusively. It was only necessary to use the keyboard to input text patterns for search, and to alter parameters associated with le writing operations, such as the name of a le.

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

58

In this sense the editor conformed to the guideline. However it is likely that potential enhancements to the current editor may violate the guideline. At present, the editor anticipates only mandatory downstream symbols for easy user acceptance| all other symbols must be keyed in full using their ASCII input representation. In practice, the parser component of the editor can easily predict the allowed alternative symbols at any point, or indeed the allowed constructs that imply corresponding symbol sequences. To further reduce user input e ort in insertion mode, particularly where Z symbols with clumsy or hard-to-remember input representations are involved, it is logical to make available a selectable menu of the symbols and constructs allowable at any point. Inclusion of selectable constructs in this menu would go some way to simulating the template-based input provided by tree-building editors such as the Cornell Program Synthesizer and editors that can be built using the Synthesizer Generator without compromising the basic recognition paradigm on which UQ1 is based. The selection mechanism for such a menu can be either keyboard- or mousebased, but mouse-based selection seems more natural. The menu input option, however, cannot relieve the user of all keyboard usage, since variable-spelling symbols such as identi ers and literals must still be keyed. Like most template-based editors, this mouse-based enhancement to the basic UQ1 input paradigm, with its interleaving of mouse and keyboard use, would therefore appear to violate the guideline. At this stage the designers envisage a mouse-based solution to this enhancement and feel that the guidelines (including 1.05 and 1.114) may be oversimple for this application. We have applied the techniques of predictive modelling to an evaluation of keyboard-based program input and combined keyboard- and mouse-based strategies. The results are presented in chapter 7. This evaluation led us to a further experiment presented in chapter 8.

5.5.5 User Needs and User Models In general, matching system facilities to user needs implies some model of what those needs are, and hence some model of how the user conceives the objects under manipulation. Guidelines 3.03 and 3.04 dealt with the issue of relating sequence control actions to user needs. Sequence control was entirely in the hands of the user,

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

59

that is, the user decided when to enter Z text, when to change it, what searches to undertake, when to create les and so forth, using explicit commands to do so. However, the associated controls for navigation and selection within the document, together with the display of the document itself, were signi cantly in uenced by the potential user models of document structure. Various guidelines including 2.01 to 2.04 dealt in some way with the requirement that the display of data should match the needs of the user. From a user model of program structure [WBK91] (summarised in section 4.2), a UQ1 user almost certainly perceived the primary structure of a UQ1 document as a tree or hierarchy of nested contexts. The user was most commonly interested in examining one context at a time, sometimes interested in comparing two contexts and occasionally in reviewing the overall hierarchic structure itself. When users have hierarchic structures it is important that they be able to navigate easily within those structures to view and edit parts of them (guidelines 2.7.215 and 2.7.217). The hierarchical structure of a document was captured in the context menu that could be used to view the overall structure of a document or to allow absolute selection of a context for inspection or edit. UQ1 accommodated comparison of contexts via multiple views with the user being able to tailor the editor's set of windows to suit the view required. From this unequivocal model of user needs, UQ1's context-based display, multiple window usage and context selection window menu were easily justi ed. Thus, the editor allowed the user to make optimal use of a document's overall hierarchic structure. For other levels of document structure, however, the user model was much more equivocal or pluralistic. At schema level in a Z speci cation (or statement level in a Pascal program) the user was likely to oscillate between a hierarchic tree model, a symbol sequence model, or a textual lines and characters model depending on the task in hand. It was on the basis of this pluralistic user model of the data concerned that the relatively rich set of highlight setting commands in UQ1 was justi ed. The pluralistic model, however, is equally relevant to the single most signi cant design choice in a language-based editor|whether to adopt the treebuilding paradigm or the text recognition paradigm for document entry and editing. Several guidelines including 1.812, 1.031 and 1.618 referred to the entry of data

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

60

that was in an organised hierarchic structure. These guidelines suggested that where such structures occur in an application there should be corresponding assistance in entry, display and manipulation of same. In e ect, these guidelines endorsed a tree-based approach to editing, although anecdotal evidence suggests that this approach may not be appropriate. Other guidelines (in particular 3.1.85 and 1.61) referred to direct manipulation of objects as means for data entry and sequence control. Direct manipulation is consistent with the text recognition approach, that is, WYSIWYE (what you see is what you edit). Therefore, since there is con icting design guidance with the choice of editing paradigm, designers have tended to base their decision on some other grounds such as intuition. We have already argued that relying entirely on intuition is inadequate as a user interface design and evaluation methodology and in chapter 7 and chapter 8 we present experimental results on the choice of editing paradigm.

5.6 Conclusion In many cases, not explicitly discussed in this chapter, the guidelines could be considered commonsense and simply reinforced the original intuitive design decisions taken for the user interface of the case-study editor, UQ1. Where the guidelines were at odds with design decisions, in many cases they served to con rm subsequent choices made for the interface to UQ1's successor, UQ*. In general, as a check list, the guidelines proved to be an extremely useful tool. Some of the guidelines, such as those on response rate, seemed obvious and simple enough but could involve enormous technical diculties in implementation. Thus the domain of the application must be considered when applying the guidelines. The e ort in applying the guidelines, in an evaluation sense, was also large. Some 60 person-days were required to examine the Z editor from the perspective of these guidelines. This included learning how to use the editor, learning how to use the guidelines software, intensive examination of the guidelines and examination of the guidelines with the editor. Documenting the experience, including the discussion here, took another 30 days [TW91a, TW91b]. Many of the guidelines seemed repetitive. For example, guidelines 1.014 and

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

61

6.316 were virtually identical except that they were grouped in di erent functional areas: data entry and data protection. For this analysis, the discussion of both was identical. It may be that the authors have employed a form of controlled redundancy so that important guidance is not missed by the interface designer. The interconnection of the guidelines was of some interest because it gave an indication of the usability of the guidelines. Most of the guidelines related to at least one other guideline. For example three guidelines, on window overlays and the requirement that they be non-destructive, 2.7.110, 2.4.815 and 2.7.510, referenced each other and no other guidelines. However the dependency graph for guideline 2.7.21, on the display of data relevant to a user, was large including well over 100 other guidelines. Obviously the degree of dependency of one guideline on another is important and clearly the guideline set is highly interconnected in some instances. The NaviText SAM software provided mechanisms for exploring the associations but these were not ideal. Its interface was relatively poor being essentially character-based with no direct manipulation facility. HyperSAM [Ian94], an alternative more usable mechanism that assists in the exploration of these associations is now available. It may also be the case that there were too many guidelines and the set could have been signi cantly simpli ed. Others (for example [MN90, dSB90, TS91]) have argued for the use of smaller sets of guidelines to assist designers, but it seems that still larger sets are becoming available (for example ISO 9241 reportedly will contain well over 1000 guidelines). Some guidelines were not easy to interpret and there might have been multiple interpretations for any one guideline. This was a problem only in that we had to be constantly alert to alternative interpretations for a guideline. Many of the guidelines included examples and comments that indicated completely di erent interpretations for a guideline from that we initially contemplated. As expected with such a large set of guidelines, there were some con icts (also noted as a problem in [Mye94]).

5.7 Summary In this chapter we have demonstrated the use of guidelines as an evaluation tool for the design of a user interface of a software development tool. Clearly, if such

CHAPTER 5. GUIDELINES AND A LANGUAGE-BASED EDITOR

62

a set of guidelines had been readily available in their current form at the time of UQ1's design then their use, in conjunction with other design strategies, would have been bene cial. Nevertheless, for some seemingly important issues there was con icting guidance or a lack of guidance. The guidelines o ered con icting requirements about the need to maintain input device consistency. Perhaps the most important issue where con icting advice was given related to a very basic design decision dealing with the correspondence between user requirements and the design of data entry. For language-based editors the most important design decision is whether to adopt the tree-building (template-based) or text recognition paradigm for document input and maintenance. The guidelines also encouraged designers to provide overviews where a hierarchical structure existed in an application. However there was no advice on the presentation of such overviews or an appropriate representation for such a view. Thus far, the designers of UQ1 have presented the user with a text-based indented-list format as the overview to a structured document but alternative graphical views are possible particularly given current bit-mapped screen hardware and graphical user interface software. It may be unfair to criticise the guidelines for not listing these graphical alternatives, given that many of the guidelines are based on character-oriented terminal systems, but even the text-based view was not o ered as a solution. In the three chapters that follow we explore some of the issues raised here that were not addressed by the guidelines, in particular, issues of hierarchical structure representation and editing paradigms.

Chapter 6 An Experiment with Menu Design In chapter 4 a user model of editing suggested that the menu of viewable blocks should be displayed in a graphical tree-like form. In UQ1 this menu is displayed as an indented list of block names. From the previous chapter, guidelines suggested that overviews should be provided where hierarchical structures exist in an application, but no suggestions on overview presentation were o ered. In this chapter we describe an experiment to compare two alternative hierarchical structure overviews, one text-based and one graphical.

6.1 Representation of Hierarchies There are many contentious issues in the research literature regarding languagebased editors and environments for the development of programs, generally. One issue that appears to have little empirical evidence is the advantage or otherwise of certain graphical views of program code and data. Graphics are not always better than text-based displays as has been shown by, for example [PLSS84, Ste84]. Often software tool designers assume that graphics will improve an otherwise text-based user interface. In program visualisation some form of graphics is used to represent some aspect of a program. Myers [Mye90],1 in a review of the literature on visual programming and program visualisation, stated that there was much scepticism about the part to be played in the software development process by such systems and referred to [Bro87a] and [Dij89]. In [Mye90] the view was expressed that for professional A more recent and comprehensive taxonomy of visual programming environments and systems is given in [PBS93]. 1

63

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

64

programmers, textual languages were most appropriate while visual programming may be useful in learning environments and for novices and non-programmers. However according to [Alm90], there were few empirical results that data visualisation even aided students in understanding and debugging their programs. More recently [Gre92] noted that little `hard evidence' existed to indicate that visualisation improved performance for programmers and this was reinforced by the review in [PP92]. There is some evidence that the addition of graphics reduces error rates for casual computer users using a videotex system [MM86] but even for these users and this environment there was no signi cant improvement in performance times. Nevertheless, many programming environments o er various textual and graphical views of programs and data (for example, see Myers' taxonomy [Mye90] and newer environments such as the Mjlner/ORM environment [MBD+90], MultiView [Mar90] and MViews [GH92]). One data visualisation issue is that of displaying the hierarchical tree-structure of the modules or blocks of a program. For example, one graphical representation of a program, written in a block-structured language such as Pascal or Modula-2, is the structure chart model of the hierarchical structure of the blocks or modules making up the program. This graphic is frequently used by programmers in reviewing the overall structure of a program and in examining the various dependencies between modules. Sometimes, the programmer wishes to review the overall hierarchical structure of the entire program so that an alternative or extra block may be selected for examination or editing. This view of the program may therefore become a menu from which the programmer selects another block to be displayed or edited. There are, of course, other ways of nding and selecting blocks to be displayed or edited, for example, search facilities. Such facilities, however, do not provide an overview of the structure of a document which is important [MWD88, HS94]. They also assume prior knowledge of the structure of a document and its contents which is not necessarily valid in many maintenance programming situations. The user model of programs in [WBK91], which we summarised in section 4.2, suggested that this menu should be displayed graphically and this approach has been adopted by, for example [JCL85]. While workstations with bit-mapped displays are capable of generating graphical representations for program constructs, the display area

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

65

consumed by such representations must be very high even to allow modest program segments to be displayed e ectively. Thus, it is dicult to represent other than relatively trivial program procedure structures graphically on screens currently in popular use. An alternative to the tree-like form is the display of the overall program structure as an indented text-based list of block names. The language-based editor UQ1 implements the concept in this manner ( gure 4.3). Here the e ects on programmers and non-programmers, of these implementation strategies for the display of menus representing all the viewable items of a hierarchical structure, are examined using an empirical approach to evaluation. Modelling schemes, such as GOMS/KLM, appear unable to assist in evaluation here since display comprehension consumes the bulk of task times and user interactions are essentially the same for each strategy. The factors that in uence any di erence in performance between the implementation strategies include physical characteristics such as how cluttered a display is, spacing between menu objects, text size and graphic representation, amongst others. This experiment has not been designed to test these factors (although there is little doubt that such factors may be in uential, see for example [Mus94]) but merely to examine the overall performance di erence between the two particular implementations.

6.2 Experimental Details

6.2.1 Experimental Subjects

The experiment was carried out at the University of Southern Queensland in the School of Information Technology with the assistance of academic sta , post-graduate students and administrative sta from the School. In one experiment fteen subjects were selected at random from the academic sta and post-graduate students of the School and asked to participate in the experiment. All these participants were experienced computer users having experience not only as academics but also as systems analysts, programmers and software engineers. This group constituted the experienced users. In a second experiment seven subjects were randomly selected from the admin-

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

66

istrative sta of the School. Although these subjects were familiar with computing equipment and software, they could not be considered computer professionals and had little or no knowledge of computer programming. This group constituted the inexperienced users.

6.2.2 Experimental Apparatus The program was implemented in C on a Macintosh II (68020, 16Mz) with a 48cm screen (1024768 pixels). The program's interface simulated the view of a document as produced by the UQ1 editor in that windows were used to display various views of a document ( gure 6.1).

Figure 6.1: Macintosh program simulation of UQ editor windows.

Figure 6.2 shows the graphical menu equivalent of the list menu from gure 6.1. Four hierarchies were used for both experimental groups. The hierarchical structures used were:

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

67

Figure 6.2: Graphical menu style.

A the organisational structure of the University of Southern Queensland, B the personnel and structure of the School of Engineering (not the school the subjects were from), C the structure of the Bachelor of Information Technology course, and D the personnel and structure of a ctitious company. These hierarchies included two which were well known to the subjects (the organisational structure of the University of Southern Queensland|A and the structure of the Bachelor of Information Technology course at that University|C) and two which were not known to the subjects (the personnel and structure of another University School|B and the personnel and structure of a ctitious company|D). Complete details about the hierarchies can be found in appendix B. Both experimental groups would have had various levels of knowledge about hierarchies A and B but no attempt was made to measure that knowledge. The hierarchies were chosen to give a wide range of trials rather than to measure speci c details about them per se. For each hierarchy and menu style three types of questions were asked, namely: I select a particular, named, unit within that hierarchy, II select a unit superior to a given unit, III select a unit given structural information about its location.

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

68

With the rst question the subjects were required to select an item representing a particular unit within that hierarchy|I. The second requested that the subjects select a unit superior to a given unit, that is, a unit that was the parent or grandparent of a unit|II. In the third the subjects were asked to select a unit given some structural information about its location in the hierarchy|III. The actual questions asked are given in appendix B. In all menu and question instances the subjects were presented with two windows: one brie y describing the hierarchy to be used and one displaying the instructions. The instruction window included a START button which when clicked displayed the relevant menu. After the START button was clicked, selection of an item from the menu was made by pointing to the item and clicking the mouse button. For the list menu, an item was the line containing the appropriate text, whereas for the graphical menu it was any point in the box containing the text.

6.2.3 Experimental Procedure The experiments were carried out in the same room for all subjects. During the experiments the reactions of subjects were monitored. Before the experiment proper was started, each subject was presented with a small exercise to assess their dexterity with the mouse. In this exercise the subjects were required to move the mouse pointer to ten (10) randomly presented boxes and click while the mouse cursor was in the box. Time taken for the entire set was recorded in milliseconds. Each subject was also given the opportunity to examine and experiment with both menu styles, on a hierarchy and with questions not used in the experiment itself. The examples shown in gure 6.1 and gure 6.2 are the menus and questions used for this practice session. All menu instances were randomly displayed in a di erent random sequence for each subject. The subjects were not aware which hierarchy, question and menu style combination would appear at any particular stage in the experiment. The number of errors that occurred in selection of each item was recorded as was the time (in milliseconds) required to nd and make the correct selection. Timings were recorded from the moment that a subject clicked on the START button in the instruction window until the correct item in the menu was selected (by clicking on

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

69

it). The actual time for correct selection of an item was measured after all errors had been eliminated for that subject (if any). Errors and timings were automatically recorded by the simulation program and logged to a le. At the end of the experiment subjects were asked to indicate their menu style preference and to brie y describe their reasons for that preference.

6.2.4 Experimental Design For both experiments there were two treatments: (a) graphical representation of a hierarchy, and (b) text-based list representation of a hierarchy. The null hypothesis concerning these two strategies and their relative eciencies was: there is no di erence between graphical representation and text-based list representation with respect to item selection times. The experimental design chosen was a change-over design [Gri65, MJ84] which was a special case of a repeated measures design [NWK90, pages 1035{1072]. With a repeated measures design the same subject is used for all treatments. One disadvantage with such designs is that carry-over or residual e ects may interfere with tests of treatment di erences, but change-over designs measure these residual effects enabling us to obtain unbiased estimates of treatment di erences. The main attraction of repeated measures designs is that variation among subjects can be eliminated from the error term used in testing the di erence between treatments. A second advantage is that fewer subjects are required to obtain tests of signi cance of a certain power, provided that the residual e ects of treatments can be measured or ignored. Compared to a completely randomised design, where only one treatment can be allocated to a subject, the change-over design requires as few as one-half of the subjects for an equivalent test of statistical signi cance [Gri65, page 476]. The rst experiment had fteen subjects and the second had seven subjects. For each experiment there were twelve hierarchy and question combinations made up of a factorial combination of the four hierarchical structures and three question types.

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

70

The time required to correctly nd and make selections was analysed by analysis of variance for each menu hierarchy and question combination. Layout for the analysis of variance table for experienced and inexperienced subjects is shown in table 6.1. Table 6.1: Analysis of variance table showing sources of variation.

Source of Variation

Degrees of Freedom Experts Novices Treatment Sequence 1 1 Error (Subjects) 13 5 Treatment 1 1 Treatment  Sequence 1 1 Error (Treatment) 13 5 Total 29 13

6.3 Experimental Results

6.3.1 Mouse Dexterity

Experienced subjects were slightly quicker during the mouse dexterity test (1.645sec  0.296 sd for each randomly presented object) than inexperienced subjects (1.786sec  0.645 sd) but this di erence was not statistically signi cant (P > 0:05).2 The experienced subjects were less variable in their responses as well.

6.3.2 Errors in Selection of Menu Items Most subjects had little diculty interpreting the questions and subsequently selecting correct items for either menu style. The total number of errors, before correct item selection, for experienced and inexperienced subjects is given in table 6.2. There was no analysis undertaken nor comparison made between the graphical menu style and the text-based list menu style for the number of errors that occurred when subjects failed to select the correct item|the data here were too sparse for any proper statistical analysis. However, it appeared that neither representation was Throughout this thesis, and unless otherwise stated, the probability level chosen for testing statistically signi cant di erences between treatments is 0 05. Thus, where di erences between treatments are noted it can be assumed that the di erences are statistically di erent (sometimes simply referred to as signi cantly di erent) and that di erences between treatments are not due to sampling error alone but some systematic e ect of the treatments. 2

:

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

71

Table 6.2: Total number of errors before correct selection of an item.

User Type Menu Instance Experienced Inexperienced Menu Style Menu Style Hierarchy Question Graphic List Graphic List A I 2 1 1 1 II 4 4 2 2 III 1 1 2 5 B I 0 0 0 0 II 1 0 1 0 III 1 0 2 4 C I 0 1 0 3 II 3 4 2 13 III 1 3 1 0 D I 0 0 2 0 II 0 1 3 1 III 0 1 0 0 particularly prone to provoking subjects into error and the distribution of errors was similar for both representations. Inexperienced subjects had some diculty with hierarchy C and question II but no clear reason can be given for this.

6.3.3 Menu Style Di erences Table 6.3 shows the full analysis of variance, for experienced subjects, for hierarchy A and question type I. There was a signi cant Treatment  Sequence interaction in this instance but no signi cant treatment e ect. All other hierarchy and question type combinations were analysed in this manner as well. Table 6.3: Analysis of variance table, for experienced subjects, for hierarchy A and question type I.

Source of Degrees of Sums of Mean F Test Signi cance Variation Freedom Squares Squares Treatment Sequence 1 5.811 5.811 0.33 P > 0:05 Error (Subjects) 13 226.2 17.40 Treatment 1 6.503 6.503 0.80 P > 0:05 Treatment  Sequence 1 60.47 60.47 7.41 P < 0:05 Error (Treatment) 13 106.0 8.157 Total 29 405.0

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

72

Table 6.4 shows the mean time to selection for the various menu style, hierarchy and question combinations for the experienced subjects. Only hierarchy type B and question type III produced a signi cant di erence between treatments (P < 0:01). Table 6.4: Mean time in seconds for selection of items from two menu styles by experienced subjects.

Menu Instance Menu Style Hierarchy Question Graphic List Error M.S. A I 6.096 5.165 8.157 II 7.382 6.354 8.882 III 8.717 8.324 15.62 B I 4.385 4.408 8.678 II 5.816 4.867 7.638 III 8.234 3.195 7.596 C I 5.370 4.792 14.06 II 7.060 6.850 4.958 III 5.293 9.749 42.19 D I 7.234 5.541 11.86 II 8.179 9.199 53.38 III 3.954 5.520 20.48 Table 6.5: Mean time in seconds for selection of items from two menu styles by inexperienced subjects.

Menu Instance Menu Style Hierarchy Question Graphic List Error M.S. A I 8.277 8.602 131.0 II 5.296 7.583 10.66 III 10.43 12.83 81.02 B I 3.206 8.117 29.91 II 8.507 4.762 42.18 III 8.072 9.918 122.5 C I 7.217 5.911 16.77 II 9.954 14.00 33.73 III 7.170 4.315 29.62 D I 7.265 3.151 20.46 II 6.548 8.298 39.02 III 3.833 2.529 0.719 The analysis of variance for inexperienced subjects was similar to that for the experienced subjects except that there were fewer subjects (7) and consequently fewer degrees of freedom for error terms (5). Table 6.5 shows the mean time to

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

73

selection for the various menu style, hierarchy and question combinations for the inexperienced subjects. Only hierarchy type D and question type III produced a signi cant di erence between treatments. From a pure eciency point of view, that is the time taken for subjects to understand a particular question regarding a hierarchy and to nd an item in that hierarchy, there was no di erence between the graphical menu style and the textbased list menu style for this direct manipulation environment. This was true regardless of the experience level of the subject. Clearly, from a design point of view, a text-based list is easier to create and display than a graphical representation.

6.3.4 Menu Style Preferences Aesthetically, subjects preferred the graphical representation over the text-based list representation by a ratio of about 2:1. Of the experienced subjects, ten preferred the graphical representation of menu while ve preferred the text-based list style menus. For the inexperienced subjects, ve preferred the graphical menu style while only two liked the text-based list style of menu. A post-design analysis of this new treatment, user preference, was conducted for both sets of subjects. By its nature this analysis assumes a completely randomised design and thus is unable to take account of the sequence of presentation of the menu styles. However, sequence of presentation was not a signi cant e ect for the designed part of this experiment so it is likely that the assumption is a reasonable one. The analysis of variance table for a completely randomised design only contains a treatment e ect (one degree of freedom for two treatments) and an error term (for between subject variation). Table 6.6 shows the mean time taken for experienced users to nd and select the required item when the menu shown was the graphical representation. There was no signi cant di erence in performance between those subjects who preferred the graphical representation and those who preferred the text-based list representation for any of the menu style question type combinations. Table 6.7 shows the mean time taken to nd and select the required item when the menu shown was the text-based list representation. Again, there was no significant di erence between those subjects who preferred the graphical representation to those who preferred the text-based list representation.

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

74

Table 6.6: Mean time in seconds for selection of items when the graphical menu style was presented to experienced subjects with stated preferences.

Menu Instance User Preference Hierarchy Question Graphic List Error M.S. A I 6.371 5.546 16.99 II 7.421 7.305 15.71 III 8.497 9.157 28.03 B I 5.137 2.882 4.732 II 5.751 5.948 10.14 III 9.116 6.470 18.21 C I 6.262 3.584 20.63 II 7.692 5.795 14.79 III 5.395 5.090 10.99 D I 7.388 6.926 26.84 II 7.877 8.785 47.17 III 4.183 3.497 1.675 Table 6.7: Mean time in seconds for selection of items when the text-based list menu style was presented to experienced subjects with stated preferences.

Menu Instance User Preference Hierarchy Question Graphic List Error M.S. A I 6.138 3.219 11.31 II 6.934 5.193 14.72 III 8.441 8.090 28.50 B I 5.739 1.745 18.07 II 5.010 4.581 6.130 III 3.509 2.566 2.195 C I 5.592 3.191 8.728 II 7.157 6.238 17.57 III 10.84 7.563 76.99 D I 7.355 1.915 30.35 II 10.44 6.706 102.9 III 6.911 2.739 36.62 For inexperienced users the group size of seven was small and any parametric statistical analysis with such small group sizes must be viewed with caution. The results for inexperienced users are presented as means only with no corresponding statistical analysis or inference. Table 6.8 presents results for the inexperienced users.

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

75

Table 6.8: Mean time in seconds for selection of items when graphical or text-based list style menus were presented to inexperienced users with stated menu style preferences.

Menu Style Presented Menu Instance Graphic List User Preference User Preference Hierarchy Question Graphic List Graphic List A I 5.403 15.46 9.965 5.195 II 5.286 5.319 9.622 2.485 III 11.92 6.708 16.78 2.960 B I 3.704 1.961 9.257 5.270 II 10.16 4.359 5.290 3.441 III 8.312 7.473 13.14 1.870 C I 5.769 10.84 7.398 2.194 II 10.74 7.989 17.47 5.340 III 8.176 4.655 4.578 3.657 D I 8.914 3.142 2.926 3.715 II 6.347 7.049 10.49 2.806 III 4.252 2.784 2.623 2.294

6.4 Discussion of Experimental Procedures and Results For the experiment we simulated the physical interface of an existing language-based editor and used example hierarchies that were not based on program documents. We could have modi ed the editor itself, however, this would have had many implications for the rest of the experiment. Firstly, the `look and feel' of the editor would have been an issue for users so we would have needed to restrict the population of subjects to those familiar with the editor and its use. This population was small. Secondly, if we had used a modi ed version of the editor it would have been dicult to test the issue of interest since the editor is a general editing tool and not specifically designed for conducting experiments such as this. Thirdly, instrumentation of the simulated editor interface was relatively easy compared to that required to instrument UQ1. All in all a simulated editor reduced these confounding e ects and provided a vehicle for further experimentation. By using non-programming examples of hierarchies, some of which were known to the subjects, within a simulated editor we were able to use a much wider population of subjects. The nature of the experimental situation meant that control over any contaminating variables

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

76

was good. The results were clear and concise if not astounding. One of the rst criticisms of this experiment could be that the example hierarchies were arti cial from the perspective of programs in a programming environment. We counter this with the view that experienced programmers, by virtue of their education and training, should `understand' the concept of structure whether it is applied to a program or to some other entity. In a program maintenance context the programmer could be unfamiliar with a particular program and its structure. Two of the hierarchies (B and D) presented to subjects were unknown to them. Thus these hierarchies could be considered simulations of this program maintenance situation. In a program development context and in some maintenance contexts the programmer is very familiar with a particular document's structure. Subjects were familiar with two of the hierarchies (A and C) presented here. Several noted that they knew the structure presented and that this helped them in answering questions and selecting items. Thus we covered a range of hierarchies. This would have been dicult if we would have insisted on programming examples. A second criticism could relate to the size of the population of experimental subjects. What is the number of subjects required to obtain statistically signi cant results? This is one of the most common questions in experimental design. We chose to use 15 experienced subjects but by using a type of design known as a change-over design, in which subjects saw both treatments, this e ectively equated to about 30 subjects. One of the main concerns that researchers have with such designs is the residual e ect of a previous treatment. However, the random allocation of treatments together with the padding provided by a selection of menu and question combinations and the method of statistical analysis eliminated this concern. In fact it provided better estimates of treatment di erences since variation within subjects was eliminated. The type of experiment meant that each subject was only required to participate for a relatively short time, about 20 minutes. In long or complex experiments users are prone to lose interest, they complete the experiment in ways other than that intended for their treatment group [MR92], drop out part way through or simply refuse to participate in the rst place. Thus a relatively short experiment timeframe and simple experimental task had advantages. Computing professionals,

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

77

as well as novice users, were readily available for the experiment. Both types of user found the experiment interesting and indicated a willingness to participate in further experiments of a similar type. At the end of each experiment trial we asked users to indicate their menu style preference and to brie y describe their reasons for that preference. We noted a 2:1 preference for graphical versus text-based representations of the menu but the subjects did not consider this a factor in helping to make decisions during the experiment. Is user preference not the best way to choose the type of menu to be displayed in the editor? Clearly, user preference was important but what we wanted to investigate was an issue of software ergonomics and we found no di erence between the representations regarding timing considerations. That is, it did not matter which menu style a subject preferred they performed equally well regardless of their preference. From a design perspective, the text-based representation is simpler to implement, although from a user perspective it may be prudent to o er both views eventually. This post hoc style of survey is a useful adjunct to the experiment proper since it quickly provided additional information. Several subjects noted that they learned about particular hierarchies as the experiment proceeded but at this stage no analysis of these observations has been conducted. However the design of the experiment and data collection mechanism allows this concept to be investigated. The style of the graphical menu (and for that matter the text-based menu) may also be of concern. Our view was top-down with drop down sub-menus and sub-sub-menus but could easily have been bottom-up or side-on views. These alternative menus were not examined but they may be more suitable|only further experimentation could ascertain the answer. However in this application the menu is more than a simple selection mechanism. One of its main objectives is to provide an overview of the hierarchy of a program or structured document. In that context so-called `walk-in' menus would not have been suitable since they require explicit action by the user to reveal the structure. Such menu implementations are less ecient because part of the objective of the display of the menu is to provide an aid to memory [WLM94]. We might have allowed either or both representations to scroll. This complica-

CHAPTER 6. AN EXPERIMENT WITH MENU DESIGN

78

tion was not required and indeed deliberately not examined at this stage. It would have introduced extraneous variables to an otherwise uncomplicated design and in any case the issues involved are better dealt with as another conceptual issue, scrolling. The implementation of scrolling is simple for the text-based style (and is provided in UQ1 in this manner) but, given the speci c experimental engine, the graphical style would have been more dicult.

6.5 Summary There have been numerous studies of menu layout and menu use (see for example [Kig84, Ant88, MP90, Mus94]) and this experiment could be considered just another. However the menus considered here, and their context, simulate an important concept for programmers|the modular structure of a program. Tools for program development need to be designed to give support to programmers in a manner that is optimal in the sense that program development will proceed most eciently. If a graphical representation of some concept provides such an optimal solution to that concept, then that representation should be utilised. The results from this limited experiment would indicate that for the situation concerned, namely the representation of the overall structure of a program and its use as a menu for module selection, a graphical representation is no more ecient than a more simply implemented text-based representation. There is an argument, though, based on stated user preferences, that the graphical view was preferred by some users and should be o ered, in some form, as an alternative. This experiment embodied a simple concept for experimentation, a good population from which to select, controlled conditions, simple tasks, and relatively easy execution and analysis. In contrast, the experiment reported in chapter 8 on our second conceptual issue of the choice of an editing paradigm, is more problematic. Selection of tasks, subjects, and parameters to be compared, pose many more problems. Prior to that, we present a theoretical analysis of the editing paradigm issue.

Chapter 7 A Keystroke Model of Editing Paradigms In section 4.3 we discussed the two basic editing paradigms used in language-based editors: tree-building and text recognition. Moreover, we noted that there had not been any attempt to systematically compare the paradigms. In this chapter we apply the techniques of the predictive modelling approach to an evaluation of these editing paradigms.

7.1 The Editing Paradigm Debate The debate about the choice between tree-building and text-recognition as languagebased editing paradigms continues (see discussions in [WBK91, Min92, KU93, WGR94]). For example, [KU93] suggested that the strict discipline imposed by the tree-building approach was appropriate for novices but not for experienced software developers. Thus a decision could be based on the type of user expected to utilise the product. An alternative they suggested was to consider user preferences and base a decision on a majority of user views, but where no clear majority prevailed provide both. No single solution was proposed by [WGR94] where it was suggested that a `subjective stalemate' had been reached between advocates of one or the other of these approaches to language-based editing. From the point of view of this thesis, the key word above is subjective since to the best of our knowledge no systematic attempt to demonstrate the advantage of one paradigm or the other, either by application of relevant theories or by empirical investigation, has been attempted. An alternative to this informal and intuitive ap79

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

80

proach to the evaluation of such issues is the predictive modelling approach [EE89]. Using this approach a model is built which can be used to help evaluate various issues before prototyping commences. In particular, the Keystroke-Level Model (KLM) [CMN80, CMN83] can be used to assess the eciency of the paradigms. The tree-building paradigm has been implemented in a variety of editors. In this study the example used was a Pascal editor generated using the Synthesizer Generator. Example text-recognition editors include the UQ editors, UQ1 and UQ*, that were also used here. In principle, we are interested in a two-way comparison of the tree-building and text recognition paradigms. In practice, it is useful to consider some variants. It is possible for users to use the same editor in di erent ways|there are variants of editors. The Cornell editor allowed the user to exercise the tree-building paradigm either via the use of the mouse (CSG-TB) or keyboard (CSG-TBk) for menu selection and cursor movement. UQ1 was available in its standard form as a pure text recognition editor (UQ1-TR) and in an enhanced form which e ectively simulated template-based input without compromising its basic editing paradigm (UQ1-TB). A plain text editor was also considered (EDIT). Obviously plain text editors provide an alternative to language-based editing of either form.

7.2 The Keystroke-Level Model The software development tool designer is interested in designing tools that are most ecient for use by software engineers. There are various measures of eciency, and the time required to do a task is one of them. The Keystroke-Level Model (KLM) enables a designer to predict the time taken by experienced-expert users in carrying out a given task using a given tool or paradigm. There are some restrictions and caveats on its use. The KLM assumes error-free expert behaviour. For software development tools used by software engineers, this may be a reasonable assumption. As well it only predicts time to execute a task and not the time taken in acquiring the information to execute the task. That is, program comprehension tasks were not studied here, instead we examined program development and maintenance tasks that occur either after or interleaved with comprehension. According to [CMN80, CMN83] the time to execute a task can be described

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

81

using four physical parameters, K (keystroking), P (pointing), H (homing) and D (drawing), one mental operator M and a system response operator R. The execution time for a task is the sum of the times for each of these operators (expressed as Tparameter ) and is given by Texecute = TK + TP + TH + TD + TM + TR :

Estimates of these various parameters have been determined [CMN80]. Total time spent in keystroking and button pressing is given by TK and is based on the number of keys pressed (keys not characters, so an upper case A is two keystrokes| SHIFT a) and average typing speed (40wpm) for the typical non-secretarial typist of 0.28sec per keystroke. The P operator represents pointing to a target using a mouse, an average 1.1sec. With multiple input devices, the user must shift hands between these devices. The homing H operator accounts for this movement with an average 0.4sec between devices. The D operator represents using the mouse to draw straight line segments. Originally this was included by [CMN80] to indicate the wide scope of tasks that could be covered by the KLM (for example computer-aided drafting, graphics and painting) but in this study we adapted it to indicate text highlighting tasks that were e ectively drawing tasks (point{button down{draw{button up). The D operator has two parameters, the number of lines drawn and the length of those lines. The time taken to draw a single line of length lD cm is 0:9 + 0:16lD sec. Time spent `preparing' to carry out a physical operator is covered by the M operator which is an average time of 1.35sec. Heuristic rules describe the placement of M operators in an analysis of a task [CMN83, page 265]. The R operator represents the system response time. It is only relevant when the user has to wait before execution of one of the four physical operators and may be partially or totally subsumed by an M operator. The `physical encoding' for the KLM operators is relatively simple|it follows typical use of a system by an experienced-expert user. However the `cognitive encoding' (use of the M operator) is more dicult although the heuristic rules act as a guide to their placement. These rules have an underlying psychological

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

82

principle that users cognitively organise methods according to sub-method `chunks'. The time estimate for the M operator represents the time to retrieve a chunk of information from long-term memory into working memory. It is not unusual for the KLM to account for between 80% and 90% of the time to execute tasks which implies that it is an accurate modelling tool [CMN80, OO90, PJ92, GJA93]. We now present the program development and program maintenance tasks used to analyse the editing paradigms.

7.3 Programming Tasks Program development involves the input of program source code using an editor. Software engineers sometimes develop their code o -line then type in that code using their favourite editor. Others use pseudocode as the basis for their programs and then develop the syntax as they input the code using the editor. Still others sit straight down in front of their workstations and begin typing. In this study we were mainly concerned with actions consistent with the rst two, more organised, program input styles. Thus, the example source code was either assumed to exist before input, in either typed or hand-written form, or alternatively it existed in a form that was relatively easy to transform into source code at input time. The de nition of a unit-task in this context is most likely a `chunk' of program and may even be an example of a `programming plan' familiar to experienced programmers [Sol87]. In gure 7.1 we present a simple example of a development task. It is a Pascal procedure from [WRL86] and serves as an example of the application of the KLM technique for the various paradigms. PROCEDURE Check; BEGIN IF Flag THEN count := count + 1 END; Figure 7.1: An example program input task.

The software engineer would have little diculty in entering this code using any program editing paradigm. A keystroke analysis provides an estimate of the time

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

83

taken by experienced software engineers to input the code. This was a relatively trivial input example yet typical of a unit-task as de ned in [CMN83]. Further examples of similar unit-tasks, used for evaluation of the editing paradigms (taken from a typical Pascal textbook [CC85]), are given in appendix C. A range of tasks, including input of procedure and function stubs and various language construct types, was considered. Program maintenance tasks include xing errors or making enhancements. In both instances the software engineer needs special skills such as are necessary for program comprehension, debugging techniques, speci cation comprehension, and so on. They also need to be able to eciently utilise the tools available that facilitate these maintenance tasks. As for program development, in this study we were interested in organised maintenance methods. Thus the maintenance activities to be undertaken by software engineers were assumed to be well de ned. That is, they were implementing changes that had been previously determined. As an example, we consider a simple maintenance task for part of a Pascal procedure. The program segment in gure 7.2 needs the WHILE-DO construct converted into an IF-THEN construct. This trivial `text-book' example was used to demonstrate the application of the KLM technique to the program maintenance task. WHILE count0 DO BEGIN read (char); count := count - 1 END Figure 7.2: An example program maintenance task.

Other examples of unit-tasks were developed and sample program segments and changes are given in appendix D. As for program development, a range of tasks was included. Trivial alterations were made including: a variable's name changed in a procedure, changes to a logical expression (the condition part of a decision construct), and adding an extra statement into a loop. Structural alterations included reversing the sense of a FOR loop and removing a decision construct leaving the statements as a statement block. The decision construct was subsequently

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

84

replaced. In another example we removed a loop from around statements and subsequently replaced it.

7.4 Tree-Building Paradigm The Pascal editor supplied as an example with the Synthesizer Generator is a compromise solution between a strictly tree-based editor and text editor. Language constructs at all levels can be created and manipulated as templates but at lower levels the user can also enter and edit expressions as text. The Cornell editor presents the user with a menu of template constructs and operators. This menu is permanently displayed and dynamically updated as the user inputs text and selects templates. There are three methods of template selection. A template can be selected from the displayed menu using a mouse to point to the menu item and a button click to initiate the selection. An alternative menu is available from the `transformations' option of the menu-bar.1 The user can also use the keyboard to invoke execute-command and type some unambiguous pre x of the template name. This method is not considered in this section but is in the next section. Table 7.1 shows the KLM analysis for the program development example ( gure 7.1), while table 7.2 gives the analysis for the program maintenance example ( gure 7.2). Both tables have four columns. The rst column presents the method used for each part of the task (for example, Select procedure from menu) while the second gives the physical operator encoding as set out in the KLM (for example, P [ procedure ]). Column three represents the application of `Rule 0' for inserting the memory operator, M, while column four shows the application of the other heuristic rules for the removal of M operators [CMN83, page 265]. It is this last column that provides the list of parameters for the execution time estimate. In table 7.1 we follow the precedent set in [CMN83] of placing M operators in front of Ps that select commands. The allocation of M operators was signi cant since the time assumed for this operator (1.35sec) was about ve keystrokes. Problems with the heuristic rules for placement of M operators were discussed elsewhere [LNBN93] and even [CMN83] noted diculties because of a lack of knowledge 1 Only the rst method was considered here as time estimates for the second were almost always greater.

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

85

Table 7.1: KLM analysis for program code input for tree-building paradigm with mouse-based menu selection (CSG-TB).

Operation

Physical Operators Select procedure from menu P [ procedure ] K [left button] Select of procedure P [] K [left button] Reach for keyboard H [keyboard] Enter Check 6K [Check] Reach for mouse H [mouse] Select P [] (in procedure) K [left button] Select ifthen from menu P [ ifthen ] K [left button] Reach for keyboard H [keyboard] Enter Flag 5K [Flag] Select forward-with-optionals K [RETURN] (move to ) Enter count:=count+1 16K [count:=count+1] Select pointer-down K [#] ( nish text input)

Include Ms MP MK P K H 6K H P K MP MK H 5K MK

Remove Ms MP K P K H 6K H P K MP K H 5K MK

16K MK

16K MK

about the `cognitive unit'. Note that we chose the keyboard-based forward-withoptionals2 for simple forward tree-walking movements while we used mouse-based moves for major relocation before input or where it was more ecient (from a time viewpoint) than keyboard-based moves. We assumed that, whatever the editing paradigm (except plain text editing), the spacing within expressions, conditionals, declarations, and so on, was handled by the editor automatically. The Cornell editor displayed Pascal keywords in lower-case. Comments were not included in any of the experiments since all paradigms handled them in a similar fashion and they were mostly input of text. The total time estimate (sec) to input the program development example using the mouse-based version of the Cornell tree-building editor, and the estimates of the parameters given previously, is: According to [RT89b], forward-with-optionals changes the selection to the next resting place in a forward preorder traversal of the abstract-syntax tree, stopping at placeholders with optional constituents. 2

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

86

33K + 4P + 3H + 4M = 33  0:28 + 4  1:1 + 3  0:4 + 4  1:35 = 20:24sec Table 7.2: KLM analysis for program code maintenance for tree-building paradigm with mousebased menu selection (CSG-TB).

Operation

Physical Operators Select pointer location P [end] after end of while K [left button] Select statement-seq P [ statement-seq ] from menu K [left button] Select ifthen from menu P [ ifthen ] K [left button] Select count0 of while P [count0] K [left button] Select cut-structure P [cut-structure] from pull-down menu D [0.25cm] Select of if-then P [] K [left button] Select paste-structure P [paste-structure] from pull-down menu D [1.25cm] Select begin-end of while P [begin-end] K [left button] Select cut-structure P [cut-structure] from pull-down menu D [0.25cm] Select of P [] if-then K [left button] Select paste-structure P [paste-structure] from pull-down menu D [1.25cm] Select while construct P K [left button] Select cut-structure P [cut-structure] from pull-down menu D [0.25cm] Select begin-end of if P [begin-end] (terminate ) K [left button]

Include Ms P K MP MK MP MK P K MP MD [0.25cm] P K MP MD [1.25cm] P K MP MD [0.25cm] P K MP MD [1.25cm] P K MP MD [0.25cm] P K

Remove Ms P K MP K MP K P K MP D (0.25) P K MP D (1.25) P K MP D (0.25) P K MP D (1.25) P K MP D (0.25) P K

As already mentioned, this was a relatively trivial example yet it is typical of a unit-task where the `execution' of unit-tasks is assumed to take in the vicinity of 20sec [CMN83, page 261]. Currently, to alter a while do construct to an if then construct, the user must create an adjacent if then template; move the statement block and expression,

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

87

separately; and remove the empty while do template. A CLIPPED bu er is used in the Cornell editor to hold the most recently cut structure, block of statements or condition. There is only one such bu er although a separate DELETED bu er contains the most recently deleted construct. Using the above strategy, the mousebased version of the tree-building paradigm predicts our program maintenance task to take 32.39sec (table 7.2). The Synthesizer Generator enables the provision of transformations for maintenance tasks of this kind but in the case of this simple editor no transformation is provided for this particular change. The KLM analysis of the maintenance example highlights two potential problems with the Cornell editor design. One is its provision of only one CLIPPED bu er, which immediately suggests a potential enhancement to the editor|the provision of multiple CLIPPED bu ers to accommodate cut text or program segments during program maintenance. A second problem concerns the mix of keyboard and mouse. At least two alternative design options are possible solutions to the rst problem. The current CLIPPED bu er could be implemented as a stack, such that structures are only accessible from one end with the user moving them between the bu er and the document in `last in, rst out' order. Provided the height of this stack is small it is probably not a signi cant cognitive load on the user's working memory capacity. Using this model of the save bu ers the time estimate for execution of our program maintenance example reduced to 27.22sec (5.17sec less than before). With this proposed implementation the user no longer needs to create an empty if-then construct and can simply use the left after deleting the while-do construct. There is also no need to eliminate the left-over once all editing is complete (the last PK pair in table 7.2). A second alternative provides multiple CLIPPED bu ers (a clipboard) with the user able to select the appropriate bu er at the appropriate time. Both the stack implementation and the clipboard option involve two operations to retrieve a construct; a point and a select operation. The clipboard option has the advantage that clipped structures are placed on a clipboard and made available for use at other points in a program. In either case the time estimates are sure to be less and the exibility will be greater than in the current single bu er implementation.

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

88

The second potential problem with the Cornell editor relates to the keyboard and mouse usage it encourages. As unit-tasks are combined into program segments and complete programs, the tree-building paradigm with mouse-based selection of templates, program constructs and edit operations, tends to violate one of the basic guidelines in the extensive set provided in [SM86]. Guideline 1.05 indicates that the user should be able to maintain a single method for entering data with minimal shift between physical input devices such as keyboard and mouse. In the next section we examine the keyboard-based version of this editor.

7.5 Tree-Building with Keyboard-Only Input Using the Cornell editor, template selection and cursor movement can be facilitated by use of keyboard commands. Two of these, forward-with-optionals and pointer-down, have already been utilised in the mouse-based version of this editor. The user can avoid use of the mouse and exclusively use the keyboard for entering data and cursor location within a document. The display of template menus and the general appearance of the user interface is still the same as for mouse-based interactions. In addition, though, template selection using execute-command generates a window where commands can be typed. The window appears at the current mouse cursor location. Table 7.3 shows an encoding of the program development example for the keyboard-based version of the tree-building editor. The encoding of the operator forward-preorder used to move to needs some explanation. Where the keystroke sequence is CTRL SHIFT n for one instance of the operator it seems unnecessary to count the keystrokes CTRL and SHIFT for the immediately following instance since those keys are already depressed by the user. In table 7.3 we also removed M operators from two positions where the heuristic rules would have left them|after entering the variable length strings associated with invocation of an execute-command. We argued that template selection was a single `cognitive unit' so only one M operator was required|already allocated at invocation. From the user's viewpoint, however, the tree model is compromised by the fact that a textual display is normally used at all levels. To formulate the next operation required, the user must exercise signi cant cognitive e ort to abstract

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

89

Table 7.3: KLM analysis for program code input for tree-building paradigm with keyboard-based menu selection (CSG-TBk).

Operation Invoke execute-command Enter p(rocedure) End execute-command Select forward-with-optionals (move to ) Enter Check Select forward-preorder (ignore optionals, move to begin) Select forward-preorder (move to ) Invoke execute-command Enter ift(hen) End execute-command (move to ) Enter Flag Select forward-with-optionals (move to ) Enter count:=count+1 Select pointer-down ( nish text input)

Physical Operators K [TAB] K [p] K [RETURN] K [RETURN]

Include Ms MK K MK MK

Remove Ms MK K K K

6K [Check] 3K [CTRL N]

6K M3K

6K M3K

3K [CTRL N]

MK

K

K [TAB] 3K [ift] K [RETURN]

MK 3K MK

MK 3K K

5K [Flag] K [RETURN]

5K MK

5K MK

16K [count:=count+1] 16K K [#] MK

16K MK

from this textual display to the tree structure involved. It is possible, therefore, that our assumptions are not valid and observation of actual use of the editor may be necessary to either validate our assumption or correctly allocate Ms. With these encoding choices, keyboard-based input of the program development example is predicted to take 18.51sec (table 7.3)|1.73sec less than with the mouse-based version (table 7.1). Table 7.4 is a KLM analysis for the keyboard-based tree-building editor for our simple maintenance example. As for program input, the de nition of a cognitive unit in uenced the placement of M operators. In this case not only did it impact on invocation of execute-command sequences it also applied in cut and paste operations. In each instance of this the user moved the pointer to the required position before carrying out the operation. Whether these two components (for example, move and cut) can be considered a single cognitive unit is debatable but

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

90

Table 7.4: KLM analysis for program code maintenance for tree-building paradigm with keyboardbased menu selection (CSG-TBk).

Operation Select pointer location after end of while Reach for keyboard Select forward-with-optionals (create ) Invoke execute-command Enter ift(hen) End execute-command (move to ) Select pointer-up (move to begin-end) Select cut-to-clipped (cut begin-end) Select pointer-down 2 (move to of if-then Select paste-from-clipped (paste begin-end) Select pointer-up 3 Select forward-with-optionals (move to count0) Select cut-to-clipped (cut count0) Select pointer-down 2 Select forward-with-optionals (move to of if-then) Select paste-from-clipped (paste count0) Select pointer-up 3 (move to while-do) Select cut-to-clipped (cut while-do) Select forward-with-optionals (terminate )

Physical Operators P [end] K [left button] H [keyboard] K [RETURN]

Include Ms P K H MK

Remove Ms P K H MK

K [TAB] 3K [ift] K [RETURN]

MK 3K MK

MK 3K K

K ["]

MK

MK

3K [CTRL W]

M3K

3K

2K [""]

M2K

M2K

3K [CTRL Y]

M3K

3K

K ["""] K [RETURN]

M3K MK

M3K K

3K [CTRL W]

M3K

3K

K [##] K [RETURN]

M2K MK

M2K K

3K [CTRL Y]

M3K

3K

3K ["""]

M3K

M3K

3K [CTRL W]

M3K

3K

K [RETURN]

MK

MK

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

91

for the purposes here we assumed this to be the case. With these encoding choices, the time prediction for execution is 22.38sec|10.01sec less than with mouse-based interaction (table 7.2). The time estimates for this keyboard-based editing are less than for the mousebased version. Thus the Smith and Mosier [SM86] guideline 1.05 has been vindicated, but the clumsiness of editing using only the keyboard may encourage many users into using either all mouse-based operations or some mixture of mouse and keyboard. Exclusive use of the keyboard for this paradigm appears unnatural in many instances.

7.6 Text-Recognition Paradigm UQ1 is a recognition editor, parsing and formatting text as it is typed. The editor anticipates downstream symbols that can either be accepted or overtyped. It caters for good typists in that they can ignore the display completely and touch-type the symbol sequences. Poor typists however can exploit the anticipatory powers of the editor to minimise the keystrokes required. We assumed that input of PROCEDURE Check was part of a larger task and thus the editor was already in insertion mode and remained in insertion mode after entry of the procedure. The impact of UQ1's modality is discussed further in following sections. We also assumed that the user perceived the input of the procedure heading (up to and including the ;) as a single unit. Thereafter the accept command was used to progress past the BEGIN, THEN and END symbols. We reasoned that the user viewed symbols such as ; more simply than BEGIN and that this was a natural use of the editor. The UQ editors use CTRL a (a two keystroke sequence) to initiate acceptance of downstream symbols|this could be reduced to one keystroke. For text-recognition it would sometimes be more ecient to simply type the required text. As an example, the acceptance of a THEN keyword counts as M2K (1.91sec) and only 6K (1.68sec), including a SPACE before and after THEN, if simply typed in lower case. Of course, the question becomes whether the M operator is unnecessary in the case of this acceptance. For the purposes here we retained it but experimental evidence could suggest otherwise.

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

92

Table 7.5 provides the KLM analysis for one feasible way of using text-recognition for the program development example and table 7.6 provides the analysis for the simple program maintenance example. For text-recognition input of the program development example the time estimate is 17.49sec. The program maintenance example is predicted to take 14.36sec to execute. Thus text-recognition is faster than tree-building in both program development and program maintenance tasks. In the Table 7.5: KLM analysis for program code input for text-recognition paradigm (UQ1-TR).

Operation Enter procedure Enter Check Enter ; Accept BEGIN Enter if Enter Flag Accept THEN Enter count:=count+1 Accept END Enter ;

Physical Operators 10K [procedure ] 6K [Check] K [;] 2K [CTRL a] 3K [if ] 5K [Flag] 2K [CTRL a] 16K [count:=count+1] 2K [CTRL a] K [;]

Include Ms 10K 6K K M2K 3K 5K M2K 16K M2K K

Remove Ms 10K 6K K M2K 3K 5K M2K 16K M2K K

Table 7.6: KLM analysis for program code maintenance for text-recognition paradigm (UQ1-TR).

Operation

Physical Operators Select WHILE word P [WHILE] K [middle button] Select Change option P [Change] from control panel K [left button] Reach for keyboard H [keyboard] Enter if 2K [if] Terminate insert mode K [ESC] Reach for mouse H [mouse] Select DO word P [DO] K [middle button] Select Change option P [Change] from control panel K [left button] Reach for keyboard H [keyboard] Enter THEN 4K [then] Terminate insert mode K [ESC]

Include Ms P K MP MK H 2K MK H P K MP MK H 4K MK

Remove Ms P K MP K H 2K MK H P K MP K H 4K MK

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

93

case of program development there are fewer M operators for text-recognition. The apparent `direct manipulation' of objects appears to be the basis of the advantage in program maintenance. The nature of text-recognition is that the user perceives the changes as just textual changes and can directly alter programs.

7.7 Text-Recognition with Mouse-Based Template Selection Recently, UQ1 was enhanced to allow program development using menu selection. This was a relatively simple adaptation since the parser was able to predict all possible constructs, terminals and non-terminals at any editing stage. The enhancement allows UQ1 to e ectively simulate template-based input without the need to compromise its basic editing paradigm (recognition). Table 7.7 provides the KLM analysis for text-recognition using mouse-based menu selection of templates for the program development example. The time estimate is 20.51sec. Table 7.7: KLM analysis for program code input for text-recognition paradigm with mouse-based menu selection (UQ1-TB).

Operation Select PROCEDURE from menu Reach for keyboard Enter Check Enter ; Accept BEGIN Reach for mouse Select IF-THEN from menu Reach for keyboard Enter Flag Accept THEN Enter count:=count+1 Accept END Enter ;

Physical Operators P [PROCEDURE] K [left button] H [keyboard] 6K [Check] K [;] 2K [CTRL a] H [mouse] P [IF-THEN] K [left button] H [keyboard] 5K [Flag] 2K [CTRL a] 16K [count:=count+1] 2K [CTRL a] K [;]

Include Ms MP MK H 6K K M2K H MP MK H 5K M2K 16K M2K K

Remove Ms MP K H 6K K M2K H MP K H 5K M2K 16K M2K K

As in the case of the Cornell editor, implementation of menu selection increased

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

94

time estimates. The relative eciencies of keyboard-based versus mouse-based methods can be examined by considering the example of entering the keyword PROCEDURE. For keyboard-based input the user types ten keystrokes (procedure ) taking 2.80sec to input the word. Selecting PROCEDURE from a menu involves the sequence of operators MPKH at least and may require HMPKH if the keyboard was in use before menu selection commenced (the extra H being required to reach for the mouse if the keyboard had been in use). The time required here is from 3.13sec to 3.53sec so any keyword or keyword/optional component combination requiring the typing of less than eleven characters is more eciently entered with the keyboard-based method rather than with the mouse. In program maintenance situations this enhanced recognition editor is better than the Cornell editor but slightly worse than pure recognition editing (table 7.8). The time estimate is 16.53sec. Table 7.8: KLM analysis for program code maintenance for text-recognition paradigm with mousebased menu selection (UQ1-TB).

Operation

Physical Operators Select WHILE word P [WHILE] K [middle button] Select Change option P [Change] from control panel K [left button] Select IF-THEN P [IF-THEN] from menu K [left button] Reach for keyboard H [keyboard] Terminate insert mode K [ESC] Reach for mouse H [mouse] Select DO word P [DO] K [middle button] Select Change option P [Change] from control panel K [left button] Reach for keyboard H [keyboard] Enter THEN 4K [then] Terminate insert mode K [ESC]

Include Ms P K MP MK MP MK H MK H P K MP MK H 4K MK

Remove Ms P K MP K MP K H MK H P K MP K H 4K MK

In some situations, with some languages, even experienced-expert users may be willing to sacri ce speed of keyboard/mouse exchanges for the assistance provided by menu display and selection of constructs and symbols. The example program

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

95

input and maintenance tasks presented here represent traditional programming language source code input and code change. Menu selection may have more application when the language used by the software engineer is richer, for example, languages such as the speci cation language Z. A relatively trivial Z schema can involve the use of many mathematical and graphical symbols not available on normal QWERTY keyboards. To input these symbols the user would normally need to type ASCII character sequences (up to 19 keystrokes in UQ1) that are equivalent to Z symbols with these representations being transformed into Z symbols on-screen. A menu would provide a list of possible operators from which the user could select. One problem then becomes how to present this menu in a way that assists the user. Whether the selection mechanism is better implemented as mouse-based or keyboard-based is not clear but in either case the menu may provide assistance, not only in reducing execution time, but also in reducing acquisition time for input of such symbols.

7.8 Plain Text Editing One test of the general usability of language-based editors implemented using either paradigm is to compare them with plain text editors such as the UNIX system editor vi or the MS-DOS3 Version 6.2 editor EDIT. A text editor imposes extra responsibilities on the user including the formatting of text. The spacing between symbols, placement of newlines and the adjustment of margins during program input involve non-trivial decisions. Consequently the placement of Ms re ects this. However the heuristic rules for removal of Ms eliminated them in the case of indentation. Language-based editors also syntactically check documents during input and change whereas plain text editors do not. Table 7.9 provides the KLM analysis for EDIT for the program development example. The program development example time estimate is 23.37sec. Comparison with traditional text editors is not as straightforward for maintenance tasks since there are so many di erent editors with di erent ways of selecting text for change. Table 7.10 provides a feasible solution for the program maintenance 3

MS-DOS is a trademark of Microsoft Corporation.

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

96

Table 7.9: KLM analysis for program code input for plain text editing with automatic indent propagation (EDIT).

Operation Enter procedure Enter Check Enter ; Enter newline Enter Enter begin Enter newline Enter Enter if Enter Flag Enter then Enter count := count Enter newline Select backspace 2 Enter end Enter ;

+ 1

Physical Operators 10K [procedure ] 6K [Check] K [;] K [RETURN] 2K [ ] 5K [begin] K [RETURN] 2K [ ] 3K [if ] 5K [Flag] 6K [ then ] 20K [count := count K [RETURN] 2K [ ] 3K [end] K [;]

+

Include Ms 10K 6K K MK M2K 5K MK M2K 3K 5K 6K 1] 20K MK M2K 3K K

Remove Ms 10K 6K K MK 2K 5K MK 2K 3K 5K 6K 20K MK 2K 3K K

Table 7.10: KLM analysis for program code maintenance for plain text editing with autoindentation (EDIT).

Operation Select while word by dragging pointer Reach for keyboard Enter if Reach for mouse Select do word by dragging pointer Reach for keyboard Enter then

Physical Operators P [while] D [1cm] H [keyboard] 2K [if] H [mouse] P [do] D [0.4cm] H [keyboard] 4K [then]

Include Ms P D [1cm] H 2K H P D [0.4cm] H 4K

Remove Ms P D (1) H 2K H P D (0.4) H 4K

example. With this strategy the time to complete this task is 7.10sec. Here there are no Ms to consider. This is surprisingly fast even when compared to the bi-modal UQ editor and may be an argument for modeless editors. Prototypes of a new editor, UQ*, indicate that a modeless approach is at least feasible. There is no di erence between modeless and bi-modal text recognition for program development examples. However, in

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

97

table 7.11 we present one strategy for modeless text-recognition editing of our program maintenance example. Here, the predicted time for the task is 5.64sec| 1.46sec faster than plain text editing. We still have some reservations about its ecacy, in particular where the editor may sense a trivial change to a document as a signi cant one. Eager text rearrangement as the user inputs the program character by character may not be exactly as the user expects due to the editor's interpretation of some intermediate stages of the input. Nevertheless the feasibility of a modeless approach has been demonstrated. Table 7.11: KLM analysis for program code maintenance for modeless text-recognition (UQ*).

Operation

Physical Operators Select WHILE word P [while] K [middle button] Reach for keyboard H [keyboard] Enter if 2K [if] Reach for mouse H [mouse] Select DO word P [do] K [middle button] Reach for keyboard H [keyboard] Enter then 4K [then]

Include Ms P K H 2K H P K H 4K

Remove Ms P K H 2K H P K H 4K

7.9 Comparative Results and Discussion In the preceding sections we developed keystroke models for some simple program development and maintenance examples for the various editing paradigms. Similar analysis was undertaken for the examples in Appendices C and D. The results for the program development examples are presented in tables 7.12 and 7.13. Table 7.12 gives the actual time estimates for the example code. In table 7.13 we provide a comparison among the editing paradigms by dividing the time estimates for each task and each paradigm by the time estimate for text-recognition. In all situations text-recognition provides the best estimated time. Averages for the paradigms are probably not instructive however in more than half the cases text-recognition is more than 20% faster than tree-building and using a plain text editor is relatively slow. The di erence between tree-building and text-recognition is mainly

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

98

attributable to the extra M operators. Table 7.12: KLM analysis time estimates for tree-building (CSG-TB), tree-building with keyboardbased menu selection (CSG-TBk), text-recognition (UQ1-TR), text-recognition with mouse-based menu selection (UQ1-TB) and plain text editing (EDIT) paradigms for program development.

Procedure CSG-TB CSG-TBk UQ1-TR UQ1-TB EDIT Check{PROCEDURE 20.24 18.51 17.49 20.51 23.37 AddNumbers{PROCEDURE 21.21 19.17 17.36 17.69 27.52 AddNumbers{FOR 29.53 28.77 23.98 29.70 31.16 StoreCount{CASE 50.77 50.23 40.22 50.05 55.98 Random{FUNCTION 18.36 16.32 12.60 13.21 22.48 ComputeChange{IF-THEN-ELSE 16.70 16.04 12.83 15.63 20.52 CheckInput{REPEAT-UNTIL 37.70 36.76 33.55 34.72 40.40 IterativeSum{WHILE-DO 30.03 28.31 25.66 28.96 33.96

Table 7.13: Comparisons of tree-building (CSG-TB), tree-building with keyboard-based menu selection (CSG-TBk), text-recognition with mouse-based menu selection (UQ1-TB) and plain text editing (EDIT) paradigms, as a ratio of text-recognition (UQ1-TR) for program development.

Procedure CSG-TB CSG-TBk UQ1-TB EDIT Check 1.16 1.06 1.17 1.34 AddNumbers{PROCEDURE 1.22 1.10 1.02 1.59 AddNumbers{FOR 1.23 1.20 1.24 1.30 StoreCount{CASE 1.26 1.25 1.24 1.39 Random{FUNCTION 1.46 1.30 1.05 1.78 ComputeChange{IF-THEN-ELSE 1.30 1.25 1.22 1.60 CheckInput{REPEAT-UNTIL 1.12 1.10 1.03 1.20 IterativeSum{WHILE-DO 1.17 1.10 1.13 1.32

Table 7.14 presents the results for the selected maintenance tasks while table 7.15 provides a comparison among the paradigms in the same way as table 7.13. Either the modeless text-recognition editor (UQ*) or a plain modeless text editor (EDIT) proved best for program maintenance in almost all instances. The availability of transformations in the tree-building paradigm is interesting. For the Cornell editor used, in the WHILE-DO change there is no option of a transformation to IF-THEN but the reverse is available. This transformation facility allows a decrease of 28.28sec in the time to execute the reversal whereas such editing operations for the text-recognition approach have similar time estimates. The bi-modal text-recognition paradigm is less ecient than tree-building in situations requiring

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

99

the user to terminate insertion of text and where the tree-building paradigm allows simple modeless text changes. Table 7.14: KLM analysis time estimates for tree-building (CSG-TB), tree-building with keyboardbased menu selection (CSG-TBk), text-recognition (UQ1-TR), text-recognition with mouse-based menu selection (UQ1-TB), plain text editing (EDIT) paradigms, and modeless text-recognition (UQ*) for program maintenance.

Procedure

CSG-TB CSG-TBk UQ1-TR UQ1-TB EDIT UQ* WHILE-DO change 32.39 22.38 14.36 16.53 7.10 5.64 IF-THEN change 4.11 4.25 14.64 15.97 7.35 5.92 AddNumbers{insert statement 15.12 15.12 15.15 15.15 11.86 12.14 AddNumbers{remove statement 5.05 4.25 4.11 4.11 3.29 4.11 AddNumbers{reverse loop 19.00 13.74 11.84 11.84 7.48 7.48 AddNumbers{alter var. name 18.87 15.27 21.78 21.78 10.29 8.70 ComputeChange{remove IF 29.03 16.37 10.28 10.28 9.69 7.27 ComputeChange{insert IF 29.54 20.52 19.63 22.85 20.79 10.91 ComputeChange{alter IF 4.69 4.69 7.96 7.96 3.88 3.20 IterativeSum{remove WHILE 18.28 13.01 9.32 9.32 8.90 9.32 IterativeSum{insert WHILE 19.04 16.91 21.03 23.81 25.50 12.31

Table 7.15: Comparisons of tree-building (CSG-TB), tree-building with keyboard-based menu selection (CSG-TBk), text-recognition with mouse-based menu selection (UQ1-TB), plain text editing (EDIT) paradigms, and modeless text-recognition (UQ*), as a ratio of text-recognition (UQ1-TR) for program maintenance.

Procedure

CSG-TB CSG-TBk UQ1-TB EDIT UQ* 2.26 1.56 1.15 0.49 0.39 0.28 0.29 1.09 0.50 0.40 AddNumbers{insert statement 1.00 1.00 1.00 0.78 0.80 AddNumbers{remove statement 1.23 1.03 1.00 0.80 1.00 AddNumbers{reverse loop 1.60 1.16 1.00 0.63 0.63 AddNumbers{alter var. name 0.87 0.70 1.00 0.47 0.40 ComputeChange{remove IF 2.82 1.59 1.00 0.94 0.71 ComputeChange{insert IF 1.50 1.05 1.16 1.06 0.56 ComputeChange{alter IF 0.59 0.59 1.00 0.49 0.40 IterativeSum{remove WHILE 1.96 1.40 1.00 0.95 1.00 IterativeSum{insert WHILE 0.91 0.80 1.13 1.21 0.59 WHILE-DO change IF-THEN change

Both UQ editors use the text-recognition paradigm. The decision to use the text-recognition approach was based on an intuitive preference for the direct manipulation style, and the evolutionary advantages for professional users who were already accustomed to the text-editing paradigm. By applying the KLM we showed

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

100

that for program development this paradigm was most ecient.4 Text-recognition editing was not as ecient as tree-building in some program maintenance situations (four out of eleven of our examples). Speci cally, where transformations from one construct to another were available, the tree-building editor was faster. Nevertheless for 60% of the maintenance tasks bi-modal text-recognition was faster (up to 182% faster in one case) than tree-building, and modeless text-recognition was usually faster than tree-building. There are instances where input of lengthy or dicult to remember templates and language operators may be assisted by menu display and selection. This is particularly true for complex formal languages such as the speci cation language Z. It is on this basis more than any other that the decision to allow template/operator selection in the UQ editors is justi ed. One concern about the KLM is the partitioning of timing for unit-tasks into acquisition and execution time estimates. The provision of menu selection in our Z language editor is an example. Here the menu may be more useful for development of Z language expressions where there are many possible operators, rather than for language constructs. The menu acts as a prompt for the user since these operators and their ASCII representation are dicult to remember. This implies a mix of acquisition and execution time components for a task. The KLM assumes that these are independent so the KLM may be inadequate in these circumstances.

7.10 Summary The KLM has proved to be a useful design evaluation tool. It has not only assisted in comparing relevant design options but has also indicated inadequacies of the UQ implementation as it currently stands. However, one of the KLM's main problems seems to be the diculty of de ning where M operators should be placed. This is non-trivial, needing careful thought and probably experimentation for some design situations. A predictive model such as the KLM enables a designer to predict the time taken by experienced-expert users in carrying out a given task using a given tool The validity of this claim depends on the KLM parameter estimates used which we investigate further in chapter 9. 4

CHAPTER 7. A KEYSTROKE MODEL OF EDITING PARADIGMS

101

or paradigm. However, such models make many assumptions including error-free expert behaviour and they only predict time to execute a task and not the time taken in acquiring the information to execute the task. It is unlikely, however, that even expert software engineers behave in an error-free way when using software development tools so error type, frequency and distribution are of interest. Attempts to incorporate error scenarios in models are expected to be the basis of research into further modelling schemes, but as yet these new schemes are not available [OO90]. Thus, although models provide some theoretical basis on which to choose between design options there are many issues not covered. Based on these arguments, a controlled experiment using software engineers was undertaken and is presented in the following chapter. The data collected also serves to validate the predictive models, this validation being presented in chapter 9.

Chapter 8 An Experiment with Editing Paradigms In chapter 7 we examined the use of predictive modelling as a technique to evaluate editing paradigms. We acknowledged a number of problems with the approach and the need for experimentation in a controlled environment to assist with solutions to those problems. In this chapter we are still interested in comparing the two basic paradigms for editing that are commonly associated with language-based editors: tree-building and text-recognition. However, unlike chapter 7 we describe a laboratory-style experiment rather than a predictive modelling exercise.

8.1 Experimental Context As with the predictive modelling approach, we remain interested in the relative eciency of the two paradigms, that is, nding which paradigm is more ecient for use by software engineers engaged in typical software development tasks. The preference a user has for one paradigm over the other is also of interest|as was the case for the hierarchical structure presentation experiment of chapter 6. Thus, the two paradigms of tree-building and text recognition are the treatments in our formal experimental design. Quantitative performance levels are best determined using examples based on program development tasks. Here we set code input or maintenance tasks for the subjects and measure completion times and error rates. The actual choice of task is likely to be of concern since some tasks may fare better under one paradigm than another. A selection of tasks based on representative programmer activities has already been developed in the KLM study. 102

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

103

8.2 Experimental Details

8.2.1 Experimental Subjects

For our experiment, we needed users who were familiar with the `look and feel' of the editors, with their use and with the programming language and tasks to be performed. Thus, the type of user in which we were interested was special|a software engineer. This group was small since the editors were experimental research tools primarily used to test conceptual ideas of various researchers. Five nal year honours students (two females and three males) participated in the experiment as part of a course on the construction of programming environments at the University of Queensland in the Department of Computer Science. All subjects had used the editors as part of their course and participation in the experiment contributed to their assessment for the course.

8.2.2 Experimental Apparatus For this experiment we used a Labtam X-terminal (48cm monitor, 16Mb memory) connected to a Sun SPARC 20. During the experiment other users were able to utilise the computer so a log was kept of machine load. Table 8.1 shows the means of the average number of jobs in the run queue (every minute) recorded using uptime during each user's session each day. These loads, although not optimal for such an experiment, at least were consistent during the experiment's schedule. Table 8.1: Loadings on computer while editing experiment conducted.

User Day 1 Day 2 Day 3 1 2.77 3.79 3.33 2 3.70 2.28 3.53 3 3.30 3.83 3.47 4 3.03 2.87 3.27 5 2.88 3.11 3.59 Two editors; an existing text-recognition editor, UQ1, and a Pascal editor provided with the Synthesizer Generator package, constituted the experimental platforms. The particular UQ1 editor had been enhanced to allow the editor to simulate template-based input (that is, menu selection of language components) without the

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

104

need to compromise its basic text-recognition editing paradigm. By using the one UQ editor for experimenting with both paradigms, we potentially avoided confounding e ects of di erent physical interface presentation that prevail when two di erent editors were compared. Both editors, UQ1 and the Cornell Pascal editor, were X Window System based versions. The screen desktop that experimental subjects used included a console window, a clock icon, an xterm window that logged system load and invoked the X event log program, and an xterm that invoked the editor. However as far as the subjects were concerned, only the editor and its interface were relevant and both editor windows obscured most of these other windows and icons in any case. The tasks to be performed were the same as those in the KLM study. They represented a cross-section of the types of tasks in which programmers might engage. By utilising the same tasks as for the predictive modelling exercise, we were able to validate the KLM models.

8.2.3 Experimental Procedure Before the experiment, all subjects were given a document that explained the aims and objectives of the experiment as well as some detail of the procedures to be followed (see appendix E). Participation in the experiment contributed to the individual's assessment in their course but it was made clear to the subjects that their individual performances with the editing paradigms and the editors were not included as part of this assessment. Their participation and report on the experiment formed the only basis for assessment. The experiment was conducted on the same equipment in the same room for each subject. This arrangement necessitated individual testing of subjects with each subject given up to one hour to complete the required tasks for the designated editor/paradigm. As far as possible, the time of day that subjects participated in the experiment was the most convenient time for the subject. Before beginning with a particular editor or paradigm, subjects were given an opportunity to relax and become familiar with the experimental setup. This involved physical adjustments of chair, screen, mouse, and so on, as well as preliminary experimentation with a set of programs and tasks not included in the

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

105

experiment proper (see sample tasks in appendix F). During the experiment two computer-based logs were automatically kept. The rst, already mentioned, monitored the load on the computer while each user was carrying out the allotted tasks. All mouse and keyboard events were monitored using a second log. The programs collectively called xmon, and including xmonui and xmond [McF91], were used for this data collection and monitoring function. Use of this program was relatively straightforward. The user interface allowed selection from a wide range of possible X events for capture but for this experiment we restricted the data collection to ButtonPress, ButtonRelease and KeyPress events. Even with such a restricted set of events the log les for each editor and each subject ranged in size up to 1Mb. All experimental sessions were video-taped1 with the video camera placed to the left of the subject and pointing at the terminal screen. Audio recording was also enabled to obtain comments made by subjects during a session. Subject reactions were monitored but intervention during a session was only undertaken when clari cation of a task was required or when system functions needed attention. Between the computer-based log, the video-taping and the observation, all errors, the time required for all tasks and individual mouse and keyboard activity were recorded. At the end of the experiment the subjects were asked to complete a questionnaire concerning aspects of the experiment and including their preference for the individual treatments. The survey instrument is given in appendix G.

8.2.4 Experimental Design There were three treatments: 1. program input using text-recognition (UQ1-TR); 2. program input using tree-building via the UQ1 editor (UQ1-TB); and 3. program input using tree-building via the Cornell editor (CSG-TB). For each of the ve subjects there were 20 tasks to be completed on each of three days. These tasks included nine program input tasks and eleven program Initially we had thought that the computer-based log would be sucient but the video was essential for deciphering editing tasks and useful for recording user opinions about various issues. 1

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

106

maintenance tasks. Of the program input tasks the rst and the last were almost identical. The order of the tasks as presented to the subjects is given in table 8.2. Subjects completed all program input tasks and then all maintenance tasks. Table 8.2: Program input and maintenance tasks performed by all subjects with all editors.

1. 2. 3. 4. 5. 6. 7. 8. 9.

Input Tasks

Check{PROCEDURE AddNumbers{PROCEDURE AddNumbers{FOR StoreCount{CASE Random{FUNCTION ComputeChange{IF-THEN-ELSE CheckInput{REPEAT-UNTIL IterativeSum{WHILE-DO Count{PROCEDURE

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Maintenance Tasks WHILE-DO change IF-THEN change AddNumbers{insert statement AddNumbers{remove statement AddNumbers{reverse loop AddNumbers{alter var. name ComputeChange{remove IF ComputeChange{insert IF ComputeChange{alter IF IterativeSum{remove WHILE IterativeSum{insert WHILE

Treatment sequence was randomly assigned for each subject. Subjects undertook the software development tasks for one paradigm on one day, another paradigm on the next day and the remaining paradigm on the following day. Randomisation and a reasonable delay between treatment application decreased any carry-over (or residual) e ect of previous treatments. Nevertheless, it was possible that users learned particular tasks over the three days of the experiment.2 We reasoned that there might have been some settling-in e ect or even a learning e ect for individual editors on a day, so all subjects completed the tasks in the same order to minimise variation between tasks. Two tasks Check{PROCEDURE and Count{ PROCEDURE, tasks 1 and 9 for input, were essentially identical so we were able to note any learning e ect by comparing these tasks. In this experiment there were three treatments and ve subjects with 20 experimental instances or trials (tasks). The experimental design was a change-over design|the same as that used in the experiments of chapter 6. However because of the small numbers of subjects it was infeasible to analyse the data using a Analyses of the e ect of day of editor use (ignoring treatment) showed a signi cant day e ect for only two parameters: overall completion time for maintenance task 6 (day 1: 36.5sec, day 2: 26.7sec and day 3: 23.1sec), and number of errors in maintenance task 7 (day 1: 2.5, day 2: 1.6 and day 3: 1.6). 2

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

107

change-over design, so the data was analysed as a randomised block design with the individuals as blocks. Thus, unlike the analyses of chapter 6 we were unable to measure the sequencing e ect of the treatments in conjunction with measuring the treatment e ect. However, the randomisation a orded by the change-over design was equally valid for the randomised block design. Time required to complete each task was analysed by analysis of variance. The number of error types and the actual number of errors associated with task completion were also analysed by analysis of variance after applying a log transformation typically applied before analysing data consisting of counts. Time associated with errors and average error times were analysed by analysis of variance as well. Some errors were ltered before any analysis. In particular, long time delays between ending one task and beginning the next were eliminated from analysis. In order to assess learnability (or settling-in) within a paradigm or editor, the parameters above, for program input tasks 1 and 9, were analysed as repeated measures by assuming a split-plot design.

8.3 Usability Evaluation The usability evaluation encompassed analysis and examination of various parameters. Here we analysed task completion times, number of errors and error times. Various error types were also examined as were data related to individual subject's preferences, perceptions and experiences. For all analyses there were two null hypotheses concerning each of the parameters: 1. There are no di erences between editors, 2. There are no di erences between subjects.

8.3.1 Task Completion Times Table 8.3 shows the analysis of variance of task completion time for program input task 1. Analyses for other tasks had a similar format. The error term in this analysis was one degree of freedom less than would be the norm due to the loss of data relating to one of the subjects for one day and treatment, UQ1-TB. Such data

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

108

loss is common in empirical studies and sometimes unavoidable (although annoying) but the resulting analysis is no less valid using standard least-squares techniques. Table 8.3: Analysis of variance table of task completion time for program input task 1.

Source of Degrees of Variation Freedom Treatments 2 Subjects 4 Error 7 Total 13

Sums of Mean F Test Signi cance Squares Squares 998.5 499.3 1.14 P > 0:05 1404.8 351.2 0.80 P > 0:05 3070.4 438.6 5473.7

Table 8.4 shows the mean times taken by subjects to complete the program input and maintenance tasks. For interpretation of this table it is only instructive to consider means within a row. Means within the same row, with the same superscript, are not signi cantly di erent. As well, if the row superscripts are missing then the means in that row are not signi cantly di erent. The column, SE Mean, is a measure of the accuracy of the means of each of the treatments, based on the data q collected. SE Mean is calculated as Error=n where n is the number of subjects making up that mean.3 There were few statistically signi cant results. Analyses of two program input tasks and two maintenance tasks yielded signi cant di erences between treatments. A di erence between subjects was found signi cant in only one maintenance task, IterativeSum{insert WHILE. Input of PROCEDURE-stub AddNumbers was particularly problematic for the subjects when using the CSG-TB. This was probably due to the number of options presented to the subjects while tree-walking through the parameter list for such a construct. The same problem occurred, but to a lesser extent, for the FUNCTION-stub Random. Input and maintenance of the WHILE-DO construct in FUNCTION IterativeSum was slower using UQ1 (in either form) than using CSG-TB, although this di erence was not always statistically signi cant. On the surface there was little reason for this di erence but, as the samples below indicate, observation and comments by subjects indicated that there was confusion with the presentation of downstream symbols as provided by UQ1 and whether or not to accept them. One subject The tables in appendix H have the same structure as table 8.4 so this description is relevant to those tables also. 3

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

109

Table 8.4: Overall completion times for program input and maintenance tasks (sec). Means within the same row, with the same superscript, are not significantly different.

Input Task UQ1-TR UQ1-TB CSG-TB SE Mean Check{PROCEDURE 33.47 56.51 46.92 9.93 a ab b AddNumbers{PROCEDURE 19:81 31:91 75:55 13.77 AddNumbers{FOR 47.85 45.76 43.76 5.60 StoreCount{CASE 95.76 73.66 87.58 15.15 Random{FUNCTION 22.30 16.14 37.36 5.93 ComputeChange{IF-THEN-ELSE 20.88 37.86 34.95 6.01 CheckInput{REPEAT-UNTIL 62.20 54.74 65.57 13.00 IterativeSum{WHILE-DO 58:96a 49:85ab 33:78b 5.66 Count{PROCEDURE 38.79 47.56 25.86 5.57 Maintenance Task UQ1-TR UQ1-TB CSG-TB SE Mean WHILE-DO change 25.41 22.48 42.25 6.79 IF-THEN change 21:66a 15:78a 2:16b 3.34 AddNumbers{insert statement 26.17 21.06 31.12 4.38 AddNumbers{remove statement 4.64 3.55 13.10 2.75 AddNumbers{reverse loop 19.36 23.07 23.60 2.70 AddNumbers{alter var. name 28.58 26.94 28.99 4.66 ComputeChange{remove IF 20.21 13.85 51.31 9.16 ComputeChange{insert IF 37.70 37.86 49.24 9.22 ComputeChange{alter IF 10.99 9.99 6.47 1.47 IterativeSum{remove WHILE 29.43 19.70 26.85 6.76 a b a IterativeSum{insert WHILE 47:89 66:33 38:72 4.89 solved the problem by doing syntax checks at various points in the program input task, thus lengthening task time. Others found the display less than helpful and were confused about what to do next and consequently wasted time. I'll do a syntax check; to be sure. UQ1 was not helping me much with semicolons (;). . . . accepting of END is not obvious. There's that END that I keep (accepting)|not obvious when it's needed. It (UQ1) got rid of the END! Changing constructs within the same syntactic class (for example IF-THEN to WHILE-DO) using the transformation function available in CSG-TB was clearly very fast, requiring only two selections with the mouse, and far superior to any method currently available in the UQ editors. Several subjects made comments about the

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

110

ease of use of transformations. Such transformations are a powerful function in a language-based editor. An analysis was also performed to ascertain if there was any di erence between the completion times for the two equivalent tasks, input task 1 and task 9. A repeated measures analysis, assuming a split-plot arrangement where the individual events constituted the components of the plot, is presented in table 8.5. Table 8.5: Analysis of variance table of task completion time for program input task 1.

Source of Degrees of Variation Freedom Treatments 2 Subjects 4 Error 7 Tasks 1 Treatment  Tasks 2 Error 11 Total 27

Sums of Mean F Test Signi cance Squares Squares 1165.3 582.7 1.48 P > 0:05 882.6 220.7 0.56 P > 0:05 2756.3 393.8 405.6 405.6 2.44 P > 0:05 870.0 435.0 2.62 P > 0:05 1827.5 166.1 7907.3

The mean completion time for task 1 was 44.59sec while for task 9 it was 36.93sec and although this was not a signi cant di erence there was concern that the subjects did improve in eciency with particular editors. A closer examination reveals that this improvement in performance was particularly apparent for the CSG-TB treatment where the mean times for the tasks were 46.92sec and 25.86sec respectively.

8.3.2 Errors in Editor Use These subjects, although familiar with the editors used in the experiment, could not be considered expert users. We categorised 17 types of error, many trivial, in order to assist in the examination of the usability of the editors. Once listed the errors were of interest themselves in that we could analyse error times, number of errors and so on. However their identi cation also enabled us to subtract their e ect from task completion times and to allow a correlation to be made between the actual performance and the results of the models from the predictive modelling exercise of chapter 7. We discuss this model validation later; for now we examine the errors themselves. Table 8.6 shows the error types found. Not all were errors

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

111

as such but rather deviations from expected actions associated with a particular editor or editing paradigm. Table 8.6: Types of error in program input and maintenance tasks.

Code A B C D E F G H I J K L M N O P Q

Error Description Fix an incorrect upper case/lower case instance Unnecessary spaces left in code Incorrectly xed correct data Return to x incorrectly typed data Long time spent navigating during a task Unnecessary ENTER key Excessive time spent on one keystroke Unnecessary use of SHIFT key Incorrect or excess data left behind Unnecessary CTRL key Unnecessary ALT key Long time break between tasks Problem recognising what to do with editor supplied text Incorrectly typing text in Control Panel (UQ1 only) Time to change between Control Panel and Text Region (UQ1 only) Unnecessary ESC key Unnecessary syntax check

As for task completion time, analysis of variance was used to provide a basis for comparing treatments. The analysis of variance table has the same format as that for task completion times. There are four possible analyses relevant to error data: 1. Number of error types in program input and maintenance tasks, 2. Number of actual errors in program input and maintenance tasks, 3. Error times for program input and maintenance tasks, 4. Average time for each error for program input and maintenance tasks. Analyses were conducted on all these variables, tables of means for which are given in appendix H. These tables have the same structure as table 8.4. In ve maintenance tasks UQ1-TR induced signi cantly more types of error than CSG-TB; however UQ1-TB also produced more types of error than CSG-TB in three of these ve. This would indicate that it was not just the editing paradigm that caused

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

112

more types of error to be produced but that some aspect of the UQ editor also in uenced errors. In two program input tasks and six maintenance tasks UQ1-TR showed signi cantly more actual errors than CSG-TB but there was no di erence between UQ1-TR and UQ1-TB in ve of these instances. Analyses of total error time showed no signi cant di erences either between treatments or subjects. Only two maintenance tasks produced signi cant di erences between treatments with respect to average error time. These tasks were ComputeChange|remove IF and ComputeChange|alter IF. For the rst task, subjects spent signi cantly more time on their errors with CSG-TB than with either UQ editor but with the second task the reverse was the case. Removal of the IF construct was simple in the case of the UQ editors since it involved highlighting and deleting the appropriate components of the constructs to leave the statement clauses intact. This was less trivial in the case of CSG-TB and involved several cut and paste operations, and repositioning of statement groupings. The task of altering the IF construct's logical expression was simple with the modeless CSG-TB editor but caused minor problems with the bi-modal UQ editors. Intuitively we expected text recognition to produce more errors since the paradigm needs to be more tolerant of errors than tree-building. Our intuition was borne out but more so for maintenance tasks than program input tasks for reasons that are unclear at this stage. As for task completion time, analyses were performed to identify if there was any di erence between tasks 1 and 9, for all the error parameters. No error variable showed any signi cant di erence between these two tasks for the error parameters.

8.3.3 Menu Selection versus Typing For two editors, UQ1-TB and CSG-TB, the subjects were able to select language construct keywords from a menu of those available. In section 7.7 we used the KLM to compare and contrast this input method with keyboard entry of such keywords. Some language constructs occurred more than once in our example tasks so there were multiple instances of those. The data were analysed by analysis of variance with the same format as in previous sections. Where multiple instances of keywords were available the data were combined (since variability between instances was

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

113

comparable) and analysed as repeated measures using a split-plot design. Table 8.7 shows actual times for inputting eight di erent language construct keywords using each of the editors. Table 8.7: Times for program input and maintenance sub-task events. Means within the same row, with the same superscript, are not significantly different.

Sub-Task

PROCEDURE FUNCTION REPEAT WHILE BEGIN CASE FOR IF

UQ1-TR UQ1-TB CSG-TB SE Mean Instances 2:52a 3:63a 3:82a 1.16 3 a a a 1:64 2:03 3:58 1.13 1 1:93a 3:22ab 4:08b 0.44 1 a b b 1:15 3:22 7:32 0.87 3 a b b 1:17 4:81 3:34 0.57 2 1:17a 3:72ab 4:90b 0.88 1 a a b 1:23 3:15 6:03 0.71 1 0:77a 3:85b 4:52b 0.45 5

In only 5.9% of cases (9/153) did the subjects not use the menu facility when it was available. Statistically signi cant timing di erences were found for the six shortest keywords. For the two longest keywords, PROCEDURE and FUNCTION, UQ1TR was faster than both UQ1-TB and CSG-TB but this was not statistically significant. Average keystroke timings for typing of language construct keywords ranged from 0.21sec to 0.41sec per key (the KLM assumed 0.28sec per key).

8.3.4 Use of Acceptance versus Typing Both UQ editors anticipated downstream symbols that could be either accepted or overtyped. Subjects were able to elect for themselves whether they accepted downstream symbols or not. Frequency-of-use data were available for both editors, UQ1-TR and UQ1-TB, ve subjects and 30 instances where it was feasible to use CTRL a, the accept key sequence. This categorical data was analysed using loglinear models [BFH75] with interpretation indicating that the e ect of the editors was not signi cant. That is, there was no di erence between the editors, so we `collapsed' the table relative to this variable to obtain table 8.8. This table also includes each subject's average typing speed (in sec per keystroke) from a standard typing exercise administered at the end of the experiment (see appendix F). Two subjects utilised CTRL a on less than 40% of the occasions it was practicable

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

114

to use it while three subjects used it on more than 70% of these occasions. The low-use subjects were relatively quick typists which may have accounted for their preference although one did note the usefulness of the concept. It (CTRL a) is probably faster for slow typists. . . . it makes it easy to skip over the THEN part (when constructing an IF clause). Table 8.8: Counts of the use of accept for program input and maintenance sub-task events.

User Used CTRL a No CTRL 1 11 (37%) 19 2 45 (82%) 10 3 18 (31%) 40 4 41 (72%) 16 5 41 (75%) 14

a

Typing Speed 0.161 0.189 0.216 0.319 0.273

The times required to either type or accept downstream symbols were also available. Table 8.9 shows the mean times for use of this feature. No signi cant di erences were noticed between editor implementations so data were aggregated for each downstream symbol. Even with this aggregation the number of data for some symbols was relatively small and, for these symbols, statistical analyses were of dubious value. Table 8.9: Times for program input and maintenance sub-task events. Means within the same row, with the same superscript, are not significantly different.

Sub-Task Used CTRL UNTIL 2:31a THEN 1:25a END 2:86a DO 1:33a OF 1:26a := 1:77a ; 2:34a ) 2:05a

a

No CTRL 1:06a 0:58b 0:65b 0:59b 0:37a 0:79a 1:30b 0:70b

a

SE Mean 0.80 0.28 0.16 0.10 0.58 0.35 0.19 0.11 0.43 0.23 0.36 0.26 0.27 0.24 0.15 0.12

1 13 13 8 2 3 29 11

n

8 32 36 26 7 6 36 16

When accept was used, subjects took longer to input CTRL a than to input a symbol, for example UNTIL. This was true regardless of the symbol.

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

115

8.3.5 Perceptions and Preferences of Subjects All subjects, independently, answered a simple questionnaire subsequent to using all three editing paradigms/editors (appendix G). Two of the questions solicited the subjects' perceptions of their performance with the editors and a third their preferred editing environment. In these questions we used a scale of 1{3 to rank the three editors. For example, if a subject ranked CSG-TB fastest followed by UQ1TR and UQ1-TB, then CSG-TB was ranked 3, UQ1-TR was ranked 2 and UQ1-TB was ranked 1. Table 8.10 gives the rank-totals for each editor for these questions. The Friedman two-way analysis of variance for ranks [Sie56, pages 166{172] was used to test the null hypothesis that the editors did not di er in overall ranking. Friedman's statistic, 2r , as well as the exact probability associated with the statistic is reported. Although there was no signi cant di erence in overall ranking between editors for any of the questions we would concede that with sample sizes this small the analysis of ranks lacks power. Table 8.10: Rank totals over five subjects, comparing perceptions and preferences of those subjects.

Question UQ1-TR UQ1-TB CSG-TB 2r Signi cance Speed 8 9 13 2.8 P = 0:37 Accuracy 7.5 10.5 12 2.1 P = 0:43 Preference 8 8 14 4.8 P = 0:12 It was also instructive to consider highest rankings. On the question of performance speed, four out of ve of the subjects perceived that they were fastest with CSG-TB (the other chose UQ1-TR). Performance accuracy was more even with three considering they were most accurate when using CSG-TB and one each for the other two editors. Editor preference matched performance speed with four subjects preferring CSG-TB and only one UQ1-TR. As well as ranking the editors, the subjects made comments about these parameters: speed, accuracy and preference. Tree-building reportedly had several advantages over text-recognition. The menu of available construct templates was considered useful because it seemed relatively fast as an insertion mechanism and the menu itself served to continually remind the user of the constructs available

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

116

at any particular time. On the other hand, the subjects noted that tree-building involved more mouse and keyboard interchanges, hindering speed of use but not a ecting accuracy. Both UQ editors had one feature that frustrated most subjects|simple textual changes required a complex set of actions due to the bi-modal editing style. CSGTB was modeless, with respect to simple text changes, such as altering the name of an identi er. The modality of the UQ editors has already been discussed (see section 7.8), but, although the subjects preferred the modeless style this aspect is not directly related to editing paradigm per se. Some subjects also had diculty with the formatted presentation of documents using the UQ editors, particularly when engaged in program maintenance. The view of a document seemed to change appreciably in such instances and it was not until insertion was complete (as signi ed by using the ESC key) that the text was reformatted. The exact state of the document was in question and some found this aspect confusing. In contrast, with CSG-TB the user e ectively lled in the blank spaces within xed-format language templates and at all times the overall structure of the document was evident. The other ve questions that the subjects answered were framed to assist them in their task of reporting on the experiment itself. Most relevant to this experiment was their perception of the conditions under which the experiment was conducted. All subjects reported that they were comfortable with the setup noting that the room was quiet, there were no distractions or interruptions, and the seating arrangements and position of equipment were good. The use of video to record events was an initial concern of some subjects but when it was made clear that only the computer screen was to be included on the video these concerns disappeared. The presence of the experimenter was not considered a problem.

8.4 Conclusions In general, the subjects preferred the CSG editor over the UQ editors (with either paradigm implemented) and estimated that their speed and accuracy with the CSG editor was best when comparing the three paradigms or editors. However this perception of the users that they performed best with a particular paradigm did not

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

117

re ect the reality of the situation. There appeared to be few statistically signi cant di erences between the usability parameters used in this experiment to compare language-based editing paradigms. User reaction to the di erent paradigms was of importance but complex languages, such as the speci cation language Z, may prove better for measuring the e ectiveness and eciency of language-based editing paradigms rather than typical imperative programming languages such as Pascal. The richness of a language such as Z almost dictates that the user must have some assistance in the preparation of documents. This is readily provided via menus of selectable constructs, operators and lexicals but the best structure for these menus is far from clear at this stage. There were a number of problems generated by the experiment itself. We only had a small number of subjects ( ve) but each subject used all editors so this effectively equated to 15 experimental units. The randomisation provided by the change-over design ensured that any residual e ects of one treatment on another were negligible but nevertheless we were not able to adequately measure these residual e ects. Even so we are unsatis ed and it would be useful to replicate this experiment at a later stage to con rm the results obtained. Our experimental subjects were not practising software engineers but nal year honours computer science students, who, although highly intelligent, had little or no industry experience. The question must to be asked|can we extrapolate the results from this experiment with its fairly small sample of students to software engineers in general? The answer is probably no, but such experimentation and analysis provide estimates of variability suitable for calculating sample sizes that are able to give more powerful tests of treatment di erences for future experimentation. As such tools become used by a wider range of professionals, we may be able to conduct the experiments with more experienced users. Our tasks were initially chosen for the purpose of modelling editing tasks using the KLM. Certainly the tasks were typical of the types of editing tasks undertaken by software engineers but they were `out of context' and studying long term use by software engineers of such products could be a useful complementary activity. However such `situated' research is expensive and there are many diculties, in particular the lack of control and availability of appropriate subjects. In this ex-

CHAPTER 8. AN EXPERIMENT WITH EDITING PARADIGMS

118

periment all subjects were comfortable with the experimental setup and none felt that the situation was arti cial in any sense although some commented on the relatively simple editing tasks they were required to perform. The experimental procedure and apparatus itself were not perfect either. Our video-recording of a computer screen was not particularly satisfactory. Firstly, the resolution was relatively poor and we needed to view the replay of the tape at close range to distinguish various actions and movements. We needed to exercise due care in this monitoring process to ensure that we matched the video record and data log. This often necessitated several replays of the one short segment of video-tape. Secondly, the video-camera and computer screen were not synchronised which produced a rolling e ect on the play-back device when reviewing the tape. Both these problems made data capture a long laborious task and both could have been alleviated (to some extent at least) by feeding video-output directly from the computer to the video-camera in the rst place|an option we discovered too late. Instrumentation of the editors rather than monitoring the whole X Windows session using an event logger could have facilitated better and easier data extraction. However, such a luxury is not always possible in product evaluation unless source code is available and even then there are likely to be many diculties in the instrumentation process itself.

8.5 Summary The usability experiment provided valuable new data on comparison of languagebased editing paradigms. Although limited to some extent by sample size, the experiment showed little advantage to either tree-building or text-recognition and indicated that some hybrid of the two was most appropriate. Users perceived that they performed better with tree-building but timings obtained did not support this.

Chapter 9 Validation of Keystroke-Level Models In chapter 7 we built Keystroke-Level Models and in chapter 8 we conducted an empirical usability experiment to evaluate editing paradigms. By utilising the same tasks in the usability experiment as in the predictive modelling exercise, we are able to compare results from both evaluations and e ectively validate the overall KLM estimates and speci c operator values involved. The placement and estimation of the memory operator is of particular interest. To convince ourselves that predictive modelling was worthwhile to apply in our design domain we validated our models and report the validation in this chapter.

9.1 Overall Keystroke-Level Model Validation The structure and conduct of the usability experiment meant that we were able to examine not only usability issues related to the editors and editing paradigms but also to perform a validation of theoretical models already proposed in chapter 7. In table 9.1 we present data generated by di erencing overall task completion times and task error times and averaging for the subjects. These data are comparable to the KLM data given in table 7.12 and table 7.14 from chapter 7, and summarised in table 9.2. Several methods were available to compare predicted and empirical results. The two used were percentage absolute error and correlation/regression. The gures in parenthesis in table 9.2 represent percentage absolute errors for each task and editor. Each of these values was calculated as: 119

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

120

Table 9.1: Empirically measured times for program input and maintenance tasks used in KLM study.

Input Task UQ1-TR UQ1-TB CSG-TB Check{PROCEDURE 23.60 35.90 29.52 AddNumbers{PROCEDURE 17.24 21.95 24.56 AddNumbers{FOR 28.81 36.60 31.75 StoreCount{CASE 56.44 59.90 56.59 Random{FUNCTION 14.38 15.38 29.19 ComputeChange{IF-THEN-ELSE 14.17 19.75 23.18 CheckInput{REPEAT-UNTIL 41.67 44.56 42.06 IterativeSum{WHILE-DO 32.38 36.76 26.60 Count{PROCEDURE 18.66 30.11 18.15 Maintenance Task UQ1-TR UQ1-TB CSG-TB WHILE-DO change 13.15 17.44 31.07 IF-THEN change 13.97 13.36 1.39 AddNumbers{insert statement 20.94 17.28 19.51 AddNumbers{remove statement 4.64 3.08 5.82 AddNumbers{reverse loop 14.23 18.82 14.46 AddNumbers{alter var. name 19.11 18.95 26.34 ComputeChange{remove IF 13.84 13.61 27.17 ComputeChange{insert IF 15.20 24.09 35.16 ComputeChange{alter IF 7.35 5.15 6.29 IterativeSum{remove WHILE 17.27 11.42 19.58 IterativeSum{insert WHILE 26.33 36.18 24.67 100 j predicted ? empirical j empirical Averages for these errors were calculated for each editor: UQ1-TR 17.7%, UQ1TB 23.4%, CSG-TB 26.8% (CSG-TB 17.9%, if the large value for, IF-THEN change, was omitted). Correlation analyses between predicted and empirical times for individual subjects produced highly related results. For each editor there was a highly signi cant correlation (P < 0:01) between predicted execution times (KLM) and actual execution times. Linear regression relationships also re ected this high correlation with a linear relationship between predicted times (dependent variable) and actual times (independent variable) accounting for between 67.1% and 79.2% of the overall variation (table 9.3). Figure 9.1 displays the linear relationships between predicted and actual execution times for all three editors. The column, SE Slope, is a mea-

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

121

sure of the accuracy of the estimated value q of the slope of 2the regression line (the Slope column). SE Slope is calculated as Error=(xi ? x) where xi are the KLM estimates for all tasks for an editor and x is their mean.1 Table 9.2: KLM analysis time estimates and, in parenthesis, percentage absolute error of these compared with empirical data for program input and maintenance tasks.

Input Task Check{PROCEDURE AddNumbers{PROCEDURE AddNumbers{FOR StoreCount{CASE Random{FUNCTION ComputeChange{IF-THEN-ELSE CheckInput{REPEAT-UNTIL IterativeSum{WHILE-DO Count{PROCEDURE Maintenance Task WHILE-DO change IF-THEN change AddNumbers{insert statement AddNumbers{remove statement AddNumbers{reverse loop AddNumbers{alter var. name ComputeChange{remove IF ComputeChange{insert IF ComputeChange{alter IF IterativeSum{remove WHILE IterativeSum{insert WHILE

UQ1-TR 17.49 (25.9) 17.36 (0.7) 23.98 (16.8) 40.22 (28.7) 12.60 (12.4) 12.83 (9.5) 33.55 (19.5) 25.66 (20.8) 17.49 (6.3) UQ1-TR 14.36 (9.2) 14.64 (4.8) 15.15 (27.7) 4.11 (11.4) 11.84 (16.8) 21.78 (14.0) 10.28 (25.7) 19.63 (29.1) 7.96 (8.3) 9.32 (46.0) 21.03 (20.1)

UQ1-TB 20.51 (42.9) 17.69 (19.4) 29.70 (18.9) 50.05 (16.4) 13.21 (14.1) 15.63 (20.9) 34.72 (22.1) 28.96 (21.2) 20.51 (31.9) UQ1-TB 16.53 (5.2) 15.97 (19.5) 15.15 (12.3) 4.11 (33.4) 11.84 (37.1) 21.78 (14.9) 10.28 (24.5) 22.85 (5.1) 7.96 (54.6) 9.32 (18.4) 23.81 (34.2)

CSG-TB 20.24 (31.4) 21.21 (13.6) 29.59 (6.8) 50.77 (10.3) 18.36 (37.1) 16.70 (28.0) 37.70 (10.4) 30.03 (12.9) 20.24 (11.5) CSG-TB 32.39 (4.2) 4.11 (195.7) 15.12 (22.5) 5.05 (13.2) 19.00 (31.4) 18.87 (28.4) 29.03 (6.8) 29.54 (16.0) 4.69 (25.4) 18.28 (6.6) 19.04 (22.8)

Table 9.3: Correlation and regression results for comparing KLM predicted and actual times.

Editor

Correlation Linear Regression Coecient (r) Intercept Slope SE Slope UQ1-TR 0.819 7.05 0.509 0.036 UQ1-TB 0.890 4.63 0.602 0.035 CSG-TB 0.845 5.02 0.691 0.047 The three regression equations indicate that actual execution times are marginally greater than predicted values. This result is not as obvious from the other analysis, percentage absolute error, since there the direction of any error is not available. 1

The same description of SE Slope is relevant to table 9.15.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

122

50 45

KLM Predicted Execution Time

40

UQ1−TB CSG−TB

35

UQ1−TR

30 25 20 15 10 5 0 0

10

20 30 40 Empirically Evaluated Execution Time

50

60

Figure 9.1: Linear relationships between KLM predicted and actual execution times for three editors.

Thus, although there is a reasonable correlation between empirical data and KLM predicted data, there appears to be a systematic di erence as well. For example, for an actual execution time of 20sec, the KLM predicted times for each editor are: UQ1-TR 17.23, UQ1-TB 16.67, CSG-TB 18.84. Both analyses, average percentage absolute error, and correlation and regression, indicate that model-predicted values represent close to 80% of empirical data values which accords with other literature [CMN80, OO90, PJ92, GJA93].

9.2 Data Collection for Parameter Estimation All key press, mouse button press and release events and timings were recorded for each subject and for each editor. The KLM predicted equivalent actions in each of the tasks analysed, so timings for model predictions were compared with actual timings.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

123

As already indicated, the principle concern with the KLM was with the estimate of time for the memory operator, M. For all instances with the KLM this parameter was predicted to be 1.35sec. With each task and editor there were events that were predicted to include this memory operator. There were four instances of such events. 1. For the editors UQ1-TR and UQ1-TB an M operator was assumed to precede acceptance of downstream symbols using the two keystroke sequence, CTRL a. Such events were modelled as M2K and all were extracted from the computer-based log of events for each editor and subject. 2. Single keystroke sequences that were predicted to be preceded by an M operator were evident in all three editors. In both UQ editors the ESC key was used to exit from insertion mode and return to navigation mode. The CSGTB editor implemented the ENTER or RETURN key to nalise statements input by the subject and provide template holes for lling by the user. Such events were modelled by MK. 3. All three editors included event sequences that required subjects to use the mouse to point to items for selection purposes. Items included editing commands, such as insert, delete and change for the UQ editors, and language command menu items, such as ifthen and while for the UQ1-TB and CSGTB editors. The KLM placed Ms in front of all such events that were modelled as MPK. 4. Editing operations involving one editor, CSG-TB, required subjects to use a pull-down menu to select editing operations, cut, copy and paste. Each event here involved pointing to a menu bar, holding down the left mouse button and drawing to the required editing operation (cut, copy or paste) and releasing the mouse button. This was modelled as MPD (D for draw). An alternative would have been to model this event as MPK press PK release since it was possible to use the menu in this way. However, actual use of the editor by subjects indicated that this was not the preferred method and MPD more closely simulated their actions.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

124

To estimate memory operator times for these event types we also needed estimates of times for K, PK and PD. Estimates for K may be found by using the computer-based log of all keystrokes not involving or biased by memory operators. Unfortunately, estimates for the other two parameter combinations were more problematic. It was possible to estimate the values of P and D by counting video frames but the main diculty was in knowing when to start the times for each parameter, that is, when had the M e ect stopped and the other parameter started. We chose, therefore, to consider the events in toto rather than attempt to split them into component parts.

9.2.1 Hypotheses Related to KLM Validation There were three principal aims for this analysis. Two related to hypotheses relevant to the types of event listed above. The third involved obtaining estimates of timings for the events. Thus for each event there were up to two null hypotheses concerning the timing measurements associated with those events, namely: 1. There are no di erences between editors. 2. There are no di erences between subjects. The second hypothesis related to the variability of performance of the subjects, a commonly cited problem [KL94b]. When null hypotheses were rejected estimates of timings needed to be calculated for the individual treatments or subjects. If an hypothesis was not rejected then an estimate was based on all available data across treatments, subjects or both.

9.2.2 Statistical Analysis The data were analysed by analysis of variance. There were one, two or three treatments (editors) depending on the type of event (M2K, MK, MPK, MPD) with up to ve subjects and a variable number of instances of events (samples) within each combination of treatment  subject. In experimental designs with unequal subclass numbers (samples) the method of analysis involves the tting of general linear models using statistical software such as GLIM [Nel85]. Fitting these

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

125

models is an iterative process and although there is no technical diculty in tting them, their interpretation is complex.

9.2.3 Results for M2K Table 9.4 shows the analysis of variance for the timing data pertaining to M2K events. Table 9.4: Analysis of variance table for M2K events.

Source of Degrees of Sums of Mean F Test Signi cance Variation Freedom Squares Squares Treatments 1 9.403 9.403 2.06 P > 0:05 Subjects 4 5.374 1.343 0.29 P > 0:05 Error 2 9.133 4.567 Sampling 140 147.6 Total 147 168.7 The analysis indicated no signi cant e ects of either editors or subjects. It was worthy of note however, that for one subject there was no data available (through an accident of data collection) and for another subject there was no data for this event type. Table 9.5 shows the mean times for M2K events for each editor and subject. The second subject was the only one recording an appreciable time di erence between editors. Table 9.5: Mean times for M2K events.

Subject UQ1-TR UQ1-TB 1 1.971 { 2 1.307 2.581 3 { 1.691 4 1.948 2.213 5 1.736 1.784 The overall mean time for M2K events of 1.971sec compared very favourably with the KLM predicted value of 1.91sec.

9.2.4 Results for MK Table 9.6 shows the analysis of variance for the timing data pertaining to MK events.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

126

The analysis indicated signi cant di erences between editors and between subjects. Table 9.7 shows the mean times for MK events for each editor while table 9.8 Table 9.6: Analysis of variance table for MK events.

Source of Degrees of Sums of Mean F Test Signi cance Variation Freedom Squares Squares Treatments 2 8.102 4.051 4.76 P < 0:05 Subjects 4 15.65 3.913 4.60 P < 0:05 Error 7 5.957 0.851 Sampling 226 151.7 Total 239 180.5 Table 9.7: Mean times for MK events for editors.

Editor UQ1-TR UQ1-TB CSG-TB

Mean 1.372 1.317 0.976

Table 9.8: Mean times for MK events for subjects and individual's typing speeds.

Subject 1 2 3 4 5

Mean Typing Speed 0.882 0.161 1.261 0.189 1.174 0.216 1.555 0.319 0.835 0.273

gives the means for subjects. For each editor the mean time for an MK event was less than the KLM predicted value of 1.63sec. The timing for CSG-TB events of this type was signi cantly less than for events from either of the UQ editors. Timings for the UQ editors were similar. There was variability between subjects also. In the CSG-TB editor the event related to use of the RETURN key which was a relatively large key compared to the ESC key|the UQ1 key represented by this event type. Not only is the RETURN key more regularly used but its proximity to all the other keys, compared with the ESC key, may have been a factor.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

127

There was a positive correlation (r = 0:479) between the time subjects took for the event and their average typing speed but this was not signi cant. An alternative explanation for subject di erences was that some subject variability was a factor for this event type for all editors.

9.2.5 Results for MPK Table 9.9 shows the analysis of variance for the timing data pertaining to MPK events. Table 9.9: Analysis of variance table for MPK events.

Source of Degrees of Sums of Mean F Test Signi cance Variation Freedom Squares Squares Treatments 2 165.0 82.50 5.87 P < 0:05 Subjects 4 9.240 2.310 0.16 P > 0:05 Error 7 98.38 14.05 Sampling 304 2735. 8.998 Total 317 3004. The analysis indicated a signi cant di erence between editors but not between subjects. Table 9.10 shows the mean times for MPK events for each editor. Table 9.10: Mean times for MPK events for editors.

Editor UQ1-TR UQ1-TB CSG-TB

Mean 2.592 2.868 4.320

For each UQ editor the mean time for an MPK event was comparable with the KLM predicted value of 2.73sec. The timing for CSG-TB events of this type was signi cantly greater (P < 0:01) than for events from either of the UQ editors. The two UQ editors were similar. The longer time for CSG-TB events may have been related to the structure of the menu from which users selected items. It was essentially horizontal and, depending on the context, there were potentially many choices from which to select. For similar contexts, the UQ1-TB menu was a vertical layout but the same number of choices would have been evident.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

128

9.2.6 Results for MPD Table 9.11 shows the analysis of variance for the timing data pertaining to MPD events that occur in the CSG-TB editor. In this analysis the only error term available to test the e ect of subject was sampling error so no signi cance testing was done. Table 9.11: Analysis of variance table for MPD events.

Source of Degrees of Sums of Mean Variation Freedom Squares Squares Subjects 3 12.55 4.185 Sampling Error 73 67.98 0.931 Total 76 80.54 Table 9.12 shows the mean times for MPD events for each subject for which events were recorded. There is an indication of a di erence between subjects. In all cases, subjects recorded times less than the KLM predicted value of 3.67sec (assuming an average 2cm distance traversal for the mouse cursor before lifting the left mouse button). Table 9.12: Mean times for MPD events for subjects.

Subject 2 3 4 5

Mean 3.051 3.404 2.323 2.826

9.3 Conclusions This analysis indicates that the KLM estimates of time to execute tasks are reasonably accurate. KLM predicted execution times account for about 80% of actual empirically measured execution times. However, we did nd some di erences between predicted and empirically measured values for some events involving memory operators. Table 9.13 shows a summary of the predicted and empirically determined timings for the KLM events analysed here. These values could be used as alter-

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

129

natives to those supplied by [CMN80] when comparing design options for software development tools of the type described here. Table 9.13: Comparison of actual and predicted times (sec) for KLM events including M2K, MK, MPK, MPD.

KLM Actual Predicted Event UQ1-TR UQ1-TB CSG-TB KLM M2K 1.971 1.971 { 1.91 MK 1.372 1.317 0.976 1.63 MPK 2.592 2.868 4.320 2.73 MPD { { 2.774 3.67 Table 9.14: KLM estimates using empirically determined values of M2K, MK, MPK, MPD and, in parenthesis, percentage absolute error of these compared with empirical data for program input and maintenance tasks.

Input Task Check{PROCEDURE AddNumbers{PROCEDURE AddNumbers{FOR StoreCount{CASE Random{FUNCTION ComputeChange{IF-THEN-ELSE CheckInput{REPEAT-UNTIL IterativeSum{WHILE-DO Count{PROCEDURE Maintenance Task WHILE-DO change IF-THEN change AddNumbers{insert statement AddNumbers{remove statement AddNumbers{reverse loop AddNumbers{alter var. name ComputeChange{remove IF ComputeChange{insert IF ComputeChange{alter IF IterativeSum{remove WHILE IterativeSum{insert WHILE

UQ1-TR 17.67 (25.1) 17.36 (0.7) 24.10 (16.3) 40.34 (28.5) 12.60 (12.4) 12.89 (9.0) 33.61 (19.3) 25.78 (20.4) 17.67 (5.3) UQ1-TR 13.57 (3.2) 14.17 (1.4) 15.01 (28.3) 3.97 (14.4) 11.44 (19.6) 21.38 (11.9) 10.00 (27.7) 18.90 (24.3) 7.56 (2.9) 9.04 (47.7) 20.30 (22.9)

UQ1-TB 20.97 (41.6) 17.83 (18.8) 30.22 (17.4) 50.25 (16.1) 13.35 (13.2) 15.89 (19.5) 34.92 (21.6) 29.36 (20.1) 20.97 (30.4) UQ1-TB 16.32 (6.4) 15.76 (18.0) 15.29 (11.5) 4.25 (38.0) 11.67 (38.0) 21.61 (14.0) 10.56 (22.4) 22.84 (5.2) 7.92 (53.8) 9.60 (15.9) 23.80 (34.2)

CSG-TB 22.11 (25.1) 21.49 (12.5) 29.44 (7.3) 52.36 (7.5) 17.33 (40.6) 16.33 (29.6) 37.33 (11.2) 31.90 (19.9) 22.11 (21.8) CSG-TB 37.67 (21.2) 5.70 (310.1) 13.81 (29.2) 5.53 (5.0) 19.58 (34.4) 18.33 (39.9) 29.82 (9.8) 32.09 (8.7) 4.04 (35.8) 19.98 (2.0) 20.78 (15.8)

Using these empirically determined time estimates of KLM events we predicted a new set of KLM time estimates for our tasks. Table 9.14 shows these new predictions and the corresponding calculations of absolute percentage error.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

130

Average errors for the individual editors are now: UQ1-TR 17.1%, UQ1-TB 22.8%, CSG-TB 34.4% (CSG-TB 19.9%, if the large value for, IF-THEN change, is omitted). These gures are almost identical with those from the averages calculated from table 9.2. In table 9.15 we show the new correlation coecients and regression equations obtained after re-estimating KLM data for our tasks and comparing with actual times. While no signi cant improvement in overall correlation is noticed there has not been any reduction in association between predicted and actual times either. On this basis the original KLM estimates from [CMN80] appear to be reasonable. Further improvements to the predictions might be obtained by considering alternative estimates of P based on Fitt's Law calculations, since screen layout of each of the editors is known. Improvements might also be noticed if actual rather than estimated values of typing speed of users were included. Neither of these has been considered at this stage since our principal concern was with estimates and placement of M parameters, though this may be considered in future work. Table 9.15: Correlation and regression results for comparing new KLM predicted and actual times.

Editor

Correlation Linear Regression Coecient (r) Intercept Slope SE Slope UQ1-TR 0.823 6.69 0.517 0.036 UQ1-TB 0.894 4.63 0.608 0.035 CSG-TB 0.828 5.45 0.707 0.052 The KLM assumes that users perform their tasks in as ecient a manner as possible including the selection of optimal methods where there are alternatives. Users invariably use or select methods with which they are familiar or they think are best (in some context) or are part of their minimal set of methods. It is in situations such as this that other models such as GOMS [CMN83] and CCT [KP85] may be useful but even these cannot handle error scenarios adequately and so experimentation and usability studies are still needed to compare design options where errors are a major concern.

CHAPTER 9. VALIDATION OF KEYSTROKE-LEVEL MODELS

131

9.4 Summary GOMS/KLM is the one `real' engineering approach to user interface evaluation that is currently available to software engineers but, compared to other engineering disciplines, GOMS/KLM is immature and our understanding of its application in many design situations is at an early stage of development. The validation of the KLM data showed that even simple models can be useful in providing evidence to assist design choices. We applied the KLM analysis to a complex tool in a domain where such evaluation is rare and, although we had reservations about some aspects, our validation shows that the technique (KLM) was as accurate in this domain as many others. Thus, we conclude that such models can play a useful role in user interface design for software development tools and provide a reasonable basis for decisions on design choices.

Chapter 10 Need for Further Work The previous three chapters presented experiments with two conceptual issues relevant to the design of software tools. In this chapter we examine two more such issues requiring similar examination in the future: detail suppression and multilingual documents. This is not meant to imply that these are the only two issues relevant to software development tools left to investigate. Our purpose is to demonstrate that the techniques already examined are applicable to many other issues and to illustrate the diculties involved in doing so.

10.1 Detail Suppression One of the principal tools needed in developing or maintaining computer programs is a program display tool. Such a tool enables a programmer to read a program either to understand what it does or to x errors or make enhancements. All computer displays are limited in size, so showing a program's textual (or diagrammatic) representation in its entirety is impossible (except for a trivial program). Even if it was possible to show large program representations in full, such an expanse of material would be incomprehensible to the programmer. The most common solution to this problem is to allow the user to scroll the representation within the available display space, which for graphical user interfaces is a window. Scrolling basically comes for free as part of the windowing software. Alternative presentation mechanisms are, however, available and may be more appropriate in this domain. Automatic formatting of program text would seem to be a desirable feature of any program editor. Indeed, in chapter 5 our guidelines evaluation suggested that 132

CHAPTER 10. NEED FOR FURTHER WORK

133

structured documents should to be presented to a user and manipulated in a manner appropriate to the structure of the document. There are any number of editors that provide formatted views of program text. In particular, so-called folding editors provide a means by which the user can hide or expose certain parts of a program document from the current view. Typically, these parts are functions or procedures, which constitute the major folds in programs, although folds encapsulating other levels of structure are also common. These editors allow the display and editing of a program's text based on opening and closing these folds. A review of folding techniques, folding editors and the advantages they provide to programmers can be found in [KL94a]. The concept of folding is also known by the terms elision and holophrasting. As far as we can ascertain, scrolling and folding as program browsing mechanisms have not been compared in empirical studies. One study by Monk, Walsh and Dix [MWD88] compared scrolling with hypertext and folding with hypertext but scrolling and folding were not explicitly considered in the analysis and there was no discussion that compared them. Nevertheless, an analysis of the summary data presented in their table 2 (re-presented in table 10.1) showed no signi cant di erence between either the number of tasks correctly performed or the number of tasks performed per hour when comparing scrolling and folding mechanisms. Based on their description of the folding browser, its poor performance was probably attributable to the primitive nature of the browser itself rather than the technique per se. Table 10.1: Performance data comparing hypertext, scrolling and folding program browsing mechanisms (Monk, Walsh and Dix [MWD88]).

Treatment Tasks Correct Mean SD Hypertext 13.5 0.93 Scrolling 13.2 1.20 Folding 13.1 1.10

Tasks per hour Mean SD 49.2 13.9 68.1 21.0 56.7 12.9

To avoid prodigal consumption of display capacity, the UQ editors employ `adaptive formatting' as de ned in [RW81]. This allows trivial occurrences of potentially large constructs to assume one-line formats, while larger occurrences adopt the minimum multi-line form consistent with their structure, extent and readability.

CHAPTER 10. NEED FOR FURTHER WORK

134

The combination of `nested block abbreviation' (folding) and adaptive formatting minimises the display requirements of a program block while maintaining its readability, but it cannot guarantee that the resultant view ts in a window of given size. In UQ1, this problem was solved by resorting to the text-editor solution of scrolling the viewable text within the display window. Figure 10.1 shows the e ect of scrolling when the top sub-window in gure 4.3 is reduced to half its height. While such scrolling is consistent with a purely textual view of the block concerned, it is unsatisfactory when structural issues are relevant, since structurally related symbols which are textually separated by more than the window size cannot be viewed simultaneously. PROCEDURE WritePair(v:

pair); ...;

BEGIN WHILE NOT eof(input) DO BEGIN ReadPair(NextPair); FixPair(NextPair); WritePair(NextPair) Figure 10.1: A scrolled view of program text.

For this reason, Broom's prototype of UQ2 o ered `suppression by structural distance'1 [BW86, Bro87b, BW91] as a selectable alternative to scrolling when a block view was too large for the available sub-window. This was a variation on a suppression technique described by Mikelsons [Mik81]. With the option selected, text furthest from the user's focus of attention (in some structural sense) was suppressed until the view tted. Figure 10.2 shows the e ect of structural suppression when the top sub-window in gure 4.3 is reduced to half its height, and the user's focus or highlight is on the conditional expression controlling the while-loop in the program body. This sheye lens style of view has been applied extensively to graphically presented information in, for example, Spence and Apperley's `bifocal display' for displaying oce information [SA82], display of subway networks [Fur86], visuali1

We sometimes shorten the phrase suppression by structural distance to structural suppression.

CHAPTER 10. NEED FOR FURTHER WORK

135

PROGRAM Example(input, output); ....... BEGIN WHILE NOT eof(input) DO BEGIN ... END END. Figure 10.2: Detail suppression by structural distance.

sation of `linear' structures [MRC91], 3D visualisations of le systems [RMC91], navigation in hierarchically clustered telephone networks [SZB+92], geographical maps [SB92, SB94], computer-aided design drawings [MG93], and diagramming techniques in information systems [Chu95]. For a review of distortion-oriented techniques see [LA94]. There has been little empirical evaluation of sheye techniques comparing them with traditional display mechanisms such as scrolling. The work of Furnas [Fur86] included a small usability study which did indicate that sheye views were superior to at views. In [HCMM89], subjects navigated around a ctional subway network undertaking various tasks using either scrolling or a sheye view technique. From the perspective of user performance times, the sheye view o ered no advantage when both nodes could be displayed at once. Fish-eye was slightly faster than scrolling when a target node was hidden (in the scrolling view). However in complex tasks where the user was required to navigate through various nodes along several routes, scrolling was faster than sheye. The authors attributed this to the `disorienting nature' of the sheye view with subjects nding the shifts in view orientation using their particular implementation of the technique, `jarring'. They contemplated further research on this issue of spatial distortion since there were no guidelines available for selecting the appropriate degree of distortion applicable to various parts of the sheye display. Scha er et al. [SZB+92] also reported an empirical study comparing sheye and full-zoom techniques. Navigation tasks were completed signi cantly faster using the sheye views though some software enhancements are required to reduce the number of errors made by users (20% in the case of sheye views).

CHAPTER 10. NEED FOR FURTHER WORK

136

Detail suppression in text-based systems has a similar lack of empirical research and guidance for designers. By providing scrolling and structural suppression as user-selectable alternatives in UQ2, variations in user preference were catered for, but a signi cant interface choice remained to be resolved within the structural suppression option. In a system that combines adaptive formatting and text compression an important issue was whether the formatting decisions took account of text suppression in determining the layouts required. If they did, then a statement whose unsuppressed layout was IF condition THEN BEGIN statements END ELSE BEGIN statements END;

might under partial suppression take the one-line form IF condition THEN ... ELSE ...;

While this approach made optimum use of the available display area, it meant that the shape of a program fragment altered as the user's focus of attention changed. Such changes of shape created recognition problems for the user (noted in graphical presentation systems as well [HCMM89]), and for this reason Broom preferred a suppression strategy which suppressed only whole lines of the formatted unsuppressed view. This decision was a strictly intuitive one on the designers' part, and has not been validated. However its underlying principle, which preserves an object's shape while varying the relative levels of projected detail of its components, might be one which has been investigated in other contexts. Unfortunately, no literature reporting investigations of this aspect seems to be available.

CHAPTER 10. NEED FOR FURTHER WORK

137

10.1.1 A Possible Experiment As already indicated, Broom's prototype of UQ2 o ered structural suppression as a selectable alternative to scrolling when viewing a given context. Some contexts could include elided sub-contexts (and some at the bottom of a hierarchy obviously will not) so both options allow us to examine the issue of folding also. Since UQ2 is a generic language-based editor, we have the opportunity to not only examine the use of formal languages such as programming languages, but also simple text-based structured documents such as this thesis. Because of the editor's generic nature we can select and use subjects from a wide variety of backgrounds, although we would want to make our comparisons within certain groups. For example, we would stratify the potential subjects based on the type of structured documents they were able to handle. One obvious grouping is software engineers but in this experiment we do not believe that we are as restricted as we were in the experiments of chapter 8. There we required experience with the editor as a prerequisite for participation in the experiment but here only limited experience should be required since the tasks involved should not require an intimate knowledge of the editor. Basic navigation and highlight setting skills are all that is required. Users are familiar with the concept of scrolling as it is common in graphical user interfaces and indeed in some character-based systems but they are less likely to be familiar with the concept of structural suppression. Thus, subjects may be biased by previous experience. This complicates the choice of experimental tasks. For some tasks scrolling is naturally favoured while for others structural suppression would be better. Scrolling may be a better mechanism when the user wants to nd the number of occurrences of a variable in a construct or when searching for a particular object. Structural suppression will almost certainly be better where there is a major space separation between the items of interest and structural suppression, by its nature, allows them to be viewed in the one window. These di erences and the need to take account of them are consistent with the ndings in [HCMM89]. We are interested in the relative eciency of the treatments. In both cases the user is interested in viewing and/or editing a relatively large structured document. The experiment should examine the comprehensibility of documents presented us-

CHAPTER 10. NEED FOR FURTHER WORK

138

ing all methods. On a quantitative level we can set tasks (a non-trivial activity itself) and compare completion times. Many program comprehension (for example [SSSW86, Wie86, Pen87, MWD88, CW91]) and text comprehension (for example [ERL+89, CS90]) studies have already been undertaken and activities based on these or others that are similar may be suitable here. As well, the user should nd the view of the document appropriate for the tasks so we need to query the user about this. In all treatments the shape of the displayed text, as it is either scrolled or as the user alters their focus of attention, changes but the extent to which this a ects the user's ability to read and comprehend program code is unknown and needs to be evaluated.

10.2 Multilingual Documents Good software does not just consist of well-formed programs in appropriate programming languages, but also involves speci cation, design and implementation documents, which provide all the information necessary to the understanding and maintenance of the programs concerned. In its simplest form, software documentation involves the integration of program text with informal natural language material, but in more rigorous development methods it may involve the combination of formal material in several languages, a speci cation language, a design language and a programming language, all accompanied by appropriate informal text. The most obvious example of a program preparation system designed to integrate program and informal material was Knuth's WEB system [Knu84]. `Literate programs' produced via WEB consisted of a narrative commentary which presented program design decisions in a logical sequence, with relevant program fragments embedded at appropriate points. The input prepared for WEB re ected this narrative sequence. To obtain a compilable program from the input representation, a utility was invoked which extracted and rearranged the embedded program fragments to form a compilable text which obeyed the syntax rules of the programming language concerned. Few empirical studies of the e ectiveness of the literate programming style have been undertaken even though the WEB system is over a decade old. Bertholf [Ber93] studied novice and intermediate programmers to determine the comprehensibility

CHAPTER 10. NEED FOR FURTHER WORK

139

of literate programs compared to traditional modular programs and found that the literate programming style improved comprehension appreciably. From the user's point of view, a WEB document had a simple conceptual model|it was a sequential interleaving of commentary and program text. Its major strength was that it permitted an arbitrary grain and sequence of presentation to be used in presenting a program design. This was particularly important with languages which were restrictive in the order in which de nitions and declarations must occur. In other respects, however, the WEB model had some disadvantages:

 The rearrangement of program fragments needed to produce a compilable program created an inconsistency between the user's view of program (the narrative presentation) and its executable semantic structure. When the user needed to think in terms of this semantic structure, as at debug time, a considerable cognitive load arose.

 By its nature, the WEB representation of a program was not well-suited to exploiting on-line hierarchic abstraction facilities such as those we described earlier for nested blocks.

Several editing tools have been developed to exploit the literate programming paradigm while dealing with these problems in various ways. For a recent review of some typical tools see [Ziv94, st95]. Many of these tools were based on a hypertext architecture and although there have been usability studies of hypertext there have been no evaluations of the usability of hypertext for software development. Indeed, it has been argued that the hypertext architecture is `too complex and verbose' and another approach was a single integrated system which handled all document types [Mun94]. UQ2 provides a document structure in which inherent hierarchic structure (either of formal material such as the program code itself or of the narrative text in which it is embedded) can be used as the framework for document organisation. It does so by generalising the concept of hierarchic block-oriented navigation and display described earlier to a hierarchic context structure. This context structure may be generated either implicitly, by formal language constructs such as program

CHAPTER 10. NEED FOR FURTHER WORK

140

blocks, or explicitly, by user command within informal text. A context is the unit of on-screen document display, with automatic abbreviation of nested contexts, and the user may zoom and pan through this context hierarchy as described before. The signi cant extension is that the content of each context is a sequence of language zones, allowing an interleaving of formal material with informal text, or the juxtaposition of, say, a formal speci cation with the program code it describes. Any of the zones in a context may contain nested contexts if the language concerned permits them. In an on-screen context display the zones are displayed in sequence, with minimal indication of the zone boundaries. Figure 10.3 shows a UQ2 display of the outermost context of a simple program document, which consists of a text zone, a zone giving a formal speci cation of the program in the Z language, another text zone, a Modula-2 zone which is the outermost program block, and a nal text zone.

Figure 10.3: Mixed language zones.

CHAPTER 10. NEED FOR FURTHER WORK

141

The conceptual model underlying this interface is the postulated `unambiguous tree model' of the nested context structure in a document. For the sequence of language zones that make up a context, however, the predicted user's model is much less clear. At times, users may be conscious of the zone boundaries concerned| during initial input for example, or when considering the semantic content of speci c zones. At other times, however, they may conceive a sequence of material that crosses one or more zone boundaries as a single unit and expect to manipulate it as such. A related conceptual issue in any multilingual system is whether the user assumes that material in language A which is visually similar to some allowable sequence in language B can be used as such without explicit transliteration|the WYSIWYG principle suggests that it should. In some cases implicit transliteration seems natural, for example when the same expression can be used with equivalent meanings in a speci cation language and a programming language. Generally, however, provision of implicit transliteration is fraught with danger, and multilingual systems may well require the user to be aware of language di erences in these situations, and to achieve explicit transliteration in some way. In this case uncertainty with respect to the user's intentions cannot readily be accommodated in the interface without requiring explicit discrimination by the user. Extension of UQ2's command set to enable multilingual document manipulation re ects these conclusions. As indicated above, context structure may be created either implicitly by formal language structure, or explicitly by user command. Once created, the context structure of a UQ2 document is hard, and can only be altered by operations that correspond, at least implicitly, to tree-based editing. Thus the user can delete an entire context from the enclosing context level, but cannot alter the boundaries between a context and its parent, or between sibling contexts, by directly editing the boundaries as such. In contrast, the user has complete freedom to edit zone boundaries within the current context. The highlight on which edit operations operate can be de ned as any contiguous sequence of material visible in the current context, and as such may straddle one or more zone boundaries. Deleting or replacing such a highlight alters the zone structure, with the editor automatically eliminating or creating new zone

CHAPTER 10. NEED FOR FURTHER WORK

142

boundaries as required. This automatic adjustment of zone boundaries is consistent with the principle that no material changes language implicitly during edit operations, but the user is provided with explicit commands for language transliteration, either in situ or during copying.

10.2.1 Possible Experiments From the preceding material, at least two experiments are possible:

 a comparison of UQ2's `integrated' approach with the `separate but related' document approach that hypertext links allow, and

 a validation of UQ2's editing constraints for integrated multilingual documents.

UQ2 provides a single document approach to document integration as opposed to a hypertext architecture so the rst type of experiment we might propose is one comparing these two approaches and tools that implement them. For control purposes it would be useful to also include a traditional approach and tools. Conducting a usability study of these approaches and indeed comparing the ef ciency of them is likely to include many problems. Few programmers are familiar with the literate programming style and even fewer have experience with editing systems that implement the style. Thus we have an immediate problem of the availability of expertise and moreover a problem with experience bias. Any experiment would need to be over an extended period of time to allow the subjects to become accustomed to the software system in use. Such a longitudinal study would provide valuable information on the learnability of the systems as well as extensive data on their usability. Unfortunately, it is unlikely that large numbers of subjects would be available for such a trial so a series of case-studies may be a compromise. Even then the tasks to be undertaken would need to be chosen carefully. In the other experiments already discussed, relatively short experimental tasks were acceptable whereas here that is unlikely to be the case. The tasks selected might be quite large so it may be appropriate to combine this experiment with some other project. Of course, such

CHAPTER 10. NEED FOR FURTHER WORK

143

a situation may in itself be a problem similar to that found when attempting to study programmers in their natural work environment. The second experiment to examine the validity of UQ2's editing constraints is even more problematic. Most of the problems noted for the rst experiment are still relevant but in addition there is the question of the design of the experiment. A comparison of explicit and implicit transliteration options might be viable but user expertise rather than design option may well be tested with such an experiment. In particular, design options that might relax the proposed hard context structure of a UQ2 document to allow less constrained editing would fundamentally alter the underlying system model and cause consequent changes in a range of editor features. No editor with these features is available and adapting UQ2 to provide them for comparative evaluation would require signi cant re-implementation. In general, we note that in evaluating speci c features of an innovative tool design a comparable tool which di ers in just these features may be dicult to obtain, and the design of an experiment that isolates the use of just these features from other aspects of the tool may be impossible.

10.3 Summary There are many unresolved issues associated with software development tools and the design of their user interface. Both of the issues presented in this chapter have some anecdotal evidence surrounding design options relevant to them, but neither issue has received a reasonable amount of experimental treatment to allow selection between the options. Such issues should be resolved by experimentation wherever possible. In some cases, however such experimentation is dicult for various reasons. Particularly problematic are issues where the design options involved are dicult to isolate, or to make available for comparative evaluation.

Chapter 11 Conclusions 11.1 Summary of this Thesis The objective of this thesis was to investigate the applicability of user interface design techniques to the design of the user interface of software development tools, in particular, language-based editors. We focused on systematic approaches to evaluation of interface design issues including a guideline review, a predictive modelling approach (use of keystroke-level models) and empirical approaches (experimentation with tool users). The thesis was organised as follows:

 Chapter 2 discussed the strategies currently used in user interface design and

their application to software design generally. We identi ed three categories for these strategies: user experienced-based design, model-based design and the use of guidelines to assist design. Intuition on the part of designers and all types of experiments with users constituted user-experience-based design. Predictive modelling, application of the principle of anthropomorphism and consideration of cognitive models and aspects of users, characterised modelbased design. The extent to which any of these (except intuition) was used by software engineers was questioned.

 Chapter 3 examined the application or lack of application of these design

strategies to software development tools. Intuition, as a design and evaluation strategy, was particularly common among software tool designers. However the penetration of the other strategies into this domain remained weak.

144

CHAPTER 11. CONCLUSIONS

145

 Chapter 4 discussed user models of programs as a basis for assisting the de-

signer to understand a software tool user's needs. The example software tools (language-based editors) used in the study were discussed. We also introduced two issues arising from these user models: the appropriate representation for the hierarchical structure of a program and choice of editing paradigm. Both of these issues have been the focus of intuitive comment but little or no systematic analysis.

 Chapter 5 described a guideline review of the user interface for one languagebased editor, the Z language instantiation of the UQ1 editor. The review indicated that if such a set of guidelines had been available before UQ1's design and implementation then their use would have had bene ts. This case-study, although a signi cant task in itself, focused our attention on user interface issues relevant to the editor and indicated areas needing further investigation. In particular, there was little or no guidance with the two issues introduced in chapter 4.

 Chapter 6 described and presented the results of an experiment that com-

pared the eciency of navigation menus using two representations of hierarchic structure. Both representations, graphical and text-based, proved equally ecient. However, a majority of users preferred the graphical representation. If we consider the hierarchic structures as simulations of program structure, then this preference agrees with the postulated user model of programs. Thus although UQ1 currently implements the concept using a text-based approach, we argue that UQ* should o er both representations.

 Chapter 7 presented keystroke-level models of the editing paradigms commonly used in language-based editors. This predictive modelling exercise was useful in that it con rmed the view that the text recognition approach was more ecient than the tree-building approach, in a majority of cases. However it also highlighted some problems with the models including the estimate of the memory parameter in some contexts and the KLM's assumption of errorfree expert behaviour. User experimentation could assist in both regards.

CHAPTER 11. CONCLUSIONS

146

 Chapter 8 described a usability experiment of language-based editing para-

digms, as suggested in chapter 7. The usability experiment showed little advantage to either text recognition or tree-building, however it was limited by sample size and the level of expertise of the subjects.

 Chapter 9 described a validation of the Keystroke-Level Models built in chap-

ter 7, using data collected in the experiments described in chapter 8. KLM estimations of task execution times proved accurate (when user errors were acknowledged and eliminated from analysis) con rming the view of the usefulness of the technique for early design choice considerations.

 Chapter 10 presented two other conceptual issues relevant to the design of

language-based editors: document display where the display area is less than that required to view that portion of the document in its entirety, and display and editing requirements of multilingual documents. These were typical of the many unresolved issues relevant to software tools and their user interfaces. We demonstrated that the techniques already examined were applicable to such issues and illustrated the diculties involved in doing so.

11.2 Major Outcomes and Contributions The major contributions of this thesis can be categorised under two headings: the process of user interface design and evaluation in the software tool domain, and speci c outcomes relevant to the particular software development tools examined. Contributions to the process of design and evaluation are:

 Design evaluation strategies other than intuition are feasible for designing

the user interface of software tools, such as language-based editors, and are relevant to the usability of the subsequent tools. The relatively slow uptake rate of software tools is evidence of a lack of understanding and consideration of the needs of users. By developing user models and considering cognitive aspects of the programmer and the tasks to be performed, designers might have developed tools more suitable to their users and therefore more likely to be utilised. Consideration of such models leads not only to identi cation of

CHAPTER 11. CONCLUSIONS

147

design options but also suggests systematic evaluation procedures where more than one design option occurs. The designers of UQ1 did consider such user models as the basis for design decisions for UQ1's user interface. However until now there had been no validation of the models nor usability studies of the implementation based on the models. We have used strategies other than intuition to successfully validate the models.

 User interface design guidelines are an important resource representing more

than commonsense guidance and they can and should be consulted by software tool designers. Using a retrospective analysis involving the application of guidelines to an existing user interface, we showed that several inappropriate design decisions made for UQ1 could have been avoided if guidelines of signi cant coverage had been consulted. Guidelines do not provide complete advice on all issues but their use may shorten the time to a nal product by eliminating some design errors at an earlier stage.

 Predictive modelling approaches for evaluation of competing design options

are feasible, even for complex software objects such as language-based editors and some design decisions associated with them. Keystroke-level models built to allow a comparison between basic language-based editing strategies showed that text recognition was a more ecient approach to editing compared with its counterpart, tree-building. Thus, through the use of models we showed that we could predict eciency of editor use before the editor had been built. Validation of the models, comparing them with actual experiments with users and various editor implementations, showed that the models were accurate provided that user errors were taken into account.

 Experimental approaches to user interface evaluation of software development

tools are feasible, but dicult. Some of these diculties include: the availability of appropriately trained and experienced users (where we are interested in optimising user interaction), the development of tasks that are typical, are of an appropriate size and are contextual for the users involved, and the availability of resources. Nevertheless, we have shown that experimentation with complex tools and highly skilled users is possible. Many challenges remain,

CHAPTER 11. CONCLUSIONS

148

though, not the least of which is the need for a cultural shift in the software engineering community towards such user-based experimentation. Outcomes speci c to language-based editors are:

 Graphical representations of the hierarchic structure of information are not

necessarily ergonomically better than text-based representations. Providing an overview of a program's structure is an important aspect identi ed from the user model of programs. At this level the model suggested that a graphical representation was more appropriate than a text-based one. Guidelines also indicated that providing an overview of any structured document was required as, indeed, were methods to manipulate the structure and its contents, but no advice on representation was given. For various reasons, including availability of display space and ease of implementation, UQ1 implemented a text-based view and not a graphical one. Our experiment to compare these two representations showed that both representations were equally ecient. This result is only relevant in this context and does not hold for all graphical representations of program documentation.

 Text recognition is at least as ecient if not better than tree-building or

hybrid editing paradigms for language-based editors. Our predictive models and our experiments with users and various editors con rmed the choice of text recognition as a viable and suitable editing paradigm.

11.3 Future Work In chapter 10 two further conceptual issues were canvassed as candidates for future study but there are many such issues. The critical need is that systematic approaches to design evaluation be utilised when examining the options apparent from consideration of any individual issue. In particular, user-based experimentation with user interface choices relevant to software development tools should be undertaken.

Appendix A Sample Guideline Use 1.04 Fast Response Ensure that the computer will acknowledge data entry actions rapidly, so that users are not slowed or paced by delays in computer response; for normal operation, delays in displayed feedback should not exceed 0.2 seconds. See also 3.018 3.019 In insertion mode, typed entry of Z text is accompanied by automatic formatting and incremental syntax checking of the typed symbols. Unless the system is heavily overloaded it should never be the case that the time taken in this formatting and checking is longer than 0.2 seconds. Text may also be copied from a save bu er while in insertion mode. Depending on system load, the time taken for this operation may exceed 0.2 seconds. The user may also request another le of data to be read into the current document while the editor is in insertion mode. It is likely that formatting and syntax checking, depending on the size and complexity of the le to be read, may well exceed the suggested 0.2 seconds. The most signi cant delay in parsing and formatting les occurs at editor startup time when the user requests to edit a particular le. This, however, would seem to be a special case and not considered normal operation. The experienced Z editor user would be prepared for such occurrences. The current structure of data les produced by the editor precludes increases in speed of response to read or load operations. The speed of these processes is mainly dependent on the size of the le being read. More specialised formatting of les could be implemented to speed the operations but this would probably involve use of more complex le structures. To test all of these issues thoroughly would require a set of tasks to be performed and timed under various system load conditions. There is anecdotal evidence, at least, to suggest that users are aware when they make trivial, as opposed to extensive changes to a document and are thus prepared for what would seem an appropriate response time. The time taken is commensurate with the e ort involved 149

APPENDIX A. SAMPLE GUIDELINE USE

150

in making the data entry or change.

1.12 Nonobscuring Cursor Design the cursor so that it does not obscure any other character displayed in the position designated by the cursor. The insertion mode solid block cursor and the inspection mode highlight employ brightness inversion to show the character or characters they are marking.

2.08 User Control of Data Display Allow users to control the amount, format, and complexity of displayed data as necessary to meet task requirements. See also 2.02 Only Necessary Data Displayed 2.81 Flexible Design for Data Display 2.7 Display Control The main document region can include several windows. These context windows have a xed width, which is the full width of the editor window less the width of the control panel. Window re-sizing and repositioning are as per SunView, however, changes in window width are ignored as far as document formatting is concerned. The user has no control over the format of Z text. De nitions relating to the physical layout of Z text in a context are given in the grammar le developed by the editor builder. These de nitions are applied to all Z constructs created by a user and there are no mechanisms for tailoring presentation other than by creating a new grammar le and building a new editor. Equally, the user has no control over the complexity of Z text displayed. Text within a context is either fully displayed in its true mathematical detail or completely compressed if it is a directly-embedded context. The user cannot control the level of compression of text, that is, view part of a directly-embedded context within the parent context.

2.13 Consistent Text Format When textual material is formatted, as in structured messages, adopt a consistent format from one display to another. See also 2.06 Consistent Display Format

APPENDIX A. SAMPLE GUIDELINE USE

151

All Z text is automatically formatted. The user is responsible for structuring the text into contexts. A context has a name and is either the entire document, a section, sub-section or sub-sub-section. The physical display of a context is automatically controlled by the editor.

2.7.34 Visual Integration of Changing Graphics If a user must visually integrate changing patterns on a graphic display, update the data at a rate appropriate to human perceptual abilities for that kind of data change. Scrolling Z text within a window is controlled by the user. In this editor there is no facility for the user to control the level of suppression/compression of text other than to view an entire context. If such a facility were available then dynamic display of Z constructs and/or contexts would be enabled and this would have potential for problems for human perceptual abilities in terms of the changing shape of text.

3.05 Control by Explicit User Action Allow users to control transaction sequencing by explicit action; defer computer processing until an explicit user action has been taken. See also 1.09 1.14 1.41 4.02 6.09 6.35 The user controls the selection of all transactions whether they be navigation or edit mode operations. The exception might be the global search facility where contexts can be switched implicitly by the editor to bring a found sequence of symbols into view. Z text representations are automatically converted to Z symbols, if correctly entered. The user is immediately interrupted if syntax errors are encountered on entry of Z text. In the WRITE TO FILE data entry area of the control panel, the content of a text zone is accepted by invoking the command to process the required function, for example, Write. In the SEARCH data entry area, the content of the Target is not accepted until the RETURN key is pressed. If this text is changed but the RETURN key is not pressed then any subsequent search commands try to nd the previous target. In the window used to enter the lename of a le to be read in insertion mode, the lename must end in the RETURN key otherwise the name is not recognised.

3.08 Distinctive Display of Control Information Design all displays so that features relevant to sequence control are distinctive in position and/or format.

APPENDIX A. SAMPLE GUIDELINE USE

152

See also 2.52 4.06 Except for insertion mode control commands (Accept, Copy, Read and Help), all control functions are displayed on the control panel which is permanently displayed.

4.010 Clear Control Labels Label function keys and other controls clearly to indicate their function. See also 1.010 3.1.44 Function keys are not utilised in this implementation. Most controls are clearly labelled on the control panel. The only control functions that are not labelled are those that allow anticipated Z symbols to be automatically accepted, saved text copied, les read, and help activated , all while in insertion mode. These control functions could be implemented as function keys.

4.318 User Con rmation of Destructive Entries Require the user to take some explicit action to con rm a potentially destructive data/command entry before the computer will execute it. See also 3.57 6.018 6.020 6.319 The QUIT option signals to the user that data will be lost if the action is continued and changes have been made and not already saved. Change/delete are not con rmed but any edit operation, up to the next edit operation, may be reversed.

6.35 Explicit User Actions Require users to take some explicit ENTER action to accomplish data entry/change transactions; data change should not occur as a possibly unrecognised side e ect of other actions. See also 1.09 1.14 3.05 3.1.36 4.02 6.09 In insertion mode the editor recognises the end of each Z symbol representation, most but not all ending with the / key, and immediately produces the appropriate

APPENDIX A. SAMPLE GUIDELINE USE

153

symbol and performs any associated actions in the way of formatting or checking. In addition, and according to the Z Editor User Manual [ZMa], the current symbol is terminated by use of the commands accept, copy, help and read. Some user actions in creating a document involve explicit enter operations, such as invoking accept to automatically accept an anticipated symbol produced by the editor. In the WRITE TO FILE data entry area of the control panel, the content of a text zone is accepted by invoking the command to process the required function, for example, Write. In the SEARCH data entry area, the content of the Target is not accepted until the RETURN key is pressed. If this text is changed but the RETURN key is not pressed then any subsequent search commands try to nd the previous target. In the window used to enter the lename of a le to be read in insertion mode, the lename must end in the RETURN key otherwise the name is not recognised. To invoke the actual reading of a le into Z text when using the read command requires the user to explicitly select the Ok button in the read window. Exit from insertion mode is achieved by using the end insert command, the ESC key. An explicit instruction is required to exit insertion mode, namely, end insert. Throughout insertion mode, syntax errors are indicated to the user and action to correct them, or else action to end insertion mode, is required by the user before proceeding. The control panel data entry areas of SEARCH and WRITE TO FILE and the read le window do comply with the guideline. These functions are never actually processed until the user uses the mouse to point the mouse cursor at an activation button and selects one by clicking the left mouse button. Thus, the user can ll in details appropriate to a search such as the target text but until the Forward or Backward button is selected no search is carried out.

6.310 Editing Entries After Error Detection Following error detection, allow users to edit entries so that they must rekey only those portions that were in error. See also 4.315 6.010 During document input, the editor rejects symbols representing syntax errors and the user is immediately invited to enter another symbol. The representation of the incorrect symbol is lost and the user inputs the new symbol from the start. Syntax checks after deletion or insertion may also discover syntax errors which the user will correct in a similar fashion to the above. To change previously entered text it is necessary to position the mouse cursor appropriately and either use the Delete key or simply insert new text. For example, to change Target: pset/ to Target: pset1/ position the mouse pointer to the left of / and click the left mouse button. Type 1 then press the RETURN key to nish the change. If the editor is started with a particular le as input on the system command

APPENDIX A. SAMPLE GUIDELINE USE

154

line then that name is present in the Filename area; otherwise that area is empty. is set to 75 and Indentation to 2. Input and change of text in these areas is the same as the procedure to change the target text.

Page Width

Appendix B Hierarchies Used in Menu Experiment Each section below describes one of the four hierarchies used in the block-oriented program display experiment. A nal section gives details for the example hierarchy that the subjects were able to trial without timing. Included is the description all subjects were given, the hierarchy in the indented-list format, and the three questions asked of the subjects.

B.1 Partial University Hierarchy|A B.1.1 Description

The University of Southern Queensland is managed by a Vice-Chancellor with the assistance of a Deputy-Vice-Chancellor, two Pro-Vice-Chancellors and a Director.

B.1.2 Hierarchy

Vice-Chancellor Deputy-Vice-Chancellor Pro-Vice-Chancellor (Academic) Accounting & Finance Applied Science Arts Engineering Education Information Technology Management Economics Human Res. Man. Marketing Operations Man. Public Admin. Pro-Vice-Chancellor (Learn. Serv.) 155

APPENDIX B. HIERARCHIES USED IN MENU EXPERIMENT

156

Academic Services Distance Education Information Res. International Educ. Student Services Univ. Centres CINITEC Land Use Performance Human Perf. Astronomy Regional Info. Educ. Res. Dev. Director (Admin.) Business Serv. Grounds Maintenance Printery Store Finance Food Services Personnel Res. Colleges

B.1.3 Questions

I Select the item representing the ASTRONOMY CENTRE. II Select the item representing the UNIT WHICH DIRECTLY CONTROLS the PRINTERY. III Knowing that PUBLIC ADMIN is a sub-unit of the MANAGEMENT unit select the item representing the PUBLIC ADMIN UNIT.

B.2 Sta in University Schools|B B.2.1 Description

Sta within each school of the University of Southern Queensland are responsible to other sta within that school or in the case of the Dean to the Pro-Vice-Chancellor (Academic A airs).

B.2.2 Hierarchy

Swannell Ayers Daniels

APPENDIX B. HIERARCHIES USED IN MENU EXPERIMENT

157

Young Brett McClachlan Grant-Thomson Seal Glasby Conway Fleming Rogers Morgan Rixon Harris Hilton O'Shea Porter Smith Kilpatrick Durack Fair Fordyce Ball Black Dobney Leis Parsons Fulcher Eastwell Hayes Pemberton

B.2.3 Questions

I Select the item representing the person named LEIS. II Knowing that the person named FAIR is responsible to KILPATRICK select the item representing the person named FAIR. III Select the item representing the PERSON WHO DIRECTLY CONTROLS the person named CONWAY.

APPENDIX B. HIERARCHIES USED IN MENU EXPERIMENT

158

B.3 The Bachelor of Information Technology Course Structure|C

B.3.1 Description

There are three streams in the Bachelor of Information Technology course at the University of Southern Queensland. Each contains core units and major units.

B.3.2 Hierarchy

Bachelor of Information Technology Commercial Computing Core Introduction to Accounting Management & Organ. Behaviour Introduction to Law Business Economics Major Programming I Digital Communications Applications Development I End-User Computing Core Program Design & Dev. Data Analysis Systems Analysis Programming I Major Programming II Expert Systems Development Info. Tech. Project Information Science Core Algebra & Calculus I Algebra & Calculus II Algorithms & Logic Major Computer Engineering I Linear Systems & Control Numerical Computing Robotics & Machine Vision

B.3.3 Questions

I Select the item representing the unit ALGORITHMS & LOGIC.

APPENDIX B. HIERARCHIES USED IN MENU EXPERIMENT

159

II Select the item representing the STREAM which has NUMERICAL COMPUTING as one of its units. III Knowing that the END-USER COMPUTING stream has EXPERT SYSTEMS DEVELOPMENT as one of its units select the item representing EXPERT SYSTEMS DEVELOPMENT.

B.4 The Widget Farming Company|D

B.4.1 Description

The Widget Farming Company has four divisions each headed by a manager and containing several sta responsible to that manager. Overall control of the company is in the hands of the company chairman.

B.4.2 Hierarchy

I. Jones - Chairman A. Jones - Farms Lindemann Rogan Keer Gat eld Ducket Hagan Cameron - Trading Hess Facer Smith Haben Cave Evans Baldwin Firman Bowe - Construction Kratzman Muller Weier Keller Marriot McLeish - Sales Gold Gall Brandon Williams Hook M. Jones

APPENDIX B. HIERARCHIES USED IN MENU EXPERIMENT

160

B.4.3 Questions

I Select the item representing the person named KELLER. II Select the item representing the PERSON WHO DIRECTLY CONTROLS the person named HABEN. III Knowing that GALL is responsible to McLEISH select the item representing the person named GALL.

B.5 The British Royal Family

B.5.1 Description

A partial family tree for the British Royal Family includes the Queen (Elizabeth II), her four children and their children.

B.5.2 Hierarchy

Elizabeth - Phillip Charles - Dianna William Harrold Anne - Mark Peter Sarah Andrew - Sarah Beatrice Eugenie Edward

B.5.3 Question

Select the item which represents the PARENTS of PETER.

Appendix C Program Development Code Segments  PROCEDURE Check PROCEDURE Check; BEGIN IF Flag THEN count := count + 1 END;

 PROCEDURE AddNumbers|PROCEDURE stub and FOR loop PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END {for} END; {AddNumbers}

 PROCEDURE StoreCount|CASE construct PROCEDURE StoreCount(ThisChar: char; VAR Ecount, Ocount, Zcount: integer); {Increments the proper total.} BEGIN CASE ThisChar OF ',', ' ' : ; {Ignore these} '2', '4', '6', '8' : Ecount := Ecount + 1; '1', '3', '5', '7', '9' : Ocount := Ocount + 1; '0' : Zcount := Zcount + 1 END {case} END; {StoreCount}

161

APPENDIX C. PROGRAM DEVELOPMENT CODE SEGMENTS

162

 FUNCTION Random|FUNCTION stub FUNCTION Random(VAR Seed: integer): real; {Generates a pseudo-random number such that 0 Upper THEN writeln (Value: 1, ' was too large. Try again.') UNTIL Value IN [Lower..Upper] END; {CheckInput}

 FUNCTION IterativeSum|WHILE-DO construct FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series, 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; WHILE Limit > 1 DO BEGIN Limit := Limit - 1; TemporarySum := Limit + TemporarySum END; IterativeSum := TemporarySum END; {IterativeSum}

 PROCEDURE Count

This last procedure was not used in the KLM study since it is essentially the same as PROCEDURE Check but it was used in the usability study. PROCEDURE Count; BEGIN IF Test THEN check := check + 1 END;

Appendix D Program Maintenance Code Segments 

WHILE-DO

change to IF-THEN

WHILE count0 DO BEGIN read (char); count := count - 1 END

becomes IF count0 THEN BEGIN read (char); count := count - 1 END

 IF-THEN change to WHILE-DO (reverse of previous)  AddNumbers|Insert extra statement in loop FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END

becomes FOR Count := 1 TO Several DO BEGIN readln (Current); writeln (Current:5,' was input.'); Sum := Sum + Current END

164

APPENDIX D. PROGRAM MAINTENANCE CODE SEGMENTS

 AddNumbers|Remove extra statement in loop (reverse of previous)  AddNumbers|Reverse loop order FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END

becomes FOR Count := Several DOWNTO 1 DO BEGIN readln (Current); Sum := Sum + Current END

 AddNumbers|Alter variable name|Current becomes Now  ComputeChange|Remove IF-THEN-ELSE (ignore comment) IF Pieces > 1 THEN writeln ('s') ELSE writeln

becomes writeln ('s'); writeln

 ComputeChange|Insert IF-THEN-ELSE (reverse of previous)  ComputeChange|Alter IF condition and expression IF Pieces > 1

becomes IF Pieces >= 2

 IterativeSum|Remove WHILE-DO WHILE Limit > 1 DO BEGIN Limit := Limit - 1; TemporarySum := Limit + TemporarySum END; IterativeSum := TemporarySum

becomes Limit := Limit - 1; TemporarySum := Limit + TemporarySum; IterativeSum := TemporarySum

 IterativeSum|Insert WHILE-DO (reverse of previous)

165

Appendix E Instructions for Editing Experiment E.1 Introduction The software development tool designer is interested in designing tools that are most ecient for use by software engineers. There are various measures of eciency, and the time required to do tasks is one of them. A predictive model such as the Keystroke-Level Model (KLM) enables a designer to predict the time taken by experienced-expert users in carrying out a given task using a given tool or paradigm. Such models make many assumptions including error-free expert behaviour and they only predict time to execute a task and not the time taken in acquiring the information to execute the task. It is unlikely, however, that even expert software engineers behave in an error-free way when using software development tools so error type, frequency and distribution are of interest. Thus, although models provide some theoretical basis on which to choose between design options there are many issues not covered. An alternative complementary activity is a controlled experiment where a user's performance with various design options can be measured. Analyses of these various measures can then be carried out and used to assist a designer in choosing between design options. The choice between the two language-based editing paradigms of tree-building (or template-based) and text recognition has been an issue for some time now with much intuitive comment appearing in the literature. We have conducted a theoretical study using predictive modelling techniques but this seemed to raise more issues than it resolved. The study seemed to suggest to us that controlled experimentation may provide answers to the issues. For this experiment we shall use two existing editors as the experimental platforms. The Pascal editor, supplied as an example with the Cornell Synthesizer Generator, is a compromise solution between a strictly tree-based editor and text editor. Language constructs at all levels can be created and manipulated as templates but at lower levels the user can also enter and edit expressions as text. The Cornell editor presents the user with a menu of template constructs and operators. This menu is permanently displayed and dynamically updated as the user inputs text and selects templates. A template can be selected from the displayed menu using a mouse to point to the menu item and a button click to initiate the selection. 166

APPENDIX E. INSTRUCTIONS FOR EDITING EXPERIMENT

167

The editor provides various navigational facilities, both keyboard- and mouse-based as well as cut and paste operators and bu ers to hold cut text and structures. The second editor to be used is UQ1, a recognition editor, parsing and formatting text as it is typed. The editor anticipates downstream symbols which can be either accepted or overtyped. It caters for good typists in that they can ignore the display completely and touch-type the symbol sequences. Poor typists however can exploit the anticipatory powers of the editor to minimise the keystrokes required. Recently, UQ1 has been enhanced to allow program development using menu selection. This was a relatively simple adaptation since the parser is able to predict all possible constructs, terminals and non-terminals at any editing stage. This enhancement allows UQ1 to e ectively simulate template-based input without the need to compromise its basic text-recognition editing paradigm. The menu is displayed during insertion mode by the user pressing the middle mouse button. Once displayed it can be moved and re-sized to suit the needs of a particular user. It is updated automatically to re ect the allowable constructs, terminals and nonterminals at the current insertion location in the program. Items from the menu can be selected for insertion in the document by pointing to them and clicking the left mouse button. Thus, both paradigms are able to be examined within the one editor. This enables us to avoid possible confounding e ects of di erent physical interface presentation which might prevail between the two di erent editors used in the study.

E.2 Program Development and Maintenance Tasks Program development involves the input of program source code using an editor. Software engineers sometimes develop their code o -line then type in that code using their favourite editor. Others use pseudocode as the basis for their programs and then develop the syntax as they input the code using the editor. Still others sit straight down in front of their workstations and begin typing. In this study we are mainly concerned with actions consistent with the rst two, more organised, program input styles. Thus, the example source code is either assumed to exist before input, in either typed or hand-written form, or alternatively it exists in a form that is relatively easy to transform into source code at input time. The de nition of a unit-task in this context will most likely be a `chunk' of program and may even be an example of a `programming plan' familiar to experienced programmers. Program maintenance tasks might include xing errors or making enhancements. In both instances the software engineer will need special skills such as are necessary for program comprehension, debugging techniques, speci cation comprehension, and so on. They also need to be able to eciently utilise the tools available which facilitate these maintenance tasks. As in program development, we are interested in organised maintenance methods. Thus, the maintenance activities to be undertaken by software engineers are assumed to be well de ned. That is, they are implementing changes that have been previously determined. In this study both types of task will be used. For program development we consider a range of tasks including input of procedure and function stubs and various language construct types. Program maintenance tasks include trivial alterations to code as well as relatively complex structural changes.

APPENDIX E. INSTRUCTIONS FOR EDITING EXPERIMENT

168

E.3 Experimental Procedure The experiment will be conducted on the same computer in the same room for each subject. Thus, subjects will be individually tested rather than group tested. Participation in the experiment will contribute to the assessment for each subject in their CPE course. However, subjects will be made aware that their individual performances with the editing paradigms and the editors are not included as part of this assessment. Participation by the subject and their report on the experiment forms the only basis for assessment. During the experiment a computer-based log will be automatically kept. This log includes all mouse and keyboard events. All experimental sessions will also be video-taped. These video-tapes complement the main data collection device of the computer-based log and ensure that no reactions are missed. The author will monitor subject reactions during a session, only intervening when clari cation of a task is required or when system functions need attention. Between the computerbased log, the video-taping and the observation, all errors and the time required for all tasks and individual mouse and keyboard activity will be recorded. Three treatments have been devised for the experiment involving input and maintenance of program code using: 1. tree-building with CSG (TB-CSG); 2. text-recognition with UQ1 (TR-UQ1); and 3. tree-building with UQ1 (TB-UQ1). Each treatment will be undertaken by each subject on a di erent day with the allocation of treatment sequence randomised. For each experimental subject there are 20 tasks to be completed for each treatment including nine program input tasks and eleven program maintenance tasks. Before any experimental treatments are considered subjects will be given an opportunity to become familiar with the particular experimental setup. At this time no data will be collected and subjects are free to experiment with the experimental platforms in any way they see t. This will occur on a day prior to the running of any treatment. At the end of the experiment, after each subject has experienced all three treatments, each subject will participate in a structured interview with the author.

E.4 Summary We are comparing editing paradigms used in language-based editors. The measures in the comparison are primarily error frequency and keystroke and mouse performance measured by time to complete various typical program input and maintenance tasks. More subjective measures such as usability of the paradigms will also be assessed. On Day 1 subjects will have an opportunity to become familiar with the experimental setup. On 3 subsequent days the subjects will undertake 20 editing tasks per day using 3 di erent treatments (1 treatment per day). On the last of these 3

APPENDIX E. INSTRUCTIONS FOR EDITING EXPERIMENT

169

days the subjects will participate in a short structured interview. Over the total of 4 days no more than 4 hours (1 hour per day) should be necessary to complete the required tasks.

Appendix F Editing Paradigm Experiment Details F.1 Sample Task In the program called Sample you need to insert several program segments. Use the editor provided and the editing paradigm allocated to input these in as ecient a manner as you can. This is a practice session only and no keyboard or mouse use is being monitored.

F.1.1 Task 1: Junk

Enter the PROCEDURE

Junk

stub after PROCEDURE

Binary

|

PROCEDURE Junk (Many: integer; VAR Total: real);

F.1.2 Task 2: Binary

In PROCEDURE

Binary

|

PROCEDURE Binary(Value: integer; Numbers: NumberArray; VAR Position: integer); VAR Midpoint, Left, Right: integer; BEGIN Left := 1; Right := MAX; REPEAT Midpoint := (Left + Right) DIV 2; IF Value < Numbers[Midpoint] THEN Right := Midpoint - 1 ELSE Left := Midpoint + 1 UNTIL (Value = Midpoint) OR (Left > Right) END;

insert the missing IF construct, as shown below 170

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

171

PROCEDURE Binary(Value: integer; Numbers: NumberArray; VAR Position: integer); VAR Midpoint, Left, Right: integer; BEGIN Left := 1; Right := MAX; REPEAT Midpoint := (Left + Right) DIV 2; IF Value < Numbers[Midpoint] THEN Right := Midpoint - 1 ELSE Left := Midpoint + 1 UNTIL (Value = Midpoint) OR (Left > Right); IF Value = Numbers[Midpoint] THEN Position := Midpoint ELSE Position := 0 END;

F.2 Program Input Tasks In the program called ProgramInput you need to insert several program segments. Use the editor provided and the editing paradigm allocated to input these in as ecient a manner as you can. All keyboard and mouse use is being monitored and timed but only times for each individual task are important not the overall time for all tasks.

F.2.1 Task 1: Check

Enter PROCEDURE

Check

after the program header |

PROCEDURE Check; BEGIN IF Flag THEN count := count + 1 END;

F.2.2 Task 2: AddNumbers

Enter the PROCEDURE

AddNumbers

stub after PROCEDURE

Check

|

PROCEDURE AddNumbers(Several: integer; VAR Sum: integer);

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

F.2.3 Task 3: AddNumbersAlso

Extend the PROCEDURE

AddNumbersAlso

shell from |

PROCEDURE AddNumbersAlso(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers} VAR Count, Current: integer; BEGIN Sum := 0 END;

to the full PROCEDURE | PROCEDURE AddNumbersAlso(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END END;

F.2.4 Task 4: StoreCount

Extend the PROCEDURE

StoreCount

shell from |

PROCEDURE StoreCount(ThisChar: char; VAR Ecount, Ocount, Zcount: integer); {Increments the proper total} BEGIN END;

to the full PROCEDURE | PROCEDURE StoreCount(ThisChar: char; VAR Ecount, Ocount, Zcount: integer); {Increments the proper total} BEGIN CASE ThisChar OF ',', ' ' : ; '2', '4', '6', '8' : Ecount := Ecount + 1; '1', '3', '5', '7', '9' : Ocount := Ocount + 1; '0' : Zcount := Zcount + 1 END END;

172

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

F.2.5 Task 5: Random

Enter the FUNCTION

Random

stub after PROCEDURE

StoreCount

|

FUNCTION Random(VAR Seed: integer): real;

F.2.6 Task 6: ComputeChange

Extend the PROCEDURE

ComputeChange

shell from |

PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1) END;

to the full PROCEDURE | PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); IF Pieces > 1 THEN writeln ('s') ELSE writeln END;

F.2.7 Task 7: CheckInput

Extend the PROCEDURE

CheckInput

shell from |

PROCEDURE CheckInput(VAR Value: integer; Upper, Lower: integer); {Gets and returns a value between Lower and Upper} BEGIN END;

to the full PROCEDURE |

173

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS PROCEDURE CheckInput(VAR Value: integer; Upper, Lower: integer); {Gets and returns a value between Lower and Upper} BEGIN REPEAT writeln ('Enter an integer from ', Lower: 1, ' to ', Upper: 1); readln (Value) UNTIL Value IN [Lower..Upper] END;

F.2.8 Task 8: IterativeSum

Extend the FUNCTION

IterativeSum

shell from |

FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; IterativeSum := TemporarySum END;

to the full FUNCTION | FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; WHILE Limit > 1 DO BEGIN Limit := Limit - 1; TemporarySum := Limit + TemporarySum END; IterativeSum := TemporarySum END;

F.2.9 Task 9: Count

Enter PROCEDURE

Count

after FUNCTION

IterativeSum

PROCEDURE Count; BEGIN IF Test THEN check := check + 1 END;

|

174

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

175

F.3 Program Maintenance Tasks In the program called ProgramMaintenance you need to alter several program segments. Use the editor provided and the editing paradigm allocated to change these in as ecient a manner as you can. All keyboard and mouse use is being monitored and timed but only times for each individual task are important not the overall time for all tasks.

F.3.1 Task 1: Dummy

Change the WHILE

DO

of PROCEDURE

Dummy

|

PROCEDURE Dummy; {Dummy procedure.} VAR count: integer; char: char; BEGIN WHILE count0 DO BEGIN read (char); count := count - 1 END END;

to an IF

THEN

construct, as shown below

PROCEDURE Dummy; {Dummy procedure.} VAR count: integer; char: char; BEGIN IF count0 THEN BEGIN read (char); count := count - 1 END END;

F.3.2 Task 2: Dummy

Change the IF

THEN

of PROCEDURE

Dummy

PROCEDURE Dummy; {Dummy procedure.} VAR count: integer; char: char; BEGIN IF count0 THEN BEGIN read (char); count := count - 1 END END;

|

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS to a WHILE

DO

construct, as shown below

PROCEDURE Dummy; {Dummy procedure.} VAR count: integer; char: char; BEGIN WHILE count0 DO BEGIN read (char); count := count - 1 END END;

F.3.3 Task 3: AddNumbers

In PROCEDURE

AddNumbers

|

PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END END;

insert the extra writeln statement, as shown below PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); writeln (Current:5,' was input.'); Sum := Sum + Current END END;

176

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

F.3.4 Task 4: AddNumbers

In PROCEDURE

AddNumbers

|

PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); writeln (Current:5,' was input.'); Sum := Sum + Current END END;

remove the writeln statement, as shown below PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END END;

F.3.5 Task 5: AddNumbers

In PROCEDURE

AddNumbers

|

PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END END;

reverse the sense of the FOR statement, as shown below

177

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

178

PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers,} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := Several DOWNTO 1 DO BEGIN readln (Current); Sum := Sum + Current END END;

F.3.6 Task 6: AddNumbers

In PROCEDURE

AddNumbers

|

PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Current: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Current); Sum := Sum + Current END END;

change the three (3) occurrences of the variable named Current to the new name Now, as shown below PROCEDURE AddNumbers(Several: integer; VAR Sum: integer); {Read and add a sequence of Several Numbers.} VAR Count, Now: integer; BEGIN Sum := 0; FOR Count := 1 TO Several DO BEGIN readln (Now); Sum := Sum + Now END END;

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

179

F.3.7 Task 7: ComputeChange

In PROCEDURE

ComputeChange

|

PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); IF Pieces > 1 THEN writeln ('s') ELSE writeln END;

remove the IF construct but leave the statements of the THEN and ELSE clauses, as shown below PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); writeln ('s'); writeln END;

F.3.8 Task 8: ComputeChange

In PROCEDURE

ComputeChange

|

PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); writeln ('s'); writeln END;

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

180

insert an IF construct making writeln('s') the statement of the THEN clause and the statement of the ELSE clause, as shown below

writeln

PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); IF Pieces > 1 THEN writeln ('s') ELSE writeln END;

F.3.9 Task 9: ComputeChange

In PROCEDURE

ComputeChange

|

PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); IF Pieces > 1 THEN writeln ('s') ELSE writeln END;

alter the below

IF

conditional expression from

Pieces > 1

to

, as shown

Pieces >= 2

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS PROCEDURE ComputeChange(Unit: integer; VAR Change: integer); {Prints number of coins. Reduces Change by that many Units.} VAR Pieces: integer; BEGIN Pieces := Change DIV Unit; Change := Change MOD (Pieces * Unit); write (Pieces: 1); IF Pieces >= 2 THEN writeln ('s') ELSE writeln END;

F.3.10 Task 10: IterativeSum

In FUNCTION

IterativeSum

|

FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; WHILE Limit > 1 DO BEGIN Limit := Limit - 1; TemporarySum := Limit + TemporarySum END; IterativeSum := TemporarySum END;

remove the WHILE loop leaving the statements intact, as shown below FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; Limit := Limit - 1; TemporarySum := Limit + TemporarySum; IterativeSum := TemporarySum END;

181

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS

182

F.3.11 Task 11: IterativeSum

In FUNCTION

IterativeSum

|

FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; Limit := Limit - 1; TemporarySum := Limit + TemporarySum; IterativeSum := TemporarySum END;

insert a WHILE loop around the statements, as shown below FUNCTION IterativeSum(Limit: integer): integer; {Iteratively sums the series 1 through Limit.} VAR TemporarySum: integer; BEGIN TemporarySum := Limit; WHILE Limit > 1 DO BEGIN Limit := Limit - 1; TemporarySum := Limit + TemporarySum END; IterativeSum := TemporarySum END;

F.4 Typing/Mousing Test In the program called TypingTest you need to insert a comment and check the syntax of the program. Use the UQ1 editor to input this in as ecient a manner as you can. All keyboard and mouse use is being monitored and timed. Enter the following comment before END in the program and then check the syntax of the program. {The quick brown fox jumps over the lazy dog.}

The steps you should take are: 1. 2. 3. 4.

Click on the E of END in the program. Click on Insert in the control panel. Click in the document region. Type the comment |

{The quick brown fox jumps over the lazy dog.}

APPENDIX F. EDITING PARADIGM EXPERIMENT DETAILS 5. Type the ESC key. 6. Click on Syntax in the control panel.

183

Appendix G Questionnaire for Editing Experiment For this part of the questionnaire we would like you to think back over your experience with the three editing paradigms and their implementations. 1. tree-building with CSG (TB-CSG); 2. text-recognition with UQ1 (TR-UQ1); and 3. tree-building with UQ1 (TB-UQ1). Q1. From fastest to slowest, rank your performance speed for each of the three paradigms. Brie y explain the reasons for your rankings. Q2. From best to worst, rank your performance accuracy for each of the three paradigms. Brie y explain the reasons for your rankings. Q3. From best to worst, rank your preference for each of the three paradigms. Brie y explain the reasons for your rankings. We would also like you to consider your involvement as a subject in the experiment and the experimental approach per se. Q4. What did you feel about the experimental setup|be frank? Include comments on room and equipment, the presence of video, the presence of the experimenter and any other factors you think important. Q5. Did you think that the tasks presented were a representative sample of typical program development tasks (in Pascal)? If not what would you have included? Q6. Do you think that this type of experimentation is useful in deciding on design options particularly where issues of eciency of use are important? Why? Q7. What do you think are the main problems with this type of experimentation. List and comment on any you consider. Q8. Would you consider using this experimental approach to assist your own decision making about design options? Why? 184

APPENDIX G. QUESTIONNAIRE FOR EDITING EXPERIMENT

185

Thank you very much for your assistance in participating in the experiments and in answering these questions. Mark Toleman

Appendix H Tables of Error Analysis Means In this appendix we present tables of means for the four error parameters from chapter 8. Those parameters and associated tables, presented here, are: 1. 2. 3. 4.

Number of types of error in program input and maintenance tasks, table H.1, Number of actual errors in program input and maintenance tasks, table H.2, Error times for program input and maintenance tasks, table H.3, Average time for each error for program input and maintenance tasks, table H.4.

186

APPENDIX H. TABLES OF ERROR ANALYSIS MEANS

187

Table H.1: Number of types of error in program input and maintenance tasks. Means within the same row, with the same superscript, are not significantly different.

Input Task UQ1-TR UQ1-TB CSG-TB SE Mean Check{PROCEDURE 2.6 2.3 2.0 0.16 AddNumbers{PROCEDURE 1:9a 2:3a 3:4b 0.077 AddNumbers{FOR 3.1 2.6 1.9 0.096 StoreCount{CASE 3.4 2.1 3.1 0.11 Random{FUNCTION 1.7 2.0 1.4 0.11 ComputeChange{IF-THEN-ELSE 2.2 2.6 1.7 0.16 CheckInput{REPEAT-UNTIL 3.1 2.3 2.7 0.090 IterativeSum{WHILE-DO 2.7 2.0 1.7 0.12 Count{PROCEDURE 2.8 2.5 2.2 0.065 Maintenance Task UQ1-TR UQ1-TB CSG-TB SE Mean WHILE-DO change 3.4 1.7 1.6 0.16 a b b IF-THEN change 3:3 0:7 0:1 0.18 AddNumbers{insert statement 2.9 3.0 2.1 0.13 AddNumbers{remove statement 0.0 0.1 0.6 0.14 AddNumbers{reverse loop 2.2 1.5 0.8 0.24 AddNumbers{alter variable name 2:1a 1:9a 0:7b 0.14 ComputeChange{remove IF 1.9 1.8 1.1 0.11 ComputeChange{insert IF 3:9a 2:8ab 2:2b 0.094 ComputeChange{alter IF 1:9a 2:3a 0:7b 0.10 IterativeSum{remove WHILE 1.9 2.3 0.8 0.19 IterativeSum{insert WHILE 4:1a 5:1a 2:0b 0.073

APPENDIX H. TABLES OF ERROR ANALYSIS MEANS

188

Table H.2: Number of actual errors in program input and maintenance tasks. Means within the same row, with the same superscript, are not significantly different.

Input Task UQ1-TR UQ1-TB CSG-TB SE Mean Check{PROCEDURE 8:2a 6:5ab 4:2b 0.11 AddNumbers{PROCEDURE 5.4 7.5 6.0 0.11 AddNumbers{FOR 10.9 7.2 4.8 0.20 StoreCount{CASE 23.0 24.1 28.6 0.11 Random{FUNCTION 4.3 3.6 2.7 0.14 ComputeChange{IF-THEN-ELSE 5.4 5.3 3.0 0.19 CheckInput{REPEAT-UNTIL 11.5 7.6 9.4 0.086 IterativeSum{WHILE-DO 12.7 11.5 11.0 0.082 a b b Count{PROCEDURE 8:6 6:0 5:2 0.083 Maintenance Task UQ1-TR UQ1-TB CSG-TB SE Mean WHILE-DO change 4:7a 2:2b 1:8b 0.14 a b b IF-THEN change 4:5 0:9 0:1 0.18 AddNumbers{insert statement 3.3 3.5 2.8 0.23 AddNumbers{remove statement 0.0 0.1 0.6 0.14 AddNumbers{reverse loop 2.9 1.8 1.1 0.31 AddNumbers{alter variable name 4:9a 4:9a 0:9b 0.14 ComputeChange{remove IF 1.9 1.7 1.8 0.10 ComputeChange{insert IF 7:8a 4:4ab 3:5b 0.16 a a b ComputeChange{alter IF 2:4 3:0 1:0 0.11 IterativeSum{remove WHILE 2.5 2.5 1.1 0.36 IterativeSum{insert WHILE 7:8a 8:3a 3:0b 0.11

APPENDIX H. TABLES OF ERROR ANALYSIS MEANS

189

Table H.3: Error times for program input and maintenance tasks (sec). Means within the same row, with the same superscript, are not significantly different.

Input Task UQ1-TR UQ1-TB CSG-TB SE Mean Check{PROCEDURE 9.87 18.62 17.39 9.98 AddNumbers{PROCEDURE 2.57 9.96 50.99 15.42 AddNumbers{FOR 19.04 9.15 12.01 6.26 StoreCount{CASE 39.31 13.75 30.99 14.58 Random{FUNCTION 7.93 0.75 8.17 3.76 ComputeChange{IF-THEN-ELSE 6.71 18.11 11.77 4.92 CheckInput{REPEAT-UNTIL 20.53 10.18 23.51 11.43 IterativeSum{WHILE-DO 26.57 13.09 7.19 7.54 Count{PROCEDURE 20.12 17.45 7.71 5.47 Maintenance Task UQ1-TR UQ1-TB CSG-TB SE Mean WHILE-DO change 12.26 5.04 11.18 5.99 IF-THEN change 7.69 2.41 0.77 1.65 AddNumbers{insert statement 5.23 3.78 11.61 4.97 AddNumbers{remove statement 0.00 0.47 7.28 2.57 AddNumbers{reverse loop 5.13 4.24 9.14 2.98 AddNumbers{alter variable name 9.47 7.99 2.65 2.66 ComputeChange{remove IF 6.37 0.24 24.14 6.71 ComputeChange{insert IF 22.50 13.76 14.08 5.68 ComputeChange{alter IF 3.64 4.83 0.18 1.02 IterativeSum{remove WHILE 12.16 8.27 7.27 4.39 IterativeSum{insert WHILE 21.57 30.15 14.06 7.32

APPENDIX H. TABLES OF ERROR ANALYSIS MEANS

190

Table H.4: Average time for each error for program input and maintenance tasks (sec). Means within the same row, with the same superscript, are not significantly different.

Input Task UQ1-TR UQ1-TB CSG-TB SE Mean Check{PROCEDURE 1.00 2.66 3.47 2.05 AddNumbers{PROCEDURE 0.40 0.50 10.92 3.49 AddNumbers{FOR 1.56 0.78 3.28 1.14 StoreCount{CASE 1.38 0.60 1.07 0.45 Random{FUNCTION 2.39 0.09 3.43 1.50 ComputeChange{IF-THEN-ELSE 1.15 3.76 2.57 1.02 CheckInput{REPEAT-UNTIL 1.96 0.92 1.93 0.88 IterativeSum{WHILE-DO 2.02 1.05 0.56 0.54 Count{PROCEDURE 2.15 2.90 1.39 0.75 Maintenance Task UQ1-TR UQ1-TB CSG-TB SE Mean WHILE-DO change 2.18 1.83 9.16 2.25 IF-THEN change 1.63 1.58 0.47 0.54 AddNumbers{insert statement 1.80 0.78 3.21 1.27 AddNumbers{remove statement 0.00 0.00 5.96 2.50 AddNumbers{reverse loop 1.45 1.67 4.34 1.59 AddNumbers{alter variable name 1.78 1.66 1.83 0.57 a a b ComputeChange{remove IF 3:08 0:65 9:79 1.88 ComputeChange{insert IF 2.68 2.85 4.05 0.88 ComputeChange{alter IF 1:47a 1:58a 0:05b 0.35 IterativeSum{remove WHILE 6.50 3.44 2.30 1.06 IterativeSum{insert WHILE 2.88 4.36 4.60 1.78

Bibliography [AD89]

[Alm90] [Ano94] [Ant88] [App88] [Bal95] [Ber93] [BFH75] [Bil93] [Bil94] [Bil95] [BM94]

B. Auernheimer and J.L. Dyck. Human-computer interaction: Psychology and computer science meet. In B.E. Shriver, editor, Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences (Software Track), volume II, pages 425{426, 1989. V. Almstrum. User interface evaluation: Data visualization in Pascal Genie. School of Urban and Public A airs, Carnegie Mellon University, 1990. Personal Communication. Anon. Publications. SIGCHI Bulletin, 26(1):64{70, 1994. J.F. Antin. An empirical comparison of menu selection, command entry and combined modes of computer control. Behaviour and Information Technology, 7:173{182, 1988. Apple, Inc. Human Interface Guidelines|The Apple Desktop Interface. Addison-Wesley, Reading, MA, 1988. S. Balbo. Automatic evaluation of user interface usability: Dream or reality? In S. Balbo, editor, QCHI95 Symposium Proceedings, pages 9{16. Bond University, 1995. C.F. Bertholf. Comprehension of literate programs by novice and intermediate programmers. Master's thesis, Department of Computer Science, Portland State University, Portland, USA, March 1993. Y.M.M Bishop, S.E. Fienberg, and P.W. Holland. Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambridge, Mass., 1975. P. Billingsley. Standards Factor: A closer look at ISO 9241. SIGCHI Bulletin, 25(3):11{13, 1993. P. Billingsley. Standards: The pace quickens . . . . SIGCHI Bulletin, 26(3):8{13, 1994. P. Billingsley. Standards: ANSI/HFES 200 standards well underway. SIGCHI Bulletin, 27(3):8{9, 1995. N. Bevan and M. MacLeod. Usability measurement in context. Behaviour and Information Technology, 13(1/2):132{145, 1994. 191

BIBLIOGRAPHY [Bro87a]

192

F.P. Brooks. No silver bullet: Essence and accidents of software engineering. Computer, 20(4):10{19, 1987. [Bro87b] B.M. Broom. Aspects of Interactive Program Display. PhD thesis, Department of Computer Science, The University of Queensland, Brisbane, Australia, September 1987. [BW86] B.M. Broom and J. Welsh. Detail compression techniques for interactive program display. In G.W. Gerrity, editor, Ninth Australian Computer Science Conference, pages 83{93, 1986. [BW91] B. Broom and J. Welsh. Display techniques for language-based editors. Technical Report 198, Department of Computer Science, The University of Queensland, Brisbane, Australia, 1991. [BWW89] B. Broom, J. Welsh, and L. Wildman. UQ2 editor user manual. Technical report, Department of Computer Science, The University of Queensland, Brisbane, Australia, 1989. [BWW90] B. Broom, J. Welsh, and L. Wildman. UQ2: A multilingual document editor. In Proceedings of the Fifth Australian Software Engineering Conference, pages 289{294, 1990. [Car94] J.M. Carroll. Making use a design representation. Communications of the ACM, 37(12):29{35, 1994. [CB94] D.L. Cuomo and C.D. Bowen. Understanding usability issues addressed by three user-system interface evaluation techniques. Interacting with Computers, 6(1):86{108, 1994. [CC85] D. Cooper and M. Clancy. Oh! Pascal! W.W. Norton, New York, NY, second edition, 1985. [Chi90] J. Chin. Position paper. Paper presented to the CHI'90 Workshop on Structure Editors, 1990. [Chu95] N. Churcher. Photi|a sheye view of bubbles. Information and Software Technology, 37(1):31{37, 1995. [CHW90] D.A. Carrington, I.J. Hayes, and J. Welsh. A syntax-directed editor for object-oriented speci cations. In Tools Paci c'90, pages 46{57, November 1990. [CMN80] S.K. Card, T.P. Moran, and A. Newell. The keystroke-level model for user performance time with interactive systems. Communications of the ACM, 23(7):396{410, 1980. [CMN83] S.K. Card, T.P. Moran, and A. Newell. The Psychology of HumanComputer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ, 1983.

BIBLIOGRAPHY [CS90]

[CSS93] [CW91] [Dij89] [DL81]

[Dow93] [dSB90]

[Ebe87] [Ebe94] [EE89]

[EG75] [EN81] [EP91]

193

R. Chimera and B. Shneiderman. Evaluation of three interfaces for browsing large hierarchical tables of contents. Technical Report CS-TR2620, Computer Science Department, University of Maryland, College Park, MD, 1990. C.R. Cook, J.C. Scholtz, and J.C. Spohrer, editors. Empirical Studies of Programmers: Fifth Workshop. Ablex, Norwood, NJ, 1993. C.L. Corritore and S. Wiedenbeck. What do novices learn during program comprehension? International Journal of Human-Computer Interaction, 3(2):199{222, 1991. E.W. Dijkstra. On the cruelty of really teaching computer science. Communications of the ACM, 32(12):1398{1404, 1989. S.T. Dumais and T.K. Landauer. Psychological investigations of natural terminology for command and query languages. In A. Badre and B. Shneiderman, editors, Directions in Human/Computer Interaction, pages 95{109. Ablex, Norwood, NJ, 1981. A. Downton. Evaluation techniques for human-computer systems design. In A. Downton, editor, Engineering the Human-Computer Interface, pages 265{295. McGraw-Hill, London, 1993. F. de Souza and N. Bevan. The use of guidelines in menu interface design: evaluation of a draft standard. In D. Diaper, D. Gilmore, G. Cockton, and B. Shackel, editors, Human-Computer Interaction| INTERACT'90, pages 435{440, 1990. R.E. Eberts. Human-computer interaction. In P.A. Hancock, editor, Human Factors Psychology, pages 249{304. North Holland, Amsterdam, 1987. R.E. Eberts. User Interface Design. Prentice-Hall, Inc., Englewood Cli s, NJ, 1994. R.E. Eberts and C.G. Eberts. Four approaches to human computer interaction. In P.A. Hancock and M.H. Chignell, editors, Intelligent Interfaces: Theory, Research and Design, pages 69{127. North Holland, Amsterdam, 1989. S.E. Engel and R.E. Granda. Guidelines for man/display interfaces. Technical Report TR 00.2720, IBM, Poughkeepsie, NY, 1975. D.W. Embley and G. Nagy. Behavioral aspects of text editors. ACM Computing Surveys, 13(1):33{70, 1981. J. Elkerton and S.L. Palmiter. Designing help using a GOMS model: An information retrieval evaluation. Human Factors, 33(2):185{204, 1991.

BIBLIOGRAPHY

194

[ERL+89] D.E. Egan, J.R. Remde, T.K. Landauer, C.C. Lochbaum, and L.M. Gomez. Behavioral evaluation and analysis of a hypertext browser. In K. Bice and C. Lewis, editors, Human Factors in Computing Systems| CHI'89, pages 205{210, 1989. [Fur86] G.W. Furnas. Generalized sheye views. In M. Mantei and P. Orbeton, editors, Human Factors in Computing Systems Conference|CHI'86, pages 16{23, 1986. [Gal93] W.O. Galitz. User-Interface Screen Design. QED Publishing Group, Boston, 1993. [GC87] M.M. Gardiner and B. Christie, editors. Applying Cognitive Psychology to User-Interface Design. John Wiley, Chichester, 1987. [GE90] R. Gong and J. Elkerton. Designing minimal documentation using a GOMS model: A usability evaluation of an engineering approach. In J.C. Chew and J. Whiteside, editors, Human Factors in Computing Systems|CHI'90, pages 99{106, 1990. [GH92] J.C. Grundy and J.G. Hosking. MViews: A framework for developing visual programming environments. In TOOLS Paci c'93, Englewood Cli s, NJ, 1992. Prentice-Hall. [Gil90] D.J. Gilmore. Methodological issues in the study of programming. In J.-M. Hoc, T.R.G. Green, R. Samurcay, and D.J. Gilmore, editors, Psychology of Programming, pages 83{96. Academic Press, London, 1990. [GJA93] W.D. Gray, B.E. John, and M.E. Atwood. Project Ernestine: Validating a GOMS analysis for predicting and explaining real-world performance. Human-Computer Interaction, 8(3):237{309, 1993. [GK94] R. Gong and D. Kieras. A validation of the GOMS model methodology in development of a specialized, commercial software application. In B. Adelson, S. Dumais, and J. Olson, editors, Human Factors in Computing Systems|CHI'94, pages 351{357, 1994. [Gou88] J.D. Gould. How to design usable systems. In M. Helander, editor, Handbook of Human-Computer Interaction, pages 757{789. North Holland, Amsterdam, 1988. [Gre92] T. Green. Does visualization really work?, March 1992. Discussion in comp.graphics.visualization. [Gre94] T.R.G. Green. Why software engineers don't listen to what psychologists don't tell them anyway. In D.J. Gilmore, R.L. Winder, and F. Detienne, editors, User-Centred Requirements for Software Engineering Environments, pages 323{333. Springer-Verlag, Berlin, 1994. [Gri65] J.E. Grizzle. The two-period change-over design and its use in clinical trials. Biometrics, 21:467{480, 1965.

BIBLIOGRAPHY [GW91]

195

D.R. Goldenson and B.J. Wang. Use of structure editing tools by novice programmers. In J. Koenemann-Belliveau, T.G. Moher, and S.P. Robertson, editors, Empirical Studies of Programmers: Fourth Workshop, pages 99{120. Ablex, Norwood, NJ, 1991. [HCMM89] J.G. Hollands, T.T. Carey, M.L. Matthews, and C.A. McCann. Presenting a graphical network: a comparison of performance using sheye and scrolling views. In G. Salvendy and M.J. Smith, editors, Designing and Using Human-Computer Interfaces and Knowledge Based Systems, pages 313{320, 1989. [HS94] S. Houde and R. Sellman. In search of design principles for programming environments. In B. Adelson, S. Dumais, and J. Olson, editors, Human Factors in Computing Systems|CHI'94, pages 424{430, 1994. [Ian92] R. Iannella. BruitSAM: An interface for user interface guidelines. In Proceedings of the Australian Computer Society Queensland Branch Conference, Gold Coast, Australia, 1992. [Ian94] R. Iannella. HyperSAM: A practical user interface guidelines management system. In R. Iannella, editor, Second Annual CHISIG (Queensland) Symposium|QCHI94, pages 19{24, 1994. [IPI94] S. Irving, P. Polson, and J.E. Irving. A GOMS analysis of the advanced automated cockpit. In B. Adelson, S. Dumais, and J. Olson, editors, Human Factors in Computing Systems|CHI'94, pages 344{350, 1994. [JCL85] C.R. Jesshope, M.J. Crawley, and G.L. Lovegrove. An intelligent Pascal editor for a graphical oriented workstation. Software|Practice and Experience, 15:1103{1119, 1985. [Jen84] J.P. Jenkins. An application of an expert system to problem solving in process control displays. In G. Salvendy, editor, Human-Computer Interaction, pages 255{260. Elsevier, Amsterdam, 1984. Also in Proceedings of the First USA-Japan Conference on Human-Computer Interaction. [JMWU91] R. Je ries, J.R. Miller, C. Wharton, and K.M. Uyeda. User interface evaluation in the real world: A comparison of four techniques. In S.P. Robertson, G.M. Olson, and J.S. Olson, editors, Human Factors in Computing Systems|CHI'91, pages 119{124, 1991. [Joh90] B.E. John. Extensions of GOMS analyses to expert performance requiring perception of dynamic visual and auditory information. In J.C. Chew and J. Whiteside, editors, Human Factors in Computing Systems|CHI'90, pages 107{115, 1990. [Joh92] P. Johnson. Human Computer Interaction: Psychology, Task Analysis and Software Engineering. McGraw-Hill, London, 1992.

BIBLIOGRAPHY [JV92]

196

B.E. John and A.H. Vera. A GOMS analysis of a graphic, machinepaced, highly interactive task. In P. Bauersfeld, J. Bennett, and G. Lynch, editors, Human Factors in Computing Systems|CHI'92, pages 251{258, 1992. [KBMR91] J. Koenemann-Belliveau, T.G. Moher, and S.P. Robertson, editors. Empirical Studies of Programmers: Fourth Workshop. Ablex, Norwood, NJ, 1991. [Kie88] D.E. Kieras. Towards a practical GOMS model methodology for user interface design. In M. Helander, editor, Handbook of Human-Computer Interaction, pages 67{85. North Holland, Amsterdam, 1988. [Kig84] J.I. Kiger. The depth/breadth trade-o in the design of menu-driven user interfaces. International Journal of Man-Machine Studies, 20:201{ 214, 1984. [KL94a] R.J. King and Y.K. Leung. Designing a user interface for folding editors to support collaborative work. In G. Cockton, S.W. Draper, and G.R.S. Weir, editors, HCI'94: People and Computers IX, pages 369{381, 1994. [KL94b] H.H. Koester and S.P. Levine. Validation of a keystroke-level model for a text entry system used by people with disabilities. In E.P. Glinert, editor, ASSETS'94: The First Annual International ACM/SIGCAPH Conference on Assistive Technologies, 1994. 31 October { 1 November, Los Angles. [Knu84] D.E. Knuth. Literate programming. The Computer Journal, 27:97{111, 1984. [KP85] D.E. Kieras and P.G. Polson. An approach to the formal analysis of user complexity. International Journal of Man-Machine Studies, 22:365{394, 1985. [KSDL89] R.J. Koubek, G. Salvendy, H.E. Dunsmore, and W.K. LeBold. Cognitive issues in the process of software development: Review and reappraisal. International Journal of Man-Machine Studies, 30:171{191, 1989. [KU93] A.A. Khwaja and J.E. Urban. Syntax-directed editing environments: Issues and features. In E. Deaton, G.H. Berghel, and G. Hedrick, editors, Applied Computing: States of the Art and Practice|1993, pages 230{237, New York, NY, 1993. ACM Press. Also in Proceedings of the 1993 ACM SIGAPP Symposium, Indianapolis, IN, 14{16 February. [KW86] D. Kiong and J. Welsh. An incremental parser for language-based editors. In Proceedings of the Nineth Australian Computer Science Conference, pages 107{118, 1986. [KW91] D. Kiong and J. Welsh. An incremental parse strategy for languagebased editors. Technical Report 208, Department of Computer Science, The University of Queensland, Brisbane, Australia, 1991.

BIBLIOGRAPHY [LA94]

197

Y.K. Leung and M.D. Apperley. A review and taxonomy of distortionoriented presentation techniques. ACM Transactions on ComputerHuman Interaction, 1(1):126{160, 1994. [Lan86] B. Lang. On the usefulness of syntax directed editors. In IFIP WG2.4 International Workshop on Advanced Programming Environments, pages 47{51, Trondheim, Norway, 1986. [Lar92] J.A. Larson. Interactive Software: Tools for Building Interactive User Interfaces. Prentice-Hall, Inc., Englewood Cli s, NJ, 1992. [Lin94] G. Lindgaard. Usability Testing and System Evaluation. Chapman & Hall, London, 1994. [LL93] J. Lowgren and U. Lauren. Supporting the use of guidelines and style guides in professional user interface design. Interacting with Computers, 5(4):385{396, 1993. [LNBN93] D.M. Lane, H.A. Napier, R.R. Batsell, and J.L. Naman. Predicting the skilled use of hierarchical menus with the keystroke-level model. Human-Computer Interaction, 8(2):185{192, 1993. [Mar90] C.D. Marlin. A distributed implementation of a multiple view integrated software development environment. In Proceedings of the Fifth Conference on Knowledge-Based Software Assistant, pages 388{402, Syracuse, NY, 1990. [May92] D.J. Mayhew. Principles and Guidelines in Software User Interface Design. Prentice-Hall, Inc., Englewood Cli s, NJ, 1992. [MBD+90] B. Magnusson, M. Bengtsson, L.-O. Dahlin, G. Fries, A. Gustavsson, G. Hedin, S. Minor, D. Ocsarsson, and M. Taube. An overview of the Mjlner/ORM environment: Incremental language and software development. Technical Report LU-CS-TR:90-57, Department of Computer Science, Lund University, Sweden, 1990. [McF91] G. McFarlane. Xmon UNIX Manual page, 1991. [MG93] D. Mitta and D. Gunning. Simplifying graphics-based data: applying the sheye lens viewing strategy. Behaviour and Information Technology, 12(1):1{16, 1993. [Mik81] M. Mikelsons. Pretty printing in an interactive programming environment. SIGPLAN Notices, 16(6):108{116, 1981. [Min90] S. Minor. On structure-oriented editing. PhD thesis, Department of Computer Science, Lund University, Sweden, 1990. [Min92] S. Minor. Interacting with structure-oriented editors. International Journal of Man-Machine Studies, 37:399{418, 1992.

BIBLIOGRAPHY [MJ84]

198

G.A. Milliken and D.E. Johnson. Analysis of Messy Data, volume 1: Designed Experiments. Lifetime Learning Publications, Belmont, California, 1984. [MM86] P. Muter and C. Mayson. The role of graphics in item selection from menus. Behaviour and Information Technology, 5(1):89{95, 1986. [MN90] R. Molich and J. Nielsen. Improving a human-computer dialogue. Communications of the ACM, 33(3):338{348, 1990. [MN94] R.L. Mack and J. Nielsen. Executive summary. In J. Nielsen and R.L. Mack, editors, Usability Inspection Methods, pages 1{23. John Wiley, New York, 1994. [MP90] Z. Mills and M. Prime. Are all menus the same? an empirical study. In D. Diaper, D. Gilmore, G. Cockton, and B. Shackel, editors, HumanComputer Interaction|INTERACT'90, pages 423{427, 1990. [MR89] S. Meyers and S.P. Reiss. Representing programs in multiparadigm software development environments. In Compsac 89, pages 420{427, 1989. [MR92] S. Meyers and S.P. Reiss. An empirical study of multiple-view software development. ACM SIGSOFT Software Engineering Notes, 17(5):47{ 57, 1992. Also in H. Weber, editor, ACM SIGSOFT'92: Fifth Symposium on Software Development Environments, Washington D.C., 9{11 December. [MRC91] J.D. Mackinlay, G.C. Robertson, and S.K. Card. The Perspective Wall: Detail and context smoothly integrated. In S.P. Robertson, G.M. Olson, and J.S. Olson, editors, Human Factors in Computing Systems| CHI'91, pages 173{179, 1991. [Mun94] E.V. Munson. Interoperability of software documents. In R.N. Taylor and J. Coutaz, editors, ICSE'94 Workshop on Software Engineering and Computer-Human Interaction: Joint Research Issues, pages 153{ 161, 1994. [Mus94] J. Musseler. Using predictors to partition menu selection times. Behaviour and Information Technology, 13(6):362{372, 1994. [MWD88] A.F. Monk, P. Walsh, and A. Dix. A comparison of hypertext, scrolling and folding as mechanisms for program browsing. In D.M. Jones and R. Winder, editors, HCI'88: People and Computers IV, pages 420{435, 1988. [Mye90] B.A. Myers. Taxonomies of visual programming and program visualization. Journal of Visual Languages and Computing, 1:97{123, 1990. [Mye94] B.A. Myers. Challenges of HCI design and implementation. Interactions, 1(1):73{83, 1994.

BIBLIOGRAPHY [Nel85] [Nie92]

199

J.A. Nelder. Glim77 Reference Manual. Royal Statistical Society, 1985. J. Nielsen. The usability engineering life cycle. Computer, 25(3):12{22, 1992. [Nor83] D.A. Norman. Some observations on mental models. In D. Getner and A.L. Stevens, editors, Mental Models. Lawrence Erlbaum Associates, Hillsdale, NJ, 1983. [NS90] L. Neal and G. Szwillus. Report on the CHI'90 workshop on structure editors. SIGCHI Bulletin, 22(2):49{53, 1990. [NSC+ 91] R.J. Norman, W. Stevens, E.J. Chikofsky, J. Jenkins, B.L. Rubenstein, and G. Forte. CASE at the start of the 1990's. In Proceedings of the Thirteenth International Conference on Software Engineering, pages 128{139, 1991. [NWK90] J. Neter, W. Wasserman, and M.H. Kutner. Applied Linear Statistical Models: Regression, Analysis of Variance, and Experimental Design. Irwin, Homewood, IL, third edition, 1990. [OO90] J.R. Olson and G.M. Olson. The growth of cognitive modeling in human-computer interaction since GOMS. Human-Computer Interaction, 5(2{3):221{265, 1990. [Ope93] Open Software Foundation, Inc. OSF/Motif Style Guide. Prentice-Hall, Inc., Englewood Cli s, NJ, 1993. [OSS87] G.M. Olson, S. Sheppard, and E. Soloway, editors. Empirical Studies of Programmers: Second Workshop. Ablex, Norwood, NJ, 1987. [st95] K. sterbye. Literate Smalltalk programming using hypertext. IEEE Transactions on Software Engineering, 21(2):138{145, 1995. [OSWS93] T. Oberndorf, C. Schmiedekamp, P. Warminster, and V. Squitieri. Next generation computer resources (NGCR) project support environment standards (PSES). In Eleventh Annual National Conference on Ada Technology, pages 160{168, Williamsburg, VA, March 1993. [PBS93] B.A. Price, R.M. Baecker, and I.S. Small. A principled taxonomy of software visualization. Journal of Visual Languages and Computing, 4(3):211{266, 1993. [Ped94a] M. Pedersen. An analysis of introducing UQ LBEs to persistent storage environments. Software Veri cation Research Centre, Department of Computer Science, The University of Queensland, 1994. Working Paper #1. [Ped94b] M. Pedersen. Towards persistence for recognition editors. Software Veri cation Research Centre, Department of Computer Science, The University of Queensland, 1994. Working Paper #4.

BIBLIOGRAPHY [Pen87]

200

N. Pennington. Stimulus structures and mental representations in expert comprehension of computer programs. Cognitive Psychology, 19:295{341, 1987. [PG90] N. Pennington and B. Grabowski. The tasks of programming. In J.-M. Hoc, T.R.G. Green, R. Samurcay, and D.J. Gilmore, editors, Psychology of Programming, pages 45{62. Academic Press, London, 1990. [PJ92] V.A. Peck and B.E. John. Browser-Soar: A computational model of a highly interactive task. In P. Bauersfeld, J. Bennett, and G. Lynch, editors, Human Factors in Computing Systems|CHI'92, pages 165{ 172, 1992. [PLSS84] M. Power, C. Lashley, P. Sanchez, and B. Shneiderman. An experimental comparison of tabular and graphic data presentation. International Journal of Man-Machine Studies, 20:545{566, 1984. [PM88] G. Perlman and T. Moorhead. NaviTextSAM Software User Manual. Northern Lights Software Corporation, Westford, MA, 1988. [PP92] M. Petre and B.A. Price. Why computer interfaces are not like paintings: The user as a deliberate reader. In East-West International Conference on Human-Computer Interaction|EWHCI'92, pages 217{224, 1992. [PRS+ 94] J. Preece, Y. Rogers, H. Sharp, D. Benyon, S. Holland, and T. Carey. Human-Computer Interaction. Addison-Wesley, Wokingham, England, 1994. [RB86] S.P. Robertson and J.B. Black. Structure and development of plans in computer text editing. Human-Computer Interaction, 2(3):201{226, 1986. [RBS93] H.D. Rombach, V.R. Basili, and R.W. Selby. Experimental software engineering issues: Critical assessment and future directions. In H.D. Rombach, V.R. Basili, and R.W. Selby, editors, Lecturer Notes in Computer Science 706, pages v{xiii. Springer-Verlag, Berlin, 1993. Experimental Software Engineering Issues: Critical Assessment and Future Directions, International Workshop, September 1992. [Ree94] P. Reed. Standards: ANSI/HFES software user interface standardization: Critical issues. SIGCHI Bulletin, 26(2):12{15, 1994. [Rie94] D. Riecken (Guest Editor). Intelligent agents. Communications of the ACM, 37(7):19{21, 1994. [RMC91] G.C. Robertson, J.D. Mackinlay, and S.K. Card. Cone Trees: Animated 3D visualizations of hierarchical information. In S.P. Robertson, G.M. Olson, and J.S. Olson, editors, Human Factors in Computing Systems|CHI'91, pages 189{194, 1991.

BIBLIOGRAPHY [Rob88] [RT84]

[RT89a] [RT89b] [RW81] [RW90] [SA82] [SB92] [SB94] [Sca81] [SH94] [Shn83] [Shn92] [SI86]

201

T.L. Roberts. Text editors. In M. Helander, editor, Handbook of Human-Computer Interaction, pages 655{672. North Holland, Amsterdam, 1988. T.W. Reps and T. Teitelbaum. The Synthesizer Generator. SIGPLAN Notices, 19(5):42{48, 1984. Also in P. Henderson, editor, Proceedings of ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, Pittsburgh, May 1984. T.W. Reps and T. Teitelbaum. The Synthesizer Generator: A System for Constructing Language-Based Editors. Springer-Verlag, New York, 1989. T.W. Reps and T. Teitelbaum. The Synthesizer Generator Reference Manual. Springer-Verlag, New York, third edition, 1989. G.A. Rose and J. Welsh. Formatted programming languages. Software|Practice and Experience, 11:651{669, 1981. C. Rich and R.C. Waters. The Programmer's Apprentice. AddisonWesley, Reading, MA, 1990. R. Spence and M.D. Apperley. Data base navigation: An oce environment for the professional. Behaviour and Information Technology, 1(1):43{54, 1982. M. Sarkar and M.H. Brown. Graphical sheye views of graphs. In P. Bauersfeld, J. Bennett, and G. Lynch, editors, Human Factors in Computing Systems|CHI'92, pages 83{91, 1992. M. Sarkar and M.H. Brown. Graphical sheye views. Communications of the ACM, 37(12):73{84, 1994. D.L. Scapin. Computer commands in restricted natural language: Some aspects of memory and experience. Human Factors, 23:84{90, 1981. S.B. Shum and N. Hammond. Delivering HCI modelling to designers: a framework and case study of cognitive modelling. Interacting with Computers, 6(3):314{341, 1994. B. Shneiderman. Direct manipulation: A step beyond programming langauges. Computer, 16(8):57{69, 1983. B. Shneiderman. Designing the User Interface: Strategies for E ective Human-Computer Interaction. Addison-Wesley, Reading, MA, second edition, 1992. E. Soloway and S. Iyengar, editors. Empirical Studies of Programmers. Ablex, Norwood, NJ, 1986.

BIBLIOGRAPHY [Sie56]

202

S. Siegel. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, Tokyo, 1956. [SIK+ 82] D.C. Smith, C. Irby, R. Kimball, B. Verplank, and E. Harslem. Designing the star user interface. Byte, 7(4):242{288, 1982. [SL92] J.B. Smith and M. Lansman. Designing theory-based systems: a case study. In P. Bauersfeld, J. Bennett, and G. Lynch, editors, Human Factors in Computing Systems|CHI'92, pages 479{488, 1992. [SM84a] S.L. Smith and J.N. Mosier. Design guidelines for user-system interface software. Technical Report ESD-TR-84-190, The Mitre Corporation, Bedford, MA, 1984. NTIS AD A154 907. [SM84b] S.L. Smith and J.N. Mosier. The user interface to computer-based information systems: A survey of current software design practice. Behaviour and Information Technology, 3:195{203, 1984. [SM86] S.L. Smith and J.N. Mosier. Guidelines for designing user interface software. Technical Report ESD-TR-86-278, The Mitre Corporation, Bedford, MA, 1986. Also published as NTIS AD A177 198. [Smi88] S.L. Smith. Standards versus guidelines for designing user interface software. In M. Helander, editor, Handbook of Human-Computer Interaction, pages 877{889. North Holland, Amsterdam, 1988. [Sol84] E. Soloway. A cognitively-based methodology for designing languages, environments and methodologies. SIGPLAN Notices, 19(5):193{196, 1984. [Sol87] E. Soloway. I can't tell what in the code implements what in the specs. In G. Salvendy, editor, Cognitive Engineering in the Design of Human-Computer Interaction and Expert Systems, pages 317{328, 1987. Also in Proceedings of the Second International Conference on Human-Computer Interaction. [Som95] I. Sommerville. Software Engineering. Addison-Wesley, Wokingham, England, fth edition, 1995. [SSSW86] B. Shneiderman, P. Shafer, R. Simon, and L. Weldon. Display strategies for program browsing. IEEE Software, 3:7{15, May 1986. [Ste84] K.R. Stern. An evaluation of written, graphics, and voice messages in proceduralized instructions. In Proceedings of the Human Factors Society|28th Annual Meeting, pages 314{318, 1984. [Sun89] Sun Microsystems, Inc. OPEN LOOK Graphical User Interface Functional Speci cation. Addison-Wesley, Reading, MA, 1989. [SW83] D.W. Small and L.J. Weldon. An experimental comparison of natural and structured query languages. Human Factors, 25:253{263, 1983.

BIBLIOGRAPHY

203

[SZB+92] D. Scha er, Z. Zuo, L. Bartram, J. Dill, S. Dubs, S. Greenberg, and M. Roseman. Comparing sheye and full-zoom techniques for navigation of hierarchically clustered networks. Technical Report 92/491/29, Department of Computer Science, The University of Calgary, Calgary, Canada, 1992. [TC94] R.N. Taylor and J. Coutaz. Workshop on software engineering and computer-human interaction: Joint research issues. In Proceedings of the Sixteenth International Conference on Software Engineering, pages 356{357, 1994. [Thi90] H. Thimbleby. Failure in the technical user-interface design process. Computers and Graphics, 9(3):187{193, 1990. [TN91] H. Thovtrup and J. Nielsen. Assessing the usability of a user interface standard. In S.P. Robertson, G.M. Olson, and J.S. Olson, editors, Human Factors in Computing Systems|CHI'91, pages 335{341, 1991. [TS91] L. Tetzla and D.R. Schwartz. The use of guidelines in interface design. In S.P. Robertson, G.M. Olson, and J.S. Olson, editors, Human Factors in Computing Systems|CHI'91, pages 329{333, 1991. [TW91a] M.A. Toleman and J. Welsh. An analysis of the Smith and Mosier user interface guidelines. Department of Computer Science, The University of Queensland, June 1991. Working Document. [TW91b] M.A. Toleman and J. Welsh. Retrospective application of user interface guidelines: A case study of a language-based editor. In J.H. Hammond, R.R. Hall, and I. Kaplan, editors, People Before Technology| OZCHI'91, pages 33{38, 1991. [TW94a] M.A. Toleman and J. Welsh. An evaluation of editing paradigms. In S. Howard and Y.K. Leung, editors, Harmony Through Working Together|OZCHI'94, pages 73{78, 1994. Also available in extended version as Technical Report 94-5, Software Veri cation Research Centre, Department of Computer Science, The University of Queensland, Brisbane, Australia. [TW94b] M.A. Toleman and J. Welsh. Issues in the design of experiments for studying user interaction with software development tools. In R. Iannella, editor, Second Annual CHISIG (Queensland) Symposium| QCHI94, pages 49{54, 1994. Also available as Technical Report 9430, Software Veri cation Research Centre, Department of Computer Science, The University of Queensland, Brisbane, Australia. [TW95] M.A. Toleman and J. Welsh. An empirical investigation of languagebased editing paradigms. In H. Hasan and C. Nicastri, editors, HCI: A Light into the Future|OZCHI'95, pages 163{168, 1995. Also available as Technical Report 95-45, Software Veri cation Research Centre, Department of Computer Science, The University of Queensland, Brisbane, Australia.

BIBLIOGRAPHY [TW96]

[TWC92]

[Van94] [Wat82] [WBK91] [WGR94] [WH94] [WHBK87]

[Whi90]

[Wie86] [WLM94] [WRL86]

204

M.A. Toleman and J. Welsh. Can design choices for language-based editors be analysed with Keystroke-Level Models? In M.A. Sasse, R.J. Cunningham, and R.L. Winder, editors, People and Computers XI| Proceedings of HCI'96, pages 97{112, London, 1996. Springer-Verlag. Presented at HCI'96, London, 20{23 August. Also available as Technical Report 96-36, Software Veri cation Research Centre, Department of Computer Science, The University of Queensland, Brisbane, Australia. M.A. Toleman, J. Welsh, and A.J. Chapman. An empirical investigation of menu design in language-based editors. ACM SIGSOFT Software Engineering Notes, 17(5):41{46, 1992. Also in H. Weber, editor, ACM SIGSOFT'92: Fifth Symposium on Software Development Environments, Washington D.C., 9{11 December. J. Vanderdonckt. The Tools for Working with Guidelines bibliography, 1994. available in ftp//arzach.info.fundp.ac.be/pub/papers/jvd. R.C. Waters. Program editors should not abandon text oriented commands. SIGPLAN Notices, 17(7):39{46, 1982. J. Welsh, B. Broom, and D. Kiong. A design rationale for a languagebased editor. Software|Practice and Experience, 21:923{948, 1991. B.R. Whittle, R.J. Gautier, and M. Ratcli e. Trends in structureoriented environments. International Journal of Software Engineering and Knowledge Engineering, 4(1):123{157, 1994. J. Welsh and J. Han. Software documents: Concepts and tools. Software|Concepts and Tools, 15:12{25, 1994. N.H. Weiderman, A.N. Habermann, M.W. Borger, and M.H. Klein. A methodology for evaluating environments. SIGPLAN Notices, 22(1):199{207, 1987. Also in P. Henderson, editor, Proceedings of Second ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, Palo Alto, December 1986. A. White eld. Human-computer interaction models and their roles in the design of interactive systems. In P. Falzon, editor, Cognitive Ergonomics: Understanding, Learning and Designing Human-Computer Interaction, pages 7{25. Academic Press, London, 1990. S. Wiedenbeck. Processes in computer program comprehension. In E. Soloway and S. Iyengar, editors, Empirical Studies of Programmers, pages 48{57. Ablex, Norwood, NJ, 1986. P. Wright, A. Lickorish, and R. Milroy. Remembering while mousing: The cognitive costs of mouse clicks. SIGCHI Bulletin, 26(1):41{45, 1994. J. Welsh, G.A. Rose, and M. Lloyd. An adaptive program editor. The Australian Computer Journal, 18(2):67{74, 1986.

BIBLIOGRAPHY [WT89]

205

J. Welsh and M.A. Toleman. A case study in user interface design. In Proceedings of HCI Australia'89, pages 19{28, Melbourne, Australia, 1989. [WT92] J. Welsh and M.A. Toleman. Conceptual issues in language-based editor design. International Journal of Man-Machine Studies, 37:419{430, 1992. [Zel90] M.V. Zelkowitz. Evolution towards speci cations environment: Experiences with syntax editors. Information and Software Technology, 32(3):191{198, 1990. [Ziv94] H. Ziv. Research issues in the intersection of hypertext and software development environments. In R.N. Taylor and J. Coutaz, editors, ICSE'94 Workshop on Software Engineering and Computer-Human Interaction: Joint Research Issues, pages 99{105, 1994. [ZKIH89] M.V. Zelkowitz, B. Kowalchack, D. Itkin, and L. Herman. Experiences building a syntax-directed editor. Software Engineering Journal, 4(6):294{300, 1989. [ZMa] Z Editor User Manual. Department of Computer Science, The University of Queensland.

Suggest Documents