Software Tools for Evaluating the Usability of User ... - CiteSeerX

5 downloads 205160 Views 73KB Size Report
well, we will highlight the role played by software tools for evaluating the usability of .... software help was the Cognitive Walkthrough, which introduces an Apple.
in Proceedings of the 6th International Confernece on HCI, Pacifico Yokohama (Japan), 9-14ּJuly 1995, Edited by Y. Anzai and K. Ogawa

Software Tools for Evaluating the Usability of User Interfaces Sandrine Balbo Human-Computer Communications Centre School of Information Technology Bond University 4229 QLD - AUSTRALIA Phone: Fax: Email: WWW:

+61 75 953331 +61 75 953320 [email protected] www.dstc.Bond.edu.au:8008/staff/sandrine.html

Abstract In this article we propose a review of some different techniques and methods to evaluate the usability of user interfaces (UI). So far, the evaluation process has been mostly based on "craft techniques" [Long 89], but as we will demonstrate, formalisation is possible, and a few software packages in this domain will be presented. The models and techniques we will consider are: • general guidelines such as those proposed by [Smith 86, Nielsen 90, Bastien 93], • the Cognitive Walkthrough [Lewis 90], • metrics [Whiteside 85, Bevan 94], • usability labs [Hammontree 92, Weiler 93], • predictive models [Young 90, Barnard 87], • automatic monitoring systems [Siochiּ91, Balbo 94] and • critics [Löwgren 90, Kolsky 89]. We will present these methods and techniques around a taxonomy developed by Joëlle Coutaz in [Coutaz 94], taxonomy designed to help in the choice of a method to evaluate UI. As well, we will highlight the role played by software tools for evaluating the usability of UI.

1. Introduction The development of interactive systems is not an exact science. It relies on an iterative four-step process that includes requirement analysis, design, implementation, and evaluation. A number of tools are now available to support the first three phases. In particular, for the requirements and design phases, task models such as GOMS [Card 83] and MAD [PierretGoldreich 89], provide useful ways to structure the task space. Implementation is supported by an even wider variety of tools ranging from toolboxes such as Motif [Motif 90], to application skeletons such as MacApp [Schmucker 86], and UI management systems such as SIROCCO [Normand 92] or Interface Builder [Webster 89]. On the other hand, the software engineering community has placed little emphasis on the evaluation of UI. For the purpose of this article, we will organise the evaluation methods around the taxonomy developed by Joëlle Coutaz in [Coutaz 94]. This taxonomy is designed to help the person in charge of the evaluation (not always an expert) to formulate questions that will help in the choice of the right tool to conduct the evaluation. It proposes a set of five preoccupations, where each preoccupation is decomposed into a set of axes. These five preoccupations concern knowledge resources, hardware resources, environment resources, human resources, and outcomes. Let us now examine each preoccupation in turn.

2. Knowledge resources

Ergonomic expertise

The first preoccupation with which Descriptions this taxonomy is concerned is the high knowledge needed to conduct the user model evaluation. We consider two types of low task model knowledge required in conducting an external specifications evaluation: the description needed as an none scenarios input for the evaluation and the level of none expertise required from the evaluator in order to perform the evaluation. • The Smith & Mosier guidelines result from empirical knowledge (inferred by observation and experimentation) [Smith 86]. They can be used by the human factor specialist just by “looking at the interface”. These guidelines propose a set of 944 recommendations concerning very specific points to be examined. These recommendations are quite difficult to manage for a non-ergonomist expert. Then, in the case of the Smith & Mosier guidelines, if we consider the knowledge resources preoccupation, no description is necessary but the ergonomic expertise of the evaluator must be high. By providing tools and software packages to the evaluator, we try to reduce the amount of ergonomic expertise needed to conduct the evaluation.

3. Environment resources The next preoccupation concerns the environment resources which define the context of the evaluation. This dimension is expressed using a set of five axes: the location where the evaluation takes place, the structure of the dialogue provided by the interface, the phase of the SE development life cycle in which the evaluation may be conducted, the type of interface being evaluated and the financial or temporal constraint on the evaluation.

Software development phase Dialog structure flexible

Requirement analysis User Interface type Design Implementation

mixte rigid

Tests

mono/ multi user multimedia/ multimodal

direct manipulation

Location in the zoo

in the wild

money limit

calendar limit

Financial & temporal constraints

• [Nielsen 90] or [Bastien 93] propose a set of ergonomic criteria derived from existing guidelines, similar to the Smith & Mosier guidelines, but the structure is different. They supply a “way of improving the completeness and explicitness of the diagnosis, of standardising the format of the evaluation, and of better documenting the evaluation” [Bastien 93, p.1]. This is provided by giving some recommendations, of a higher level than the one proposed by Smith & Mosier. For example, [Bastien 93] will speak about consistency, when Smith & Mosier will introduce it at different stages, like, for example, in the guideline “2.7.1/4 Consistent format for display labels”, or the guideline “4.4/9 Consistent format for prompts”. We note that in the Smith & Mosier guidelines, the word extract “consisten” (cf. consistent/consistency/etc.) appears in 142 guidelines. • The Cognitive Walkthrough supplies another type of support for evaluation in the form of a questionnaire [Lewis 90]. This questionnaire focuses primarily “on ease of use for first-time users, using a theoretical analysis based on the CE+ theory of Polson and Lewis” [Rieman 91, p. 427]. Regarding methods such as those proposed by [Bastien 93], [Nielsen 90] or the Cognitive Walkthrough, the only restriction concerning the environment resources refers to the financial and temporal constraint. A study conducted by Pollier in [Pollier 91] shows, that to be effective, an evaluation needs to be conducted by at least 3 evaluators. An evaluator alone will find an average of only 42% of the problems in the interface. Therefore, if the budget is tight, such a method will not be suitable.

Origin of subjects

4. Human resources Nu

m

s or at

Nu

lu

va fe

ro be

m

external The human resources concern the n persons involved in the evaluation Subject types n process. This may refer to the evaluators in house as well as to the subjects. For the evaluators, the taxonomy takes into levels of expertise account their number and their level of 0 representative/effective expertise, which is directly linked to the 0 Expertise of knowledge resources preoccupation. The the evaluators second aspect of the human resources Usability Quality concerns the users: their number, but also Culture Assurance their type and their origin. Culture Regarding the methods which were previously discussed, let us highlight the fact that at least three evaluators are needed to cover the ergonomic problems in the interface. The only method using software help was the Cognitive Walkthrough, which introduces an Apple Hypercard stack to manage the navigation between the different questionnaires [Reiman 91]. A second step towards the use of software tools during the evaluation stage is to provide quantitative observations. This is accomplished by the use of predictive techniques in the early stage of the software development life cycle or by the use of experimental techniques, including metrics or usability labs. To illustrate the predictive techniques, we will present below the Programmable User Model (PUM) and the Interacting Cognitive Sub-systems (ICS) which are both based on predictive models. • PUM predicts learnability or usability problems from the formal description of the knowledge required from the end user in order to run the UI [Young 1990]. This description is compiled in terms of rules. These rules represent the user’s ability to accomplish a particular task with a particular UI. Based on this knowledge, the PUM cognitive architecture tries to elaborate a plan. If no plan can be generated, then the designer is notified of a potential usability or learnability problem. • ICS models the human architecture for perception, cognition, and action [Barnard 87]. Each of these three entities is decomposed into sub-systems. A total of nine sub-systems compose the ICS model. These sub-systems process information in parallel and they communicate through a common data bus. The observation of the flow of data in the bus among the subsystems will provide information about the cognitive complexity of the interface. The advantage of such techniques is that they are applicable early in the development life cycle. On the other hand, the expertise of the evaluator should be high in order to interpret the results provided by the methods. Another disadvantage is that they are based on hypotheses and not on real data as experimental methods, to which we now turn. • Early in 1985, Whiteside et al introduced a metric of UI usability which allows the comparison of different UI for the same system [Whiteside 85]. The metric used expresses the performance of a user on a task in terms of the rate at which work is completed and is the following: ( 1/T ) * PC, where T = time spent on task (in minutes) P = percentage of task completed C = arbitrary time-unit constant, equal to the fastest time spent on task by an expert. • More recently, the objective of the MUSiC (Measuring the Usability of Systems in Context) project is to provide tools which can be used to measure user performance, cognitive workload, and user perceived quality in the laboratory and the field [Bevan 94]. Several tools has been developed to achieve that goal: the Evaluation Design Manager, the video analysis support tool DRUM, SUMI to measure the user perceived quality, the Measurement of Mental Effort tool, and the Analytical Measurement tool. The human resources are centred on the evaluator as well as the user. We believe that both these factors are central to the evaluation of the usability of UI.

ro

be

fs

ts

jec

ub

5. Hardware resources

Instruments for data capture

computer The hardware resources cover the what is evaluated physical components for the evaluation. usability lab version product They include the object of the evaluation (i.e. what is evaluated), and the prototype paper& instruments used to capture the data. pencil mockup This second concept emphasises again none task model the importance of the observation of the end user manipulating the UI while user model conducting the evaluation. As a further step towards the automation of UI usability and learnability, we will review tools that do observe the end user in action. These actions are recorded during task execution with the real system on video tape, or on the computer itself through event capture. We will first introduce usability labs and the tools developed to analyse their results. • Usability labs are generally used to record sessions of the user manipulating a UI without being intrusive [Weiler 93]. These sessions can be evaluated on the fly, and later on as well. The tools to help the evaluator later on cover a vast range going from a simple replay of the sessions, to filters or multimedia data analysers that allow the evaluator to go through the sessions in an effective way [Hammontree 92]. The capture of events on the computer opens the perspective of an automatic analysis of the information recorded. Going in that direction, we present two software packages: Maximal Repeating Pattern (MRP) and ÉMA, an automatic analysis mechanism. • MRP detects repeating patterns of behaviour by analysing the log of the end user’s actions [Siochiּ91]. These actions are recorded during task execution with the real system. Detected patterns can then be used by the human factor specialist as driving cues for the evaluation. • The automatic analysis provided by ÉMA is not only based on the acquisition of the end user's actions, but uses as well a data-flow oriented dialogue model of the interface being tested to detect patterns of behaviour modelled by a set of heuristic rules. ÉMA automatically constructs an analysis that will help the evaluator to perform the evaluation [Balbo 94]. In the case of MRP or ÉMA, a prototype or a version of the product is needed, and an automatic analysis is provided. As for usability labs, they may apply to task models or user models as well, but do not provide automatic analysis.

Rendering support

6. Outcomes

Information types

computer The outcomes of an evaluation objective/ technique or method are characterised by subjective video the rendering support, as well as the type quantitative/ paper& qualitative of information provided. This second pencil predictive/ axis allows objective information, explicative/ quantitative results, or corrective outputs corrective to be distinguished. We present here two systems, Knowledge-based Review of user Interfaces (KRI) and SYNOP that provide corrective output automatically. • KRI provides the designer with an automatic critic [Löwgren 90]. The tool uses a collection of ergonomic rules to evaluate a formal description of the interface. The evaluation produces a set of comments and suggestions. The improvements, however, are concerned with lexical issues only such as the minimum distance between two icons or menu items ordering, and only form filling interfaces are concerned by this technique. • The expert system SYNOP is similar to KRI in regards to the services it provides [Kolsky 89]. However, its organisation is different in addition to the interfaces it concerns. SYNOP is applicable only for control system applications.

With KRI or SYNOP the level of automation in the evaluation is very high. For example SYNOP is able to modify the interface by itself. However these systems exclude the user from the evaluation and the constructive information their observations can provide.

7. Conclusion This discussion has resulted in a comparative analysis of some of the software packages available to help in evaluating the usability of UI. We note four differences in the use of software tools during the evaluation of the usability of UI: help to conduct the analysis (Cognitive Walkthrough, usability labs, MUSiC), computerised capture (usability labs, MRP, ÉMA), automatic analysis (MRP, ÉMA), and automatic critic (KRI, SYNOP). The outcome preoccupation is concerned by the rendering support and the type of information provided. We should consider to include as well in that preoccupation a new axis referring to the use of software during the evaluation.

Acknowledgments Many thanks to Nadine Ozkan, Michael Rees and Dominique Scapin for their review of this paper.

REFERENCES [Balbo 94] S. Balbo, “Évaluation ergonomique des interfaces utilisateur : un pas vers l'automatisation”, PhD thesis, University of Grenoble I, France, September 1994 [Barnard 87] P.J. Barnard, “Cognitive Resources and the Learning of Human-Computer Dialogs”, in Interfacing Thought, edited by J.M. Carroll, The MIT Press, pp.ּ112-158, 1987 [Bastien 93] J-M. C. Bastien & D. L. Scapin & “Ergonomic Criteria for the Evaluation of Human-Computer Interfaces”, Rapport Technique INRIA no 156, Juin 1993 [Bevan 94] N. Bevan & M. Macleod, “Usability measurement in context”, Behaviour and Information Technology, Vol. 13, Nos 1 and 2, pp 132-145, 1994 [Card 83] S.K. Card, T.P. Moran & A. Newell, “The psychology of Human Computer Interaction”, Lawrence Erlbaum Associates, 1983 [Coutaz 94] J. Coutaz & S. Balbo, “Evaluation des interfaces utilisateur: taxonomie et recommandations”, IHM'94, Human-Computer Interaction Conference, Lilles (France), December 1994 [Hammontree 92] M.L. Hammontree, J.J. Hendrickson & B.W. Hensley, “Integrated data capture and analysis tools for research and testing on graphical user interfaces”, in Proceedings of the CHI'92 Conference, pp.ּ431-432, ACM Press, Monterey, 3-7 May 1992 [Kolsky 89] C. Kolski, “Contribution a l'ergonomie de conception des interfaces graphiques homme-machine dans les procédés industriels : application au système expert SYNOP”, Thèse présentée à l'Université de Valenciennes et du Hainaut-Cambrésis, Janvier 1989 [Lewis 90] C. Lewis, P. Polson, C. Wharton & J. Rieman, “Testing a Walkthrough Methodology for Theory-Based Design of Walk-Up-and-Use Interfaces”, Proceedings of the CHI’91 Conference. April 1990, Seattle. ACM New York, pp. 235-242

[Long 89] J. Long & J. Dowell, "Conceptions of the Discipline of HCI: Craft, Applied Science, and Engineering", in Proceedings of the Fifth Conference of the BCS HCI SIG, A. Sutcliffe and L. Macaulay (Eds), Cambridge University Press, 1989 [Löwgren 90] J. Löwgren & T. Nordqvist, “A Knowledge-Based Tool for User Interface Evaluation and its Integration in a UIMS”, Human-Computer Interaction-INTERACT'90, pp. 395-400 [Motif 90] 1990

OSF/Motif, Programmer’s Guide, Open Software Foundation, Prentice Hall,

[Nielsen 90] J. Nielsen & R. Molich, "Heuristic evaluation of user interfaces", Proceedings of the CHI’90. 1990, Seatle. ACM New York, pp.ּ349-256 [Normand 92] V. Normand, “Le modèle Sirocco : de la spécification conceptuelle des interfaces utilisateur à leur réalisation”,PhD thesis, University of Grenoble I, France, 1992 [Pierret-Goldreich 89] C. Pierret-Goldreich, I. Delouis & D.L. Scapin, "Un Outil d'Acquisition et de Représentation des Taches Orienté-Objet", rapport de recherche INRIA no 1063, Programme 8 : Communication Homme-Machine, août 1989 [Pollier 91] A. Pollier, “Evaluation d’une interface par des ergonomes : diagnostics et stratégies”, Rapport de recherche INRIA no 1391, Février 1991 [Rieman 91] J. Rieman, S. Davis, D.C. Hair, M. Esemplare, P. Polson & C. Lewis, “An Automated Cognitive Walkthrough”, CHI'91 Conference Proceedings, New-Orleans (USA), ACM Press, Addison-Weslay, April 27-May 2, 1991 [Schmucker 86] K. Schmucker, "MacApp: an application framework", Byte 11(8), 1986, pp.ּ189-193 [Siochi 91] A.C. Siochi & D. Hix, “A study of computer-supported user interface evaluation using maximal repeating pattern analysis”, CHI'91 Conference Proceedings, New-Orleans (USA), ACM Press, Addison-Weslay, April 27-May 2, 1991 [Smith 86] S.L. Smith & J.N. Mosier, “Guidelines for designing user interface software”, Report MTR-10090 ESD-TR-86-278, The MITRE Corporation, Bedford, MA, August 1986 [Webster 89] B.F. Webster, “The NeXT Book”, Addison-Wesley, Reading, Mass., 1989 [Weiler 93] P. Weiler, “Software for the Usability Lab: a Sampling of Current Tools”, INTERCHI'93 Conference Proceedings, Amsterdam (Holland), ACM Press, AddisonWeslay, April 24-29, 1993 [Whiteside 85] J. Whiteside, S. Jones, P.S. Levy & D.Wixon,“User Performance with Command, Menu, and Iconic Interfaces”, in Proceedings of the CHI'85 Conference, ACM Press, Addison-Weslay, April 1985 [Young 90] R.M. Young & J. Whittington, "A knowledge Analysis of interactivity", Proceedings of INTERACT'90, edited by D. Diaper, G. Cockton & B. Shackel, Elsevier Scientific Publishers B.V. pp.ּ207-212, 27-31st August 1990, Cambridge (United Kingdom)