graphical user interface evaluation for messaging ... - Semantic Scholar

7 downloads 71010 Views 24KB Size Report
Email: [email protected]. ABSTRACT ... One of the primary visions of X.500 is to act as a global electronic mail .... These scenarios enable a benchmark in the.
GRAPHICAL USER INTERFACE EVALUATION FOR MESSAGING AND DIRECTORY SYSTEMS Renato Iannella Distributed Systems Technology Centre Pty Ltd University of Queensland Queensland, 4072, AUSTRALIA Phone: +61 7 365 4310 Fax: +61 7 365 4311 Email: [email protected] ABSTRACT

Two key international standards for messaging and directory services are evaluated using a prototype system. The aim is to develop a graphical user interface reference model that can be used by implementors to guide and benchmark their own development. The evaluation process is discussed which enabled the specification of the reference model. The development and evaluation of such graphical user interface reference models is seen as an important key factor in the acceptance of user-oriented communication standards. KEYWORDS

User Interface Evaluation; Usability Evaluation; Messaging Systems; Directory Systems; Reference Models. INTRODUCTION

This paper reports on the evaluation of a graphical user interface (GUI) for X.400 Message Handling Systems (MHS) incorporating X.500 Directory Services (DS). The X.400 MHS international standard is a complex and encompassing protocol describing the interchange of electronic mail messages between cooperating systems. The X.500 DS international standard protocol describes the structure and contents of a highly distributed database. One of the primary visions of X.500 is to act as a global electronic mail address directory and to cater for the proliferation of complex addressing structures used in X.400 MHS. A key area of concern for both the X.400 and X.500 standards is that they do not provide any guidelines for the user interfaces—called User Agents (UA)—to systems utilising these standards. The usability evaluation involved the selection of appropriate and effective evaluation methodologies. Each method has advantages and disadvantages, both in terms of time and costs involved to achieve varying levels of results. For the evaluation of the UA, three methods were selected on the basis of cost-effectiveness and identification of user interface problems. This selection process was controlled by the resources available for the evaluation and the level of expertise required to administer each method. The evaluation methods selected included a scenario-based usability trial utilising the thinkingaloud method and followed up with user questionnaires. The scenario-based usability evaluation described a number of common tasks in the daily use of electronic mail systems. The identified tasks cover a range of user activities that should be supported by X.400 MHS including the use of X.500 DS for address searching. The usability tests were video taped which allowed a complete and thorough analysis of each user’s interaction with the UA. A series of questionnaires is used to

backup the findings of the analysis and provide a statistical analysis across the usability trials. The method used to analyse the video tapes is discussed as it provided an integral part of the evaluation method. A summary of the usability evaluation results is used to identify aspects of the UA interface that may require redesign. The UA prototype is then modified to reflect the new interface design changes. The usability evaluation process is repeated on the redesigned interface in an iterative manner. The results from the second evaluation are then discussed and matched with the original interface problems. USER INTERFACE EVALUATION

The evaluation of the interface is undoubtedly one of the most important aspects of software development. Without evaluation, designers would have little indication as to the successful and, more importantly, the non-successful interface designs. From a management point of view, the evaluation of an system is important in determining the feasibility of introducing Human-Computer Interaction (HCI) usability techniques (eg rapid prototyping, task analysis) into the development lifecycle. Evaluation also allows some comparisons in the cost savings for new (and presumably better) interfaces over existing systems. Karat (1993) reports on a possible cost–benefit ratio of 1:100 for projects involving usability evaluation on the interface. Usability is defined as an evaluation of a system that is designed to measure the functionality of the system in respect to (Potosnak, 1988): 1 real users undertaking, 2 real tasks, with 3 real products. The users should be representative of the typical user of that application and the number should be of optimal size for the anticipated evaluation results. The tasks should represent a whole user task and will allow assessment of the interface for consistency between users conceptual models of the tasks. The usability tests should be performed on real products or working prototypes and not rely on the users imagination of the final product. Evaluations are normally performed during the development of the system to catch any potential problems early. There are numerous methods for evaluation and usually a combination is used to provide a complete evaluation of an interface. Such evaluations cover formal, empirical, contextual, and ethnographic approaches to interface analysis. A balance must be achieved to generate results that are applicable and representative of the interface under evaluation. A number of evaluation methods are reviewed in Nielsen (1989). One of the most interesting and cost-effective human factors methodologies for interface evaluation is the ‘thinking-aloud’ method (Lewis, 1982) in which users ‘talk aloud’ as they perform their tasks. The thinking-aloud method has been shown to be highly effective (Jorgensen, 1989). Another evaluation method utilises ‘scenarios’ which are small tasks that the user may typically perform on a regular basis when interacting with software. The designer can get quick and frequent feedback by developing effective and small scenarios that may cover the bulk of the typical tasks a user may perform. The scenario may reduce the level of the prototype functionality but may instead emphasise a specific interface style under evaluation. The prototypes used may also be developed quicker so as to only simulate the interface being tested with the scenario. Heuristic evaluation involves the use of a small set of major guidelines that represent the largest proportion of problems in user interfaces. Such a set can be found in (Molich &

Nielsen, 1990). Each heuristic needs to be applied to the interface under evaluation and does require some experience with the principles. One of the least expensive usability evaluation methods are short questionnaires to users about the performance and functionality of a system. Questionnaires require a lot of planning but do allow for the collection of data from large amounts of users. The formats of the questions should help gather as much quantitative information as practical and should be worded to elicit such responses. USABILITY EVALUATION METHOD

The usability evaluation was undertaken on a prototype UA for an X.400 MHS. The X.400 UA was initially designed utilising a User Interface Management System, then subsequently implemented under the X Window System with the OSF/Motif graphical user environment. Initial designs were subjected to a heuristic evaluation to capture any interface problems early in the development. The usability evaluation devised for the X.400 UA consisted of: • five scenarios • thinking-aloud method • video-taped sessions • questionnaires The video allows the evaluation to be reviewed a number of times at a later stage. If the review of the video highlights problems with the interface, then a subsequent usability evaluation can be performed after a redesign of the interface. The process can then continue in this iterative manner. The use of video is also useful for developers to actually see their software in use and provides a permanent record of the events. Figure 1 shows a typical set-up for the usability trial. Mirror Video Camera

Scenario Tasks Sheet

Figure 1: Usability Evaluation Set-up

The taping of the trialist provides a convenient mechanism for the review of the interactions with the software as all actions can be scrutinised. As an added benefit, a small mirror was added to the top-left of the computer monitor to provide feedback to the eye movements of the trialist. This would provide extra information as to where the trialist is looking on the screen. The trialist were also asked to use the thinking-aloud method to further provide feedback as to what they were thinking during the trial. Since the trialist were being video-taped, assurances had to made to the users to protect their confidentiality. Each trialist was asked to sign a consent form that enforced this and

also followed organisational guidelines for experimentation involving human subjects. The consent form used was an adaptation from Perlman (1992). Scenarios

For the evaluation of the X.400 UA, the trialist were asked to perform a number of typical task common for electronic mail users. These scenarios enable a benchmark in the final evaluation of the interface. The scenarios developed for the usability evaluation reflect five common task that are typical for day-to-day use of electronic mail applications. The scenario tasks are representative of realistic tasks and cover all systems functionality. Table 1 lists two of the scenarios used in the evaluation. Table 1: Usability Scenarios Task

Scenario Description

1

Read the mail message from John Lewis about the FIMS standard. Reply to the message and ask him to send you a copy of the FIMS standard. Move the message to the folder about FIMS standards

2

You and your work colleague Joe Hook are working on the HCCC Annual Report. The report is due soon and you wish to send Joe a message asking if he has finished with his part of the report. You also require the following to be specified before sending the message: 1—Since this is urgent and highly important, indicate that you wish to receive automatic notification of when he reads this message. 2—Also indicate that you wish to receive a reply from him by the 1st July 1993. 3—Make sure a copy of the message gets sent to the dean of the school.

Each scenario was design to utilise and test certain functionalities of the X.400 UA in a typical ‘real-world’ situation. The scenarios offer challenges to the trialists in transferring each task-metaphor to the X.400 UA interface. Questionnaires

A number of questionnaires were formulated to elicit quantitative measures from the usability trials and provide an indication of the background of the trialists. Questionnaires are an easy and quick method to provide extra feedback on the user interface. The first questionnaire asks the trialist questions on the number of years experience they have had with computers and various electronic mail systems. This small questionnaire establishes the history and ensures a common background of the trialists. The second questionnaire focuses on the functionality of the X.400 UA. The trialist is asked to indicate their agreement with a number of statements about the clarity of each messaging function that was involved in the usability evaluation. Table 2 lists a small sample of the 15 questions asked. The third questionnaire focuses on the graphical user interface of the X.400 UA. The trialist is asked to indicate their agreement with a number of statements about aspects of the X.400 UA interface. The questions were taken from a small subset from (Ravden & Johnson, 1989). Ravden & Johnson provide a detailed checklist of numerous questions grouped into nine sections. For the interface questionnaire only two questions from seven sections were selected which provided a small and concise set targeted at the interface under evaluation. Table 3 lists a sample of the 15 questions asked. It was important to present these two questionnaires separately as to best elicit the trialists responses. If the questionnaires were together, if would be difficult to ascertain if the responses were highlighting problems in the interface or inappropriate functionality.

Table 2: Functionality Questionnaire Strongly Disagree

Disagree

Neutral

Agree

Strongly Agree

Never

Some of the Time

Neutral

Most of the Time

Always

Rate your agreement with the following statements (Place a ✔ in the appropriate column)

It was clear to read a mail message It was clear to reply to a mail message It was clear to move a message to a folder It was clear on setting message options It was clear on addressing mail messages Overall, the system provided adequate functions for electronic mail Table 3: Interface Questionnaire

Rate your agreement with the following statements (Place a ✔ in the appropriate column)

Is the way the system respond to a user actions consistent at all times Are status messages informative and accurate Is it clear what different parts of the system do Is the system flexible in allowing the user to choose options Does the system protect against errors in user actions Overall, the interface was pleasing and easy to use Trial Process

The trialists for the evaluation were selected from undergraduate computing students who all had experience using graphical and non-graphical electronic mail system but none had experience with X.400 mail. The trialist were paid $A15/hour for their participation in the evaluation which took no longer than one hour and each trial consisted of six students. A sample size of six is double the recommended number of Perlman (1992) and would also allow for quantitative differences to be used. The following steps were taken for each trialist: 1 Explain the concept of usability evaluations and the purpose of the research. 2 Explain the X.400 UA prototype. 3 Reconfirm that the trial will be video-taped. 4 Explain that the trail is to test software not users.

5 6 7 8 9 10

Explain consent form and emphasize the confidentiality clause. Issue the user experience questionnaire. Explain the five task scenarios. Explain (with an example) the thinking-aloud method. Run video and check microphone. If the user seems stuck during the trial, ask the trialist what they are ‘thinking– about’. 11 When finished, stop video. 12 Issue the functionality and interface questionnaires. 13 Ask the trialist not to divulge the content of the evaluation to other people (particularly other trialists). 14 Thank and reimburse the trialist. A ‘dry-run’ was first performed on a test trialist to gain the experience and to get any feedback from the trialist on the sequence of steps or any other aspect that may have been unclear. Video Review

After each trial, a comprehensive review of the video tape is undertaken to analyse each trialist. As each trialist performs each task scenario, the entire sequence of steps performed is recorded as well as the time taken to compete the scenario. Table 4 shows an example of the level of detail recorded from each trialist. The example shows a trialist attempt at a scenario task and lists all the actions and possible problems. Table 4: Usability Task Recording Example Sub Task

Description

Address Message

1. Selected Dean of School nickname 2. Clicked on Add button 3. Realised incorrect—clicked on recipient name and Remove button 4. Selected ‘Copy’ radio button and added correctly

Message Options

1. Clicked on Recipient Options button—no response (again) 2. Click on Joe Hook in recipient list 3. Clicked on Recipient Options button 4. Selected Receipt-Notification and Reply Requested toggle buttons

Interface Problem Summary

There are generally three levels of problems found with usability evaluations (Perlman, 1992): 1 High-level—Difficulties with task and not the user interface. 2 Medium-level—Problems accomplishing task because of the user interface. 3 Low-level—Easy to fix user interface problems. An example of a high-level problem include misunderstanding the scenario task. The majority of problems found in the usability evaluation fall into the latter two categories and required minor to substantial changes in the user interface of the X.400 UA. After the completion of analysis of the videos and each task for each trialist, another summary was made from the generated task recordings. This summary concentrated on the problems experienced by the trialists for each task that were due to the user interface. Each point in the summary was numbered to aid in the identification of the solution in

the new redesigned interface. A sample summary appears in Table 5 and is referenced by the scenario number and the identified problem number. The summary also includes a classification of the problem into the three levels. Table 5: Interface Problem Summary Problem No

Scenario No

Description of User Interface Problem

Level

1

Could not find the message to read—looked in message folders first before looking at msg list.

Medium

2

Clicked on Read button without selecting message first.

Low

3

Message list text is too small.

Low

4

Move message dialog was confusing. User wasn’t sure what to do with the two lists.

1

Medium

The summary enables the redesign of the X.400 UA interface to be structured and focused on the particular issues it raises. The problems identified as high-level were unable to be solved with interface changes as they reflected misconceptions with the scenario tasks. It was inadvisable to change the scenario tasks at this point as it would than defeat the purpose of a comparison across different groups of trialists. Iterative Process

After the redesigned interfaces are implemented in the prototype, the entire usability evaluation is repeated with different users. A second summary of interface problems is then generated. The results of this summary dictates the feasibility of a subsequent redesign and evaluation process. If the summary shows a small number of problems or a number of high level problems, then the evaluation process may be concluded. Otherwise, the interface needs to be redesigned to address the new interface problems, then another usability trial run. This process continues until the identified problems do not warrant a subsequent usability evaluation. Questionnaire Results

An analysis of the questionnaires comparing the different groups of trialists, before and after the interface changes to the X.400 UA, add further evidence to the result of the design changes. The rating questionnaire answers were assigned values between 1 (lowest) and 5 (highest) and the average values calculated for each question. A statistical analysis of the data collated from the questionnaires was performed to establish if there are any significant differences between the groups of usability trials. A two sample differences test using a normal distribution (T-Test) and a non-parametric test (Wilcoxon) were utilised to establish statistical evidence. The functionality and interface questionnaires were further summarised to reflect the higher level aspects of each questionnaire. The questions were grouped into common areas and the data values averaged. For example, the functionality questionnaire was grouped into; Message Management, Options, Addressing, and Overall. The interface questionnaire was similarly grouped into; Visual Clarity, Explicitness, Consistency, Flexibility, Expectations, Error Prevention, Feedback, and Overall.

The data was presented in summary tables and charts. Although the questionnaires only reflect the attitude of the users, they do provide extra support to the changes in the interface and functionality of the X.400 UA after the redesign process. Scenario Task Times

An analysis of the times recorded to complete each task also provides feedback to the evaluation process. A comparison of the average times across the different usability evaluations indicates the success of the interface design changes. Statistical tests can also be applied on the data to provide conclusive support. CONCLUSION

The usability engineering methodology described has provided convincing and successful results on a real prototyped electronic mail system. The method is a practical approach and identifies the iterative process required to integrate usability evaluations in a software engineering project. The method described is a cost-effective measure to ensure the usability requirements of a software product and can easily be integrated in the software engineering development lifecycle. The conclusions from this paper include the recommended use of a comprehensive scenario-based usability evaluation for messaging systems and the development of GUI reference models to be associated with standards that involve human-user interaction. A GUI reference model, that has been subjected to a usability evaluation, may feasibly be used as a benchmark for implementors of such systems. The major benefits include the earlier release of software conforming to international standards and the ‘pretested’ GUI and, hence, the greater user-acceptance of the software. The idea that a GUI reference model for application-layer protocols, such as X.400 and X.500, may also prove to be an important precursor to the development of many other HCI-centred standards. ACKNOWLEDGMENTS

The research reported in this paper was undertaken whilst the author was an employee of Bond University. The author acknowledges the support of the DSTC Pty Ltd in the continuation of this work. REFERENCES

Jorgensen, Anker Helms. Using the Thinking–Aloud Method in System Development. Designing and Using Human–Computer Interfaces and Knowledge Based Systems. Salvendy, G & Smith, M J, eds. Elseivier Science Publishers (1989): 743–750. Karat, Clare–Marie. Usability Engineering in Dollars and Cents. IEEE Software 10.3 (May 1993): 88–89. Lewis, Clayton. Using the ‘Thinking–aloud’ Method in Cognitive Interface Design. Research Report RC 9265 (#40713). IBM, New York. 1982. Molich, Rolf & Nielsen, Jakob. Improving a Human–Computer Dialogue. Communications of the ACM 33.3 (Mar 1990): 338–348. Nielsen, Jakob. Usability Engineering at a Discount. Designing and Using Human– Computer Interfaces and Knowledge Based Systems. Salvendy, G & Smith, M J, eds. Elseivier Science Publishers (1989): 394–401. Perlman, Gary. Practical User Interface Evaluation. OZCHI92 Conference Workshop. Gold Coast, Australia, 25 Nov (1992). Potosnak, Kathleen. Recipe for a usability test. IEEE Software, Human Factors column (Nov 1988): 83–84. Ravden, Susannah J & Johnson, Graham I. Evaluating Usability of Human–Computer Interfaces: A Practical Method. Ellis Horwood Limited, 1989.

Suggest Documents