methods & tools - PUC-Rio

R. PRATES, C. DE SOUZA, AND S. BARBOSA

methods & tools

Simon Osborne © Digital Vision Ltd.

A Method for Evaluating the Communicability of User Interfaces

U

User interfaces can be viewed as one-shot, higher-order messages sent from designers to users. The content of such messages is a designer’s conception of who the users are, what their needs and expectations are, and, more important, how the designer has chosen to meet these requirements through an interactive artifact. The form of the messages is an interactive language (i.e., a series of organized dialog patterns determining how and which other lower-order messages can be exchanged between users and systems. From this perspective, user interface design is a semiotic engineering task [1] whose target is to convey the specific principles of communication that are embedded in any software artifact. Thus, designers should be assisted in achieving this goal and in evaluating how well they do it in different situations. interactions...january + february 2000

31

Raquel O. Prates Departamento de Informática, PUC-Rio Rua Marquês de São Vicente, 225 Rio de Janeiro, RJ, Brazil 22453-900 [email protected] Departamento de Informática e Ciência da Computação, UERJ R. São Francisco Xavier, 524 – 6o. andar Rio de Janeiro, RJ, Brazil, 20550-013

Clarisse S. de Souza, and Simone D. J. Barbosa Departamento de Informática, PUC-Rio Rua Marquês de São Vicente, 225 Rio de Janeiro, RJ, Brazil 22453-900 [email protected], [email protected]

In parallel with software usability, we can then assess software communicability. Communicability is the property of software that efficiently and effectively conveys to users its underlying design intent and interactive principles. Thus, the goal of the communicability evaluation method is to let designers appreciate how well users are getting the intended messages across the interface and to identify communication breakdowns that may take place during interaction. This method is carried out in three steps that can be performed by different groups of people (users, designers, human–computer interaction [HCI] experts and semiotic engineering experts). It yields distinctive types of representations about interaction, which tell us something about user–system interactive patterns and designer-to-user (intentional or unintentional) communication. As users try to meet their needs and expectations, by exchanging messages with the system, they should be able to grasp the designer’s conception of who they are and what they want. A system can be perceived as a “discourse deputy” for the designer. The deputy can only communicate to users the set of conversational turns and themes that the designer has predicted at design time.

Conversely, users can only tell the deputy what the communicative language allows them to. Furthermore, users can only communicate with the designers’ deputies, but not with designers themselves. Therefore, unlike in human conversation, where message sender and receiver can negotiate what they mean, interactive words and phrases in HCI have a fixed meaning (because of implementation) when used by either party. What varies is the intention of use, which depends on contextual features. As a consequence, user interfaces should help users negotiate what they mean by what they say and what is meant by the designer’s deputy. For instance, applications that provide an undo feature support this negotiation by allowing users to correct their misconceptions at low cost. Figures 1 and 2 show examples of high and low communicability, respectively. In Figure 1, an analogy with an existing physical design is being effectively used to tell users what they must do to play back compact discs in the

Figure 1. Example of high communicability.

Figure 2. Example of low communicability.

32

interactions...january + february 2000

computer. In Figure 2, however, within the context of searching for a computer, there are some readily available operations (in the pulldown menu) that are completely unrelated to the present context of discourse (but related to file manipulation). Users have little chance of assigning any meaning to the corresponding interface elements, since they refer to a whole different context of use, which may then be long forgotten. The communicability evaluation method described in this paper provides a way for the evaluators to identify points in which the designer may have failed to convey to users his intended message, as well as a way for users to communicate with the actual designers, although indirectly, what they have not understood or agreed with. When users perform the communicability evaluation, they can spontaneously express their expectations, attitudes, interpretations, approval, or rejection toward HCI design choices present in software. When designers or experts perform the test, they produce what should be perceived as an inferred message about the same topics, qualified by the evaluator’s background and expertise in HCI. Communicative evaluation can be made at different stages of design and serve different targets. In formative evaluation, it can help designers compare design alternatives and make further design decisions. In summative evaluation, it can inform the changes needed in new releases. Compared with other evaluation methods, ours focuses on the signs, structures, and conversational patterns presented to users at the interface level, signaling the immediate interpretations assigned to them and the role they play in user-to-system and designerto-user communication. Steps in the Method

The communicability evaluation method consists of three major steps: tagging, interpretation, and semiotic profiling. Each of these steps requires different expertise from the evaluator and yields a distinctive type of representation about the interaction. Users, designers, HCI experts, or semiotic engineering experts may perform the first step, tagging, which

identifies communication breakdown points. The next step, interpretation, maps these breakdowns to HCI problems. In this step an HCI expert is usually necessary. Semiotic profiling, the last step, requires a semiotic engineering expert and yields a characterization of the overall message conveyed by the system. Tagging Tagging amounts to “putting words in the user’s mouth” while observing his actions during goal-oriented interactions. The “words” are selected from a set of utterances that express a user’s reaction to what happens during interaction (e.g., “Oops!” or “What’s this?” or “Where is that?” and the like) when conversational breakdowns occur. The result of tagging is a correlation between elements of a predefined set of utterances (to be discussed later) with a sequence of actions in the interface. These actions for accomplishing a predefined task must be recorded using software that is able to capture mouse-pointer movements and other screen events (e.g., Lotus® ScreenCam™). This step may be performed by users, designers, or experts. Users may perform the tagging either while executing a task, or afterwards by playing a screen movie of their own interaction. In these cases, we capture a spontaneous response to interaction patterns, cast in the form of one of the available utterances, as a kind of constrained thinking-aloud protocol. When designers or HCI experts do the tagging, they identify the interactive breakdowns that users experienced and express them by means of the same set of utterances. That is why we say that they put words in the users’ mouth, in an attempt to recreate a verbal protocol. Designers and experts could also videotape and take notes of users’ behavior during test sessions and use this material for occasional disambiguation during the tagging process. The following set of utterances is the one we have selected to express different kinds of breakdown situations and user


METHODS & TOOLS COLUMN EDITORS Michael Muller Lotus Development Corp. 55 Cambridge Parkway Cambridge, MA 02142 USA +1-617-693-4235 fax: +1-617-693-1407 [email protected]

Finn Kensing IT- University in Copenhagen Glentevej 67 2400 København NV Denmark + 45 3816 8888 lokal 829 fax: + 45 3816 8899 [email protected]

33

attitudes likely to occur during human-computer interaction [2]. They can be mapped to various ontologies of HCI design problems or guidelines and represent categories of HCI phenomena. The utterances in parentheses are variants of the most representative utterance associated with the interactive pattern described. ✖ Where is? (What now?) The user seems to be searching for a specific function but demonstrates difficulty in locating it. So, he sequentially (worse case) or thematically (better case) browses menus and/or toolbars for that function, without triggering any action. This category includes a special case we have called What now?, which applies when a user is clearly searching for a clue of what to do next and not searching for a specific function that he hopes will achieve what he wants to do. ✖ What’s this? (Object or action?) The user seems to be exploring the possibilities of interaction to gain more (or some) understanding of what a specific function achieves. He lingers on some symbol waiting for a tool tip and/or explicitly calls for help about that symbol, or he hesitates between what he thinks are equivalent options. This category also includes cases in which users are confused about widgets being associated with objects instead of actions and vice versa (Object or action?). ✖ Oops! (I can’t do it this way./ Where am I?) This category accounts for cases in which a user performs some action to achieve a specific state of affairs, but the outcome is not what he expected. The user then either immediately corrects his decision (typically via Undo or by attempting to restore some previous state) or completes the task with an additional sequence of actions. Sometimes the user follows some path 34

✖

✖

✖

✖

of action and then realizes that it’s not leading him where he expected. He then cancels the sequence of actions and chooses a different path. In this case the associated utterance is I can’t do it this way. This category includes another one, Where am I?, in which the user performs some action that is appropriate in another context but not in the current one. Why doesn’t it? (What happened?) This category involves cases in which the user expects some sort of outcome but does not achieve it. The subsequent scenario is that he then insists on the same path, as if he were so sure that some function should do what he expects that he simply cannot accept the the fact that it doesn’t. Movies show that users carefully step through the path again and again to check that they are not doing something wrong. The alternative scenario (What happened?) is when they do not get feedback from the system and are apparently unable to assign meaning to the function’s outcome (halt for a moment). Looks fine to me… The user achieves some result he believes is the expected one. At times he misinterprets feedback from the application and does not realize that the result is not the expected one. I can’t do it. The user is unable to achieve the proposed goal, either because he does not know how to or because he does not have enough resources (time, will, patience) to do it. Thanks, but no, thanks. (I can do otherwise.) The user ignores some preferential intended affordance present in the application’s interface and finds another way around task execution to achieve his goal. If the user has successfully used the afforded strategy before and still decides to switch to a different path of action, then it is a case of


Thanks, but no, thanks. If the user is not aware of the intended affordance or has not been able to use it effectively (that is, he has probably uttered an “I can’t do it this way” right before he engaged in an alternative path), then it is a case of I can do otherwise. Whereas Thanks, but no, thanks is an explicit declination of some affordance, I can do otherwise is a case of missing some intended affordance. Notice that all the utterances could be expressed naturally by the users, except for Looks fine to me… and I can do otherwise. These utterances are tagged to a user’s actions only when the user is unaware of the result of his actions or of some function afforded by the interface. But if someone is unaware of something, he cannot possibly tell what he is unaware of. Therefore, these taggings can be produced by users themselves only after the fact, that is, when they watch a movie of their own performance and realize then what they didn’t realize before. For instance, in Figure 2 we can assign the utterances “What’s this?” to the Edit > Undo Copy menu item and “What happened?” when the error message appears. For samples of communicability tagged movies, visit http://peirce.inf.puc-rio.br/ and select SERG resources (under Communicability Tagging). Interpretation This step consists of tabulating the gathered data and mapping the utterances onto HCI ontologies of problems or design guidelines. This step must be done by an HCI expert, unless the mapping has been predefined. In this case, designers can benefit from some sort of automatic mapping and obtain a mechanically generated diag-

nosis of interaction problems. We have associated the seven categories of HCI phenomena (as represented by their corresponding utterances) to four high-level classes of interaction and usability problems, as shown in Table 1. Notice that navigation, meaning assignment, task accomplishment, and missing of affordance, are known usability problems. However, communicability evaluation reveals yet another class of problems: declination of affordance. Popular sets of design guidelines or usability principles do not address this problem explicitly, nor do cognitively based evaluation methods and techniques. Nevertheless, in communicability evaluation, this phenomenon can be perceived and can be used to refine HCI problem taxonomies. Users generally decline an affordance when they regard the cost–benefit ratio for an afforded feature as disadvantageous, if compared with an alternative way of performing the same task. Among the causes of declination of affordance are inconvenient navigational structures, such as deep nesting in menu structures or lack of shortcuts. HCI experts may use alternative taxonomies for a more precise user-system interaction diagnosis. For instance, utterances may be mapped to such distinct taxonomies as Nielsen’s discount evaluation guidelines [3], Shneiderman’s eight golden rules [7], Norman’s gulfs of execution and evaluation

Navigation

Meaning Assignment

Task Accomplishment

Declination / Missing of Affordance

I can’t do it. Looks fine to me… Where is? What now? What’s this? Object or action? Why doesn’t it? What happened? Oops! I can’t do it this way. Where am I? Thanks, but no, thanks. I can do otherwise. Table 1. Mapping conversational categories onto high-level interactive and usability problems.


35

[4], or even Sellen and Nicol’s taxonomy of contents for building online help [6]. Independently of the taxonomy chosen to map the utterances to, during interpretation HCI experts identify the main interaction problems at the interface. By examining the utterances, the expert has some indication of the cause of the problem and, thus, of its solution. The taggings capture more precisely the symptoms of interaction breakdowns, and thus give a more detailed and refined indication of these problems. For instance both the “Why doesn’t it?” and “What’s this?” utterances may be mapped to a meaning assignment category. Nevertheless, the former indicates that users believe they understand the signs but are generating the wrong interpretations to them, whereas the latter indicates that they can’t generate any interpretation. The expert can then plan for the redesign of the communicative breakdown points of interface, supported by more indepth information. From this interpretation, the expert can also plan the context-sensitive help, or the entire help module as a whole (especially if redesign is not an option or if the breakdown is not perceived as critical), as well as create the application’s semiotic profile. Semiotic Profiling Profiling must be done by semiotic engineering experts. Profiling consists of interpreting the tabulation in semiotic terms, in an attempt to retrieve the original designer’s meta-communication, that is, the meaning of the designer-to-user message. Thus, semiotic profiling adds value to the evaluation made during interpretation, since it goes beyond the communication breakdowns and interaction problems identified and tackles a more abstract level, the interface language. At this level, the semiotic engineering expert deals with the implicit messages conveyed by the choice of signs, structures, and 36

interactive patterns that compose the user interface. These messages may be intentional or unintentional and greatly influence the perceptions and reactions of users to software artifacts. Unintentional messages are generally the result of designers’ tacit knowledge and assumptions. The role of the expert is to reveal these implicit messages to designers, who may then change or confirm their choices. An example of the outcome of the semiotic profiling of an application is the realization that some piece of software is implicitly conveyed as a toolbox for solving a certain set of related problems, whereas another piece (a commercial competitor, for instance) is conveyed as a monitor of the user’s action. Conclusion

Communicability evaluation provides different instances of representation that can be used in different ways by people with different degrees and types of expertise. Semiotic engineering and HCI experts can use this method to evaluate interface design, identifying problems and proposing redesign choices. Designers can use it to predict or diagnose interaction problems. And users can use it as a means of direct or indirect communication with the designer. Direct communication takes place either when (a) designers have access to users’ interaction through the movies and can recognize problems that occurred, or (b) designers have access to users’ own taggings. Indirect communication can be mediated by (a) the HCI expert, who produces taggings and interpretations based on his technical knowledge and professional experience, or (b) the semiotic engineering expert, who can additionally build the application’s semiotic profile. Moreover, taggings provide a common language in which users, designers, and HCI and semiotic engineering experts can share their knowledge. As we said, communicability evaluation can


be used at different stages of the design process. At early stages, it can serve as a formative evaluation tool, allowing designers to compare different design options or assess the choices they have made. In particular, our method can be used as an instrument for inspection evaluation, since designers and experts may try to put themselves in the users’ shoes and tag potential interactive breakdowns. At later stages, it can be used as a summative evaluation tool to inform the features to be changed or added in future software releases. Our method applies basically to single-user interfaces. Multi-user interfaces would probably require other utterances related to interacting with other users, such as Who are you? What are you doing? and Where are you? The same is true of artificial intelligence applications, for which utterances related to the system’s cognitive abilities are likely to occur (e.g., Do you know this? Can you learn this?). Compared with other evaluation methods [5], our method focuses on what is being said by the interface signs a user is supposed to interpret. Thus, we do not directly address problems in inadequate task or user modeling, except perhaps as a further inference on why certain communication is conveyed. By the same token, other methods typically do not directly address the problems we deal with. For instance, failure to provide feedback [e.g., 3, 7] may cause differentiated taggings (e.g., Why doesn’t it? What’s this? I can do otherwise.). The effect of differentiation can be noticed in redesign tasks or online help design, for example. Designers can use the tagging to decide which sign or message they will incorporate into the application so that the problem is solved or minimized. Sequences of utterances may additionally provide relevant insights about how users interpret the designers’ messages. For instance, a Where is? utterance may be often followed by an I can do otherwise utterance. In this case, the sequence probably indicates that the user fails to perceive some feature’s intended affordance and thus that the designer is not getting the message across. Also, a sequence of Thanks, but no, thanks utterances is likely to indicate a mismatch between the designer’s

ideal user and the actual user who is participating in the evaluation. The goals and issues evaluated by usability and communicability methods are distinct and complementary. Thus, in order to have a broader evaluation of software, taking into consideration appropriateness to the task, user’s performance, and communication of design intents, the expert should combine usability and communicability methods. In our user interface evaluation projects we have combined communicability evaluation (all three steps), user interviews, and interface inspection. As a result, we have been able to evaluate the interface thoroughly for the factors mentioned. Some challenging issues about communicability evaluation remain, namely, ✦ Is this set of utterances appropriate? Is it technology dependent? Is it culturally determined? If we allow people to tag movies with other utterances (in addition to the ones in our set), we may sense if the set we are working with is satisfactory for the analysis or not. The same applies to specialized technologies, such as multi-user applications or artificial intelligence–based systems. Different cultures may also react in different ways to the same communicative acts (even without translation problems). The latter may be particularly interesting for software localization. ✦ What is the spectrum of taggings that can be done to the same movie by different groups of people (users, designers, and experts)? We shall soon have the results of a case study in which interaction with a small application will be tagged by different groups of people. By contrasting the taggings, we expect to assess the range of plausible interpretations the same phenomena can yield, which is seldom achieved with other evaluation methods.


37

PERMISSION TO MAKE DIGITAL OR HARD COPIES OF ALL OR PART OF THIS WORK FOR PERSONAL OR CLASSROOM USE IS GRANTED WITHOUT FEE PROVIDED THAT COPIES ARE NOT MADE OR DISTRIBUTED FOR PROFIT OR

✦ How do utterances change along the users’ learning curve? An ongoing study with 6 participants using the same software over a period of 10 weeks will provide us with data for an appreciation of the types of utterances novices are likely to make, compared with more experienced users. For instance, Where is? can be expected to disappear, or at least to come down to an insignificant number of occurrences during a task. Conversely, Thanks, but no, thanks might become more frequent over time.

COMMERCIAL ADVANTAGE AND THAT

FULL CITATION ON THE FIRST PAGE.

TO POST ON SERVERS OR TO REDISTRIBUTE TO LISTS, REQUIRES PRIOR SPECIFIC PERMISSION AND/OR A FEE.

© ACM 1072-5220/00/0100 $5.00

References 1. de Souza, C.S. The semiotic engineering of user interface languages. International Journal of ManMachine Studies 39 (1993), pp. 753–773. 2. de Souza, C.S., Prates, R.O., and Barbosa, S.D.J. A Method for Evaluating Software Communicability. In Proceedings of the Second Brazilian Workshop in Human–Computer Interaction (IHC ‘99), forthcoming. 3. Nielsen, J. Usability Engineering. Academic Press, Boston, MA, 1994. 4. Norman, D. Cognitive Engineering. In Norman, D. A. and Draper, S. W. (eds.), User-Centered System

COPIES BEAR THIS NOTICE AND THE

TO COPY OTHERWISE, TO REPUBLISH,

contributed significantly to the results presented here.

Acknowledgments

Design. Lawrence Erlbaum Associates, Hillsdale, NJ,

The authors would like to thank CNPq for their support. They also thank all their colleagues who discussed the issues presented in this paper and gave them insightful suggestions, in particular, Tom Carey, Michael Muller, Kevin Harrigan, and John Thompson. The members of the Semiotic Engineering Research Group and the graduate students of INF2062 at PUC-Rio also

1986, pp. 411–432.

BOOK ACM PRESS

5. Preece, J., Rogers, Y., Sharp. H., Benyon, D., Holland, S. and Carey, T. Human-Computer Interaction. Addison-Wesley, Reading, MA, 1994. 6. Sellen, A. and Nicol, A. Building User-Entered Online Help. In B. Laurel (ed.), The Art of Human-Computer Interface Design. Addison-Wesley, Reading, MA, 1990. 7. Shneiderman, B. Designing the User Interface. Addison-Wesley, Reading, MA, 1998.

Association for Computing Machinery The First Society in Computing

Programming Pearls, Second Edition by: Jon L. Bentley When programmers list their favorite books, Jon Bentley’s collection of programming pearls is commonly included among the classics. Illustrated by programs designed as much for fun as for instruction, the book is filled with lucid and witty descriptions of practical programming techniques and fundamental design principles. It is not at all surprising that Programming Pearls has been so highly valued by programmers at every level of experience. In this revision, the first in 14 years, Bentley has substantially updated his essays to reflect current programming methods and environments. In addition, there are three new essays on: testing, debugging, and timing; set representations; and string problems. All the original programs have been rewritten, and an equal amount of new code has been generated. Implementations of all the programs, in C or C++, are now available on the Web.

Order #: 702003 Paperback 2000 256 pp. ISBN: 0-201-65788-0 Members: $22.45 Non Members: $ 24.95

ad99

Call Toll Free: 1.800.342.6626 (USA & Canada) 1.212.626.0500 (outside US) Fax: 1.212.944.1318

T O

O R D E R