A Toolkit for Evaluating Peripheral Awareness Displays Tara Matthews University of California, Berkeley Soda Hall, Berkeley, CA 94720
[email protected]
Jennifer Mankoff Carnegie Mellon 5000 Forbes Ave, Pittsburgh, PA 15213
[email protected]
ABSTRACT
natural environment for long periods of time [5,15]. A natural deployment and long-term use implies that the peripheral display is fully built and functional. However, it is extremely costly to build a fully functional prototype and deploy it for long periods of time. Further, it is commonly known that an application typically requires several iterations before it is realistically ready for use [3]. The iterative design process would be many times more timeconsuming if a fully functional prototype were deployed long-term for each iteration. We believe that the evaluation of low effort prototypes is necessary for designing peripheral displays.
Peripheral displays are an important application for enabling people to be constantly aware of information while doing other activities. How does one evaluate a peripheral awareness display to determine if it provides awareness without distracting the user inappropriately from other tasks? We explore evaluation metrics for peripheral displays and describe proposed support to be added to an existing toolkit for creating peripheral display prototypes. Author Keywords
Peripheral displays, evaluation, awareness, distraction.
In past work, we have presented the Peripheral Displays Toolkit (PTK), a toolkit that provides structured support for managing user attention in the development of peripheral displays [9]. The PTK is well-suited to enabling quick prototype creation. As a next step, we plan to add support for evaluating prototypes built with the PTK. Initial support will focus on measuring metrics specific to peripheral displays: awareness and distraction. Methods to be supported include context-aware experience sampling, image and audio experience sampling, logging of both the peripheral display task and the primary task, and data analysis.
ACM Classification Keywords
D.2.2 [Software Engineering]: Design Tools and Techniques — software libraries; user interfaces. H.5.1 [Information Interfaces]: User Interfaces — input devices and strategies; interaction styles; prototyping; usercentered design. INTRODUCTION
Peripheral displays can allow a person to be aware of information while she is attending to some other primary activity. Such displays are particularly well suited to helping people maintain awareness of other individuals or groups because they enable continuous monitoring as a secondary activity. For example, the Digital Family Portrait [12] is a digital picture frame that changes based on the pictured person’s activities. The display is meant to provide relatives with a qualitative sense of the geographically distant person’s well-being, helping them to stay connected more continuously.
DIFFICULTIES EVALUATING PERIPHERAL DISPLAYS
In order to inform the design of a toolkit for peripheral displays, we conducted interviews of 10 peripheral display creators. When asked what the hardest part of designing peripheral displays was, 7 out of 10 participants said “evaluation.” The other participants pointed to early design decisions as being most difficult. Clearly, an evaluation technique to help designers wade through design options to select the best would be extremely valuable. One participant aptly stated the problem, “How do you generate as real of an experience as would suffice for your data collection needs? I’m looking for [an evaluation technique] that will have a high data pay-off for a low time cost.” However, such an evaluation technique does not currently exist, as other participants pointed out: “the real value in many of these systems is only apparent longitudinally.” Another reason for difficulties is that peripheral displays commonly involve hardware so building a realistic enough prototype is time consuming. One participant noted how the iterative design cycle is lengthened,
A challenge when creating peripheral awareness displays is evaluating them. How does one determine that users are aware of displayed information while not inappropriately distracted from other activities? Several recent papers have claimed that ambient and peripheral displays cannot be effectively evaluated unless they are deployed in a user’s
1
“How do you evaluate a peripheral display? You can’t do a typical lab, usability study. We had to implement a working prototype and deploy it in people’s work place. If we had found that it was all wrong, we had to throw away all that work.”
minimizing it. Appeal (Usefulness, Aesthetics)
Appeal refers to how much the user qualitatively enjoys using the display and the way it looks. In other words, this metric represents their overall feelings about the display. This metric can be broken down into usefulness and aesthetics, which represent two very different aspects of users’ qualitative feelings about a display.
The PTK provides a potential solution: the toolkit enables rapid prototyping of displays. With new support for evaluations, the toolkit will help creators determine which prototype design is best.
Four of ten participants in the interviews mentioned usefulness and aesthetics as important when evaluating peripheral displays. Adoption of displays depends upon its appeal to users, as several interview participants learned in their projects. In addition, this metric is informed by the ambient display heuristics presented in [8], several of which would affect the appeal of a display: “aesthetic and pleasing design,” “useful and relevant information,” and “match between design of ambient display and environment.”
EVALUATION METRICS
Traditional user interfaces are evaluated based upon metrics related to accomplishing a concrete focal activity: efficiency or time to completion, success rate of completion, number of errors, and quality of the results. Given their non-focal use and non-interactive nature, peripheral displays have different measures of effectiveness, which make traditional evaluation metrics less applicable.
Learnability
This section presents five evaluation metrics for peripheral displays derived from interviews of peripheral display creators, lessons learned from our own past studies [4,6,8], and ambient display heuristics [8]. They are awareness, distraction, appeal, learnability, and effects of breakdowns. Once we define these evaluation metrics, we describe in the next section how support added to the PTK will help measure them. We focus on support for awareness and distraction as a first step, because existing methods are less effective in measuring these metrics.
Learnability is the amount of time and effort required for users to learn to get information from the display. This metric is often affected by the chosen mappings of input to output, since peripheral displays do not always display input literally but instead show it in an abstract fashion. The learnability of a display may be important to its adoption. Users may be less likely to use displays that are difficult to learn, unless interpreting the display is meant to present a challenge and users expect this [14]. Again, the ambient display heuristics point to learnability as an important metric (“consistent and intuitive mapping”). This heuristic assumes that the display should be easy to learn and interpret. We agree that learnability is important, but we argue that the designer should determine how learnable a display will be based upon the situation in which the display is used and feedback from users. For example, sometimes a challenging display is desirable.
Awareness
Awareness is the amount of information shown by the display that users are able to recall, understand, or use. Since the purpose of a peripheral display is to convey some information, it follows that the user’s awareness of that information can be used to judge the effectiveness of the display. Three of ten interview participants said user awareness of information is an important metric for evaluating peripheral displays. Our past studies of the Bus mobile [8], sound displays for the deaf [4], and email displays [6] have all explored measuring awareness. Many studies in peripheral display literature have also measured awareness as an important metric [10,11,12,13].
Effects of Breakdowns
The effects of breakdowns involve how apparent breakdowns are and how easily users can recover. The goal should be to make breakdowns obvious and recovery easy. The ambient heuristics point to the effects of breakdowns, saying that the “visibility of system state,” which includes error states, is an important design consideration. This has proven difficult for peripheral displays to accomplish in the past, because they are typically non-interactive (i.e., input to the display is not typically controlled by users). In an evaluation of the bus mobile, the visibility of breakdowns was shown to be a problem [8]. The state signified by the bus tokens being still underneath the mobile skirt had two meanings: no buses are scheduled or a motor is broken. Users could not tell the difference. An interview participant told a story of a peripheral display breakdown causing mild panic in the lab group. The display showed the activity on a main server. At one point, an error in the display caused it to freeze. Users interpreted this to mean that the main
Distraction
Distraction is the amount of attention the display attracts away from a user’s primary action. Designers should measure distraction since it will affect a user’s primary action productivity and qualitative feelings about a display. Three of ten interview participants named distraction as an important metric, saying that a crucial measure of success for peripheral displays is that they be peripheral and distract the user only when appropriate. Our past studies of the bus mobile [8], sound displays for the deaf [4], and email displays [6] have all measured distraction, often (but not always) with a goal of
2
displayed information is, and how appropriate she thinks the level of interruption is.
server had frozen and they frantically searched for problems with it. In one of our past field studies of the Email Orb [6], the Orb was not displaying anything for half a day before users noticed. These examples show the importance of making breakdown apparent and recovery easy.
The PTK event handling and notification system enables context-aware decisions about when questionnaires should appear. First, questionnaires should only appear if information has recently been displayed. The PTK knows when events arrive at a display and can appropriately schedule questionnaires after these times. Second, designers may be more interested in knowing about the awareness and distraction caused by events of different notification levels. Notification levels specify the importance of making the user aware of a piece of information and affect the way the information displayed. For example, a low notification level event may cause a slow, subtle transition on the display, while a high notification level event may cause the display to flash bright colors. It follows that users should have more awareness and distraction for high notification levels and less for low levels. To measure this, the PTK will schedule experience sampling after a mixture of events with different notification levels. Finally, the PTK will enable designers to customize experience sampling occurrences (e.g., which notification levels to target, the time to wait after events are displayed before sampling, whether or not to randomize sample intervals, which questions to ask, and more).
Focus on Awareness and Distraction
As a first step in exploring toolkit evaluation support, we will focus on awareness and distraction. Traditional evaluation methods are not clearly applicable to measuring awareness of and distraction caused by a peripheral display since they typically evaluate focal activities. Both metrics are greatly affected by the focal activity of the user, which means they are best measured while the user performs such activities. How can we evaluate the activities that are peripheral to the focal activity that we already know how to evaluate? Though proposed support focusing on awareness and distraction may also help gather information about appeal, learnability, and the effects of breakdowns, we will address these metrics more directly in future work. TOOLKIT SUPPORT FOR EVALUATION
We propose toolkit support for several methods that have been effective in evaluating awareness and distraction in our past lab and field studies. These methods include context-aware experience sampling, image and audio experience sampling, logging of both the peripheral task and the primary task, and data analysis. When employed in dual-task lab studies or field deployments, these methods enable evaluation at various stages in the design process. When added to the PTK, a toolkit for quickly prototyping peripheral displays, this evaluation support will improve the iterative design process for peripheral displays.
Image and Audio Experience Sampling
Intille et al. [7] presented a method called image-based experience sampling and reflection in which a digital image is captured during a field study and later presented to the user for reflection. We plan to incorporate a version of this method in the PTK for both image and audio samples. A digital camera (or microphone) will capture images (or audio) of the peripheral display’s environment (additional cameras (or mics) can be placed in other locations as well), and samples will be captured in a context-aware way (i.e., when events have recently displayed, taking notification levels into consideration). Later, when the user goes to her desktop computer, a questionnaire will appear showing the captured image (or playing the audio) and asking about the situation (e.g., “What were you doing?” “What information did the display convey?” “Did the display distract you from your other activities?” “Did the display help you accomplish your other activity?”). Alternatively, the images (or audio) may be presented to users in interviews with the designer to improve recall of the situation.
Context-Aware Experience Sampling
Experience sampling gets in situ feedback from users during field studies by asking questions periodically throughout the day. In situ feedback is beneficial because users will have a contextual feeling of how well the display is doing versus a generalized impression after the fact. PTK support for experience sampling will come in the form of graphical pop-up questionnaires. To measure a user’s awareness of displayed information, questionnaires will ask about the information shown (e.g., for the Digital Family Portrait, “Can you guess what activities Grandma has done in the last hour?”). The toolkit will have a mechanism for displaying no information whenever a questionnaire appears (e.g., show a white screen). This enables measurement of user awareness immediately before the experience sampling interruption.
The benefits of image and audio experience sampling are (1) users are not interrupted in the middle of an activity, (2) information can easily be gathered about displays placed anywhere in the environment when the user goes to her computer (a device most people have), and (3) the moment a person first sits down to her computer is likely to be an inbetween-tasks time – a good time for interruptions [1].
To measure distraction, pop-up questionnaires could ask likert-scale questions to explore a user’s opinion as she uses the display. For example, “When the display shows new information, how much does it interrupt your current work? 1 (not at all) – 5 (completely).” You could also ask what the user is currently doing, how important the peripherally
Audio sampling is particularly appropriate for displays that convey auditory information or auditorally convey information, for example [2,4,11,13]. 3
Logging
experience sampling, image and audio experience sampling, and logging of both the primary and peripheral display activities. Already a toolkit well-suited to rapid prototyping, the PTK with evaluation support will provide a powerful tool for iteratively designing peripheral awareness displays.
The PTK will also include support for various types of logging, which will aid data collection during both dualtask lab and field studies: o Input to the display, stored such that it can be used to recreate the output. o If the primary task is accomplished using a computer, this task will be logged (including, keystrokes, active window changes, and times between window changes). o Similar logging of displaced tasks that are accomplished using a computer. A displaced task is the task the peripheral display replaces and should be measured before the display is deployed (i.e., in the email orb evaluation [6], the PTK could log the use of email before orb deployment). This is used to measure changes in behavior.
REFERENCES
Data Analysis
Given the information gathered from experience samples and logs, the PTK will support data analysis: o Experience sample awareness question answers will be checked for correctness against input logs. o Primary task logging will be analyzed to determine if displayed events distracted users. For example, after an event is displayed, the user’s keystrokes may have slowed. Logs can also be used to indicate awareness of displayed information. For example, task changes in the primary activity could indicate awareness. o Experience samples and distraction measures (from changes in the primary task logs) will be categorized by notification level, giving designers an idea of how well notification transitions worked. Additional data analysis, such as visualizations and searching, will follow as future work.
1.
Czerwinski M., Cutrell E. and Horvitz E. “Instant messaging: Effects of relevance and time.” In People and Computers XIV: Proc. of HCI ‘00, v. 2, pp. 71-76.
2.
Gaver, W. et al. “Effective sounds in complex systems: The ARKola simulation.” In Proc. of CHI ’91, pp. 85-90.
3.
Gould, J.D., Lewis, C. “Designing for usability: key principles and what designers think,” Communications of the ACM, v. 28 n. 3, pp. 300-311, 1985.
4.
Ho-Ching, W. et al. “Can you see what I hear? The design and evaluation of a peripheral sound display for the deaf.” In Proc. of CHI ‘03, pp. 161-168.
5.
Holmquist, L.E. “Evaluating the comprehension of ambient displays.” In Extended Abstracts of CHI ’04, p.
1545.
VALIDATION
To validate evaluation support for peripheral awareness displays, we will iteratively develop and evaluate new designs of a sound visualization for the deaf, based on [4]. We will also conduct a long-term field study of the display, in order to determine if the PTK evaluation support improves our ability to measure awareness and distraction when compared to our previous study.
6.
Hsieh, G. and Mankoff, J. “A Comparison of Two Peripheral Displays for Monitoring Email: Measuring Usability, Awareness, and Distraction.” Tech Report UCB//CSD-03-1286, U.C. Berkeley. 2003.
7.
Intille, S. et al. “Eliciting User Preferences Using ImageBased Experience Sampling and Reflection.” In Extended Abstracts of CHI ’02, pp. 738-739.
8.
Mankoff, J. et al. “Heuristic evaluation of ambient displays.” In Proc. of CHI ’03, pp. 169-176.
9.
Matthews, Tara, Anind K. Dey, Jennifer Mankoff, Scott Carter, Tye Rattenbury. “A Toolkit for Managing User Attention in Peripheral Displays.” In Proc. of UIST ‘04, pp. 247-256.
10. McCrickard, D. S. and Zhao, Q. A. “Supporting information awareness using animated widgets.” In USENIX Technical Program, pp. 117-127. 2000. 11. Mynatt, E.D., et al. “Designing audio aura.” In Proc. of CHI '98, pp. 566-573. 12. Mynatt, E.D., et al. “Digital family portraits: Providing peace of mind for extended family members.” In Proc. of CHI ‘01, pp. 333-340.
CONCLUSION
Peripheral displays offer designers creative ways to help people maintain continuous awareness of other individuals or groups while doing other activities. Evaluation of such displays continues to be a challenge, as found in interviews of peripheral display creators. In this paper, we presented evaluation metrics based on past work: awareness, distraction, appeal, learnability, and effects of breakdowns. Focusing on measuring awareness and distraction as a first step, we proposed new evaluation support in the Peripheral Display Toolkit. Support will include context-aware
13. Pedersen, E. R., and Sokoler, T. “AROMA: Abstract representation of presence supporting mutual awareness.” In Proc. of CHI ‘97, pp. 51-58. 14. Redström, J. et al. “Informative art: using amplified artworks as information displays.” In Proc. of DARE ’00, pp. 103-114. 15. Skog, T., et al. “Between aesthetics and utility: designing ambient information visualizations.” In Proc. of InfoVis ‘03, pp 233-240.
4