Conferences
Editor: Jason Hong N Carnegie Mellon University N
[email protected]
Evaluating Pervasive and Ubiquitous Systems Steve Neely, Graeme Stevenson, Christian Kray, Ingrid Mulder, Kay Connelly, and Katie A. Siek
R
ecognized evaluation strategies are essential to systematically advance a research field’s state of the art. Pervasive and ubiquitous computing needs such strategies to mature as a discipline and to enable researchers to objectively assess and compare new techniques’ contributions. Researchers have shown that evaluating ubiquitous systems can be difficult, so approaches tend to be subjective, piecemeal, or both. To ensure that the validity and usability of proposed systems won’t be compromised, researchers must reach consensus on a set of standard evaluation methods for ubiquitous systems. Otherwise, methods for scientifically testing and presenting advances on the state of the art will remain unclear. Driven by these issues, several workshops focusing on the evaluation of ubiquitous systems have taken place over the last three years. Here, we summarize and discuss the main outcomes of five related workshops. Kay Connelly and Katie A. Siek discuss the first two workshops in their series on studying usability in the wild. Ingrid Mulder reports on the In Situ workshop at Mobile HCI 2007. Steve Neely and Graeme Stevenson present the Ubiquitous Systems Evaluation workshop at UbiComp 2007, and Christian Kray writes about the workshop on Evaluating Ubiquitous Systems with Users at the European Conference on Ambient Intelligence 2007.
STUDYING USABILITY IN THE WILD Two workshops looked at challenges associated with evaluating ubiquitous computing in nonlab environments. The first, Reality Testing: HCI Challenges in Nontraditional Environments, was held in conjunction with CHI (ComputerHuman Interaction) 2006. The second, Technology Has Escaped from the Zoo: Studying Usability in the Wild, was held at Interact 2007. Both workshops were organized around application domains, with sessions consisting of short paper presentations followed by panel discussions. The main goals were to examine key challenges associated with conducting user studies in nontraditional environments and to identify common themes to contribute to an evaluation framework. The primary application areas represented in one or both workshops were healthcare, the military, everyday life and games, and social software. The healthcare domain is broadly defined as research conducted in healthcare or home-care environments with patients or clinicians. Healthcare environments typically have limited, stressful user spaces, and privacy concerns can restrict recording equipment. Research in the military domain might be conducted inside a military facility or in situ with a traveling military unit. These applications’ operational environments are high-stress and cramped, and often involve unusual equipment (such
Published by the IEEE CS N 1536-1268/08/$25.00 © 2008 IEEE
as heavy packs or hazmat suits), making realistic lab simulations difficult. These same conditions also make insitu testing challenging. The everyday life domain encompasses studies conducted in a home, on the street, or almost anywhere people normally interact. Everyday life environments typically have multiple interruptions and pose challenges to recording data without getting in the participant’s way. Ubiquitous computing games pose similar challenges, as they are typically held outside on the streets. Social software enhances interaction between groups of friends, often using socialnetworking techniques. The software can be difficult to study because of the need to recruit groups of people who already know each other. In the workshops, each domain area had its own specific constraints, but workshop participants identified common issues that crossed domains. In particular, methodology, data collection and analysis, ethics, and the multidisciplinary nature of such research emerged as common and important topics of discussion. Methodology was the most difficult topic we discussed. Many agreed that it was often necessary to triangulate methods to piece together the large, complex picture inherent in nontraditional environments. But which combination of methods are best suited for different situations wasn’t
PERVASIVE computing
85
CONFERENCES
CONFERENCES
clear. How to handle specific environmental issues was another focus for discussion. For instance, mobility introduces a host of technical problems related to deploying designs and gathering usability data. In addition, sample size can be a problem. Researchers might have access only to a small number of participants for a short amount of time, either because the study is resource-intensive (as in the case of ubiquitous games and everyday life) or access is limited to a small target population (as in the case of the military and certain health conditions). Small study sizes can affect the ability to generalize the research. Finally, participants generally agreed that because much of this research is exploratory, data collection early in a project tends to be open-ended. This lets themes emerge that will direct data collection toward answering specific research questions later. Workshop participants agreed that adequate guidelines to help researchers set up and safely conduct user studies in these nontraditional environments don’t yet exist. Several participants are working together to formalize guidelines for future studies. More information about these workshops is available at www.cs.indiana. edu /surg /CHI2006 and www.cs. indiana.edu/surg/interact2007.
IN SITU In Situ 2007, a workshop held in conjunction with the ACM MobileHCI 2007, explored the role of mobile devices and pervasive technology in in situ evaluation. Participants discussed research questions, future directions, relevant technologies, and in situ evaluation methods and tools. Mobile technologies and personal services are becoming more complex and are used in diverse contexts, so it’s harder to validly study and evaluate users’ experiences with these technologies. Given the complex nature of these emerging technologies, in situ measurement seems to be a promising way to study subjective and dynamic
86
PERVASIVE computing
phenomena, such as the user’s experiences in the context of use. Traditional methods such as observation, contextual inquiry, or lab study can’t capture the true user experience in context. The workshop consisted of two parts. In the morning, participants presented and discussed selected papers, concentrating on methods and tools for evaluation in the wild. Approaches fell into two categories. The first one comprised researchers who applied classic methodologies and tools such as thinking aloud and interviewing—sometimes in new combinations—to pervasive systems in field experiments. The second category comprised methods in which researchers used mobile and pervasive technology itself to evaluate systems—for example, by analyzing and capturing data from a mobile phone’s various sensors. In the second block of the morning
Mobility introduces a host of technical problems related to deploying designs and gathering usability data. session, participants separated into groups to explore the issues that arose from the presentations. A discussion on communication mobility sparked several interesting threads on issues such as private space entering public space and monitoring people’s activities. Other issues related to how to set up an evaluation for unknown contexts and context effects. What variables should we measure? Can we know in advance what variables must be measured? How should we deal with evaluating the massive and varied amount of data resulting from capturing numerous users’ incontext experiences? In the second half of the workshop, participants exchanged ideas and experiences to identify design and evaluation challenges. We discussed the ideal setup for a proper evaluation in the wild and in what way an “exemplary guide
for in situ evaluation” differs from common available handbooks on socialscience research methods and humancomputer interaction? Outlines for such an imaginary handbook for in situ evaluations were defined. Workshop participants agreed that considering in situ evaluation from different perspectives is important and that privacy is a delicate issue. The discussions covered a broad array of domains, from healthcare and persuasive technologies to social technology and the home environment. The workshop stressed the value of in situ research—understanding behavior and user experience as well as informing new systems’ design. It also provided more insight into how we can exploit qualities of new technology in in situ evaluation. Attendees reflected on the methods and tools used in this area and discussed their impact. New tooling is indeed helpful in data collection and analysis, but considering how much data can be captured automatically is also important. At the end of the day, we had to leave some questions unanswered—in particular, how can we effectively deal with huge data sets? And how can we interpret data in meaningful and useful ways? More information about this workshop is available at http://insitu2007. freeband.nl.
UBIQUITOUS SYSTEMS EVALUATION The first Ubiquitous Systems Evaluation (USE 07) workshop was held in conjunction with UbiComp 2007. USE 07 brought together researchers from across the ubiquitous computing domain to discuss their experiences, with the long-term view of forming a toolkit of techniques to support evaluation in the domain. The workshop included presentations of selected papers on the themes of users and usability, prototype-based systems development, and ubiquitoussystem components. After the talks, participants discussed the various themes that the speakers had identified. A one-size-fits-all approach to evalu-
www.computer.org/pervasive
CONFERENCES
ating ubiquitous systems is unrealistic. The variety of contributing factors— from scale to context, to the nature of user interaction—implies the need for a tailored approach. One idea that emerged was the possibility of decomposing core system functionalities into components and pairing each component with specific evaluation strategies. The outcome would be a toolkit of techniques from which system builders could draw. Making these techniques open, available, and easy for others to use will afford comparative evaluation between similar components across systems. On a related theme, a clear need emerged for realistic, nontrivial demonstrator scenarios to assess the relative merits of different evaluation techniques. Such a suite of demonstrators should cover the broad spectrum of factors that affect ubiquitous systems. Until this is achieved, the choice of one evaluation strategy over another remains subjective. Another finding was that in many instances, it makes sense to evaluate personal experiences rather than require test subjects to follow a script of actions. The question of how to compare the results from a set of personalized evaluations remains open. Effective evaluations require realworld data. Although this data exists in many forms (for example, mobilephone network usage), privacy concerns render such data largely unobtainable. Although some forms of data are more easily made anonymous than others (such as text versus video), more subtle issues are involved. Increasingly sophisticated data-mining techniques might support the recovery of identity and relationships from obfuscated data. To make realworld data sets available to researchers, we must consider such issues. Current evaluation practices include the use of theoretical models, virtual environments, lab-based trials, and field studies. The choice of technique greatly depends on the deployed infrastructure’s availability and factors affecting experiments’ reproducibility. A discon-
JULY–SEPTEMBER 2008
nect currently exists between user evaluations in the physical world and those in virtual worlds. However, there is great potential in combining these approaches to drive user evaluation throughout the development life cycle. More information about this workshop is available at www.useworkshop. org.
EVALUATING UBIQUITOUS SYSTEMS WITH USERS The Workshop on Evaluating Ubiquitous Systems with Users took place in November 2007 at the European Conference on Ambient Intelligence (AmI 07). Participants came from several disciplines, including design, psychology, and computer science. The workshop chairs had two main goals: to build community and to identify future research directions. We emphasized discussion over technical-
Qualitative methods might be more appropriate when evaluating how a system integrates with everyday activities. paper presentation by alternating working in small groups and synthesizing the group-discussion results in plenary sessions. Topics for group work included case studies (for example, how would you evaluate this particular system?), particular issues or problems, and how they could be solved in the future. The workshop touched on several important aspects in the context of evaluating ubiquitous systems with users. The first one relates to the nature of ubiquitous systems, which often integrate a large amount of data from a set of sensors while interacting with human users. An interdisciplinary evaluation appears to be the best way to gain a holistic understanding and cover all relevant factors of a ubiquitous system, its performance, and its environmental impact. A second aspect relates closely to the following observation: While
quantitative analysis is well suited to evaluating a ubiquitous system’s technical components, qualitative methods might be more appropriate when evaluating how a system integrates with everyday activities. Another defining feature of ubiquitous systems is their context awareness, and this feature has some implications for evaluating such systems. One key question here is how to effectively factor in the context. While longitudinal or field studies naturally incorporate context, they do so in an uncontrolled way, and measuring and recording the actual context is often difficult. We discussed virtual environments as an alternative that allows for controlling many contextual factors but frequently lacks realism. Another issue that emerged during the discussion is the comparability of systems. Other disciplines often rely on standardized tasks, methods, or test data sets. These don’t (yet) exist for ubiquitous systems. Moreover, defining them will be quite difficult because of the context’s potential impact on system behavior. Nevertheless, standardized tools have been instrumental in advancing some disciplines, such as robotics or HCI, so they might potentially benefit ubiquitous computing if they were developed for this type of system. Finally, participants discussed a particularly important problem in the context of longitudinal and field studies. Because a ubiquitous system is usually meant to integrate with everyday activities, it’s important to know whether it achieves a seamless integration. The issue of measuring and detecting disengagement has received little attention so far, but if systems are to be deployed in the real world, knowing whether most potential users are (willfully or accidentally) ignoring them will be important. More information about this workshop is available at http://homepages. cs.ncl.ac.uk/c.kray/ubieval.html.
I
t has been interesting to discover several independently arranged work-
PERVASIVE computing
87
CONFERENCES
IEEE INTERNET COMPUTING
M AY • J U N E 2 0 0 8
CONFERENCES
MAY • JUNE 2008 USEFUL COMPUTER SECURITY VOL. 12, NO. 3
USEFUL COMPUTER SECURITY
WWW.COMPUTER.ORG/INTERNET/
Facebook Expansion • Net Neutrality • Dynamic Linking
Engineering and Applying the Internet
Kay Connelly is an assis-
Steve Neely is a postdoc-
tant professor of com-
toral fellow in computer
puter science at Indiana
science at University Col-
University. Contact her at
lege Dublin. Contact him at
[email protected].
[email protected].
Katie A. Siek is an assis-
Graeme Stevenson is a
tant professor of computer
doctoral candidate in com-
science at the University
puter science at University
of Colorado at Boulder.
College Dublin. Contact
Contact her at ksiek@cs.
him at graeme.stevenson@
colorado.edu.
ucd.ie.
Ingrid Mulder is a pro-
Christian Kray is a lecturer
fessor for human-centered
in the School of Comput-
information-communica-
ing Science at Newcastle
tion technology at Rotter-
University. Contact him at
dam University’s Institute
[email protected].
for Communication, Media, and Information Technology and a lead researcher on user experience
IEEE Internet Computing reports emerging tools, technologies, and applications implemented through the Internet to support a worldwide computing environment. In upcoming issues, we’ll look at: • Mesh Networking • Service Mashups • Data Stream Management • RFID Software & Systems • and more!
www.computer.org/internet/
88
PERVASIVE computing
and methodological innovation at Telematica Instituut. Contact her at
[email protected].
shops motivated by defining and formalizing evaluation strategies in the ubiquitous computing domain. The workshop tracks have been popular in terms of submission rates and attendance. This strong interest seems to indicate that the evaluation of ubiquitous systems is an important topic requiring further research. These workshops highlighted the necessity for standardized evaluation techniques for reuse across the community. Specifically, they identified the need for further exploration of methods, data sets, and scenarios. The choice between controlled lab versus contextdriven in situ studies was a recurrent theme driving discussion. Many challenges remain to be addressed. How do we decide which combination of evaluation techniques to use and when? How can we more easily compare systems and identify advances? To what extent can we protect privacy
when recording data? What’s the best way to publish data sets and methodologies to make them open and accessible to others? What information can we infer using virtual environments as a means to evaluate ubiquitous systems? How do we effectively deal with context in highly dynamic environments? Currently, we’re preparing a special issue called “Advances in Evaluating Mobile and Ubiquitous Systems” for the International Journal of Mobile Human Computer Interaction. Additionally, further workshops on this topic are planned. The first upcoming one is USE 08 (www.useworkshop.org), which will be held in conjunction with the UbiComp conference.
ACKNOWLEDGMENTS We thank all the attendees for their papers, presentations, and participation in stimulating discussions.
www.computer.org/pervasive