Exploring Multimodality in the Laboratory and the Field - CiteSeerX

2 downloads 112 Views 267KB Size Report
Oct 6, 2005 - In Proceedings of Designing Interactive Systems. (DIS2000) (Liverpool, UK, 2000). [32] Tyldesley, D.A. Employing usability Engineering in the.
Exploring Multimodality in the Laboratory and the Field Lynne Baillie

Raimund Schatz

FTW Donau-City-Strasse

FTW Donau-City-Strasse

Vienna, Austria +43 1505283098

Vienna, Austria +43 1505283098

[email protected]

[email protected]

ABSTRACT There are new challenges to us, as researchers, on how to design and evaluate new mobile applications because they give users access to powerful computing devices through small interfaces, which typically have limited input facilities. One way of overcoming these shortcomings is to utilize the possibilities of multimodality. We report in this paper how we designed, developed, and evaluated a multimodal mobile application through a combination of laboratory and field studies. This is the first time, as far as we know, that a multimodal application has been developed in such a way. We did this so that we would understand more about where and when users envisioned using different modes of interaction and what problems they may encounter when using an application in context.

Categories and Subject Descriptors H.5 Information Interfaces and Presentation (I.7); H5.2 User Interfaces: User-centered design.

General Terms Design, Experimentation, Human Factors and Theory.

To investigate the potential benefits of multimodal interfaces in mobile contexts, the project developed three applications that are targeted at the domains of work (messaging client), entertainment (quiz game) and travel (tourist guide). For the sake of brevity and clarity we present only one application in this paper: the messaging client. The messaging client is intended to provide business users with multimodal access to messages including email, voice messages, SMS and MMS, while on the move. Figure 1 presents an overview of our application development process. We start with Study 1: Requirements Gathering (see Section 4). The scenarios and insights generated from these studies guide the development of storyboards and use cases. In addition to the requirements gathering and final user evaluation, we involve users at an intermediary stage (Study 2: Design, see Section 5) in order to obtain feedback on our first prototypes. The results from the design study inform the subsequent redesign stages which focus on the graceful degradation of the visual design down to smart- and low-end phones as well as the design of the voice interface. The final evaluation (Study 3: Evaluation, see Section 6) assesses the overall usability of the application and the value of multimodal interaction as perceived by our target group. Conceptual Scenarios

Keywords Action Scenarios, Multimodal Interaction, Mobile applications and devices.

The advent of 3G technology poses challenges to researchers and application designers who are trying to build mobile multimedia applications for business and entertainment usage - mainly on how to address the limitations of mobile interfaces e.g. the restrictions of tiny displays and keypads, context of use and information retrieval on the fly. These and other factors add to the difficulty of producing a useful and usable design. Our research project focuses on the development of an infrastructure for multimodal applications targeted at a range of different mobile client devices (PDAs & Smart-Phones). Users should be able to interact visually, by voice or a combination of both according to their current individual preferences. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICMI’05, October 4–6, 2005, Trento, Italy. Copyright 2005 ACM 1-59593-028-0/05/0010...$5.00.

Study 1. Requirements Gathering (Lab and Field)

Preparation

1. INTRODUCTION

Storyboards Use Cases Visual Prototype Design for PDA

Study 2. Prototype Evaluation (Lab and Field)

Application Version 1 Study 3. User Evaluation (Lab and Field)

Implementation

Redesign for PDA Visual Design Symbian Voice Interface Design

Figure 1. Application Development Process. This paper describes how we designed and evaluated a multimodal mobile messaging application in the laboratory and in the field. The methods that informed our research are discussed in Sections 2 and 3. The next sections of the paper focus on the application’s evolution through the development stages of requirements gathering, design, redesign, and evaluation. We close with a discussion of what we learned from undertaking this research as to the appropriateness of multimodality for mobile applications.

2. BACKGROUND Most of the mobile studies we have encountered focus on laboratory studies. However, it has been claimed that the use of laboratories can foster a lack of cooperation between designers,

usability specialists, engineers and users [7], mainly because of the focus it has on the analysis/evaluation part of the design process. Investigations in the field have been undertaken into the use of mobile devices, however, they have mainly focused on either trying to gain knowledge about an already deployed device by logging users and interviewing them about their use [20, 26] or on the evaluation stage [8, 4, 14, 17]. A further issue was that the only comparable studies we could find on the evaluation of multimodal applications had taken place in either the laboratory or in a quiet office sitting at a desk [18]. We did not think this was the only context in which our users would use our application and therefore concluded that it was essential to undertake studies in the ‘wild’. After investigating several possible methods and techniques for involving users in our project, we settled upon using some of the principles and techniques from Scenario [9] and Participatory Design. The reason for using these methodologies was that they are user driven rather than designer driven and have proved successful in the past in new, unusual and mobile settings [1, 28, 30]. We also invented some new techniques of our own.

3. METHODS A user study matrix was developed to aid us in the design, development and evaluation of three multimodal mobile applications for our project however for the sake of clarity and brevity we only present one application in this paper. The orientation of our matrix is toward helping users take part in the design process. There are some practical problems in applying this approach to design; for instance, time pressure and manpower. However, these problems are not limited to this approach. The matrix, its focuses, and the methods employed are outlined in Table 1 and in the following sub-sections. Table 1. User Studies Matrix Focus

Method

Study 1: Study 2: Requirements Design gathering

Study 3: Evaluation

Phase I: Observational Essential Discovering Studies Current practices

Not Required

Not required

Phase II: Facilitating discussion

Multi-modal Scenario Discussion

Essential (concrete scenario)

Not required

Phase III: Gathering data about the user and their wants

Questionnaire Essential

Not Required

Essential

Phase IV: Action Investigating Scenarios current problems and Future Use Phase V: Contextualising ideas for the application

Essential (conceptual scenario)

Essential: Essential: focus on past focus on interaction prototype application

User Sketches/ Essential Voice Interaction Snapshots

Essential

Essential: focus on finished application Not Required

3.1 Phase I: Discovering Current Practices In this phase we undertake field observations (in this case of the messaging client in the local area around the research center and

in the city) during these observations we watch people using their mobile applications in different settings e.g. in the underground, while using escalators, and so on. We believe that undertaking observational studies in the field is essential as other researchers [33, 16] have found that undertaking observations in the field provided their project with richer insights into people’s actual use of their mobile phones. We use the results from the observations to aid us during our scenario building sessions.

3.2 Phase II: Facilitating Discussion We thought that scenarios may be an ideal tool to facilitate involvement of users at different stages during our design process as they have proved to be a useful tool for early exploratory design situations and user evaluations in the past [5, 7, 30]. Another reason was that we felt that using scenarios in the design of our applications would help us to keep the future use of the envisioned applications in view as the applications were developed. Scenarios can also be used for 'good' and 'bad' use situations [6]. This can help to clarify what users want, as well as what they don’t want. Scenarios have been used in the workplace to facilitate discussions about new design concepts. However, can they be used to discuss the use of a multimodal mobile application in the same way? Gaver and Dunne [13] thought that the use of impressionistic scenarios may help communicate ideas about possible design concepts. However, they felt that conveying these ideas at this level to people presented a challenge; for example, if the scenarios were presented too abstractly, people could not imagine interacting with the application or system; if they were presented too concretely, the users would focus on the details rather than the overall intentions. O’Brien et al., [25] however, found in their home studies that when people think of a new technology they think of the possible uses they could have for it and translate these into scenarios. Therefore, people do not just think of stories when talking about current technology, but actually apply old stories to possible new situations with new devices. However, we should be wary of only relying on this data as a way of approaching the design of new devices for mobile communication, as the design could fail due to a lack of connection to daily practice. Tollmar et al., [31] suggested that a possible solution to this problem could be to use scenario-based design in conjunction with field observations (we incorporated this solution into our user studies matrix). These studies suggested to us that the use of scenarios as a way of eliciting information from participants about current activities, and also perhaps orientating them to the possibilities for new applications, may prove to be a fruitful one. When it comes to mobile technologies, the ability to conceptualize the social setting of the application, and its use, is crucial. As Suchman [29] said the significance of an action is based in a fundamental way to the physical and social world. One potential problem of using scenarios is suggested by [16], who said that there are limitations to using scenarios in traditional user-centered design, as they are far removed from the situatedness of the activity. Some researchers have tried to combat this by acting out a scenario in-situ themselves, to try and understand the potential use of an application [15]. We discuss how we used Action Scenarios in an aim to combat this problem in Sub-section 3.5. We used scenario discussions as a tool to facilitate understanding between designers and users of the new application being

suggested. The designers present the scenarios to participants and ask them to provide them with feedback on any parts of the scenarios they feel need changing, that they do not like, or do not understand.

3.3 Phase III: Gathering Data At certain times during the development of an application more concrete information is required. In the first study (requirements gathering) we use a questionnaire to ask questions based around the main topic areas of the application. In the case of the messaging client these were: context (where and when do the users see themselves using the application), filtering, modality use, billing, and how the phone should deal with incoming call and texts. We do not usually use a questionnaire in the Study 2: Design as it has been found not to be necessary, but one can be used if required. In the Study 3: Evaluation, we use a standard usability questionnaire to gauge user satisfaction.

3.4 Phase IV: User Sketches, Annotations and Voice Interaction Snapshots (VIS) In this phase we ask participants to sketch their own interfaces for an application. We ask them to take into account the fact that the interface will be considerably smaller than they are used to seeing on their PC or laptop. We take with us to the user studies various mobile devices (e.g. Smart phone, WAP enabled phone) so that the users can see the different types of phone an application could be deployed on. There are two reasons why users are asked to sketch an interface for an application. Firstly, it is hoped that by asking them to sketch their own interface, we will be providing a way of extracting and learning about the needs and wishes of users. Secondly, it is anticipated that by asking the more diffident participants to sketch their own interface it can help them to create an overflow in their imagination, as described by McKim [22]. He found that visual thinking is experienced to the fullest when seeing, imagining, and drawing merge into active interplay. Other researchers have also found that this is a fruitful way of inviting participants to conceptualize their ideas [2, 11]. In Study 2, the emphasis is changed in that we provide the users with HTML mock-ups and asked them to choose one function and create a Voice Interaction Snapshot (VIS). The reason for the invention of this technique was that we realized that the requirements gathering techniques we were using did not gather enough data about the voice interaction. As a consequence we found that we could not fully implement voice into the prototypes until we found a way of gathering more information on this topic from users. We had the general requirements from the first study for when and where voice would be used but not the ‘how’. Therefore we needed a new tool/technique that would aid us to gather this type of information from users. The tool works in the following way: participants are asked to choose one part of the system e.g. going to their inbox or sending a new email, and map out a Voice Interaction Snapshot, the resulting snapshot looks somewhat like a flow chart. For Study 3: Evaluation, we ask users to annotate print outs of the application screens after completing the field and laboratory studies in an aim to gather data about their subjective opinion of the icons, colors and layout.

we decided to build a new tool: Action Scenarios [3]. There are two types of Action Scenario, the first type (used in Study 1) tries to discover past problems encountered with a device or application (e.g. a potential user is asked to provide a user story or vignette about a past experience with a device or application and act out this experience, either with the device itself or a device provided to them as a prop), the second type of Action Scenario (used in Studies 2 & 3) is one which: - has been built with the sole purpose in mind of a user acting it out in situ (one of our goals was to discover more about the usability of using multimodality in everyday situations, therefore using an application in situ could be said to be a prerequisite to learning more about actual use); - requires the user to undertake common tasks which they would normally do when using the application (for our messaging application common tasks could be said to be: reading a message, replying to a message, forwarding a message, or deleting a message). The potential user is also free to invent an Action Scenario of their own on how they would interact with the application or service. However, the user should always be given an Action Scenario, provided by the project team, so that they can understand the potential of the technology and know what level of detail an Action Scenario should have. Howard, et al., [15] used scenarios in a similar way they used them as a prop and asked actors to act out scenarios in situ, they said that they found that this gave the design team the opportunity to examine the interaction between the device and the physical, social and digital context. However, they encountered problems with using actors to act out scenarios as the actors had concerns about dramatic tension, characterization and emotional integrity. Our work differs from theirs in two ways. Firstly, we involved potential future users taken from the general population. Secondly, we encouraged the users themselves to develop their own Action Scenarios. We hoped that by doing this they would feel they were actively contributing to the design and evaluation in the same way as suggested by [10, 23]. Data was captured from the studies in four ways: facilitator’s notes, participants’ sketches/annotations/VIS, questionnaires, the studies were also videotaped. The notes and tapes were reviewed by two of the design group and, when possible, with the participants themselves.

4. STUDY 1: REQUIREMENTS GATHERING In the first user study the users undertook Phases II-V. The participants consisted of a mix of senior administrative staff and managers (though not computing or engineering orientated staff), aged 26-41, of a telecommunications service provider. Twelve participants took part in the study and the participants were split evenly along gender lines. The study was videotaped and notes were taken by facilitators (see Section 3.5 for further information). Each participant was given paper, pens and pencils on which to make notes and draw.

3.5 Phase V: Action Scenarios

4.1 Results

Because of the multimodal possibilities of our applications we decided that a richer tool than a scenario was required. Therefore

The participants started by discussing the scenario provided by the design team. The participants talked about the different parts

of the scenario, where and when they could imagine themselves using such an application, discussing the different points contained in the scenarios and making suggestions of their own. The next phase saw the participants filling in a questionnaire, which asked for information such as: age, experience with email applications and mobile devices, and so on. We also used this opportunity to gather other information, such as contexts in which they saw themselves using such an application (see Table 2). We used the outcomes from Table 2 to decide upon when and where we would ask users to undertake Action Scenarios in the field. Table 2. The Preferred Mode of Interaction of the Participants in different Contexts Context

Modality

Context

Text

Voice

Mix

Would not use

Outcome

Bus

12

0

0

0

text

Train

11

0

1

0

text

Walking

2

5

5

0

mix

Airport

6

0

5

1

text

Breaks

2

1

9

0

mix

Home

6

0

5

1

text

Bar

5

0

7

0

mix

Driving

0

6

0

61

voice/WNU

The participants were then asked to act out an Action Scenario of when and how they might use our concept application. The Action Scenarios suggested by the participant contained contexts we had not thought about (e.g. usage while cooking or while doing housework). Also, many of the participants, in the beginning, questioned the use of voice, the Action Scenarios helped them see that there could be contexts in which the use of voice would make sense and would be usable e.g. listening to an email while running on a treadmill at the gym. In addition to this information, Action Scenarios enabled participants to prioritize our list of suggested application features and elicit new ones. In the final phase the participants drew a sketch of how they envisioned the application looking (see Figure 2). This resulted in a set of interesting points being raised that pertained to the overall concept, that we could consider incorporating into our design, and a list of likes and dislikes of that particular participant.

Figure 2. The participant on the left is explaining her drawing of the messaging client and the participant on the right is acting out his own Action Scenario. We were now at the stage of drawing up a strategy for the visual prototype design as the studies provided us with information which we could use to revise the application concept, build an iteration roadmap and revise and finalize the application scenarios.

5. STUDY 2: DESIGN From the information gathered from the first study we drew up storyboards, sketches and use cases. We then began to develop the application screens in HTML. Since we had taken into account multimodality during our user studies and subsequently in the drawing up of the use cases we had hoped that our interface screens would naturally reflect this. However this was not the case and we found that it was difficult if not impossible to know how a user would interact with the messaging client by voice. In order to combat these uncertainties, we decided that for the initial prototype, at least each voice command should correspond with each visual command. Once we had completed the first prototype of the application we again approached users to see what they thought of our prototype and whether they agreed/disagreed with our concept of the user interaction. We undertook the second study for two reasons: we needed to evaluate our application with potential users in context and we wanted to find out more about the potential use of voice/multimodality within the application.

5.1 Procedure The same participants took part in the second study and the data was collected in the same way. We decided to employ phases 2, 4 and 5, but not 3. The reason for this was that we were dealing with a very basic prototype and did not think that a usability questionnaire was appropriate at this stage. In addition, the relevant user information had already been gathered from the users during the first study.

5.2 Results The scenario discussion was again lively with the participants engaging with our application concept. While we were confident that the participants could draw or annotate the printouts provided, we were less sure that they could map out a possible VIS. However, our concern was unfounded as the participants realized that what we wanted was an idea of the interaction flow between a user and the application (an example of one of the VIS’s provided by one of our participants can be seen below). Participant (Voice Command): “go to inbox” System Audio Response: “you have x number of messages” Participant (voice Command): “view message 1” Participant comment: “I start reading the message and decide I want to reply to the sender: Reply text” Participant: “I then start typing in my reply using text”

1

The reason given to us for not using the application in the car was that some of the participants were concerned about safety issues.

(Tom Wright: User Study, Scotland, 1/12/04) The users then undertook Action Scenarios in the field. The participants could clearly articulate problems they had with the application while working through the Action Scenarios with the

application prototype in the field. The Action Scenarios aided us to evaluate and redesign our application and learn more about problems with the proposed application in context. The feedback from the Voice Interaction Snapshots helped us streamline the flow of interaction in some parts of the application. For example, we changed the messaging application so that it would automatically select the intended message modality of a newly composed message from the context: if the user speaks “new message” by voice, the default message modality would be a voice message. We also discovered more about how the users thought they would use the multimodal features within the application. From the results, we were able to abandon table pagination in favor of vertical scrolling for inboxes with a large number of messages, redesign icons and interface elements that the participants had found confusing. In addition, we were able to remove and add some voice commands to the applications to accommodate large variety of phrasings our users exhibited, and know the preferred modalities of users in certain contexts (some of these changes can be seen in Figure 3). Comprehensibility of Icons

Rapid selection of table lines and opening of messages important

Long tables should not be split excessively into pages. Scrolling is accepted.

Figure 3. The design before the study (left) and the redesigned interface after the study (right).

6. STUDY 3: EVALUATION The aim of the evaluation was to investigate the overall usability of the application as well as to discover whether multimodality can truly improve mobile interfaces. Any user evaluation should assess: the extent and accessibility of the applications functionality, the user’s experience of the interaction, and identify specific problems. In addition, our study also investigated whether the user had a different experience and used different modalities in the field than the lab.

6.1 Procedure The evaluation took place at the research center. There were two settings in which the users undertook the evaluation: in a laboratory free from interruptions and noise, and outside encompassing the area from the research center building to the nearby shops and train station. It was hoped that we would discover additional information about the user’s interaction with

the application by undertaking part of the study in a natural setting. We also anticipated that we would discover differences between the choices of interaction with the application when in the field, as opposed to the lab. The number of test users used in the study was ten, five for each application. We believe that this is a reasonable number of test users as many usability experts believe that between 4-6 users is an adequate sample size for such a study [10]. The test users were evenly split along gender lines and reflected the general user population in all aspects except for age. Unfortunately we found it impossible to attract older test users to evaluate our application. We used a within subjects design. Our independent variable was context, which had two possibilities: field or laboratory. User Group 1 were asked to undertake the four Action Scenarios in the laboratory and then in the field and Group 2 vice versa. We used a dependent variable: time taken to complete an Action Scenario. Others have commented that time for a task, such as reading a email message can be measured as 5=Unacceptable [12]. However, we would counter this by saying that these measures were for accessing email on a PC and not while on the move. We have found that users begin to feel frustrated with an application if they cannot complete a task within a shorter period of time e.g. 2 minutes, this is something that has been found by other researchers [24]. We undertook some pre-testing with 2 expert users and found that they could complete each Action Scenario in less than two minutes. We therefore felt that the following rating scale was more appropriate for our application: 3=Unacceptable. We also gathered other performance measures (as recommended by [32] and [12]) such as: time spent/recovering from errors, number of errors (menu and selection), and repeats. Following the same procedure as above, we decided on a rating scale for errors that we would apply for each Action Scenario: 0=Excellent, 1-2 Acceptable, and >2= Unacceptable. The users were given a set of four Action Scenarios to be performed. Our Action Scenarios were in quite a different format compared to task lists as can be seen below: You are a businessperson called Alex who is currently making their way from their office to a meeting across town. You want to stay up to date with your messages, even though you do not have access to a PC. Please complete the following scenario acting as ‘Alex’ using the mobile device. You receive a txt message from Julia asking why you have not responded to the email she sent you. You open up the messaging application and start to look for it. You find the message you received from Julia ([email protected]) and start to read it. After reading the email, you decide to answer it quickly, so you compose an answer and send it back to Julia. You want to make sure that the message has been sent so you switch to the Sent box and check if the message you sent to Julia has been sent, as you don’t want another reminder from her.

The users were given a user satisfaction questionnaire and asked to fill it in after each Action Scenario. The questionnaire followed the same structure as ones used in other usability studies [19]. This was done so that we could assess the user satisfaction with the application in both settings. During the studies the users were encouraged to talk to the evaluator and discuss freely the problems they were having with the application. According to [10] this form of evaluation has the following advantages: the process is less constrained, the user is

The users were given a brief introduction to the messaging application in our lounge while having a coffee. A facilitator took them through a sample scenario (the same one was demonstrated to all the participants) and the different parts of the evaluation study procedure were explained. The user then undertook a sample scenario themselves before commencement of the study. We analyzed the data using simple descriptive statistics e.g. means, medians and times. The rationale for this is that we believe that these are sufficient to make meaningful recommendations based on our small sample size (10 users, five for each condition). This view is supported by other usability researchers and practitioners [27, 12]. Inferential statistics could have been used to obtain statistically significant results. However, we feel that the application of these tests are most suitable when comparing two versions of an application or when the sample size for each condition is at least 10 to 12 participants, this was not the case with this study.

6.2 Results It can be seen from Figure 4 that in the lab only one Action Scenario was completed in the time frame rated as excellent, in the other three cases the time taken would be rated as acceptable or unacceptable.

Time Taken (mins)

Lab 3.00

Group 1 Group 2

2.00 1.00

Overall (Figure 6) the time taken in the lab to complete each Action Scenario was longer than in the field. This result was not what we expected as we thought, before the evaluation, that the users would taken longer in the field as they would be more flustered and there are more distractions. Ove rall Tim e Tak e n Lab Time Taken (mins)

encouraged to criticize the system, the evaluator can clarify points of confusion at the time they occur and so maximize the effectiveness of the approach for identifying problem areas.

3.00

Field

2.00 1.00 0.00 s1

s2

s3

s4

Sce nario Num be r

Figure 6. Time Taken to complete an Action Scenario Overall. The error rates followed a similar pattern to that of the time in that there were more errors in the Lab (mean number of errors 2.6 =unacceptable) than in the Field (mean number of errors 1.5 =acceptable) this again was something that we were not expecting. However, it perhaps points to the value of undertaking field studies throughout the design process as the application that resulted was easier to use in one of the main contexts in which it will be used. Another reason for these results could be that (as the facilitators noted) the users seemed more relaxed in the field. We encouraged the users to undertake the Action Scenarios wherever they wanted in the local and to undertake any other tasks they would normally do. The users took us at our word, and one user lit up a cigarette to smoke while undertaking an Action Scenario (see Figure 7).

0.00 s1

s2

s3

s4

Sce nar io Num be r

Figure 4. Time taken to complete an Action Scenario in the Lab. If one looks at the field results however, it can be seen that the result was reversed and 3 out of 4 of the Action Scenarios could be completed in less than 2 minutes (excellent). Only one (Action Scenario 1) took slightly longer, achieving a result of only acceptable.

Time Taken (mins)

Fie ld 3.00 Group 1 2.00

Group 2

1.00 0.00 s1

s2 s3 s4 Sce nar io Num be r

Figure 5. Time taken to complete an Action Scenario in the Field.

Figure 7. Participant using the messaging client in the field and the lab. It could also be the case that the users perceived that they had more control over the field environment than the lab and were therefore more comfortable and relaxed in this setting. As regards the usefulness of multimodality we found that when we asked Group 1 (Lab/Field) if they had found the multimodality option useful (after they had used the application in the laboratory only), the majority said that they did not find it useful. However, this result was completely reversed once they had used the application in both settings. We also found that in the lab the users preferred to use either voice or text and did not mix modalities. In contrast, in the field the users tended to mix modalities as and when they thought it was suitable e.g. when traveling on escalators, going up or down stairs and so on.

The video tape, notes and annotated screenshots were analyzed by three usability experts to produce a list of usability problems. They counted the number and severity of usability problems identified in each setting as this would show the extent to which each setting supports the identification of usability problems. Table 3. Number of usability problems identified in each setting Severity Rating Level 1: Prevents completion of a task Level 2: Creates significant delay and frustration Level 3: Minor Effect Level 4: problems that are subtle and point to a possible future enhancement.

Lab 6 13

Field 5 12

6 5

2 6

Sum

30

25

There was only a minor difference between the identification of severe problems (levels 1 and 2) in the lab and the field (a difference of one problem in each level). More minor usability problems were identified in the laboratory than in the field, this finding would agree with previous research in this area [4]. At level 4 there was again only one extra usability problem identified. Finally, by building multimodal applications and making users central to that development we believed that our applications would be more acceptable and usable in a mobile environment. We found that the users agreed and were more satisfied with the application overall in the field (the system being rated as 3.84 in the field and 2.44 in the lab (1= poor 5= Excellent)).

7. DISCUSSION As regards the new tools that we invented, we found that the Action Scenarios and Voice Interaction Snapshots proved to be useful and could be used at different stages of the design process. The Action Scenarios provided us with information about the cultural use, context of use and helped us to concretize our design concepts and redesign our applications. In addition they proved to be very useful at the evaluation stage as instead of giving users a single task or a list of unrelated tasks to complete, they were asked to complete a whole action sequence (which is something that could be said to be more natural), this is in contrast to normal practice which is to give people a list of tasks [18]. One of the reasons for using Action Scenarios was the difficulties we had in giving users such a task list in the field in previous studies at our research center. We found that the users had to constantly refer to the list and this caused them to stop using the application and then return to it after an interval of several seconds. The two techniques are flexible enough to fit in with and supplement already existing approaches to requirements gathering, design and evaluation. Some researchers have claimed that undertaking user evaluations in the field are not ‘worth the hassle’ (Kjeldskov et al. [17]) and are expensive, as have others [21]. We hope that our findings go some way to refuting these claims as we found that the overall usability of the application, in terms of effectiveness, efficiency and satisfaction was rated more highly in the field than in the lab. In addition, the users made fewer errors and were more relaxed in

the field setting. The field studies were (if all relevant costs are taken into account e.g. cost of maintaining and equipping a room as a laboratory) found to cost roughly the same as the lab study. The cost of a user study is more dependent, we think, on the type of study e.g. depth, breadth, the need for specialist settings, and so on, than to setting alone.

8. CONCLUSION The paper presents an overview of our application development process in which we used a new user centric matrix to aid us in our goal of building a more usable multimodal mobile application that would work well in context. We started with Study 1: Requirements Gathering (see Section 4). The scenarios and insights generated from these studies guided the development of the storyboards and use cases. These artifacts do not only serve as starting points for the visual prototype design but also provide us with information on aspects of mobile multimodality. In addition to the requirements gathering and final user evaluation, we involved users at an intermediary stage (Study 2: Design, see Section 5) in order to obtain feedback on our first prototypes. The results from the design study informed the subsequent redesign stages which focused on the graceful degradation of the visual design down to smart- and low-end phones as well as the design of the voice interface. The final evaluation (Study 3: Evaluation, see Section 6) assessed the overall usability of the applications and the value of multimodal interaction as perceived by our target group. We hope that these findings go some way to clarifying the need for both lab and field studies to be undertaken throughout the design of an application to ensure the resulting usability of such an application. And that undertaking studies solely in the lab or the field is not sufficient as both settings are required to uncover important usability issues, as our work has demonstrated. In conclusion, in this paper we have provided some new techniques for the design and evaluation of multimodal mobile applications and offered some insights into how studies can be conducted in the field and lab at different stages during the design, development and evaluation of multimodal mobile applications. However, we believe that our research is quite limited and more research on tools and methods appropriate for carrying out such studies are crucial if we are to gain user acceptance of multimodality.

9. References [1] Andersen, N.E., Kensing, F., Lundin, J., Mathiassen, L., Munk-Madsen, A., Rasbech, M., and S∅rgaard. P. Professional Systems Development: Experience, Ideas, and Action. Prentice-Hall, London, 1990. [2] Baillie, L., Benyon, D.R., Macaulay, C., Petersen, M.G. Investigating Design Issues in Household Environments. Cognition, Technology and Work, 1, 33-43, 2003. [3] Bell, D. Flirt project (Flexible Information and Recreation for Mobile Users). Available from: http://www.research.philips.com/Assets/Downloadablefile/pa ssw4_22(1)-1048.pdf [4] Beck, E.T., Christiansen, M.K., and Kjeldskov, J. Experimental Evaluation of Techniques for Usability testing of Mobile Systems in a Laboratory Setting. In Proceedings of OzCHI’2003. Brisbane Austriala, 2003.

[5] Bellotti, V., and Smith, I. Informing the Design of an Information Management System with Iterative Fieldwork. In Proceedings of DIS’2000 (New York, 2000), 227-237. [6] B∅dker, S. Scenarios in User-centered Design-Setting the stage for reflection and action. Interacting with computers. Elsevier, 13, 2000, 61-75. [7] B∅dker, S. & J. Buur. The Design Collaboratorium-A Place for Usability Design. ACM Transactions on ComputerHuman Interaction, 9, 2, 2002, 152-169. [8] Brewster, S. Overcoming the Lack of Screen Space on Mobile Computers. Personal and Ubiquitous Computing, 6 (3), 2002, 188-205.

for Use. International Journal of Human-Computer Interaction, 7, 1, 1995, 57-78. [20] Makela, A., Giller, V., Tscheligi, M., and Sefelin, R. Joking, storytelling, art sharing, expressing affection: A field trial of how children and their social network communicate with digital images in leisure time. In Proceedings of CHI’00, ACM Press, New York, 2000, 548-555. [21] Mao, J.Y., Vredenburg, K., Smith, P.W and Carey. T. The State of User-Centered Design Practice. Communications of the ACM, March 2005, 48, 3, 106-109. [22] McKim, R.H. Experiences in Visual Thinking. Boston Massachussettes, PWS Publishers, 1972.

[9] Carroll, J. M. Making Use: Scenario-Based Design of Human-Computer Interactions. MIT Press, Cambridge, Mass, 2000.

[23] Monk, A.F. Wright, P., Haber, J., Davenport, L. Improving your Human-Computer Interface: A Practical Approach. Prentice Hall International, Hemel Hempstead, 1993.

[10] Dix, A., Finlay, J., Abowd, G., and Beale, R. HumanComputer Interaction. Hemel Hempstead, Prentice Hall, 1993.

[24] Nikkanen, M. User-Centered Development of a Browseragnostic Mobile E-Mail Application. In Proceedings of NordCHI’04 (Tampere, Finland, Oct 23-27, 2004). ACM Press, New York, NY, 2004, 53-56.

[11] Druin, A. Cooperative Inquiry: Developing New Technologies for Children with Children. In Proceedings of CHI’99 (Pittsburgh, USA, 1999). [12] Dumas, J.S., and Redish, J.C. A Practical Guide to Usability Testing (Revised Edition), Exeter, England, Intellect, 1994. [13] Gaver, W., Dunne, A. Projected Realities: Conceptual Design for Cultural Effect. In Proceedings of CHI’99 (Pittsburgh, USA, 1999). ACM Press, New York, 1999. [14] Goodman, J. Gray, P., Khammampad, K., and Brewster, S. Using Landmarks to Support Older People in Navigation. In Proceedings of Mobile HCI 2004 (Glasgow, Scotland, Sept 13-16, 2004). Springer-Verlag, Berlin, Heidelberg, 2004, 3848. [15] Howard, S., Carroll, J., Murphy, J., Peck, J., and Vetere, F. Provoking Innovation: Acting-out in Contextual Scenarios. In Proceedings of HF2002 Human Factors Conference (Melbourne, Australia, 2002). [16] Iacucci, G., Kutti, K. and Ranta, M. On the Move with a Magic Thing: Role Playing in Concept Design of Mobile Services and Devices. In Proceedings of Designing Interactive Systems (DIS’00). ACM Press, 2000, 193-202. [17] Kjeldskov, J., Skov, M.B., Als, B.S., and Hoegh, R.T. Is it Worth the Hassle? Exploring the Added Value of Evaluating the Usability of Context-Aware Mobile Systems in the Field. In Proceedings of Mobile HCI 2004 (Glasgow, Scotland, Sept 13-16, 2004). Springer-Verlag, Berlin, Heidelberg, 2004, 61-73. [18] Lai, J. Facilitating Mobile Communication with Multimodal Access to Email Messages on a Cell Phone. In Proceedings of CHI’04 (April 24–29, 2004, Vienna, Austria). ACM Press, New York, 2004, 1259-1262. [19] Lewis, J. R. IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions

[25] O’Brien, J., Rodden, T., Rouncefield, M., Hughes, J. At Home with the Technology: An Ethnographic Study if a SetTop-Box Trial. ACM Transactions on Computer-HumanInteraction, 6, 3, 1999, 282-308, [26] Palen, L., Salzman, M. Beyond the Handset: Designing for Wireless Communications. ACM Transactions on Computer Human Interaction, 2002, 9, 125-151. [27] Rubin, J. Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. [28] Sperschneider, W., and Bagger, K. Ethnographic Fieldwork Under Industrial Constraints: Towards Design-in-Context. In Proceedings of NordiCHI2000 (Stockholm, Sweden, 2000). [29] Suchman, L. Making work Visible. Communications of the AM, 38, 9, 1995, 56-64. [30] Tamminen, S., Oulasvirta, A., Toiskallio, K., and Kankainen, A. Understanding Mobile Contexts. In Proceedings of Mobile HCI2003 (Udine, Italy, 2003), 2003, 17-31. [31] Tollmar, K., Junestrand, S., Torgny, O. Virtually Living Together: A Design Framework for New Communication Media. In Proceedings of Designing Interactive Systems (DIS2000) (Liverpool, UK, 2000). [32] Tyldesley, D.A. Employing usability Engineering in the Development of Office Products. Computer Journal, 31, 5, 1988, 431-6. [33] Weilenmann, A. and Larsson, C. Collaborative Use of Mobile Telephones: A Field Study of Swedish Teenagers. In Proceedings of NordChi2000 (Stockholm, Sweden, 2000).

Suggest Documents