User Evaluation of Augmented Reality Systems David J. Haniff and Chris Baber The University of Birmingham Edgbaston Birmingham B15 2TT
[email protected] Abstract Augmented Reality (AR) systems need to be evaluated for their appropriateness for a given task. Three approaches are used with this paper to evaluate the Waterpump Augmented Reality Tool (WART). The system is assessed using verbal protocol, performance time and a questionnaire. The WART system is compared with a paper version of the assembly instructions. The verbal protocol revealed that there was more cognitive processing for the paper version of task than WART, it took longer to complete the task with WART than the paper version and the questionnaire revealed the effect on performance of problems associated with AR. All of the participants in the evaluation, however, were positive about WART, and appreciated its usefulness.
1. Introduction The evaluation of Augmented Reality (AR) can take a variety of forms, for instance the undertaking of usability engineering or system prototyping. However, these techniques for evaluation do not examine what the user is thinking while using the AR system. All systems require the user to ‘think’ to perform the task, AR systems require the user to think in a specific way due to the nature of the systems. AR systems are usually implemented with a headmounted display (HMD) or portable display device, there is also the issue of system lag associated with the technology. These and other factors make the use of AR systems unique, moreover, the cognitive processing required to use the systems differs from other technologies. Its closest technological relation is virtual reality (VR); the subject of system latency is also an issue with VR [1].
A multi-disciplinary approach to system development can be used to consider the internal thought processes being carried out while using AR technology with context-aware graphics. Psychology is the study of how the mind works and is suitable for gaining an idea of the internal mechanisms functioning within the mind when using these systems. The psychological principle of verbal protocol (thinking out loud while performing the task) is used to evaluate Waterpump Augmented Reality Tool (WART) to ascertain its effectiveness. The cognitive processes can be examined by analysing the words spoken when the user ‘thinks aloud’. For example ‘I am finding the correct part’, the cognitive process executed is the search for the part. In order to support the use of verbal protocol other evaluation techniques can be used. A questionnaire specific to AR systems has been developed and used to evaluate WART. A more conventional approach of assessing systems is to ‘time’ the user while using the AR system. To evaluate the performance of AR it needs to be compared with another technology. The AR system is compared with a standard paper-based approach to the assembly of a waterpump. Paper versus WART is presented within this paper. WART is an AR system built using the AR Toolkit and provides instructions in the construction of a waterpump. The system uses markers to identify parts of a waterpump and superimposes graphics representing the parts that need to be placed in the assembly. The AR Toolkit has, however, not been fully evsluated.
2. User Evaluation of AR Usability engineering techniques such as ‘heuristic evaluation’ have been used by software developers to evaluate a set of criteria for the system; evaluators for example, assess the
Proceedings of the Seventh International Conference on Information Visualization (IV’03) 1093-9547/03 $17.00 © 2003 IEEE
user control [2] and the help system of the software. Verbal protocol analysis is more lengthy than many usability engineering techniques, however, more can be gained through a detailed examination of the data. The evaluation of AR systems and the information that is presented on the display can consist of a prototyping approach whereby the user informs the design by actually using the system. Feiner et al. [3] evaluate their touring machine by prototyping the system. The system was developed using mobile computers and assessed by the users. However, the cognitive processing is often overlooked using this method. By analysing what people are thinking about when using an information presentation technique within an application, we can understand the reasons for human performance levels for that particular information representation. AR representations have distinct attributes, such as the combination of rendered graphics with real world objects and the effects of system latency in the registration of the real objects. With the use of verbal protocol these unique thought patterns can be extracted using the technique. Verbal protocol has also been supported by other evaluation techniques such as the time taken to complete a task using the AR system and the use of questionnaires to gather the users’ response to the system. The AR evaluation techniques can be defined temporally according to the time of recording. For example, the recording of data is continuous for the verbal protocol, the ‘time’ is recorded at the end of the activity and the questionnaire is recorded after the activity. End: Time measure Time
Continuous: Verbal protocol
After: Questionnaire
Figure 1: Time of recording The qualitative measures of verbal protocol and the questionnaire are introspective and retrospective respectively. The verbal protocol requires the user to talk about their thought processes as they happen and the questionnaire is more reflective on the activity. The recording of the time is a quantative measure.
3. Evaluating AR using Verbal Protocol AR research has not looked fully at the user perspective of AR enabled activities. The user’s cognitive function can be understood by the verbalization of what they are thinking. This is called verbal protocol. Verbal protocol is used to ascertain the internal processing conducted by the user while carrying out a task. The verbal statements made by the individual are recorded using a tape recorder or by hand and then examined. There are many advantages to using verbal protocol, for example getting access to information otherwise unattainable and being an unobtrusive method of extracting cognitive and physical actions. There are also disadvantages such as the possible lack of ability of the subject to express what they are thinking and the view that thinking and verbalisation are two separate processes. Verbal protocol is however, an appropriate method for the extraction of thoughts. Ericsson and Norman suggest: “verbal reports, elicit with care and full understanding of the circumstances under which they were obtained, are a valuable and thoroughly reliable source of information about cognitive processes” pp. 247 [4].
4. WART: Waterpump Augmented Reality Tool Assembly requires the manipulation of objects to construct a whole. AR systems lend themselves to the assembly process due to its association with real world objects. Pertinent information can be displayed in the appropriate position within the construction. The user does not have to look away at secondary information for instructions, the information is within the field of view. For example, AR has been used for wirebundle assembly at Boeing [6], architectural construction [7] and for evaluating assembly sequences [8]. The WART system developed at the University of Birmingham provides a sequence of instructions attached to parts of an assembly for a typical construction activity. The context-aware system uses the AR Toolkit 2.11 libraries produced by Hiroshima City University and the University of Washington. The toolkit is written in Visual C++ and uses Microsoft Vision C++ classes. The AR system runs in Windows ’98 and uses the OpenGL glut
Proceedings of the Seventh International Conference on Information Visualization (IV’03) 1093-9547/03 $17.00 © 2003 IEEE
library to produce the virtual overlays. The distance that the pattern can be recognized is relative to the size of the pattern; the patterns in figure 4 are 3x3 inches. The camera used is a standard off-the-shelf Web cam. The system is video-based capturing the scene and adding rendered graphics to the image. The pattern matching process consists of storing a static bitmap representation of the target object through the use of a make_ pattern application. This pattern is then matched with an object in the real world that has been identified as having the basic structure of all patterns. The camera position is then ascertained by examining the size and orientation of the pattern. A virtual object is associated with the pattern within a data file and is then used on the AR display.
x
x
x
5. WART: Evaluation 5.1. Introduction The following evaluation compares the AR system, WART, with standard paper-based instruction consisting of a 2D engineering drawing. The context-aware AR technology itself requires a specific type of interaction with real world artefacts. For example, recognition problems can occur if the markers are moved too quickly, the system has problems catching up with fast movement. This situation could be improved with faster hardware. Nevertheless, the AR technology needs to be assessed for its usefulness despite these restrictions. There are also further augmented reality features that need to be addressed to assess its usability, these features can consist of problems associated with AR technology: x
x
x
System lag/registration: The system lag within the AR system is due to a low frame rate display. What is displayed is slower than what is happening in the real world. This is due to the computational intensity of the processing. The frame rate for high-end VR systems is approx. 30 frames per second (fps). Image Disparity: The camera’s view used to provide video feedback may be offset from the user’s view of the real world. This may lead to disorientation. Resolution: The sharpness of the display can affect the recognition of objects within the scene. The higher the resolution the more pixels need to be
processed for computer vision based AR systems. Rendering: The rendering of objects can aid the recognition of virtual objects. However, rendering adds computational load on the computer and as a consequence can contribute to system lag. Manoeuvrability: The manoeuvrability of the objects receiving supplemental information can be effected by the sensors being used. Tethered 3D trackers limit the movement of objects and system lag can lead to slow movement of objects as the user tries to match the slow frame-rate. Environmental Conditions: The environmental conditions can influence the input sensors such as with computer vision based systems, the lighting can effect the recognition of objects. In addition, with 3Dtrackers electromagnetic disturbances can effect the accuracy of the sensors.
Using these issues concerning AR a questionnaire has been developed to encapsulate these concerns. The AR specific questions presented to the users after using the AR system, which address some of these issues are:
1. The virtual objects swam as I moved the part (system lag)… 1. Not at all… 7. Very Much So 2. This system lag was detrimental to my performance… 1. Not at all… 7. Very Much So 3. The clarity of the display was…. 1.Poor………7. Very Good 4. The clarity of the display affected my performance…. 1. Not at all... 7. Very Much So 5. My movements were…. 1.Very Slow…7. Very Fast 6. I found the image displacement (camera view separate to eye view) difficult to adjust to… 1. Not at all… 7. Very Much So 7. I had difficulty performing the task due to this image displacement… 1. Not at all… 7. Very Much So 8. The virtual objects were appropriate for the task… 1. Not at all… 7. Very Much So
Proceedings of the Seventh International Conference on Information Visualization (IV’03) 1093-9547/03 $17.00 © 2003 IEEE
Figure 2: AR questionnaire The AR specific questions relate to finely grained [9] graphically based AR systems. Finely grained systems closely couple the virtual objects to the real world object, coarsely grained system are less accurate for example systems using GPS sensors. The aim of the study is to assess the performance of context-aware AR technology for an artefact-based activity by comparing AR with paper-based instruction. The study differs from the VR study presented in Boud et al. [10] by addressing the ‘on-line’ use of instructions (continual use of instruction) as opposed to looking at the amount of learning facilitated by the technology.
(see figure 3). The three main parts are placed in the sequence of assembly, the other parts are placed randomly, next to them. The three main parts themselves were placed at an angle of 30 degrees in order for the camera to a full view of the markers on the parts. The light was also restricted above the camera with a piece of card to avoid glare and consequent mis-recognition. The participants were instructed that there were three main parts with markers and instruction associated with them in the form of virtual graphical parts. The investigator also gave an example of what to do with the part with the marker on, in terms of positioning it within the view of the camera. They were also instructed to move onto the next main part once they had constructed the current main part (see figure 3).
5.2. Equipment A web-cam digital camera is mounted onto tripod to provide the video feedback. The camera is a standard web-cam running at 320x240 resolution. A Fujitsu Stylistic Pentium (200 MHz) is used to run the AR software. The lighting is restricted by closing the blinds to the room used for the experiment this is to prevent glare which effects the recognition of the system. Three visual markers are placed on the waterpump parts. The visual markers themselves are mounted on card to ensure that they remain flat. Shapes are within the squares on the visual markers. The water-pump parts are placed on a large desk with the camera and tripod.
5.3. Participants The study used ten postgraduate participants aged 22-30 from the University of Birmingham with no knowledge or experience of using context-aware augmented reality systems and constructing water-pumps. The participants had no visual impairments that could not be resolved with corrective lenses. There were 8 males and 2 females participating in the study.
5.4. Method The water-pump parts are placed in front them on a desk within a lab. They could either stand or be seated throughout the experiment whichever they felt comfortable with. Three of the water-pump parts have visual markers on them, these are the largest parts within the waterpump assembly. There are three virtual markers, each with a number of parts associated with them
Figure 3: Sub-assemblies The paper condition consisted of a 2D engineering drawing of the assembly process. The time taken for both the 2D condition and the AR condition was recorded. In addition, the participants are asked to think out loud while doing the task. Ericsson and Simon [4] point out that verbalization will not change the processes involved in completing the task but it may slow down the subject’s performance. The comparison of paper versus WART within this paper both use verbal protocol, they are together subject to the time delay, the comparison is therefore fair. The verbal protocol data was recorded by hand due to the availability of equipment while conducting the experiment. There was minimum contact with the participants while conducting the activities. Following the AR condition they were then asked to complete a questionnaire and asked to comment on the technology for the AR system. A questionnaire was not provided for the paper condition. The objective of the questionnaire was to extract the opinion of the subjects on issues concerned with the use of AR systems. Issues such as those described earlier for instance image disparity and resolution. These are specific
Proceedings of the Seventh International Conference on Information Visualization (IV’03) 1093-9547/03 $17.00 © 2003 IEEE
concerns for the use of the AR toolkit and the hardware used on the AR system.
Verbal Protocol 120 100 80 60 40 20 0
stylistic & camera waterpump
AR Paper
Mean Utterances
Figure 6: Mean Utterances
Figure 4: Water-pump AR Tool
5.5. Results It took less time completing the task with the paper-based instructions than the AR system (see figure 5). The standard deviation for the AR condition was 1.24 and the standard deviation for the paper condition was 0.3. A student t-test was performed on the data, the difference between the two conditions is significant [t(4)=4.604, p