The Use of Eye-Tracking in Usability Testing of Medical Devices

0 downloads 0 Views 10MB Size Report
The study presented here was made in collaboration with a medical device manufacturer. Normal- ly, the device manufacturer uses verbal self-reporting ...
Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

192

The Use of Eye-Tracking in Usability Testing of Medical Devices Thomas Koester, Jesper E. Brøsted, Jeanette J. Jakobsen, Heike P. Malmros & Niels K. Andreasen FORCE Technology, Department of Applied Psychology Kongens Lyngby, Denmark The study presented here was made in collaboration with a medical device manufacturer. Normally, the device manufacturer uses verbal self-reporting protocols, interviews and observations in their formative usability tests during product development. The objective of our study was to investigate whether the use of eye-tracking technology can contribute to the data collection and bring new data and knowledge into the product development. The use of eye-tracking provided five unique insights and findings. Although the evaluated significance of them varied, a couple of findings stands out as important, indicating that the use of eye-trackers can indeed contribute positively to the results obtained from a usability test based on traditional ethnographic methods. It should be noted that use of eye-tracking requires additional time, resources and technical skills including optimal light conditions. However, with the promising perspectives in mind, eyetracking is recommendable as an additional tool for usability studies.

Copyright 2017 Human Factors and Ergonomics Society. All rights reserved. 10.1177/2327857917061042

INTRODUCTION

The present study was made in collaboration with a medical device manufacturer. Normally, the device manufacturer uses verbal self-reporting protocols, interviews and observations in their formative usability tests during product development. The objective of our study was to investigate whether the use of eye-tracking technology (Tobii Pro Glasses 2 shown in figure 1 and iMotions 6.2.5 recording software) can contribute to the data collection and bring new data and knowledge into the product development. It was the assumption that the use of eye-tracking would provide an insight into the visual perception of the user that cannot be obtained from verbal self-reporting, interviews or observations. In this context, eye-tracking is considered a supplement to the ethnographic data collection (interview and observation), giving further insight into the visual perception and attention of the user, and not as a standalone tool. The observation of user actions and use errors is still the core point of the formative usability test, but the use of both traditional ethnographic methods and eyetracking in combination could give the development team a much deeper understanding of the reasons and explanations behind the observed user actions and use errors. Furthermore, this could provide better insights into root causes in use errors as expected by the FDA and other regulatory authorities. In general, the idea is that narratives and stories can – and should be – designed through a strategy approach using principles from human factors and basic psychological knowledge just as when designing the physical device.

The study is presented as work in progress and is conducted by Department of Applied Psychology at FORCE Technology in cooperation with the medical device manufacturer. The name of the manufacturer and the product in development cannot be mentioned in this paper due to confidentiality reasons during the product development process. RESEARCH QUESTIONS

The study tried to answer the following questions: 1. What are the pros and cons of the use of eyetracking in usability tests? 2. How can eye-tracking used in combination with conventional methods such as observations, interviews and verbal self-reporting support the understanding of the user's visual attention and visual distraction during use? 3. Will eye-tracking used in combination with conventional methods provide faster and more efficient usability tests yielding more details, more insights and better understanding of e.g. root causes of use errors? THEORY

The study is based on the idea about composite methods including both ethnographic methods and data collection based on visual perception (eye-tracking). This idea is inspired by The PCA Model (perception, cognition, action) of user behavior as describe in the “Guidance for Industry and Food and Drug Administration Staff” (2016). Some methods such as observation of

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

behavior can give us information about “action”. Other methods such as eye-tracking can give us information about “perception”. And “cognition” can be accessed using verbal self-reporting (e.g. “thinking-aloud” reflections) and interviews. However, in the regular design of formative usability studies, the medical device manufacturer today uses only observations (including video observations), selfreporting and interviews to gain access to the perception of the user, i.e. what the user reports having perceived. We know that this can be heavily influenced by an individual’s urge to give socially desirable responses (giving answers perceived as appropriate) and memory-related issues (what is remembered, maybe influenced by shortterm memory span width, primacy-recency effect or false memories). Equally, the interview-data can be under influence from the fact that very short perception episodes and perception during automated actions might happen at a non-conscious level. Furthermore, participants are not asked about their experience or cause of difficulty in using the device between tasks but rather in a post-test interview. In relation to this, it was our assumption that the use of eye-tracking would provide data about very short visual perception episodes and/or episodes that are part of automated behavior and therefore difficult for the user to report during self-reporting or interviews. METHOD

The study focused on two scenarios or use cases with a medical device: (1) how the user correctly identifies the “device status”, and (2) how the user handles replacement of consumables in the maintenance of the device. The identification of the device status includes: (a) to be able to identify if the device is ready for use or not and (b) to be able to identify if the device needs maintenance or problems need to be solved before use (and which problems to solve, if any). The scenarios described above were tested as part of a formative usability test with ten participants. The test took place in February 2017 per normal schedule for the product development process. A small pilot study with two participants was carried out prior to the test and these results are included in the study, bringing the number of participants to twelve. Participants

The medical device is intended for use by health care professionals. The twelve participants were recruited from other departments in the manufacturer’s organization and six of the participants had prior experience with similar devices.

193

Research design

As mentioned, the test was conducted per normal practice for the medical device manufacturer and with use of the ethnographic methods normally executed, including observation, verbal self-reporting and interviews. This process included three observers from the manufacturer (manufacturer’s team). The manufacturer’s team carried out an analysis based on their data collection. The test was also observed by a team of psychologists from FORCE Technology using eye-tracking technology (Tobii Pro Glasses 2 and iMotions 6.2.5 software) to investigate and record details about the participant’s visual perception and attention during the use scenarios. The team of psychologists performed an additional analysis based on the eye-tracking recordings. The results from the analysis made by the manufacturer’s team and the results of the analysis made by the team of psychologists using eye-tracking were compared and evaluated to identify additional information and results gained from the use of eye-tracking. The technology of eye-tracking: What is eye-tracking and how does it work?

Eye trackers “are video based and use information about the center of the pupil and the center of one or many corneal reflections in the eye image to estimate movements of the eye” (Nyström et al. 2016). In short, mobile eye-tracking (eye-tracking glasses) refers to the measurement of eye activity, making it possible to measure and analyze what a test person is looking at, in which order and for how long. The final eye-tracking recording including gaze data is in fact a three-piece data collection consisting of measurements of the infrared spectrum imaging from the eye camera and data from the scene camera (a regular HD camera) on the front of the glasses combined with the data output of the software. The outcome is an estimated direction of gaze signals. The eye camera thus records eye movements, while the scene camera records the surroundings of where the glasses are pointed at during head movements. The mobile eye-tracking device used in this study (Tobii Glasses 2 shown in figure 1) has a sampling rate of 50 Hz, i.e. each gaze point represents 20 ms. A cluster of gaze points denotes a fixation, a stabilization of the retina toward a specific object of interest. Statistically, the duration of a fixation range is 150 – 600 ms (Duchowski, 2007). The rapid eye movements between fixations are called saccades and are used to move the fovea to a new

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

location of interest and can both be voluntary and reflexive (Duchowski, 2007). While saccades are rapid another type of eye movement is called smooth pursuit where the eyes steadily follow a moving object without generating obvious saccades. The terms fixation, saccades and smooth pursuit are prominent metrics of visual attention, however, according to Posner et al. (1980) attention can be shifted independent of the fovea, making it possible to attend to an object while maintaining gaze elsewhere – this is called attentional spotlight: Attention can be compared to a spotlight that enhances the efficiency of detection of events within its beam. But notably, fovea has no special connection to attentional system, making it possible to look at one point, but attending to another (e.g. in the periphery of the visual field). The eye-tracking device is also equipped with a processing unit (box) containing the image detection, 3D eye model and gaze mapping algorithms, in which the raw video material is stored and processed, and now ready to be analyzed. This box is not shown in figure 1, but usually – and as it was the case in our study – it is worn in a belt clip. Through its own Wi-Fi network the box is connected to a PC wherefrom it is possible to livestream camera recordings and related gaze points. In the analysis, heat maps and areas of interests (AOI's) were presented. Heat maps (not shown in this article) are static or dynamic aggregations of gaze points and fixations revealing the distribution of visual attention. Based on a color-coded scheme, heat maps serve as a method to visualize what elements of the stimulus can draw attention. The red areas suggest a high number of gaze points and thus an increased level of interest, where the areas of yellow and green point toward flattening visual attention. Areas of interest includes fixation counts and durations. This provides the opportunity to define boundaries around a feature or element of the eyetracking stimulus, in which the software calculates the desired metrics within the given boundary over the time interval of interest (iMotions, 2016). See figure 2.

Figure 1: Psychologist Jeanette J. Jakobsen wearing Tobii Pro Glasses 2.

194

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

Figure 2: Gaze map aggregating four participants’ use of eye-tracking. The figure is not an illustration of the actual device. It is for illustration purposes only.

195

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

Evaluation parameters

The five most important parameters for evaluation (criteria for success) was: A. FASTER RESULTS: Eye-tracking will support faster results (e.g. insights available at a very early stage during the usability test, and comments from the participant can be obtained in an interview right after the test) B. LEVEL OF DETAIL: Eye-tracking provides deeper details of the perception and attention of the user compared to e.g. the user’s self-reporting. C. ADDITIONAL INFORMATION: Eye-tracking provides additional information not obtainable by means of traditional methods, e.g. additional information about root cause of use errors and additional information about the user’s attention to important indications and call to actions from the device D. BETTER UNDERSTANDING OF VISUAL DISTRACTIONS DURING USE: Eye-tracking provides a more valid representation of visual distractions than self-reporting of distractions which is typically “filtered” through for example social desirability (it is not socially acceptable to be easily distracted during work) and impression management (it is considered more professional to be non-distractible) E. TO WHAT EXTENT ARE USERS INFLUENCED BY THE EYE-TRACKING TECHNOLOGY ITSELF, e.g. change of focus, enhanced attention, enhanced awareness, difference in content of selfreports or interviews? RESULTS AND DISCUSSION Observations during the usability study

While the test participants were engaged in the experiment, the team of psychologists closely followed the eye-tracking livestream for immediate observations. By observing each test participant, it quickly became clear which scenarios in the design called for most attention. It also became clear how some of the test participants tended to adjust the eye-tracking glasses calling for new instructions that would not have been without immediate observation (live-streamed video). By means of the setting, eye-tracking could also unveil how the participants were trying to read the instructions from the facilitator’s papers while he/she was getting the instructions. It can

196

be argued that the participant might lose some information in that process. Analyses after the usability study

Figure 2 shows an example of a type of analysis made after the usability study itself. The figure is not an illustration of the actual device. It is for illustration purposes only. The areas marked 1, 2, 3, 4 and 5 are so called “Areas of interests” (AOI’s) with 3 chosen metrics: Time to first fixations, Time spent and Ratio. Time to first fixation indicates the amount of time it takes a test participant to look at a specific AOI from the stimulus onset (it takes around 1.5s for the four respondents to look at AOI #1). Time spent quantifies the amount of time that respondents have spent on an AOI (the respondents spent 4.8s on looking at AOI #2), and Ratio indicates how many of the respondents look at a specific AOI from the stimulus onset (0/4 respondents look at AOI #5). The study focused on two scenarios with the medical device: Scenario 1. The test participants should identify the device status, i.e. if the device is ready to be used or not. Scenario 2. The test participants should be able to identify if the device needs maintenance or other problem solving actions, such as finding the correct function in the interface. This function is denoted as configuration menu in the interface and marked as AOI #5 in figure 2. As the AOI-analysis illustrates, for all 4 test participants who went through scenario 2 AOI #5 has the longest TTFF of all demarcated areas. These results indicate that participants had trouble finding the right function which was crucial for solving their task thereby illuminating a possible design flaw in the interface (or identification of usability problem). Insights, findings and classification of results

The analysis made by the team of psychologists included both observations during the usability study and analyses after the usability study as explained in the two previous sections. The analysis showed seven main insights and findings as illustrated in table 1 below:

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

Insights and findings

Classification of results

The participants started to look (sneak peek) at the device already when they received instructions for the usability test

Better understanding of visual distractions during use

None of the participants looked at an important button (AOI #5 in figure 2) when first visiting a particular screen

Additional information

Participants with prior experience from other similar devices were influenced by this experience and thereby showed reduced performance compared to inexperienced participants (also known as negative transfer)

Faster results Additional information

Some participants did not notice an important indication of failure condition, but after the test they reported that this indication must have been present and in fact, they did act accordingly

Level of detail

Participants with experience did not read instructions on the screen or skipped the last lines of the instructions

Faster results Level of detail Additional information

One participant looked at a product name only and ignored an important product identification symbol

Faster results Additional information

Some participants tried to match the product name shown on the screen with the product name on the packaging of consumables. Furthermore, they spent much time on this, and in their effort to do so, Level of detail they missed the correct match of product identification symbol (differ- Better understanding of visual distracent packaging share almost the same product name and the product tions during use discrimination should be done on basis of the product identification symbol. There is a risk that users select a wrong box with consumables and fail to notice the small difference between product names)

Table 1: The insights and findings corresponds to the five most important parameters for evaluation (criteria for success) as identified under “Method”. Parameters A-D are indicated in the table and parameter E was evaluated by means of interviews after each session.

197

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

Evaluation

The seven insights and findings described under “Results” were presented to the manufacturer’s team for comments and evaluations concerning three issues: 1. It was evaluated if each insight and finding had also been identified by the manufacturer’s team in their own analysis or whether it was a result of the use of eye-tracking as additional method apart from the traditional ethnographic methods. 2. It was evaluated how significant, valuable or relevant each of the seven insights and findings were per the manufacturer’s evaluation in relation to the experimental design of the usability test, the practical administration of the test, the scenography, instructions to participants etc. 3. It was evaluated how significant, valuable or relevant each of the seven insights and findings were per the manufacturer’s evaluation in relation to the design process, prevention of use errors etc. Significance, value or relevance

Five (marked *) of the seven insights and findings were not identified in the manufacturer’s analysis based on traditional ethnographic methods. They can therefore be attributed to the use of the eye-tracking method (evaluation issue 1). The manufacturer’s evaluation of the significance, value or relevance (issue 2 and 3 in previous section) is presented in table 2.

Insights and findings

* The participants started to look (sneak peek) at the device already when they received instructions for the usability test * None of the participants looked at an important button (fixation point 5 in figure 2) when first visiting a particular screen Participants with prior experience from other similar devices were influenced by this experience and thereby showed reduced performance compared to inexperienced participants (negative transfer) * Some participants did not notice an important indication of failure condition, but after the test they report that this indication must have been present and they acted accordingly * Participants with experience do not read instructions on the screen or skip the last lines of the instructions * One participant looked at a product name only and ignored an important product identification symbol The participants tried to match the product name shown on the screen with the product name on the packaging of consumables. Furthermore, they spent much time on this, and in their effort to do so, they missed the correct match of product identification symbol

198

Issue 2

Issue 3

Very low

Very low

Very low

High

Low

Very low

None

Low

Low

Very high

Very low

Some

Very low

Very low

Table 2: Manufacturer’s evaluation

The cons of eye-tracking: Influence from eyetracking and problems during data collection

The first research question in our study is about the pros and cons of eye-tracking in formative usability testing. The pros have been discussed in the previous sections about insights, findings, results, evaluation etc. The cons are discussed in this section and relates to how eye-

Proceedings of the 2017 International Symposium on Human Factors and Ergonomics in Health Care

tracking has influence on the participant during the usability test and some problems during data collection. All participants were interviewed after their sessions to get knowledge about how the equipment had influenced them during the usability test. They were asked how they experienced the use of eye-tracking and how the technology affected them in the situation. The interviews were based on three questions: 1. WAS IT UNCOMFORTABLE WEARING THE GLASSES? Most participants (9) reported no discomfort, two participants reported some discomfort and a single participant reported that the glasses were ill-fitting 2. DID YOU NOTICE THE PRESENCE OF THE GLASSES DURING THE TEST? Seven participants reported that they did not notice the glasses during test and five participants reported that they noticed the glasses e.g. “In the beginning you can feel they are there” and “The edge is a bit annoying”. 3. DID THE GLASSES AFFECT THE WAY YOU INTERACTED WITH THE DEVICE? Almost all participants (11) reported that the glasses did not affect their interaction with the device. A single participant reported “Maybe a little because I was thinking about where I was looking”. During the study, respondents could move freely mostly without implications. Even so, we experienced that the glasses sometimes tended to shift during the recording, resulting in less accurate gaze points. Maybe this was due to movements of the participant’s facial muscles, making the glasses move a little. It also became evident that some of the test persons wearing glasses in their everyday lives showed a tendency to adjust the eye-trackers like they would do with their regular glasses. Instructions were made early in the process to prevent this from happening too often. One test person was using strong contact lenses (eye power -11 diopter), making it impossible to firmly calibrate the eye-trackers. Finally, it was discovered during the pilot study that the lighting in the room was highly contrasting with the medical device, making the text too difficult to read on the recording. The solution was to mount extra light in the room making a smaller contrast between the lighting in the room and the screen on the medical device.

199

Limitations and further studies

Undoubtedly, the present study would have benefitted from including an interview with the participant’s while reviewing the recorded video-feed with eyetracking information shown as it happened. Technically, this is easily done, albeit time consuming. However, if participants can re-experience their own task execution by seeing the video recording including the eye-tracking dots provided by the iMotions software, it is quite likely that additional reflections and detailed reports will emerge. Put differently, the eye-tracking video can most likely serve as an excellent source of retrieval cues for the participants. Future studies will show. CONCLUSIONS

It is concluded from the study that the use of eyetracking provided five unique insights and findings. Although the evaluated significance of them varied, a couple of findings stands out as important. This implies that the use of eye-trackers can indeed contribute positively to the results obtained from a usability test based on traditional ethnographic methods. It should be noted that use of eye-tracking requires additional time, resources and technical skills including optimal light conditions. However, with the promising perspectives in mind, eyetracking is recommendable as an additional tool for usability studies. REFERENCES Duchowski, A. T. (2007). Eye Tracking Methodology. Theory and Practice (pp. 42-47) 2nd Edition iMotions (2016). Eye Tracking. Pocket Guide. iMotions Biometric Research Platform. Irwin, D. E. (1992). Visual Memory Within and Across Fixations. In K. Rayner (Ed.), Eye movements and visual cognition: Scene perception and reading (pp. 146-165). New York: Springer-Verlag. (Springer Series in Neuropsychology) Nyström, M., Hooge, I., & Andersson, R. (2016). Pupil size influences the eye-tracker signal during saccades. Vision Research, 121, 95-103. Posner, M., Snyder, C., Davidson, B., & Kimble, Gregory A. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109(2), 160-174. U.S. Department of Health and Human Services - Food and Drug Administration - CDRH (2016). Applying Human Factors and Usability Engineering to Medical Devices. Guidance for Industry and Food and Drug Administration Staff, February 2016.