Desirability and Usability: Testing Visceral and ...

3 downloads 0 Views 2MB Size Report
survey results are considered “anecdotal” (Barnum & Palmer, 2011) because these responses are shaped reflectively rather than viscerally. Visceral ...
Belling, O'Neil and Ryan 1

Desirability and Usability: Testing Visceral and Pupillary Data in UX Methodology by Maryn Belling, Fer O’Neil and David Ryan Summary: § §

Usability studies and UX often rely on reflective data rather than visceral data for measuring variables related to user desirability. Integrating visceral data into usability studies can help User Centered Design and UX better triangulate user reflections.

Keywords: Usability, UX, eye-tracking, desirability, Eye Guide® Mobile Tracker, pilot study, visceral, cognitive responses __________________________________________________________________________ We undertook this pilot study to determine whether people's reflective perception of desirability correlated with their visceral responses. Specifically, we wanted to explore and better understand the phenomenon of desirability because it allows us as researchers to study the liminal thresholds where visceral signals become cognitive ones. To aid in this study, we used a methodology that triangulated think-aloud-protocol, pupil dilation via EyeGuide™, and emotional response cards to help determine whether respondents' visceral reactions (via pupil dilation) correlated with their self-reported reactions. Is What is Desirable Trackable? Desirability—the appeal of a design or product—is essential to understanding user experience (UX) because, as D. Norman (2002) argues, “attractive things work better." Consequently, though attractive and appealing denote emotively-focused, user-related values, they connote strong usability traits even when audiences indicate challenges with the utility and usability of a product or process. Below are brief summaries of studies that shaped our research agenda: §

§

In an empirical study of the visual design of computer-simulated cash machines, the perception of a technology’s “attractiveness” led to a perception of its usability (Kurosu & Kashimura, 1995; Tractinsky, 1997). A SURL study found that users perceived that a website was more or less usable based on an impression of its visual design, regardless of a user's actual usability experience with the site (Phillips & Chaparro, 2009). Understanding the emotional dimension of a user's experience, therefore, might be more effective than measuring “pure” usability (Stolterman, 1995).

Belling, O'Neil and Ryan 2

§ The ISO defines usability as having three components: effectiveness, efficiency, and satisfaction. Despite the existing studies that measure and report on effectiveness and efficiency, researchers acknowledge that they are challenged to measure subjective visceral data that indicates user satisfaction (Hornbæk, 2006), including the desirability of design. Understanding Visceral and Reflective Data Existing desirability studies often rely on reflective data—such as verbal or written survey responses—rather than pre-cognitive, visceral data to understand desirability. The problem is that participant-provided reflective data often contain social desirability bias or self-editing. For instance, the results of desirability studies using Microsoft Product Reaction Cards (MPRC), regardless of how quickly participants are required to respond, are once-removed from user participants’ visceral responses because visceral responses are pre-cognitive responses. Once users start qualifying their responses via reflective tools—such as MPRCs printed as a checklist, or asking participants to choose either a positive trait or its opposite within a linguistic context (“interesting" or "uninteresting”) to quantitatively assess user responses to visual design—such survey results are considered “anecdotal” (Barnum & Palmer, 2011) because these responses are shaped reflectively rather than viscerally. Visceral Impressions: Millisecond Threshold Because users often make first-impression judgments in as little as four milliseconds (Zajonc, 1980; 2000), researchers must capture users’ visceral response nearly instantaneously for results to be effective (Hawley, 2010; Norman, 2002, p. 23). Finding an effective empirical testing process producing primary visceral data would benefit UX researchers because such data could complement secondary reflective data. Design and engineering teams could make decisions based on which design elements elicit an optimal emotional response rather than rely solely on secondary reports of satisfaction (Martin & Hanington, 2012). Lindgaard, Fernandes, Dudek & Brown (2006) illustrate the series of steps that occur in the first 50 milliseconds of a human’s interaction with data (see Figure 1 on the next page). The measurable part of the process—movement of the eyes, mouth, arms, etc.—occurs within that first 50 ms., sometimes before a user can interpret the experience cognitively (Lindgaard et. al., 2006, p. 116).

Belling, O'Neil and Ryan 3

Figure 1. Lindgaard et al. (2006) Illustration of Human Data Perception. Pilot Study: Triangulating Eye-Tracking Because our eyes are an important part of the body's larger viscerosensory system, our pilot study used eye-tracking—an experimental method that collects pupillary data on whether a pupil dilates and where, how quickly, and how long a user fixates on a design element—to capture and validate a user's visceral responses to static designs for advertisements promoting technical communication PhD programs. Pupil dilation can indicate stress, surprise, cognitive load, emotion, conflict, or arousal (Cavanaugh et al, 2014). A long fixation on a design element may indicate a user is taking a longer time interpreting or relating to a design or object (Cooke, 2006). Alternatively, if a user fixates on a particular design element quickly, it could mean that part of the design is more noticeable (Berlin, 2012). We compared pupil dilation and fixation data to visceral and reflective data, using previously demonstrated methods, to determine whether we could use such data to measure the desirability of static designs. To gather reflective data, we used MPRC (Benedek & Miner, 2002), a Likert Scale asking users to rate each design, low-fidelity paper prototyping (Still & Morris, 2010), and thinkaloud protocol (Boren & Ramey, 2000). We also used emotion heuristics as an alternative way to measure to visceral response (de Lera & Garreta-Domingo, 2007).

Belling, O'Neil and Ryan 4

A/B Testing We had our participants look at seven examples of static advertisements for university PhD programs for technical communication, four of which were existing ads and three were proposed renderings created by a design team (see Figure 3). We created two testing protocols (see Figure 2) and found individuals interesting in PhD programs and divided eight participants (three men, five women) equally into the two groups. Protocol A

Protocol B

Eye-tracking

Low-fidelity paper prototyping

Microsoft Product Reaction Cards

Concurrent think-aloud protocol

PrEmo

PrEmo

Likert scale

Ten emotion heuristics Likert scale

Figure 2: Protocol Methods.

Figure 3: static renderings for Protocol A. Results and Discussion When we began the testing process, we hypothesized that regardless of which protocol was used, triangulation would help determine the measurability of the desirability of the design. In other words, A/B testing would be mutually authenticating, validating the experimental process as a method for testing desirability. It quickly became clear that distilling the complexity of desirability to “positive, negative, and neutral” codes were problematic because the research design added a divergent number of variables. For instance, data fell into two separate categories: (1) data that was coded as positive, neutral, and negative; and (2) data from low-fidelity paper prototyping that was unique to each participant's imagination that fell outside of the three codes. In Protocol B, participants could remove elements and add design elements that they imagined should be there, thereby complicating the “positive-negative-neutral” field considerably.

Belling, O'Neil and Ryan 5

Of the visceral data we collected, pupil dilation became the most problematic. Because a variety of affective states cause pupil dilation (including fear, difficulty, arousal, confusion, and surprise) (Chatham, Frank, & Munakata, 2009; Goldwater, 1972; Hess & Polt, 1960; Kahneman & Beatty, 1966; Laeng, Sirois, & Gredeback, 2012), it may not be possible to link pupil dilation with a single emotion or affect such as “like” versus neutrality or negativity. However, pupil dilation data may be useful both as supplemental, correlational data with other sources of visceral and reflective data, and as an illustration of the experimental process of desirability information gathering, because the data can affirm reflective responses. The Validity of Measuring Visceral Response We offer this discussion in two parts. First, when half of the participants gave reflective responses to the designs, their self-reporting validated their visceral responses. In this respect, a user’s visceral response played a persuasive role in the overall experience of the user. This result is important, for it validates the importance of and need for measuring visceral responses in relation to behavioral and reflective measurements. The second part poses some practical, interpretive challenges and raises some theoretical issues. The challenge is in understanding why half of the respondents chose to self-report a response different from their visceral reactions. For 50% of our participants, their eye scan results indicated a positive pupil dilation response, yet these users reported from the traditional methods a neutral or negative user experience. Interpreting Visceral Data There are a few ways to interpret this potential dissonance. First, for these respondents, visceral responses are micro-experiences that are factored into a much larger spectrum of reflection, so their initial responses are not as important as their final impressions. Though Lindgaard (2006) argues that 50 ms. is the time necessary to formulate a valid first impression, our participants engaged in a five second test for each design. In their decisionmaking, this period allowed the users to factor their visceral responses with other, perhaps more important and more complicated impressions. If 50 ms. is the time it takes for a visceral response to form, then the question is: to what extent does this first response last before other visceral impressions form? Is it 51 ms.? What is more important is that for 50% of our participants, they weighed their initial, visceral response (measured as generally a favorable response via pupil dilation) with other responses and chose the latter ones (neutral and negative) to dominate their reporting. There are a few probable reasons to explain this result. First, the humorous nature of some of the designs (Figure 3) may have provoked visceral responses that lead to cognitive dissonance. For instance, the humor-focused ads seemed to intentionally challenge conventional PhD program ads and, perhaps, the perceived values of potential doctoral students. Though these designs could have appealed directly to the visceral instincts of users, they posed problems regarding the utility of the ads (akin to "I viscerally like this subversively humorous ‘size matters’ ad, but I don't think it is usable because it goes against my public values, or I have doubts about the ad's persuasive utility," etc.).

Belling, O'Neil and Ryan 6

In this reflective circumstance, a user’s visceral response via eye-tracking may produce a strong personal desirability, but in the more reflective parts of the reporting process (MPRCs, PrEmo, and Likert scales), users chose to report a neutral or negative perception. For these participants, whatever positive visceral response was felt and measured was processed as a contrary value from the perceived utility or usability of the designs. In this process, their selfreporting privileges a reflective response over a physiological one (Lindgaard, 2006. p. 117b). Integrating Visceral Responses This interpretation does not necessarily confound our findings regarding integrating visceral data within existing UX methodologies, for our discussion adds to the field of inquiry regarding the study of viscerosensory data and UX. We are encouraging further study regarding the complex interplay of involuntary, physiological responses via eye-tracking and voluntary cognitive responses via self-reporting. Here, UX testers must be careful when interpreting these viscerosensory areas when visceral data seems incongruent with reflective data. We also affirm the need for more studies in humor-centered design and UX because many studies that focus on understanding visceral responses do so in relation to non-humor based aesthetics despite the pervasiveness of humor. A better understanding of the techniques of humor-centered design and its relationship to UX can help researchers to design activities and tests that measure how users factor humor in their decision-making. This recommendation also raises a compelling need for understanding the intent of designers as a factor in testing user response, particularly when designers create viscerally-oriented interfaces that challenge users by using subversive messages, novel approaches, or unconventional techniques. Because the intent behind some of the designs were unknown (beyond garnering attention and sending messages), our findings could have been more informed if a better understanding of the comic framing of three of their four designs were achieved. Integrating testers early in the design process improves communication, achieves a shared understanding of design intent, and promotes group collaboration (Gothelf 2013). Finally, though tracking eye-movement can provide “so many aspects into the window of cognition” (Poole & Ball 3), usability testers must be careful in working with the eye-mind hypothesis, particularly as their projects relate to viscerally-oriented interfaces designed to challenge users with humor or examples of unconventional thinking or language. Eye-tracking does provide reliable data for measuring a user’s eye movements and fixations, but there are other variables for testers to consider when factoring in pupillary responses within the reflective, self-reporting processes of users. Our experiment used a five-second eye-tracking sequence in conjunction with tasks that required comparative analysis and reflective selfreporting. In this process, visceral responses were contextualized with “reasoning behavior” (Muldner, et al. 2009) where pupil dilation data were processed with tasks and sequences that required cognitive and emotive acts of reporting.

Belling, O'Neil and Ryan 7

For half of the users, their visceral responses (via pupillary dilation) to the largely humorcentered aesthetic designs strongly influenced their reflective choices regarding the usability of these ads. This data is encouraging because it shows that a user’s visceral response can be isolated (Hawley 2010), measured, and factored in usability testing. However, for the other half of the respondents, their pupillary results led to alternative conclusions that are worth studying. Though we have explored some reasons why, this incongruity deserves further investigation so usability testers can improve their understanding and integration of viscerosensory data within their research methodology. The next desirability-usability study could enlarge the scope with more users, employ alternative methods and tasks, and utilize aesthetic designs that do not possess humorous messages. Acknowledgments Special thanks to Dr. Brian Still and his colleagues at EyeGuide. References Barnum, C. & Palmer, L. (2010). More Than a Feeling: Understanding the desirability factor in user experience. Ext. Abstracts CHI 2010. ACM. 10-15. Berlin, D. (2012). Psychophysiology and Eye Tracking: New and old technologies that complement usability research. http://www.slideshare.net/Banderlin/psychophysiology-and-eyetracking-in-user-experience. Boren, M.T. & Ramey, J. (2000). Thinking Aloud: Reconciling theory and practice. IEEE Transactions on Professional Communication. 43(3). 261-278. de Lera, E., & Garreta-Domingo, M. (2007). Ten emotion heuristics: Guidelines for assessing the user’s affective dimension easily and cost-effectively. In People and Computers XXI. Presented at the 21st British HCI Group Annual Conference on People and Computers, University of Lancaster, 3-7 September. Swindon, UK: BCS. Desmet, P.M.A. (2003). Measuring emotion: development and application of an instrument to measure emotional responses to products. In: M.A. Blythe, A.F. Monk, K. Overbeeke, P.C. Wright (Eds), Funology: from Usability to Enjoyment (pp. 111-123). Dordrecth: Kluwer Academic Publishers. Gothelf, J. & Seiden, J. (2013). Lean UX: Applying lean principles to improve user experience. Cambridge, MA: O'Reilly Media. Hawley, M.(2010). Rapid Desirability Testing: A Case Study. UX Matters. February 22, 2010. Retrieved: September 21, 2014.

Belling, O'Neil and Ryan 8

Kurosu, M. & Kashimura, K. (1995) Apparent usability vs. Inherent usability. CHI '95 Conference Companion. 292-293. Lindgaard, G., Fernandes, G., Dudek, C., & Brown, J. (2011). Attention web designers: You have 50 milliseconds to make a good first impression!, Behavior & Information Technology, 25:2, 115-126. DOI: 10.1080/01449290500330448Martin B, Hanington B. Chapter 29 Desirability Testing. Universal Methods of Design: 100 Ways to Research Complex Problems, Develop Innovative Ideas and Design Effective Solutions. Beverly, MA: Rockport Publishers; 2012. Muldner, K., Christopherson, R., Atkinson, R., & Burleson, W. (2009). Investigating the Utility of Eye-Tracking Information on Affect and Reasoning for User Modeling, Adaptation, and Personalization, 138-149: Seventeenth International Conference. UMAP 2009, Trento, Italy, Volume LNCS 5535, Springer-Verlag, 2009. Norman, D.A. (2002). Emotion and design: Attractive things work better. Interactions Magazine, ix(4), 36-42. Retrieved from http://www.jnd.org/dn.mss/emotion_design_at.html. Phillips, C., & Chaparro, B. (2009, October). Visual appeal vs. usability: Which one influences user perceptions of a website more? Usability News, 11(2). Retrieved from http:// usabilitynews.org/visual-appeal-vs-usability-which-one-influences-user-perceptions-of-awebsite-more/. Poole, A. & Ball, L.J. (2005). Eye-tracking in Human-Computer Interaction and Usability Research: Current Status and Future Prospects: Encyclopedia of Human-Computer Interaction. C. Ghaoui (Ed.) Pennsylvania: Idea Group, Inc. Redish, J. & Barnum, C. (2011). Overlap, Influence, Intertwining: The Interplay of UX and Technical Communication. Journal of Usability studies. 6(3). 90-101. Still, B. (2010). Mapping Usability: An Ecological Framework for Analyzing User Experience. In Albers & Still (Eds). Usability of Complex Information Systems: Evaluation of User Interaction. 89-108. Boca Raton, FL: CRC Press. Stolterman E. The aesthetics of information systems. Institute of Information Processing, University of Umea, Sweden. 1995. Tractinsky N. Aesthetics and Apparent Usability: Empirically Assessing Cultural and Methodological Issues. CHI 97 Electronic Publications: Papers. 1997. Zajonc R. (1980). Feeling and thinking: preferences need no inferences. American Psychologist. 35(2):151-175. Zajonc R. (2000). Feeling and thinking: Closing the debate over the independence of affect. Feeling and Thinking: The Role of Affect in Social Cognition. 31-58.

Belling, O'Neil and Ryan 9

About the Authors Fer O'Neil Fer is a technical writer for a global security software company and a PhD student in the Technical Communication and Rhetoric program at Texas Tech University.

David Ryan David is Faculty Chair and Academic Director of the MA in Professional Communication program at the University of San Francisco. He serves as director of USF's Usability Research Lab.

Maryn Belling Maryn serves as Executive Director of United Fund of Globe Miami and is a PhD student in the Technical Communication and Rhetoric program at Texas Tech University. Citing this Pilot Study: Belling, M., O'Neil, F. & Ryan, D. (2017). Desirability and Usability: testing visceral and pupillary data in ux methodology. Unpublished manuscript. Texas Tech University, Lubbock, TX.