International Journal of Human - Computer Studies 101 (2017) 1–9
Contents lists available at ScienceDirect
Int. J. Human–Computer Studies journal homepage: www.elsevier.com/locate/ijhcs
Real-time detection of navigation problems on the World ‘Wild’ Web
MARK
⁎
Markel Vigo, Simon Harper
University of Manchester, School of Computer Science, Manchester M13 9PL, United Kingdom
A R T I C L E I N F O
A BS T RAC T
Keywords: Automated usability testing Web usage Web interaction
We propose a set of algorithms to detect navigation problems in real-time. To do so, we operationalise some navigation strategies suggested by the literature and investigate the extent to which the exhibition of these strategies is an indicator of navigation problems. Our Firefox extension senses behaviour indicative of a user experiencing interaction problems. Once these problems are detected we can suggest changes to these sites, and eventually adapt the site in real time to better accommodate the user. A remote longitudinal study monitored real website user behaviour, analysing every application event on the client side both individually and in combination. The study was conducted with 34 participants over 400 days totalling 567 h of normal usage and with no task restriction. Our sensing algorithms detected 374 issues with a 85% precision for purposeful Web use, suggesting that, indeed, when users search for specific information the exhibition of these strategies indicates the presence of problems. This contribution is novel in that, as opposed to a post-hoc analysis of user interaction, real-time detection of navigation problems at the user end opens up new research avenues in the realm of adaptive interfaces and usability analysis.
1. Introduction The World Wide Web has become the ecosystem that enables individuals to satisfy their information and communication needs. Often, navigating on the Web is far from being a smooth and satisfying experience. There are several reasons why the Web can be a difficult environment under determined circumstances: a website can be a flawed artefact because of usability, accessibility and information architecture problems. Sometimes, users may lack the expertise, are unfamiliar with the design conventions, or there is a mismatch between the expectations of individuals and the intentions of Web designers. We explore whether the navigation strategies users employ to overcome the encountered difficulties can be detected as they happen and when these strategies are indicators of navigation problems. Our previous work has identified behavioural strategies of users that operate under regular interaction modalities on the Web (i.e. ablebodied users with desktop computers) to be a challenging and costly task (Vigo and Harper, 2013a). It is even more difficult if we aim at identifying these strategies in naturalistic and ecological settings so that a determined task model is not assumed. Therefore, we propose a method to transfer solutions from constrained populations, who cope more frequent and overtly, to broader ones. In order to suggest the validity of the method we have previously illustrated the process of
observation, analysis, coding, deployment and modification of two behavioural strategies that have been identified on visually disabled users, re-checking and retracing (Vigo and Harper, 2013d). The outcomes were twofold: firstly, we discovered an overlap between the strategies employed by populations operating under different interaction modalities. Previous research found common problems between regular and constrained interaction modalities (Yesilada et al., 2010), in that able bodied users and physically disabled users had overlapping problems of mobile data input. This finding was not only confirmed by our work, but also we have found that there are common strategies to overcome such problems. Secondly, the proposed method allows us to find how the operationalisation of the strategies differs between populations, which facilitates the identification of the behavioural solutions employed by broad populations on the Web. In this work we describe a longitudinal and ecologically valid way of studying behaviour in the real-world. Moving away from controlled laboratory studies which gives clean data but in a setting which does not take into account real world activities, the challenging behaviours of task free interaction,1 and does not take acknowledge that behaviour evolves over time. To do so, we embed a set of algorithms that sense such behaviours into a Mozilla Firefox add-on. We distributed this addon to 34 participants who used it cumulatively for 400 person days totalling approximately 567 h of normal usage and with no task
⁎
Corresponding author. E-mail address:
[email protected] (S. Harper). By ‘task free’ we mean to move away from the dictionary definition of a ‘piece of work imposed as a duty’ – Oxford English Dictionary – to activity undertaken by choice. And browsing where by a directed task (regardless of at whom's direction) is not undertaken. 1
http://dx.doi.org/10.1016/j.ijhcs.2016.12.002 Received 14 January 2016; Received in revised form 8 December 2016; Accepted 14 December 2016 Available online 21 December 2016 1071-5819/ © 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/BY/4.0/).
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
Some studies based on self-reporting have been able to identify the causes of computer frustration (Ceaparu et al., 2004). When it comes to describing the strategies employed to cope with frustration, users report quite vaguely: ‘I found an alternative solution’ or ‘I figured out how to fix it myself’. Self-reporting introduces subjectivity and provides a coarse grained approach about the strategies employed when problems are encountered. A method to detect user frustration on search engines obtained 87% of precision by analysing query features such as task duration or length of the query and comparing these to what users reported (Feild et al., 2010).
restriction. Our sensing algorithms detected 374 problems users were having when interacting with the Web. We evaluated the precision of our sensing algorithms by asking users to confirm problems once we had recognised they were in difficulties. In this case, we were able to test our precision; we got 85% correct for purposeful Web use. Our results suggest that, when users are actively seeking some information or need to complete a transaction, the navigation strategies employed by users are indicative of problematic situations. Furthermore, this precision is stable across the two independent cohorts of participants who reported their feedback during different periods, which suggests the reliability of these algorithms. The automatic detection of navigation problems in realtime at the user end opens up new avenues on usability evaluation and repair. The contributions of this paper are the following:
• • •
2.2. Detecting extreme adaptive behaviour While there are a number of ways to approach the detection of adaptive strategies employed by users work has been under way in log analysis and using custom browsers since the late 1990s and early 2000s. While the specific outcomes of these methods were not directly related to identifying adaptive strategies, they were interested in understanding human behaviour on Websites to refine recommender systems and intra-site Web search. Actions including mouse clicks, mouse movement, scrolling and elapsed time have been used as well as individual explicit ratings and some combinations of implicit ratings were analysed and compared with the explicit rating. Researchers found that the time spent on a page, the amount of scrolling on a page and the combination of time and scrolling had a strong correlation with explicit interest, while individual scrolling methods and mouse-clicks were ineffective in predicting explicit interest (Claypool et al., 2001). Further, as logs contain a rich source of information about how users actually access a website (Ding and Zhou, 2007) the relationships of Web pages in user session have been analysed based on their similarities. In this case, new Web page representations have been generated and combined with original text-based representations in Web site search (Zhou et al., 2006). Indeed, navigation metrics have also been calculated based on web logging data and used as indicators of user characteristics and task outcomes (Juvina and Oostendorp, 2006). Results show that spatialsemantic cognitive mechanisms seem to be crucial in adequately performing web navigation tasks which may provide problematic for or visually disabled users performing re-checking and retracing tasks (Vigo and Harper, 2013d). Juvina and Oostendorp (2006) suggest that ‘the fact that user characteristics and task outcomes can be estimated with reasonable accuracy based on navigation metrics suggests the possibility of building adaptive navigation support in web applications’. Our goal is more ambitious: we want to detect adaptive strategies employed by users on the Web regardless the task and type of page. These are navigation behaviours exhibited by users who are facing difficulties, which often bring about frustration. Thus, the studies about frustration when interacting with computers lead to identifying the sources of such difficulties. Ceaparu et al. (2004) mention pop-up ads, long time download and slow or dropped connection as the main causes of frustration generated by the Internet, where only pop-up ads are directly related to Web content. In order to cope with such frustration, users report to employ different adaptive strategies. These findings are consistent with a subsequent study that explored the strategies exhibited in order to overcome the episodes of frustration: Novick et al. (2007) found that strategies include asking for help, employing trial and error strategies, implementing workarounds and giving up. However little is known on how these workarounds, and trial and error strategies are actually realised. Some studies provide a number of hints on the interaction events which may be indicators of problems. In a study to characterise difficult search tasks Aula et al. (2010) found that scrolling up and down a page without reading the content may be a signal of frustration and lack of confidence. They also found that the re-visitation of pages over the time might also be a symptom of extreme frustration. Further Feild et al.
From our observations and a literature analysis we identify a number of strategies users employ when encountering problems on the Web. These strategies are amenable to computation so we develop a set of algorithms to detect such strategies automatically in real-time. The algorithms and the Mozilla Firefox add-on that contains them are openly available. A long-term remote study suggests that the algorithms can reliably detect navigation problems when users navigate purposefully on the Web.
2. Background Individuals adjust to the demands and constraints of their environment through adaptive behaviour. Coping, which is considered an extreme adaptive behaviour, occurs when the adjustments are a response to ‘specific external or internal demands that are appraised as taxing or exceeding the resources of a person’ (Lazarus and Folkman, 1984). Typically, these extreme adaptations are exhibited when individuals encounter new situations in which learned and automatic responses cannot be triggered. Coping does not necessarily entail successful outcomes or mastery over a situation. The effectiveness of coping strategies are determined by several factors including the effects of a given strategy in the longterm, the application of problem-solving strategies and managing one's emotions. Adaptation and coping involves the simultaneous management of the following variables (White, 1974):
• • •
Keeping the right amount of information about the environment as a plan to subsequent actions: adaptation may require seeking more information or filtering/removing existing information if its excess prevents decision making. Being alert and ready to process information. Maintaining autonomy and freedom in order to monitor the available ways to escape from any threatening situation.
2.1. Behavioural adaption on the Web Investigating extreme adaptation employed by users on the Web is a challenging and costly task (Vigo and Harper, 2013a). A study about frustrating computer interactions found one challenging event occurring every 75 min (Novick et al., 2007). Therefore, in order to observe a significant amount of coping responses on a desktop Web, a number of resources (participants, time and observers) are required. As said, in situ observations are costly but help to isolate why coping responses are given and which are the strategies employed by users. Note that the emotions expressed by users play a fundamental role in order to identify such strategies and onsite observations allow us to identify emotional coping responses. Not having a baseline about adaptive strategies on the Web, studies on human frustration with computers are the closest related work. 2
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
3.1. Quick preview
(2010) proposed a method to detect user frustration on SERPs, obtaining 87% of precision by comparing search query features such as task duration or search query length with user frustration as reported through after-task questionnaires. It was also found that mouse dwell on textual content predicts user frustration 15% more than chance (Navalpakkam and Churchill, 2012). In order to increase the ecological validity in detecting coping strategies, we propose a bottom-up approach as opposed to a traditional top-down approach that subjects users to a determined task model:
This strategy is exhibited by scrolling down pages very quickly in order to get a preview of the entire page (Aula et al., 2010). In order to automatically detect quick scans Algorithm 1 estimates the time it takes to users to visualise the bottom of the page. Fastest text scanning is established at a rate of 600 words per minute (Masson, 1982) although users do not normally read all Web content but just scan through it. Therefore a correction rate (see correction_rate in Algorithm 1) quantifies the amount of text a user is supposed to be able to scan. A flag will be raised if the bottom of the page is reached quicker than the time it takes a quick reading of the page. We understand that the strategy is completed if the top of the page is reached again later and the flag that indicates a quick scroll to bottom is raised.
1. Observe edge interaction cases (e.g. people with disabilities), where extreme adaptation is more frequent and overt, and isolate coping strategies. 2. Implement algorithms to automatically detect the identified coping strategies. 3. Deploy coping strategy detection algorithms on the ‘wild’ and run user studies that monitor the behaviour of broader populations. 4. During the studies get feedback from users every time a strategy is detected. 5. Refine algorithms based on user feedback. 6. Go to step 3 until significant detection rates are obtained.
Algorithm 1. Quick review strategy (executed at every scrolling event). if current_scroll_Y_location ≥ distance_to_the_bottom then
reading_ratio = no_of_words_in_web_page /time_spent_to_reach_bottom if reading_ratio > (600 × correction_rate) then bottom_reached_fast = true; end
While many of these strategies are well documented in the literature, the way in which they are implemented by users is often vaguely reported. Therefore, we tried different implementation alternatives, run several tests in the laboratory, exposed users to challenging tasks in these pages and obtained feedback from them. This led us to come up with the final set of algorithms that worked well in a laboratory setting and were later refined in two long-term remote tests (Vigo and Harper, 2013b). This paper focuses on implementation, deployment, and feedback (within the model-cycle described above) prior work has focused on observation and iterative refinement of the strategies (Vigo and Harper, 2013a). In this case we would say, that the coping strategies derived, have been initially sourced from the literature as a starting point, then translated to computational algorithms, deployed and iteratively experimented over, spanning a period of months and with different groups of users (Vigo and Harper, 2013c). In this case, the outcomes of these prior experiments showed strategies which were not amenable to computation and where therefore discarded, and enabled the generation of new strategies which were not previously elucidated in the literature (Vigo and Harper, 2013d).
Hesitant behaviour is common before asking for help (Novick et al., 2007). In order to catch hesitancy the algorithm checks whether the following sequence of Web pages (wp) occurs: wpi → wpj → wpi → wpk → wpi , where → entails that wpi links to wpj and users did not scan wpj and wpk but devoted a considerable amount of time as measured by the reading ratio – as computed by Algorithm 1. This suggests that users did not get to wpj and wpk just for a quick viewing, but stayed for a period of time – this is informed by the fact that time spent reading a Web page was found to be a predictor of lostness (Herder and Juvina, 2004).
3. Algorithms to detect navigation strategies
3.3. Retracing or circling
In the HCI tradition automated evaluation of navigation problems has been carried out using navigation models that draw from cognitive models of human behaviour (Chi et al., 2003; Blackmon et al., 2005). We propose an indirect way of detecting navigation problems by detecting the consequences of such problems. We hypothesise that the exhibition of determined strategies might be a reaction or adjustment to navigation problems. Hence, navigation strategies can be understood as behavioural markers of cognitive processes that indicate the presence of problematic situations. We could therefore infer the existence of navigation problems by detecting the exhibition of the strategies. The criteria to select a set of strategies for this study are as follows:
Users tend to retrace their steps starting from a safe harbour, accessed through teleporting or backtracking, until the user lands again on the Web page that contained the original problem. Then, users follow a sequence of pages that have not been visited before. Hence, the detection of this strategy enables us to identify turning points – also called hubs (Milic-Frayling et al., 2004) – that avert problematic pages. Retracing, which is often exhibited as a revisitation strategy (Obendorf et al., 2007), is indicative of users struggling when looping on the same set of pages (Thomas, 2014). The retracing algorithm detects the longest traversed path which has been repeated at least two times. For instance, in the following sequence of Web pages: wpi → wpj → wpk → wpl → wpm → wpl→ wpk → wpj → wpk → wpl → wpm → wpn , identifies {wpk , wpl , wpm} as the longest pattern and a detection event will be triggered when the user lands on wpn, identifying wpm as the turning point page.
• • •
end if bottom_reached_fast ¤t_scroll_ Y_location = 0 then report_detection () end
3.2. Asking for help
The strategies may be indicative of problematic interactions on the Web. Their existence is supported by our own anecdotal lab observations although they must be well documented by previous work. The algorithms to detect such strategies have to be amenable to computation.
3.4. Quick revisitation This strategy entails accessing a Web page for a second time after navigating back to the original page from which the user landed. 3
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
Typically, the objective of the first landing is to have a quick preview of the page. Often times we found that users click backwards to assure themselves they clicked on the right link by reading again its content. It has been documented as a revisitation strategy (Adar et al., 2008) and also as way to have a quick preview of a page (Kules and Shneiderman, 2008); it is suggested to be an indicator of problems (Aula et al., 2010). Indeed, this behaviour is common in the information retrieval community and is known simply as ‘quick back’ (Dan et al., 2012). Consequently, the algorithm for the quick revisitation monitors whether the following sequence of Web pages occurs: wpi → wpj ⇒ wpi → wpj , where ⇒ is a transition between pages as a result of typing the URL or clicking on the back button of the browser and wpj is the page visited twice. This strong transition is sought because it suggests an intentional action as opposed to getting to the previous page by chance.
Firefox add-on injects into the browser the algorithms described. The artefact is able to unequivocally identify user profiles (almost always 1:1) by automatically generating a key which is stored in the browser database. All communications with the server will contain this key so that obtained feedback can be linked to particular individuals. The detection algorithms are injected when Web pages load onto the browser. These algorithms are conceived to be independent agents that track particular user interactions and behave as observers that target specific actions of users. When the algorithms detect the exhibition of a given strategy a form pops up within the bottom area of the browser – see Fig. 2. At this point, users are requested to confirm whether they were actually facing a navigation problem and are asked to give feedback about what they were doing when the notification popped up – see Fig. 3. When users complete their feedback the information is sent to a remote server, where data is ready to be analysed. Responding to notifications in such a way may raise some concerns on the intrusiveness of the artefact as confirming whether a strategy was actually employed interrupts the natural interaction flow. The alternative to avoid such intrusion would be to ask users on a post-hoc basis. In this case spontaneity would be lost and users would difficultly recall their activities (Kellar et al., 2008). It should be also noted that users are free to not to send feedback. When another Web page is loaded the notifications belonging to the previous page disappear.
4. Study Two cohorts of participants took part in two remote studies that run in different periods of time. By doing this we wanted to assess the reproducibility of the results across different periods of time and different groups of people. We recruited participants using a snowball sampling method as well as through mailing lists of technology enthusiastic individuals. Each study lasted 10 days during which participants had to install the Mozilla Firefox add-on and respond to the notifications that popped up while doing their typical Web browsing tasks.
4.4. Materials Participants were sent a tutorial to learn the installation procedures and functionalities of the add-on. If a strategy was detected by the addon, a notification would show-up at the bottom bar and would remain there until it was filled out and submitted; if the user decided to navigate to another Web page, this would make the notification disappear. Participants were explicitly told that they were free to deactivate the add-on at any time.
4.1. Participants 20 participants (10 female), median age of 32 (SD=8.5, age range 25–60), took part in the first study. Eight of them participated from home, 7 from the workplace and 5 from both locations. All of them but one had at least a university degree and 13 reported to be expert Web surfers.2 Eleven of them were Mozilla Firefox users, while Chrome and Safari were the preferred browsers of 7 and 2 participants respectively. 17 participants reported to be familiar with Mozilla Firefox. A majority of participants (12) navigated on the Web less than 10 h per week, while 3 of them did 10–20 h and 5 of them did it for more than 20 h. In the second study 14 participants (7 female), with a median age of 31.5 (SD=9.7, age range 21–61), took part from home (5), work (5) and both locations (4). Most participants except one had a university degree and 10 of them considered themselves to be expert Web users. Chrome was the preferred browser of 6 participants, whereas remaining participants were Mozilla Firefox (4 participants), Safari (3) and Internet Explorer (1) users. All of them were familiar and had used in the past the Mozilla Firefox browser. Five of the participants navigated on the Web more than 20 h per week, whereas 3 of them did it 10–20 h and 6 of them navigated less than 10 h per week. Even if there are some differences, the two cohorts of participants are similarly characterised.
4.5. Analytical framework We acknowledge the existence of several frameworks for describing different activities users carry out on the Web (Byrne et al., 1999; Sellen et al., 2002; Kellar et al., 2007). Indeed, the most recent study (Lindley et al., 2012) analysing user activity on the Web, may better capture the Web usage routines of the participants in accordance with the state of the art of current Web technologies. This framework identifies five modes of Web use:
• •
4.2. Tasks
•
Participants were given total freedom to visit Web pages and carry out their daily tasks, so they were not told to conduct specific tasks in any particular website. We proceed in such a way because we wanted to identify navigation problems in a more ecologically valid scenario, where problems would emerge naturally (Fig. 1).
• •
4.3. Tooling We developed a software artefact that implemented as a Mozilla
In respite mode, familiar Web pages are visited to take a break from the main activity on, typically, news, social networks or Web mail. This usage is reported to be exhibited often and for very short periods of time. Users do not have information needs to satisfy nor want to be engaged with content. Orienting is a very similar behaviour to that of respite. The only difference resides on the fact that users are willing to be engaged and consequently sessions tend to be longer than in respite mode. Opportunistic use is exhibited on one's leisure time and users show a ‘wandering around’ navigations strategy, in which they end up landing on pages they are not familiar with. Purposeful use indicates using the Web to achieve specific goals. It implies satisfying the informational needs or conducting a transaction (i.e. buy an item in an e-commerce site). Either way, purposeful use entails the intention of completing a task. Lean back internet use suggests using the Web as a means to consume audio and video content.
Because of the subtle differences between respite and orienting, which is basically about the willingness of users engage, we merge these two categories into consumption use. Also, we added two more
This was self-reported by answering Yes, No or I don't know to the ‘Do you consider yourself to be an expert in Web browsing?’ question. 2
4
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
Fig. 1. Regular state of the add-on.
Fig. 2. Notification popping up on the bottom bar.
•
Using the above framework we classify the context of Web use of the reports submitted by participants following these criteria: (i) the URL of the Web page in which a given strategy was detected tells the type of the site and/or (ii) written feedback about what participants were doing indicates the type of activity they were carrying out. We follow criterion (i) to classify feedback reports into consumption, lean back and Web app use (see Table 1). Alternatively, we follow criterion (ii) for classifying reports into opportunistic, purposeful and comparison use (see Table 2). Both classification schemes are useful for identifying behaviour. Artefact classification attempts to understand behaviour a priori from the perspective of the expected use of the Website. While, behaviour classification attempts to the same behaviour but from an analysis of how and what the users actually do. The set of keywords on the second column in Table 2 were used for guidance when classifying the feedback although the final decision was made by taking into account the context of use (i.e. type of site +
Fig. 3. Feedback form.
emerging categories based on our observations and initial studies (Vigo and Harper, 2013b):
•
sors and spreadsheets. Comparison of Web pages is often considered a revisitation behaviour which is mostly implemented by switching between the tabs of the browser (Dubroy and Balakrishnan, 2010).
Web application use refers to the use of those Web applications stored on the cloud, which mimic the applications of traditional desktop operating systems including calendars, mail, word proces5
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
earlier – we did not expect mortality due to errors, incorrect reporting, or over reporting. In this case, we see no bias in the recorded precision and, therefore, no serious effects on validity. Each feedback report contained a boolean field telling whether the detection was right or not and another field containing a text conveying what the participant was doing when the detection was triggered. First, we classified the reports submitted by participants using the analytical framework described above and assigned each report to a category. Then, we collected the number of hits (i.e. true positives, tp) and failures (i.e. false positives, fp) in order to compute the precision of the tp algorithms to detect navigation difficulties, where precision = t + f .
Table 1 Artefact classification framework for consumption, lean back and Web application. Web use
Type of pages
Examples found
Consumption
Web mail
Corporate Web mail, Gmail, Yahoo! mail, Windows Live Facebook
Social networks Lean back
Video/audio broadcast
Youtube, allmyvideos.net, BBC iPlayer
Web application
Cloud applications
Google Docs, Google Drive, Google Calendar
p
p
Precision was computed per study and per strategy as shown in
Table 2 Behaviour classification framework for opportunistic, purposeful and comparison. Web use
Keywords
Examples found
Opportunistic
Having a look, skimming through, glancing at, exploring, browsing
‘I'm not seeking anything specific, I was just having a look’ ‘Just exploring a set of pictures’
Purposeful
Search, seek, looking for, find
‘I'm looking for houses in the area where I live although the properties I found so far are too expensive and I keep on searching’ ‘I'm looking for a page that contains my data usage [in a mobile phone company site] but I'm not able to find it even if I know this page exists’
Comparison
Comparing, switching, between, different tabs
‘I was switching between tabs rapidly to compare information’. ‘Just tabbing between different search results provided by Google’.
feedback). The quotations in the rightmost column are selected from the feedback we got and are exemplar of the category to which they belong. Most of the classifications are exclusive. That is, most of the times, there is not any overlap between different categories: when editing a Google docs document (i.e. Web application use) participants do not navigate on the Web to complete certain task (i.e. purposeful use). However, there are some exceptions: some participants, especially in social network sites, look for a specific content: ‘I'm trying to find some pictures a friend of mine uploaded on Facebook’. In these situations, we assessed the whole context of use and in this particular case it was considered a purposeful use rather than a consumption use.
Table 3. Overall values per strategy and Web use indicate the weighted aggregation of the scores obtained in the two studies. Results show that Web usage strongly determines the precision of the algorithms. Overall precision is very low – on the range of [0–0.17] – in all categories but on purposeful use, where a high precision (0.85) is obtained. If we compare the results between studies, scores are in general higher for study 1, except for purposeful use, where there was an increase in overall precision from 0.81 to 0.87. For lean back and comparison use, not only very few of the strategies are triggered, but those which are triggered consistently fail to detect problematic navigation situations. Consumption and opportunistic show a more variable behaviour although still yield low overall precision. Purposeful use does not only yield higher precision scores, but these are more stable across studies. Purposeful use could potentially be split up in further categories based on the type of website in which this mode of Web use was identified: search tasks through search queries typically in search engines and e-commerce sites and exploration on directories such as cinema listings, travel planning websites, furniture catalogues and online libraries. Table 3 highlights those strategies whose detection indicated navigation problems with a high precision and were consistent across the two studies: quick preview, asking for help and quick revisitation are indicators of navigation problems when the Web use is purposeful and retracing behaviour is also a predictor of problems when the use is opportunistic and purposeful. The values for the purposeful category are:
5. Results A total number of 367 and 220 h of navigation were monitored in the first and second studies respectively. Median times of Web use differed between groups: 15 h 05 min for the first study and 5 h 57 min for the second. The algorithms detected 183 strategies in study 1 (one strategy every 2 h 30 min), while the participants in the study 2 triggered 191 detections (one strategy every 1 h 26 min). In both studies participants responded to the notifications 80% of the times, accounting for 147 and 153 responses respectively for each study. Often we see experimental tools or instruments having high usage early on, when people are enthusiastic (and/or playing with the tool to see how it works), then a steep decline.3 This mortality rate is not directly measured in our case because usage decline maybe a natural effect (a trough in use). This said, we did ask that participants inform us if they no longer wished to participate further – we did not receive any such notifications. We rationalise that this is because the overhead in participation is both tiny, and stress free – compared to a laboratory study. Our participants did not need to attend a particular location at a particular time, and they did not have to alter their habits when surfing the Web. Further, due to the iterative development cycle – discussed 3
T1 T2 T3 T4
Quick preview: 27 hits/30 instances. Asking for help: 12/16. Retracing: 9/11. Quick revisitation: 2/2.
Further, we had 83 instances for consumption, 36 for opportunistic, 59 for purposeful, 9 for lean back, 15 for Web app and 3 for comparison mode. Failure to detect the quick preview strategy is mainly provoked by consumption use (‘I'm just trying to see the latest updates on
Know as mortality, or dropout rate.
6
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
Table 3 Precision scores per strategy across studies and mode of Web use where dark red corresponds to 0–0.24; orange: 0.25–0.49; light-green: 0.50–0.74; and dark-green: 0.75–1; NA: no occurrences.
Facebook’) and opportunistic use was reported as ‘I was just bored and decided to browse this Web’ when browsing on a travel website. On purposeful use, a precision of 0.90 is obtained and problems were caught when participants could not find what they were looking for, ‘I'm looking for kitchen tables [on Ebay] and I'm trying different searching criteria: I search by price or by closeness to where I live’. Sometimes the quick review strategy was observed when participants knew a page contained what they were looking for and scrolled down very quickly to check if any of them contained the expected content. Asking for help is the least precise detection algorithm: for purposeful use, the Web use that exhibited a better performance, 3 out of 4 detections are right (0.75). The algorithm was right in detecting situations which were unclear. For instance, when navigating on the journey planner of the Transport for London website a participant acknowledged the situation was confusing: ‘It's only confusing as I'm trying to arrange convenient travel times between the tube and flight times’. Another participant felt lost as he was not able to find file announced at the website, ‘I'm looking for a place to download the file the page is talking about’. False positives again indicated mainly Web mail and social network use. The algorithm detecting retracing behaviours showed a good performance for opportunistic use, yielding the maximum precision in a consistent way. Feedback for opportunistic use suggests that the retracing algorithm successfully detected turning points on a meandering navigation ‘I clicked on a link without much expectations about what was coming next’ and ‘I've never been to the page before, I'm following a link from a tweet’. For purposeful use, results suggest that the algorithm detects problems when participants needed a particular piece of information to complete their tasks. Participants indicate a lack of completion either because they are not able to complete their tasks: ‘Google is not giving me the answer I need’; or because they do not have enough information to continue their navigation, ‘There wasn't enough information on the previous link’. For purposeful use participants employed the quick revisitation technique to check different alternatives ‘I'm trying to decide which product to buy [in Amazon]. There do not seem to be any alternatives to the one.’ or to recheck previous actions, ‘I'd forgotten whether I was looking at the primary results page, or the citations page [at Google Scholar]’. Interestingly, precision scores of 0.5 were obtained for opportunistic use, with feedback suggesting that participants were exploring the Web: ‘I was looking for something Google suggested while typing’. The hits rely on the fact that quick revisitation occurs often in an exploratory context.
navigation involved in these activities. For lean back use there is little interaction on the browser apart from using the audio/video player controls, and navigation behaviour is absent. In the case of Web app use, cloud applications contain problems that might be more closely related to traditional usability problems of desktop applications, which are unrelated to hypertext navigation. When it comes to comparing pages there is a particular set of idiosyncrasies, again unrelated to hypertext navigation, if comparisons are conducted switching tabs (Dubroy and Balakrishnan, 2010). As far as the remaining Web uses are concerned, purposeful use does clearly stand up for highly precise and stable scores across studies. On the other hand, consumption yields at times highly precise but inconsistent precision scores: see for instance retracing, which gets no precision at all in the second study after having obtained scores of 1 and 0.6 respectively. Consumption use is typically exhibited in bursts of activity in pages users are familiar with (Lindley et al., 2012), so a priori, this familiarity should not pose any problem to users. Quick revisitation obtains the higher and reliable scores – still low – for opportunistic use. The fact that opportunistic use is of an exploratory nature (entailing wandering around behaviours and vague navigation goals) may lead users to give up the task as soon as they encounter an obstacle. This withdrawal would not have any negative consequence on users as there is no information need to satisfy. Hence, users could choose any other Web page from which they start another navigation session. The operationalisation of navigation strategies as indicators of navigation problems was precise when participants searched for specific information. This behaviour was articulated when looking for a specific Web page in a site, a specific piece of information (i.e. looking up a word in a dictionary) or a particular item in a e-commerce catalogue. In the literature there are different definitions of ‘finding specific information’. Indeed, this can be seen in search–browse cycles in which navigation is recognised to involve a sequence of searching and browsing tasks based on both known and serendipitous information discovery. When people want to find information in a search task, they may go to a Web page, but if they go to a page during a search task, they may not necessarily be finding specific information (e.g. finding ‘the world population in 2014’). They may be exploring a new category and finding new avenues for exploration (e.g. as in starting with a Wikipedia page for exploratory search). While this is correct, to an extent, the defining aspect here is progress. The user makes progress through sequences of pages, however this is not the same as navigation strategies which become problematic – and detection is different because progress does not occur. In these circumstances, participants could not afford to withdraw from carrying out their tasks as their completion was key for the activities they were carrying out. Navigation problems triggered strategies to overcome the obstacles encountered. Whether these strategies were key to successfully pre-empt problems and complete their tasks we do not know. The proposed set of
6. Discussion The performance of the proposed algorithms as detectors of navigation problems is very variable. Lean back, Web app, and comparison use do not look as the scenarios in which navigation strategies are indicator of problems, mainly because there is little 7
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
educated and interested in the Web. How typical of the wider population the strategies we found are is not a question we can directly answer nor are the consequences for the frequencies for the strategies found. However, we also do not claim predictive ability across a population, but that we can predict the action of an individual based on the traits they are exhibiting. Further, while we also accept that our tool – by nature – may make some incorrect predictions whereby help is offered but not required, these are still more helpful than not providing help when it is required. Indeed, for site owners who may have thousands or millions of visitors our work may have a great practical significance. Pragmatically precision (as opposed to recall) is important for both the user experience and the site owners conservative logging of the interaction scenarios. On a final methodological note, it has been suggested that we may bolster our precision by replaying recordings of user activity after the fact. Participants could note what problems they have, based on e.g. watching a video replay of their interactions. Indeed, this is a useful suggestion, and one that we did employ in the early laboratory based development work as detailed in Section 2.2. This iterative development started in the laboratory with participants self-reporting and experimenters making notes. The work then went into longitudinal iterative development, whereby remote users self-reporting as to whether our proposed algorithms correctly identified coping behaviours.
algorithms intercepted these strategies correctly when navigation was purposeful. The fact that some of the strategies emerged in the context of work tasks (Lindley et al., 2012; Thomas, 2014) may be the cause of poor performance on non-work related modes of Web use (i.e. all but opportunistic and purposeful). 6.1. Automating the detection of Web use Telling automatically if a Web page belongs to a social network, Web mail, broadcasts audio/video content or contains a desktop application on the cloud is relatively straightforward. Naively, just by maintaining a list of the most popular sites that belong to such category (a la blacklist mode), a large amount of noise would be filtered. By following this approach we could categorise Web use as defined in Table 1. Of course site owners, who may use our instrumentation, know well what is on their site. However, their site classification may not take into account the behavioural frameworks that allow them to understand ‘interaction-in-use’. Regarding the Web usages of Table 2, the detection of comparison use could be achieved by analysing switching behaviour across tabs (Dubroy and Balakrishnan, 2010). However, detecting (and distinguishing) opportunistic and purposeful use is more challenging as for the same behaviour the only difference is the intention of users and their willingness to engage (Lindley et al., 2012). Future work should focus on how to distinguish such Web uses by finding intent predictors from interaction data. Automatic detection of Web use jointly with the set of algorithms presented in this paper opens up a number of possibilities in the field of Web interaction. From a practical point of view, the problems encountered by users could be tackled from two perspectives: one, webmasters could include these algorithms in their sites and keep track the navigation problems undergone by the users. A high number of strategy detections in a particular Web page would be an indicator of usability problems that would require immediate attention. Two, usertailored interventions on the browser could be automatically delivered employing adaptive Web techniques. From a research perspective, these algorithms pave the way towards automated usability testing of navigation problems-in-use; also they enable the longitudinal analysis of individuals' behaviour.
7. Concluding remarks We discover some navigation strategies – quick preview, asking for help, retracing and quick revisitation – that indicate navigation problems on the Web when users have a clear informational need or want to complete a transaction. We operationalise such navigation strategies as algorithms to detect navigation problems on the user-end, in real-time and automatically. These findings are robust in that
• •
6.2. Methodological considerations The naturalistic setting in which the studies were run prevents us from knowing all the existing problems because we do not control the stimuli, the tasks and the specific informational needs of users. Therefore whether the algorithms detected all the problems encountered is unknown. At most we find that the algorithms are precise when navigation is purposeful (the proportion of detected navigation problems that are also correct) but we cannot compute the recall and accuracy because we do not know the number of false negatives (the problems that were missed by the algorithms). Controlled studies would allow us to compute recall and tailor thresholds for websites and individuals (Thomas, 2014) but their approach is only applicable to the remit of the websites used as stimuli. At the expenses of not knowing the recall, our approach is more generalisable and has the following advantages: (i) we avoid the harms to external validity caused by a laboratory setting; (ii) we allow problematic situations to emerge naturalistically; (iii) we do not have to induce problematic situations in the lab, which might be ethically problematic; (iv) and we avoid in situ observations, in which the presence of the observer leads to the The Guinea Pig Effect (Webb et al., 2000). Future work will be devoted to the collection of more data points in order to support and corroborate the reliability of these initial promising outcomes. However, we do find that our study may not be balanced as per the general population. The way we advertised, the task itself, and the nature of the users self-selection may all have affected the demographics of our participant pool. Our participants tended to be well
the ecological validity of these strategies is high as they were exhibited in settings which were not constrained by the tasks, the presence of experts or the location of the study; they are highly precise as corroborated by 34 participants who gave feedback about how well the strategies detected actual problems during a period of 10 days.
As future work, it would be interesting to run the CoLiDes model (Blackmon et al., 2005) in parallel to our approach, to investigate whether they provide the same results. E.g. do they detect the same confusing hyperlinks? Further, it would be nice to understand how easy (or hard) is it to diagnose problems with a site, based on this classification of user sessions, and what variety of problems, or problem pages, relate to a single site? This work presents an interesting contrast with other (more constrained and controlled) work, where similar techniques worked well. Indeed, we may argue that the longitudinal in-the-wild nature of our work is as much a contribution than the results detailed here. This said we believe these findings – as they stand – open up new avenues for usability testing as problems can be automatically collected and analysed as they are encountered. Moreover, interventions could be delivered as soon as problems are detected. Acknowledgements
• • • 8
This research was approved by the University of Manchester School of Computer Science Ethics Committee, approval ID: CS51. The source code of the Mozilla Firefox plugin discussed in the paper is available at https://www.bitbucket.org/IAMLab/cope. The COPE project is funded by the Department of Education, Universities and Research of the Basque Government (BFI-2010305) and the Engineering and Physical Sciences Research Council
International Journal of Human - Computer Studies 101 (2017) 1–9
M. Vigo, S. Harper
guidelines for categorized overviews. Inf. Process. Manag. 44 (2), 463–484, URL 〈http://www.sciencedirect.com/science/article/pii/ S0306457307001574〉. Lazarus, R., Folkman, S., 1984. Stress, Appraisal, and Coping. Springer. Lindley, S.E., Meek, S., Sellen, A., Harper, R., 2012. “It's simply integral to what I do': enquiries into how the web is weaved into everyday life. In: Proceedings of WWW'12, pp. 1067–1076, URL http://dx.doi.org/10.1145/2187836.2187979. Masson, M.E., 1982. Cognitive processes in skimming stories. J. Exp. Psychol.: Learn. Mem. Cogn. 8 (5), 400–417. Milic-Frayling, N., Jones, R., Rodden, K., Smyth, G., Blackwell, A., Sommerer, R., 2004. Smartback: Supporting users in back navigation. In: Proceedings of WWW'04, pp. 63–71. URL http://dx.doi.org/10.1145/988672.988682. Navalpakkam, V., Churchill, E., 2012. Mouse tracking: measuring and predicting users' experience of web-based content. In: Proceedings of CHI'12, pp. 2963–2972, URL http://dx.doi.org/10.1145/2208636.2208705. Novick, D.G., Elizalde, E., Bean, N., 2007. Toward a more accurate view of when and how people seek help with computer applications. In: Proceedings of the 25th Annual ACM International Conference on Design of Communication. SIGDOC'07, pp. 95– 102, URL http://dx.doi.org/10.1145/1297144.1297165. Obendorf, H., Weinreich, H., Herder, E., Mayer, M., 2007. Web page revisitation revisited: implications of a long-term click-stream study of browser usage. In: Proceedings of CHI'07, pp. 597–606, URL http://dx.doi.org/10.1145/1240624. 1240719. Sellen, A.J., Murphy, R., Shaw, K.L., 2002. How knowledge workers use the web. In: Proceedings of CHI'02, pp. 227–234, URL http://dx.doi.org/10.1145/503376. 503418. Thomas, P., 2014. Using interaction data to explain difficulty navigating online. ACM Trans. Web 8 (4), 24:1–24:41, URL http://dx.doi.org/10.1145/2656343. Vigo, M., Harper, S., May 2013a. Challenging information foraging theory: Screen reader users are not always driven by information scent. In: Proceedings of the 24th ACM Conference on Hypertext and Social Media. HT'13. ACM, New York, NY, USA, pp. 60–68, URL http://dx.doi.org/10.1145/2481492.2481499. Vigo, M., Harper, S., May 2013b. Considering people with disabilities as überusers for eliciting generalisable coping strategies on the web. In: Proceedings of the 5th Annual ACM Web Science Conference. WebSci'13. ACM, New York, NY, USA, pp. 441–444, URL http://dx.doi.org/10.1145/2464464.2464494. Vigo, M., Harper, S., 2013c. Coping tactics employed by visually disabled users on the web. Int. J. Hum.–Comput. Stud. 71 (November (11)), 1013–1025, URL 〈http://www.sciencedirect.com/science/article/pii/ S1071581913001006〉. Vigo, M., Harper, S., May 2013d. Evaluating accessibility-in-use. In: Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility. W4A'13. ACM, New York, NY, USA, pp. 7:1–7:4, URL http://dx.doi.org/10.1145/2461121. 2461136. Webb, E., Campbell, D., Schwartz, R., Sechrest, L., 2000. Unobtrusive Measures 2nd ed.. Sage Publications. White, R., 1974. Strategies of adaptation: an attempt at systematic description. In: Coping and Adaptation. Basic Books. Yesilada, Y., Harper, S., Chen, T., Trewin, S., 2010. Small-device users situationally impaired by input. Comput. Hum. Behav. 26 (3), 427–435, URL 〈http://www.sciencedirect.com/science/article/pii/ S0747563209001861〉. Zhou, J., Ding, C., Androutsos, D., 2006. Improving web site search using web server logs. In: Proceedings of the 2006 Conference of the Center for Advanced Studies on Collaborative Research. CASCON'06. IBM Corp., Riverton, NJ, USA, URL http://dx. doi.org/10.1145/1188966.1188996.
(EPSRC) Knowledge Transfer Award (EP/H500154/1–R113685) 'TEOA: Text Entry For Older Adults' Grant References Adar, E., Teevan, J., Dumais, S.T., 2008. Large scale analysis of web revisitation patterns. In: Proceedings of CHI'08, pp. 1197–1206, URL http://dx.doi.org/10.1145/ 1357054.1357241. Aula, A., Khan, R.M., Guan, Z., 2010. How does search behavior change as search becomes more difficult? In: Proceedings of CHI'10, pp. 35–44, URL http://dx.doi. org/10.1145/1753326.1753333. Blackmon, M.H., Kitajima, M., Polson, P.G., 2005. Tool for accurately predicting website navigation problems, non-problems, problem severity, and effectiveness of repairs. In: Proceedings of CHI'05, pp. 31–40, URL http://dx.doi.org/10.1145/1054972. 1054978. Byrne, M.D., John, B.E., Wehrle, N.S., Crow, D.C., 1999. The tangled Web we wove: a taskonomy of WWW use. In: Proceedings of CHI'99, pp. 544–551, URL http://dx. doi.org/10.1145/302979.303154. Ceaparu, I., Lazar, J., Bessiere, K., Robinson, J., Shneiderman, B., 2004. Determining causes and severity of end-user frustration. Int. J. Hum.–Comput. Interact. 17 (3), 333–356, URL 〈http://www.tandfonline.com/doi/abs/10.1207/ s15327590ijhc1703_3〉. Chi, E.H., Rosien, A., Supattanasiri, G., Williams, A., Royer, C., Chow, C., Robles, E., Dalal, B., Chen, J., Cousins, S., 2003. The bloodhound project: automating discovery of web usability issues using the InfoScentTM simulator. In: Proceedings of CHI'03, pp. 505–512, URL http://dx.doi.org/10.1145/642611.642699. Claypool, M., Le, P., Wased, M., Brown, D., 2001. Implicit interest indicators. In: Proceedings of the 6th International Conference on Intelligent User Interfaces. IUI'01. ACM, New York, NY, USA, pp. 33–40, URL http://dx.doi.org/10.1145/ 359784.359836. Dan, O., Dmitriev, P., White, R.W., 2012. Mining for insights in the search engine query stream. In: Proceedings of the 21st International Conference on World Wide Web. WWW'12 Companion. ACM, New York, NY, USA, pp. 489–490, URL http://dx.doi. org/10.1145/2187980.2188091. Ding, C., Zhou, J., 2007. Improving website search with server log analysis and multiple evidence combination. Int. J. Web Grid Serv. 3 (June (2)), 103–127, URL http://dx.doi.org/10.1504/IJWGS.2007.014071. Dubroy, P., Balakrishnan, R., 2010. A study of tabbed browsing among Mozilla Firefox users. In: Proceedings of CHI'10, pp. 673–682, URL http://dx.doi.org/10.1145/ 1753326.1753426. Feild, H.A., Allan, J., Jones, R., 2010. Predicting searcher frustration. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR'10, pp. 34–41, URL http://dx.doi.org/10.1145/ 1835449.1835458. Herder, E., Juvina, I., 2004. Discovery of individual user navigation styles. In: Proceedings of Workshop on Individual Differences in Adaptive Hypermedia at Adaptive Hypermedia. AH'04. Juvina, I., Oostendorp, H.V., 2006. Individual differences and behavioral metrics involved in modeling web navigation. Univers. Access Inf. Soc. 4 (3), 258–269, URL http://dx.doi.org/10.1007/s10209-005-0007-7. Kellar, M., Hawkey, K., Inkpen, K.M., Watters, C., 2008. Challenges of capturing natural web-based user behaviors. Int. J. Hum.–Comput. Interact. 24 (4), 385–409. Kellar, M., Watters, C., Inkpen, K.M., 2007. An exploration of web-based monitoring: implications for design. In: Proceedings of CHI'07, pp. 377–386, URL http://dx.doi. org/10.1145/1240624.1240686. Kules, B., Shneiderman, B., 2008. Users can change their web search tactics: design
9