LNAI 6703 - Towards a Fully Computational Model of Web-Navigation

3 downloads 49 Views 728KB Size Report
tools have been developed so far for web navigation. Several existing ... a browser. She or he could be listening to songs, writing a report, replying by email to.
Towards a Fully Computational Model of Web-Navigation Saraschandra Karanam1,*, Herre van Oostendorp2, and Bipin Indurkhya1 1

International Institute of Information Technology-Hyderabad, Gachibowli, Hyderabad, Andhra Pradesh, India [email protected],[email protected] 2 Utrecht University, Utrecht, The Netherlands [email protected]

Abstract. In this paper, we make the first steps towards developing a fully automatic tool for supporting users for navigation on the web. We developed a prototype that takes a user-goal and a website URL as input and predicts the correct hyperlink to click on each web page starting from the home page, and uses that as support for users. We evaluated our system's usefulness with actual data from real users. It was found that users took significantly less time and less clicks; were significantly less disoriented and more accurate with systemgenerated support; and perceived the support positively. Projected extensions to this system are discussed. Keywords: cognitive model, automation, support, navigation, web.

1 Introduction This paper presents an approach towards the support of web navigation by means of a computational model. Though several studies have established the usefulness of providing support based on cognitive models to end-users, no such fully automatic tools have been developed so far for web navigation. Several existing tools are either used for evaluating hyperlink structure (Auto-CWW based on CoLiDeS) or for predicting user navigation behavior on the web (Bloodhound based on WUFIS). For example, Cognitive Walkthrough for the Web (Auto-CWW) [1] is an analytical method (based on a cognitive model of web-navigation called CoLiDeS [2]) to inspect usability of websites. It tries to account for the four steps of parsing, elaborating, focusing and selecting of CoLiDeS. It also provides a publicly available online interface called AutoCWW (http://autocww.colorado.edu/), which allows you to run CWW online. One bottleneck is that the steps of identifying the headings and the hyperlinks under each heading in a page, designating the correct hyperlink corresponding for each goal and various parameters concerning LSA (a computational mechanism to compute similarity between two texts, described later in detail) like selecting a semantic space, word frequencies and minimum cosine value to come up *

Corresponding author.

K.G. Mehrotra et al. (Eds.): IEA/AIE 2011, Part I, LNAI 6703, pp. 327–337, 2011. © Springer-Verlag Berlin Heidelberg 2011

328

S. Karanam, H. van Oostendorp, and B. Indurkhya

with the link and heading elaborations need to be entered manually. Auto-CWW then generates a report identifying any potential usability problems such as unfamiliar hyperlink text, competing/confusing hyperlinks. A designer can make use of this report to make corrections in the website's hyperlinks. Bloodhound developed by [3] predicts how typical users would navigate through a website hierarchy given their goals. It combines both information retrieval and spreading activation techniques to arrive at the probabilities associated with each hyperlink that specify the proportion of users who would navigate through it. Bloodhound takes a starting page, few keywords that describe the user-goal, and a destination page as input. It outputs average task success based on the percentage of simulated users who reach the destination page for each goal. ScentTrails [4] brings together the strengths of both browsing and searching behavior. It operates as a proxy between the user and the web server. A ScentTrails user can input a list of search terms (keywords) into an input box at any point while browsing. ScentTrails highlights hyperlinks on the current page that lead the user towards his goal. It has been found that with ScentTrails running, users could finish their tasks quicker than in a normal scenario without ScentTrails. Both Auto-CWW and Bloodhound are tools for web-designers and evaluators and not for supporting end-users. ScentTrails, though it is designed for supporting endusers, it makes an underlying assumption that knowledge about the website structure is assumed to be known beforehand. User can enter queries at any point of time during a browsing session and the ScentTrails system directs the user along paths that lead to his or her desired target pages from the current page. Our proposed system does not make this assumption. It takes as input from the user only the goal and a website URL and nothing else. A fully automatic model of web-navigation has many potential benefits for people working under cognitively challenging conditions. Screen readers for visually impaired persons can be made more efficient with an automated model that can read out only relevant information. Elderly people have one or more of the following problems: they can forget their original goal, or forget the outcome of the previous steps which can have an impact on their next action; they may have low mental capacity to filter unnecessary information that is not relevant to their goal; and their planning capabilities may be weak in complex scenarios [5]. For these situations, an automated model can plan an optimized path for a goal, can provide relevant information only and keep track of their progress towards the completion of the task. Naive users who are very new to the internet generally do not employ efficient strategies to navigate: they follow more of an exploratory navigation style; they get lost and disoriented quite often; and they are slow and also inefficient in finding their information. An automated model that can provide visual cues to such users can help them learn the art of navigation faster. An experienced internet user generally opens multiple applications on her or his machine and also multiple tabs in a browser. She or he could be listening to songs, writing a report, replying by email to a friend, chatting with friends on a chat-application and searching on internet for the meaning of a complex word she or he is using in a report. Under these scenarios, she or he would definitely appreciate an automated model that can reduce the time spent on one of these tasks. In previous research [1], it was shown that the CoLiDeS model could be used to predict user navigation behavior and also to come up with the correct navigation path

Towards a Fully Computational Model of Web-Navigation

329

for each goal. CoLiDeS can find its way towards the target page by picking the most relevant hyperlink (based on semantic similarity between the user-goal and all the hyperlink texts on a web-page) on each page. The basic idea of the CoLiDeS model [2] [7] is that a web page is made up of many objects competing for user's attention. Users are assumed to manage this complexity by an attention cycle and actionselection cycle. In the attention cycle they first parse the web page into regions and then focus on a region that is relevant to their goal. In the action-selection cycle each of the parsed regions is comprehended and elaborated based on user's memory, and here various links are compared in relevancy to the goal and finally the link that has the highest information scent – that is, the highest semantic similarity between the link and the user’s goal – is selected. For this, Latent Semantic Analysis technique is used [6]. This process is then repeated for every page visited by users until they reach the target page. This model can be used to come up with a tool in which we give the successful path back to the user and this could help the user in reducing the efforts spent in filtering unnecessary information. Thus, the tool we are developing is designed to help users in cognitively challenging scenarios. In the next section, we provide the details of our system.

2 System Details We provide a brief description of Latent Semantic Analysis (LSA) developed by [6], which forms the backbone of our system. LSA is a machine-learning technique that builds a semantic space representing a given user population’s understanding of words, short texts and whole texts by applying statistical computations, and represents them as a vector in a multidimensional space of about 300 dimensions. It uses singular value decomposition: a general form of factor analysis to condense a very large matrix of terms-documents co-occurrence into a much smaller representation. The cosine value between two vectors in this representation gives the measure of the semantic relatedness. Each cosine value lies between +1 (identical) and –1 (opposite). Near-zero values represent two unrelated texts. LSA provides many different semantic spaces (‘psychology’, ‘biology’, ‘heart’, for example) to represent the differences in vocabulary levels of various user-groups and terminology in different domains (http://lsa.colorado.edu). To model age differences, there are semantic spaces available based on American Grade levels: 03, 06, 09, 12 and 1st year college. LSA provides a functionality to compute similarity between a piece of text and a group of texts (one-to-many analysis) and a functionality to retrieve all those terms that are close (minimum frequency and minimum similarity measure can be specified) to a particular term from the semantic space. Building a semantic space locally and running LSA algorithms is not our expertise. So, we plan to use the LSA server provided by the University of Colorado (http://lsa.colorado.edu). This means that we require our program to automatically fill in values in the forms in different pages of the LSA server. While we leave out the coding details, we provide a high-level description of the steps involved in running our system:

330

1 2 3

4 5 6 7 8

S. Karanam, H. van Oostendorp, and B. Indurkhya

Take user-goal and website URL as input. Extract hyperlinks from the website. Elaborate the hyperlink text using LSA’s ‘Near-neighbor analysis’. These elaboration steps simulate the spreading-activation processes happening in the user’s working memory, which are known to help in text comprehension. [1]. Compute semantic similarity between the user-goal and the elaborated representation of hyperlink text using LSA’s ‘one-to-many analysis’. Output the hyperlink with the maximum similarity measure. Extract the URL associated with the hyperlink in the previous step. If the current page is the target page, stop the system, else go to Step 8. Give the URL extracted in Step 6 as input back to Step 2.

Fig. 1. Schematic representation of the steps involved in the automation system

Refer to Figure 1 for a schematic representation of these steps.

3 A Behavioral Study to Evaluate the Support Generated by the Automated System Our aim is to study the usefulness of automated support generated by our system. Would the user find it useful? In which form should the support be provided? There is already some literature in this direction [7], where support in the form of auditory cues was found to be annoying by the participants. Here we replicate the study of [7] by using visually highlighted links. In [7], the process of inserting suggestions was not done automatically. We do this by creating two versions of a mock-up website: a control condition without any visual highlighting and a support condition with the model-predicted hyperlinks highlighted in green colour. We hypothesize that the

Towards a Fully Computational Model of Web-Navigation

331

support will be found useful by the participants; and their navigation performance in terms of time, number of clicks taken to finish the task, accuracy and overall disorientation will significantly improve. 3.1 Method Participants. Nineteen participants from International Institute of Information Technology-Hyderabad and five participants from Jawaharlal Nehru Technological University, Hyderabad participated in the experiment. All were computer science students. A questionnaire with six multiple-choice questions on Human Body, the topic of our mock-up pages, was given to the participants. Correct answers were scored as 1 and wrong answer as 0. Individual scores for the participants were computed by adding their scores for all the questions. All our participants scored low, so we can safely assume that they had low prior domain knowledge. Material. A mock-up website on the Human Body with 34 web pages spread across four levels of depth was used. Eight user-goals (or tasks), two for each level, which required the users to navigate, search and find the answer, were designed. Design. We had two conditions: a control condition, where no support was provided; and a support condition, where support was provided in the form of highlighted links. These conditions are shown in Figures 2 and 3, respectively.

Fig. 2. Navigation format on different pages in the control condition

The automated system based on CoLiDeS described in the previous section was run on the eight different information retrieval tasks. The results of the simulations were paths predicted by the system. Based on these results, the model-predicted paths for each goal were highlighted in green color (See Figure 3). It is important to emphasize that this support is based on the automated model, which is based on CoLiDeS: computation of semantic similarity between the goal description and hyperlink text. We used a between-subjects design: half the participants received the

332

S. Karanam, H. van Oostendorp, and B. Indurkhya

Fig. 3. Navigation format on different pages in the support condition

control condition and the other half the support condition. The dependent variables were mean task-completion time, mean number of clicks, mean disorientation and mean task accuracy. Disorientation was measured by using Smith’s measure [8] L = √((N/S – 1)² + (R/N – 1)²) Where, R = number of nodes required to finish the task successfully (thus, the number of nodes on the optimal path); S = total number of nodes visited while searching; N = number of different nodes visited while searching. Task accuracy was measured by scoring the answers given by the users. The correct answer from the correct page was scored 1. A wrong answer from the correct page was scored 0.5. Wrong answers from wrong pages and answers beyond the time limit were scored 0. Procedure. The participants were given eight information retrieval tasks in random order. They were first presented with the task description on the screen and then the website was presented in a browser. The task description was always present in the top-left corner, in case the participant wished to read it again. In the control condition, participants had to read the task, browse the website, search for the answer, type the answer in the space provided and move to the next task. In the support condition, the task was the same except that they got a message reminding them that they are moving away from the model-predicted path if they did not choose the modelpredicted hyperlink, but they were free to choose their path. 3.2 Results Mean task-completion time. An independent samples t-test between control and support conditions was performed. There was a significant difference in mean taskcompletion times between the control and support conditions t(22)=2.67, p

Suggest Documents