Document not found! Please try again

Adaptive visualization of search results: Bringing user models to visual ...

3 downloads 48477 Views 506KB Size Report
even the best search engines cannot achieve perfect precision: the returned ranked list of ... more on the search engine ranking (which is not perfect) than on their own abilities .... visualization tool that supports document exploration with Point ...
Original Article

Adaptive visualization of search results: Bringing user models to visual analytics Jae-wook Ahn and Peter Brusilovsky

Abstract Adaptive visualization is a new approach at the crossroads of user modeling and information visualization. Taking into account information about a user, adaptive visualization attempts to provide user-adapted visual presentation of information. This paper proposes Adaptive VIBE, an approach for adaptive visualization of search results in an intelligence analysis context. Adaptive VIBE extends the popular VIBE visualization framework by infusing user model terms as reference points for spatial document arrangement and manipulation. We explored the value of the proposed approach using data obtained from a user study. The result demonstrated that user modeling and spatial visualization technologies are able to reinforce each other, creating an enhanced level of user support. Spatial visualization amplifies the user model's ability to separate relevant and non-relevant documents, whereas user modeling adds valuable reference points to relevance-based spatial visualization. Information Visualization (2009) 8, 167 -- 179. doi:10.1057/ivs.2009.12

C O

PY

School of Information Sciences, 135 North Bellefield Avenue, University of Pittsburgh, Pittsburgh, PA 15260, USA. E-mails: [email protected], [email protected]

O

R

Keywords: visualization; VIBE; user model; query; information filtering

A U

TH

Introduction

Received: 18 March 2009 Revised: 18 March 2009 Accepted: 20 April 2009

The The emergence and the immense popularity of the World Wide Web caused a major advance in the field of information retrieval. Despite a rapidly increasing number of indexed pages, modern Web search engines are able to retrieve good quality results in response to a user’s query. Yet, even the best search engines cannot achieve perfect precision: the returned ranked list of results still mixes pages that are truly relevant to the user needs with those that bear only marginal similarity to the query. The majority of Web searchers, who have insufficient search experience and poor understanding of search mechanisms, use very short queries that tend to return many irrelevant results.1 Distinguishing good and bad results is still a major problem for the majority of searchers, who are known to rely more on the search engine ranking (which is not perfect) than on their own abilities to recognize relevant pages.2,3 As a result, they also rarely go beyond a few top results,4 further decreasing their chances of selecting relevant pages. The need to improve the user’s search experience motivated researchers from several fields ‘beyond information retrieval’ to explore how specific techniques from their fields can enhance the value of information retrieval for end users. Among these efforts, very promising results were delivered by researchers from two fields: user modeling and information visualization. Researchers in the field of user modeling explore the application of user models (also known as user profiles) to improve a range of information access technologies.5 Developed originally in the context of information filtering, user models are now broadly applied in both content-based recommendation systems6 and personalized Web search systems.7

© 2009 Palgrave Macmillan 1473-8716 Information Visualization www.palgrave-journals.com/ivs/

Vol. 8, 3, 167 – 179

Ahn and Brusilovsky

PY

explicit user modeling with relevance-based visualization. Our studies demonstrated that these technologies are able to reinforce each other, creating an enhanced level of user support. Spatial visualization amplifies the user model’s ability to separate relevant and not-relevant documents, while user modeling adds valuable reference points to relevance-based spatial visualization. The next section introduces the Adaptive VIBE technology along with the context in which it was implemented. The following sections report the design and the results of a user study of Adaptive VIBE. At the end, we summarize the contribution of the paper and discuss future research plans. Despite a good volume of research in both contributing areas, there are few attempts to bring adaptive search and information visualization together. The only work that we can cite as an example is the Lighthouse system,19 which uses spatial visualization approach to adaptive visualization.

C O

Adaptive Visualization of Search Results with Adaptive VIBE

The Adaptive VIBE visualization approach proposed in this paper is based on the well-known VIBE visualization framework.15 To implement Adaptive VIBE, we developed a Web-based version of the VIBE framework and extended it with several features that are important for adaptive visualization. To evaluate Adaptive VIBE in a meaningful context, we integrated it into a text-based adaptive information retrieval system, TaskSieve. This section briefly introduces the TaskSieve system and the VIBE framework, followed by a discussion of the integrated adaptive visualization platform.

A U

TH

O

R

Observing user actions over time (for example, queries issued, document selected and explored, links traversed) a personalized search system attempts to capture user interests ‘beyond the query’ and to represent them in the user model. The user model can be either applied to generate query expansions8,9 or re-rank search results promoting documents that are more similar to user interests represented in the model.10,11 In both cases, user models are able to separate relevant and non-relevant documents by pulling relevant documents to the top of the ranked list, while also decreasing the rank of non-relevant documents. Among the cited approaches, the adaptive re-ranking of search results become very popular over the last few years and used in a range of Web search systems.7,12–14 Information visualization systems explore a very different approach to help users of search engines: visual presentation of search results. The goal of this approach is to replace a one-dimensional ranked list with a twoor three-dimensional presentation of retrieved results, appealing to human intelligence and a human’s ability to analyze visual information. Over the years, information visualization systems have explored a range of approaches to help users in retrieving relevant documents. One of the most popular approaches is spatial visualization, which attempts to allocate documents in a two- or three-dimensional space. One of the benefits of spatial visualization, which is important in the context of distinguishing good and bad results, is that when one or more relevant documents are discovered, documents located close to them will likely also be relevant. Spatial visualization may use a range of techniques for spatial allocation of documents. For example, VIBE15 and InfoCrystal16 use relevance-based visualization approach, where document allocation is based on relevance between documents and query terms. Envision and Movie Finder used featurebased visualization,17,18 where vertical and horizontal document allocation is based on the value of selected metadata. Other systems used similarity based approaches such as spring-forces or self-organized maps19,20 where spatial allocation of documents is driven by similarity between documents. Among the listed approaches, the VIBE approach developed at the University of Pittsburgh and used in Adaptive VIBE is probably the most popular. It was applied with minor changes in several visualization systems.21–23 The personalization and spatial visualization approaches are typically considered as two alternative ways to assist the users of search systems, and there are almost no attempts to bring it together. However, the pioneering work of the Lighthouse system,19 which combined similarity based visualization with profile-driven personalization demonstrated that these approaches are really complementary and can reinforce each other. Inspired by this work, we explored adaptive visualization approaches based on both relevance-based and similarity based visualization.24,25 This paper presents an account of our work on the Adaptive VIBE approach, which merges

168

© 2009 Palgrave Macmillan 1473-8716

TaskSieve: The host system The version of Adaptive VIBE presented in this paper was implemented as a visual search result representation and exploration component of TaskSieve (Figure 1), an adaptive information retrieval system for intelligence analysis.26 TaskSieve is able to personalize user search results with the help of an explicit user model, which is represented as a weighted vector of terms (as shown in the rectangle in Figure 1). By user request, search results can be re-ranked taking the user model into account. As a study of the system demonstrated,26 the use of the model can improve the rank of useful documents and help the user to discover them. The user models in TaskSieve are built by observing user actions, such as saving interesting text passages in the user notebook (also known as the shoebox). Whenever the user notes are saved, TaskSieve extracts important keywords (such as TRAIN, FIRE, SKI and KILL in Figure 1) representing user interests and integrates them into the user model. More details on the user model structure and maintenance can be found in Ahn et al.26

Information Visualization Vol. 8, 3, 167 – 179

C O

PY

Adaptive visualization of search results

R

TaskSieve: Text-based adaptive information filtering system equipped with a user model viewer.

TH

The generic VIBE visualization framework

O

Figure 1:

A U

VIBE was originally developed at Molde College in Norway and the School of Information Sciences at the University of Pittsburgh.15 It is a relevance-based document visualization tool that supports document exploration with Point of Interest (POI). POIs represent key concepts or keywords and are displayed as movable icons on the screen. The documents are placed according to their similarities to the POIs. The main idea is that if a document is more similar to POI Pa than POI Pb , then it is placed closer to Pa than Pb , and the closeness is determined by the document-to-POI similarity ratio. For example, if a document has similarities of 0.3 and 0.6 to POI Pa and Pb respectively, the similarity ratio to these two POIs is 1:2 and the document is placed one-third of the way from Pb on the line connecting those two POIs, because it is twice as similar to Pb than Pa . The detailed algorithm for placing a document among multiple POIs was presented in Olsen et al.15 Users can drag and move POIs anywhere they want and the locations of the documents are dynamically updated depending on their similarities to the POIs. They can easily discover which documents are more similar to a certain POI by their locations and they can also understand the degree of similarity by the documents’ degree of movements (if documents follow the movement of a POI a lot, then they can be thought of as very similar to the POI).

© 2009 Palgrave Macmillan 1473-8716

The adaptive VIBE approach The key idea of the Adaptive VIBE approach is to make relevance-based visualization adaptive by using top user model terms as POIs. Traditional visualization of search results using VIBE treated query terms as POIs. Adaptive VIBE uses both query terms and profile terms as POIs. To develop this kind of visualization we started with our own applet-based version of VIBE that implemented basic functionalities15 and equipped it with various new features to support the user model visualization. Figure 2 shows our Adaptive VIBE implementation in the context of the TaskSieve system. Looking at the Adaptive VIBE interface (Figure 2), it is easily noticeable that the traditional text-based ranked list in Figure 1 was replaced with the Adaptive VIBE spatial visualization. Not only is the filtering result presented visually, but it is also presented adaptively. The top terms from TaskSieve’s user model (for example, LAUNDER, FUJIMORI and so forth) are added as POIs in the visualization area (Figure 2). They are shown as blue circles (2) and spatially separated from the query terms (VLADIMIR and MONTESINOS) shown as yellow circles (1) in the visualization area. Unlike the query terms, which are entered by users, the user model terms are automatically extracted by the user-modeling engine from user notes (6). The documents are visualized as white squares and placed according to their relatedness to the POIs (query or user

Information Visualization Vol. 8, 3, 167 – 179

169

O

Adaptive VIBE integrated into TaskSieve adaptive news filtering system.

TH

Figure 2:

R

C O

PY

Ahn and Brusilovsky

A U

model POIs). To reflect document relevance traditionally shown by a ranked list of results, Adaptive VIBE uses larger square sizes and more intensive colors for items more similar to the query and the user model. By mousing over or clicking on a document icon, users can examine summaries (3) or full text (4) of the documents and then add any interesting passages to the user notes area (6) by simply selecting the text passage and clicking on the ‘Relevant’ button. Because the user model (as the POIs) is directly incorporated in the visualization, users can actively mediate between the user- and the system-based information filtering results, understand how each component has contributed to them, control the effects of the visualization elements, and discover clues for heading toward better filtering results. In the example above, all 100 documents were retrieved by the user query ‘VLADIMIR MONTESINOS’. Even though all these documents are related to the person Vladimir Montesinos (former head of Peru’s intelligence service), they discuss Montesinos in different contexts, which could be more or less relevant to user’s past interests. For example, some documents may present information about the relationship between Montesinos and Peruvian President Fujimori (ones in the middle of the line between the POI MONTESINOS and the POI FUJIMORI).

170

© 2009 Palgrave Macmillan 1473-8716

Other documents may be talking about Montesinos in the context of money laundering (the POI LAUNDER). As reflected in the user model, both Fujimori and money laundering topics are of interest to the users, and the documents talking about Montesinos in these contexts may be especially relevant to the user information need. Users can examine and understand these various relationships among the documents and the POIs by observing the spatial distribution of the documents on the screen and pick up the most promising documents. To further explore relationships between documents, query terms and the user model, the users can move POIs. For example, if a user is curious about which of the user model terms LAUNDER or FUJIMORI is most closely related to a specific document (which is not clear owing to close proximity of POIs LAUNDER and FUJIMORI in the initial layout), moving these two POIs allows to discover the strength of these two POIs on various documents. The more related a document is, the more it will it be affected by the movement of the POI.

Adaptive VIBE features for user model visualization Our implementation of VIBE includes a range features to help users more efficiently explore and understand the

Information Visualization Vol. 8, 3, 167 – 179

Adaptive visualization of search results

Figure 3: Adaptive VIBE visualization supporting three POI layouts (query and user model POIs are painted in different colors and locations).

A U

TH

O

PY

R

POI allocation presents The original VIBE provided a single default layout of POIs that is just a circular form with no distinction among the POIs in terms of their positions and visual cues. We developed two additional default layouts: Hemisphere and Parallel (Figure 3). They were designed to separate the user model and the query terms. The Hemisphere layout places the query terms on the left hemisphere of the circle and the user model terms on the right hemisphere (Figure 3(b)). The Parallel layout attempts to further increase the separation by placing them in two further opposing columns (Figure 3(c)). The initial location of the POIs and documents is important because the users usually have to work with more than five POIs (query and user model POIs altogether), requiring them to perform a lot of manipulations to change the default layouts. A proper initial layout can instantly provide an information-rich overview about the relationships between the retrieved documents and the query/user model terms. This issue becomes clearer if we recall that the document positions (ranks) are very important in ranked lists provided by many search engines. Users tend to examine just a small fraction of the top ranked documents from the lists within the first or second pages1 even though they are allowed to explore much further in the lists.

screen (Figure 4). The dock allows users to temporarily disable some POIs by dragging them to the dock area. The addition of dock was very important for Adaptive VIBE. First, because of the addition of the user model POIs, Adaptive VIBE usually requires the user to manipulate more POIs than traditional VIBE. To make the exploration more efficient, a user can dock some less important POIs. However, beside this basic advantage, the POI dock can be used for adjusting the effect of the user model on the visualization plane. Note that the user has no control over the original selection of the user model POIs: the system simply selects the top 10 most important user model terms (top five are arranged according to the selected layout and next five are docked). Some of these terms may have little connection to the current query. Disabling and re-enabling the POIs allows the user to control the effect of the user model in the context of the given query. Figure 6 shows an example of this feature of the dock. Originally, Adaptive VIBE enabled five user model POIs. However, the user decided to disable three POIs (POPE, RUSSIA and SENTENCE). Those three POIs are docked, and only two POIs remain on the visualization. By editing the user model content and removing the effect of the disabled POIs, the users could see more clearly which documents are more related to which POIs.

C O

relationships among POI and documents. This section presents only two of these features, which were added to support the adaptive functionality of the system. The rest of the information exploration features were retained from the earlier non-adaptive version of VIBE and were introduced in earlier publications.27,28

POI dock The Adaptive VIBE interface was extended with POI doc, shown as white rectangle on the right-hand side of the

© 2009 Palgrave Macmillan 1473-8716

Relevant Document Separation by Visualizing the Real User Activity Data In order to test the effectiveness of the Adaptive VIBE, we started with a usability study of an Adaptive VIBE prototype that has the same core adaptive visualization capability as the current version introduced in the previous section. The results27 showed that the subjects appreciated the role and the importance of the proposed Adaptive VIBE approach and its visualization features. Encouraged with this result, we conducted a study to further assess the value of the proposed adaptive visualization framework.

Information Visualization Vol. 8, 3, 167 – 179

171

Separation between relevant and non-relevant items using VIBE.

and doubt the performance of the system. However, if the relevant and irrelevant items form separate clusters, then the users can pick up and examine a couple of representative items (that is centroids) from each cluster and discover a group of good items according to their judgments on those sample documents. This scenario can greatly save the user effort if the system can generate clusters for mostly relevant items separate from those of non-relevant items. Following this consideration, we expect and hypothesize that the proposed Adaptive VIBE framework will be able to help users to access relevant documents by separating relevant and non-relevant document spatially on its visualization plane. The effect of this separation ability can be hypothesized in two aspects, (1) it will create two exclusive separate clusters mostly containing relevant and non-relevant documents; and (2) the relevant document clusters will mostly be located closer to the user model than user queries. We are going to discuss these hypotheses in more detail and explain the methodologies to test them in the following sections.

A U

TH

O

R

Our goal in this study was to verify whether Adaptive VIBE will be able to help users to locate relevant documents more effectively. Specifically, this can be understood as the ability to separate relevant and non-relevant documents, which stems from the known property of user models to distinguish these two groups of documents. As the user model accumulates user interests over multiple interactions, truly relevant documents tend to be more similar to the user model than non-relevant documents, which bear similarity with the last query only. This property of user models is applied in many adaptive search systems. A complete adaptive information filtering system will return 100 per cent relevant information to user needs, but the reality is not like that. Even systems with good performance – that is, 90 per cent of precision with their outputs – still display 10 per cent of the irrelevant information. The systems need to be able to discriminate this relevant and irrelevant information. Traditional search engines or information filtering systems with a textual interface have adopted ranked lists trying to place as many relevant items as possible at the top of the list and irrelevant items at the bottom. For example, a system with very good performance would put all relevant items above rank 20 (two pages if it displays 10 items per page) and irrelevant items under rank 20. Then users will be provided with a list that has a perfect separation between the relevant and irrelevant information at rank 20 (of course, they cannot notice it before they examine the list). Likewise, we can consider a similar issue with a visual presentation of the items, especially within the VIBE framework. The main concern of the users of the information filtering system is to find relevant information. If the distribution of the relevant and irrelevant information on the visualization is merely random, users will lose the clues for locating relevant items, become frustrated

C O

Figure 4:

PY

Ahn and Brusilovsky

172

© 2009 Palgrave Macmillan 1473-8716

The TaskSieve data set In order to test the hypotheses introduced above, we built a data set containing the snapshots of real user actions and real user models from our previous text-based adaptive information filtering study of TaskSieve interface shown in Figure 1.26 The use of this study data was important as it was based on fully annotated ‘ground truth’ data – that is, for each task performed by the user we knew which documents were relevant to the task and which were not. Using this detailed study log data, we were able to generate the visual representation of the user search results.

Information Visualization Vol. 8, 3, 167 – 179

Adaptive visualization of search results

using this log data. The information extracted from the TaskSieve log data for our visualization experiment is as follows (italicized items are the three core elements to be visualized). As mentioned before, we also have the ground truth information to check if a retrieved document is relevant or not to the given task (which was not revealed to the subjects during the TaskSieve study). This information was collected by real human experts and the detailed information about this process was presented in He et al.30

PY

1 Record ID 2 User ID 3 Topic ID of the target task 4 Query terms 5 User model (Top N weighted terms) 6 Documents retrieved using the query and the task model (top 100) 7 Relevance information about the retrieved documents

C O

From the original data set, some records that cannot contribute to this replication were removed. An example of a removed record is one containing non-relevant documents only, which cannot provide information about the relevant documents and the VIBE-based visualization itself bears no meaning. Another example is the data at the very beginning of each session, where the task model was not yet built; so again, the VIBE visualization cannot be constructed with it. After cleaning, 105 records remained. Each record included every attribute described above and was fed into the adaptive VIBE system to generate the visualization.

A U

TH

O

R

Figure 4 shows an example of this visualization. It is an instance from the experiment where a subject enters a query (CONVICT and PARDON) and the system returned a set of documents using the user model (YEAR, POPE, ESPIONAGE, PRISON and RUSSIA). The user model had been built by the user’s previous activities with the system. Here we can see the user model painted in blue POIs (placed on the right-hand side of the screen), the user query term in yellow POIs (on the left-hand side), and the documents in squares forming two separate clusters in green and red colors. Green means relevant to the current task, red means irrelevant and a clear separation between the relevant/non-relevant clusters is visible in this figure. White circles with green and red boundaries represent the centroids of the clusters. The visual cues regarding the relevance information (green or red document/centroid markers) in the figures were generated from ground truth data on the document relevancy to specific topics. The colors were not visible to the subjects during the adaptive information filtering experiment, they were added for the user log data analysis stage in this experiment. In the TaskSieve study where this user model data set was collected, ten subjects were asked to search for information from the TDT4 news text corpus (Topic Detection and Tracking Project, http://projects.ldc.upenn.edu/TDT) and mark passages they found relevant to the tasks. Six topics were chosen from the TDT4 corpus as tasks to be completed. The marked passages were stored into an electronic notebook provided by TaskSieve so that the subjects were able to submit them as final reports at the end of their experiment sessions. The notes were also used to construct task models, which is a type of user model specifically targeted to reflect task-based short-term interests in order to solve different tasks (we did not make a clear distinction between the user model and task model in this paper). Using this task model, the system could provide personalized filtering results. We adopted the bag of words approach for constructing and exploiting the task models. Whenever the subjects took notes, the top N important terms were extracted from the list of notes and added to the task models. The task model terms were represented in the vector space model using the well-known TF-IDF weighting scheme.29 The system retrieves a list of documents using a search engine with the user queries and then post-filters the ranked list using the task models. While interacting with the system by entering queries and making notes with important passages, the subjects automatically construct task models. The task models were reset when subjects started a task, so that they can work with different task models for different tasks. Each subject completed four tasks: two of them for the experiment system and the other two for the baseline system, which provided a simple ad hoc search interface without the task models. Each user activity and change to the user models in the TaskSieve study was traced and recorded in log files. We could test the proposed VIBE-based visualization interface

© 2009 Palgrave Macmillan 1473-8716

Adaptive VIBE experiment using the TaskSieve data set Thanks to the complete set of data, we were able to replicate the activities and results shown while the subjects interacted with the text-based adaptive information filtering system and could replace the textual interface with the VIBE-based visual interface we proposed. We then observed if the visual interface could successfully provide a better representation and separation of the adaptively filtered documents. We extended our VIBE implementation, so that it can run in a batch-processing mode to do the experiment with the log data. It calculated the locations of all the document points in the data set on the Adaptive VIBE visualization plane. Detailed information regarding the visualizations such as all document IDs and their positions (x and y coordinates on the 2D visualization plane) were recorded for analysis. Because we wanted to compare the three layouts the current VIBE supports – Radial, Parallel and Hemisphere – we repeated this process for each layout for every single record in the dataset and recorded the document locations separately. The relevance information to each document was also recorded in order to compare the relevant and non-relevant document clusters. Below

Information Visualization Vol. 8, 3, 167 – 179

173

Ahn and Brusilovsky

Hypotheses and measures Hypothesis 1 -- Separation of relevant and non-relevant documents By visualizing the adaptive filtering results using the VIBE framework and adding the new layouts that stress the separation of user model and query terms, we hypothesize that the relevant and non-relevant documents would create clusters and would be distributed separately on the display, because the proposed method can separate the effect of the query and user model terms spatially. This can be stated formerly as

 VarWC =

j=rel,nonrel

nj

2 i=1 dist(dij , centroid j )

|D|

(2)

where dij is each document in the retrieved lists and |D| is the total number of documents in the lists. Therefore, to test the Hypothesis 1, we should check whether the between-cluster variance (VARBC ) in the hemisphere and the parallel layouts is larger than that in the radial layout. At the same time, the within-cluster variance (VARWC ) should be compared to check the dispersion of the clusters.

R

Hypothesis 1: The query and the user model POI separation of Adaptive VIBE will create distinct document clusters that contain relevant and non-relevant documents and the clusters will be displayed separate on the screen.

where centroid rel , centroid nonrel and centroid all represent the average of the relevant, non-relevant, and all document points respectively, and dist means the Euclidean distances of the centroids. Note again that we could distinguish these centroids because we have the ground truth information as described earlier. We then calculated the within-cluster variances (VARWC ), which measure the dispersion of the document points in the clusters, defined as follows:

PY

1 Record ID 2 Layout – Radial, Parallel or Hemisphere 3 Horizontal document position (x coordinate on the visualization plane) 4 Vertical document position (y coordinate on the visualization plane) 5 Relevance (relevant or non-relevant)

Therefore, for measuring the separation, we can first calculate the between-cluster variance (VARBC ) of the relevant and non-relevant document groups.  2 j=rel,nonrel nj dist(centroid j , centroid all ) VarBC = (1) 2

C O

is the list of attributes recorded for the data analysis.

A U

TH

O

Adaptive VIBE provides three different POI layouts as discussed in the previous section. One of them is a radial layout, where all the POIs (regardless of query and user model POIs) are treated equivalently and the separation of those two groups is minimal. The other two layouts (hemisphere and parallel) are different in that they clearly separate the territory of the two groups. Therefore, we can test if the document cluster creation and separation is bigger in hemisphere or parallel layouts than in the radial layout. The hypothesis above can be redefined as follows: Hypothesis 1-1: The hemisphere and the parallel layout of Adaptive VIBE will create bigger cluster separation between relevant and non-relevant documents than the radial layout. The creation and the separation of the clusters can be tested in two ways: how separate they are from each other and how compact each of them is. By examining the location of the two cluster centroids, we can measure how much those two clusters are separated from each other. If the distance between two cluster centroids is far enough, we can conclude that the clusters are separate from each other. However, we should also consider the compactness of the clusters at the same time. Even though the distance between the centers (centroids) are big enough, two much dispersion of the clusters will result in great overlap of the cluster borders and will fade the effect of the big centroid distance.

174

© 2009 Palgrave Macmillan 1473-8716

Hypothesis 2 -- User model effect The first hypothesis was about whether the relevant and non-relevant document clusters are formed by the effect of the user query and the user model separation on the Adaptive VIBE visualization plane. However, the separation itself is not sufficient for helping users to access relevant information quickly. For example, users will be frustrated with an adaptive search system, which in some cases places all relevant documents at the top of the ranked list and all non relevant documents at the bottom – or, in some cases, vice versa. Although in each case relevant and non-relevant documents are clearly separated from each other, the user would not know where to expect relevant documents. Therefore, we need to assess whether Adaptive VIBE can place a cluster of relevant documents in a predictable location. As discussed earlier, user models of adaptive systems tend to attract truly relevant documents better than marginally relevant documents. Given that, we may expect Adaptive VIBE to place relevant documents closer to the user model POIs thus helping users to locate them. This property can be checked by directly comparing the positions of the relevant and non-relevant document centroids extracted from the log data. Following this idea, we defined the second hypothesis. Hypothesis 2: The centroids of relevant documents are closer to the user model than that of non-relevant documents.

Information Visualization Vol. 8, 3, 167 – 179

Adaptive visualization of search results

We could calculate the relevant and non-relevant document centroid positions whenever the users in the data set received a set of retrieved documents from the system.

Analysis

A U

TH

O

R

Separation of relevant and non-relevant documents As discussed before, we tried to verify whether the relevant and non-relevant document clusters were really formed and separated enough from each other. We examined all 105 records from the data set and calculated the variances between the relevant and non-relevant cluster centroids (VARBC ) for measuring the cluster separation. Table 1 compares these variances among the three VIBE layouts. Radial is the circular format that is the default layout of the traditional VIBE and Parallel and Hemisphere are new layouts designed for the separation of the user model and the user query. Higher VARBC means that the centroids of the relevant and non-relevant document clusters are more distant from each other. The Parallel layout showed the biggest variance between the clusters, which means the visual separation between the relevant and non-relevant documents were the biggest among the three layouts. The Radial layout does not consider the different type of POIs (query or user model terms) and just places them sequentially on a circle. It showed the least variance between the cluster centroids. The differences between the average VARBC of the new layouts (Parallel and

PY

Hypothesis 2-1: The horizontal coordinates of relevant document centroids are greater than the nonrelevant document centroids coordinates on average.

Hemisphere) and the Radial layouts were statistically significant (paired Wilcox signed rank test). We also compared the within-cluster variances (VARWC ) of the three layouts. We calculated these variances in order to estimate the dispersion of the clusterings. Even though the distance of the centroids is big, if the boundaries of the clusters are too large to go beyond this distance, then the between-centroid distance is meaningless. That is, the boundaries estimated by the within-cluster variance (VARWC ) should not exceed the centroid distance estimated by the between-cluster variance (VARBC ). In fact, the high degree of separation, especially with the Parallel layout, could have been expected because it greatly stretches the space horizontally. It is very probable that the between-cluster distance will be bigger than that of the other settings. Therefore, the within-cluster variances should be examined to verify the usefulness of the layouts with the higher between-cluster variances. If the documents forming a cluster are similar to each other and uniform, then they will be more likely to move to one extreme with a layout like the Parallel. If they are not uniform, they will be more likely to form a long, strip-shaped cluster spreading from left to right (from query terms to the user model terms) with the Parallel layout. The results in Table 2 and Figure 5 show that the clusters generated by the Parallel layout had the highest spread, around 17 000 on average. The Hemisphere and the Radial layouts showed spreads around 16 000 and 12 000 on average, respectively. However, these variances are all just around 1/10 of the between-cluster variances, suggesting that the clusters are distant enough from each other and formed distinctive separations. We have seen so far that the VIBE visualization for adaptive information filtering, especially with the newly proposed POI layouts, can separate relevant information from non-relevant information. However, one might suspect that this separation is just an outcome of distinguishing two random groups of POIs, and the distinction does not need to be drawn between the user model and query term POI groups. Therefore, we investigated the effect of the user models for separating the relevant information from the

C O

The new VIBE POI layouts (Parallel and Hemisphere) place the user model terms on the right-hand side of the current experimental settings. Therefore, we can simply check if the horizontal coordinates of the relevant document cluster centroids are greater (closer to the user model POIs at the right-hand side of the visualization) than those of the non-relevant document cluster centroids, meaning that they are closer to the user model.

Table 1: Cluster separation (between-cluster variances)

VARBC Difference (from Radial)

Radial

Parallel

Hemisphere

109 325.05 —

220 362.31 111 037.26 (P < 0.01)

186 679.57 77 354.52 (P < 0.01)

Radial

Parallel

Hemisphere

12 124.21 —

16 788.85 4664.64 (P < 0.01)

15 617.01 3 492.79 (P < 0.01)

Table 2: Cluster dispersion (within-cluster variances)

VARWC Difference (from Radial)

© 2009 Palgrave Macmillan 1473-8716

Information Visualization Vol. 8, 3, 167 – 179

175

Ahn and Brusilovsky

250000 200000 150000 VARWC 100000

VARBC

50000 0 Radial

Parallel

Hemisphere

A U

TH

O

R

C O

PY

Figure 5: Cluster separation and dispersion (between- and within-cluster variances).

non-relevant information. We could do this by disabling the user model terms from the active POI list and observing the change of the clustering. The assumption here is that if the separation decreases after the user model is removed from the visualization, then we will be able to interpret the decrease of separation as the effect of the user model. Figure 6 illustrates this process. The upper visualization shows two distinctive clusters: the relevant document cluster closer to the user queries (CONVICT and CLEMENT) and the non-relevant document cluster closer to the user model terms (POPE, YEAR, RUSSIA, SENTENCE and CHARGE). However, when three of the user model terms are removed (lower), the separation decreases radically. Figure 7 shows the comparisons of the states with and without the user model using the whole 105 records

Figure 6: Decrease of cluster separation by removing user model POIs (green squares represent relevant and red represent non-relevant documents).

176

© 2009 Palgrave Macmillan 1473-8716

Information Visualization Vol. 8, 3, 167 – 179

Adaptive visualization of search results

(we did not count the intermediate state as in the example above). We compared the between-cluster variances (VARBC ) by different number of query terms. The state without any user model (Radial without user models) was measured by dropping every user model POIs and leaving only the query terms in the Radial VIBE POI layout. The case of just one query term was discarded because it is not able to draw the VIBE visualization with a single POI term. The Parallel and the Hemisphere layouts could not be drawn for this query-only situation, because those layouts assume both user model and query terms are being used. Here we can observe that the best condition that separates the relevant and non-relevant clusters is when the number of query terms was two (with five user model terms by default), and they were drawn with the user model POIs using the Parallel or Hemisphere layouts. The variances dropped when the number of query terms increased beyond two. The condition without the user model (Radial with no user model) always showed worse separation than the Parallel layout with the user model. Therefore, we could conclude that the Adaptive VIBE framework with the user model and the query fusion has the power to visually discriminate the relevant information from the non-relevant information, especially because of the discriminating ability of the user model.

C O

PY

non-relevant document clusters. We examined which one among the two (relevant and non-relevant cluster centroids) was closer to the user model terms as discussed in the previous sections. Table 3 summarizes the result. Here, the centroid positions were calculated from the general Java Swing based plane where the x = 0, y = 0 points were located at the top and leftmost corner of the interface. Therefore, as the difference (horizontal position of relevant cluster centroid minus non-relevant cluster centroid) becomes positive and bigger, we can determine that the relevant document clusters are closer to the user model terms positioned on the right-hand side of the display in the Parallel and Hemisphere layouts. The Parallel layout showed the biggest difference betweenclusters among the three layouts, while the Hemisphere was the second biggest. We conducted a paired Wilcox signed rank test to compare every pair of the layouts per each visualization and found that the horizontal positions of the relevant document cluster centroids were greater than the non-relevant cluster centroids, which suggests that the relevant document clusters were closer to the user model on average. This result was statistically significant for all three layouts.

Conclusions and Future Work

User model effect In order to test the second hypothesis that the user model attracts the relevant document cluster, we directly compared the centroid positions of the relevant and

A U

TH

O

R

This paper proposes a novel adaptive visualization approach that bridges two different directions of research on information access: personalized search and spatial visualization. We built an adaptive visualization system Adaptive VIBE that implements this approach. In this system, both query terms and user model terms that represent the current user interests are represented as two separate groups of reference points, called POIs, so that the retrieved documents can be adaptively arranged according to their relevance to each group. The incorporation of the user model into the visualization allowed the system to provide more flexible and user-centered visual information filtering, along with the conventional query-based output. Adaptive VIBE expands the ideas of user control over the personalization, which we started to explore in our earlier work.26,27 The use of both search terms and model terms as POI allows the user to mediate flexibly between documents more relevant to the query and those more relevant to different longer-term interests represented in the model. To investigate whether the proposed approach is really effective, we implemented it in a personalized search

Figure 7: Comparison of between-cluster variances when the visualization was equipped with the user model (in Radial, Parallel or Hemisphere layouts) or not (no UM).

Table 3: Horizontal positions of the cluster centroids

Relevant Non-relevant Difference (Relevant – Non-relevant)

Radial

Parallel

Hemisphere

304.3 283.9 20.4 (P < 0.01)

300.9 207.96 92.94 (P < 0.01)

337.7 295.3 42.40 (P < 0.01)

© 2009 Palgrave Macmillan 1473-8716

Information Visualization Vol. 8, 3, 167 – 179

177

Ahn and Brusilovsky

11

12

13

14

R

A U

TH

O

1 Jansen, B.J., Spink, A. and Saracevic, T. (2000) Real life real users and real needs: A study and analysis of user queries on the web. Information Processing and Management 36(2): 207–227. 2 Jansen, B. J., Zhang, M. and Zhang, Y. (2007) Brand awareness and the evaluation of search results. In: C. Williamson, M.E. Zurko, P. Patel-Schneider and P. Shenoy (eds.) The 16th International Conference on World Wide Web, WWW ’07. Banff, Canada: ACM, pp. 1139–1140. 3 Keane, M., O’Brien, M. and Smyth, B. (2008) Are people biased in their use of search engines? Communications of ACM, 51(2): 49–52. 4 Cutrell, E. and Guan, Z. (2007) What are you looking for?: An eye-tracking study of information usage in web search. In: B. Begole, S. Payne, E. Churchill, R. St. Amant, D. Gilmore and M.B. Rosson (eds.) CHI ’07: ACM SIGCHI conference on human factors in computing systems. San José, CA: ACM Press, pp. 407–416. 5 Gauch, S., Speretta, M., Chandramouli, A. and Micarelli, A. (2007) User profiles for personalized information access. In: P. Brusilovsky, A. Kobsa, and W. Neidl (eds.) The Adaptive Web: Methods and Strategies of Web Personalization. Berlin Heidelberg, New York: Springer-Verlag, pp. 54–89. 6 Pazzani, M.J. and Billsus, D. (2007) Content-based recommendation systems. In: P. Brusilovsky, A. Kobsa, and W. Neidl (eds.) The Adaptive Web: Methods and Strategies of Web Personalization. Berlin Heidelberg, New York: Springer-Verlag, pp. 325–341. 7 Micarelli, A., Gasparetti, F., Sciarrone, F. and Gauch, S. (2007) Personalized search on the World Wide Web. In: P. Brusilovsky, A. Kobsa and W. Neidl (eds.) The Adaptive Web: Methods and Strategies of Web Personalization. Berlin Heidelberg, New York: Springer-Verlag, pp. 195–230. 8 Chirita, P.-A., Firan, C.S. and Neidl, W. (2006) Summarizing local context to personalize global web search. In: P.S. Yu, V. Tsotras, E. Fox, and B. Liu (eds.) ACM Fifteenth Conference

© 2009 Palgrave Macmillan 1473-8716

PY

10

15

References

178

9

on Information and Knowledge Management, CIKM 2006. Arlington, VA: ACM, pp. 287–296. Koutrika, G. and Ioannidis, Y. (2005) A unified user-profile framework for query disambiguation and personalization. In: J.G. Carbonell and J. Siekmann (eds.) Workshop on New Technologies for Personalized Information Access at 10th International User Modeling Conference, UM 2005. Edinburgh, UK: Springer, pp. 44–53. Sieg, A., Mobasher, B. and Burke, R. (2007) Web search personalization with ontological user profiles. In: M.J. Silva, A.H.F Laender, R.A. Baeza-Yates, D.L. McGuiness, B. Olstad, Ø.H. Olsen, and A.O. Falcão (eds.) ACM Sixteenth Conference on Information and Knowledge Management, CIKM 2007. New York: ACM, pp. 525–534. Micarelli, A. and Sciarrone, F. (2004) Anatomy and empirical evaluation of an adaptive Web-based information filtering system. User Modeling and User Adapted Interaction 14: 159–200. Arezki, R. et al. (2004) Web information retrieval based on user profile. In: W. Nejdl, and P. De Bra (eds.) Third International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH’2004) the Netherlands: Springer, pp. 275–278. Teevan, J., Dumais, S. and Horvitz, E. (2005) Personalizing search via automated analysis of interests and activities. In: G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates and N. Ziviani (eds.) Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, pp. 449–456. Braun, M., Dellschaft, K., Franz, T., Hering, D., Jungen, P., Metzler, H., Müller, E., Rostilov, A., Saathoff, C. (2008) Personalized Search and Exploration with MyTag. In: J. Huai, R. Chen, H.-W. Hon, Y. Liu, W.-Y. Ma, A. Tomkins and X. Zhang (eds.) The 17th international conference on World Wide Web, WWW ’08. Beijing, China: ACM, pp. 1031–1032. Olsen, K.A., Braun, M., Dellschaft, K., Franz, T., Hering, D., Jungen, P., Metzler, H., Müller, E., Rostilov, A. and Saathoff, C. (1993) Visualisation of a document collection: The VIBE system. Information Processing and Management 29(1): 69–81. Spoerri, A. (1993) InfoCrystal: A visual tool for information retrieval and management. In: B. Bhargava, T. Finin, and Y. Yesha (eds.) CIKM’93: The Second International Conference on Information and knowledge management. Washington, DC: ACM, pp. 11–20. Ahlberg, C. and Shneiderman, B. (1994) Visual information seeking: Tight coupling of dynamic query filters with starfield displays. In: B. Adelson, S. Dumais and J. Olson (eds.) CHI ’94, SIGCHI conference on Human factors in computing systems. Boston, MA: ACM Press, pp. 313–317. Fox, E. et al. (1993) Users, user interfaces, and objects: Envision, a digital library. Journal of the American Society for Information Science 44(8): 480–491. Leuski, A. and Allan, J. (2004) Interactive information retrieval using clustering and spatial proximity. User Modeling and User Adapted Interaction 14(2–3): 259–288. Lin, X. (1997) Map displays for information retrieval. Journal of the American Society for Information Science 48(1): 40–54. Benford, S., Snowdon, D., Greenhalgh, C., Ingram, R., Knox, I. and Brown, C. (1995) VR-VIBE: a virtual environment for cooperative information retrieval. Proceedings of EUROGRAPHICS’95 14(3): 349–360. Christel, M. and Martin, D. (1998) Information visualization within a digital video library. Journal of Intelligent Information Systems 11(3): 235–257. Hemmje, M., Kunkel, C. and Willett, A. (1994) LyberWorld – A visualization user interface supporting fulltext retrieval. In: W.B. Croft and C.J. van Rijsbergen (eds.) SIGIR ’94: The 17th Annual International ACM SIGIR Conference on Research and development in information retrieval. New York: Springer-Verlag, pp. 249–259. Ahn, J.-W., Farzan, R. and Brusilovsky, P. (2006) A two-level adaptive visualization for information access to open-corpus educational resources. In: P. Brusilovsky, J. Dron and J. Kurhila (eds.) Workshop on the Social Navigation and Community-Based

C O

system, TaskSieve and conducted an experiment. A real user activity log data set containing user queries, user models and retrieved document information was visualized using the Adaptive VIBE. The experiment revealed that the Adaptive VIBE is able to provide separate clustering of relevant and non-relevant documents while displaying relevant documents in the proximity of the visualized user model. These properties allowed the system to provide good visual separation of relevant and non-relevant documents. These results provide good evidence that the proposed adaptive visualization method can serve as a useful tool to help users to locate relevant information in a large corpus. In the future, we plan to explore more advanced user model elicitation methods and incorporate them in the next version of the adaptive visualization system. Even though the current simple keyword- and frequencybased user model method works well in general, we have noticed some situations where the user model presentation was neither precise nor useful enough, resulting in failures in the adaptive visualization. We believe that these shortcomings can be overcome by incorporating advanced user modeling technologies, including concept level user modeling and networked representations of user model entities.

16

17

18

19

20 21

22

23

24

Information Visualization Vol. 8, 3, 167 – 179

Adaptive visualization of search results

information foraging. In: T.Y Lin, L. Haas, J. Kacprzk and R. Motwani (eds.) The 2007 International Conference on Web Intelligence, WI ’07. Silicon Valey, CA: IEEE, pp. 706–712. 28 Ahn, J.-w., Brusilovsky, P. and Sosnovsky, S. (2006) QuizVIBE: accessing educational objects with adaptive relevance-based visualization. In: T. Reeves and S. Yamashita (eds.) World Conference on E-Learning, E-Learn 2006. Honolulu, HI: AACE, pp. 2707–2714. 29 Salton, G., Wong, A. and Yang, C. (1975) A vector space model for automatic indexing. Communications of ACM 18(11): 613–620. 30 He, D., Brusilovsky, P., Ahn, J.-w., Grady, J., Farzan, R., Peng, Y., Yang, Y. and Rogati, M. (2008) An evaluation of adaptive filtering in the context of realistic task-based information exploration. Information Processing and Management 44: 511–533.

A U

TH

O

R

C O

PY

Adaptation Technologies at the 4th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems. Ireland, Dublin: Springer, pp. 497–505. 25 Brusilovsky, P., Ahn, J.-w., Dumitriu, T. and Yudelson, M. (2006) Adaptive knowledge-based visualization for accessing educational examples. Information Visualization London, UK: IEEE, pp. 142–147. 26 Ahn, J.-w., Brusilovsky, P., He, D., Grady, J. and Li, Q. (2008) Personalized web exploration with task models. In: J. Huai, R. Chen, H.-W. Hon, Y. Liu, W.-Y. Ma, A. Tomkins and X. Zhang (eds.) The 17th International Conference on World Wide Web, WWW ’08 Beijing, China: ACM, pp. 1–10. 27 Ahn, J.-w. and Brusilovsky, P. (2007) From user query to user model and back: Adaptive relevance-based visualization for

© 2009 Palgrave Macmillan 1473-8716

Information Visualization Vol. 8, 3, 167 – 179

179

Suggest Documents