A Trustable Recommender System for Web Content - Semantic Scholar

1 downloads 12127 Views 616KB Size Report
Web-. ICLite is able to predict relevant pages based on a learned model, whose parameters are estimated from a labelled cor- pus. Data from a ... generate useful recommendations; we also consider learning .... software2 on their own computer, and begin browsing their ... Relevant, but does not answer my question fully.
A Trustable Recommender System for Web Content Tingshao Zhu, Russ Greiner Dept. of Computing Science University of Alberta Canada T6G 2E8

1

Gerald H¨aubl School of Business University of Alberta Canada T6G 2R6

“close the loop” for recommendations. For example, it is important for the system to help users understand the items recommended and then evaluate the recommendations.

ABSTRACT

This paper presents three principles for building a Web recommender system that can be trusted to provide valuable page recommendations to Web users. We also demonstrate the implementation of these principles in an effective complete-Web recommender system — WebICLite. WebICLite is able to predict relevant pages based on a learned model, whose parameters are estimated from a labelled corpus. Data from a recent user study demonstrate that the prediction model can recommend previously unseen pages of high relevance from anywhere on the Web for any user. 1.1

To build a Web recommender system, it is important to understand both what recommender systems are, and how they can be applied effectively. In this paper, we introduce three principles that we followed as we built our trustable Web recommender system — WebICLite: Useful and Complete-Web Recommendation: To gain the trust from Web users, it is critical to generate useful recommendations, as less relevant recommendations will discourage users. Since the Internet contains a vast amount of information spread across multiple sites, we observed that to satisfy the current information need, it is important to provide the most relevant pages from anywhere on the Web.1

Keywords

Browsing Behavior Model, Machine Learning, Information Search, Web Content Recommendation 2

Bob Price, Kevin Jewell Dept. of Computing Science University of Alberta Canada T6G 2E8

Introduction

While the World Wide Web contains a vast quantity of information, it is often time consuming and difficult for web users To find the information they are seeking on the Web. It is natural to ask whether computational techniques can be used to assist the user in finding useful pages. Today, millions of users employ information retrieval techniques in the form of popular search engines to successfully find useful pages.

Adaptive: A user’s needs can change dramatically even in the course of a single browsing session. As the user works on various tasks and subtasks, s/he will often require information on various unrelated topics, including topics the user has investigated before. This required us to build a system that can suggest recommendations dynamically, that is, the system needs to be adaptive to the current user and browsing session.

In order to benefit from these search engines, users must explicitly perform searches, and thus they must have intuitions about what keywords they should use to efficiently circumscribe the information they are seeking within the billions of Web pages that search engines typically index. We believe, however, that the time has come for information systems to go beyond simple fulfillment of specific requests. We envision interfaces that actively make suggestions for useful content, i.e., Web recommender systems.

User interface: In addition to generating useful recommendations, it is also critically important to design an efficient mechanism for presenting these recommendations to users — one that is compatible with a user’s natural behavior while using the Web. If the recommender interface is different from traditional interfaces, then people will have to change their behavior in order to adapt to the new interface.

We envision Web recommender systems that generate understandable and trustable recommendations, and present these recommendations to users in a natural way, and even include interfaces can encourage users to provide ratings in order to

One of our research goals is to understand how best to present recommendations to users so as to encourage them to explicitly rate content recommendations (e.g., how good is this recommendation?) Such relevance feedback can subsequently be used to tune the performance of the recommender system. 1 We

use the phrase “anywhere in the Web” to mean “anywhere that Google has indexed”.

1

We propose that there exists a general user-independent browsing behavior model of goal-directed information search on the web. That is, people follow some very general rules to locate the information they are seeking. If we (or our algorithms) can detect such patterns, then our systems can provide more useful content recommendations to the user. In our research, we use machine learning to acquire browsing behavior models, which can then be used to generate useful recommendations; we also consider learning personalized models.

3

Related Work

Many groups have built various types of systems that recommend pages to web users. This section will summarize several of those systems, and discuss how they differ from our approach. Zukerman [15] distinguishes two main approaches to learning which pages will appeal to a user: A content-based system tries to learn a model based on the contents of the Web page, while a collaborative system bases its model on finding “similar” users, and assuming the current user will like the pages that those similar users have visited.

There are many advantages to learning a model from data — i.e., to tuning the model’s internal parameters based on training examples First, a system that adapts to the user’s actual actions will perform better than one that does not. Of course, we could have a person inspect the earlier performance, and then make these modifications, by hand. A machine learning algorithm, however, can discover features in datasets that are beyond the scope of human inspection, as they too large or too complex.

Our WebICLite system is basically content-based, as its decisions are based on the combination of the page content and the applied actions. However, our approach differs from the standard content-based systems: As many such systems are restricted to a single Web site, their classifiers can be based on a limited range of words or URLs; this means they can make predictions about the importance of specific URLs (see association rules [1]), or of specific hard-selected words [4, 7, 3]. We did not want to restrict ourselves to a single Web site, but wanted a system that could recommend pages anywhere on the Web; this meant we did not want to restricted the range of words, or pages. For this reason, we built our content-based classifiers based on characteristics (“browsing properties”) of the words. As a result, our system is not restricted to predefined words, nor to Web pages that already have been visited by this user, nor even to Web pages that have been visited by similar users. As we expect that these browsing properties will be common accross web users, we expect them to be useful even for novel users, and so help them find pages even in novel web environments.

We developed the WebICLite Web recommender system to be “passive”, in that it can recommend pages relevant to the user’s current information need without requiring the user to do any additional work (e.g., the user does not need to answer intrusive questions, etc.). Like most recommendation systems, WebICLite watches a user as she navigates through a sequence of pages, and suggests pages that (it predicts) will provide the relevant information. WebICLite is unique in several respects. First, while many recommendation systems are server-side and hence specific to a single web site [9, 10], our client-side system is not so specific, and so can point users to pages anywhere on the Web. Secondly, WebICLite can predict the user’s information need dynamically, based on his/her current context (e.g., the current session). The third difference deals with the goal of the recommendation system: our goal is to recommend only useful pages; i.e., pages that are relevant to the user’s task. These “Information Content” pages (aka IC-pages) are just the pages the user needs to solve his/her current task, and not the overhead pages required to reach them. This differs from systems that instead attempt to lead the user to pages that similar users have visited earlier (independent of whether those “familiar” pages in fact contain the answer to the user’s current quest).

Another way to predict useful Web pages is to take advantage of a Web user model. Chi et al. [6], and Pirolli and Fu [11] construct Information Need from the context of the followed hyperlinks based on the theory of “Information Scent”, and view it as the information that the user wants. While this appears very similar to our approach, note that our approach is based on the idea that some browsing properties of a word (e.g., context around hyperlinks, or word appearing in titles, etc.) indicate whether a user considers a word to be important in this current context; moreover, our system has learned this “which words are important” predicate from training data, rather than by hand-coding an ad-hoc assumption. Letizia [8] and Watson [5] anticipate the user’s information needs using heuristics. While the heuristics may represent the user behavior well, we believe a model learned from user data, should produce a more accurate user model.

Section 3 discusses related work. Section 4 then describe a simple procedure for training the model, and the results of a user study (i.e., LILAC) that demonstrates the ability of our model to predict pages useful to the current user, from anywhere in the Web.Section 5 describes an implementation of our ideas in the form of a stand-alone web browser, WebICLite, that runs on the user’s computer, and provides on-line recommendations, to pages anywhere on the Web (not just on the user’s current Web site). Finally, Section 6 concludes with a summary of our key contributions and insights.

Table 1 compares the following types of recommender systems in terms of their key properties: • COB: Co-occurrence Based e.g., Association Rule [1], Sequential Pattern [2], etc. • CF: Collaborative Filtering [12] • CB: Content-Based [4, 7, 3] • HBM: Heuristic-Based Model [11, 8, 5] • IC-Models: IC-based Models; see Section 4. 2

Specific Site/Domain Model Acquisition Annotation Required (training) Annotation Required (performance) Reference Set Recommending Useful Information Using Sequential Information

COB Yes Learning No No Population No No

CF Yes Learning No No Group N/A No

CB Yes Learning Yes No Individual Yes No

HBM No Hand-coded No No None Yes Yes

IC-Models No Learning Yes No Individual/Group/Population Yes Yes

Table 1: Techniques for Recommender System mation need can be identified from the pages the user visits and the actions that the user applies to the pages. The content of the page is communicated by the roles that words play on the page. We make the simplifying assumption that we can represent the information need of the session by a set of significant words from the session. We further assume that the significance of words can be judged independently for each word from the roles played by instances of the word on pages throughout the sequence (e.g., appearing as plain text, highlighted in a title, appearing in followed hyperlink, etc.). To capture the reaction of the Web user, we developed a number of browsing features from the page sequence. The complete list of these ”browsing features” can be found in our previous papers [13, 14].

Figure 1: Browsing Behavior to Locate IC-page

To obtain the user model for Web recommendation, we first collected a set of annotated web logs (where the user has indicated which pages are IC-pages), from which our learning algorithm learned to characterize the IC-page associated with the pages in any partial subsession. Again, see [13, 14] for the details.

The most common kinds of collaborative filtering are based on weakly informative choice signals from users (purchases at Amazon, views of a web page, etc.). We view these system as not requiring user annotation. In some systems like movie databases, users explicitly rate the value of items. We would treat the explicit recommendations as annotations. 4

4.1

LILAC Study

The five-week study, named “LILAC” (Learn from the Internet: Log, Annotation, Content), attempts to evaluate the browsing behavior models that we proposed. In particular, it is designed to qualitatively assess the strengths of different browsing behavior based models, relative to a baseline model, and to gather data for further studies. We want to evaluate these models on actual users working on day-to-day tasks, to determine if the models help users, currently visiting arbitrary pages, find useful information from arbitrary sites on the Web.

Learning User-Web Interaction Models

Our work on web browsing behavior models is driven by several observations we have made concerning the nature of human-computer interaction while searching information on the Web. Consider the example suggested by Figure 1. Imagine the user needs information about marine animals. The user sees a page with links on “Dolphins” and “Whales”. Clicking on the “Dolphins” link takes the user to a page about the NFL Football team. This page is not an IC-page for this user at this time, so the user “backs up”. We might conclude that the word “Football”, which appears on the previous page but not on the current page, might not be an IC-word. The user then tries the “Whale” pointer, which links to a page whose title includes “whale”, and which includes whales, oceans, and other marine terms in its content. The user then follows a link on that page to another with similar content, and so forth, until terminating on a page about a local aquarium. We might conclude that words such as “Whale” and “Ocean” are IC-words, but “football” is not.

The process begins when the participants install the WebIC software2 on their own computer, and begin browsing their own choice of web pages.3 Whenever the subject discovers an “Information Content” page, s/he is instructed to label this page as an IC-page. Here, WebIC will ask the subject to compare the current IC-page to an alternative page that it 2 As the name suggests, this WebIC system is the original system, which has evolved into the current WebICLite. 3 We requested they use English language pages, but of course, did not enforce this. We also gave them ways to turn off the observation part of WebIC when dealing with private information — e.g., email or banking.

The above observation suggest that the user’s current infor3

provides. The subject can choose one of the following options: • • • • •

Fully answered my question Relevant, but does not answer my question fully Interesting, but not so relevant Remotely related, but still in left field Not related at all

The only other interaction is when the subjects could not find a page that addresses his/her information need, and so clicks on the “Suggest” button. Here, WebIC presents a recommended page for review. As before, the subject is then asked to rate the usefulness of the suggested page with respect to his/her current information needs. 4.2

Figure 2: Overall Results of the LILAC Study

Overall Results of LILAC

A total of 104 subjects participated in the 5-week LILAC study, visiting 93,443 Web pages. Over this period of time, they marked 2977 pages as IC-pages and asked for recommendations by clicking the “Suggest” button, 2531 times. In addition to the baseline model (FHW), we also trained three kinds of models, IC-word, IC-Relevant, and IC-Query, each based on browsing features. Moreover, we used the cumulative browsing logs, up to week n − 1, to produce the set of new models that the users downloaded for week n. Followed Hyperlink Word (FHW) Here, a word w was considered important if it appeared as the anchor text of any hyperlinks that was followed in the page sequence. Notice there is no training involved in this model.

Figure 3: WebICLite — An Effective Complete-Web Recommender System

This model is a baseline for this study. It was motivated by its similarity to the “Inferring User Need by Information Scent” (IUNIS) model [6].

which tries to produce the subset of words that would locate p, if given to a search engine (here Google). We then trained decision tree to predict words most likely to locate the IC-page by querying a search engine based on browsing features.

IC-word We compute the “browsing behavior” features of each word w encountered in the current session, and also note whether w appears on the associated IC-page (i.e., if it is an IC-word). We then train a decision tree to predict which words are IC-words given only their browsing properties.

At the conclusion of the study, we collected the evaluation results of our browsing feature based models (i.e., IC-models) and the baseline model (FHW). Figure 2 shows the overall results; the bars here show the relative percentage of each of the evaluation responses for each model.

IC-Relevant We again compute the “browsing behavior” features of each word w encountered in the current session. We also explicitly ask the user to select which of these words is “relevant” to his/her search task. We then train a decision tree to predict which words are “relevant”, based only on their browsing properties.

As suggested by this figure, and confirmed by statistical tests (shown in “http://www.web-ic.com/lilac/results.html”), each of the different IC-models perform better than the baseline model. This result validates our basic assumption that we are able to provide useful recommendations by integrating the user’s browsing behaviors into the prediction.

IC-Query The IC-word model considers each word in the IC-page to be an IC-words. However, many of these words are are general (e.g., “the”, “page”, etc.) and therefore not particularly relevant to the page content. In particular, only a few of these words would help locate this page, in a search engine. We label these words as IC-Query words. To obtain these IC-Query words, we run a “Query Identifier (QI)” on each IC-page p,

5

WebICLite— An Effective Complete-Web Recommender System

The WebIC system that we used in the LILAC study has evolved into the WebICLite recommendation system, whose interface appears in Figure 3. Like WebIC, WebICLite is a 4

client-side, Internet Explorer-based multi-tab web browser, that observes the user’s browsing behavior, extracts the browsing properties of the words encountered, and then uses those browsing properties to predict the user’s current information need, which it then uses to suggest (hopefully) useful pages from anywhere on the Web, without any explicit input from the user. WebICLite predicts IC-pages using trained models of user browsing patterns from previously annotated web logs. It computes browsing properties for essentially all of the words that appear in any of the observed pages, and generates an appropriate query to a search engine (e.g., Google) to generate an appropriate IC-page. The query is selected from the predicted information need using the trained models.

Figure 4: The Evaluation Interface of WebICLite

WebICLite differs from other IE-based browsers and Web recommender systems as follows. We found that some models work better for certain browsing patterns, as determined by general characteristics of the current session. WebICLite therefore chooses the model that works best according to the current browsing session, then uses this model to provide recommendations.

Modeling (Training) WebICLite chooses one of several models to suggest useful Web pages, where each of these models has each been trained to learn the relationship between user’s current page-action sequence and his/her information needs using browsing features. Our current system then builds a general “browsing behavior model” from the entire population of users. (Another option is building a personalized model for this specific user, based only on his/her own annotated web logs. Here, we can obtain a recommendation that is specific to the user’s specific browsing patterns, and thus might be more accurate. We can also train the models for user community based on the data that from all the members from the same community; e.g., by country or by domain.)

5.2

In order to collect the relevance feedback, WebICLite will then ask the user to evaluate the suggested page, as shown in Figure 4. The user will be asked to “Tell us what you feel about the suggested page”, to indicate whether the information provided on the page suggested by WebICLite was relevant for his/her search task, just as the user did in LILAC.

Recommendation In order to generate a recommendation, the system must: First, extract the browsing features of the words in the current session, and then apply the extracted patterns to generate a synthesized query, which is subsequently sent to a search engine (e.g., Google). 5.1

Evaluation Interface

In the current implementation, the user can click “Suggest” to ask WebICLite to propose a Web page, anytime s/he needs assistance.

WebICLite also provides the options that allow the user to keep browsing from the suggested page. He/she can select an appropriate action: Discard the suggested page, Open it on the current tab, or Open it in a new tab. 5.3 Learning from Evaluation

Adaptive Recommender System

In order to train our models, the study participants must actively label IC-pages while browsing the Web; of course, this is inconvenient for the user, and unrealistic in a production version of the product. To solve this data collection problem, we propose to passively train a model based on previous evaluation results. Recall that every time a user requests a recommendation, we generate a search query using one of the models to return a page to the user, which s/he is then asked to evaluate. If we assume that the search engine (e.g., Google) remains relatively consistent (i.e., in terms of the quality of pages returned) over time, we can infer the evaluation of the search query from the actual evaluation of the recommended page. Thus we can label each query as one of the evaluation outcomes. We can then attempt to learn a “Fully”-page classifier by considering (as positive examples) only the queries that are evaluated as “Fully”, and the rest as negative.

WebICLite can be adapted to the current browsing session and user. Data from the LILAC study suggests that people tend to surf the Web by following some general browsing session patterns. One example of a search pattern looks like: http://www.google.ca/ http://www.google.ca/search?q=spaceshipone http://www.scaled.com/projects/tierone http://www.google.ca/search?q=spaceshipone http://www.scaled.com http://www.google.ca/search?q=spaceshipone

This pattern might suggest that sometimes even though the user provides explicit input, s/he still cannot find the relevant information. In this case, our models might find the exact query that will return the relevant page, from his/her browsing behavior. 5

4.

5.

6.

Figure 5: Alternate Training on Evaluation

7.

In the LILAC study, the IC-Query models were trained directly based on the pages marked IC-pages. In the last week, we changed the experimental protocol to train a model based on all queries that resulted in a “Fully” evaluation in the previous weeks. Figure 5 presents the results of these two models, trained by the pages marked IC-page, vs the retrospective one.

8.

9.

As suggested here, both approaches produce similar performance. This result is significant as it will allow us to continuously refine the model without requiring user input to label IC-pages while browsing the Internet. Importantly, this alternate training method will make the use of WebICLite more realistic in real world situations. 6

10.

11.

12.

Conclusion and Future Work

The browsing behavior model in WebICLite is independent of any particular words or domain. Our model uses a unique source of information, i.e., browsing features, to provide recommendations when other paradigms cannot.

13.

We are currently investigating the features that can provide specific page recommendations to the user. In the future, we plan to release WebICLite as freeware to do further tests on other domains and larger user pools.

14.

Acknowledgement 15.

The authors gratefully acknowledges the generous support from Canada’s Natural Science and Engineering Research Council, the Alberta Ingenuity Centre for Machine Learning (http://www.aicml.ca), and the Social Sciences and Humanities Research Council of Canada Initiative on the New Economy Research Alliances Program (SSHRC 538-2002-1013). 7 REFERENCES 1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th Int’l Conference on Very Large Databases (VLDB’94), Santiago, Chile, Sep 1994. 2. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the Int’l Conference on Data Engineering (ICDE), Taipei, Taiwan, Mar 1995. 3. C. Anderson and E. Horvitz. Web montage: A dynamic per-

6

sonalized start page. In Proceedings of the 11th World Wide Web Conference (WWW 2002), Hawaii, USA, 2002. D. Billsus and M. Pazzani. A hybrid user model for news story classification. In Proceedings of the Seventh International Conference on User Modeling (UM ’99), Banff, Canada, 1999. J. Budzik and K. Hammond. Watson: Anticipating and contextualizing information needs. In Proceedings of 62nd Annual Meeting of the American Society for Information Science, Medford, NJ, 1999. E. Chi, P. Pirolli, K. Chen, and J. Pitkow. Using information scent to model user information needs and actions on the web. In ACM CHI 2001 Conference on Human Factors in Computing Systems, pages 490–497, Seattle WA, 2001. A. Jennings and H. Higuchi. A user model neural network for a personal news service. User Modeling and User-Adapted Interaction, 3(1):1–25, 1993. H. Lieberman. Letizia: An agent that assists web browsing. In International Joint Conference on Artificial Intelligence, Montreal, Canada, Aug 1995. B. Mobasher, R. Cooley, and J. Srivastava. Automatic personalization through web usage mining. Technical Report TR99-010, Department of Computer Science, Depaul University, 1999. M. Perkowitz and O. Etzioni. Adaptive sites: Automatically learning from user access patterns. Technical Report UW-CSE97-03-01, University of Washington, 1997. P. Pirolli and W. Fu. Snif-act: A model of information foraging on the world wide web. In Ninth International Conference on User Modeling, Johnstown, PA, 2003. P. Resnick, N. Iacovou, M. Suchak, P. Bergstorm, and J. Riedl. Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, pages 175–186, Chapel Hill, North Carolina, 1994. ACM. T. Zhu, R. Greiner, and G. H¨aubl. An effective completeweb recommender system. In The Twelfth International World Wide Web Conference(WWW2003), Budapest, HUNGARY, May 2003. T. Zhu, R. Greiner, and G. H¨aubl. Learning a model of a web user’s interests. In The 9th International Conference on User Modeling(UM2003), Johnstown, USA, June 2003. I. Zukerman and D. Albrecht. Predictive statistical models for user modeling. User Modeling and User-Adapted Interaction, 11(1-2):5–18, 2001.

Suggest Documents