Personalization by Collaborative Meta-Search* .... looking for scientific articles in a variety of document formats (HTML, PDF,. Postscript etc.) ..... editor, Proceedings of the International Joint Conference on Artificial Intelligence,. IJCAI'95, pages ...
I-SPY - Anonymous, Community-Based Personalization by Collaborative Meta-Search? Barry Smyth1,2 , Jill Freyne1 , Maurice Coyle1 , Peter Briggs1 , and Evelyn Balfe1 1
Smart Media Institute, University College Dublin, Dublin, Ireland 2 ChangingWorlds Ltd. Trintech Building, South County Business Park, Leopardstown, Dublin, Ireland {firstname.lastname}@ucd.ie
Abstract. Today’s Web search engines often fail to satisfy the needs of their users, in part because search engines do not respond well to the vague queries of many users. One potentially promising solution involves the introduction of context into the search process as a means of elaborating vague queries. In this paper we describe and evaluate a novel approach to using context in Web search that adapts a generic search engine for the needs of a specialist community of users. This collaborative search method enjoys significant performance benefits and avoids the privacy and security concerns that are commonly associated with related personalization research.
1
Introduction
The information overload problem continues to frustrate Web users as they find it increasingly difficult to locate the right information at the right time, even with today’s modern search engines. Many factors are responsible and certainly, the sheer quantity of information, and its growth rate, tax even the most advanced search engines. For example, various estimates indicate that even the largest search engines often cover only a fraction of the available information space [13]. However, this search engine coverage issue is just part of the problem, and indeed can be relieved by using meta-search methods [5, 18]. Perhaps a more pressing issue is the degree to which those pages that are covered can be accurately ranked relative to a given query. In particular, while search engines work very well for properly formulated queries, they come up short when presented with an average Web user’s query, which contains only about 2 query terms [12]. The outcome is long lists of apparently relevant results, but with genuinely useful results (for the target user) few and far between. Furthermore these problems are exacerbated by the new generation of mobile Internet devices (eg. WAP ?
The support of the Informatics Research Initiative of Enterprise Ireland is gratefully acknowledged
phones and PDAs)—their restricted input capabilities and limited screen realestate mean that mobile users are even less likely to provide well formulated queries or tolerate long result-lists. Generally speaking, recent advances have focused on ways to handle vague queries by improving existing page analysis, indexing and ranking methods. However, vagueness remains a significant problem: a query might include terms that identify the primary information target, but might exclude terms that usefully describe the search context. For example, a simple query for “jaguar ” does not indicate whether the user is interested in cars or cats, and queries for “Michael Jordan” do not distinguish between the basketball star, the Berkeley professor, or the recently appointed chairperson of computer-services giant, EDS. Thus, many researchers have recently focused on ways to exploit context in Web search, either by explicitly establishing search context up-front or by implicitly inferring context as part of the search process (see also [11]). We describe a novel, deceptively simple, yet powerful technique to exploit context during search (Section 3). This collaborative search method acts as a post-processing service for existing search engines that re-ranks results based on the learned preferences of a community of users; see also [19]. We describe its implementation in the I-SPY system (http://ispy.ucd.ie) and show how ISPY achieves this level of personalization in an anonymous fashion, without storing individual user profiles, thus relieving many of the usual privacy issues associated with personalization techniques. In Section 4 we discuss the results of a preliminary live-user trial in which the collaborative search technique is found to enjoy improved result accuracy.
2
Background
The importance of context is well understood in Web search but where does context information come from and how can it be used? There are two basic options: either it can be explicitly provided by the user or search engine or it can be implicitly inferred from the local search environment; see [11] for a comprehensive review of the use of context in Web search. 2.1
Explicit Context Manipulation
There are two ways to capture explicit context; they are perhaps less interesting from an AI viewpoint, but are outlined here for completeness. Option one asks users to provide context terms as part of their search query. For example, Inquirus 2 [6] supplements a traditional keyword-based query with a context category; users select from a set of categories such as “research paper”, “homepage” etc. It is a meta-search engine and uses the context categories to select which search engines to use to satisfy the user’s query. The category can also be used for query modification (eg. a query for research papers on “web search” might be modified to include terms such as “references”). In related work [7] present
techniques for automatically learning high-quality query modifications that are capable of improving search precision at reasonable levels of recall. The second option for introducing context into Web search is to use a specialised search engine whose index has been designed to cover a restricted information domain (eg. www.invisibleweb.com, www.MP3.com, www.ireland.com etc), essentially fixing the context prior to searching. Some specialised search engines automatically maintain their indexes by using information extraction techniques to locate and index relevant content (see [10]). Good examples include CiteSeer [14], for searching scientific literature, and DEADLINER [9], for conference and workshop information. For example, CiteSeer crawls the Web looking for scientific articles in a variety of document formats (HTML, PDF, Postscript etc.) and builds an index that is well suited to literature searching. 2.2
Implicit Context Inference
Establishing the search context by asking users to provide context terms or by fixing the context through the use of a specialised search engine is clearly not a complete solution to the context problem. Whatever about their willingness to use specialised search engines for specialised searches, most users are fundamentally lazy and do not like having to include additional context terms as part of their query. But what if context could be automatically inferred? This question is being answered by a wide range of research focusing on different techniques for capturing different types of context. In fact two basic approaches have become popular depending on whether external or local context sources are exploited. External Context Sources. Users rarely perform searches in isolation. It is much more likely that the search will be related to some other task that they are currently performing. Perhaps they are reading a Web page, replying to an email, or writing a document when they need to search for some associated piece of information. By taking advantage of a user’s activity immediately prior to the search it may be possible to determine a suitable search context. This is the goal of systems such as Watson [4], the Remembrance Agent [17], and Letizia [15]. Watson and the Remembrance Agent provide just-in-time information access by deriving context from everyday application usage. For example, as a Watson user edits a document in Microsoft Word, or browses with Internet Explorer, Watson attempts to identify informative terms in the target document by using a heuristic term-weighting algorithm. If the user then searches with an explicit query, Watson modifies this query by adding these newly derived terms. Similarly, Letizia analyses the content of Web pages that the user is currently browsing, extracting informative keywords also using similar term-weighting heuristics, and proactively searches out from the current page for related pages. In this sense, Letizia is more of a browsing assistant than a search assistant but it does exploit context in a similar manner. [8] describes a method that uses categories from the Open Directory Project (ODP) (www.dmoz.org) as a source of context to guide a topic-sensitive version
of PageRank [3]. Briefly, the URLs below each of the 16 top-level ODP categories are used to generate 16 PageRank vectors that are biased with respect to each category. These biased vectors are used to generate query-specific importance scores for ranking pages at query-time that are more accurate than generic PageRank scores. Similarly, for searches performed in context (eg. when a user performs a search by highlighting words in a Web page), context-sensitive PageRank scores can be computed based on the terms and topics in the region of the highlighted terms. Local Context Sources. Local context analysis and relevance feedback techniques attempt to use the results of a search as the basis for context assessment. These methods analyse search results in order to extract useful context terms which can then be used to supplement the user’s original query. Typically these context terms are those terms that are highly correlated in the initial search results. For example, the technique proposed by [16] extracts correlated terms from the top-ranking search results to focus context on the most relevant search results as opposed to the entire set. This idea of using the local search context can be extended beyond a single search episode. Many users will perform a sequence of searches on a specific topic and their response to the results can provide valuable context information. Thus, by monitoring and tracking queries, results and user actions, it may be possible to model search context over an extended search session or even across multiple search sessions. For example [1] describes the SearchPad system which extracts context information, in the form of useful queries and promising result-lists, from multiple search sessions. Similarly, [2] describe the CASPER search engine for job advertisements, which maintains client-side user profiles that include job cases that users have liked and disliked in previous searches. These profiles are used to classify and re-rank the results of future searches based on features of jobs that users have preferred in the past.
3
Collaborative Search
Collaborative search is motivated by two key ideas. First, specialised search engines attract communities of users with similar information needs and so serve as a useful way to limit variations in search context. For example, a search field on a motoring Web site is likely to attract queries with a motoring theme, and queries like “jaguar ” are more likely to relate to fast cars than wild cats. Second, by monitoring user selections for a query it is possible to build a model of querypage relevance based on the probability that a given page pj will be selected by a user when returned as a result for query qi . The collaborative search approach combines both of these ideas in the form of a meta-search engine that analyses the patterns of queries, results and user selections from a given search interface. This approach has been fully implemented in the I-SPY search engine and will be detailed and evaluated in the following sections.
3.1
The I-SPY System Architecture
The I-SPY architecture is presented in Fig. 1. Each user query, q, is submitted to the base-level search engines (S1 - Sn ) after adapting q for each Si using the appropriate adapter. Similarly, the result set, Ri , returned by a particular Si is adapted for use by I-SPY to produce Ri0 , which can then be combined and re-ranked by I-SPY, just like a traditional meta-search engine. However, I-SPY’s key innovation involves the capture of search histories and their use in ranking metrics that reflect user behaviours.
Fig. 1. The I-SPY system architecture.
The unique feature of I-SPY is its ability to personalize its search results for a particular community of users without relying on content-analysis techniques (eg. [2, 12]). To achieve this I-SPY borrows ideas from collaborative filtering research to profile the search experiences of users. Collaborative filtering methods represent user interests in terms of ratings (provided by the user, either implicitly or explicitly) over a set of items (eg. books, CDs, movies, etc.). Recommendations are made to a target user by selecting items from a set of users with correlated rating histories. Thus collaborative filtering operates by exploiting a graded mapping between users and items. I-SPY exploits a similar relationship between queries and result pages (Web pages, images, audio files, video files etc.) represented as a hit matrix (see Fig. 1). It maintains a separate hit matrix for each search community and each element of the hit matrix, Hij , contains a value vij (that is, Hij = vij ) to indicate that vij users have found page pj relevant for query qi . In other words, each time a user selects a page pj for a query term qi , I-SPY updates the hit matrix accordingly. For example, consider the situation in which an I-SPY search box is placed on the Web pages of a motoring Web site. This search box will be associated with a specific search service within I-SPY and as such is allocated its own hit matrix. The queries submitted through this search box, and results that are subsequently clicked, will be recorded in this particular hit matrix. In general, a single I-SPY deployment may service a number of separate search communities by maintaining separate hit matrices, performance permitting.
3.2
Collaborative Ranking
I-SPY’s key innovation is its ability to exploit the hit matrix as a direct source of relevancy information; after all, its entries reflect concrete relevancy judgments by users with respect to query-page mappings. Most search engines, on the other hand, rely on indirect relevancy judgments based on overlaps between query and page terms, but I-SPY has access to the fact that, historically, vij users have selected page pj when it is retrieved for query qi . I-SPY uses this information in many ways, but in particular the relevance of a page pj to query qi is estimated by the probability that pj will be selected for query qi (see Equation 1). Hij Relevance(pj , qi ) = P ∀j Hij
(1)
By way of an example, Figs. 2 and 3 show two screen-shots of the I-SPY system. Each presents part of the results page for a query by a computer science student for the Occam programming language; in this case a single query term, Occam, is used as shown. Fig. 2 shows the standard result-list returned before ISPY has built up its hit table data and so the results are ordered using a standard meta-search ranking function, giving preference to results that are highly ranked by I-SPY’s underlying search engines; in this case, Yahoo! and Splat!, although only Yahoo! results happen to be shown in the screenshots. Clearly the results presented are not so relevant. In fact none of first 4 results shown refer directly to the Occam programming language. For example, the first result is for a page entitled “What is Occam’s Razor? ” and relates to William of Occam’s famous principle of choice; this result obtains a ranking score of 25% as indicated. In contrast, Fig. 3 shows the results for the same query, but after I-SPY has been trained by a community of computer science students; see Section 4 for further details. The results are now ranked by I-SPY’s relevance metric, as discussed above, rather than by the standard meta-search ranking function. The point is that this time the results are very different; all of the top 4 results are clearly relevant to the Occam programming language rather than other interpretations of the query. For example, the top ranking result, “The Occam Archive”, has a relevance value of 34%. In other words, this page has been selected 34% of the times that Occam has been used as a query. Note that this page is assigned a score of only 11% by the standard meta-search ranking function and previously would have been ranked in 6th position. 3.3
Community Filtering
A key point to understand about this relevancy metric is that it is tuned to the preferences of a particular set of users - the community of I-SPY users - and the queries and pages that they tend to prefer. Deploy I-SPY on a motoring web site and its hit matrix will be populated with query terms and selected pages that are relevant to car fans. Over time the value-space of the relevancy metric will adapt to fit the appropriate query-page mappings that serve this target community. For
Fig. 2. I-SPY search results before training.
Fig. 3. I-SPY search results after training.
example, queries for “jaguar ” will tend to result in the prioritisation of car sites because previously when users have submitted this query term they will have selected Jaguar car sites, ignoring the wild cat pages. The wild cat pages may still be returned but will be relegated to the bottom of the result-list. In fact I-SPY can deploy multiple I-SPY search agents, each with its own separate hit table. Thus the central I-SPY engine can be used to service many different search services across a range of portals, for example, each one adapted for the needs of a particular user group through its associated hit matrix. Alternatively, different hit matrices could be associated with different regions of the same site to bias search with respect to different topics. For instance, above we reviewed work by [8] in which PageRank is biased with respect to different topic groups in an Internet directory (ODP). A similar strategy can be supported by I-SPY. Placing a search box on the Programming Languages directory page will naturally tend to capture queries from this domain. The behaviour of the users providing these queries will gradually adjust I-SPY’s relevancy metric and ranking function in favour of Programming Languages pages. I-SPY’s administration module (see Figure 1) manages this functionality and also supports the potential for sharing or combining hit matrices when appropriate, as a way of kick-starting the tuning of a new search service, or of broadening the scope of the search bias; for example, a music site may benefit from an ISPY search service that uses an initial hit matrix that is the combination of hit matrices that have been previously generated from related services such as an entertainment search service and a live concert search service.
4
Evaluation
In this section we describe the results of a preliminary evaluation of I-SPY, focusing on its ability to anticipate the needs of a community of users. 4.1
Methodology
For our evaluation it was appropriate to focus on a specific search domain. We chose that of programming languages and selected the names of 60 programming languages listed by Yahoo!. I-SPY was configured to query two underlying search engines, Yahoo! (which uses Google) and Splat!, and each of the 60 queries was submitted to I-SPY to obtain a set of results pages based on a standard metasearch ranking function. This produced 60 result-lists containing between 10 and 30 results and with a median of 27 results. A group of 20 computer science students took part in the evaluation. These students were asked to analyse each result-list and select those results that appeared to be relevant, based on the summary result description returned by I-SPY (which is the summary description returned by Yahoo! and/or Splat!). Next, a leave-one-out relevancy evaluation was performed. Each user in turn was designated the test user with the remaining 19 serving as training users. The relevancy results produced by the training users are used to populate I-SPY’s hit
matrix and the results for each query were re-ranked using I-SPY’s behavioural ranking metric. Next we counted the number of these results listed as relevant by the test user for different result-list size limits (k); here k = 5..30, and resultlists with fewer than k items did not contribute to the results for that list size. Finally, we also made the equivalent relevancy measurements by analysing the results produced by the untrained version of I-SPY (Standard ), which uses the standard meta-search ranking metric and which serves as a benchmark. So for each query, and each test user, we have the number of relevant results returned by I-SPY and by Standard for various different result-list sizes. 4.2
Results
Figure 4(a) presents these results for I-SPY and the benchmark search engine, as a graph of the mean number of relevant pages across all 20 users and 60 queries, and for each result-list size. Clearly the results indicate that I-SPY’s collaborative search technique has a positive impact in terms of the number of relevant items returned. For example, when the results are restricted to the top 5 we find that the benchmark only returns 3.16 relevant pages on average, compared with 4.8 pages for I-SPY’s behavioural ranking metric; a relative increase for I-SPY of over 50%. Put another way: I-SPY can retrieve about 8 relevant results with result-list sizes of 10 items. To retrieve the same number of relevant items takes our benchmark system 20 items.
Fig. 4. Experimental results.
The I-SPY benefit is preserved across all result-list sizes although the relative benefit naturally decreases as the result-list increases to its maximum size of 30 items; at this size both engines retrieve the same number of relevant items since
both have access to the same list of results and differ only in the way that they order them. Fig. 4(B&C) present the corresponding precision and recall results. Briefly, the precision of a list of k results is the percentage of these k that are relevant, while the recall of a list of k results is the percentage of the relevant results that are among the k. Because of our experimental design these are really bounded versions of the standard precision and recall metrics and the measures for each engine will converge once a complete result-list is returned. The results indicate a significant and consistent benefit for I-SPY over the standard meta-search benchmark. For example, for result-lists of 5 items, I-SPY achieves a precision of just over 96% compared to the standard meta-search precision of only 63%. Similarly, at the same list size, we find an average recall for I-SPY of 64% compared to just under 40% for the standard method. Indeed we see that I-SPY achieves 100% recall at just over 20 items; it takes the benchmark 30 items to achieve the same level of recall. In summary, the above results indicate that I-SPY enjoys a clear advantage over the benchmark meta-search engine. Moreover, the fact that larger relative benefits are available at smaller result-list sizes is particularly interesting and useful. Users rarely sift through large result-lists even when they are returned and so the more relevant items that can be presented earlier on, the better; this is why it is important to evaluate precision and recall at different list sizes. In addition, these results indicate that I-SPY’s technology is likely to be especially valuable in situations where large result-lists must be truncated for external reasons. This is commonly the case with mobile applications because of the limited screen-space of current mobile devices; many mobile devices are capable of displaying less than 5 items per screen.
5
Discussion
Finally, before concluding we discuss some further applications of collaborative search including its ability to guide query elaboration and page similarity predictions, as well as its use in multimedia retrieval domains. We also highlight some important issues that must be addressed if applicability is to be guaranteed across all domains. 5.1
Query Elaboration & Page Similarities
I-SPY’s hit matrix can be used to estimate query term similarities and page similarities as well as query-page relevance. For example, the correlation between two hit matrix rows, corresponding to queries qx and qy , is a measure of the relationship between the relevancy of the different pages that have been selected for these queries. A strong positive correlation means that the same pages tend to be judged similarly for both queries. This indicates a potential relationship between qx and qy . Moreover, if qx and qy have a strong correlation, and a proper subset of qx ’s relevant pages are found to be relevant for qy then qx might be a
more general query than qy . This type of information can be a valuable source of knowledge for query modification techniques that add related query terms (more specialised or generalised) to help cope with vague or under-specified queries. Similarly, column correlations are used to compute result similarities as a way of automatically suggesting related pages to a user that may not have a direct similarity (in terms of term-overlap) to the specified query terms, and thus pages that would not be selected by the base-level search engines. 5.2
Multimedia Content
I-SPY’s collaborative ranking, query elaboration and result similarity strategies can pay significant dividends in domains where traditional indexes are difficult or expensive to compile. For example, indexing non-textual content (music, graphics etc.) is expensive because of the lack of automatic indexing techniques. Consider an MP3 search engine and a preliminary index whereby individual MP3 files are manually indexed according to basic genre categories, with additional information (such as artist, album name, and title words) extracted by heuristic methods. Furthermore, assume that the Celine Dion song, “My Heart Will Go On” (the theme from the hit movie, Titanic) does not contain “Titanic” as a default index term, so searches on this term will fail to retrieve this result. For example, although Google is very good at retrieving songs by their title (as one would expect, since song titles are usually relatively unique) it fails to perform well when less specific queries are submitted; for instance, the above song is not returned in the top 100 results for the query “Titanic”. Occasionally a user will add “Celine Dion” to their failed “Titanic” query and get the above result. If they select it then I-SPY will log this in its hit table for the three query terms and, as a result, I-SPY now has the potential to suggest the above result in the future for the lone “Titanic” query. By taking advantage of these occasional query correlations I-SPY can effectively enhance the coverage, precision and recall of the underlying search engine irrespective of its content type or index completeness. 5.3
The Cold-Start Problem and Relevancy Biases
There are a number of important problems with collaborative search that need to be addressed in order to guarantee its full applicability across a broad range of search tasks. Perhaps the most important problem is the so-called cold-start problem, which refers to the fact that newly indexed Web pages will find it difficult to attract user attention since they will, by default, have a low relevancy score using I-SPY’s metric and thus appear far down in result-lists, thus limiting their ability to attract the hits they may deserve for a given query. Essentially there is an in-built bias towards older pages. There are a number of ways that this problem might be dealt with. One is to look at ways to normalise the relevancy of pages with respect to their age. For example, we might measure the age of a page by the time (or number of
queries) since its first hit and amplify the relevancy of young pages relative to older pages. Indeed there is another side to this problem. Just as new pages find it difficult to attract hits, so too older pages may find it too easy to attract hits. In the worst case scenario this could even bias I-SPY’s result-lists towards pages that are likely to be out of date and thus less relevant to current users than they were to past users. Once again, biasing relevance towards new pages should help to cope with this problem. Of course in general there are many factors that can and probably should be taken into account when ranking search results. We have focused primarily on I-SPY’s relevancy factor, but other factors such as the age of a page and its meta-search ranking are also appropriate. As part of our future work we will explore how best to combine these factors to produce optimal result rankings. This may or may not involve a direct combination of the rankings. For example, one option is to present search results not as a single list of results, as is normally the case, but perhaps as two or more lists of results in order to emphasise the different qualities of the returned pages. For instance, in general only a subset of search results are likely to have non-zero I-SPY relevance scores (that is, a subset will have been selected in the past for the current query) and so it is practical to present the I-SPY results that do have relevancy scores as special recommendations (ranked by their relevancy) while presenting the remaining results separately, ranked by their meta-score. In turn a third list of new pages, ranked by meta-search score or relevancy, could also be separately presented.
6
Conclusions
Improving the accuracy of Web search engines by introducing context into the search process is an important and challenging research problem. In this paper we have described the I-SPY system which attempts to discover patterns in the activity of a community of searchers in order to determine general search context and prioritise search results accordingly. I-SPY can be used in collaboration with any underlying search engines, and builds a relevancy model based on the selection histories of its user community. This model can be used to re-rank search results based on the past selections of I-SPY users and we have shown that this approach can lead to significant improvements in overall search accuracy. I-SPY’s collaborative search method makes no strong assumptions about the form of the underlying search engines and is generally applicable across a range of content types. Its ranking metric is computationally efficient (O(k) in the number of search results) and requires no additional parsing of result pages. Finally, its ability to personalize search results for the needs of a community is possible without the need to store individualised search histories; no individual user profiles are stored and no user identification is necessary. This has significant security and privacy advantages compared to many more traditional approaches to personalization.
References 1. K. Bharat. SearchPad: Explicit Capture of Search Context to Support Web Search. In Proceedings of the Ninth International World-Wide Web Conference, 2000. 2. K. Bradley, R. Rafter, and B. Smyth. Case-based User Profiling for Content Personalization. In O. Stock P. Brusilovsky and C. Strapparava, editors, Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-based Systems, pages 62–72. Springer-Verlag, 2000. 3. S. Brin and L. Page. The Anatomy of A Large-Scale Web Search Engine. In Proceedings of the Seventh International World-Wide Web Conference, 2001. 4. J. Budzik and K. Hammond. User Interactions with Everyday Applications as Context for Just-In-Time Information Access. In Proceedings International Conference on Intelligent User Interfaces. ACM Press, 2000. 5. D. Dreilinger and A. Howe. Experiences with Selecting Search Engines Using Meta Search. ACM Transactions on Information Systems, 15(3):195–222, 1997. 6. E. Glover, S. Lawrence, M. D. Gordon, W. P. Birmingham, and C. Lee Giles. Web Search - Your Way. Communications of the ACM, 2000. 7. E. J. Glover, G. W. Flake, S. Lawrence, W. P. Birmingham, A. Kruger, C Lee Giles, and D. M Pennock. Improving Category Specific Web Search by Learning Query Modifications. In Proceedings of the Symposium on Applications and the Internet (SAINT), pages 23–31. IEEE Computer Society, 2001. 8. T. H. Haveliwala. Topic-Sensitive PageRank. In Proceedings of the World-Wide Web Conference. ACM Press, 2002. 9. A. Kruger, C Lee Giles, F. Coetzee, E. Glover, G. Flake, S. Lawrence, and C. Omlin. Building a New Niche Search Engine. In Proceedings of the Ninth International Conference on Information and Knowledge Management., 2000. 10. N. Kushmerick. Wrapper Induction for Information Extraction. In Proceedings of the International Joint COnference on Artificial Intelligence, pages 729–735. Morgan-Kaufmann, 1997. 11. S. Lawrence. Context in Web Search. IEEE Data Engineering Bulletin, 23(3):25– 32, 2000. 12. S. Lawrence and C. Lee Giles. Context and Page Analysis for Improved Web Search. IEEE Internet Computing, July-August:38–46, 1998. 13. S. Lawrence and C. Lee Giles. Accessibility of Information on the Web. Nature, 400(6740):107–109, 1999. 14. S. Lawrence and C. Lee Giles. Searching the Web: General and Scientific Information Access. IEEE Communications, 37(1):116–122, 1999. 15. H. Lieberman. Letizia: An Agent That Assists Web Browsing”. In C. Mellish, editor, Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI’95, pages 924–929. Morgan Kaufman Publishers, 1995. Montreal, Canada. 16. M. Mitra, A. Singhal, and C. Buckley. Improving Automatic Query Expansion. In Proceedings of ACM SIGIR. ACM Press, 1998. 17. B. J. Rhodes and T. Starner. Remembrance Agent: A Continuously Running Automated Information Retrieval System. In Proceedings of the First International Conference on the Practical Applications of Intelligent Agents and Multi-Agent Technologies., pages 487–495, 1996. 18. E. Selberg and O. Etzioni. The Meta-Crawler Architecture for Resource Aggregation on the Web. IEEE Expert, Jan-Feb:11–14, 1997. 19. B. Smyth, E. Balfe, P. Briggs, M. Coyle, and J. Freyne. Collaborative Web Search. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI-03. Morgan Kaufmann, 2003. Acapulco, Mexico.