biografy arturo prat naval battle in iquique. 47. 2. 2. 3 goverment of jose joaquin prieto. 3 ancon treaty. 3. Fig. 4. Query ''naval battle in Iquique'' and associated ...
Recommending Better Queries Based on Click-Through Data Georges Dupret and Marcelo Mendoza Departamento de Ciencias de la Computaci´ on Universidad de Chile Abstract. We present a method to help a user redefine a query based on past users experience, namely the click-through data as recorded by a search engine. Unlike most previous works, the method we propose attempts to recommend better queries rather than related queries. It is effective at identifying query specialization or sub-topics because it take into account the co-occurrence of documents in individual query sessions. It is also particularly simple to implement.
1
Introduction
There is hardly a paper in the Literature over search engines that does not start reminding readers of the spectacular growth of the Internet and the need to develop new methods to help the user navigate through this immense source of heterogeneous information. Besides the practical difficulties inherent to the manipulation of a large number of pages, the difficulty of ranking documents based on generally very short queries and the fact that query needs are often imprecisely formulated, users are also constrained by the query interface, and often have difficulty articulating a query that could lead to satisfactory results. It happens also that they do not exactly know what they are searching for and select query terms in a trial and error process. Better formulation of vague queries are likely to appear in the query logs because experienced users tend to avoid vague formulation and issue queries specific enough to discard irrelevant documents. Different studies sustain this claim: [11] report that an independent survey of 40,000 web users found that after a failed search, 76% of users try rephrasing their query on the same search engine. Besides, the queries can be identical or articulated differently with different terms but they represent the same information needs as found and reported in [9, 14]. We propose in this work a simple and novel query recommendation algorithm. While most methods recommend queries related to the original query, this method aims at improving it. 1.1
Related Work
The scientific literature follows essentially three research lines in order to recommend related queries: query clustering algorithms, association rules and query expansion.
There is recent related work on query clustering using query log files. BaezaYates et al. [2] build a term-weight vector for each query. Each term is weighted according to the number of document occurrences and the number of selection of the documents in which the term appears. Unfortunately, the method is limited to queries that appears in the log and is biased by the search engine. Wen et al. [15] propose to cluster similar queries to recommend URLs in answer to frequently asked queries. They use four notions of query distance: (1) based on query keywords or phrases; (2) based on string matching of keywords; (3) based on common clicked URLs; and (4) based on the distance of the clicked documents in some predefined hierarchy. From a study of the log of a popular search engine, Jansen et al. [9] conclude that most queries are short (around 2 terms per query) and imprecise, and that the average number of clicked pages per answer is very low (around 3 clicks per query). Thus, notions (1)-(3) are difficult to deal with in practice, because distance matrices between queries generated by them are very sparse. [16] and [7] nevertheless report satisfactory results. Notion (4) needs documents to be classified in a concept taxonomy. Za¨ıane and Strilets [18] present a query recommendation method based on seven different notions of query similarity. Three of them are mild variations of notion (1) and (3). The remaining use the content and title of the URLs in the result of a query. Their approach is designed for a meta-search engine and thus none of their similarity measures consider user click-through data. Fonseca et al. [8] present a method to discover related queries based on association rules. Here queries represent items in traditional association rules. The query log is viewed as a set of transactions, where each transaction represent a session in which a single user submits a sequence of related queries within a time frame. The method gives good results, however two problems arise. First, it is difficult to determine sessions of successive queries that belong to the same search process. On the other hand, the most interesting related queries, those submitted by different users, cannot be discovered. This is because the support of a rule increases only if its queries appear in the same query session and thus they must be submitted by the same user. Finally, there are several relevant works based on query augmentation or expansion [1, 5, 6] and query reformulation [12]. The basic idea here is to reformulate the query such that it gets closer to the term-weight vector of the documents the user is looking for. In [4], large query logs are used to construct a surrogate for each document consisting of the queries that were a close match to that document. It is found that the queries that match a document are a fair description of its content. They also investigate whether query associations can play a role in query expansion. In this sense, in [13], a somewhat similar approach of summarizing document with queries is proposed: Query association is based on the notion that a query that is highly similar to a document is a good descriptor of that document. The concept of related search has recently taken a new importance with the development of search engines that place it at the center of the ranking algorithm [16] to improve precision (e.g. AskJeeves, http://www.askjeeves.com/).
Unlike traditional search engines, these new systems try to understand the users question in order to suggest similar questions that other people have asked and for which the system has the correct answers. These answers have been prepared or checked by human editors in most cases. The assumption behind such a system is that many people are interested in the same questions –the Frequently Asked Questions (FAQs). Based on these observations, Wen et al. [16] suggest a method for clustering queries in order to identify FAQs and their set of associated relevant documents. According to [17], Teoma (www.teoma.com) also uses a click-through log-based ranking. Given a particular query, it utilizes the previous session logs of the same query to return pages that most users visit. To get a statistically significant result, it is only applied on a small set of popular queries. Because the engine is a commercial one, more details on the method are uneasy to obtain. The method we propose is closely related to the work of Wen et al. [10]. These authors observe that common selections in two or more queries identify common sub-topics and they derive a measure of similarity between queries based on user feedback. They also propose a similarity measure proportional to the shared number of clicked (or selected) documents taken individually and observe a “surprising ability to cluster semantically related queries that contain different words”. It is worth to quote them further, as the argument is the basis of this work: “In addition, this measure is also very useful in distinguishing between queries that happen to be worded similarly but stem from different information needs. For example, if one user asked for “law” and clicked on the articles about legal problems, and another user asked “law” and clicked the articles about the order of nature, the two cases can be easily distinguished through user clicks.” 1.2
Contribution
Most work on the related topics of query recommendations and augmentations are based on some form of clustering, where the features are the terms appearing in the queries and/or the documents. Recommendations or query augmentation terms are then extracted from the clusters. Because they are related to the original query, they might help the user to refine it. By contrast, the simple method we propose here aims at discovering alternate queries that improve the search engine ranking of documents: We order the documents selected during past sessions of a query according to the ranking of other past queries. If the resulting ranking is better than the original one, we recommend the associated query. This is described in Sect. 2. We apply this method in Sect. 3 on the query logs of TodoCL, an important Chilean search engine. Finally we conclude and describe future work in Sect. 4.
2
Query Recommendation Algorithm
Many identical queries can represent different user information needs. Depending on the topic the user has in mind, he will tend to select a particular sub-group
of documents. Consequently, the set of selections in a session reflects a sub-topic of the original query. Consider for example the users submitting the query car. They might be interested in renting, buying, repairing or formula one. Whatever their variety of interests, users initially interested in car rental will make similar selections. We might attempt to assess the existing correlations between the documents selected during sessions of a same query, create clusters and identify queries relevant to each cluster, but we prefer a simpler, more direct method where clustering is done at the level of query sessions. We need first to introduce some definitions. We understand by keyword any unbroken string that describes the query or document content. A query is a set of one or more keywords that represent an information need formulated to the search engine. The same query may be submitted several times. Each submission induces a different query session. Following Wen et al. [15], a query session consists of one query and the URLs the user clicked on, while a click is a Web page selection belonging to a query session. We also define a notion of consistency between a query and a document: Definition 1 (Consistency). A document is consistent with a query if it has been selected a significant number of times during the sessions of the query. Consistency ensures that a query and a document bear a natural relation in the opinion of users and discards documents that have been selected by mistake once or a few time. Similarly, we say that a query and a set of documents are consistent if each document in the set is consistent with the query. We now describe the algorithm. Consider the set D(sq ) of documents selected during a session sq of a query q. If we make the assumption that this set represents the information need of the user who generated the session, we might wonder if alternate queries exist in the logs that 1) are consistent with D(sq ) and 2) better rank the documents of D(sq ). If these alternate queries exist, they return the same information D(sq ) as the original query, but with a better ranking and are potential query recommendations. We then repeat the procedure for each session of the original query, select the alternative queries that appear in a significant number of sessions and propose them as recommendations to the user interested in q. To apply the above algorithm, we need a criteria to compare the ranking of a set of documents for two different queries. We first define the rank of a document in a query: Definition 2 (Rank of a Documents). The rank of document u in query q, denoted r(u, q), is the position of document u in the listing returned by the search engine. We say that a document has a high or large rank if the search engine estimates that its relevance to the query is comparatively low. We extend this definition to sets of documents: Definition 3 (Rank of a Set of Documents). The rank of a set U of documents in a query q is defined as r(U, q) = max r(u, q) . u∈U
q
1
q
U1
U2
U3
U4
U5
U6
1
2
3
4
5
6
2
Fig. 1. Comparison of the ranking of two queries. A session of the original query q1 contains selections of documents U3 and U6 appearing at position 3 and 6 respectively. The rank of this set of document is 6 by virtue of Def. 3. By contrast, query q2 achieves rank 4 for the same set of documents and therefore qualifies as a candidate recommendation.
In other words, the documents with the worse ranking determines the rank of the set. If a set of documents achieves a lower rank in a query qa than in a query qb , then we say that qa ranks the documents better than qb . Definition 4 (Ranking Comparison). A query qa ranks better a set U of documents than a query qb if r(U, qa ) < r(U, qb ) . This criteria is illustrated in Fig. 1 for a session containing two documents. One might be reluctant to reject an alternative query that orders well most documents of a session and poorly only one or a few of them. The argument against recommending such a query is the following: If the documents with a large rank in the alternate query appear in a significant number of sessions, they are important to the users. Presenting the alternate query as a recommendation would mislead users into following recommendations that conceal part of their information need. We can formalize the methods as follows: Definition 5 (Recommendation). A query qa is a recommendation for a query qb if a significant number of sessions of qa are consistent with qb and are ranked better by qa than by qb . The recommendation algorithm induces a directed graph between queries. The original query is the root of a tree with the recommendations as leaves. Each branch of the tree represents a different specialization or sub-topic of the original query. The depth between a root and its leaves is always one, because we require the recommendations to improve the associated document set ranking. We could follow another policy and add to the graph the recommendations of the recommendations iteratively. We would obtain a deeper graph, but as experiments show, that it is of little value because the sub-topics of a recommendation are generally not sub-topics of the original query. Finally, we observe that nothing prevents two queries from recommending each other:
Definition 6 (Quasi-synonyms). Two queries are quasi-synonyms when they recommend each other. We will see in the numerical experiments of the next section that this definition leads to queries that are indeed semantically very close.
3
Evaluation of the Algorithm
The algorithm implied by Def. 5 is easy to implement once the log data are organized in a relational database. It is considerably simpler than most of the algorithms referred to in the Introduction. Candidate queries for the set of documents selected during a session are searched, and if a significant proportion of sessions points to a given query, it is recommended. We identify three methods for evaluating query recommendations in the Literature. The first consists in asking a group of volunteers to test the system. Results are then compiled and conclusion drawn in favor of one method or another [13]. The second method consists in modifying an existing search engine and in proposing the query recommendations to regular users. This solution is adopted in [3] where the author had access to the lycos search engine. Finally, in [18], the authors opted to analyze their results in detail to evaluate whether they make sense. This is the methodology we adopt here. We used the logs of the TodoCL search engine for a period of three months (www.todocl.cl). TodoCL is a search engine that mainly covers the .cl domain (Chilean web pages) and some pages included in the .net domain that are related with Chilean ISP providers. It indexed over 3,100,000 Web pages and has currently over 50,000 requests per day. Over three months the logs gathered 20,563,567 requests, most of them with no selections: Meta search engines issue queries and re-use the answer of TodoCL but do not return information on user selections. A total of 213,540 distinct queries lead to 892,425 registered selections corresponding to 387,427 different URLs. There are 390,932 query sessions. Thus, in average users clicked 2,28 pages per query session and 4,17 pages per query. Queries are originally in Spanish and have been translated to English. We intend to illustrate that the recommendation algorithm has the ability to identify sub-topics and suggest query refinement. Fig. 2 shows the recommendation graph for the query valparaiso. The number inside nodes refer to the number of sessions registered in the logs for the query. They are reported only for nodes having children, as they help make sense of the numbers on the edges which count the number of users who would have benefited of the query reformulation. This query requires some contextual explanation. Valparaiso is an important touristic and harbor city, with various universities. It is also the home for the Mercurio, the most important Chilean newspaper. Fig. 2 captures all these informations. It also recommends some queries that are typical of any city of some importance like city hall, municipality and so on. The more potentially beneficial recommendations have a higher link number. For example 9/33 ' 27% of the users would have had access to a better ranking of the documents they
harbour of valparaiso 9 university 9 electromecanics in valparaiso 8 valparaiso 33
7
www.valparaiso.cl
7 municipality valparaiso 9 2
11
mercurio valparaiso
el mercurio 20
Fig. 2. Query valparaiso and associated recommendations. The node number indicate the number of query session. The edge numbers count the sessions improved by the pointed query. A given session can recommend various queries.
selected if they had searched for university instead of valparaiso. This also implicitly suggests to the user the query university valparaiso, although we are maybe presuming the user intentions. The query maps illustrated in Fig. 3 shows how a particularly vague query can be disambiguated. The TodoCL engine click-through data bias naturally the recommendations towards Chilean information. A third example concerns the ‘‘naval battle in Iquique’’ in May 21, 1879 between Peru and Chile. The query graph is represented in Fig. 4. The point we want to illustrate here is the ability of the algorithm to extract from the logs and to suggest to users alternative search strategies. Out of the 47 sessions, 2 sessions were better ranked by armada of Chile and sea of Chile. In turn, out of the 5 sessions for armada of Chile, 3 would have been better answered by sea of Chile. Probably the most interesting recommendations are for biography of Arturo Pratt, who was captain of the “Esmeralda”, a Chilean ship involved in the battle, for the Anc´ on treaty that was signed afterward between Peru and Chile and for the Government of Jos´ e Joaqu´ ın Prieto, responsible for the war declaration against the Per´ u-Bolivia Federation. A more complex recommendation graph is presented in Fig. 5. The user who issued the query fiat is recommended essentially to specify the car model he is interested in, if he wants spare parts, or if he is interesting in selling or buying
map of chilean cities 19 maps 53
12
map antofagasta 12
6 3
map argentina
map chile 8
Fig. 3. Query maps and associated recommendations
a fiat. Note that such a graph also suggests to a user interested in – say – the history or the profitability of the company to issue a query more specific to his needs. Finally, we already observed that two queries can recommend each other. We show in Table 1 a list of such query pairs found in the logs. We reported also the number of original query sessions and number of sessions enhanced by the recommendation so as to have an appreciation of the statistical significance of the links. We excluded the mutual recommendation pairs with less than 2 links. For example, in the first row, out of the 13 sessions for ads, 3 would have been better satisfied by advert, while 10 of the 20 sessions for advert would have been better satisfied by ads. Both pairs of queries lawyer tetareli ↔ tetareli and lawyer tetarelli ↔ tetarelli are quasi-synonyms, but although the four queries clearly refer to the same information need (only a spelling mistake distinguish the two pairs), they are no recommendations from one pair to the other. The reason is that TodoCL returns only documents with the spelling corresponding to the query. On the other hand, all the pairs reported in Table 1 are high quality quasi-synonyms, like van and light truck or genealogy and family name illustrates. The proposed method generates a clustering a posteriori where different sessions consisting of sometimes completely different sets of documents end up recommending a same query. This query can then label this group of session. In some sense, the very search engine has been used to cluster the queries.
4
Conclusion
We have proposed a method for query recommendation based on user logs that is simple to implement and has low computational cost. Contrary to methods based on clustering where related queries are proposed to the user, the recommendation method we propose are made only if they are expected to improve. Moreover, the algorithm does not rely on the particular terms appearing in the
armada of chile 5 2
naval battle in iquique 47
3
2
3 sea of chile
biografy arturo prat
3
3
goverment of jose joaquin prieto
ancon treaty
Fig. 4. Query ‘‘naval battle in Iquique’’ and associated recommendations.
query and documents, making it robust to alternative formulations of an identical information need. Our experiments show that query graphs induced by our method identify information needs and relate queries without common terms. Among the limitations of the method, one is inherent to query logs: Only queries present in the logs can be recommended and can give rise to recommendations. Problems might also arise if the search engine presents some weakness or if the logs are not large enough: Some variations in the expression of an information need fail to be clustered. Finally, queries change in time and query recommendation systems must reflect the dynamic nature of the data. If the documents appearing in answer to a query are re-ordered to match their popularity in the past sessions of the same query, an interesting synergy appears with our recommendation algorithm: The document ranking matching better past user selections, the association between selected documents and the query improves. At the same time, the recommendation algorithm matches past query sessions to an improved ranking, leading to recommendations of higher quality.
Table 1. “Quasi-synonym” queries recommend each other. Isapres is a generic name for health insurance companies and Tetareli or Tetarelli is supposed to be a family name. → 3/13 3/105 34/284 4/9 2/21 4/12 3/10 2/27 2/5 3/9 3/24 2/10 4/17 5/67 7/15 8/43 2/58
query ads cars chat chileporno classified ads code of penal proceedings courses of english dvd family name hotels in santiago isapres lawyer tetareli lawyer tetarelli mail in chile penal code rent houses van
query advert used cars sports pornochile advertisement code of penal procedure english courses musical dvd genealogy hotels santiago superintendence of isapres tetareli tetarelli mail company of chile code of penal procedure houses to rent light truck
← 10/20 2/241 2/13 2/142 2/20 2/9 2/5 2/5 2/16 2/11 3/8 2/10 2/20 2/3 2/9 2/14 3/25
References 1. R. Baeza-Yates. Query usage mining in search engines. Web Mining: Applications and Techniques, Anthony Scime, editor. Idea Group, 2004. 2. R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Current Trends in Database Technology - EDBT 2004 Workshops, LNCS 3268, pages 588–596, March 2004. Heraklion, Greece. 3. D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Knowledge Discovery and Data Mining, pages 407–416, 2000. 4. B. Billerbeck, F. Scholer, H. E. Williams, and J. Zobel. Query expansion using associated queries. In CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management, pages 2–9, New York, NY, USA, 2003. ACM Press. 5. K. Chakrabarti, M. Ortega-Binderberger, S. Mehrotra, and K. Porkaew. Evaluating refined queries in top-k retrieval systems. IEEE Transactions on Knowledge and Data Engineering, 16(2):256–270, 2004. 6. H. Cui, J. Wen, J. Nie, and W. Ma. Query expansion by mining user logs. IEEE Transaction on Knowledge and Data Engineering, 15(4), July/August 2003. 7. L. Fitzpatrick and M. Dent. Automatic feedback using past queries: social searching? In SIGIR ’97: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 306–313, New York, NY, USA, 1997. ACM Press. 8. B. M. Fonseca, P. B. Golgher, E. S. De Moura, and N. Ziviani. Using association rules to discovery search engines related queries. In First Latin American Web Congress (LA-WEB’03), November, 2003. Santiago, Chile. 9. M. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: a study of user queries on the web. ACM SIGIR Forum, 32(1):5-17, 1998.
10. J.-Y. N. Ji-Rong Wen and H.-J. Zhang. Clustering user queries of a search engine. inclu, 2001. 11. NPD. Search and portal site survey. Published by NPD New Media Services., 2000. Reported in www.searchenginewatch.com. 12. K. M. Risvik, T. Miko?ajewski, and P. Boros. Query segmentation for web search. WWW2003 Budapest, Hungary. ACM, May 2003. 13. F. Scholer and H. E. Williams. Query association for effective retrieval. In CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management, pages 324–331, New York, NY, USA, 2002. ACM Press. 14. C. Silverstein, M. Henzinger, M. Hannes, and M. Moricz. Analysis of a very large alta vista query log. In SIGIR Forum, volume 33(3), pages 6–12, 1999. 15. J. Wen, J. Nie, and H. Zhang. Clustering user queries of a search engine. In Proc. at 10th International World Wide Web Conference. W3C, 2001. 16. J.-R. WEN, J.-Y. NIE, and H.-J. ZHANG. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59–81, 2002. 17. G. Xue, H. Zeng, Z. Chen, W. Ma, and C. Lu. Log mining to improve the performance of site search. In WISE Workshops, volume 33, pages 238–245, 2002. 18. O. R. Za¨ıane and A. Strilets. Finding similar queries to satisfy searches based on query traces. In Proceedings of the International Workshop on Efficient Web-Based Information Systems (EWIS), Montpellier, France, September 2002.
7 2 fiat uno 3
2
7
2
auto nissan centra
9 automobiles fiat
6 7
tire fiat 600
9
6 6
fiat 32
9
fiat 600 11
4 2 8
7
fiat 147
4
second hand fiat
fiat bravo
8 9 spare pieces fiat 600 7 14
fiat sale 9 fiat palio Fig. 5. Query Fiat and associated recommendations.