been used for optimizing the search ranking [1â3] although clicks are biased by .... work, they also used the shrunk domain name instead of the whole URL in ...
Click-graph Modeling in view of Facet Attribute Estimation of Web Search Queries No Author Given No Institute Given
Abstract. We use the clickthrough data of a Japanese commercial search engine to evaluate the similarity between a query and a facet category from the click patterns on URL. Using a small number of seed queries, we extract a set of topical words forming search queries together with the same facet directive words e.g. ‘curry recipe’, ‘software download’. We adopted a PageRank like random walk approach on the query-URL bipartite graphs called “Biased ClickRank”, in order to propagate facet attributes through click bipartite graphs. We noticed that query to URL links are too sparse to capture query variations whereas query to domain links are too dense to discriminate among the different usages of broadly related queries. We integrated decomposed URL paths into the click graph to capture the differences of click patterns at the appropriate level of granularity. Our expanded graph model improves recalls as well as average precisions against baseline graph models.
1
Introduction
Search engines record user interactions of search results in clickthrough logs. Typically, it consists of information such as search query string, time stamp, browser identifier, clicked URL and its rank position. The click information is considered as an implicit confirmation from users of URL relevance. This has been used for optimizing the search ranking [1–3] although clicks are biased by the rank positions of the URLs. In order to use such information for search assistance functions, Beeferman et al. converted the click through data into queryURL bipartite graphs where links represent the frequency of click event of the URL as a response to the query [4] and they merged the similar node on the basis of link patterns. The queries to which the same URLs are clicked as response play presumably similar roles in information seeking activities. This is therefore applicable to query suggestion or recommendation where the query to query similarities need to be taken into account. Another approach is query classification [5] or query intent estimation [6]. These methods use training data or seed data and learn classifiers to discriminate intentions behind the query, which are typically task related. We decide instead to take advantage of an obvious clue to user intention behind the query: the facet directive words sometimes present in the query. Examples of such facets are “recipe” in a query like “pizza recipe” or “image” in
Table 1. Example of clicked URLs of ‘curry’ and ‘curry recipe’ curry
http://www.currymuseum.com/ http://www.ichibanya.co.jp/ http://www.sbfoods.co.jp/ curry recipe http://www.recipe.nestle.co.jp/kind/curry/ http://recipe.gourmet.yahoo.co.jp/ http://www.foods.co.jp/curry/
“Eiffel tower image.” These facets are sometimes used as a vertical search indicator as in commercial search engines like Yahoo search [7] or Yahoo Japan search [8]. As illustrated in Table 1, the query “curry” is related to a curry museum, a curry restaurant (ichibanya) or a spice company (sbfood) whereas “curry recipe” is specific to the recipe specialized sites such as “recipe.nestle.co.jp”, “recipe.gourmet.yahoo.co.jp” or “foods.co.jp”. The query “curry recipe” clarifies the user intention. Therefore, URLs about recipe are clicked rather than URLs about restaurant, museum, history, etc. The second part of a facet query does not form a noun phrase with the preceding word but it specifies the search intention. The first part of the query is referred to as the “topic” and the second part as the “facet” or “facet directive” word. The first part indicates the domain of interests of search users whereas the second part helps to focus the search intention. It is syntactically easy to identify facet directive part in Japanese queries, because searchers normally do not put a space in noun phrases but they put it before a facet directive word. The question we address in this work is how, starting from the click patterns of a query like “curry” or “curry recipe”, we can relate it to other queries of the same facet like “pasta” or “pasta recipe” on the basis of click pattern in order to retrieve the topic words associated with the facet. We found that a random walk approach through the query - URL click graphs, similar to the topic sensitive PageRank algorithm on the Web graph is an effective method. Firstly we propose in Section 3 an effective and efficient adaptation of the biased PageRank [9] to bipartite graphs. We show how it improves the effectiveness over the baseline model of Li et al. [6] with Japanese clickthrough data. Secondly, in Section 4 we propose to expand query - URL bipartite graphs by taking the URL structure into account. This expanded graph is able to capture the finer relations between queries and URLs. We compare this with two baseline models namely query - URL and query - domain bipartite graphs. The rest of the paper is organized as follows. Section 2 briefly surveys the previous studies on related matters. We present our evaluation results and analyses in Section 5. Section 6 concludes the paper. To the best of our knowledge, this is the first study to expand the click graphs in this way. This method allows for massive amount of queries and URLs to be annotated with facet directives. This can be used for search assistance functions such as query suggestion / recommendation or directly to improve search ranking.
2
Related Work
A click event on a URL is considered as an implicit user feedback regarding the relevance of a document to a query. Clicks on a document are strongly biased by the document position in the ranking, but suitable methods have been successfully used to optimize search engine ranking [1–3]. Beeferman et al. studied Web search query logs and carried out clustering on click through data by iterating the following steps: (a) combining two most similar queries, (b) combining two most similar URLs [4]. The generated clusters are used for enhancing query suggestion functions assisting web search users. Since this approach is essentially based on clustering rather than classification, it is not directly applicable to annotate queries or URLs with certain facets. Xue et al. [10] studied the click data to create meta-data for the web pages. They estimated document to document similarities on the basis of co-clicked queries and then query strings, used as meta-data or tags, are spread over similar documents. Craswell et al. who use click data for image retrieval, experimented with backward random walks [11]. The method is based on query to document transition probabilities on the click graph. Rose et al. studied query log analysis in view of user goal of the search activity [12]. They reorganized Broder’s trichotomy of web search types [13], and replaced “transactional” by “resource” queries. It is worth notifying that the “resource” queries tend to collocate with facet specific directive words such as “download”, “install” or “mpeg”. Nguyen et al. studied the faceted query classification into schemes of attributes such as “ambiguity”, “authority sensitivity”, “temporal sensitivity” and “spatial sensitivity” [14]. They mapped their schemes to Rose et al’s goal classifications. We are mostly inspired by Li et al. who used a set of queries as a seed set and propagate these labels through the click graph. They applied this method to semi-supervised query classifications of two intent classes, namely product and job intent [6]. Their method uses both learning with click graphs and contentbased regularization but we only evaluate their click graph learning model as the baseline. We will cover this in more detail in the next section. Similar to this work, they also used the shrunk domain name instead of the whole URL in their bipartite graphs.
3
Learning biased ClickRank vectors with click graphs
Assume U a set of URLs, Q a set of queries, V = (U ∪ Q) is the set of vertices of the click graph. The click graph is a bipartite graph defined as (V, E). Each edge in E represents registered click events on a URL in U in response to a query in Q. This graph is represented by an adjacency matrix of A ∈ IN |V|×|V| where the element Ai,j is the click count of the URL i against the query j. The edges are bi-directional i.e. once a URL is clicked against a query a bi-directional edge is made. We normalize the adjacency matrix by out degrees, i.e. the sum of each rows, to obtain transition matrix B:
Ai,j Bi,j = P j Ai,j
(1)
Because the edges are bi-directional, matrix A is symmetric, whereas matrix B is not since the denominator i.e. out-degree of each vertex is different for each row. Unlike inter-page hyper-link structures of the Web PageRank is computed on, out-degrees are always strictly positive in such click graphs, since the edges and vertices are created only when a click event is observed. Let S be a subset V representing a set of seed queries or URLs corresponding to a specific search intention. The vector s of dimension |V| is defined as follows: ( 1/|S| (Vi ∈ S) (2) si = 0 (Vi 6∈ S) where Vi is the query or URL indexed by i in matrix A. In this paper, we use only queries as input seed sets. The score of each vertex is computed iteratively according to m(k) = (1 − α)Bm(k−1) + αs with α ∈]0, 1[
(3)
until convergence is achieved, i.e. until m(k) = m(k−1) for all k beyond a certain value. We call the stationary solution m∗ . We ranked each queries in the click graph according to the descending order of the score. (This ranking excludes the seed queries). This process is equivalent to the biased PageRank i.e. a random walk consisting of the transition to neighbor vertex with probability (1 − α) and the random teleportation to one of seed vertex with probability α. We call such adaptation of biased PageRank to the click graphs ‘biased ClickRank’. Haveliwala introduced biased PageRank in his topic sensitive PageRank paper [9]. The traditional PageRank uses a random walk with a random teleportation and it gives a general importance score to each page on the Web graph whereas topic sensitive PageRank uses a biased teleportation to a subset of vertex representing a topic. Consequently the scores reflect not only the importance but also the topical relevance to the seed set. For comparison, Li et al. normalize the scores in the path from a query via clicked URLs to other queries. (P T (i = j) j (AA )i,j Di,j = 0 (i 6= j) B 0 = D−1/2 A m(k) = (1 − α)B 0 B 0T m(k−1) + αs
(4)
The inconvenient of this method is that the computation of D is very expensive, which limits the size of the graph that can be handled in practice. Moreover, this normalization is sensitive to URLs with many clicks because it normalizes the scores by the sum of the query vertex scores neighboring to the target query via any URL.
4
Using URL structures
Fig. 1. Example of graphs of queries and Fig. 2. Example of graphs of queries and URLs domains
URL and queries tied by a click event tend to have a strong semantic relation but the associated click graph is sparse. Weak relations are comparatively difficult to extract. For example Fig. 1 shows the URLs of the ministry of foreign affairs along with the queries used to search for it. ‘England’, ‘Ukraine’ and ‘USA’ have no link with the URL. We call this basic model as URL node model. Using the domain name instead of the URL as Li et al. merged the URLs by three level domain name matching [6], all other queries including ‘visa’, ‘passport’ and ‘mofa’1 link to the domain name. Fig. 2 shows the graph using the domain instead of URLs for the same click data, which we call as domain node model. The problem now is that all queries have a tie with the same strength to the domain name irrespective of the semantic relations. In order to capture the different levels of granularity, we use the hierarchical structure of the URL by extracting sequentially the left substrings of the URL. Fig. 3 shows an example of such an expansion by supplementing prefix URLs, which we propose in this paper, namely hierarchy node model. The query is considered as having weighted relations with each levels of the hierarchy attached to the URL. For example, the query ‘England’ is related to each of prefixes of the clicked URL with the same weight as in Table 2. We will show that this model is able to capture the semantic relation among queries. We derived different weighting schemes to quantify the relation between a query and the URL hierarchy: BW stands for Baseline Weighting where edges are added with a weight equal to the original query to URL weight extracted from the transition matrix B. The original edge between the query and the full URL retains its original weighting as well as expanded edges are given the same weight. In this scheme, clicks on a URL with a deeper hierarchy have comparatively more impact. 1
‘Ministry of Foreign Affairs’
Fig. 3. Example of graphs of queries and URL paths Table 2. Example of supplemented URLs and their weights ; when the original frequency is 1 Original and supplemented URL BW UW LDW EDW http://www.mofa.go.jp/mofaj/area/uk/index.html 1 0.2 0.3333 0.5161 http://www.mofa.go.jp/mofaj/area/uk 1 0.2 0.2667 0.258 http://www.mofa.go.jp/mofaj/area 1 0.2 0.2 0.129 http://www.mofa.go.jp/mofaj 1 0.2 0.1333 0.0605 http://www.mofa.go.jp 1 0.2 0.0667 0.0322
UW stands for Uniform Weighting. Here the original weight is distributed evenly on the edges generated by the sequential expansion. The sum of edges’ weights now accounts for the original weight. LDW stands for Linearly Decaying Weighting, where edges are added with a weight linearly decaying while the suffix of the original URL is shrunk. The sum of edges’ weights accounts for the original weight. EDW stands for Exponentially Decaying Weighting where edges receive a weight exponentially decaying while the suffix of the original URL is shrunk. The sum of edges’ weights accounts for the original weight. LDW uses the following weight on the click frequency. LDW = f req ·
p n(n + 1)/2
(5)
where p is the position from the root hierarchy and n is the depth of the leaf hierarchy. For example, p = 5 and n = 5 for the first URL in Table 2 and p = 1 and n = 5 for the last URL.
EDW is:
ap EDW = f req · Pn
i=1
ai
(6)
where a is a smoothing parameter which adjusts the decaying curve; we used 0.5 in this paper.
5
Evaluation
5.1
Experimental settings
We selected one million edges with high click frequency from one day of the clickthrough logs of a Japanese commercial web search engine. Table 3 reports some statistic of this 1M.set data set. We also extracted a smaller subset, 100K.set, intended for comparison with the baseline method. This subset has been kept small to ease the computation. Seed sets for 1M.set consists of seven facet words, namely ‘download’, ‘fashion’, ‘image’, ‘recipe’, ‘reservation’, ‘second hand’ and ‘stock price’, which are frequently used in Japanese queries. The queries including these facets directive words as their last element are extracted from the query log data. As the 100K.set contains only a small number of examples, we used only two facet words. Table 4 shows examples of seed data for ‘fashion’, ‘recipe’ and ‘reservation’. Each of these queries were issued at least once together with a facet directive words in our logs. We used these queries to form various test sets. We partitioned the queries into two parts: a seed and a test set. The seed set is used to learn the biased ClickRank vectors, and the test set is reserved for evaluation. Evaluation consists of a 2-fold cross validation against seed / test sets. For the seed sets, the queries both with and without facet directive words are used. For evaluation, only queries without facet directive words are used. Following Haveliwala [9], we set α to 0.25 in Eq 3. The biased ClickRank algorithm gives a score to each query as well as to each URL in the bipartite graph. We use the query score to rank the queries. After excluding seed queries and queries with test facet directive words2 , we evaluate the query ranking in terms of standard evaluation measures of information retrieval such as recall at 1000, precision at 100 and average precision [15]. 5.2
Results
Table 5 shows the top ranked queries for the ”recipe” seed set. Most queries are related to cooking, cuisine or food and some queries indicate a Web site or a TV program providing such information. We also computed the ranking using 2
We excluded queries with the facet directive word in test queries, because their facet attribute is explicit from the query surface form.
Table 3. Statistics of the experimental data
Number Number Number Number Number Number Number Number
of of of of of of of of
1M.set 100K.set query nodes 403,574 51,664 URL nodes 686,055 74,370 1,000,000 100,000 query - URL edges each URL hierarchy node 1,054,717 111,962 supplemented edges 2,464,690 202,497 domain nodes 225,728 41,603 query to domain edges 932,192 94,845 seed/test queries 4,344 354
Table 4. Examples of facets and their seed queries
(Fashion) (street), (France), (Celebrities), 80 (80’s) (Recipe) (Thai curry), (apple sour), !#"%$'& (# ) (bread), ,- (Reservation) .0/ (golf), "#&!" + * "%1' (Italian) (skyliner:rapid train), jr 234 (jr super express), 5#687#90/ (Hotel abroad) the method by Li et al.3 . Because this last method takes as long as 8 hours to compute a set of two fold cross validation against one facet classification (compared to only 2 minutes with our method), we used the specially reserved 100K.set for comparison. For evaluation purposes we assume that the evaluation query sets represent the distribution of true category query sets. We used recall at 1000 and average precision to compare query ranks. As Table 6 shows, biased ClickRank seems to perform better than the method of Li et al. Improvements in macro average recalls of 12.4% and in mean average precision of 110.2% are observed. Fig. 4 shows precision-recall curves of two model results, where the biased ClickRank overwhelms the baseline model. The difference is mainly caused by the normalization strategies. Li et al. used the ‘volume’ of all length-of-two-paths for regularization i.e. sums of all elements of each row of AAT . As Baeza-Yates et al. [16] suggests, the click distribution follows the power law distribution: there is a small number of URLs with many clicks and many URLs with a small number of clicks. Once a query is linked to a URL on which so many clicks are registered and possibly related to many queries, influences of other URLs with small number of clicks are faded away in their method. Table 7 compares the three models described in Section 4 for generating the graph. We compared the macro averaged recalls of the first 1000 results 3
They also propose to regularize their graph based algorithm by a content based classification. In this paper, we compared only with their graph-based algorithm.
Table 5. Top 10 queries against ‘recipe’ score / (goya champuru:traditional cuisine) 0.0503 .
(tofu hamburg) 0.0336 0.0311 & / 1 (carbonara) (salad noodle) 0.0245
(Hanamaru market:TV program) 0.0240 !"# (cold sliced meet) 0.0233 $ $ ) % (cookpad:recipe site) 0.0232
query
&(')*(+ (simple cookies) 0.0231 &('-,./0 21 (simple how to cook pikled plums) 0.0227 3 45 6 (vichyssoise) 0.0219
Table 6. Comparison of the method by Li et al. and biased ClickRank using 100K.set measured by recall@1000 and average precision (in parentheses). Both models using domain node model. Number of relevant queries in brackets. Facet label[#rel] Li et al Biased ClickRank Image[220] 0.5909 (0.0670) 0.6227 (0.1411) Recipe[134] 0.5746 (0.0778) 0.68.66 (0.1632) Macro Avg. 0.5828 (0.0724) 0.6546 (0.1522)
of seven facet words as well as average precisions. The hierarchy node model achieved the best with 0.4596, succeeded by the domain node with 0.4184 and the URL node with 0.3780. The differences between the hierarchy node model and other two models are statistically significant (t-test p < 0.05), whereas the difference between the domain node and URL node model is not. Average precision measures indicate similar tendencies but the difference between the hierarchy and URL node models is not statistically significant. In both measures, the hierarchy node model is consistently better than other models. Fig. 5 shows the precision-recall curves of hierarchy, domain and URL node models. Although the URL node model is better in the initial precision, the hierarchy node model outperforms in other recall points. After the recall point of 0.5, the curve of the URL node model promptly approaches the bottom line. The limitation of the URL node model is clearly related to the graph sparseness. Starting, for example, from the 689 seed query nodes associated to the ‘recipe’ facet, only 7,879 out of the 403,574 queries of the whole graph receive a positive score. The other queries are disconnected from the seed queries. Table 8 compares the three weighting strategies for the hierarchy node model described in Section 4 with the baseline weighting (BW) reported in Table 7. None of the strategies improves the recall at a statistically significant level. In average precision, LDW is the best but the difference is not statistically significant. For the BW strategy, the longer URLs are overweighted compared to the shorter URLs because of the largest number of edges generated is not
1
1 0.9
0.8
0.8
0.7
0.7
0.6
0.6 Precision
Precision
Biased ClickRank Li et al 0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
Hierarchy node method Domain node method URL node method
0.1
0 0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
0 0
0.1
0.2
0.3
0.4
0.5 Recall
0.6
0.7
0.8
0.9
1
Fig. 4. Precision-recall curves of 100K.set Fig. 5. Precision-recall curves of 1M.set evaluations: Biased ClickRank and Li et al’s evaluations: Domain node, hierarchy node algorithm and URL node models Table 7. Comparison of graph models measured by recall@1000 and average precision (in parentheses). The hierarchy node model uses the BW weighting strategy. Number of relevant queries in brackets. Facet label[#rel] Download[375] Fashion[52] Image[2235] Recipe[1374] Reservation[60] Second hand[57] Stock price[191] Macro Avg.
Domain node 0.4747 (0.1791) 0.2885 (0.0379) 0.2380 (0.1767) 0.3552 (0.1941) 0.5833 (0.1555) 0.5439 (0.1015) 0.4450 (0.0886) 0.4184 (0.1333)
Hierarchy node 0.5306 (0.2197) 0.2885 (0.0435) 0.2676 (0.2113) 0.4192 (0.2773) 0.6333 (0.1770) 0.5965 (0.1030) 0.4817 (0.1161) 0.4596 (0.1640)
URL node 0.4613 (0.1825) 0.2115 (0.0658) 0.2237 (0.1213) 0.3035 (0.1542) 0.6500 (0.1903) 0.4035 (0.0965) 0.3927 (0.1177) 0.3780 (0.1326)
compensated by a reduction of the edge weights. The fact that the BW strategy is slightly better than other strategies at least in recall at 1000 suggests that the longer URLs are more useful than shorter URLs as ranking features. 5.3
Editorial Evaluation
We observe in Table 7 that the facets ‘recipe’ and ‘image’ do not perform well compared to other facet categories. Because human assessors do not report such degradation, we suspect the existence of relevant but unlabeled queries in the higher positions of the ranking. This obviously causes a degradation in both precisions and recalls at higher rankings. We assessed manually the relevance of the top 100 queries and we report the results in Table 9 in terms of precision at 100. The URL node model is the best and the hierarchy node model is slightly inferior than that. The domain node model performs much worse. The hierarchy node model is able to retrieve thoroughly related queries sacrificing slightly early precisions. This order is consistent with the initial precisions in Fig. 5, consequently we can validate our test set automatically extracted from
Table 8. Comparison of weighting strategies of expanded graph nodes evaluated by recall@1000 and average precision (in parentheses). Number of relevant queries in brackets. Facet label[#rel] Download[375] Fashion[52] Image[2235] Recipe[1374] Reservation[60] Second hand[57] Stock price[191] Macro Avg.
UW 0.5360 (0.2110) 0.2885 (0.0526) 0.2613 (0.2003) 0.4258 (0.2670) 0.6833 (0.1916) 0.5439 (0.1046) 0.4764 (0.1211) 0.4593 (0.1640)
LDW 0.5387 (0.2147) 0.2885 (0.0571) 0.2519 (0.1977) 0.4250 (0.2718) 0.7000 (0.1967) 0.5439 (0.1090) 0.4607 (0.1253) 0.4584 (0.1675)
EDW 0.5200 (0.2122) 0.2885 (0.0589) 0.2501 (0.1928) 0.4170 (0.2631) 0.7000 (0.1994) 0.5439 (0.1075) 0.4607 (0.1245) 0.4543 (0.1655)
Table 9. Editorial evaluation of graph models by precision@100 (number of retrieved relevant queries in parentheses) Facet label Domain node Hierarchy node URL node Image(2235) 0.41 (41) 0.50 (50) 0.53 (53) 0.26 (26) 0.43 (43) 0.46 (46) Recipe(1374) Macro Avg. 0.335 0.465 0.495
query logs. Despite the existence of unlabeled relevant queries, our test set is reliable to compare the effectiveness of query ranking methods.
6
Conclusions
In this study, we applied PageRank like activation propagation approach, namely biased ClickRank against click graphs extracted from the click through data of a commercial web search engine. From limited number of seed queries, the method is able to estimate queries with similar characteristics in search user behaviors. Our score propagation method is 12.4% better than the baseline model by Li et al. in recall and 110.2% better in average precision. The hierarchy node model found to be very effective to achieve high recall with reasonably good precision. It improves 9.8% against the domain node model and 21.6% against the URL node model in recall. As the future plans, not only the URLs but also queries will be expanded by decomposing into their constituent parts in the query-URL bipartite graphs. This hopefully helps for further alleviating sparseness problems in query-URL click graphs.
References 1. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: Proceedings of the 28th annual
2.
3.
4.
5.
6.
7. 8. 9.
10.
11.
12.
13. 14.
15. 16.
international ACM SIGIR conference on Research and development in information retrieval, ACM New York, NY, USA (2005) 154–161 Dupret, G., Piwowarski, B.: A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (2008) 331–338 Dupret, G., et al.: Cumulated relevance: A model to estimate document relevance from the clickthrough logs of a web search engine. In: Proceedings of the Third International Conference on Web Search and Web Data Mining, WSDM 2010, New York City, USA. (2010) to appear. Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM New York, NY, USA (2000) 407–416 Broder, A., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. (2007) 231–238 Amsterdam, The Netherlands. Li, X., Wang, Y., Acero, A.: Learning query intent from regularized click graphs. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM New York, NY, USA (2008) 339– 346 : Yahoo! Search http://search.yahoo.com/. : Yahoo! JAPAN http://www.yahoo.co.jp/. Haveliwala, T.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering 15(4) (2003) 784–796 Xue, G.R., Zeng, H.J., Chen, Z., Yu, Y., Ma, W.Y., Xi, W., Fan, W.: Optimizing web search using web click-through data. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management. (2004) 118– 126 Washington, D.C., USA. Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, ACM New York, NY, USA (2007) 239–246 Rose, D.E., Levinson, D.: Understanding user goals in web search. In: Proceedings of the 13th international conference on World Wide Web, WWW 2004, New York, NY, USA, May 17-20, 2004. (2004) 13–19 Broder, A.: A taxonomy of web search. ACM SIGIR Forum 32(2) (2002) 3–10 Nguyen, B.V., Kan, M.Y.: Functional faceted web query analysis. In: Proceedings of the Sixteenth International World Wide Web Conference (WWW2007) Workshop on Query log analysis: Social and Technological Challenges. (2007) http://www2007.org/workshop-W6.php, Banff, Canada. : Text REtrieval Conference, Common evaluation measures http://trec.nist.gov/pubs/trec16/appendices/measures.pdf. Baeza-Yates, R., Tiberi, A.: Extracting semantic relations from query logs. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. (2007) 76–85 San Jose, California, USA.