Query Subtopic Mining for Search Result Diversification Md Zia Ullah
Masaki Aono†
Department of Computer Science and Engineering Toyohashi University of Technology 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi, 441-8580 Email:
[email protected]
Department of Computer Science and Engineering Toyohashi University of Technology 1-1 Hibarigaoka, Tempaku-cho, Toyohashi, Aichi, 441-8580 Email:
[email protected]
Abstract—Web search queries are usually short, ambiguous, and contain multiple aspects or subtopics. Different users may have different search intents (or information needs) when submitting the same query. The task of identifying the subtopics underlying a query has received much attention in recent years. In this paper, we propose a method that exploits query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query subtopics. In this regard, we estimate the importance of the subtopics by introducing multiple query-dependent and query-independent features, and rank the subtopics by balancing relevancy and novelty. Our experiment with the NTCIR-10 INTENT-2 English Subtopic Mining test collection shows that our method outperforms all participants’ methods in NTCIR-10 INTENT-2 task in terms of D#-nDCG@10. Keywords—subtopic mining; intent; diversification
I.
I NTRODUCTION
When an information need is being formulated in user’s mind, queries in the form of a sequence of words will be typed into the search box, ideally, the search engine should respond with a ranked list of snippet results that best meets the needs of users. Web search queries are typically short, ambiguous, and contain multiple aspects or subtopics [1]–[3]. A query is classified into two types, one is faceted and the other is ambiguous. The search intent of faceted queries is usually clear; so that the search engine can report helpful results. However, information retrieval systems often fail to capture users’ search intents exactly if a submitted query is ambiguous. Because an ambiguous query has more than one interpretation and different users have different intents for the same query, which corresponds to different subtopics. Some intents of a query are constantly popular; however some others intents are time-dependent. For example, the query “apple” may refer to two subtopics: (1) “apple Inc.” and (2) “apple fruit”. Each subtopic may also contain several second level subtopics; for example “apple iPhone 5s”, “apple iPad”, “apple iOS”, and “apple store” with respect to the subtopic of “apple Inc.”. In some cases, subtopics associated to query can be temporally ambiguous; for instance, the query US Open is more likely to be targeting the tennis open in September, and the golf tournament in June [4]. In addition, it is not clear which aspect of a multi-faceted query is actually desirable for a user. For example, the faceted query “air travel information” may contain different subtopics, such as (1) information on air travel, airports, and airlines, (2) restrictions for checked
baggage during air travel, and (3) websites that collect statistics and report about airports [5]. With the enormous size of the Web, a misunderstanding of the information needs underlying a user’s query can misguide the search engine to produce a ranked result page that may frustrate the user, and lead the user to abandon the originally submitted query. Traditional information retrieval models, such as the Boolean model and the vector space model treat every input query as a clear, well-defined representation, and completely neglect any sort of ambiguity. Ignoring the underlying subtopics of a query, information retrieval models might produce top ranked documents possibly containing too much relevant information on a particular aspect of a query, and eventually leave the user unsatisfied. To maximize the satisfaction of users, a retrieval model should select a list of documents that are not only relevant to the popular subtopics, but also covers different subtopics of the user’s query. In order to satisfy the user, a sensible approach is to diversify the documents retrieved for the user’s query [6]. The diversified retrieval models should produce a ranked list of documents that provide the maximum coverage and minimum redundancy with respect to the possible aspects underlying a query. The solution of the aforementioned diversification problem might be composed of two parts: understanding the intent behind a query i.e. query subtopic mining, and diversifying the results with respect to the user’s possible intents. Recently, mining subtopics of an user query for diversifying the retrieved documents has received considerable attention [7]. Several methods are proposed for mining subtopics from different aspects, such as the retrieved documents, the query logs, Wikipedia, Freebase [8], and the related search services provided by the commercial search engines [5], [9], [10]. In this paper, we address the problem of query subtopic mining, which is defined as: “given a query, list up its possible subtopics which specializes or disambiguates the search intent of original query”. This is the same task setting as the NTCIR10 INTENT-2 English Subtopic Mining subtask [7]. In our approach, we exploit query reformulations provided by three major WSEs as a means to uncover different query subtopics. We estimate the importance of the subtopics by introducing multiple query-dependent and query-independent features, and rank the subtopics by balancing relevancy and novelty. The experimental results indicate that our method outperforms
all other participants’ methods [11]–[15] in terms of D#NDCG@10 in NTCIR-10 INTENT-2.
II.
R ELATED W ORK
Queries are usually short, ambiguous and/or underspecified [1]–[3]. To understand the meanings of queries, researchers define taxonomies and classify queries into predefined categories. Song et al. [16] divided queries into three categories: ambiguous queries, which have more than one meaning; board queries, which covers a variety of subtopics; and clear queries, which have a specific meaning or narrow topics. At the query level, Broder [17] divided query intent into navigational, informational, and transactional types. Nguyen and Kan [18] classified queries into four general facets of ambiguity, authority, temporal sensitivity, and spatial sensitivity. Boldi et al. [19] created query-flow graph with query phrase nodes and used them for query recommendation. Query suggestion is a key technique for generating alternative queries to help users drill down to a subtopic of the original query [20], [21]. Different from query suggestion or query completion, subtopic mining focuses more on the diversity of possible subtopics of the original query rather than inferring relevant queries. Jian Hu et al. [22] integrated the knowledge contained in Wikipedia to predict the possible intents for a given query. Filip Radlinkshi et al. [23] proposed an approach for inferring query intents from reformulations and clicks. For an input query, the click and reformulation information are combined to identify a set of possible related queries to construct an undirected graph. An edge is introduced between two queries if they were often clicked for the same documents. Finally, random walk similarity is used to find intent cluster. At the session level, Radlinski and Joachims [24] mined intent from query chains and used it for learning to rank algorithm. Recently, Wang et al. [5] proposed a method to mine subtopics of a query either directly from the query itself or indirectly from the retrieved documents of the retrieval systems to diversify the search results. In indirect approach, subtopics are extracted by clustering, topic modeling, and concepttagging of the retrieved documents. In direct approach, several external resources, such as Wikipedia, Open Directory Project (ODP), search query logs, and the related search services are investigated to mine subtopics. Santos et al. [9] leveraged the query reformulations of WSEs to mine sub-queries (i.e. subtopics) for diversifying web search results. The surrounding text of query terms in top retrieved documents are utilized to mine subtopics and ranked them [10]. III.
P ROPOSED F RAMEWORK
In this section, we describe our proposed methodology which is decomposed into three parts including Subtopic candidate generation, Subtopic features extraction, and Subtopic ranking. Whole framework is depicted in Fig. 1. The detail of each part is articulated in the following sections.
Query
Yahoo Query Completion
Bing Query Completion
Resources
Subtopic candidate generation
Subtopic ranking
The rest of the paper is organized as follows: Section II overviews related work on query subtopic mining. Section III introduces our proposed framework for subtopic generation, subtopic feature extraction, and subtopic ranking. Section IV includes the overall experiments and the results that we obtained. Finally, concluding remarks and some future directions of our work are described in Section V.
Google Query Completion
Subtopic features extraction
Subtopic importance estimation
Subtopic diversification
Ranked subtopics
Fig. 1: A subtopic mining framework
A. Subtopic candidate generation Query suggestions obtained from WSEs are an easy and effective choice for obtaining subtopics. Santos et al. [9] evaluated the effectiveness of two kinds of query suggestions for the purpose of search result diversification: “suggested queries” (a.k.a query auto completions), which are presented in a drop-down list within the search box as the user enters a query, and “related queries”, which are shown alongside a ranked list of URLs within a Search engine result page (SERP). As their results reveal that “suggested queries” are more effective for search result diversification, we also decided to use suggested queries rather than related queries. Inspired by the work of Santos et al. [9], our assumption is that “suggested queries” in across WSEs contains some facets or intents of the user query. Given a query, we retrieve all the suggested queries from the WSEs, aggregate them by filtering out the duplicates or wrongly represented ones, and consider them as subtopic candidates. For example, given a query “figs”, we generate a list of subtopics candidates including “home garden figs”, “health benefits of figs”, “figs cooking”, “figs nutrition”, “Figs chocolate”, “Charlestown”, etc. B. Subtopic features extraction To measure the importance of the subtopic candidates, we apply the fusing technique with multiple features extracted from the subtopic candidates. In this regard, we broadly organize all subtopic candidate features used by our approach as either query-dependent or query-independent, according to whether they are computed on-the-fly at querying time or offline at indexing time, respectively. While the considered query-dependent features are standard features commonly used in the literature for learning to rank for web search [25], the query-independent ones are specifically proposed here to estimate the quality of different subtopic candidates. We now briefly describe how each feature is computed with its parameter setting. 1) Query-dependent features: Given a query Q, querydependent features are directly computed by scoring the occurrences of the terms of query Q in each subtopic candidate of {S1 , S2 , S3 , ....., SN }. Among the query-dependent features, there are some term dependencies, term frequency, lexical, and Web HIT count based features.
Language modeling with dirichlet smoothing feature, fLM DS (Q, S) is defined based on language modeling approach to information retrieval [26] and smoothed using Dirichlet smoothing [27]. It is computed as the log likelihood of the query being generated from the subtopic candidate. The feature is computed as: ∑ tfw,S + µP (w|C) fLM DS (Q, S) = tfw,Q log (1) |S| + µ w∈Q
where tfw,Q is the number of times that w occurs in the query, tfw,S is the number of times w occurs in the subtopic candidate, |S| is the number of terms in the subtopic candidate, P (w|C) is the background language model, and µ is a tunable smoothing parameter and set to 2, 500 in our experiment, empirically. Language modeling with Jelinek-Mercer smoothing feature, fLM JM (Q, S) is defined as the linear combination of the probability of the query term given the subtopic candidate and the probability of the query term in background language model, and computed as: ∑ fLM JM (Q, S) = λ P (w|S) + (1 − λ) P (w|C) (2) w∈q
where P (w|S) is the probability of the query term w given the subtopic candidate, P (w|C) is the background language model, and λ is the controlling parameter. The parameter λ is µ set to |S|+µ , where the parameter µ is 2, 500. A particularly effective approach to exploit term dependency was proposed by [28]. In this model, unigram, bigram sequential dependency, and bigram full dependency are linearly interpolated. The term dependency with Markov random field based feature, fM RF (Q, S) is computed as: ∑ fM RF (Q, S) = αu log P (wi |θS )+ αs
∑ ∑
is an exact lexical match of the query Q within the subtopic candidate S [29]. It is computed as: fEM (Q, S) = I (Q substring of S)
where I is the indicator function that returns 1 if its argument is satisfied. Overlap feature, fOverlap (Q, S) is simply defined as the fraction of query terms that occur, after stemming and stopping, in the subtopic candidate [29]. It is computed as: ∑ w∈Q I (w ∈ S) fOverlap (Q, S) = (6) |Q| Overlap-syn feature, fOverlap−syn (Q, S) is the generalization of the Overlap feature by also considering synonyms of query terms. It is defined as the fraction of query terms that either match with subtopic candidate term or have a synonym that matches with subtopic candidate term. It is computed as: ∑ w∈Q I (Syn (w) ∈ S) fOverlap−syn (Q, S) = (7) |Q| where Syn (w) denotes the set of synonyms of the term w, including the term itself. We use the WordNet 3.0 [30] to get the synonyms of noun, adjective, verb, and adverb followed by canonicalization with Krovertz stemmer [31] and POS tagging with Standford NLP parser [32]. BM25 [33] is an effective term weighting method that incorporates the query term frequency, subtopic length, and inverse subtopic frequency. BM25 feature, fBM 25 (Q, S) is computed as: ∑ tfw,S ∗ (k1 + 1) fBM 25 (Q, S) = tfw,Q lS k (1 − b + b ∗ avgl ) + tfw,S 1 w∈Q S log
wi ∈Q
log P (< wi , wj > |θS )+
wi ∈Q wj ∈Q j=i+1
(5)
N − tfw,S + 0.5 tfw,S + 0.5 (8)
(3)
where lS is the subtopic length in terms, avglS is the average length of the subtopics, N is the total subtopics in the collections, and k1 and b are free parameters. In our experiment, the parameters k1 = 1.80 and b = 0.75 are assigned for empirical reason.
where the parameters αu , αs , and αf control the weights of the unigram, bigram sequential, and bigram full models in the linear combination, respectively, and θS is the language model of the subtopic candidate S. αu , αs , and αf are given with equal importance and set to 0.333, empirically.
A non-parametric divergence from randomness (DFR) based models, DFH [34] has been shown to perform effectively across a variety of Web search tasks [35]. Term frequency based DFH feature, fDF H (Q, S) is computed as:
To measure the lexical similarity between the query and the subtopic candidate, Edit distance based feature, fEDS (Q, S) is computed as:
fDF H (Q, S) =
αf
∑ ∑
log P (< wi , wj > |θS )
wi ∈Q wj ∈Q j̸=i
fEDS
Edit distance (Q, S) (Q, S) = 1 − M ax (length(Q), length (S))
∑ tfw,S (1 − w∈S
tfw,S 2 lS )
tfw,S + 1
log2 (tfw,S
avglS )+ lS tfw,C
0.5 log2 (2πtfw,S (1 − (4)
where Edit distance is measured based on the number of edit operations (insertion, deletion, or substitution of a word) necessary to unify two strings, and length (Q) is the number of terms in the query Q. Another simple lexical feature is Exact Match. Exact match feature, fEM (Q, S) is a binary feature that returns 1 if there
tfw,S )) lS (9)
where tfw,S is the frequency of the term in the subtopic candidate S and tfw,C is the frequency of the term in the collections. Usually, if a subtopic is frequently mentioned in the Web pages, then that subtopic might be important than others. According to this intuition, we select the search engine hit
counts of a subtopic candidate. Let Hit (S) denotes the search engine hit counts of a subtopic candidate S, obtained by Bing API. HIT count based feature, fhit (Q, S) is computed as: fhit (Q, S) = ∑
Hit (S) Si ∈C Hit (Si )
(10)
Subtopic candidate contains the query topic itself. Therefore, point-wise mutual information (PMI) from the search engines hit counts is defined as: fP M I (Q, S) = log
Hit (S) Hit (Q)(Hit (Q) − Hit (S))
MF =
Voting feature, fV oting (Q, S) is defined as the number of search engines, which returns a subtopic candidate S. fV oting (Q, S) =
I (S ∈ Ri )
(12)
i=1
where I is an indicator function, which returns 1 if its argument is true, N is the number of Search engines, Ri is the ranked list of subtopic candidates from the Search engine i. Reciprocal rank feature, fRR (Q, S) is defined as the reciprocal of the rank of the subtopics in the suggested queries of the search engines. It is computed as: fRR (Q, S) =
N ∑ i=1
1 √ ranki (S) + 1
Now, the mean feature vector, MF is computed from the feature vectors of all the subtopic candidates as:
(11)
2) Query-independent features: The goal of query independent features is to encode a prior knowledge we have about individual subtopic candidate. Here, we propose some simple query independent features.
N ∑
1) Subtopic importance estimation: We estimate the importance score Imp (.) of a subtopic candidate by considering the extracted features in the above section. We normalize each feature using max-min normalization technique. For a subtopic candidate S, we represent all the extracted features in a feature vector, FVS = {fLM DS , fLM JM , fM RF , ...., fT C } with dimension 15, one dimension of FV for each feature. Thus, given a query, we have a list of subtopic candidates with the corresponding feature vectors.
(13)
where ranki (S) is the rank of the subtopic in the ith search engine and N is the number of Search engines. Longer terms in subtopic would reflect a more thoughtful and readable style. To focus on the readability of the subtopic, average term length (ATL) in a subtopic candidate is defined as: 1 ∑ fAT L (Q, S) = tfw,S lw (14) lS w∈S
where lw denotes the length in characters of the term w. Additional readability features have been recently proposed is topic cohesiveness (TC) [36]. Topic cohesiveness feature, fT C (Q, S) is computed as: ∑ fT C (Q, S) = − P (w|S) log P (w|S) (15) w∈S
where p (w|S) is computed using a maximum likelihood estimation. C. Subtopic ranking In this section, we rank the subtopic candidates to optimize both relevancy and diversity of the subtopic candidates, which is the goal of subtopic mining task in NTCIR-10 INTENT-2.
N 1 ∑ FVSj N j=1
(16)
Here, N is the number of subtopic candidates. We define the importance score Imp (S) of a subtopic candidate S as the cosine similarity between the subtopic feature vector FVS and the mean feature vector MF. Therefore, subtopic candidate importance score is estimated as: FVS · MF ∥ FVS ∥∥ MF ∥ (17) We consider the importance score Imp (S) of a subtopic candidate S as the relevancy score Rel (S) for subtopic diversification. Imp (S) = CosineSim (FVS , MF) =
2) Subtopic diversification: We utilize the maximum marginal relevance (MMR) framework [37] to further evaluate the diversity of mined subtopic candidates. The MMR model regards the ranking problem as a procedure of successively selecting the “best” unranked object (usually a document but in our scenario, a subtopic) and arranging it at the tail of the rank list. When looking for the next best subtopic, the MMR model chooses not the most relevant one, but the one that best balances the relevancy and novelty. Novelty means that a subtopic is new compared to those already chosen and ranked. Therefore, the MMR model can further improve the ranking of subtopics in diversity, which is one of the main goals of the NTCIR-10 INTENT task. Given a relevance function Rel (.) and a similarity function Sim (., .), the MMR model could be set up as follows: Si+1 = argmaxS ∈D / i α Rel (S) + (1 − α) N ov (S, Di ) where α ∈ [0, 1] and it is a combining parameter. Then Di+1 = Di ∪ Si+1 Here, Si is the subtopic ranked at the ith position and Di is the collection containing the top i subtopics. The function N ov (S, Di ) tries to measure the novelty of S given Di has already been chosen and ranked. In our approach, we implement the novelty function as follows: N ov (S, Di ) = −maxS ′ ∈Di Sim (S, S ′ ) We apply Jaccard Similarity between two sets to calculate the similarity between S and S ′ . First, a subtopic can be represented by a set of terms; after that, we apply Jaccard Similarity to calculate the similarity between S and S ′ : Sim (S, S ′ ) =
S ∩ S′ S ∪ S′
TABLE I: Comparative results of our proposed methods with other participants in NTCIR-10 INTENT-2 task. Acronyms stand for THUIR: Tsinghua University, THCIB: Canon Information Technology, hultech: Normandie University, KLE: Pohang University of Science and Technology, ORG: Microsoft Research Asia, TUTA1: The University of Tokushima, and LIA: University of Avignon. Method Our method THUIR-S-E-4A THCIB-S-E-1A hultech-S-E-1A KLE-S-E-4A SEM12-S-E-2A ORG-S-E-4A TUTA1-S-E-1A LIA-S-E-4A
I-rec@10 0.4405 0.4364 0.4431 0.3680 0.4457 0.3777 0.3815 0.2181 0.2000
D-nDCG@10 0.5314 0.5062 0.4657 0.5368 0.4401 0.4290 0.3829 0.2577 0.2753
D#-nDCG@10 0.4835↑ 0.4713 0.4544 0.4524 0.4429 0.4014 0.3822 0.2379 0.2376
Finally, we find the maximum among the similarity values between S and all S ′ ∈ Di , and take its opposite number as the novelty score. The more similar S is to some previous selected subtopic, the less novel it is. IV.
E XPERIMENTS AND D ISCUSSION
A. Dataset We evaluated our proposed methods using the NTCIR10 INTENT-2 English Subtopic Mining test collection and compare our method to other participants’ works [7]. The test collection consists of a set of 50 query topics, a set of possible subtopics for each query topic, obtained by pooling the submitted runs and assessor voting as relevance judgments. B. Evaluation metrics We evaluated our method by estimating I-rec, D-nDCG, and D#-nDCG@10. I-rec measures the diversity of the returned subtopics, which shows how many percentages of intents can be found. D-nDCG measures the overall relevance across all intents considering the subtopic ranking. D#-nDCG is a combination of I-rec and D-nDCG. It is used as the primary evaluation metric by the INTENT task organizers. The advantages of D#-nDCG over other diversity metrics (e.g. a-nDCG [38] and Intent-Aware metrics [39]) are discussed in [7]. In our experiments, we use NTCIREVAL [40], the tool provided by the NTCIR-10 organizers, to compute the above three metrics, in which the default setting is used. C. Experimental results Our evaluation results are depicted in Table I. The result shows that our method is effective in mining and ranking subtopics. Our method outperforms the best runs in terms D#nDCG at the cutoff 10. Although we only used one subtopic mining resource (i.e. WSEs’ suggested search services), the other participants used several resources to mine subtopic candidates. V.
C ONCLUSION AND F UTURE W ORK
In this paper, we proposed a method to address the query subtopic mining problem. First, given a query, we retrieved search related services provided by the three major WSEs
and generated a pool of subtopic candidates with filtering out the near duplicates. Then we extracted some query-dependent and query-independent features including term dependency, term frequency, and lexical features. Next, we estimated the importance of the subtopic candidates using the extracted features. Finally, we utilized the subtopic importance to rank and diversify the subtopic candidates. Experimental results show that our proposed method is effective to mine and rank subtopics, and it outperforms the related previous methods, although we only used one resource to mine subtopics whereas previous methods used several resources. In our method, we focused on effectively ranking subtopic than generating subtopics. As the future work, we will mine subtopics from multiple resources including search results, Wikipedia, and Freebase. One of our future plans is to efficiently utilize the mined subtopics for search result diversification. R EFERENCES [1] R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon, “Identification of ambiguous queries in web search,” Information Processing & Management, vol. 45, no. 2, pp. 216–229, 2009. [2] C. L. Clarke, N. Craswell, and I. Soboroff, “Overview of the trec 2009 web track,” DTIC Document, Tech. Rep., 2009. [3] K. Sp¨arck-Jones, S. E. Robertson, and M. Sanderson, “Ambiguous requests: implications for retrieval tests, systems and theories,” in ACM SIGIR Forum, vol. 41, no. 2. ACM, 2007, pp. 8–17. [4] T. N. Nguyen and N. Kanhabua, “Leveraging dynamic query subtopics for time-aware search result diversification,” in Advances in Information Retrieval. Springer, 2014, pp. 222–234. [5] C.-J. Wang, Y.-W. Lin, M.-F. Tsai, and H.-H. Chen, “Mining subtopics from different aspects for diversifying search results,” Information retrieval, vol. 16, no. 4, pp. 452–483, 2013. [6] C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. B¨uttcher, and I. MacKinnon, “Novelty and diversity in information retrieval evaluation,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2008, pp. 659–666. [7] T. Sakai, Z. Dou, T. Yamamoto, Y. Liu, M. Zhang, R. Song, M. P. kato, and M. Iwata, “Overview of the ntcir-10 intent-2 task,” in Proceedings of NTCIR, vol. 10, 2013, pp. 1–26. [8] Google. (2014) Freebase data dumps. [Online]. Available: https://developers.google.com/freebase/data [9] R. L. Santos, C. Macdonald, and I. Ounis, “Exploiting query reformulations for web search result diversification,” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 881–890. [10] Q. Wang, Y. Qian, R. Song, Z. Dou, F. Zhang, T. Sakai, and Q. Zheng, “Mining subtopics from text fragments for a web query,” Information retrieval, vol. 16, no. 4, pp. 484–503, 2013. [11] A. Damien, M. Zhang, Y. Liu, and S. Ma, “Improve web search diversification with intent subtopic mining,” in Natural Language Processing and Chinese Computing. Springer, 2013, pp. 322–333. [12] J. G. Moreno and G. Dias, “Hultech at the ntcir-10 intent-2 task: Discovering user intents through search results clustering,” in Proceedings of NTCIR, vol. 10, 2013. [13] J. Wang, G. Tang, Y. Xia, Q. Zhou, F. Zheng, Q. Hu, S. Na, and Y. Huang, “Understanding the query: Thcib and thuis at ntcir-10 intent task,” in Proceedings of NTCIR, vol. 10, 2013. [14] S.-J. Kim and J.-H. Lee, “The kle’s subtopic mining system for the ntcir-10 intent-2 task,” in Proceedings of NTCIR, vol. 10, 2013. [15] M. Z. Ullah, M. Aono, and M. H. Seddiqui, “Sem12 at the ntcir-10 intent-2 english subtopic mining subtask,” in Proceedings of NTCIR, vol. 10, 2013. [16] R. Song, Z. Luo, J.-R. Wen, Y. Yu, and H.-W. Hon, “Identifying ambiguous queries in web search,” in Proceedings of the 16th international conference on World Wide Web. ACM, 2007, pp. 1169–1170.
[17] [18] [19]
[20]
[21]
[22]
[23]
[24]
[25] [26]
[27]
[28]
[29]
A. Broder, “A taxonomy of web search,” in ACM Sigir forum, vol. 36, no. 2. ACM, 2002, pp. 3–10. B. V. Nguyen and M.-Y. Kan, “Functional faceted web query analysis,” in WWW2007: 16th International World Wide Web Conference, 2007. P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna, “The query-flow graph: model and applications,” in Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008, pp. 609–618. Z. Zhang and O. Nasraoui, “Mining search engine query logs for query recommendation,” in Proceedings of the 15th international conference on World Wide Web, 2006, pp. 1039–1040. Q. Mei, D. Zhou, and K. Church, “Query suggestion using hitting time,” in Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 2008, pp. 469–478. J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen, “Understanding user’s query intent with wikipedia,” in Proceedings of the 18th international conference on World wide web. ACM, 2009, pp. 471–480. F. Radlinski, M. Szummer, and N. Craswell, “Inferring query intent from reformulations and clicks,” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 1171–1172. F. Radlinski and T. Joachims, “Query chains: learning to rank from implicit feedback,” in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005, pp. 239–248. T.-Y. Liu, “Learning to rank for information retrieval,” Foundations and Trends in Information Retrieval, vol. 3, no. 3, pp. 225–331, 2009. F. Song and W. B. Croft, “A general language model for information retrieval,” in Proceedings of the eighth international conference on Information and knowledge management. ACM, 1999, pp. 316–321. C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to ad hoc information retrieval,” in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2001, pp. 334–342. D. Metzler and W. B. Croft, “A markov random field model for term dependencies,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 472–479. D. Metzler and T. Kanungo, “Machine learned sentence selection
[30] [31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
strategies for query-biased summarization,” in SIGIR Learning to Rank Workshop, 2008, pp. 40–47. C. Fellbaum, WordNet. Wiley Online Library, 1998. R. Krovetz, “Viewing morphology as an inference process,” in Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1993, pp. 191–202. R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng, “Parsing with compositional vector grammars,” in In Proceedings of the ACL conference. Citeseer, 2013. S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford et al., “Okapi at trec-3,” NIST SPECIAL PUBLICATION SP, pp. 109–109, 1995. G. Amati, G. Amodeo, M. Bianchi, C. Gaibisso, and G. Gambosi, “Fub, iasi-cnr and university of tor vergata at trec 2008 blog track,” DTIC Document, Tech. Rep., 2008. R. L. Santos, R. M. McCreadie, C. Macdonald, and I. Ounis, “University of glasgow at trec 2010: Experiments with terrier in blog and web tracks.” in TREC, 2010. M. Bendersky, W. B. Croft, and Y. Diao, “Quality-biased ranking of web documents,” in Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011, pp. 95–104. J. Carbonell and J. Goldstein, “The use of mmr, diversity-based reranking for reordering documents and producing summaries,” in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1998, pp. 335–336. C. L. Clarke, N. Craswell, I. Soboroff, and A. Ashkan, “A comparative analysis of cascade measures for novelty and diversity,” in Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 2011, pp. 75–84. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in Proceedings of the Second ACM International Conference on Web Search and Data Mining. ACM, 2009, pp. 5–14. T. Sakai, “Ntcireval: A generic toolkit for information access evaluation,” in Proceedings of the Forum on Information Technology, vol. 2, 2011, pp. 23–30.