Diversification for Multi-domain Result Sets - Semantic Scholar

2 downloads 40159 Views 112KB Size Report
Oct 28, 2011 - entities, like “Find an affordable house in a city with low criminality index, good ... notions of diversity for multi-domain result sets, compares.
Diversification for Multi-domain Result Sets ∗ Alessandro Bozzon, Marco Brambilla, Piero Fraternali, Marco Tagliasacchi Politecnico di Milano, Dipartimento di Elettronica e Informazione P.zza Leornardo Da Vinci 32, 20133 - Milano - Italy

{name.surname}@polimi.it

ABSTRACT

1. INTRODUCTION

Multi-domain search answers to queries spanning multiple entities, like “Find an affordable house in a city with low criminality index, good schools and medical services”, by producing ranked sets of entity combinations that maximize relevance, measured by a function expressing the user’s preferences. Due to the combinatorial nature of results, good entity instances (e.g., inexpensive houses) tend to appear repeatedly in top-ranked combinations. To improve the quality of the result set, it is important to balance relevance (i.e., high values of the ranking function) with diversity, which promotes different, yet almost equally relevant, entities in the top-k combinations. This paper explores two different notions of diversity for multi-domain result sets, compares experimentally alternative algorithms for the trade-off between relevance and diversity, and performs a user study for evaluating the utility of diversification in multi-domain queries.

Multi-domain search tries to respond to queries that involve multiple correlated concepts [2, 4]. Formally, multidomain queries can be represented as rank-join queries over a set of relations, representing the wrapped data sources [10]. Each item in the result set is a combination of objects that satisfy the join and selection conditions, and the result set is ranked according to a scoring function. Due to the combinatorial nature of multi-domain search, the number of combinations in the result set is normally very high, and strongly relevant objects tend to combine repeatedly with many other concepts, requiring the user to scroll down the list of results deeply to see alternative, maybe only slightly less relevant,objects. Improving the diversity of the result set is the aim of diversification, which can be defined in the context of multi-domain search as the selection of K elements out of a universe of N combinations, so to maximize a quality criterion that combines the relevance and the diversity of the objects of distinct types seen by the user. The contributions of the paper includes: (1) the formalization of the diversification problem in the context of multidomain search; following the approach in [9], we propose two criteria for comparing combinations (categorical and quantitative diversity); (2) a study of the behavior of three known greedy algorithms, testing the hypothesis that the diversification algorithms improve the quality of the result set with respect to a baseline constituted by the selection of the most relevant K combinations; (3) an evaluation of the perception and utility of diversification in multi-domain search with a user study based on selection tasks involving combinations of correlated objects. The paper is organized as follows: Section 2 presents the formalization of the problem and introduces different diversity measures for combinations; Section 3 illustrates the results of the experimental activity; Section 4 discusses previous work; Section 5 concludes and discusses future work.

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Search Process; H.3.4 [System and Software]: Performance Evaluation (efficiency and effectiveness); H.3.5 [On-line Information Services]: Web-based services

General Terms Theory, Experimentation, Performance

Keywords Result diversification, multi-domain search ∗The research leading to these results has received funding from the European Research Council under the 2008 Call for “IDEAS Advanced Grants” as part of the Search Computing (SeCo) project. We wish to thank A. Vaccarella and M. Follo for their contribution to the implementation of the query analysis framework.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM’11, October 24–28, 2011, Glasgow, Scotland, UK. Copyright 2011 ACM 978-1-4503-0717-8/11/10 ...$10.00.

2. MULTI-DOMAIN DIVERSIFICATION Consider a set of relations R1 , R2 , . . . , Rn , where each Ri denotes the result set returned by invoking a search service σi over a Web data source. Each tuple ti ∈ Ri =< i i i a1i , a2i , . . . , am > has schema Ri (A1i : D1i , . . . , Am : Dm i i i ), mi i where Am is an attribute of relation R and D is the asi i i sociated domain. As usual in measurement theory, we distinguish the domains Dki into categorical, when values admit only equality test, and quantitative, when values can be organized in vectors embedded in a metric space.

A multi-domain query over the search services σ1 , . . . σn is defined as a join query q = R1 1 . . . 1 Rn over the relations R1 , . . . , Rn , where the join predicate can be arbitrary. We call combination an element of the join τ = t1 1 · · · 1 tn =< 1 2 mn 1 >, and result set R the a11 , a21 , . . . , am 1 , . . . , an , an , . . . , an set of combinations satisfying the query q.

2.1 Relevance The goal of multi-domain search is to support the user in selecting one or more combinations from the result set of a multi-domain query, so to maximize the satisfaction of his information seeking task. To this end, combinations can be presented in order of relevance, according to a user-defined relevance score function S(τ, q), which can be assumed to be normalized in the [0, 1] range, where 1 indicates the highest relevance. When the result set R is sorted, e.g. in descending order of relevance, τk indicates the k-th combination of R. For instance, given the relations Hotel (HName, HLoc, HRating, HPrice), Restaurant(RName, RLoc, RRating, RPrice), and Museum(MName, MLoc, MRating, MPrice), consider a function city(), which returns the city where the geographical coordinates of a location belong to, and the multidomain query q = select * from Hotel, Restaurant, Museum where city(HLoc) = Milan ∧ city(RLoc) = city(HLoc) ∧ city(MLoc) = city(HLoc). The overall price of the combination S(τ, q) = sum(HPrice[th ], RPrice[tr ], MPrice[tm ]) could be used to rank hotel, restaurant and museum triples. Note that S is a simple linear (thus monotone) functions based solely on a subset of the attribute values of the constituent tuples, but more complex functions, not necessarily monotone, that might incorporate external knowledge (e.g. road maps) are allowed.

2.2 Diversity We consider two different criteria to express the similarity (or symmetrically, the diversity) of combinations: (1) Categorical diversity: two combinations are compared based on the equality of the values of one or more categorical attributes of the tuples that constitute them. As a special case, categorical diversity can be based on the key attributes: this means evaluating if an object (or sub-combination of objects) is repeated in the two combinations. (2) Quantitative diversity: the diversity of two combinations is defined as their distance, expressed by some metric function. In both cases, for each pair of combinations τu and τv , it is possible to define a diversity measure δ : R × R → [0, 1], normalized in the [0, 1] interval, where 0 indicates maximum similarity. j

ji,d

Definition 2.1. Let Ai i,1 , . . . , Ai i be a subset of di atji,d j tributes of relation Ri and vi (τ ) = [ai i,1 , . . . , ai i ]T the projection of a combination τ on such attributes. Categorical P diversity is defined as δ(τu , τv ) = 1 − n1 n 1 , v i=1 i (τu )=vi (τv ) where n is the number of relations Ri and 1 is the indicator function, returning one when the predicate is satisfied. Intuitively, categorical diversity is the percentage of tuples in two combinations that do not coincide on the attributes ji,d j Ai i,1 , . . . , Ai i . When these attributes are the key, categorical diversity can be interpreted as the percentage of objects that appear only in one of the two combinations. Definition 2.2. Let vi (τ ) be as in Definition 2.1. Let v(τ ) = [v1 (τ ), . . . , vn (τ )]T = [v1 (τ ), . . . , vd (τ )]T denote the

concatenation of length d = d1 + . . . + dn of such vectors. Given d user-defined weights w1 . . . , wd and a normalization constant qPδmax , quantitative diversity is defined as δ(τu , τv ) = d p 1 p l=1 wl |vl (τu ) − vl (τv )| δmax Quantitative diversity between combinations τu , τv is formalized as the (weighted) lp -norm of the difference between the vectors v(τu ) and v(τv ). The normalization constant can be chosen, e.g., as the maximum distance value between pairs of combinations in the result set.

2.3 Computing Relevant and Diverse Combinations Let N = |R| denote the number of combinations in the result set and RK ⊆ R the subset of combinations that are presented to the user, where K = |RK |. We are interested in identifying a subset RK which is both relevant and diverse. Fixing the relevance score S(·, q), the dissimilarity function δ(·, ·), and a given integer K, we aim at selecting a set RK ⊆ R of combinations, which is the solution of the following optimization problem [9]: R∗K = argmax F (RK , S(·, q), δ(·, ·)), where F (·) is an RK ⊆R,|RK |=K

objective function that takes into account both relevance and diversity. Two commonly used objective functions are MaxSum and MaxMin, as defined in [9]. Solving the problem for such functions is NP-hard, as it can be reduced to the minimum kcenter problem. Nevertheless, greedy algorithms exist [9], which give a 2-approximation solution in polynomial time. The underlying idea for those algorithm is to create an auxiliary function δ ′ (·, ·) and iteratively construct the solution by incrementally adding combinations in such a way as to locally maximize the given objective function. A λ > 0 parameter specifies the trade-off between relevance and diversity. Another objective function, closely related to the aforementioned ones, is Maximal Marginal Relevance (MMR), initially proposed in [3]. Indeed, MMR implicitly maximizes an hybrid objective function whereby relevance scores are summed together, while the minimum distance between pairs of objects is controlled.

3. EXPERIMENTS We conducted quantitative experiments and a subjective user study to understand how the considered diversification algorithms perform with respect to each other and to the non-diversified baseline. The experiments address both categorical and quantitative diversity measures. All the algorithms were evaluated using a λ = 1, thus giving equal importance to both diversity and ranking.

3.1 Evaluation measures The evaluation of diversity in multi-domain result sets must assess the ability of a given algorithm to retrieve useful and diverse tuples within the first K results in the query answer. We therefore adapted the α-Discounted Cumulative Gain (α-DCGK ) [5] to measure the usefulness (gain) of a tuple ti ∈ Ri – where R1 , . . . , Rn are the relations involved in a multi-domain query – based on its position in the result list and its novelty w.r.t. the previous results in the ranking. Therefore, we define α-DCGK as PK Pni=1 J (τk ,ti )(1−α)rti ,k−1 α-DCGK = , where J(τk , ti ) k=1 log (1+j) 2

10

No Div. MMR Max Sum Max Min

0.1 MD−Recall

α DCG

15

5 5

12 α DCG

10 8

15 20 a) αDCG Categorical

25

No Div. MMR Max Sum Max Min

30

MD−Recall

14

10

6

No Div. MMR Max Sum Max Min

0.05

0

0.02

5

10 15 20 b) MD−RECALL Categorical

25

30

10 15 20 d) MD−Recall Quantitative

25

30

No Div. MMR Max Sum Max Min

0.01

4 2

5

10

15 20 c) αDCG Quantitative

25

30

0

5

Figure 1: Experimental results for α-DCGK (a,c) and M D-RecallK (b,d) measures. returns 1 when tuple ti appears in a combination τk at position k in the result set and 0 otherwise. J is defined as J(τk , ti ) = 1πRi (τk )=ti , where πRi denotes the projection P over the attributes of Ri , and rti ,k−1 = k−1 j=1 J(τj , ti ) quantifies the number of combinations up to position k − 1 that contain the tuple ti . In our experiments, α is set to 0.5 to evaluate novelty and relevance equally. Sub-topic recall at rank K (S-RecallK ) [12] is a recall measure often applied to evaluate diversification algorithms. In multi-domain search, a tuple in a combination can be assimilated to a subtopic in a document. Therefore, multik Q |∪K k=1 Ri | domain recall at rank K – M D-RecallK = n , i=1 |Ri | measures, for each relation Ri involved in a query (with i = 1 . . . n) and for all rank positions k from 1 to K, the set of distinct tuples (Rik = {ti ∈ Ri |∃j ≤ k. πRi (τj ) = ti }) retrieved in the result set, with respect to the entire population of the relation (|Ri |).

3.2 Datasets Experiment has been performed on two usage scenarios. In the first scenario – Night Out (NO) – a user looks for a museum, a restaurant, and a hotel in Milan. We created a dataset consisting of the Hotel (50 tuples), Restaurant (50 tuples), and Museum (50 tuples) relations, computing the mutual location distances, and defining three quantitative relevance scores – the combination cost, the total walking distance, and the average ratings. In the Study Abroad (SA) scenario, a user looks for a U.S. university, considering the rating of the university, the quality of life in the area, and the overall cost, including accommodation. The supporting dataset consists of three relations University (60 US universities with their academic quality score1 , walkability score of the surroundings2 , and average tuition fee), State (including their crime rate), and Flat (1200 flats). Joins were performed on the state attribute. We defined three relevance scores: the overall yearly expenditure, a “quality” index 3 , and the distance between the university and the flat. 1 Source: http://archive.ics.uci.edu/ml/machine-learningdatabases/university/ 2 WalkScore - http://www.walkscore.com/ 3 A function of the academic quality score, the walkability index of a university and the crime rate in a state

3.3 Quantitative Evaluation Figure 1 shows the average results for multiple execution of 20 queries on both datasets; each query applies one of the score functions described in Section 3.2, and diversification according to categorical and quantitative distance. Each data point of the X-axis represents the k-th element in the result-set. Using the categorical diversity function, MMR and MaxMin always outperform the un-diversified baseline, while MaxSum does not provide significant improvements. One can also notice that MMR and MaxMin offer similar performance: this is not surprising, as the greedy algorithms for the two objective functions are also very similar. Using the quantitative distance, all algorithms provide similar performance; MMR and MaxMin degrade their performance w.r.t. categorical distance, and behave only slightly better than the baseline. This may also be influenced by the chosen quality measures (α-DCGK and M D-RecallK ), that are based on diversity of extracted objects and therefore are more suited to evaluate categorical diversification. Overall, these results support the hypothesis that diversification algorithms can improve the quality of multi-domain result sets.

3.4 User Study The goal of the user study was to assess the benefits in terms of user experience of adopting diversification algorithms. We considered only categorical distance, as it provided better performance in the quantitative analysis. We also excluded MaxMin from the evaluation, as provided performance comparable to MMR. The experiment was designed as a controlled betweensubject study, where 66 users4 were required to perform the Study Abroad task. A common user interface has been created, showing the combinations of one result set to let the user choose three preferred combinations. The interface presented combinations in a paginated list (10 combinations per page), and a shopping cart widget let users save their favorite results. Each participant was assigned to one of the three experiments in a round-robin fashion. For each experiment we collected the total task completion time and the 4 Bachelor and Master students; 95% male; age range from 20 to 28 years old.

25

Seconds

Page Number

300

200

100

0

15 10 5 0

No Div.

MMR

MaxSum

(a)

6. REFERENCES

20

No Div.

MMR

MaxSum

(b)

Figure 2: Average user completion time for the task (a) and average number of result pages explored (b).

number of visited result pages. Each participant took an average of 8 minutes to complete the task. Figure 2 shows the average and standard deviation of the completion time and the number of explored pages. We observe that MMR and MaxSum lead, on average, to 45% and 13% reduction in terms of task completion time, and 60% and 16% reduction in terms of number of explored pages, w.r.t. the baseline. A Kruskal-Wallis one-way, non parametric ANOVA test, provided evidence of the fact that, for each measure, the average values are different at significance level α = 0.01.

4.

RELATED WORK

Result diversification is a well-investigated topic: recent works overview and classify the existing approaches [8] and provide a systematic axiomatization of the problem [9]. A broad distinction can be done between the contributions that focus on diversifying search results for document collections [1, 7] and those that concentrate instead on structured data sets [6, 11]. Our work is mostly related to diversification applied to structured data. The work in [11] examines the diversification considering an attribute-based notion of similarity. Multi-domain diversification, as discussed in this paper, is a broader problem; it could be partially reduced to prefix-based diversification only in the case of categorical diversity. Keyword search in structured databases is addressed in [6], where diversification is applied to query interpretations, which are assumed to be available from the knowledge of the database content and query logs. Our work assumes, for simplicity, unambiguous queries and thus a fixed interpretation, but could reuse the interpretation diversification approach of [6] to cope for multi-domain searches with more than one possible interpretation.

5.

CONCLUSIONS

This work discusses how the diversification techniques well studied in the context of IR can be extended to support multi-domain search applications. We experimentally tested three diversification algorithms, showing that they can introduce a significant degree of diversification in the result set; we also conducted a user study, which demonstrated a positive perception of the utility of diversification by users. In the future, we plan to extend this work by characterizing the interplay between the score function and the similarity measure, to better understand the circumstances when relevance alone is sufficient to diversify; the goal is to end up with a clear methodology for application developers to choose the most promising scoring and diversity functions, based on the available statistics on data distribution, and user preferences.

[1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM ’09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5–14, New York, NY, USA, 2009. ACM. [2] A. Bozzon, M. Brambilla, S. Ceri, and P. Fraternali. Liquid query: multi-domain exploratory search on the web. In WWW ’10: Proceedings of the 19th international conference on World wide web, pages 161–170, New York, NY, USA, 2010. ACM. [3] J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336, New York, NY, USA, 1998. ACM. [4] S. Ceri and M. Brambilla, editors. Search Computing Trends and Developments, volume 6585. Springer, Mar. 2011. [5] C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. B¨ uttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659–666, New York, NY, USA, 2008. ACM. [6] E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. Divq: diversification for keyword search over structured databases. In SIGIR ’10: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 331–338, New York, NY, USA, 2010. ACM. [7] Z. Dou et al. Multi-dimensional search result diversification. In Proceedings of the fourth ACM international conference on Web search and data mining, WSDM ’11, pages 475–484, New York, NY, USA, 2011. ACM. [8] M. Drosou and E. Pitoura. Search result diversification. ACM SIGMOD Record, 39:41–47, September 2010. [9] S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW ’09: Proceedings of the 18th international conference on World wide web, pages 381–390, New York, NY, USA, 2009. ACM. [10] D. Martinenghi and M. Tagliasacchi. Proximity rank join. Proceedings of the VLDB Endowment, 3:352–363, September 2010. [11] E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. Efficient computation of diverse query results. In ICDE ’08: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pages 228–236, Washington, DC, USA, 2008. IEEE Computer Society. [12] C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 10–17, New York, NY, USA, 2003. ACM.

Suggest Documents