Interactive Exploration of Multi-Dimensional Information ... - FORTH-ICS

1 downloads 0 Views 32MB Size Report
name Nikos, my colleagues in FORTH Patkos Theodore, Yannis Marketakis, ...... j) belongs to the scope of a child (w.r.t. ⊑) action. Definition 11 (Active Scope of Relative Preferences) ...... Melville, P., Mooney, R. J., and Nagarajan, R. 2001.
UNIVERSITY OF CRETE DEPARTMENT OF COMPUTER SCIENCE FACULTY OF SCIENCES AND ENGINEERING

Interactive Exploration of Multi-Dimensional Information Spaces with Preference Support by

Panagiotis Papadakos

PhD Dissertation Presented in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Heraklion, November 2013

UNIVERSITY OF CRETE DEPARTMENT OF COMPUTER SCIENCE Interactive Exploration of Multi-Dimensional Information Spaces with Preference Support PhD Dissertation Presented by Panagiotis Papadakos in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

APPROVED BY:

Author: Papadakos Panagiotis

Supervisor: Tzitzikas Yannis, Assistant Professor, University of Crete

Commitee Member: Plexousakis Dimitris, Professor, University of Crete

Commitee Member: Savidis Anthony, Professor, University of Crete

Commitee Member: Spyratos Nicolas, Professor Emeritus, University of Paris-South

Commitee Member: Vassiliadis Panos, Assistant Professor, University of Ioannina

Commitee Member: Rauber Andreas, Associate Professor, Vienna University of Technology

Commitee Member: Paltoglou Georgios, Senior Lecturer, University of Wolverhampton

Department Chairman: Trahanias Panos, Professor, University of Crete Heraklion, November 2013

To the sacred reflexive, symmetric and transitive relation of a student and a teacher

“The only principle that does not inhibit progress is: anything goes.” – Paul Karl Feyerabend Against Method (1976)

The drawing in the previous page sketches the following:

i) Preferences are part of the cognitive process of decision making ii) This dissertation takes advantage of multi-dimensional hierarchies iii) The process that any PhD student has to face: starting from a few unrealistic aims, going through the gradual immersion and disorientation in the ocean of the available knowledge (a difficult and frustrating situation where the help of the advisor is appreciated), to the final gathering and integration of the contributions (see also the respective drawing in p. 175).

Image taken from beamer XƎLATEX template available from https://github.com/drbunsen/drbunsen-beamer

Acknowledgments The following pages cannot convey my feelings and the experiences that I gained during all these years of my Doctoral Dissertation odyssey. The blank space was filled by black ink in just a few seconds. Only a small odour reminds the process of their imprinting. But my ‘imprinting’ was a long process. Different ‘typesetters’ wrote their words with different metal sorts in different places. Their printings have affected my scientific, artistic and human nature, and I owe them my present. Those people I would like to thank. The ‘End’ of this work could not have been typed without the undivided and unconditional support of my supervisor, Assistant Professor Yannis Tzitzikas. Through all these years his academic advice and directions were always on the ’spot’. I am also grateful to him for his constant mentorship and for believing in me when I had lost my confidence. Although I was his first PhD student, he managed to accommodate to the specific particularities of my personality and stimulate my interest and enthusiasm. What is more important though, is that as blinkers keep horses from seeing what nature meant them to see, which is just about everything, I was taught to try and remove my mental blinkers. I want to deeply thank Professor Dimitris Plexousakis, head of Information Systems Laboratory (ISL) for the time he devoted to me all these years. He has been a critic and at the same time a supportive advisor. His insights have been really inspiring and crucial. As the head of the lab he created a highly creative and inspiring environment for me. I would like to express my sincere appreciation to the third member of my advisory committee, Associate Professor Anthony Savidis, who was my supervisor during my MSc “3 Dimensional CRC” voyage in the Human Computer Interaction (HCI) lab. Although the initial plans of my PhD thesis changed to unknown territories for him, he managed to understand my work and provide guidance and comments. Furthermore I am indebted to the other members of my examination committee, Professor Emeritus Nicolas Spyratos, Assistant Professor Panos Vassiliadis, Associate Professor Andreas Rauber and Senior Lecturer Georgios Paltoglou, for their constructive comments and suggestions. xiii

I was fortunate to meet Irini Fundulaki and Kostas Stefanidis, two researchers who helped me a lot to gain self-confidence. Irini motivated a number of interesting discussions that helped me understand deeper my work. Kostas is the motivating example of a young, smart, capable and passionate researcher. I wish him all the best to his career. Moreover I would like to acknowledge the support of the Institute of Computer Science of the Foundation of Research and Technology (FORTH-ICS) and especially the ISL, both financially and for the facilities (the lights of the laboratory were kept on until early morning some times). It is a nice place to be, with exceptional people who elicit inspiring discussions. This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operation Program ”Education and LifeLong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: “Herakleitus II. Investing in knowledge society through the European Social Fund”. Despite the above formal words that I have to write, this financial support has been really important for me, especially during this financial crisis period. Finally, I want to thank the following persons with whom I spent a lot of time all these years: Nikos Tsagkarakis for all the time that we spent together, our discussions, the summits we reached, for cultivating ’our’ vineyard and drinking the raki ’spirit’ we produced, Anna-Maria Papadaki for being an ‘earthy’ human being, Aristea Papadimitriou for her gaze and our philosophical discussions, Georgia Troullinou for taking care of me when I was for a second time an ‘infant’, Michalis Papadakis for his ’amanedes’, Dimitris Robotis for cooking on Sundays, Andreas Sfakianakis for being bald, Despoina Pavlidi for the house in Panagia, Sofia Skandali for our trips, Christina Lantzaki for our interesting discussions on graphs, Nikos Manolis and Maria Psaraki for the times in the ’Lefka’ basements, ”Aksas” for not listening to his name Nikos, my colleagues in FORTH Patkos Theodore, Yannis Marketakis, Pavlos Fafalios, Nikos Armenatzoglou, Yannis Kitsos, Dimitris Andreou, Stella Kopidaki, and Yannis Kargakis, Corina Doerr for the nice logo of Hippalus and Ionas for the name Hippalus, Yannis Roussakis for ping-pong, my neighbours in the lab, George Baryannis and Chrysostomos Zeginis, as long as Ioannis Chrysakis, Dimitra Zografistou, Roula Avgoustaki, Lida Charami, Athina Kritsotaki, Irini Maravellia and Manos Papadakis for their patience, Maria Moutsaki for ‘scanning’ and Dimitris Aggelakis for ‘windows’, George Konstantinidis for our friendship before he left Greece, and Dimitra Makri for her understanding. This work is a result of the constant support of my parents, Stavros and Maria, and my two sisters Stavroula and Katerina. They always believed and supported me in any possible way.

xiv

Abstract Users access large amounts of information resources (documents or data) mainly through search functions, where they type a few words and the system (web search engine, query engine) returns a linear list of hits. While this is often satisfactory for focalized search, it does not provide enough support for recall-oriented (exploratory) information needs, which constitute the majority according to various user studies. The interaction of Faceted and Dynamic Taxonomies (FDT), is a highly prevalent model for exploratory search, which allows users to get an overview of the information space (e.g. search results) and offer them various groupings of the results (based on their attributes, metadata, or other dynamically mined information). These groupings enable users to restrict their focus gradually and in a simple way (through clicks, i.e. without having to formulate queries), enabling them to locate resources that would be difficult to locate otherwise (especially the low ranked ones). The enrichment of search mechanisms with preferences could be proved useful for recall-oriented information needs. However, the current approaches for preference-based access (mainly from the area of databases), seem to ignore the fact that users should be acquainted with the information space and the available choices for describing effectively their preferences. In this dissertation we extend the interaction model of FDT with preference actions that allow users to express their preferences interactively, gradually, and in a simple way. Initially, we introduce a preference framework appropriate for information spaces comprising resources described by attributes whose values can be hierarchically valued and/or multi-valued. We define the language, its semantics and the required algorithms. The framework supports preference inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioritization, Pareto and their combination). Subsequently, we enrich the FDT model with preference actions and we propose logical optimizations and methods for exploiting the intrinsic characteristics of the FDT-based interaction, aiming at xv

making it applicable to large amounts of information. Then, we present the design and the implementation of the web-based system Hippalus, which realizes the extended interaction model. Regarding user benefits, at first we theoretically analyze user gain in terms of the number and difficulty of choices, and then we describe and analyze three user-based evaluations that we have conducted. The first investigates the degree of effectiveness of preferences (and the effort to express them) when users are not aware of the available choices. The results showed that only 20% of the users managed to express effective preferences without knowing the available choices. The second comparatively evaluates FDT and other exploratory models. The results showed that the majority of users preferred FDT, was more satisfied by FDT and achieved higher rates of task completion with FDT. The last one concerns the evaluation of the preference-enriched FDT as realized by Hippalus. The results were impressive. Even in a very small dataset, with the preference-enriched FDT all users successfully completed all tasks in 1/3 of the time and with 1/3 of the actions in comparison to the plain FDT. Moreover all (100%) of the users (either plain or experts) preferred the preference-enriched interface. Keywords: Preferences, Exploratory Search, Interactive Information Retrieval, Decision Making Supervisor: Tzitzikas Yannis Assistant Professor Computer Science Department University of Crete

xvi

Περίληψη Η πρόσβαση των χρηστών σε μεγάλους όγκους πληροφοριακών πόρων (δεδομένων ή εγγράφων) συνήθως γίνεται μέσω λειτουργιών αναζήτησης όπου οι χρήστες παραδοσιακά πληκτρολογούν μερικές λέξεις κλειδιά και το σύστημα αναζήτησης (π.χ. η μηχανή αναζήτησης ή σύστημα αποτίμησης επερωτήσεων) επιστρέφει μία γραμμική λίστα «επιτυχιών» (hits). Αν και αυτό είναι ικανοποιητικό για τις ανάγκες της επικεντρωμένης αναζήτησης (focalized search), αυτού του τύπου οι αποκρίσεις δεν παρέχουν επαρκή υποστήριξη σε ανάγκες εξερευνητικού χαρακτήρα (recall oriented), οι οποίες, κατά διάφορες μελέτες, είναι και οι περισσότερες. Ένα ευρέως πλέον διαδεδομένο μοντέλο εξερευνητικής αναζήτησης είναι η αλληλεπίδραση μέσω Πολυεδρικών και Δυναμικών Ταξινομιών (ΠΔΤ). Το μοντέλο αυτό επιτρέπει στους χρήστες να εποπτεύσουν τον πληροφοριακό χώρο, π.χ. τα αποτελέσματα μιας αναζήτησης, προσφέροντας τους διάφορες ομαδοποιήσεις των αποτελεσμάτων (βάσει των γνωρισμάτων τους, των μεταδεδομένων τους, ή άλλων δυναμικά εξηγμένων πληροφοριών). Οι ομαδοποιήσεις αυτές επιτρέπουν στους χρήστες να περιορίσουν το επίκεντρο τους σταδιακά, και με απλό τρόπο (απλά κλικς), χωρίς δηλαδή να χρειάζεται η διατύπωση επερωτήσεων, και εν τέλει να βρουν πηγές που θα ήταν δύσκολο να βρεθούν στη γραμμική λίστα αποτελεσμάτων λόγω της χαμηλής τους κατάταξης. Ο εμπλουτισμός των μηχανισμών αναζήτησης με προτιμήσεις θα μπορούσε να αποδειχθεί ιδιαίτερα χρήσιμος σε ανάγκες εξερευνητικού χαρακτήρα (recall oriented), όμως οι τρέχουσες προσεγγίσεις πρόσβασης πληροφορίας με υποστήριξη προτιμήσεων (που προέρχονται κυρίως από το χώρο των βάσεων δεδομένων), αγνοούν το γεγονός ότι οι χρήστες πρέπει να είναι εξοικειωμένοι με τον πληροφοριακό χώρο και τις διαθέσιμες επιλογές για να μπορέσουν να περιγράψουν αποτελεσματικά τις προτιμήσεις τους. Σε αυτή τη διατριβή επεκτείνουμε το μοντέλο αλληλεπίδρασης των ΠΔΤ με δράσεις που επιτρέπουν στους χρήστες να εκφράσουν τις προτιμήσεις τους διαλογικά, σταδιακά, και με απλό τρόπο. Αρχικά εισάγουμε ένα μοντέλο προτιμήσεων κατάλληλο για πληροφοριακούς χώρους αποτελούμεxvii

νους από πόρους που περιγράφονται από γνωρίσματα των οποίων οι τιμές μπορεί να είναι ιεραρχικά οργανωμένες ή/και πλειότιμες. Ορίζουμε τη γλώσσα, τη σημασιολογία της και τους σχετικούς αλγόριθμους. Το μοντέλο υποστηρίζει κληρονομικότητα προτιμήσεων στις ιεραρχίες και αυτόματη επίλυση συγκρούσεων, καθώς και τελεστές σύνθεσης προτιμήσεων (προτεραιοποίηση, Pareto και συνδυασμός τους). Εν συνεχεία εμπλουτίζουμε το μοντέλο ΠΔΤ με δράσεις προτίμησης και προτείνουμε διάφορες βελτιστοποιήσεις και τρόπους αξιοποίησης των εγγενών χαρακτηριστικών των ΠΔΤ για την εφαρμοσιμότητα του μοντέλου σε μεγάλους όγκους πληροφορίας. Κατόπιν παρουσιάζουμε τη σχεδίαση και υλοποίηση του ιστο-συστήματος Hippalus, που υλοποιεί το εκτεταμένο μοντέλο αλληλεπίδρασης. Σχετικά με το όφελος για το χρήστη, αρχικά αναλύουμε θεωρητικά τα οφέλη βάσει του πλήθους των επιλογών και της δυσκολίας αποφάσεων που καλείται να πάρει, και εν συνεχεία περιγράφουμε και αναλύουμε τα αποτελέσματα τριών αξιολογήσεων από χρήστες. Η πρώτη διερευνά το βαθμό αποτελεσματικότητας των προτιμήσεων (και τον κόπο διατύπωσής τους) όταν ο χρήστης δεν έχει γνώση των διαθέσιμων επιλογών. Τα αποτελέσματα έδειξαν ότι μόνο το 20% των χρηστών μπορούν να εκφράσουν αποτελεσματικές προτιμήσεις χωρίς γνώση των διαθέσιμων επιλογών. Η δεύτερη αξιολογεί την αποδοτικότητα των ΠΔΤ έναντι άλλων εξερευνητικών μοντέλων, και τα αποτελέσματα έδειξαν ότι οι ΠΔΤ προτιμήθηκαν από το μεγαλύτερο μέρος των χρηστών, προσέφεραν μεγαλύτερη ικανοποίηση και οδήγησαν σε μεγαλύτερα ποσοστά ολοκλήρωσης των εργασιών. Η τρίτη αφορά το εκτεταμένο με προτιμήσεις μοντέλο ΠΔΤ και η αξιολόγηση έγινε χρησιμοποιώντας το σύστημα Hippalus. Τα αποτελέσματα ήταν εντυπωσιακά. Ακόμα και σε πολύ μικρές συλλογές, με τη χρήση της διεπαφής με προτιμήσεις, όλοι οι χρήστες ολοκλήρωσαν με επιτυχία όλες τις εργασίες στο 1/3 του χρόνου (!) και με υποτριπλάσιες ενέργειες σε σχέση με την απλή ΠΔΤ. Επίσης το 100% των χρηστών, απλών και έμπειρων, προτίμησε την εμπλουτισμένη με προτιμήσεις διεπαφή. Keywords: Προτιμήσεις, Εξερευνητική Αναζήτηση, Αλληλεπιδραστική Ανάκτηση Πληροφορίας, Πάρσιμο Αποφάσεων Επόπτης: Τζίτζικας Ιωάννης Επίκουρος Καθηγητής Τμήμα Επιστήμης Υπολογιστών Πανεπιστήμιο Κρήτης xviii

Contents Page Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

Εκτεταμένη Περίληψη . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xix

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1

Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Context, Approach and Research Questions . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.3

Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.4

Produced Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.5

Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.1

Context: Exploration through FDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.2

Preference Management in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.1

Various Perspectives of Preference Management . . . . . . . . . . . . . . . . . .

17

2.3

Faceted and Dynamic Taxonomies (FDT) and Preferences: Motivation . . . . . . . . . .

21

2.4

The Database World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.5

IR and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.6

FDT and Preferences: Past and Related Works . . . . . . . . . . . . . . . . . . . . . . . .

25

2.7

Motivation and Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3 A Preference Framework for Multidimensional Information Spaces . . . . . . . . . . . . . . .

35

3.1

Syntax of the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.2

The Domain of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

xix

3.3

Syntax to Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.3.1

Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.3.2

Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.3.3

Best/Worst Preferences over Hierarchically Organized Values . . . . . . . . . .

46

3.3.4

Relative Preferences over Hierarchically Organized Values . . . . . . . . . . . .

52

3.3.5

Preferences over Hierarchical Set-Valued Attributes . . . . . . . . . . . . . . . .

59

Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

3.4.1

Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

3.4.2

Pareto Composition and Best Matches Only (BMO)-set . . . . . . . . . . . . . . .

63

3.4.3

Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . .

65

A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

4 Complexity and Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

3.4

3.5

4.1

Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.2

Optimizations for Deriving the Preference-based Order . . . . . . . . . . . . . . . . . .

78

4.2.1

An Algorithm based on the Focal Object Set . . . . . . . . . . . . . . . . . . . . .

78

4.2.2

Optimizations for Capturing Set-Valued Attributes and Top-K Requirements . .

82

Optimizations for Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.3.1

Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.3.2

Pareto Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.3.3

Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . .

87

5 Applicability and the System Hippalus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

4.3

5.1

5.2

Application in Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

5.1.1

Case: Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

5.1.2

Case: Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

5.1.3

Case: RDF Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

Hippalus: A Preference Enriched Faceted Exploratory System . . . . . . . . . . . . . .

98

5.2.1

Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

5.2.2

Visualization and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2.3

Interaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 xx

6.1

Evaluation Approaches & Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.1.1

Metrics for Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1.2

Metrics Related to the Proposed Interaction Scheme . . . . . . . . . . . . . . . . 114

6.2

Theoretical Analysis of the Number of User Decisions and Effort in FDT . . . . . . . . . . 116

6.3

DiFEPreKO Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3.1

Analytical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3.2

User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.4

Evaluation of Various Exploration Approaches . . . . . . . . . . . . . . . . . . . . . . . 131

6.5

Evaluation of Hippalus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.6

Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.1

Synopsis of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2

Directions for Future Work and Research . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Appendices A Complete Syntax of Preference Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 B Binary Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 C Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

xxi

xxii

List of Figures 2.1

Dynamic Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2

Finding a Hotel in the Island of Symi (FDT over booking.com) . . . . . . . . . . . . . . .

14

2.3

Checking Olympus Cameras (FDT over eBay) . . . . . . . . . . . . . . . . . . . . . . . .

15

2.4

FTD-based GUI of the Mitos WSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.5

Distinctions of Preference Management Approaches . . . . . . . . . . . . . . . . . . . .

19

2.6

SciNet Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.7

Example Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.1

Hasse Diagram of Preference Relation Over E (E, R≻ ) . . . . . . . . . . . . . . . . . . .

40

3.2

Example for Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3.3

Example for a DAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3.4

Example for Flat Multi-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.5

Example of Preferences Without Exploiting Hierarchies . . . . . . . . . . . . . . . . . .

47

3.6

Hasse Diagram of Actions Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.7

Taxonomy of Manufactures

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

3.8

Hasse Diagram of Scope-Based Ordering of Preference Actions . . . . . . . . . . . . . . .

54

3.9

Examples of Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.10 Relative Inherited Preferences and Conflicts Examples . . . . . . . . . . . . . . . . . . .

57

3.11 Examples of Cycles of the Form e ≺ e′ ≺ e . . . . . . . . . . . . . . . . . . . . . . . . . .

58

3.12 Hasse Diagram of the Relation R for the Manufacturer Attribute . . . . . . . . . . . . . .

59

3.13 Scope Based Ordering of Actions (Left for Best/Worst Actions, Right for Relative Preference Actions): Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

3.14 Hasse Diagram for the Relation Rbw : Complete Example . . . . . . . . . . . . . . . . . .

69

3.15 Hasse Diagram for the Relation R≻ : Complete Example . . . . . . . . . . . . . . . . . .

69

3.16 Hasse Diagram for the Relation R: Complete Example . . . . . . . . . . . . . . . . . . .

70

xxiii

3.17 Hasse Diagram for Ordering Ordering Multi-Valued Attributes According to MoreWins Rule: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

3.18 Hasse Diagram for Ordering Multi-Valued Attributes According to MoreGoodLessBad Rule: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

5.1

Processes of Web Searching and Exploratory Web Searching . . . . . . . . . . . . . . . .

90

5.2

Process of Exploratory Web Searching Enhanced with Preference Actions . . . . . . . .

91

5.3

Mitos GUI for Exploratory Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.4

Facets and Zoom-Points of Running Example . . . . . . . . . . . . . . . . . . . . . . . .

96

5.5

Example of RDF/S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

5.6

System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

5.7

The RDF Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

5.8

Hippalus: The Main Page of Hippalus . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.9

Hippalus: Value Expansion - Object Restriction . . . . . . . . . . . . . . . . . . . . . . 103

5.10 Hippalus: Expression of Relative Preference Korean ≻ European . . . . . . . . . . . 103 5.11 Hippalus: (a): Expressing Preferences, (b): Object Restrictions after Preference Expression 105 5.12 Hippalus: Composition of Preference Actions. Manufacturer Prioritized to Price . . . . . 106 5.13 Hippalus: Composition of Preference Actions. Price Prioritized to Manufacturer . . . . . 106 5.14 Hippalus: Composition of Preference Actions. Default Combination Mode . . . . . . . . 107 5.15 Hippalus: Restricted Focus with Preferences Applied . . . . . . . . . . . . . . . . . . . 107 6.1

Available IR Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2

Evaluation Step B: Users Select a Car from the List (1st page) . . . . . . . . . . . . . . . . 122

6.3

Evaluation Step B: Users Select a Car from the List (2nd page) . . . . . . . . . . . . . . . 123

6.4

Probabilities and Distribution Function of the Binomial Distribution . . . . . . . . . . . 131

6.5

Comparative Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.6

Plurality and Borda results for (a) Ease of Use, (b) Usefulness, (c) Preference and (d) Satisfaction. 141

6.7

Average Values in Last Step of Each Task. (a) for Timings (T) and Actions (A), while (b) Depicts the Values for Recall (R), Precision (P) and Average Precision (AP) . . . . . . . . 144

xxiv

List of Tables 2.1

Basic Notions and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

3.1

Scopes (Direct and Under Inheritance) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.2

Scopes: Example for Best/Worst Preferences . . . . . . . . . . . . . . . . . . . . . . . .

48

3.3

Scopes: Example for Relative Preferences . . . . . . . . . . . . . . . . . . . . . . . . . .

54

3.4

Complete Example: Scopes and Active Scopes . . . . . . . . . . . . . . . . . . . . . . . .

69

3.5

Tuples in Database: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

4.1

PrefOrderOpt Changes for Capturing Relative Preferences Over Hierarchically Organized Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

4.2

Complexity for Non-Optimized and Optimized Alg. PrefOrder and PrefOrderOpt . . .

81

6.1

Choices and Number of Clicks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2

Example of Hypothesis Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.3

Results of the hypothesis evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.4

Percentages of the 30 Users that Expressed a Preference Over a Valid Attribute . . . . . 136

6.5

Graeco-Latin Square Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.6

Plain and Expert Users Average, Max and Min Timings and User Actions for each Task for both UIs per each User Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.7

Plain and Expert Users Average, Max and Min Timings and User Actions per each Task and All Tasks for both UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.8

Plain and Expert Users Recall, Precision and Average Precision Metrics per each and all Tasks for both UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

xxv

xxvi

Chapter 1

Introduction

Contents

1.1

1.1

Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Context, Approach and Research Questions . . . . . . . . . . . . . . . . . . . . .

1

1.3

Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.4

Produced Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.5

Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

Objective

In one sentence, we could say that the general objective of this thesis is to offer users a flexible and effective method for accessing large amounts of data, able to support recall-oriented information needs and decision making.

1.2

Context, Approach and Research Questions

Users access large amounts of information resources (documents or data) mainly through search functions, where the user types a few words and the system (web search engine, query engine) returns a 1

2

Chapter 1. Introduction

linear list of hits. While this is often satisfactory for focalized search, where the user knows exactly what he is looking for and can be satisfied by a single hit, it does not provide enough support for recalloriented (exploratory) information needs, which are the majority according to various user studies (Rose and Levinson (2004); Crawford (2006)). Below we describe in brief the “world of unstructured data” and the “world of structured data”. Information Retrieval (IR) is the area of study concerned with the processes by which user queries to information systems are matched against stored objects (in principle full-text documents), which are finally returned to the user. Mainly, it is a system-based approach that does not take into consideration the user, except during query formulation. However, researchers recently have started trying to understand the role of users in IR, since there is a belief that we cannot design effective IR systems without knowing how users interact with them (Kelly (2009)). This has led to the development of Interactive Information Retrieval (IIR), where users are studied along with their interactions with systems and information. Traditional IR abstracts human interactions and experiences out of the evaluation of a retrieval system, and as a result leads to suboptimal IR systems. The interest of the community for a TREC-style evaluation framework for studying interaction and users led in three recent Tracks. These Tracks include the Interactive Track (TRECs 3-11), the HARD Track (TRECs 12-14) and ciQA(TRECs 15-16)1 . Each one of them experimented with different evaluation frameworks but none of them was successful in establishing a generic evaluation framework (Kelly (2009)). On the other hand, the recent applications must cope with a wide range of data, which can be unstructured (full text documents), semi-structured (XML) or structured (databases, linked-data). Furthermore a plethora of new tasks, quite different from the classical query evaluation task, are being performed: from data mining algorithms and machine learning to collaborative recommendation and filtering. As a result, there is an interest in the integration of IR and databases, like in Papadakos et al. (2008a); Li et al. (2011), and the exploitation of available techniques from both scientific regions in a user friendly way. In the world of structured information (e.g. databases, the Semantic Web) users are offered powerful and expressive languages to query the underlying information. On the other hand, such query languages are not directly utilized by end users, since the formulation of queries is a laborious and difficult task for them (Reisner (1981)). Consequently, efforts for exploiting such languages in user friendly general1

http://trec.nist.gov/tracks.html

1.2. Context, Approach and Research Questions

3

purpose models of exploration/navigation have come up (e.g. Chakrabarti et al. (2004); Oren et al. (2006); Mäkelä et al. (2006); Hildebrand et al. (2006); Becker and Bizer (2009); Le Phuoc et al. (2010); Ferré and Hermann (2012)). For instance, the interaction scheme of FDT (Sacco and Tzitzikas (2009)) allows users to explore the information space and is suitable for recall-oriented tasks, as the ones found in Exploratory Search (ES)2 and decision making environments. By using the Faceted and Dynamic Taxonomies (FDT) interaction scheme, users can restrict their focus (object set) without having to formulate queries, but through a small set of actions (zoom in/out/side). Each action corresponds to a query (formulated onthe-fly) which can be enacted by a simple click. This approach can be applied over the results of an IR system (simple user query with relative terms), a relational database (by using available query languages like SQL) or data published under Semantic Web technologies like RDF/S or OWL data models (querying them using SPARQL, SQWRL, etc). In this dissertation we investigate how we can extend these actions with preference management. Such an extension can further ease the interaction and speed up the restriction of the focus to those parts of the information space that the user is interested in. Such actions can be especially beneficial for mobile devices and User Interfaces (UIs) over small screen real-estates (i.e. smart-phones and tablet computers), which need special interfaces (as the one proposed by Neumann and Schmeier (2012)). To this end, we extend the FDT interaction with user specified preferences. In other words FDT allows constructing queries by simple navigation/exploration actions, and this work extends this set of actions in order to offer preference-compliant exploration. Works on preference management over structured data (e.g. Kießling (2002); Chomicki (2003); Andreka et al. (2002); Kießling and Kostler (2002)) require that the user either has to formulate complex preference queries, or the application developer has to develop specialized interfaces which internally construct such queries. Moreover, and more importantly, for formulating an effective preference specification the user should be already acquainted with the information space and the available choices, which could be unknown as in the case of web databases (Stefanidis et al. (2011a)). In addition, in this work we formulated the hypothesis that without knowing the available choices, the declarative expression of preferences is a tiresome and time-consuming process and proved its validity through a user study. 2

The ES initiative aims to provide users with better tools for advanced information seeking tasks such as learning, investigation and analysis according to Marchionini (2006).

4

Chapter 1. Introduction The above observations justify the need for flexible and universal (i.e. general purpose) access meth-

ods that offer exploration services and real-time preference elicitation. The requirements for such exploratory environments include: a) generality, they should be capable of capturing a wide range of information spaces and user information needs, b) expressiveness, it should be possible for the user to interactively specify complex preference structures and c) usability, the users should be able to use and understand the interaction immediately, and the resulting interaction should be effective and desired by the users. As a result, the main research questions that arise from the above are: • How can we gradually and flexibly specify preferences over information spaces that might be hierarchically organized and might support multi-valued attributes and which will be their semantics? • How can we tackle the algorithmic perspective so that the proposed preference-extended FDT interaction can be applied over large information bases? • How does the preference-extended FDT interaction affect the user effort and other metrics during exploratory tasks?

1.3

Contribution

The key points and contributions of this thesis are: • It introduces an interaction model for preference elicitation during FDT exploration. Most works on preference management focus only on the order of objects, while this work focuses also on the other aspects of the FDT interaction scheme (facet/zoom-points), i.e. on the order of “queries” the user can select for changing his focus. • At first we introduce a preference framework appropriate for information spaces comprising resources described by attributes whose values can be hierarchically valued and/or multi-valued. We define the language, its semantics and the required algorithms. The framework supports preference

1.4. Produced Publications

5

inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioritization, Pareto and their combination). • It elaborates on the system and algorithmic perspective of the proposed approach, and introduces methods that allow applying the approach over large information spaces. • Subsequently, we present the design and the implementation of the web-based system Hippalus which realizes the extended interaction model. • Regarding the benefits for the users, at first we analyze theoretically the user gain in terms of number and difficulty of choices. • Then we describe and analyze three user-based evaluations that we have conducted. The first investigates the degree of effectiveness of preferences (and the effort to express them) when the user is not aware of the available choices. The results show that only 20% of the users managed to express effective preferences without knowing the available choices. The second, comparatively evaluates FDT and other exploratory models. The results show that the majority of users preferred FDT, were more satisfied by FDT, and they achieved higher rates of task completion with FDT. Finally, the last one evaluates the preference-enriched FDT as realized by Hippalus. The results were impressive. Even in a very small dataset, with the preference-enriched FDT all users completed successfully all tasks in 1/3 of the time and with 1/3 of the actions in comparison to the plain FDT. Moreover all (100%) of the users (either simple or experts) preferred the preferenceenriched interface. To the best of our knowledge this is the first work that supports the above.

1.4

Produced Publications

The research activity related to this thesis has so far produced 2 journal, 3 conference, 1 workshop and 1 demo papers along with 2 technical reports, which are briefly described below: Related to the application of FDT over a Web Search Engine (WSE) • DEXA’08 Workshops paper Tzitzikas et al. (2008) describes FleXplorer, which is a framework for

6

Chapter 1. Introduction FDT, that can manage millions of objects in real-time and is used by Mitos WSE3 . • The idea of combining the interaction scheme of FDT and on-line clustering algorithms was described in the conference paper Papadakos et al. (2009a), presented at ECDL’09 and also in HDMS’09 (Papadakos et al. (2009b)). • ECDL’09 Doctoral Consortium workshop paper Papadakos (2009) describes the initial vision of this Dissertation. • WISE’09 conference paper Kopidaki et al. (2009) describes a snippet-based clustering algorithm named NM-STC, which is used by the previous work. • The KAIS’12 journal Papadakos et al. (2012a) proposes the exploitation of both static and dynamic metadata for the FDT interaction scheme, studies an incremental way of speeding up the exploration process of this approach and provides the results of an experimental user study over Mitos.

Extension of FDT with preferences • The FI’12 journal Tzitzikas and Papadakos (2013), motivates the need for real-time preference elicitation, introduces a language for enriching the interaction scheme of FDT, with preference elicitation and preference-based interaction. Key aspects of the proposed approach include, the support of hierarchically organized values, the support of set-valued attributes, and the incremental preference specification mode, with the scope-based method for resolving conflicts. The proposed algorithms, take advantage of the rapid reduction of the information space through the use of FDT, and are independent of the size of the information base. • A demo paper, showcasing an implementation of the above functionality over an RDF exploratory system, is described in Papadakos et al. (2012b). Related to IR indexing and querying • The technical report Papadakos et al. (2008b), published in CORR’08 describes Mitos, a DBMS-based WSE that provides the FDT interaction scheme. 3

Under development by the Department of Computer Science of the University of Crete and FORTH-ICS (http://groogle.csd.uoc.gr:8080/mitos/).

1.5. Thesis Outline

7

• The DBMS-index of Mitos is discussed in the PCI’08 conference paper Papadakos et al. (2008a), where different database representations are discussed and compared. • An extension of the previous work with one additional representation and experimental results is provided in the technical report Papadakos et al. (2009c), published in CORR’09. Submitted and under review • Submitted to the International Journal of Information Technology & Decision Making an article based on the hypothesis user study described in Section 6.3. The title of the article is “Comparing the Effectiveness of Intentional Preferences versus Preferences over Specific Choices: A User Study”. • A paper that describes in detail the Hippalus system and discusses the results of the evaluation in Section 6.5 has been submitted for review to the 1st International Workshop on Exploratory Search in Databases and the Web (ExploreDB 2014), with the title “Hippalus: Preference-enriched Faceted Exploration”.

1.5

Thesis Outline

The rest of this thesis is organized as follows. Chapter 2 provides the required background information on FDT and preference management, and discusses related work. Chapter 3 defines the syntax and semantics of a preference specification language for multi-dimensional hierarchical information spaces. Chapter 4 elaborates on the algorithmic perspective of the proposed approach and introduces a number of optimizations which are crucial for the applicability of the framework. Chapter 5 examines implementation and application issues of the approach. Chapter 6 discusses user effort, checks the validity of the hypothesis of this thesis and examines the results of a number of user-based evaluations. Finally, Chapter 7 concludes the thesis and identifies issues that are worth further work and research.

8

Chapter 2

Background and Related Work

Contents 2.1

Context: Exploration through FDT . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.2

Preference Management in General . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.1

Various Perspectives of Preference Management . . . . . . . . . . . . . . . . .

17

2.3

FDT and Preferences: Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.4

The Database World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.5

IR and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.6

FDT and Preferences: Past and Related Works . . . . . . . . . . . . . . . . . . . .

25

2.7

Motivation and Running Example

30

. . . . . . . . . . . . . . . . . . . . . . . . . .

In this chapter we discuss the background and the related work regarding FDT and preferences. Specifically, Section 2.1 reviews and provides notions and notations for FDT. Regarding preferences and personalization, Section 2.2 reviews preference management in general. In Section 2.3 we motivate the enrichment of FDT with preference actions. Section 2.4 and Section 2.5 discusses preferences under the prism of the database and IR world respectively. Finally, Section 2.6 discusses related work in the context of decision making and preferences over FDT, while Section 2.7 provides the motivating example of this thesis. 9

10

Chapter 2. Background and Related Work

2.1

Context: Exploration through FDT

Most Database (DB) and IR applications, as well as most Web Search Engines (WSEs), are very effective for focalized search, i.e. they make the assumption that users can accurately describe their information need using a query which is usually a small sequence of words. However, as several user studies have shown, a high percentage of search tasks are exploratory (Crawford (2006),Rose and Levinson (2004)), the user does not know accurately his information need (e.g. in WSE users provide in average 2.4 words as described in Inan (2006)) and he can not be satisfied by a single ‘hit’. The information needs emerge as users iteratively seek, learn and reflect on the gathered results during the session (Byström and Järvelin (1995); Chowdhury et al. (2011)). As a result focalized search very commonly leads to inadequate interactions and poor results. The available UIs in most cases do not aid the user in query formulation, and do not provide any exploration services. The returned answers are simple ranked lists of results, with no organization. For this reason users typically reformulate their initial query, inspect the top elements of the returned answer, and so on. One approach to this problem is results clustering (Hearst and Pedersen (1996); Zamir and Etzioni (1998); Kopidaki et al. (2009)) which provides an overview of the search results. A survey of web clustering engines is provided in Carpineto et al. (2009). Results clustering aims at grouping the results into topics, called clusters, with predictive names (labels), aiding the user to locate quickly documents that otherwise he wouldn’t practically find especially if these documents are low ranked (and thus not in first result pages). The snippets, the cluster labels and their structure is one instance of what we call dynamically-mined metadata. We use this term to refer to metadata which are query-dependent, i.e. they cannot be extracted/mined a-priori. The problem with clustering is that such metadata are usually mined only from the top-K part of a query answer because it would be unacceptably expensive (for real-time interaction) to apply these tasks on large quantities of data. In addition, the lack of predictability, the fact that a number of algorithms create clusters with common results, the difficulty of labeling the groups (at least for non snippet-based algorithms) and the counterintuitiveness of cluster sub-hierarchies, make the explicit use of clustering difficult for the users (Hearst (2006)). Other approaches to exploratory search (Meij et al. (2009); Shokouhi and Radinsky (2012); Fafalios et al. (2012b); Shokouhi (2013)) include completions, either of a single term the user is typing in or the entire query, and auto suggestions representing the full user query intent. Such completions also include

2.1. Context: Exploration through FDT

11

result completions (i.e. the user is presented a number of results according to the keywords he has typed). Such approaches have been used for a while by mainstream search engines like Google1 , Freebase2 , commercial sites like eBay or Evi3 which is an answer engine. On the other hand, modern environments should guide users in exploring the information space and in expressing their information needs in a progressive manner. Systems supporting FDT offer a simple, efficient and effective way for explorative tasks and are discussed in Sacco and Tzitzikas (2009). Dynamic taxonomies (faceted or not) is an interaction framework based on a multi-dimensional classification of (may heterogeneous) data objects allowing users to browse and explore the information space in a guided, yet unconstrained way through a simple visual interface. Features of this framework include: (a) display of current results in multiple categorization schemes (called facets - or just attributes), (b) display of categories (i.e. attribute values) leading to non-empty results only, (c) display of the count information of the indexed objects of each category (i.e. the number of results the user will get by selecting that category), and (d) the user can refine his focus gradually, i.e. it is a session-based interaction paradigm in contrast to the query-and-response dialog of current WSE which is stateless.

Figure 2.1: Dynamic Taxonomies An example of the idea of dynamic taxonomies assuming only one facet is shown in Figure 2.1. Figure 2.1(a) shows a taxonomy comprising 10 terms (European, Italian, Spanish, German, Fiat, Lancia, Seat4 , VW, BMW, Audi) and 8 indexed objects (1-8). Figure 2.1(b) shows the dynamic taxonomy if we restrict 1

http://www.google.com http://www.freebase.com/ 3 http://www.evi.com/ 4 Since Seat was bought by VW we assume that cars build by Seat are both Spanish and German. 2

12

Chapter 2. Background and Related Work

our focus to the objects {4,5,6}. Notice that it comprises only 6 terms, those that lead to objects in {4,5,6}. Figure 2.1(c) sketches user interaction, based on the restriction shown in Figure 2.1(b) (e.g. at the left side bar). Notice the count number next to each term. User Interaction. The user explores or navigates the information space by setting and changing his focus. The notion of focus can be intensional or extensional. Specifically, any conjunction of terms (or any boolean expression of terms in general) is a possible focus. In this case we can say that the focus is defined intensionally. For example, the initial focus can be the empty, or the top term of a facet. However, the user can also start from an arbitrary set of objects, and this is the common case in the context of a WSE. In that case we can say that the focus is defined extensionally. Specifically, if A is the result of a free text query q (or if A is a set of tuples returned by an SQL query q), then the interaction is based on the restriction of the faceted taxonomy on A (Figure 2.1(b) shows the restriction of a taxonomy on the objects {4,5,6}). At any point during the interaction, we compute and provide to the user the immediate zoom-in/out/side points along with count information (as shown in Figure 2.1(c)). When the user selects one of these points then the selected term is added to the focus (corresponding to a more refined query), and so on. Notions and Notations. Table 2.1 defines formally and introduces notations for terms, terminologies, taxonomies, faceted taxonomies, interpretations, descriptions and materialized faceted taxonomies, that will be used in the sequel. The upper part of the table describes taxonomies. The middle part of the table describes materialized faceted taxonomies, which is actually the kind of information sources that we consider. In brief, Obj is a set of objects (e.g. the set of all documents indexed by a WSE), each described with respect to one or more aspects (facets), where the description of an object with respect to one facet consists of assigning to the object one or more terms from the taxonomy that corresponds to that facet. I is the interpretation function, while I¯ takes into account the semantics. For example, and assuming ¯ the example of Figure 2.1(a), we have I(Lancia) = {2, 3}, I(Italian) = ∅, while I(Lancia) = {2, 3} ¯ and I(Italian) = {1, 2, 3}. The lower part of the table describes the FDT-interaction. For example, Figure 2.1(b) depicts the restriction over the set A = {4, 5, 6}, and the reduced terminology TA is the set of shown terms. Scalability Regarding scalability we should mention that FDT can provide real-time exploration services for millions of objects using techniques like those proposed in Yee et al. (2003); Sacco (2006a); BenYitzhak et al. (2008). Thorough experimental results over FleXplorer are given in Tzitzikas et al. (2008).

2.1. Context: Exploration through FDT

13

TAXONOMY Name

Notation

Definition

terminology

T

a set of terms (can capture categorical/numeric values)

subsumption



a partial order (reflexive, transitive and antisymmetric)

(T, ≤)

taxonomy

T is a terminology, ≤ a subsumption relation over T

broaders of t

+

B (t)

{ t′ | t < t′ }

narrowers of t

N + (t)

{ t′ | t′ < t}

direct broaders of t

B(t)

minimal< (B + (t))

direct narrowers of t

N (t)

maximal< (N + (t))

⊤i

top element

⊤i = maximal≤ (Ti )

MATERIALIZED FACETED TAXONOMIES F = {F1 , ..., Fk } Fi = (T i , ≤i ), for i = 1, ..., k and all T i are disjoint

faceted taxonomy object domain interpretation of T

Obj

any denumerable set of objects

I

any function I : T → P(Obj)

materialized faceted taxonomy

(F, I)

F is a faceted taxonomy {F1 , ..., Fk } and I is an inter∪ pretation of T = i=1,k T i

ordering over interpretations

I ⊑ I′

I(t) ⊆ I ′ (t) for each t ∈ T

model of (T , ≤) induced by I



description of o wrt I

DI (o)

¯ = ∪{I(t′ ) | t′ ≤ t} I(t) DI (o) = { t ∈ T | o ∈ I(t)}

¯ I (o) { t ∈ T | o ∈ I(t)} ¯ DI¯(o) ≡ D = ∪t∈DI (o) ({t} ∪ B + (t))

description of o wrt I¯

FDT-INTERACTION: BASIC NOTIONS AND NOTATIONS intentionally specified focus

ctx

any subset of T such that ctx = minimal(ctx)

projection on Fi

ctxi

ctx ∩ Ti

Kinds of zoom points w.r.t. a facet i while being at ctx zoom points

AZi (ctx)

¯ ¯ ̸= ∅} { t ∈ Ti | I(ctx) ∩ I(t)

zoom-in points

Zi+ (ctx)

AZi (ctx) ∩ N + (ctxi )

immediate zoom-in points

Zi (ctx)

maximal(Zi+ (ctx)) = AZi (ctx) ∩ N (ctxi )

zoom-side points

ZRi+ (ctx)

AZi (ctx) \ {ctxi ∪ N + (ctxi ) ∪ B + (ctxi )}

immediate zoom-side points

ZRi (ctx)

maximal(ZR+ (ctx))

Restriction over an object set A ⊆ Obj (i.e. extensional focus) reduced interpretation

IA

IA (t) = I(t) ∩ A

reduced terminology

TA

{ t ∈ T | I¯A (t) ̸= ∅} = ¯ ∩ A ̸= ∅} = ∪o∈A B + (DI (o)) { t ∈ T | I(t)

Table 2.1: Basic Notions and Notations

14

Chapter 2. Background and Related Work

Figure 2.2: Finding a Hotel in the Island of Symi (FDT over booking.com)

As expected, the computation of zoom-in points with count information is more expensive than without: in 1 sec we can compute the zoom-in points of around 240.000 results (i.e. |A| = 240.000) with count information, while without count information we can compute the zoom-in points of around 540.000 results. Applications. Examples of applications of faceted metadata-search include: e-commerce (e.g. eBay shown in Figure 2.3 or Amazon5 ), library and bibliographic portals (e.g. DBLP, ACM Digital Library), booking applications (e.g. booking.com6 as shown in Figure 2.2), museum portals (e.g. Hyvönen et al. 5 6

http://www.amazon.com http://www.booking.com

2.1. Context: Exploration through FDT

15

Figure 2.3: Checking Olympus Cameras (FDT over eBay) (2005) and Europeana7 ), mobile phone browsers (e.g. Karlson et al. (2006)), specialized search engines and portals (e.g. Mäkelä et al. (2005); Yee et al. (2003)), Semantic Web (e.g. Hildebrand et al. (2006); Mäkelä et al. (2006)), general purpose WSE (e.g. Mitos Papadakos et al. (2009a)) and collaborative enviroments (e.g. mSpace Schraefel et al. (2003)). Moreover, and as shown in Papadakos et al. (2012a) this interaction scheme can act complementarily to the query-and-response dialogue of the current WSE, along with available dynamic metadata mined through clustering techniques (Kopidaki et al. (2009)) or entity mining (Fafalios et al. (2012a, 2013); Kitsos et al. (2013); Fafalios and Tzitzikas (2013)). As an application example, Figure 2.4 shows a screenshot of a WSE that supports FDT exploration. 7

http://www.europeana.eu

16

Chapter 2. Background and Related Work

Figure 2.4: FTD-based GUI of the Mitos WSE

Specifically, it shows the screen after the user submitted the query java. In that screenshot, 4 different facets are shown, each corresponding to one metadata attribute (at the left bar). The values (zoom-points or terms) of two of these facets (By date and By clustering) are hierarchically organized, while the values of the rest two facets (By filetype and By domain) are flat (no hierarchical organization). The results of the current focus appear at the right frame. For more on this application the reader can refer to Papadakos et al. (2012a), while the real-time snippet-based results clustering algorithm employed is described in Kopidaki et al. (2009).

2.2. Preference Management in General

2.2

17

Preference Management in General

Preferences are a central part of our every day lives and lead human decision making. Commonly, preferences are not hard constraints but wishes, simple or complicated ones (covering one or more aspects), which might or might not be satisfied. Such wishes might be independent, or might affect each other even in conflicting ways (Stefanidis et al. (2011a)). Preferences have been studied in a number of fields since they are a multi-disciplinary topic. Such fields include Philosophy (Hansson (2001)), social sciences like Psychology (Scherer (2005)) and Economics (Fishburn (1999)) and Decision Making (Lichtenstein and Slovic (2006)). Furthermore, they have been thoroughly studied in a number of Computer Science areas. Specifically, they have been studied in the fields of Artificial Intelligence (AI) (Wellman and Doyle (1991)), Human Computer Interaction (HCI) (Linden et al. (1997)), and especially in Information Systems (ISs) like in databases (Kießling (2002); Chomicki (2003)), XML (Kießling et al. (2001)), and OLAP (Golfarelli et al. (2011)). A survey on representation, composition and application of preferences in DBs is given at Stefanidis et al. (2011a), while a survey of major questions and approaches for preference handling in applications such as recommender systems, personal assistant agents and personalized user interfaces is given at Peintner et al. (2008). Pu and Chen (2008) propose guidelines and report examples for product search and recommender systems. In Figure 2.5 we show some distinctions of preference management approaches from various perspectives, which are discussed below.

2.2.1

Various Perspectives of Preference Management

We can identify the following perspectives of preference management8 : Subject of Personalization. In general, a user can express preferences regarding the informational contents of an application, the visualization of the contents, the services that the user has access to at any time, or the interaction between the user and the application. Preference Formulation. Preferences can be defined either using a qualitative approach like in Kießling (2002); Chomicki (2003) and Georgiadis et al. (2008) or a quantitative approach as in Agrawal and Wimmers (2000); Balke and Güntzer (2004) and Koutrika and Ioannidis (2005). According to the qualitative approach, preferences are described directly, using a preference relation ≻P ref (i.e. x ≻P ref y). Preference rela8

This categorization is based on Stefanidis et al. (2011a) survey.

18

Chapter 2. Background and Related Work

tions may be specified using logical formulas (Chomicki (2003)), or by using special preference constructors (Kießling (2002)). In the quantitative approach, preferences are described indirectly by defining scoring functions (i.e. Score(x) > Score(y)). Scores may be assigned through preference functions (Agrawal and Wimmers (2000)) or through degrees of interest under specific satisfied conditions (Koutrika and Ioannidis (2004)). The qualitative approach is more powerful and expressive than the quantitative approach, since not every preference can be modeled using scoring functions, according to Chomicki (2003) and Fishburn (1970). There are also approaches that support a mixture of qualitative and quantitative preferences (Rossi et al. (2008)). This can be done by putting together a CP-net (Conditional Preference Network)9 and a set of constraints. Certainty of Preference. The above approaches can be further specialized depending on whether the expressed preferences are crisp or fuzzy (uncertain). Uncertainty expresses the level of confidence whether a particular preference holds and can be modeled by using fuzzy set theory. Barrett and Salles (2006) reviews the literature on fuzzy preferences. Sources of Preference. Preferences can be specified explicitly by the users (either through a query language that supports preferences (Kießling and Kostler (2002); Levandoski et al. (2010)), or through the mediation of an application that produces such queries (Kießling et al. (2011b)), or implicitly, by tracking silently user actions and monitoring user behaviour. The latter category includes works like Gadanho and Lhuillier (2007), Kelly and Teevan (2003) and Pound et al. (2011). In addition, preferences can be inferred based on the assumption that similar people like similar things. Such works include collaborative filtering systems (Schafer et al. (2001) and Rashid et al. (2002)). Machine learning has also been applied for learning preference value functions. For example desJardins et al. (2006) and Wagstaff et al. (2010) present methods for learning preferences over sets of items, by taking as input a collection of positive examples. Subject Information Space. Another criterion is the structure of the underlying information space (unstructured information (i.e. text), relational spaces, multi-dimensional spaces with hierarchically organized attribute domains, support of multi-valued attributes, etc). Context. Preferences can hold unconditionally and in this case are called context-free. On the other hand, contextual or conditional preferences hold when specific conditions are met. Furthermore, contextual pref9

A CP-net is a directed graph G over attributes V , whose nodes are annotated with conditional preference tables for each attribute (Boutilier et al. (2004)), and uses conditional ceteris paribus (all else being equal) semantics.

2.2. Preference Management in General

19

Figure 2.5: Distinctions of Preference Management Approaches

erences can be divided to internal, when the context can be defined from information available to the data over which preferences are expressed on, and external when not. Computing context (i.e. network connectivity), user context (i.e. profile), physical context (i.e. temperature) and time are common types of external contexts (Chen and Kotz (2000)). A model for the propagation of user preferences through contexts is described in Ciaccia and Torlone (2011) while a model for expressing contextual is described in Stefanidis et al. (2011b).

20

Chapter 2. Background and Related Work

Elasticity. Preferences can be exact or elastic. Exact preferences can either be satisfied or not, while elastic should be satisfied as closely as possible. For example, Kießling (2002) captures elastic preferences using the AROUND preference construct and distance functions. Complexity. Complexity describes the degree of detail and how specific a preference is. Generic or simple preferences express preferences over a single attribute of the entities of interest while a compound preference combines a number of simple preferences. Completeness. The description of user preferences usually is incomplete, since it is inconvenient for users to express preferences over all pairs of objects in the domain of interest (Stefanidis et al. (2011a)). In such cases, the lack of preference relations over some objects can be interpreted either as an equal preference (i.e. they are equivalent), as an incomparability (i.e. these objects can not be compared), or finally we assume that there is a gap in our preference knowledge, which can be avoided by defining a preorder extending the given partial order (Ross (2007)). Semantics. Preferences can use two different semantics: ceteris paribus semantics and totalitarian semantics. Ceteris paribus is latin and means “all else being equal”. For example the preference “I prefer a square table over a round table”, when any other attributes like size, wood, etc. are the same. On the other hand totalitarian semantics mean that if “I prefer an object o1 over o2 ” for a specific attribute, it means that I do not prefer o2 over o1 for any other attribute (Pareto semantics). Stability. Furthermore, its difficult to assume that user preferences are stable, so frameworks that capture preferences should not assume that they are fixed. Users change their preferences even while inspecting available choices. For example Doyle (2004) shows how easily preferences change over time. Chomicki (2007) proposes an incremental preference revision framework, where the revised preference relation is produced by composing the original preference relations with another preference relation, by using preference composition methods like union, prioritized and Pareto composition. Elaborating even more, in this thesis we show that the expression of user preferences is time-consuming and results to incomplete preferences, when the user does not have the ability to view and explore the existing choices. We have named this hypothesis the Difficulty of Formulating Effective Preferences without Knowing the Options (DiFEPreKO) hypothesis which we evaluate in Section 6.3, through a user study. Granularity. Preferences can be expressed at different levels of granularity. For example in databases, preferences can be expressed over individual tuples, sets of tuples (i.e. where preferences do not depend only on individual tuple values but also on properties of groups of tuples like in Brafman et al. (2006);

2.3. FDT and Preferences: Motivation

21

Binshtok et al. (2007) and Zhang and Chomicki (2011)), attributes (Georgiadis et al. (2008)), relationships (Koutrika and Ioannidis (2004)) (i.e. preferences expressed over relationships between two type of entities), relations (i.e. preferences expressed on class of entities) and finally on facts (i.e. preferences on the space of hierarchy attributes) (Golfarelli et al. (2011)). In the FDT world, preferences can be expressed over different objects, zoom-points and facets. Most of the available works focus only to objects. Recently, there are works that also affect facets, like in Dash et al. (2008); Wagner et al. (2011) and Pound et al. (2011), which will be presented later in Section 2.6.

2.3 FDT and Preferences: Motivation One main thesis of this work is that effective preference specification presupposes knowledge of the information space and of the available choices. FDT-based interaction can aid users in getting acquainted with the information space and the available choices. Therefore FDT can aid preference elicitation even if instead of the preference actions proposed in this proposal, the other well known approaches (e.g. Preference SQL described in Kießling and Kostler (2002) and Kießling et al. (2011a) or FlexPref described in Levandoski et al. (2010)) are employed for expressing user preferences and/or deriving the corresponding object ranking. The computation and display of zoom points reduces the need for specifying complex preference profiles and users can explore the available choices (or the most preferred) non linearly. For instance, by clicking on the zoom points the user can inspect the available choices based on the desired values. Without effective exploration services the user is obliged to explore linearly blocks of objects and the derivation of small blocks (equivalently many blocks) requires rich preference specifications which are cumbersome to acquire. Let’s consider a set of attributes and suppose the user selects one zoom point of the first attribute. The FDT approach will show those values of the rest attributes that are “active”. Such browsing can aid users in identifying for each attribute those values for which it is worth specifying a complex value tradeoff (e.g. by using a quantitative approach). On a multi-dimensional space where user preferences for each dimension have been specified, the efficient set (else called skyline, or Pareto optimal set) is indeed very useful if the user is interested in one hit (e.g. one car to buy, one hotel to book). In the FDT approach and with the actions that we propose (specifically with term-scoped actions), the most preferred values from each dimension are shown as zoom points and at decreasing order of preference. We know that all objects that have these values (i.e.

22

Chapter 2. Background and Related Work

those objects that have at least one of the most preferred values of an attribute), are certainly part of the skyline. So the preference-extended FDT interaction inherently provides partially skyline support. However, to compute the entire skyline we need to apply one skyline algorithm (e.g. Kossmann et al. (2002); Papadias et al. (2005)), so skyline computation can be considered as a helpful complementary service. The computed skyline can then be explored using the FDT method.

2.4

The Database World

For applying user preferences over relational data, many different methods have been proposed in the literature. The most used ones are skylines (i.e. return objects in a database that are not dominated by any other object in the data) (Börzsönyi et al. (2001); Kossmann et al. (2002); Chomicki et al. (2003); Papadias et al. (2005)) and top-K (i.e. score each object using a monotonic ranking function and return the top-K (Chaudhuri and Gravano (1999); Chang and Hwang (2002); Ilyas et al. (2004a,b)). Other methods include k-dominance (i.e. consider only k dimensional subspaces for dominance)(Chan et al. (2006a)), k-frequency (i.e. rank each object based on how often they are returned in the skyline when different number of dimensions are considered) (Chan et al. (2006b)), top-k dominance (i.e. rank objects based on how many other objects it dominates and returns the k objects with the highest score (Yiu and Mamoulis (2007)), k-representative dominance (i.e. selecting k skyline points so that the number of points, which are dominated by at least one of these k skyline points is maximized) (Lin et al. (2007)), hybrid multi-objective methods (computing sets of objects that are non-dominated with respect to a set of monotonic objective functions (Balke and Güntzer (2004)), ranked skylines (i.e. adapt to user-specific information needs and identify the skyline results of user-specified retrieval size k) (Lee et al. (2009)), distance-based dominance (i.e. a new definition of representative skyline that minimizes the distance between a non representative skyline point and its nearest representative) (Tao et al. (2009)), and lastly ϵ skylines (i.e. the number of skylines can be increased or decreased, provide a built-in rank for all objects and integrate weights to different dimensions) (Xia et al. (2008)). Finally, user satisfaction can further be improved by increasing the diversity of the results, like in desJardins and Wagstaff (2005) and Vee et al. (2009).

2.5. IR and Preferences

2.5

23

IR and Preferences

Preference management and personalization in IR has been approached from various perspectives. The initial step for personalizing IR systems was query reformulation through explicit relevance feedback (Rochio (1971); Choi et al. (2001); Bot and Wu (2004)), or pseudo-relevance feedback (Kelly and Belkin (2001); Kelly and Teevan (2003)), which is implicit feedback inferred from user behavior (i.e. selection of a document, time the document is open, etc). The approaches for personalization and information filtering can be roughly classified into two categories: content-based filtering and collaborative filtering. In the first approach, the documents are monitored and the system pushes to the user the best matching ones to his user profile. The user can provide explicit relevance feedback, updating his profile using different retrieval models, like Boolean, VSM, probabilistic models (Robertson and Jones (1976); Yu et al. (2004); Zigoris and Zhang (2006); Zhang and Koren (2007)), retrieval models that rank objects based on user-defined reference points (Korfhage (1997)), inference networks (Callan (1996)), language models (Croft and Lafferty (2003)), user feedback to improve preference learning (Cohen et al. (1999)) and machine learning algorithms for learning ranking functions(Lewis (2001); Yang et al. (2005); Shawe-Taylor et al. (2002); Burges et al. (2005); Zhai and Lafferty (2006); Zha et al. (2006); Liu (2011)). In the latter approach, the system takes advantage of other similar user profiles and preferences, except from documents content. Memory-based (utilize the entire user-item database to generate a prediction) and model-based (provide item recommendation by first developing a model of user ratings) approaches have been proposed (Breese et al. (1998); Delgado and Ishii (1999); Herlocker et al. (1999); Hofmann and Puzicha (1999); Jin et al. (2004); Konstan et al. (1997); White et al. (2010)). Other approaches (Basu et al. (1998); Melville et al. (2001); Wang et al. (2006); Pitkow et al. (2002)) try to combine both techniques, to provide an effective recommendation system. A very recent and interesting approach is described in Ruotsalo et al. (2013a). This work presents the design and study of interactive user modeling, where the user model’s features are keywords, and aims to support exploratory tasks. Specifically this work allows the users to perceive the state of a user model at all times and provide feedback that directly rewards and penalizes. In addition the users can continuously tune the system’s belief about their information needs. Feedback is provided by drag-&-dropping keywords from available documents into the exploratory view. Keywords near the center of the exploratory view are more important than keywords near the edges. Figure 2.6 shows an snapshot of the SciNet,

24

Chapter 2. Background and Related Work

Figure 2.6: SciNet Prototype

which is a prototype implementing the above functionality. The results show that interactive user modeling can help users to more effectively find relevant, novel and diverse results without compromise in task execution time. The same authors in Ruotsalo et al. (2013b) introduce an interactive intent modeling, where the user directs exploratory search by providing feedback for estimates of search intents. Estimates are visualized in an Intent Radar, where relevant intents are are close and similar intents have similar angles.

Such approaches, except Ruotsalo et al. (2013a) which also affects keywords, affect only object ranking and do not exploit available metadata (which could be mined statically or dynamically as proposed in Papadakos et al. (2012a); Kitsos et al. (2013)), With respect to our proposal they can be considered as complementary techniques that are based solely on the textual content of the objects. In addition, the proposed model can incorporate IR-like rankings by exploiting a Relevance facet, which corresponds to the score returned by the WSE. Furthermore, they do not engage users (again except Ruotsalo et al. (2013a)) to use available personalization techniques in the search process.

2.6. FDT and Preferences: Past and Related Works

25

2.6 FDT and Preferences: Past and Related Works Supporting personalization in FDT is not well studied. Most FDT systems, like Flamenco (Hearst et al. (2002)), output facets and zoom-points in lexicographical order. An alternative is to order facets and zoom-points based on the number of indexed documents as in Oren et al. (2006). Some other systems like eBay Express10 (merged now to the main eBay portal), only present a manually chosen subset of facets to the users, and the zoom-points are again ranked based on the number of indexed documents. Manually selecting and maintaining a number of preferred facets can be time consuming, especially for systems that support a great number of facets and zoom-points. In addition in systems like eBay or Amazon, users are able to order the available objects according to simple object ordering operations over one specific attribute (e.g. order objects according to Price, or Price + Shipping, or Duration of auction in ascending or descending order). Set-Cover Ranking One of the first approaches for facet ranking, is described in Dakka et al. (2005). Specifically, this work aims at providing automatic and scalable methods for the creation of multifaceted interfaces. In addition, it provides methods for selecting the best portions of the generated hierarchies (considering the limitations of screen space). Specifically, they introduce two approaches for facet ranking. The first tries to maximize the number of indexed objects that are accessible from the top-k facets (set-cover ranking). The second, named merit based, takes into consideration the structural properties of the subhierarchies under the selected facets (i.e. the structure of zoom-points). Specifically, the merit-based method ranks higher facets that enable users to access their contents with the smallest cost on average. Interestingness Ranking Another approach described in Dash et al. (2008), tries to select the list of facets that will be displayed to the user following a query, a problem called facet selection problem. In this method the notion of interestingness is incorporated into the ranking. Each facet is measured based on how surprising it is, by aggregating the “interestingness” of its values given a certain expectation. They define three different ways for setting the expectations. The first is the natural one, where the users assume a natural distribution in the data-set (i.e. documents uniformly distributed along each facet, or that facets are independent). The second is navigational one, where they assume that the user is already familiar with 10

http://www.ebay.com

26

Chapter 2. Background and Related Work

the repository and the expectation is set based on how the user navigates the results. Finally, there is an ad-hoc way, where the user sets the expectation based on an arbitrary query. However, in this approach, users cannot explictly define their preferences over facets and zoom-points and cannot affect the ordering of the objects. Collaborative Approaches A collaborative filtering method with explicit user ratings to design a personalized FDT system is proposed in Koren et al. (2008), where several algorithms are proposed and evaluated. They propose a general probabilistic framework to build faceted document models and user relevance models. Users express a preference for retrieved documents and facet-values pairs are ranked according to their probability of being included in a document relevant to the user. Their objective is to minimize user cost, which is defined as the time needed for reaching an item of interest. The time is an aggregation of the times for reading facet headings, browsing facet hierarchies and correcting browsing mistakes. Moreover, the authors provide an evaluation methodology for personalized faceted search research, in order to complement user studies by being cheap, repeatable, and controllable. In contrast to our work, this work does not allow the user to express any facet and zoom-point preferences. Furthermore, it assumes that each user is searching for exactly one document, and that the user has perfect knowledge of the target document. This can be the case only for focalized search, but not for exploratory search, which is our point of interest. A number of collaborative approaches for the personalization of faceted search and visual graph navigation in Semantic Web data, by content filtering based on (manually or automatically) created ontologies are proposed in Tvarožek (2006); Tvarožek and Bieliková (2007a,c,d,b); Tvarožek et al. (2008). These approaches take advantage of metadata stored in an ontology to create at runtime new facet descriptions. The set of available facets and restrictions adapt to the in-session user behavior and on long term user and other users characteristics stored in the user model. According to these approaches, relevance to users is measured by calculating the distance between values in the hierarchical ontology. In addition they annotate search results to improve user orientation and guidance. Again, it is a collaborative approach and there is no support for explicit preferences. Minimum Effort and Cost Approaches Minimum-effort driven navigational techniques for enterprise databases and warehouses are described in Roy et al. (2008). At each step of the navigation, the system asks the user one or more questions

2.6. FDT and Preferences: Past and Related Works

27

regarding different facets. Then according to the user response, it dynamically fetches the next most promising set of facets. For example in a cars database, a very simple faceted search interface is one where the user is prompted an attribute (e.g. Manufacturer), to which he responds with a desired value (e.g. Honda), after which the next appropriate attribute (e.g. Model) is suggested to which he responds with a desired value (e.g. Accord). The proposed approach is based on minimal cost decision trees, which is an NP-Complete problem. As a result, they use a simple approximation algorithm. This algorithm selects facets based on their ability to rapidly drill down to the most promising tuples as well as the user ability to provide desired values for them. In addition, in Roy and Das (2009) the same authors investigate opportunities to improve the performance of minimum effort driven faceted search techniques. The main idea is motivated by the early stopping techniques used in the TA-family of algorithms for top-K computations. In comparison to the proposed approaches in this thesis, this work does not allow users to express preferences, and it only concerns which facets will be displayed. In the same manner but for zoom-points, Kashyap et al. (2010) propose a cost-based system for faceted navigation, named FACeTOR. The user is presented with a subset of all possible facet conditions (zoom-points), which are selected based on a probabilistic cost model of user navigation. This approach guarantees that the overall navigation cost is minimized and every result is guaranteed to be reachable by a facet condition. Since the selection of the optimal facet conditions is NP-Hard, they present two intuitive heuristics. The first, is inspired by an approximation algorithm for the weighted set cover problem and attempts to find a relatively small set of suggestions that have a high probability of being recognized by users. The second heuristic, greedily selects each facet condition assuming that all future suggestions have identical properties. This automatic approach only concerns zoom-points and does not allow users to express preferences. Semantically Enriching Tweets Abel et al. (2011) present an adaptive and personalized faceted search engine for Twitter. They propose strategies for inferring facets and facet-values (entities and topics) from tweets and related external Web resources, by semantically enriching tweets. Given the semantically enriched tweets, they propose user and context modeling strategies that identify (current) interests of a given Twitter user and allow for contextualizing the demands of this user. As a result they propose faceted search strategies for content exploration on Twitter and methods that adapt to the interests and context of a user, by ranking the facets and facet-values. Finally, they present an evaluation environment based on simulated users

28

Chapter 2. Background and Related Work

to evaluate different strategies in this adaptive faceted search engine on Twitter. All the above functionality is offered automatically, and as a result the user can not explicitly express his preferences over facets, values and objects or define his context. Log Based Utility Pound et al. (2011) model the user faceted-search behavior using the intersection of web query-logs with existing structured data, in order to capture facet and facet-value utility for a specific query. They present an automated scalable solution that elicit user preferences on attributes and values. They propose different disambiguation techniques ranging from simple keyword matching to more sophisticated probabilistic models (based on clustering, logs or clicks) for mapping keywords to different possible attributes. Furthermore, they present a variety of techniques that deal with disambiguating amongst different overlapping attribute-value pairs per query (table or context dependent value selection). In addition they discuss how to use signals from the data, like entropy and sparseness to discover which attributes make better facets. As a result, facets and values are ordered according to available log information and users are not allowed to explicitly express their preferences for their specific information need. Intuition Based Ranking All of the above approaches, assume a precise information need. That is, relevance, interestingness, and user costs (for fulfilling an information need) have been employed for measuring facet importance. On the other hand, Wagner et al. (2011) provide a browsing-oriented approach (i.e. the user has a fuzzy information need and slowly explores an unknown collection of items) for facet ranking. They use an aggregation function over different intuitions and metrics for facet ranking. In particular they prefer facets that allow users to modify the result set via small and uniform facet operations. In addition they group facets and their values by using a divisive hierarchical clustering technique algorithm leading to an Extended Facet Tree. Finally, they provide a task-based evaluation of their system regarding effectiveness and efficiency. Compared to our approach, this approach tries to rank facets and facet values according to the characteristics of the facets and facet values space, but does provide explicit user preferences or ranking of objects according to preference. On the other hand, this is the first method that targets exploratory search and shows that ranking of facet and facet values can be effective for exploratory information needs. Preference Search

2.7. Motivation and Running Example

29

Finally, Kießling et al. (2011b) propose the substitution of Faceted Search, which they consider as a tedious and time consuming trial and error process, with Preference Search. Preference Search replaces lengthy user sessions by one single user request, where the user completes a search mask. The user input is then automatically compiled into one single Preference SQL query. This query is afterwards augmented in a context-sensitive and user adaptive fashion by a recommender component using sensors and friends recommendations from a social network. It then presents to the user the BMO objects. Excluding the recommender system, the above functionality can be easily implemented using our proposed method, by letting the user expressing his preferences over the related facets. Then the system could return to him the top objects for each facet that the user has defined a preference (i.e. Pareto optimal set). Furthermore, our method is more expressive, since it allows preferences over attributes with hierarchically organized values which are possibly set-valued. The support of hierarchies can make the expression of preferences less time consuming, more intuitive and with less cognitive load. One further note is that Kießling et al. (2011b) assume that Faceted Search can return empty results, which is not true in our case, since only categories that lead to non empty results are displayed. Specifically, our hypothesis is that FDT can aid exploratory search by letting the user progressively expressing his information needs. In addition, since preferences are incomplete and most importantly they change over time, the proposed Preference Search method, with its single user request can be successful for focalized search, and not for explorative environments.

2.7

Motivation and Running Example

Let us first motivate the benefits of FDT for decision making over our running example. Consider an international dealer of used cars and suppose that the available cars are stored in a relational table of the form: Car(id, Manufacturer, Model, Category, Price, Color, Power, Volume, Year, Mileage, Fuel, Location, Comment, Accessories). An instance of the table is shown below: Id Manufacturer Model Category Price Color Power Volume Year Mileage Fuel Location Comment Accessories o1 Porsche Carrera 911 Cabrio 50000 Black 350 3600 2005 54000 Petrol Cefalonia Uncrashed {ABS,AT} o2 Alfa Romeo 164 Sedan 15000 Red 180 3000 1995 76000 Petrol Heraklion Crashed {ABS} ... ... ... ... ... ... ... ... ... ... ... ... ... ...

In addition there are three taxonomies that have been designed in order to provide an hierarchical organization for the values of the attributes manufacturer, fuel, and location. The leaves of these

30

Chapter 2. Background and Related Work

taxonomies are the domains of the corresponding attributes which are recorded in the tuples of the relational table11 . Specifically, assume the taxonomies shown in Figure 2.7.

Figure 2.7: Example Taxonomies

Example 1 Assume that somebody, call him James, wants to change his car. He is interested in a family car, although he preferred sport cars when he was younger. His wife prefers Jeeps but he is reluctant due to the extra parking space required and because the garage of his home is somehow small. He believes that Japanese and German cars are more reliable than French or Korean cars. He likes the fact that Hybrid cars consume less, are more ecological and that the annual taxes are lower for such cars. James lives at the city of Heraklion, so cars owned by persons that do not live in the island of Crete (where Heraklion resides) are less preferred for him (due to the traveling time and cost) unless the case is exceptional. In addition he cannot afford an expensive car. Ideally he would like a Porsche with four doors (e.g. Porsche Panamera) and enough space for luggage, hybrid with consumption less than 6lt/100Km, bigger than Panamera (to satisfy his wife) but smaller than Cayenne, with less than 10 thousands kilometers, in sale by his favorite neighbor and at a very good price (e.g. less than 30K Euros), but this is a utopian desire. James aims at buying one car, but it is probable that he would buy a ”Porsche Carrera 4s” if available at a very good price, and another decent but inexpensive family car to satisfy the rest requirements. 11

Our model also allows tuples that contain values which are not necessarily leaves of the corresponding taxonomy.

2.7. Motivation and Running Example

31

Although lengthy, the above description is by no means complete. There are a lot of other aspects that would determine James’ final decision (years of guarantee, grip, airbags, Euro NCAP stars, color, trunk, GPS, CD player, trip calculator, sunroof, etc). What we want to stress with this example is that the specification of preferences is a laborious, cumbersome and time consuming task, and that the resulting descriptions are in most cases incomplete. Pragmatically, decision making is based on complex tradeoffs that involve several (certain or uncertain) attributes as well as user’s attitude towards risk (Keeney and Raiffa (1976)). Moreover preferences are not stable over time. We believe that it is beneficial to provide users with an interaction method in which the preference specification cost is paid gradually and depends on the available choices. For example, why spending time for expressing complex tradeoffs between Porsche models with 4 doors versus those with 2 doors if no Porsche car is in sale. Therefore an effective interaction that shows users the available choices is important for reducing the preference specification cost and for speeding up decision making12 . In brief, the proposed preference specification actions affect the presentation order of: • facets, i.e. the order by which facets (i.e. criteria, attributes) appear, • terms, i.e. the order of the zoom-in/side points (i.e. criteria values, attribute values) appear (which can be hierarchically organized and/or set-valued), and • objects (of the focus), i.e. the order by which the objects (i.e. choices) appear. Now suppose a user who (a) likes European cars, (b) does not like Italian cars, (c) likes Ferrari, and (d) prefers low prices. According to the framework that we propose, the user can express the above preferences straightforwardly, i.e. without having to refer to particular European countries or Italian manufacturers (for expressing (a) and (b)) thanks to the hierarchically organized values, and preference inheritance). Furthermore, he does not have to express all the above in one shot. He can provide them gradually and in any order, say (b)-(a)-(d)-(c), and there is no need to define priorities for resolving the conflicts (e.g. the fact that he likes Ferrari but he does not like Italian cars). The priority will be deduced automatically by a scope-based conflict resolution rule. For instance, the scope of (b), i.e. Italian cars, is contained in the scope of (a), which is the set of European cars, so (b) prevails on Italian cars. Analogously (c) prevails on Ferrari cars (despite the fact that Ferrari is Italian). 12

See also Section 6.3 for a user-based evaluation of the DiFEPreKO hypothesis.

32

Chapter 2. Background and Related Work Moreover the user can express more expressive statements like (e) I prefer Asian to European cars,

and (f) I prefer Italian to Korean cars. From these two statements we can deduce that the user prefers Fiat to Kia, and prefers Toyota to Peugeot. The above are examples of just some of the functionalities offered by the proposed approach. With respect to the characteristics described earlier in Section 2.2.1, this work focuses on multidimensional spaces with hierarchically organized attribute domains, and explicitly-specified and crisp qualitative user preferences. We assume that these preferences hold unconditionally (i.e they are context-free), exact (although we provide support for distance functions), and simple (we assume that preference inheritance is not a compound preference and we also provide prioritized and Pareto composition). We also focus on the preference elicitation process. Preference elicitation refers to the problem of developing a decision support system capable of generating recommendations to a user, thus assisting him in decision making. It is important for such a system to model user’s preferences accurately, find hidden preferences and avoid redundancy. A survey of preference elicitation methods is given in Chen and Pu (2004) while a survey of preference elicitation from a computer scientist’s perspective is given in Braziunas (2006). Most of the above methods focus on the quantitative approach, i.e. on the elicitation of multi-criteria value functions. In this thesis we use the term real-time preference elicitation because according to our approach: (a) the system requires from the user to express his preferences only for those facets/values that are involved in the available (and restricted) set of choices (i.e. not for the entire value space), and (b) we exploit the hierarchical organization of terms for reducing the number of preferences that have to be explicitly specified. To conclude and to the best of our knowledge this is the first work that proposes an incremental preference elicitation mode which allows the user to define the desired preference structure gradually and flexibly, over attributes with hierarchically organized values and possibly set-valued, and employs a scopebased conflict resolution rule.

Chapter 3

A Preference Framework for Multidimensional Information Spaces (Syntax, Semantics and Algorithms)

Contents 3.1

Syntax of the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.2

The Domain of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

3.3

Syntax to Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.3.1

Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.3.2

Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.3.3

Best/Worst Preferences over Hierarchically Organized Values . . . . . . . . .

46

3.3.4

Relative Preferences over Hierarchically Organized Values . . . . . . . . . . .

52

3.3.5

Preferences over Hierarchical Set-Valued Attributes . . . . . . . . . . . . . . .

59

Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

3.4.1

Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

3.4.2

Pareto Composition and BMO-set . . . . . . . . . . . . . . . . . . . . . . . . .

63

3.4.3

Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . .

65

A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

3.4

3.5

33

34

Chapter 3. A Preference Framework for Multidimensional Information Spaces In this chapter we extend the interaction of FDT with user actions for preference specification /

elicitation. Specifically, we introduce a preference framework appropriate for information spaces comprising resources described by attributes whose values can be hierarchically valued and/or multi-valued. We define the language, its semantics and the required algorithms. The framework supports preference inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioritization, Pareto and their combination). We start by introducing a preference framework for multidimensional information spaces. Specifically, Section 3.1 introduces the syntax for preference actions, Section 3.2 describes the domain of the semantics, while Section 3.3 defines the syntax of the semantics for flat, hierarchical, single-valued and set-valued attributes. Finally, Section 3.4 describe the composition (prioritized and Pareto) of preference actions over multiple facets.

3.1

Syntax of the Language

Here we introduce a language consisting of statements that we call preference actions, which can be easily enacted by simple input user actions (i.e. mouse selections). We consider an information space as the one described in Section 2.7. Specifically, each action has a scopeType and a spec, which consists of an anchor and a rankSpec. In more detail, the scopeType (either facets order, terms order, or object order) determines which kind of elements it affects (facets, terms, or objects). Furthermore, each action is “anchored” (anchor) to one element which can be a facet, a term or even an object1 . This anchor allows enacting the preference actions through the GUI straightforwardly as we will see in Section 5.2.2 (i.e. if the user right-clicks on an element e, a pop-up window will show and allow the user to select the desired preference action, where the selected action will be anchored to e. . Finally, each action is associated to a rankSpec (rank description) which can be lexicographic (for ordering strings lexicographically), count (for ordering elements based on the number of objects that are classified to them), value (for ordering numericallyvalued facets) and indexedBy (for ordering objects according to the number of facets each object is indexed by)2 . The language also defines actions for supporting best / worst (i.e. short-cuts to express 1

Such actions would be interesting for example in expressing positive or negative preference over a specific object. In addition we could also support any other method suggested in the bibliography for automatically ranking facets and facet-values as discussed in Section 2.6. 2

3.1. Syntax of the Language

35

preferred / non-preferred according to specific policies) elements ( later on, we extend the language to also capture relative preferences, Prioritized and Pareto composition, intervals, etc.). Syntactically, preference actions are defined through the following grammar in a BNF variation:

⟨stmt⟩ ::= ⟨scopeT ype⟩ ::= ⟨spec⟩ ::= ⟨anchor⟩ ::=

⟨scopeT ype⟩⟨spec⟩ facets order : | terms order : | objects order : ⟨anchor⟩⟨rankSpec⟩ facet ⟨Fi ⟩

|

term ⟨tj ⟩

|

object ⟨ok ⟩

|

ϵ

⟨rankSpec⟩ ::=

// the empty string

{lexicographic | count | value | indexedBy} {min|max}

|

best | worst

|

use scoreFunction ⟨score()⟩ {min|max}

In the above grammar Fi , tj and ok , denote names that match a facet, a term or object respectively, while score is the name of a real-valued function provided by the user or the application programmer (e.g. around operator for proximity search, which can be the edit distance for categorical values, or absolute value of distance for numerical values). Some examples follow: (1) facets order: count max (2) facets order: facet Manufacturer best (3) terms order: facet Year value max (4) objects order: term Location.Cefalonia best (5) objects order: facet Relevance value max (6) objects order: use scoreFunction Relevance * dist(Price,20K) max

Before explaining formally their semantics, let us first describe them informally. The 1st action specifies the order of facets to be in decreasing order with respect to their count information (i.e. max counts are preferred). The 2nd places the facet Manufacturer at the top of the facets list. The 3rd specifies

36

Chapter 3. A Preference Framework for Multidimensional Information Spaces

that the order of the terms of the facet Year to be in decreasing order. The 4th places all objects classified (directly or indirectly) under the term Cefalonia at the top of the object ordering. Now suppose a user who starts the car seeking process by formulating a free text query which the WSE evaluates over the attribute comment of the database. In this case the user would like to see the objects in decreasing order with respect to their relevance. The 5th action captures this requirement where the facet called Relevance corresponds to the score returned by the WSE. Finally, the 6th action orders the objects based on a function over the relevance score and distance from a given price. We now extend the syntax to support relative preferences over facets and terms, as shown before: ⟨stmt⟩ |

facets order : prefer facet ⟨Fi ⟩ to ⟨Fj ⟩

⟨stmt⟩ |

terms order : prefer term ⟨ti ⟩ to ⟨tj ⟩

⟨stmt⟩ |

objects order : prefer term ⟨ti ⟩ to ⟨tj ⟩

Regarding object ranking, we extend the syntax to allow composition of preference that synthesize two or more actions with complex preference constructors over the different facets. Such actions include Pareto, Pareto Optimal (i.e. same ordering as the skyline), Priority and Combinational composition. The syntax of such actions is given below: ⟨stmt⟩

|

objects order : Pareto ⟨setOf F acets⟩

⟨stmt⟩

|

objects order : ParetoOptimal ⟨setOf F acets⟩

⟨stmt⟩

|

objects order : Priority ⟨orderedSetOf F acets⟩

⟨stmt⟩

|

objects order : Combinational ⟨bucketOrderedSetOf F acets⟩

Below we introduce some possible extensions, although we do not focus on them. For example, the compositions described above presuppose a number of object scoped preference actions over each facet that participates in the composition. On the other hand, in the skyline3 operator of SQL, for each attribute participating in the skyline, a single preference is expressed along with the operator (i.e. SELECT * FROM

3

In brief, the skylines as in Papadias et al. (2005) are the maximal (w.r.t. preference) elements, i.e. those which are not dominated by others. This set is also called efficient set, or Pareto optimal set.

3.2. The Domain of Semantics

37

Cars SKYLINE OF price MIN, consumption MIN). ⟨specList⟩ ::= ⟨stmt⟩

|

⟨Fi ⟩ {LOW | HIGH} | ⟨specList⟩ objects order : skylineOf ⟨specList⟩

Furthermore, we can extend the syntax so that to support interval-anchored actions and named actions (that eases the formulation of more complex preferences): ⟨anchor⟩ ⟨namedStmt⟩

| ::=

term interval [ ⟨ti ⟩⟨tj ⟩] NamedAction ⟨String⟩ : ⟨stmt⟩

Notice that for interval functions, we only consider as the pair of the interval, numerical values that are values of the same facet Ff (i.e. ti , tj ∈ Tf ) such that ti ≤ tj . Then with term interval [ ⟨ti ⟩⟨tj ⟩ ] we denote all tk ∈ Tf such that ti ≤ tk ≤ tj . In this case, we use as anchor of a preference action all available values between ti and tj . Such actions can be used as shortcuts and can be easily defined through simple menus. The complete syntax of the language is given in Appendix A.

3.2

The Domain of Semantics

In general, a preference over a set of elements E can be expressed as a binary relation over the elements of E. In the described approach, we do not assume that preference relationships are transitive. So we hereafter assume that a preference relation is the binary relation (E, ≻) (sometimes we will also use its dual relation denoted by ≺). The proposed approach can also be used if we consider transitivity over the preference relationships (as in Kießling (2002)), i.e. a preference relation is a strict partial order (E, ≻)), except for set-valued attributes, since the MoreWins-Rule described later in Def. 3 is not transitive. The actions specified by the syntax allow structuring (ordering) the materialized faceted taxonomy according to the preferences. Independent of how many actions have been issued and what their semantics are, the defined preference at each point in time, comprises k + 2 preference relations. Specifically:

38

Chapter 3. A Preference Framework for Multidimensional Information Spaces • One over the facets: ({F1 , . . . , Fk }, ≻F ), • k preference relations, one for each facet Fi (of the form (Ti , ≻i )), and • one preference relation for the objects (A, ≻Obj ).

Let B be the set of user actions the user has issued. We can partition this set to k + 2 subsets (where k is the number of facets) as follows: B F holds the user actions for facets, BTi holds the user actions for the terms of each facet Fi and BObj holds the user actions regarding the objects’ preferences. So ∪ B = BF ∪( i B Ti )∪B Obj . As each of these sets can contain more than one action, we have to specify how the corresponding preference relation is defined, e.g. from the actions in B Ti to define the preference relation (Ti , ≻i ). Let us now introduce some required notions about preference relations. Consider a set E = {P orsche, F errari, F iat} and a preference relation R≻ over E consisting of one relationship, specifically R≻ = {P orsche ≻ F errari}. We shall use dom(R≻ ) to denote the elements of E that participate in R≻ , here dom(R≻ ) = {P orsche, F errari}, and call inactive the elements of E which are not members of dom(R≻ ), in our case F iat. Given a preference relation R≻ , with R≺ we will denote its dual order. Commonly, preference relations are illustrated using Hasse diagrams. In our case (E, R≻ ) can be illustrated as shown in Figure 3.1. Given a preference relation R≻ and two objects o1 , o2 with o1 ≻ o2 we will denote that o1 is preferred to o2 and with o1 ≺ o2 the reverse.

Figure 3.1: Hasse Diagram of Preference Relation Over E (E, R≻ ) Definition 1 (Valid Preference) We consider a preference relation R≻ over a set of elements E to be valid, if it is acyclic.



Given a set of objects Obj, a bucket order B on Obj with |Obj| items, is the total order L defined over the |B| sets B1 , ..., B|B| , where the |B| buckets are a partition4 of Obj. For any two items oi and oj in Obj, if they are in the same bucket, there is no preference precedence between oi and oj , and these 4

All blocks are pairwise disjoint.

3.3. Syntax to Semantics

39

two items are said to be “tied”. If item oi belongs to Bk and item oj belongs to Bl , we say that oi is more preferred to oj if and only if Bk precedes Bl according to the total order L. A total order on Obj can be viewed as a special case of a bucket order such that every bucket consists of only one item. Definition 2 We say that a linear or bucket order L over E respects a binary relation R over E, if R ⊆ L. □

3.3

Syntax to Semantics

To describe formally the semantics of the syntax we have to define what the various keywords of the syntax, like count, mean precisely. Initially, note that the semantics of lexicographic, count, value, and indexedBy are straightforward, and each defines a linear or bucket order. The same is true also for use ScoreFunction. Note however that count is not applicable to objects, while indexedBy is only valid for objects. For a term t, t.count is the number of objects in A, indexed by term t, or a narrower term of t. For a facet Fi , Fi .count is the number of the elements in A which are indexed by terms of Fi . For example, consider the example of Figure 2.1(a) where we have only one facet, say M anuf acturer. At that point we have M anuf acturer.count = 8 while for the term Italian we have Italian.count = 3. In the restriction on the set A = {4, 5, 6} that is shown in Figure 2.1(b), we have M anuf acturer.count = 3 while Italian is not shown, since Italian.count = 0. Formally, and using the notations of Table 2.1, we have ¯ ∩ A| and Fi .count = |J(Fi ) ∩ A| where J(Fi ) = {o ∈ Obj | D(o) ∩ Ti ̸= ∅} (FDT t.count = |I(t) notations are described in Table 2.1). The semantics of best/worst(ei ) and prefer ei to ej actions are defined in an aggregated way (i.e. not in isolation) and are clarified next.

3.3.1

Flat Single-Valued Attributes

We will now define the semantics of actions that express qualitative preferences, i.e. actions of the form best(ei ), worst(ei ), and prefer ei to ej , starting from the case of single-valued and flat attributes. Let B (resp. W ) be the elements of E on which a best (resp. worst) action has been defined. Let R≻ be the relative preferences (of the form ei ≻ ej ) over E provided by the user. We shall now introduce an algorithm, Alg. Apply, which takes as input these sets and derives one linear or bucket order. The

40

Chapter 3. A Preference Framework for Multidimensional Information Spaces

algorithm also takes a parameter P olicy which determines the ordering of the inactive elements (will be explained later on). Algorithm 1 Apply(E, B, W, R≻ , P olicy) Input: the set of elements E, the set of best elements B over E, the set of worst elements W over E, a set of relative relationships R≻ over E, and P olicy for inactive elements Output: a bucket order L over E that respects R 1: 2: 3: 4: 5: 6:

Rbw ← {(b, w) | b ∈ B, w ∈ W } // each best is preferred than each worst R ← Rbw ∪ R≻ //add relative prefs L ← SourceRemoval(R) //produce blocks with boundaries I ← E \ (B ∪ W ∪ dom(R≻ )) // I contains the inactive elements L′ ← addInactiveElements(L, I, P olicy) return L′

Algorithm 2 SourceRemoval(R) Input: a binary relation R over E Output: a bucket order L over E that respects R 1: 2: 3: 4: 5: 6: 7:

L ← ⟨⟩ repeat S ← maximal≻ (R) R ← R \ {(x, y) ∈ R | x ∈ S} // Remove maximal L ← L.append(S) // Append a bucket to L until S ̸= ∅ return L

At first the algorithm constructs a graph by connecting each best to each worst element ((b, w) means b ≻ w). So best/worst are interpreted as “each best is preferred to each worst”. Then it adds to the graph the relationships in R≻ . Furthermore, we should note here that the parameters B and W actually define a set of relationships (Rbw at line 3 of the algorithm), so they could have been expressed directly through the R≻ parameter, however we keep them separate as they constitute an easily enacted (for the user) shorthand. In order to create Rbw this algorithm assumes that |B| ≥ 1 and |W | ≥ 1. If this is not the case, we can use different policies. Although a linear or bucket order could be produced by traversing the graph in a breadth first search (BFS) manner (where the first block will contain the more preferred elements, the second the next more preferred, etc), if the transitive reduction is a DAG (Directed Acyclic Graph, i.e. not a tree), then BFS could yield wrong results (i.e. the produced linear or bucket order would not respect the condition of Definition 2). This will be made clear in a following example. Using instead of BFS a topological sorting algorithm, which yields a linear ordering of the nodes of a DAG such that each node comes before all

3.3. Syntax to Semantics

41

nodes to which it has outbound edges, e.g. Alg. SourceRemoval as shown above, we can always get a linear order that respects R. In particular, Alg. SourceRemoval is based on the source removal algorithm described in Kahn (1962), satisfying the condition that all removed maximal nodes are inserted in the same bucket. Initially, it finds all the maximal elements of R, moves them in a bucket, and continues with the maximal elements of their children, and so on.

Figure 3.2: Example for Flat Single-Valued Attributes To give an example, let B = {F errari}, W = {F iat, Lancia} and R≻ = {P orsche ≻ F errari, P orsche ≻ F iat}. Figure 3.2 shows at the left the diagram of R, and at the right the result of topological sorting (as derived by step 5 of Apply), i.e. L = ⟨P orsche, F errari, {F iat, Lancia}⟩, meaning that the bucket order consists of three blocks (the first two are singletons).

Figure 3.3: Example for a DAG Consider another example, where R≻ = {P orsche ≻ F errari, F errari ≻ Lancia, Lancia ≻ F iat, P orsche ≻ F iat}. Figure 3.3 shows the resulting total order (i.e. L = ⟨ P orsche, F errari, Lancia, F iat⟩) derived by topological sorting. If the final order was derived using BFS, the bucket order would be LBF S = ⟨P orsche, {F errari, F iat}, Lancia⟩, although F iat is the least preferred car. As a

42

Chapter 3. A Preference Framework for Multidimensional Information Spaces

result R ⊈ LBF S (i.e. LBF S does not respect R). Regarding inactive elements (elements not participating in any action), they can be considered as maximal or minimal elements according to the application needs (controlled by parameter P olicy of Alg. Apply). For example consider a facet Fi with values from a set Ti , and a number of actions that define the sets Bi , Wi , R≻i . By using E = Ti and calling Alg. Apply, in line 7 we compute the set of inactive elements I = E \ (Bi ∪ Wi ∪ dom(R≻i )) (where dom(R≻i ) is the elements of E that participate in R≻i ). Now by using the command addInactiveElements (line 5 of Alg. Apply) and passing as parameters the bucket order L′ , the inactive elements I and the policy based on the application needs, which can be maximal (resp. minimal), we put the inactive elements at the beginning (resp. end) of L′ as a new block. As a final note, our approach assumes totalitarian semantics regarding the attributes that do not participate in any preference action. For example, if F errari ≻ F iat, then any car manufactured by F errari is preferred to any car manufactured by F iat. In the opposite case, (i.e. if our approach supported the ceteris paribus semantics), a F errari would be preferred to a F iat car, provided that these cars agreed regarding preference on the values of all other attributes.

3.3.2

Set-Valued Attributes

Multi-valued attributes appear in several cases (social tags, clustering, etc). In our running example suppose that the attribute accessories of the table Car is multi-valued, taking values like ABS, ESP (Electronic Stability Program), AT (Auto-Transmission), DV D, etc. Definition 3 (Induced Preference over Sets: MoreWins-Rule) If s, s′ are two subsets of E, with wins(s, s′ ) we will denote the number of “times” s beats s′ according to ≻. Formally: wins(s, s′ ) = |{(e, e′ ) | e ∈ s, e′ ∈ s′ , e ≻ e′ }| Any subset S of the powerset of E (i.e. S ⊆ P(E)), can be ordered according to a preference relation that we will denote by ≻{} , defined by the following rule: s ≻{} s′ iff wins(s, s′ ) > wins(s′ , s) □

3.3. Syntax to Semantics

43

As an example consider a set T = {ABS, ESP, AT, DV D} and three statements which define ABS as best, ESP as worst and that ABS ≻ AT. Now consider the following family of sets: S = {{ABS}, {ESP }, {ABS, ESP }, {AT, ABS}, {AT, ESP }, {DV D, ESP }}. The win(s, s′ )/win(s′ , s) values of all pairs of sets from the above family are shown in the next table (the last column shows the number of clear winnings - not ties). By using Def. 3 (i.e. ≻{} ) and then applying topological sorting we get the following bucket order ⟨{ABS}, {AT, ABS}, {ABS, ESP }, {{AT, ESP }, {DV D, ESP }}, {ESP }⟩, as shown in Fig. 3.4. w(s, s′ )/w(s′ , s)

{ABS} {ESP} {ABS,ESP} {AT, ABS} {AT, ESP} {DVD, ESP}

all

{ABS}

0/0

1/0

1/0

1/0

2/0

2/0

5/0

{ESP}

0/1

0/0

0/1

0/2

0/1

0/1

0/5

{ABS,ESP}

0/1

1/0

1/1

1/2

2/1

2/1

3/2

{AT,ABS}

0/1

2/0

2/1

1/1

3/0

3/0

4/1

{AT,ESP}

0/2

1/0

1/2

0/3

1/1

1/1

1/3

{DVD,ESP}

0/2

1/0

1/2

0/3

1/1

1/1

1/3

Figure 3.4: Example for Flat Multi-Valued Attributes Now suppose that both ABS and ESP are defined as best elements, and that both AT and DVD are defined as worst. In that case it holds: wins({ABS}, {ABS, ESP }) = wins({AT }, {AT, DV D}) =

wins({ABS, ESP }, {ABS}) = 0 wins({AT, DV D}, {AT }) = 0

This means that with wins we get 0 whenever sets with best only elements are compared, and sets with worst only elements are compared. If we would like to break such ties we could adopt a MoreGoodLessBad-

44

Chapter 3. A Preference Framework for Multidimensional Information Spaces

rule (the more best elements the better and the less worst elements the better). To define it formally, we first have to introduce some notations. Given an element e we use sup(e) to denote the number of elements that e dominates, minus 1. Formally, sup(e) = |{e′ ∈ E | e ≻ e′ }| − 1. Notice that each worst element takes a negative value. Given a set of values e we define the support of s, denoted by ∑ Support(s), by summing up the support of its terms, i.e. Support(s) = e∈s sup(e). Note that since a worst value takes -1 we can discriminate an s having one worst term from one s′ having 10 worst terms (Support(s) = −1, while Support(s′ ) = −10). We can now proceed and define: Definition 4 (Breaking ties: MoreGoodLessBad-rule) If wins(s, s′ ) = wins(s′ , s) and Support(s) > Support(s′ ) then s ≻{} s′ .



In our case: Support({ABS, ESP }) = 2 > Support({ABS}) = 1 > Support({AT }) = −1 > Support( {AT, DV D}) = −2, and the induced ordering, i.e. ⟨ {ABS, ESP }, {ABS}, {AT }, {AT, DV D} ⟩, satisfies the MoreGoodLessBad-rule. To conclude, in case we have preferences over atomic values but the information space has set-valued attributes, then it is enough to use Alg. Apply with a small modification. Initially, we follow the first two steps of Alg. Apply, in order to compute the relation ≻ of the atomic values. We should stress here that to correctly compute wins we have to take into account the transitive closure of the preference relation. For example, if a ≻ b and b ≻ c and we want to compute wins({a, e}, {c, e}) we should consider that a ≻ c. In other words, we should anticipate the topological sorting of Apply over individual values before computing wins over sets. Then we compute the wins (and the Support to break ties), to define ≻{} . Afterwards we continue with the next steps of Alg. Apply, i.e. topological sorting and so on, eventually yielding the final bucket order of the sets. The steps are given in more detail in Alg. 3.

3.3.3

Best/Worst Preferences over Hierarchically Organized Values

So far we have considered single-valued and set-valued attributes over flat (non hierarchically organized) value domains. Let us now consider hierarchically organized values. As an example if the user is interested in “Italian” cars and marks them as “best” then it is reasonable to apply “best” also to its narrower terms, i.e. to Ferrari, Fiat, etc. It is not hard to see that the approach described in the previous section is not adequate for terms which are not leaves. For example suppose the following set of actions (using an informal syntax): B = {Best(European), W orst(Italian), Best(F errari)}, which define

3.3. Syntax to Semantics

45

Algorithm 3 ApplyOverFamiliesOfSets(E, B, W, R≻ , P olicy) Input: the set of elements E (here each element of E is a set), the set of best elements B, the set of worst elements W , a set of relative relationships R≻ , and P olicy for inactive elements Output: a bucket order L over E 1: Rbw ← {(b, w) | b ∈ B, w ∈ W } 2: R ← Rbw ∪ R≻ 3: R ← Closuretransitivity (R) // Addition of the transitively induced links 4: for each e, e′ ∈ E, s.t. e ̸= e′ do 5: if wins(e, e′ ) > wins(e′ , e) then 6: set e ≻{} e′ 7: else if wins(e, e′ ) < wins(e′ , e) then 8: set e′ ≻{} e 9: else if wins(e, e′ ) = wins(e′ , e) then 10: resolve the tie by computing the support(e) and support(e′ ) 11: L ← SourceRemoval(≻{} ) 12: I ← E \ dom(≻{} ) // I is the set of inactive elements 13: L′ ← addInactiveElements(L, I, P olicy) 14: return L′ the sets B = {European, F errari}, W = {Italian}. If we apply Alg. Apply without taking into account the taxonomy we would get the bucket order shown in Figure 3.5 which does not make much sense, nor helps us to derive the intended ordering of cars.

Figure 3.5: Example of Preferences Without Exploiting Hierarchies It follows that without proper exploitation of the subsumption relation, the user would have to issue a high number of actions, all anchored to leaf terms. To tackle this problem, below we introduce a form of preference inheritance where preferences are “inherited” to the narrower terms. Let b be an action in B. We shall use scope(b) to denote the scope of the action b, which is the set of elements (either facets, terms, or objects) that are affected by this action. To capture inheritance we will redefine the scope of actions which are anchored to terms of a taxonomy.

46

Chapter 3. A Preference Framework for Multidimensional Information Spaces

Definition 5 (Scope and Inheritance) Let b be an action b = ⟨e, rs⟩ where e is its anchor and rs the other part of the action. The scope of b is defined as: scope(b) = scope(⟨e, rs⟩) = ∪e′ ∈N ∗ (e) scope(⟨e′ , rs⟩) where N ∗ (e) stands for e and the narrower elements of e, formally N ∗ (e) = {e}∪N + (e) = {e′ | e′ ≤ e}. □ In other words, the scope of b is the union of the scopes of the actions obtained by replacing the anchor e with a narrower term of e. Table 3.1 defines exactly the scope for each action, while the scopes of our example according to Def. 5, are shown in the first two columns of Table 3.2. Table 3.1: Scopes (Direct and Under Inheritance) scopeType

anchor

(D)irect scope

(I)nherited scope

facet Fi

Ti

Ti

terms order

term tj

{tj }

objects order

term tj

I(tj )

N ∗ (tj ) ¯ j) I(t

Table 3.2: Scopes: Example for Best/Worst Preferences action b1 : Best(Europe)

b2 : Worst(Italian) b3 : Best(Ferrari)

scope {European, German, Audi, BMW, Porsche, French, Citroen, Peugeot, Italian, Lancia, Ferrari, Fiat, Lamborghini } {Italian, Lancia, Ferrari, Fiat, Lamborgini} {Ferrari}

active scope scope(b1) \ scope(b2) scope(b2) \ scope (b3) scope(b3)

Note that the set of action B = {b1 , b2 , b3 } defines a valid preference, i.e. no cycles are formed (recall Def. 1). However, if we “unfold” each b ∈ B, based on its scope, then we will get a B ′ that does not define a valid preference, e.g. Ferrari will be both best and worst and this forms a cycle. To tackle this problem, and to provide an intuitive interpretation of user’s actions, we will introduce what we call active scope, after first introducing some required definitions. Definition 6 We say that an action b is equally or more refined than an action b′ , denoted by b ⊑ b′ , if scope(b) ⊆ scope(b′ ).



3.3. Syntax to Semantics

47

In this way a preorder (reflexive and transitive) relation over B, denoted by (B, ⊑) is defined. In the case of our example, the Hasse diagram of ⊑ is shown in Figure 3.6.

Figure 3.6: Hasse Diagram of Actions Refinement The objective is to use (B, ⊑) for resolving the conflicts incurred due to inheritance. This can be done by assuming that more specific preferences prevail over less specific ones (specificity). Particularly, we introduce the following rule: Definition 7 (Scope-based Dominance Rule) If A ⊆ scope(b) ⊆ scope(b′ ) then b′ is dominated by b on A, and thus action b′ should not determine the ordering of A.



We can now define the active scope of each action, by excluding from its scope the scopes of its direct children with respect to ⊑. Specifically, we can define active scope as: Definition 8 (Active Scope) If C(b) denotes the direct children of b with respect to ⊑, then the active scope of b, denoted by aScope(b), ∪ is defined as: aScope(b) = scope(b) \ ( b′ ∈C(b) scope(b′ )) □ In our example, the active scopes are shown in the Table 3.2. From these we obtain B = ascope(b1)∪ ascope(b3), and W = ascope(b2), which define a valid preference. Specifically, Alg. Apply will return (assuming inactive elements go at the end) the following bucket order: ⟨ {European, F errari, German, Audi, BM W, P orsche, F rench, Citroen, P eugeot}, {Lancia, F iat, Lamborghini}, {Asian, Japanese, T oyota, Korean, Kia, American, U.S.A., Chrysler} ⟩

Now, consider the same set of actions B but suppose that they are object-scoped instead of termscoped, and assume that the table Cars contains the following tuples:

48

Chapter 3. A Preference Framework for Multidimensional Information Spaces Id

Manuf

P

Porsche

L

Lancia

Fi

Fiat

Fe

Ferrari

T

Toyota

...

The (plain and active) scopes in this case are: action

scope

active scope

b1 : Best(Europe)

{P, L, Fi, Fe}

{P}

b2 : Worst(Italian)

{L, Fi, Fe}

{L, Fi}

b3 : Best(Ferrari)

{Fe}

{Fe}

The sets B and W of the active scopes are: B = {P, F e} and W = {L, F i}. With these parameters Alg. Apply will yield the ordering: ⟨{P, F e}, {L, F i}, {T }⟩. The algorithm that supports inherited preferences and scope-based resolution of conflicts is Alg. PrefOrder. It starts by computing the scopes of each action b ∈ B (line 2) using Def. 5 in order to compute the preorder relation (B, ⊑) (line 3). Afterwards, it computes the active scopes using the Def. 8 (line 5), and expands the original set of actions B to a new set of actions B ′ , by including the new actions computed by the active scopes (line 6). Then, it parses the new actions set B ′ in order to get the B, W and R≻ (line 8). Finally, it calls Alg. Apply (line 9). Algorithm 4 PrefOrder(E, B, P olicy) Input: the set of elements E, the set of actions B, and P olicy for inactive elements Output: a bucket order L over E 1: // Part (i): Computation of (B, ⊑) 2: Compute the scopes of the actions in B 3: Form (B, ⊑) 4: // Part (ii): Efficient Computation of Act. Scopes 5: Use (B, ⊑) to compute the active scopes of the actions in B 6: Use the active scopes to expand the set B to a set B ′ 7: //Part (iii): Derivation of the final bucket order 8: (B, W, R≻ ) ← Parse(B ′ ) 9: return Apply(E, B, W, R≻ , P olicy) // call to Alg. 1 Let us discuss now a number of propositions.

3.3. Syntax to Semantics

49

Prop. 1 If B ∩ W = ∅ and (T, ≤) is a tree, then in the expanded (through active scopes) actions, a term ⋄

cannot be both Best and Worst. Proof: Since (T, ≤) is a tree, for each term t there is only one and unique path starting from t and ending to the root of the tree. The term t will be in active scope of the closest action, i.e. in the active scope of an action anchored on t, or on its father, or on the father of its father, and so on. Therefore it can be in the active scope of an action anchored to its closest (in the path) term. Since B ∩ W = ∅ that anchor can be either in B or in W (not both), therefore t cannot be both Best and Worst.

This means that the inheritance of preferences over tree-structured facets cannot create any ambiguity. However if (T, ≤) is a DAG (Directed Acyclic Graph), then Prop. 1 does not always hold, e.g. consider a term having two direct fathers one defined as best, the other as worst. Such actions do not define a valid preference and below we show how we can detect such cases. Let: effAnchors(t) = minimal{ t′ | t ≤ t′ and t′ is anchor of one preference action} Prop. 2 If B ∩ W = ∅, then there is not any ambiguity about a term t iff the actions in effAnchors(t) ⋄

are all either Best or W orst. Proof: It is a straightforward consequence of the definitions that a term t will be in active scopes of the actions anchored in the terms that belong to the set effAnchors(t) = minimal { t′ | t ≤ t′ and t′ is anchor of one preference action}. If all such actions are Best (resp. Worst) statements, then t will be Best (resp. Worst) in the expanded statements. If however some of these actions are Best and some are Worst, then (since t will be in the active scopes of all of them) t will be both Best and Worst, and thus the expansion will create ambiguities (and hence an invalid preference). Note that Prop. 1 is a special case of Prop. 2, since in trees for each term t it holds: |effAnchors(t)| ≤ 1

50

Chapter 3. A Preference Framework for Multidimensional Information Spaces

Algorithmically we can check whether the actions defined over a DAG-structured facet create an ambiguity by checking the condition of Prop. 2 only for those terms which have more than one direct fathers. Prop. 3 Alg. PrefOrder respects the scope-based dominance rule (Def. 7).



Proof: Suppose the opposite, i.e. suppose that ∃ b, b′ and A ⊆ Obj s.t. A ⊆ scope(b) ⊆ scope(b′ ) and that PrefOrder orders the elements of A on the basis of action b′ . This cannot be true, since according to Def. 8, the active scope of b′ will not contain A. Notice that although in the definition of active scopes (Def. 8) only the direct children are used, the scope (defined as in Def. 5) is based on N ∗ (e) so it takes into account all children wrt ≤. For this reason it is enough at Def. 8 (and actually more efficient at implementation level) to consider only the direct children.

3.3.4

Relative Preferences over Hierarchically Organized Values

To complete the expressive power of the proposed actions, here we study the case of relative (qualitative) preferences over hierarchically organized values. Specifically, our objective is to support sets of preferences of the form: (b1 ): Asian ≻ European (b2 ): European ≻ Kia (b3 ): BM W ≻ Asian (b4 ): Kia ≻ F iat (b5 ): T oyota ≻ Kia whose semantics take into account inheritance, and conflicts are resolved in an intuitive manner. To this end we will define the scope and the expansion of such preferences. Definition 9 (Scope of Relative Preferences) The scope of a preference relationship ei ≻ ej , denoted by scope(ei ≻ ej ), is defined as: scope(ei ≻ ej ) = (N ∗ (ei ) × N ∗ (ej )) ∪ (N ∗ (ej ) × N ∗ (ei )) □

3.3. Syntax to Semantics

51

Definition 10 (Expansion of Relative Preferences) The expansion of a preference relationship ei ≻ ej , denoted by expansion(ei ≻ ej ), is defined as: expansion(ei ≻ ej ) = {e′i ≻ e′j | e′i ∈ N ∗ (ei ), e′j ∈ N ∗ (ej )} □ This means that expansion(ei ≻ ej ) actually “unfolds” the preference relationship ei ≻ ej on the basis of the subsumption relationships, while scope(ei ≻ ej ) does not contain any preference relationship (it is used for resolving conflicts as we shall see below). The scope-based ordering of such actions is defined as before (Def. 6), i.e. b ⊑ b′ iff scope(b) ⊆ scope(b′ ). We can now define the active scope of a preference ei ≻ ej by excluding from its expansion all relationships e′i ≻ e′j such that (e′i , e′j ) belongs to the scope of a child (w.r.t. ⊑) action. Definition 11 (Active Scope of Relative Preferences) The active scope of a preference action b, in the context of a set of preference actions B is defined as: aScope(b) = {ei ≻ ej ∈ expansion(b) | ∄b′ ∈ B s.t. b′ ⊑ b and (ei , ej ) ∈ scope(b′ )} which is equivalent to: aScope(b) = expansion(b) \ (



{ei ≻ ej | (ei , ej ) ∈ scope(b′ )})

b′ ⊑b

□ Assume that the taxonomy of manufacturers has the form shown in Figure 3.7,

Figure 3.7: Taxonomy of Manufactures

52

Chapter 3. A Preference Framework for Multidimensional Information Spaces Then the scope-based ordering of preferences b1 -b5 is that shown in Figure 3.8, while Table 3.3 shows

the scopes, expansion and active scopes of the actions.

Figure 3.8: Hasse Diagram of Scope-Based Ordering of Preference Actions

Table 3.3: Scopes: Example for Relative Preferences preference b1 : Asian ≻ European

b2 : European ≻ Kia b3 : BM W ≻ Asian

b4 : Kia ≻ F iat b5 : T oyota ≻ Kia

expansion Asian ≻ European, Asian ≻ BM W , Asian ≻ F iat, Kia ≻ European, Kia ≻ BM W , Kia ≻ F iat, T oyota ≻ European, T oyota ≻ BM W , T oyota ≻ F iat, Lexus ≻ European, Lexus ≻ BM W , Lexus ≻ F iat European ≻ Kia, BM W ≻ Kia, F iat ≻ Kia BM W ≻ Asian, BM W ≻ Kia, BM W ≻ T oyota, BM W ≻ Lexus Kia ≻ F iat T oyota ≻ Kia

active scope Asian ≻ European, Asian ≻ F iat, T oyota ≻ European, T oyota ≻ F iat, Lexus ≻ European, Lexus ≻ F iat

European ≻ BM W ≻ Kia

Kia,

BM W ≻ Asian, BM W ≻ Kia, BM W ≻ T oyota, BM W ≻ Lexus Kia ≻ F iat T oyota ≻ Kia

As in the case of Best/Worst preferences (and Prop. 1 and 2), here we have to examine whether the expansion of relative preferences creates ambiguities (conflicts), apart from those which are resolved by the scope-based rule, and how we can identify such cases. Let B be a set of relative preference actions, which define a valid preference relation R≻ over a (T, ≤). We will examine whether a preference relationship between two terms e and e′ of T (either e ≻ e′ or e′ ≻ e), can be in the active scope of more than one action in B. If this holds then this means both e ≻ e′

3.3. Syntax to Semantics

53

and e′ ≻ e could belong to the expanded (through the active scopes) preference relation, and thus that preference relation would be invalid. Let us make the hypothesis that a relationship e ≻ e′ belongs to the active scope of two actions bi and bj such that bi ̸= bj . Suppose that bi : ti ≻ ti′ and bj : tj ≻ tj ′ . Certainly e ≻ e′ should belong to the expansions of both bi and bj . Membership to expansion of bi means: e ≤ ti and e′ ≤ ti′ . Membership to expansion of bj means: e ≤ tj and e′ ≤ tj ′ . We can identify the following cases: (i) if ti ≤ tj and ti′ ≤ tj ′ then it holds bi ⊑ bj and thus e ≻ e′ can belong to the active scope of bi only (and not of bj ). (ii) if tj ≤ ti and tj ′ ≤ ti′ then it holds bj ⊑ bi and thus e ≻ e′ can belong to the active scope of bj only (and not of bi ). (iii) if ti ≤ tj and tj ′ ≤ ti′ , or tj ≤ ti and ti′ ≤ tj ′ then neither bi ⊑ bj nor bj ⊑ bi holds. This means that in such cases it could belong to the active scopes of both. An example is shown at Figure 3.9 (left). (iv) If ti ||tj and/or tj ′ ||ti′ , again we have bi ̸⊑ bj and bj ̸⊑ bi , meaning that e ≻ e′ would belong to the active scopes of both. Note that the case ti ||tj and tj ′ ||ti′ can occur in DAGs, and an example is shown at Figure 3.9 (right). For the case of trees we cannot have ti ||tj , since we know that e ≤ ti and e ≤ tj (it is not possible to hold all these three relationships). For the same reason in trees it cannot hold tj ′ ||ti′ .

e e

e’: e ≤ e’ e’: e > e’ European

Asian

German

Korean

BMW

KIA

BMW > KIA due to German > Asian BMW < KIA due to Korean > European

A

P

B

I

R J

I > J due to A > B I < J due to R > P

Figure 3.9: Examples of Conflicts The cases (iii) and (iv) are indicative situations when conflict can occur. Note that case (iii) can occur both in trees and DAGs, while case (iv) only in DAGs.

54

Chapter 3. A Preference Framework for Multidimensional Information Spaces It follows from the above that we need methods for detecting the cases where inheritance causes

invalidities. One method to do so, is to compute the expansion and then check for cycles. This means that a classical cycle detection algorithm (e.g. topological sort) is enough for detecting such cases. We could also avoid the expansion step in some cases. Below we elaborate on a method that could be applied for the case of tree-structured taxonomies. To begin with, let Re denote the expanded (through the notion of active scopes) preference relation of R≻ (obviously, R≻ ⊆ Re ). Prop. 4 (Relative Inherited Preferences and Conflicts) For tree-structured taxonomies, the expansion through active scopes of a valid preference relation R≻ (yielding a preference relation Re ) can create a conflict iff (if and only if) there are two actions in R≻ (not necessarily different) of the form a ≺ b and c ≺ d such that either: (i) a ≤ d and c ≤ b hold, or (ii) b ≤ c and d ≤ a, hold. If these actions are the same, meaning that a = c and d = b, the formulation of the proposition becomes: Re has a conflict iff there is an action a ≺ b and either a ≤ b or b ≤ a. ⋄ Proof: (Direction: If the conditions of the proposition hold then Re has a conflict) As we can see from Figure 3.10 (i), if the conditions of the proposition hold, then Re contains a conflict (either between a and c, or between d and b). Regarding the special case (where the two actions are the same), note that if b ≤ a then we get the cycle b ≺ b (see Figure 3.10 (ii-left)). If a ≤ b then we get the cycle a ≺ a (see Figure 3.10 (ii-middle)). Note than non trivial cycles (i.e. not self-cycles) can also occur, e.g. if c ≤ b ≤ a, with the expansion we will get c ≺ b and b ≺ c (see Figure 3.10 (ii-right)). (Direction: if Re has a conflict then the conditions of the proposition hold) Trivial Cycle Suppose that Re has a trivial cycle of the form a ≻ a. Since this relationship cannot belong to R≻ (which is acyclic by assumption), it should be result of an inherited action, therefore a should have a superclass, say sp, for which there is an action sp ≻ sb, and this action for being inheritable to a, it should also be a ≤ sb. Therefore it should hold a ≤ sb and a ≤ sp. However, since ≤ is a tree, sb and sp cannot be incomparable (i.e. it cannot be

3.3. Syntax to Semantics

55

(ii)

(i) d

b

a

c

a

(iii) b

a a

a

c

d

b

b

a

d

b

c

b e

e’

c Figure 3.10: Relative Inherited Preferences and Conflicts Examples sb||sp), therefore it should either be a ≤ sb ≤ sp or a ≤ sp ≤ sb. We reached to the conclusion that there exists an action sp ≻ sb and either sb ≤ sp or sp ≤ sb. This is exactly what the proposition states. Cycle of the form e ≺ e′ ≺ e A relationship e ≺ e′ can belong to Re either because it belongs to R≻ , or due to an action a ≺ b to whose active scope the relationship e ≺ e′ belongs. In the latter case it should be e ≤ a and e′ ≤ b. Analogously, a relationship e′ ≺ e can belong to Re either because it belongs to R≻ , or due to an action c ≺ d to whose active scope the relationship e ≺ e′ belongs. In the latter cases, it should be e′ ≤ c and e ≤ d (illustrated at Figure 3.10 (iii)). However, since ≤ is a tree it cannot be a||d nor b||c. Therefore we can have one of the following four cases (also illustrated at Figure 3.11). (i) a ≤ d and b ≤ c (ii) a ≤ d and c ≤ b (iii) d ≤ a and b ≤ c (iv) d ≤ a and c ≤ b We cannot be in case (i) because in that case e ≻ e′ would not be in the active scope of c ≺ d (that would contradict one of our hypothesis). Similarly, we cannot be in case (iv) because in that case e ≺ e′ would not be in the active scope of a ≺ b. So only (ii) and (iii) can hold. Notice that we reached to the exact conditions that the proposition states.

56

Chapter 3. A Preference Framework for Multidimensional Information Spaces d

c

d

b

a

c

a

b

a

b

a

c

d

b

d

c

e

e’

e

e’

e

e’

e

e’

(i)

(ii)

(iii)

(iv)

Figure 3.11: Examples of Cycles of the Form e ≺ e′ ≺ e ⋄ Based on the above proposition, below we describe an algorithmic method for identifying such problems without having to expand R≻ , i.e. without having to compute Re . For each pair of statements (i.e. for each pair of relationships in R≻ ) we check whether the condition of Proposition 4 holds. This means that we need to check the proposition |R≻ |(|R≻ |−1)/2 times. To check the proposition once, we have to check whether four ≤ relationships hold. If the transitive closure of ≤ is stored then this can be checked fast (one scan, or even faster if indexes exist). If the transitively induced ≤ relationships are not stored, then we can check whether t ≤ t′ by applying the reachability algorithm with cost analogous to the average depth of the taxonomy. If however the taxonomy has been labeled (e.g. using Agrawal et al. (1989)), then we can check whether t ≤ t′ in O(1). At application level, the detected invalidities can be managed in various ways. For instance, we can inform the user and ask him to revise his preferences or to resolve the ambiguity. Alternatively one could consider the preference invalid and thus ignore it, or “cut” the inheritance at some points (e.g. at the points of conflicts), or employ other conflict resolution rules (e.g. the closer in ≤ hierarchy prevails, or the more recent action prevails, etc). All these are application-specific issues that go beyond the focus of this thesis. Obviously (B, ⊑) contains relationships between preferences of the same kind (i.e. Best/Worst and Relative). Therefore, when we are in the first step of the algorithm where we compute (B, ⊑), first we calculate the actions’ refinement preorder for Best/Worst preference actions, then for Relative preference actions and finally we return the union of these relationships. Returning to preference-based order and the actions b1 -b5 given at the beginning of this section, we can apply Alg. PrefOrder as it is (assuming the scope defined as in this section). Specifically, to produce

3.4. Multi-Facet Preferences

57

the induced ordering we have to pass to Alg. Apply through R≻ , all active scopes of the actions in B. Figure 3.12 shows the transitive reduction (i.e. the Hasse Diagram) of the relation R≻ for the preferences over the Manufacturer attribute. The derived bucket order by Alg. PrefOrder in our example is: ⟨ {BM W }, {Asian, T oyota, Lexus}, {European}, {Kia}, {F iat} ⟩ , and its restriction on the leaves of the taxonomy is: ⟨ {BM W }, {T oyota, Lexus}, {Kia}, {F iat} ⟩ which captures the intuition.

Figure 3.12: Hasse Diagram of the Relation R for the Manufacturer Attribute

3.3.5

Preferences over Hierarchical Set-Valued Attributes

In case we have set-valued attributes over hierarchically organized value domains, we can again exploit inheritance to order the sets. In particular, consider the scope and active scope as defined earlier, in a way that captures relative preferences. We can apply Alg. PrefOrder up to line 8 (i.e. just before calling the algorithm Apply), and then apply the algorithm described in Section 3.3.2 (based on the relation ≻{} ), to derive the final bucket order. The steps are sketched in more detail in Alg. 5.

3.4

Multi-Facet Preferences

Here we describe the case where we have actions that concern more than one facets. The user can define separately a preference for each facet (using one or more actions) and then compose them using Priority or Pareto (Pareto Optimal is a subcase of Pareto) operators, or a composition of the previous operators.

58

Chapter 3. A Preference Framework for Multidimensional Information Spaces

Algorithm 5 PrefOrderSetValued(E, B, P olicy) Input: the set of elements E (E is a family of sets), the set of actions B, and P olicy for inactive elements Output: a bucket order L′ over E 1: // As in Alg. 4: 2: Compute the scopes of the actions in B and form (B, ⊑) 3: Use (B, ⊑) to compute the active scopes of the actions in B 4: Use the active scopes to expand the set B to a set B ′ 5: (B, W, R≻ ) ← Parse(B ′ ) 6: // As in Alg. 3: 7: Rbw ← {(b, w) | b ∈ B, w ∈ W } 8: R ← Rbw ∪ R≻ 9: R ← Closuretransitivity (R) // Addition of the transitively induced links 10: Compute ≻{} based on wins and support as in Alg. 3 11: L ← SourceRemoval(≻{} ) 12: I ← E \ dom(≻{} ) // I is the set of inactive elements 13: L′ ← addInactiveElements(L, I, P olicy) 14: return L′

3.4.1

Prioritized Composition

Prioritized composition (Kießling (2002)) of two preference relations P 1 and P 2, denoted by P 1 ▷ P 2, meaning that P 1 has more priority than P 2, is defined as: x ≻P 1▷P 2 y iff x1 ≻P 1 y1 ∨ (x1 = y1 ∧ x2 ≻P 2 y2 ) Let B i and B j be two sets of object-scoped actions. Suppose the user has defined B i ▷ B j , and let A be the current object set (the focus). The ordering of A with respect to B i ▷ Bj , is derived by ordering each block defined by the preference Bi , using the preferences in Bj . The exact steps are given in Alg. MFPriority. At Step 1 we derive the blocks defined by the preference Bi . At Step 2 we order the elements of each block derived from the first step, using the actions in Bj . Finally, at Step 3 we just put these blocks in the order specified by Step 1. Let us now denote with o1 ∼ o2 that two objects are indifferent based on the relation R≻ , i.e. that neither o1 ≻ o2 or o2 ≻ o1 holds. A refinement of the indifference relation associated to a preference relation R≻ is to consider objects o1 , o2 as equivalent o1 ≈ o2 5 , if o1 ∼ o2 and for all o ∈ Obj such that o1 ≺ o or o ≺ o1 , it is o2 ≺ o or o ≺ o2 respectively and vice verca. If o1 ∼ o2 and o1 ̸≈ o2 we say that 5

Another symbol used for equivalence in the bibliography is ≡.

3.4. Multi-Facet Preferences

59

Algorithm 6 MFPriority(A, B i , B j ) Input: the objects of current focus A, the actions B i for facet Fi , and the actions Bj for facet Fj Output: a bucket order L of A corresponding to B i ▷ B j 1: We call the Alg. PrefOrder(A, B i ) and let L = ⟨A1 , . . . AM ⟩ be the produced bucket order where M is the number of blocks returned. 2: For each block Am of L (1 ≤ m ≤ M ) where |Am | > 1, we call PrefOrder(Am , B j ), returning a bucket order Lm = ⟨Am1 , . . . , Amz ⟩. 3: We replace each block Am of L with its bucket order Lm and this yields the final bucket order L = ⟨L1 , . . . , LM ⟩.

objects o1 and o2 are incomparable Ciaccia and Torlone (2011). It is easy to see that the produced bucket order interprets prioritized composition (▷) as: x ≻P 1▷P 2 y iff x1 ≻P 1 y1 ∨ (x1 ∼P 1 y1 ∧ x2 ≻P 2 y2 ) where x1 ∼P 1 y1 means that x1 and y1 are in the same block in the bucket order produced by P 1. This means that the relative ordering of the blocks defined by P 1 is preserved, and this policy is aligned with what the user expects to see. This is the prioritized composition described in Chomicki (2003), which is referred to as triangle composition in Ross (2007). A refinement of the above is to use equivalence ≈ instead of indifference (∼) (see Section 3.2). This refinement can be made, since in our case if o1 ∼ o2 and for all o ∈ Obj such that o1 ≺ o or o ≺ o1 , it is o2 ≺ o or o ≺ o2 respectively and vice verca (our algorithms provide a bucket order for all elements, since they also consider inactive elements). As a result, the produced bucket order interprets prioritized composition (▷) as follows: x ≻P 1▷P 2 y iff x1 ≻P 1 y1 ∨ (x1 ≈P 1 y1 ∧ x2 ≻P 2 y2 ) The above algorithm can be straightforwardly generalized to more than two facets. For example assume that the user has defined: B Loc ▷ B M anuf ▷ B price Moreover, assume the actions BLoc = {Best(Crete), W orst(Chania)}, BM anuf = {Best(European), W orst(Italian), Best(F errari)} and BP rice = {price min}, and suppose that the current focus A consists of the following tuples:

60

Chapter 3. A Preference Framework for Multidimensional Information Spaces Id

Location

Manuf.

Price

...

L

Heraklion

Lancia

10

...

B

Chania

BMW

20

...

A1

Athens

Audi

20

...

A2

Athens

Audi

21

...

F1

Heraklion

Ferrari

100

...

F2

Rethymno

Ferrari

80

...

The constituent and final bucket orders are shown below (for the composed preferences we use nesting to make clear how each block was derived):

LBLoc LBM anuf LBP rice LB Loc ▷BM anuf LB Loc ▷B M anuf ▷BP rice

= ⟨{L, F1 , F2 }, {B}, {A1 , A2 }⟩ = ⟨{B, A1 , A2 , F1 , F2 }, {L}⟩ = ⟨{L}, {B, A1 }, {A2 }, {F2 }, {F1 }⟩ = ⟨⟨{F1 , F2 }, {L}⟩{B}, {A1 , A2 }⟩ = ⟨⟨⟨{F2 }, {F1 }⟩{L}⟩{B}, ⟨{A1 }, {A2 }⟩⟩ = ⟨F2 , F1 , L, B, A1 , A2 ⟩

Note that the above specified prioritized composition method (and algorithm) does not adopt the ceteris paribus semantics, since it does not require equality of values. Higher priority implies preference over all other attributes, and therefore it adopts the totalitarian semantics. Totalitarian semantics are too strong and can lead to cyclic preferences when several comparative preference statements are dealt with (Neves and Kaci (2010)). In our framework we always compose facets either by Priority or Pareto composition or a combination of them to avoid this kind of cycles. We can even assume a default behaviour of automatic facet priority driftage, based on the interaction of the user with the facets. The second prioritized facet assumes totalitarian semantics for each block of the bucket order returned by ordering the elements based on the most prioritized facet, the third prioritized facet assumes totalitarian semantics per sub-block of the previous bucket order, etc. For example in the previous case, since the Location facet is prioritized over the Manufacturer facet, and Heraklion ≻ Chania, the Lancia will be preferred to the BMW, although Italian cars are not preferred over other European cars.

3.4. Multi-Facet Preferences

3.4.2

61

Pareto Composition and BMO-set

The Pareto composition (Kießling (2002)) assumes that the preferences expressed over different facets are equally important. Typically, the Pareto composition of two preference relations P 1 and P 2, denoted by P 1 ⊗ P 2, is defined as: x ≻P 1⊗P 2 y iff (x1 ≻P 1 y1 ∧ (x2 = y2 ∨ x2 ≻P 2 y2 )) ∨ (x2 ≻P 2 y2 ∧ (x1 = y1 ∨ x1 ≻P 1 y1 )) The winnow (Chomicki (2003)) operator or Pareto optimal or Best operator (Torlone and Ciaccia (2002)), selects the maximal elements of the preference order defined using the Pareto composition (i.e. BMOset). There are many algorithms for the winnow operator, like BNL described in Börzsönyi et al. (2001) or SFS described in Chomicki et al. (2003). The winnow operator is also implicit in skyline queries, which supports only LOWEST and HIGHEST preferences based on the Preference Algebra described in Kießling (2002). Methods for calculating skylines over partially ordered data have also started to emerge as in Zhang et al. (2010). Let B i and B j be two sets of object-scoped actions. Suppose the user has defined B i ⊗ B j , and let A be the current object set (the focus). Lets denote with ABM O the BMO-set of the focus A. The exact steps for computing the Pareto are given in Alg. MFPareto. Algorithm 7 MFPareto(A, Bi , Bj ) Input: the objects of current focus A, the actions B i for facet Fi , and the actions Bj for facet Fj Output: a bucket order L of A corresponding to B i ⊗ B j 1: We call the Alg. PrefOrder(A, B i ) and Alg. PrefOrder(A, B j ) for facets Fi and Fj and let Li = ⟨Ai1 , . . . Aim ⟩ be the produced bucket order for facet Ai and Lj = ⟨Aj1 , . . . Ajn ⟩ for facet Aj , where m and n is the number of blocks returned for each facet resp. 2: while the bucket orders B i and B j are not empty do 3: Get the maximal elements of each bucket order, i.e. Aimax and Ajmax 4: Check which objects in the bucket orders Aimax and Ajmax are not dominated by other objects in the Aj and Ai bucket orders respectively. These objects belong to the current BMO-set ABM Ocurrent 5: Append ABM Ocurrent to returned bucket order L 6: Remove objects in ABM Ocurrent from bucket orders B i and B j 7:

return Bucket order L Initially, we derive the bucket orders defined by the preference actions B i and B j . Then we get the

maximal elements from each bucket order (i.e. the objects of the current BMO-set are included in them) and test which objects are not dominated by others (by checking the bucket order of the other preference

62

Chapter 3. A Preference Framework for Multidimensional Information Spaces

action). These objects are part of the current BMO-set, and are removed from the initial bucket orders B i and B j . Then we continue computing the next BMO-set of the remaining objects. Notice, that if we are interested only in the Pareto optimal, i.e. winnow operator, we need to find only once the BMO-set. One can easily see that the produced bucket order interprets Pareto composition (⊗) as: x ≻P 1⊗P 2 y iff (x1 ≻P 1 y1 ∧ (x2 ∼P 2 y2 ∨ x2 ≻P 2 y2 )) ∨ (x2 ≻P 2 y2 ∧ (x1 ∼P 1 y1 ∨ x1 ≻P 1 y1 )) where x1 ∼P 1 y1 means that x1 and y1 are in the same block in the bucket order produced by P 1. Again, in our case indifference means equivalence, so the finally produced bucket order is interpreted as: x ≻P 1⊗P 2 y iff (x1 ≻P 1 y1 ∧ (x2 ≈P 2 y2 ∨ x2 ≻P 2 y2 )) ∨ (x2 ≻P 2 y2 ∧ (x1 ≈P 1 y1 ∨ x1 ≻P 1 y1 )) The above algorithm can be straightforwardly generalized to more than two facets. For example, assume that the user has defined: B Loc ⊗ B M anuf ⊗ B price Moreover, assume the actions BLoc = {Best(Crete), W orst(Chania)}, B M anuf = {Best(European), W orst(Italian), Best(F errari)} and BP rice = {price min}, and suppose that the current focus A consists of the following tuples: Id

Location

Manuf.

Price

...

L

Heraklion

Lancia

10

...

B

Chania

BMW

20

...

A1

Athens

Audi

20

...

A2

Athens

Audi

21

...

F1

Heraklion

Ferrari

100

...

F2

Rethymno

Ferrari

80

...

The constituent bucket orders are shown below: LB Loc LB M anuf LB P rice

= ⟨{L, F1 , F2 }, {B}, {A1 , A2 }⟩ = ⟨{B, A1 , A2 , F1 , F2 }, {L}⟩ = ⟨{L}, {B, A1 }, {A2 }, {F2 }, {F1 }⟩

3.4. Multi-Facet Preferences

63

From the above we can see that A1 dominates A2, since A1 is less expensive and has the same preference regarding Location and Manufacturer. In addition, A1 is dominated by B, since they have the same price and the manufacturer is equally preferred, but B is located in Chania, which is preferred over the inactive Athens. Finally, F 1 is dominated by F 2, since F 2 is less expensive. Then, the final bucket order returned by the algorithm is: LB Loc ⊗BM anuf ⊗B P rice

= ⟨{L, F2 }, {F1 , B}, {A1 }, {A2 }⟩

The Pareto optimal set (i.e. the result of the winnow operator or skyline operator), is ABM O = {L, F2 }, which is the maximal element of the above bucket order. Pareto composition assumes Ceteris paribus semantics. Recall that Ceteris paribus semantics means that if o1 ≻ o2 for a specific attribute, I prefer o1 to o2 considering that for all the other attributes o1 and o2 are “equal”, (i.e. objects are equivalent). Furthermore, we expand the Ceteris paribus semantics by accepting that if o1 ≻ o2 for a specific attribute attr1 , o1 and o2 are at least “equal” for all other attributes. So we also accept o1 to be preferred to o2 for another attribute attrN instead of being “equal”. Here we assume that “all else equal” is captured by the o1 ≈ o2 (equivalence) operator (i.e. o1 and o2 are in the same bucket for a specific attribute). For example in the previous case, A1 is preferred to A2 since it is less expensive and all the rest attributes are the same. Furthermore, L is preferred to F 2, since Heraklion ≻ Rethymnon, and L is less expensive than F 2.

3.4.3

Combination of Priority and Pareto Compositions

In addition, we can provide combinations of the Priority and Pareto compositions. For example assume that L1 is the bucket order returned by Alg. MFPriority (i.e. we have a composition of type P 11 ▷ P 12 ▷ ... ▷ P 1k ) and that L2 is the bucket order returned by Alg. MFPareto (i.e. composition of type P 21⊗P 22⊗...⊗P 2l ). Then, we can combine the previous bucket orders, using either Priority or Pareto composition, by calling Alg. MFPriority or MFPareto resp (combining also their semantics). There are works like the one described in Neves and Kaci (2010) that provide a combination of Priority and Ceteris paribus semantics. In this case instead of calculating the bucket orders (the first step of each algorithm) we can pass the already computed buckets L1 and L2 to the appropriate algorithm.

64

Chapter 3. A Preference Framework for Multidimensional Information Spaces In this way we can calculate Priority compositions of the type: (P 11 ⊗ P 12 ⊗ ... ⊗ P 1k) ▷ (P 21 ▷ P 22 ▷ ... ▷ P 2l) (P 11 ▷ P 12 ▷ ... ▷ P 1k) ▷ (P 21 ⊗ P 22 ⊗ ... ⊗ P 2l) (P 11 ⊗ P 12 ⊗ ... ⊗ P 1k) ▷ (P 21 ⊗ P 22 ⊗ ... ⊗ P 2l)

Respectively we can calculate Pareto compositions of the type: (P 11 ▷ P 12 ▷ ... ▷ P 1k) ⊗ (P 21 ▷ P 22 ▷ ... ▷ P 2l) (P 11 ▷ P 12 ▷ ... ▷ P 1k) ⊗ (P 21 ⊗ P 22 ⊗ ... ⊗ P 2l) (P 11 ⊗ P 12 ⊗ ... ⊗ P 1k) ⊗ (P 21 ▷ P 22 ▷ ... ▷ P 2l)

Compositions of the type: (P 11 ▷ P 12 ▷ ... ▷ P 1k) ▷ (P 21 ▷ P 22 ▷ ... ▷ P 2l) (P 11 ⊗ P 12 ⊗ ... ⊗ P 1k) ⊗ (P 21 ⊗ P 22 ⊗ ... ⊗ P 2l)

would return equivalent results as compositions: (P 11 ▷ P 12 ▷ ... ▷ P 1k ▷ P 21 ▷ P 22 ▷ ... ▷ P 2l) (P 11 ⊗ P 12 ⊗ ... ⊗ P 1k ⊗ P 21 ⊗ P 22 ⊗ ... ⊗ P 2l)

respectilve, since according to Kießling (2002) Priority and Pareto compositions are associative. As an example, assume that the user has defined:

(B Loc ▷ B M anuf ) ⊗ B price

Moreover, assume the actions BLoc = {Best(Crete), W orst(Chania)}, BM anuf = {Best(European), W orst(Italian), Best(F errari)} and B P rice = {price min}. Finally suppose that the current focus A consists again of the same following tuples:

3.5. A Complete Example

65 Id

Location

Manuf.

Price

...

L

Heraklion

Lancia

10

...

B

Chania

BMW

20

...

A1

Athens

Audi

20

...

A2

Athens

Audi

21

...

F1

Heraklion

Ferrari

100

...

F2

Rethymno

Ferrari

80

...

The constituent and final bucket orders are shown below. LB Loc LB M anuf LBP rice LB Loc ▷B M anuf L(B Loc ▷BM anuf )⊗BP rice

= ⟨{L, F 1, F 2}, {B}, {A1, A2}⟩ = ⟨{B, A1, A2, F 1, F 2}, {L}⟩ = ⟨{L}, {B, A1}, {A2}, {F 2}, {F 1}⟩ = ⟨⟨{F 1, F 2}, {L}⟩{B}, {A1, A2}⟩ = ⟨{L, F 2}, {F 1, B}, {A1}, {A2}⟩

In case the user had defined the opposite combination: (B Loc ⊗ B M anuf ) ▷ B price then the constituent and final bucket orders would be: LB Loc ⊗B M anuf LBP rice L(B Loc ⊗BM anuf )▷BP rice

3.5

= ⟨{F 1, F 2}, {B}, {A2, A1}, {L}⟩ = ⟨{L}, {B, A1}, {A2}, {F 2}, {F 1}⟩ = ⟨{F 2}, {F 1}, {B}, {A1}, {A2}, {L}⟩

A Complete Example

This section provides a complete example for making more clear the semantics of preferences statements. Consider the following set of preference actions: b1 : Best(Europe) b2 : Worst(Italian)

66

Chapter 3. A Preference Framework for Multidimensional Information Spaces

b3 : Porsche ≻ Ferrari b4 : Fiat ≻ Korean b5 : Japanese ≻ French The scope-based ordering of actions are shown in Fig. 3.13, where the left diagram concerns the best/wost actions, while the right one concerns the relative preference actions. The scopes and active scopes of the actions are shown in Table 3.4.

Figure 3.13:

Scope Based Ordering of Actions (Left for Best/Worst Actions, Right for Relative Preference Actions): Complete Example

It follows, that Alg. Apply will be called with the following parameters: Param

Param value

B

European, German, Audi, Bmw, P orsche, F rench, Citroen, P eugeot

W

Italian, Lancia, F errari, F iat, Lamborghini

R≻

P orsche ≻ F errari, F iat ≻ Korean, F iat ≻ Kia, Japanese ≻ F rench, Japanese ≻ Citroen, Japanese ≻ P eugeot, T oyota ≻ F rench, T oyota ≻ Citroen, T oyota ≻ P eugeot, Lexus ≻ F rench, Lexus ≻ Citroen, Lexus ≻ P eugeot

The diagram of Rbw is shown in Figure 3.14, while the diagram of R≻ is shown in Figure 3.15. The final diagram of R is shown in Figure 3.16. For reasons of space names are abbreviated. The returned bucket order, assuming all these actions are term-scoped is:

⟨{E, G, A, B, P o, J, T, Lx}, {F r, C, P e}, {I, Lmb, F e, F i, La}{Ko, Ki}⟩ The bucket order over the leaves of the taxonomy (i.e. car manufacturers) is:

3.5. A Complete Example

67

Table 3.4: Complete Example: Scopes and Active Scopes action b1 :

b2 : b3 : b4 : b5 :

scope / expansion European, German, Audi, BM W , P orsche, F rench, Citroen, P eugeot, Italian, Lancia, F errari, F iat, Lamborghini Italian, Lancia, F errari, F iat, Lamborghini P orsche ≻ F errari F iat ≻ Korean, F iat ≻ Kia Japanese ≻ F rench, Japanese ≻ Citroen, Japanese ≻ P eugeot, T oyota ≻ F rench, T oyota ≻ Citroen, T oyota ≻ P eugeot, Lexus ≻ F rench, Lexus ≻ Citroen, Lexus ≻ P eugeot

active scope European, German, Audi, Bmw, P orsche, F rench, Citroen, P eugeot Italian, Lancia, F errari, F iat, Lamborghini P orsche ≻ F errari F iat ≻ Korean, F iat ≻ Kia Japanese ≻ F rench, Japanese ≻ Citroen, Japanese ≻ P eugeot, T oyota ≻ F rench, T oyota ≻ Citroen, T oyota ≻ P eugeot, Lexus ≻ F rench, Lexus ≻ Citroen, Lexus ≻ P eugeot

Figure 3.14: Hasse Diagram for the Relation Rbw : Complete Example

Figure 3.15: Hasse Diagram for the Relation R≻ : Complete Example

⟨{A, B, P o, T, Lx}, {C, P e}, {Lmb, F e, F i, La}{Ki}⟩ Now suppose an object-relational database (i.e. a database that supports multi-valued attributes) containing the following tuples shown in Table 3.5.

68

Chapter 3. A Preference Framework for Multidimensional Information Spaces

Figure 3.16: Hasse Diagram for the Relation R: Complete Example

Table 3.5: Tuples in Database: Complete Example Id C B A1 A2 F1 F2 P F3 K T

Manufacturer Citroen BMW Audi Audi Ferrari Ferrari Porsche Fiat Kia Toyota

Price 10 20 20 21 100 80 150 5 12 20

Accessories {DVD} {ABS, AT} {ABS, MT, DVD} {ABS} {ESP, MT} {ESP, ABS, MT} {ESP} {} {DVD} {ABS, AT, ESP, DVD}

Then, if we apply the manufacturers ordering to the specific objects in the table, we get: LB M anuf. = ⟨{B, A1 , A2 , P1 , T }, {C}, {F1 , F2 , F3 }, {K}⟩

Now consider the following three preference actions over the attribute Accessories:

3.5. A Complete Example

69

b1 : Best(ABS) b2 : Worst(DVD) b3 : AT ≻ MT These actions define the following preference relation R: ABS

AT

|

|

DVD

MT

Suppose that we have to order the values that appear in the attribute Accessories of the tuples in Table 3.5 according to preference, i.e. we want to order the set: {

{}, {DV D}, {ABS}, {ESP }, {ABS, AT }, {ESP, M T }, {ABS, M T, DV D}, {ESP, ABS, M T }, {ABS, AT, ESP, DV D}

}

w(s, s′ )/w(s′ , s)

{}

{ABS}

{DVD}

{ESP}

{ABS, AT}

{ESP, MT}

{ABS,MT,DVD}

{ESP, ABS, MT}

{ABS, AT, ESP, DVD}

all

{}

0/0

0/0

0/0

0/0

0/0

0/0

0/0

0/0

0/0

0/0

{ABS}

0/0

0/0

1/0

0/0

0/0

0/0

1/0

0/0

1/0

3/0

{DVD}

0/0

0/1

0/0

0/0

0/1

0/0

0/1

0/1

0/1

0/5

{ESP}

0/0

0/0

0/0

0/0

0/0

0/0

0/0

0/0

0/0

0/0

{ABS,AT}

0/0

0/0

1/0

0/0

0/0

1/0

2/0

1/0

1/0

5/0

{ESP,MT}

0/0

0/0

0/0

0/0

0/1

0/0

0/0

0/0

0/1

0/2

{ABS, MT, DVD}

0/0

0/1

1/0

0/0

0/2

0/0

1/1

0/1

1/2

1/4

{ESP,ABS,MT}

0/0

0/0

1/0

0/0

0/1

0/0

1/0

0/0

2/1

4/2

{ABS,AT,ESP,DVD}

0/0

0/1

1/0

0/0

0/1

1/0

2/1

1/2

1/1

3/3

The ordering of these values according to MoreWins rule (i.e. Def. 3 in Section 3.3.2) is shown in the Hasse diagram of Figure 3.17. We can resolve the ties by using the MoreGoodLessBad rule (i.e. Def. 4). Specifically, Support({}) = −1, Support({ABS}) = 0, Support({DV D}) = −1, Support({ESP }) = −1, Support({ABS, AT }) = 0, Support({ESP, M T }) = −2, Support({ABS, M T, DV D}) = −2, Support({ESP, ABS, M T }) = −2, and finally Support({ABS, AT, ESP, DV D}) = −2. As a result for the empty set {} we have {} ≻ {ESP, M T }, {} ≻ {ABS, M T, DV D}, {} ≻ {ESP, ABS, M T }, {} ≻ {ABS, AT, ESP, DV D}. For {ABS} we have {ABS} ≻ {}, {ABS} ≻ {ESP }, {ABS} ≻ {ESP, M T } and {ABS} ≻ {ESP, ABS, M T }, while for {DV D} , {DV D} ≻ {ESP, M T }. Finally, regarding {ABS, AT }, {ABS, AT } ≻ {ESP }, while for {ESP }, {ESP } ≻

70

Chapter 3. A Preference Framework for Multidimensional Information Spaces

Figure 3.17:

Hasse Diagram for Ordering Ordering Multi-Valued Attributes According to MoreWins Rule: Complete Example

{ESP, M T }, {ESP } ≻ {ABS, M T, DV D}, {ESP } ≻ {ESP, ABS, M T } and finally {ESP } ≻ {ABS, AT, ESP, DV D}. The ordering of these values according to MoreWins rule (i.e. Def. 3 in Section 3.3.2) is shown in the Hasse diagram of Figure 3.18.

Figure 3.18:

Hasse Diagram for Ordering Multi-Valued Attributes According to MoreGoodLessBad Rule: Complete Example

3.5. A Complete Example

71

After running topological sorting we get the following final bucket order over the sets ⟨

{{ABS}, {ABS, AT }}, {{ESP }, {}}, {{ABS, AT, ESP, DV D}, {ESP, ABS, M T }}, {ABS, M T, DV D}, {DV D}, {ESP, M T }



If we assume the tuples of the Table 3.5 , the expressed preference actions are object scoped, then the final bucket ordering is: LB Access. = ⟨{A2 , B}, {P1 , F3 }, {T, F2 }, {A1 }, {K, C}, {F1 }⟩

Suppose that we also want cars to be sorted according to their price in ascending order, i.e. the order of the cars in Table 3.5 is LBP rice = ⟨{F3 }, {C}, {K}, {T, B, A1 }, {A2 }, {F2 }, {F1 }, {P1 }⟩

Now consider that (B M anuf acturer ⊗ B P rice ) ▷ B Accessories . As a result, according to previous bucket orders we have: LBM anuf. ⊗B P rice = ⟨{F3 , B, A1 , T }, {C, A2 }, {K, P1 }, {F2 }, {F1 }⟩

and L(B M anuf. ⊗BP rice )▷B Accessories = ⟨{{B}, {F3 }, {T }, {A1 }}, {{A2 }, {C}}, {{P1 }, {K}}, {F2 }, {F1 }⟩

72

Chapter 4

Complexity and Optimizations

Contents 4.1

Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.2

Optimizations for Deriving the Preference-based Order . . . . . . . . . . . . . .

78

4.2.1

An Algorithm based on the Focal Object Set . . . . . . . . . . . . . . . . . . . .

78

4.2.2

Optimizations for Capturing Set-Valued Attributes and Top-K Requirements .

82

Optimizations for Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . .

85

4.3.1

Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.3.2

Pareto Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.3.3

Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . .

87

4.3

At first (Section 4.1) we discuss the computational complexity of the algorithms presented in the previous sections. Then at Section 4.2 we introduce more efficient algorithms for object-ordering, while at Section 4.2.2 we focus on algorithms for set-valued facets which can be used also for evaluating the top-K elements of the object order. Finally, at Section 4.3 we discuss some optimizations for multi-facet preferences. 73

74

Chapter 4. Complexity and Optimizations

4.1

Computational Complexity

Alg. 1 (Apply) In the worst case all elements of E are involved and the most expensive task is that of topological sorting. The topological sorting is in O(|E| + |R|), thus w.r.t. E we can say that it is in O(|E|2 ). If the actions are object-scoped, i.e. E corresponds to Obj, then the complexity of Apply is in O(|Obj|2 ).

Alg. 3 (ApplyOverFamiliesOfSets) Suppose E is a set of terms over |Ti |. The computation of the closure at line (3) is in O(|Ti |3 ). Then we have to compute |E|2 computation of wins (between all pairs of element of E). Since to compute wins(s, s′ ) we need O(|s|2 ) steps, the cost for computing all wins is in O(|E|2 avgSetSize2 ) where avgSetSize is the average size of the sets in E. Note that for some of the pairs we may have to compute the support of the involved sets. Since for computing the support of one atomic element the cost is |Ti |, the computation of Support(s) is in O(|s||E|). Altogether, the computation of wins and support is in O(|Ti |3 + |E|2 avgSetSize2 ). As regards the size of E, for a facet with |Ti | values we can have at most 2|Ti | sets, therefore |E| ≤ 2|Ti | . However |E| cannot be bigger than |A|, therefore we can write |E| ≤ min(2|Ti | , |A|).

Alg. 4 (PrefOrder) Let us now elaborate on the computational complexity of PrefOrder and suppose that all actions in B are object-scoped, i.e. E corresponds to Obj. Line 2 requires computing the scopes of all actions in B. The computation of the scope of an action depends on |Obj|, and the size of the scope can be |Obj| in size, i.e it is in O(|B| ∗ |Obj|). Line 3 requires |B|2 comparisons of sets, where each set can be |Obj| in size, i.e. it is in O(|B|2 |Obj|). Line 5 requires computing the active scopes and this depends on |B| and |Obj|, i.e it is in O(|B||Obj|). Line 6 requires firstly to compute the parameters B, W and then to run Alg. Apply. The cost of the latter is in O(|Obj|2 ) as discussed earlier. It follows that the overall cost of Alg. PrefOrder is in O(|Obj|(|Obj| + |B|2 )).

4.1. Computational Complexity

75

Alg. 6 (MFPriority) Consider the algorithm MFPriority of Section 3.4.1. Assume that we have k facets, i.e. we have to order the elements according to a prioritized composition of actions over each facet (B 1 , B 2 , . . . , B k ). Let us describe the complexity for only two facets. In that case we have to apply Alg. PrefOrder with cost O(|Obj|(|Obj| + |B i |2 )), and then for each block of the produced bucket order to call Alg. PrefOrder with actions |Bj |. The cost of each such call is in O(|Obj|(|Obj| + |B i |2 )). Overall we can say that the cost is O(|Obj|(|Obj| + |B|2 )), where |B| = |B 1 | + ... + |B k | . Now the cost of the algorithm for k facets is in O(|Obj|(k|Obj| + |B|2 )).

Alg. 7 (MFPareto) Consider the algorithm MFPareto of Section 3.4.2. Assume that we have k facets, i.e. we have to order the elements according to a Pareto composition of actions over each facet (B1 , B 2 , . . . , B k ). Let us describe the complexity for only two facets. In that case we have to apply Alg. PrefOrder twice with cost O(|Obj|(|Obj| + |B i |2 ) + |Obj|(|Obj| + |B j |2 )). The set of objects in the two maximal buckets in the worst case can be the whole set of objects (i.e. |Obj|). Then we have to check for each object in the maximal blocks of the returned two bucket orders, if they get dominated for any of the two criteria (as described by preference actions ordering objects). This can be done by running existing skyline algorithms like BNL (Börzsönyi et al. (2001)) which has a cost of O(|Obj|2 ). In the worst case (i.e. only one element is not dominated in each run) we have to repeat this for |Obj| objects, i.e. the cost for finding the Pareto is in O(|Obj|3 ). Overall we can say that the cost is O(|Obj|(|Obj|2 + |B|2 )). Now the cost of the algorithm for k facets is in O(|Obj|(k|Obj|2 + |B|2 )), where |B| = |B1 | + ... + |B k |. If we only calculate the Pareto Optimal (i.e. the skyline) then the cost is in O(|Obj|(k|Obj| + |B|2 ))

Combination of Pareto and Priority Regarding the combination of Pareto and Priority as described in 3.4.3, the complexity will be in the worst case in O(|Obj|(k|Obj|2 + |B|2 )), which is the complexity of the Pareto (i.e. the most expensive composition).

76

Chapter 4. Complexity and Optimizations

4.2

Optimizations for Deriving the Preference-based Order

Facet and Zoom Point Ordering. Since the set of facets F = {F1 , . . . , Fk } is usually small, the computation of ≻F is not expected to be expensive and we can use the proposed algorithms straightforwardly. The same is true for ordering the zoom points of each facet (as |Ti | is usually small). Also note that we do not have to order the entire Ti but only the “active” terms (i.e. the zoom points Zi (ctx) and ZRi (ctx) as defined in Table 2.1) which are subsets of Ti . Object Ordering. Let us now focus on object ordering. If |Obj| (and thus all |A|’s) is small, then we could again apply the proposed algorithm straightforwardly. If |Obj| is high then |A| can be high too. At such cases we propose exploiting the benefits of adopting the FDT approach, i.e. the fast convergence to small results sets with a few clicks. Converge is discussed in detail (and it is quantified) at Section 6.2. This means that an acceptable and feasible policy is to order according to preference the set A only if |A| is below a given threshold (say a few hundreds). For these reasons, below we present an algorithm, Alg. PrefOrderOpt , which is an optimized version of Alg. PrefOrder, and whose complexity does not depend on |Obj|, but only on |A| and |B|, so it can be applied to large information bases. We could call this algorithm focus-based. An alternative algorithm which can be beneficial in cases A is large is given in section 4.2.2.

4.2.1

An Algorithm based on the Focal Object Set

Alg. PrefOrderOpt takes as input the set A to be ranked which we can assume that is not big (due to the fast convergence of FDT). First we present some auxiliary functions and the main idea (ignoring the case of relative preferences and set-valued attributes), and then the full algorithm. We can start by the observation that if we have a function that checks whether b1 ⊑ b2 holds, where b1 and b2 are actions, then we can form the relation ⊑. An algorithm that implements such a function, denoted by CheckSubScopeOf(b1 , b2 ), is given below. The key point is that we can decide whether b1 ⊑ b2 holds, without having to compute the scopes of these actions. Instead we can base our approach on the definition of action scopes (Table 3.1). Specifically, if the anchor of b1 is not empty, while the one of b2 is

4.2. Optimizations for Deriving the Preference-based Order

77

empty (e.g. order all facets lexicographically) then it returns True. The rest cases follow the general rule: terms are more refined to facets. In case of two term-anchored actions whose terms are ≤-related, then the actions are ⊑-related too (see line 6). If furthermore labeling is used (e.g. Agrawal et al. (1989)) which is good choice in such applications, then the cost of this function is always in O(1).

1: function CheckSubScopeOf(b1 , b2 ): Boolean 2: 3: 4: 5: 6: 7: 8:

if (b1 .anchor ̸= ϵ) ∧ (b2 .anchor = ϵ) then return True if (b1 .anchor = ⟨ti ⟩) ∧ (b2 .anchor = ⟨Fj ⟩) then return True if (b1 .anchor = ⟨ti ⟩) ∧ (b2 .anchor = ⟨tj ⟩) ∧ (ti ≤ tj ) then return True return False

For defining the intended algorithm we also need a boolean function IsInScopeOf(o, b) that returns True if o belongs to the scope of b. This function can be implemented as follows.

1: 2: 3:

function IsInScope(o, b): Boolean if b.scopeType=”object order:” then if b.anchor=”facet Fi ” then

4:

if D(o) ∩ Ti ̸= ∅ then return True

5:

else return False

6:

else if b.anchor=”term tj ” then

7:

¯ if tj ∈ D(o) then return True

8:

else return False

9:

return False The main cost of IsInScope(o, b) is the cost required to check whether a term is narrower than

another (line 7 requires checking if tj is broader than a term assigned to o, i.e. if tj ≥ t′j where t′j ∈ D(o)), so its cost is O(|R≤ |) where |R≤ | denotes the number of relationships of a taxonomic relation. If labeling is used (e.g. Agrawal et al. (1989)) then this cost is O(1). Assume now that the average number of terms that are directly assigned to an object o is denoted with avgD≤ . Then the final cost of IsInScope(o, b)

78

Chapter 4. Complexity and Optimizations

is in O(avgD≤ ). We can now present the optimized version of Alg. PrefOrder, which is Alg. PrefOrderOpt shown below. It takes as input two parameters, an object set A, and a set of actions B (the latter is one of the k+2 sets of actions). Part (1) includes the optimized version of lines (2-3) of PrefOrder, and Part (2) includes the optimized version of lines (5-6) of PrefOrder. We can see that the algorithm never computes the scope of any action and this is the key point for applying it in large information bases (in the sense that its computational complexity does not depend on |Obj|). Instead, it checks whether elements of E (recall that E has been reduced through clicks) belong to the scopes of actions. Algorithm 8 PrefOrderOpt (E, B, P olicy) Input: the set of elements E, the set of actions B, and P olicy for inactive elements Output: a bucket order over E 1: /** Part (1): Computation of (B, ⊑) */ 2: V isited ← ∅ 3: R⊑ ← ∅ // R⊑ corresponds to ⊑ 4: for each b ∈ B do 5: for each b′ ∈ B \ V isited do 6: if CheckSubScopeOf(b, b′ ) then 7: R⊑ ← R⊑ ∪ {(b ⊑ b′ )} 8: else if CheckSubScopeOf(b′ , b) then 9: R⊑ ← R⊑ ∪ {(b′ ⊑ b)} V isited ← V isited ∪ {b} 11: endfor 12: endfor 10:

13: 14: 15: 16: 17: 18: 19:

/** Part (2): Efficient Computation of Act. Scopes */ for each b ∈ B do C(b) ← direct children of b wrt R⊑ ActiveScope[b] ← {e ∈ E | IsInScope(e, b)∧ (∀c ∈ C(b) it holds IsInScope(e, c) = False)} endfor

20:

Use the active scopes to expand the set B to a set B ′ 22: /** Part (3): Derivation of the final bucket order */ 23: (B, W, R≻ ) ← Parse(B ′ ) 24: return Apply(E, B, W, R≻ , P olicy) // call to Alg. 1 21:

Regarding its complexity, suppose that the taxonomy of each facet is labeled. The cost of the first part of the algorithm is in O(|B|2 ). Note that as long the user is not submitting a new action, (B, ⊑) can

4.2. Optimizations for Deriving the Preference-based Order CheckSubScopeOf : line 6

Alg. PrefOrderOpt : lines (17-18) IsInScope(o, o′ , b)

Table 4.1:

79

Let b1 .anchor = (ei , ej ) and b2 .anchor = (e′i , e′j ). We have to write: ((ei ≤ e′i ) ∧ (ej ≤ e′j )) ∨ ((ej ≤ e′i ) ∧ (ei ≤ e′j )) ActiveScope[b] ← {(e, e′ ) ∈ E × E | IsInScope(e, e′ , b)∧ (∀c ∈ C(b) it holds IsInScope(e, e′ , c) = False)} Let b.anchor = (ti , tj ). We have to write: ¯ ¯ ′ ))) ∨ ((ti ∈ D(o)) ∧ (tj ∈ D(o ¯ ¯ ((tj ∈ D(o)) ∧ (ti ∈ D(o′ )))

PrefOrderOpt Changes for Capturing Relative Preferences Over Hierarchically Organized Values

be preserved and reused when the user is changing his focus (so O(|B|2 ) is payed once). The second part of the algorithm has |B| iterations. Assuming labeling, the cost of each iteration is (avgD≤ ) ∗ |A| ∗ (1 + avgC⊑ ) where avgC⊑ is the average number of direct children of an action w.r.t ⊑. It follows that the cost of the second part is (avgD≤ ) ∗ |B| ∗ (|A| ∗ (1 + avgC⊑ )) = (avgD≤ ) ∗ |B| ∗ |A| + (avgD≤ ) ∗ |A| ∗ (|B| ∗ (avgC⊑ )) = (avgD≤ ) ∗ |A| ∗ (|B| + | ⊑ |). It holds that | ⊑ | ≤ |B|2 , and as a result the cost of the second part is (avgD≤ ) ∗ |A| ∗ |B|2 . The last part of the algorithm is the cost of Alg. Apply, which in our context is expressed as O(|A|2 ).

Changes for Capturing also Relative Preferences The optimized algorithm Alg. PrefOrderOpt can be easily adapted so that to handle also relative preferences over hierarchically organized values (as defined in Section 3.3.4). Specifically we just have to make the changes shown at Figure 4.1.

Table 4.2: Complexity for Non-Optimized and Optimized Alg. PrefOrder and PrefOrderOpt Part

Alg. PrefOrder

Alg. PrefOrderOpt

Part 1 Part 2 Part 3 Total

O(|Obj|(|Obj| + |B|2 )) O(|Obj||B|) O(|Obj|2 ) O(|Obj|(|Obj| + |B|2 ))

O(|B|2 ) O(|A||B|2 avgD≤ ) O(|A|2 ) O(|A|(|A| + |B|2 avgD≤ ))

Alg. PrefOrderOpt Relative O(|B|2 ) O(|A|2 |B|2 avgD≤ ) O(|A|2 ) O(|A|2 |B|2 avgD≤ )

80

Chapter 4. Complexity and Optimizations Regarding complexity, if labeling is available, then CheckSubScopeOf remains in O(1) and as a

result part one remains to O(|B|2 ). The function IsInScope(o, o′ , b) requires 4 checks of the form ¯ ′ ). Again, if AvgD≤ denotes the average number of terms that are directly assigned to an object tj ∈ D(o o ∈ Obj, then these checks cost O(AvgD≤ ) time. In the revised lines 17-18 of Alg. PrefOrderOpt the cost of each iteration is higher (in place of |A| we now have |A|2 ). Therefore the cost of the second part of the algorithm is now in O(|A|2 |B|AvgD≤ ). Synopsis. Table 4.2 summarizes the complexities for the 3 different parts of the algorithm, for the nonoptimized and optimized version of the algorithm. The key point is that the complexity of the optimized algorithm is independent of |Obj|.

4.2.2

Optimizations for Capturing Set-Valued Attributes and Top-K Requirements

Here we provide an optimized algorithm for ordering a set of objects A for the case where (i) we have relative preferences over a facet whose values are hierarchically organized, and (ii) the object descriptions according to that facet are set-valued. The reason for describing this case separately is because IsInScope was defined without considering set-valued attributes (however note that a plain vanilla algorithm was given in Sect. 3.3.5). Let Fi be the facet whose terms are hierarchically organized and suppose that the object descriptions are set-valued at that facet. We start by assuming that the relation ≻{} over sets of terms of facet Fi has been computed, and obviously this includes inheritance resolution (computation of active scopes), and computation of the wins and Support if needed (as we have described in Section 3.3.2). Now the idea of the algorithm is the following: (a) for the objects in A collect their descriptions w.r.t. Fi (let Z be this set), (b) compute the restriction of ≻{} on Z, (c) apply topological sorting on Z based on ≻{} , and (d) from the blocks of Z derive the blocks of the objects. The exact steps of the algorithm are given in Alg. 9. In line (1) we compute Z, which is the family of sets of terms of Fi that occur in A. As stated earlier, we assume that the relation ≻{} over all values that occur in Fi has been defined (as in lines 1-10 of Alg. 3). Now line (2) sets R to be the restriction of ≻{} on Z. Subsequently we apply topological sorting and

4.2. Optimizations for Deriving the Preference-based Order

81

Algorithm 9 PrefOrderSetValuedOpt(A, ≻{} ) Input: A, an order ≻{} over a set-valued attribute with values in Fi . Output: Ordering of A w.r.t. ≻{} 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

Z = {Di (o) | o ∈ A} R =≻{}|Z // restriction of ≻{} on Z L ← SourceRemoval(R) OL ← ∅ for each block b of L do for each term set s in b do ob = I(s) ∩ A // ob is the corresponding object block append ob to OL append to OL a block separator return OL

we obtain L. Next, we start consuming L starting from the first block. Note that a block can contain one or more term sets. For each such set s we scan A and let ob be the objects that have this value. The elements in ob are appended to OL which is the order of objects. This continues until having consumed all blocks of L. To compute ≻{} we can follow lines (1)-(10) of Alg. 3 and according to section 4.1, the complexity for this is in O(|Ti |3 + |E|2 avgSetSize2 ). As regards the size of E, for a facet with |Ti | values we can have at most 2|Ti | sets (i.e. the size of P(Ti )), therefore |E| ≤ 2|Ti | . However |E| cannot be bigger than |A|, therefore we can write |E| ≤ min(2|Ti | , |A|). One policy is to compute ≻{}|Z when needed. Another is to compute ≻{} over all distinct sets that occur for that facet (and to update it each time the user issues a preference action that concerns that order), to avoid recomputing it while the user restricts the set A. At run-time we just have to take its restriction of Z. We favor this policy in the given algorithm.

Generalization and Top-K Algorithm Note that Alg. 9 essentially corresponds to the following approach: first order the terms and from their ordering derive the object ordering. Now suppose that we are not in the context of a set-valued attribute. If instead of passing the parameter ≻{} , we pass an ordering over the values of Ti , then a rising question is whether we could use this algorithm, instead of Alg. 8, to produce the object order, and in what cases that algorithm would be beneficial. Let approach this question from the computational complexity perspective. Suppose the case of rel-

82

Chapter 4. Complexity and Optimizations

ative preferences over a Ti . Instead of ≻{} , we have to pass as parameter the ordering over the values of Ti . To compute this ordering we can use Alg. 8 where instead of having to order the objects in A we order the terms of Ti . In this case, and according to Table 4.2, the cost of this step is in O(|Ti |2 |B|2 avgD≤ ). Note that it is not necessary to compute Z or the restriction of the preference relation on Z (lines 1-2 of Alg. 9), in the sense that the final answer will be correct in any case due to the intersections with A at line 7. The computation of Z can be beneficial if Z is much smaller than Ti (in that case for less s’s we will have to compute I(s)). Also note that the way A is defined can be exploited for further optimizations. For instance, if A has been defined intentionally (by one query), then we may already know the set Z without having to scan the set A. Line (3) requires topological sorting whose cost is in O(|Ti |2 ). The subsequent loop will have at most |Ti | iterations (in particular |Z|), and the cost of each iteration is that of the operation I(s) ∩ A. The operation I(s) ∩ A can be implemented in various ways, based on the sizes of the operants (and the data structures that are in place). E.g. if A is small it is better to scan A to select those objects whose Di description equals s, and in this case the cost of an operation I(s)∩A is in O(|A|∗avgD≤ ). On the other hand if A is big, and I(s) is small, then it is better to compute and scan I(s) and then delete from this set those elements which are not in A. In this case the cost of an operation I(s) ∩ A, if we assume direct access to the elements I(s) and ability to perform binary search for lookups at A, is O(|I(s)| log |A|). It follows that the cost of the loop is in O(|Ti | ∗ min(|A| ∗ avgD≤ , |I(s)| ∗ log |A|)). Overall, the cost of Alg. 9 for single-valued facets, including the cost for computing the preference relation to be passed to this algorithm, is in O(|Ti |2 |B|2 + |Ti | ∗ min(|A| ∗ avgD≤ , |I(s)| ∗ log |A|)). Recall that the cost of Alg. 8 (according to Table 4.2) is in O(|A|2 ∗ |B|2 ∗ avgD≤ ). One benefit of Alg. 9 is that it can be more efficient than Alg. 8 if A is large and Ti is small. This is evident also from their complexities; Alg. 9 will have the cost O(|Ti |2 |B|2 + |Ti ||I(s)| log |A|), while Alg. 8 will have the cost O(|A|2 ∗ |B|2 ∗ avgD≤ ). Note that cases where |A| can be very large may occur at application level. For instance, consider the case of a user who has expressed a number of objectscoped actions, and instead of restricting A, he would like to directly get the most preferred objects. The user wants to bypass the information thinning process probably because he believes that his preference actions are enough for bringing the most desired object to the top positions of the returned answer. Is it not hard to see that in this scenario, both plain (Alg. 4) and Alg. 8, are prohitively expensive. Alg. 9 will be more efficient, but it will still order the entire Obj. Although, according to our opinion the

4.3. Optimizations for Multi-Facet Preferences

83

assumption that the user has expressed a detailed and complete description of his preferences is not very realistic (recall the discussion at the end of Section 2 and the DiFEPreKO Hypothesis that will be discussed in Section 6.3), if we want to support such scenarios then a possible direction is to devise an appropriate top-k algorithm. Top-k algorithms for preference-aware queries have been proposed (e.g. Georgiadis et al. (2008); Stefanidis et al. (2010); Spyratos et al. (2011)), however they are appropriate for plain relational sources, meaning that hierarchically organized values or set-valued attributes are not supported. However notice that Alg. 9 can be slightly changed to become a top-K algorithm. Specifically we consume blocks of L until OL has reached K objects. With this we complete the discussion of the main cases where the adoption of Alg. 9 is beneficial. On the other hand Alg. 8, can be faster than Alg. 9 if the Ti is large in comparison to A (e.g. suppose Ti is a thesaurus and A is a set of few tens of objects). Also note that another merit of Alg. 8 is that it can be straightforwardly extended to accommodate object-anchored preference actions, or other multi-facet preferences, due to its “scope-based” approach.

4.3 4.3.1

Optimizations for Multi-Facet Preferences Prioritized Composition

According to Section 4.1, the cost of MFPriority (presented at Section 3.4) is in O(|Obj|(k|Obj| + |B|2 )) for k facets. Let us now suppose that in algorithm MFPriority we use PrefOrderOpt instead of PrefOrder. The cost of MFPriority in that case is in O(|A|(k|A| + |B|2 )). Analogously, one could adopt Alg. 9 and calculate accordingly the complexity of MFPriority. Now we will introduce an alternative approach for supporting what we call efficient priority-driftage. We refer to the scenario where the user changes priorities with one click, and we want the new ordering of objects to appear instantly. To begin with, as long as the user does not submit an action, each (Bi , ⊑) can be kept stored and reused. Now suppose the user is inspecting an answer set A and he changes facets (i.e. he clicks on one facet) just for changing the priorities. Specifically, suppose the user has already specified B i ▷ Bj , meaning that both LB i and LB i ▷B j have already been computed (according to MFPriority). Suppose that the user now clicks on Fj just for changing the prioritized multi-facet preference to Bj ▷ Bi . According to the approach presented at Section 3.4, the application of MFPriority

84

Chapter 4. Complexity and Optimizations

will first compute LB j and finally it will produce LB j ▷B i . The key idea of the alternative algorithm is that we can avoid calling Alg. PrefOrder for each block of LB j at Step 2 of Alg. MFPriority. Specifically we will show that from LB i and LB j we can compute LB j ▷B i . It can be easily proved that the first blocks of LBj ▷B i is the restriction of LB i on the objects of the first block, say Aj1 , of LB j . This means that the first blocks of LB j ▷B i can be obtained by scanning once LB i and deleting (skipping) each object encountered that does not belong to Aj1 . In the example of Section 3.4, the first two blocks of LB Loc ▷BM anuf (i.e. the blocks {F 1, F 2}, {L}), can be obtained by replacing the first block of LB Loc (i.e. {L, F 1, F 2}) by what is left after scanning LB M anuf and ignoring the elements that do not belong to {L, F 1, F 2}. Since LB M anuf = ⟨{B, A1, A2, F1, F2},{L}⟩ we will get ⟨{F 1, F 2}, {L}⟩. The cost of this approach, and assuming two facets, is in O(|A|2 ), since we have to scan |A| elements and for each one of them to perform a lookup to a set that consists of at most |A| elements. If we have k facets then the cost is in O((k − 1) ∗ |A|2 ). Notice that its cost is independent of the number of user preference actions B i for each facet assuming that the user does not submit new actions. However this approach requires keeping in memory LB 1 , · · · , LB k . Each has at most |A| objects (according to the suggested scenario), therefore the main memory cost is k ∗ |A| where k is the number of facets. To summarize, an alternative to Alg. MFPriority policy is to compute and have stored the bucket order LB i for each 1 ≤ i ≤ k. Then any prioritized composition of these sets of actions can be obtained by the method just described. The cost of priority driftage in this case does not depend on the number of preference actions but requires hosting k|A| objects in main memory. Top-K Prioritized Composition. Now suppose that the user wants (or the user screen has place for) only the top-P hits where P is a positive integer. We can exploit this constraint to speedup the process. In particular, from the bucket order of the first in priority B i , we can get the minimum number of blocks whose cardinality if summed is greater or equal to P (if this is possible, i.e. if P ≤ |A|). For instance, if P = 4 in our example then we will get only the first 2 blocks of LB Loc .

4.3.2

Pareto Composition

According to Section 4.1, the cost of MFPriority (presented at Section 3.4) is in O(|Obj|(k|Obj|+|B|2 )) for k facets. Let us now suppose that in algorithm MFPareto we use PrefOrderOpt instead of PrefOrder.

4.3. Optimizations for Multi-Facet Preferences

85

The cost of MFPriority in that case is in O(|A|(k|A|2 + |B|2 )). Analogously, one could adopt Alg. 9 and calculate accordingly the complexity of MFPareto.

4.3.3

Combination of Priority and Pareto Compositions

Using the optimizations described in Sections 4.3.1 and 4.3.2, the cost of the combination of the two algorithms is in O(|A|(k|A|2 + |B|2 )).

86

Chapter 5

Applicability and the System Hippalus

Contents 5.1

5.2

Application in Searching

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

5.1.1

Case: Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

5.1.2

Case: Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

5.1.3

Case: RDF Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

Hippalus: A Preference Enriched Faceted Exploratory System . . . . . . . . . .

98

5.2.1

Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

5.2.2

Visualization and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2.3

Interaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

The objective of this chapter is to elaborate on the feasibility of the proposed interaction and preference framework over different application domains. Furthermore it presents the design and implementation of Hippalus a prototype system that realises the preference enriched FDT. In more detail, Section 5.1 elaborates on the applicability of the proposed approach over an FDT-based WSEs, relational databases and RDF/S respectively. Finally, Section 5.2 describes the Hippalus system. 87

88

Chapter 5. Applicability and the System Hippalus

5.1

Application in Searching

Searching is a process that can be applied over a number of different application domains. Here we elaborate on the feasibility of the proposed interaction and preference framework over WSEs, relational databases and RDF/S respectively.

Figure 5.1: Processes of Web Searching and Exploratory Web Searching

The left part of Figure 5.1 shows (with a traditional WSE in mind) how search is performed today. On the other hand, the right part of Figure 5.1 and Figure 5.2 showcase the proposed approaches for exploratory and preference based searching. The same processes can be applied over the relational databases and the Semantic Web domains, by submitting instead of a free text query the appropriate SQL or SPARQL queries and applying FDT and the proposed preference framework over the results.

5.1. Application in Searching

89

Figure 5.2: Process of Exploratory Web Searching Enhanced with Preference Actions

5.1.1

Case: Web Searching

One application domain of the proposed approach is that of Web searching. Commonly, the various static metadata that are available to a search engine (e.g. domain, language, date, filetype, etc.) are exploited only through the advanced (form-based) search facilities that some WSEs offer (and users rarely use). An

90

Chapter 5. Applicability and the System Hippalus

approach that exploits such metadata by adopting the interaction scheme of FDT exploration was first proposed and analyzed in Papadakos et al. (2009a). The proposed process for exploratory web searching is sketched in the right column of Figure 5.1. Specifically the process constitutes of the following steps: • The user submits a free text query which he assumes that corresponds to his specific information need • The system computes a ranked set of pages, documents, items • Available static metadata are then loaded to the system for these items (i.e. date, language, filetype, domain, etc.) • Available small top − K excerpts (i.e. snippets), which can be produced in real-time, are then computed • Based on the previously top − K computed snippets, we can mine dynamic metadata, by using a clustering or entity mining algorithm • Then the FDT interaction scheme is applied, by calculating for each facet, the corresponding values and count numbers (i.e. static and dynamic metadata and their values) • The system visualizes the available information regarding the facets, terms and objects • The user can explore the information space by restricting his focus • The can explore the next top − K results, by dynamically mining metadata from the next top − K snippets • Finally, if the user is not satisfied with the results returned by the initial query, he submits a new query The previous process can be enriched with preference expression over the facets, zoom-points and objects. Figure 5.2 depicts the process in more detail. The difference with the previous process is that now the user can express preferences over the visualized facets, zoom-point and objects. Then the system computes and presents to the user the most preferred facets, zoom-points and objects, according to the expressed user preferences.

5.1. Application in Searching

91

Since the first two steps of the above process correspond to the query-and-answer interaction that current WSE offer (which is stateless), what we propose essentially extends and completes the current interaction. Note that the FDT interaction scheme has already been implemented over the Mitos WSE1 . Figure 5.3 shows the GUI of that engine. This figure shows 5 facets and their values and counts for the user submitted query “library”. Specifically, one facet is dynamically mined (i.e. By clustering) while the rest 4 facets are based on static metadata (By domain, By date, By filetype, By language). To the best of our knowledge, there are no other WSEs that offered the same kind of information/interaction at that time. A complete presentation that also includes an incremental algorithm for speeding up the interaction and the results of a user study is available to Papadakos et al. (2012a).

facet

A (objects of focus)

Facet based on dynamic metadata extracted from the top-k resources

Facets based on static metadata

zoom points

facet

facet

facet

Figure 5.3: Mitos GUI for Exploratory Web Searching Regarding preferences, the default operation mode of Mitos (and of most FDT search engines) is captured by the following actions: 1

Under development by the Department of Computer Science of the University of Crete and FORTH-ICS (http://groogle.csd.uoc.gr:8080/mitos/).

92

Chapter 5. Applicability and the System Hippalus

facets order: lexicographic min terms order: count max objects order: Relevance value max

This indicates that the language presented in Section 3 can capture the default behaviour of various systems. In order to extend the exploratory web searching process with preferences we have to add the two additional steps of the process depicted in Figure 5.2. This functionality can be provided in the same way as described later on in Section 5.2.2.

5.1.2

Case: Relational Databases

The interaction scheme of FDT can also be applied over relational databases (i.e. over a single table or over the results of a query defined by using the query language (SQL)). If we want to explore data that are not stored in one table, then we can exploit the view mechanism that relational databases provide: a view is actually a named SQL query, over which other queries can be formulated as if it was a table of the database. Specifically, we can define a view comprising attributes from different relations (tables) and its definition may include joins, and various other descriptions. Subsequently, we can apply the FDT interaction scheme over the contents of this view (i.e. over its answer), by assuming that each attribute of the view is a facet, and the set of its distinct values that appear in that attribute correspond to the terms of this facet. The tuples in the answer of that view are the objects. This is the ”straightforward” approach to apply FDTs and preference-based browsing over relational sources. Let us now compare this method with Preference SQL at a syntactical level, i.e. the usability of our method in comparison to using directly preference SQL. Suppose the following table Car: Id

Manufacturer

Color

o1

VW

Silver

o2

Ferrari

Red

o3

Fiat

Yellow

o4

BMW

Silver

o5

Kia

Silver

o6

Lexus

Silver

o7

Toyota

Silver

o8

Kia

Silver

o9

Fiat

Red

5.1. Application in Searching

93

Consider now a user that wants to buy a car and assume that he prefers European cars to any other cars. Also he likes Lexus equally to European cars. Finally Fiat and Kia are his least preferred brands. The above preferences can easily be expressed using our preference language. Specifically, they can be expressed as preference actions that are anchored to terms, with objects as their scopeType Such a preference could be expressed using Preference SQL (Kießling et al. (2011a)). Preference SQL returns the BMO set. i.e. the Pareto optimal. The query could have the following format, assuming that the user knows all the distinct manufacturers that can be stored in the database: SELECT * FROM CAR PREFERRING Manufacturer EXPLICIT ('Kia' < 'Toyota', 'Kia' < 'Ferrari','Kia' < 'Lancia', 'Kia' < 'Citroen', 'Kia' < 'Peugeot', 'Kia' < 'BMW', 'Kia' < 'VW', 'Kia' < 'Lexus', 'Fiat' < 'Toyota', 'Fiat' < 'Honda', 'Fiat' < 'Ferrari', 'Fiat' < 'Lancia', 'Fiat' < 'Citroen', 'Fiat' < 'Peugeot', 'Fiat' < 'BMW', 'Fiat' < 'VW', 'Fiat' < 'Lexus', 'Toyota' < 'Ferrari', 'Toyota' < 'Lancia', 'Toyota' < 'Citroen', 'Toyota' < 'Peugeot', 'Toyota' < 'BMW', 'Toyota' < 'VW', 'Toyota' < 'Lexus')

In this query the user explicitly defines the preference relation over all available manufacturers. An alternative and simpler query is to provide a layered form of the user preferences: SELECT * FROM CAR PREFERRING Manufacturer LAYERED(('Ferrari', 'Lancia', 'Peugeot', 'Citroen', 'BMW', 'VW', 'Lexus'), ('Toyota'), ('Kia', 'Fiat))

These queries will return the following bucket order ⟨{o1 , o2 , o4 , o6 }⟩. The first query is too complex for a plain user and presupposes knowledge of the schema and the values stored in the database. The second query is simpler, but again presupposes that the user is able to construct the appropriate layers of the preference relation and that he knows the available stored values and database schema. Furthermore, such a query must be given in one shot. If the user changes his preferences over time or by exploring the available objects, then he must submit a reformulated query. If the user was exploring this table using the interaction scheme of FDT he would get the facets and zoom-points shown in Figure 5.4. Subsequently, he would be able to express his preferences interactively

94

Chapter 5. Applicability and the System Hippalus

Figure 5.4: Facets and Zoom-Points of Running Example (by clicking on values and selecting the desired action). Furthermore, he would express his preferences gradually, until the point that he gets a list of results that satisfies him.

5.1.3

Case: RDF Bases

Recently, the amount of data published on the public Semantic Web has exploded, especially in the form of Linked Data2 (Bizer et al. (2009)). Specifically, by September 2011 available datasets had grown to 31 billion RDF triples, interlinked by around 504 million RDF links3 . The interaction scheme of FDT can also be applied over the Semantic Web and there are already several browsers that provide FDT over RDF. Examples include /facet (Hildebrand et al. (2006)), Ontogator (Mäkelä et al. (2006)) and BrowseRDF (Oren et al. (2006)). RDF/S sources actually follow a structurally object-oriented model. The structuring of information assumed by FDTs is simpler: objects described by attributes whose values may be hierarchically organized, meaning that associations between objects of the same or different types are not assumed. This implies that for applying FDT over RDF/S sources one has to decide the part (in its original native form or transformed) of the RDF/S source that should be explored according to FDT. One way for this is to follow the approach described for relational sources. Specifically, we can specify the desired part (or the desired transformation) by a SPARQL query. Then we can apply FDT over the results of this query. Since the structure of the results is actually a relational table, we can apply FDT exploration as in relational tables. 2

Wikipedia defines Linked Data as a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF 3 http://en.wikipedia.org/wiki/Linked_data

5.1. Application in Searching

95

An alternative way, that does not use a query but instead specifies the part of the source that should be explored is described below. Moreover this method can exploit the subClassOf relationships. Specifically, subClassOf relationships are treated as hierarchically organized values. In this case, the objects of interest (i.e. the set Obj) can be defined by selecting one class of the source: all direct and indirect instances of this class constitute the set Obj. For instance, assuming the case of Fig 5.5 we can define that the objects of interest are the instances of the class Vehicle. As facets we can consider the properties that start or point to the above class. Moreover the class hierarchies (of Vehicle, Location, Manufacturer) are exploited. Specifically, in this example it is like having three facets: type whose values are the hierarchy of Vehicle, madeBy whose values are the instances of Manufacturer, organized hierarchically through the subclasses of that class, and locatedIn whose values are the instances of Location, organized hierarchically through the subclasses of that class. More expressive exploration models which exploit the full structuring of RDF/S sources (even its fuzzy extensions (Manolis and Tzitzikas (2011))), go beyond the scope of this work.

Figure 5.5: Example of RDF/S

96

Chapter 5. Applicability and the System Hippalus

5.2 Hippalus: A Preference Enriched Faceted Exploratory System To demonstrate the feasibility of our approach and for identifying possible difficulties or other issues related to implementation and application, we have designed and implemented a proof of concept prototype, named Hippalus. The logo of Hippalus is a ancient greek boat with the ≻ symbol of preferences as sail 4 . This system was used for the user study described in Section 6.5.

5.2.1

Software Architecture

Instead of starting from scratch, we have decided to design and build Hippalus over RDF/S sources and RDF/S managing software. Specifically, we have implemented the proposed preference framework over a prototype for browsing and exploring RDF sources5 based on the model described in Manolis and Tzitzikas (2011), apart from the aspect of fuzzyness. Hippalus uses Jena6 , which is a Java framework for building Semantic Web applications. The architecture of the system and its components is given in Figure 5.6. The user submits his preferences through HTML5 context menus, which are then translated to statements of the preference language described in Section 3.1. These statements are then send to the servletbased server, through HTTP requests. The server checks the validity of the received requests and analyze them using a parser of the preference language described in Chapter sec:IPS. If the action is valid, it is passed as input to the appropriate preference algorithm (as described in Chapter 3 and 4). To query the underlying RDF information base, we use Jena through a Data Manager component, for abstracting the details of this particular component. Finally, the computed preference relation and therefore the preference bucket order is send to the State component, which in turn updates the UI through an HTTP response.

4

Hippalus was a Greek navigator and merchant who probably lived in the 1st century BCE. He is credited to have discovered the direct route from the Red Sea to India over the Indian Ocean by plotting the scheme of the sea and the correct location of the trade ports along the Indian coast, and by taking advantage of the monsoon wind. 5 The information base that feeds Hippalus is represented in RDF/S, using a schema adequate for representing objects described according to dimensions with hierarchically organized values. 6 http://jena.apache.org/

5.2. Hippalus: A Preference Enriched Faceted Exploratory System

97

Figure 5.6: System Architecture Manufacturer

European

American

U.K.

U.S.A.

Aston_Martin

Vehicle_Type

Drive_System

2-Wheel_Drive All-Wheel_Drive

Car

Truck

2-Wheel_Drive, Rear

J eep

3 1

835

Figure 5.7: The RDF Knowledge Base

Transmission

Manual

Semi-automatic

98

Chapter 5. Applicability and the System Hippalus

5.2.2

Visualization and User Interface

Regarding visualization and FDT one has to decide where and how to visualize: the focus (current object set), the facets, the zoom points (and their count information), the intentional description of the current state, and finally the information related to preferences. These are the main decisions. The most widely adopted approach or policy (evidenced by the UI design of global systems like booking.com), is to use a left bar for the facets and the corresponding zoom points, the right area for the scrollable list of objects in the focus, and a top small area for the description of the current state. For each of these elements, various visual elements can be used. A thorough description is available at Chapter 4 of Sacco and Tzitzikas (2009) book. In our case we have to decide where to show the preference-related information and actions, since this has not been supported by any system so far. Regarding preference actions, one approach is to provide the preference-related action through right-click activated pop-up menus. This policy does not require allocating permanent screen space for these actions. However the user should be aware that these options exist. The design of the preference actions, includes actions that are anchored to one element, and this makes the right click activated actions straightforward. However, the proposed preference based framework also supports actions that concern two elements (i.e. relative preferences like German ≻ Italian). Regarding the way the description of the current state is shown to the user, the user should be able to view not only the intentional description of his current state, but also the accumulated preferences that he has formulated. Finally, the user should be able to store and load his preferences, since exploration is a time depth process. Based on the above requirements, we have designed a Web application that offers exploration services for a set of objects described in using several dimensions, where all this information is represented in RDF. In this case we map facets and terms from FDT to classes and subclasses respectively. The preference actions are offered through HTML 5 context menus7 and AJAX, which are enacted by right clicking in the browser window. The user is able to order classes, subclasses and objects using best, worst, prefer to actions (i.e. relative preferences), around to actions (over a specific value), or actions that order them lexicographically, or based on their values or count values. Furthermore, he can compose object 7

Available only to firefox 8 and up.

5.2. Hippalus: A Preference Enriched Faceted Exploratory System

Figure 5.8:

99

Hippalus: The Main Page of Hippalus. (a) Shows the Area where Facets and Terms are Displayed, (b) the Ranked Objects Area, (c) the Preference Actions History and Composition Tool, (d) ‘Interesting Objects’ Tool (i.e. Like a Shopping Cart) and (e) the Object Restriction History

related preferences, using Priority, Pareto, Pareto Optimal, and Combination8 compositions, by selecting the appropriate composition mode and selecting classes through the classes’ context menus. The default composition is Combination. Regarding objects, since their number can be very large, the user is able to define a threshold, so that preferences are applied only when the number of objects is reduced under this threshold9 . Options and parameters regarding the system functionality can be set through a drop-down 8

Order according to priorities if defined. The rest actions use Pareto composition and are the least prioritized. The user can reduce the number of objects (simple menus support only actions affecting objects), by navigating over the classes, subclasses, and objects and restricting his focus. 9

100

Chapter 5. Applicability and the System Hippalus

menu (i.e. simple or full support of preference menus, threshold, evaluation parameters, load-save etc.)

5.2.3

Interaction Example

For demonstration purposes as well as for the needs of the user study (described in detail at section 6.5) we have constructed an information base about cars. Each is described using classes like Manufacturer and Drive_System, which are hierarchically organized, while the rest like Vehicle_Type are flat (as shown in Fig. 5.7). In this figure, continuous arrows denote subClassOf relationships while dashed arrows denote typeOf relationships. The information base contains 50 cars, indexed under 23 classes and 85 subclasses. Here we describe a more complete scenario demonstrating how hard and soft constraints can be specified by the user, in an easy and gradual manner. It also aims at making clear the merits of the underlying preference framework (preference inheritance and scope-based conflict resolution). A video showcasing this scenario is available online10 . Figure 5.8 shows the main page of Hippalus over the collection of 50 cars. Specifically, part (a) shows the attributes, their values (which can be hierarchically organized), accompanied by the number of their occurrences, where the user can restrict his focus or express preferences anchored to them. Part (b) depicts the objects area, which is ranked according to preference, part (c) shows the preference actions history and composition tool, part (d) displays the ‘Interesting Objects’ tool (i.e. like a shopping cart) and finally, part (e) the object restriction history. Figure 5.9 shows that one can expand broad values, like Asian (from the attribute Manufacturer), and that by clicking on the value Korean the focus is restricted on three Korean cars. Notice that the left bar has been updated, i.e. only the values that appear in the restricted set are presented (all attributes have count up to 3). With additional clicks the user can further reduce the focus, e.g. from the attribute Fuel Type we can see that one of the cars consumes Diesel and two cars Gasoline. By clicking on Gasoline we see these two cars and by mouse over one of them the user gets its “Object Card” showing all attributes of that car. At the right bottom frame the user can see the history of his clicks and can undo any click. Preferences are activated through right click menus. Suppose we cancel all clicks and assume that 10

http://www.youtube.com/watch?v=Cah-z7KmlXc

5.2. Hippalus: A Preference Enriched Faceted Exploratory System

101

Object restriction

mouse over

(b) Figure 5.9: Hippalus: Value Expansion - Object Restriction

Figure 5.10: Hippalus: Expression of Relative Preference Korean ≻ European

we want to express that we prefer Korean cars than European. This means that we do not want to see only Korean; we just want to get them ranked higher than European. This is shown in Figure 5.11.a (top) where we see that now the user is getting a linear list of blocks of equally preferred objects, here the

102

Chapter 5. Applicability and the System Hippalus

first contains Korean cars, the next one European (thanks to inheritance the user does not have to say anything about German, Italian, French, etc). It is important that preferences can be expressed incrementally, and at any point during the interaction, e.g. suppose that the user also prefers prices around 12,000. He can use the action around 12,090 as shown in Figure 5.11.a (bottom). We can see that the object order now becomes more refined (the figure shows 14 blocks). Notice that the first block contains one Korean (Hyunday) and one Fiat. This happens because both of his preference actions have the same priority (and Fiat is closer to 12090). If the user wants to give higher priority to one preference he can use the right frame dedicate on this. Figure 5.11(b) shows the object order obtained after expressing that the preferences on manufacturers have higher priority than the preferences over prices. At any time the user can click on a value from a facet to restrict the current focus, which is now a preference-based list of cars. For instance, if the user wants to see only cars having two doors, he can click on 2 in the attribute Doors. We can see that now he gets only 8 cars, which are ranked according to his preferences so far. The user could cancel this extra restriction from the object restriction history. In general the user can combine object restriction (or relaxation) actions and preference actions in any order. Figure 5.14 shows the ranked list of objects, after restricting our focus only to cars that have 2 doors. The two previous preference actions (i.e. Korean ≻ European and price around 12090) are used for the final preference ranking of objects. The composition of preference actions is shown in Figure 5.12. Specifically, we have created two priority levels, by pressing the ‘Add Priority Level’ button. Then we defined the desired priority order, by drag-and-drop facets to the appropriate priority levels. As the user changes the priority levels, for example by placing Manufacturer to Level 1 priorities and Price to Level 2 priorities, the system calculates on the fly the new order of objects. Notice how refined the objects ranking is, because of the second preference action that orders the cars around the price 12090. If we revert the priorities, the objects order changes Figure 5.13. The default composition mode is the Combination mode. This mode shown in Figure 5.14, is like Pareto, if no priorities are defined.

5.2. Hippalus: A Preference Enriched Faceted Exploratory System

(a)

(b) Figure 5.11:

Hippalus: (a): Expressing Preferences, (b): Object Restrictions after Preference Expression

103

104

Chapter 5. Applicability and the System Hippalus

Figure 5.12: Hippalus: Composition of Preference Actions. Manufacturer Prioritized to Price

Figure 5.13: Hippalus: Composition of Preference Actions. Price Prioritized to Manufacturer

5.2. Hippalus: A Preference Enriched Faceted Exploratory System

Figure 5.14: Hippalus: Composition of Preference Actions. Default Combination Mode

Figure 5.15: Hippalus: Restricted Focus with Preferences Applied

105

106

Chapter 6

Evaluation

Contents 6.1

Evaluation Approaches & Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.1.1

Metrics for Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1.2

Metrics Related to the Proposed Interaction Scheme . . . . . . . . . . . . . . . 114

6.2

Theoretical Analysis of the Number of User Decisions and Effort in FDT . . . . . 116

6.3

DiFEPreKO Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3.1

Analytical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3.2

User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.4

Evaluation of Various Exploration Approaches . . . . . . . . . . . . . . . . . . . 131

6.5

Evaluation of Hippalus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.6

Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

The objective of this chapter is to elaborate on how the proposed preference-based interaction scheme could be evaluated. Specifically, Section 6.1 reviews the related work, identifies the various metrics and evaluation approaches that have been proposed or used and are related to our interaction scheme, and proposes new metrics for decision making. Consequently, Section 6.2 studies theoretically the convergence of FDT-based UIs and the required user effort with or without preference actions. Afterwards, Section 6.3 introduces and evaluates through a simple experiment, an hypothesis saying that without 107

108

Chapter 6. Evaluation

the ability to explore the existing choices, the expression of preferences can be time-consuming and result to incomplete preferences. Furthermore, Section 6.4 and 6.5 discuss two user-based evaluations. The first one shows the effectiveness of FDT for exploratory tasks, while the second one evaluates the proposed preference-based scheme over Hippalus and discusses the results of the evaluation. Finally Section 6.6 concludes this chapter.

6.1

Evaluation Approaches & Metrics

Here we discuss a number of exploratory search metrics and we identify metrics that are relevant to our proposed preference-based interaction scheme. Furthermore, we study theoretically the convergence of FDT-based UIs and the required user effort with or without preference actions.

6.1.1

Metrics for Exploratory Search

One characteristic of any ES approach is that it is session-based. With session-based we refer to a dialogue between the user and the system such that the response of the system (e.g. answer, branch shown) does not depend only on the current user request (e.g. query, click) but also on his previous request and session history in general. Furthermore, according to Marchionini (2006) ES is recall-oriented. As a result, standard single query metrics like traditional Precision and Recall metrics or Instance Recall (Over (1997)), which allow multiple queries per session (rewarding for the number of distinct relevant answers identified in a session of a given length), are inefficient for evaluating session-based information tasks. The evaluation of systems offering session-based IIR is difficult, because of the complexity of data relationships, diversity of displayed data, interactive nature of exploratory search, along with the perceptual and cognitive abilities offered. They rely heavily on users’ ability to identify and act on exploration opportunities, as described in White et al. (2007). Discussions of available methods and metrics for evaluating experimental UIs for web searching are provided in the works of Käki and Aula (2008) and Kelly et al. (2009). Kanoulas et al. (2011b) give an overview of data collections and metrics for the evaluation of session-based IR1 . Finally, a recent survey regarding the evaluation of web retrieval effectiveness, is provided in Carterette et al. (2012). According 1

The 2012 session can be visited in http://ir.cis.udel.edu/sessions/index.html

6.1. Evaluation Approaches & Metrics

109

to it, we can group evaluation methods and metrics in three different categories. The first one include traditional metrics, that do not make any assumption regarding the user. The second one tries to use simple user models and finally, the third uses advanced user models. Figure 6.1 shows the different groups of metrics, which are described below. First Group of Metrics: No User Model The first group of metrics assumes binary relevance (i.e. a document is either relevant or not) and is based on sets of documents and not ranked lists. Specifically, it includes the traditional metrics of Precision and Recall from the Cranfield’s studies and their combinations, like F-measure and Average Precision. Average Precision is the most widely used metric in IR. Second Group of Metrics: Simple User Model The second group includes metrics that make simple assumptions about user behaviour. For example the Expected Search Length metric described in Cooper (1968), assumes that the user walks down a ranked list of documents and observes every document until a stopping point. This is the point where he satisfies his need. This metric uses a cost function for each visited document, based on the relevance of the document. In addition, Robertson (2008) demonstrates that Precision and Recall can be redefined using the above user model, by defining the utility of each document. In this case, the Precision becomes a measure of utility and Average Precision becomes an expectation of utility over a number of browsing decisions. Furthermore, Robertson et al. (2010) proposed the Graded Average Precision (GAP ), a new measure that redefines Average Precision by taking into consideration different relevant grades. Specifically, this metric assumes that the user regards as relevant documents that have a relevance value over a specific threshold. The Rank Biased Precision (RBP ) measure, described in Moffat and Zobel (2008), tries to incorporate the user’s persistence to examine a certain number of documents in the results list (e.g. the user looks only the first result, or the top ten results). However, this measure does not take into account the quality of the answer. Another relative metric is the Discounted Cumulated Gain (DCG) and its variations. The best known is the normalized discounted cumulative gain (nDCG), described in Järvelin and Kekäläinen (2002), which uses a graded relevance scale of documents and measures the usefulness or gain of a document, based on its position in the result list. The gain is accumulated from the top of the list to

110

Chapter 6. Evaluation

Figure 6.1: Available IR Metrics

the bottom, with the gain of each result discounted at lower ranks. The result is normalized by dividing with the DCG of the ideal ranked answer set. Schuth and Marx (2011) suggest an adaptation of nDCG for FDT-based information systems. Specifically, they are interested in which facet-value pairs will be

6.1. Evaluation Approaches & Metrics

111

presented to the user. They also propose nrDCG, which is a recursive version of DCG. In the same manner, Expected Reciprocal Rank (ERR) described in Chapelle et al. (2009), tries to overcome the problem of DCG and RBP , that a text in a specific position has always the same profit, by taking into account the quality of the response of the system. It is a popular measure for tasks that return a single relevant document and is based on the cascade user model. This model assumes a user that accumulates utility by stepping down the ranked list and decides whether to continue browsing based on the accumulated utility. Yilmaz et al. (2010) proposes the EBU metric, a similar metric to ERR, which uses the same cascade user model, but in addition takes into consideration the effect of document snippets.

Third Group of Metrics: Advanced User Model The third category includes metrics that use more advanced user models. We can consider two different subfamilies for these metrics. The first includes metrics that take into consideration novelty and diversity. Examples include subtopic generalizations of recall and precision as described in Zhai et al. (2003), where the user gets utility from each different topic that was retrieved. In addition, the intent-aware family of measures described in Agrawal et al. (2009), assumes that there is a probability distribution over subtopics. In Clarke et al. (2008), the a-nDCG metric, takes into account duplicate text, by penalizing duplicate text. Finally, Chapelle et al. (2011) describe an intent-aware ERR, computed as a weighted average of ERR over intents. The second subfamily includes the metrics that are session-based. This subfamily includes nsDCG, a variant of nDCG for sessions, which is described in Järvelin et al. (2008) and incorporates a cost for each query reformulation. Furthermore, the work described in Yang and Lad (2009), proposes a theoretical probabilistic framework that takes into consideration the user interactions over multi-session ordered lists, in order to evaluate and optimize information distillation2 . The associated user models is a user that steps down a list until a point where he reformulates his query and begins again from the new ranked list. Finally, recent work of Kanoulas et al. (2011a) generalizes the traditional measures of IR such as Precision, Recall and Average Precision during a session. These metrics assume that a user steps down a ranked list until a point where he either reformulate his query or abandons the search. 2

Information Distillation is an emerging area of research, which focuses on the effective combination of ad-hoc IR, novelty detection and adaptive filtering.

112

Chapter 6. Evaluation

Other Evaluation Approaches In addition to all the above, the work of Kules et al. (2009) examined the interaction with a faceted online library catalogue and found that facets are very important in exploratory processes. Azzopardi (2009) represents the usage of an ES as a stream of documents and studies the performance of such systems based on time and usage. Kules and Capra (2008) discusses ways to create exploratory tasks for faceted search UIs. Wilson and Schraefel (2007) propose a method for evaluating exploratory search by blending IR frameworks with HCI design. Works that use statistical methods such as factor analysis (FA) and structural equation models (SEM) in order to examine the interrelationships between multiple evaluation criteria are described in Toms et al. (2005) and O’Brien et al. (2008) respectively. Finally, Carterette et al. (2011) simulates user behaviour by using ’click’ data and a Bayes procedure.

6.1.2

Metrics Related to the Proposed Interaction Scheme

Regarding the evaluation of our preference-based interaction scheme, we will consider both non sessionbased and session-based metrics, which can be measured at each step of the interaction. We include both non session-based and session-based metrics, so that we can conclude how each user action affects the results set and the user task respectively. Non-session based metrics The following non-session based metrics could be beneficial for the evaluation of our approach: Average Precision. It is one of the most commonly used metrics, since it takes into consideration both precision and recall. It is calculated by the following formula:

AP =

n ∑

p(i)δr(i)

(6.1)

i=1

where p(i) is precision of document in position i of the search results, δr(i) is the difference in recall from document in position i − 1 to document in position i and n is the number of objects in the result set. normalized Discounted Cumulative Gain - nDCG. Discounted Cumulative Gain, is a metric described in Järvelin and Kekäläinen (2002), which promotes systems that return relevant documents near

6.1. Evaluation Approaches & Metrics

113

the top of the answer set and penalizes systems that return relevant documents at the bottom of the answer set. It is calculated by the following formula for the position k: k ∑ 2reli − 1 DCGk = log2 (i + 1)

(6.2)

i=1

where reli is the relevance of document i and reli ∈ [0, 1]. The normalized DCG i.e. nDCG in the position r is calculated by diving DCGr with the IDCGr value, which is the ideal DCGr value (documents were returned in the optimum way). Specifically,

nDCGr =

DCGr IDCGr

(6.3)

normalized Discounted Cumulative Gain - nDCG for FDT. An adaptation of nDCG for FDT-based information systems by taking into consideration the facet-value pairs, is described in detail in Schuth and Marx (2011). This metric focuses on two aspects: (a) prefer facet-values that would return a lot of relevant documents high in the return list and (b) prefer facet-values that would return relevant documents we have not seen by earlier facet-values. normalized recursive Discounted Cumulative Gain - nrDCG for FDT. A recursive version of nDCG for FDT is also proposed in Schuth and Marx (2011). Such a metric could be very useful for suggesting the top-K most valuable facet-values to the user, when the display area is limited (i.e. mobile devices). Furthermore, these metrics could provide the default ordering of facets and their values (in addition to the lexicographic, value and count based ordering). normalized Expected Reciprocal Rank - nERR. This is a metric that takes into consideration the usability of the documents in the answer set and is described in detail in Chapelle et al. (2009). This metric is calculated by the following equation:

ERR =

n ∑ 1 r=1

r

P (user stops at position r)

(6.4)

where P (r) is the probability that the user stops searching after the document in position r. This prob-

114

Chapter 6. Evaluation

ability is calculated by the following equation:

P (user stops at position r) =

r−1 ∏

(1 − R(reli ))R(relr )

(6.5)

i=1

where R(reli ) is the probability that document i satisfies the user. In more detail R(rel) is calculated by the following equation: R(rel) =

2rel − 1 2max rel

(6.6)

where max rel is the maximum relevance score. While there is no justification for using this formula (like in the gain function of DCG), values could be inferred from logged user data. ERR can be normalized (nERR), by dividing with the maximum ERR for a specific query. Session-based Metrics Session-based metrics include: Session-based Precision, Recall and Average Precision. These metrics extend the classic precision, recall and average precision metrics for sessions. They are described in detail in Kanoulas et al. (2011a). normalized session Discounted Cumulative Gain - nsDCG. Järvelin et al. (2008) extends DCG and nDCG to a session. This specific metric takes into consideration also the number of queries. The bigger the number of the queries, or the number of interactions in the case of explaratory systems, the smaller the value of the metric. Specifically, nsDCG is calculated by:

nsDCG(q) =

(1 + logbq q)−1 ∗ DCG IDCG

(6.7)

where q is the number of the query or user interaction and 1 < bq < 1000.

6.2

Theoretical Analysis of the Number of User Decisions and Effort in FDT

Here we try to measure the number of choices that a user has to make in order to reach (through exploratory browsing) the desired object, assuming that all objects are described by one or more hierarchically

6.2. Theoretical Analysis of the Number of User Decisions and Effort in FDT

115

organized attributes. Specifically, we theoretically discuss the convergence of FDT-based UIs and the required user effort with and without preference actions.

Convergence of FDT Exploration The algorithm presented at Section 4.2 (Alg. PrefOrderOpt ) is based on the assumption that the focus A can be reduced very fast in a FDT-based interaction. In this section we report an analysis for justifying this claim. Consider one taxonomy having the form of a complete and balanced tree of depth d and degree b. Let n be the number of objects in the information base (which are indexed by that tree). In that case b ∗ d is the number of choices a user has to see in order to reach (select) a particular leaf (i.e. the number of terms whose label the user has to read if he starts from the root of the tree), and d is the number of decisions (i.e. clicks) he has to make. If we want each object to have a distinct description (assuming that each object is classified to one leaf of the taxonomy), then this means that: b ∗ d = b ∗ logb n =

√ d n∗d

(6.8)

The real-valued degree b that minimizes the product b ∗ d is the Euler’s number e, so let us assume that b = 3 is the more beneficial degree. If each leaf should index 10 objects, then for n = 1011 objects we need 1010 descriptions. Assuming b = 3 we get b ∗ d = 3 ∗ log3 1010 ∼ = 3 ∗ 19 = 57 choices, and d∼ = 19 clicks. Now suppose that we have k facets. Finding the desired description requires selecting one leaf from each Ti . As there are k facets, and we must select one leaf from each one of them, the overall displayed choices are obtained by multiplying by k. Since we have k facets, we can obtain the n distinct descrip√ tions if each facet has k n leaves (since their cartesian product yields n distinct descriptions). In this √√ √ √ case, the depth d of a facet equals to logb k n, and the degree b is d k n = d∗k n. It follows that the overall displayed choices are: √ 1 b ∗ d ∗ k = b ∗ logb ( k n) ∗ k = b ∗ logb (n k ) ∗ k = k

= b ∗ logb (n k ) = b ∗ logb n (independent of d) √ b ∗ d ∗ k = d∗k n ∗ d (independent of b)

116

Chapter 6. Evaluation

and the number of clicks required is k ∗d. Some indicative values of these parameters are shown in Table 6.1. From the last row we can see, that in order to select the desired 10 objects from a peta-sized (∼ 1015 ) information base, the user has to make 30 clicks. Table 6.1: Choices and Number of Clicks n/10

k

b

d

531.441 (∼ 106 ) 3.486.784.401 (∼ 1011 ) ∼ 1015

3 5 10

3 3 3

4 4 3

Num. of Choices b∗d∗k 36 60 90

Num. of Clicks k∗d 12 20 30

In the previous analysis we have considered plain faceted taxonomies, not dynamic ones. According to FDT, during the interaction process the only displayed terms are those whose addition to the current selection yields a conjunction having a non empty extension. So, although the number of clicks will not be reduced, the number of choices (i.e. the number of terms the user has to read) will be less, since each displayed term will not have all of its b children active. From this small example we can realize the potential of FDT on rapidly reducing very big information spaces. We should also mention that the analysis in Sacco (2006b) shows that 3 zoom operations on leaf terms are sufficient to reduce an information base of 107 objects described by a taxonomy with 103 terms to an average of 10 objects. Plain FDT versus FDT with Preferences w.r.t. User Effort Note that term-scoped preferences (i.e. those that order the terms of a facet according to user’s preferences) make the aforementioned choices less laborious since the more desired options are shown first. Specifically if we assume that a preference relation for each facet has been defined, and we assume that the most preferred choice from each facet is prompted first (and it is unique), then the cost of the required decisions is not b ∗ d ∗ k but 1 ∗ d ∗ k since the user just clicks on the first choice without having to look at the rest choices. Returning to the context of the car selection use case, if we assume that each of the 7 billions persons living on this planet sells one car, then for n = 109 objects we need 108 descriptions if we want to reach a block comprising 10 cars. Assuming k = 10 and degree b = 3 we get that b ∗ d ∗ k = b ∗ log3 108 ∼ = 3 ∗ 15 = 45 choices have to be displayed (using plain faceted taxonomies), and certainly less than 45

6.3. DiFEPreKO Hypothesis

117

using dynamic taxonomies. If we assume that preferences have been defined for each of the k = 10 facets, then the choices are reduced to 15, which is equal to the number of required clicks.

6.3

Difficulty of Formulating Effective Preferences without Knowing the Options (DiFEPreKO) Hypothesis Evaluation Through a User Study

In this section we introduce the Difficulty of Formulating Effective Preferences without Knowing the Options (DiFEPreKO) hypothesis: Hypothesis Without the ability to view and explore the existing choices, the expression of preferences is timeconsuming and in most cases results to incomplete preferences (i.e. preferences that are not sufficient for selecting the most desired option from a particular set of choices). Initially, we provide an analytical comparison between extensional and intentional preferences regarding effort, completeness and correctness. Afterwards, we describe the conducted user study for evaluating the DiFEPreKO hypothesis. We present the results and conduct a statistical significance test to check the randomness of our results.

6.3.1

Analytical Comparison

Let A1 , . . . , Ak be the k attributes that are used for describing the choices, and let dom(Ai ) denote the set of values that Ai can take (for each i = 1..k). Let V be the cartesian product of the domains of the attributes, i.e. V = dom(A1 ) × . . . × dom(Ak ). We can consider that a complete (over V ) intentionally specified preference aims at defining a linear order of the elements of V . Let denote by ip an intentionally specified preference and let ≻ip denote the linear order of V that ip defines. Now consider a specific set of choices S ( S ⊆ V ). We can consider that a complete extensional preference over a set of choices S aims at defining a linear order of the elements of S. Let denote by ep the preference specification and let ≻ep denote the defined linear order of S that ep defines.

Completeness and Correctness of Intentional Specified Preferences

118

Chapter 6. Evaluation

Consider that we have an S, the user has defined an ≻ep , and suppose that we consider ≻ep correct. We could say that an ip is correct and complete with respect to ≻ep , if the restriction of ≻ip on S is equal to ≻ep . However note that since in decision tasks humans mainly have to select the most desired element (the hotel to book, the car to buy, the place for holidays), and not to order the entire list of available options, in our user study (described afterwards), we will consider and compare only the first and the second most preferred elements, i.e. only the two most preferred elements according to ≻ep and ≻ip .

Effort Required for Expressing Complete Preferences We could quantify, in a very rough way, the effort required for expressing complete intentional preferences with the amount |V |. Similarly, we could quantify, in a very rough way, the effort required for expressing extensional predicates with the amount |S|. For instance, if we have only one attribute that takes two values, and only two objects, then |V | = |S| = 2, meaning that it is equally laborious to express preferences intentionally or extensionally. If on the other hand we have 10 attributes, each having 10 possible values, and two objects, then |S| = 2 while |V | = 1010 , indicating that it is much more laborious to express preferences intentionally than extensionally in this case. The above specified costs, do not aim at being accurate; they aim to capture the main point. One could easily refine the costs according to various aspects, for instance, according to the type of the attribute values. Specifically, for an attribute Ai we can define Cost(Ai ) = |dom(Ai )| if categorical, else (i.e. if the domain is arithmetic) we can define Cost(Ai ) = 1. The latter because in arithmetic attributes (e.g. horsePower, fuelConsumption, price) commonly the user just has to express whether he prefers the highest values, the lowest values, or those around a specific value, hence he does not have to inspect the available values. In contrast, in categorical attributes (e.g. bodyType, brand, color), the user has to express his preference on the specific values of the attributes. Based on the above perspective, the cost for specifying complete intentional preferences could then defined as Cost(A) = Cost(A1 ) ∗ . . . ∗ Cost(Ak ) (note that Cost(A) ≤ |V |).

6.3. DiFEPreKO Hypothesis

6.3.2

119

User Study

In this user study 30 persons participated, 18 male and 12 female, from 7 countries and of ages between 22 - 75 years old. All of the participants had at least secondary education, while most of them had a MSc or PhD degree3 . The experiment had two steps, Step 1 and Step 2. Step 1 In the first step, all participants were asked to express their preferences, according to the following: Suppose that you have (it is obligatory) to change your car. You have to select and buy a new one, which you will use for the next 5 years, and of course you will have to pay it. Please express your preferences on paper. This paper will be handed to a different person who has at his disposal a limited collection of available cars. This person will select one for you based on the available cars and the preferences that you expressed. You have 30 minutes at most to express your preferences. You are free to express them in any form you like, e.g. in natural language text (e.g. I prefer a car with an engine volume between 1200 and 1400 cc), by providing an ordering of the firms according to your preference (e.g. Japanese, European or BMW, Audi), by specifying the preferred (ideal) price, etc. Other characteristics could include year, body type, engine volume, power, max speed, acceleration, fuel consumption, weight, fuel type, price, trunk, etc. Please measure how much time you spent on this exercise and give us the paper. Step 2 Immediately after completing the first step, participants continued with the second step, in order to avoid users’ preferences alteration. In this step, users were given a list of cars and were asked to identify which car was ideal for them in order to buy it. In total, the list consisted of 50 cars and is shown in Fig. 6.2 and Fig. 6.3. Again, users were asked to measure how much time they spent on finding the ideal car. Results Subsequently, we checked if the paper-written preferences of Step 1 would allow someone to obtain the car selected in Step 2. In case the answer is ”YES” then it means that the preference expression on paper 3

9 persons that participated in this evaluation were participants of the First MUMIA Training Summer School ”Building Next Generation Search Systems” (http://www.mumia-network.eu/index.php/training-school-2012), Olympiada, Chalkidiki, Greece.

Figure 6.2: Evaluation Step B: Users Select a Car from the List (1st page)

120 Chapter 6. Evaluation

Figure 6.3: Evaluation Step B: Users Select a Car from the List (2nd page)

6.3. DiFEPreKO Hypothesis 121

122

Chapter 6. Evaluation

was complete and sufficient in order to select the ideal car. If the answer is “NO”, then the conclusion would be that they did not manage to express their preferences in a sufficient way, in order to get the most desired car from the small list of available cars. In addition, we compared the times users spent in both steps. In order to check the results, a broker was given the forms of Step 1, with the expressed preferences of each one of the participants. Subsequently, he was asked to select the ideal and the second ideal car from the list of Step 2, based on the participants preferences. Users preferences were divided in two categories: Specific and General. Specific preferences are preferences that use specific values of the attributes domain (i.e. ’I prefer red to yellow cars’ or ’I want a car with a displacement between 1200cc and 1400cc’). General preferences are preferences that do not use specific values (i.e. ’I want a cheap car’ or ’I want a car that does not pollute the environment’). The broker used a number of criteria in order to select the most ideal car for a specific user, based on the user’s preferences expressed in Step 1. The following criteria, ordered according to significance, were used for ranking the objects of our collection of cars: 1. Specific Preferences Criterion (SPC) Initially we only consider the Specific preferences applicable to our collection. If the expressed preferences are prioritized (i.e. ’Firstly I prefer a car that costs less than 10000 Euros, secondly a car with a displacement between 1200cc and 1400cc, etc.’), then cars are ranked according to the most prioritized preference, then according to the second most prioritized preference, etc. This is the Prioritized composition we discussed in Section 3.4.1. In the case that the user did not provide any priority order, preferences were considered equal. When a number of preferences have the same priority, then Pareto composition is used as it was described in Section 3.4.2. Lastly, in case of ties, the final bucket order is derived by ordering the cars of each bucket according to the number of wins per preference, like the rules described in Section 3.3.2. 2. General Preferences Criterion (GPC) If there are still ties, when deriving the most ideal and second ideal car, based on the ordering of cars created by the previous step, we take advantage of the General preferences, in case they can be applied to our collection. Specifically, the broker transformed each General preference to an ordering of cars. For example the preference ’I want a cheap car’, means ordering the cars of each bucket according to their price. Again the same criteria (Priority composition, Pareto composition and the wins rule) were used to derive the ideal and the

6.3. DiFEPreKO Hypothesis

123

second ideal car. 3. Broker Assumption Criterion (BAC) Finally, in the very few cases of a second tie, the broker was free to use his own assumptions like preferring most of the times the cheaper one, or based on his own opinion about manufacturers reliability. The above assumptions were used in our evaluation and were sufficient for the small number of cases that we had a 2nd tie (i.e. a tie in the General preferences). An indicative example of the results table is shown in Table 6.2. In the first columns, the table stores information regarding the user. The Step 1 column, holds information for Step 1, specifically the number of Specific preferences (SP ), the number of General preferences (GP ), and the total number of preferences (T P ), which are 3, 8 and 11 respectively in our case. Furthermore, in this specific example, the user spent 20 minutes to express his preferences (T ime). Step 2 column holds information regarding the second step of the evaluation. Specifically, according to our example, the user selected from the list of cars the car with id 46 (CID), after searching the list of cars for only 3 min. (T ime). Additionally, there is a grade for this specific car. This grade indicates the priority of the Specific preferences expressed by the user and is based on the number of Specific preferences this car satisfies. Specifically, the preference grade P Giz of a preference Pi for car cz can take the following values: • ✓ (i.e. the preference Pi is satisfied for this car) • − (i.e. the preference Pi is not satisfied for this car) • ◦ (i.e. the preference Pi is not applicable for this car). These are the inactive elements which in this case we have considered as worse than elements that satisfy this preference, but better that elements that do not satisfy this preference. • a number if this car satisfies the corresponding value from an ordered set of preference values (i.e. 1 for the first, 2 for the second, etc.). For example, if F iat ≻ Audi and Audi ≻ M ercedes, then a car made by F iat would have a value 1 for this preference, Audi would have a value 2, etc. and the rest of the cars made by other manufacturers would have a value of 0, which are inactive elements and are considered as the worst in this case. When there was no preference priority, all preferences were considered equal i.e. the grades would be {P G1z , ..., P Gnz } for a specific car cz . On the other hand, if P1 is prioritized over P2 , which is priori-

124

Chapter 6. Evaluation User User Information

Step 1

Step 2

Id Age Gen. Educ. Country SP GP TP Time CID 16 23

F

Grade

MSc Greece 3 8 11 20 m. 46 {{0}, {−✓}} 3 m.

Broker Ideal CID

Grade

Time

2nd Ideal CID

Grade

Results 1st vs 2nd Br vs Us wins in

43 {{1}, {✓✓}} 28 {{2}, {✓◦}}

SPC

General

wins in NIIP IIPR NUP UPR S1 S2 SPC

0

0%

5 45.4% − −

Table 6.2: Example of Hypothesis Evaluation Results tized over all the other preferences for car carz , then the grade would be {{P G1z }, {P G2z }, {P G3z , ..., P Gnz }}. In our example, P1 is prioritized over P2 and P3 , which have equal priority. Notice that for space reasons, we provide only grades regarding the SPC criterion. The next columns concern the broker, and specifically the first and second ideal cars that he proposes. Again we store the car ids (CID) and their corresponding grades. The next column (1st vs 2nd) describes in which step of the broker’s criteria process, the ideal car was preferred over the 2nd ideal car, and takes a value from {SP C, GP C, BAC}. Finally, the last seven columns give us an overview of the results. The first Br vs U s column describes in which step of the broker’s criteria process, his ideal car was preferred over the user’s selected ideal car, and again takes a value from {SP C, GP C, BAC}. The next column, Number of Intentional Inconsistent Preferences (NIIP) means the number of preferences, expressed in Step 1 by the user, that were intentionally overriden when the user selected the ideal car from the collection (e.g. choosing a car with a displacement of 1500cc when he had expressed a preference for an engine less than 1400cc). Intentional Inconsistent Preferences Percentage (IIPP) holds the percentage of preferences that were overriden over the number of Specific preferences. Number of Unused Preferences (NUP), holds the number of preferences that were not used, since they were not applicable in our collection, and Unused Preferences Percentage (UPP) holds the percentage of preferences that were not used over the whole number of preferences that the user expressed. Finally, (S1 ) marks if the user selected the same car as

6.3. DiFEPreKO Hypothesis

125

the ideal car proposed by the broker and (S2 ) marks if the user selected the same car as the second ideal car proposed by the broker. The results of the evaluation are shown in Table 6.3. From the results it is obvious that only 6 out of the 30 participants (20%) were able to select the ideal car in Step 2, according to their expressed preferences in Step 1. This supports our initial hypothesis that without exploring the existing choices, the expression of preferences results to incomplete preferences. In addition, when the user did not select the ideal car according to his preference, the broker’s ideal car won the user selected car, according to his expressed preferences in Step 1, 79.16% during the Specific preference criteria phase (SP C), 16.67% during the General preference criteria phase (GP C), and only 4.16% during the broker’s assumption phase (BAC). If we also take into consideration the 2nd ideal car proposed by the broker, then the number of the participants raises to 10 (33.3%). Notice though, that 75% of the 2nd ideal cars, lost from the ideal car during the Specific preference criteria phase (SP C). This means that the ideal car is clearly preferred to the 2nd ideal one, according to the expressed Specific preferences by the user. The rest 25% of these cars lost during the broker’s assumption phase (BAC). Regarding the (1st vs 2nd) column, we can see that 40% the broker was able to discriminate the ideal from the 2nd ideal car during the SP C phase, 36.6% during the GP C phase and 23.3% during the broker biased BAC phase. Furthermore, we can see that participants spent on average 10 minutes in Step 1 (the worst case was a user that used the whole 30 minutes time slice). Users tried to take into consideration every aspect they could imagine, since a test collection with the viable choices was not available. As a result the process of preference expression was time consuming and also lead to a number of inconsistent preferences. Specifically, each participant expressed on average 9.73 preferences, of which 6.7 were Specific preferences and the rest 3.06 were General preferences. Of the 6.7 Specific preferences, 1.54 were inconsistent (i.e. meaning that finally the user selected an ideal car that did not satisfy this preference). 3.2 preferences per user were not applicable to our collection, and as a result were not used. This result showcases that users spend a lot of time expressing preferences that are not consistent with their final decision or applicable for selecting the ideal selection. On the other hand, participants spent 4 minutes to find the ideal car from the list of 50 cars in Step 2. We can argue that in the case of a list of thousands cars, participants would have to spent a lot more time in order to find the ideal car. In this case we could exploit available information thinning approaches

126

Chapter 6. Evaluation

and the proposed preference framework. Another important conclusion is that only 2 out of the 30 participants provided a prioritized list of preferences. This might explain the differences between the ideal car users selected and the ideal car that was picked by the broker, since for example the price is one of the most important factors when purchasing a car. Finally, we can conclude that even though most of the participants have a high educational level, they were not able to provide the appropriate preferences that could lead to the ideal car for them. They spend on average 10 minutes in order to provide preferences that were either non applicable or they were overridden when they chose the ideal car from the test collection. Statistical Significance Test We also conducted a statistical significance test to check the randomness of our results. In our evaluation test we have dichotomous data, where each individual in the sample is classified in one of two categories. The first category is the individuals who expressed preferences that can lead to the ideal car for them (CIdeal ) and the second category is the individuals who expressed preferences that could not lead to the ideal car (CN on−Ideal ). A suitable statistical test in our case is a one-tailed (lower-tailed) binomial significance test, since we have dichotomous data, observations are independent from each other, probabilities of success and failure are constant across trials and the critical region falls at one end of the possible values (Griffiths (2009)). Our null hypothesis H0 is: Null Hypothesis (H0 ) More than half of the users expressed their preferences without exploring available cars, and were returned the ideal car for them from a car collection. Then the alternative hypothesis H1 is: Alternative Hypothesis (H1 ) Less than half of the users expressed their preferences without exploring available cars, and were returned the ideal car for them from a car collection. In our case, we want the user selected car id (cidu ) (i.e. the ideal car from the car collection for the user) and the first ideal car id selected by the broker (cidb ) to be the same (cidu = cidb ), for more than half of the cases.

F

M

M

M

M

F

11 32

12 31

13 29

14 28

15 37

M

F

M Univers. Greece 8

M Univers. Greece 9

F

F

F

M

M

F

M

20 42

21 47

22 46

23 22

24 26

25 30

26 26

27 25

28 32

29 26

30 27

User

Total Number

0

0

0

2

7 m. 48

10 m. 16

10 m. 47

6 m. 46

5 m. 46

7 m. 31

5 m. 46

4 m. 40

4

5 m. 40

7 m.

3

4

0

4

0

{◦✓✓} 3 m.

5 m. 46

4

9

8

5

5 m. 30

8 m. 21

5 m. 30

15 m. 15

14 14 m. 46

11

10 30 m. 17

12 m. 32

15 m. 46

7 m. 31

6

5 m.

4

10 10 m. 48

9

9

9

1 m. 5 m. 5 m. 1 m. 5 m. 5 m. 5 m. 5 m.

{◦✓ − ✓✓✓ − ✓−} {◦✓✓✓} {◦ ◦ ◦ − ✓} {◦ ◦ ◦ ◦ − − −✓✓} {−✓✓ − −} {✓✓ − − ◦ ✓✓ − −} {−2 − ✓✓−} {◦✓✓}

{◦ ◦ ✓✓}

{3✓✓✓✓✓}

Grade

2nd Ideal

{− ◦ ✓✓✓✓✓ ◦ ✓✓}

{◦ ◦ − ◦ ◦ ◦ ◦✓✓✓✓}

{{✓✓}, {✓✓ − ✓2}}

{{✓−}, {− ◦ ✓ ◦ ◦ ◦ ✓◦}}

{✓✓✓✓}

{−✓✓✓}

{◦ − ✓}

{✓✓✓ ◦ ✓✓✓}

{−✓✓ − ✓ − ◦}

{✓ − −✓✓}

{✓✓✓}

{✓0✓✓ ◦ ◦✓}

{✓✓✓} 28

49

14

30

16

16

30

46

47

30

20

{◦✓✓}

{−0✓ − ✓✓}

{✓✓ − − ◦ ✓✓ − ✓}

{✓✓✓✓✓}

{◦ ◦ ◦ ◦ ✓✓✓ − −}

{◦ ◦ ◦✓✓}

{◦✓✓✓}

{◦ − −✓✓✓✓✓✓}

{✓ ◦ ✓✓✓✓✓ − ✓}

{−✓✓}

{1✓✓✓✓}

{✓ − ✓ − ✓}

{◦✓✓✓−}

30

39

39

32

30

16

30

5

46

17

17

14

17

{✓✓✓}

{− − ✓}

{1 − ✓✓✓}

{✓ − ✓ − ✓}

{◦✓ − ✓−}

{◦✓✓}

{−0✓ − ✓✓}

{✓ − ✓ − ◦✓✓ − −}

{✓✓✓✓✓}

{◦ ◦ ◦ ◦ ✓ − ✓ − ✓}

{◦ ◦ ◦✓✓}

{◦✓✓✓}

{◦ − −✓✓✓✓✓✓}

{✓ ◦ − − ◦ − ✓ − ✓}

Table 6.3: Results of the hypothesis evaluation

4.01 m

1 m.

{✓ ◦ ✓✓✓✓✓ − ✓}

18 m. 20

{1✓✓ ◦ −}

15

2 m.

37

{✓ − −✓✓} 1 m.

43

16

42

4

35

42

4

34

32

16

30

38

40

4

CID

Broker

31 {◦ ◦ ◦ ◦ ◦ ◦ ◦✓✓ ◦ −✓ − ✓} 46 {◦ ◦ ◦ ◦ ◦ ◦ ◦✓✓ ◦ −✓ − −}

43

30 s.

{✓ − −}

{◦ ◦ ✓✓✓✓✓ ◦ ✓✓}

{◦ ◦ ✓ ◦ ◦ ◦ ◦✓✓✓✓}

{{✓✓}, {✓✓✓✓1}}

{{✓✓}, {− ◦ − ◦ ◦ ◦ −◦}}

{✓✓✓✓}

{−✓✓✓}

{◦ − ✓}

{✓✓✓ ◦ ✓✓✓}

{−✓✓ − ✓ − ◦}

{✓ − −✓✓}

{✓✓✓}

{✓4✓✓ ◦ ◦✓}

{◦ ◦ ✓✓}

{1✓✓ ◦ ✓✓}

Grade

Ideal

33 {✓✓✓✓ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦◦} 50 {−✓✓✓ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦◦}

26

11

40

6

41

48

31

10

15

46

30

{◦✓ − ✓−}

15 15 m. 46 {◦ ◦ ◦ ◦ ◦ ◦ ◦✓✓ ◦ −✓ − −} 5 m.

11 20 m. 46

6.7 3.06 9.73 10.1 m

Greece 3

Greece 6

Greece 9

Greece 5

Greece 9

3 m.

{✓✓✓✓}

{◦ ◦ ✓✓✓✓✓ ◦ −✓}

3 m. 15 m. 30

{−✓✓✓}

5 m.

1 m.

{◦ − ✓}

{◦ ◦ − ◦ ◦ ◦ ◦✓✓✓✓}

3 m.

2 m.

5 m.

{−✓✓ − ◦ − ◦} {✓✓ ◦ − − −✓}

{{✓✓}, {✓✓✓✓1}}

2 m.

{✓ − −✓−}

5 m.

3 m.

{✓✓✓}

{{✓−}, {− ◦ ✓ ◦ ◦ ◦ ✓◦}}

1 m.

{✓4 − ✓ ◦ ◦✓}

46

3 m.

40

2 m.

{◦ ◦ ✓✓}

Time CID

{1✓✓ ◦ ✓✓}

Grade

Step 2

21 13 m. 49 {−✓✓ − ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦} 4 m.

10 10 m. 14

11 10 m. 34

10

10

10 15 m. 34

11 10 m. 45

8

9

7

6

7

7

9

9

Greece 5 12 17 10 m. 36

Greece 4

Average Values

MSc

MSc

MSc

MSc

MSc

PhD

MSc

6

Greece 10 4

Greece 5

Univers. Greece 3

Master

Master

5

M

Austria 5

19 25

PhD

M

8

18 44

Greece 3

F

M High Sch. Greece 14 1

Greece 15 6

Greece 10 0

Greece 11 0

3

10 0

Greece 7

India

6

7

17 76

MSc

Norway 4

Bulgaria 4

5

1

1

1

4

0

5

3

16 23

MSc

MSc

MSc

PhD

PhD

PhD

MSc

Μ

9 33

10 31

Estonia 3

PhD

F

France 6

8 33

PhD

F

Greece 5

M Univers. Austria 8

PhD

Greece 3

Greece 7

Greece 4

Greece 6

7 32

M

5 26

MSc

MSc

Step 1

Country SP GP TP Time CID

6 30

F

M

4 27

F

2 25

3 26

MSc

M

1 32

MSc

Educ.

Id Age Gen.

User Information

GPC

GPC

BAC

GPC

BAC

GPC

GPC

GPC

SPC

SPC

SPC

GPC

SPC

SPC

GPC

SPC

SPC

SPC

SPC

SPC

GPC

BAC

BAC

BAC

BAC

BAC

GPC

SPC

GPC

SPC

wins in

General

Results

GPC

SPC

SPC

SPC

SPC

SPC

GPC

SPC

-

SPC

SPC

GPC

SPC

SPC

SPC

SPC

SPC

SPC

-

SPC

GPC

BAC

-

SPC

SPC

SPC

-

SPC

-

0%

0%

5

0

50%

60%

0%

1

0

0

0%

20%

0%

25%

0

3

5

7

10%

1

0%

✓−

✓− −− −−

✓− 10% − −

54.5% − −

0%

30% − ✓

50% − −

63.5% − −

62.5% ✓ −

0%

14.2% − −

0%

0%

28.5% − −

55.5% ✓ −

9%

5

20%

40%

40%

7

2

4

20%

0%

60%

3

0%

50%

0% − − 25% − ✓

1

3

0%

−−

6 4

16.6% − −

30% − −

33.3% − −

33.3% − −

1.53 24.4% 3.2 28.71%

0

3

4 44.4% 3

3

−− 12.5% ✓ −

0%

50% − −

18.8% − −

40% − ✓

46.6% − ✓

45.5% − −

10 58.8% − −

1

3 37.5% 0

1

0

3 33.3% 0

1 12.5% 1

2 66.6% 0

1

2

2

3 21.4% 7

0

2 13.3% 14 82.3% − −

1

1 9.09% 6

0

2

0

1

1 33.3% 5

3 42.8% 0

3

3

0

1 14.2% 2

0

0

wins in NIIP IIPR NUP UPR S1 S2

1st vs 2nd Br vs Us

6.3. DiFEPreKO Hypothesis 127

128

Chapter 6. Evaluation If Y is the number of successes in n trials, then the probability of getting Y successes in n trials is

due to the binomial distribution (Griffiths (2009)): ( ) n n! P (Y = y) = ∗ pk ∗ (1 − p)n−k = ∗ py ∗ q (n−y) y! ∗ (n − y)! k where p is the probability of success and q = 1 − p the probability of failure. In our case n = 30, p = 0.5 and q = 0.5. So we want to check what is the probability that 6 or less participants are successful in finding the appropriate car for them. So, we want to calculate: P (X ≤ 6) =

6 ∑

P (X = i)

i=0

which will provide the p-value (the probability of obtaining a test statistic at least as extreme as the one that was actually observed). Figure 6.4 shows the probabilities of the binomial distribution for different number of successes, the cumulative distribution function and the Type I error area (i.e. rejecting falsely a true null hypothesis). Regarding the significance level, according to Wasserman (2004) an α value of 0.05, which is commonly used in the bibliography, provides a strong evidence against H0 , while a value of less than 0.01 provides a very strong evidence against H0 . In our case we used an α value of 0.01. The α value determines the risk of a Type I error. We used the R language 4 in order to calculate the above probability. Specifically, we executed the command: binom.test(6, 30, 0.5, alternative="less") which returned a p-value = 0.0007155 ≤ α = 0.01. As a result we have a very strong evidence against H0 . So we can reject the null hypothesis H0 and we can conclude that: ”Less than half of the users can express their preferences without exploring available cars, in such a way that they can be returned the ideal car for them from a car collection”. Furthermore, if we also consider the second ideal cars, where the number of successes is 10 then, we 4

R is an open source programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. (http://www.rproject.org/)

6.4. Evaluation of Various Exploration Approaches

129

Figure 6.4: Probabilities and Distribution Function of the Binomial Distribution can execute the command: binom.test(10, 30, 0.5, alternative="less") which returned p-value = 0.04957 ≥ α = 0.01. As a result, with a significance level α = 0.01 we cannot reject the H0 . But if we relax our significance level to α = 0.05, then p-value = 0.04957 ≤ α = 0.05. And as a result we have a strong evidence (instead of the very strong evidence that is provide by α = 0.01), to reject the H0 and accept the H1 .

6.4

Evaluation of Various Exploration Approaches

We conducted a comparative evaluation between a) an interface providing FDT over static metadata b) an interface providing a clustering algorithm and c) a combination of FDT with both static and dynamically mined metadata (i.e. clustering). The purpose of this evaluation was to prove the effectiveness of the FDT scheme over other exploratory schemes like clustering. Thirteen users participated in the evaluation with ages ranging from 20 to 30, 61.5% males and 38.5%

130

Chapter 6. Evaluation

females. We can distinguish two groups: the advanced group consisting of 3 users and the regular one consisting of 10 users. The advanced users had prior experience in using clustering and multidimensional browsing services, while the regular ones had not. We specified four information needs (or tasks) of exploratory nature, and for the first three we specified three variations for each. According to Lindgaard and Chattratichart (2007), a big number of tasks can improve usability test performance. All tasks were refined using the task refinement steps described in Kules and Capra (2008): • The task descriptions should include words or semantically close terms that are values of a facet. • By using keywords of the task description, the user should not complete the task by using the first 10 results of the answer set (else the task would be too easy and would not requiring exploratory search). • The facets should be useful without having to click the ”show more” link of a facet. In the described comparative approach the idea is to let users compare a number of different systems and rank them according to the following different criteria: Log Data Analysis During the evaluation we logged and counted for each user: (a) the number of submitted queries and (b) the number of clicked zoom points (by facet). Task Completeness We measured the average percentage of the correct URLs that users found (both regular and advanced users) for the evaluating tasks in each user interface, out of the total number of correct URLs in our testbed. User Preference To identify the most preferred interface (for regular and advanced users) we aggregated the preference rankings for each task using Plurality Ranking (i.e. we count the first positions), and Borda ranking Borda (1781) (i.e. we summed the positions of each interface in all rankings). In a Plurality column, the higher

6.4. Evaluation of Various Exploration Approaches

131

a value is, the better (i.e. the more first positions it got), while in a Borda column the less a value is, the better. The rows marked with All show the sum of the values of all tasks. With bold we have marked the best values for each task.

User Satisfaction Users ranked the interfaces based on their satisfaction, and we aggregated the satisfaction rankings for each task again using Plurality Ranking and Borda ranking.

User Friendliness Users ranked the interfaces based on their user friendliness, and we aggregated the rankings for each task again using Plurality Ranking and Borda ranking. The results, which are described in detail at Papadakos et al. (2012a) showed that the FDT-based approaches were the most preferred. Specifically, users both advanced and regular ones, were able to achieve a significantly higher degree of task completeness with the FDT-based approaches, instead of the plain clustering one. Furthermore, they submitted the least number of queries with FDT interfaces (advanced users made more than 50% less queries with FDT). The plain clustering interface was the least preferred for 58.3% of the advanced users and for 65% of the regular, while the (c) interface was the most preferred one for the advanced users. For regular users there was a tie between (a) and (c). Finally, regarding satisfaction, 55% of the advanced users were highly satisfied from (c), while 50% of the regular users were satisfied by (a). Only 16.6% of the advanced users and 12.5% of the regular users were highly satisfied from the plain clustering interface. Furthermore a statistical analysis was conducted, where the upper and lower limits with 95% confidence were computed. For regular users, we made the following observations: • Only 5% of the regular users with a ±9.2 error were not satisfied by the FDT interface (A) • Only 5% of the regular users with a ±15.91 error have low preference for the combined interface (C) • Only 12.5% of the regular users, with a ±20 error, find the clustering interface highly satisfactory

132

Chapter 6. Evaluation • Only 12.5% of the regular users, with a ±20 error, have low satisfaction for the combined interface (C) For the advanced users, we did not come up with a clear conclusion due to the big errors. Summarizing the results of the evaluation, the UIs providing the FDT interaction scheme over static

metadata was the most preferred UIs for regular users. On the other hand, advanced users preferred by a small margin an FDT UI which in addition provided dynamic metadata through a clustering algorithm. In any case, the browsing interaction scheme of FDT with or without dynamic metadata was preferred, provided better user satisfaction, and resulted in a higher task completeness degree with less queries, than other browsing interaction schemes (i.e. plain clustering).

6.5

Evaluation of Hippalus System

We evaluated the Hippalus system over two different user groups, plains users and expert users. These two groups were asked to complete a number of tasks over two different UIs: • a) U I1 : Hippalus system with exploration and browsing capabilities only (preference functionality was disabled) • b) U I2 : Hippalus system with exploration and browsing capabilities and preference functionality enabled We compared the above interfaces with respect to ease of use, ease of learning, usefulness, user preference, user satisfaction, and task accomplishment. Furthermore, we wanted to examine how users really used the above interfaces and for this reason we conducted a log analysis (usage-based evaluation). Finally, for each user action we calculated a number of the metrics that were described in Section 6.1.2, so that we could evaluate each user action. Figure 6.5 depicts the steps of our evaluation process. Participants In this study, 26 persons5 , males and females of varying age (i.e. between 23-43 years) and expertise (i.e. tertiary education - PhD level) participated. We formed two groups. The first group, named plain 5

According to Faulkner (2003), 10 evaluators are enough for getting more than 82% of the usability problems of a user interface (at least in their experiments).

6.5. Evaluation of Hippalus System

133

Figure 6.5: Comparative Evaluation Process

users, consisted of 20 regular users, while the second one, expert users, consisted of 6 people with a prior experience in using multi-dimensional services and preferences. Before starting the evaluation, users were given a simple tutorial of 15 minutes6 to all the participants of the evaluation. Specifically, initially users were given a description of the information base (domain, attributes). In the next five minutes they were described the interactive process of information thinning and finally the rest of the tutorial demonstrated the preference actions by showing specific examples. Users were allowed to get acquainted with the UI and complete a number of simple tasks. 6

A video is available with the tutorial in http://www.youtube.com/watch?v=Cah-z7KmlXc

134

Chapter 6. Evaluation Attribute Price Manufacturer Engine Volume Body Type Fuel type Power Consumption city Consumption national

Users percentage 90% (27/30) 90% (27/30) 80% (24/30) 73.3% (22/30) 53.3% (16/30) 43.3% (13/30) 43.3% (13/30) 43.3% (13/30)

Attribute Trunk Year Number of doors Max Speed Torque Acceleration Drive System

Users percentage 40% (12/30) 36.7% (11/30) 33.3% (10/30) 16.7% (5/30) 13.3% (4/30) 13.3% (4/30) 6.7% (2/30)

Table 6.4: Percentages of the 30 Users that Expressed a Preference Over a Valid Attribute Information Base We used an information base of 50 cars, indexed under a big number of classes and subclasses. Specifically, there is a total of 23 classes and 85 subclasses. Some of them are hierarchically organized, like Manufacturer and some other are flat like Vehicle Type (an example is shown in Figure 5.5). Tasks A question during the design of the user tasks was which attributes to use. So, in order to provide representative tasks, we designed them on top of attributes for which real users expressed their preferences (these user preferences were collected in the evaluation described previously in Section 6.3). Specifically, Table 6.4 shows the percentages of the 30 users that participated in the evaluation described in 6.3, who expressed a preference over an attribute which is valid in our information base. Users expressed preferences for a total of 15 attributes that appear in our information base (and for a number of other attributes valid in our collection base, like color, ABS, etc.). In order to create the task of this evaluation we identified the most important attributes for which users expressed their preferences. Specifically, we only considered those with a percentage bigger than 50% (i.e. price, manufacturer, engine volume, body type and fuel) for the design of the tasks. Notice that for the hierarchically organized values of attribute Manufacturer, a number of users expressed preferences of the form Audi is better than BMW, while others expressed preferences like Japanese are better to European which are better to American and Korean, so we try to capture both of them in our task description. Finally, in this specific evaluation we make the assumption that the user does not change his preference criteria as he is exploring the available choices. We created two variations of equal7 tasks for the plain users evaluation and for the expert users eval7

In our context task equality is defined as tasks that consist of the same kind of preference actions and criteria.

6.5. Evaluation of Hippalus System

135

uation. Each task, in the first subtask used prioritized preference actions, while in the second one used Pareto composition. Tasks for plain users were designed on top of only 3 criteria. The tasks regarding the expert users, were more difficult and complicated, since they used 6 different criteria. Specifically, the tasks that users completed were the following: Plain User-Based Evaluation Task A You are supposed to buy a new car, which you will select through the Hippalus system. In order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine Volume: You would like a car with an engine volume around 1200cc . b) Price: You are willing to pay around 10000 Euros. c) Manufacturers: Generally, you prefer European to Korean, and German manufacturers to other European. You consider Japanese cars better than Korean ones. • Subtask 1: Which are the best cars according to the above description, if you consider that a) and b) are equally important and the most important criteria for you, followed by the criterion c). • Subtask 2: Which are the best cars according to the above description, if you consider that all of the 3 criteria are equally important? Task B You are supposed to buy a new car, which you will select through the Hippalus system. In order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine Volume: You would like a car with an engine volume around 1600cc. b) Price: You are willing to pay around 14000 Euros. c) Manufacturers: Generally, you prefer Asian manufacturers to European. From European you prefer German. Finally, European are better than American. • Subtask 1: Which are the best cars according to the above description, if you consider that a) and b) are equally important and the most important criteria for you, followed by c). • Subtask 2: Which are the best cars according to the above description, if you consider that all of the 3 criteria are equally important? Expert-Based Evaluation Task A You are supposed to buy a new car, which you will select through the Hippalus system. In order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine

136

Chapter 6. Evaluation

Volume: You would like a car with an engine volume around 1200cc. b) Price: You are willing to pay around 10000 Euros. c) Manufacturers: Generally, you prefer European manufacturers to American, and German to other European manufacturers. You consider Japanese cars better than Korean ones. d) Body Type: You want a car with a hatchback body type. Finally, e) Fuel type: fuel should be gasoline and not diesel and f) Year: you prefer a modern car, i.e. a car from a recent year. • Subtask A.1: Which are the best cars according to the above description, if you consider that a), b) and c) are equally important and the most important criteria for you, while d) is more important than e) which is more important than f). • Subtask A.2: Which are the best cars according to the above description, if you consider that all of the 6 criteria are equally important? Task B You are supposed to buy a new car, which you will select through the Hippalus system. In order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine Volume: You would like a car with an engine volume around 1400cc. b) Price: You are willing to pay around 14000 Euros. c) Manufacturers: Generally, you prefer European manufacturers to Asian and German to other European. Finally, you prefer Japanese to Korean. d) Body Type: You do not want a car with a body type of a minivan. Finally, e) Fuel type: fuel type should be diesel and f) Doors: you prefer a car with 5 doors instead of 3. • Subtask B.1: Which are the best cars according to the above description, if you consider that a), b and c) are equally important and the most important criteria for you, while d) is more important than e) which is more important than f). • Subtask B.2: Which are the best cars according to the above description, if you consider that all of the 6 criteria are equally important? We used rotation and counterbalancing, in order to control for order effects and to increase the chance that results can be attributed to the experimental treatments and conditions (Kelly 09). Specifically, we used a Graeco-Latin Square Design, rotating both the order of tasks and the order in which subjects experience the interfaces. Specifically, we created 4 user groups, U GP 1 , U GP 2 , U GP 3 , and U GP 4 8 . Each group completed the tasks as shown in Table 6.5, where column headings represent points 8

Unfortunately we only formed two groups of expert users U GE1 and U GE2 , since only 6 experts were available.

6.5. Evaluation of Hippalus System Users U GP 1 U GP 2 U GP 3 U GP 4

Time 1 U I1 : T askA1 , T askB2 , U I1 : T askA2 , T askB1 U I2 : T askA1 , T askB2 U I2 : T askA2 , T askB1

137 U I2 U I2 U I1 U I1

Time 2 : T askA2 , T askB1 : T askA1 , T askB2 : T askA2 , T askB1 : T askA1 , T askB2

Table 6.5: Graeco-Latin Square Design in time and order and the rows represent subjects 9 . Evaluation Users were asked to evaluate two different UIs over the Hippalus system, using the previously described tasks. In the first UI (U I1 ), preference actions were disabled. As a result, in order to complete the aforementioned tasks, they browsed the car collection and used the available information thinning functionality (selection of appropriate facets and terms to restrict their focus). The second UI (U I2 ), in addition to the information thinning functionality described previously, provided on top of it the proposed in this thesis preference actions through context menus. For both UIs, the users provided the set of cars which they believed fulfilled the needs of each task. For each task, an expert user provided the ordering of the collection according to preference. The order was a bucket order, meaning that two cars can be incomparable (i.e. equally preferred ). Users provided scores for the two exploratory systems, regarding Ease of use, Ease of learning, Usefulness, Preference and Satisfaction using a psychometric Likert scale. We calculated Effectiveness (Task completeness) and Efficiency (Time to complete a task) using the logged data. Main Results We gathered a number of interesting results from this evaluation. The main results can be synopsized to the following: • All plain users preferred the preferences UI instead of the non-preferences UI. Specifically 75% of the 20 plain users preferred the preference UI very strongly, 20% strongly and only 5% strongly enough. In addition all 6 expert users preferred the preference UI, 50% of them very strongly and the other 50% strongly. 9

For expert users due to their low number, we only rotated the interfaces.

138

Chapter 6. Evaluation • The preference-enabled UI, allowed the users to complete successfully all the tasks, in average less than a third of the time and with a third of user interactions compared to the plain FDT UI. • None of the users was able to successfully complete both of the tasks with the plain UI (only 1 expert user and 2 plain users completed successfully one of the two tasks they were assigned using U I1 ). • As a result we verify the conclusions of the theoretical user effort analysis, since the preferencebased UI helped the users to find the desired results in less time and with fewer actions and less decisions.

Fine Grained Results Here we discuss in more detail the results of this evaluation. Specifically, Figure 6.6 (a) depicts the aggregated results according to Plurality (i.e. how many times each UI was ranked first) and Borda (i.e. the total score each UI gathered from all users and tasks), regarding Ease of Use, while Figure 6.6 (b) for Usefulness, Figure 6.6 (c) for Preference and Figure 6.6 (d) for Satisfaction respectively. Scores are given for both plain and expert users. It is easy to see that for each one of the above criteria, U2 (i.e. the UI with preferences) was ranked almost always first for both expert and plain users. There were a number of ties between the two UIs, especially in the case of plain users (e.g. 14 ties regarding Satisfaction). Notice though, that the less ties (i.e. more wins for U I2 ) are in the case of the Preference criterion, where U I2 is a clear winner. Regarding the total scores of each UI according to Borda, for plain users, U I2 scored on average almost always 1/3 more than U I1 . Specifically, U I2 reached almost 9/10 of the top score (200 in the case of plain users) a system could score. On the other hand, expert users gave a bit lower rankings for U I2 , which in this case reached 3/4 of the top score (60 in the case of expert users). Again, U I2 was a clear winner over all criteria, while U I1 reached 1/2 of the top score. Table 6.6 reports the average, max and min timings and actions per each user group of both plain and expert users. From the results it is obvious that the timings and number of user actions of U I2 (i.e. the UI with preferences) are much smaller than the ones gathered using U I1 (i.e. the UI without preferences). Furthermore, we can see that the deviations of min and max actions and timings of U I2

6.5. Evaluation of Hippalus System

139

Figure 6.6: Plurality and Borda results for (a) Ease of Use, (b) Usefulness, (c) Preference and (d) Satisfaction. from the average ones are also much smaller than the respective deviations of U I1 10 . In addition, the timings and interactions for expert users is bigger for U I2 than the plain users since the users had to express a lot more preference actions. In more detail, Table 6.7 reports the average, max and min timings and actions per all and per each task for both UIs. Lets discuss first the average timings and user actions for all tasks, for both plain and expert users. It is obvious that U I2 is much more efficient in terms of timings and interactions for both user groups. Specifically, plain users on average were almost 3 times more efficient with U I2 instead of U I1 for both timings and user actions. On the other hand, expert users were on average more than 3.3 times more efficient and made half the interactions with U I2 instead of U I1 11 . 10 Notice that since a number of users were checking the correctness of the preferred cars returned by Hippalus for U I2 , the timings and numbers of user actions reported here should be bigger than the results that would be gathered from users that are confident about Hippalus. 11 It seems that expert users were more conservative regarding their interactions with the system.

140

Chapter 6. Evaluation

Plain UGP1 Average Max Min Plain UGP2 Average Max Min Plain UGP3 Average Max Min Plain UGP4 Average Max Min

UI1

Expert UGE1 Average Max Min Expert UGE2 Average Max Min

UI1

UI2

UI2

UI1

UI2

Table 6.6:

A1 (sec) 688.71 1071.53 270.75 A2 (sec) 220.06 357.92 136.49 A1 (sec) 284.58 347.59 202.60 A2 (sec) 949.86 1824.65 480.35

A1 (act.) 108.2 176 71 A2 (act.) 41.8 75 32 A1 (act.) 36.2 45 29 A2 (act.) 116.4 166 77

B2 (sec) 681.00 1104.38 347.41 B1 (sec) 226.52 367.29 151.04 B2 (sec) 154.47 189.43 121.29 B1 (sec) 631.56 1062.11 119.93

B2 (act.) 86.2 122 50 B1 (act.) 39.2 61 31 B2 (act.) 33.2 42 25 B1 (act.) 125.2 205 19

U2

A1 (sec) 862.46 1416.10 441.72 B2 (sec) 280.99 346.04 185.60

A1 (act.) 142.33 246 59 B2 (act.) 69.33 75 58

B2 (sec) 1020.00 1636.02 394.92 A1 (sec) 365.20 580.59 246.26

B2 (act.) 125.33 157 70 A1 (act.) 70.66 85 42

U2

U1

U1

U2

U1

A2 (sec) 379.40 803.70 146.19 A1 (sec) 596.15 1438.11 294.34 A2 (sec) 878.51 1431.88 500.75 A1 (sec) 243.77 391.87 146.22

A2 (act.) 43 58 31 A1 (act.) 100.2 165 53 A2 (act.) 136.6 297 73 A1 (act.) 37.8 52 33

B1 (sec) 223.74 409.92 130.22 B2 (sec) 705.22 1744.77 178.76 B1 (sec) 554.29 881.39 256.45 B2 (sec) 133.99 188.46 75.82

B1 (act.) 38.8 49 25 B2 (act.) 100.8 167 43 B1 (act.) 87.8 110 51 B2 (act.) 33 36 27

A2 (sec) 192.47 260.92 61.67 A2 (sec) 1083.08 1530.93 853.40

A2 (act.) 47.33 58 27 A2 (act.) 148.33 194 57

B1 (sec) 308.85 355.73 282.89 B1 (sec) 842.79 1447.35 434.22

B1 (act.) 57.33 64 50 B1 (act.) 92.66 100 78

Plain and Expert Users Average, Max and Min Timings and User Actions for each Task for both UIs per each User Group

The task that benefited the most from the preference interaction,seems that for plain users, was Task B2 , since the speedup for timings was 4.80x. On the contrary, the speedup for user actions was almost the same for all tasks. Furthermore, notice that for Task B1 , there was a plain user that completed the task quickly and with only 19 interactions12 . On the other hand, regarding expert users, Task A2 benefited the most from the preference interaction, since the speedup for timings was 5.62x and 3.13x regarding user actions. On the other tasks the speedup for user actions was much smaller. The above results are synopsized in Figure 6.7 (a). Finally, the preference-based approach gave better average values for each used metric during the session of each exploratory task. Specifically, none of the users was able to successfully complete both of the tasks with the plain UI U I1 . On the contrary all the users completed successfully all the tasks with U I2 , a result that highlights the user-friendliness and efficiency of the proposed interaction scheme. 12

The correct answer of this task included 4 cars and this user found only one of them.

6.5. Evaluation of Hippalus System

141

Plain All Tasks Average Plain Task A1 Average Max Min Plain Task A2 Average Max Min Plain Task B1 Average Max Min Plain Task B2 Average Max Min

UI1 (sec) 710.66

UI2 (sec) 233.32

Speedup 3.04x

U1 (act.) 107.67

U2 (act.) 37.87

Speedup 2.84x

642.43 1438.11 270.75

264.17 391.87 146.22

2.43x 3.66x 1.85x

104.2 176 53

37 52 29

2.81x 3.38x 1.82x

914.18 1824.65 480.35

299.73 803.702 136.49

3.04x 2.27x 3.51x

126.5 297 73

42.4 75 31

2.98x 3.96x 2.35x

592.93 1062.11 119.93

225.13 409.92 130.22

2.63x 2.59x 0.92x

106.5 205 19

39 39.2 38.8

2.73x 5.22x 0.48x

693.11 1744.77 178.76

144.23 189.43 75.82

4.80x 9.21x 2.35x

93.5 167 43

33.1 42 25

2.82x 3.97x 1.72x

Expert All Tasks Average Expert Task A1 Average Max Min Expert Task A2 Average Max Min Expert Task B1 Average Max Min Expert Task B2 Average Max Min

UI1 (sec) 952.08

UI2 (sec) 286.88

Speedup 3.32x

U1 (act.) 127.17

U2 (act.) 61.17

Speedup 2.08x

862.46 1416.11 441.72

365.20 580.60 246.27

2.36x 2.44x 1.79x

142.33 246 59

70.66 85 42

2.01 2.89x 1.40x

1083.08 1530.94 853.41

192.47 260.92 61.68

5.63x 5.87x 13.84x

148.33 194 57

47.33 58 27

3.13x 3.35x 2.11x

842.79 1447.35 434.22

308.85 355.73 282.90

2.73x 4.07x 1.53x

92.67 100 78

57.33 64 50

1.62x 1.56x 1.56x

1020.00 1636.03 394.92

281.0 346.05 185.61

3.00x 4.73x 2.13x

125.33 157 70

69.33 75 58

1.34x 1.33x 1.34x

Table 6.7:

Plain and Expert Users Average, Max and Min Timings and User Actions per each Task and All Tasks for both UIs

Notice, that 1 expert user and 2 plain users managed to successfully complete one of the two tasks they were assigned using U I1 . The above are depicted in the calculated Recall, Precision, and Average Precision metrics, which are reported in Table 6.8. Notice that on average for all tasks, there is a 2.30x and 3.49x improvement regarding average precision for plain and expert users respectively. Task B2 seems to be the most difficult task for both plain and expert users, since the biggest gains in all three metrics

142

Chapter 6. Evaluation

Figure 6.7:

Average Values in Last Step of Each Task. (a) for Timings (T) and Actions (A), while (b) Depicts the Values for Recall (R), Precision (P) and Average Precision (AP)

were observed here (i.e. regarding average precision more than 3.62x improvement for plain and 6.36x improvement for expert users.) Furthermore, notice that there were higher improvements per each metric for expert users with U I2 , since their tasks were much more complicated and the number of criteria was bigger than the tasks of the plain users. As a result, although experts, these users achieved lower rankings with U I1 for almost all metrics. The above results are synopsized in Figure 6.7 (b). Since the results of the two approaches show such significant differences for the basic metrics of Recall, Precision and Average Precision, we did not consider evaluating the other more refined metrics described in Section 6.1.

6.6

Evaluation Conclusion

In this chapter we have discussed a number of evaluation metrics and approaches for exploratory search and we selected those that could apply in our preference–based approach. In addition, the provided theoretical analysis of user effort in FDT interaction schemes described in Section 6.2, shows the benefits of the FDT interaction (i.e. small number of interactions and decisions). Specifically, the section provided an example where a user could find the desired 10 objects in a peta-size collection with only 30 clicks (number of decisions is 90). The extension of this study to the proposed preference-enriched scheme, assuming that a user has expressed a preference relation for each facet and that the most preferred choice is prompted first, shows that the number of decisions is reduced to the number of clicks (i.e. to 30 for a peta-sized collection).

6.6. Evaluation Conclusion Metric All Tasks Recall Precision Average Precision Task A1 Recall Precision Average Precision Task A2 Recall Precision Average Precision Task B1 Recall Precision Average Precision Task B2 Recall Precision Average Precision

Table 6.8:

143 UI1 0.56 0.61 0.433 UI1 0.63 0.48 0.42 UI1 0.55 0.80 0.54 UI1 0.61 0.77 0.487 UI1 0.46 0.39 0.27

Plain Users UI2 Improv. 1 1.7x 1 1.62x 1 2.30x UI2 Improv. 1 1.57x 1 2.06x 1 2.33x UI2 Improv. 1 1.81x 1 1.24x 1 1.84x UI2 Improv. 1 1.62x 1 1.28x 1 2.049x UI2 Improv. 1 2.14x 1 2.53x 1 3.62x

Expert Users UI1 UI2 Improv. 0.52 1 1.92x 0.43 1 2.31x 0.28 1 3.49x UI1 UI2 Improv. 0.66x 1 1.5x 0.44x 1 2.25x 0.44x 1 2.25x UI1 UI2 Improv. 0.33 1 3x 0.53 1 1.87x 0.26 1 3.75x UI1 UI2 Improv. 0.83 1 1.2x 0.38 1 2.57x 0.277 1 3.601x UI1 UI2 Improv. 0.25 1 4x 0.36 1 2.76x 0.15 1 6.36x

Plain and Expert Users Recall, Precision and Average Precision Metrics per each and all Tasks for both UIs

Subsequently, we formulated the an hypothesis expressing Difficulty of Formulating Effective Preferences without Knowing the Options (DiFEPreKO), and the conducted user study showed that without the ability to explore the existing choices, the expression of preferences is time-consuming and in most cases results to incomplete preferences. Specifically, we found that only 20% of the users were able to identify the ideal car from a list of cars according to their previously expressed preferences (the percentage raises to 33% if we also consider the second ideal car). Furthermore, users expressed preferences that were inconsistent to their final decision (23% of the preferences). The statistical analysis over the results, provide a strong evidence against the formulated null hypothesis and we can conclude that ”without exploring available cars only less than half of the users can express their preferences in a way sufficient for returning the ideal car for them from a car collection”. We also conducted two comparative user studies, one for evaluating the FDT per se and another for the proposed preference-enriched FDT interaction. The first one was conducted over the Mitos WSE and evaluated a number of different exploratory interfaces. This evaluation showed that the UI that supported the FDT interaction scheme over static metadata was the most preferred UI among the

144

Chapter 6. Evaluation

regular users. On the other hand, advanced users preferred by a small margin the FDT UI which in addition to the static metadata, also uses dynamic metadata through a clustering algorithm. In any case, the browsing interaction scheme of FDT with or without dynamic metadata was preferred, provided better user satisfaction, and resulted in a higher task completeness degree with less queries, than the other browsing interaction schemes (i.e. plain clustering). Finally, the second user–based comparative evaluation that we conducted over the Hippalus system, showed that 100% of the users (both expert and plain ones) preferred the preference–based UI, a result that was supported by each distinct qualitative result. The preference-enabled UI, allowed users to complete successfully all the tasks, in average less than a third of the time and with a third of user interactions compared to the plain FDT UI. Furthermore, none of the users was able to successfully complete both of the tasks with the plain UI (1 expert user and 2 plain users completed successfully one of the two they were assigned using U I1 ). As a result we verify the conclusions of the theoretical user effort analysis, since the preference-based UI helps users to find the desired results in less time and with fewer actions and less decisions. Finally, the preference-based approach gave better average values for each used metric during the session of each exploratory task.

Chapter 7

Conclusion and Future Research Contents

7.1

7.1

Synopsis of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2

Directions for Future Work and Research . . . . . . . . . . . . . . . . . . . . . . 148

Synopsis of Contributions

In this thesis we motivated the need for real-time preference elicitation and we introduced a language (including its syntax, semantics and GUI-level exploitation methods) for enriching the interaction scheme of Faceted and Dynamic Taxonomies (FDT) with preference elicitation and preference-based interaction. Key aspects of the proposed approach include, the support of hierarchically organized values, the support of set-valued attributes, and the incremental preference specification mode with the scope-based method for resolving conflicts. In addition, the rapid reduction of the information space that is possible with FDT, makes preference-based ordering feasible on large information bases, since the introduced algorithms for producing the preference order are independent of the size of the information base; they depend on the size of the focus, and the number of the preference actions enacted by the user. Furthermore, we provided a top-k variation of the algorithm suitable for the case where the size of the focus is big. To demonstrate the feasibility of our approach and for identifying possible difficulties or other is145

146

Chapter 7. Conclusion and Future Research

sues related to implementation and application, we have designed and implemented a proof of concept prototype, the Hippalus system. This system provides exploration services over RDF information bases and supports the introduced preference framework through HTML 5 context menus. Specifically, the user is able to order classes, subclasses and objects and he can compose object related preferences, using Priority, Pareto and Pareto Optimal compositions. We provided a theoretical analysis of user effort in FDT interaction schemes, plain and preference– enabled ones, that suggests the effectiveness of the proposed approach in respect to the interaction and decision cost. In addition, we formulated the Difficulty of Formulating Effective Preferences without Knowing the Options (DiFEPreKO) hypothesis and the conducted user study showed that without the ability to explore the existing choices, the expression of preferences is time-consuming and in most cases results to incomplete preferences. Finally, we conducted two comparative user studies, one for evaluating the FDT per se and another for the proposed preference-enriched FDT interaction. The first one was conducted over the Mitos WSE and suggested that the browsing interaction scheme of FDT with or without dynamic metadata was preferred, provided better user satisfaction, and resulted in a higher task completeness degree with less queries, than the other browsing interaction schemes (i.e. plain clustering). The second one, conducted over the Hippalus system, showed that 100% of the users (both expert and plain ones) preferred the preference–based UI. In more detail this UI allowed users to complete successfully all the tasks, in less than a quarter of the time and with a quarter of user interactions compared to the plain FDT UI. Furthermore, none of the users was able to successfully complete any of the tasks with the plain UI. As a result we verify the conclusions of the theoretical user effort analysis, since the preference-based UI helps users to find the desired results in less time and with fewer actions and less decisions.

7.2

Directions for Future Work and Research

There are several issues that are worth further work and research. As regards applicability, it is worth developing wrappers that can be used for feeding (synchronously or asynchronously) Hippalus with the results of queries from web search engines (e.g. at least those which are OpenSearch compatible), database sources, SPARQL queries, etc. The availability of such wrappers can lead to a generic client of search services that can bring the benefits of Hippalus system to a plethora of users. Furthermore, up to now Hippalus does not support multi-valued attributes.

7.2. Directions for Future Work and Research

147

Regarding the interaction model, we have not realized any substantial requirement for change or advancement. This is also supported by the results of the user study. As far as the algorithmic part is concerned, in this thesis we strongly suggest a process that contains both information thinning and preference actions, since apart from giving users the required overview for decision making, it also significantly reduces the computational effort for deriving the preferenceorder. But it is still interesting to investigate optimizations for the case where the current answer is very big, i.e. to further research the direction described in Section 4. Finally, considering the structure of the information space (either of the information corpus or the search results), (i.e. objects described according to a multidimensional space with hierarchically organized values), one possible future direction could be to consider more complex structures. For example, objects described with values accompanied by numbers expressing various quality aspects like accuracy Powley and Dale (2007), specificity Tzitzikas et al. (2013), certainty Webber et al. (2012), trust, authority and popularity Kazai and Milic-Frayling (2008), etc. Then we can investigate the required advancements of both the interaction model and the preference framework.

148

References Abel, F., Celik, I., and Siehndel, P. 2011. “Towards a Framework for Adaptive Faceted Search on Twitter”. In Procs of the International Workshop on Dynamic and Adaptive Hypertext (DAH’11), ACM Hypertext, Eindhoven, The Netherlands. Agrawal, R., Borgida, A., and Jagadish, H. 1989. “Efficient Management of Transitive Relationships in Large Data and Knowledge Bases”. ACM SIGMOD Record 18, 2, 253–262. Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. 2009. “Diversifying Search Results”. In Procs of the Second ACM International Conference on Web Search and Data Mining (WSDM’09). ACM, New York, NY, USA, 5–14. Agrawal, R. and Wimmers, E. L. 2000. “A Framework for Expressing and Combining Preferences”. In Procs of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD ’00). ACM, New York, NY, USA, 297–306. Andreka, H., Ryan, M., and Schobbens, P.-Y. 2002. “Operators and Laws for Combining Preference Relations”. Journal of Logic and Computation 12, 1, 13–53. Azzopardi, L. 2009. “Usage Based Effectiveness Measures: Monitoring Application Performance in Information Retrieval”. In Procs the 18th ACM Conferemce on Information and Knowledge Management (CIKM’09). ACM, New York, NY, USA, 631–640. Balke, W.-T. and Güntzer, U. 2004. “Multi-Objective Query Processing for Database Systems”. In Procs of the Thirtieth International Conference on Very large Data Bases (VLDB’04). VLDB Endowment, 936–947. Barrett, R. and Salles, M. 2006. ”Social Choice with Fuzzy Preferences”. Economics Working Paper Archive (University of Rennes 1 & University of Caen), Center for Research in Economics and Management (CREM), University of Rennes 1, University of Caen and CNRS. 149

150

References

Basu, C., Hirsh, H., and Cohen, W. W. 1998. “Recommendation as Classification: Using Social and ContentBased Information in Recommendation”. In In Procs of the Fifteenth National Conference on Artificial Intelligence (AAAI/IAAI’98). 714–720. Becker, C. and Bizer, C. 2009. “Exploring the Geospatial Semantic Web with DBpedia Mobile”. Web Semantics: Science, Services and Agents on the World Wide Web 7, 4, 278 – 286. Ben-Yitzhak, O., Golbandi, N., Har’El, N., Lempel, R., Neumann, A., Ofek-Koifman, S., Sheinwald, D., Shekita, E., Sznajder, B., and Yogev, S. 2008. “Beyond Basic Faceted Search”. In Procs of the International Conference on Web Search and Web Data Mining, (WSDM’08). Palo Alto, California, USA, 33–44. Binshtok, M., Brafman, R. I., Shimony, S. E., Martin, A., and Boutilier, C. 2007. “Computing Optimal Subsets”. In Procs of the 22nd National Conference on Artificial Intelligence - Volume 2 (AAAI’07). AAAI Press, 1231–1236. Bizer, C., Heath, T., and Berners-Lee, T. 2009. “Linked Data - The Story So Far”. International Journal of Semantic Web Information Systems 5, 3, 1–22. Borda, J. C. 1781. “Memoire sur les Elections au Scrutin”. Histoire de l’Academie Royale des Sciences, Paris. Bot, R. S. and Wu, Y. B. 2004. “Improving Document Representations Using Relevance Feedback: The RFA Algorithm”. In Procs of the 13th ACM International Conference on Information and Knowledge Management (CICM’04). Washington, USA. Boutilier, C., Brafman, R. I., Domshlak, C., Hoos, H. H., and Poole, D. 2004. ”CP-nets: A Tool for Representing and Reasoning with Conditional Ceteris Paribus Preference Statements”. Journal Of Artificial Intelligence Research 21, 135–191. Brafman, R. I., Domshlak, C., Shimony, S. E., and Silver, Y. 2006. “Preferences Over Sets”. In Procs of the 21st National Conference on Artificial Intelligence - Volume 2 (AAAI’06). AAAI Press, 1101–1106. Braziunas, D. 2006. “Computational Approaches to Preference Elicitation”. Tech. rep., Department of Computer Science, University of Toronto.

References

151

Breese, J., Heckerman, D., and Kadie, C. 1998. “Empirical Analysis of Predictive Algorithms for Collaborative Filtering”. In Procs of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98). Morgan Kaufmann, San Francisco, CA, 43–52. Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. N. 2005. “Learning to Rank Using Gradient Descent”. In Procs of the 22nd international conference on Machine learning (ICML’05). 89–96. Byström, K. and Järvelin, K. 1995. “Task Complexity Affects Information Seeking and Use”. In Information Processing and Management. 191–213. Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. “The Skyline Operator”. In Procs of the 17th International Conference on Data Engineering (ICDE’01). 421–430. Callan, J. 1996. “Document Filtering with Inference Networks”. In Procs of the 19th Annual International Conference on Research and Development in Information Retrieval (SIGIR’96). New York, NY, USA, 262–269. Carpineto, C., Osiński, S., Romano, G., and Weiss, D. 2009. “A Survey of Web Clustering Engines”. ACM Computing Surveys 41, 3, 17:1–17:38. Carterette, B., Kanoulas, E., and Yilmaz, E. 2011. “Simulating Simple User Behavior for System Effectiveness Evaluation”. In Procs of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). ACM, New York, NY, USA, 611–620. Carterette, B., Kanoulas, E., and Yilmaz, E. 2012. “Evaluating Web Retrieval Effectiveness”. In Web Search Engine Research, D. Lewandowski, Ed. Emerald Books, 105–137. Chakrabarti, K., Chaudhuri, S., and Hwang, S. 2004. “Automatic Categorization of Query Results”. Procs of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD’04), 755–766. Chan, C.-Y., Jagadish, H. V., Tan, K.-L., Tung, A. K. H., and Zhang, Z. 2006a. “Finding k-Dominant Skylines in High Dimensional Space”. In Procs of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06). ACM, New York, NY, USA, 503–514.

152

References

Chan, C.-Y., Jagadish, H. V., Tan, K.-L., Tung, A. K. H., and Zhang, Z. 2006b. “On High Dimensional Skylines”. In Procs of the 10th International Conference on Advances in Database Technology (EDBT’06). Springer-Verlag, Berlin, Heidelberg, 478–495. Chang, K. C. and Hwang, S. 2002. “Minimal Probing: Supporting Expensive Predicates for Top-k Queries”. In Procs of the 2002 ACM SIGMOD International Conference on Managementt of Data (SIGMOD’02). 346–357. Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., and Wu, S.-L. 2011. “Intent-Based Diversification of Web Search Results: Metrics and Algorithms.”. Information Retrieval 14, 6, 572–592. Chapelle, O., Metlzer, D., Zhang, Y., and Grinspan, P. 2009. “Expected Reciprocal Rank for Graded Relevance”. In Procs of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). 621–630. Chaudhuri, S. and Gravano, L. 1999. “Evaluating Top-k Selection Queries”. In Procs of 25th International Conference on Very Large Data Bases (VLDB’99). 397–410. Chen, G. and Kotz, D. 2000. “A Survey of Context-Aware Mobile Computing Research”. Tech. rep., Hanover, NH, USA. Chen, L. and Pu, P. 2004. ”Survey of Preference Elicitation Methods”. Tech. rep., Swiss Federal Institute of Technology in Lausanne (EPFL). Choi, J., Kim, M., and Raghavan, V. V. 2001. “Adaptive Feedback Methods in an Extended Boolean Model”. In ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval. New Orleans, LA. Chomicki, J. 2003. “Preference Formulas in Relational Queries”. ACM Transactions on Database Systems 28, 4, 427–466. Chomicki, J. 2007. “Database Querying Under Changing Preferences”. Annual of Mathematics and Artificial Intelligence 50, 1-2, 79–109. Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2003. “Skyline with Presorting”. Procs of Data Engineering, International Conference (ICDE’03), 717–719. Chowdhury, S., Gibb, F., and Landoni, M. 2011. “Uncertainty in Information Seeking and Retrieval: A Study in an Academic Environment”. Information Processing & Management 47, 2, 157–175.

References

153

Ciaccia, P. and Torlone, R. 2011. “Modeling the Propagation of User Preferences”. In Procs of the 30th International Conference on Conceptual Modeling (ER’11). 304–317. Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., and MacKinnon, I. 2008. “Novelty and Diversity in Information Retrieval Evaluation”. In Procs of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, USA, 659–666. Cohen, W. W., Schapire, R. E., and Singer, Y. 1999. “Learning to Order Things”. Journal of Artificial Intelligence Research 10, 243–270. Cooper, W. S. 1968. “Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the Weak Ordering Action of Retrieval Systems”. In American Documentation. 30–41. Crawford, D. E. 2006. “Supporting Exploratory Search”. Communications of ACM 49, 4. Croft, B. W. and Lafferty, J., Eds. 2003. “Language Modeling for Information Retrieval”. The Information Retrieval Series, vol. 13. Springer. Dakka, W., Ipeirotis, P., and Wood, K. R. 2005. “Automatic Construction of Multifaceted Browsing Interfaces”. In Procs of the 14th ACM International Conference on Information and Knowledge Management (CIKM ’05). New York, NY, USA, 768–775. Dash, D., Rao, J., Megiddo, N., Ailamaki, A., and Lohman, G. 2008. “Dynamic Faceted Search for DiscoveryDriven Analysis”. In Procs of CIKM. Delgado, J. and Ishii, N. 1999. “Memory-Based Weighted-Majority Prediction for Recommender Systems”. desJardins, M., Eaton, E., and Wagstaff, K. L. 2006. “Learning User Preferences for Sets of Objects”. In Procs of the 23rd International Conference on Machine Learning (ICML ’06). ACM, New York, NY, USA, 273–280. desJardins, M. and Wagstaff, K. 2005. “DD-PREF: A Language for Expressing Preferences Over Sets”. In Procs of the 20th national conference on Artificial intelligence (AAAI’05). 620–626. Doyle, J. 2004. “Prospects for Preferences”. Computational Intelligence 20, 2, 111–136.

154

References

Fafalios, P., Kitsos, I., Marketakis, Y., Baldassarre, C., Salampasis, M., and Tzitzikas, Y. 2012a. “Web Searching with Entity Mining at Query Time”. In Procs of the 5th Information Retrieval Facility Conference (IRFC’05). Vienna, Austria. Fafalios, P., Kitsos, I., and Tzitzikas, Y. 2012b. “Scalable, Flexible and Generic Instant Overview Search”. In Procs of the 21st international conference companion on World Wide Web. ACM, 333–336. Fafalios, P., Salampasis, M., and Tzitzikas, Y. 2013. “Exploratory Patent Search with Faceted Search and Configurable Entity Mining”. In Procs of the 1st International Workshop on Integrating IR technologies for Professional Search (ECIR 2013 Workshop). Moscow, Russia. Fafalios, P. and Tzitzikas, Y. 2013. “X-ENS: Semantic Enrichment of Web Search Results at Real-Time”. In Procs of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13 Demo paper). Dublin, Ireland. Faulkner, L. 2003. “Beyond the Five-User Assumption: Benefits of Increased Sample Sizes in Usability Testing”. Behavior Research Methods, Instruments & Computers 35, 3, 379–383. Ferré, S. and Hermann, A. 2012. “Reconciling Faceted Search and Query Languages for the Semantic Web”. International Journal of Metadata, Semantics and Ontologies 7, 1, 37–54. Fishburn, P. 1970. “Utility Theory for Decision Making”. Wiley, New York. Fishburn, P. 1999. “Preference Structures and their Numerical Representations”. Theoretical Computer Science 217, 359–383. Gadanho, S. C. and Lhuillier, N. 2007. “Addressing Uncertainty in Implicit Preferences”. In Procs of the 2007 ACM Conference on Recommender Systems (RecSys ’07). ACM, New York, NY, USA, 97–104. Georgiadis, P., Kapantaidakis, I., Christophides, V., Nguer, E. M., and Spyratos, N. 2008. “Efficient Rewriting Algorithms for Preference Queries”. In Procs of the 24th International Conference on Data Engineering (ICDE’08). Golfarelli, M., Rizzi, S., and Biondi, P. 2011. “myOLAP: An Approach to Express and Evaluate OLAP Preferences”. IEEE Transactions Knowledge and Data Engineering 23, 7, 1050–1064.

References

155

Griffiths, D. 2009. “Head First Statistics”. Head first. O’Reilly, Sebastopol, CA. Hansson, S. O. 2001. “Preference Logic”. In Handbook of Philosophical Logic, D. Gabbay and F. Guenthner, Eds. Vol. 4. Kluwer, Chapter 4, 319–393. Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, K.-P. 2002. “Finding the Flow in Web Site Search”. Communications of ACM 45, 9, 42–49. Hearst, M. and Pedersen, J. 1996. “Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results”. In Procs of the 19th Annual International ACM Conference on Research and Development in Information Retrieval, (SIGIR’96). Zurich, Switzerland, 76–84. Hearst, M. A. 2006. “Clustering versus Faceted Categories for Information Exploration”. Communications of the ACM 49, 4, 59–61. Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. 1999. “An Algorithmic Framework for Performing Collaborative Filtering”. In Procs of the 22nd Annual International ACM conference on Research and Development in Information Retrieval (SIGIR’99). ACM Press, New York, NY, USA, 230–237. Hildebrand, M., van Ossenbruggen, J., and Hardman, L. 2006. “/facet: A Browser for Heterogeneous Semantic Web Repositories”. In Procs of International Semantic Web Conference, (ISWC’06). Athens, GA, USA, 272–285. Hofmann, T. and Puzicha, J. 1999. “Latent Class Models for Collaborative Filtering”. In Procs of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 688–693. Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., and Kettula, S. 2005. “MuseumFinland – Finnish Museums on the Semantic Web”. Journal of Web Semantics 3, 2, 25. Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. “Supporting Top-k Join Queries in Relational Databases”. VLDB Journal 13, 3, 207–221. Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. “Rank-Aware Query Optimization”. In Procs of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD ’04). 203–214.

156

References

Inan, H. 2006. “Search Analytics: A Guide to Analyzing and Optimizing Website Search Engines”. Book Surge Publishing. Järvelin, K. and Kekäläinen, J. 2002. “Cumulated Gain-Based Evaluation of IR Techniques”. ACM Transactions Information Systems 20, 4, 422–446. Järvelin, K., Price, S. L., Delcambre, L. M. L., and Nielsen, M. L. 2008. “Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions”. In European Conference on Information Retrieval (ECIR’08). 4–15. Jin, R., Chai, J. Y., and Si, L. 2004. “An Automatic Weighting Scheme for Collaborative Filtering”. In Procs of the 27th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’04). ACM Press, 337–344. Kahn, A. B. 1962. “Topological Sorting of Large Networks”. Communications of the ACM 5, 11, 558–562. Käki, M. and Aula, A. 2008. “Controlling the Complexity in Comparing Search User Interfaces via User Studies”. Information Processing & Management 44, 1, 82–91. Kanoulas, E., Carterette, B., Clough, P., and Sanderson, M. 2011a. “Evaluating Multi-Query Sessions”. In Procs of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 1053–1062. Kanoulas, E., Carterette, B., Clough, P., and Sanderson, M. 2011b. “Session Track 2011 Overview”. In Procs of the Twentieth Text REtrieval Conference Procs (TREC 2011). National Institute of Standards and Technology. Karlson, A. K., Robertson, G. G., Robbins, D. C., Czerwinski, M. P., and Smith, G. R. 2006. “FaThumb: a Facet-Based Interface for Mobile Search.”. In Procs of the Conference on Human Factors in Computing Systems, (CHI’06). New York, NY, USA, 711–720. Kashyap, A., Hristidis, V., and Petropoulos, M. 2010. “FACeTOR: Cost-Driven Exploration of Faceted Query Results”. In Procs of the 19th ACM international conference on Information and knowledge management (CIKM ’10). ACM, New York, NY, USA, 719–728.

References

157

Kazai, G. and Milic-Frayling, N. 2008. “Trust, Authority and Popularity in Social Information Retrieval”. In Procs of the 17th ACM conference on Information and knowledge management (CIKM’08). ACM, New York, NY, USA, 1503–1504. Keeney, R. L. and Raiffa, H. 1976. “Decisions with Multiple Objectives: Preferences and Value Tradeoffs”. John Wiley & Sons. Kelly, D. 2009. “Methods for Evaluating Interactive Information Retrieval Systems with Users”. Foundations and Trends in Information Retrieval 3, 1-2, 1–224. Kelly, D. and Belkin, N. J. 2001. “Reading Time, Scrolling and Interaction: Exploring Implicit Sources of User Preferences for Relevance Feedback”. In Procs of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, NY, USA, 408–409. Kelly, D., Dumais, S., and Pedersen, J. O. 2009. “Evaluation Challenges and Directions for InformationSeeking Support Systems”. Computer 42, 3, 60–66. Kelly, D. and Teevan, J. 2003. “Implicit Feedback for Inferring User Preference: a Bibliography”. SIGIR Forum 37, 2, 18–28. Kießling, W. 2002. “Foundations of Preferences in Database Systems”. In Procs of the 28th International Conference on Very Large Data Bases (VLDB’02). VLDB Endowment, 311–322. Kießling, W., Endres, M., and Wenzel, F. 2011a. “The Preference SQL System - An Overview”. IEEE Data Engineering Bulletin 34, 2, 11–18. Kießling, W., Hafenrichter, B., 0003, S. F., and Holland, S. 2001. “Preference XPATH: A Query Language for E-Commerce”. In Wirtschaftsinformatik, H. U. Buhl, A. Huther, and B. Reitwiesner, Eds. Physica Verlag / Springer, 32. Kießling, W. and Kostler, G. 2002. “Preference SQL - Design, Implementation, Experiences”. In Procs of the 28th International Conference on Very Large Data Bases (VLDB’02). Hong Kong, China, 990–1001. Kießling, W., Soutschek, M., Huhn, A., Roocks, P., Endres, M., Mandl, S., Wenzel, F., and Zelend, A. 2011b. “Context-Aware Preference Search for Outdoor Activity Platforms”. Tech. rep., Institut fur Informatik, Universitat Augsburg, Augsburg, Germany. November.

158

References

Kitsos, I., Magoutis, K., and Tzitzikas, Y. 2013. “Scalable Entity-Based Summarization of Web Xearch Results Using MapReduce”. Distributed and Parallel Databases. Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., and Riedl, J. 1997. “GroupLens: Applying Collaborative Filtering to Usenet News”. Communications of the ACM 40, 3, 77–87. Kopidaki, S., Papadakos, P., and Tzitzikas, Y. 2009. “STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching”. In Procs of the 10th International Conference on Web Information Systems Engineering (WISE’09). Koren, J., Zhang, Y., and Liu, X. 2008. “Personalized Interactive Faceted Search”. In Procs of the 17th International Conference on World Wide Web (WWW’08). WWW, 477–486. Korfhage, R. R. 1997. “Information Storage and Retrieval”. John Wiley & Sons. Kossmann, D., Ramsak, F., and Rost, S. 2002. “Shooting Stars in the Sky: An Online Algorithm for Skyline Queries”. In Procs of the 28th International Conference on Very large Data Bases (VLDB’02). 275–286. Koutrika, G. and Ioannidis, Y. 2005. “Personalized Queries under a Generalized Preference Model”. In Procs of the 21st International Conference on Data Engineering (ICDE ’05). IEEE Computer Society, Washington, DC, USA, 841–852. Koutrika, G. and Ioannidis, Y. E. 2004. “Personalization of Queries in Database Systems”. In Procs of the 20th International Conference on Data Engineering (ICDE ’04). 597–608. Kules, B. and Capra, R. 2008. “Creating Exploratory Tasks for a Faceted Search Interface”. In Workshop on Computer Interaction and Information Retrieval, (HCIR’08 Workshop). 18–21. Kules, B., Capra, R., Banta, M., and Sierra, T. 2009. “What do Exploratory Searchers Look at in a Faceted Search Interface?”. In Procs of the 9th ACM/IEEE-CS joint conference on Digital libraries (JCDL’09). 313–322. Le Phuoc, D., Parreira, J. X., Reynolds, V., and Hauswirth, M. 2010. “RDF On the Go: RDF Storage and Query Processor for Mobile Devices”. In Procs of the 9th International Semantic Web Conference (ISWC’10 Posters&Demos). Lee, J., You, G.-w., and Hwang, S.-w. 2009. “Personalized Top-k Skyline Queries in High-Dimensional Space”. Information Systems 34, 1, 45–61.

References

159

Levandoski, J. J., Mokbel, M. F., and Khalefa, M. E. 2010. “FlexPref: A Framework for Extensible Preference Evaluation in Database Systems”. In Procs of the 26th International Conference on Data Engineering (ICDE’10). 828–839. Lewis, D. D. 2001. “Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks”. In Text Retrieval Conference (TREC-10). 286–292. Li, G., Feng, J., Zhou, X., and Wang, J. 2011. “Providing Built-In Keyword Search Capabilities in RDBMS”. The VLDB Journal 20, 1–19. Lichtenstein, S. and Slovic, P. 2006. “The Construction of Preference” Thirteenth Ed. Cambridge University Press. Lin, X., Yuan, Y., Zhang, Q., and Zhang, Y. 2007. “Selecting Stars: the k Most Representative Skyline Operator”. In Procs of the 23th International Conference on Data Engineering (ICDE’07). Linden, G., Hanks, S., and Lesh, N. 1997. “Interactive Assessment of User Preference Models: The Automated Travel Assistant”. In Procs of the Sixth International Conference of User Modeling (UM’97), C. P. A. Jameson and C. Tasso, Eds. Springer Wien, 67–78. Lindgaard, G. and Chattratichart, J. 2007. “Usability Testing: What Have we Overlooked?”. In Procs of the SIGCHI Conference on Human Factors in Computing Systems (CHI’07). ACM, New York, NY, USA, 1415–1424. Liu, T.-Y. 2011. “Learning to Rank for Information Retrieval”. Springer. Mäkelä, E., Hyvönen, E., and Saarela, S. 2006. “Ontogator - A Semantic Biew-Based Search Engine Service for Web Applications”. In Procs of International Semantic Web Conference (ISWC’06). Athens, GA, USA, 847– 860. Mäkelä, E., Viljanen, K., Lindgren, P., Laukkanen, M., and Hyvönen, E. 2005. “Semantic Yellow Page Service Discovery: The Veturi Portal”. Poster paper at International Semantic Web Conference (ISWC’05), Galway, Ireland. Manolis, N. and Tzitzikas, Y. 2011. “Interactive Exploration of Fuzzy RDF Knowledge Bases”. In Procs of the 8th Extended Semantic Web Conference (ESWC’11). Heraklion, Greece.

160

References

Marchionini, G. 2006. “Exploratory Search: From Finding to Understanding”. Communications of the ACM 49, 4, 41–46. Meij, E., Mika, P., and Zaragoza, H. 2009. “An Evaluation of Entity and Frequency Based Query Completion Methods”. In Procs of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 678–679. Melville, P., Mooney, R. J., and Nagarajan, R. 2001. “Content-Boosted Collaborative Filtering”. In Procs of the 2001 SIGIR Workshop on Recommender Systems (SIGIR’01 Workshop). Moffat, A. and Zobel, J. 2008. “Rank-Biased Precision for Measurement of Retrieval Effectiveness”. ACM Transactions Information Systems 27, 1, 2:1–2:27. Neumann, G. and Schmeier, S. 2012. “Exploratory Search on the Mobile Web”. In 4th International Conference on Agents and Artificial Intelligence (ICAART 2012). SciTePress, 110–119. Neves, R. D. S. and Kaci, S. 2010. “Combining Totalitarian and Ceteris Paribus Semantics in Database Preference Queries”. Logic Journal of the IGPL 18, 3, 464–483. O’Brien, H. L., Toms, E. G., Kelloway, K., and Kelly, E. 2008. “Developing and Evaluating a Reliable Measure of User Engagement”. 45, 1, 1–10. Oren, E., Delbru, R., and Decker, S. 2006. “Extending Faceted Navigation for RDF Data”. In Procs of the 5th Internation Semantic Web Conference (ISWC’06). Athens, GA, USA, 559–572. Over, P. 1997. “TREC-7 Interactive Track Report”. In Procs of Text REtrieval Conference (TREC’97). 57–64. Papadakos, P. 2009. “Exploratory Web Searching with Dynamic Taxonomies, Results Clustering and Visualization”. In Procs of the 13th European Conference on Digital Libraries Doctoral Consortium (ECDL’09 DC). Corfu, Greece. http://www.ieee-tcdl.org/Bulletin/v6n1/Papadakos/papadakos.html. Papadakos, P., Armenatzoglou, N., Kopidaki, S., and Tzitzikas, Y. 2012a. “On Exploiting Static and Dynamically Mined Metadata for Exploratory Web Searching”. Knowledge and Information Systems 30, 3, 493–525.

References

161

Papadakos, P., Kopidaki, S., Armenatzoglou, N., and Tzitzikas, Y. 2009a. “Exploratory Web Searching with Dynamic Taxonomies and Results Clustering”. In Procs of the 13th European Conference on Digital Libraries (ECDL’09). Papadakos, P., Kopidaki, S., Armenatzoglou, N., and Tzitzikas, Y. 2009b. “Exploratory Web Searching with Dynamic Taxonomies and Results Clustering”. In Procs of the 8th Hellenic Data Management Symposium (HDMS’09). Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., and Tzitzikas, Y. 2008a. “Mitos: Design and Evaluation of a DBMS-Based Web Search Engine”. In Procs of the 12th Pan-Hellenic Conference on Informatics (PCI’08). Greece. Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., and Tzitzikas, Y. 2009c.

“Object-

Relational Database Representations for Text Indexing”. CoRR abs/0906.3112. Papadakos, P., Tzitzikas, Y., and Zafeiri, D. 2012b. “An Interactive Exploratory System with Real-Time Preference Elicitation”. In Procs of the 13th International Conference on Web Information Systems Engineering (WISE’12 Demo Paper). Papadakos, P., Vasiliadis, G., Theoharis, Y., Armenatzoglou, N., Kopidaki, S., Marketakis, Y., Daskalakis, M., Karamaroudis, K., Linardakis, G., Makrydakis, G., Papathanasiou, V., Sardis, L., Tsialiamanis, P., Troullinou, G., Vandikas, K., Velegrakis, D., and Tzitzikas, Y. 2008b. “The Anatomy of Mitos Web Search Engine”. CoRR, Information Retrieval abs/0803.2220. Available at http://arxiv.org/abs/0803.2220. Papadias, D., Ta, Y., Fu, G., and Seeger, B. 2005. “Progressive Skyline Computation in Database Systems”. ACM Transactions on Database Systems 30, 1, 41–82. Peintner, B., Viappiani, P., and Yorke-Smith, N. 2008. “Preferences in Interactive Systems: Technical Challenges and Case Studies”. AI Magazine 29, 4, 13–24. Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., and Breuel, T. 2002. “Personalized Search: A Contextual Computing Approach may Prove a Breakthrough in Personalized Search Efficiency”. Communications of the ACM 45, 9, 50–55.

162

References

Pound, J., Paparizos, S., and Tsaparas, P. 2011. “Facet Discovery for Structured Web Search: a Query-Log Mining Approach”. In Procs of the 2011 International Conference on Management of Data (SIGMOD’11). ACM, New York, NY, USA, 169–180. Powley, B. and Dale, R. 2007. “Evidence-Based Information Extraction for High-Accuracy Citation Extraction and Author Name Recognition”. In Procs of the 8th RIAO International Conference on Large-Scale Semantic Access to Content. Pu, P. and Chen, L. 2008. “User-Involved Preference Elicitation for Product Search and Recommender Systems”. AI Magazine 29, 4, 93–103. Rashid, A. M., Albert, I., Cosley, D., Lam, S. K., Mcnee, S. M., Konstan, J. A., and Riedl, J. 2002. “Getting to Know You: Learning New User Preferences in Recommender Systems”. In Procs of the 7th International Conference on Intelligent User Interfaces (IUI’02). ACM Press, New York, NY, USA, 127–134. Reisner, P. 1981. “Human Factors Studies of Database Query Languages: A Survey and Assessment”. ACM Computing Surveys 13, 1, 13–31. Robertson, S. 2008. “A New Interpretation of Average Precision”. In Procs of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, USA, 689–690. Robertson, S. E. and Jones, S. K. 1976. “Relevance Weighting of Search Terms”. Journal of the American Society for Information Science 27, 3, 129–146. Robertson, S. E., Kanoulas, E., and Yilmaz, E. 2010. “Extending Average Precision to Graded Relevance Judgments”. In Procs of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, USA, 603–610. Rochio, J. 1971. “Relevance Feedback in Information Retrieval”. In The SMART Retrieval System, G. Salton, Ed. Prentice Hall, Englewood Cliffs, NJ, 313–323. Rose, D. E. and Levinson, D. 2004. “Understanding User Goals in Web Search”. In Procs of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, NY, USA, 13–19.

References

163

Ross, K. A. 2007. “On the Adequacy of Partial Orders for Preference Composition”. Tech. rep., In DBRank Workshop. Rossi, F., Venable, K. B., and Walsh, T. 2008. ”Preferences in constraint satisfaction and optimization”. AI Magazine 28, 4. Roy, S. B. and Das, G. 2009. “TRANS: Top-k Implementation Techniques of Minimum Effort Driven Faceted Search For Databases”. In Procs of the 15th International Conference on Management of Data (COMAD’09), S. Chawla, K. Karlapalem, and V. Pudi, Eds. Computer Society of India. Roy, S. B., Wang, H., Das, G., Nambiar, U., and Mohania, M. 2008. “Minimum-Effort Driven Dynamic Faceted Search in Structured Databases”. In Procs of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). New York, NY, USA, 13–22. Ruotsalo, T., Athukorala, K., Glowacka, D., Konyushkova, K., Oulasvirta, A., Kaipiainen, S., Kaski, S., and Jacucci, G. 2013a. “Supporting Exploratory Search Tasks with Interactive User Modelling”. In Procs of ASIST 2013, the 76th ASIS&T Annual Meeting. Ruotsalo, T., Peltonen, J., Eugster, M. J., Głowacka, D., Konyushkova, K., Athukorala, K., Kosunen, I., Reijonen, A., Myllymäki, P., Jacucci, G., et al. 2013b. “Directing Exploratory Search with Interactive Intent Modeling”. In Procs of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). Sacco, G. 2006a. “Some Research Results in Dynamic Taxonomy and Faceted Search Systems”. In Procs of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Workshop on Faceted Search (SIGIR’06). Sacco, G. M. 2006b. “Analysis and Validation of Information Access Through Mono, Multidimensional and Dynamic Taxonomies”. In Flexible Query Answering Systems, 7th International Conference (FQAS’06). 659–670. Sacco, G. M. and Tzitzikas, Y., Eds. 2009. “Dynamic Taxonomies and Faceted Search: Theory, Practise and Experience”. Springer. Schafer, J. B., Konstan, J. A., and Riedl, J. 2001. “E-Commerce Recommendation Applications”. Data Mining and Knowledge Discovery 5, 1-2, 115–153.

164

References

Scherer, K. R. 2005. “What are Emotions? And How can They be Measured?”. Social Science Information 44, 695–729. Schraefel, M. C., Karam, M., and Zhao, S. 2003. “mSpace: Interaction Design for User-Determined, Adaptable Domain Exploration in Hypermedia”. In Procs of Workshop on Adaptive Hypermedia and Adaptive Web Based Systems (AH’03). Nottingham, UK, 217–235. Schuth, A. and Marx, M. 2011. “Evaluation Methods for Rankings of Facetvalues for Faceted Search”. In Procs of the Second International Conference on Multilingual and Multimodal Information Access Evaluation (CLEF’11). Springer-Verlag, Berlin, Heidelberg, 131–136. Shawe-Taylor, J., Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C., Graepel, T., Li, Y., and Renders, J.-M. 2002. “Kernel Methods for Document Filtering”. In The Eleventh Text Retrieval Conference (TREC 2002), E. Voorhees and L. P. Buckland, Eds. Vol. NIST Special Publication 500-251. Department of Commerce, National Institute of Standards and Technology. Shokouhi, M. 2013. “Learning to Personalize Query Auto-Completion”. In Procs of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY, USA, 103–112. Shokouhi, M. and Radinsky, K. 2012. “Time-Sensitive Query Auto-Completion”. In Procs of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, USA, 601–610. Spyratos, N., Sugibuchi, T., and Yang, J. 2011. “Personalizing Queries over Large Data Tables”. In Procs of the 15th East-European Conference on Advances in Databases and Information System (ADBIS 2011). Vienna, Austria. Stefanidis, K., Drosou, M., and Pitoura, E. 2010. “PerK: Personalized Keyword Search in Relational Databases through Preferences”. In Procs of the 14th International Conference on Advances in Database Technology (EDBT’10). 585–596. Stefanidis, K., Koutrika, G., and Pitoura, E. 2011a. “A Survey on Representation, Composition and Application of Preferences in Database Systems”. ACM Transactions on Database Systems 36, 19:1–19:45.

References

165

Stefanidis, K., Pitoura, E., and Vassiliadis, P. 2011b. “Managing Contextual Preferences”. Information Systems 36, 8, 1158 – 1180. Tao, Y., Ding, L., Lin, X., and Pei, J. 2009. “Distance-Based Representative Skyline”. In Procs of the 2009 IEEE International Conference on Data Engineering (ICDE’09). IEEE Computer Society, Washington, DC, USA, 892–903. Toms, E. G., O’Brien, H. L., Kopak, R. W., and Freund, L. 2005. “Searching for Relevance in the Relevance of Search”. In Procs of the 5th International Conference on Conceptions of Library and Information Sciences (CoLIS’05). 59–78. Torlone, R. and Ciaccia, P. 2002. “Which are my Preferred Items?”. In Workshop on Recommendation and Personalization in eCommerce, RPEC-2002. Malaga, Spain, 217–225. Tvarožek, M. 2006. “Personalized Navigation in the Semantic Web.”. In Procs of the 4th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH’06) (2006-06-27), V. P. Wade, H. Ashman, and B. Smyth, Eds. Lecture Notes in Computer Science Series, vol. 4018. Springer, 467–472. Tvarožek, M., Barla, M., Frivolt, G., Tomša, M., and Bieliková, M. 2008. “Improving Semantic Search Via Integrated Personalized Faceted and Visual Graph Navigation.”. In Procs of the 34th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’08) (2008-01-09). Lecture Notes in Computer Science Series, vol. 4910. Springer, 778–789. Tvarožek, M. and Bieliková, M. 2007a. “Adaptive Faceted Browser for Navigation in Open Information Spaces”. In Procs of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, USA, 1311–1312. Tvarožek, M. and Bieliková, M. 2007b. “Personalized Faceted Browsing for Digital Libraries”. In Procs of the 11th European Conference on Digital Libraries (ECDL’07). 485–488. Tvarožek, M. and Bieliková, M. 2007c. “Personalized Faceted Navigation for Multimedia Collections”. In Procs of the Second International Workshop on Semantic Media Adaptation and Personalization (SMAP’07). IEEE Computer Society, Washington, DC, USA, 104–109. Tvarožek, M. and Bieliková, M. 2007d. “Personalized Faceted Navigation in the Semantic Web”. Web Engineering, 511–515.

166

References

Tzitzikas, Y., Armenatzoglou, N., and Papadakos, P. Sept. 3, 2008. “FleXplorer: A Framework for Providing Faceted and Dynamic Taxonomy-based Information Exploration”. In Procs of 20th International Database and Expert Systems Application Workshop FIND’2008 (DEXA’08 FIND Workshop). Torino, Italy, 212–216. Tzitzikas, Y., Kampouraki, M., and Analyti, A. 2013. “Curating the Specificity of Ontological Descriptions under Ontology Evolution”. Journal on Data Semantics, 1–32. Tzitzikas, Y. and Papadakos, P. 2013. “Interactive Exploration of Multi-Dimensional and Hierarchical Information Spaces with Real-Time Preference Elicitation”. Fundamenta Informaticae 122, 4, 357–399. Vee, E., Shanmugasundaram, J., and Amer-Yahia, S. 2009. “Efficient Computation of Diverse Query Results”. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 32, 4, 57–64. Wagner, A., Ladwig, G., and Tran, T. 2011. “Browsing-Oriented Semantic Faceted Search”. In Procs of 22th International Conference on the Database and Expert Systems Applicatione (DEXA’11). 303–319. Wagstaff, K. L., desJardins, M., and Eaton, E. 2010. “Modelling and Learning User Preferences Over Sets”. Journal of Experimental & Theoretical Artificial Intelligence 22, 237–268. Wang, J., de Vries, A. P., and Reinders, M. J. T. 2006. “Unifying User-Based and Item-Based Collaborative Filtering Approaches by Similarity Fusion”. In Procs of the 29th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’06). ACM Press, New York, NY, USA, 501–508. Wasserman, L. 2004. “All of Statistics : A Concise Course in Statistical Inference”. Webber, W., Chandar, P., and Carterette, B. 2012. “Alternative Assessor Disagreement and Retrieval Depth”. In Procs of the 21st ACM international conference on Information and knowledge management (CIKM’12). ACM, New York, NY, USA, 125–134. Wellman, M. P. and Doyle, J. 1991. “Preferential Semantics for Goals”. In Procs of the 9th National Conference on Artificial Intelligence (AAAI’91). 698–703. White, R. W., Bennett, P. N., and Dumais, S. T. 2010. “Predicting Short-Term Interests Using ActivityBased Search Context”. In Procs of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, USA, 1009–1018.

References

167

White, R. W., Drucker, S. M., Marchionini, G., Hearst, M. A., and Schraefel, M. C. 2007. “Exploratory Search and HCI: Designing and Evaluating Interfaces to Support Exploratory Search Interaction”. In Procs of the Extended Abstracts on Human Factors in Computing Systems (CHI’07 EA), M. B. Rosson and D. J. Gilmore, Eds. ACM, 2877–2880. Wilson, M. L. and Schraefel, M. C. 2007. “Bridging the Gap: Using IR Models for Evaluating Exploratory Search Interfaces”. In Workshop on Exploratory Search and HCI (SIGCHI’2007). ACM. Xia, T., Zhang, D., and Tao, Y. 2008. “On Skylining with Flexible Dominance Relation”. In Procs of the 2008 IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE Computer Society, Washington, DC, USA, 1397–1399. Yang, Y. and Lad, A. 2009. “Modeling Expected Utility of Multi-session Information Distillation”. In Procs of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory (ICTIR’09). Springer-Verlag, Berlin, Heidelberg, 164–175. Yang, Y., Yoo, S., Zhang, J., and Kisiel, B. 2005. “Robustness of Adaptive Filtering Methods in a CrossBenchmark Evaluation”. In Procs of the 28th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’05). ACM, New York, NY, USA, 98–105. Yee, K., Swearingen, K., Li, K., and Hearst, M. 2003. “Faceted Metadata for Image Search and Browsing”. Procs of the SIGCHI Conference on Human Factors in Computing Systems (CHI’03), 401–408. Yilmaz, E., Shokouhi, M., Craswell, N., and Robertson, S. 2010. “Expected Browsing Utility for Web Search Evaluation”. In Procs of the 19th ACM international conference on Information and knowledge management (CIKM’10). 1561–1564. Yiu, M. L. and Mamoulis, N. 2007.

“Efficient Processing of Top-k Dominating Queries on Multi-

Dimensional Data”. In Procs of the 33rd International Conference on Very Large Data Bases (VLDB’07). VLDB Endowment, 483–494. Yu, K., Tresp, V., and Yu, S. 2004. “A Non-Parametric Hierarchical Bayesian Framework for Information Filtering”. In Procs of the 27th annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY, USA, 353–360.

168

References

Zamir, O. and Etzioni, O. 1998. “Web Document Clustering: A Feasibility Demonstration”. In Procs of the 21th Annual International ACM Conference on Research and Development in Information Retrieval, (SIGIR’98). Melbourne, Australia, 46–54. Zha, H., Zheng, Z., Fu, H., and Sun, G. 2006. “Incorporating Query Difference for Learning Retrieval Functions in World Wide Web Search”. In Procs of the 15th ACM International Conference on Information and Knowledge Management (CIKM’06). ACM, New York, NY, USA, 307–316. Zhai, C. and Lafferty, J. 2006. “A Risk Minimization Framework for Information Retrieval”. Information Processing and Management 42, 31–55. Zhai, C. X., Cohen, W. W., and Lafferty, J. 2003. “Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval”. In Procs of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR’03). ACM, New York, NY, USA, 10–17. Zhang, S., Mamoulis, N., Cheung, D. W., and Kao, B. 2010. “Efficient Skyline Evaluation Over Partially Ordered Domains”. Procs of VLDB Endowment 3, 1-2, 1255–1266. Zhang, X. and Chomicki, J. 2011. “Preference Queries Over Sets”. In Procs of the 27th International Conference on Data Engineering (ICDE’11). 1019–1030. Zhang, Y. and Koren, J. 2007. “Efficient Bayesian Hierarchical User Modeling for Recommendation System”. In Procs of the 30th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, USA, 47–54. Zigoris, P. and Zhang, Y. 2006. “Bayesian Adaptive User Profiling with Explicit & Implicit Feedback”. In Procs of the 15th ACM international Conference on Information and Knowledge Management (CIKM’06). ACM, New York, NY, USA, 397–404.

Appendix A

Complete Syntax of Preference Language In this section we give the complete syntax of the language described in Section 3.1.

⟨stmt⟩ ::= ⟨scopeT ype⟩⟨spec⟩ |

facets order : prefer facet ⟨Fi ⟩ to ⟨Fj ⟩

|

terms order : prefer term ⟨ti ⟩ to ⟨tj ⟩

|

objects order : prefer term ⟨ti ⟩ to ⟨tj ⟩

|

objects order : Pareto ⟨setOf F acets⟩

|

objects order : ParetoOptimal ⟨setOf F acets⟩

|

objects order : Priority ⟨orderOf F acets⟩

|

objects order : Combinational ⟨bucketOrderOf F acets⟩

⟨scopeT ype⟩ ::= facets order : | terms order : | objects order : ⟨spec⟩ ::= ⟨anchor⟩⟨rankSpec⟩ ⟨anchor⟩ ::= facet ⟨Fi ⟩ |

term ⟨tj ⟩

|

object ⟨ok ⟩

|

ϵ

// the empty string 169

170

Appendix A. Complete Syntax of Preference Language

⟨rankSpec⟩ ::= {lexicographic | count | value | indexedBy} {min|max} |

best | worst

|

use scoreFunction ⟨score()⟩ {min|max}

⟨nonEmptyF acetElems⟩ ::= ⟨Fi ⟩{‘‘, ”⟨Fj ⟩} ⟨setOf F acet⟩ ::= ‘‘{”⟨nonEmptyF acetElems⟩‘‘}” ⟨orderOf F acets⟩ ::= ‘‘ < ”⟨nonEmptyF acetElems⟩‘‘ > ” ⟨bucketOrderOf F acets⟩ ::= ‘‘ < ”⟨setOf F acet⟩‘‘ > ”

Appendix B

Binary Relations Here we list several typical properties of binary relations. A binary relation R over a set S is called: • reflexive, if ∀a ∈ S, aRa • irreflexive, if ∀a ∈ S, ¬(aRa) • symmetric, if ∀a, b ∈ S, aRb ⇒ bRa • asymmetric, if ∀a, b ∈ S, aRb ⇒ ¬(bRa) • antisymmetric, if ∀a, b ∈ S, (aRb ∧ bRa) ⇒ a = b • transitive if ∀a, b, c ∈ S, (aRb ∧ bRc) ⇒ (aRc) • negatively transitive if ∀a, b, c ∈ S, (¬(aRb) ∧ ¬(bRc)) ⇒ ¬(aRc) • connected (strongly complete or total), if ∀a, b ∈ S, (aRb) ∨ (bRa) ∨ (a = b) The above properties are not independent. Asymmetry implies irreflexivity, while irreflexivity and transitivity imply assymetry. Based on its properties a binary relation is characterized as follows: • A binary relation is a preorder or quasi-order, if it is reflexive and transitive. If it is in addition antisymmetric, then it is a partial order. • A binary relation is a strict partial order (or irreflexive partial order) if it is irreflexive, assymetric and transitive. 171

172

Appendix B. Binary Relations • A binary relation is a total order, if it is a strict partial order and it is also connected. • A binary relation is a weak order, if it is a negatively transitive strict partial order.

Appendix C

Acronyms AI Artificial Intelligence DB Database BMO Best Matches Only DiFEPreKO Difficulty of Formulating Effective Preferences without Knowing the Options ES Exploratory Search FDT Faceted and Dynamic Taxonomies HCI Human Computer Interaction IIR Interactive Information Retrieval IIPP Intentional Inconsistent Preferences Percentage IR Information Retrieval IS Information System NIIP Number of Intentional Inconsistent Preferences NUP Number of Unused Preferences UI User Interface UPP Unused Preferences Percentage WSE Web Search Engine

173

174

Appendix C. Acronyms

175

Suggest Documents