Fig. 20. ETKS, ATKS efficiency versus k and distribution. (a) CPU time versus k. (b) CPU time versus distribution.
Fig. 18. ETKS, ATKS efficiency versus user and record cardinality. (a) CPU time versus jUj. (b) CPU time versus jRj.
Fig. 21. Alternative implementations: CPU time versus user cardinality. (a) Exact skylines. (b) Approximate skylines.
exact counterparts. 2S-ASCopt achieves noticeable improvement over ASCopt because after the first scan, there are relatively fewer records in the buffer Skyl compared to the exact solution. Fig. 16 measures the effect of optimizations as a function of the score cardinality. The results are consistent with those of Fig. 15: ESCopt is the best choice for exact computation, whereas 2S-ASCopt is the winner for approximation computation. Both algorithms scale well as the number of scores increases. Finally, we analyze the impact of the sampling parameters of ASC. Fig. 17a (resp., 17b) illustrates the overhead as a function of the error parameter " (resp., confidence parameter ). As expected, the CPU cost of ASC is proportional to both 1="2 and lnð1=Þ.
while ATKS is insensitive to jUj. According to Fig. 18b, the CPU time of ETKS decreases as the record cardinality grows because there are fewer scores per record. Figs. 19a and 19b present the CPU time with respect to the score cardinality and the dimensionality, respectively. The results are consistent with those in Fig. 13, and ATKS has a great advantage over ETKS, especially when more scores (resp., higher values of d) are considered. Fig. 20a analyzes the effect of k, which controls the number of records returned to the users. As expected, the performance of ETKS degrades with k. On the other hand, the impact of k on ATKS is negligible because its cost is dominated by the sampling probability computation, which is independent of k. Fig. 20b summarizes the experimental results with three different distributions. Note that for anticorrelated data, ATKS reduces the computation cost by almost two orders of magnitude. Next, we evaluate the effect of optimizations. Since the two-scans paradigm is not applicable on top-k algorithms, in the following we only consider the four scenarios ETKSopt , ETKSbasic , ATKSopt , and ATKSbasic , where the optimized versions apply prepruning, score preordering, and record preordering. Fig. 21a (resp., Fig. 21b) illustrates the cost as a function of jUj for the exact (resp., approximate) solution. ETKSopt and ATKSopt outperform ETKSbasic and ATKSbasic ,
7.3.2 ETKS versus ATKS In this section, we evaluate the efficiency of top-k algorithms for CFS using again the parameters of Tables 2 and 3. Both ETKS and ATKS are implemented as discussed in Section 6 and use all optimizations of Section 4.2. The default value of k is 20. Fig. 18a presents the impact of the user cardinality on the efficiency of ETKS and ATKS. Similar to Fig. 12a, the CPU time of ETKS grows with jUj,
respectively, by at least one order of magnitude. Similar performance gains are observed when varying the number of records, scores, and dimensions. In summary, prepruning, score preordering, and record preordering decrease significantly the cost of skyline computation under all settings (exact/approximate, threshold/ top-k). On the other hand, the two-scan paradigm is beneficial only for approximate skylines under the conventional (i.e., threshold) model.
This paper proposes collaborative filtering skyline (CFS), a general framework that generates a personalized skyline for each active user based on scores of other users with similar scoring patterns. The personalized skyline includes objects that are good on certain aspects, and eliminates the ones that are not interesting on any attribute combination. CFS permits the distinction of scoring patterns and selection criteria, i.e., two users are given diverse choices even if their scoring patterns are identical, which is not possible in conventional collaborative filtering. We first develop an algorithm and several optimizations for exact skyline computation. Then, we propose an approximate solution, based on sampling, that provides confidence guarantees. Furthermore, we present top-k algorithms for personalized skyline, which contains the k least dominated records. Finally, we evaluate the effectiveness and efficiency of our methods through extensive experiments.
Ilaria Bartolini received the graduate degree in computer science in 1997 and the PhD degree in electronic and computer engineering from the University of Bologna, Italy, in 2002. She is currently an assistant professor with the DEIS Department of the University of Bologna. In 1998, she spent six months at the Centrum voor Wiskunde en Informatica (CWI) in Amsterdam as a junior researcher. In 2004, she was a visiting researcher for three months at the New Jersey Institute of Technology (NJIT) in Newark. From January 2008 to April 2008, she was visiting the Hong Kong University of Science and Technology (HKUST). Her current research mainly focuses on learning of user preferences, similarity and preference query processing in large databases, collaborative filtering, and retrieval and browsing of multimedia data collections. She has published about 30 papers in major international journals (including the IEEE Transactions on Pattern Aanalysis and Machine Intelligence, ACM Transactions on Database Systems, Data & Knowledge Engineering, Knowledge and Information Systems, and Multimedia Tools and Applications) and conferences (including the VLDB, ICDE, PKDD, and CIKM). She served in the program committee of several international conferences and workshops. She is a member of the ACM SIGMOD, the IEEE, and the IEEE Computer Society.
Zhenjie Zhang received the BS degree from the Department of Computer Science and Engineering, Fudan University in 2004 and the PhD degree from the School of Computing, National University of Singapore in 2010. He is currently with the Advanced Digital Sciences Center, Illinois at Singapore. He was a visiting student of the Hong Kong University of Science and Technology in 2008, and a visiting scholar of AT&T Shanon Lab in 2009. His research interests cover a variety of topics including clustering analysis, skyline query processing, nonmetric indexing, and game theory. He serves as a PC member in the VLDB 2010 and the KDD 2010. Dimitris Papadias is a professor of computer science and engineering, Hong Kong University of Science and Technology. Before joining HKUST in 1997, he worked and studied at the German National Research Center for Information Technology (GMD), the National Center for Geographic Information and Analysis (NCGIA, Maine), the University of California at San Diego, the Technical University of Vienna, the National Technical University of Athens, Queen’s University, Canada, and University of Patras, Greece. He has published extensively and been involved in the program committees of all major database conferences, including the SIGMOD, the VLDB, and the ICDE. He serves or has served on the editorial boards of the VLDB Journal, the IEEE Transactions on Knowledge and Data Engineering, and Information Systems.
