... for specific functions, these algorithms may perform unnecessary operations. Our project will try to assess whether it is possible to tweak the algorithms for two.
Information Retrieval Seminar Universit¨at des Saarlandes, Saarbr¨ ucken, Wintersemester 2005/06 Dr. Holger Bast, Ingmar Weber, Debapriyo Majumdar
Optimizing Fagin’s TA algorithms (OFTA) Daniel Dumitriu & Silvana Solomon
1. Fagin’s top-k algorithms are designed for generic aggregation functions, the only requirement being that they are monotone; for specific functions, these algorithms may perform unnecessary operations. Our project will try to assess whether it is possible to tweak the algorithms for two specific aggregation functions (to be chosen from max, max1 + max2 and median) in such way that they perform better than the generic versions, and try to compare the time required and memory consumption. The first part will consist of a closer study of the literature about the use of particular aggregation functions for these top-k algorithms and their possible extensions. The second part will be dedicated to the analysis, design and implementation of the chosen algorithm versions, followed by testing and interpretation of results. 2. Project phases (a) Literature study (4 days) i. R. Fagin, A. Lotem and M. Naor, Optimal aggregation algorithms for middleware, Journal of Computer and System Sciences 66, p. 614-656, 2003. (7 hours) ii. N. Mamoulis, K.H. Cheng, M.L. Yiu, and D.W. Cheung, Efficient Aggregation of Ranked Inputs, Proceedings of the 22nd International Conference on Data Engineering, Atlanta, April 2006. (4 hours) iii. C.A. Lang, Y.-C. Chang, J.R. Smith, Making the Threshold Algorithm Access Cost Aware, IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 10, p. 1297–1301, October 2004. (3 hours) iv. G. Weikum, lecture notes for Information Retrieval and Data Mining, Universit¨at des Saarlandes, Saarbr¨ ucken, Wintersemester 2005/06. (4 hours) (b) Application (4 days) i. Analysis and design of algorithms specially tweaked for our chosen functions (7 hours) ii. Implementation of the original TA and NRA algorithms using sum as the aggregation function. (5 hours) iii. Implementation (6 hours) iv. Construction of the queries to be used in tests (on data taken from the collections Drawing and JulesVerne) (1 hour) v. Testing and collecting the results: time, memory (2 hours) vi. Analysis and interpretation of the results (3 hours) 3. Deliverables (a) Presentation and summary of the project (b) The application code (c) Structured tables (possibly graphs) with the results obtained