Asynchronous iterative methods for the effective computation of PageRank Efstratios Gallopoulos Patras University, GREECE
[email protected]
Giorgos Kollias Patras University, GREECE
[email protected]
Daniel Szyld Temple University, USA
[email protected]
Abstract Iterative algorithms are the building blocks of important scientific computations. However their semantics-preserving implementation over modern distributed computing platforms introduces synchronization phases between cooperating tasks. These phases increase overall idle time and put tight upper bounds to performance. A drastic measure would be a total elimination of these phases: Each processor computing at its own pace, consuming whatever data is currently and locally available. Such an asynchronous computational model – has been actively pursued since the late 60’s [5], with key contributions by a relatively small but strong group of researchers; cf. the key references [4, 7]. It is fair to say, however, that most contributions have been addressing the theoretical convergence questions posed by these asynchronous iterative schemes. Until recently, actual performance of these methods was shadowed by the technology trends: Tightly-coupled, homogeneous parallel machines, with specially tuned interconnects could keep synchronization overheads to a minimum and would easily outperform asynchronous methods. Very recent results, however, appear to offer greater promise for these methods (e.g. [1, 2, 3]). In this presentation we report on the use of iterative schemes for what has been called “the world’s largest matrix computation” [8], namely Google’s PageRank algorithm as described in [9], that is a key component of Google’s search engine. We concentrate on asynchronous schemes, motivated by the size of the problem and the type of computational systems that are candidates for its implementation: Large-scale loosely-coupled heterogeneous systems (PCs of various operating systems and characteristics) ‘glued’ with commodity interconnects (LANs, the Internet), P2P and Grid platforms [6, 10]. We have implemented an asynchronous version of the PageRank algorithm and experimented over a Beowulf cluster. The implementation consists of Java components assembled by Python scripts and can be seamlessly ported to largerscale heterogeneous environments. Results are encouraging: The asynchronous computation of PageRank seems to be faster than the classic synchronous one and it has a property of dynamically adapting to scarce resources. Topics like
1
convergence properties of the resulting asynchronous schema, strategies for termination detection and techniques for implementing asynchronism are also discussed together with future directions.
References [1] J.M. Bahi, S. Contassot-Vivier, R. Couturier, and F. Vernier. A Decentralized Convergence Detection Algorithm for Asynchronous Parallel Iterative Algorithms. IEEE Trans. Parallel Distrib. Syst., 16:4–13, 2005. [2] J.M. Bahi, S. Domas, and K. Mazouzi. Combination of Java and Asynchronism for the Grid : a Comparative Study Based on a Parallel Power Method. In 18th Int’l. Parallel and Distributed Processing Symposium (IPDPS ’04), pages 158a, 8 pages, Santa Fe, USA, April 2004. IEEE. [3] J.M. Bahi, S. Domas, and K. Mazouzi. Jace: A Java Environment for Distributed Asynchronous Iterative Computations. In EUROMICRO-PDP’04, pages 350–357. IEEE, 2004. [4] D.P. Bertsekas and J.N. Tsitsiklis. Parallel and Distributed Computation. Prentice Hall, Englewood Cliffs, NJ, 1989. [5] D. Chazan and W. L. Miranker. Chaotic relaxation. J. Linear Algebra Appl., 2:199–222, 1969. [6] I. Foster and C. Kesselman, editors. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann - Elsevier, San Francisco, 2004. [7] A. Frommer and D.B. Szyld. On asynchronous iterations. J. Comput. Appl. Math., 123:201–216, 2000. [8] Cleve Moler. The world’s largest matrix computation. Matlab News and Notes, Oct. 2002. [9] L. Page, S. Brin, R. Montwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Univ., 1998. [10] K. Sankaralingam, S. Sethumadhavan, and J. C. Browne. Distributed Pagerank for P2P Systems. In 12th Int’l. Symposium on High Performance Distributed Computing, 2003.
2