Simple, List-Based Parallel Programming with Transparent Load Balancing Jorge Buenabad-Ch´avez1, Miguel A. Castro-Garc´ıa1,2, , and Graciela Rom´ an-Alonso2 1
Secci´ on de Computaci´ on, Centro de Investigaci´ on y de Estudios Avanzados del IPN, Ap. Postal 14-740, D.F. 07360, M´exico
[email protected] 2 Departamento de Ing. El´ectrica, Universidad Aut´ onoma Metropolitana, Izt., Ap. Postal 55-534, D.F. 09340, M´exico {mcas, grac}@xanum.uam.mx
Abstract. We present a data-list management library that both simplifies parallel programming and balances the workload transparently to the programmer. We present its use with an application that dynamically generates data, such as those based on searching trees. Under these applications, processing data can unpredictably generate new data to process. Without load balancing, these applications are most likely to imbalance the workload across processing nodes resulting in poor performance. We present experimental results on the performance of our library using a Linux PC cluster.
1
Introduction
Today the ubiquity of PC clusters and the availability of a number of parallel programming libraries have made the use of parallelism ready to hand. Programming with message passing or shared memory is still widely used because of its general applicability, despite the need for the programmer both to divide the workload, to assign it to the processors and to specify communication for processors to share data or to synchronise their tasks [1, 6, 8, 11]. Shared abstract data-types (SADTs) are shared objects (such as queues) that appear local to each processor; concurrent access to them is transparent to the programmer through SADTs relevant functions (such as enqueue and dequeue) [7]. Hence the programmer does not need to divide and assign the workload, nor to specify communication. However, the programmer is responsible to represent the application data as one or more of the shared object(s) available. Skeletons are pre-packaged parallel algorithmic forms, or parallel code pattern functions. The programmer has only to assemble the appropriate skeletons to solve the problem at hand. However, not all practical problems can be simply represented with the skeletons proposed so far, and some skeletons require a relatively large number of parameters complicating their use [4, 5, 10].
Thanks to CONACyT for the institutional support.
R. Wyrzykowski et al. (Eds.): PPAM 2005, LNCS 3911, pp. 920–927, 2006. c Springer-Verlag Berlin Heidelberg 2006
Simple, List-Based Parallel Programming with Transparent Load Balancing
921
In this paper we present a data list management library (DLML) for parallel programming in clusters. DLML is useful to develop parallel applications whose data can be organised into a list, and the processing of each list item does not depend on other list items. The programming is almost sequential and, at run time, load balancing takes place transparently to the programmer. DLML lists are similar to SADTs queues [7] both in organisation and use (we describe DLML use in this paper). SADTs queues follow the FIFO policy on inserting/deleting items, DLML follows no order. The motivation of DLML was to provide transparent load balancing to parallel applications running on clusters. SADTs queues were designed first for parallel programming on shared memory platforms, and later for distributed memory platforms. For the latter, load balancing was also integrated. An earlier version of DLML was designed to help to improve the execution time of a parallel evolutionary algorithm [3]. Another version was tested with a divide-and-conquer algorithm [9]. These two kinds of algorithms manage a total amount of data that is known in advance. The version presented in this paper is simpler to use, uses a more efficient load balancing policy based on distributed workpools, and can be used with applications that dynamically generate data in unpredictable ways. With our library, the data list must be accessed using typical list functions, such as get and insert, provided by our environment. Although the use of these functions is typical, internally they operate on several distributed lists, one in each processing node. When a processor finds its local list empty, more data is asked from remote processing nodes, thus balancing the workload dynamically and transparently to the programmer. In Section 2 we present the Non-Attacking Queens problem as an example of applications that dynamically generate data. In Section 3 we show this application using our data list-management library. In Section 4 we present the organisation of the load balancing policy of our library. In Section 5 we discuss empirical results on the performance advantage of using our programming environment. We offer some conclusions and describe future work in Section 6.
2
The Non-attacking Queens Problem
In this section we present the Non-Attacking Queens (NAQ) problem [2] as a sample application that dynamically generates data organised into a list. It consists of finding all the possibilities of placing N queens on an N × N chessboard, so that no queen attacks any other. The search space of NAQ can be modeled with an N degree searching tree. We implemented a looping version of NAQ managing such tree as a list, which, at any time, contains only leaves that represent possible solutions to explore. For N = 4, Figure 1 shows the only two solutions (left), the search space (middle) and part of its list management trace (right) by our list-management library. A queen position can be specified with two coordinates in a two-dimensional array. However, in our implementation, a queen placed on row i (1