Parallel Programming Environment for Cluster ... - Semantic Scholar

2 downloads 0 Views 77KB Size Report
work is done automatically by DDG, what does not only minimize the amount of work done by programmers ... all “dirty” work in data dependence analysis, data.
Parallel Programming Environment for Cluster Computing Viet D. Tran, Ladislav Hluchy, Giang T. Nguyen Institute of Informatics, Slovak Academy of Sciences Email: [email protected] Abstract In this paper, we present a new model for parallel program development for cluster computing called Data Driven Graph (DDG). DDG automatically analyzes data dependence among tasks, synchronizes data, generates task graphs and schedules. Programming in DDG is easy and reliable; most of work is done automatically by DDG, what does not only minimize the amount of work done by programmers but also removes most frequent errors likes race conditions and deadlocks. The integrated scheduler makes parallel programs in DDG run efficiently. Our experiments demonstrate the simplicity and efficiency of programs written in DDG.

1. Introduction Parallel programming is one of the largest problems in cluster computing. Problem decomposition, data dependence analysis, communication, synchronization, race condition, deadlock and many other problems, which do not exist in sequential programming, make parallel programming much harder. Existing messagepassing libraries like PVM or MPI do not allow compilers to detect programming errors in communication at compilation time, so removing such errors requires a large time for testing and debugging. Problem decomposition Data dependence DAG generation

Data synchronization

Scheduling

Coding tasks

Fig. 1. Parallel program development in DDG

Fig. 1 shows the steps of parallel program development. At first, the program is divided into a set of tasks. This step requires a lot of analyses, thinking but little manual work. Next, programmers have to determine the data dependence among tasks, what is the basis for adding communication and synchronization routines in the next step. These step do not only require a lot of manual work from programmers, but also the largest potential source of errors in the parallel programming. Furthermore, most of possible errors in these steps cannot be detected by compilers and cause runtime errors. Runtime errors are well known to be difficult to remove, especially when current parallel debuggers are not perfect. Data dependence analysis is also the basis for generating task graphs that are used by schedulers [1]. Generating a task graph manually from a large parallel program is exhaustive work, if not impossible. Some scheduling tools provide graphical environments for drawing task graphs, or use functional languages for describing task graphs, but they have little practical meaning. Programmers simply want to write their programs in their traditional High Level Languages (C/C++, Fortran). As the result, this step is often absent in parallel programming. However, leaving this step causes scheduling inapplicable and the performance of the parallel program is not optimized.

2. Data Driven Graph The basic idea of Data Driven Graph (DDG) is that all “dirty” work in data dependence analysis, data synchronization, task graph generation and scheduling (the steps with dark background in Fig. 1) is done automatically by computers, not by programmers. That does not only free programmers from exhaustive work, but also removes most of potential errors in parallel programming. Program decomposition step is left to programmers because this step requires a lot of intelligence and little manual work. Coding tasks, of course, cannot be done automatically. In order to get the data dependence among tasks automatically, DDG has to know for each task which

Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’00) 0-7695-0896-0/00 $ 17.00 © 2000 IEEE

work1

work2

Task 1

work3

Task 2

PROCESSOR 2 work1 work2

work4

Task 4

Data b

Data c

work4

Task 3

TASK MANAGEMENT MODULE Data a

work3

TASK MANAGEMENT MODULE

Data d

Data a

Data c

INTERCONNECTION NETWORK Fig. 2. Internal structures of DDG. variable the task uses and which variable the task modifies. It can be done by tracing the code of the task, however it is time consuming. Furthermore, as many C/C++ programmers use pointers to data in their programs, it is very difficult to get which data a pointer refers to by tracing code. Therefore, in DDG each task must declare which variable it uses and modifies. Data dependence is also the basis for data synchronization. As discussed above, data synchronization is the source of potential errors in parallel programs and programmers spend a large part of time for synchronizing data, testing and debugging communication errors. It is desired if the data synchronization is done automatically and programmers can concentrate on coding tasks. Data use declaration of a task has to be consistent with its code. However, the codes of tasks change during parallel program development and using separate data use declaration is not welcomed. In DDG, the code of a task is implemented as functions/procedures in the program and the input and output data of the task are referred to in its code as formal parameters. The real variables are passed to the code as parameters during task creation. By wrapping task creation routine, DDG can determine which data the task uses without separate declaration. Fig. 2 shows DDG internal structures. Each task is represented by a task object containing pointers to its code (a function in the program) and its data. All communication and synchronization are managed automatically by DDG. For example in Fig. 2, when task 1 has finished, data a will be available so task 2 can start. Data a is also sent to the second processor for task 3. A task graph in directed acyclic graph (DAG) format could be easily extracted from the data

structures in Fig. 2 by connecting the task that writes to a data to the tasks that read it. DDG has an integrated schedule that will schedule the task graph. Task can be moved from a processor to another by sending the task object of the task (not its code or data) to the target processor. The overhead of DDG is as follow:

TDDGoverhead C = TTotal Taverage where TDDGoverhead is the processor time used by DDG, Ttoal is the total processor time used by the program, C is a constant and Taverage is the average execution time of tasks (also called as grain size). The overhead of DDG is negligible if the grain size of tasks is large enough.

3. Conclusion Data Driven Graph, a new model for parallel program development, provides a new approach for parallel programming in message-passing systems with integrated scheduling. DDG API allows programmers to write robust, efficient parallel programs in DDG with minimal difficulties.

Acknowledgment This work was supported by the Slovak Scientific Grant Agency under Research Project No. 2/7186/20.

References 1. B. Sharazi eds.: Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE Computer Society Press 1995.

Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’00) 0-7695-0896-0/00 $ 17.00 © 2000 IEEE