1
Developing Parallel Programs in a Graph-Based Environment Guido Wirtz FB Elektrotechnik und Informatik, Universit¨at–GHS–Siegen, 57068 Siegen, Germany
[email protected] Explicit parallel programming requires changing from a single linear flow of control to complicated, non-linear systems of parallel processes related via complex interdependencies. We argue that traditional, text-based parallel programming languages are not the best choice for describing parallel systems. Almost all interesting aspects of parallel systems are hidden in the linear textual description and have to be made visible by introducing additional non-linear formalisms when explaining and/or visualizing a concrete run of a parallel program. To overcome this problem, the graph-based language Meander is built up by combining textual sequential program fragments (in C) and a specification graph describing the parallel aspects of a message passing paradigm like process creation, synchronization and data exchange between processes. 1. Introduction Parallel programming is characterized by a growing set of parallel architectures, paradigms and programming languages. The question how to support a programmer best in designing and implementing a parallel program is still an important topic of research. This is especially true when trying to utilize distributed memory machines. For many application areas, explicit parallel programming using some sort of message passing is still essential for gaining efficiency. Unfortunately, explicit parallel programming is more complex than programming in a sequential paradigm: the user has to perform more steps when developing a parallel application (algorithm and/or data partitioning, coordination of complex systems, mapping); more important, parallelism is hard to understand because there is no longer a linearly ordered flow of control which implies problems like determinacy, proper termination and poor testability. We argue that parallel languages which are entirely based on textual representations are not the best choice for describing parallelism. The main drawbacks stem from the fact that textual representations are always written down in some sequential order which hides the parallel structure of a program. The inadequacy of purely textual programming becomes evident when communicating a parallel program to a colleague or reading a textbook on parallel programming: even people studying for years in topics of parallelism resort to drawing when explaining a specific parallel situation, mechanism or whatever. Moreover, the evolving set of tools supporting parallel programming (debugging, visualization) is especially useful because of exploiting graphical methods.
2 We propose the usage of graphical languages to specify parallel programs where parts of a graph may be connected or not, depending on the existence of logical relationships (e.g. causal ordering, communication). Although graphical methods are central in our approach, we are far from using graphics for all parts of a parallel program: purely sequential parts without communication and parallel control should be formulated in the old-fashioned way. Hence, we use a hybrid approach integrating textual and graphical representations into one language. The rule of distinction between graphical and textual representation is: specify all parallel aspects graphically and the rest is done using plain text in ANSI-C. A sequential program fragment is not allowed to hold parallel statements (e.g. communication) and thus the graphical part of our language has to be as powerful as an imperative programming language. Such a hybrid language provides the ideal basis for an integrated programming environment: The same graph which is used for drawing a sketch of the planned process system is used for coding and for the visualization of program behaviour. This uniformity of formalisms is one of the major benefits of the Meander approach. There do exist many tools which support some steps in the development of explicit parallel programs (for a recent overview for network-based systems cf. [15]). Especially, the area of visualizing the concrete race of a program has been tackled in many approaches, e.g. [8], [10], or [2]. In contrast to Meander, almost all approaches start with the textual coded parallel program, give no direct support for the core program development and share one common drawback: the representation used to develop a program is completely different from those of the tools which are needed to understand the program. Moreover, most of these tools put their focus on performance measurement, not on a better program understanding. The need for graphical representations during all steps of parallel program development has been recognized by some authors for years (cf. [7]). Some approaches use graphical formalisms close to that of Meander. In [17], a graphical basis for visualization is obtained by program analysis; no support is given for program development. For the Petri-net based work of [5], the focus lies in performance prediction. PFG [13] models control flow as well as data access in a graphical manner. The control flow part seems to be close to our approach although PFG works on shared data and hence uses a different underlying computation model. The Schedule environment [4] is explicitly dedicated to the development of large scale numerical programs but restricted to shared-memory machines and based on Fortran. The Enterprise system [3] tries to exploit parallelism through intensive use of the metaphore of a business company and provides a diagrammatic language for the specification of a built-in set of interaction pattern between parallel components. The PVM-based [14] HeNCE tool [1] offers a graphical interface for describing coarsegrained parallel PVM tasks. Nodes are connected via edges describing dependencies between the data produced in the nodes and contain subroutine calls as well as input/output declarations in order to specify which data are to be imported/exported in a single node. Graphical patterns for defining conditionals, loops and pipes ease the specification of process systems. The Meander language seems to be more appropriate for describing fine-grained parallel programs but is able to handle coarse-grained systems as well. In
3 HeNCE, the incorporation of graphical constructs, node code and function calls works on 3 distinct language levels whereas Meander combines the textual and graphical level in a more concise manner. Besides the benefits of graphical formalisms, one has to cope also with the problem of graphical complexity: the naive usage of graphical constructs at a rather detailed abstraction level can become tedious when dealing with programs of real-life size. Plain graphs consisting of hundreds of nodes and edges are hard to oversee and display which – in the worst case – may lead to the same problems in understanding a program as in the textual case. Hence, comfortable abstraction methods, support for process systems consisting of uniform processes and regular communication patterns as well as a modular and flexible design strategy are essential to provide a really useful programming environment. Fortunately, such mechanism can be settled on the top of graphs structured by a fixed graph-syntax in an almost canonical manner. In the rest of this paper, we give a sketch of the Meander language as well as an example program in section 2, discuss the important topic how to manage complex graphical specifications in section 3, provide an overview of the current functionality of our system (section 4) and close with some comments on future plans w.r.t. Meander. 2. The Meander Language The sequential basis of our language is ANSI-C. All parallel aspects are formulated by a specification graph. A (plain) complete specification program consists of three parts: 1. The specification graph describes the global structure of a parallel process system by means of a finite, directed, loosely connected graph build up from a fixed set of graph fragments and 3 disjoint types of edges (causal, sync, async). 2. The annotation function defines an appropriate sequential code fragment for each node of the specification graph (executable statements, storage manipulation or expressions). 3. The global base environment holds all parts of a specification which are not directly executable, i.e. typedefs and function definitions. Only the union of all three parts is interpreted to be a specification program and hence a semantical object. A valid graph fragment is a subgraph which is embedded into its host graph by at most one incoming and one outgoing arc of type causal. Edges of this type model the sequential flow of control and data. All specification graphs are constructed via the combination of graph fragments identifying incoming and outgoing arcs in a directionrespecting manner. The graph syntax is simple: each correct specification graph is strictly hierarchical. That implies that all path forked at the same create-child (do, alt) join at the same wait-child (end-do, end-alt), respectively (e.g. node 103–104 in Fig.1). Sequential code is integrated by annotating a valid code fragment, i.e. a non-empty list of C-constructs forming a complete block structure to each seq-node . Parallelism is introduced by means of a cc/wc (stands for create-child/wait-child) graph fragment. Between these nodes there has to be (i) exactly one graph fragment which is embedded
4
Figure 1. Screenshot of a master-plus-8-worker system organized as a ring by causal arcs and (ii) at least one additional graph fragment embedded via the process creation (causal) arcs. Fragment (i) specifies the amount of work to be done by the process which executes the process creation; each fragment (ii) specifies an additional process. All these fragments will be executed in parallel and the creating process is delayed after executing its fragment until all created processes have been terminated. Each maximal chain of graph fragments reachable by causal arcs only, constitutes a sequential process of its own.
5 Communication is represented explicitly via special snd/rcv-nodes and directed edges starting (ending) at the sending (receiving) node, respectively. Each communication node is annotated by a reference to the data which are to be sent (where data are to be received) and the sizeof the message. Communication edges may be either of type sync or async: a sync edge blocks both nodes (and, hence the processes) until communication takes place; an async edge blocks only the rcv-node iff the snd-node has not been executed yet. Many-to-one connections are permitted iff all communication partners reside in one process and in each execution only one of the edges causes a communication event (no broadcast). We use explicit communication edges because communication between two processes introduces additional constraints on the execution order of the entire system. Conditionals and loops are similiar to Dijkstra’s guarded commands as used in CSP [9] but locally deterministic by introducing unique priorities. Each alternative of an alt or do consists of a special guard-node controlling the execution of the following sequential graph fragment. Boolean conditions, snd/rcv nodes as well as combinations of these types are permitted in guards. Constructs of this type are essential for describing reactive processes acting only if specific messages arrive, sampling of messages from more than one process without fixing a concrete order of arrival, multi-waits etc. The concept of environment is settled on the global base environment which is available at the start node of each process. The environment inside a sequential code fragment is the same as in C and propagated along the causal edges. The do/alt subgraphs introduce local blocks. Each process started at a cc node by means of a process creation edge starts with its own local name space. There is no environment sharing oder copying between processes. Fig. 1 shows a typical situation when developing a parallel program using the Meander graph editor. The toolbox (bottom of Fig.1) provides access to the various graph fragment types and edges as well as context menus which are used to define or edit anotations. The process system consists of a global cc (103) starting 8 additional processes; the left-most path describes the amount of work which has to be done by the process executing the cc. The additional processes (X-marked path) are of almost the same structure: a do-loop ruled by a rcv-guard (11) waits for a message from the neighbour to the left, does some amount of work in the seq node (13) and sends the result to the right neighbour (14). The amount of work which is done in a single seq node may be enormous or simple, it is always represented by a single node. Splitting of nodes is only needed iff communication is needed between sequential computing steps. Each loop is continued until no further communication is possible for the rcv-guard; this mechanism is handled by the system’s distributed termination service. Only the master’s loop is controlled by a simple boolean guard (91) which determines the number of iterations. If this guard becomes false, the do-loop is left and distributed termination starts it’s work: the neighbour to the right is informed that no more communication is possible through the communication edge (4,12); this terminates the loop (1-5), which informs loop (11-15) and so on. When all processes have terminated properly, the wc construct (104) can be finished in the master process and the entire system terminates.
6 3. Managing Graphical Complexity Parallel programs consisting of a moderate number of processes of moderate size or having a rather simple communication structure are easily developed in the plain Meander language. It is possible to display the entire graph in a screen window and maintaining readability of textual annotations as well as a clear insight into the connection structure. Meander should also be useful to program complex parallel systems as well as data-parallel systems with lots (up to hundreds) of similiar processes. When dealing with such systems, the resulting graphs tend to become unreadable and poor structured. Therefore, plain graphical specifications without additional support for handling graphical complexity mismatch their intended benefits of being easy-to-understand. A general approach for managing complexity is based on focus: A programmer should be able to look for a specific situation in a program without being disturbed by displaying lots of additional information he is not interested in at that specific time. Change of focus can be supported in Meander in at least two distinct fashions: (i) on the representational level of annotated graphs and (ii) by means of additional language constructs. We have developed concepts for both because they are useful although w.r.t. different aspects. The simplest way (which is already implemented) uses the concept of a logical canvas where the user navigates with the window through parts of the canvas, scales the specification graph etc. Instead of unconstrained folding we propose a socalled syntax-constrained folding which is based on our notion of graph fragments: a complex hierarchical fragment may be collapsed into one single node inheriting all external communication edges from inner nodes. This can be done for sequential constructs as well as for the level of processes by collapsing maximal chains into one single node which obtains a task graph (which is also used during program transformation and analysis). Whereas the former methods only ease the way a program can be understood, additional support for the structured development of specification graphs is needed. A graph module concept providing interface definition and check w.r.t. consistent use, incorporation of already defined or standard modules (e.g. buffers, I/O-interfaces) can be based on the existing graph fragment concept (using local declaration environments). Such a concept is of great help because the programmer can concentrate on newly developed parts and interfaces whereas already known components are hidden in module nodes. Moreover, the re-use of program parts is ruled by a strict module discipline. In order to deal with data-parallel programs or programs utilizing a high number of similiar processes using a simple uniform communication structure, process replication for a built-in set of standard topologies (grid and torus in 1 or more dimensions, regular n-trees), supported by index meta-variables and edge annotations which bridge the gap between communication lines and replicated process indices, seems to be very helpful. Besides the easier way to specify such systems, it is additionally avoided to generate a source main program for each replicated process in order to reduce compilation time. Concepts for Meander w.r.t. these questions have been developed yet; the implementation which complicates graph layout, analysis, code management as well as program transformation is under way.
7 4. System Overview The Meander programming environment (Fig. 2) supports the graphical specification of parallel programs, the annotation and incremental analysis of sequential code fragments (using lcc [6] as the starting point), a structural analysis of graphs w.r.t. deadlocks as well as the automatic generation of executable code for Transputer systems. Code generation is supported by the Meander Library (MEAL). It provides a couple of units to organize process systems as well as the communication services which are not directly present in the used target software architecture (e.g. mechanisms for communication handling and distributed termination implemented on top of the socket level of HeliosT M [12]). GraphEditor
C-Sources
A n a l y s e
Mapping
Compile Link,Load
MEAL
Trans-
Trans formation sources MEAL-vis puter Visualization
trace
ProgramEditor DecStation-5000/240, Ultrix, X11R4, Dec-C++, C, lcc, InterViews-3.1
Sun-IPX, Parsytec MC-1, NFS BBK-S4, T805, Helios, C
Figure 2. Meander system overview Based on tracefiles generated at runtime, a concrete run of a program can be visualized offline (controlled by a recorder-like interface) by animating the original specification graph. All interactive components use a common user interface which is controlled by the Graph Editor (see Fig.1). Besides the back-end, all parts are hosted on DecStations (using C++, X11R5 and the InterViews-3.1 toolkit [11]). Communication between the front- and back-end (a Sun–Transputer–interface) is done via NFS–mounted file systems. 5. Conclusions The experiments with the already functioning prototype show that our strategy to mix up textual and graphical formalisms is a promising way to go. At the current state of the system, enhancing the prototype and adding functionality is of primary interest. In order to provide a really useful tool for real-life programming, the concepts for managing graphical complexity sketched in section 3 have to be incorporated into the prototype. The ongoing port to PVM based networks will – besides Transputers – provide a broad range of target machines for our system1 . 1
Due to limited space, only a rough outline of the Meander concepts was possible. Some details w.r.t. the use of graphical mechanisms during all development phases of a program may be found in [16]
8 REFERENCES 1. Beguelin, A., Dongarra, J., Geist, G., Manchek, R., and Sunderam, V. Graphical development tools for network-based concurrent supercomputing. In Proc. of Supercomputing 91, Albuquerque, NM (August 1991), pp. 435–444. 2. Bemmerl, T., and Braun, P. Visualization of message passing parallel programs. In CONPAR92 - VAPP V, LNCS 634 , Lyon, France (Sep 1992), B. J. et al., Ed., pp. 79–90. 3. Chan, E. e. a. Enterprise: An interactive graphical programming environment for distributed software development. Tech. Rep. TR 91-17, Univ. of Alberta, Edmonton, Alberta, Canada, September 1991. 4. Dongarra, J., Sorensen, D., and Brewer, O. Tools to aid in the design, implementation, and understanding of algorithms for parallel processors. In Software for Parallel Computers (Jan 1992), R. H. Perrot, Ed., UNICOM Applied Information Technology, pp. 195–220. 5. Ferscha, A. A petri net approach for performance oriented parallel program design. Journal of Parallel and Distributed Computing 15, 4 (August 1992), 188–206. 6. Frase, C. W., and Hanson, D. R. A code generation interface for ANSI C. Software—Practice & Experience 21, 9 (September 1991), 963–988. 7. Glinert, E. P. Visual Programming Environments – Paradigms and Systems. IEEE Society Press, Los Alamitos, CA, 1990. 8. Heath, M. Visual animation of parallel algorithms for matrix computations. In Proc. Fifth Distributed Memory Conference (April 1990), D. Walker and Q. Stout, Eds., IEEE, pp. 1213–1222. 9. Hoare, C. A. R. Communicating Sequential Processes. Prentice Hall, Englewood Cliffs, N.J., USA, 1985. 10. Hollingsworth, J., Irvin, B., and Miller, B. P. IPS Users Guide Version 5.0. Univ. of Wisconsin-Madison, September 1992. 11. Linton, M. A., Vlissides, J. M., and Calder, P. R. Composing user interfaces with InterViews. Computer 22, 2 (Feb 1989), 8–22. 12. Perihelion, S. L. The Helios Operating System. Prentice-Hall, Englewood Cliffs, N.J., 1989. 13. Stotts, P. D. The PFG language: Visual programming for concurrent computation. In Proc. Int. Conference on Parallel Processing (1988), pp. 72–79. 14. Sunderam, V. S. PVM: A framework for parallel distributed computing. Concurrency: Practice and Experience 2, 4 (December 1990). 15. Turcott, L. H. A survey of software environments for exploiting networked computing resources. Tech. Rep. MSU-EIRS-ERC-93-2, Mississippi State U., Starkville, MS, February 1993. 16. Wirtz, G. A visual approach for developing, understanding and analyzing parallel programs. In Proc. Int. Symp. on Visual Programming, Bergen, Norway (August 1993), E. Glinert, Ed., IEEE. 17. Zernik, D., Snir, M., and Malki, D. Using visualization tools to understand concurrency. IEEE Software 9, 5 (May 1992), 87–92.