from Intel, the Cosmic Cube, the NCUBE, the Pringle Parallel Computer, the Mark II/III, the PASM, the. Megacluster, the SUPRENUM machine, etc. In addition ...
A PARALLEL DEVELOPMENT ENVIRONMENT FOR THE IPSC HYPERCUBE
Thomas Bemmerl, Ralf Gebhart, Peter Ginzinger, Thomas Ludwig*
The paper describes a new development environment for parallel computers, which was designed and implemented on the iPSC hypercube. The development system improves the programmers productivity during the design, the implementation and the testing of parallel programs. The development environment consists of three subsystems supporting the programmer in different phases of the software development lifecycle for parallel machines. A concurrent runtime library offers high parallel programming concepts for designing parallel programs at a high abstraction level. The remote facility supports the programmer during generation, compilation, mapping, configuration and loading. The most sophisticated subsystem of the development environment is a high level, object oriented concurrent debugging and monitoring system. Apart from the functionality and the implementation of the several subsystems, the paper focusses on the portable an integrated design concept of the development environment.
MOTIVATION AND STATE OF THE ART Parallel computing is proposed by many computer scientists to be the only possibility to increase execution speed and fault tolerance. This is the reason for the intensified research efforts during the last years i n the field of parallel processing. But, most of these resarch projects focus only on hardware aspects. This concentration on multiprocessor hardware leads to a lack in developing software for parallel computers. Especially few research activities in the last years have been concentrated on adequate and practical development tools for parallel programs. This fact was illustrated by Papadopoulos from MIT with the sentence: “It appears to be easier to build parallel machines than to use them” (Papadopoulos (5)). Today programs for many parallel machines are written in a high level programming language and are based on the objects and system calls of the concurrent operating system of each processor node. In addition, software for many parallel computers is developed in a so called host/target environment. This means, that the programs are developed with cross-compilers on a host computer and are downloaded for execution into the target system. The host is usually a conventional workstation connected with the target multiprocessor via parallel busses or LAN’s. Many supercomputers from industry and university are programmed this way, for example; the iPSC and iPSC/2 from Intel, the Cosmic Cube, the NCUBE, the Pringle Parallel Computer, the Mark II/III, the PASM, the Megacluster, the SUPRENUM machine, etc. In addition, even multiprocessors with native-compilers include central development nodes, from which programs are downloaded into the processor elements. Therefore a “logical” host/target environment is at least existent within this class of multiprocessors. In many cases, the host computers within these environments are connected via communication networks to other workstations of the parallel computing laboratory. As already mentioned, also the parallel machine available at the authors laboratory is used in a network based host/target environment. In addition the programming model of the iPSC is implemented with the NX operating system based on parallel processes and message passing between them. But, together with a lot of other multiprocessor prototypes from university and industry, the iPSC offers a very rudimentary programmi ng and development environment. Most of the development environments available for multiprocessors have the following drawbacks:
• The available parallel development environments offer only low level programming models consisting of concur* Technical University Munich, Department of Computer Science, P.O.B. 202422, D-8000 München, FRG
•
•
rent processes and message passing system calls. No higher abstraction concepts for process synchronization and communication are at hand. In addition, the programmer has to keep in mind the physical or logical architecture of the underlying parallel hardware when using the abstraction concepts of the programming model. This means, that the programmer has to deal with physical or logical node and process identifiers. A second field in which available development systems are very weak is configuration, program management, mapping and integration into networks (Eichholz (4), Pratt (6)). Normally no tools are at hand for automatic generation of the several application subsystems and no tools for their mapping to the nodes of the multiprocessor target system. Also most development tools are only usable from the host computer corresponding to the multiprocessor. Only a few of them offer integration into networks and remote access from other workstations. The most weighty drawback of state-of-the-art development environments is, that they contain no tools for looking at the dynamic behavior of parallel computers and their software during runtime (4),(6). In particular no adequate tools are available for debugging, testing, performance measurement (optimization) and visualization (animation) of parallel program execution.
These drawbacks of existing development environments motivated the design and implementation of the parallel development system to be presented in the rest of the paper. Within this reasearch project we especially intended to fight the three classes of drawbacks explained above. Therefore our new development environment consists of the following three subsystems:
• A concurrent runtime library offering a high level parallel programming model. • A remote generation and configuration management system based on UNIX and TCP/IP facilities. • A graphic based and menue driven high level test, debugging and monitoring environment. The features, design concepts and implementation details of the three subsystems are presented in the following paragraphs in the order mentioned. A CONCURRENT RUNTIME LIBRARY FOR THE IPSC There still exists a node operating system for the hypercube. This operating system (NX) is based on the capabilities of the node processor (protection, task switching). Parts of the operating system are linked to the application and downloaded to the cube. With NX it is possible to specify communicating sequential processes at load time and to communicate with other nodes. But, the major disadvantages of NX can not be overseen:
• • • •
It is not possible to create and delete processes dynamically at runtime within the application. There are no higher parallel abstraction concepts for the programmer to deal with than sequential processes. The creation of transparent programs is not possible because of the visible architecture. In case of runtime errors only few information is given to the programmer.
Concerning these drawbacks, our goal was to offer the programmer better and more comfortable objects and operations to communicate between processes and to synchronize them. Therefore we overlayed the NX operating system with the kernel RMK (Realtime Multitasking Kernel), which was designed at the authors laboratory. This kernel uses a priority driven scheduling algorithm and is based on coroutines known from Modula-2. Apart from these features, RMK offers a minimum set of three basic objects together with the usual operations: Tasks (processes which can be dynamically created or deleted), semaphores for synchronization of tasks, and mailboxes for communication between tasks. We chose these objects, because they are well known from nearly all programmers, who build programs in a multitasking environment. For the interprocessor communication we still use the send- and receive-calls of NX. In a further project mailboxes and semaphores will be global (known from every task in every node). In that approach the low send and receive concepts of NX are hidden from the application programmer. A program example based on the programming model of the new environment on the iPSC hypercube is shown in figure 1. RMK is implemented as a runtime library which is linked to the application program. The kernel is mainly written in C. Only a few primitive routines, which are hardware dependent are written in assembler. RMK is linked to the application, and therefore only the really used kernel routines are added to the application program. Some procedures of RMK were modified to support a high level debugger, which is described in section four of this paper. The implementation hierarchy of RMK is illustrated in figure 2. For the hypercube programmer it is now easier to handle bigger and more complex applications by using the new kernel RMK in combination with NX.
REMOTE CONFIGURATIAON MANAGMENT One of the main objectives was to use the iPSC in a network based Host/Target Environment. The host is usually a powerful UNIX workstation along with comfortable development tools for the target system. Programming the iPSC this way, offers the following advantages:
• Access to the cube in a network environment. • Usage of the comfortable tools of the host. • Debugging the iPSC along with the powerful, graphical features of the parallel debugger described in section four. In our existing configuration the host is a VAX station II with Ultrix as operating system and XWindows as graphic interface. The iPSC (cube and cubemanager) is regarded as target. Host and Target are connected in an Ethernet based network via the TCP/IP communication protocoll. The software for the iPSC is developed on the Host and downloaded into the target for testing and debugging. Figure 3 shows this network based Host/Target configuration. For a comfortable and efficient handling of the whole system we offer some mighty concepts, which enable the programmer to concentrate on his problem, namely the development of parallel programms, instead of wasting his time with low priority stuff. In particular these concepts deal with the management of source and object files, the automatic generation of the application, the mapping of the application to the cube and remote access to the cube. First of all, the management of the source and object files will be discussed. Suppose you want to write an application for the iPSC. The program will typically consist of several independent parts called components in the following, which can be executed on different virtual nodes. The whole application is mapped onto a hierarchical directory structure into the file system of the Host and each component is placed in an own subdirectory. For the implementation of this concept, we defined a so-called configuration file, where the process structure of the application along with it’s components and the corresponding mapping onto the hypercube architecture will be specified. The configuration file can be written with an editor and must be placed in the rootdirectory of the application. This offers the possibility for an easy changing of the focus of interest by changing the configuration file. Also, for each component the mapping to the virtual nodes of the cube must be specified. This mapping however is a static one. Figure 4 gives an example of such a directory hierarchy. With a specified configuration file it is possible to generate the directory structure and makefiles for the application automatically. The management of the source and object files can then be easily done via the make-utility already known from UNIX. Therefore in each directory of the whole application a makefile will be placed. You can generate the whole application with the makefile of the rootdirectory or single components with the makefile in the specific subdirectories. In each makefile the necessary options for generation of the appropriate files are specified. Because there is no adequate compiler for the 80286 on the Host (VAXstation II), the compiler of the cubemanager has to be used. This concept introduces the problem that all source and object files are stored on the Host but the compiler runs on the Target. Therefore all necessary files have to be transfered temporarily to the Target. The commands for this remote compilation are also done via the makefiles, but this process is transparent for the user. In addition to the already mentioned tasks the configuration file is used in the following phases of program development: Remote loading and starting of the application and remote testing and debugging. For implementation of these remote-concepts only standard UNIX tools as shell-scripts, make, lex, awk were used. This makes the retargeting of the remote facilities to other UNIX workstations very simple. THE PARALLEL TEST AND DEBUGGING ENVIRONMENT REALBUG In every programming environment there should exist tools for testing and debugging application programs. This fact is especially true for parallel machines because of their complex dynamic behavior during runtime. In our parallel programming environment we decided to design a debugger with the following features. REALBUG is high level language oriented (C), interactive, node oriented, offers multitasking support, and can deal with the objects of the kernel (RMK). Furthermore it is split in two parts for nodesoftware and hostsoftware, offers a menu and graphic driven human interface, supports different monitoring techniques and is easily expandable by modularity. In our parallel programming environment we use the debugger in the following way: After compiling and loading the application to the nodes of the iPSC it is possible to activate the debugger which knows all relevant information about the mapping and the symbols of the application program. This mapping and symbol information is
extracted from the configuration file (see section 3) and the symbol table generated by the compiler. Then the user has to select a node on which the debugger operates for the next commands. Of course he can change his focus of interest and switch to another node. After switching to a new node all results of the debugger which concerned to the part of the application of the old node are then hidden from the user. If the user switches back to the old node the results concerning to this node become visible again.After focusing the interests to a special node, the debugger offers the following classes of commands: monitoring program execution, controlling program execution, inspecting the state of the application, modifying the state of the program, and various miscellaneous commands. For monitoring program execution the debugger allows to specify predicates about the executing program. That means that one can stop the running program (BREAK) or protocol some interesting events (TRACE). There are three groups of events offered:
• Control flow: BREAK or TRACE if the processor executes a specified statement or procedure. • Data flow: BREAK or TRACE if a specified value is assigned to a specific variable. • Concurrency: BREAK or TRACE if there is an interprocess communication or tasks are created (deleted) or semaphores are requested (released). The debugger is split in two parts which are located on the Host and the target system (see Bemmerl (1) and (2)). A workstation which has graphic capabilities to make possible the menu and graphic driven human interface handles the host parts. At the host there are located all symboltables of the application. High level language commands are transformed to lowlevel commands (hardware, compiler dependent) which are then sent to the cube. Of course the target parts of the debugger are located at the nodes of the hypercube. There are running two hidden monitor tasks on every node in addition to the application tasks. They are activated by events or by commands from the host. These two debugging tasks are hardware and compiler dependent (- data predicates) and have to be changed if a new compiler or hardware is used. SUMMARY It has to be mentioned, that the parallel development presented was not written from the scratch. The development system is an extension of a programming environment for multitasking single processor systems available at the authors laboratory since 1986. The adaption and implementation of the development environment for the iPSC hypercube was finished in fall 1988. In the last half a year we gained experiences about the adequateness of the tools in several practical courses and in house projects at our laboratory. These experiences influence the design and implementation of further tools for parallel systems. We extend the development environment at the moment with tools for performance measurement, visualization and dynamic loadbalancing. In addition we plan to retarget and adapt the development environment to other parallel computers for feasibility demonstration of the portable design concept of the tool environment. REFERENCES 1.
Beier,H.-J., and Bemmerl, T., 1988 “Software Monitoring of Parallel Programs”, Proc. of CONPAR ’88, Manchester, UK.
2.
Bemmerl, T., 1986 “Realtime High Level Debugging in Host/Target Environments”, Proc. of EUROMICRO Symp. on Microarchitectures, Developments and Applications, Venice, 387.
3.
Bemmerl, T., Huber, F., and Stampfl, R., 1988, Microprocessors & Microsystems, 190.
4.
Eichholz, S., 1987 “Parallel Programming with ParMod”, Proc. of IEEE Int. Conf. on Parallel Processing, St. Charles, USA.
5.
Papadopoulos, 1987 “The New Dataflow Architecture being built at MIT”, Proc. of MIT/ZTI-Symposium on Very High Parallel Architectures, Siemens AG, Munich.
6.
Pratt, T.W., 1987 “The PISCES 2 Parallel Programming Environment”, IEEE Int. Conf. on Parallel Processing, 439.
7.
Segall, Z., and Rudolph, L., 1985, IEEE Software, 22.
Node i Task 1 Task 2 Mailbox
Application
NX system calls
RMK
Node j
parts of NX NX
Task 3
Hardware Figure 1: Programming Model
Figure 2: Implementation Hierarchy
Application
XENIX 286 Cubemanag.
other Workstations
Module
Module Module MAPPING
Cube VAXstation II Ultrix 32m Target
Cube Host Figure 3: Host/Target Environment
Figure 4: Mapping of Applikation