A Distributed Object-Oriented Framework for Tool Development y
John M. Kewley and Radu Prodan Centro Svizzero di Calcolo Scienti co (CSCS) Via Cantonale, CH-6928 Manno, Switzerland email:
[email protected],
[email protected]
Abstract In recent years there has been a substantial increase in the availability and quality of software engineering tools; such tools are invaluable in ensuring program correctness and identifying performance problems. The majority of these, however, do not interoperate and are available on a limited platform set. We analyse such de ciencies and propose an extensible architecture for a distributed software engineering tool framework using CORBA object-oriented technology. The resulting framework provides a uni ed interface for parallel, distributed and single-processor systems, facilitates tool development, promotes tool interoperability, and can be extended by the integration of new tools and services. This exibility is demonstrated by the speci cation of an extension to support the MPI programming paradigm and a wide selection of tools that have been built using the system.
1 Introduction As applications get larger and more complex, the use of software tools becomes vital for identifying performance problems and detecting program defects. There is now a wide choice of software tools available for software developers on most development platforms. These tools, however, are typically only available on a small subset of platforms and do not interoperate. The porting of such tools is problematic due to the prevalence of platform dependencies: there may be reliance on target machine architecture, the operating system and/or the compiler. Since there are no tool standards, each tool comes with its own functionality and graphical user interface. While the oered functionality may be similar, the tools can be substantially dierent. Moving from one system to another will usually require the tedious and time-consuming process of relearning the tools. The development process is complicated by the necessity for programmers to provide dierent compile and/or link options, according to the tool that will be used to assess the behaviour of their software. This can be time consuming since, when a problem is found during the running of performance tests, its diagnosis will typically entail some recompilation (after disabling optimisation and enabling debugging, trace and/or memory c 2000 IEEE. Personal use of this material is permitted. However, permission to Copyright reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. y
testing options). With such a substantial change to the object code, xing it can be timeconsuming since the symptoms of the problem may become harder to reproduce. Such tools have no way of interoperating since they must be used with eectively dierent executables. Integrated tool environments (e.g., [1]), which combine the functionality of several tools, do oer some degree of tool interworking. They are, however, closed for extension and are limited to the proprietary facilities provided. The FIRST project1 (Framework for Interoperable Resources, Services and Tools), a collaboration between the University of Basel2 and the Swiss Center for Scienti c Computing, de nes an extensible framework (Sec. 2) based on the Common Object Request Broker Architecture (CORBA) for distributed portable software tools. This object-oriented tool framework supports an open set of high-level tool services (Sec. 2.3) which provide a machine-independent interface in CORBA interface de nition language (IDL). The framework enables portable, language-independent tool development. Several simple portable tools (Sec. 2.4) have been developed to verify the framework, and additional ones, based on the existing services, are planned. Their interoperability is achieved by the interaction of the tool services, which have the ability to serve multiple clients (i.e., end-user tools) and manipulate common application processes simultaneously. The use of CORBA, for communication between various distributed system components, as well as the object-oriented approach, gives the system a high degree of extensibility (Sec. 2.3.2 and Sec. 3). The use of standard components (i.e., CORBA, Portable Timers and Dyninst) maximises its portability across UNIX platforms (Sec. 2.1). This paper describes in detail the work of the FIRST project, building on and extending the ideas outlined in [9]. The generic framework is presented along with a wide selection of tools which make use of its facilities. The environment components are detailed carefully as they are crucial to the framework's goals of environment extensibility and tool interoperability. An extension to support the MPI programming paradigm is described to demonstrate how FIRST achieves these goals.
2 Architectural Model 2.1 Approach The FIRST approach follows the client-server tool model [4] in which platform dependencies are con ned to the server and the clients are highly portable. The monitoring system [5] is the platform-dependent part; it attaches to an application and extracts the run-time information using an instrumentation library. This library, which implements a portable standard API, con nes the platform dependencies to the monitoring system, thereby increasing portability. Figure 1 shows the basic FIRST architectural model which utilises CORBA for its interobject communication. The objective of the FIRST project is not to develop tools, but to provide a tool framework that provides a set of generic, high-level tool services which enable simple, fast and eective tool development. Within FIRST, tools are clients which make use of the provided tool services. The issue of tool interoperability is resolved by ensuring that the FIRST services can interoperate, concurrently serving requests from multiple clients, 1 2
Supported by grant 21{49550.96 from the Swiss National Science Foundation. Radu Prodan is a PhD student at the University of Basel.
with multiple services manipulating the same application process. The power is no longer stored in a powerful monolithic front-end tool, instead it is distributed among many with the interoperability providing additional synergy. Additionally, we do not attempt to achieve a full set of such services, but one that is extensible and open to further service integration and tool development. ...
Tool 1
Tool T
Application Aggregator
Manager
...
Process Manager
Process 1
...
Process P
Machine 1
...
Tool Services Layer (TSL)
Process Monitoring Layer (PML)
Process Manager
Process 1
Tool Layer (TL)
Process Q
Machine M control flow data flow
Figure 1. The Basic FIRST Architectural Model.
A selection of tool services are implemented as distributed CORBA objects which may serve multiple arbitrary clients, in particular the tools. These tool services publish their interface in the platform independent CORBA interface de nition language (IDL). This separation between speci cation and implementation makes the distributed components language independent and signi cantly increases their extensibility.
2.2 The Process Monitoring Layer (PML) The Process Monitoring Layer directly interacts with application processes to add instrumentation which collects run-time information. The dynamic instrumentation library Dyninst, developed at the Universities of Wisconsin-Madison [6] and Maryland [3], is used by FIRST for adding run-time instrumentation. This has the advantage of permitting tools to attach to processes that are already running (the process does not have to be started under the control of the tool) and insert instrumentation into processes without having to restart them (the process is only stopped for the minimum time needed to inject the code snippets). Additionally this can be done for any binary executable regardless of which compilation or link options were used to build it (even those that are optimised, stripped, or even proprietary for which no source code is available), and can do this with minimal intrusion since instrumentation can be added only when and where needed. Since Dyninst can only perform instrumentation on a single processor, and in any case to avoid bottlenecks in a parallel system, there must be one Process Manager for each processor the application is running on.
2.2.1 Process Manager (PM) The PM is a lightweight process in charge of controlling and instrumenting application processes on a single host. It automatically serves requests and there is no direct communication between PMs. The only exception to this is the MPI [8] extension (Sec. 3), in which one PM, in order to create a remote process, delegates the responsibility for performing this task to the PM on the corresponding host. A PM is implemented by four threads and supports services in the ve categories shown in Fig. 2 and described below. Application Manager
Tool
Information Manipulation Notification Services Services Services
Tool
Agg
Trace Services
Data Collection Services
Trace Data
Counters Timers
signal Application Process
RTI
Shared Memory
Figure 2. The Process Manager Architecture.
Information Services provide static and dynamic information about the running application. Static information includes the object code structure of a process, while dynamic information includes the current value of a given variable. Manipulation Services provide facilities for inserting additional instructions into a process's object code so that information about its execution may be gathered. This manipulation of running processes is achieved using the Run-Time Instrumentation Library (RTI), a shared library dynamically loaded by the PM into the (unmodi ed) application at runtime. The PM creates instrumentation objects, with which it manipulates the execution of an application process, by injecting the corresponding instrumentation into the already running process. The instrumentation objects can be of the following types: Counters characterised by the instrumentation point to which they are attached, the run-time phase that they were started in (i.e., start time), and their value ; Timers characterised by their type (i.e., wall, user or system), a set of start/resume points, a set of stop points, a phase, and a current value ; the counters and timers are known as performance data which is stored in memory which is shared between the PM and the application process; Traces characterised by the instrumentation point which, when reached, generates the data (via the Trace Service); Noti cations which notify (via the Noti cation Service { see below) about important events, such as the reaching of an instrumentation point, a change in the application's status (i.e., stop, resume, exit), the reaching of a breakpoint, the creation of a new process, or the loading of a shared library. When instrumentation is requested, the PM rst checks whether it already exists inside the process, as a result of previous activity. If so, it reuses it, thus avoiding instrumentation redundancy and minimising intrusion (also known as the probe-eect ).
Noti cation Services detect certain events during the application's execution and informs the tools when they occur. It utilises callbacks from the PM to the tool directly (which needs to implement the service as a CORBA object). There are four types of noti cations handled by the framework: arrival at an instrumentation point, loading/unloading of a shared library, process fork/exit, and status change (i.e., process resumed, stopped, terminated, detached). Trace Services generate lightweight trace data, associated with certain instrumentation points. As trace data will in general be infeasibly large to store in shared memory, it is periodically appended to a le ( fo) from where it can be (albeit less eciently than for performance data) collected by the PM. Data Collection Services sample performance data (i.e., counters and timers) from the application processes and forwards it to either the Aggregator (Sec. 2.3.2) or to the tool directly using an asynchronous (oneway) CORBA callback. PM is the only platform dependent part of the system. By using standard software components to handle the majority of the platform dependencies a high degree of portability is achieved. CORBA provides portable middleware and some degree of language independence. The dynamic instrumentation library Dyninst [3] provides a standard API on a number of platforms for inserting code into an already running program. The Portable Timing Routines (PTR) produced by the Parallel Tools Consortium provide an ecient and portable timing interface [10]. The remainder of the platform dependencies are handled using UNIX system calls (shared memory, pipes, and signals). PM has been initially developed on SPARC-Solaris 2.6 and the port for DIGITAL-Alpha has been achieved without problem. Ports to MPIS Irix and x86 Linux are already planned and will further test the portability.
2.3 The Tool Services Layer (TSL) The FIRST approach focuses on the design and development of tool services, rather than tools. These tool services have the following characteristics: platform independent, enabling portable tool development; independent of programming languages/libraries, allowing them to apply to a wide class of applications and parallel programming models; open for extension, permitting new services to be integrated, or existing services to be specialised; are able to instrument common application processes and serve multiple arbitrary clients, thereby allowing tools to interoperate. There are two types of tool service; Standard Services, which are fundamental to tool development and are grouped within the Application Manager (AM); and Auxiliary Services which are not required by all tools, but may facilitate the development of certain types of tool. Auxiliary services include the Aggregator (Sec. 2.3.2) and the extension for the parallel programming paradigm MPI (Sec. 3).
2.3.1 Application Manager (AM) Unlike the PM, which de nes low-level instrumentation abstractions, the AM de nes high-level tool services which provide the following functionality:
Information Functions are based on the PM's Information Services and include the retrieval of the application's object code or the inspection of the value of a variable. Performance Metrics are based on the PM's Manipulation Services, but operate at a higher level of abstraction: e.g., counting the number of function calls, computing a function's execution time, or counting the number of bytes passed in a function parameter. Based on these one can build more useful metric specialisations, such as the number of messages sent, the number of I/O operations, the time spent in communication, the time spent in I/O operations, and the number of bytes sent/received/involved in I/O operations (see the MPI extension { Sec. 3). Function Trace requests that a system, user or library function's entry, exits and calls are logged. Breakpoints request the insertion of normal or conditional breakpoints. Noti cations request that the tool is noti ed when certain PM-de ned events occur in the application.
2.3.2 The Aggregator (Agg) When dealing with parallel applications, frequently the rst step in processing the collected (performance) data requires an aggregation step to summarise the data for better understanding. In order to meet this requirement, an additional service, the Aggregator (Agg) was created. The Aggregator is a CORBA service that takes large amounts of data and through the use of a chosen aggregation function, reduces it to more manageable quantities. It can be used independently since it has no dependency upon other FIRST components. The Aggregator supports aggregation over time or across processors and supports a variety of functions including mean, total, variance, count, max and min. Other aggregations will be implemented as required, since due to the object-oriented approach taken, additional aggregations are relatively trivial to implement. The use of the Aggregator is optional, for instance it is unlikely that a simple debugger would make use of its facilities, whereas a performance monitor probably would when collecting large quantities of data.
2.4 The Tool Layer (TL) The objective of the FIRST project is to produce a tool framework with some exemplar tools which can be used to demonstrate the applicability of the approach. The production of richly featured software tools is therefore out of scope. Instead, a set of simple, easy-touse tools has been developed. The intention is that they are easy to use, similar in look and feel, portable across dierent platforms, interoperable, and yet cover a variety of basic pro ling and debugging scenarios that a software developer could encounter. Tool
Services
Services
Services
Data Collection & Visualisation Services
Application Manager
Application Manager
Process
Aggregator
Managers
Manipulation
Information
Notification
Figure 3. The Tool Structure.
Figure 3 shows the general structure of a FIRST tool. The high-level services available at the TSL layer and the use of CORBA to bind objects and exploit their services make the development of tools comparatively simple. The tool must perform the following tasks:
1. 2. 3. 4. 5.
Initialise the ORB ; Connect to the AM, using the CORBA Naming Service ; Attach to or Create the required Application ; Create a Notif CORBA Object to handle noti cations from the application process; Create a DataCol CORBA Object which handles the run-time data provided by the Aggregator (or directly from the PM's Data Collector); 6. Instrument the Application as needed, using the high-level manipulation services available at the AM level; pass as a parameter a DataCol object, where the performance/trace data is to be received; tell the AM of any data aggregation that is required; 7. Remove Instrumentation when no longer needed to minimise intrusion; 8. Terminate/Detach the Application and Quit. To demonstrate the exibility of the framework and to highlight the ease with which FIRST tools can be written, the following tools have been implemented: Object Code Browser (OCB) is a graphical tool comprising the following functionality: create/attach processes, spawn a process over a number of nodes, create an MPI application, display the object code structure of an SPMD application, select resources on which to apply subsequent tools (i.e., processes, modules and functions which are to be instrumented). 1st top (inspired by the UNIX administration tool top) dynamically displays the top n functions in terms of the number of times they were called, or in terms of execution time. 1st trace traces the functions executed by the application as it executes. It can be used for user-functions, system-functions (in the style of the UNIX software tool truss) or a subset of both. 1st prof like the UNIX tool prof times and counts function calls. 1st cov like the UNIX tool tcov produces a test coverage analysis. There are two versions of this tool, one counts how many times each instrumentation point is reached; the other just marks each instrumentation point when hit and immediately removes the corresponding instrumentation snippets (thus reducing the probe-eect). 1st debug is a simple command line debugger (in the style of gdb or dbx) that has the capability of controlling processes, extracting the object-structure of the application, viewing variables, adding breakpoints and replacing function calls (useful in performance steering for changing algorithm).
3 Extension for MPI Applications 3.1 Motivation FIRST was originally designed as a generic tool environment, not specialised for a particular parallel programming paradigm. Parallel and distributed applications are seen as a collection of processes running on a set of machines and cooperating by exchanging messages, using special message passing libraries. The generic metrics available at the AM level can then be instantiated for corresponding communication library functions to enable application tuning at the communication layer (Sec. 2.3). Despite the opportunity to provide uni ed views of various abstractions, integration of dierent parallel and distributed
programming models inevitably requires further environment extensions, as each standard has its own limitations. For example, the MPI standard provides a standard mean of communication between processes, but does not standardise the creation of parallel/distributed applications. In order to overcome this lack of speci cation, a number of MPI-1 implementations adopted the common approach and de ned the mpirun command, which runs instances of an MPI program over a set of machines. However, the speci cation is implementation dependent and is constrained to the SPMD programming paradigm3. For FIRST, a universal MPI application start-up therefore becomes impossible at present. In order to validate the concepts, the MPICH [2] implementation was chosen for the following reasons: it is widely and freely available, has been ported on a number of platforms, and oers a rich mpirun command which provides the functionality needed for running and manipulating MPI applications under Dyninst control.
3.2 Application Start-up The main technical problem to solve for controlling an MPI application with FIRST is the identi cation of the processes involved. This requires information about the host, path and the process id of each process so they can be either created or attached to. The tool requests that the AM create an MPI application by invoking the mpirun command (as speci ed by the MPI implementation). The AM chooses a master host from the list of machines available and forwards the request to the PM running on that host, which will coordinate the activities that follow. The PM adds the -t ag (for testing) to the command's arguments and executes it. It receives as a result what the mpirun command would normally have executed, i.e., a list of the machines with the associated MPI processes to be started. The last element in the list is the command that has to be run rst on the master host. This will then spawn the other processes above in the list. The PM adds the -p4norem ag to the command, which prevents the master process from starting the slave processes, but instead list the commands the user must run manually. The master command is run either on the local host or on a remote host if -nolocal is speci ed. The PM captures the master's output (via a pipe) from where it reads how to run the slave processes under its control. The PM creates a remote process by binding to the PM running on the corresponding host; it then delegates to it the task of creating and managing the process (the only case of direct communication between PM servers). The eectiveness of the FIRST approach is highlighted by its original solution of the following buered I/O problem: since the master buers its output when using printf and the PM redirects it to a pipe, no output can be received until the buer is explicitly
ushed. Rather than modifying the source code to ush the buer and then rebuilding the whole MPICH library, the PM forces the ush at run-time by dynamically inserting a call to fflush(stdout) immediately after the oending printf call point. This way, the implementation works on an original unmodi ed MPICH library. The latest MPI-2 [7] standard attempts to rectify this omission by specifying a new command called (in order to avoid confusion with the existing practice of using the non-standard mpirun) as a recommendation rather than a requirement. However, the speci cation is incomplete and has been kept to minimal functionality; this is (according to the MPI Forum) because the range of the possible environments is so wide (e.g., there may not even be a command line interface), that MPI cannot mandate such a mechanism. 3
mpiexec
AM create MPI
Process Manager create
Process Manager
bind create slave
run slave command
Master Machine 1
create acknowledge creation
Slave Machine 2
Figure 4. Running an MPI(CH) Application.
Both master and slave processes must be resumed so that the above can complete as described (since in MPI Init, the slaves acknowledge their creation to the master). As most of the tools require that the application is halted immediately after it is run, a breakpoint is inserted at the end of the MPI Init function of each process to ensure this. A call to PMPI Comm rank is dynamically inserted before this breakpoint in order to retrieve each MPI process identi er (within the MPI COMM WORLD communicator). Dynamic instrumentation of the new MPI Comm spawn and MPI Comm spawn multiple routines, required by the MPMD programming model and de ned in MPI-2 standard, will allow new MPI processes to be discovered and instrumented. They are not however implemented by MPICH as yet, and remains as future work for us.
3.3 Pro ling The dynamic instrumentation technology enables the framework to pro le MPI library calls with ease. The pro ling and tracing operations available at the AM (and PM) level can be used for the MPI library calls in the same way that they are used for user-de ned (or system) functions. Therefore, the pro ling interface de ned by the MPI standard is of no use to FIRST. It is sucient to apply the pro ling and tracing operations to the PMPI -pre xed calls directly, without using the MPI -pre xed wrappers. Furthermore, apart from the MPI application creation which unfortunately is not fully standardised (Sec. 3.2), all the metrics and tools developed can also be used for MPI applications. A new metric to calculate the size of a parameter passed to a function has also been added (Fig. 5). This can be used to determine the communication message size.
4 Conclusions The paper presents a distributed object-oriented framework for tool development, which has a number of results and bene ts. An open set of high-level tool services, which signi cantly ease tool development, has been de ned. The de nition of these services in CORBA IDL, separating the speci cation from the implementation, makes the distributed system components (in particular the tools) language independent. The interface to the services is platform independent, thus enabling portable tool development. The tool services can be used concurrently by multiple clients, enabling tool interoperability. By using the FIRST framework, a wide set of interoperable performance and debugging tools has been implemented.
Process Manager int rank, calls = 0, bytes = 0, datasize; Timer timer;
MPI process malloc
int PMPI_Init(int argc, char **argv) { PMPI_Comm_rank(MPI_COMM_WORLD, &rank);
// proprietary implementation insert snippet
}
BPatch_breakPointExpr();
calls++; bytes += count * datasize; PMPI_TypeSize(datatype, datasize);
insert snippet
int PMPI_Send(void *buff, int count, PMPI_DataType datatype, int dest, int tag, MPI_Comm comm) {
timer.start();
timer.stop();
// proprietary implementation insert snippet
}
Figure 5. Dynamic MPI Library Profiling.
References [1] Christian Clemencon, Akiyoshi Endo, Josef Fritscher, Andreas Muller, Roland Ruhl, and Brian J. N. Wylie. Annai: An Integrated Parallel Programming Environment for Multicomputers. In Amr Zaky and Ted Lewis, editors, Tools and Environments for Parallel and Distributed Systems, volume 2 of Kluwer International Series in Software Engineering, chapter 2, pages 33{59. Kluwer Academic Publishers, February 1996. [2] William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. User's guide, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois 60439-4844, USA, 1997. [3] Jerey K. Hollingsworth and Bryan Buck. DyninstAPI Programmer's Guide. Manual, University of Maryland, College Park, MD 20742, September 1998. [4] Robert Hood. The p2d2 Project: Building a Portable Distributed Debugger. In Proceedings of 1st SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT'96, Philadelphia, PA, USA). ACM Press, May 1996. [5] Thomas Ludwig and Roland Wismuller. OMIS 2.0 { A Universal Interface for Monitoring Systems. In M. Bubak, J. Dongarra, and J. Wasniewski, editors, Proceedings of 4th European PVM/MPI Users' Group Meeting, pages 267{276. Springer Verlag, May 1997. [6] Barton P. Miller, R. Bruce Irvin, Mark D. Callaghan, Jonathan M. Cargille, Jerey K. Hollingsworth, Karen L. Karavanic, Krishna Kunchithapadam, and Tia Newhall. The Paradyn Parallel Performance Measurement Tool. IEEE Computer, 28(11):37{46, November 1995. [7] MPIF. MPI-2: Extensions to the Message-Passing Interface. Technical report, The Message Passing Interface Forum, University of Tennessee, Knoxville, Tennessee, jul 1997. http://www.mpi-forum.org/. [8] MPIF (Message Passing Interface Forum). MPI: A Message-Passing Interface Standard. International Journal of Supercomputer Applications, 8(3&4):157{416, 1994. http://www.mpi-forum.org/. [9] Radu Prodan and John M. Kewley. FIRST: A Framework for Interoperable Resources, Services, and Tools. In H. R. Arabnia, editor, Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (Las Vegas, Nevada, USA), volume 4. CSREA Press, June 1999. [10] The Parallel Tools Consortium. Portable Timing Routines. http://www.ptools.org/projects/ptr/.