PDG : A Process-Level Debugger for Concurrent Programs in the GRAPE Parallel Programming Environment. Chris Caerts, Rudy Lauwereins and J.A. Peperstraete Katholieke Universiteit Leuven, E.S.A.T. Laboratory Kard. Mercierlaan 94 B-3001 Heverlee, Belgium email :
[email protected] Tel: +32-16-22.09.31 Fax: +32-16-22.18.55
Abstract In this paper, we describe the process-level debugger of GRAPE, our hierarchical graphical programming environment for concurrent programs. Its unique feature is that it clearly separates the identification of erroneous processes, which we call process-level debugging, from the exact localisation of the bug at the source-level. This divide-and-conquer approach is absolutely necessary for debugging complex parallel programs in a fast and systematic way. Our processlevel debugging approach is based on an animation of the program's behaviour on its hierarchical graphical representations. Graphical views are used that reflect the programmer's mental picture of the actual application. Hierarchy allows us to employ a top-down debugging approach in which we successively refine the search-space by zooming in on suspect processes first-time-right. During animation a debugging kernel implementing a record-replay mechanism guarantees reproducible program behaviour. 1. Introduction Debugging concurrent programs is considerably more complex than traditional sequential debugging. A first reason for this is that there are multiple threads of control (processes) that are executing concurrently. Because of this, we not only must identify the statement (or instructions) that causes erroneous behaviour, but also, and firstly, the process that contains this erroneous statement. This is what we call process-level debugging. The fact that processes interact (asynchronously) with each other further complicates the debugging task. Indeed, it not only gives rise to a new class of bugs (such as races and deadlocks), but also causes faults to spread rapidly, which results in the so-called domino effect. Therefore, if we observe that a process does not behave the way it should, it is possible that this misbehaviour is caused by the fact that the process received incorrect data, not because it contains a bug. Another major problem one encounters while debugging concurrent programs is that asynchronous interactions, whether they are intended or not, make it difficult to guarantee reproducible program behaviour. This hinder the recreation of error conditions during debugging, since any interference may significantly change non-deterministic program behaviour [1][2][3]. In this paper, we present a debugging tool that tackles all the above-mentioned problems. By animating the interactions between processes, it provides a means to identify erroneous processes in a fast and systematic way. The ability to identify anomalous behaviour is considerably enhanced
by the use of hierarchical application-specific graphical program representations, and by the provision of a data-visualisation mechanism that allows to easily visualise data that are exchanged between processes in a (user specified) appropriate way. In section two, we give an overview of parallel debugging techniques, and demonstrate that complex parallel programs can only be debugged with a process-level debugger. Next we briefly describe the GRAPE programming environment, which provides us with the hierarchical graphical program representations needed for our process-level debugging approach which is described in section four. In section five we will illustrate our approach with a sample debugging session after which we conclude with a description of our run-time distributed debugging kernel that implements a record-replay mechanism. 2. Overview of parallel debugging techniques In this section, we give a short overview of parallel debugging techniques. In section 2.1 we start with source-level debuggers, and demonstrate that they can be used efficiently only if a small number of suspect processes has already been identified. For this identification task however, it is advisable to make abstraction of the exact inner workings of processes, and to focus on higher level information : the interactions between processes. This is done with what we call process-level (or interaction-level) debuggers. In section 2.2, we will give an overview of this class of debuggers, and indicate where PDG differs from other similar tools. 2.1. Source-level debuggers Source-level debuggers, such as [4][5][6], are very effective to detect and fix bugs in a (single) process because they allow to examine its state information and data values while following its thread of control. Their effectiveness however strongly depends on the appropriateness of the selection of the process under study. No assistance is given for this selection however, so one is forced to tackle this problem in an ad-hoc manner : one can either sprinkle the processes with
Figure 1 : With source-level debuggers, the identification of erroneous processes requires a tedious backtracking of erroneous interactions. This way, a lot of irrelevant processes, that are affected by the domino effect, have to be examined.
debug statements, or rely on one's intuition to select suspect processes. This way, it is likely that after closer examination, suspect processes turn out to be affected by the domino effect. This means that they do not contain a bug, but misbehave because they received incorrect data from other processes. In this case, one can start all over again, selecting another suspect process, for which no hints are given by the debugger, or one can try to backtrack to the process that actually contains the bug by traversing anomalous interactions (e.g. interactions that transfer wrong data or are blocked) in reverse order. This means one has to change one's focus of attention from process to process until one eventually reaches the process that actually contains the bug. It is clear that this way, potentially a lot of irrelevant (e.g. affected by the domino effect) processes and interactions have to be examined (see figure 1). In every process one encounters, one must examine both its calculations, to check if it contains the bug, and its interactions, to determine to which process the focus of attention should be changed next. This tedious task can be alleviated by representing syntactical entities graphically [7][8]. This conveniently introduces hierarchy at the source-level. This hierarchy however, just like library calls and subroutines, is situated inside a single process. 2.2 Process-level debuggers Process-level debuggers make abstraction of the inner workings of processes, which are treated as black boxes that interact with each other. This black box representation focuses attention on the interactions between processes. It is very easy then to introduce the time aspect by animating the interactions between processes. This way one can identify the first process that generates erroneous data or the first processes that fail to communicate with each other. This clearly solves the domino effect problem. Besides, processes that functionally belong together can be combined into virtual processes. Unlike at the source-level, the hierarchy that is introduced this way allows to gradually reduce the search-space, so one no longer is confronted with all processes and all interactions at once. On each hierarchical level, only a limited number of interactions between a limited number of processes must be examined. This way, it is possible to zoom in on the erroneous (virtual) process in a fast and systematic way. We can distinguish between process-level debuggers that animate on abstract standard views [9][10], and debuggers that use views that reflect the actual structure of the algorithm at hand [11][12]. The former have the advantage of removing the need to build application-specific graphical representations : whatever the algorithm, the same graphical representation can be used. Often it consists of an x-axis representing time, with processes that are depicted as horizontal bands (colour or hatching indicating status), and interactions between processes that are depicted by vertical arrows (see figure 3.a). Because of this generality however, their views bear little resemblance to the actual program structure. This makes them not very adequate when complex
Figure 2 : With process-level debuggers, erroneous processes can be identified by examining the interactions between the processes that are depicted as hierarchical black boxes.
(a)
(b)
Figure 3 : Generic (a) and application specific (b) view on interactions between processes.
interactions must be debugged (for instance in distributed control or real-time applications). They are very suitable however for applications containing fork/join parallelism. Debuggers that use application specific graphical views, such as our PDG debugger (see figure 3.b), are mostly integrated into a graphical programming environment. Their (hierarchical) graphical representations are constructed by the programmer himself while developing his program and directly reflect the programmer's mental picture of the application. Re-using them for the animation considerably facilitates the identification of anomalous behaviour. The distinct feature of PDG [11] is its all-compassing approach. By its integration in the GRAPE programming environment [13] hierarchical graphical representations that reflect the programmer's mental picture of the application at hand are readily available. Its animation tool not only visualises program behaviour, but also acts as a powerful graphical front-end from which it is possible to zoom in on virtual processes, set break-points at, or single step through, interactions, etc... . For this it interacts with a run-time debugging kernel implementing a record-replay mechanism. The animation tool and the debugging kernel are coupled with a powerful datavisualisation tool that allows to visualise interaction data in a (user-specified) appropriate way by simply pointing at and clicking on the corresponding connection on the graphical view. Both the animation front-end and the debugging kernel are developed with portability in mind, and are available on multiple platforms. 3. The GRAPE programming environment GRAPE, our GRAphical Programming Environment integrates all tools for program development : from specification and code generation up to monitoring and debugging [13]. It supports the hierarchical mixed graphical-textual specification of parallel programs (see figure 4). On the highest abstraction level, a program is composed of coarse-grain building blocks (virtual processes), containing ports (terminals) via which they interact with each other. This is depicted graphically by connecting the appropriate terminals of the respective blocks. These graphical representations reveal important characteristics (such as pipelined operations or split/merge parallelism) of the application. Each of these graphical blocks can be elaborated more on lower abstraction levels : by zooming in on a block, we can specify out of which finer grain blocks it is composed, and how these interact with each other. This zooming in can be repeated until the complexity becomes manageable. At this point, we switch to a textual editor to specify the actual source code of the process in a conventional programming language (or turn to a hardware implementation). Developing parallel programs this way has many advantages. First of all, successive refinement is an effective means to master complexity. Besides, graphical representations seem to be a natural way to express the behaviour of algorithms composed of communicating processes. Re-using
Figure 4 : Hierarchical mixed graphical-textual program representation in GRAPE : zooming in on virtual process G reveals that it is composed of (virtual) processes G1..G5. Zooming in on process G2 brings us at the source-level.
these graphical views for debugging purposes considerably facilitates the interpretation of the animation results because they reflect the mental picture the programmer has in mind. Therefore, deviations from intended behaviour can readily be detected. 4. PDG debugging approach PDG assists the programmer in the first stage of debugging, when little or nothing is known about the source of the bug. It offers the programmer a powerful framework to identify suspect processes in a systematic way, not lead by intuition and assumptions. For this, an animation of the interactions between processes is used. Animation is used because it tackles the domino effect problem by introducing the time aspect, allowing to identify the first process that behaves anomalously. This process clearly must contain a bug. Attention is focused on interactions between processes because this not only allows to detect communication and synchronisation errors, but also calculation errors by examining the input/output relationship of processes (e.g. by examining the values of data that are exchanged between processes). Therefore, no source-level information is required for the identification of erroneous processes, if we know how processes should behave, and can detect deviations of this intended behaviour. To facilitate this, three special features are introduced : 1. the animation is performed on application-specific graphical representations that are constructed by the programmer himself during program development. Therefore, they clearly indicate what a particular process is supposed to do (f.i. edge-detection) and how processes are supposed to interact (f.i. continuously input from two processes, then output to third process). Erroneous behaviour can be detected much easier this way than if abstract standard views would be used (see figure 3). 2. one can specify easily in which events one is interested in (f.i. waiting for data, testing of firing rule, busy), and how one wants these particular events to be visualised (using colour, hatching, line style, geometry, text, icons, ...). It suffices to modify a text file called Display Description File (see figure 5.a). No reprogramming is needed for this. 3. a powerful data-visualisation mechanism is provided that visualises interaction data in a way that is most appropriate for the programmer. He can specify this for each interaction between (virtual) processes by associating during program development a visualisation tag with each graphical connection. This way, complex or huge data sets can be examined easily. Similar capabilities are offered by [14][15][16][17]. Our data-visualisation tool differs from these however by its 'open' approach and its ease of use. It offers a skeleton in which commercial or third party visualisation packages can be incorporated (see figure 5.b).
#include "datatype.h"
... TERMINAL shape ... color red
/*contains all definitions*/
main(int argc, char **argv) { /*local variable definitions*/ /*initialisation of connection with kernel and data*/ for (;;) { recv(from_kernel, &tag, sizeof(int)); recv(from_kernel, &datasize, sizeof(int)); switch (tag) { case FLOATVAL : { recv(from_kernel, &floatval, datasize); disp_float(floatval); break; } case I_IMAGE : { recv(from_kernel, (char*)i_im, datasize); display_i(dev, i_im, " ", 40, 40); break; } default : /* Error Handling */ . . . } }
(term $== "waiting") | | (term $== "communicating") (term $== "idle")
green ... CONNECTION ... color red (conn $== "active") green (conn $== "idle") ... ...
}
(a) Extract from Display Description File.
(b) Sample data-visualisation skeleton.
Figure 5 : By providing a user-adaptable Display Description File and a powerful Data-Visualisation mechanism, PDG considerably enhances the ability to detect anomalies.
If hierarchical representations are available, we can gradually reduce the search space. In this case, we will initially focus attention on the interactions between the (small number of) highest level virtual processes of the application. Once a suspect virtual process has been identified, we focus attention on the interactions between its component (virtual) processes. All processes and interactions that are contained in the other highest level virtual processes are irrelevant, and need not to be examined. This way, we can zoom in on erroneous processes in a fast and systematic way. 5. Sample debugging session In this section, we will illustrate our debugging approach by means of a sample application consisting of three functional units (see figure 6) : an Image-Enhancer unit, an Edge-Detector unit and an Object-Recognizer unit. To keep up with the incoming data stream, these three modules operate in a pipelined fashion. The Edge-Detector unit consists of two concurrent sobel operators (one for the x-axis and one for the y-axis) followed by a squaring unit, after which the results of the two axes are combined. We can express this graphically by zooming in on the Edge-Detector unit. The sobel operators can be elaborated as being a pipeline consisting of a derivation followed by a smoothing. Further speedup could have been achieved by exploiting data parallelism, but we did not do so to not overload this example. Also the Image-Enhancer and the Object-Recognizer units could have been elaborated graphically, but we did not do so because it will not be relevant to our discussion. When we run this application, providing it with a stream of input images, we observe that no objects are recognised by the Object-Recognizer unit. This might indicate something is wrong with this unit, and we might be tempted to start looking for the bug there. This is somewhat
Figure 6 : Hierarchical top-down exploration of a sample application. Derivator_Y and Smoother_X are source-level processes and cannot be elaborated graphically.
speculative however. Therefore, we will start examining the interactions between the virtual processes at the highest abstraction level, and try to systematically refine the search-space without making hasty assumptions. To this purpose we set a breakpoint at the input of the virtual processes Edge-Detector and Object-Recognizer. This is done by clicking on the graphical representations of these terminals in the animation front-end. This instructs the distributed debugging kernel to set the breakpoints. When the kernel acknowledges that the breakpoints have been set, this is visualised in the animation front-end by means of a grey circle at the respective terminals. Next we can instruct the kernel to single step through interactions (visible as connections between graphical blocks in the animation front-end). The kernel will inform the animation front-end about the occurrence of any event we are interested in, which we have specified in the Display Description File, and then blocks the execution of the application until we tell it to proceed. How the particular significant events must be visualised also has been specified in the Display Description File. For instance, when for a given interaction, both the send and the receive process are ready, this is visualised in
Figure 7 : Examining the data that is exchanged between the virtual processes at the highest abstraction level, we see that something must be wrong in Edge-Detector. It gets a neat image as input, but produces a blurry output image.
Figure 8 : Examining the exchanged data at the breakpoint between SOB_Y and Square, we see that SOB_Y produces a 'rippled' image instead of an image having a dark border at the upper boundaries of objects and a light border at lower boundaries, characteristic for a 'derived' image.
this example by colouring the corresponding terminals black and by thickening the corresponding connection. If a breakpoint is set on one of the terminals involved in the interaction, this is visualised by colouring the breakpoint circle at the respective terminal black. In this case, the kernel blocks the execution of the application until we instruct it to resume execution. While the application is blocked at a breakpoint, we can examine the data involved in that interaction by simply pointing at and clicking on the corresponding terminal. This instructs the kernel to pass the data, together with the connection's visualisation tag, to the data-visualisation program, that visualises the data appropriately. If we check the input/output relationships of the Image-Enhancer and the Edge-Detector units this way (see figure 7) we see that the Image-Enhancer functions correctly : it delivers a neat image to the Edge-Detector. The latter however produces a blurry image without distinct edges. This clearly indicates that the Edge-Detector unit is the prime suspect, not the Image-Enhancer or
Figure 9 : Examining the exchanged data at the breakpoint between Derivator_Y and Smoother_X, which are on the lowest graphical level, we are able to identify the source-level process containing the bug (e.g. Derivator_Y).
Figure 10 : After having fixed the bug, the Edge-Detector behaves as expected. It produces an image with clearly marked edges.
the Object-Recognizer units. Therefore, we decide to zoom in on the Edge-Detector unit. Examination of the partly processed images at the interactions between its component processes reveals that there must be a problem in process SOB_Y (see figure 8). Zooming in on process SOB_Y, we see that the result produced by Derivator_Y is incorrect (see figure 9). Since Derivator_Y is a source-level process, we cannot elaborate it graphically. Instead, we use a source-level debugger to examine it carefully. This eventually reveals the bug : all calculations were performed on byte values instead on integer-values, causing overflow. After having corrected this error, we can easily verify that now the Edge-Detector generates images with clear edges (see figure 10). Further observations show that now also the ObjectRecognizer recognises the objects we present to it. Although the interactions between processes are quite simple, this example illustrates the power of our approach. One is guided to the erroneous process in a systematic way by zooming in on suspect processes. On each abstraction level, only a small number of interactions must be examined. 6. Debugging Kernel Because it is impossible to record all information that might be necessary in the debugging stage, debugging usually requires multiple re-executions of the application, during which one successively gains more information about the bug. This is even more true when one uses a hierarchical debugging approach. Therefore, a record-replay mechanism is needed to guarantee reproducible program behaviour for non-deterministic concurrent programs [18][19]. To accomplish this, we developed a run-time distributed debugging kernel. This debugging kernel (there will be one per processor) consists of two concurrent threads (see figure 11). One thread is responsible for the communication between the kernel and the application processes (the ProcessServer), while the other one deals with the communication between kernels on different processors (the Kernel-Server). The kernel can operate in three modes : run mode, record mode and replay mode. In run mode, the kernel merely acts as a code loading and routing engine. In record mode, the kernel logs the relative order of interactions between processes in a trace buffer. To minimise interference, only the relative order of interactions is recorded, not the actual data involved. For coarse-grain applications, interference is small since it is limited to writing five integers in local
Figure 11 : The run-time distributed debugging kernel, one per processor, guarantees reproducible program behaviour by implementing a record-replay mechanism. Each kernel is composed of a Process-Server and a Kernel-Server.
memory (identification of source and destination process, send and receive terminal, plus an optional time stamp). Only if the trace buffer becomes full, and must be flushed in a trace file, the interference is significant. Currently, we are developing an optimal trace mechanism that minimises the size of the generated trace by detecting asynchronous behaviour at run-time. Beside the order of interactions, also system events (such as the starting and finishing of processes or the overflow of system buffers) can be recorded, as well as user-defined events (such as accessing a file or entering a critical routine). If non-optimal tracing has been done, the generated trace file can be parsed and visualised by the animation front-end, so that the interactions between processes can be examined without re-executing the application. In replay mode, the trace file is used by the kernel to impose an equivalent order on the interactions between processes. This guarantees reproducible program behaviour, even if the program contains asynchronous interactions. During these re-executions, the run-time kernel signals the occurrence of selected events to the animation front-end so that they can be visualised on the graphical program representation. Besides it is possible to instruct the kernel via the animation front-end to single step through interactions, or to set breakpoints at interactions. When a breakpoint is reached, which is visualised on the graphical representations, one can examine the data involved in the interaction by simply pointing at and clicking on the corresponding connection. This instructs the kernel to send the data, together with its visualisation tag, to the data-visualisation tool. Our kernel generates one local trace file for each processor. By merging all local trace files, while preserving the "Happened Before" relationships as defined in [20], we obtain a global trace file. This global trace file contains a partially ordered list of events. The list is not strictly ordered since in general, events can be permuted, and still represent an equivalent execution, as long as the existing "Happened Before" relationships are respected. In the same way, it is possible to split this global trace file again in partial trace files according to another assignment : these partial trace files still represent an equivalent execution. This means that it is possible to execute the program in record mode on an expensive massive parallel machine, and debug it by executing it in replay mode on a smaller, cheaper machine. This allows to debug huge programs in a cost-effective way. To this purpose, we took special care in making our kernel as portable as possible. Currently it is running on a MEIKO transputer machine and on SUN workstations. 7. Conclusion In this paper, we described a process-level debugging tool that allows to determine in a fast and systematic way which process (or interaction between processes) is responsible for erroneous behaviour of concurrent programs. Our approach is based on an animation of program behaviour on its hierarchical graphical representations. The graphical views that are used reflect algorithmic aspects of the application at hand, thereby facilitating the detection of anomalies. This ability is even further enhanced by providing a powerful data visualisation mechanism that allows to visualise data that are exchanged between processes easily. Hierarchy allows to gradually reduce the search space and keeps complexity manageable at all times. A run-time debugging kernel guarantees reproducible program behaviour by implementing a record-replay mechanism. Reexecutions can be controlled conveniently from a graphical user interface. Acknowledgements This work is partly sponsored by the Belgian Interuniversity Pole of Attraction IUAP-50. Rudy Lauwereins is Senior Research Associate of the Belgian National Fund for Scientific Research. K.U.Leuven-ESAT is a member of the DSP Valley network.
References [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11]
[12]
[13] [14] [15] [16] [17] [18] [19] [20]
W.H.Cheung, J.P.Black, E.Manning, "A framework for Distributed Debugging", IEEE Software, Januari 1990, pp.106-115. J.Gait, "A Probe Effect in Concurrent Programs", Software - Practice and Experience, Vol.16(3), March 1986, pp.225-233. F.Baiardi, N.D.Francesco, G.Vaglini, "Development of a Debugger for a Concurrent Language", IEEE Transactions on Software Engineering, VOL.12(4), April 1986, pp.547-553. E.Adams and S.S.Muchnick, "DBX Tool : a Window-Based Symbolic Debugger for Sun Workstations", Software Practice and Experience, July 1986, pp.653-669. J.Rooney, "ANABEL : an Environment for Creating Concurrent Programs", Proc. of the EWPC'92 European Workshop on Parallel Computing, March 1992. Texas Instruments, "TMS320C4x C Source Debugger User's guide", April 1992. F.Mourlin, E.Cournarie, "A Graphical Environment for OCCAM Programming", Applications of Transputers 1, IOS 1990, pp.252-261. S.Stepney, "Graphical Representation of Activity, Interconnection and Loading", Proc. 7th Occam User Group and int. workshop on parallel programming of transputer-based machines, IOS, Amsterdam, The Netherlands, 1987. R.H.Halstead, D.A.Kranz, P.G.Sobalvarro, "MulTVision : A Tool for Visualising Parallel Program Executions", ACM/ONR Workshop on Parallel and Distributed Debugging, 1991, pp.237-239. C.C.Charlton, A.J.Eaton, D.Jackson, "A Visualisation System for the Interactive Debugging and Validation of Concurrent Programs", ACM/ONR Workshop on Parallel and Distributed Debugging, 1991, pp.219-221. C.Caerts, R.Lauwereins, J.A.Peperstraete, "A Powerful High-Level Debugger for Parallel Programs", Proc. of the First ACPC Conference, Salzburg, Austria, September/October 1991, Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1992, pp. 54-64. T.G.Lewis, R.Currey, J.Liu, "Data Parallel Program Design", Proc. of the First ACPC Conference, Salzburg, Austria, September/October 1991, Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1992, pp.37-53. M.Engels, R.Lauwereins, J.A.Peperstraete, "Rapid Prototyping for DSP systems with Multiprocessors", IEEE Design & Test of Computers, June 1991, pp.52-62. M.H.Brown, R.Sedgewick, "Techniques for Algorithm Animation", IEEE SOFTWARE, Januari 1985, pp. 28-39. M.K.Ponamgi, W.Hseush, G.E.Kaiser, "Debugging Multithreaded programs with MPD", IEEE Software, May 1991, pp.37-43. S.Isoda, T.Shimomura, Y.Ono, "Linked-List Visualisations for Debugging", IEEE Software, May 1991, pp.44-51. D.Socha, M.Bailey, N.Notkin, "VOYEUR : Graphical Views of Parallel Programs", Proc. of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, May 1988, pp.206-215. T.J.LeBlanc, J.M.Mellor-Crummey, "Debugging Parallel Programs with Instant Replay", IEEE Transactions on Computers, Vol.36(4), April 1987, pp.471-482. S.H. Jones, R.H. Barkan, L.D. Wittie, "BugNet : a Real-Time Distributed Debugging System", Proc. of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, May 1988. L.Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Communications of the ACM, July 1978, Vol.21, Nr 7, pp. 558-565.