Programming Language Array Constructs For Parallel Relative Debugging Greg Watson David Abramson School of Computer Science and Software Engineering Monash University Wellington Road, Clayton, VIC 3168
[email protected] [email protected] 5 May, 1998 Abstract Relative debugging is a technique which addresses the problem of debugging programs developed using evolutionary software techniques. Recent developments allow relative debugging to be used on programs that have been ported from serial to parallel architectures. Such programs may change significantly in the porting process, and this raises many issues for the debugging process. This paper examines the issues of array data and code transformations that occur in these situations, and proposes an algebra for expressing the changes to the data representation. A series of programming language constructs are defined, implemented as extensions to the command language of our existing debugger, GUARD95, that allow the comparison of data between two, otherwise, different programs.
1
Introduction
Relative debugging is a technique that allows a user to compare data between two executing programs [2][4][12]. It was devised to aid the testing and debugging of programs that are either modified in some way, or are ported to other computer platforms. Whilst traditional debuggers force the programmer to understand the expected state and internal operation of a program, relative debugging makes it possible to trace errors by comparing the contents of data structures between programs at run time. In this way, the programmer is less concerned with the actual state of the program, and more concerned with finding when and where differences between the old and new codes occur. The original implementation of relative debugging, called GUARD95, only allowed the programs to run on sequential platforms [12]. The most recent implementation, GUARD97, uses an enhanced dataflow based execution mechanism [3], which makes it possible to run the programs on parallel computers. This is of significance interest since errors are often introduced when a program is ported from a sequential platform to a parallel one. An additional problem with GUARD95 was that it did not contain any significant mechanism for manipulating array structures, and thus the debugger could only handle the comparison of array data, and associated code segments if the structure did not change significantly between the sequential and the parallel program. Unfortunately, data structures are often altered when the program is parallelised, because the code may need to be be transformed to better suit the parallel platform. Thus, a major shortcoming of GUARD95 is that it is difficult to compare data when the new program is modified substantially. In this paper we address the issue of data and code transformation that occurs when a program is parallelised. In particular, we address the transformation of array data structures and propose an algebra for expressing changes to the data representation. We also define a set of extensions to the GUARD95 command language which make it possible to compare data between programs when the following common transformation occur:
1
• • • • •
the data is decomposed for distribution across the memories of a MIMD computer; the shape and number of indices of an array is altered; the order of the array indices is permuted; a scalar is promoted to an array structure in the parallel code; and multiple loops are fused or split.
These transformations are currently being implemented in GUARD97. The paper begins with a brief discussion of relative debugging and the GUARD95 command language. It then describes some of the important transformations that occur when a code is parallelised, and defines these transformations formally. The paper then proposes new debugger array constructs which make it possible to describe the transformations, and subsequently compare data between two, otherwise, different programs.
2
Relative Debugging
Unlike most conventional debuggers, a relative debugger controls two programs concurrently. It is implemented using a client-server architecture so that the debugger may be running on a different computer system than either of the programs being tested. One program, usually the original version, is referred to as the reference code, and is assumed to operate correctly. Our experience with relative debugging has been that it is extremely effective for locating errors quickly. A number of case studies are reported in the literature [1][2][4][5]. A relative debugger implements a superset of normal debugger commands. It provides a user with mechanisms for controlling the execution of a process and examining its state. More importantly, it contains commands that allow a user to compare data between two programs. Data comparisons can be performed either through an imperative scheme or a declarative scheme. Imperative comparisons can be performed explicitly by the user providing the two programs under control are stopped at breakpoints. The imperative compare command behaves like a conventional debugger command like print, however, it names data structures in two processes instead of one. For example, the following command: compare program1::A = program2::B compares the data from array A in program1 with array B in program2. If they differ, then the differences are reported. The compare command requires fairly high degree of interaction by the user because the two programs must be executed and halted before the data can be extracted. An alternative way of comparing data is to use the declarative assert command. Assert takes a pair of data structure names, program names and line numbers, and compares the contents of the data structures only when then programs have reached their respective line numbers. The following command: assert program1::A@123 = program2::B@456 compares the data from array A in program1 with array B in program2 when program1 reaches line 123 and program2 reaches line 456. It is beyond the scope of this paper to describe the way that relative debugging can be implemented. In essence, the assert commands are compiled into a dataflow graph, which is then executed under the control of an interpreter. Dataflow nodes only fire when the programs reach breakpoints, and the comparison is performed only when both data structures are available. The scheme is better described in [4]. In a number of other papers we have illustrated the way that a relative debugger can be used to find the location of errors [1][5].
3
Transformation Types
As discussed in the introduction, when traditional serial or vector codes are transformed for execution on a parallel computer, it is often necessary to re-organise the key data structures and associated code. For
2
example, if the parallel platform has physically distributed memories, then the data must be partitioned and allocated to the individual processors. It is often necessary to change the shape of the arrays and alter the order of the indexes. In some cases, the loop structures themselves must be altered. The type of transformations used is particularly important in relative debugging. In order to compare data structures between the serial and parallel versions of a program, it must be possible to replicate or reverse the data transformation that been performed within the debugger itself. This section attempts to categorise the transformations used in mapping arrays from a serial to a parallel implementation. It also examines the implications these transformations have in the context of relative debugging. 3.1
Data Parallel Decomposition
Except for extremely coarse-grained or parameterised models, some form of decomposition of arrays for parallelisation is necessary in order to distribute data to the individual processors. In the case of data parallel languages, this decomposition is usually block or cyclic decomposition, or a combination of the two, and is handled automatically by the language runtime system. Hand coded distributed memory implementations may resort to much more complex algorithms, and are beyond the scope of this paper. In many cases, particularly where processor pool sizes are dynamic, the exact partitioning is not normally known until runtime. Relative debugging allows the comparison of data between serial and parallel codes. As shown in Figure 1, some mechanism must be provided to duplicate the decomposition that has occurred.
SERIAL DATA
DATA DISTRIBUTION
PARALLEL DATA P0 P1 P2
DATA COMPARISON P0 compare
P1 P2
Figure 1: Duplication of Data Distribution for Relative Debugging
3.2
Shape Transformation
Many codes require reorganisation of data structures for parallelisation. Data may be structured to exploit a particular vector architecture and this data may need to be organised into an array to allow parallelisation across an index. Other codes may vary the shape of arrays to exploit processor performance improvements, for load balancing purposes, or for better cache utilisation [5]. In a similar manner to data decomposition, relative debugging must also provided a shape transformation mechanism to duplicate transformations that have been applied to the data.
3.3
Index Permutation
3
Vector code will structure arrays to suit vectorisation hardware, but this may not be appropriate for a parallel architecture. Permutation of array indices is also often required to improve data locality and exploit the processor cache. In addition, permutation may also result from language differences, such as between C and Fortran. Comparison of arrays in these situations using relative debugging requires an index permutation mechanism.
3.4
Variable Promotion
Many serial codes use scalars to store intermediate values in a computation. These values often become arrays when the code is converted to a data parallel language. Figure 2 shows code fragments indicating how a temporary variable in the C code might be promoted to an array written in ZPL, a data parallel language developed at the University of Washington [9]. In ZPL, the array is a fundamental data type and can be used in expressions in a similar manner to scalar values. Arrays are defined in terms of regions, which are similar to sub-arrays in Fortran 90 [6]. C Code
ZPL Code
128 for (i=0; i