Data Parallel Program Visualizations from Formal Speci ... - CiteSeerX

4 downloads 344 Views 197KB Size Report
introduced using informal approaches to software development. ... LIVE helps a programmer create parametric visualizations, called visualization templates,.
J. of Parallel and Distributed Computing, Vol. 18, No. 2, pp.252-257, June 1993

Data Parallel Program Visualizations from Formal Speci cations Mark Vincent LaPollay Meta-Software, Campbell, CA 95008 Joseph L. Sharnowski and Betty H.C. Cheng Department of Computer Science, Michigan State University, East Lansing, MI 48824

AND Kevin Anderson Lockheed Advanced Computing Lab., Research and Development Division, Palo Alto, CA 94304 This work is supported in part by the NSF Grant CCR-9209873. This author was at Lockheed Advanced Computing Laboratory, Research and Development Division, when this project was done. 

y

1

Proposed Abbreviated Form of the Title:

Data Parallel Prog. Visualizations from Formal Specs. Address all correspondence to: Betty H.C. Cheng Department of Computer Science Michigan State University A714 Wells Hall East Lansing, MI 48824-1027 PH: (517) 355-8344 FAX: (517) 336-1061 email: [email protected]

Abstract As software is used increasingly to control critical systems, program correctness becomes paramount. Correctness is particularly important for parallel and distributed programs since the e ects of a single programming error may be magni ed greatly due to the parallelism. Using formal speci cation languages to describe programs minimizes the number of errors that would have been introduced using informal approaches to software development. This paper describes a project that combines formal speci cations and visualizations to assist in testing and debugging a parallel program. The Lockheed Integrated Visualization Environment (LIVE) system is a graphical rule-based environment that aids a programmer in the creation of visualizations that depict the behavior of executing programs. The choice of visualization corresponds to the type of programs visualized and the type of errors expected. In order to facilitate the decision-making process, LIVE uses the formal speci cations of the program's data structures to guide the visualization generation process.

List of Symbols \ell" is l \one" is 1 \oh" is O \zero" is 0 Uppercase \kay" is K Lowercase \kay" is k

1 Introduction Debugging a program consists of executing the program with particular test data that produces an e ect or behavior, interpreting that e ect, and relating the e ect back to the error in the program or its speci cation. Trace statements have been e ective in debugging serial programs since the serial organization of the trace closely matches the serial ow of computation. Tracing is not suitable for the parallel case, however, due to two factors: the serial trace does not e ectively portray parallelism and an immense amount of data is needed to analyze a computation. Instead, the debugging of massively parallel programs may be accomplished using visualizations that encode and display large amounts of data [1, 2]. For visualizations to be e ective as debugging aids, programmers must be able to easily create the appropriate debugging visualization, where the choice of visualization depends on the type of error being investigated. This paper describes a project that combines formal speci cations and visualizations to address the debugging task. Speci cally, a method for deriving appropriate parametric debugging visualizations from formal speci cations is presented. The Lockheed Integrated Visualization Environment (LIVE) system is a cooperative, graphical rule-based environment for testing and debugging massively parallel programs [2]. LIVE helps a programmer create parametric visualizations, called visualization templates, that display program behavior when later instantiated by data from an executing program. LIVE uses the formal speci cations of the subject program's data structures to guide the generation of the visualization templates. The Larch speci cation language [3] is used to describe the data structures, where Larch is a two-tiered speci cation language that 1

provides an algebraic speci cation language for specifying data abstractions and a predicate logic-based speci cation language for describing behavioral properties of procedural abstractions. The cell model [2] is used as an intermediate graphical model for representing the data structure speci cations. In order to further assist a programmer in the creation of visualizations, LIVE uses an interactive cooperative computer-aided design (CCAD) paradigm [1] that allows a programmer to quickly modify the visualization template(s). A large variety of systems have been developed to help support the task of debugging parallel programs. (For an extensive bibliography of such systems, see [4].) The particular approach of combining formal speci cations and visualizations to address the parallel debugging task has also been considered by Roman and Cox, where their technique is to use a formal mapping from the shared dataspace paradigm to appropriate visualizations [5]. The shared dataspace paradigm is one in which processes have access to a common contentaddressable data structure (typically a set of tuples) whose components may be examined, inserted, and deleted. The representation of the dataspace as a set of tuples permits formal mappings to be established between the dataspace and the animation space. The basis of the approach by Roman and Cox is to use these formal mappings for the generation of application-speci c visualizations. The major shortcoming of this approach is that it is limited to the shared dataspace paradigm. The remainder of the paper is organized as follows: The Larch speci cation language is described in Section 2. Section 3 discusses the LIVE system. Section 4 describes how Larch speci cations are used to generate visualizations. An example illustrating the use of Larch-based visualizations is described in Section 5. Concluding remarks and future work are discussed in Section 6. 2

2 Larch Using formal speci cation languages to represent the purpose of a piece of software has many advantages. First, the formal speci cations minimize the number of errors that would have been introduced using informal approaches to software development [6, 7]. Second, due to the well-de ned syntax and semantics of formal speci cation languages, many types of manipulation are amenable to automation. Finally, a formal speci cation can be used to give an abstract view of the intent of a program, where implementation details are hidden. The Larch Project [3] is developing tools and techniques to aid in the production of formal speci cations. Each Larch speci cation has components written in two languages: one designed for a speci c programming language (interface) and the other common to all programming languages (shared). The Larch Shared Language (LSL) is an algebraic speci cation language that allows a programmer to formally specify data abstractions, known as traits, that are independent of program state and programming language and are reusable among di erent speci cations. The Larch Interface Language (LIL) is used to specify what can be observed about the behavior of components written in a particular programming language. Currently, only the LSL speci cations are used in the derivation of visualizations; future investigations will address how LIL speci cations can be used. Figure 1 contains a Larch trait for a stack abstract data type (ADT). The introduces clause is a header for the portion of a trait that contains the list of operators (function identi ers). The operators for the stack are fg, push, top, pop, size, and isEmpty. Each operator is immediately followed by its signature that speci es the sorts (types) 1 of the The term \sort" is used in order to avoid confusion with the similar concept \type" from programming languages. 1

3

Stack (E ; C ) : trait includes Natural (N )

introduces

fg ! :

push : E ; C ! C top : C ! E pop : C ! C size : C ! N isEmpty : C ! Bool C

asserts generated by fg push

8

C

;

e

stk : C top (push (e; stk )) == e pop (push (e; stk )) == stk size (fg) == 0 size (push (e; stk )) == size (stk ) + 1 isEmpty (stk ) == stk = fg :

E;

implies

OrderedContainer (push for insert ; top for head ; pop for tail ) C partitioned by top ; pop ; size converts top ; pop; size exempting top (fg); pop(fg)

Figure 1: Stack trait operator's domain and range. The range is always a single sort, but the domain may consist of zero or more sorts. The

M

generated by clause asserts that the set of operators immediately following

it (constructors) may be used to generate all possible values of the sort M de ned by the trait. For the case of the stack trait (Figure 1), the constructors for

C

are fg and push.

The information delimited by the implies section is not currently used in the visualization derivation and, thus, will not be discussed here (see [3] for further details).

4

3 The LIVE System The LIVE system is divided into two components: the monitoring-rendering system and the Visualization Design Environment (VDE) [1, 2]. The monitoring-rendering system determines programmer-speci ed points in subject programs to be visualized and handles the rendering of the appropriate visualizations. The VDE component supports the creation of visualization templates based on data structure properties as well as user input. The cell model used to guide the creation of visualization templates is discussed in the next part of this section. Details of the VDE component are given in Section 3.2. 3.1

The Cell Model

In accordance with Marks' [8] approach, three components for the automatic creation of graphics are recognized: the real-world system, the graphical model, and the visualization. Figure 2 gives a pictorial representation of the relationship between the three components. There is a mapping from the real-world system to the graphical model, and another mapping from the graphical model to the visualization. For this project, the real-world system is the program speci cation and the graphical model is the cell model. The visualization is represented by a visualization template, which is used to display program behavior when later instantiated by data from an executing program. Real-World System

Graphical Model

Visualization

Figure 2: Three components for the automatic creation of graphics The cell model consists of two components, the dimension of the data structure and the description of cells composing the data structure. The cell description is decomposed into 5

the structure of the values within the cell (complexity) and the individual type of each value. The dimension of the data structure may be 0-D for a single value, 1-D for one-dimensional data types such as arrays, lists, and stacks, 2-D for matrices, and so on. The complexity of a cell may range from single-valued variables to composite data structures made up of several components such as tuples. The type of a value within a cell may be integer, real, character, or any other simple data type. For example, a two-dimensional array of vector forces may be represented with cells that each contain three single-valued variables, two variables for a two-dimensional direction of force and one variable for magnitude. The complexity of this data structure consists of the three single-valued variables, where the types of those variables are oating-point. The dimension component of the cell model is two for this example. Complexity determines the type of graphical primitives used in the visualization. For a single-valued variable, a color scale or the relative position of a pixel could be used. In order to represent a double-valued variable, the length and color of a rectangle or the position and color of a pixel could be used, and so on. Dimension determines the manner of rendering and partially determines the geometric relationship between cells. A zero-dimensional ADT could be visualized by printing it. A two-dimensional data structure suggests a two-dimensional array of graphical primitives. A three-dimensional data structure could be viewed using volume visualization, or by using an array of two-dimensional arrays.

6

3.2

VDE

VDE provides two modes for visualization construction: \automatic" and \manual". When in automatic mode, VDE reads in the speci cations of a program's data structures and produces nascent visualization templates based on the speci cations, using the cell model as an intermediate graphical model. The mapping from speci cations to the cell model is discussed in the next section. Graphical design rules are used to map from the cell model to the visualization templates. If multiple rules are applicable for a particular value, VDE will generate all the possibilities and then allow the programmer to select the desired one via its manual mode. For example, both a color scale and the relative length of a bar are suitable choices for graphically depicting a cell of an ADT that has complexity equal to a single-valued variable. The graphical design rules would generate both possibilities, and the programmer may then select the desired one in VDE's menu. The manual mode of VDE permits the programmer to further re ne a visualization template by applying or reapplying graphical design rules or by directly editing the characteristics of the visualization template using an editor [9].

4 Mapping Speci cations to the Cell Model This section de nes a mapping from Larch speci cations to the cell model. The task is to obtain the dimension, complexity, and type information for a data structure from its Larch speci cation. Figure 3 contains the technique that maps information found in a trait to the dimension component of the cell model. The tasks of determining the type and complexity of the data type are currently being investigated. Step 2 of the procedure tests for the case in which there exists a single constructor 7

INPUT: The trait that de nes the sort

M

corresponding to the ADT.

OUTPUT: The dimension of the ADT, where the value DIMENSIONLESS indicates that the data structure lacks a dimensional ordering computable by this technique.

PROCEDURE:

1. Determine the set of sorts found in the domains of the constructors' signatures. 2. If the set of sorts found in step 1 is empty, then return dimension equal to zero and stop. 3. Count the maximum number of times that the sort M appears as a domain sort within a constructor's signature. 4. If the result from step 3 is greater than one, then return dimension equal to DIMENSIONLESS and stop. 5. Count the maximum number of domain sorts within any of the constructors' signatures. 6. If the result from step 5 is greater than or equal to three, then return dimension equal to the count minus two and stop. 7. If the set found in step 1 contains a sort not equal to and that sort appears as the range for one of the signatures, then return dimension equal to one and stop. 8. Return dimension equal to DIMENSIONLESS and stop. M

Figure 3: Technique for nding the dimension of an ADT from its trait of the form

op

:!

M

for the sort

M

, where the domain of the signature is empty. This

situation occurs when the data structure can only be assigned a single value, in which case its dimension is zero. An example of this type of data structure is the Boolean value true. Step 4 of the procedure tests for the existence of a constructor for which a \new" value of the data structure is generated by joining two or more \old" values; for instance, the insertion of a tree at the leaf of another tree. The computation of the dimension for this case is beyond the current scope of our technique, and so, consequently, the procedure returns a value of DIMENSIONLESS. 8

Array2 (A; DX ; DY ; E ) : trait introduces fg :! A bind : A; DX ; DY ; E ! A apply : A; DX ; DY ! E de ned : A; DX ; DY ! Bool asserts A generated by fg; bind A partitioned by apply ; de ned 8 a : A; di ; d1i ; d2i : DX ; dj ; d1j ; d2j : DY ; e : E apply (bind (a; d2i ; d2j ; e); d1i ; d1j ) == if d1i = d2i ^ d1j = d2j then e else apply (a; d1i ; d1j ) :de ned (fg; di ; dj ) de ned (bind (a; d2i ; d2j ; e); d1i ; d1j ) == (d1i = d2i ^ d1j = d2j ) _ de ned (a; d1i ; d1j ) implies converts apply ; de ned exempting 8 di : DX ; dj : DY apply (fg; di ; dj ) Figure 4: Two-dimensional array trait Step 6 of the procedure tests for the existence of indices. Two is subtracted from the total count of domain sorts in a signature in order to account for one occurrence of , plus M

one occurrence of a sort representing an element to be added to . Only one of the sorts M

is an element to be added since the operators have been constructed in an abstract fashion. (If it were possible to add two di erent types of elements to the data structure, then two di erent constructor operators would have been used.) The other sorts in the domain of this particular constructor are thus indices for placement of the element, and so the number of these other sorts is equal to the dimension of the data structure. For example, for the trait of a two-dimensional array, shown in Figure 4, step 5 of the procedure determines the maximum number of domain sorts within a constructor to be four for the case of the constructor bind. The procedure will thus return the expected value of two. 9

Line 7 of the procedure tests if there is an ordering imposed on the data structure, such that the cells of the data structure may be depicted in a one-dimensional layout. In order to determine if there is such an ordering, we use the rule that if one of the non-constructor operators permits access to a particular element within the container, then that operator imposes an ordering. For example, the Larch trait for a stack in Figure 1 indicates that the top operator returns a particular instance of E . Line 7 of the procedure would, accordingly,

return a dimension equal to one for the stack ADT. Finally, line 8 of the procedure handles the case in which no ordering was found for the ADT. Speci cally, the procedure returns dimension equal to DIMENSIONLESS. The process is made more robust by interpreting its results within a CCAD paradigm. That is, if a result is ambiguous (e.g. the case when DIMENSIONLESS is returned by the technique), then LIVE will use the available information to generate all the visualizations that may potentially be suitable and allow the programmer to choose the appropriate visualization.

5 An Example Application This section describes how LIVE-generated visualizations based on Larch speci cations are used to detect an error in the implementation of a column-sorting routine for the assignment problem. The assignment problem is identical to the (weighted) bipartite graph matching problem, which is de ned as:

G

=(

V; E

) is an undirected bipartite graph such that

V

can

be partitioned into two disjoint sets and and all edges have one end point in and the S

other in . A set T

M

T

S

of edges is a matching for if each vertex in is the end point of at G

G

most one edge in . In the weighted bipartite matching, a weight ( ), is associated with M

w i; j

10

each edge ( ), and the cost of a matching is the sum of the weights of its edges [10]. For i; j

this problem, the maximum matching is the one with the minimum cost. For example, the arcs, represented by ones in the match matrix, shown in Figure 5, produce the maximum match with the minimum cost for the corresponding weight matrix. 11 22

4 76

32 95

2

8

100 55

6

7

333 25 88 0

Weight Matrix

1 0

0 0

0 1

0 0

0 0

0 1

0 0

1 0

Match Matrix

Figure 5: Maximum match with the minimum cost A C/Paris [11] implementation of the algorithm was adapted for the case of Single Instruction Multiple Data (SIMD) machines.2 A critical component of the algorithm is the sorting routine. The strategy of this routine is to sort columns by the row position of matches, where the rst row equals 0, the second row equals 1, and so on. The row positions of the matched arcs are used to the label the columns of a sort matrix , where S

zeros are placed in the positions that do not contain a match. The preprocessing of the sort matrix involves spreading the row numbers of the matching arcs along the columns. S

Next, in nity values, termed in nites, are substituted for zeros. An example is shown in Figure 6. The columns of this sort matrix are then sorted in ascending order. The weights matrix and the match matrix are sorted based on how is sorted. S

The underlying data structure for a matrix is a two-dimensional array, and so the trait in Figure 4 applies. Using a combination of the automatic and manual modes of VDE, the programmer is able to generate visualizations for this problem such as those in Figures 7 2

See [10] for details of this algorithm and the results obtained from its implementation.

11

Row # 0 1

1

0

0

0

0

0

1

2 3

0

1

0

0

0

0

0

0

0

Match Matrix

0 0

0 0

0 0

0 1

2 2

1

0 0

2 0

0 0

0 0

2

1

2

1

Sort Matrix Before Preprocessing

1

Sort Matrix After Preprocessing

Figure 6: Construction of the sort matrix from the match matrix and 8, where a color scale is used as the graphical primitive for cells organized in a 2-D grid arrangement. An error is detected in the visualizations of the match matrix3 in Figures 7 and 8, where red represents a match and blue represents a non-match. More speci cally, as the sorting progresses, the red dot in the rst row moves in the opposite direction of the other red dots (Figure 7) until it eventually ends up at the far right end of the matched columns upon completion of the sorting (Figure 8). Since the columns are sorted according to match row positions, this particular red dot should instead migrate to the rst column. From our earlier discussion, we recall that a column without a match will contain all zeros and, thus, be used to set the corresponding column of the sort matrix to in nites. The sorting routine will position these columns to the right of the columns containing a match, as is shown in Figure 8. Noting that the misbehaving column is sorted similar to the columns that do not contain a match, we hypothesize that the corresponding column in the sort matrix has been incorrectly lled with in nites. (Visualizations of the sort matrix were used to con rm this hypothesis, as discussed in [12].) Backtracking in the procedure to nd an explanation for the error, we also recall that row indices start at zero. Since any column of the match matrix containing all zeros will trigger its corresponding column in 3

These visualizations are for a 128 x 128 matrix.

12

Figure 7: A visualization of column sorting for the implementation of the assignment algorithm

Figure 8: The visualization of the match matrix after the misbehaving column sort the sort matrix to be set to in nites, it is concluded that the bug was caused by in nites being incorrectly substituted for the zeroth row position.

13

6 Conclusions This paper has described a method for semi-automatically generating parametric visualizations for data parallel programs from Larch [3] ADT speci cations. An intermediate graphical model, called the cell model [2], and a mapping from the cell model to the visualization templates, has been presented. Finally, a technique that derives dimension information from Larch speci cations and translates it to the cell model has been described. Future research will concentrate on expanding the cell model and extending the technique for obtaining cell model information from Larch ADT speci cations.

Acknowledgements The authors wish to thank the anonymous reviewers for their detailed comments, and the Northeast Parallel Architectures Center (NPAC) of Syracuse University for the use of their computational resources.

References [1] M. Friedell, M. V. LaPolla, S. Kochhar, S. Sistare, and J. Juda, \Visualizing the behavior of massively parallel programs," in Supercomputing '91, Nov. 1991. [2] M. V. LaPolla, \Towards a theory of abstractions and visualizations for debugging massively parallel programs," in Hawaii Intl. Conf. on System Sciences-25, Jan. 1992. [3] J. Guttag, J. J. Horning, and J. M. Wing, \Larch in ve easy pieces," tech. rep., Digital Equipment Corporation, Systems Research Center, July 1985. [4] S. Utter and C. M. Pancake, \A bibliography of parallel debuggers," SIGPLAN Notices, vol. 24, no. 11, pp. 29{42, 1989. 14

[5] G. Roman and K. C. Cox, \A declarative approach to visualizing concurrent computations," IEEE Computer, pp. 25{36, Oct. 1989. [6] B. H. C. Cheng, \Synthesis of procedural abstractions from formal speci cations," in Proc. of COMPSAC'91, (Tokyo, Japan), Sept. 1991.

[7] J. M. Wing, \A speci er's introduction to formal methods," IEEE Computer, vol. 23, pp. 8{24, Sept. 1990. [8] J. Marks, \A formal speci cation scheme for network diagrams that facilitates automated design," Journal of Visual Languages and Computing, vol. 2, no. 4, pp. 395{ 414, 1991. [9] M. Friedell, S. Kochhar, M. LaPolla, and J. Marks, \Integrated software, process, algorithm, and application visualization," J. of Visualization and Computer Animation, 1992.

[10] M. Brady, K. Jung, M. LaPolla, H. Nguyen, R. Raghavan, and R. Subramonian, \The assignment problem on parallel architectures," in The 1st DIMACS Intl. Algorithm Implementation Challenge: Problem De nitions and Speci cations, 1992.

[11] Thinking Machines Corp., Cambridge, Massachusetts, Paris Reference Manual, version 6.0 ed., Feb. 1991. [12] M. V. LaPolla, J. L. Sharnowski, B. H. Cheng, and K. Anderson, \Using formal speci cations to generate visualizations of data parallelism," Tech. Rep. MSU-CPS92-05, Michigan State University, July 1992. 15

Author Biographies MARK V. LAPOLLA received his B.A. and M.A. degrees in linguistics from SUNY Stony Brook in 1982 and 1983, respectively, and has an M.A. pending in theoretical linguistics from the University of Texas. He was at Lockheed Research and Development Division from 1987 to 1992 in various labs, including the Software Technology Lab, the Advanced Computing Lab, and the Arti cial Intelligence Center. Prior to that, he was with the University of Texas Arti cial Intelligence Lab. He is currently at Meta-Software in Campbell, California. His research interests include software engineering, user interface design, and visualization design for debugging software and scienti c applications, as well as research in cognition and consciousness. JOSEPH L. SHARNOWSKI received the B.S. degree in electrical engineering from GMI Engineering & Management Institute in 1988, and the M.S. degree in computer science from Michigan State University in 1991. He is currently a Ph.D. candidate in the Department of Computer Science at Michigan State University. His research interests include the speci cation of parallel software, program visualization, parallel program debugging, and programming environments for parallel computers. BETTY H. C. CHENG is an Assistant Professor in the Department of Computer Science at Michigan State University. She received her B.S. from Northwestern University and her M.S. and Ph.D. degrees from the University of Illinois at Champaign-Urbana in 1987 and 1990, respectively. She conducts research in the areas of formal methods applied to software

engineering and parallel computing, software development environments, object-oriented development techniques, and logic programming. KEVIN ANDERSON received a B.A. degree in Economics from the University of California at Los Angeles in 1985 and a B.S. degree in Computer Engineering from San Jose State University in 1990. Currently, he is a senior associate scientist in the R & D division of Lockheed Missiles and Space Company. Part of his research includes working on underwater acoustic signal detection and classi cation using the Wavelet transform. This research includes developing new graphical displays for extracting useful information from various transforms. Other research interests include graphics programming, computer architectures, and synthesizer design. Past projects include the design and construction of a hybrid music synthesizer along with a graphical user interface to a host computer.

Figure Captions Figure 1: Stack trait Figure 2: Three components for the automatic creation of graphics Figure 3: Technique for nding the dimension of an ADT from its trait Figure 4: Two-dimensional array trait Figure 5: Maximum match with the minimum cost Figure 6: Construction of the sort matrix from the match matrix Figure 7: A visualization of column sorting for the implementation of the assignment algorithm Figure 8: The visualization of the match matrix after the misbehaving column sort

Footnotes From the cover page: 

This work is supported in part by the NSF Grant CCR-9209873.

y

This author was at Lockheed Advanced Computing Laboratory, Research and Devel-

opment Division, when this project was done.

From the body of the paper: 1

The term \sort" is used in order to avoid confusion with the similar concept \type"

from programming languages. 2

See [10] for details of this algorithm and the results obtained from its implementation.

3

These visualizations are for a 128 x 128 matrix.

Suggest Documents