Evaluation and Comparison of Program Slicing Tools by Tommy Hoffner Email:
[email protected]
Abstract This report presents an evaluation and comparison of implementations of program slicing, which is a technique for extracting parts of computer programs by tracing the programs’ control and data flow related to some data item. The technique has found uses in various application areas such as debugging, program integration and testing where it is useful to be able to focus on relevant parts of large programs. Static program slicing, which is a compile-time version of the analysis, was first introduced 1982, whereas run-time based dynamic slicing systems appeared around 1988. However, previously there has not been any comprehensive evaluation of the state of the art regarding slicing system implementations. This is an attempt to partially fill that need, by evaluating five implementations. Not surprisingly, it was observed that dynamic slicing systems often give smaller and more precise slices than static slicing systems, since in the dynamic case an actual flow of control is known. All systems can be regarded as first generation systems, in that they are mainly developed to support research. They have some performance problems and in several cases support rather small language subsets. Since their are no established criteria for how to evaluate program slicing yet, this report also discusses how to compare slicing tools for different application areas and formulates a method for doing so. The method is then used to perform the evaluation of the tools that provide program slicing, with a particular emphasis on Mariam Kamkar’s dynamic interprocedural program slicing tool.
1 Introduction
sio
n
Program slicing is a way of handling program complexity. It helps the user to focus on relevant parts of the code. It does so by using the programs’ control and data flow to filter out a subset of the program that preserves some relevant behaviour of the original program. One of the most common application areas is debugging, but it is also used for program integration, testing, maintenance and more. Program slicing can for instance be used to locate bugs. The erroneous program can be sliced with respect to some manifestation of an error. The resulting slice will contain all parts of the program that could have caused the error. Program slicing has been an active research area since Weiser coined the term in the early eighties [Wei82, Wei84]. However there are a few research implementations available that provide several different approaches to program slicing. This paper compares those implementations and tries to indicate their usefulness for debugging of real programs.
1.1 What is program slicing
er
An example of program slicing is shown in Figure 1. Assume that we use slicing to locate a bug that is responsible for erroneous output in the first output statement. A slice with respect to the erroneous output is shown in a bold typeface.
tv
PROGRAM Example(input,output); BEGIN a:= 3; b:= 4; IF a=3 THEN BEGIN a:= a + 3; b:= b + 7; END; writeln(a); writeln(b); END.
Figure 1. An introductory example of program slicing.
Dr af
For this example it is quite easy to realize that, if variable a does not have the right value when it is written out, then the faulty part of the program must be in the slice. This is certainly true if the bug is caused by a faulty calculation. It can be argued whether the slice contains the bug if it is caused by an omitted statement, but even in this case the slice will help the user since it will show which parts of the program that actually can influence the erroneous output.
1.2 Contributions
This report concerns the evaluation of tools which provide program slicing. The aim is to compare different measures for the slicing systems in order to draw conclusions about the usability of the different approaches. The only other evaluations of program slicing tools, which I have found, are an evaluation of the usability of slicing in a debugging process [Lyl84] and some informal evaluations made in order to justify the classification of different approaches [Kam91, Ven91]. Most of the different approaches have been theoretically evaluated to motivate the different algorithms. This report is an attempt to compare slicing tools. To do so, a method for
1
sio
n
comparing implementations is formulated. Though the available systems cover many different aspects of slicing, this evaluation will emphasize on the differences between static and dynamic slicing. Since this evaluation was made while working together with Kamkar on implementing her approach, some of the results used here can also be found in her dissertation [Kam93]. Her evaluation of the algorithms concentrates on comparing her approach to others, and is based on measurements and examples used in previous work in the area. I have on the other hand tried to formulate a method for evaluating program slicing implementations, on which my evaluation is based.
1.3 Report overview
To the author’s knowledge, there exist no other formal comparisons with the same aims as this evaluation. Nor is there any described method for doing so. The report is therefore divided into two parts: • Formulating a method for evaluating program slicing tools.
er
• Performing the actual evaluation.
The report also includes an introductory part and an overview of the tools that are considered for the evaluation. The report is therefore structured in the following way: • Section 2 gives the theoretical framework. It consists of a short introduction to slicing and defines the terminology used for describing the different implementations.
tv
• Section 3 discusses implementation characteristics. The section starts by formulating what characterize a good slicing tool. Then it continues by describing a way to measure these characteristics and the problems that are involved. • Section 4 defines a method for evaluating slicing implementations. • Section 5 describes how the method is applied to the available implementations. The method is adapted to suit the selection of implementations and the aim of the evaluation. • Section 6 presents the slicing tools that are used in the evaluation. It contains a short description of the different tools and the environment in which they are used.
Dr af
• Section 7 is about the test data that is used for the evaluation. The section presents the criteria used for selection and the construction of the test data set. There is also a description of the larger test programs and the transformations made to get the different slicing tools to handle the programs.
• Section 8 summarizes and discusses the obtained measures. • Section 9 discusses the limitations that were noticed while working with the slicing tools. • Section 10 presents the conclusions drawn from the evaluation and discusses their consequences. All versions of the test programs are included in the appendix.
2
2 Theoretical framework
sio
n
This section gives a brief introduction to program slicing. It also defines the terminology that will be used in the rest of the report. Since the different descriptions to some extent use different terminology, this section also aims at defining a terminology which is applicable to all slicing algorithms and implementations. The section ends by defining a set of classification aspects that can be used to classify slicing algorithms and implementations.
2.1 Graphs
The concepts used here follow the general outline that can be found in most basic compiler course literature [ASU86]. In connection with program slicing two kinds of graph are used: flow graphs and dependence graphs. PROGRAM NextExample(input,output);
IF a=3 THEN proc(a) ELSE proc(7); END.
er
BEGIN a:= 3;
tv
Figure 2. An example of a program.
2.1.1 Flow graphs
Dr af
A flow graph is a directed graph where the vertices represent program statements. They can be used to represent the control and/or data flow of a program. Control-flow graphs (CFGs) in particular, are of interest in program slicing. A CFG is a flow graph with one unique entry vertex and one unique exit vertex [Kam93, FOW87]. It models the flow of control in a program so that each path from the entry vertex to the exit vertex represents a possible execution of the program [FOW87]. The term flow graph is sometimes used for control-flow graphs [ASU86, Kam93]. Since other kinds of flow graphs [OO84] exists this report will use the term CFG. The program in Figure 2 can be described by the CFG in Figure 3.
begin
a:= 3
proc(a) true if a=3
end
false proc(7)
control-flow
Figure 3. A CFG representing the program in Figure 2.
2.1.2 Dependence graphs
A program dependence graph (PDG) is a directed graph where the statements and control predicates in a program are represented by vertices and the data and control dependences
3
sio
n
are represented by edges [FOW87]. There is also a distinguished vertex called the entry vertex and there might be vertices representing initial definition and final use of variables [HRB88]. A data dependence exists between two statements whenever a variable appearing in one statement may have an incorrect value if the order of the two statements is reversed. A control dependence exists between a statement and a control predicate if the value of the predicate controls the execution of the statement [HRB88]. The edges from a predicate (and from the entry vertex) are labelled with the conditions for which there is a dependence. The program described in Figure 2 and 3 can be represented by the program dependence graph (PDG) shown in Figure 4.
Entry true
true
a:= 3
Control dependence data dependence
a=3 true
er
false
proc(a)
proc(7)
Figure 4. A PDG of the code in Figure 2.
tv
A dynamic dependence graph is a dependence graph where every executed statement occurs once for every time it is executed. The parts of a program that are not executed are therefore not in the graph. Statements that are executed many times are on the other hand represented by many vertices. A reduced dynamic dependence graph is a dynamic dependence graph were multiple occurrences of a statement is represented by more than one vertex only if the two executions have different transitive dependences [Agr91]. The PDG can be augmented with other kinds of edges to represent other dependences. To represent a program with procedures we need special kinds of edges to represent the interprocedural dependences. A system dependence graph (SDG) is an extension of a PDG that have special edges to represent these dependences [HRB88].
Dr af
2.1.3 Execution trees
A program execution tree, or a program activation tree, is a tree representing a specific execution of a program [ASU86, Kam93]. The vertices in the tree represent an activation of a procedure and the root vertex represents the activation of the main procedure. When a procedure is activated a new vertex is generated and it becomes a child of the vertex that represent the calling procedure. The resulting tree will have one vertex for each activation of a procedure, ordered from left to right in chronological order. An execution of a program that consists of a main procedure that calls procedure proc twice can be represented by the tree outlined in Figure 5.
2.2 Program slicing
Program slicing is a method for decomposing programs into slices or segments. The decomposition is made with respect to a set of variables and a statement in the program. This pair is called a slicing criterion, denoted C = , where i denotes a statement and
4
proc(8)
Figure 5. An example of an execution tree.
sio
proc(7)
n
Main
V a set of variables occurring in the program [Wei84]. For some of the algorithms, the variables in V have to occur in the statement i [HRB88, Kam93]. The slice consists of all statements that might influence the values of the variables in V at the given statement i in the program [HRB88]. Another way of seeing it is that a slice of
Dr af
tv
er
a program will show the same behaviour as the original program at the statement in the slicing criterion [Wei84]. These definitions imply that there exist many slices for a given program and slicing criterion. The whole program is one of the possible slices, though there probably exist smaller slices. The smallest slice for a given program and slicing criterion is called a minimal slice. The intended use of the slice imposes slightly different definitions of a slice and therefore different notions of what is supposed to be a minimal slice. Even if there was a universal definition of what a slice is, there can be no algorithm that finds a minimal slice for an arbitrary program since that would be equivalent to solving the halting problem [Wei84], which is undecidable [Sal85]. Algorithms for finding program slices therefore impose some restrictions on the programs that can be sliced and what is assumed to be a minimal slice. For the rest of this discussion we will assume that the programs that are to be sliced terminate. Figure 6 is a trivial example of a (minimal) slice. Suppose we notice that the variable b does not have the correct value after the execution of the statement b:=a+3. To locate the bug we slice the program with the slicing criterion C = . In this case the bug must be in the last expression or in the values that are used to compute the values of that expression. The only statement that influences the value of the variables that are used in the last expression is the initial assignment to the variable a. The program slice will thus consist of these two statements. The part of the program that is included in the slice is set in bold typeface in the figure. PROGRAM AnotherExample(input,output); BEGIN a:= 3; b:= 7; write(b); b:= a + 3; END. Figure 6. A program slice for C=.
Extensive descriptions of slicing algorithms can be found in papers by Weiser [Wei84], Korel [KL88, KL90], Horwitz, Reps and Binkley [HRB88, HRB90], Agrawal [AH90, Agr91] and Kamkar [KSF92, Kam93].
5
2.3 Classification aspects for algorithms
n
The algorithms are made for different applications and use different kinds of information. In order to compare them, their distinguishing features have to be classified. There have been several attempts with incompatible results to classify slicing algorithms for comparison and evaluation. The classification presented here is derived from classifications made by Kamkar [Kam93] and Venkatesh [Ven91]. Some aspects that are relevant for classifying the implementations have been added.
sio
2.3.1 Slicing variable
As mentioned in section 2.2 the algorithm can constrain the variables in V to those that occur in i, while other algorithms allow an arbitrary set of the program’s variables to be in V. The slicing criterion C = will only be meaningful in the latter case.
2.3.2 Type of slice
tv
er
The algorithms can be classified according to whether the generated program slice is executable or not. If the algorithm always produces an executable slice, it is said to generate a partially equivalent program otherwise it is said to produce a set of statements [Kam93]. Whether the declarations and other nonexecutable parts of the program are included in the slice is of no concern for this aspect. If these parts are included or not is only a question of how the algorithm’s result is mapped back to the code, by the tool or manually. According to Horwitz et al. all algorithms for finding intraprocedural slices (see section 2.3.4) will generate partially equivalent programs [HRB88]. Venkatesh on the other hand claims that both types of slices can be generated by intraprocedural slicers [Ven91]. My testcases in the appendix supports Venkatesh view. The example in Figure 6 (and all the other examples) shows a program slice that is a partially equivalent program.
2.3.3 Slicing point
Dr af
When we consider the execution of a program, we do not use any model that describes what happens inside a single statement. We only model the situation between the statements. We therefore introduce the notion of a point in the program, that is the position in the program for which the slicing is considered. We can have the slicing point immediately before or immediately after the statement in the criterion. If we have the slicing point immediately before the statement in C, the statement i in C will not be a part of the computed slice. Some of the algorithms that define the slicing point to be before the statement will include i in the presented slice anyway [Lyl84]. The example in Figure 6 assumes that the slice is taken with respect to the situation after the imagined execution of the statement in C. If we consider the slicing point to be before the imagined execution of the statement in C, then the value of the variable b is only influenced by the initial assignment of b at the beginning of the program. Figure 7 shows an example of a slice where the slicing point is considered to be immediately before the statement in C. Whether the slicing point is considered to be before or after the statement in the criterion affects which slices that can be generated. If we for instance consider the slicing point to be after the statement we will have problems expressing criteria that capture the program point were two execution paths join. Suppose that we want a slice directly before b:=a+3 in the program in Figure 7, i.e. which statements influence the value of b immediately before we reach the statement. If we consider the program point to be after the statement in C, there is no slicing criterion that captures that point for all possible executions.
6
n
sio
PROGRAM YetAnotherExample(input,output); BEGIN b:= 7; IF b>3 THEN a:= 3 ELSE a:= 4; b:= a + 3; END.
Figure 7. A program slice for C= to illustrate program slicing when the slicing point is before b:=a+3. Notice that the statement b:=a+3; is included in the slice.
How restrictive the definition of the slicing point is depends on the other classification aspects such the set of allowed variables in the criterion.
2.3.4 Scope of slice
tv
2.3.5 Slicing direction
er
The previous definition of program slicing does not take side effects due to procedure calls into account. It assumes that called procedures never modify their parameters. Nor does it consider side effects due to aliasing or global variables. Algorithms with these limitations can only be used to slice one procedure at a time. Algorithms that work according to this definition are classified as intraprocedural. If the slicing algorithm can handle slicing across procedure boundaries it is called interprocedural.
Originally, a program slice was defined as the part of the program that might have influenced the variable at the statement in the slice criterion [Wei84], a so called backward slice. The slices in Figure 6 and Figure 7 are examples of backward slices. Later extensions of the definition also include forward slices which means slices that include all statements which might be influenced by the variable’s value at the statement in the criterion [HRB88]. A forward slice for C= will include the initial assignment of the variable a and all statements that might be affected by a’s value, as shown in Figure 8.
Dr af
PROGRAM AnotherExample(input,output); BEGIN a:= 3; b:= 7; write(b); b:= a + 3; END.
Figure 8. A program slice for C= to illustrate forward slicing.
2.3.6 Abstraction level
There exist several possible abstraction levels at which a slicing tool can work. The ones that will be used here are statement and procedure level abstraction. If an algorithm works at the statement level it means that the units that are considered for inclusion in the slice are statements. Procedure level abstraction means that the algorithm will decide if whole procedures should be included in the slice or not.
7
2.3.7 Type of information
sio
n
Another aspect is whether the slice can be computed at compile time, i.e. that the slicing algorithm only uses statically available information or if it also uses run-time information from the actual execution. The first kind is called static slicing, the second one dynamic slicing. The methods that use dynamic information only consider the statements from one specific execution for inclusion in the slice. A slice based on the statically available information can be found in Figure 9 and slice of the same program based on the run-time information can be found in Figure 10.
2.3.8 Computing method
Next we consider the different approaches to computing slices. The original method for computing slices was defined by Weiser [Wei84] through solving dataflow equations. Later there have been definitions based on different dependence graph representations of the programs [OO84, HRB88]. They compute slices by traversing dependence graphs such as PDGs. These approaches are called graph reachability in dependence graphs.
er
2.4 Classification aspects for implementations
The implementations are basically classified according to their algorithms. But to get a good picture of what the implementations can handle there are three more classification aspects to consider. They are implementation dependent and will therefore help to classify the actual implementations.
tv
2.4.1 Application area
The usability of a given slice is dependent on the application area for which it is intended. Slices containing only executed statements from a particular execution are not of much use for deciding whether two versions of a program can be merged, but they are sufficient for locating a bug. The most explored application areas are program debugging, maintenance, integration and testing.
2.4.2 Language
Dr af
The language for which the slicing tool is intended gives a good picture of what can be expected of the tool. To do a comparison they would have to be fairly alike. This report only deals with algorithms and implementations for imperative languages such as FORTRAN, C and Pascal.
2.4.3 Output format
Partly depending on the application area, the tools present the computed slice in different ways. This might influence the feasibility to compare the computed slices. Even if it would be feasible to transform the outputs to a canonical format for comparison it might be too time consuming to do so.
2.5 Summary of classification aspects The definitions made in this section are summarized in Table 1.
8
Table 1: Classification aspects Aspects
Categories Only those in the slicing statement
All variables
Type of result
Partially equivalent program
Set of statements
Slicing point
Before the statement
After the statement
Scope of slice
Interprocedural
Intraprocedural
Slicing direction
Forward
Abstraction level
Statement
Type of information
Static
Computing method
Solving dataflow equations
Application area
Debugging, testing, program integration, etc.
Language
Several imperative languages
Output format
Code, graph, execution tree etc.
sio
n
Slicing variables
Backward
Procedure Dynamic
er
Graph reachability in dependence graphs
3 Slicing implementation characteristics
tv
This section discusses what characterizes a good slicing tool and on what grounds it is possible to measure those characteristics. A method which uses these measures will be presented in the next section. I want to do the evaluation from a pragmatic point of view. The aim is to evaluate the resulting slices, not the complete tool or the user’s productivity when using the tool. Since the tools are for research use, we will not try to measure how efficiently a particular tool implements the algorithms. First we have to define the characteristics. Intuitively the tool should: • Generate correct slices
• Generate relevant slices
Dr af
• Not restrict the use of the source language • Be time and space efficient
To decide how to evaluate the different implementations the criteria above will be discussed one by one below.
3.1 Correct slices
An incorrect slice has to exclude something that should have been in the minimal slice. Theoretically correctness should be easy to evaluate since there only exist two alternatives; the slice is either correct or not. There are however some problems to consider. First of all the algorithms introduce restrictions on the source code that is used as input, the program has to terminate etc. Secondly, what is a correct slice depends on the purpose for which the algorithm was defined. A program slice that contains all relevant parts of a program for a specific input will be useful for locating a bug, but it might be to small for an application area where we have to consider all possible inputs.
9
n
It is not feasible to verify that the tools are correct in an absolute sense since this would require that all combinations of language constructs and data types would have to be tested. The only feasible way is to verify that the slicing tool handles the language constructs and data types in simple test cases. To eliminate influence from other constructs than the ones we want to test, each test case should use as few constructs as possible. We will therefore assume that the tool computes correct slices for a given construct if the test cases that contains the construct give correct slices.
sio
3.2 Relevant slices
er
By generating relevant slices I mean that program slices should help the user to focus on relevant parts of the program. Since the main intent with program slicing is to break down the complexity of the programs, the best slice for a given program and slicing criterion should be the smallest correct slice. A slice efficiency factor could be measured by observing how close to minimal slices the tool’s results get. As mentioned earlier the definition of minimal slice depends on the intended use. This means that the intended application area will influence the measure. To exemplify this we can consider the program in Figure 9. Assume that it does not produce the expected answer. To locate the faulty behaviour we slice the program with C=. With static slicing the slice will be the entire program as shown in Figure 9. If we use dynamic slicing, assuming that the faulty behaviour was noticed for input 5, then the slice would only consist of the statements that are shown in Figure 10.
tv
PROGRAM StillAnotherExample(input,output); BEGIN read(a); b:= 7; IF a > 3 THEN b:= 4 ELSE b:= b - 1; write(b); END.
Figure 9. A slice for C= using static information.
Dr af
PROGRAM StillAnotherExample(input,output); BEGIN read(a); b:= 7; IF a > 3 THEN b:= 4 ELSE b:= b - 1; write(b); END. Figure 10. A slice for C= using runtime information from an execution with the input value 5.
Both slices are minimal in some sense, but the second slice is more useful for locating the bug. If the intention is to find all statements whose modification could affect the value
10
tv
er
sio
n
of the variable b at the write-statement, then the slice shown in Figure 9 is the only slice that contains all those statements. The only thing we can do is to measure the reduction of the program and assume that the smallest slice is the best. The program/slice ratio will measure the slice efficiency as long as the tools generate correct slices for the application area under consideration. The unit of measure depends on the abstraction level. If the tools work at the procedure level then we can count the number of procedures or we can count the total number of statements in the included procedures. The latter should be avoided since this will give a misleading measure for procedures that only have a small number of its statements in the slice. The easiest way to measure the size of the program slice would be to measure the number of lines or characters as long as the tool presents code as output. For the purpose of comparison this is only useful if the tools slice programs written in languages with roughly equivalent syntax. This will not work if we want to compare tools that slices languages with different features such as Pascal versus some declaration free language. To compare such languages we have to find another unit of measure. If the languages are closely related then the number of statements or the number of procedures could be used to measure program size. An alternative is to measure the size of the abstract syntax trees if the tools use comparable node types, but the cost for measuring size would then be considerably higher. It is not obvious how to grade two slices of equal size with different contents. The notion of a minimal slice leads to the conclusion that if we have two correct slices with different contents, then the differing statements are not required to be in the slice. This is because if they were required to be in the slice then the slice they are not included in would be incorrect which is a contradiction to the assumption that both were correct. Therefore a smaller correct slice must exist that does not include the differing statements. Assuming both slices are correct it is therefore of no interest to try to grade slices of the same size.
3.3 Restrictions on the language
Dr af
Most of the algorithms impose some restrictions on the programs that are to be sliced. There are for instance very few algorithms that can handle aliasing due to pointers. The actual implementations of the algorithms will almost always further restrict the programs that are to be sliced. In order to do a fair evaluation these restrictions have to be measured or at least described. In order to evaluate the restrictions we have to determine which restrictions to consider. The direct approach would be to only consider the restrictions mentioned in the algorithms or by the implementors. Another approach is to consider everything that can be expressed in some reference language but not by the tool as a restriction. A drawback of this approach is that the restrictions will depend on the choice of reference language. A third approach is to consider everything that at least one of the implementations can handle as a restriction for the tools that can not handle it. In order to objectively compare different restrictions we would have to grade them according to how much they limit the use of the tools. How limiting a restriction is depends on the language, the application area for the code, programming style etc. If we do not want to evaluate with respect to a certain intended use of the language we will have to be content with describing the restrictions without an absolute grading.
11
3.4 Efficiency
tv
er
sio
n
Since this evaluation is focused on the implementations it would be unfair to only evaluate the efficiency by comparing the time and space complexities of the algorithms. On the other hand there are problems with evaluating the efficiency of the implementations. The measurements can not be conclusive whatever approach we choose. We are only interested in the time and space used for computing slices. If the tools can be used for other tasks than computing slices, these functions’ influence on the time and space requirements should preferably be disregarded from the measurements. Unfortunately it is non trivial to decide what is actually needed to accomplish the slicing functionality. A tool can, for instance, use two data fields to hold some information that could be squeezed into one if program slicing was the only intended use of the tool. Those unnecessary parts that can be distinguished should however be disregarded from the measurements. The complexity measures that can be found in connection to the algorithms are almost always worst case complexities and they use different units for the size of the input. The complexity of dynamic algorithms depends on the number of executed statements, while static algorithms do not. If we want a common unit of measure we would have to make the complexity measure independent of the number of times each statement is executed. If we assume worst case, each statement can be executed an arbitrary number of times. Since this never is the case (for terminating programs) we would get unjust measurements for the dynamic algorithms. From a pragmatic point of view the average complexities are of more interest than the worst case complexities. Since average complexities are fairly hard to compute, an alternative is to interpolate a complexity function from test data. Even if this only will be an approximation it is preferable even if it requires a fairly large number of test programs. The two interesting efficiency measures are time and space requirements. These two measures will be further analysed below.
3.4.1 Time efficiency
Dr af
The easiest way to measure time requirements would be to measure the execution time for the tool. This could however be a bad measure since the tool may invoke functions that only are needed for its other tasks. To get a better measure we would have to have the source code for the tool so that we could measure the execution time for the relevant parts with gprof [Sun90] or an equivalent tool. If we decide to measure a complete invocation of the tool we have to specify what we mean by an invocation. In order to define this we will have to look closer at what is actually done by a slicing tool. The computation of slices can be viewed as being composed of different phases. For static slicing tools these phases are the preparation of the code and the computing of the actual slice. What is performed in each phase can differ for different algorithms and different implementations. The time requirements for static slicing tools consist of two parts; one for preparing each program and one for computing each slice. The total cost for a slice depends consequently on how many slices that are computed for each prepared program. A measure for time requirements would therefore have to take the intended number of slices per program into account. In the case of dynamic program slicing we also have to take the compile and the run time into account. It is not obvious how much of it that should be attributed to be the cost for computing the slice. The easiest to measure would be if we considered the total compile and run time to be cost for computing the slice. We can also chose to attribute the increase of compile and run time as a cost for computing slices.
12
3.4.2 Space efficiency
er
sio
n
Space requirements for slicing consist of two parts. One part is the memory space allocated during execution because of the size of the program that is to be sliced. The other part is file space needed for temporary files or extended source files. There are no easily measurable manifestations of the used memory space for a tool as in the case of time requirements. An approximation of the maximal memory space required could be obtained through system calls, or separate programs like pstat [Sun90]. If we have access to the tools’ source code we could annotate it to trace the allocation (and deallocation) of memory. Another approach is to measure intermediate formats, i.e. the attributed graphs. The unit of measure should then be the number of vertices if they are all of one fixed size. If the implementation includes edges with information then their number and size should be included as well. If the tools can present the generated graphs and other space consuming data structures we can measure this. For the dynamic slicing tools the amount of run time trace information will have to be added to the total space requirements. The instrumentation of the program by the dynamic slicing tools will also increase the space requirements. The easiest way to measure trace and instrumentation is to measure the sizes of the files in bytes since this will be proportional to the number of trace points and added statements respectively. Since we do not attempt to measure how space efficient the compilers are, we will not try to measure the extra space used to compile the instrumented programs. The rate at which the computers’ space requirements increase will be proportional to the rate at which the program size increases so we can use this as a indication of how much extra space is required by the compiler.
tv
3.5 Other considerations
In order to make it as easy as possible to measure, the subsets of the languages, used in the evaluation should only consist of those constructs that have counter-parts in all the recognized languages. The tools which have been constructed for large languages and therefore may use more complex algorithms can give inferior measures compared to tools that only implement the evaluated language. The only way to take this into account is to explicitly state the size of the recognized language compared to the allowed subset.
Dr af
3.6 Conclusions
All the characteristics described in this section can to some extent be measured. The conclusions that can be drawn from these measures depend on how much effort we are willing to put into measuring and on the classification aspects that differentiates the tools. The quality of the evaluation will be highly influenced by how many of the following classification aspects that fall into different categories: • Abstraction level. It is hard to compare the computed slices of a tool that only works with procedures with one that works with statements. • Scope of slice. It is hard to create test cases that are equivalent for interprocedural and intraprocedural tools. • Output format. It can be very expensive to obtain measurements of the size of the computed slice is if the output formats are substantially different. • Intended application area. If the intended application areas differ then it may not be clear what should be regarded as a correct slice in the evaluation. There can also be
13
problems with finding equivalent example programs due to application area dependent restrictions.
n
• Type of information. It is hard to define easily obtainable efficiency measures to compare tools that use static and runtime information respectively.
• Language. It will be unfair to compare the efficiency of tools that handle languages of different size.
sio
The more of the classification aspects that differentiates the tools, the harder it will be to compare the them. It is for instance infeasible to compare an Intraprocedural tools that works at the statement level with an interprocedural tool that works at the procedure level. The intended use of the tool is the most critical classification aspect since this will define what is a relevant slice as well as influence the other aspects. There is no way to compare slicing tools without any presumptions and get results like tool A is better on program slicing than tool B. The only possible goal for an evaluation of slicing tools
er
in general has to be to restrict to evaluating under certain presumptions like the intended application area, the abstraction level etc. Even under these presumptions it will be hard to make any claims about the general usability of the tool.
4 A method for evaluating slicing tools
Dr af
tv
This section presents a general method for evaluating slicing tools. The method describes how to measure the characteristics described in the previous section. My aim is to formulate a method for the evaluation of slicing tools in a black-box fashion. The method will only consider aspects visible to the user. I will assume that the user wants to compare different tools to each other. As far as possible the measures should therefore be formulated in a way that makes the obtained measurements useable for comparison of different tools. Characteristics which are not feasible to measure in this way will instead be described. To do a fair comparison according to the previous section we will have to compare slicing tools intended for the same application area and for similar languages. This will be assumed in this evaluation model. If the tools that we want to evaluate do not have too many differing classification aspects then the evaluation should be concentrated to the following points: • Measuring the program/slice ratio for different slices
• Time and space requirements
• Discussions about the restrictions
To make this method usable even if the application areas differ, the method will also contain simpler measure that can give a crude notion of the relation between the tools. The method contain six steps, which the rest of the section will discuss further: • Classifying the tools
• Verifying a language subset • Constructing test programs • Extracting measurements • Analyzing limitations
14
4.1 Classifying the tools
n
First we have to classify the tools according to the aspects defined in section 2.5. If many aspects differ we will have to consider using the simpler measures in the rest of the evaluation.
4.2 Verifying a language subset
4.3 Constructing test programs
sio
Since the tools only recognizes small or toy languages and since the descriptions of these languages often are diffuse, the constructs that the tools can handle should be verified. Since there are features that are fairly hard to verify, the verification should concentrate on the features that will be used in the rest of the evaluation.
tv
er
All test programs should solve real-world problems. By real-world problems I mean that the programs should originally have been constructed to solve some real problem, not only to be used for benchmarking. The reason I do not use synthetic benchmarks is that it is difficult to define what characterizes a typical program. It can also be hard to find large programs that easily can be modified to only use the language subsets the tools can handle. But the difficulty to find suitable programs would also reduce the risk of choosing programs that favour one of the tools. If the complexity measure is to be interpolated in the way that have been outlined in section 3.4, programs of different sizes should be selected. One of the problems with finding suitable large programs is that they tend to use a large number of language constructs, data types, library functions etc. The programs will have to be modified to only use the verified subset. Since we want to use programs that solve real-world problems we should also choose programs that need as little modification as possible. If for instance the tools handle I/O badly then I/O intensive programs should be avoided. If the tools are intended for different languages then the modified programs will have to be constructed for each language. The modifications shall be as few and small as possible to make it possible to verify that the modified programs are equivalent to the original ones.
4.4 Extracting measurements
Dr af
The quantitative measures should be obtained from test runs using the test programs described above. According to the previous section there are three different measures of interest; reduction of code size, time complexity and space complexity. The following sections describe how they can be obtained.
4.4.1 Slice quality
As stated in section 3.2, we can measure slice quality by measuring the size of the sliced program compared to the original size. The tools output formats will determine which unit of measure we can use. If one of the tools only works at the procedure level we will count the number of procedures. If all the tools work at the statement level the most impartial measure would be to measure the number of vertices in the abstract syntax tree, CFG or PDG. Second best would be to measure the number of statements, if the languages are reasonably alike. If the tools handle similar languages and if the output formats are alike then some trivial meas-
15
sio
n
ure like code size in bytes could be used, although we have to be careful with how the output is formatted, how many comments that are include and how language specific features are expressed. If the tools work at different abstraction levels, the highest used abstraction level will decide the unit of measure. Since the procedure level tools do not make any claims about the intraprocedural dependences it would be wrong to let them influence the measure. Since we assume that the intended use is debugging we will always try to transform the result to a subset of the original program. If a tool uses dynamic information and the output is a slice in an execution tree we will have to map it back to the original program since we only will be interested in which statements that are candidates for containing an error. We will not be interested in which invocations are candidates, especially not if the program runs through a large number of loops or make a large number of recursive calls.
4.4.2 Complexity
er
The most pragmatic approach would be to interpolate the complexity from the values obtained during slicing of the test programs. For this we would need a number of test programs of varying size. If the algorithms’ complexity is known then we can use it to estimate the number of test programs needed to calculate our complexity function. If the number of available test programs is large enough to allow us to select some of them, then we should select so that we have control over parameters such as programming style, i.e select programs so that the parameters do not differentiate the programs used. The input unit for the complexity function should be as easily collectable as possible, like number of source code statements or number of procedures.
Dr af
tv
Time complexity If we have access to the source code for the tools, we should use a tool like gprof [Sun90] to obtain the execution time for each procedure assuming that we can identify the procedures needed for slicing. If this is not feasible we will have to measure the execution of a whole session with the tool. Since the measure has to be made with an actual user situation in mind, we will assume that the intended use is debugging sessions that follow the work cycle compilation–execution–slicing–modification. We will also assume that only one slice will be computed for every preparation of a program. If the above approaches are unusable there is always the possibility of measuring the time with an external clock. Since this will not be reproducible for time-sharing systems even less conclusions will be possible to draw from such measures. Space complexity We will measure the number of vertices in the internal graphs if this information is available. If the sizes of the vertices are known then this should be included in the measure as well. If the above measures not are obtainable we will have to measure the space used by the program with a separate program like pstat [Sun90]. With this approach it can however be hard to find out when the maximum space usage is.
4.5 Analyzing limitations
The limitations will mainly be the ones that are observed during the other parts of the evaluation. They will be found because of failed verification tests, obvious restrictions due to the classification aspects and other observed limitations found while using the tools.
16
sio
n
The limitations can either be restrictions on what the tool can handle or it can be deficiencies that make the tool generate larger slices than necessary. Unfortunately, what kind a particular deficiency is will be tool dependent to some extent so it is not a good grouping for the evaluation. The limitations can generally not be compared to each other since that would require some kind of scale to measure the limitations by. This part of the evaluation will therefore have to be informal. The evaluation will not discuss all limitations since it is far from obvious what should be viewed as a limitation. It will concentrate on the differences between the tools. There will probably be a difference between what the algorithms can handle and what have actually been implemented since all existing slicing tools are for demonstration purposes only. Although I aim at evaluating implementations it is interesting to see if these restrictions are introduced by the implementors or if they are a consequence of the underlying algorithm. Each restriction should be described as extensively as possible and exemplified. The descriptions should also contain some shorter discussion on how the restriction in question will influence the results from the tools.
er
5 Applying the method
tv
This section contains a short outline of how the set of available tools affects the application of the method. There is also an adaptation of the method in view of the selection of tools. The five tools that were used in the evaluation are: Kamkar’s slicing tool, Spyder, WPIS, FOCUS and Schatz’ slicing tool. They will be described in the next section.
5.1 General adaptation
Dr af
Since the available implementations have so few aspects in common, it was not meaningful to try to compare them all to each other. Even if it would be possible to construct measures that capture all of the relevant aspects of the different implementations, these measures would be useless. The reason for this is that these measures would depend on so many aspects that we would not be able to draw any useful conclusions from them. Since the emphasis in this report lies on Kamkar’s tool, I have concentrated on how the other implementations could be compared to it. I also assumed that all the tools are to be used for locating bugs, manifested during some specific execution of the program. It was not feasible to compare the slice efficiency of Kamkar’s tool to one of the intraprocedural tools. This is due to the fact that Kamkar’s tool is working at the procedure level abstraction and the intraprocedural ones do not handle slicing outside procedure boundaries. The only way would have been to transform test programs to a suitable form, either by changing each statement into a procedure or, the other way round, by integrating all procedures into one, monolitic procedure. The first alternative would have been unfair to Kamkar’s tool due to the extra work done to handle procedures. Both alternatives would significantly have restricted the test programs that could have been used, since the programs would only have been allowed to call each procedure one time in order not to duplicate code and it would not have allowed recursion. The only tools that could be compared to Kamkar’s are therefore Spyder and WPIS. FOCUS and Schatz’ tools are only in this report to show which tools that have been available for the evaluation. There was also the problem of the intended application area since WPIS is intended for program integration. All slices made by WPIS are formally correct slices for debugging,
17
n
but slices from Kamkar’s tool and from Spyder are not acceptable slices for program integration since they are only valid for a specific execution. Since I evaluated the tools as if they were intended for debugging there was no problem with the correctness of the slices. Tools intended for other application areas have however probably had a disadvantage in the evaluation.
5.2 Modified method
sio
Because of the problems with the available tools as discussed above and the time limit for doing the evaluation, the method was adapted as described below.
5.2.1 Classification
This was made as part of the description of the five tools.
5.2.2 Verifying a language subset
er
The verification was focused on the language subset that was needed for the rest of the evaluation. This is since there was no detailed description of which language constructs the tools can handle. All testprograms used to verify the language subsets will be available in the appendix to this report.
5.2.3 Constructing test programs
tv
The efficiency of the slicing tools has been measured by running nontrivial test programs. The test programs were selected so that they had to be modified as little as possible to be handled by all of the tools. The deficiencies for some of the slicing tools in handling I/O and dynamic data structures limited the number of possible test programs. Because of the limited language subsets, it has been hard to find real-world programs that could be transformed so that they could be handled by the tools. This evaluation therefore used every larger test programs that could be found.
5.2.4 Extracting measures
Dr af
There was very little common ground for measuring the available tools. The measurements taken in this evaluation are mainly taken to exemplify the previously defined method.
Slice quality The slices which were computed should be slices that could be useful in a debugging session. The size has been measured in number of procedures, since this is the only unit that was supported by Spyder, WPIS and Kamkar’s tools. Spyder does not however mark the procedure declarations to indicate that a procedure is included in the slice. Therefore all procedure containing marked statements are considered to be in the slice. Space and time requirements In this part I will not try to measure the complexity since I have to few testprograms to obtain measures from. The measures can only give a hint about the systems complexity and show which measures that can be obtained this way. Because of the experimental nature of most of the tools, this comparison could give a totally different result than what
18
er
sio
n
would have been the case if we compared the algorithms or some commercial implementations. WPIS is partly described in a high level specification language. It is therefore not obvious which parts are needed for computing slices. It is even harder to figure out where to put system calls in order to measure execution time and trace memory allocation. Kamkar’s tool and Spyder are built on existing compilers. Therefore they use data structures and functions already implemented in these tools even though they may not be the most efficient for the task. I have only measured the space requirements by counting the number of vertices, since this is what the tools support. I assume that the vertices are of similar size in order to be able to draw any conclusions from the measures. The time requirements have been measured with a normal clock to give an idea of how useful the tool is in a working situation. This measure was broken down into three parts; time for preparation of each session, time for preparing each source file and time for computing each slice. For the tools that use dynamic information I have also measured the enlargement of the test program due to instrumentation and the size of the execution trace when the trace is needed to pass information between different programs.
5.2.5 Analyzing limitations
tv
This has been the largest part of the evaluation since we are comparing tools with few classification aspects in common. The evaluation tries to emphasize the aspects that differs. There is no attempt to summarize the restrictions to decide which tool is to be considered the least limited one.
6 The slicing tools
Dr af
For the purpose of this evaluation we were able to obtain five tools that implement program slicing. The ones we found were: Kamkar’s tool, Spyder, the Wisconsin Program Integration System, FOCUS and Schatz’ tool. In addition to these five I am only aware of two other tools: Weiser’s tool which was not available and Korel’s tool which we have not yet had time to evaluate. The rest of this section describes all five above-mentioned tools and the algorithms behind them in the light of the definitions made in Section 2. It also contains information regarding the environment in which they are used and how they are used. Last in this section is a summary of the classification according to the aspects in section 2.5.
6.1 Kamkar’s slicing tool
The tool was implemented primarily to evaluate the algorithms defined by Kamkar [Kam93]. The intended application area is debugging and testing. The implementation uses parts of the Distributed Incremental Compiling Environment, DICE [Fri84], and is currently being integrated with the Generalized Algorithmic Debugging Technique/Tool, GADT [Sha91]. It is interprocedural and the only dynamic slicing tool in the evaluation. It is also the only slicing tool that works at the procedure level abstraction. It presents the slice as a tree consisting of the procedure invocations that are included.
19
6.1.1 Algorithm
sio
n
The algorithm is based on graph reachability in dependence graphs. The code that is to be sliced is executed and traced to generate an execution tree with dependence information. A temporary dependence graph is constructed for each procedure. The information from the temporary graph is transferred to a summary graph. The temporary graph is erased as soon as the information has been transferred. The slice is generated by traversing the execution tree with respect to the transferred dependences. The implementation can presently only do backward slices at the procedure level.
6.1.2 Environment
The implementation uses the front end of DICE to instrument the program that is to be sliced. Currently it uses SUN’s Pascal compiler to compile the instrumented code and to integrate it with DICE. GADT’s graphical interface is used to present the slices. The tool consists of an instrumentation part, a graph building part and a traversing/slicing part, all integrated with DICE.
6.1.3 Target language
tv
6.1.4 Handling
er
The tool handles a subset of Pascal. The subset is restricted to only handle character and integer types, which can be scalar, vectors or matrixes. The only recognized programs units are procedures (i.e. no functions). It also contains if-then-else- and while-statements. It recognizes read- and write-statements. Side effects such as the use of global variables are allowed. The verified subset can be found in the appendix.
Dr af
The code that is to be sliced is parsed and instrumented with tracing code by an extended DICE. It is then compiled and linked with DICE to run in the same address space. The program is run from within DICE to generate an execution tree which can be presented by GADT’s graphical interface. The slicing criterion is taken from the execution tree and consists of a unique procedure number, a variable name and some type information (see section 6.1.5). The resulting slice is then graphically presented. The resulting slice can be sliced again to present a smaller part of the program. A sample session is shown in Figure 11. The left window shows the interaction with the user and the graphical window on top shows the full execution tree and the graphical window below shows the resulting slice.
6.1.5 Special considerations
The slicing criterion differs from what has been previously described in two respects. Since we work at the procedure level, we can only set the point in the slicing criterion to be at a call site. Secondly, since the slicing tool uses dynamic information, the criterion have to distinguish different invocations of a procedure. This is in theory made by extending the variable in the criterion with a specific value of that variable. In this tool the different invocations are identified by specifying a node in the execution tree and a variable mentioned at that node. The tool asks whether the variable is scalar or not and what kind of parameter it is, in-parameter, out-parameter, global variable etc. This implementation is intended for algorithmic debugging. Therefore it will only consider the procedures that are children to the procedure in the criterion for inclusion in the slice. The graphical interface assumes that the resulting slice is a tree.
20
n sio er
tv
Figure 11. A sample session with Kamkar’s tool. The execution tree is generated for input 2 and the slice is computed for the variable a[3] with respect to the whole program.
6.2 Spyder
Spyder is a debugging tool that uses several techniques to assist the user in locating bugs. The slicing part of the tool is described in Agrawal’s PhD thesis [Agr91]. The implementation is based on GNU’s C compiler gcc [Sta89b] and debugger gdb [Sta89a]. It is interprocedural and it can perform both static and dynamic slicing. This evaluation will however only use the exact dynamic slicing mode.
Dr af
6.2.1 Algorithm
The algorithm is based on graph reachability in dependence graphs. The code that is to be sliced is instrumented and compiled to an executable program. The program is executed and its I/O is recorded. This information is then used by the debugger to execute the program and generate a dynamic dependence graph. The slice is finally constructed by traversing the dependence graph. The implementation can only do backward slices at the statement level.
6.2.2 Environment
The basic tools are modified versions of gcc and gdb. There is also a program that organizes the generation of testcases called tcgen. A graphical interface has been added to gdb to present the slices in. Every program that is to be sliced is stored in a separate directory together with its dependence graph, testcase information and mapping between graph and source code.
21
n sio er
tv
Figure 12. A sample session from Spyder. The right window shows part of the slice obtained by slicing the calculator. The slice is computed for C=.
6.2.3 Target language
Dr af
The tool handles almost all of ANSI C accord to Agrawal. It can handle all control structures including while-, for-, if-then-else-statements. All basic compound datatypes can be used except function pointers. The only recognized calls are procedure calls. The only recognized library procedures are printf and scanf. Side effects such as the use of global variables are allowed. The verified subset can be found in the appendix.
6.2.4 Handling
The code that is to be sliced is compiled with the modified gcc. The result of the compilation is a directory containing the executable program, the dependence graph and other files. The testcase generation tool tcgen is then run to add files to the programs directory. The actual slicing is then preformed in the debugger. The debugger works as a normal debugger but provides the additional functionality of computing static slices. To get dynamic slices, one of the dynamic slicing versions is selected and a testcase is loaded. The program is then run by the debugger to generate the execution trace needed to construct the dynamic dependence graph. The slicing point is always the point where the execution have been stopped. The slicing variable is selected directly from the code presented on the screen. The resulting slice is then graphically presented by highlighting the code in the slice. A sample session is shown in Figure 12. The left window shows the two initial step of compiling and generating a testcase, while the right window shows the part of the selected slice.
22
6.2.5 Special considerations
sio
n
A slice can be taken at any step in the execution of the testcase. This can for instance be used to slice to be taken before the execution of the statement. Different invocations of a procedure can be sliced upon by stopping the execution just after the procedure was called. Due to a trivial mapping from dependence graph to source code the vertices have to be listed by the tool in the same order as the matching statements are written in the source code. his is generally no problem except when dealing with mutual recursion through a chain of procedures. One of the test programs contains such chain of procedures. It can however be avoided by changing the order in which the procedures are defined to an, to the user, obscure order. This implementation is intended for debugging. It is for instance possible to step backwards through the program and generate a slice depending on a lesser part of the execution history.
6.3 Wisconsin Program Integration System
6.3.1 Algorithm
er
This implementation is version 1.0 of an experimental application for program integration. Program slicing is only one of the functions available for examining programs written in a small declaration free Pascal like language. The implementation makes slices on statement level abstraction and can handle procedures. It can do both forward and backward slicing.
tv
The algorithm is based on graph reachability in a system dependence graph (SDG). An SDG is a PDG extended with features to represent dependences between procedures. The slice is constructed by following the transitive dependences. An attribute grammar is used to define a structure editor and to annotate the code. The annotations are used to build a SDG. An external package is used to handle the graph and to perform the slicing.
6.3.2 Environment
Dr af
The system is built on top of the Synthesizer generator [RT89]. An attribute grammar is used to define a structure editor and to annotate the code. The annotations are then used to build a system dependence graph. An external package is then used to handle the graph. The system has a window based environment with different windows for the different functions. All results are shown directly in the editor window.
6.3.3 Target language
The language is a declaration free small Pascal-like language. It contains while-, for-, ifthen-else-statements. Scalars and arrays of arbitrary dimension are allowed, although all variables are assumed to be numerical. Variables whose names start with the letter g are considered to be global. The only recognized calls are procedure calls. Nested procedures are not supported. It recognizes an output-statement that takes an integer or a variable as parameter. The verified part of the language can be found in the appendix.
23
6.3.4 Handling
tv
er
sio
n
The program reads a text file with a program, parses it and builds a SDG. The slice point is selected on the screen and the slicing direction is selected from a menu window. The resulting slice is presented both in a separate window and indicated by a different typeface in the original code. It is possible to save the slice on a file. An example session is shown in Figure 13. The main window shows the original program with the slice indicated in italic typeface. The SliceView window shows only the resulting slice and the Slice window is used to select the desired slicing parameters.
Dr af
Figure 13. A sample session from Winsconsin Program Integration System. The slice in the right window is computed for C=.
6.3.5 Special considerations
The tool allows the user to read and write files in three different formats. Only the text format has been used in this evaluation even though it does not allow comments. The other formats have been avoided because they may cause inconsistencies due to bugs in the tool. The slicing criterion is selected by marking a part of the source code. All variables that occur in the selected code are considered to be in the slicing criterion. The statement that ends closest to the selected part is used as the statement in the slicing criterion. WPIS will have problems if we use a global variable (a variable starting with the letter g) as a formal parameter to a procedure since this combination is undefined in the current implementation.
24
6.4 Schatz’ slicing tool
n
This implementation is a research implementation based on work by Ottenstein and Ottenstein on PDGs [OO84]. It is built to work with PAT which is an interactive FORTRAN parallelizing assistant tool [Smi88, SA89]. It has a very simple user interface and is working on a graph representation of the code. The tool is intraprocedural and uses only static information. It can only do backward slicing.
sio
6.4.1 Algorithm
The algorithm is based on graph reachability in PDGs. The program that is to be sliced is parsed to generate an intermediate format that is a CFG. The CFG is sequentialized and transformed to a PDG. The slice is constructed by following the dependences in the PDG.
6.4.2 Environment
6.4.3 Target Language
er
Schatz’ tool uses PATs front end to parse the program. There is a utility that reads the parsed program and builds a a PDG. The PDG is then printed in textual form on the screen. There is another program that apart from the above also takes a slicing criterion as input and slices the PDG before printing it on the screen. The tool also generates files which can be used to visualize the graphs.
tv
The tool works on FORTRAN. The accepted FORTRAN subset is not explicitly defined. It can handle several source files from netlib [DG87]. Subroutine calls are assumed to be free of side effects and are not expected to modify their parameters. The verified subset can be found in the appendix. The parser and presumably the slicing tool can handle some parallel constructs but these are not used in this evaluation.
6.4.4 Handling
Dr af
The code is first parsed to generate a file with an intermediate representation of the code. Next the PDG building program is used to build and display the graph to get references for the slice criterion. After that the slicing tool is run with the intermediate file and the slicing criterion as input. A sample session is shown in Figure 14. The left window shows part of the PDG that is the result of the PDG building program. The right window shows the output from the tool where the first part is a list of all vertices in the PDG and the second part is a list of the vertices in the resulting slice.
6.4.5 Special considerations
The tool also generates files that are intended for use with a visualization tool. We had no access to this tool. The data format for the visualization tool is so simple that it could quite easily be made to work with other visualization tools. A node number in the PDG and a variable name are used as slicing criterion, both given as arguments to the slicing tool.
25
n sio er
tv
Figure 14. A sample session from Schatz’ tool. The slice in the right window is computed for variable a with respect to the statement represented by node DN@0007 (a:=a) in the program language_test.f.
6.5 FOCUS
Dr af
The implementation is a updated version of a debugging tool called FOCUS [Lyl84]. The tool uses a method called dicing to locate bugs. The implementation is intraprocedural and uses only static information. It is the only tool in this evaluation that computes slices by solving dataflow equations. According to the author the tool should be able to do interprocedural slices as well. The tool is distributed with a program that does interprocedural analysis, but the current tool can not make use of the interprocedural information. Apart from normal slicing, the tool can perform what Lyle calls Dicing [Lyl84].
6.5.1 Algorithm
The algorithm solves dataflow equations on a CFG. The algorithm annotates the nodes in the graph with a set of the active variables at that point in the program. The algorithm continues as long as the active sets are changed. The slice is then extracted from the graph by taking all statements that have common variables with the active set of the slicing criterion.
6.5.2 Environment
The system consists of a parser that generates an intermediate format. The intermediate format and the source file are then read into the main program that builds the CFG. Finally
26
the slice is constructed by the main program and mapped back to the source code. The slice is shown in a different typeface in the same window as the original program.
n
6.5.3 Target language
sio
The implementation is intended for programs written in C or FORTRAN. The only available parser however is a C-parser. The accepted language is not explicitly defined. There is no support for pointers or procedure calls. The verified subset can be found in the appendix.
6.5.4 Handling
Dr af
tv
er
The code that is to be sliced is first processed by a script that parses the program and generates a file with an intermediate representation. The intermediate file is also used by a number of utility programs. Thereafter the slice program is invoked. The tool displays the code and a menu of available variables. To generate a slice the variable in the slicing criterion is selected from the menu. The statement in the criterion is then selected from the part of the window that displays the code. A sample session is shown in Figure 15. The main window is on the left with the sliced program. The variables that can be sliced on is presented in the top of the window. The left window shows the trace output that always is generated.
Figure 15. A sample session from FOCUS. The slice in the window is computed for C=
27
6.5.5 Special considerations
6.6 Summary of implementations
sio
Table 2: Summary of implementations
n
This tool gives no indication that it may be unable to slice a program. It always produces some output no matter what the code looks like and how the slicing point is selected. The slicing criterion consists of one identifier selected from a menu, and a row that is selected with the mouse from the code.
Kamkar’s
Spyder
WPIS
Schatz’
FOCUS
Abstraction level
Procedure
Statement
Statement
Flow Graph
Statement
Scope of slice
Interprocedural
Interprocedural
Interprocedural
Intraprocedural
Intraprocedural
Type of result
Set of statements
Set of statements
Partially equivalent program
Partially equivalent program
Partially equivalent program
Intended application
Debugging, testing
Debugging
Program integration
Not stated by implementor
Slicing direction
Backward
Backward
Forward & backward
Backward
Backward
Type of information
Dynamic
Dynamic
Static
Static
Static
Approach
Graph reachability
Graph reachability
Graph reachability
Graph reachability
Solves dataflow equations
Debugging
The statement
—1
—2
After3
After
Before
Language
Pascal subset
C
Pascal-like language
FORTRAN subset
C subset
Have to be referenced at the slicing vertex4.
Does not have to be referenced.
Have to be referenced at the slicing statement.
Have to be referenced at the slicing statement.
Does not have to be referenced.
tv
er
Regards
Dr af
Special considerations for the slicing variables
1. Since we only slice on call sites and can select if we want to slice on an incoming or an outgoing variable, both variants can be obtained. 2. Both variants can be obtained by making the slice before or after the execution of the statement. 3. We can select slices computed on incoming edges(parameters) to get something similar to before execution of procedure calls. 4. A modified global variable can be used in the criterion since it will be listed in the procedures node.
28
7 Construction of test data
sio
n
The test data set consists of two parts. The first part is a set of basic test cases. These test cases have been made to verify the different classification aspects, the used language subsets and other conceivable limitations. The other part consists of complete programs that are used to measure the efficiency and the time and space complexities. These programs are not specially developed for the evaluation.
7.1 Basic test cases
tv
er
The basic tests have three objectives. The first objective, is to verify the classification in the previous section. The aspects that will be verified are whether the tool produces an equivalent program or just a set of statements and whether the slice is taken before or after the execution of the statement in the criterion. To avoid problems with the last aspect, the rest of the tests have a statement like x:=x to be used in the slicing criterion. The two test cases concerned with determining whether the tools produce partially equivalent programs or just set of statements are according to examples by Kamkar [Kam93] and Venkatesh [Ven91]. If the tool fails any of the two test cases then it produces sets of statements. However, the test cases can not be used to show that the tools always produce equivalent programs. The second objective is to verify the language constructs used for the rest of the evaluation. It is not an attempt to verify all language constructs that the tool can handle. To do that would involve testing of a large number of combinations of constructs and that would generate an enormous set of tests, especially for FOCUS that parses a large part of C. The goal for these test cases is to verify a common language subset that can be used by the large test program. The third objective is to find differences between the tools. This is done by running test cases with known problems for different slicing algorithms. These tests is not expected to be exhaustive.
7.2 Complete programs
Dr af
To get good measures the test programs should have been written to solve real-world problems and the slicing criterion should be relevant for locating a possible bug. Unfortunately the restricted common language subset limits the set of usable test programs. We have to avoid test programs with the following constructs since they would make large modifications necessary: • Pointers
• Advanced data structures (complex combinations of records and arrays) • Advanced floating point arithmetic or string manipulations There are also restrictions on what kind of test programs that it will be meaningful to slice. The following demand on the output will also have to be fulfilled: • Easily identifiable results, i.e. no test programs with graphic output or continuous result like word processing. These kind of output would make it hard to narrow down a faulty behavior and to locate a erroneous statement to slice on. The only nontrivial programs I have found that fulfil these criteria are both taken from education. The programs are a calculator used as an example in a textbook and a recursive
29
n
descent parser that was used as a practical exercise in a compiler construction course. The two programs had to be transformed to only use the verified constructs and abstractions. To make the results comparable, all transformations were made on all versions of the programs even if they only were needed for some of the tools. Although the transformations below may not be valid in general, they will preserve the semantics for the test programs. The following transformations were made:
sio
• Simplification of data structures. This includes splitting records, extending subtypes, exchanging sets with arrays and converting floating point variables to integers. This also leads to replacing some operators like Pascal’s in and functions like abs to user defined procedures. • Replacing enumerated data types with integers. This is also done for booleans. • Replacing constants with instantiated variables.
• Replacing global variables (and the replaced constants) where they were used as parameters to procedures. In most cases made by propagating the “constants”.
er
• Flatten the code so that all procedures are defined at the same level, like in C. This also includes making variables global if they are used in more than one procedure. • Transforming all functions to procedures. Here the C version has to use pointers to return result variables. • Using verified control structures only. Replacing for-loops with while-loops, repeatuntil-loops with while-loops, case-statements with if-then-else-statements.
tv
• Removing gotos. This is made by first transforming global gotos to local gotos and a boolean flag to control the flow of the program. Next the local gotos are transformed to an if-statement and a flag. (See Shahmehri [Sha91] for further details) • Restricting arrays to only use integer indices. In the C version of the programs dummy element were added to avoid changing indices • Transforming assignments to whole arrays to explicit assignments to every element in the array. This is in some cases made as procedures.
Dr af
• Removing expressions, including unary minus, from indices and parameters, i.e. transforming x[a-1] to aminus:=a-1; x[aminus]. • Assignments of results of comparisons to boolean variables have been replaced with ifstatements. • Transforming all I/O to use stdin/stdout instead of files.
7.2.1 Recursive descent parser
The recursive descent parser was used in a compiler course some years ago. It is a basic parser that parses assignment statements and prints them in postorder notation. The program was originally about 350 lines of code. To make the program larger I have extended it with a flag to control the trace output and the grammar have also been extended to recognize function calls. The resulting grammar is shown in Figure 16. Another noticeable modification of the program is the removal of the file-I/O and the global gotos. In order to remove tests for the letter q has been introduced to indicate end of input. The original code and the transformed versions can be found in the appendix.
30
::= ':='
::= '*' ! '/' !
sio
::= '+' ! '-' !
n
::= ';' !
::= '(' ')' ! ! ’(’ ’)’
er
::= ’,’ ! ::= 'identifier'
Figure 16. Grammar for the language accepted by the recursive descent parser.
tv
A sample session is shown in Figure 17. The user’s input in infix notation is shown in bold typeface and the recursive descent parser’s output in postfix notation is shown in normal typeface (together with the shell’s prompt). sen1% recdes a:=b+ c /(d-e); a b c d e - / + := ab:= cd + de ab cd de + := q sen1%
Dr af
Figure 17. An example session with the parser. Observe the letter q used to indicate end of input.
7.2.2 Calculator
The calculator has been taken from Haraldsson’s book “Programmering i Pascal” [Har85]. The program is a simple calculator for infix expressions consisting of about 300 lines of code. Since the used language subset does not include floating point numbers, the input has been simplified to only allow integers. The output has also been simplified in this respect, but not as far as possible since that would have reduced the number of procedures in the program. The I/O has been simplified not to use pointers, get or eoln since the tools do not recognize any built-in procedures. Since the program used nesting of procedures, some local variables had to be made global. All I/O has been translated into english. Finally the new procedure myin was made recursive instead of iterative to avoid a problem with Kamkar’s tool. The original code and the two transformed versions can be found in the appendix. A sample session with the modified program is shown in Figure 18. The user’s input is in bold typeface.
31
sio
n
sen1% calculator : 4+5; 9 : 4*9/5 + 1 ; 8 : end; exit from calculator sen1% Figure 18. An example session using the calculator.
8 Extracting measurements
tv
er
This section summarizes the measurements taken on the tools. It follows the method described in the previous sections as far as the available time and the tools allowed. The aim was to measure the interprocedural tools using the two larger test programs that where described in Section 7. Kamkar’s tool only considers procedures that are children, in the execution tree, to the procedure in the criterion for inclusion in the slice. It is therefore not feasible to define more than one justifiable slicing criterion, that is relevant for all three tools and the intended application area. The small number of test programs makes it impossible to interpolate any usable complexity measures. The measurements below are therefore only an example of the measurements that can be collected when evaluating slicing tools according to the previously defined method. Observe that the built-in procedures are not included in the measure since they can not be responsible for the erroneous behavior that we use the slicing tool to locate.
8.1 Slice quality
Dr af
Recursive descent parser In order to define a reasonable slicing criterion the program have been augmented with a statement in the main procedure that references the variable we want to use in the slicing criterion. Since WPIS requires that the variable is referenced in the statement in the criterion, this makes it possible to use the same slicing criterion for all three programs. Thus the criterion C = with the input a:=b results in the slices summarized in Table 3. The size of the slice is 86% of the original program for WPIS, 50% for Kamkar’s tool and 29% for Spyder. From this example it can be seen that all three slices exclude the trace procedures since no other part of the program depends on them. In the dynamic case it is basically because they are not executed. The errmess can be excluded from the dynamic slices since it is not executed. The rest of the difference between WPIS and Kamkar’s tool is due to the fact that the dynamic slice can trace the actual execution, while the static slice has to take all possible execution paths into account. For this program there would have been no significant improvement for the static slice if WPIS could have handled array subscripts or if it could have evaluated the predicates. The difference between Kamkar’s tool and Spyder is that Spyder does not reflect the flow of control since it only includes procedures that actually modifies the variables in the active set and/or passes those variables as parameters.
32
Table 3: Results for recursive descent parser Execution tree
WPIS (Static)
Kamkar’s tool (Dynamic)
Spyder (Dynamic)
main
X
X
X
X
X
getchar
X
X
X
X
X
scan
X
X
X
X
X
initialize
X
X
X
X
tracein
X
traceout
X
errmess
X
variable
X
exprlist
X
factor
X
term
X
expression
X
assignment
X
assignlist
X
sio
n
Reachable procedures
X
X
X
X
X
er
X
X
X
X
X
X
X
X
X
X
X
X
X
tv
Procedure
Sum:
14
10
12
7
4
Size ratio for slice
—
—
0.86
0.50
0,29
Dr af
Calculator The program is sliced with the criterion C = for input 3. A summary of the resulting slice is presented in Table 4. The size of slice is 87% of the original program for WPIS and 57% for Kamkar’s tool and Spyder. From this example it can be seen that all three slices excludes the procedures that formats the output since the variable value is not affected by those procedures. The push, pop, top, calculate, prio and width procedures can be excluded from the dynamic slice since they are not executed. The rest of the difference between the static and the dynamic tools is as those mentioned above. Handling of array subscripts and evaluation of control predicates would not have improved the static slice in this case either.
8.2 Time and space requirements The performance of the dynamic tool depends on the input. All measures in this section have been obtained for the input a:=b and 3 respectively if no other input is specified. Since the interesting time measures only are parts of an execution for some of the tools the time measures have to be obtained from an external clock even though this means that the measures will not be reproducible. The time requirements for the tools running on a SPARC-station 10/20 are described in Table 5. Since none of the tools are explicitly built for the work cycle this evaluation has assumed, the time measures should only be seen as an indication of the cost for computing
33
Table 4: Results for calculator Execution tree
WPIS (Static)
Kamkar’s tool (Dynamic)
Spyder (Dynamic)
main
X
X
X
X
X
initvars
X
X
X
X
X
myin
X
X
X
X
X
copytorow
X
X
X
X
X
copytoarr
X
X
X
X
X
exitexpr
X
X
X
X
X
getnextsymbol
X
X
X
X
X
readexpr
X
X
X
X
X
push
X
pop
X
empty
X
top
X
val
X
makereal
X X
calculate
X
operand prio
sio
er X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X
X
operator parsexpr
X
X
X
X
X
X
X
X
X
X
X
Dr af
myabs
X
tv
pushresstack
n
Reachable procedures
Procedure
width
X
printvalue
X
X
Sum:
23
17
20
13
13
Size ratio for slice
—
—
0.87
0.57
0.57
the slices. The measure is not a measure of the turnaround time for a debugging session. The large part of the cost for the WPIS can be attributed to the fact that its PDG-building is described in the attribute grammar formalism the Synthesizer generator uses. The space requirement measures are acquired with the built in debugging facilities. The measures are summarized in Table 6. The maximal space requirement for Kamkar’s tool is smaller than the sum of the summary graph and the temporary graphs sizes since the tool removes the temporary graphs as soon as the corresponding procedure ends. Since the size of the dynamic graph depends on the actual execution, we will get different results
34
Table 5: Execution time for the tools
WPIS
Kamkar’s tool
Calculator Spyder
WPIS
For each session: (a:=5; b:=a+4;), slice AFTER executing statement. *} a := 5; b := 3; b := a + 4 tnemmoc; {* * Classification 2. Checks if the slicer produces set of statements * or a partly equivalent program. Example according to M Kamkar. * * C= => (x:=5; y:=0; while(x>3)do x:=x-1; y:=7; od; y:=y) * * Possible discrepancies: Doesn’t include x:=x-1; => only set of * statements. * * Result: Produces executable programs. *} x := 5; y := 0; while (x > 3) do x := x - 1; y := 7 od; y := y tnemmoc;
(* * Classification 3. Checks if slicer produces set of statements or a * partly equivalent program. Example according to G A Venkatesh. * * C= => (test3(c); set10(h); inc(i)); * * Possible defects: Doesn’t include inc(i); => only set of statements. * * Result: Produces set of statements. *) c:= 0; test3(c); end.
1(22)
94/04/10 20:16:37
classification_test.WPIS.txt
2
94/04/10 20:22:35
classtest.f
1
program classification {* * classification 3. Checks if the slicer produces set of statements or * a partly equivalent program. Example according to G A Venkatesh. * * C= => * (i:=0; while(i only set of * statements. * * Result: Produces executable programs. *} i := 0; while (i < 10) do if (1 = 0) then h := 10 fi; i := i + 1 od; h := h tnemmoc tnemmoc end clasification_test
c c c c c c c c c c c c c
Test program to help classify the slicer. FORTRAN-version for Emmi Schatz Slicer. The different parts of the test: The slicing criterion definition. Slice a set of statements or equivalent program. The slicing criterion C is written as: C= => (list of statements in the slice) Tommy Hoffner 1993-05-21 integer a, b integer x, y integer i, h
c c c c c c c c c c c c
Classification 1. Check the definition of the slicing criterion. C= C => (b=3 b=a+4)
=>
C => (a=5 b=a+4) =>
The slice is taken BEFORE executing the statement in the criterion. The slice is taken AFTER executing the statement in the criterion.
Result: C => (a=5 b=a+4), slice AFTER executing statement. a = 5 b = 3 b = a + 4
c c c c c c c c c c c c c c c c
Classification 2. Checks if slicer produces set of statements or a partly equivalent program. Example according to M Kamkar. Irrelevant for this slicer since it is inpossible to rewrite the test for it. If the while-statement is rewritten as if and gotos the slicer doesn’t handle it and if it is rewritten as a do-loop the resulting code will not include the relevant parts for the test. Result: None Classification 3. Checks if slicer produces set of statements or a partly equivalent program. Example according to G A Venkatesh. Result: None (Se test 2.) stop end
94/10/21 17:38:34
classification_test.SPYDER.c
1
94/10/21 17:38:34
/* * Test program to help classify the slicer. * C-version for Agrawals Slicer (only exact dynamic analysis). * * The different parts of the test: * The slicing criterion definition. * Slice a set of statements or equivalent program. * * The slicing criterion C is written as: * C= => (list of statements in the slice) * Tommy Hoffner 1994-10-21 * */
classification_test.SPYDER.c
2
while (x > 3) { x = x - 1; y = 7; } y = y; /* * Classification 3. Checks if slicer produces set of statements or a * partly equivalent program. Example according to G A Venkatesh. * * C= => (i=0; while(i only set of statements. * * Result: Produces set of statements. */
main() { int a, b; int x, y; int i, h;
i = 0; while (i < 10) { if (i == 0) h = 10; i =i + 1; } h = h;
/* * Classification 1. Check the definition of the slicing criterion. * * C= * * C => (b=3; b=a+4;) => The slice is taken BEFORE executing * the statement in the criterion. * C => (a=5; b=a+4;) => The slice is taken AFTER executing * the statement in the criterion. * * Result: Both variants obtainable depending on how long the execution * have progressed. The second variant used for the rest of the * test. */
}
a = 5; b = 3; b = a + 4; /* * Classification 2. Checks if slicer produces set of statements or a * partly equivalent program. Example according M Kamkar. * * C= => (x=5; y=0; while(x>3) x=x-1; y=7; y=y) * * Possible defects: Doesn’t includde x=x-1; => only set of statements. * * Result: Produces executable programs (as long as the loop is executed * two times or more, which it is in this case). */ x = 5; y = 0;
2(22)
93/05/12 19:36:58
classification_test.FOCUS.c
1
93/05/12 19:36:58
/* * Test program to help classify the slicer. * C-version for Lyles Slicer. * * The different parts of the test: * The slicing criterion definition. * Slice a set of statements or equivalent program. * * The slicing criterion C is written as: * C= => (list of statements in the slice) * Tommy Hoffner 1993-05-12 * */
classification_test.FOCUS.c
2
/* * Classification 2. Checks if slicer produces set of statements or a * partly equivalent program. Example according M Kamkar. * * C= => (x=5; y=0; while(x>3) x=x-1; y=7; y=y) * * Possible defects: Doesn’t includde x=x-1; => only set of statements. * * Result: Produces executable programs. */ x = 5; y = 0; while (x > 3) { x = x - 1; y = 7; } y = y;
main() { int a, b; int x, y; int i, h; /* * Classification 1. Check the definition of the slicing criterion. * * C= * * C => (b=3; b=a+4;) => The slice is taken BEFORE executing * the statement in the criterion. * C => (a=5; b=a+4;) => The slice is taken AFTER executing * the statement in the criterion. * * Result: C => (b=3; b=a+4;), slice BEFORE executing statement. */
/* * Classification 3. Checks if slicer produces set of statements or a * partly equivalent program. Example according to G A Venkatesh. * * C= => (i=0; while(i only set of statements. * * Result: Produces executable programs. */ i = 0; while (i < 10) { if (i == 0) h = 10; i =i + 1; } h = h;
a = 5; b = 3; b = a + 4;
}
94/04/10 20:25:02
language_test.p
1
94/04/10 20:25:02
program instr_object(input,output); (* * Test routines to check what language the slicing programs can handle. * pascal-version for Kamkars Slicer. * * The different parts of the test: * If statements. * While statements. * Input and Output statements. * Scalar arrays (indexies). * * The slicing criterion C is written as: * C= => (list of statements in the slice) * Tommy Hoffner 1993-05-11 * *)
language_test.p
procedure test_if_else_1(var x:integer); var y:integer; begin y := 4; if (y > 4) then dec(y) else set1(x); end; (* part of test 3. Se test desription below. *) procedure another_set1(var x:integer); begin x := 1; end;
type my_arr = array[0..3] of integer;
procedure another_inc(var x:integer); begin x := x + 1; end;
var a,x,d,w,i,q: integer; k,m: my_arr;
procedure test_if_else_2(var d:integer); var c:integer; begin c := 3; if (d 3) then inc(a); end;
(* part of test 4. Se test desription below. *) procedure another_dec(var x:integer); begin x := x - 1; end;
(* part of test 2. Se test desription below. *) procedure dec(var x:integer); begin x := x - 1; end;
procedure set7(var x:integer); begin x := 7; end;
procedure set1(var x:integer); begin x := 1; end;
procedure test_while(var w:integer); var z:integer; begin z := 5; while (z > 3) do begin another_dec(z); set7(w); end; end;
3(22)
2
94/04/10 20:25:02
language_test.p
3
94/04/10 20:25:02
(* part of test 5. Se test desription below. *) procedure set5(var x:integer); begin x := 5; end;
begin (* * Language 1. Checks that the slicer handles if-statements. * * C= => (test_if(a); inc(a)) * * Possible discrepancies: None known. * * Result: Ok *)
(* part of test 6. Se test desription below. *) procedure another_set5(var x:integer); begin x := 5; end; procedure test_read(var q:integer); begin another_set5(q); read(q); end;
a:=3; test_if(a); (* * Language 2. Checks that the slicer includes for if-else-statements. * * C= => (test_if_else_1(x); set1(x);) * * Possible discrepancies: None known. * * Result: Ok *)
(* part of test 7. Se test desription below. *) procedure set3(var x:integer); begin x := 3; end; procedure one_more_set5(var x:integer); begin x := 5; end;
x := 3; test_if_else_1(x);
procedure test_arr_1(var k:my_arr); begin set3(k[0]); one_more_set5(k[1]); end;
(* * Language 3. Checks that the slicer includes for if-else-statements. * * C= => (test_if_else_2(d);) * * Possible discrepancies: None known. * * Result: Ok since another_set1(d) is not executed and can therefore * not be in the resulting slice. *)
(* part of test 8. Se test desription below. *) procedure set_nr_0(var k:my_arr); begin k[0] := 3; end; procedure set_nr_n(var k:my_arr; n:integer); begin k[n] := 5; end;
language_test.p
4
procedure test_arr_2(var m:my_arr); var j: integer; begin j := 1; set_nr_0(m); set_nr_n(m,j); end;
procedure test_write(var i:integer); begin set5(i); write(i); end;
94/04/10 20:25:02
language_test.p
d := 4; test_if_else_2(d);
5
94/04/10 20:25:02
(* * Language 4. Checks that the slicer handles for while-statements. * * C= => (test_while(w); another_dec(z); set7(w);) * * Possible discrepancies: Slice doesn’t include another_dec(z);. * * Result: Ok *)
language_test.p
6
test_arr_1(k); (* * Language 8. Checks if the slicer can handle arrayindexies. * * C= => (test_arr_2(m); set_nr_1(m, j);) * The slicing variable has to be given as m[1] due to the handling of * the slicer. * * Possible discrepancies: * C => (test_arr_2(m); set_nr_0(m););) => The slicer handles arrays * as scalars. * C => (test_arr_2(m);set_nr_0(m);set_nr_1(m, j);) => The slicer * handles arrays, but it ignores indexies. * * Result: Ok *)
w := 1; test_while(w); (* * Language 5. Checks how slicer handles output-statement parameters. * * C= =>(test_write(i); set5(i);) * * Possible discrepancies: Slice includes output-statements, i e output * may have side effects. * * Result: Only references output-parameters, i e printf-statement * not included in slice *)
test_arr_2(m); end.
test_write(i); (* * Language 6. Checks how slicer handles input-statement parameters. * * C= =>(test_read(q); read(q);) * * Possible discrepancies: Slice doesn’t recognize input-statements, * output not expected to have side effects. * * Result: Recognizes input-statement. *) test_read(q); (* * Language 7. Checks if the slicer can handle arrays. * * C= => (test_arr_1(k); set3(k[0]);) * * Possible descrepancies: * C=> (test_arr_1(k); one_more_set5(k[1]); ) => * The slicer handles arrays as scalars. * C=> (test_arr_1(k); set3(k[0]); one_more_set5(k[1]); ) => * The slicer handles arrays, but it ignores indexies. * Result: Ok *)
4(22)
94/04/10 20:27:34
language_test.WPIS.txt
1
94/04/10 20:27:34
procedure langauge() begin {* * Test routines to check what language the slicing programs can handle. * WSIP-version for Winsconsin Program-Integration System. * * The different parts of the test: * If statement * While statement * Input and output statements. * Arrays (indexies). * * The slicing criterion C is written as: * C= => (list of statements in the slice * Tommy Hoffner 1993-05-18 *} {* * Language 1. Checks that the slicer handles if-statements. * * C= => (a:=3; b:=4 if(b>3) then a:=a+1 fi; a:=a) * * Possible discrepancies: None known. * * Result: Ok *} a := 3; b := 4; if (b > 3) then a := a + 1 fi; a := a tnemmoc; {* * Language 2. Checks that the slicer handles if-else-statements. * * C= => (x:=3; y:=4; if(y>3) then else x:=1; x:=x) * * Possible discrepancies: None known. * * Result: Ok *} x := 3; y := 4; if (y > 3) then y := y - 1 else x := 1 fi; x := x tnemmoc;
94/04/10 20:27:34
language_test.WPIS.txt
language_test.WPIS.txt
2
{* * Language 3. Checks that the slicer handles if-else-statements. * * C= => (d:=4; if(d (z:=5; w:=1; while(z>3) do z:=z-1; w:=7; od; w:=w) * * Possible discrepancies: Slice doesn’t include z:=z-1. * * Result: Ok *} z := 5; w := 1; while (z > 3) do z := z - 1; w := 7 od; w := w tnemmoc; {* * Language 5. Checks how the slicer handles output-statements. * * C= => (i:=5; i:=i) * * Possible discrepancies: Slice includes output-statement, i e * output may have side effects. * * Result: Only references output-parameters, i e output-statement * not included in slice. *} i := 5; output(i); i := i tnemmoc;
3
94/04/10 20:31:32
language_test.f
1
program language {* * Language 6. Checks how the slicer handles input-statements. * * C= => (input(q), q:=q) * * Possible discrepancies: Slice doesn’t recognize input-statement, * i e input not expected to have side effects. * * Result: There is no input-statement defined in the language. *} tnemmoc; {* * Language 7. Checks if the slicer recognizes arrays. * * C= => (k[0]:=3; k[0]:=k[0];)) * * Possible discrepancies: * C => (k[1]:=5; k[0]:=k[0]) => The slicer doesn’t recognize arrays. * C => (k[0]:=3; k[1]:=5; k[0]:=k[0]) => The slicers recognizes * arrays but ignores indexies. * * Result: It recognizes arrays but ignores indexies (2:nd alternative). * *} k[0] := 3; k[1] := 5; k[0] := k[0] tnemmoc; {* * Language 8. Checks if the slicer can handle arrayindexies. * * C= => (j:=1; m[j]:=5; m[j]:=m[j];)) * * Possible discrepancies: * C => (j:=1; m[j]:=5; m[j]:=m[j]) => * The slicer doesn’t recognize arrays. * C => (j:=1; m[0]:=3; m[j]:=5; m[j]:=m[j]) => * The slicers recognizes arrays but ignores indexies. * * Result: It recognizes arrays but ignores indexies (2:nd alternative). * *} j := 1; m[0] := 3; m[j] := 5; m[j] := m[j] tnemmoc tnemmoc end langauge
5(22)
c c c c c c c c c c c c c c c
Test routines to check what language the slicing programs can handle. FORTRAN-version for Emmi Schatz Slicer. The different parts of the test: If statements. While statements. Input and Output statements. Arrays (indexies). The slicing criterion C is written as: C= => (list of statements in the slice) Tommy Hoffner 1993-05-18 integer integer integer integer integer integer integer integer
c c c c c c c c c
a, b x, y c, d z, w i q k(0:1) j, m(0:1)
Language 1. Checks that the slicer handles if-statements. C= => (a=3 b=4 if(b.gt.3) then a=a+1 a=a) Possible discrepancies: None known. Result: Ok a = 3 b = 4 if (b.gt.3) then a = a + 1 endif a = a
c c c c c c c c c
Language 2. Checks that the slicer includes for if-else-statements. C= => (x=3 y=4 if(y.gt.3) else x=1 x=x) Possible discrepancies: None known. Result: Ok
94/04/10 20:31:32
language_test.f
2
94/04/10 20:31:32
x = 3 y = 4 if (y.gt.3) then y = y - 1 else x = 1 endif x = x c c c c c c c c c
c c c c c c c c c c
Language 3. Checks that the slicer includes for if-else-statements. C= => (d=4 if(d.lt.3) then d=1 c=c)
C= =>(read(*,*)q; q=q;) Possible discrepancies: Slice doesn’t recognize input-statements, output not expected to have side effects. Result: Doesn’t recognize input-statement. q = 5 read(*,*) q q = q
Result: Ok
c c c c c c c c c c c c c
c = 3 d = 4 if (d.lt.3) then d = 1 else c = c + 1 endif d = d Language 4. Checks that the slicer includes for while-statements. For FORTRAN it is only possible to test do-loops even though they are less general than while-loops. Since the decrement is implicit there are fewer possible discrepancies.
Language 7. Checks if the slicer can handle arrays. C= => (k(0)=3; k(0)=k(0)+4;) Possible descrepancies: C => (k(1)=5; k(0)=k(0);) => The slicer handles arrays as scalars. C => (k(0)=3; k(1)=5; k(0)=k(0);) => The slicer handles arrays, but it ignores indexies. Result: Doesn’t recognize arrays (first alternative). (The slicer only accepts k as slicing variable). k(0) = 3 k(1) = 5 k(0) = k(0)
C= => (w=1 do z=5,3,-1 w=7 w=w;)
c c c c c c c c c c c c c c
Possible discrepancies: Slice doesn’t include z=z-1;. Result: (w=1 do z=5,3,-1 w=7 w=w;). w = 1 do 40 z=5,3,-1 w = 7 continue w = w
40 c c c c c c c c c c
3
Language 6. Checks how slicer handles input-statement parameters.
Possible discrepancies: None known.
c c c c c c c c c c c c
language_test.f i = 5 write(*,*) i i = i
Language 5. Checks how slicer handles output-statement parameters.
Language 8. Checks if the slicer can handle arrayindexies. C= => (j=1;m(0)=3; m(1)=5; m(j)=m(j);) Possible discrepancies: C => (j=1; m(1)=5; m(j)=m(j);)=> The slicer handles arrays as scalars. C => (j=1; m(0)=3; m(1)=5; m(j)=m(j);) => The slicer handles arrays, but it ignores indexies. Result: (j=1 m(j)=5 m(j)=m(j)) It doesn’t recognizes arrays but recognizes indexies as references. (The slicer only accepts k as slicing variable). j = 1 m(0) = 3 m(j) = 5 m(j) = m(j)
C= =>(i=5; i=i;) Possible discrepancies: Slice includes output-statements, i e output may have side effects. c Result: Only references output-parameters, i e write-statement not included in slice
94/10/21 17:47:37
language_test.SPYDER.c
stop end
1
94/10/21 17:47:37
/* * Test routines to check what language the slicing programs can handle. * C-version for Agrawals Slicer. * * The different parts of the test: * If statements. * While statements. * Input and Output statements. Test 9. * Scalar arrays (indexies). Test 2. * * The slicing criterion C is written as: * C= => (list of statements in the slice) * Tommy Hoffner 1994-10-19 * */
language_test.SPYDER.c
2
/* * Language 3. Checks that the slicer includes for if-else-statements. * * C= => (d=4; if(d (z=5; w=1; while(z>3) z=z-1; w=7; w=w;) * * Possible discrepancies: Slice doesn’t include z=z-1;. * * Result: Doesn’t include w=1; due to tracing of actual execution. */
/* * Language 1. Checks that the slicer handles if-statements. * * C= => (a=3; b=4; if(b>3) a=a+1; a=a;) * * Possible discrepancies: None known. * * Result: Ok */
z = 5; w = 1; while (z > 3) { z = z - 1; w = 7; } w = w;
a = 3; b = 4; if (b > 3) a = a + 1; a = a; /* * Language 2. Checks that the slicer includes for if-else-statements. * * C= => (x=3; y=4; if(y>3) else x=1; x=x;) * * Possible discrepancies: None known. * * Result: Only (x=3; x=x;) included due to actual execution. */
/* * Language 5. Checks how slicer handles output-statement parameters. * * C= =>(i=5; i=i;) * * Possible discrepancies: Slice includes output-statements, i e output * may have side effects. * * Result: Ok */
x = 3; y = 4; if (y > 3) y = y - 1; else x = 1; x = x;
i = 5; printf("%d\n", i); i = i; /*
6(22)
94/10/21 17:47:37
language_test.SPYDER.c
3
93/05/12 19:41:07
* Language 6. Checks how slicer handles input-statement parameters. * * C= =>(scanf("%d",&q); q=q;) * * Possible discrepancies: Slice doesn’t recognize input-statements, * output not expected to have side effects. * * Result: Ok */ q = 5; scanf("%d", &q); q = q; /* * Language 7. Checks if the slicer can handle arrays. * * C= => (k[0]=3; k[0]=k[0];) * * Possible descrepancies: * C => (k[1]=5; k[0]=k[0];) => The slicer handles arrays as scalars. * C => (k[0]=3; k[1]=5; k[0]=k[0];) => The slicer handles arrays, * but it ignores indexies. * * Result: Ok */
1
main() { int a, b, x, y, c, d, z, w, i, q, k[2], j, m[2]; /* * Language 1. Checks that the slicer handles if-statements. * * C= => (a=3; b=4; if(b>3) a=a+1; a=a;) * * Possible discrepancies: None known. * * Result: Ok */
k[0] = 3; k[1] = 5; k[0] = k[0];
a = 3; b = 4; if (b > 3) a = a + 1; a = a;
/* * Language 8. Checks if the slicer can handle arrayindexies. * * C= => (j=1; m[j]=5; m[j]=m[j];) * * Possible discrepancies: * C => (j=1; m[j]=5; m[j]=m[j];)=> The slicer handles arrays as scalars. * C => (j=1; m[0]=3; m[j]=5; m[j]=m[j];) => The slicer handles arrays, * but it ignores indexies. * * Result: Ok */
/* * Language 2. Checks that the slicer includes for if-else-statements. * * C= => (x=3; y=4; if(y>3) else x=1; x=x;) * * Possible discrepancies: None known. * * Result: Ok */
j = 1; m[0] = 3; m[j] = 5; m[j] = m[j];
x = 3; y = 4; if (y > 3) y = y - 1; else x = 1; x = x;
}
93/05/12 19:41:07
language_test.FOCUS.c
/* * Test routines to check what language the slicing programs can handle. * C-version for Lyles Slicer. * * The different parts of the test: * If statements. * While statements. * Input and Output statements. Test 9. * Scalar arrays (indexies). Test 2. * * The slicing criterion C is written as: * C= => (list of statements in the slice) * Tommy Hoffner 1993-05-12 * */
language_test.FOCUS.c
2
93/05/12 19:41:07
language_test.FOCUS.c
3
/* * Language 6. Checks how slicer handles input-statement parameters. * * C= =>(scanf("%d\n",q); q=q;) * * Possible discrepancies: Slice doesn’t recognize input-statements, * output not expected to have side effects. * * Result: Doesn’t recognize input-statement. */
/* * Language 3. Checks that the slicer includes for if-else-statements. * * C= => (d=4; if(d (k[0]=3; k[0]=k[0]+4;) * * Possible descrepancies: * C => (k[1]=5; k[0]=k[0];) => The slicer handles arrays as scalars. * C => (k[0]=3; k[1]=5; k[0]=k[0];) => The slicer handles arrays, * but it ignores indexies. * * Result: It recognizes arrays but ignores indexies (2:nd alternative). * (The slicer only accepts k as slicing variable). */
/* * Language 4. Checks that the slicer handles for while-statements. * * C= => (z=5; w=1; while(z>3) z=z-1; w=7; w=w;) * * Possible discrepancies: Slice doesn’t include z=z-1;. * * Result: Ok */ z = 5; w = 1; while (z > 3) { z = z - 1; w = 7; } w = w;
k[0] = 3; k[1] = 5; k[0] = k[0]; /* * Language 8. Checks if the slicer can handle arrayindexies. * * C= => (j=1; m[j]=5; m[j]=m[j];) * * Possible discrepancies: * C => (j=1; m[j]=5; m[j]=m[j];)=> The slicer handles arrays as scalars. * C => (j=1; m[0]=3; m[j]=5; m[j]=m[j];) => The slicer handles arrays, * but it ignores indexies. * * Result: It recognizes arrays but ignores indexies (2:nd alternative). * (The slicer only accepts k as slicing variable). */
/* * Language 5. Checks how slicer handles output-statement parameters. * * C= =>(i=5; i=i;) * * Possible discrepancies: Slice includes output-statements, i e output * may have side effects. * * Result: Only references output-parameters, i e printf-statement * not included in slice */
j = 1; m[0] = 3; m[j] = 5; m[j] = m[j];
i = 5; printf("%d\n", i); i = i; }
7(22)
93/08/06 13:14:23
interproc.p
1
93/08/06 13:14:23
program instr_object(input,output); (* * Test routines to check what the slicing program can handle. * Interprocedurell slicing. Pascal-version for Kamkars slicer. * * Test how the slicer handles different uses of the * parameters and aliasing. * * The slicing criterion is written as: * C = => (list of statements in the slice) * * Tommy Hoffner 1993-08-06 *)
procedure globhelpproc; begin globx:=3; globy:=4 end; procedure globparamproc(var x: integer); begin x:= x + 3 end;
procedure helpproc(var x:integer; var y:integer); begin if x = 3 then y := x; end;
procedure globproc; begin globhelpproc; globparamproc(globy); end;
procedure set2(var x:integer); begin x := 2 end;
begin (* * Interprocedural 1. Tests how slicer handles referenced parameters. * * C = => (proc(a,b,x)) * * Result: Ok *) (* * Interprocedural 2. Tests how slicer handles defined parameters. * * C = => (proc(a,b,x),set2(y)) * * Result: Ok *) (* * Interprocedural 3. Tests how slicer handles mulltiple levels. * * C = => (proc(a,b,x),helpproc(x,z)) * * Result: Ok *) a := 3; b := 7; x := 5; proc(a, b, x);
procedure proc(var x: integer;var y: integer;var z: integer); begin if (x >= 3) then begin helpproc(x,z); set2(y) end end; procedure add3(var x:integer); begin x := x + 3 end; procedure add4(var x:integer); begin x := x + 4 end; procedure aliashelpproc(var x:integer;var y:integer); begin add3(x); add4(y) end;
interproc.p
2
procedure aliasproc(var x:integer); begin aliashelpproc(x,x); end;
var a,b,c,x, globx, globy: integer;
93/08/06 13:14:23
interproc.p
3
94/12/05 14:48:26
interproc.SPYDER.c
/* * Test routines to check what the slicing program can handle. * Interprocedurell slicing. C-version for Agrawals slicer. * * Test how the slicer handles different uses of the * parameters and aliasing. * * The slicing criterion is written as: * C = => (list of statements in the slice) * * Tommy Hoffner 1994-10-19 */
(* * Interprocedural 4. Tests how slicer handles aliasing due to * parameters. * Test assumes "pascal-semantics" for parameters. * * C = => * (aliasproc(c),aliashelpproc(x,x),add3(x),add4(y)) * * Result. Ok *) aliasproc(c); (* * Interprocedural 5. Tests how slicer handles modification of * global variables. * * C = => (globproc,globhelpproc) * * Result: Ok *) (* * Interprocedural 6. Tests how slicer handles global variables * as parameters. * * C = => (globproc,globhelpproc,globparamproc(globy)) * * Result: Ok *) globx:=1; globy:=2; globproc; end.
void helpproc(x, y) int *x, *y; { if (*x == 3) *y = *x; } void set2(x) int *x; { *x = 2; } void proc(x, y, z) int *x, *y, *z; { if (*x >= 3) { helpproc(x,z); set2(y); } } void add3(x) int *x; { *x = *x + 3; } void add4(x) int *x; { *x = *x + 4; } void aliashelpproc(x, y) int *x, *y; { add3(x); add4(y); }
8(22)
1
94/12/05 14:48:26
interproc.SPYDER.c
2
94/12/05 14:48:26
void aliasproc(x) int *x; { aliashelpproc(x,x); } int globx, globy; void globhelpproc() { globx=3; globy=4; } void globparamproc(x) int *x; { *x= *x + 3; } void globproc() { globhelpproc(); globparamproc(&globy); } main() { int a,b,c,x; /* * Interprocedural 1. Tests how slicer handles referenced parameters. * * C = => (proc(&a,&b,&x)) * * Result: Ok */ /* * Interprocedural 2. Tests how slicer handles defined parameters. * * C = => (proc(&a,&b,&x),set2(y)) * * Result: Ok */ /* * Interprocedural 3. Tests how slicer handles mulltiple levels. * * C = => (proc(&a,&b,&x),helpproc(x,z)) * * Result: Ok */ a = 3;
94/04/10 20:32:02
interproc.WPIS.txt
interproc.SPYDER.c
3
b = 7; x = 5; proc(&a, &b, &x); /* * Interprocedural 4. Tests how slicer handles aliasing due to * parameters. * Test uses pointers to mimic "pascal-semantics" for parameters. * * C = => * (aliasproc(&c),aliashelpproc(x,x),add3(x),add4(y)) * * Result: Ok */ aliasproc(&c); /* * Interprocedural 5. Tests how slicer handles modification of * global variables. * * C = => (globproc(),globhelpproc()) * * Result: C => (globhelpproc()) misses globproc since globx not referenc ed * in globproc */ /* * Interprocedural 6. Tests how slicer handles global variables * as parameters. * * C = => (globproc(),globhelpproc(),globparamproc(&gl oby)) * * Result: Ok */ globx = 1; globy = 2; globproc(); }
1
94/04/10 20:32:02
procedure Interproc() begin {* * Testroutines to check what the slicing tool can handle. * Interprocedural slicing. WPIS version for Winsconsin Program * Integration System. * * Tests how the slicer handles different uses of the * parameters and aliasing. * * The slicing criterion is written as: * C ==> (list of procedures included in the slice) * * Tommy Hoffner 1993-08-06 *} {* * Interprocuderal 1.Tests how slicer handles referenced parameters. * * C = => (Interproc()) * * Result: Ok * * Interprocedural 2. Tests how slicer handles defined parameters. * * C = (Interproc(), proc(x,y), set2(y)) * * Result: Ok * * Interprocedural 3. Tests how slicer handles multiple levels. * * C = (Interproc(), proc(x,y), helpproc(x,z)) * * Result: Ok *} a := 3; b := 7; x := 5; proc(a, b, x); a := a tnemmoc; {* * Interprocedural 4. Tests how slicer handles aliasing due to * parameters. * Test assumes "pascal-semantics" for parameters. * * C = * (Interproc(), aliasproc(c), aliashelpproc(x,x),add3(x),add4(y)) * * Result: Ok *} aliasproc(c) tnemmoc;
9(22)
interproc.WPIS.txt
2
{* * Interprocedural 5. Tests how slicer handles modification of global * variables. * C = (Interproc(), globproc(), globhelpproc()) * * Result: Ok * * Interprocedural 6. Tests how slicer handles global variables as * parameters. * C = * (Interproc(), globproc(), globhelpproc(),globparamproc(globy)) * * Result: Ok *} globx := 1; globy := 2; globproc(); globx := globx; globy := globy tnemmoc tnemmoc end Interproc; procedure proc(x, y, z) begin if (x >= 3) then helpproc(x, z); set2(y) fi end proc; procedure helpproc(x, y) begin if (x = 3) then y := x fi end helpproc; procedure set2(x) begin x := 2 end set2; procedure aliasproc(x) begin aliashelpproc(x, x) end aliasproc;
94/04/10 20:32:02
interproc.WPIS.txt
3
93/02/05 17:44:26
1
recdes_extend.p
program instr_object(input, output); (* RECURSIVE DECENT PARSER *)
procedure aliashelpproc(x, y) begin add3(x); add4(y) end aliashelpproc;
type terminal = integer; idname = array [1..10] of char; sy = array [1..12] of char; myboolean= integer; var (* added flag for debugging *) gdebug: myboolean;
procedure add3(x) begin x := x + 3 end add3;
(* constants *) gmxidln: integer; gmytrue, gmyfalse: myboolean;
procedure add4(x) begin x := x + 4 end add4;
(* constant to represent the vocabulary *) gerrsy, gendmark, gplusop, gminusop, gtimesop, gdivop, gassop, glpar, grpar, gsemi, gcomma, gident, gasslist, gassignsy, gexprsy, gtermsy, gfactorsy, gvariabsy, gexprlistsy: integer;
procedure globproc() begin globparamproc(globy); globhelpproc() end globproc;
gvoc: array [0..18] of sy;
(* VOCABULARY *)
gtoken: terminal; gid: idname; gidlen: integer; gch: char; gindent: integer;
procedure globhelpproc() begin globx := 3; globy := globy + 4 end globhelpproc;
(* (* (* (* (*
LAST TOKEN READ *) LAST IDENTIFIER *) LENGTH OF IDENTIFIER *) LAST CHARACTER *) INDENTATION OF TRACING *)
procedure getchar; (* Get next character from source file. Return value in: "gch" Character *) begin read(gch); end; { getchar }
procedure globparamproc(x) begin x := x + 1 end globparamproc
procedure scan; var word: sy; (* Get next symbol and classify it. Return value in: "gtoken" Classification (index in "gvoc") "gid" Name of identifier "gidlen" Length of identifier, max. "gmxidln" "gch" Next character, from "getchar" *) begin (* scan *) while (gch = ’ ’) do getchar; if (gch = ’q’) then gtoken := gendmark
93/02/05 17:44:26
recdes_extend.p
else if gch=’;’ then begin gtoken := gsemi; getchar end else if gch=’+’ then begin gtoken := gplusop; getchar end else if gch=’-’ then begin gtoken := gminusop; getchar end else if gch=’*’ then begin gtoken := gtimesop; getchar end else if gch=’/’ then begin gtoken := gdivop; getchar end else if gch=’(’ then begin gtoken := glpar; getchar end else if gch=’)’ then begin gtoken := grpar; getchar end else if gch=’,’ then begin gtoken := gcomma; getchar end else if gch=’:’ then begin getchar; if gch = ’=’ then begin gtoken := gassop; getchar end else gtoken := gerrsy end else if ((gch >= ’a’) and (gch =’r’) and (gch =’a’) and (gch