November 25, 1997
Precise slices of block-structured programs with goto statements Arun Lakhotia and Jean-Christophe Deprez The Center for Advanced Computer Studies University of Southwestern Louisiana Lafayette, LA 70504 (318) 482-6766, -5791 (Fax)
[email protected] Revision history:
Abstract This articles presents an algorithm for creating precise, static, executable, slices for programs containing arbitrary jump statements. The slices are precise in that they do not contain any more action statements—assignments and predicates—than those included in a non-executable slice create by the backward, transitive closure of a program dependence graph. While similar in approach to Choi & Ferrante’s algorithm, the algorithm presented is applicable to a syntactically richer language and creates slices that do not contain dead goto statements. Consistent with Weiser’s observation, to construct precise slices this algorithm does not just eliminate statements it may also transform some statements. While such transformations may not be acceptable in some applications, the slicing algorithm presented is well suited for reengineering or restructuring legacy systems.
1 Introduction A slice of a program with respect to a program point p is the set of statements of the program that might affect the behavior of the program observed at p; the program point p is said to be its slicing criterion*. Program slicing has several application, such as: system generation, debugging, verifying *
Our definition of a slice is slightly different from the original definition introduced by Weiser [Wei84]. Relation between the two definitions
and a survey of slicing algorithms may be found elsewhere [BG96, Kam95, Tip95].
Working Draft
1
Do not circulate or quote
November 25, 1997
requirements [Wei79], program integration [HPR89], restructuring [KCK94], and testing [HD95], to name a few. The definition of slice as a set of statements of the original program is satisfactory for applications, such as debugging. However, if program slicing is used to extract specialized functions [RT96], or for restructuring functions [KCK94], or for program integration [HPR89], the slice should be a syntactically complete unit—a function or a program—that can be compiled separately and executed. Algorithms that identify the set of statements belonging to a slice may be classified in two categories [Tip95]: data-flow equation based—progeny of Weiser’s algorithm [Wei84]—and program dependence based (PDG) based—progeny of Ottenstein & Ottenstein’s algorithm [OO84]. Both these classes of algorithms include in a slice only the statements that either alter or use variables of a program. Since goto statements do not alter or use the variables of a program, they are not included in a slice. To create slices that are syntactically complete units, henceforth referred to as executable slices, one must also identify the goto statements belonging to a slice. We present an algorithm for program slicing that computes precise, executable slices for programs containing any type of static, local goto statements. Like previous efforts [Agr94, BH93, CF97, CF94, HD96, Lyl84, Gal90], our algorithm is restricted to programs with goto statements whose targets are static (as opposed to dynamic targets determined at runtime based on the value of some variable) and that are local, i.e., do not jump across procedure boundaries. The slices we produce are precise in that they do not include any more action statements—assignments and predicates—than those identified by a backward transitive closure of the PDG. The precision is achieved without trading-off efficiency as compared to the PDG-based slicing algorithm. Our slicing algorithm consists of the following three steps:
1. Identify the set of action statements in the slice. This may be computed using the PDG-based closure algorithm. Working Draft
2
Do not circulate or quote
November 25, 1997
2. Identify the set of goto statements that should be included in the slice. The goto statements to be included are based upon the action statements included and excluded in Step 1. 3. Transform the original abstract syntax tree (AST) to create a new AST representing the program slice. The three separate steps—one for identifying the action statements in the slice, one for identifying the goto statements in the slice, and one for creating the AST—imply that our algorithm is not really restricted to PDG-based closure algorithm. A descendant of Weiser’s algorithm could equally well be used in Step 1. This is further made possible by the fact that the computations in Step 2 do not require the PDG, they only depend on the control flow graph (CFG). We choose the PDG-based closure algorithm in Step 1 only as a matter of personal preference. In the last step, to compute precise slices we do not just eliminate statements, we also replace branch conditions by new goto statements. Our use of transformations, other than statement elimination, conflicts with the de facto definition accepted by the community that a program slice is a statement elimination -only transformation. However, the use of transformations other than elimination is consistent with Weiser’s observation that precise “source language slicing requires transformations beyond statement deletion” [Wei79, page 6]. While such transformations may be unacceptable in certain applications, they are acceptable for system generation, program restructuring, and extracting reusable components. The algorithm presented in this paper is similar in precision to the algorithms of Choi & Ferrante [CF94] and Harman & Danicic [HD96]. Our algorithm is an improvement upon Choi & Ferrante’s algorithm in that (a) it is applicable for a syntactically richer language and (b) the slices it creates do not have dead goto statements. Our algorithm is more efficient than Harman & Danicic’s algorithm and gives precise slices for a larger class of programs (for the same language), as discussed later. The rest of the paper is organized as follows: In the next section we compare our algorithm with previous works and present examples enumerating the improvement achieved by our algorithm. In Section 3 we describe various program representations and notations used in our algorithm. We present our algorithm in two parts: the base algorithm—presented in Section 4—and the optimized Working Draft
3
Do not circulate or quote
November 25, 1997
algorithm—presented in Section 6. The base algorithm is simplified for the purpose of proving its correctness, the subject of Section 5. The optimized algorithm uses meaning preserving transformations to improve upon the results of the base algorithm. Section 7 contains our concluding remarks and is followed by the references.
2 Related works This section places our work in the context of the previous works. We limit our comparison to algorithms that create static, backward, slices [Ven91] of procedural programs containing jump statements. The slices constructed are backwards in that they identify statements affecting the values of the slicing criterion. The slices are static in that they include statements that may affect the slicing criterion on some execution. The relevant algorithms [Agr94, BH93, CF97, CF94, HD96, Lyl84, Gal90] are compared along three dimensions: their scope, their approach, and the precision of their results. The scope covers issues such as the intended application context for which the algorithm is proposed and thef choice of programming language. The approach covers the algorithm strategy.
2.1 Comparison of scope and approach
All of the relevant algorithms, as well as ours, are restricted
to programs containing static, local goto statements. Lyle [Lyl84] and Gallagher [Gal90] have extended Weiser’s algorithm to identify such goto statements. Agrawal [Agr94]; Ball & Horwitz [BH93] ; Choi & Ferrante [CF94], Harman & Danicic [HD96], and Cifuentes & Fraboulet [CF97] have proposed PDGbased algorithms for the same problem. Choi & Ferrante have proposed two algorithms—referred in here as Algorithm I and II—that differ in their approach and precision. Lyle and Gallagher’s algorithms—intended for aiding debugging and maintenance—are limited to identifying the set of goto statements in a slice. They do not create executable slices. The other algorithms—all of them PDG-based—create executable slices. Working Draft
4
Do not circulate or quote
November 25, 1997
A slicing algorithm is called elimination-only if it creates executable slices by simply eliminating statements from the original program. Algorithm I of Choi & Ferrante and the algorithms of Agrawal, Ball & Horwitz, and Harman & Danicic are elimination-only. If an algorithm uses transformations other than statement elimination, we call it transformation-based algorithm. Choi & Ferrante’s Algorithm II and our algorithm are transformation-based. Choi & Ferrante have based their algorithm on a flat language—a language without nested statements. While this language is sufficient to model the control flow of any procedural language, it does not highlight issues related to rich syntax of a block structured language. Assembly language programs, the subject of Cifuentes & Fraboulet algorithm, represent the syntactically “poor” extreme in languages. These programs are flat and devoid of syntactic boundaries between procedure or function units. All the other algorithms have been proposed for block-structured languages, permitting nested if-then-else and while-do statements. Our algorithm has another similarity with Gallagher’s, Choi & Ferrante’s Algorithm II, and Harman & Danicic’s algorithms. These algorithms separate the task of finding the action statements in a slice and the task of finding the goto statements in a slice. The other algorithms do not have such a clean separation. Either they perform both the tasks in one step [BH93] or they iterate between the two tasks [Agr94]. Our algorithm further separates the task of transforming the AST to create a new program, a step delineated by Ball & Horwitz’s algorithm [BH93], but not the other algorithms. Our statement slicing algorithm being based on the standard semantics of goto statements does not satisfy the assumption of Ball and Horwitz’s AST pruning algorithm. When a statement is not in a slice we cannot simply delete the complete subtree rooted at its node. Instead, we replace the deleted block statement by the result of recursively pruning its children subtrees. 2.2 Precision improvement by our algorithm We use the PDG-slice, the set of action statements in a slice identified by the PDG-based closure algorithm, as a benchmark to evaluate the precision improvement offered by our algorithm. The precision Working Draft
5
Do not circulate or quote
November 25, 1997
x := 1;
x := 1;
x := 1;
y := 1; L1:
y := 1;
if (y