An Interval-Based Approach to Exhaustive and ... - ENSTA ParisTech

An Interval-Based Approach to Exhaustive and Incremental Interprocedural Data-Flow Analvsis * MICHAEL BURKE IBM T.J. Watson Research Center

We reformulate interval analysis so that it can he applied to any monotone data-flow problem, including the nonfast problems of flow-insensitive interprocedural analysis. We then develop an incremental interval analysis technique that can be applied to the same class of problems. When applied to flow-insensitive interprocedural data-flow problems, the resulting algorithms are simple, practical, and efficient. With a single update, the incremental algorithm can accommodate any sequence of program changes that does not alter the structure of the program call graph. It can also accommodate a large class of structural changes. For alias analysis, we develop an incremental algorithm that obtains the exact solution as computed by an exhaustive algorithm. Finally, we develop a transitive closure algorithm that is particularly well suited to the very sparse matrices associated with the problems we address. Categories and Subject Descriptors: D.3.4 [Programming optimization; F.3.3 [Logics and Meanings of Programs]: General Terms: Algorithms,

Languages]: Processors-compilers, Studies of Program Constructs

Design, Theory

1. INTRODUCTION The transformations performed by an optimizing compiler are based on an identification of the variables that are modified or used by the individual statements of the program. Where a statement contains a procedure call, this identification is problematic: Exact modification and use information for a procedure call requires knowledge of the execution paths taken within the procedures directly or indirectly invoked via the call site, as well as within the procedures that (directly or indirectly) invoked the calling procedure. These paths are generally not known at compile time. However, an analysis of events occurring on at least one execution path (may information) or of those occurring Author’s address: IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1990 ACM 0164-0925/90/0700-0341$01.50 ACM Transactions on Programming Languages and Systems, Vol. 12, No. 3, July 1990, Pages 341-395.

342

l

Michael Burke

on all paths (must information) does not depend on the path actually taken and can be computed at compile time. Information collected in this manner with respect to a procedure call (or procedure) is referred to as summary information 12, 10, 11, 20, 22, 441.

One class of may-summary information is the result of flow-insensitive analysis [8,9], which ignores intraprocedural control flow structures, assuming that each statement of a procedure is executed along at least one path through the procedure. Two such may-summary sets for a call site s that are generally of use to an optimizing compiler can be produced by a flow-insensitive analysis:’ (1)

MOD(s)-the

set of variables that are possibly modified by execution of s;

and (2) &G’(s)--the executing s.

set of variables that are possibly used in the course of

These two sets (and summary information in general) are interprocedural in that they pertain to facts determined across procedure boundaries. Facts that are determined entirely within procedure boundaries are intraprocedural. We now illustrate how the absence of summary information may impede optimizations in the neighborhood of a call site. Suppose that no call-site summary information is available and that the following sequence of statements occurs in a program: X = A+B*C CALL P(. . .) Y = A+B*C

If A, B, or C are passed-by-reference to or otherwise accessible from P, then it must be assumed that each is possibly modified by it. Thus, the expression A+B*C is not “available” at the statement assigning to Y, although in fact the call might never modify any of these variables. In a similar manner, the movement of loop invariant expressions out of a loop is impeded by the absence of summary information for a procedure call occurring in the loop. Within the context of compiling for a machine that executes instructions in parallel, such as a vector machine or a multiprocessor [l, 6, 42, 451, the loss of execution efficiency inflicted by the absence of summary information is magnified [12]. In testing for the semantic validity of transforming a sequential loop to a sequence of vector statements, data dependences within the loop are analyzed [5, 7, 15, 42, 661. A recurrence holds where the data dependences among a set of statements form a cycle; where the recurrence cannot be broken, the statements r MOD and REF need Furthermore, we follow variables that are used sensitive may-summary and [48]. ACM Transactions

only include variables that are “passed” at the call site s (Section 2). the terminology of Banning [S, 91, who also defines USE(s) as the set of prior to redefinition along at least one execution path of s. USE is a flowset. Algorithms for computing USE are presented in [l], [S], [9], [17], [44]

on Programming

Languages and Systems, Vol. 12, No. 3, July 1990.

An Interval-Based

participating loop:

Approach to Interprocedural

Analysis

l

343

in it cannot be executed in vector mode. Consider the following

DO I=l,lOO A(1) = B(1) + C(1) CALL P( ) END0

Where A, B, or C is accessible from P, it must be assumed that each invocation of P, and thus each iteration of the loop, both modifies and uses every element of each array. The resulting cycle of data dependences between the two statements of the loop disallows vectorization. Similarly, if a function invocation within a loop is to be executed as a function subprogram that passes and returns vector arguments, information about its effects on variables must be available. Within the context of compiling for a multiprocessor, a loop transformation in which each iteration of a loop is converted into a process is discussed in [6], [27], and [45]. Where loop-carried dependences hold among the statements of the loop, some statements of process (k +- I) (for some I > 0) cannot execute until the kth process has completed executing certain statements; once again, the degree of parallelism achieved may be inhibited by data dependences. Here, too, a procedure call within a loop may result in a strictly sequential execution of it in a case where summary information would allow some degree of concurrency. In a multiprocessing environment that executes some procedure calls as processes [ 11, summary information could supply the required data dependence information among calls. We now consider some of the reasons that optimizing compilers have generally not computed interprocedural information: -The potential benefits of interprocedural information are not well understood and perhaps are not generally pronounced in the sequential context [46]. Some languages render certain forms of interprocedural information irrelevant, for example, by prohibiting aliasing or requiring an explicit specification of which formal parameters are possibly modified. As discussed above, this potential seems to be greater in a parallel context. -Where large programs are developed and maintained, a separate compilation facility, which allows for the independent development, testing, and compilation of external procedures, is of critical importance. This is in conflict with the analysis and use of interprocedural information. Where a procedure P is altered, the compiler must recompute interprocedural information if it is to be used as a basis for optimization of P. Even if one elects not to utilize interprocedural information for a separate compilation, it remains possible that a change to one procedure may alter interprocedural information that was utilized in the optimization of another procedure, possibly invalidating that previous compilation. When information from previous compilations is not available and the program is altered, an exhaustiue algorithm, that is, one that views the program as an entirely new input, is required to recompute interprocedural information. Even if interprocedural information is not recomputed, ACM Transactions

on Programming


344

l

Michael Burke

the possibility of invalidating a previous compilation necessitates the recompilation of the entire program. The latter difficulty is alleviated once a program development environment allows the retention of information from previous compilations. Issues arising in the design of an interprocedurally optimizing compiler that is embedded in such a programming environment are specifically addressed in [l], [4], [20], [26], and [54]. Program information, including interprocedural information determined in previous compilations, can be maintained in a database that the compiler can access and update. This update of interprocedural information to reflect changes in the program is via an incremental algorithm, that is, one that updates the solution from a previous analysis by recalculating only the affected part of the solution. Thus, interprocedural information can be kept current without a reanalysis of the entire program each time a procedure is modified. After a change to a program’s interprocedural information, it is desirable to recompile only procedures with previous compilations that have been invalidated or that could be improved by the change. Within the context of a program-development environment, techniques have been developed to determine, with varying degrees of conservativeness, which procedures must be recompiled [ 171. This paper is an augmented and improved version of [13]. Section 2 develops the program model in terms of which MOD and REF are computed. Section 3 reviews relevant graph terminology. Section 4 reviews the previous work that has influenced our exhaustive algorithm for computing MOD and REF. This algorithm is presented in Sections 5-12. It utilizes the elimination data-flow technique, interval analysis, which we describe in Section 7. The asymptotic complexity of the exhaustive algorithm is analyzed in Section 13. Given the above considerations regarding a suitable programming environment for optimizations based on interprocedural information, in Section 14 we develop an incremental version of the algorithm. In Section 15 we consider, among other implementation issues, the program-database information required by the incremental algorithm. 2. THE PROGRAM

MODEL

We extend Banning’s block-structure model of program structure and execution [B, 91 so that a program consists of a set of one or more external procedures, where an external procedure is one that is not contained (declared) within another procedure but that may contain procedures nested within it. One of the external procedures is the main procedure. A system procedure that invokes the main procedure is presumed; the execution of the program commences with this invocation. Recursion is allowed: A procedure may directly or indirectly invoke itself. The static block structure of a program may be represented as a tree, where the root represents the system procedure and where the external procedures of the program are its direct descendants. The variables declared directly within a procedure are local to the procedure, while the variables declared in the ancestors of a procedure are global to it. The set of variables global to procedure P is denoted GLOBAL(P). Among the local variables of a procedure P are zero or more formal parameters: The set of such variables in P is denoted FORMAL(P). A variable that is either local or global with respect to a procedure P is known to ACM Transactions

on Programming


An Interval-Based


Analysis

.

345

P. An external variable is one that is global to all the procedures of a program: It is local to the system procedure. We modify Banning’s model to allow for “hiding” of global variables. The local variables of a procedure are visible to it; its global variables that are not hidden from it are also visible. The specific mechanism for hiding is irrelevant to our analysis, so we do not assume one. One mechanism provided by block-structured languages for hiding a global variable is the declaration of a local variable of the same name. Our model for procedural interaction is essentially the same as Banning’s. A statement that invokes a procedure is referred to as a call site. It designates a called procedure, which must be visible to the procedure containing the call site (the calling procedure). For each formal parameter of the called procedure, the call site must designate an argument that is associated with it. An argument may be a reference argument, which is a variable that is visible to the calling procedure and is passed-by-reference to its corresponding formal parameter. When the call site is invoked, a formal parameter that is associated with a reference argument assumes the same address in memory as the argument. Procedures interact at call sites through reference arguments and also through variables that are global to the called procedure. Thus, a call site s is said to pass a variable X to a variable Y if and only if variable Y is the same variable as X and is global to the called procedure, or X is passed-by-reference to Y. The candidates for MOD(s) and REF(s) are the variables that are passed at s. A call chain is a sequence of call sites such that, at each site, the called procedure contains the next call. Where there is a call cham sl, s2 . . . S, such that each si passes Xi to Xi+l, X1 is bound to X,+, along the corresponding path in the call graph. For a call site s in procedure P, we do not exclude hidden (with respect to P) variables from MOD(s) or REF(s). In the context of parallel execution on multiple processors, the effects of a call on hidden variables must be determined [15, 361. In this context there is a case in which the effect of a call on a variable that is not even known to P must be determined. Where the called procedure Q has a local, static variable X that has an “upward-exposed” use with respect to Q (e.g., as in a random-number generator that retains the value of its most recent number), at least one of the definitions of X reaches the use from one invocation of Q to the next. Such a data dependence is relevant to parallelization at the call site: X should be included in both MOD(s) and REF(s). In terms of our program model, such a variable should be regarded as global to Q. 3. GRAPH TERMINOLOGY We now state some terminology that we will find useful. A flow multigraph G = (V, E, r) is a finite set V of nodes that includes a distinguished start node r and a finite multiset E of edges. An edge is an ordered pair (v, w) of nodes; v is the source of the edge, and w, its target. Where the edge (v, w) is in E, we say that v is a predecessor of w and w a successor of v. A sequence of edges (vi, vZ), (vZ, us), . . . ) (v,-, , v,) in G is a path from v1 to v,. There is an empty path from any node to itself. We denote a path either by listing its sequence of edges or by listing the nodes it contains, enclosed in angular brackets: ( vlvzvB . . . v,-Iv,). A suffix of (VlV2.. . v,) is a path (viUi+l . . . v,-~v, ), where 1 5 i 5 n. If there is a path from ACM Transactions

on Programming


346

l

Michael Burke

Ui to Uj, we say that Vi reaches Uj or that Uj is reachable from vi. Every node in G is reachable from r, and r is not the target node of any edge in E. For a path p1= (u1up.. . u,-~u~) and a path pz = ( u,,,v,,,+~ . . . u,-lu,), we may concatenate p2 topI to form the path p1 l p2 = (ulv2., . v,-~v,u,+~ . . . u,-,v,). A cycle is a path for which u1 = v,. A path is simple if all nodes on the path, except possibly the first and last, are distinct. A region R of G is a multigraph whose nodes are in G and such that an edge (m, n) of G is in R if and only if m and n are both in R. A node m is an entry node of R if there is an edge (u, m) of G such that v is not in R. We denote a region by listing the set of nodes it contains. A path is internal to a region if each of its edges belongs to the region. A region is strongly connected if every node in it is reachable from every other node. Following [61], we denote sets of paths by regular expressions, where the regular expression operators of U, 0, and * correspond to the union, concatenation, and reflexive transitive closure of sets of paths, respectively. A depth-first spanning tree (dfst) T of G can be generated by starting at r and traversing G in depth-first fashion, numbering the nodes in increasing order as they are visited during the search [59].’ The nodes are then said to be numbered in preorder. An edge (m, n) in G is3 -a tree edge if it is in T; -a forward edge if it is not in T and if n is a proper descendant of m in T; -a back edge if m is a descendant of n in T; and -a cross edge if neither m nor n is a descendant of the other in T where (1, h) is a back edge, h is a header node, and 1 is a latch node. The interval order of the nodes of G is the order in which they are visited by a reverse postorder traversal. Where, for the ith node in interval order, NUMBER(node) is assigned the integer i, the resulting assignment is such that tree, cross, and forward edges (u, w) satisfy NUMBER(v) < NUMBER(w). If G is acyclic, then interval order is a topological ordering.4 A path ( uluz . . . u,) whose sequence of nodes is in interval order is a forward path. The call graph of a program is a flow multigraph for which each procedure is uniquely represented by a single node and each call site by a unique edge. The start node represents the system procedure. The node representing procedure P shall be referred to as node P. The edge (P, Q) represents a call site in P that invokes Q. By the definition of flow multigraph, it is assumed that every node in the call graph is reachable from the system procedure. 4. THE EXHAUSTIVE

ALGORITHM:

PREVIOUS

WORK

In this and the ensuing section, we focus on the MOD computation. The REF computation has a direct and obvious analogue. Our algorithm for computing * This search has generally been defined for directed graphs. With respect to the depth-first search, should be regarded as a graph: Where there is more than one edge (v, w), one single edge (u, w) is chosen to represent them. Furthermore, a reducible flow graph G has a unique

a multigraph dfst T [34].

3 The classification of an edge in G as a tree, forward, back, or cross edge is according classification of the edge representing it. 4 The numbering of nodes here is equivalent to Tarjan’s “s-numbering” [60]. ACM Transactions

on Programming


to the

An Interval-Based


Analysis

347

MOD builds upon a decomposition of the MOD problem developed by Banning [8, 91 and by Cooper and Kennedy [20, 221. Banning divides the MOD computation into two components: “potential alias” analysis and a “side-effect” computation that ignores aliasing effects. Two variables are aliased to each other when, at a given point in the execution of a program, the same storage location can be accessed through a reference to either of them. Banning confines his analysis of aliases to the interprocedural aliases generated by the reference argument mechanism, by which a procedure call can cause two distinct variables to refer to the same location in the called procedure. Within a framework that factors aliasing out of the side-effect computation, Banning develops an algorithm that computes MOD more efficiently than previous work and is precise up to symbolic computation [a]. SITEMOD consists of the set of variables possibly modified by a call site s independently of the alias relation holding in the program. MOD(s) for a call site s is computed by determining SITEMOD and factoring in the alias relations holding at that point.5 For each call site in the program, SITEMOD is computed from summary information that is generated with respect to the invoked procedure. The DIRECTMOD set of a procedure P is the set of variables modified strictly within P (the effects of any called procedure and of aliasing are excluded), so that DIRECTMOD information is intraprocedural. The PROCMOD set6 for a procedure P is the set of global variables and formal parameters of P that belong to DIRECTMOD (P) or to SITEMOD (s) for a call site s in P. Given the PROCMOD set for a procedure P, the SITEMOD set for any invocation of P can be calculated by examining the call site’s passing of variables to the members of SITEMOD. For example, if PROCMOD(P) = (A, C, X], where the formal parameter list for P is (A, B, C) and X is a global, then the statement CALL P(D, E, F) has (0, F, X) as its SITEMOD set. Banning formulates the computation of the PROCMOD sets of the procedures in a program as a “data-flow problem” that can be solved by either an iterative [35, 41, 491 or an elimination (Section 6) technique, but is not fast in the sense of Graham and Wegman [31] and therefore also not rapid in the sense of Kam and Ullman [37] (Appendix B).7 Cooper and Kennedy identify the source of nonfastness in Banning’s framework as the tracking of formal parameter bindings. They also observe that the ’ For dependence analysis, Banning formulates a technique that is more precise than comparing the MOD sets of the relevant statements [9]. His test determines whether either the relevant SITEMOD sets have an element in common or one of the sets contains an element that is aliased to an element of the other. 6 SZTEMOD, DZRECTMOD, and PROCMOD are termed DMOD, IMOD, and GMOD, respectively, by Banning. 7 Banning [8] demonstrates that the Graham-Wegman elimination technique is applicable by formulating MOD as an information propagation problem in the sense of Graham and Wegman. Significantly, he replaces the monotonicity condition by the stronger condition of distributivity, resulting in a framework that is similar to Tarjan’s continuous data-flow framework [61]. An iterative data-flow algorithm for solving Banning’s formulation is presented in [20]. ACM Transactions

on Programming

Languages and Systems, Vol. 12, No. 3, July 1990

348

l

Michael Burke

PROCMOD computation for formal parameters is independent of the PROCMOD computation for globals, except in the case where a global is passed as a reference argument. This case introduces an alias in the called procedure between the global and the formal parameter to which it is passed. If such aliasing information is eventually factored in, the PROCMOD computation can be divided into two components by separating the analysis of global variables from that of formal parameters.8 Having separated the FORMAL-PROCMOD computation from the GLORALPROCMOD computation, Cooper and Kennedy decompose the former into an (interprocedural) formal bound set computation and an (intraprocedural) FORMAL-DIRECTMOD computation. The set of all formal parameters in the program that are bound to the formal parameter A along some path in the call graph comprises the formal bound set of A, which we denote BOUND(A).Q The formal bound set computation determines BOUND(A) for each formal parameter A in the program. Given the formal bound sets and FORMAL-DIRECTMOD sets, FORMAL-PROCMOD sets can be computed, for each formal A in FORMAL-DIRECTMOD, by including each element belonging to BOUND(A). Having isolated the formal bound set computation (the nonfast component of PROCMOD), Cooper and Kennedy formulate it as a continuous data-flow framework in the sense of Tarjan [61] (Appendix A.2). As such, it can be solved by elimination data-flow techniques and serves as a more practical basis for this approach than Banning’s formulation. Cooper and Kennedy [23] develop a solution to the bound set computation based on Tarjan’s elimination data-flow algorithm [62]. Cooper and Kennedy later developed an algorithm for the formal bound set computation, which is based on the program’s binding multigraph [24]. This graph contains an edge for each passing of one formal parameter to another. This algorithm also analyzes formal parameters and global variables separately. The complexity of the algorithm is O(n + e) logical operations (where n is the number of nodes and e is the number of edges in the binding multigraph). On the assumption that for programs the average number of formal parameters in a procedure and the average number of actual parameters at a call site are both bound by a constant (i.e., do not grow with program size), the size of the binding multigraph is within a constant factor of the size of the call graph (in terms of both edges and nodes). Without empirical evidence, it is difficult to judge whether these assumptions are correct. If one concedes the constant bound, it remains difficult to estimate the size of the constant factor. The algorithms developed here are based on the call multigraph, not on the binding multigraph, and are related to the earlier work of Cooper and Kennedy. We shall not further discuss the work of Cooper and Kennedy that is based on the binding multigraph. 5. THE MOD COMPUTATION The preceding section described the reduction, by Banning and by Cooper and Kennedy, of the MOD computation into essentially three interprocedural subcomputations: alias analysis, the formal bound set computation, and the ‘Given this separation, however, the factoring in of aliasing information must be adjusted. We address this issue in Section 10.2. ‘A is regarded as bound to itself along the empty path, so BOUND(A) includes A. ACM Transactions

on Programming


An Interval-Based


Analysis

349

MOD

/\ SITEMOO

Aliases

/\ FORMA4PROCMOD

GLOBA4PROCMOO

/\ Bound

Sets

/\ FORMA4DlRECTMOD

Fig. 1.

Reaching-Procedure

Sets

GLOBACDIRECTMOD

The MOD decomposition.

GLOBAL-PROCMOD computation. Consider the GLOBAL-PROCMOD computation. It is analogous to the FORMAL-PROCMOD computation in that the program variables potentially affected by the presence of a global in DIRECTMOD(P) are those that are directly or indirectly passed to it (recall that a variable that is global to the called procedure is passed by the call site). We extend the above decomposition of the MOD computation one step further by dividing the GLOBAL-PROCMOD computation into a GLOBAL-DIRECTMOD computation and a computation of global variable binding patterns. This decomposition is analogous to the Cooper and Kennedy decomposition of the FORMAL-PROCMOD computation. This division of the GLOBAL-PROCMOD computation provides a framework in which the exhaustive computation can treat globals and formals in a uniform manner. Furthermore, it completes the separation of the inter- and the intraprocedural components of the PROCMOD computation, which is advantageous in an incremental context. We now formalize our notion of global binding patterns. A procedure P is a member of the reaching-procedure set RP(X, Q) associated with a (global variable, procedure) pair (X, Q) iff X is bound to itself along a call chain originating at P and terminating at Q. For a global variable X in DIRECTMOD( X is in the GLOBAL-PROCMOD set of all those procedures in RP(X, Q) with respect to which it is global. For each procedure Q in the program, the reaching-procedure computation determines RP(X, Q) for all variables X global to Q. We decompose the GLOBAL-PROCMOD computation, then, into the GLOBAL-DIRECTMOD and reaching-procedure computations. With our decomposition of the GLOBAL-PROCMOD computation, the MOD computation has now essentially been reduced to three interprocedural subcomputations: alias analysis, the formal bound set computation, and the reachingprocedure computation. Figure 1 summarizes the resulting MOD decomposition. We now illustrate the decomposition of the MOD computation with the example of Figure 2. An intraprocedural analysis of R determines that X E GLOBAL-DIRECTMOD and C E FORMAL-DIRECTMOD( Since RP(X, R) = (P, Q), X belongs to the GLOBAL-PROCMOD sets for P, Q, and R. Since the formal B is bound to the formal C, B E FORMAL-PROCMOD(Q) (as well as belonging to FORMAL-PROCMOD(R)). Given the PROCMOD sets for R and Q, it is easily determined that the SITEMOD set for the call to R is ACM Transactions

on Programming


350

l

Michael Burke EXTERNAL X PROCEDURE P CALL

Q(D,D)

END MAIN PROCEDURE Fig. 2.

Example program.

CALL

Q(A,B) R(B)

END Q SUBROUTINE

R(C)

x= c= END R

(B, X) and the SITEMOD set for the call to Q is (D, X). Alias analysis determines that A and B are potentially aliased (Section 10) in Q due to the call site in P. Factoring alias relations into SITEMOD sets to yield MOD sets, the MOD set for the call to R is {A, B, X). The MOD set for the call to Q is identical to its SITEMOD set. In Section 12 we summarize the algorithm for combining the solutions to the aliasing and the formal- and global-binding problems to yield the MOD solution. We apply the interval analysis data-flow technique (Section 7.2) to the computation of both formal bound sets and reaching-procedure sets. Interval analysis has served as a basis for a simple and efficient implementation of intraprocedural data-flow analysis [29]. An important criterion here for the techniques used to compute the exhaustive solution is that they also serve as a practical basis for the computation of incremental solutions. Interval analysis is a suitable basis for incremental data-flow analysis 1.52, 561, even where changes are made to the structure of the flow multigraph [13, 541, and as such provides the basis for an incremental computation of MOD. As classically formulated [3, 581, interval analysis can only be applied to fast problems. lo However, Rosen [49, 511 has shown that in principle all elimination data-flow algorithms (a family that includes interval analysis) are applicable to the same class of problems.” Here we extend the formulation of interval analysis to accommodate nonfast problems, allowing its application to the formal bound set computation. 6. DATA-FLOW

PROBLEMS

In the realm of data-flow problems, a flow graph G = (V, E, r) represents the structure of a procedure or program. A node in G represents a segment of code, and an edge represents a possible transfer of control between such segments. lo Schwartz and Shark formulate the domain as even more narrow than this: See Appendix I1 This class of problems is equivalent to Tarjan’s monotone data-flow framervork (Appendix ACM Transactions

on Programming


A. A.2).

An Interval-Based Approach to Interprocedural Analysis

351

Data-flow analysis entails determining, for each node, facts that hold at that point regardless of the actual path of execution. We use a semilattice L to represent the universe of possible program facts. A semilattice is a nonempty set L having an idempotent, commutative, and associative meet operation A. We denote the facts associated with a node n by INFO(n), which is thus a mapping from V to L. For a meet semilattice, we define a partial ordering 5 such that x 5 y M x A y = x. Where x 5 y, we say that y is higher in the semilattice than x. A meet semilattice is complete where every nonempty subset X of L has a greatest lower bound AX with respect to 1. If X = b 1, x2, *. . , x,), Ax = x1 A x2 A * -. A x,. We use I to denote AL. Where an element y exists such that x A y = x for all x in L, we denote y as T. A meet semilattice naturally represents must information, that is, information that holds at a node if a certain condition is satisfied along all paths to it. A join semilattice [47, 651 more naturally represents may information (where a fact holds if a certain condition is satisfied by some paths to that point). A join semilattice is a meet semilattice “turned upside down”; least upper bounds are considered instead of greatest lower bounds. The two structures are essentially equivalent in that a join-oriented application can always be reduced to a meetoriented application, and vice versa [49]. The use of meet semilattices is more common in the literature. The applications considered by this paper are formulated as meet semilattices. To represent the effect of the program on the universe of facts, we associate each edge e with a data-flow function fe such that if fact x is true at the source node of e then fact f&x) is true at the target node of e. The function thus describes the data-flow effect of the edge. We extend the association of data-flow functions to paths by defining f,,(x) = ( fe, 0 fe,-, 0 . . . 0 fe,)(x) for p = el, e2, . . . , ek. For the empty path A, fA = L. To extend further the association of data-flow functions to sets of paths, we define function meet: (f A g)(x) = f(x) A g(x). The data-flow effect of the set of all paths p from u to u is described by the function f, = A ( f, 1p is a path from u to u). For a set of paths p1 from u to u and a set of paths pz from v to w, the data-flow effect of the set of paths p1 l p2 is described by the function f,, 0 f,,. There may be an infinite number of paths between two nodes. Consider a node u that participates in a cycle, and let f represent the effect of all simple cycles that originate at u. The effect of all paths (including the empty path) that originate at u is described by the function f*(x) = A( f “(x) 1 i I 0). A data-flow framework (L, F) is a complete semilattice L with meet operator A and a class of functions F C (fi L + LJ. A data-flow problem is a data-flow framework (L, F), a flow graph G( V, E, r), and a mapping from E to F. We want to compute its meet-over-all-paths (MOP) solution, which is the mapping mop from V to L given by mop(u) = A { f,(l) 1p is a path from r to u). An elimination data-flow algorithm (interval analysis is in this family), in finding or approximating the MOP, summarizes the data-flow effects of certain paths between certain pairs of nodes [49]. This approach is only possible where F satisfies properties that are described in Appendix A.2. For simplicity, we have limited our formulation of data-flow problems to forward problems. All applications considered in this paper define forward problems. Our formulation of data-flow problems can easily be extended to accommodate backward data-flow problems [43]. ACM Transactions

on Programming

Languages and Systems, Vol. 12, NO. 3, July 1990.

352

l

Michael Burke

7. INTERVALS

AND INTERVAL

ANALYSIS

The interval analysis data-flow technique presented in this section assumes that the flow multigraph G under consideration is reducible [32, 60].12 Irreducible graphs are considered in Section 11. It is also assumed in this section that a dataflow problem satisfies the conditions that allow interval analysis to compute its meet-over-all-paths solution. We refer to this computation as solving the problem. The assumed conditions, which are met by all applications considered in this paper, are stated in Appendix A.2. 7.1 Definitions For a back edge (m, n) in G, the nodes and edges belonging to forward paths from n to m, along with the edge (m, n), form the strongly connected region STR(n, m). Consider the set B(h) of back edges whose target node is h: Each back edge (1, h) in B(h) defines the strongly connected region STR(h, 1). The union of the strongly connected regions defined by the edges in B(h) is the interval region whose header node is h. The header h is the only entry node of the interval; that is, h “dominates” every other node of the interval.13 This definition is essentially equivalent to those formulated by Graham and Wegman and by Schwartz and Sharir; it differs from the Allen and Cocke definition, which does not require an interval to be strongly connected. Intervals may be “nested”: Interval I may be a s&interval of (a region of) interval J. When denoting an interval that contains a subinterval, the nodes of the subinterval will be enclosed in brackets. An interval may contain arbitrarily many subintervals. An interval that is not a subinterval is an outermost interval. An interval that does not contain any subintervals is an innermost interval. An outermost interval and its subintervals constitute an interval nest. The nesting level of an interval is one plus the number of intervals it is contained within. The depth of G is the maximum nesting level among the intervals of G. A node in G (such as r) that is not contained within any interval of G is a singletw node. An exit edge with respect to an interval I is an edge (m, n) such that m belongs to I and n does not. We refer to such a node m as an exit node. A path from the header h to an exit node m is an entrance-to-exit path with respect to I. The set of ail forward paths within an interval from the header h to a node n will be denoted as (h + n). The set of all paths within an interval from the header h to a node n will be denoted as (h..n). An efficient algorithm for finding intervals is described in [!%I. 7.2 The Interval Analysis Technique Where a graph G is acyclic, a data-flow problem defined for it can be solved in the course of a single topological-order traversal.14 When processing node n, each ‘* Reducibility has been formulated for graphs, not multigraphs. In this context the multigraph is to be regarded as a graph in the same manner as we have indicated with respect to depth-first search. I3 This observation, referred to as “Theorem 2” in [60], follows from one of Hecht and Ullman’s structural characterizations of reducible flow graphs [34]. I4 Recall that we have confined our discussion to forward data-flow problems. Kennedy [39] has shown that interval analysis as formulated by Allen and Cocke can be applied to backward data-flow problems. Using essentially the same formalism for data flow that we have developed here, Marlowe [43] shows that elimination algorithms in general can be applied to backward data-flow problems. ACM Transactions

on Programming



l

353

of its predecessor nodes has already been solved for. For each such p in the predecessor set, the function fcp,nj is applied to the solution at p. Performing the lattice meet operation over the outputs of these function applications yields the solution at n. In the case of an acyclic graph, then, the data-flow problem is solved by a forward propagation through G. See Figure 3 for a precise statement of this algorithm. Consider a graph G that contains cycles but is reducible. To determine the solution at a node n, the effect of all paths from the start node r to n must be propagated to n. Where a cycle occurs on such a path, the first (and last) node visited in the cycle must be a header. Any traversal of a back edge completes such a cycle. Thus, a path from r to n can be decomposed into a sequence of subpaths such that each subpath is either a cycle originating with a header node or a forward path. (A cycle originating at the header of an outer interval may contain any number of cycles within its subintervals). Once the data-flow effects of cycles in G that originate at header nodes are accounted for, the data-flow problem for G has essentially been reduced to the acyclic case and can be accomplished in the course of a single interval-order traversal. Interval analysis takes place in two steps: the elimination phase and the propagation phase. The elimination phase determines the data-flow effects of all cycles in G that originate at header nodes. The propagation phase then solves for each node of G in the course of an interval-order traversal. 7.2.1 The Elimination Phase. For each interval I, the elimination phase evaluates the data-flow function f T associated with the set of all paths within the interval that originate at the header. The intervals of each interval nest of G are processed by the elimination phase in an innermost to outermost order. This can be accomplished by processing each interval as its header node is encountered in the course of either a reverse preorder or a postorder traversal of G. After an interval I has been processed, it is “reduced” to its header node h and is represented in this form in the processing of its containing interval. To reduce an interval I with header h is to (1) delete ail edges (u, w) such that u and w are contained in I; (2) replace each exit edge (u, w) in I by the edge (h, w); and (3) delete all nodes contained in I except h. The replacement edge (h, w) is a virtual edge of G. For h to represent I in the associated with each virtual edge (h, w) must reduced graph, the function fch,w) incorporate the data-flow effect of all paths within I from h to w. We now describe the elimination-phase processing of an interval. (1) The interval is traversed in interval order. As each node n of the interval is visited, the function fch.+)associated with the set of all forward paths from h to n is evaluated. (This evaluation is similar to the forward propagation of information through an acyclic graph.) Node h is visited first: The only forward path to it from h is of length zero, so f++h) is the identity function 1. As a node n (other than h) is visited, for each of its predecessor nodes p, fchdpJ has already been evaluated. Composing each such fch+ ACM Transactions

on Programming


354

8

Michael Burke

/* The nodes in G are numbered from I to Iv such that i is the number of the node that is irh in topological order. INFO(i) represents the lattice element associated with node i. The set PRED(i) consists of the predecessorsin G of node i. +/ /* Initialize the start node. +I INFO(l) := I /+ Perform topological traversal.

do i

:= 2 to

*/

IV1

INFO(i) := A {f(p,i)(rNFO(P))

I P E PRED(i)l

enddo

Fig. 3. Solving a data-flow problem for an acyclic graph.

with ftP,nj and forming the function f++nj:

the function

meet of the resulting

f(h+n,= A(fbL, ’ fth+ closure fr of the interval is

functions

yield

Ip E PRED(n)).

(2) The internal the data-flow function representing the set of all simple cycles within the interval originating at the header. Having computed the forward path function fch--tlj for each latch node 1, the internal closure can be computed:

fI = A(f(W)o f(,&) 11 is transitive closure of fi, which

(3) The reflexive, closure of the interval,

a latch in I). we denote

fr* and

refer to as the

is computed.

(4) For each exit node z, the function f(h.4

f(h..=) is computed: = f(h+z) ’ f 7.

This evaluation is based on the observation z can be decomposed into a cycle (or empty followed by a forward path from h to z. In (z, n) is replaced by the edge (h, n), whose as f(h.4 =

that any path within I from h to path) within I originating at h, reducing I to h, each exit edge associated function is evaluated

f(z,n)’ f(h..+

At the conclusion of the elimination phase, G has been transformed into an acyclic graph G’, each node of which either is a singleton in the original graph or represents an outermost interval. For each interval 1, the closure f T has been evaluated and associated with its header. 7.2.2 The Propagation Phase. Given the function associated with each edge and each header node of the original graph G, the propagation phase solves for each node in the course of a single interval-order traversal. Information is simply forward propagated, except that when visiting a header node h the effects of the cycles within its interval are factored in by applying f T to INFO(h) after meeting ACM Transactions

on Programming


An Interval-Based


Analysis

.

355

/* The nodes in G are numbered from 1 to lk’j such that i is the number of the node that is ilh in interval order. INFO(i) represents the lattice element associated with node i. The set PRED(r? consists of those predecessors of node i in G that prcccde node i in interval order. IS-HEADER(n) returns true if and only if node n is a header node. x is the closure of the interval whose header is n. *I

INFO, setting start node to 1. +I INFO(l) := I do i := 2 to IV1 INFO(i) := T enddo

I+ Initialize

I+ Perform interval-order

traversal.

*I

do i := 2 to Icq := INFO(i) INFO(i) h (A {f(p i,(INFO(P)) := INFO(i) if IS-HEADER(i) then INFO(l) enddo Fig. 4.

The propagation

phase of interval

I P E FRED(i))) h f,(INFO(i))

analysis.

the outputs from its predecessors.15 See Figure 416 for a precise statement of the propagation phase of interval analysis. 8. THE

FORMAL

BOUND

SET

COMPUTATION

For each procedure in the program, the formal bound set BOUND(X) of each formal parameter X is to be computed. Consider the program in Figure 5. In the figure a dash in an argument list indicates a variable other than a formal parameter. By the call to Q in P, Pl is bound to Ql and P2 to Q3. Ql is bound to Sl by the call to S in Q, so Pl is also bound to Sl (by the call graph path (PQS)). Similarly, Q3 is bound to S3 by the call to S in Q, so P2 is also bound to S3 along the path (P&S). Along the path (PRS), P3 and Rl are bound to S2 and to S3. The matrix FP-BOUND-CLOSURE (for which the column for formal parameter X represents the formal bound set for X) is given in Figure 6. A one in position (i, j) indicates that the formal parameter of row i is bound to the formal parameter of column j. The formal bound set computation is a data-flow problem. The flow multigraph G = (V, E, r) is the program’s call graph; the start node r is the system procedure. We now formulate the associated semilattice as a complete meet semilattice with a T. Let FORMALS be the set whose elements are the n formal parameters of the program. L consists of the set of all n-tuples whose elements are nonempty subsets of FORMALS. Where x = (sl, . . . , s, ) and y = ( tl, . . . , t,,), x A y = (Sl U tl, . *. , Si U tip . . *, s, U tn ). This semilattice is complete, in that every nonempty subset X = (xi, x2, . . . , x,) of L has a greatest lower bound AX given ” Its predecessors include the latch nodes of the interval, where at this point INFO for each such node is still T. I6 In Figure 4, the existence of T is assumed. If T does not exist for the semilattice, the nodes must be initialized to some other value that is sufficiently high in the lattice. ACM Transactions

on Programming


356

Michael Burke

l

PROCEDURE P(Pl,PZ,P3) . . . CALL Q(Pl,-,PZ) CALL R(P3) . . . END P

Fig. 5. Program example for formal bound set computation.

PROCEDURE . . .

Q(Ql,Q2,Q3)

CALL . . . END Q

S(Ql,Q2,43)

PROCEDURE R(R1) . .. CALL S(-,Rl,Rl) . . . END R PROCEDURE . . * END S

S(Sl,S2,S3)

by x1 A x2 A . . . A x, (or by x1 if X contains a single tuple). AL, or I, is the n-tuple for which each element is the set of all the program’s formal parameters. T is the n-tuple for which the ith element is the set containing the ith formal parameter as its only element. The edge function associated with a call site in procedure P that invokes procedure Q represents its by-reference passing of P's formal parameters to Q’s formal parameters. Where procedure P with formal parameter list (A, B, C) contains the call site CALL

Q(A, C)

and the formal parameter list of Q is (D, E) (Example l), then the function given input (. . . BOUND(A),

BOUND(B),

BOUND(C),

. . . , BOUND(D),

BOUND(E).

BOUND(B), BOUND(C), U BOUND(E). . .).

. . . , BOUND(A)

u BOUND(D),

fCP,gj, . .),

produces output (. . . BOUND(A), BOUND(C)

In general, where a formal parameter A is passed to a formal parameter B at a call site s, applying fS to the input sets BOUND(A) and BOUND(B) associated with A and B, respectively, generates BOUND(A) U BOUND(B) as the output set that is associated with B. The class of functions F meets the conditions required for solving the problem by interval analysis as we formulate it (Appendix A.2). We now apply interval analysis to the formal bound set computation. The elimination phase summarizes the data-flow effects of certain sets of paths. This ACM Transactions

on Programming


An Interval-Based Pl

P2

P3

Pl P2 P3

Approach to Interprocedural Ql 1

42

43

Rl

Analysis

Sl 1

S2

S3

1

1 1

1 1

;::

l

1

1

43 Rl Sl

357

1

1 1

s2 s3 Fig. 6.

FP-BOUND-CLOSURE

for the program of Figure 5.

entails computing the composition and meet of pairs of functions, and computing the reflexive, transitive closure of functions. We first develop a representational scheme for the data-flow functions of our application. A call site s defines a relation PASS,, where (A, B) E PASS, iff A is a formal parameter of the calling procedure that s passes to the formal parameter B of the called procedure. This relation determines the function corresponding to the call site and can be used to represent it. We term this relation the descriptor of the function. PASScp,Bj for Example 1 is the set of ordered pairs {(A, D), (C, E)). The function associated with a set of paths p from node m to node n can similarly be represented by a relation PASS,, where (A, B) E PASS, iff B is a parameter of the procedure represented by n and A is a formal parameter that is bound to B along a suffix of a path in p. The range of PASS, is the set of formal parameters in n. The domain of PASS, is the set of formal parameters belonging to a procedure whose node belongs to one of the paths in p.17 Given our representation of functions in terms of descriptor relations, the required elimination-phase operations on functions are performed via their corresponding descriptors. We now consider functional composition, meet, and reflexive, transitive closure in terms of our functional descriptors. First we consider composition. Suppose there is a call sequence in which procedure P passes formal parameter A to the formal B of Q, and procedure Q passes B to the formal C of R. PASS(,B, is ((A, B)], and PASSca,R, is {(B, C)). We wish to compute the descriptor PASS,PQR, for fcpQR).Performing the relational composition PASScQ,R, 0 PASStp,B, yields ((A, C)], which is not the desired relation: It indicates the parameters that are propagated to R from P but does not account for those propagated from Q. If the elements of PASScQ,R, are “combined” with the elements of the composition, however, the correct descriptor ((A, C), (B, C)) results. The combine operation, which we denote by @, is similar to but not the same as a relational union, whose input relations have identical domains and ranges. We define @ as a binary operator whose input relations RI and R, are defined over (possibly disjoint) domains D, and D, and the identical range RNG. The output of @ is the relation R defined over the domain D, U D, I7 Node n is an exception: without terminating at n.

Its formal

ACM Transactions

parameters

are included

on Programming

only if one of the paths includes

n


358

l

Michael Burke

and the range RNG, such that

(x, Y) E R - b, Y) E R1

or

lx, Y)

E Ra.

In general, where PASS, represents a set of paths terminating at node n and PASS, represents a set of paths originating at n, the descriptor representing the concatenation of the sets of paths is computed as (PASS, 0 PASS,) @ PASS,. We refer to this operation as lozenge and denote it as PASS, X PASS,. In the context of a lozenge operation, the combine operation is defined over disjoint domains. We now consider function meet. Given that PASS, and PASS, represent sets of paths from m to n, the descriptor associated with the union of these path sets is arrived at by combining PASS, with PASS,. Suppose that, in addition to the above call sequence, by another call sequence P passes A to the formal D of S and S passes D to the formal C of R. The descriptor for this set of two paths is computed as PASS,p~BR,63PASScps~,. Thus, itA, 0, (B, 01 @ ((A, 0, MA 01 yields the meet descriptor {(A, C), (B, C), (D, C)]. We now consider the reflexive, transitive closure of a function. Suppose that procedure P with formal parameter list (A, B, C) invokes procedure Q that has parameter list (D, E, F) by the statement CALL Q(X, A, B)

and Q invokes P by the statement CALL P(D, E, F)

The descriptor PASS for the function f representing two calls is (((D, A), (A, B), (E, B), (B, C), (F, C)). for f + by computing the transitive closure PASS+ (D, B), (E, B), (A, C), (B, C), (D, C), (E, C), (F, 0). 8.1 Formal Bound Set Computation

the composite effect of the We compute the descriptor of PASS: {(D, A), (A, B), PASS* represents f*.‘”

Example

Consider the program of Figure 7. The program call graph, as partitioned into intervals, is drawn in Figure 8. Figure 9 gives the edge functions for the program’s call sites. For an edge of the original graph, these functions are determined by the call site’s pattern with respect to passing formal parameters as reference arguments. The functions for virtual edges are computed in the course of the elimination phase of interval analysis. For each interval 1, the elimination phase determines the descriptor PASS*(I) that represents the closure of I. We denote the descriptor representing the internal closure of I as PASSI( The elimination phase processes the intervals I8 We are representing the identity function 1, which is associated with the empty path, by the identity of 1 corresponds to our regarding a relation Q (recall that f * = L A f ‘). Our representation formal parameter as bound to itself along the empty path. Applying L to the lattice element (Abound, Bbound, Cbound) associated with P yields (Abound U (A], Bbound U {B), Cbound U {C)). The above mapping is truly the only identity mapping where A, B, and C are assumed to already belong to the sets Abound, Bbound, and Cbound, respectively. Accordingly, T for the formal bound set semilattice is ((A], (B), {C)). ACM Transactions on Programming Languages and Systems, Vol. 12, No. 3, July 1990.


*

PROCEDUREP(Pl,P2,P3) ... CALL Q(Pl,-,P2) CALL R(P3) ... END P

PROCEDURET(Tl,TZ) ... CALL U(T2,-,-) ... END T

PROCEDUREQ(Ql,Q2,Q3) ... CALL S(Ql,Q2,Q3) ... END Q

PROCEDUREU(Ul,U2,U3) ... CALL V(-,Ul) ... END U

PROCEDURER(R1) ... CALL S(-,Rl,Rl) CALL T(-,Rl) ...

PROCEDUREV(Vl,V2) ... CALL T(-,V2) CALL U(-,-,V2) ... END V

359

END R PROCEDURES(Sl,S2,S3) ... CALL P(-,Sl,S3) CALL T(S3,-) ... END S Fig. 7. Programexamplewith recursion. in an inner to outer order. PASScu,,, X PAS!&,, yields the descriptor PASS, (( ( U, V)) for the internal closure of (U, V ). Figure 10a gives PASS*(((U, V )) (PASS+((U, V]) = PASSI((U, V I)). The descriptor PASScu..vj is evaluated as PASScu,vj x PASS*({U, V I), yielding (( Ul, V 2)) (the same relation as PASScu,vj). The interval is collapsed into U, with the edge (U, 2’) replacing (V, T) and associated with the descriptor formed by lozenging PASStv,T, and PASScu..vJ (Figure 9). When emphasizing that node U now represents the interval (U, V), we shall denote it as [U, V]. The next interval to be processed is (T, [U, VI). Lozenging the descriptors for fCU,Tj and fCT,Uj yields PASSI((T, [U, VI)), from which PASS*((T, [U, V])) is calculated (Figure lob). The next interval processed is (P, Q, R, 5’). The descriptors fco,s, and fcP,oj are lozenged to yield PASScpQs, (Figure 10~). The descriptors fCR,s)and fCP,RJare lozenged to yield PASScpRs, (Figure 10~). PASS Cp-+s)is determined by combining with PASS(pRs,. Figure 1Oc shows the resulting relation. Next PASS~PQS, PASScs,p, and PASScp,s, are lozenged to yield PASSI((P, Q, R, S)) (Figure 11). Then PASS*(P, Q, R, S} is computed (Figure 12). By lozenging the path descriptors for forward paths to exit nodes with PASS*({P, Q, R, SJ), the entrance-to-exit path descriptors for the interval are determined (see Figure 10d). The interval can now be collapsed into a single ACM Transactions

on Programming


360

Michael Burke

l

Fig. 8.

Interval

partitioning

of the call graph.

EDGE

PASS 1 (Pl,Ql), 1 (P3,Rl) (QZ,=), { (Rl,SZ),

1 (Ql,Sl),

l

(P,T)

(replacing (replacing

(V,T)) (S,T))

(PrT)

(replacing

(R,T))

Fig. 9.

0’2,Q3) 1 (Q3tS3) (Rl,S3)

(Sl,P2), (S3PP3) 1 (RltT2) I { (S3,Tl) 1 1 W,Ul) 1 1 (UlnV2) I f wu~3~ I { W,T2) 1 ( (Ul,T2),(V2,T2)

1 I ) 1

}

(P1,T1),(P2,T1),(P3,T1),0, (Q3,T1>,(R1,Tl),(Sl,T1),0 { (Pl,T2),(P2,T2),(P3,T2),0, (Q3,T2),(Rl,T2),(Sl,T2),(53,T2) {

Edge-function

1 1

descriptors.

node. The edge (P, T) replaces (S, T) and is associated with the result of PASScs,T, M PASScp,,s, (see Figure 9). Another edge (P, T) replaces (R, T) and is associated with the result of PASScR,Tj x PASScP..R) (Figure 9). All intervals have now been processed, and so the elimination phase has been completed. The propagation phase now applies the edge functions of the call graph and the closure functions calculated during the elimination phase in the course of an interval-order traversal of the original graph. To apply an edge function fu=,oj and to meet the result with INFO(Q), BOUND(Y) is augmented to include each element of BOUND(X) for each element (X, Y) in PASScp,+lg I9 A function application and the accompanying indicated union operations. ACM Transactions

on Programming

meet operation

are both accomplished


here by the

An Interval-Based


Analysis

l

361

PASS*((U,V)) u2 0 1 0 0 0

Ul

Ul u2 u3 Vl v2

1 0 0 0 0

u3 1 0 1

0 1

(a) PASS*({T, [‘J rVl>)

Tl 1 0 0 0 0 0 0 (b)

Tl T2 Ul u2 u3 Vl v2

PAss:

t I

Sl 52 53 Pl 10 0 P2 0 01 P3000 100

I I 1 1 I

43

1 I

0

T2 0 1 1 0 0 0 1

01

Pl P2 P3 Rl

Sl 0 0 0 0

s2 s3 0 0 0 0 11 11

Sl s2 53 PllOO P2 0 01 P3 0 11 Ql 10 0 42 010 43 Rl 011 0 01

I I I

; /

(cl I

PASS(P..S)Z Sl

Pl P2 ;: Q2 43

Rl Sl 52 53

111 0 011 111 0 0 0 011 0 0

s2

I I I I I I

s3

11 1 11 11

0

0 11

0

PASS(P..R): Pl P2 P3

I I

43

I I I

52 s3

Rl 1 1 1 1 0 1 1 1 0 1

(4

Fig. 10. Path descriptors. ACM Transactions on Programming Languages and Systems, Vol. 12, No. 3, July 1990.

362

Michael Burke

l

Pl P2 ;: Fig. 11.

Internal

42

closure of (P, Q, R, S).

43 Rl Sl

s2 53

Pl P2 P3 Ql ,": Rl Sl s2 53

Pl 1 0 0 0 0 0 0 0 0 0

P2 1 1 0 1 0 0 0 1 0 0

P3 1 1 1 1 0 1 1 1 0 1

Fig. 12.

Pl 0 0 0 0 0 0 0 0 0 0

P2 1 0 0 1 0 0 0 1 0 0

P3 0 1 1 0 0 1 1 0 0 1

Closure of (P, Q, R, S).

The bound set associated with each formal is initially the set with the formal itself as its only member. Bound set information may be represented by a bit matrix FP-BOUNDCLOSURE of size fpnum by fpnum, where fpnum is the number of program formal parameters and where a “1” in the row for parameter A and the column for parameter B indicates that A is bound to B. In accordance with the initialization of the lattice elements to T, every element of FP-BOUND-CLOSURE not on the main diagonal is initially “0” and those elements on the main diagonal are initially “1". Applying f *{P, Q, R, S} to the initial bound sets associated with procedure P yields the bound sets for P’s formals given in Figure 13. Applying fcP,B,and FcP,RJ to these sets yields the bound sets for Q and R given in Figure 13. Applying fta,s, yields the bound sets shown in Figure 14. Applying ftR,S) and unioning with the above bound sets yield the bound sets for the parameters of S given in Figure 13. Applying fcR,n and fcS,T)and unioning the outputs yield the bound sets shown in Figure 15. Applying f *( Z’, [U, V]] (Figure lob) to the parameters of T yields the additional bindings of Ul and V2 to T2. Applying fcT,“) yields the bound set for Ul given in Figure 13. Since Ul is bound to U3 by the transitive closure of (U, V ), applying f *( U, V) (Figure 10a) propagates the bound set of Ul to the bound set of U3 (see Figure 13). Applying f (u,v) propagates bindings to V2 as shown in Figure 13. Figure 13 gives the solution FP-BOUND-CLOSURE for this example. ACM Transactions on Programming Languages and Systems, Vol. 12, No. 3, July 1990.


Pl P2 P3

Ql ;: Rl Sl s2 s3

Tl T2 Ul u2 u3

Vl v2

Pl P2 P3 11110 0110 0 010 0 1110 0 0 0 0 010 0 010 0 110 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Ql

0

0 0 0 0 0 0 0 0

Fig. 13.

Parameter Sl

42

52 S3 Tl T2 Ul U2 U3 Vl V2 43 Rl Sl 111111110 10 1 0 110111110101 0 0 10 11111010 1 111111110 10 1 10 0 010 0 0 0 0 0 0 0 0110111110 101 0 0 10111110 101 0 111111110 101 0 0 0 010 0 0 0 0 0 0 0 0 0 10 111110 101 0 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0110101 0 0 0 0 0 0 0 110 101 0 0 0 0 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 110101

FP-BOUND-CLOSURE

for the program of Figure I.

Bound Set tQ1,Pll {Q2)

s2 s3

Fig. 14.

Bound sets for procedure S.

{Pl,P2,Ql,Q3,Sl}

Parameter Fig. 15.

363

Bound sets for procedure

Tl

‘I’.

T2 9. THE PROCMOD

COMPUTATION

Bound Set {Pl,P2,P3,Ql,Q3,Rl,Sl,S3] {Pl,P2,P3,Ql,Q3,Rl,Sl,S3)

FOR GLOBALS

for globals can be computed in a manner analogous to PROCMOD for formal parameters: Given the DIRECTMOD sets and call-site binding patterns, the PROCMOD sets can immediately be calculated. X is bound to itself by a call chain iff it is passed at each call. For a global variable X in the DIRECTMOD set of procedure P, X is to be included in GLOBAL-PROCMOD(Q) for PROCMOD

any procedure Q (with respect to which it is global) that reaches P with respect to X: that is, Q for which there is some call chain from Q to P by which X is

bound to itself. 9.1 The Reaching-Procedure

Computation

The set of reaching procedures associated with each (global variable, procedure) pair (X, P) in the program is to be computed. The reaching-procedure computation is a data-flow problem to which interval analysis as formulated in Section 7.2 can be applied. We now formulate the associated semilattice as a complete semilattice with a T. Where the program contains n (global variable, procedure) ACM Transactions

on Programming


364

-

Michael Burke

pairs, each lattice element is an n-tuple whose elements are subsets of the set of procedures in the program. The meet operation, applied to lattice elements

a, SP,. . . , &I

and

VI, TP.,. . . , Tnl,

produces the lattice element [Sl U 7’1, S2 U T,, . . . , S, U T,]. T is the n-tuple for which each element is the set containing the associated procedure as its only element. 2o I is the n-tuple for which each element is the set containing all procedures in the program. The edge function associated with a call site s in procedure P that invokes procedure Q represents the passing at s of each global variable in Q to itself. Where procedure P invokes Q that has global variables X, Y, and 2, the function fcP,gj, given input [. . . , RP(X, P), RP(Y, PI, RP(Z P), . . . , RPW, QL RP(Y, Q), RP(Z, Q), . . .], produces output21

Q) [ . . . ) RPW, PI, RP(Y, P), RP(Z, P), . . . , RPK u RP(X, P), RP(Y, Q) u RP(Y, P), RP(Z, Q) U RP(Z, P), . . .]. The representation of the edge and path functions is straightforward. The action of an edge function is determined by the set of variables global to the called procedure. Where the called procedure has global variables X1, X2, . . . , X, and P is the calling procedure, the mapping ([X,, (PI], [X2, {PI], . . . , [X,, {P)]}, the descriptor of the edge function, represents it.22 The function representing a set of paths p from node m to node n can be described by a mapping of each global variable X of n to its set of reaching procedures with respect to p: P is such a procedure iff there is a path suffix in p, initiating at P, along which X is bound to itself. We term this mapping the descriptor of the path function. We now consider the determination of the edge function descriptors for paths and sets of paths. First we consider function composition. Consider a call sequence in which procedure P calls procedure Q that has global variable Y, and procedure Q calls procedure R that has global variables Y and 2. The descriptor for (P, Q) is ([Y, {P)]). The descriptor for (Q, R) is ([Y, (Q)], [Z, IQ)]. The descriptor for the path (PQR) consists of the mapping of each global X in R to the union of the sets associated with X by the two edge descriptors. Where a global X in R is not global in Q, the set of procedures associated with X in Q is evaluated as the empty set. Thus, the descriptor for (P&R) is

U’, V’, &II>LC iQI11. The descriptor for a function meet is computed, for each global X, by unioning the sets of procedures associated with X by the descriptors for input functions. ” This assignment to T is consistent with regarding globals as bound to themselves with respect to P along the empty path at P. *I Where V is not global to P, there is no set in the input tuple that is associated with (V, P), and RP(V, P) is evaluated as the empty set. 22The set of global variables of the called procedure is sufficient to represent the edge function. The descriptor defined here is consistent with our definition for path functions and is assumed for a clearer exposition. ACM Transactions

on Programming


An Interval-Based


Analysis

365

Suppose that, in addition to the above call sequence, procedure P calls procedure S that has global variables Y and 2, and procedure S calls procedure R. The descriptorfor (PQR) is V, If’, &II, L-TiQIl1, and the descriptor for (PSR ) is ([Y, {P, S)], [Z, {P, S)]). The descriptor for (PQR) A (PSR) is ([Y, (P, Q, S]],

V’, U’, Q, SW. The reaching-procedure computation is fast, so the closure operation is trivial. An external variable is global to all procedures of a program, so where P (directly or indirectly) calls Q, the external variable X is bound to itself along this chain of calls. The set of reaching procedures associated with (X, P), then, is the set of procedures that reach P in the call graph. Thus, the reachingprocedure sets for external variables are given by the transitive closure of the call graph and can be computed separately. This computation can be formulated as a data-flow problem. A formal parameter A of procedure P is global with respect to any procedure nested within P. The separate computations of PROCMOD for globals and PROCMOD for formals seem problematic in that a formal parameter of procedure P is global to the procedures nested within P. For example, suppose that procedure Q is nested within P and that A E GLOBAL-DIRECTMOD and A E FORMAL(P). A must belong to FORMAL-PROCMOD(P) for the SITEMOD computation to be correct. The proper adjustment can easily be made when sets.23 computing DIRECTMOD Consider procedure P with procedure Q nested within it. We assume that Q is in the call graph and thus is reachable from the start node. Any invocation of Q must be within the scope of P, since Q is visible only within that scope. Any path from the start node to Q, then, has as a suffix a path from P to Q along which each procedure is within the scope of P. A formal parameter A of P is implicitly passed as a global at each call site of such a suffix. A modification of A anywhere along the suffix is a modification of the formal parameter. Since such a path from P to Q must exist, the required adjustment to formal parameter modification information does not depend on the call graph; lexical scoping information is sufficient. As DIRECTMOD sets are computed through a scan of the program text, the necessary adjustments can be performed. A modification of a global in Q results in its placement in GLOBAL-DIRECTMOD( If it is a formal parameter for an outer procedure P, it is also added to FORMAL-DIRECTMOD(P). 9.1.1 Global Variable Mechanisms in PL/l and FORTRAN. The PL/l external attribute for variables and the FORTRAN common block are mechanisms for declaring external variables. In PL/l an external variable X is hidden from an external procedure P simply by not declaring X with the external attribute in P. Similarly, in FORTRAN an external (“common”) variable is hidden from a routine P by not declaring its associated common block in P.24 FORTRAN is not block-structured. With block structure removed from our model, no variable other than an external is global to any procedure. Thus, in FORTRAN the *’ Cooper and Kennedy formulate equivalent accommodations of nesting [23,24] in their frameworks. *4 A FORTRAN common variable may assume different names in different procedures. Where the variable is an array, these name associations may be complex. ACM Transactions

on Programming


366

l

Michael Burke

GLOBAL-PROCMOD computation essentially reduces to computing the transitive closure of the call graph.

9.1.2 Computing the Call Graph$ Transitive Closure. We have observed that reaching-procedure sets for external variables are given by the transitive closure of the call graph. Every interval in G forms an equivalence class with respect to reachability. Once a call graph G has been decomposed into intervals, then, the computation of its transitive closure reduces to computing the transitive closure of the acyclic graph formed by collapsing each outermost interval of G into a single node. The transitive closure of an acyclic graph can be computed in the course of a single topological-order traversal, since each node is reachable from (and only from) its predecessors and the nodes reaching its predecessors. The transitive-closure computation can be formulated as a continuous dataflow problem, but here most of this framework is reduced to triviality. The information to be associated with a node n is the set of nodes that reach it. The set of nodes associated with n at a given point in the data-flow calculation is referred to as REACH(n). The function fCp,o)outputs (PI U REACH(P). The lattice meet operation is union, and the computation is fast. In the context of the transitive-closure computation, the call multigraph can be represented by a relation R such that (P, Q) E R iff there is an edge from P to Q in G. Eve and Kurki-Suonio [28] present an algorithm for computing the transitive closure of an arbitrary relation based on Tarjan’s algorithm for finding the strongly connected components of a directed graph [59]. This algorithm, like the data-flow analysis described above, performs O(e) union operations on inputs of length n. Zadeck [67] has developed a data-flow technique, based on Tarjan’s algorithm for identifying the strongly connected regions of a graph, that can be applied to fast data-flow problems for which each member of a strongly connected region has the same solution. The transitive-closure computation satisfies these conditions; applying Zadeck’s technique to it is equivalent to applying Eve and Kurki-Suonio’s algorithm. 10. ALIAS ANALYSIS

The variables X and Y are said to be aliased when, at a given point in the execution of a program, the same storage location can be accessed through a reference to either of them. In that alias relations pertain to associations between variable names and storage locations, they are defined with respect to a scope: a portion of a program over which a name is visible and is associated with a particular storage location. In our model the scope of a variable is the procedure in which it is declared (the scope of an external variable is the entire program). It is not difficult to extend the techniques presented here to include mechanisms (such as PL/l begin blocks) for defining scopes within procedures. Aliases arise in a number of ways. Our concern here is with aliasing effects due to reference parameters. This interprocedural form of .aliasing depends on a runtime binding of formal parameter names to storage locations. The alias relations generated with respect to a procedure P are dynamically determined by the sequence of calls that culminates in its invocation. There are as many such sequences as there are paths in the call graph from the system procedure node to node P. We consider the determination of potential aliases. Where there exists ACM Transactions

on Programming


An Interval-Based Approach to Interprocedural Analysis a chain of calls that results in the aliasing of variables are said to be potential aliases in P. designated by (X, Y ). We denote the set of ALIAS(P). The problem, then, is to compute the program. 10.1 The Exhaustive

Computation

367

two variables X and Yin P, the Such an (unordered) alias pair is all potential alias pairs in P as ALIAS(P) for each procedure in

of Aliases

Cooper [21] draws a useful distinction between the “introduction” and “propagation” of aliases at call sites. When the same variable is passed at a call site to two different variables X and Y, then the alias pair (X, Y) has been introduced to the called procedure by the call site. When two aliased variables are passed at a call site, the variables to which they are passed are aliased in the called procedure: This latter alias pair has been propagated by the call site. Thus, where A and B are aliased and a call site passes A to C and D and passes B to E, the call site propagates the aliases (C, E) and (D, E) and introduces the alias (C, D). Recall that a variable can be passed either as a reference parameter or through being global to the called procedure. Alias introduction, then, occurs when (1) the same variable The corresponding (2) a variable that is parameter. In the formal parameter

is passed-by-reference at more than one parameter position. formal parameters of the called procedure are then aliased. global to the called procedure is passed to it as a reference called procedure, the global variable and the corresponding are aliased.

The potential-alias computation can be performed in a manner analogous to the SITEMOD computation. The role analogous to DIRECTMOD sets is played by the sets of introduced alias pairs, which form the initial, immediate subset of the set of all potentially aliased pairs. The alias data-flow problem is, starting with these pairs, to generate all pairs of potential aliases. The introduced aliases, like the direct modifications, are propagated by call-site bindings, but here the bindings hold between pairs of variables. Where variables X and Y are passed to A and B at a single call site, the pair (X, Y) may be regarded as being passed to the pair (A, B). Where X is bound to A and Y to B by the same path in the call graph, (X, Y) is pair-bound to (A, B) along that path. Where X is bound to A and Y to B, it does not necessarily follow that (X, Y) is pair-bound to (A, B), as the bindings must occur along the same path: Pair-binding information cannot be extracted from bound set information. The potential-alias computation can be decomposed into the computation of introduced alias pairs and the pair-bound computation. To describe the pairbound computation, we first define relevant pairs. A relevant pair (X, Y) of variables is such that X and Y are distinct and, for some procedure P, each belongs to either FORMAL(P) 01:GLOBAL(P). The pair-bound set of a relevant pair (X, Y) is the set of all relevant pairs that are pair-bound to (X, Y) along some path in the call graph. The pair-bound computation determines PAIRBOUND(X, Y) for each relevant pair (X, Y) in the program. It can be performed by interval analysis in essentially the same manner as the formal bound set computation. Given the pair-bound sets for relevant pairs and the initial set of introduced pairs, it is easy to compute the set of all potential aliases. But a ACM Transactions

on Programming


368

l

Michael Burke

difficulty here is that global and formal parameter bindings cannot be treated separately, as the pair-bound sets of (formal, global) pairs cannot be computed from a separate consideration of formal and global bindings. Where the procedure represented by the header of an interval has k formal parameters and 1 globals, the matrix representing its bound-pair relations in the standard manner has size (rlz+ 1)* by (k + Z)‘, so computing its reflexive, transitive closure using Warshall’s algorithm [64] has time cost of order (k + Z)6, where (h + 1) is not necessarily small. A sparse-matrix representation and a transitive-closure algorithm based on it would lessen the cost of the closure computation considerably in practice (Section 15.1). The bound-pair computation meets the formal requirements of our technique (Appendix A.2), but other algorithms may be less costly in practice, depending on the characteristics of the program (in particular, the density of the boundpair relations). Alternative exhaustive algorithms include Banning’s, which determines potential alias pairs through a depth-first traversal of the call graph, Cooper’s [21], which determines potential alias pairs via iterative data-flow analysis, and Cooper and Kennedy’s [25], which is based on the program’s binding multigraph. 10.2 Factoring in Alias Information Banning’s algorithm decomposes the MOD computation into SITEMOD and alias analysis. For a call site s, aliasing is ignored as SITEMOD is computed, factoring the alias relations holding in the procedure containing s into SITEMOD(s) produces MOD(s). That is, where X E SITEMOD (s) for a call site s in P, X and each variable Y such that (X, Y) E ALIAS(P) belong to MOD(s). Due to our separate treatment of global and formal parameter bindings, factoring in alias information in this manner becomes insufficient.25 The following example illustrates this. Suppose that procedure Q calls procedure R, passing a variable X that is global to both Q and R by reference to the formal parameter A. Suppose also that A belongs to PROCMOD(R). This call introduces an alias between X and A in R, so X is possibly modified in R. X would be included in SITEMODc~,R,, since it is passed-by-reference to A and since A is PROCMOD(R). But X would not be included in PROCMOD(Q), since the PROCMOD sets are computed directly from DIRECTMOD sets and since X does not belong to DIRECTMOD (or PROCMOD(R)). Suppose also that procedure P calls procedure Q: X would not be included in SITEMODtp,B,, since it is not in PROCMOD (Q), and thus would erroneously be excluded from MODcp,Qj. The remedy suggested by Burke [13] is to factor alias relations into the DIRECTMOD sets: For every variable X in DIRECTMOD( all variables potentially aliased with X are included in DIRECTMOD( This solution is correct in that any variable belonging to the MOD set of a call site due to an alias relation will be recognized as such. In the above example, X would be included in DIRECTMOD and thus in PROCMOD(R), PROCMOD(Q), and However, this solution results in more conservative MOD SITEMODcp,a,. x Cooper and Kennedy do not take this observation into account in [22]. They correct their algorithm with an adjustment that does not affect its time bound in [23]. ACM Transactions

on Programming


An Interval-Based


Analysis

369

information than is determined by Banning’s algorithm. Suppose in the above example that R is also called by Q2, that X is also global to Q2, and that this call does not pass X to A. Including X in the DIRECTMOD and PROCMOD sets of R would result in the inclusion of X in MOD (c2,R),but X cannot be modified by the call. The source of difficulty is that the alias between X and A holds along the path in the call graph to R from Q, but not along the other path (from Q2). To determine a solution as precise as Banning’s, this path-specific information must be incorporated into the factoring in of alias relations.26 Recording path-specific alias information requires an inexpensive modification of the alias analysis algorithm. When an alias (A, X) is introduced to procedure Q by a call site in P, A is a formal in Q, and X is global to both P and Q, the representation of the alias pair in Q must include the associated procedure P. The alias (A, X) may be introduced to Q under these conditions by call sites in more than one procedure; thus, each introduced alias holding between a formal and a global is associated with a set PROCS((A, X)) of procedures. When a potential alias (A, X) of the form described above is introduced, PROCS((A, X)) is augmented to include the procedure containing the relevant call site. Given alias information that has been augmented in the above manner, aliases can be factored in as precisely as with Banning’s algorithm. Assume that FORMAL-PROCMOD and FORMAL-DIRECTMOD have already been computed, ignoring aliasing as Banning does (except when accommodating nested procedures in the manner described below). Prior to factoring aliasing directly into SITEMOD sets, we first factor aliasing effects into GLOBAL-PROCMOD sets. When the alias pair (A, X) in Q is associated with PROCS((A, X)) and A belongs to FORMAL-PROCMOD(Q), then X is added to GLOBAL-PROCMOD for each procedure P in PROCS((A, X)). For each such pair (X, P), each procedure R in RP(X, P) must also be considered. If X is global to R, then it must be added (if not already present) to GLOBAL-PROCMOD(R).27 Once GLOBAL-PROCMOD sets have been augmented in this manner (see Figure 16 for an exact description), SITEMOD sets can be determined from PROCMOD sets, and alias relations can then be factored in to produce accurate MOD sets. In the above example, the set associated with (A, X) in R would include Q but not Q2. Thus, X would be added to GLOBAL-PROCMOD(Q) but not to GLOBALmPROCMOD(Q2). We have not yet considered the interaction of interprocedural aliasing with procedure nesting. We discussed the accommodation of nesting in Section 9.1. During the construction of DIRECTMOD sets, where X E GLOBAL-DIRECTMOD(Q), X is added to FORMAL-DIRECTMOD if there is a procedure P containing Q such that X E FORMAL(P). Suppose that A E FORMAL-DIRECTMOD(Q) and (A, X) E ALIAS(Q), w h ere X is global in Q and a formal in P. In this case, too, X is to be included in FORMAL-DIRECTMOD( The DIRECTMOD computation, then, for each element A E FORMAL-DIRECTMOD(Q), must consider each global X such that (A, X) E ALIAS(Q). For each “The paths under consideration here are paths in the call graph: Our algorithm remains flowinsensitive. 27An equivalent adjustment was pointed out to the author by M. Carroll and B. Ryder (private communication, 1986). ACM Transactions

on Programming


370 /*

’

Michael Burke

PROCS((A,X)) designates the set of procedures associated with an introduced alias pair (,4,X). */ GLOBAL-PROCMOD.*/ + 4)

/* Form the set of introduced alias pairs that possibly necessitate an adjustment to

INTERESTING-ALIASES /* Process relevant alias pairs.

for

:= ((A,X)

1 PROCS((A,X))

*/

each alias

pair

(A,X)

E INTERESTING-ALIASES

GLOBAL-PROCMOD*/ each procedure P E PROCS(A,X) global-closure(X,P) endfor endfor

/* Generate possible additions to

for

procedure for

global-closure(X,P) each procedure Q E RP(X,P) if X E GLOBAL(Q) then add X to GLOBAL-PROCMOD(Q) endfor end global-closure Fig. 16.

Factoring

aliasing into

GLOBAL-PROCMOD information.

such X, if X E FORMAL(P) for P containing Q, X is added to FORMALDIRECTMOD( Aliasing sets must be computed prior to DIRECTMOD sets. A formal description of the computation of DIRECTMOD sets is provided in Section 12. 11. IRREDUCIBLE

CALL GRAPHS

Program call graphs are not always reducible. The intuition underlying the belief that most control flow graphs are reducible does not extend to program call graphs. For example, a program to perform recursive descent parsing may have an irreducible call graph. An irreducible call graph contains at least one strongly connected region that has more than one entry point. We consider the smallest single-entry region containing such a multientry strongly connected region, abandoning the restriction that regions be strongly connected. We term such a region improper. Terms that we have defined for interval regions have analogous definitions for improper regions. An exit edge with respect to an improper region R is an edge (u, u) such that u belongs to R and u does not. We refer to such a node u as an exit node. A path from the entry node m of R to an exit node u is an entrance-toexit path with respect to R. The set of all paths within R from the entry m to a node n will be denoted as (m .. n). An improper region, like an interval region, is summarized in terms of path functions and collapsed into a single node during the elimination phase. Recall that in processing an interval I we associate f T with the header h and apply this function to INFO(h) during the propagation phase. One may process the other nodes of the interval in the same manner as the header, associating fc,,..,,) with each node n during the elimination phase (f T is fth..hJand, in the propagation phase, applying each function to the same input to which f T is applied (i.e., INFO(h) just prior to processing the interval). The interval analysis algorithm ACM Transactions on Programming Languages and Systems, Vol. 12, No. 3, July 1990.

An Interval-Based


Analysis

l

371

formulated in Section 7 in effect optimizes this process by taking advantage of the single-entry property of intervals. For an improper region R, the entry node m does not belong to a strongly connected region within R, so fcm..mJ is 1. For every other node n in R, the function f(,,,.,,)is calculated during the elimination phase and applied as described above during the propagation phase. During the elimination phase, the multientry strongly connected regions contained within the improper region R with entry node m must be repeatedly traversed until the function associated with each node n stabilizes. We denote the function associated with a node n in R at a particular point in this computation as f(n). At the point of stabilization, f(n) is fcm..nj for each node n. As with intervals, the nodes of the region are processed in interval order (reverse postorder) in the course of a traversal. The evaluation of r(n) as each node n (other than the entry node m) is visited is similar to the function evaluation that takes place for interval region nodes. The difference here is that as a node is visited it has predecessor nodes that succeed it in interval order. For such a predecessor node p, the function r(p) associated with p from the previous iteration is used (for the first iteration, 7 (p) has been set to L). Having computed fcm,.nj for each node n of R, the entrance-to-exit path functions are used to summarize R, and it is collapsed into a single node as with interval regions. In the propagation phase, it is not necessary to iterate through an improper region, as the path functions associated with its nodes by the elimination phase can be used to determine the solution at each node. Schwartz and Sharir [58], in their formulation of interval analysis, handle irreducible regions in a manner similar to that described here. The improper region that they associate with an irreducible region IR is the smallest singleentry, strongly connected region containing IR. Thus, the improper region they identify is always strongly connected and is at least as large as ours. Our more fine-grained interval decomposition is advantageous in the incremental context. The call-graph-transitive-closure algorithm based on Tarjan’s algorithm for determining strongly connected regions (see Section 9.1.2) does not require reducibility, as it is unaffected by whether a strongly connected region is singleor multientry. 12. SUMMARY

OF THE MOD COMPUTATION

We now summarize our exhaustive algorithm for computing MOD for each call site of a program: (1) Construct the call graph. (2) Compute ALIAS(P) for each procedure P (Section 10.1). (3) Compute FORMAL- and GLOBAL-DIRECTMOD sets by scanning each procedure, accommodating procedure nesting as described in Section 10.2 (Figure 17). (4) Compute formal bound sets and reaching-procedure sets by interval dataflow analysis. (5) Compute PROCMOD sets from DIRECTMOD sets: -Compute GLOBAL-PROCMOD sets from GLOBAL-DIRECTMOD reaching-procedure sets (Figure 18). ACM Transactions

on Programming

and


372

l

Michael Burke

for

each procedure P for each formal or global V modified locally within P if V E FORMAL(P) then begin add V to FORMAL-DIRECTMOD( for each W such that (V,W) E ALIAS(P) if W E FORMAL(PROC) (for some PROC containing P) then add W to FORMAL-DIRECTMOD(PROC) endif endfor end else /* V is global */ begin add V to GLOBAL-DIRECTMOD if V E FORMAL(PROC) (for some PROC containing P) then add V to FORMAL-DIRECTMOD(PROC) endif end endif endfor endfor Fig. 17.

GLOBAL-DZRECTMOD computations.

FORMAL-and

for

each procedure P for each global X E DIRECTMOD global-closure(X,P) endfor endfor procedure global-closure(X,P) for each procedure Q E RP(X,P) if X E GLOBAL(Q) then add X to GLOBAL-PROCMOD(Q) endfor end global-closure Fig.18.

GLOBAL-PROCMOD computation.

-Compute FORMAL-PROCMOD sets from FORMAL-DIRECTMOD and formal bound sets (Figure 19). The above computations of GLOBALPROCMOD and FORMAL-PROCMOD are independent of each other and thus can be performed in either order. (6) Augment GLOBAL-PROCMOD in Figure 16 of Section 10.2. (7) Compute SITEMOD called procedure Q:

on Programming

aliasing

as indicated

for each call site s from the PROCMOD

X E SITEMOD, iff X is passed-by-reference ACM Transactions

sets to accommodate

X E GLOBAL-PROCMOD(Q) or to A and A E FORMAL-PROCMOD(Q).


sets of the

An Interval-Based Approach to Interprocedural Analysis each procedure P for each formal A E DIRECTMOD formal-closure(A) endfor endfor procedure formal-closure(A) for each formal B E BOUND(A) [where add B to FORMAL-PROCMOD(Q) endfor end formal-closure

373

for

Fig.19.

B E FORMAL(Q))

FORMAL-PROCMOD computation.

(8) Compute MOD for each call site s by factoring the alias relations holding in the calling procedure into SITEMOD( as indicated in Section 10.2. 13. COMPLEXITY

OF THE EXHAUSTIVE

ALGORITHM

First we consider the time complexity of interval analysis in general. Then we consider the time complexity of the exhaustive algorithm presented in Sections 8-13 for determining MOD and REF information for the call sites of a program. 13.1 Complexity

of Interval Analysis

phase, for a reducible graph 13.1.1 Reducible Graphs. The interval-construction G with e edges, has a time bound O(eLu(e)) [58, 601, where (Y is the functional inverse of Ackerman’s function. We now bound the number of data-flow operations performed by interval analysis. First we consider the number of virtual edges generated by the elimination phase of interval analysis. Since only the exit edges of an interval are duplicated and each of these can be duplicated at most a number of times equal to the deepest nesting d of intervals, the number of virtual edges is at most d e. Let e” denote the total number of graph and virtual edges. We have shown that e# is O(d e). In the elimination phase, the operations performed are the meet and composition of pairs of functions, and the transitive closure of functions. Each edge participates in a function meet at most once, so the number of meets performed is bound by e#. The number of transitive-closure operations is equal to the number of intervals and thus is bound by the number of nodes n in the graph. For each exit edge of an interval, a function composition is performed twice. Thus, this operation is performed O(d e) times. The total number of operations performed in the elimination phase, then, is O(d e). In the propagation phase, a function is applied once for each edge of G and once for each interval, and so the number of function applications is bounded by e + n. Each function application is accompanied by a lattice meet. The number of operations performed during the propagation phase, then, is O(e), and the total number of operations performed by interval data-flow analysis is O(d e). The cost of these operations depends on the application. ACM Transactions

on Programming


374

l

Michael Burke

13.1.2 Irreducible Graphs. In the irreducible case, the time bound of the interval-construction algorithm of [58] is increased by a factor of r, where r is the number of interval and improper regions of G. The data-flow analysis has the time bound of the iterative algorithm used to process irreducible regions. In iterative data-flow analysis, each edge visit involves a function application. For rapid problems, the bound on the number of edge visits for iterative analysis is O(d’ e), where d’ is the loop-connectedness parameter, the largest number of back edges on any cycle-free path of G [37].” However, neither the SITEMOD nor the potential-alias computation is rapid; in the ensuing section, we consider the cost of iterative analysis of these problems. 13.2 Complexity

of the Application

We consider the time cost of our exhaustive algorithms for inter-procedural analysis. Except where otherwise stated, n denotes the number of nodes and e the number of edges in the call graph. The most costly components of the SITEMOD computation are the formal bound set and reaching-procedure computations. We also consider the cost of the potential-alias computation. First we consider the cost of the formal bound set computation. Where the call graph is reducible, the number of data-flow operations performed is O(d e). Function meet is performed by a combine operation that requires fp set-union operations, where fp is the number of formal parameters of the relevant procedure. The size of the sets being unioned is at most the number of formal parameters in the relevant interval. In the propagation phase, each function application and the accompanying lattice meet are accomplished by p union operations, where p is the number of formal parameters in the called procedure. The size of the sets being unioned is at most the number of formal parameters in the program. These set-union operations can be performed in time linear in the size of the sets with respect to a sparse representation (Section 15.1). Function composition requires a bit matrix composition of an l-by-m matrix with an m-by-n matrix, where 1, m, and n each represent the number of formal parameters of a relevant procedure.” The matrices will generally be sparse; the sparse-matrix composition algorithm described in Section 15.1 performs in time dependent on the number of matrix entries (see that section for a more detailed analysis). The closure computation for an interval requires performing the transitive closure of an n-by-m matrix, where n is the number of formal parameters of the header procedure and m is the number of formal parameters in the interval. The sparse-matrix-transitiveclosure computation of Section 15.1 performs in time, dependent on the number of entries in the matrix (see that section for analysis). The bound on the number of call-graph edge visits for an iterative analysis of the formal bound sets of an irreducible region is O(d ’ e fpmax), where d’ represents the region’s loop-connectedness parameter, e its number of edges, and fpmax the largest number of formal parameters for any procedure in the region. Each edge visit involves a function application. A function application requires p set-union ‘a For irreducible graphs, d’ is dependent on which dfst is chosen. ” As described in Section 8, function composition is accomplished by a lozenge operation that includes a relational composition and a combine operation applied to relations with disjoint domains. In this context, the combine operation is performed implicitly by our representation (i.e., has zero cost). ACM Transactions

on Programming



l

375

operations, where p is the number of formal parameters in the called procedure and where the size of the sets being unioned is at most the number of formal parameters in the region. We now consider the cost of the reaching-procedure computation. Where the graph is reducible, the number of data-flow operations performed is O(d e). Function meet requires g set-union operations, where g is the number of variables global to the relevant procedure. The size of the sets being unioned is at most the number of procedures in the relevant interval. Function composition has the same cost as function meet. The computation is fast, so no closure operations are performed. In the propagation phase, each function application and its accompanying lattice meet require g union operations, where g is the number of variables global to the relevant procedure and where the size of the sets being unioned is at most the number of procedures in the program. There is no basis for assuming that the reaching-procedure sets will generally be sparse. However, the maximum size of the sets is limited by the number of procedures in an interval for some operations and the number of procedures in the program for others. A bit-vector representation, allowing a bit-vector union, would generally be the most suitable for the reaching-procedure sets. The number of variables g global to a procedure can be quite large. However, the reachingprocedure computation can consider the external variables of the program separately from other global variables and solve for them in an efficient manner (see Section 9.1.2). This would, in general, significantly diminish the number of global variables included in the reaching-procedure computation. For an irreducible region of e edges, the reaching-procedure computation is rapid, so the bound on the number of call-graph edge visits is O(d’ e). Each edge visit involves a function application. A function application requires g set-union operations, where g is the number of variables global to the called procedure and where the size of the sets being unioned is at most the number of procedures in the region. The time cost of the potential-alias computation, using interval analysis, is the same as for the formal bound computation, except the matrices are larger (Section 10.1). Once again, the sparse-matrix representation and the sparse-matrix operations described in Section 15.1 should be utilized. The time cost of the potentialalias computation developed in [25] is O(n2 -I- n e), based on certain assumptions about the growth of problem parameters with respect to the length of a program. 13.3 An Asymptotically

Faster Elimination Algorithm

Empirical studies of the depth of control flow graphs have shown that d is small, rarely exceeding 7 [53]. The author is not aware of any empirical studies of program call-graph structures for a language that supports recursion, but deep nesting of these graphs would seem to be unusual. For graphs structured like the “shell graph” of [40], the depth of the graph grows proportionally with the number of nodes. The number of operations performed by interval analysis for such a graph of n nodes is O(n e). A shell graph seems an unlikely structure for a call graph, but it should be noted that there are elimination data-flow algorithms that are asymptotically superior to interval analysis. Tarjan [62] has formulated an elimination data-flow algorithm ACM Transactions

on Programming


376

-

Michael Burke

that maintains an auxiliary forest data structure; its running time is dominated by the forest manipulation operations. By a sophisticated off-line method that preprocesses the entire sequence of forest manipulation operations, a bound of O(e a(e, n)) data-flow operations can be achieved.30 Cooper and Kennedy [23] apply this algorithm to the bound set computation and analyze the resulting complexity. Tarjan’s method, which applies path compression to balanced trees, has proved too complicated to be practically applied to data-flow problems.3’ Tarjan’s recommendation for practical applications is a simpler method (but still more complicated than interval analysis) that uses path compression without maintaining balanced trees to obtain an O(e log n) bound on the number of dataflow operations [62]. The difficulty with Tarjan’s path compression technique in our context is that it prevents his algorithm from providing a practical basis for an incremental algorithm. Given a change to the propagation function associated with an edge, updating the solution would require redoing any calculation dependent on that edge function that had been performed after the edge’s elimination. An incremental data-flow algorithm would have to “undo” compression by “backing up” to the point at which the edge had been eliminated. The algorithm could only propagate a change incrementally, then, if the entire sequence of path compressions, along with the partial solution at each point, were maintained. Ryder and Paul1 [53, 561 formulate incremental versions of Allen-Cocke interval analysis and of Hecht and Ullman’s asymptotically superior (O(e log e)) data-flow algorithm that is based on balanced 2-3 trees [33, 631. They find that the incremental version of the Hecht-Ullman algorithm is unduly complicated and has a worse time bound than its exhaustive version. They too observe that an incremental algorithm based on an auxiliary data structure is hindered by having to save sufficient information about the structure to re-create its intermediate stages during elimination. Rosen [50] has also observed that traditional exhaustive computational complexity bounds are poor predictors of incremental performance. In the ensuing section, we consider interval analysis as a basis for an incremental data-flow algorithm that accommodates a large class of edge insertions and deletions along with any changes to the propagation functions associated with edges. 14. INCREMENTAL

COMPUTATION

OF MOD

Since our interprocedural analysis algorithm is intended for use in a program deveIopment environment, it is important to develop an incremental version of it. An incremental determination of altered interprocedural facts could provide an efficient basis for incorporating interprocedural information in the optimization of an altered procedure and for limiting the recompilation of other (unaltered) procedures. 30The number of data-flow operations performed is proportional to the number of forest operations. 31An application of path compression replaces a path from a leaf node to the root of its tree by a single edge that is associated with the composition of the functions along that path. ACM Transactions

on Programming


An Interval-Based

We distinguish


Analysis

l

377

among three kinds of program changes:

(1) A Type A change alters DIRECTMOD and/or DIRECTREF information. (2) A Type B change adds and/or deletes reference arguments at a call site. (3) A Type C change adds or deletes a call site. 14.1 Type A Changes For a set of changes that are all of Type A, binding relationships are unaffected. There is no need to repeat data-flow analysis, then, so long as the bound sets and reaching-procedure sets from the previous solution are still available. Only the DIRECTMOD and DIRECTREF sets have been altered. To compute the updated PROCMOD (PROCREF) sets, the new DIRECTMOD (DIRECTREF) sets are applied to the bound sets and reaching-procedure sets.32 Interprocedural aliasing patterns cannot be altered by a Type A modification. Cooper and Kennedy [22] update MOD in this manner for Type A changes to FORMAL-DIRECTMOD sets. Where A is added to DIRECTMOD( any formal parameter Q2 in BOUND (A) must be included in PROCMOD(Q). Where A is deleted from DIRECTMOD( an element Q2 of BOUND(A) remains in PROCMOD(Q) iff some formal RC to which it is bound belongs to DIRECTMOD(R). Not only the bound set of a formal but also the set of formals to which it is bound (its bound-to set) must be available. Bound-to sets are represented by the rows of FP-BOUND-CLOSURE. A change to FORMAL-PROCMOD can impact the factoring in of aliasing information to GLOBAL-PROCMOD sets. If a formal B that participates in an introduced alias that meets the conditions described in Section 10.2 is added to FORMAL-PROCMOD, then this alias is to be processed as described in Figure 16 of Section 10.2. Where formals that participate in such an alias are deleted from FORMAL-PROCMOD, it is easiest to simply factor in alias relations “from scratch.” In this case step 6 of the exhaustive algorithm (Section 12) would be repeated. Given our division of the GLOBAL-PROCMOD computation into the GLOBAL-DIRECTMOD computation and the reaching-procedure computation, the above approach can also be applied to Type A changes to GLOBALDIRECTMOD sets. Computing the impact of deletions from GLOBAL-DIRECTMOD on GLOBAL-PROCMOD requires that reached-procedure sets be maintained. Where X is global to procedure P, Q E REACHED- PROC(X, P) iff there is some call chain from P to Q along which X is bound to itself. This is analogous to the need for bound-to sets in computing the impact of deletions from FORMAL-DIRECTMOD. Just as with bound and bound-to sets, reaching- and reached-procedure sets can be represented by a single matrix, where each column 32Where nested procedures are supported, a change to DIRECTMOD can result in changes to FORMAL-DZRECTMOD for a containing procedure (step 3 of the exhaustive algorithm). In the case of an addition to DIRECTMOD, any such additions are easily (incrementally) computed in accordance with step 3. In the case of a deletion from DIRECTMOD, if there are any possible resulting deletions from FORMAL-DZRECTMOD for a containing procedure, it is simplest to recompute FORMALDIRECTMOD for such a procedure. ACM Transactions

on Programming


378

-

Michael Burke

represents RP(X, P) for some procedure P and where each row represents REACHED-PROC(X, Q) for some procedure Q. 14.2 Type B Changes A change to the reference arguments at a call site possibly alters the associated data-flow functions that represent its formal and global bindings. The impact of this change on the binding solutions throughout the program is to be determined. Cooper and Kennedy address Type B changes with respect to the addition or deletion of formals as reference arguments at a call site. Rather than incrementally solving the formal bound set computation as a data-flow problem over the call graph, they use FP-BOUND-CLOSURE (which originally is determined by exhaustive data-flow analysis) as a basis for incremental updates. FP-BOUNDCLOSURE is the reflexive, transitive closure of the matrix FP-BOUND that represents the immediate formal parameter bindings of the program. That is, FP-BOUND(i, j ) = ‘1’ iff formal parameter i is passed to formal parameter j by some call site of the program. Thus, maintaining FP-BOUND-CLOSURE as immediate formal parameter bindings are altered is equivalent to updating the reflexive, transitive closure of a bit matrix as changes occur to it. Where a binding is added to FP-BOUND, updating FP-BOUND-CLOSURE is straightforward. Determining the impact of a deletion is more difficult. Cooper and Kennedy suggest that deletions be handled by removing the deleted immediate bindings from FP-BOUND-CLOSURE and then by applying a version of the incremental iterative data-flow algorithm described in [20] to determine the other bindings that should be removed.33 Their incremental iterative algorithm, however, in restarting at the previous solution, arrives at a fixed point without necessarily discovering all deleted bindings [16, 571: The solution arrived at is not always the desired (maximum fixed-point) solution as computed by the exhaustive iterative algorithm. 34Where immediate bindings are deleted at a call site, this approach could thus result in more conservative MOD information than an exhaustive analysis would produce. Ryder and Paull, using interval analysis as formulated by Allen and Cocke as a basis, have developed an incremental version of interval analysis [52, 561 that can be applied to Type B changes.35 The applications considered by Ryder and Paul1 are intraprocedural data-flow problems, so the graph under consideration is an intraprocedural control flow graph. They use the Allen and Cocke definition of an interval, where it is not required that the interval be a strongly connected region. The data-flow analysis model on which Ryder and Paul1 base their algorithm represents data-flow propagation information by a system of linear equations rather than by edge functions. Despite these differences, the basic approach of the Ryder and Paul1 incremental interval analysis algorithm can be employed here. We develop an incremental version of our own formulation of 33It would also be determined which of the deleted immediate bindings still belong to FP,BOUNDCLOSURE. 34Since this application is a continuous data-flow framework (Appendix A.l), the maximum fixedpoint solution is identical to the meet-over-all-paths solution. 35Applying their algorithm requires the formulation of a “loop-breaking rule” that would accommodate the nonfastness of this application. ACM Transactions

on Programming



*

379

interval analysis in a manner analogous to the Ryder and Paul1 incrementalization of Allen and Cocke interval analysis and then apply our algorithm to Type B changes. 14.2.1 Incremental Interval Analysis. Here we formulate an incremental version of our interval analysis algorithm. It can be applied to the same domain of problems as the exhaustive version (Appendix A). In particular, we apply it to the formal bound set computation. The summary information associated with an interval I consists of the functions f T and, for each exit node n, fCh..nj.The incremental algorithm requires as input the flow graph and its interval structure, the solution previously holding at each node, and the previous summary information for each interval. The set of edges whose functions have been altered and the edge function that is now associated with each edge are also required inputs. The incremental interval analysis algorithm, like the exhaustive algorithm, proceeds in two phases: elimination and propagation. The elimination phase, as before, produces summary information for each interval. Those intervals immediately containing edges whose functions have been altered are processed in an inner-to-outer order (either postorder or reverse preorder). The summary information for each such interval I is recomputed. The processing of I utilizes information from the previous solution by using the previous summary information for the unchanged subintervals of I, rather than recomputing it. If f T has been altered, then the header of I is added to a worklist that drives the propagation phase. If, for some exit node n, fCh..njhas been altered, then at least one of the virtual-edge functions for the immediately containing interval J may have been altered, so summary information for J must also be recomputed. (If J also immediately contains an altered edge function, the new function is, of course, used in computing its new summary information.) In this manner, the elimination phase processes the intervals containing the change outward until either the summary information associated with the exit nodes of a containing interval is unchanged or there are no further containing intervals. The propagation phase then applies the new edge and closure functions as in the exhaustive algorithm, but only at nodes that are potentially affected by the program changes. A worklist of these potentially affected nodes is maintained in interval order. The worklist has been initialized by the elimination phase, during which any node that is the target of an edge (excluding virtual edges) with an altered edge function and any header of an interval with an altered closure function have been added. The propagation phase consists of a loop that repeatedly deletes the first element from the worklist and processes it, until the worklist is empty. Processing a node involves the same processing as in the propagation phase of the exhaustive algorithm, but in addition, a comparison is made between the new solution and the previous one. Where the solution at the node has changed, the successors of the node that also succeed it in interval order are added to the worklist (which must remain sorted in interval order to guarantee an interval-order processing of nodes). Only nodes at which the solution has potentially been altered are processed, and (as in the exhaustive algorithm) no node is processed more than once. ACM Transactions

on Programming


380

Michael Burke

l

Fig. 20.

Ul u2 u3

Previous summary information.

Vl

v2

Ul u2 u3 VI v2

Ul 10 0 0 0 0 (a)

u2

u3 1

10 0 10 0

1 1

Ul u2 u3 Vl v2

VI 0 0 0 1 0 (b)

v2 1 0 0 0 1

Ul u2 u3 10 1 0 10 0 0

0 0

1 0

0 0 (a)

1

Fig. 21.

Ul u2 u3 Vl v2

Vl 0 0 0 1 0 (b)

v2 1 0 0 0 1

Updated summary information.

The number of data-flow operations required by this general incremental algorithm is on the order of the number of edges (including virtual edges) immediately contained in affected intervals plus the number of edges immediately contained in at most one unaffected outer interval. 14.2.2 Application to Incremental Formal Bound Set Computation. We now apply our general incremental algorithm for Type B changes to the formal bound set computation. The summary information for each interval I is represented by the descriptors for fT and for fch..njfor each exit node n. Consider the interval (U, V) of the example of Section 8. The descriptor for f i&l is as in Figure 20a. The only exit node of the interval is V. The descriptor for f,u..v,ias as in Figure 20b. Suppose that the invocation of U in V were altered from CALL U(-, -, V2) to the following CALL U(-, Vl, V2) The modification, then, is that Vl is now passed to U2. This addition results in the summary information shown in Figure 21. The only change to summary information is the addition of the binding of U2 to Vl to the closure. Due to this addition, the node U is placed on the worklist for the propagation phase. Because the entrance-to-exit summary information is unaltered, there is no need to recompute summary information for the containing interval, and the elimination phase has been completed. The incremental propagation phase can be optimized when the only changes to binding information are additions. In recomputing the solution at a node, only the new bindings from its (altered) predecessors need be propagated to it, as the others are already present. 36 In this case the only node on the worklist is U (no 36In an analogous manner, the incremental elimination phase can be optimized to evaluate separately the functions for new bindings and then to unite them with the previous functions. This approach would require, however, the availability of additional previous information with respect to an interval: its internal closure and its forward entrance-to-exit functions. ACM Transactions

on Programming


An Interval-Based Approach to Interprocedural Analysis Ul

u2

u3

Ul

1

0

1

U2 u3 Vl v2

0

1

0

0

0

1

1

0

0

0

0

1

Ul u2 u3 Vl v2

(4

Vl 0 0 0 1 0

v2 1 0 0 1 1

Fig. 22.

381

l

Updated summary information.

(b)

Tl

Fig. 23. Previous and updated summary information.

Tl T2 Ul u2 u3 Vl v2

1 0 0 0 0 0 0 (4

T2 0 1 1 0 0 0 1

Tl Tl T2 Ul u2 u3 Vl v2

1 0 0 0 0 0 0

T2 0 1 1 0 0 1 1

(b)

new bindings are propagated to it from any of the nodes preceding it in interval order). Applying the updated closure function associated with U results in the addition of Vl to the formal bound set of U2. The node V is then placed on the worklist. Since U2 is not bound to any formal of V by the call to V, this new binding at U does not result in any new bindings to the solution at V, and the propagation phase has been completed. Suppose that instead of the above modification the invocation of U in V is altered as follows: CALL

U(V1,

-, V2)

A new binding (of Vl to Ul) has been created. The resulting summary information for interval (U, V] is shown in Figure 22. In this case, then, not only the closure but also entrance-to-exit summary information has been altered. In addition to placing U on the worklist for the propagation phase, the elimination phase must recompute summary information for the containing interval (T, [U, V]). This interval does not have any exit edges, so its summary information consists solely of its closure function. The descriptor of this function from the previous solution is as given in Figure 23a. For an outermost interval, previous summary information is immediately available from the previous FP-BOUND-CLOSURE. The summary information for (T, [U, V]) is now computed as shown in Figure 23b. Since the closure contains the additional binding of Vl to T2, the node T is now placed on the worklist (prior to U, since it precedes U in interval order). The propagation phase then commences. The application of the new closure function at T propagates Vl to BOUND(T2). In this case node U has a predecessor that has been altered. Prior to the application of its closure function, then, Vl is propagated to BOUND(U1). Applying the closure also propagates Vl to Ul. Vl is then propagated to BOUND( V2). Even where the elimination and propagation phases need only examine a single interval, new bindings may be propagated to nodes of this interval throughout ACM Transactions

on Programming


382

l

Michael Burke

the graph. For example, suppose that the original following:

call site were altered to the

CALL U(-, V2, V2) The binding of V2 to U2, then, has been added. This results in the addition to frU,v, of the binding of Ul to U2. During the propagation phase, applying the closure at U now propagates Ul and its bound set (which includes most formals in the program: See Figure 6) to BOUND( U2). We now consider an example in which a binding is deleted. Suppose that the call to P in procedure S were altered to the following: CALL P(-, -, S3) Thus, the binding of S 1 to P2 has been deleted. As a result, Pl, Ql, and Sl are no longer bound to P2 by the internal closure of (P, Q, R, S] and are no longer bound to P3 by the closure of (P, Q, R, SJ. The previous closure for this interval is given by the submatrix consisting of the first 10 rows and first 10 columns of FP-BOUND-CLOSURE. The closure is now as shown in Figure 24. The impact on the closure is that Sl is no longer bound to any (other) formal, Ql is bound only to Sl, and Pl is bound only to Ql and Sl. The propagation phase would in this case examine every node in the call graph, as they all have been altered. The alterations are with respect to the bound-to sets for Pl, Ql, and Sl, which are now as described above. To simplify the exposition, the above examples have all been of a single addition or a single deletion. Our incremental algorithm can update with respect to any combination or number of Type B changes; for example, it can accommodate the above change to the call to P in combination with any of the above changes to the call to U. With respect to the application of our incremental algorithm to the formal bound set computation, we have observed that -the required summary information, for outermost intervals, is present in the previous FP-BOUND-CLOSURE solutions; and -the algorithm can be optimized when all changes are additions. The updating of reaching-procedure sets in response to Type B changes to global bindings can be performed in the same manner as we have described for formal parameter bindings.

14.3 Type C Changes A Type C change alters the structure of the call graph by adding or removing a procedure call. The Ryder-Paul1 algorithm does not accommodate changes to the structure of the graph being analyzed. The basic incremental interval analysis technique of determining the impact of a change by analyzing containing intervals outward as far as necessary and then propagating new information forward where necessary, however, can be applied to most edge insertions and deletions. We now apply our formulation of incremental interval analysis to Type C changes. We simplify this section by not considering edge changes that render the call graph irreducible. However, it is not difficult to update the region structure of the call graph in these cases, using techniques similar to those described below ACM Transactions

on Programming


An Interval-Based Approach to Interprocedural Analysis P2

P3 0

Ql 10

Pl P2

Pl 10 0

;:

0

0

010

1

0

0

010

1

011

0

;: Rl Sl s2 53

0 0 0 0 0

0 0 0 0 0

01 10 0 0 10

0

010 0 0 0 0

1 0 0 0 0

01 10 0 0 10

0

110 11

1

110

Q2 0

0 0

43 Rl 0 0 110

Sl 10

S2

383 S3 0

11

10 0

0 10 11

Fig. 24. Updated closure.

for updating interval regions. In Section 14.8 we describe the processing of improper regions by our incremental data-flow analysis. Our accommodation of Type C changes requires a representation of the interval nesting structure of the graph. The interval tree T’ (termed the interval dependency tree in [55]) corresponding to a reducible graph G represents the interval nesting of G. The nodes of T’ are the nodes of G that are not singleton nodes; the edges reflect the interval structure of G. The parent of node n in T’ is the header of the interval immediately containing n in G. Each header node is the parent in T’ of the nodes immediately contained in the interval that it heads. An inserted edge falls into one of the following categories, where ancestry and descendancy relationships refer to the dfst T: (1) The target is a proper descendant

of the source (a tree or forward edge has been inserted). (2) The target is neither an ancestor nor a descendant of the source, and succeeds it in interval order (a cross edge has been inserted). (3) The target is an ancestor of the source (a back edge has been inserted). (4) The target is neither an ancestor nor a descendant of the source, and precedes it in interval order. For cases (1) and (2), the target node must be an immediate descendant of an ancestor of its source with respect to the interval tree T’, or the graph becomes irreducible [19]. For such an inserted edge, the call graph’s interval structure remains unaltered, and the bindings of the inserted edge can be processed in the same manner as the newly introduced bindings at an edge. In case (3), a back edge is introduced. The back edge is processed as in the interval finding algorithm of [58]. By traversing predecessor chains from the source node, the strongly connected region determined by the back edge is identified. Either the target is a new header (case (a) below) and thus a new interval has been formed, or the target was already a header (case (b)). (a) A new interval has been formed. Its summary information is computed (using the previous summary information for any subintervals it may contain). ACM Transactions

on Programming


384

*

Michael Burke

Summary information for its immediately containing interval is then computed, and so on outward as with Type B changes. Forward propagation then takes place also as described for Type B changes. (b) The interval associated with the target node of the introduced back edge may or may not have been enlarged (this can readily be determined during the predecessor-chain traversal). Where the interval has been enlarged, containing intervals that do not already contain the added subregion are enlarged to include it. In either case, summary information for the interval associated with the new back edge is determined. Any new bindings associated with the summary information are processed in the same manner as for Type B changes. Case (4) is problematic. Such an edge is similar to a cross edge, but runs in the “wrong” direction. The resulting graph may or may not be reducible. Adding the dashed edge to the graph of Figure 25a renders it irreducible. The graph of Figure 25b remains reducible with the addition of the dashed edge, but its dfst must be altered. The edge (P, R) becomes a forward edge, and the new edge (Q, R) is a tree edge. The reducible case here does not lend itself to the incremental treatment we have described, as the dfst must be rebuilt and the interval order of the nodes is altered. We now consider edge deletions. It is assumed that the deletion does not “disconnect” the graph (it remains a flow graph). The deletion of a cross or forward edge does not affect the graph’s interval structure and can be processed in essentially the same manner as the deletion of bindings at an edge. Where a back edge is removed, either it is the lone back edge of its target node or others remain. In the former case, the target no longer heads an interval. Summary information for containing intervals must be recomputed as far outward as necessary, followed by forward propagation where necessary. In the latter case, one of several back edges to a header has been removed. Each back edge defines a strongly connected region; the interval defined by a header consists of union of the regions defined by its back edges. The interval that results from the removal of a back edge may readily be determined, given the back-edge-region correspondences that were computed as the original interval was constructed. The new interval region may be smaller. If so, it must be checked whether the containing region is now smaller also (this too may readily be determined), and so on as far outward as necessary. In any case summary information for the intervals containing the deleted edge must be recomputed as far outward as necessary, followed by forward propagation where necessary. The deletion of a tree edge (u, w) may occur, even under the assumption that the call graph remains a flow graph. Where prior to the deletion the graph has multiple (u, w) edges, the deletion is not problematic. But, where the lone (LJ,w) edge is removed, the dfst, and thus the interval order of the nodes, is altered, and our incremental approach cannot be applied, 14.4 Update MOD For Type B and Type C changes, we have presented the updating of formal- and global-binding information. We must also consider the updating of PROCMOD and PROCREF sets in response to changes in binding information. Finally, we ACM Transactions

on Programming


An Interval-Based


Fig. 25.

(a)

Problematic

385

Analysis

edge-insertion

examples.

(b)

must also consider the updating of MOD(REF) information in response to changes to PROCMOD(PROCREF). The impact of a change to bound set information on PROCMOD (PROCREF) sets can be determined in essentially the same manner as described for changes to DIRECTMOD (DIRECTREF) sets. Where a new binding of a formal P.A to a formal Q.B is introduced and where Q.B is in DIRECTMOD (Q), P.A now belongs to PROCMOD(P). Where an old binding of P.A to Q.B no longer holds and Q.B is in DIRECTMOD( then the membership of P.A in PROCMOD(P) must be recomputed: P.A remains in PROCMOD(P) iff some formal R.C to which it is bound belongs to DIRECTMOD( Given changes to FORMALPROCMOD sets, the adjustment of GLOBAL-PROCMOD sets as required by certain introduced aliases must be performed as described in Section 14.1. Updating MOD and REF information requires updating potential-alias information in addition to updating PROCMOD (PROCREF). A Type B or Type C change can affect interprocedural alias patterns. Incremental interval analysis could serve as the basis for an incremental algorithm for computing potential aliases. In the ensuing section, we develop an alternative algorithm for this computation. Having computed the changes to aliasing information, their impact on PROCMOD (PROCREF) must be determined.37 When an introduced alias is added and meets the conditions described in Section 10.2, it must be processed as described in Figure 16. When an introduced alias is deleted, it is easiest to factor alias relations into GLOBAL-PROCMOD (step 6 of the exhaustive algorithm) “from scratch.” Having computed the modifications to PROCMOD and potential-alias sets, the corresponding adjustments to MOD sets can be determined in an inexpensive and straightforward manner. 14.5 Incremental Alias Analysis Cooper [20] and Ghodssi [30] have developed similar incremental alias analysis algorithms based on iterative data-flow analysis. Both algorithms require modification, however, to accommodate deleted bindings properly in the presence of cycles in the call graph. Where a binding that has either generated or propagated an alias pair (A, B) in procedure P is deleted, their algorithms determine whether the alias still holds in P by a check of whether there remains a call to P that propagates or generates 37Where nested procedures are supported, a change to aliasing information in procedure P can result in changes to FORMAL-DZRECTMOD for a containing procedure (step 3 of the exhaustive algorithm). In the case of an added alias, any such additions are easily (incrementally) computed in accordance with step 3. In the case of a deleted alias, if there are any possible resulting deletions from FORMAL-DZRECTMOD for a containing procedure, it is simplest to recompute FORMALDZRECTMOD for such a procedure. An alteration of FORMAL-DZRECTMOD must be handled as described in Section 14.1. ACM Transactions

on Programming


386

l

Michael Burke

the alias. But such a check is insufficient in the presence of a cycle. Suppose the deleted edge had generated (A, B) in P, that a call in P to Q had propagated (A, B) to (X, Y) in Q, and that a call to P by Q had propagated the alias pair (X, Y) back to (A, B). Their algorithms will determine that the alias (A, B) still holds, since the call of P by Q apparently still propagates (X, Y) to (A, B) (the alias pair (X, Y) will also not be deleted). We now propose a correction to the above difficulty. Cooper’s distinction between the introduction and propagation of aliases is useful here. For an interprocedural alias (X, Y), there exists at least one alias chain (X,, Yl), (X2, Yd, . . . , (X,, = X, Y, = Y), where n 2 1, such that (Xi, Y1) is an introduced alias pair and (Xi+l, Yi+l) is propagated by (Xi, Yi), for 1 % i 5 n - 1. Where a call-graph edge is deleted, the alias (A, B) still holds only if there is at least one alias chain terminating in (A, B) such that no member of the chain is introduced or propagated by the deleted edge. A depth-first search for such a chain would start at the target node of the deleted edge and proceed in the reverse direction of the call graph. This search considers only those bindings relevant to the pair (A, B) and considers only the reverse of those paths that had previously produced the alias. Where it is determined that an alias pair has been deleted, any alias pair that it propagates must be placed on a worklist as a candidate for deletion and processed in the same manner as described above. 14.6 Incremental Updating of External Variable Bindings We have seen that the determination of reaching procedures for external variables is equivalent to the computation of the call graph’s transitive closure (Section 9.1.1). For a language without block structure, such as FORTRAN, all global variables are externals, and so the reaching-procedure computation for all globals reduces to computing the call graph’s transitive closure. We have also observed that this computation satisfies the conditions specified by Zadeck [67] for application of his data-flow technique. As such, Zadeck’s incremental dataflow technique can be applied. As with his exhaustive technique, it is not required that the call graph be reducible. 14.7 A Related Incremental Algorithm for Type C Changes Ryder and Carroll 1541 adopt the MOD decomposition and the interval analysis framework as described here (and in [13]) for incrementally updating MOD, and address our difficulties with incrementally accommodating changes that alter the dfst. Their algorithm does not rely on the dfst as we do to define the interval order of nodes and to classify edge insertions and deletions. They classify each edge in the call graph according to the relation between its source and target nodes in the interval tree. In their interval tree, the children of a parent are ordered left-to-right in topological order. When a Type C change occurs, they use the call graph (as augmented by its virtual edges) to update the interval tree in terms of interval membership, interval nesting, and topological order. Their incremental data-flow analysis is then driven by the structure of the interval tree. Their algorithm incrementally accommodates all Type C changes that result in a reducible graph and detects when an irreducibility has been introduced. ACM Transactions

on Programming



387

14.8 Irreducible Graphs Our general incremental algorithm processes improper regions in essentially the same manner as interval regions, with the only differences reflecting their different treatment by the exhaustive algorithm (Section 11). The eliminationphase processing of an improper region requires iterating through its nodes until a fixed point is reached in the computation of the function F (n) associated with each node n. Our incremental algorithm computes these functions “from scratch” in the recomputation of an improper region’s summary information (except for use of previous summary information for interval or improper subregions). In the propagation phase, our incremental algorithm (like the exhaustive algorithm) processes interval and improper regions in essentially the same manner. 15. IMPLEMENTATION 15.1 Sparse-Matrix

CONSIDERATIONS

Representation

and Transitive-Closure

Algorithm

In the course of the formal bound set computation, the operations of union, composition, and transitive closure are performed on bit matrices that represent the parameter-binding effects of edges and paths of the call graph. These matrices will generally be sparse, as a given formal parameter typically is either bound or bound-to only a few of the program’s formals. A sparse-matrix representation is desirable from the viewpoint of space efficiency, as well as time efficiency with respect to the operations of union, composition, and, especially, transitive closure. We assume here a standard representation of a sparse matrix, where each nonempty row and column is represented by an ordered, linked list of nodes. Each node corresponds to a “1” entry and contains fields for row and column numbers as well as for row and column links. We now consider the space requirements for the formal bound set computation. In the elimination-phase processing of an interval 1, all computed path functions can in practice be represented by an underlying bound set matrix of size fpr by fp,, where fpI denotes the number of formal parameters belonging to the procedures in I. One can in practice use the single matrix FP-BOUND-CLOSURE to represent path functions during the elimination phase and bound set solutions during the propagation phase, since any binding that is determined with respect to a path in the elimination phase must also belong to the solution found by the propagation phase. (The elimination phase would then be regarded as initializing FP-BOUND-CLOSURE for the propagation phase.) For the exhaustive solution, then, only the single matrix FP-BOUND-CLOSURE is required. In the incremental context, however, summary information for each nonoutermost must be compleinterval must be maintained, and so FP-BOUND-CLOSURE mented by smaller matrices representing summary information for each such interval. These matrices should all be represented by a sparse representation such as the one described above. The operations of union, composition, and transitive closure can be performed in time that depends on the number of nodes in the sparse-matrix representation. The union of two matrices results from merging their corresponding lists of rows and columns. In that the lists are ordered, the merge operation for a given row ACM Transactions

on Programming


388

-

Michael Burke

or column requires time proportional to the size of the resulting list. Thus, the time cost of the union operation is linear with respect to the number of elements in the resulting sparse matrix. We now consider the sparse-matrix evaluation of R, 0 Rz, where R, is represented by M1 and R2 by MP. A (nonempty) column in M, corresponding to a formal A has elements (A,, A), (A,, A), . . . , (A,,, A). Each such column is to be processed. Where the row for A in MP has elements (A, &), (A, BP), . . . , (A, B,) the m-times-n elements of the form (Ai, Bj) (where 1 5 i I n and 1 % j I m) are added to the output matrix M. (If the row corresponding to A in M2 is empty, then nothing is done.) Where the solution matrix M has s elements and the column or row in M of largest size has k elements, then the bound on the time cost of this operation is O(k s). We now consider the evaluation of R+, where R is represented by M and M+ is the output matrix. The relation R determines a directed graph DG whose nodes are the elements in the domain (and range) of R and where (a, b) E DG w (a, b) E R. For expository purposes, here we regard M as representing DG. The pair (a, b) belongs to R+ (and so M+(a, b) = “1”) iff there is a nontrivial path from a to b in DG. As in Section 9.1.2, the transitive closure of a directed graph DG is to be computed, only here DG is not a flow multigraph and generally 1E 1