Dec 1, 1995 - ject by using constructor: WzDataFlow(enum WzDFMethod, ... P !Q and P !R, respectively, since the bound set of S3 includes P2 and P3.
Data-Flow Analysis Framework in
Wizard++
Peiyi Tang and John N. Zigman
Technical Report SC-MC-9605 December 1, 1995
Data-Flow Analysis Framework in Wizard++ Peiyi Tang
Department of Mathematics and Computing The University of Southern Queensland Toowoomba QLD 4350 Australia John N. Zigman
Department of Computer Science The Australian National University Canberra ACT 0200 Australia December 1, 1995
Abstract
Data- ow analysis problems such as reaching de nition, common expression and constant propagation have strong similarities. They can be modeled by an abstract framework and each individual problem can be treated as a particular instantiation of it. We have built an abstract data- ow framework in Wizard++ and used it to solve many intraprocedural as well as interprocedural data- ow problems. In this report, we describe the user interface of this abstract data- ow analysis framework to enable you to write codes for any particular data- ow analysis in Wizard++.
This work was supported in part by the Australian Research Council under Grant No. A49232251
ii
1 Introduction Data- ow analysis problems such as reaching de nition, common expression and constant propagation have strong similarities. They can be modeled by an abstract framework and each individual problem can be treated as a particular instantiation of it. This was discovered long time ago by Kildall [1]. Wizard++ is an experimental parallel compiler currently being developed on top of Sage++, a generic compiler front-end from Indiana University, USA. When we started to design Wizard++, Sage++ had very few and yet incomplete modules of data- ow analysis. We need to add a lot of data- ow analysis modules, both intraprocedural and interprocedural, to Wizard++ to make it an interprocedural parallel compiler. At the early stage of the design, we decided to build an abstract data- ow framework to unify the implementation of all the data- ow analysis modules. The reasons behind of this decision are as follows: With modern object-oriented programming languages like C++ that support parametric polymorphism, the implementation of abstract data- ow analysis framework becomes much easier. With the abstract data- ow framework built, the implementation of interprocedural and intraprocedural data- ow analysis modules requires much less coding and testing. Using the abstract data- ow framework helps to unify the concepts of individual problems and make Wizard++ more reliable and robust. We have built the abstract data- ow framework in Wizard++. Our experience proved that our decision to build it is right. We have used this framework to to implement many data- ow modules, a few of which are listed as follows: reaching de nition for use-de nition and de nition-use chains intraprocedural and interprocedural constant propagation interprocedural scalar REF and MOD analysis interprocedural array regular section analysis The abstract data- ow analysis framework saved us a lot of time and eorts for coding these modules. The codes are more reliable due the simple interface with the framework. More evidence will be shown when we describe these modules in other technical reports. In this report, we concentrate on the data- ow analysis framework itself. We give a brief introduction of the theory of data- ow framework in section 2. In section 3, we describe the implementation of the abstract data- ow analysis framework in Wizard++. In section 4, we give an example of using the abstract data- ow framework to write the code for a particular data- ow problem: bound set analysis. In the nal section, we summarize what you need to do to use the data- ow framework. The purpose of this paper is to enable you to write codes for any data- ow analyses using the abstract data- ow framework in Wizard++.
1
2 Data-Flow Analysis Framework Given a program represented in the form of graph, the goal of a data- ow analysis problem is to nd the nal information about some aspects of the program in each of the nodes assuming that the control of the program ows along the edges of the graph. The information sought is represented in the form of a element of a semilattice. The eect of an edge is represented in the form of a function from the semilattice to itself.
2.1 Semilattice
A semilattice is a nonempty set L with an idempotent, commutative, and associative meet operator . That is, a; b; c L, we have a a = a, a b = b a, and a (b c) = (a b) c. We de ne a relation among the element of the semilattice L such that a b a = a b. It can be proved that the relation is a partial ordering, i.e. is re exive, antisymmetric and transitive. The element b L is denoted as (called \top") if and only if for all a L we have a b(or a = a b). The element c L is denoted as (called \bottom") if and only if c = x1 xn, where L = x1 ; ; xn (c is one of x1; ; xn). This fact is equivalent to that c a for all a L. A simple example of semilattice is shown in Figure 1. The lattice L is the power set of the set S = x; y; z , i.e. u
8
2
u
u
u
u
,
>
u
f
u
u
2
2
uu
u
2
u
?
g
2
f
g
L = 2S = ; x ; y ; z ; x; y ; y; z ; x; z ; x; y; z f
f
g f g f g f
g f
g f
g f
gg
The meet operator is the intersection between the sets in L. The partial ordering derived from the meet operator is the relation \subset of" , because for any A; B L, A B A = A B . Figure 1 shows the minimum acyclic graph of this partial ordering. \
\
2
,
\
{x, y, z} {x,y}
{x}
{x,z}
{y,z}
{y}
{z}
{}
Figure 1. An example of lattice
2.2 Iterative Data-Flow Analysis
The ow of a program is represented by a directed graph G = (N; E; r), where N is the set of nodes, E the set of edges between the nodes, and r the starting node which does not have any incoming edges. The goal of a data- ow analysis is to nd some program facts on each of the 2
nodes. These facts are represented in the form of the elements of some kind of semilattice L. In other words, the goal is nd a mapping INFO : N L. The typical approach of data- ow analysis is to start with an initial mapping INFO : N L such that INFO(v) = for all v N . Each e E is associated with a function fe : L L. Given a node v, let Pred(v) be the set of all predecessors of v in the graph, i.e. u Pred(v) if and if only there is edge u v in E . Assume that Pred(v) = u1; ; un . The new lattice value of node v is calculated by !
!
>
2
2
!
2
!
f
INFOnew (v) = INFOold(v) fu1 !v (INFOold(u1)) u
u u
g
fu1 !v (INFOold(un))
The calculation continues until the lattice values for all nodes of the graph are stabilized and the nal lattice value of each node v is the value we seek. This style of data- ow analysis is called iterative and forward analysis. There are also problems that need backward data- ow analysis in which the direction of the information ow is opposite to that of edges of the ow graph. In backward analysis, the formula to calculate the new lattice value of a node v is:
INFOnew(v) = INFOold(v) fv!w1 (INFOold(w1)) fv!w (INFOold(wp )) where w1 ; ; wp are the successors of node v in the graph. Here function fv!w captures the eect of the lattice value of wi on that of v along the edge v wi . u
u u
p
i
!
2.3 Interval-Based Data-Flow Analysis
The iterative data- ow analysis described in section 2.2 is one way for data- ow analysis. If the graph is reducible, we can use interval-based data- ow algorithm to solve the same problem. A graph is reducible if it becomes acyclic after its back edges are deleted. The abstract data- ow framework implemented in Wizard++ can use both iterative and interval-based algorithms. The interval-based algorithm used in Wizard++ is from M. Burke's paper [2]. Since the reverse graph of a reducible graph is not necessary reducible, the interval-based analysis is applicable to forward data- ow analysis problems. At the time of this writing, we have only implemented iterative and interval-based forward algorithms. We will extend our data- ow framework to include iterative backward analysis.
2.4 Data-Flow Framework and Data-Flow Problem
In forward data- ow analysis, the function associated with each edge represents the change of the lattice value of from the source node to the destination node. The meet operations between the incoming edges represent the joint eect of the ows on the destination node. Let the set of all functions from L to L be denoted as f : L L . The functions associated with graph edges form only a subset of it, i.e. F f : L L . The data- ow framework is the two-tuple (L; F ). Given a data- ow framework (L; F ), a data- ow problem is the combination of: the data- ow framework (L; F ), the graph G = (N; E; r) representing the program, and a mapping from E to F . f
f
3
!
!
g
g
3 Implementation of Abstract Data-Flow Framework
The abstract data- ow framework is implemented in Wizard++ as a template class called WzDataFlow. Part of the de nition of WzDataFlow class is as follows:
template class WzDataFlow { public: WzDataFlow(); WzDataFlow(enum WzDFMethod, WzSet &, WzSet &, V &, WzSet &, L iV = L(0)); WzDataFlow(WzDataFlow &); ~WzDataFlow(); WzDataFlow& operator=(WzDataFlow &); enum WzDFMethod method(); V origin(); WzSet edges(); WzSet nodes(); WzSet info(); protected: ... enum WzDFMethod int V WzSet WzSet WzSet WzSet WzSet WzSet WzSet L };
methodVal; empty; originNode; nodeSet; functionSet; edgeSet; dfstSet; intervalSet; treeSet; infoSet; initValue;
3.1 Template Arguments L,
F, V
and E
There are four template arguments as indicated in template . L, F, V, E
class L - Lattice object type contains; L(); default constructor L(0); construct top L(L &); replicating constructor L operator^(L &); meet operator int operator==(L &); equivalent int operator!=(L &); not equivalent L& operator=(L &); assignment L& operator=(0); construct top class F - Data Flow function object
4
F(); F(F &); L operator()(L &); int operator==(F &); int operator!=(F &); F& operator=(F &);
default constructor replicating constructor function application equivalent not equivalent function assignment
class V - node class V(); V(V &); int operator==(V &); int operator!=(V &); V& operator=(V &);
default constructor replicator constructor equivalent not equivalent assignment
class E - edge object E(); E(E &); E(V,V); V from(); V to(); int operator==(E &); int operator!=(E &); E& operator=(E &);
default constructor replicator constructor edge 1st -> 2nd source node destination node equivalent not equivalent assignment
3.2 Constructors
Constructor WzDataFlow() creates an empty object, while Constructor WzDataFlow(WzDataFlow &) simply copies all the data members of the argument to the newly created object. The data- ow analysis is carried out when constructing a real WzDataFlow object by using constructor: WzDataFlow(enum WzDFMethod, WzSet &, WzSet &, V &, WzSet &, L iV = L(0)). The rst argument of enumerate type WzDFMethod can have two values: INTERVAL or ITERATIVE. If it is INTERVAL, the constructor rst checks whether the graph is reducible. If the graph is reducible, the constructor uses interval-based algorithm to perform the data- ow analysis; otherwise it uses the iterative algorithm. If the argument is ITERATIVE, the constructor uses the iterative algorithm straightaway. The second argument of type WzSet is the set of the nodes of the graph. The third argument of type WzSet is the set of the edges of the graph. The fourth argument of type V is the origin node of the graph. The fth argument iV of type L is the initial lattice value for all the nodes before the data ow analysis starts. In Wizard++ we use the top lattice value for all the data- ow analysis problems.
3.3 Access Functions
After construction, the data- ow information can be accessed by two functions: WzSet
nodes();
5
WzSet
info();
Functions nodes() returns the set of the nodes in the depth- rst search reverse postorder which is the same as depth- rst backward preoder. Function info() returns the set of lattice value of the corresponding nodes. The order of lattice values returned by info() is the same as the order of the nodes returned by nodes().
4 Example In this section, we present an example to show how to use the data- ow analysis framework in Wizard++. The data- ow problem we are going to solve is to nd the binding of each formal parameters of a Fortran program. Consider the Fortran program in Figure 2. program foo external P,Q,R,S call P(1,2,3) end subroutine P(P1,P2,P3) call Q(P1,2,P2) call R(P3) end subroutine S(S1,S2,S3) print*,'stuff' end subroutine Q(Q1,Q2,Q3) call S(Q1,Q2,Q3) end subroutine R(R1) call S(1,R1,R1) end
Figure 2. An example of Fortran program The call graph of this program is shown in Figure 3. Since P calls Q with formal P 1 as the argument for formal Q1, we say that formal P 1 is bound to formal Q1 and the bound set of Q1 contains P 1. Similarly, the bound set of Q3 contains P 2. For the same reason, Q1 is bound to S 1 due to the call site Q S . Since P 1 is bound to Q1 and Q1 is bound to S 1, P 1 should be also bound to S 1. In other words, the bound set of S 1 contains P 1 and Q1. Let us look at the bound set of S 3. Following the path P Q S , it can concluded that the bound set of S 3 should include Q3 and P 2. The path P R S indicates that the bound set of S 3 should include R1 and P 3. Therefore, the ultimate bound set of S 3 should be Q3; P 2 R1; P 3 . !
!
!
!
!
f
6
g [ f
g
foo
P
Q
R S
Figure 3. Call graph of a program The bound set analysis is one of the interprocedural analyses needed to compute the REF and MOD information (The detailed description of the problem can be found in [2]). For example, if S 3 were modi ed in S , it can be derived that P 2 and P 3 may be modi ed at the call sites P Q and P R, respectively, since the bound set of S 3 includes P 2 and P 3. The bound set analysis is a forward data- ow problem. In order to use the data- ow framework to solve this problem, we need to de ne the particular lattice and function classes for L and F , respectively. The graph needed for this problem is the call graph of the program. Wizard++ has a module for building call graphs and the node class for V and edge class for E of call graph have already been de ned. See [wizard++ call graph 1995] for the details. !
!
4.1 Bound Set Lattice Class
The de nition of the lattice class for the bound set problem is as follows: class WzBoundL { public: WzBoundL(); WzBoundL(int i); WzBoundL(WzBoundL &WzBoundL); WzBoundL(WzSet &s); ~WzBoundL();
// // // //
construct construct construct construct
top top when i=0 copy initial bindings
WzBoundL operator^(WzBoundL &); int operator==(WzBoundL &); int operator!=(WzBoundL &); WzBoundL &operator=(WzBoundL &); WzBoundL &operator=(int i);
// // // // //
meet operator a equal to b? a not equal to b? assign b to a if 0, then initialise to top
WzSet v;
// binding storage
};
The lattice value is represented as a set of sets of formals. In Wizard++, the abstract syntax tree is provided by Sage++. Formals of Fortran functions and subroutines are represented as pointers to SgSymbol. The only data member of WzBoundL v is a set of sets of SgSymbol pointers, each of which corresponds to the bound set of a particular formal. The order of these bound sets in v is the same as in allFormals, a static global variable which contains all the formals of the program. 7
The meet operator of WzBoundL is implemented as: WzBoundL WzBoundL::operator^(WzBoundL &l) { WzBoundL r(0); int j; assert(+v == +(l.v)); for(j=0;j" identifier()