email: ferrantet2cs.ucsd. edu. Permission to make digital\hard copy of part or all of ..... iMap(X) are both set to W. Since. X. E Nr,. W is placed in No. When loop.
Efficiently
Computing
@-Nodes On-The-Fly
RON K. CYTRON Washington
University
in St. Louis
and JEANNE
FERRANTE
University
Recently, the
of California
Static
efficient
those
graph
where
were
found
that
could
present
Single-Assignment
solution
set of flow
at San Diego
nodes
that
potentially
inherently
the
worst-case
Sparse
affect
iterated
algorithm
with
have Other
of the
initial flow
advanced with
nodes
so-called
are
@-nodes
set of nodes, graph.
for
an initial
relevant
these
to the input
exactly
been
is provided
Previously,
frontiers
for determining
method solution.
combine.
respect
Graphs
Each
a problem’s
must
dominance time
Evaluation
problems.
solutions
quadratic
an almost-linear
and
optimization
disparate
by computing take
Form
of program
a process
In this
article
we
the same set of ~-nodes.
Descriptors: D .3.3 [Programming Languages]: Language Constructs Languages]: Processors—compdew; structures; D. 3.4 [Programming optimization; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical AlG.2.2 [Discrete Mathematics]: gorithms and Problems—computations on dwcrete structures; algorithms; 1.1.2 [Algebraic Manipulation]: Algorithms—analysis of Graph Theory—graph Categories
and
Subject
and Features—control
algorithms; General
1.2.2 [Art Terms:
Additional
ificial
Algorithms,
Key
Words
1. MOTIVATION Static
Intelligence]:
Automatic
Languages,
and Phrases:
AND
Theory,
Static
Programming—program
transformation
Performance
Single-Assignment
(SSA)
form
BACKGROUND
Single-Assignment
(SSA)
form
[Cytron
et al.
1991] and the more general
Sparse Evaluation Graphs (SEG) [Choi et al. 1991] have emerged as an efficient mechanism for solving compile-time optimization problems via data flow analysis [Alpern et al. 1988; Choi et al. 1993; 1994; Cytron et al. 1986; Rosen et al. 1988; Wegman and Zadeck 1991]. Given an input data flow framework, the SEG construction
algorithm
distills
the flow graph
nodes are then
interconnected
tions.
SSA form identifies
Ron A
Similarly,
K. Cytron
is currently
preliminary
Workshop
version
were Research Authors’
Staff
MO
San Diego, 92093–0114; Permission granted
email: to make
advantage,
the
copying
servers, 01995
email:
digital\hard
copyright
National
for Parallel
Science
in
grant
Proceedings 1993.
Labs,
University,
Yorktown
Department
edu; Jeanne
Ferrante,
and Engineering,
9500
CCR-9402883. time,
6th
Annual
both
authors
Heights,
New
of Computer University
Gilman
solu-
those variable
of the
At that
These
flow
York.
Science,
of California,
Drive,
La Jolla,
CA
edu. copies the
M by permission to lists,
Foundation,
Reserach
nodes. data
interconnects
Computmg,
T. J. Watson
copy of part
0164-0925/95/0500-0487
for evaluating
appeared
cytronClcs.wustl.
that
notice,
a set of relevant
Science
algorithm
Washington
ferrantet2cs.ucsd.
or to redistribute ACM
Cytron,
of Computer
fee provided
given
by the
at IBM
K.
63130–4899;
Department
without that
Ron
into
edges suitable and appropriately
@-placing
and Compders
Members
addresses:
St. Louis,
funded
of the
on Languages
with
title
are
or all of this not
of the
of ACM, requires
prior
made
publication,
Inc.
To copy
specific
work
for personal
or distributed and
its
othervnse,
permission
for date
or classroom profit appear,
to repubhsh, and/or
use is
or commercial and
notice
is
to post
on
a fee.
$03.50
ACM Transactions on Programmmg Languages and Systems, Vol 17, No. 3, May 1995, Pages 4S7-506
488
Ron K. Cytron and Jeanne Ferrante
.
references that are relevant to solving certain data flow problems. The SEG and SSA algorithms take the following structures as input: Graph
Flowgraph.
then we write
~f has Nf,
X 6 Pred(Y)
Depth-first
edges &f, and root
and
Y c SUCC(X).
Let dfn(X)
numbering.
be the number
1< in
a depth-first
DFST.
of ~f. 1
search
Similarly,
we define
dfn(X)
We
the
Dominator
flow
Let idom(X)
tree.
We say that graph
call
to Y;
Y,
dominates
Y, written
unique
immediate
dominator
Node
Inn%al sparse
L Nf
The output
flow
SSA,
and SSA algorithms
tree
and
of flow graph
node from
transitive.
Y and X # Y.
path
We say that
X
Each node X has a
W ~idom(X).
dominator
of those
nodes
such
nodes
represent
such
nodes
contain
essentially
produce
tree.
(An
1.)
that
must
appear
nonidentity
in the
transference
of variables.
definitions
the following
structures
as
:
No,
Sparse nodes.
Previous
which
we compute
Nti G N.,
as a property
which
we compute
SEG and SSA construction
algorithms
~-function
nodes.
(1) The algorithm
precomputes DF(X)
the dominance
= {z
I (3(Y,
In other words, X dominates The dominance (2) The algorithm (3)
subset
an SEG, For
spanning
on every
tree are shown in Figure
is an initial
framework.
SEG
reflexive
appears
of X in a flow graph’s
nodes.
in a data
depth-first
dominator if X
>> X and V W >> X,
For
X
such that
and its dominator
Na
with
= k.
Y,
X >> Y, if X z
flow graph representation.
associated
X ~
is both
serves as the parent
idom(X)
example
E &f,
I
I d~n(X)
written
idom(X)
idom(X)
If (X, Y)
is connected.
mapping
= X
domination
strictly
the
~f
associated
be the immediate
X dominates
Entry
lNf
dfn(idom(Z)
dominators
).
have decreasing
depth-first numbers. The second case is when X and Y are siblings in the dominator tree. No order is determined by the “typical case,” since here both nodes have the same immediate dominator. We define the equidominates dominator
of a node X as those
equidom(X) For
example,
nodes
with
the same
immediate
as X:
in Figure
= { Y
\ idom(Y)
= idom(~r)
}.
1, equidom(A)
= {A,
V, W, X, Y, Z }
More generally, the noun equtdommates refers to any such set of nodes. Our algorithm essentially partitions equidominates into blocks of nodes that are in each other’s iterated dominance frontiers, but without the expense of explicitly ACM
Transactions
on Programming
Languages
and Systems,
Vol. 17, No. 3, May
1995.
492
Ron K. Cytron and Jeanne Ferrante
.
Algorithm[l]
Sparse
NodeCount foreach
graph
node
determination
(simple)
+- O
(X 6 ~f)
T(A-)
do
+ false
;
O(X)
t
false
;
Map(x)
+- x
od Cnum for
+
O
n = N downto foreach
1 do
({Z
\ dfn(zdom(Z))
if (Cnum(Z)
= n})
= O) then
call
do Vzstt(Z)
fi
od od
for
n = N downto foreach
1 do
({Z
I d~n(zdom(Z))
= n})
do
call
F’znzsh(Z)
od
od end Function for
FtndSnode(Y, (X
= Y)
P)
repeat(X
if (T(Map(X))
: node = idom(X))
or
do = P))
(zdom(X)
then
return
(Map(X))
fi
od end Procedure
IncludeNode(X)
T(4Y)
+- true
end Procedure
F’mzsh(Z)
foreach
((Y, Z)
s +
c &f
I Y # zdom(Z))
FzndSnode(Y,
do
idom(Z))
if (’Y(s)) then 0(2)
+
true
fi od if (T’(Map(Z))) call
then
IncludeNode(Z)
fi end
Figure
computing It
dominance control
is the
needed
order.
frontiers. flow
If
the
nent
(S CC)
(Afo ) iff one of the
of control
the
its
SCC.
of this If the
determined
flow
equidominates
are
edges,
then
frontier
node flow
used
are not edges
in
any is.
relation
is then
equidominates
by control
between
equidominates
dominance
membership
edges
equidominates
D&’+
termines
3
the
of the Our
for
between
their
determines
connected
equidominate~ finds
a representative same
here
strongly
algorithm
to determine in the
that
same
the SCC,
respective
in the these node, the
SCC
SCCS
is in
and
de-
Map(X).
membership then
the
compo-
The
of all nodes topological
SCCS is the
in
order
correct
order
to follow. Our
algorithm
connected ACM
uses
components
TransactIons
a well-known of the
on Programnnng
algorithm
dominance Languages
[Aho
frontier and Systems,
et al. graph
1974] (in
which
Vol. 17, No. 3, May
to
find an
1995
strongly edge
from
Efficiently Computing Procedure
On-The-Fly
.
493
VZsit(Z)
LL(Z)
G Cnum(Z)
call
+Nodes
G
+ +NodeCount
push(Z)
foreach
((Y, Z)
: &f
\ Y # idorn(Z))
s +- FindSnode(Y, if (idom(s)
call
do
idom(Z))
then
# zdom(Z)) InctudeNode(Z)
else = O) then
if (Cnurn(s) call
Visit(s)
LL(Z)
+- rnzn(LL(Z),
LL(s))
else if (( Cnurn(s) LL(Z)
< Cnum(Z))
and
+- mzn(LL(Z),
OnStack(s))
then
Cnum(s))
ti fi if (T(s)
not
and
call
then
OnStack(s))
IncludeNode(Z)
fi fi od if (LL(Z)
= Cnum(Z))
then
repeat Q 4-
~0~()
Nfap(Q)
i-- Z
if (Q c N.) call
then
IncludeNode(Q)
fi if (T(Q))
then
call
IncludeNode(Z)
fl until
(Q = Z)
fi end Figure
Y
to
Z
implies
spanning
tree
To do this, et al.
the
1974].
to
nodes
Briefly,
give
Cnum
in this
reverse
of the
in reverse
been
in M+
from
Nm. ACM
visited,
data
the
of last
structures
depth-first
depth-first
is a root
finds
order
or
the
and
assigned
d~n assigned
“SCC”
in
depth-first
by depth-first
Cnum(X) number
number
of a SCC.
roots
visit
search.
LL(X)
[Aho
by procedure
initially).
“component”
LL refers
is to
manner.
an overview
‘VZsZt)
have
algorithm in the
is a forward,
and
nodes
SCCS
uses auxiliary
structures
procedure
That
the
if a node
associated
data
D.F(Y)).
to the
determine
We now
c
algorithm
(as opposed
Visit used
tial
Z
containing
4
then
each
depth-first
the main
Transactions
algorithm.
visits
order
procedure
on Programming
The node
main
not
procedure
already
of immediate calls
Languages
Finish
initializes
visited
(by
dominators. to determine
and Systems,
essen-
calling
the
After
all
membership
Vol. 17, No. 3, May
1995.
494
The
procedure
is first
put
between X
Ron K. Cytron
.
to
with
and
of the
Y
a call
to
the
membership
and
pops
the
Central
to
dominator
in N. entire
that
The
algorithm
implementation: X
(Section the
of FindSnodeo.
algorithm
as discussed
of our
in Section
ensuring
appear
that
nodes
of lower
The
loop
nodes and
to
graph
D.
nodes
since
since
C’s
Alap(C) since
number.
only
E
been
processed
are taken,
and
—For
C,
X,
W,
Our
the proof
rather
X~Y. of FindSnode(Y, in Section time
P)
in Fig-
3 is based
on the
bound,
illustrate that
we modify
depth-first
our
❑ calls
by
Visit
with
considers When
is next
D.
in turn
called
of
Although
holds
loop
Z,
Y,
and Visit
C’, no action on B,
1).
Since
no
for
nodes
W
at ~
Thus,
Map(D)
X,
each
on
by ~,
is set to
is
of which
is called
is taken
kfap(B)
for
in Visit
D.
B,
correctly
property
Figure
same The
been
this
dominates
❑ considers
dominates
already
10 in
❑; the
immediately
application
{ B}.
determine
numbered
are taken
of D
have
correctly
(depth-first
=
that
number
can
the
N.
~ dfn(idom(Z)).
I’tsit
A“,
P>
4. We now
empty.
places
P.
of FindSnodeo
dfn(idom(X))
immediately
When
ascends
below
G; and
also,
B;
B in No.
V, calls
❑ considers
loop and to
becomes
Z
(although
Visit).
A.
When
When
Visit
the nodes some Visit
works
immediately
of these works
will
on
on Y,
A,
dominated already no
steps
Find5’node
have of ~ will
be
X.
FindSnode
returns
B;
since
B and
Y are not
equidominates,
Y is placed
by ~. FindSnode
recursively —For
stack)
returns
version
❑ then
E, suppose Y,
Map(A)
on C and
in No
❑ is
nature
here
C, loop
by recursive
called
—For
❑
❑ considers A,
V
Loop so
C.
step
in order
determines on the
which
node
3, we mention
predecessor
~.
P),
graph
1, assuming
algorithm
with
predecessor
is set to
simultaneously
almost-linear
3 and
by 1~, no steps
no node, only
our
higher
depth-first
the
B ~ N.,
When by
with the
is set to D in loop dominates
~
❑ considers
When
empty,
a node
4.
in Section
❑ begins
at
for
(accomplished
are currently
as presented
in Figure
be in N.,
are dominated
inefficient,
in Figures
flow
Z < DF(X) By
in Na
FindSnode(Y,
Z ~ SUCC(Y),
To obtain
is shown to the
determined
Visit (which
always
I Z E DF+(X),
behavior
proofs
in
SCC
on the
algorithm
full
code
be placed
function
albeit
the
be
should
3) relies
correctness
algorithm
or we may
may
for a sparse
a straightforward,
algorithm
on
be necessary,
function
ure
The
Depending
final
We present
the
flow
Visit
(Y, Z)
Z tree
FindSnode).
Y, searching
correctness
The
node
calling
to
edge
input dominator
control
in the
is the
node
of algorithmic
3.
the the
each
node
of all nodes
Here, Then
(A’. ) (by call
the
algorithm.
of SCCS).
stack.
from
actual
track for
a further
than
its
of the
keep
is searched
directly
this
heart
to
IncludeNode).
tree
Ferrante
to be in DF+
search,
determine
the
(used
determined
success
able
forms
a stack
idona(Z)
already
the
Visit
on
and Jeanne
B,
on A’. FindSnode
ACM TransactIons
returns In this
X;
returns
on Programmmg
since
X
invocation, B,
so X
and
loop
Y are equidominates, ~
is placed
considers
nodes
Visit B and
in N..
Languages and Systems, Vol. 17, No. 3, May 1995
is called W.
Efficiently
—For
W’,
FindSnode
invocation,
returns
loop
~
A, FindSnode
—For since
A is not
—For
V,
—For
X,
Now,
is called
A,
Since
V,
V;
and
that
happens
returns
FindSnode
returns
Map(X)
Map(W)
A.
nothing
@Nodes
On-The-Fly
recursively
.
on
495
W.
In
this
X.
node to W
has already because
Visit
is then
called
which
is already
been
visited,
and
of A. recursively
for
V,
where
V.
becomes
In particular,
so Visit
nodes
returns
in N.,
FindSnode
Map(V)
W,
considers
Computing
and
X,
is not
iMap(X)
on stack,
so nothing
happens.
set.
are both
set to W.
Since
X
E Nr,
W
is placed
in No. When
❑ considers
loop
is called
for
Z,
nodes
W,
X,
D and
V, each has already
and
Y are considered
—For
D, FindSnode
returns
B, so Z is placed
—For
Y,
returns
Y,
In
FindSnode
contrast,
previous
only
have
constructed
cally
quadratic
The
proofs
put
would
on the
The
contained
order
suffices eral
been
et al.
Visit
Cytron
1991;
visited. et al.
sets in Figure
also have
When
in No.
has already
frontier
worklist,
in this
in which
iterated
namely
1991]
2, which
through
the
would
not
are as ymptoti-
dominance
frontier
B, V, W, X, Y, Z }.
{E,
as they
is itself
presented
algorithm
general prove
@nodes.
the
algorithm pitons
3.11:
The
—Lemma
3.16:
Two
in the
iterated
—Theorem
—Theorem
in other
the
proof
sequence
Ma,po
structure
node node.
3.19:
the
When
and
from
dominance
theorems
The
placement
theorems,
of this
The
trees,
in Section
follows
2
gen-
frontiers,
are offered
node-ordering
sep-
property
3.8.
in
as a sparse
presented
property
3.7; these
contexts.
presented
different
This
dominator
previous
dominance
3.18:
identified
in
following:
by the algorithm
3.5, 3.6, and
identifies
shown
the
of @-nodes.
graphs,
be useful
correctly
The
—Lemma
flow
as Corollary
results that
are considered
in Theorems may
establish
placement
about
are proven
arately,
The
the
observations
section
nodes
to identify
which
(2)
[Choi
visited.
CORRECTNESS
3.
(1)
methods all dominance
in size, but
sets of all nodes
which
been
by ~.
nodes frontier
Section
Relying
3.18,
2 computes
the
3.19,
on the and
correct
3.20 set
of
are as follows:
correctly have
the
of the
representing
algorithm
of @-nodes. Theorems
relates same
equidominates.
Mapo
value
only
if one is
other. a set
terminates,
of equidominates
@-nodes
have
is correctly
been
correctly
identified. Theorem sparse LEMMA
3.20
is then
easily
established:
the
algorithm
correctly
identifies
the
nodes. 3.1. ACM
(Y,
Z)
E tf
Transactions
=+-
idorn(Z)
on Programming
zY. Languages
and Systems,
Vol. 17, No. 3, May
1995.
496
Ron K. Cytron and Jeanne Ferrante
.
PROOF.
Suppose
not.
idom(Z).
contain to Z that
does
Then
If path not
COROLLARY
there
exists
P is extended
contain
idom(Z).
Thus,
Z)
E&f
and
Y
dfn(idom(Z)) COROLLARY
Suppose
Lemma
either
(1)
X
3.4,
3.6.
left
by
PROOF. (1)
X.
3.7. Suppose
idom(Z)
=X,
(2) idom(Z)
Since and
Z
Y
and
~
of DFST
tree
idom(Z)
edges
$
from
Z
that
idom(Z)
to
spanning
tree
❑ m the depth-first frontier
of X.
Z)
6 Sf such
since
if so X
But
spanning
that >
Y
XZY =
and
idom(Z)
3.3, idom(Z)
since
X
>> Y,
z
X
>
>
Z.
Z,
idom(Y)
we have
X
a >>
>> Z,
a
❑ dfn(X)
By
> dfn(idom(Z)). but
Theorem
Suppose must
in DFST.
3.5,
there
E DI’(X).
thus
some of X
left
the
of
or node
Y c 1%-eds(Z)
in DFST,
of Z.
DFST
But
Y
is also
Z cannot
have
to a
Z @ DF(X).
proving
idom(Z)
In
Z @ DI’(X),
existed
Y is to the Therefore
==+
Z
be a descendant
therefore
Then
Root
Y.
dfn(idom(Z)),
E DF(X) not.
Y,
>> Z.
a contradiction,
which
# X
X
==+
> Z.
# Y. By Corollary
idom(Z).
to its left
case reached
THEOREM
*
X
a path
is a path
=
the
==+
of idom(Z)
Then
of idom(Z). of”
of idom(Z),
predecessor Either
>> Y
dfn(X)
“left
in
Y
dominate
idom(Z)
==+
>> Y.
be in the dominance
E llF(X)
is an ancestor
dominated
there
Z @ DF’(X).
Z
B
exists
cannot
idom(Z)
X
Suppose
(2) X is to the the
X
case that
Hence
THEOREM PROOF.
not
a path
< dfn(Y).
idom(Y)
X
there
>> Y,
Therefore,
contradiction.
~f,
does
==+
of idom(Z)
then
c DF(X).
Z
~
is an ancestor
If X
be the
Y,
Then
then Z cannot
contradiction. By
that
❑
and Y # idom(Z)
c&f
ancestor >
Thus,
Suppose
cannot
an
idom(Z)
X. 3.5.
of Gf,
PROOF.
is
not.
Since
THEOREM
Y.
X
and idom(Z)
excludes
DFST
It
If
3.4.
X.
Y that
Y
we obtain
3.3.
of Gf,
excludes
idom(Z)
#
idom(Z)
PROOF.
$
then
@ ~f.
~ dfn(idom(Y))
(Y, Z)
LEMMA
(Y, Z)
: Root
(Y, Z),
3.2. (Y,
DFST
P
a path
by an edge
the
theorem.
❑
>> X.
either
implies
Z
idom(Z)
@
~ X.
DF(X); Consider
or, then
the
path
P : Root ~ A“ that
does
not
contain
idom(Z).
If Z G DF(X),
then
we can extend
Q: Root bX~Y~Z ACM
TransactIons
on Programming
Languages
and Systems,
Vol. 17, No. 3, May
1995
path
P to
Efficiently Computing
such that
X~Y
between
X
idom(Z)
must
DFST Either
and
and
occur
3.8.
The
lemmas
following
With
on path
Z
By
node
those
We omit
3.5,
before
the
497
Q such that
not
occur
node
Z.
edges
P,
on path Thus,
X
is a
❑
theorem.
2 d~n(idom(Z)),
properties that
.
Z < DF(X).
proving
proofs
does
and
dfn(idom(X))
=+-
formalize
X
Theorem thus
On-The-Fly
we can construct
idom(Z)
Since
Q after
G D~(X)
proof.
X>Y,
edges.
a contradiction,
COROLLARY
correctness
$Z.
of idom(Z).
ancestor
case reached
in our
X
Y are DFST
+Nodes
of our algorithm
directly
follow
that
from
participate
inspection
of our
algorithm. LEMMA
3.9.
LEMMA
3.10.
is called exactly
Visit During
all
calls
once for each node in Flowgraph
to Visit,
each
node
is pushed
and popped
exactly
once. PROOF.
exactly then
By
once.
Lemma
We
Z is popped
Z belongs
by
the
3.11.
Y
PROOF.
Since
initially
algorithm.
=
X
are
popped
set
stack,
call,
Z is pushed
If Z
pushed.
by H,
H
Map(Z),
=
Otherwise, # Z,
in which
❑
= idom(X). the
is only
off the
Z was
represented
idom(Y)
Map(X),
=
Map(X)
on this once.
H is pushed.
in which =+
once; exactly
in which
component
iteration
Map(X)
exactly
Z is popped of Visit
connected
Otherwise,
equidominates
that
invocation
by the
LEMMA
is invoked
to show
to a strongly
case Z is popped
Visit(Z)
3.9,
need
lemma
holds
during
In this
the
case,
at
loop
the
at
we have
start
step
of
~,
the
when
idom(IMap(X)
) =
❑
idom(X). LEMMA
At step ~,
3.12.
node s (referenced
at step ~)
of
s cannot
has already
been pushed
and popped. PROOF. Thus, s has
By
s either not
before
been
step
~
the
predicate
has
not
pushed
then
Crtum(s)
node or
=
Therefore,
Any
3.13.
~, yet,
O, but
s has
invocation
else
of
s has then
been
be
been step
on
~
pushed
stack
pushed would
and
IncludeNode(s)
at
and have
~. If
pushed
s
❑
popped.
must
step
popped.
already
have
oc-
at step ~.
LEMMA
3.14.
OnStack(X)
LEMMA
3.15.
At
PROOF.
frontiers. LEMMA
PROOF. servation
x.
pushed
is reached.
COROLLARY
curred
step
been
Follows
and OnStack(Y)
step ~,
from
FindSnodeo
==+ idom(X)
returns
of FindSnodeo
inspection
= idom(Y).
s I s = Map(S),
Z 6 DF(S).
and the definition
of dominance
•l
Map(X)
3.16.
Follows that
from
Map(X)
= Map(Y)
==+ Y e DF+(X)
initialization represents
(Map(X) the
strongly
=
X),
or X = Y, Lemma
connected
3.15,
component
and
the
ob-
containing
❑ COROLLARY ACM
3.17.
s e N.
Transactions
+
Map(s)
on Programming
G Na. Langnages
and Systems,
Vol. 17, No. 3, May
1995.
498
Ron K. Cytron and Jeanne Ferrante
.
THEOREM
3.18.
PROOF.
We
(==+).
consider
We actually
the
iteration
by
backward
that
T(X)
Base
induction
Case.
trivially
satisfied.
Inductive
~.
a stronger prove
With
vertex(~)
idom(s)
Lemma
3.15
the
steps
that
T(X)
I dfn(idom(X))
To accommodate
depth-first loop
algorithm),
idom(Z)
noting
= idorn(X).
spanning
❑ empty,
at step
potentially
tree,
and
this
so
case is
set
= n.
~
s = itfap(S),
must
have
returned
Z E DF(S).
Z E DF+(s).
From
Lemma
3.11
and
with
idorn(s)
#
I N
> djn(ido7ia(Z)).
~ k ~ n + 1, we obtain T(s)
S eNm.
=+
Z E No.
Steps whose
~~,
~,
and ~.
immediate
Each
dominator
of these
gets
pushed
and
{ZI,
Z2,...,
ZL } according
is the
is the
first
last
popped
such
such
once
to the popped;
popped.
determines
Each
node
I dfn(idom(X))
exactly
node
node
steps
is idom(X).
{X
xl
of our
when
With
step
E N..
3.8 implies
IHA(k)
Thus,
progression
in its
node.
dfn(idorn(s)) Applying
==+ X
Visit(Z)
# zdorn(Z),
c N..
hypothesis
those
we obtain
Corollary
idorn(Z),
T(X)
induction
is childless any
s I T(s), From
result:
the
Map(X)
separately.
on n (following
Consider
e
implications
dominate
Step.
T(Map(X))
be set in procedure
Node
immediately
two
prove
can only
cannot
Step
the
❑, we
of step
❑,
o? step
As
(by
order node
by consulting
nodes
set
= n}
❑ and
steps
in which
they
x, is popped
Accordingly,
T (X)
in the
~.
before
we define
We name
emerge
the
from
node
the
X,+l.
such
nodes
stack:
node
and
node
XL
predicate
L
Popped(k)
which
is true
following
when
induction
exactly
Base popped
Case. x,,
Inductive ACM
Prior
affect
any so step
Step.
Transactions
k such
~ .=k+l
OnStack(xl)
nodes
have
been
popped.
\Ye
nox
prove
the
hypothesis:
IHB (n)
cannot
=
to
popping
of the ~
(Popped(n))
R
z,.
cannot
We now
on Programming
xl,
By
Lemma
Lemma
affect
prove
A T(X)
IH~
Languages
3.14
3.12,
any
of the
step
=+
X
s N..
ensures
that
~
requires
steps
z,.
(n). and Systems,
~
and
~
s to be an already
Vol. 17, No. 3, May
1995
Efficiently Computing
❑
Step lary
3.13,
. By and
Lemma
3.12,
assuming
s has
~~E(k)
already
I 1
dfn(idom(Y)).
set
T(Y)
call
to FindSnodeo
< d~n(idom(X)). ~
d~n(idorn(Y)) —X
popped
c Map(Na)
By
lHc(n
– 1), we have
at step
~
will
Y’(X)
return
X.
true. There
cases:
d~n(idom(Y)) step
Y
n >0.
[ y c DF(X),
= idom(y),
y G DF(X) We
and
~,
Y = Map(y).
Inductive
Since
=+- T(Y).
Y.
sets T(Y)
= d~n(idom(X)). Step
~
sets
Step
❑
Two
cases:
sets
T(y)
true.
When
y is popped
at
true.
‘Y’(y)
true.
When
y is popped
at step
~,
step
~
sets
true. ACM
Transactions
on Programming
Languages
and Systems,
Vol. 17, No. 3, May
1995.
500
Ron K. Cytron and Jeanne Ferrante
.
—X
= Y.
Thus,
then
both
T(X)
=+
implications
THEOREM
3.19.
T(Y). hold.
•l
the call to Finis/z(Z),
i-lfter
@(Z)
-
Z G No.
PROOF.
@(z)
u
3(Y,
Y s
z)
e tf
#
idom(z),
=
FimiSnode(Y,
]
icio7ri(Z)),
sENg in Finis
by construction
ho and Theorem
3.18.
But
UZCDF+(S), in FindSnodeo,
by construction
by definition
After
3.20.
PROOF.
X
~
T(iiKap(X))
No e
(by
holds
S=NO this
holds
❑
of N+.
THEOREM
and
this
calls to Finisho, Map(z)
Theorem
E N.
3.18).
T(X)
u
(by Corollary in Finish,
But
● N..
X 3.17).
But
‘T(Map(X))
Map(z)
-
E No
u
❑
T(X).
4. COMPLEXITY In
this
the
section,
number
is the
total
time
FindSnode that
we first
of nodes for
correctness
obtaining
our
desired
THEOREM
The
phase
—a
call
in the
+ E + T), input
Unfortunately,
a faster
hold.
almost-linear
In
version
our
faster
complexity
where
IV is
flowgraph,
T is not of our
linear
algorithm
algorithm,
and
T
using
and
show
is O(Ect(E)),
T
bound.
Algorithm of Figure
3 is O(N
+ E + T),
of edges in the input
where N w the
f?owgraph,
and T is the
consists
of
phase,
Visit
where
is called
recursively
once
for
each
node,
and
to Finish.
We analyze
the complexity
Visit(Z),
call
starting
at step
step
~
can
be ignored
Since
provide
still
is O(N
of edges
FindSnode.
to
We then
algorithm
initialization
—a
each
algorithm
for all calls to FindSnode.
time
loop
calls
our number
The algorithm
~. 1.1.
PROOF.
For
the
of nodes and E the number
number
—an
that
E
results
4.1 Analysis of Initial
total
all
as written.
our
show
and
where
Visit
the
~
amount
predecessor
edges
of the
in determining only
the
once
on Programmmg
phases.
is a constant
over
contents
is called
ACM Transactions
of each of these there
for
stack
into
Consider node,
initialization
of work
are popped.
bound. each
The
over
Z, The the all
not and
phase inside
the
constant loop calls
any
loop
the
starting
at
amount
starting to
Visit,
Languages and Systems, Vol. 17, No. 3, May 1995.
is O(N). loop,
of work
at step this
~.
loop
is
Efficiently Computing
executed O(E)
O(E) calls
starting loop
times.
at step
O(E)
result.
•l
We
now
shown tree
Finally,
analyze
path
graph version
the
Entry
of our
the
cost
algorithm
is O(I3
It
of the
require
execute For
consists
501
at most
the
last
once,
of O(E)
the
each node
asymptotic
plus
desired
FindSnode(Y,
function
loop
so this
work
we obtain
FindSnodeo
overall
.
exactly
work,
visiting
of applying the
will
work.
popped
all of this
behavior
The
and
On-The-Fly
loop
other
to Finish.
could
). As such,
the
O(E)
are pushed
call
invocation
0(N2
Visit,
to
, and
Summing
asymptotic
to Y.
is then
calls
❑
all nodes
consider
3. Each
from
nodes
all
step
to FindSnode.
calls
in Figure
over in
in Visit,
~
is O(lV).
at most
Thus
FindSnode
to
+Nodes
P)
as
on a dominator
to each
behavior
of N
of the
flow
simple
x N).
4.2 Faster Algorithm a path-compression
Using our
algorithm Eval
the
(Y).
Using
dominator
associated
result
to use the the
tree
links
from
initially
due
to Tarjan
[1979],
we rewrite
certain
parts
of
instructions:
with
established Y,
by the
returning
each
node
the
X
Linko
node
instruction,
of
is –d~n(X);
Evalo
maximum
the
link
label.
of each
ascends The
node
label
is initially
1. Link node
(Y, idorn(Y)). Y now
Update
(X,
struction The
djn(X)).
is issued
(1)
We initialize Links
(3)
Path
node
the X
version
the
=
idorn(Y).
immediate
Changes
label
information
associated
is updated
Ilvcdo
whenever
search
that
includes
with
X
to
dfn(X).
This
in-
in set ~0.
is obtained
at steps
the
Evalo of Y.
included
of algorithm
to extend
Any
dominator
becomes
path-compression
are inserted
at step
lznk(Y) the
when
path-compressing
(2)
(4)
Sets
also includes
~
and
search a node
as follows: ~.
at steps X
~
and
is added
~.
to the
sparse
graph,
~.
We redefine
function
Function
FindSnodeo
FindSnode(Y,
return
P)
by:
: node
(Map(Eval(Y)))
em
end We now of our
state
and
Proper
erations
are used Linko
required
4.2.1.
PROOF.
Steps
node
Z
has
representative ~,
in a manner
and
LEMMA
and
the
Use of Path-Compressing
structions
each
prove
properties
of the
path-compressing
version
algorithm.
At ~
Updateo steps and
a unique
in Map(Z),
/ink(Z) ACM
= 1 just Transactions
with
are applied ~
~
Instructions. consistent
and ~, initialize
and prior
since
to nodes link(Z)
linlc(X)
immediate
We first their
=
n is a strictly
on Programming
at the
end
= L prior
dominator
to applying
show
definition
Lznko Languages
1
for
in
the
[Tarjan
each
node
and
decreasing
and Systems,
inop-
Linko.
X
G Aff.
and
Since
a unique
sequence ~
above 1979]:
of a “link-path.”
to invoking
idom(Z), at steps
that
at steps ~.
map ~
❑
Vol. 17, No. 3, May
1995.
502
Ron K. Cytron and Jeanne Ferrante
. NodeCount foreach
+ 0
(X E Nf)
T(X)
do
+- false
@(X) + false
;
Map(x)
;
+ x
od foreach
(X E ~f)
lznk(X)
do
+ L ;
Label(X)
+
– dfn(X)
od Cnum for
+
O
n = N downto foreach
1 do
({Z
I djn(zdom(Z))
if (Cnum(Z)
= n})
= O) then
do
VZsZt(Z)
call
fi
od foreach
({Z
I dfn(tdom(Z))
if (Z = kfap(Z))
= n})
do
then
call
Lmk(Z,
tdom(Z))
call
L2nk(Z,
Map(Z))
else fi od od foreach
(X ~ J/f)
lznk(X)
do
+ L ;
Label(X)
+
– dfn(X)
od for
1 do
n = N downto
foreach
({Z
call
I dfn(zdom(Z))
= n})
do
= n})
do
Fzntsh(Z)
od
foreach
({Z
I dfn(tdom(Z))
if (Z = &fap(Z))
then
call
L~nk(Z,
tdom(Z))
call
Lznk(Z,
lvfap(Z))
else fi od od
Procedure
IncludeNode(.Y)
T(X)
& true
call
Update(X,
em
dfn(X))
end
Fig.
LEMMA
4.2.2.
PROOF.
The
cesses only sidered
Update(X,
procedure
by Visito,
produces
ACM TransactIons
each
Since such
of the Faster the
Faster
same
step
node
on Programmmg
is revoked is invoked
~
has not
has 1 for
Algorithm.
output
algorithm.
dfn(X))
lncludelVodeo
equidominates.
Correctness version
When
5.
We
its now
as its slower
at step ~,
only
yet
link. show
from
executed
lznk(X)
Visito, for
any
= L.
which
pro-
node
con-
❑ that
the
path-compressing
version.
Languages and Systems, Vol. 17, No. 3, May 1995.
Efficiently
4.2.3.
~EMMA
As
implementation
invoked
durtng
of FindSnode(Y,
@Nodes On-The-Fly
Computing
the course
P)
respective
of their
.
503
algorithms,
each
the same answers.
returns
PROOF. —By
inspection,
FindSnode(Y,
ancestor
X
of Y,
function
returns
no Nfap(X) where —We
is ancestor
now
argue
notice
X
~ap(X)
if X
strongly
connected
to
that
member
the path
can be changed
(1)
If Map(X)
by step
P),
(a)
there
(b)
its
is a link
label
has been
some
node node
on
the
that
versions The
is almost
THEOREM
sparse
the
graph.
returns
If
A4ap(X),
❑
exactly
this
behavior.
and ~
link
each node
each
node
X
representative
node
X’s
There
label
begins
are two
X
to
idom(X).
member,
while
as –d~n(X),
cases to consider:
between
Y and
P (exclud-
There
4.1.1
than
in
the
is
by its
node
will
depth-first
asymptotic
label, some
node
complexity
the
up any
Eval(Y)
closest
to Y whose
Applying
sparse
this
node
of Y just
P)
Y,
Since
dominator,
graph.
in
representative
from
yet.
Mapo
once
is returned.
negative
the ancestor the
sparse
path in
component
included
be
link
linked
its immediate
node
of maximum
which
on the been
connected
of FindSnode(Y,
faster
Our are
and
higher
strongly
path
label
P has not
will in
below
graph,
then
number. be the
the P.
each
Thus, node
strongly
when of min-
connected
Applying
Mapo
once
is returned.
compute
the
same
❑
results.
of the path-compressing
version
of our
linear.
4.2.4.
PROOF.
the
containing
Performance.
Theorem
in the
and
representative
link
label,
ensures
two
algorithm
each
to dfn(iMap(X)).
since
be labeled
returns
component
Thus,
the
the
negative
node,
is already
that
must
Eva/(Y) imum
again
link
to their
of maximum
P,
in the
node
ensures
node
numbered
node
representative
no
simulates
for any node
to that
node
is depth-first
such
considers
is considered,
function
at steps
Each
graph,
changed
the
excluding
returns
(2) If
path
returns
but
again
X
then
Eval(Y)
node
~
to be df n(X).
sparse
the
otherwise
are linked
~
then
at
to its dominator.
is in the
ing
and
nodes
included
established
Llap(X),
is linked
but
to
#
P)
of links
Y and
to P.
FindSnode(Y,
First,
Thus,
graph,
prior
at node
P. As each node
is already
sparse
of Y just
3 begins
node
if Map(X) in the
that
that
of Figure
excluding
Map(X)
is already
X
P)
up to but
0(~)
Tarjan
algorithm
calls
O(Ea(E))
FindSnodeo.
to
[1979].
takes
The
time. proof
thus
follows
from
El
5. EXPERIMENTS Although of the have this
we have standard
indicated section,
described
algorithms that
this
we present
flow does
behavior results
graphs occur, is not
from
where
the
previous expected
experiments
worst-case
experiments in practice intended
quadratic
behavior
[Cytron
et al.
on real
programs.
to answer
the
1991] In
following
questions: ACM
Transactions
on Programming
Languages
and Systems,
Vol
17, No. 3, May
1995.
504
Ron K. Cytron and Jeanne Ferrante
.
*
l.o– o.9– 0.8– o.7–
*
0.6–
*
o.5–
*
o.4–
*
*
o,3–
** o.2– o.l– 0.0–1 *
Fig.
(1)
6.
Speedup
How the
(2)
does
of our algorithm
the
big
of SSA
an experiment,
form)
1988]
(Ocean,
1976]
and
In
that
compare
Most
algorithm
procedures
with
1991]
before
suite
of
usual
the
our
algorithm
performance
on typical
of
programs?
algorithm
the
beats
the
reported
in
(i.e., given
these
only
the
these Cytron
without
et al.
[Smith
et
are
same
the
et al.
2 with
times
necessary
path-
the
were
al.
[1991].
balanced
in Section
time
construc-
[Berry
Eispack
execution
the
(toward
Perfect
library;
simple
1991];
represent
placed
from
algorithm
et al.
were from
subroutine
our
faster
they
and
experiments
speed
[Cytron
taken
1979]
in the the
10, and
of the
runs
constant)
expected
of these
runs We
experiments
theoretically Though creasingly behavior
show in took
under
have
not not
represented
taller
“ladder”
is expected.
Transactions
execution time
behavior 1 second, any
time of the
of our usual
in practice both
implemented reflect
efficient
not
the
execution
linear
did
more
that
the
also to exhibit
of programs.
ACM
@functions
of the
speed
obtained
to compute
the
of #-functions.
a small
these
wherein
et al.
asymptotically
on a SparcStation
each
et al.
become
benchmark
participated
but
usual
QCD)
[Dongarra
6 we
location
compare
[Cytron
example
139 Fortran
Spzce,
compression) of the
in
Linpack
Figure
(in milliseconds)
algorithm
algorithm
a contrived
time
algorithm?
We performed
procedures
of our
quadratic
must
quadratic
tion
performance
potentially
How
vs. execution
the
algorithm
on the
algorithms
same
and
might
be
Because
on this
collection
path-compression,
that
(with
so can
programs.
are fast
balanced
improvements
is linear
algorithm,
and
be gained
by
so this
algorithm. in Figure graphs
Our
on Programming
6, we also
of the
algorithm Languages
form
experimented
shown
demonstrated and Systems,
in Figure better
Vol
with
a series
1, where
performance
17, No
3, May
1995.
of in-
worst-case when
these
Efficiently Computing ladder
graphs
time
had
10 or more
Our
algorithm
anyway.
graph
of 75 rungs,
taking
rungs,
though
is twice
smaller
as fast
20 milliseconds
@Nodes On-The-Fly graphs
as the
while
take
usual
the
.
scant
algorithm
usual
algorithm
to the
usual
505
execution
for
a ladder
took
40 milli-
seconds. In summary, —linear
performance
test —a
comparison
Thus,
performance
better
evidence
In this
article
puting
dominance
form.
some
indicates
bound
artificially
comparable
although
shows our median
of 3;
generated
expected
cases.
performance
using
our
associated
with
cases,
Future
work will
Recently, termines
will
#-nodes
and
in determining
in DI’(X))
of any
node
X
asymptotically termining
already
~-nodes
updated
Gershbein
and
may
increment
a new
ally
“forward”
in the
remarkably
be more in
uses some
graph.
useful
in
to
(given
some
simple
algorithm
[Cytron
X,
) that
include
et al.
as-
nodes
1991].
Gao
Our
frontier
algorithm
backward
style
as when
of may-alias
de-
properties
dominance
and
our
algo-
that
graph
node
situations,
inclusion
pre-
slower
cases.
“DJ-graph”
%-eedhar
however,
our
of the
(the
Y is in the
The
simple;
response
into
structure
node
though
generated
an elegant
algorithm
a given
for
) behavior
also given
behavior,
artificially
manner
placement
whether
almostbound
0(N2
We have
SSA
rather
algorithm.
developed
algorithm
included
faster
have
Their
@-node
computes
usual
and
introduces in the
the
path-compression
on real
/1994]
time.
original
instead
to the
balanced
or
a worst-case
eliminated
algorithm’s
of com-
is a @-node,
almost-quadratic
frontiers.
simple
step
Graphs
a node
we obtain
we have
dominance our
results
but
@ nodes
of the
algorithm
using that
Gao
article,
cases,
in practice
in linear
in this
and
costly
Evaluation
which
frontiers,
incorporate the
under
a worst-case
and
compare
potentially
SEGS both
is comparable
the
Sparse
dominance
evidence
Sreedhar
developed
In
computing
in many
and
the
form.
experimental
to eliminate constructing
the conditions
through
liminary
rithm
how
constructing
SSA
WORK
when
determining
for
constructing
shown
frontiers
by iterating
linear
AND FUTURE
we have
By directly
than
and
for
algorithm,
of a factor
algorithm
algorithm.
6. CONCLUSIONS
sists
algorithm
degradation
performance
preliminary
simple
simple
for the same cases as the usual
case exhibited
factor-of-2
of our
SSA
is
of deform
information
is
[C ytron
1993].
Acknowledgements We
thank
liminary
Dirk
Grunwald
version.
We
mist ake in our
and
are
Harini
grateful
Srinivasan
to our
for
referees,
their
suggestions
especially
the
with
one
who
of
Computer
a prefound
a
proofs.
REFERENCES AHO,
A , HOPCROPT,
rithms. AHO,
Addison-
A , SETHI,
Addison-Wesley, ACM
R.,
.J ,
AND
Wesley,
ULLMAN, Reading,
AND ULLMAN, Reading,
Transactions
J
1974
The
De.w.yn
and
A7UZ1YS23
Algo-
Mass. J. 1986.
Comptlers:
Prvnczples,
Techniques,
and
TOOIS,
Mass. on Programming
Languages
and Systems,
Vol. 17, No. 3, May
1995
506
Ron K. Cytron and Jeanne Ferrante
.
ALPERN,
B.,
WEGMAN,
programs.
In
Programmmg BERRY,
N.,
AND ZADECK,
D , Koss,
J , LUE,
evaluation
of supercomputers. Univ.
Symposmm
J -D , CYTRON, evaluation
J -D , CYTRON,
R
Trans.
JOHNSON,
R.
ACM,
T
Languages. Press,
Computer
York, J
In
B
T.,
for
O , STNANSON, G ,
Effective
performance
Supercomputing
Automatic
New
1994.
Research
and
construction
York,
55-66.
On the
efficient
of sparse
ACM
data
Symposmm
engineering
on
of ambitious
20, 2, 105–114. R. L.
Efficiently
1990.
Introduction
to Algordhms.
may-alias
on Programmmg
K , WEGMAN, form
The
information
Language
in
Deszgn
and
Record
M
N , AND ZADECK,
and the
control
F
K.
dependence
1991.
graph.
Eff-
ACM
), 451-490.
K
1986.
Code
motion
of control
of the 13th Annual
New York,
R , MOLER, K
accommodating
Conference
assignment
ACM,
J
ACM
structures
Symposwm
m hlgh-
on Prmctples
70–85.
C. B.,
AND STEWART,
G
W.
1979.
Lmpack
Users’
Pa.
1993.
Data
flow
Rutgers
Dependence-based
program
on Programming
analysls
Univ.,
and
New
analysis.
Language
Languages.
Design
In
and
Proceedings
of
Implementation.
F
iteration.
K. 1988. Global
New York,
DONGARRA,
J
D. thesis,
Dept.
of
value
ACM
numbers
and redundant on Prmcaples
Symposzum
12–27.
J , GARBOW,
Matraz Ezgensystem
Ph.
NJ.
Record of the 15th Annual ACM,
J. M.,
C. B. 1976.
incremental
Brunswick,
M. N , AND ZADECK,
Conference
BOYLE,
AND MOLER,
F , JOHNSON,
36-45.
Philadelphia,
WEGMAN,
of Programming SMITH,
A ,
D , HSIUNG,
78-89.
1989.
computations.
York,
Conference
Science,
ROSEN, B. K.,
1993.
Conference
SIGPLAN
New
MARLOWE,
R.
single
AND PINGALI,
the ACM
R , SAMEII,
P , WALKER,
benchmarks:
Center
1991.
J
Syst. 13, 4 (Ott
J. J., BUNCH, SIAM
J
Eng.
A , AND ZADECK,
In
of Programming Guzde.
club
827,
E , AND RWEST,
New
Lang.
languages.
Soflw.
J., ROSEN, B
static
Program.
DONGARRA,
S , SEIDL,
ACM,
of the ACM
ACM,
R , LOWRY,
level
C
In Proceedings
computing
CYTRON,
Y , ROLOFF)
G , MESSINA,
Mass.
R , FERRANTE)
iciently
Trans.
AND GERSHBEIN,
Implementation. CYTRON,
of
Record of the 18th Annual
Conference
R , AND FERRANTE,
Cambridge,
SSA form.
in
on Prmc%ples
111.
Languages.
H , LEISERSON,
Press,
CYTRON,
In
IEEE
analysis.
T.
MIT
S , PANG,
perfect
R , AND FERRANTE,
graphs.
of Programming
program
Lo,
Rep.
Urbana,
equality
M , AND CARINI,
ACM
CORMEN,
Tech.
of variables
Symposwm
P 1993. Efficient flow-sensitive interprocedural compualiases and side effects. In Conference Record of the 20th Annual on Prmczples of Programming Languages. ACM, New York, 232-245.
J -D , BURKE,
of pointer-induced
CHOI,
D.,
D , Fox,
1988. The
Detecting ACM
1-11.
K , ORSZAC, J
of Illinois,
tation
Prmczples
York,
P , KUGK,
R., AND MARTIN,
flow
New
S , SCHNEIDER,
GOODRUM,
CHOI,
1988.
ACM,
C , SGSIW’ARZNfEIER,
CHOI,
K.
Languages. E , CHIN,
Development,
F.
Record of the 15th Annual
M , C~Eri,
CLEMENTI,
M.
Conference
B.
S., IKEBE,
Routmes-Etspack
Y , KLEMA,
Guzde.
V.
C ,
Springer-Verlag,
Berlin. SREEDHAR, Rep.
V
AND GAO,
ACAPS
Memo
G
1994.
75,
Computing
School
@-nodes
of Computer
in linear
Science,
time
McGill
using
Univ.,
DJ-graphs. Montreal,
Tech. Quebec,
Canada. TARJAN,
R
1979.
Applications
of path
compression
on balanced
trees.
J. ACM
26, 4 (Ott.),
690–715. WEGMAN,
ACM
Received
ACM
M
N. AND ZADECK,
Trans.
October
Transactions
Program.
1993;
Lang.
revised
on Programming
F
K
1991.
Constant
Syst. 13, 2 (Apr
October
1994; accepted
Languages
propagation
with
conditional
), 181-210.
November
and Systems,
1994
Vol. 17, No. 3, May
1995
branches.