Efficiently computing &fgr;-nodes on-the-fly - Semantic Scholar

8 downloads 1578 Views 1MB Size Report
email: ferrantet2cs.ucsd. edu. Permission to make digital\hard copy of part or all of ..... iMap(X) are both set to W. Since. X. E Nr,. W is placed in No. When loop.
Efficiently

Computing

@-Nodes On-The-Fly

RON K. CYTRON Washington

University

in St. Louis

and JEANNE

FERRANTE

University

Recently, the

of California

Static

efficient

those

graph

where

were

found

that

could

present

Single-Assignment

solution

set of flow

at San Diego

nodes

that

potentially

inherently

the

worst-case

Sparse

affect

iterated

algorithm

with

have Other

of the

initial flow

advanced with

nodes

so-called

are

@-nodes

set of nodes, graph.

for

an initial

relevant

these

to the input

exactly

been

is provided

Previously,

frontiers

for determining

method solution.

combine.

respect

Graphs

Each

a problem’s

must

dominance time

Evaluation

problems.

solutions

quadratic

an almost-linear

and

optimization

disparate

by computing take

Form

of program

a process

In this

article

we

the same set of ~-nodes.

Descriptors: D .3.3 [Programming Languages]: Language Constructs Languages]: Processors—compdew; structures; D. 3.4 [Programming optimization; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical AlG.2.2 [Discrete Mathematics]: gorithms and Problems—computations on dwcrete structures; algorithms; 1.1.2 [Algebraic Manipulation]: Algorithms—analysis of Graph Theory—graph Categories

and

Subject

and Features—control

algorithms; General

1.2.2 [Art Terms:

Additional

ificial

Algorithms,

Key

Words

1. MOTIVATION Static

Intelligence]:

Automatic

Languages,

and Phrases:

AND

Theory,

Static

Programming—program

transformation

Performance

Single-Assignment

(SSA)

form

BACKGROUND

Single-Assignment

(SSA)

form

[Cytron

et al.

1991] and the more general

Sparse Evaluation Graphs (SEG) [Choi et al. 1991] have emerged as an efficient mechanism for solving compile-time optimization problems via data flow analysis [Alpern et al. 1988; Choi et al. 1993; 1994; Cytron et al. 1986; Rosen et al. 1988; Wegman and Zadeck 1991]. Given an input data flow framework, the SEG construction

algorithm

distills

the flow graph

nodes are then

interconnected

tions.

SSA form identifies

Ron A

Similarly,

K. Cytron

is currently

preliminary

Workshop

version

were Research Authors’

Staff

MO

San Diego, 92093–0114; Permission granted

email: to make

advantage,

the

copying

servers, 01995

email:

digital\hard

copyright

National

for Parallel

Science

in

grant

Proceedings 1993.

Labs,

University,

Yorktown

Department

edu; Jeanne

Ferrante,

and Engineering,

9500

CCR-9402883. time,

6th

Annual

both

authors

Heights,

New

of Computer University

Gilman

solu-

those variable

of the

At that

These

flow

York.

Science,

of California,

Drive,

La Jolla,

CA

edu. copies the

M by permission to lists,

Foundation,

Reserach

nodes. data

interconnects

Computmg,

T. J. Watson

copy of part

0164-0925/95/0500-0487

for evaluating

appeared

cytronClcs.wustl.

that

notice,

a set of relevant

Science

algorithm

Washington

ferrantet2cs.ucsd.

or to redistribute ACM

Cytron,

of Computer

fee provided

given

by the

at IBM

K.

63130–4899;

Department

without that

Ron

into

edges suitable and appropriately

@-placing

and Compders

Members

addresses:

St. Louis,

funded

of the

on Languages

with

title

are

or all of this not

of the

of ACM, requires

prior

made

publication,

Inc.

To copy

specific

work

for personal

or distributed and

its

othervnse,

permission

for date

or classroom profit appear,

to repubhsh, and/or

use is

or commercial and

notice

is

to post

on

a fee.

$03.50

ACM Transactions on Programmmg Languages and Systems, Vol 17, No. 3, May 1995, Pages 4S7-506

488

Ron K. Cytron and Jeanne Ferrante

.

references that are relevant to solving certain data flow problems. The SEG and SSA algorithms take the following structures as input: Graph

Flowgraph.

then we write

~f has Nf,

X 6 Pred(Y)

Depth-first

edges &f, and root

and

Y c SUCC(X).

Let dfn(X)

numbering.

be the number

1< in

a depth-first

DFST.

of ~f. 1

search

Similarly,

we define

dfn(X)

We

the

Dominator

flow

Let idom(X)

tree.

We say that graph

call

to Y;

Y,

dominates

Y, written

unique

immediate

dominator

Node

Inn%al sparse

L Nf

The output

flow

SSA,

and SSA algorithms

tree

and

of flow graph

node from

transitive.

Y and X # Y.

path

We say that

X

Each node X has a

W ~idom(X).

dominator

of those

nodes

such

nodes

represent

such

nodes

contain

essentially

produce

tree.

(An

1.)

that

must

appear

nonidentity

in the

transference

of variables.

definitions

the following

structures

as

:

No,

Sparse nodes.

Previous

which

we compute

Nti G N.,

as a property

which

we compute

SEG and SSA construction

algorithms

~-function

nodes.

(1) The algorithm

precomputes DF(X)

the dominance

= {z

I (3(Y,

In other words, X dominates The dominance (2) The algorithm (3)

subset

an SEG, For

spanning

on every

tree are shown in Figure

is an initial

framework.

SEG

reflexive

appears

of X in a flow graph’s

nodes.

in a data

depth-first

dominator if X

>> X and V W >> X,

For

X

such that

and its dominator

Na

with

= k.

Y,

X >> Y, if X z

flow graph representation.

associated

X ~

is both

serves as the parent

idom(X)

example

E &f,

I

I d~n(X)

written

idom(X)

idom(X)

If (X, Y)

is connected.

mapping

= X

domination

strictly

the

~f

associated

be the immediate

X dominates

Entry

lNf


dfn(idom(Z)

dominators

).

have decreasing

depth-first numbers. The second case is when X and Y are siblings in the dominator tree. No order is determined by the “typical case,” since here both nodes have the same immediate dominator. We define the equidominates dominator

of a node X as those

equidom(X) For

example,

nodes

with

the same

immediate

as X:

in Figure

= { Y

\ idom(Y)

= idom(~r)

}.

1, equidom(A)

= {A,

V, W, X, Y, Z }

More generally, the noun equtdommates refers to any such set of nodes. Our algorithm essentially partitions equidominates into blocks of nodes that are in each other’s iterated dominance frontiers, but without the expense of explicitly ACM

Transactions

on Programming

Languages

and Systems,

Vol. 17, No. 3, May

1995.

492

Ron K. Cytron and Jeanne Ferrante

.

Algorithm[l]

Sparse

NodeCount foreach

graph

node

determination

(simple)

+- O

(X 6 ~f)

T(A-)

do

+ false

;

O(X)

t

false

;

Map(x)

+- x

od Cnum for

+

O

n = N downto foreach

1 do

({Z

\ dfn(zdom(Z))

if (Cnum(Z)

= n})

= O) then

call

do Vzstt(Z)

fi

od od

for

n = N downto foreach

1 do

({Z

I d~n(zdom(Z))

= n})

do

call

F’znzsh(Z)

od

od end Function for

FtndSnode(Y, (X

= Y)

P)

repeat(X

if (T(Map(X))

: node = idom(X))

or

do = P))

(zdom(X)

then

return

(Map(X))

fi

od end Procedure

IncludeNode(X)

T(4Y)

+- true

end Procedure

F’mzsh(Z)

foreach

((Y, Z)

s +

c &f

I Y # zdom(Z))

FzndSnode(Y,

do

idom(Z))

if (’Y(s)) then 0(2)

+

true

fi od if (T’(Map(Z))) call

then

IncludeNode(Z)

fi end

Figure

computing It

dominance control

is the

needed

order.

frontiers. flow

If

the

nent

(S CC)

(Afo ) iff one of the

of control

the

its

SCC.

of this If the

determined

flow

equidominates

are

edges,

then

frontier

node flow

used

are not edges

in

any is.

relation

is then

equidominates

by control

between

equidominates

dominance

membership

edges

equidominates

D&’+

termines

3

the

of the Our

for

between

their

determines

connected

equidominate~ finds

a representative same

here

strongly

algorithm

to determine in the

that

same

the SCC,

respective

in the these node, the

SCC

SCCS

is in

and

de-

Map(X).

membership then

the

compo-

The

of all nodes topological

SCCS is the

in

order

correct

order

to follow. Our

algorithm

connected ACM

uses

components

TransactIons

a well-known of the

on Programnnng

algorithm

dominance Languages

[Aho

frontier and Systems,

et al. graph

1974] (in

which

Vol. 17, No. 3, May

to

find an

1995

strongly edge

from

Efficiently Computing Procedure

On-The-Fly

.

493

VZsit(Z)

LL(Z)

G Cnum(Z)

call

+Nodes

G

+ +NodeCount

push(Z)

foreach

((Y, Z)

: &f

\ Y # idorn(Z))

s +- FindSnode(Y, if (idom(s)

call

do

idom(Z))

then

# zdom(Z)) InctudeNode(Z)

else = O) then

if (Cnurn(s) call

Visit(s)

LL(Z)

+- rnzn(LL(Z),

LL(s))

else if (( Cnurn(s) LL(Z)

< Cnum(Z))

and

+- mzn(LL(Z),

OnStack(s))

then

Cnum(s))

ti fi if (T(s)

not

and

call

then

OnStack(s))

IncludeNode(Z)

fi fi od if (LL(Z)

= Cnum(Z))

then

repeat Q 4-

~0~()

Nfap(Q)

i-- Z

if (Q c N.) call

then

IncludeNode(Q)

fi if (T(Q))

then

call

IncludeNode(Z)

fl until

(Q = Z)

fi end Figure

Y

to

Z

implies

spanning

tree

To do this, et al.

the

1974].

to

nodes

Briefly,

give

Cnum

in this

reverse

of the

in reverse

been

in M+

from

Nm. ACM

visited,

data

the

of last

structures

depth-first

depth-first

is a root

finds

order

or

the

and

assigned

d~n assigned

“SCC”

in

depth-first

by depth-first

Cnum(X) number

number

of a SCC.

roots

visit

search.

LL(X)

[Aho

by procedure

initially).

“component”

LL refers

is to

manner.

an overview

‘VZsZt)

have

algorithm in the

is a forward,

and

nodes

SCCS

uses auxiliary

structures

procedure

That

the

if a node

associated

data

D.F(Y)).

to the

determine

We now

c

algorithm

(as opposed

Visit used

tial

Z

containing

4

then

each

depth-first

the main

Transactions

algorithm.

visits

order

procedure

on Programming

The node

main

not

procedure

already

of immediate calls

Languages

Finish

initializes

visited

(by

dominators. to determine

and Systems,

essen-

calling

the

After

all

membership

Vol. 17, No. 3, May

1995.

494

The

procedure

is first

put

between X

Ron K. Cytron

.

to

with

and

of the

Y

a call

to

the

membership

and

pops

the

Central

to

dominator

in N. entire

that

The

algorithm

implementation: X

(Section the

of FindSnodeo.

algorithm

as discussed

of our

in Section

ensuring

appear

that

nodes

of lower

The

loop

nodes and

to

graph

D.

nodes

since

since

C’s

Alap(C) since

number.

only

E

been

processed

are taken,

and

—For

C,

X,

W,

Our

the proof

rather

X~Y. of FindSnode(Y, in Section time

P)

in Fig-

3 is based

on the

bound,

illustrate that

we modify

depth-first

our

❑ calls

by

Visit

with

considers When

is next

D.

in turn

called

of

Although

holds

loop

Z,

Y,

and Visit

C’, no action on B,

1).

Since

no

for

nodes

W

at ~

Thus,

Map(D)

X,

each

on

by ~,

is set to

is

of which

is called

is taken

kfap(B)

for

in Visit

D.

B,

correctly

property

Figure

same The

been

this

dominates

❑ considers

dominates

already

10 in

❑; the

immediately

application

{ B}.

determine

numbered

are taken

of D

have

correctly

(depth-first

=

that

number

can

the

N.

~ dfn(idom(Z)).

I’tsit

A“,

P>

4. We now

empty.

places

P.

of FindSnodeo

dfn(idom(X))

immediately

When

ascends

below

G; and

also,

B;

B in No.

V, calls

❑ considers

loop and to

becomes

Z

(although

Visit).

A.

When

When

Visit

the nodes some Visit

works

immediately

of these works

will

on

on Y,

A,

dominated already no

steps

Find5’node

have of ~ will

be

X.

FindSnode

returns

B;

since

B and

Y are not

equidominates,

Y is placed

by ~. FindSnode

recursively —For

stack)

returns

version

❑ then

E, suppose Y,

Map(A)

on C and

in No

❑ is

nature

here

C, loop

by recursive

called

—For



❑ considers A,

V

Loop so

C.

step

in order

determines on the

which

node

3, we mention

predecessor

~.

P),

graph

1, assuming

algorithm

with

predecessor

is set to

simultaneously

almost-linear

3 and

by 1~, no steps

no node, only

our

higher

depth-first

the

B ~ N.,

When by

with the

is set to D in loop dominates

~

❑ considers

When

empty,

a node

4.

in Section

❑ begins

at

for

(accomplished

are currently

as presented

in Figure

be in N.,

are dominated

inefficient,

in Figures

flow

Z < DF(X) By

in Na

FindSnode(Y,

Z ~ SUCC(Y),

To obtain

is shown to the

determined

Visit (which

always

I Z E DF+(X),

behavior

proofs

in

SCC

on the

algorithm

full

code

be placed

function

albeit

the

be

should

3) relies

correctness

algorithm

or we may

may

for a sparse

a straightforward,

algorithm

on

be necessary,

function

ure

The

Depending

final

We present

the

flow

Visit

(Y, Z)

Z tree

FindSnode).

Y, searching

correctness

The

node

calling

to

edge

input dominator

control

in the

is the

node

of algorithmic

3.

the the

each

node

of all nodes

Here, Then

(A’. ) (by call

the

algorithm.

of SCCS).

stack.

from

actual

track for

a further

than

its

of the

keep

is searched

directly

this

heart

to

IncludeNode).

tree

Ferrante

to be in DF+

search,

determine

the

(used

determined

success

able

forms

a stack

idona(Z)

already

the

Visit

on

and Jeanne

B,

on A’. FindSnode

ACM TransactIons

returns In this

X;

returns

on Programmmg

since

X

invocation, B,

so X

and

loop

Y are equidominates, ~

is placed

considers

nodes

Visit B and

in N..

Languages and Systems, Vol. 17, No. 3, May 1995

is called W.

Efficiently

—For

W’,

FindSnode

invocation,

returns

loop

~

A, FindSnode

—For since

A is not

—For

V,

—For

X,

Now,

is called

A,

Since

V,

V;

and

that

happens

returns

FindSnode

returns

Map(X)

Map(W)

A.

nothing

@Nodes

On-The-Fly

recursively

.

on

495

W.

In

this

X.

node to W

has already because

Visit

is then

called

which

is already

been

visited,

and

of A. recursively

for

V,

where

V.

becomes

In particular,

so Visit

nodes

returns

in N.,

FindSnode

Map(V)

W,

considers

Computing

and

X,

is not

iMap(X)

on stack,

so nothing

happens.

set.

are both

set to W.

Since

X

E Nr,

W

is placed

in No. When

❑ considers

loop

is called

for

Z,

nodes

W,

X,

D and

V, each has already

and

Y are considered

—For

D, FindSnode

returns

B, so Z is placed

—For

Y,

returns

Y,

In

FindSnode

contrast,

previous

only

have

constructed

cally

quadratic

The

proofs

put

would

on the

The

contained

order

suffices eral

been

et al.

Visit

Cytron

1991;

visited. et al.

sets in Figure

also have

When

in No.

has already

frontier

worklist,

in this

in which

iterated

namely

1991]

2, which

through

the

would

not

are as ymptoti-

dominance

frontier

B, V, W, X, Y, Z }.

{E,

as they

is itself

presented

algorithm

general prove

@nodes.

the

algorithm pitons

3.11:

The

—Lemma

3.16:

Two

in the

iterated

—Theorem

—Theorem

in other

the

proof

sequence

Ma,po

structure

node node.

3.19:

the

When

and

from

dominance

theorems

The

placement

theorems,

of this

The

trees,

in Section

follows

2

gen-

frontiers,

are offered

node-ordering

sep-

property

3.8.

in

as a sparse

presented

property

3.7; these

contexts.

presented

different

This

dominator

previous

dominance

3.18:

identified

in

following:

by the algorithm

3.5, 3.6, and

identifies

shown

the

of @-nodes.

graphs,

be useful

correctly

The

—Lemma

flow

as Corollary

results that

are considered

in Theorems may

establish

placement

about

are proven

arately,

The

the

observations

section

nodes

to identify

which

(2)

[Choi

visited.

CORRECTNESS

3.

(1)

methods all dominance

in size, but

sets of all nodes

which

been

by ~.

nodes frontier

Section

Relying

3.18,

2 computes

the

3.19,

on the and

correct

3.20 set

of

are as follows:

correctly have

the

of the

representing

algorithm

of @-nodes. Theorems

relates same

equidominates.

Mapo

value

only

if one is

other. a set

terminates,

of equidominates

@-nodes

have

is correctly

been

correctly

identified. Theorem sparse LEMMA

3.20

is then

easily

established:

the

algorithm

correctly

identifies

the

nodes. 3.1. ACM

(Y,

Z)

E tf

Transactions

=+-

idorn(Z)

on Programming

zY. Languages

and Systems,

Vol. 17, No. 3, May

1995.

496

Ron K. Cytron and Jeanne Ferrante

.

PROOF.

Suppose

not.

idom(Z).

contain to Z that

does

Then

If path not

COROLLARY

there

exists

P is extended

contain

idom(Z).

Thus,

Z)

E&f

and

Y

dfn(idom(Z)) COROLLARY

Suppose

Lemma

either

(1)

X

3.4,

3.6.

left

by

PROOF. (1)

X.

3.7. Suppose

idom(Z)

=X,

(2) idom(Z)

Since and

Z

Y

and

~

of DFST

tree

idom(Z)

edges

$

from

Z

that

idom(Z)

to

spanning

tree

❑ m the depth-first frontier

of X.

Z)

6 Sf such

since

if so X

But

spanning

that >

Y

XZY =

and

idom(Z)

3.3, idom(Z)

since

X

>> Y,

z

X

>

>

Z.

Z,

idom(Y)

we have

X

a >>

>> Z,

a

❑ dfn(X)

By

> dfn(idom(Z)). but

Theorem

Suppose must

in DFST.

3.5,

there

E DI’(X).

thus

some of X

left

the

of

or node

Y c 1%-eds(Z)

in DFST,

of Z.

DFST

But

Y

is also

Z cannot

have

to a

Z @ DF(X).

proving

idom(Z)

In

Z @ DI’(X),

existed

Y is to the Therefore

==+

Z

be a descendant

therefore

Then

Root

Y.

dfn(idom(Z)),

E DF(X) not.

Y,

>> Z.

a contradiction,

which

# X

X

==+
> Z.

# Y. By Corollary

idom(Z).

to its left

case reached

THEOREM

*

X

a path

is a path

=

the

==+

of idom(Z)

Then

of idom(Z). of”

of idom(Z),

predecessor Either

>> Y

dfn(X)

“left

in

Y

dominate

idom(Z)

==+

>> Y.

be in the dominance

E llF(X)

is an ancestor

dominated

there

Z @ DF’(X).

Z

B

exists

cannot

idom(Z)

X

Suppose

(2) X is to the the

X

case that

Hence

THEOREM PROOF.

not

a path

< dfn(Y).

idom(Y)

X

there

>> Y,

Therefore,

contradiction.

~f,

does

==+

of idom(Z)

then

c DF(X).

Z

~

is an ancestor

If X

be the

Y,

Then

then Z cannot

contradiction. By

that



and Y # idom(Z)

c&f

ancestor >

Thus,

Suppose

cannot

an

idom(Z)

X. 3.5.

of Gf,

PROOF.

is

not.

Since

THEOREM

Y.

X

and idom(Z)

excludes

DFST

It

If

3.4.

X.

Y that

Y

we obtain

3.3.

of Gf,

excludes

idom(Z)

#

idom(Z)

PROOF.

$

then

@ ~f.

~ dfn(idom(Y))

(Y, Z)

LEMMA

(Y, Z)

: Root

(Y, Z),

3.2. (Y,

DFST

P

a path

by an edge

the

theorem.



>> X.

either

implies

Z

idom(Z)

@

~ X.

DF(X); Consider

or, then

the

path

P : Root ~ A“ that

does

not

contain

idom(Z).

If Z G DF(X),

then

we can extend

Q: Root bX~Y~Z ACM

TransactIons

on Programming

Languages

and Systems,

Vol. 17, No. 3, May

1995

path

P to

Efficiently Computing

such that

X~Y

between

X

idom(Z)

must

DFST Either

and

and

occur

3.8.

The

lemmas

following

With

on path

Z

By

node

those

We omit

3.5,

before

the

497

Q such that

not

occur

node

Z.

edges

P,

on path Thus,

X

is a



theorem.

2 d~n(idom(Z)),

properties that

.

Z < DF(X).

proving

proofs

does

and

dfn(idom(X))

=+-

formalize

X

Theorem thus

On-The-Fly

we can construct

idom(Z)

Since

Q after

G D~(X)

proof.

X>Y,

edges.

a contradiction,

COROLLARY

correctness

$Z.

of idom(Z).

ancestor

case reached

in our

X

Y are DFST

+Nodes

of our algorithm

directly

follow

that

from

participate

inspection

of our

algorithm. LEMMA

3.9.

LEMMA

3.10.

is called exactly

Visit During

all

calls

once for each node in Flowgraph

to Visit,

each

node

is pushed

and popped

exactly

once. PROOF.

exactly then

By

once.

Lemma

We

Z is popped

Z belongs

by

the

3.11.

Y

PROOF.

Since

initially

algorithm.

=

X

are

popped

set

stack,

call,

Z is pushed

If Z

pushed.

by H,

H

Map(Z),

=

Otherwise, # Z,

in which



= idom(X). the

is only

off the

Z was

represented

idom(Y)

Map(X),

=

Map(X)

on this once.

H is pushed.

in which =+

once; exactly

in which

component

iteration

Map(X)

exactly

Z is popped of Visit

connected

Otherwise,

equidominates

that

invocation

by the

LEMMA

is invoked

to show

to a strongly

case Z is popped

Visit(Z)

3.9,

need

lemma

holds

during

In this

the

case,

at

loop

the

at

we have

start

step

of

~,

the

when

idom(IMap(X)

) =



idom(X). LEMMA

At step ~,

3.12.

node s (referenced

at step ~)

of

s cannot

has already

been pushed

and popped. PROOF. Thus, s has

By

s either not

before

been

step

~

the

predicate

has

not

pushed

then

Crtum(s)

node or

=

Therefore,

Any

3.13.

~, yet,

O, but

s has

invocation

else

of

s has then

been

be

been step

on

~

pushed

stack

pushed would

and

IncludeNode(s)

at

and have

~. If

pushed

s



popped.

must

step

popped.

already

have

oc-

at step ~.

LEMMA

3.14.

OnStack(X)

LEMMA

3.15.

At

PROOF.

frontiers. LEMMA

PROOF. servation

x.

pushed

is reached.

COROLLARY

curred

step

been

Follows

and OnStack(Y)

step ~,

from

FindSnodeo

==+ idom(X)

returns

of FindSnodeo

inspection

= idom(Y).

s I s = Map(S),

Z 6 DF(S).

and the definition

of dominance

•l

Map(X)

3.16.

Follows that

from

Map(X)

= Map(Y)

==+ Y e DF+(X)

initialization represents

(Map(X) the

strongly

=

X),

or X = Y, Lemma

connected

3.15,

component

and

the

ob-

containing

❑ COROLLARY ACM

3.17.

s e N.

Transactions

+

Map(s)

on Programming

G Na. Langnages

and Systems,

Vol. 17, No. 3, May

1995.

498

Ron K. Cytron and Jeanne Ferrante

.

THEOREM

3.18.

PROOF.

We

(==+).

consider

We actually

the

iteration

by

backward

that

T(X)

Base

induction

Case.

trivially

satisfied.

Inductive

~.

a stronger prove

With

vertex(~)

idom(s)

Lemma

3.15

the

steps

that

T(X)

I dfn(idom(X))

To accommodate

depth-first loop

algorithm),

idom(Z)

noting

= idorn(X).

spanning

❑ empty,

at step

potentially

tree,

and

this

so

case is

set

= n.

~

s = itfap(S),

must

have

returned

Z E DF(S).

Z E DF+(s).

From

Lemma

3.11

and

with

idorn(s)

#

I N

> djn(ido7ia(Z)).

~ k ~ n + 1, we obtain T(s)

S eNm.

=+

Z E No.

Steps whose

~~,

~,

and ~.

immediate

Each

dominator

of these

gets

pushed

and

{ZI,

Z2,...,

ZL } according

is the

is the

first

last

popped

such

such

once

to the popped;

popped.

determines

Each

node

I dfn(idom(X))

exactly

node

node

steps

is idom(X).

{X

xl

of our

when

With

step

E N..

3.8 implies

IHA(k)

Thus,

progression

in its

node.

dfn(idorn(s)) Applying

==+ X

Visit(Z)

# zdorn(Z),

c N..

hypothesis

those

we obtain

Corollary

idorn(Z),

T(X)

induction

is childless any

s I T(s), From

result:

the

Map(X)

separately.

on n (following

Consider

e

implications

dominate

Step.

T(Map(X))

be set in procedure

Node

immediately

two

prove

can only

cannot

Step

the

❑, we

of step

❑,

o? step

As

(by

order node

by consulting

nodes

set

= n}

❑ and

steps

in which

they

x, is popped

Accordingly,

T (X)

in the

~.

before

we define

We name

emerge

the

from

node

the

X,+l.

such

nodes

stack:

node

and

node

XL

predicate

L

Popped(k)

which

is true

following

when

induction

exactly

Base popped

Case. x,,

Inductive ACM

Prior

affect

any so step

Step.

Transactions

k such

~ .=k+l

OnStack(xl)

nodes

have

been

popped.

\Ye

nox

prove

the

hypothesis:

IHB (n)

cannot

=

to

popping

of the ~

(Popped(n))

R

z,.

cannot

We now

on Programming

xl,

By

Lemma

Lemma

affect

prove

A T(X)

IH~

Languages

3.14

3.12,

any

of the

step

=+

X

s N..

ensures

that

~

requires

steps

z,.

(n). and Systems,

~

and

~

s to be an already

Vol. 17, No. 3, May

1995

Efficiently Computing



Step lary

3.13,

. By and

Lemma

3.12,

assuming

s has

~~E(k)

already

I 1
dfn(idom(Y)).

set

T(Y)

call

to FindSnodeo

< d~n(idom(X)). ~

d~n(idorn(Y)) —X

popped

c Map(Na)

By

lHc(n

– 1), we have

at step

~

will

Y’(X)

return

X.

true. There

cases:

d~n(idom(Y)) step

Y

n >0.

[ y c DF(X),

= idom(y),

y G DF(X) We

and

~,

Y = Map(y).

Inductive

Since

=+- T(Y).

Y.

sets T(Y)

= d~n(idom(X)). Step

~

sets

Step



Two

cases:

sets

T(y)

true.

When

y is popped

at

true.

‘Y’(y)

true.

When

y is popped

at step

~,

step

~

sets

true. ACM

Transactions

on Programming

Languages

and Systems,

Vol. 17, No. 3, May

1995.

500

Ron K. Cytron and Jeanne Ferrante

.

—X

= Y.

Thus,

then

both

T(X)

=+

implications

THEOREM

3.19.

T(Y). hold.

•l

the call to Finis/z(Z),

i-lfter

@(Z)

-

Z G No.

PROOF.

@(z)

u

3(Y,

Y s

z)

e tf

#

idom(z),

=

FimiSnode(Y,

]

icio7ri(Z)),

sENg in Finis

by construction

ho and Theorem

3.18.

But

UZCDF+(S), in FindSnodeo,

by construction

by definition

After

3.20.

PROOF.

X

~

T(iiKap(X))

No e

(by

holds

S=NO this

holds



of N+.

THEOREM

and

this

calls to Finisho, Map(z)

Theorem

E N.

3.18).

T(X)

u

(by Corollary in Finish,

But

● N..

X 3.17).

But

‘T(Map(X))

Map(z)

-

E No

u



T(X).

4. COMPLEXITY In

this

the

section,

number

is the

total

time

FindSnode that

we first

of nodes for

correctness

obtaining

our

desired

THEOREM

The

phase

—a

call

in the

+ E + T), input

Unfortunately,

a faster

hold.

almost-linear

In

version

our

faster

complexity

where

IV is

flowgraph,

T is not of our

linear

algorithm

algorithm,

and

T

using

and

show

is O(Ect(E)),

T

bound.

Algorithm of Figure

3 is O(N

+ E + T),

of edges in the input

where N w the

f?owgraph,

and T is the

consists

of

phase,

Visit

where

is called

recursively

once

for

each

node,

and

to Finish.

We analyze

the complexity

Visit(Z),

call

starting

at step

step

~

can

be ignored

Since

provide

still

is O(N

of edges

FindSnode.

to

We then

algorithm

initialization

—a

each

algorithm

for all calls to FindSnode.

time

loop

calls

our number

The algorithm

~. 1.1.

PROOF.

For

the

of nodes and E the number

number

—an

that

E

results

4.1 Analysis of Initial

total

all

as written.

our

show

and

where

Visit

the

~

amount

predecessor

edges

of the

in determining only

the

once

on Programmmg

phases.

is a constant

over

contents

is called

ACM Transactions

of each of these there

for

stack

into

Consider node,

initialization

of work

are popped.

bound. each

The

over

Z, The the all

not and

phase inside

the

constant loop calls

any

loop

the

starting

at

amount

starting to

Visit,

Languages and Systems, Vol. 17, No. 3, May 1995.

is O(N). loop,

of work

at step this

~.

loop

is

Efficiently Computing

executed O(E)

O(E) calls

starting loop

times.

at step

O(E)

result.

•l

We

now

shown tree

Finally,

analyze

path

graph version

the

Entry

of our

the

cost

algorithm

is O(I3

It

of the

require

execute For

consists

501

at most

the

last

once,

of O(E)

the

each node

asymptotic

plus

desired

FindSnode(Y,

function

loop

so this

work

we obtain

FindSnodeo

overall

.

exactly

work,

visiting

of applying the

will

work.

popped

all of this

behavior

The

and

On-The-Fly

loop

other

to Finish.

could

). As such,

the

O(E)

are pushed

call

invocation

0(N2

Visit,

to

, and

Summing

asymptotic

to Y.

is then

calls



all nodes

consider

3. Each

from

nodes

all

step

to FindSnode.

calls

in Figure

over in

in Visit,

~

is O(lV).

at most

Thus

FindSnode

to

+Nodes

P)

as

on a dominator

to each

behavior

of N

of the

flow

simple

x N).

4.2 Faster Algorithm a path-compression

Using our

algorithm Eval

the

(Y).

Using

dominator

associated

result

to use the the

tree

links

from

initially

due

to Tarjan

[1979],

we rewrite

certain

parts

of

instructions:

with

established Y,

by the

returning

each

node

the

X

Linko

node

instruction,

of

is –d~n(X);

Evalo

maximum

the

link

label.

of each

ascends The

node

label

is initially

1. Link node

(Y, idorn(Y)). Y now

Update

(X,

struction The

djn(X)).

is issued

(1)

We initialize Links

(3)

Path

node

the X

version

the

=

idorn(Y).

immediate

Changes

label

information

associated

is updated

Ilvcdo

whenever

search

that

includes

with

X

to

dfn(X).

This

in-

in set ~0.

is obtained

at steps

the

Evalo of Y.

included

of algorithm

to extend

Any

dominator

becomes

path-compression

are inserted

at step

lznk(Y) the

when

path-compressing

(2)

(4)

Sets

also includes

~

and

search a node

as follows: ~.

at steps X

~

and

is added

~.

to the

sparse

graph,

~.

We redefine

function

Function

FindSnodeo

FindSnode(Y,

return

P)

by:

: node

(Map(Eval(Y)))

em

end We now of our

state

and

Proper

erations

are used Linko

required

4.2.1.

PROOF.

Steps

node

Z

has

representative ~,

in a manner

and

LEMMA

and

the

Use of Path-Compressing

structions

each

prove

properties

of the

path-compressing

version

algorithm.

At ~

Updateo steps and

a unique

in Map(Z),

/ink(Z) ACM

= 1 just Transactions

with

are applied ~

~

Instructions. consistent

and ~, initialize

and prior

since

to nodes link(Z)

linlc(X)

immediate

We first their

=

n is a strictly

on Programming

at the

end

= L prior

dominator

to applying

show

definition

Lznko Languages

1

for

in

the

[Tarjan

each

node

and

decreasing

and Systems,

inop-

Linko.

X

G Aff.

and

Since

a unique

sequence ~

above 1979]:

of a “link-path.”

to invoking

idom(Z), at steps

that

at steps ~.

map ~



Vol. 17, No. 3, May

1995.

502

Ron K. Cytron and Jeanne Ferrante

. NodeCount foreach

+ 0

(X E Nf)

T(X)

do

+- false

@(X) + false

;

Map(x)

;

+ x

od foreach

(X E ~f)

lznk(X)

do

+ L ;

Label(X)

+

– dfn(X)

od Cnum for

+

O

n = N downto foreach

1 do

({Z

I djn(zdom(Z))

if (Cnum(Z)

= n})

= O) then

do

VZsZt(Z)

call

fi

od foreach

({Z

I dfn(tdom(Z))

if (Z = kfap(Z))

= n})

do

then

call

Lmk(Z,

tdom(Z))

call

L2nk(Z,

Map(Z))

else fi od od foreach

(X ~ J/f)

lznk(X)

do

+ L ;

Label(X)

+

– dfn(X)

od for

1 do

n = N downto

foreach

({Z

call

I dfn(zdom(Z))

= n})

do

= n})

do

Fzntsh(Z)

od

foreach

({Z

I dfn(tdom(Z))

if (Z = &fap(Z))

then

call

L~nk(Z,

tdom(Z))

call

Lznk(Z,

lvfap(Z))

else fi od od

Procedure

IncludeNode(.Y)

T(X)

& true

call

Update(X,

em

dfn(X))

end

Fig.

LEMMA

4.2.2.

PROOF.

The

cesses only sidered

Update(X,

procedure

by Visito,

produces

ACM TransactIons

each

Since such

of the Faster the

Faster

same

step

node

on Programmmg

is revoked is invoked

~

has not

has 1 for

Algorithm.

output

algorithm.

dfn(X))

lncludelVodeo

equidominates.

Correctness version

When

5.

We

its now

as its slower

at step ~,

only

yet

link. show

from

executed

lznk(X)

Visito, for

any

= L.

which

pro-

node

con-

❑ that

the

path-compressing

version.

Languages and Systems, Vol. 17, No. 3, May 1995.

Efficiently

4.2.3.

~EMMA

As

implementation

invoked

durtng

of FindSnode(Y,

@Nodes On-The-Fly

Computing

the course

P)

respective

of their

.

503

algorithms,

each

the same answers.

returns

PROOF. —By

inspection,

FindSnode(Y,

ancestor

X

of Y,

function

returns

no Nfap(X) where —We

is ancestor

now

argue

notice

X

~ap(X)

if X

strongly

connected

to

that

member

the path

can be changed

(1)

If Map(X)

by step

P),

(a)

there

(b)

its

is a link

label

has been

some

node node

on

the

that

versions The

is almost

THEOREM

sparse

the

graph.

returns

If

A4ap(X),



exactly

this

behavior.

and ~

link

each node

each

node

X

representative

node

X’s

There

label

begins

are two

X

to

idom(X).

member,

while

as –d~n(X),

cases to consider:

between

Y and

P (exclud-

There

4.1.1

than

in

the

is

by its

node

will

depth-first

asymptotic

label, some

node

complexity

the

up any

Eval(Y)

closest

to Y whose

Applying

sparse

this

node

of Y just

P)

Y,

Since

dominator,

graph.

in

representative

from

yet.

Mapo

once

is returned.

negative

the ancestor the

sparse

path in

component

included

be

link

linked

its immediate

node

of maximum

which

on the been

connected

of FindSnode(Y,

faster

Our are

and

higher

strongly

path

label

P has not

will in

below

graph,

then

number. be the

the P.

each

Thus, node

strongly

when of min-

connected

Applying

Mapo

once

is returned.

compute

the

same



results.

of the path-compressing

version

of our

linear.

4.2.4.

PROOF.

the

containing

Performance.

Theorem

in the

and

representative

link

label,

ensures

two

algorithm

each

to dfn(iMap(X)).

since

be labeled

returns

component

Thus,

the

the

negative

node,

is already

that

must

Eva/(Y) imum

again

link

to their

of maximum

P,

in the

node

ensures

node

numbered

node

representative

no

simulates

for any node

to that

node

is depth-first

such

considers

is considered,

function

at steps

Each

graph,

changed

the

excluding

returns

(2) If

path

returns

but

again

X

then

Eval(Y)

node

~

to be df n(X).

sparse

the

otherwise

are linked

~

then

at

to its dominator.

is in the

ing

and

nodes

included

established

Llap(X),

is linked

but

to

#

P)

of links

Y and

to P.

FindSnode(Y,

First,

Thus,

graph,

prior

at node

P. As each node

is already

sparse

of Y just

3 begins

node

if Map(X) in the

that

that

of Figure

excluding

Map(X)

is already

X

P)

up to but

0(~)

Tarjan

algorithm

calls

O(Ea(E))

FindSnodeo.

to

[1979].

takes

The

time. proof

thus

follows

from

El

5. EXPERIMENTS Although of the have this

we have standard

indicated section,

described

algorithms that

this

we present

flow does

behavior results

graphs occur, is not

from

where

the

previous expected

experiments

worst-case

experiments in practice intended

quadratic

behavior

[Cytron

et al.

on real

programs.

to answer

the

1991] In

following

questions: ACM

Transactions

on Programming

Languages

and Systems,

Vol

17, No. 3, May

1995.

504

Ron K. Cytron and Jeanne Ferrante

.

*

l.o– o.9– 0.8– o.7–

*

0.6–

*

o.5–

*

o.4–

*

*

o,3–

** o.2– o.l– 0.0–1 *

Fig.

(1)

6.

Speedup

How the

(2)

does

of our algorithm

the

big

of SSA

an experiment,

form)

1988]

(Ocean,

1976]

and

In

that

compare

Most

algorithm

procedures

with

1991]

before

suite

of

usual

the

our

algorithm

performance

on typical

of

programs?

algorithm

the

beats

the

reported

in

(i.e., given

these

only

the

these Cytron

without

et al.

[Smith

et

are

same

the

et al.

2 with

times

necessary

path-

the

were

al.

[1991].

balanced

in Section

time

construc-

[Berry

Eispack

execution

the

(toward

Perfect

library;

simple

1991];

represent

placed

from

algorithm

et al.

were from

subroutine

our

faster

they

and

experiments

speed

[Cytron

taken

1979]

in the the

10, and

of the

runs

constant)

expected

of these

runs We

experiments

theoretically Though creasingly behavior

show in took

under

have

not not

represented

taller

“ladder”

is expected.

Transactions

execution time

behavior 1 second, any

time of the

of our usual

in practice both

implemented reflect

efficient

not

the

execution

linear

did

more

that

the

also to exhibit

of programs.

ACM

@functions

of the

speed

obtained

to compute

the

of #-functions.

a small

these

wherein

et al.

asymptotically

on a SparcStation

each

et al.

become

benchmark

participated

but

usual

QCD)

[Dongarra

6 we

location

compare

[Cytron

example

139 Fortran

Spzce,

compression) of the

in

Linpack

Figure

(in milliseconds)

algorithm

algorithm

a contrived

time

algorithm?

We performed

procedures

of our

quadratic

must

quadratic

tion

performance

potentially

How

vs. execution

the

algorithm

on the

algorithms

same

and

might

be

Because

on this

collection

path-compression,

that

(with

so can

programs.

are fast

balanced

improvements

is linear

algorithm,

and

be gained

by

so this

algorithm. in Figure graphs

Our

on Programming

6, we also

of the

algorithm Languages

form

experimented

shown

demonstrated and Systems,

in Figure better

Vol

with

a series

1, where

performance

17, No

3, May

1995.

of in-

worst-case when

these

Efficiently Computing ladder

graphs

time

had

10 or more

Our

algorithm

anyway.

graph

of 75 rungs,

taking

rungs,

though

is twice

smaller

as fast

20 milliseconds

@Nodes On-The-Fly graphs

as the

while

take

usual

the

.

scant

algorithm

usual

algorithm

to the

usual

505

execution

for

a ladder

took

40 milli-

seconds. In summary, —linear

performance

test —a

comparison

Thus,

performance

better

evidence

In this

article

puting

dominance

form.

some

indicates

bound

artificially

comparable

although

shows our median

of 3;

generated

expected

cases.

performance

using

our

associated

with

cases,

Future

work will

Recently, termines

will

#-nodes

and

in determining

in DI’(X))

of any

node

X

asymptotically termining

already

~-nodes

updated

Gershbein

and

may

increment

a new

ally

“forward”

in the

remarkably

be more in

uses some

graph.

useful

in

to

(given

some

simple

algorithm

[Cytron

X,

) that

include

et al.

as-

nodes

1991].

Gao

Our

frontier

algorithm

backward

style

as when

of may-alias

de-

properties

dominance

and

our

algo-

that

graph

node

situations,

inclusion

pre-

slower

cases.

“DJ-graph”

%-eedhar

however,

our

of the

(the

Y is in the

The

simple;

response

into

structure

node

though

generated

an elegant

algorithm

a given

for

) behavior

also given

behavior,

artificially

manner

placement

whether

almostbound

0(N2

We have

SSA

rather

algorithm.

developed

algorithm

included

faster

have

Their

@-node

computes

usual

and

introduces in the

the

path-compression

on real

/1994]

time.

original

instead

to the

balanced

or

a worst-case

eliminated

algorithm’s

of com-

is a @-node,

almost-quadratic

frontiers.

simple

step

Graphs

a node

we obtain

we have

dominance our

results

but

@ nodes

of the

algorithm

using that

Gao

article,

cases,

in practice

in linear

in this

and

costly

Evaluation

which

frontiers,

incorporate the

under

a worst-case

and

compare

potentially

SEGS both

is comparable

the

Sparse

dominance

evidence

Sreedhar

developed

In

computing

in many

and

the

form.

experimental

to eliminate constructing

the conditions

through

liminary

rithm

how

constructing

SSA

WORK

when

determining

for

constructing

shown

frontiers

by iterating

linear

AND FUTURE

we have

By directly

than

and

for

algorithm,

of a factor

algorithm

algorithm.

6. CONCLUSIONS

sists

algorithm

degradation

performance

preliminary

simple

simple

for the same cases as the usual

case exhibited

factor-of-2

of our

SSA

is

of deform

information

is

[C ytron

1993].

Acknowledgements We

thank

liminary

Dirk

Grunwald

version.

We

mist ake in our

and

are

Harini

grateful

Srinivasan

to our

for

referees,

their

suggestions

especially

the

with

one

who

of

Computer

a prefound

a

proofs.

REFERENCES AHO,

A , HOPCROPT,

rithms. AHO,

Addison-

A , SETHI,

Addison-Wesley, ACM

R.,

.J ,

AND

Wesley,

ULLMAN, Reading,

AND ULLMAN, Reading,

Transactions

J

1974

The

De.w.yn

and

A7UZ1YS23

Algo-

Mass. J. 1986.

Comptlers:

Prvnczples,

Techniques,

and

TOOIS,

Mass. on Programming

Languages

and Systems,

Vol. 17, No. 3, May

1995

506

Ron K. Cytron and Jeanne Ferrante

.

ALPERN,

B.,

WEGMAN,

programs.

In

Programmmg BERRY,

N.,

AND ZADECK,

D , Koss,

J , LUE,

evaluation

of supercomputers. Univ.

Symposmm

J -D , CYTRON, evaluation

J -D , CYTRON,

R

Trans.

JOHNSON,

R.

ACM,

T

Languages. Press,

Computer

York, J

In

B

T.,

for

O , STNANSON, G ,

Effective

performance

Supercomputing

Automatic

New

1994.

Research

and

construction

York,

55-66.

On the

efficient

of sparse

ACM

data

Symposmm

engineering

on

of ambitious

20, 2, 105–114. R. L.

Efficiently

1990.

Introduction

to Algordhms.

may-alias

on Programmmg

K , WEGMAN, form

The

information

Language

in

Deszgn

and

Record

M

N , AND ZADECK,

and the

control

F

K.

dependence

1991.

graph.

Eff-

ACM

), 451-490.

K

1986.

Code

motion

of control

of the 13th Annual

New York,

R , MOLER, K

accommodating

Conference

assignment

ACM,

J

ACM

structures

Symposwm

m hlgh-

on Prmctples

70–85.

C. B.,

AND STEWART,

G

W.

1979.

Lmpack

Users’

Pa.

1993.

Data

flow

Rutgers

Dependence-based

program

on Programming

analysls

Univ.,

and

New

analysis.

Language

Languages.

Design

In

and

Proceedings

of

Implementation.

F

iteration.

K. 1988. Global

New York,

DONGARRA,

J

D. thesis,

Dept.

of

value

ACM

numbers

and redundant on Prmcaples

Symposzum

12–27.

J , GARBOW,

Matraz Ezgensystem

Ph.

NJ.

Record of the 15th Annual ACM,

J. M.,

C. B. 1976.

incremental

Brunswick,

M. N , AND ZADECK,

Conference

BOYLE,

AND MOLER,

F , JOHNSON,

36-45.

Philadelphia,

WEGMAN,

of Programming SMITH,

A ,

D , HSIUNG,

78-89.

1989.

computations.

York,

Conference

Science,

ROSEN, B. K.,

1993.

Conference

SIGPLAN

New

MARLOWE,

R.

single

AND PINGALI,

the ACM

R , SAMEII,

P , WALKER,

benchmarks:

Center

1991.

J

Syst. 13, 4 (Ott

J. J., BUNCH, SIAM

J

Eng.

A , AND ZADECK,

In

of Programming Guzde.

club

827,

E , AND RWEST,

New

Lang.

languages.

Soflw.

J., ROSEN, B

static

Program.

DONGARRA,

S , SEIDL,

ACM,

of the ACM

ACM,

R , LOWRY,

level

C

In Proceedings

computing

CYTRON,

Y , ROLOFF)

G , MESSINA,

Mass.

R , FERRANTE)

iciently

Trans.

AND GERSHBEIN,

Implementation. CYTRON,

of

Record of the 18th Annual

Conference

R , AND FERRANTE,

Cambridge,

SSA form.

in

on Prmc%ples

111.

Languages.

H , LEISERSON,

Press,

CYTRON,

In

IEEE

analysis.

T.

MIT

S , PANG,

perfect

R , AND FERRANTE,

graphs.

of Programming

program

Lo,

Rep.

Urbana,

equality

M , AND CARINI,

ACM

CORMEN,

Tech.

of variables

Symposwm

P 1993. Efficient flow-sensitive interprocedural compualiases and side effects. In Conference Record of the 20th Annual on Prmczples of Programming Languages. ACM, New York, 232-245.

J -D , BURKE,

of pointer-induced

CHOI,

D.,

D , Fox,

1988. The

Detecting ACM

1-11.

K , ORSZAC, J

of Illinois,

tation

Prmczples

York,

P , KUGK,

R., AND MARTIN,

flow

New

S , SCHNEIDER,

GOODRUM,

CHOI,

1988.

ACM,

C , SGSIW’ARZNfEIER,

CHOI,

K.

Languages. E , CHIN,

Development,

F.

Record of the 15th Annual

M , C~Eri,

CLEMENTI,

M.

Conference

B.

S., IKEBE,

Routmes-Etspack

Y , KLEMA,

Guzde.

V.

C ,

Springer-Verlag,

Berlin. SREEDHAR, Rep.

V

AND GAO,

ACAPS

Memo

G

1994.

75,

Computing

School

@-nodes

of Computer

in linear

Science,

time

McGill

using

Univ.,

DJ-graphs. Montreal,

Tech. Quebec,

Canada. TARJAN,

R

1979.

Applications

of path

compression

on balanced

trees.

J. ACM

26, 4 (Ott.),

690–715. WEGMAN,

ACM

Received

ACM

M

N. AND ZADECK,

Trans.

October

Transactions

Program.

1993;

Lang.

revised

on Programming

F

K

1991.

Constant

Syst. 13, 2 (Apr

October

1994; accepted

Languages

propagation

with

conditional

), 181-210.

November

and Systems,

1994

Vol. 17, No. 3, May

1995

branches.

Suggest Documents