Document not found! Please try again

Abstract Syntax Tree

13 downloads 6349 Views 567KB Size Report
Abstract. Syntax. Tree. Control. Flow. Graph. Object. Code. CMSC 631. 3. Abstract Syntax ... Abstract Syntax Tree Example ... Assignments x := y op z or x := op z.
Compiler Structure Source Code

CMSC 631 — Program Analysis and Understanding

Abstract Syntax Tree

Control Flow Graph

Object Code

• Source code parsed to produce AST • AST transformed to CFG

Data Flow Analysis

• Data flow analysis operates on control flow graph (and other intermediate representations) CMSC 631

Abstract Syntax Tree (AST)

Abstract Syntax Tree Example x := a + b;

• Programs are written in text ■ ■

I.e., sequences of characters

y := a * b;

Awkward to work with

while (y > a) { a := a + 1;

• First step: Convert to structured representation ■

2

Use lexer (like flex) to recognize tokens

x := a + b }

Program :=

...

x

while

+ a

> b

y

Block a

:= a

... +

a

1

- Sequences of characters that make words in the language ■

Use parser (like bison) to group words structurally - And, often, to produce AST

CMSC 631

3

CMSC 631

4

ASTs

Disadvantages of ASTs

• ASTs are abstract ■



• AST has many similar forms

They don’t contain all information in the program



E.g., for, while, repeat...until

- E.g., spacing, comments, brackets, parentheses



E.g., if, ?:, switch

Any ambiguity has been resolved - E.g., a + b + c produces the same AST as (a + b) + c

• Expressions in AST may be complex, nested ■

• For more info, see CMSC 430 ■

In this class, we will generally begin at the AST level

• Want simpler representation for analysis ■

CMSC 631

(42 * y) + (z > 5 ? 12 * z : z + 20)

5

Control-Flow Graph (CFG)

...at least, for dataflow analysis

CMSC 631

6

Control-Flow Graph Example x := a + b;

• A directed graph where ■

Each node represents a statement

y := a * b;



Edges represent control flow

while (y > a) {

x := a + b

y := a * b

a := a + 1; x := a + b

• Statements may be ■

Assignments x := y op z or x := op z



Copy statements x := y



Branches goto L or if x relop y goto L



etc.

CMSC 631

y>a

} a := a + 1

x := a + b

7

CMSC 631

8

Variations on CFGs

Control-Flow Graph w/Basic Blocks

• We usually don’t include declarations (e.g., int x;) ■

But there’s usually something in the implementation

• May want a unique entry and exit node ■

Won’t matter for the examples we give

CMSC 631

• But more complicated to explain, so... ■ 9

Graph Example with Entry and Exit

a := a + 1; x := a + b } • All nodes without a (normal) predecessor should be pointed to by entry •All nodes without a successor should point to exit CMSC 631

We’ll use single-statement blocks in lecture today

CMSC 631

10

CFG vs. AST • CFGs are much simpler than ASTs

entry

y := a * b; while (y > a) {

y>a

• Can lead to more efficient implementations

A sequence of instructions with no branches into or out of the block

x := a + b;

x := a + b y := a * b

a := a + 1 x := a + b

• May group statements into basic blocks ■

x := a + b; y := a * b; while (y > a + b) { a := a + 1; x := a + b }



Fewer forms, less redundancy, only simple expressions

x := a + b

• But...AST is a more faithful representation

y := a * b y>a a := a + 1

exit



CFGs introduce temporaries



Lose block structure of program

• So for AST,

x := a + b

11



Easier to report error + other messages



Easier to explain to programmer



Easier to unparse to produce readable code

CMSC 631

12

Data Flow Analysis

Available Expressions

• A framework for proving facts about programs

• An expression e is available at program point p if

• Reasons about lots of little facts



e is computed on every path to p, and



the value of e has not changed since the last time e was computed on the paths to p

• Little or no interaction between facts ■

Works best on properties about how program computes

• Optimization ■

• Based on all paths through program ■

- (At least, if it’s still in a register somewhere)

Including infeasible paths

CMSC 631

13

Data Flow Facts • Is expression e available? • Facts: ■

a + b is available



a * b is available



a + 1 is available

CMSC 631

14

Gen and Kill • What is the effect of each statement on the set of facts?

entry x := a + b

Gen

Kill

y := a * b

a := a + 1

entry x := a + b

Stmt

y := a * b

y>a

exit

x := a + b

CMSC 631

If an expression is available, need not be recomputed

x := a + b

a+b

y := a * b

a*b

a := a + 1

15

CMSC 631

y>a a := a + 1 a + 1, a + b, a*b

exit

x := a + b

16

Computing Available Expressions

Terminology • A joint point is a program point where two branches meet

entry x := a + b

{a + b}

• Available expressions is a forward must problem

y := a * b

{a + b, a * b}

{a + b}

y>a

{a + b, a * b}

a := a + 1

Ø

{a + b} exit



Must = At join point, property must hold on all paths that are joined

{a + b}

CMSC 631

17

Data Flow Equations ■

succ(s) = { immediate successor statements of s }



pred(s) = { immediate predecessor statements of s}



In(s) = program point just before executing s



Out(s) = program point just after executing s

s

18

• A variable v is live at program point p if ■

v will be used on some execution path originating from p...



before v is overwritten

• Optimization

pred(s) Out(s )

• Out(s) = Gen(s)

CMSC 631

Liveness Analysis

• Let s be a statement

CMSC 631

Forward = Data flow from in to out

x := a + b

{a + b}

• In(s) =





If a variable is not live, no need to keep it in a register



If variable is dead at assignment, can eliminate assignment

(In(s) - Kill(s)) 19

CMSC 631

20

Data Flow Equations

Gen and Kill

• Available expressions is a forward must analysis ■

Data flow propagate in same dir as CFG edges



Expr is available only if available on all paths

• Liveness is a backward may problem To know if variable live, need to look at future uses



Variable is live if used on some path s

• In(s) = Gen(s)

Stmt

Gen

Kill

x := a + b

a, b

x

x := a + b

y := a * b

y>a



• Out(s) =

• What is the effect of each statement on the set of facts?

y := a * b

a, b

y>a

a, y

a := a + 1

a

y a := a + 1

succ(s) In(s )

x := a + b

a

(Out(s) - Kill(s))

CMSC 631

21

Computing Live Variables

CMSC 631

22

Very Busy Expressions • An expression e is very busy at point p if

{a, b} x := a + b



{x, a, b}

On every path from p, expression e is evaluated before the value of e is changed

y := a * b

• Optimization

{x,y,y,a,a}b} {x, y>a



{y, a, b} a := a + 1

• What kind of problem?

{y, a, b} x := a + b

{x, {x,y,y,a,a}b} CMSC 631

Can hoist very busy expression computation

{x}

23



Forward or backward?



May or must?

CMSC 631

backward must 24

Reaching Definitions

Space of Data Flow Analyses

• A definition of a variable v is an assignment to v

May

• A definition of variable v reaches point p if ■

There is no intervening assignment to v

Forward Backward

• Also called def-use information

Must

Reaching Available definitions expressions Live variables

Very busy expressions

• Most data flow analyses can be classified this way • What kind of problem? ■

Forward or backward?



May or must?



forward

• Lots of literature on data flow analysis

may

CMSC 631

25

Data Flow Facts and Lattices

a+b, a*b





“top”

■ ■

a+b, a+1

a*b, a+1

26

• A partial order is a pair (P, ≤) such that

Example: Available expressions a+b, a*b, a+1

CMSC 631

Partial Orders

• Typically, data flow facts form a lattice ■

A few don’t fit: bidirectional analysis



≤⊆P ×P

≤ is reflexive: x ≤ x

≤ is anti-symmetric: x ≤ y and y ≤ x ⇒ x = y ≤ is transitive: x ≤ y and y ≤ z ⇒ x ≤ z

a+b a*b

a+1

(none)

CMSC 631



“bottom” 27

CMSC 631

28

Lattices

Lattices (cont’d)

• A partial order is a lattice if � and � are defined

• A finite partial order is a lattice if meet and join exist for every pair of elements

on any set:

• A lattice has unique elements ⊥ and � such that

� is the meet or greatest lower bound operation: x � y ≤ x and x � y ≤ y if z ≤ x and z ≤ y, then z ≤ x � y ■ � is the join or least upper bound operation: x ≤ x � y and y ≤ x � y if x ≤ z and y ≤ z, then x � y ≤ z ■

CMSC 631

■ ■

• A partial order is a complete lattice if meet and join are defined on any set S ! P 30

• A function f on a partial order is monotonic if (Top - Kill(s))

x ≤ y ⇒ f (x) ≤ f (y)

(worklist)

• Easy to check that operations to compute In and Out are monotonic

repeat Take s from W In(s) := s pred(s) Out(s )



In(s) := s



temp := Gen(s)

(In(s) - Kill(s))

if (temp != Out(s)) { Out(s) := temp W := W succ(s)

pred(s) Out(s )

a function fs(In(s))

(In(s) - Kill(s))

• Putting these two together, ■

} until W = CMSC 631

CMSC 631

Monotonicity

Out(s) = Top for all statements s

temp := Gen(s)

x��=�

x ≤ y iff x � y = y

Forward Must Data Flow Algorithm

W := { all statements }

x��=x

• In a lattice, x ≤ y iff x � y = x

29

// Slight acceleration: Could set Out(s) = Gen(s)

x�⊥=x

x�⊥=⊥

31

CMSC 631

� temp := fs (�s� ∈pred(s) Out(s ))

32

Useful Lattices

Termination

• (2S, ) forms a lattice for any set S ■

• We know the algorithm terminates because

2S is the powerset of S (set of all subsets)

• If (S, ≤) is a lattice, so is (S, ≥) ■

The lattice has finite height



The operations to compute In and Out are monotonic



On every iteration, we remove a statement from the worklist and/or move down the lattice

I.e., lattices can be flipped

• The lattice for constant propagation 1

2

⊤ 3

...

CMSC 631

33

Forward Data Flow, Again Out(s) = Top

34

• Available expressions

(worklist)

repeat Take s from W temp := fs(⊓s pred(s) Out(s ))

CMSC 631

Lattices (P, ≤)

for all statements s

W := { all statements }

(fs monotonic transfer fn)



P = sets of expressions



S1 ⊓ S2 = S1



Top = set of all expressions

S2

• Reaching Definitions

if (temp != Out(s)) { Out(s) := temp W := W succ(s) } until W =

CMSC 631



35



P = set of definitions (assignment statements)



S1 ⊓ S2 = S1



Top = empty set

CMSC 631

S2

36

Fixpoints

Lattices (P, ≤), cont’d

• We always start with Top

• Live variables



Every expression is available, no defns reach this point



P = sets of variables



Most optimistic assumption



S1 ⊓ S2 = S1 ∪ S2



Strongest possible hypothesis



Top = empty set

- = true of fewest number of states

• Very busy expressions

• Revise as we encounter contradictions ■

Always move down in the lattice (with meet)

• Result: A greatest fixpoint

CMSC 631

37

Forward vs. Backward

! if (temp != In(s)) { ! ! In(s) := temp ! ! W := W pred(s)

! } until W =

! } until W =

P = set of expressions



S1 ⊓ S2 = S1



Top = set of all expressions

S2

CMSC 631

38

Termination Revisited

Out(s) = Top for all s In(s) = Top for all s W := { all statements } W := { all statements } repeat repeat ! Take s from W ! Take s from W ! temp := fs(⊓s pred(s) Out(s )) ! temp := fs(⊓s succ(s) In(s )) ! if (temp != Out(s)) { ! ! Out(s) := temp ! ! W := W succ(s)



• How many times can we apply this step: temp := fs(⊓s ! ■

pred(s) Out(s ))

if (temp != Out(s)) { ... }

Claim: Out(s) only shrinks - Proof: Out(s) starts out as top - So temp must be ≤ than Top after first step - Assume Out(s ) shrinks for all predecessors s of s - Then ⊓s

pred(s) Out(s ) shrinks

- Since fs monotonic, fs(⊓s CMSC 631

39

CMSC 631

pred(s) Out(s )) shrinks 40

Termination Revisited (cont’d)

Least vs. Greatest Fixpoints

• A descending chain in a lattice is a sequence ■

• Dataflow tradition: Start with Top, use meet

x0 ⊐x1 ⊐x2 ⊐...



- complete meet semilattice = meets defined for any set - finite height ensures termination

• The height of a lattice is the length of the longest descending chain in the lattice ■

• Then, dataflow must terminate in O(nk) time ■

n = # of statements in program



k = height of lattice



assumes meet operation takes O(1) time

CMSC 631

To do this, we need a meet semilattice with top

Computes greatest fixpoint

• Denotational semantics tradition: Start with Bottom, use join ■

41

Distributive Data Flow Problems

Computes least fixpoint

CMSC 631

42

Benefit of Distributivity

• By monotonicity, we also have

• Joins lose no information

f (x � y) ≤ f (x) � f (y)

f

g

h

• A function f is distributive if f (x � y) = f (x) � f (y)

k

k(h(f (�) � g(�))) = k(h(f (�)) � h(g(�))) = k(h(f (�))) � k(h(g(�))) CMSC 631

43

CMSC 631

44

Accuracy of Data Flow Analysis

What Problems are Distributive?

• Ideally, we would like to compute the meet over all paths (MOP) solution:

• Analyses of how the program computes ■

Live variables



Let fs be the transfer function for statement s



Available expressions



If p is a path {s1, ..., sn}, let fp = fn;...;f1



Reaching definitions



Let path(s) be the set of paths from the entry to s



Very busy expressions

MOP(s) = �p∈path(s) fp (�)

• All Gen/Kill problems are distributive

• If a data flow problem is distributive, then solving the data flow equations in the standard way yields the MOP solution CMSC 631

45

A Non-Distributive Example

CMSC 631

Practical Implementation

• Constant propagation

• Data flow facts = assertions that are true or false at a program point

x := 1

x := 2

y := 2

y := 1

• Represent set of facts as bit vector ■

Facti represented by bit i



Intersection = bitwise and, union = bitwise or, etc

z := x + y

• In general, analysis of what the program computes in not distributive

• “Only” a constant factor speedup ■

CMSC 631

46

47

CMSC 631

But very useful in practice 48

Basic Blocks

Order Matters

• A basic block is a sequence of statements s.t.

• Assume forward data flow problem



No statement except the last in a branch



Let G = (V, E) be the CFG



There are no branches to any statement in the block except the first



Let k be the height of the lattice

• If G acyclic, visit in topological order

• In practical data flow implementations, ■



Compute Gen/Kill for each basic block

• Running time O(|E|)

- Compose transfer functions ■

Store only In/Out for each basic block



Typical basic block ~5 statements

CMSC 631



49

Order Matters — Cycles

• Let Q = max # back edges on cycle-free path Nesting depth



Back edge is from node to ancestor on DFS tree

50



The order of statements is taken into account



I.e., we keep track of facts per program point

• Alternative: Flow-insensitive analysis

• Then if ∀x.f (x) ≤ x (sufficient, but not necessary) ■

CMSC 631

• Data flow analysis is flow-sensitive

Order from depth-first search



No matter what size the lattice

Flow-Sensitivity

• If G has cycles, visit in reverse postorder ■

Visit head before tail of edge



Analysis the same regardless of statement order



Standard example: types

Running time is O((Q + 1)|E|)

- /* x : int */ x := ... /* x : int */

- Note direction of req’t depends on top vs. bottom CMSC 631

51

CMSC 631

52

Terminology Review

Another Approach: Elimination

• Must vs. May ■

• Recall in practice, one transfer function per basic block

(Not always followed in literature)

• Forwards vs. Backwards • Why not generalize this idea beyond a basic block?

• Flow-sensitive vs. Flow-insensitive • Distributive vs. Non-distributive

CMSC 631

53

Lattices of Functions



“Collapse” larger constructs into smaller ones, combining data flow equations



Eventually program collapsed into a single node!



“Expand out” back to original constructs, rebuilding information

CMSC 631

54

Elimination Methods: Conditionals

• Let (P, ≤) be a lattice If

• Let M be the set of monotonic functions on P

IfThenElse

• Define f ≤f g if for all x, f(x) ≤ g(x)

Then

z

• Define the function f ⊓ g as ■

z

(f ⊓ g) (x) = f(x) ⊓ g(x)

fite = (fthen ◦ fif ) " (felse ◦ fif ) Out(if) = fif (In(ite))) Out(then) = (fthen ◦ fif )(In(ite))) Out(else) = (felse ◦ fif )(In(ite)))

• Claim: (M, ≤f) forms a lattice CMSC 631

Else

55

CMSC 631

56

Elimination Methods: Loops

Elimination Methods: Loops (cont’d) • Let f i = f o f o ... o f (i times)

Head



While

f 0 = id

• Let

Body z

z

g(j) = !i∈[0..j] (fhead ◦ fbody )i ◦ fhead

• Need to compute limit as j goes to infinity fwhile

= fhead ! fhead ◦ fbody ◦ fhead ! fhead ◦ fbody ◦ fhead ◦ fbody ◦ fhead ! · · ·

CMSC 631

57

Height of Function Lattice What is height of lattice of monotonic functions?



Claim: finite (see homework)

Does such a thing exist?

• Observe: g(j+1) ≤ g(j)

CMSC 631

58

Non-Reducible Flow Graphs

• Assume underlying lattice (P, ≤) has finite height ■



• Therefore, g(j) converges

• Elimination methods usually only applied to reducible flow graphs ■

Ones that can be collapsed



Standard constructs yield only reducible flow graphs

• Unrestricted goto can yield non-reducible graphs A

CMSC 631

59

CMSC 631

B

C

z

w

60

Comments

Data Flow Analysis and Functions

• Can also do backwards elimination ■

• What happens at a function call?

Not quite as nice (regions are usually single entry but often not single exit)



• For bit-vector problems, elimination efficient ■

• In practice, only analyze one procedure at a time

Easy to compose functions, compute meet, etc.

• Consequences

• Elimination originally seemed like it might be faster than iteration ■

Not really the case

CMSC 631

Lots of proposed solutions in data flow analysis literature

61

More Terminology



Call to function kills all data flow facts



May be able to improve depending on language, e.g., function call may not affect locals

CMSC 631

62

Data Flow Analysis and The Heap

• An analysis that models only a single function at a time is intraprocedural • An analysis that takes multiple functions into account is interprocedural • An analysis that takes the whole program into account is...guess?

• Data Flow is good at analyzing local variables ■

But what about values stored in the heap?



Not modeled in traditional data flow

• In practice: *x := e ■

Assume all data flow facts killed (!)



Or, assume write through x may affect any variable whose address has been taken

• Note: global analysis means “more than one basic block,” but still within a function

• In general, hard to analyze pointers CMSC 631

63

CMSC 631

64

Data Flow Analysis and Optimization

•LCLint - Evans et al. (UVa)

• Moore’s Law: Hardware advances double computing power every 18 months.

•METAL - Engler et al. (Stanford, now Coverity) •ESP - Das et al. (MSR)

• Proebsting’s Law: Compiler advances double computing power every 18 years. ■

DF Analysis and Defect Detection

•FindBugs - Hovemeyer, Pugh (Maryland) ■ For

Not so much bang for the buck!

Java. The first three are for C.

•Many other one-shot projects

CMSC 631

65

■ Memory

leak detection

■ Security

vulnerability checking (tainting, info. leaks)

CMSC 631

66

Suggest Documents