The Jalapefio Dynamic Optimizing Compiler for ... - ACM Digital Library

19 downloads 0 Views 1MB Size Report
compiler that performs a low level of code optimization .... is to generate the best possible code ...... Susan Hummel, Derek Lieber, Vassily Litvinov, M8rk Mer-.
The Jalapefio

Dynamic

Optimizing Stephen Fink

Jong-Deok Choi

Michael G. Burke

Mauricio J. Serrano

Vivek Sarkar

Compiler

for JavaTM

David Grove

V. C. Sreedhar

Michael Hind

Harini Srinivasan

John Whaley IBM Thomas J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598

performance

Abstract

server The Jalapeiio ponent tual

Dynamic

of the Jalapeiio

Machine

(JVM)

able execution This

Optimizing

paper

tained

of Java applications the design

thus far.

efficient

on SMP server results

compiler

a compile-only

mentations

Vir-

machines.

JVM

this is the first

approach

to program

of Java (e.g., [9, 25, 22, 181) have relied

on static

progress.

Introduction

This

paper

key component being

built

the Jalapeiio

of the Jalapefio

at IBM

the Jalapeiio to program

is that

execution.

Instead

and a JIT

compiler

always

translated

to machine

lation:

an optimizing

methods compiler marily

that

performs

The

mix execution execution

compilers

both

paper

Compiler

of this

intensive a “quick”

a low level of code optimization and a “baseline”

compile-only

approach

of unoptimized JVM,

specification makes

and optimized compared

and JIT-compiled

execution

A primary

goal of the Jalapeiio

‘Trademark

or registered trademark

as in other JVM

to

meth-

panied

JVMs.

of Sun Microaystema,

high Inc.

San Francisco

‘3Wght

KM

California

Virtual

Machine.

translation IR (HIR),

and translated by exception

tables

timizations

for

scribes

our framework performance

(IR)

used in the

of the Jalapeiio

4

of the Jalapeiio

describes

a (mostly)

to an optimized

describes

how HIR

is

machine code accom-

and GC stack-maps. for efficient

method

obtained

Section

flow-insensitive

variables.

for inlining results

and how Section

bytecodes

single-assignment

presents

Compiler

5 and 6 describe

into optimized

our framework

the high-

Sections

and the back-end

summarizes

2

respectively

front-end

of Java

Section

Machine.

representation

Compiler.

to

key features

3 outlines

Virtual

that

approach

e.s follows.

Optimizing

and “back-end”

that

for Java

by describing Section

the Jalapefio

the

results

of our knowledge,

compiler

is organized

work-in-

of the Jalapefio

a compile-only

of the Jalapefio within

Compiler;

plementation

kmkiOn 10 make digital or hard copies ol‘all or pad of this work tbr personal or classroom USC is granted without fee provided that topics are not made or distributed for profit or commercial advantage & that CVit’S bear this notice and the full citation on the first page. To cC,py otherwise, to republish, to post on servers or to redistribute to lists. rcquircs prior specific permission and/or a fee.

JAVA’99

with

for this work

Optimizing

lowered

To the best

optimizing

Jalapeiio

high-level

interpreted

is to deliver

dynamic

the intermediate

single-pass

docu-

compiled

to mixing

far.

describes

Optimizing

that

it easier

thus

the context

the “front-end”

(pri-

compiler

the design

execution.

it is invoked

such trans-

paper),

describes

1997 at

and is still

and the implementation

program

level structure

they are executed.

compiler’

in December

Center

used in a JVM

of the Jalapefio

dynamic

was initiated

The rest of the paper

are

[29], and the Jalapeiio

integrated

Research

is being

provides

an inter-

bytecodes

to provide

of the JVM

of

approach

for computationally

subject

machine

ods in the Jalapefio

feature

JVMs,

code before

allocation),

the stack [29].

project

This

this is the first

a

a new JVM

a compile-only

of providing

compiler

is the

register

mimics ment

(which

Machine,

as in other

has three different

Compiler,

A distinguishing

it takes

preter Jalapeiio

Virtual

Research.

JVM

Optimizing

of Java

features

the Jalapeiio

JVM.

we have obtained

describes

certain

In contrast,

is a fully

T. J. Watson

Optimizing 1

all features

Compiler

The Jalapeiio

execution.

disallowed

class loading.

supports

the IBM

on SMP imple-

in the Jalapefio

for Java that is being used in a

applications

high

and have therefore

Optimizing

we have ob-

of Java

Some previous

such as dynamic

Optimizing

that

scalability

performance

compilation,

and scal-

of the Jalapefio

To the best of our knowledge,

optimizing

with

is a key com-

a new Java’

to support

and the implementation

dynamic JVM

Machine,

designed

describes

Compiler,

Virtual

Compiler

and

machines.

from

Optimizing

Section calls. Compiler

8 de-

Section

the current

7 op 9 im-

(as of

‘Though dynamic compilation is the default mode for the Jalapefio Optimizing Compiler, the same infrastructure CM be used to support a hybrid of static and dynamic compilation, an discussed in Section 2.

USA

1999 I-581 13-161~5/99/06...$5.00

129

March

1999).

timizations

Section

that

are in progress

implementation

-

Section

tains 2

to the current

optimization

related

Baseline/Quick Compiler I Unop timized Code I

op-

of register

escape analysis.

work

and Section

J’JM

include

Fi-

The

Jalapeiio

Virtual

subsystems

lector,

12 con-

thread

tem),

three features,

Among

dynamic

profiler

dynamic

for debugging,

quick

compiler dynamic

and a garbage

supports

type-accurate experimentation

algorithm

multithreaded

Java are objects,

and garbage the Jalapefio

for each Java thread. a reference

to the

invocation.

by “dynamic

JVM

and

of this thread

object

which

contains

frames,

are chained

(including collector)

the Java

the Jalapefio

not need to run of

a

Although

is self-bootstrapping;

on top of another pure

the compilers,

are implemented

application.

JVM

self-optimize

of the Jalapeiio

JVM.

compiler

the Jalapefio

Figure chine.

a

of the Jalapeiio

The Jalapeiio

Optimizing

It is invoked

on an automatically

Compiler

and

other scal-

Finally,

achieves

all these

seand

for achieving it

Optimizing Building

perJava

collection,

should

be

Compiler

to

a dynamic goals

op-

is a major

together

Optimizing

an On-Line

Compiler

filing

(The

under

to monitor

techniques

performance

it is written i.e., it does we can

Context

Graph

(CCG)

similar

subsystem

profiling

information

plan”

describes

which

should

subsystem

compile will

subsystem

subsys-

The

system

OLM

sampling

calls

subsystem

and with

what

continue

monitoring

context-

in a Calling Context

will

that a certain an

performance

levels.

individual

and its

“optimization

the optimizing

optimization

Tree

be invoked

uses the CCG

to build

methand pro-

of hardware

to the Calling

methods

and a

Controller

method

detects

of

also in-

and to maintain

The controller

associated

which

a collection

in [3]. The Controller is reached.

selected

130

with for

Ma-

of individual

software

information,

information

is adaptive and dynamic. set of methods

combined

profile

that

and

the performance by using

monitor

when the OLM

Compiler

(OLM)

OLM

development.)

sensitive introduced

the Jalapeiio Virtual

System,

Measurements

ods in the application

run-time

for how

is the key component

Optimization

subsystem.

is designed

is

design

is used in the Jalapefio

Adaptive

Controller

JVM.

Optimizing

servers.

the overall

Compiler

The

threshold Structure

important

platforms.

that

sign&ant

preserving

garbage

the Jalapeiio

1 shows

Jalapefio’s cludes

in Java and

is that

deliver

code

budget.

of synchronization

on SMP

of hardware

Optimizing

one per

JVM

must

to exceptions,

to retarget

possible

compile-time

correctly

is especially

performance

goal of the Jalapefio

challenge.

One of the many

*Java implementation

Compiler

the best

a given

the cost

primitives

tems are currently feature

for while

respect

Reducing

thread able

methods

its optimizations

with

timizing

header:

object

stack

mantics

The

is to generate

improvements

a variety

a distinct

stack,

In addition,

possible

[24].

creates

These stack frames

and garbage

advantages

a

block, and a status word for collection. Since threads in

thread’s

distinguishing

dynamically

of

contains

(generational

Optimizing

is running.

Compiler

threads.

links”.

alongside

in Java,

a

garbage

has a two-word

of variable-size

all its subsystems

routines,

collectors

JVM

for Jalapeiio

an application

formance

for SMP execution

One of the fields

sequence

of

With

which

the Jalapeiio

each object

to a type information

Another

while

were

consists

collection.

and non-copying)

JVM,

locking,

1: Context

Optimizing

The Jalapefio

to determine

garbage

copying

In the Jalapefio

JVM

collector.

garbage

Java programs,

non-generational,

method

Figure

the sup-

for classes that

will be best suited

of type-accurate

contiguous

com-

until

class loader

in the Jalapeiio

to future

holds

compiler

the other

for the selected

JVM

hashing,

The

run-

testing.

compiler

via backpatching

management allocator

a pointer

for other

and type

the baseline

functional.

view

variety

handling

sys-

compilation.

Memory

collection

and support

compilers,

col-

measurements

and as the default

linking

after

garbage

It is used to validate

is fully

a dynamic

allocator,

(on-line

compilers,

first.

pilers,

an object

object

such as exception

the three

loaded

Jalapefio

linker,

scheduler,

was implemented

ports

Machine

of the

dynamic

time

3

Bytecode Translation

our conclusions.

The

run

as extensions

and interprocedural

11 discusses

class loader,

that

two interprocedural

interprocedural

saves and restores, nally,

10 describes

compiler The OLM

methods,

in-

Class Files (Bytecode)

binary

code in a “boot

compiler

could

application A,

and store

to the application. would

When sible

the

Jalapefio

Compiler

(as shown

in

Compiler

functions

88

the best

budget.

The

when

compiler

Optimiz-

or as a static

the internal

Compiler

when

stfucture

compiling

level, the Jalapeiio

of the Jalapeiio

a single

Optimizing

method.

Compiler

front-end (described in Section back-end (described in Section 6).

4

Intermediate

This Profile Information

section

outlines

some essential representation

IR better

we target.

matches

Thus,

optimizations,

motion

and code transformation.

An instruction

as well



Hardware

pmametezs

constants, key

OptimizedMIR

1

lntermed‘ate

MIR = Machine-specific

Rqxesentation

fmmmediate

register.

Binary Code

These

piler

3: Internal

structure

for compiling

a single

eluding to trigger

those already further

by saving

execution. cedure, methods

to represent

and

out-of-bounds operators

etc.

Java

A

byte-

to implement exceptions,

operators

additional

by the optimizing

binary

flow

JVM,

currently

bootstrapping compiles

and stores

end basic graph

(see Section

pro-

‘Another Optimizing optimizer.

selected

the resulting

131

stream blocks.

(CFG)

project Compiler

basic

calls

e.g.,

to test for null array

accesses

facilitate

re-

optimisa-

The

blocks

and

in

instruc-

trap the

are constructed

the HIR instruction

The IR also includes at IBM is using an the foundation

delimited

and ENDEBLOCK

basic

generating

blocks,

and potential

of the procedure

of BC2IR’s 5.1).

into

by LABEL

In our IR, method

byproduct

com-

code in a file for later

of Jalapefio’s

compiler

the Jalapefio

not

compiler,

as a static

are grouped

the instruction tions.

can also work

the generated as part

Com-

passes as needed.

compiler

In fact,

Optimizing

method

optimized

the optimizing from

of Jalapefio

optimization

The optimizing piler

run-time

common represents

types,

operators

common

and

HIR

of an

tion. Instructions

Figure

of separate

dereferences

in code

which

signatures,

and BOUNDS-CHECK

spectively.

machine-

consisting

are also operands method

a

(a generalization

[l])

operand,

the Jalapefio

checks for several

pointer

flexibility

of operunds. The most

There

between

NULL-CHECK

--------+--------I

as greater

register

targets,

is the addition

explicit

Representation

Bottom-Up Ravrite System

BURS =

is the

branch

difference

codes LIR = Low-level

effective

code

IR,

architectures

more

in our IR is an n-tuple

of operand

a symbolic ,

the load-store

and three-address

operator and some number type

of the register-

to a stack-based

it enables

specific

of quadruples

features

(IR) used by the Jalapeiio

Compared

Compiler.

register-based that

consists

Representation

based intermediate Optimizing

At

5) and an

of an optimizer optimizer

compile-

the Jalapefio

as a static

pos-

0ptimizer.s

3 shows

the highest

compiler

tailored compiler

generate

compile-time

functions

Optimizing

as a static

a user

image

compiler

it must

is less important

bytecode-to-bytecode

Optimizing

boot

optimizing from

so, the optimizing

Optimizing

compiler,

code for a given

Figure The

in a custom as a static

Jalapefio

ing Compiler

2:

the

methods

doing

function

dynamic

time budget

Figure

Similarly,

selected

2).

a pure

Boot Image with Opt-compiled Code

them

When

essentially

Figure

image*.

also compile

sites do control as a stream

space for the caching

the front-end for building

of the Jalapeiio a static bytecode

class

t1 c static float foo(A a, B b, float cl, float I float c2 = cl/c3; return(cl*a.fl + c2*a.f2 + c3+b.f1);

h&in loop:

c3)

3 3 Figure Abstract interpretationLoop:

After

5: An example

the initial

BBSet

and selects

a BB

fully

known

and no HIR

has been generated

thus

be added

of optional tion

auxiliary

sets, a data

procedure’s 5

nesting

during

timizations contains

a description

to HIR

translation

optimizations mizations

property

of the

on the HIR

the same type of stack

edges,

but

algorithm

and performs

on-the-fly

to update

and (2) additional after

BCZIR.

algorithm

in Section in Section

are performed

actly

whose

op-

This section

block

of opti-

can be found

The

Figure algorithm lects

contains

a basic

block

block set (BBSet) that

interprets

tains

The machine

candidate initial

stack

‘Abstract optimisations.

(1) the

from

initial at the

Loop that called

the

Interpretation

the translation

values

of stack

process,

operands

of the

the BB beginning

BB.

Initially,

in the BBSet at bytecode

handler

values of local variables

constructs

can always

with

An

operands

a backward

branch

states

may be

To minimize

BBS in the main

is chosen. except

for

This

all

order,

ordering

heuristic

control-flow and that

Surprisingly,

Java compilers,

When

the lowest

simple

loops,

the

greedy

loop.

the BB with

in topological

If

also be incorrect

for a BB a simple

the HIR,

find the optimal

point.

BB, the basic

the initial

flow joins.

is reducible.

basic

at that

of a BB may

that,

current.

match.

the

for programs

the greedy

algorithm

in practice.’

which Example:

state

gram of class

certain

Figure tl,

f oo of the example.

are identified

0 with

graph

incoming

of an already-generated

index

are generated flow

compiled

main-

fact

must

[38]. When

is used for selecting

on the

control

and local

state of a BB is the symbolic start

sebasic Loop

a BB. The algorithm

can be put

or exception

Main

a worklist,

within

state during

BBS that

(for example,

parts:

and (2) the Abstract

to abstract

variables.’ of the

(BB)

bytecodes

a symbolic

corresponds

two

relies

on different

operands

control

bytecode

The

Mer

is used on the stack

state

starting algorithm.

the

of the split

BCPIR

of the BC2IR

join,

because

a BB to generate

an overview

ex-

flow

at the start

selecting

4 shows

with

At a control

is split

initial

execution

[4].

the basic block

7. Algorithm

are two there

of a times HIR is generated

algorithm

have an important

there

arrive

state

must be regenerated

on the Java

must

middle

due to as-of-yet-unseen

later

based

in [29].

may

number

in Section 5.1

is the

The

pro-

of these

the symbolic

incorrect..

interpretation

operation

the stack is not empty

with bytecode

5.2. Examples

meet

block is encountered,

5.1, and on-the-fly

on the HIR

the types

target

The abstract

they

state”

operands

this other

‘When

the same point,

(1) the BC2IR

of the BC2IR outlined

into

The BBS During

and performs

pass Java verification

paths values

the current

BBSet.

the bytecodes

we exploit:

is

the CFG

defined

that

that

element-wise

to HIR

summarized that

or an encoding

the translation,

performed

defini-

Front-end

two parts:

bytecodes

optimizations

graph,

Compiler

contains

translates

such as reaching

specification

Bytecodes

structure.

Optimizing

The front-end that

dependence

loop

Jalapeiio

information,

bytecode

algorithm

interprets

the

state

for it. For each

interpreted,

to the

constructs

and optimizations.

cess essentially of BC2IR

enters

its initial

and new BBS may be generated. will

phase the compiler

4: Overview

that

in it is abstractly

generated

analyses

such

BCZIR

loop,

state is updated,

Figure

is identified,

main

BB, the bytecode

Successorbasic blocks

Java program

HIR

instruction

5 shows

an example

and Figure

6 shows

The number

is the index

Java the HIR

on the first of the bytecode

source

pro-

for method

column from

of each which

an empty ‘The

optimal order for basic block generation that minimiaes of regeneration is a topological order (ignoring the back edges). However, because BCZIR computes the control flow graph in the same pass, it cannot compute the optimal order a priori.

blocks).

number

are needed during on-the-fly

132

the instruction compiled

is generated.

and loaded

class

the HIR instructions indices index

instruction

Also notice that

covers

a result

that

both

getf

of BC2IR’s

2 7 7 10 14 17 16 21 21 24 25 26

6: HIR

for local

5.2

On-the-Fly

consider

Analyses our

copy

raries. BC2IR

storing

inspects

its result

recently

the value

Other

directly

optimizations

renaming

etc. are performed details

generated variable

are provided

during

bytecode

------------iload x iconst 5 iadd istore y

implementation, mechanisms

7: Example

are spe-

such as object of the Jalapefio

to invoke

methods

of a single instruction, bytecode

of

closely

instructions

(i.e., converted)

that invoke layout.

such as

based on the virtual-

These multiple

LIR operations

expose

optimizations.

BOO0 = 14(f1oat) lO(A, NonNull) = ts(float) = t6(float) = t7(float) t,8(float) = = te(float) ll(B. NonUull) tlO(float) = tli(float) = tJ2(float) = tl2(float) BOPO

float-mu1 getfield-unresolved float-mu1 float-add null-check float-load float-mu1 float-add rst"rn END-BBLOCK

HIR

into multiple-instruction

the methods

for low-level

LABEL0 float-div null-check getfield-unresolved

12(float),

13(float)

hi)

(n2) 10(A). (II31 12(float), tS(float) (n4) lO(A, NonNull), (nS) 14(float). tT(float) (n6) t6(float). tB(float) (n7) aC 11(B). 13(float). t9(flaat).

b8) W) (lllO) (Illi)

-16 1 tlO(float) tll(float)

(nl2)

tempoFigure

8: LIR

of method

foo0

If

is modified Example:

instead.

propagation,

dead method

the translation

Figure

8 shows

example

in Figure

5.

far right

of each instruction

process.

the LIR

The labels

in the data dependence

indicate graph

for method

foo

of the

(n12)

on the

the corresponding

node

(nl)

through

shown

in Figure

9.

in [38]. Generated IR (optimization on) _______________-_

xint, tint

5

INT-ADD yint.

xint.

5

Dependence

propagation

an instruction-level code generation

block

and dead

BURS that

captures

The current.

synchronization operations

133

register

true/anti/output

Synchronization

code elimination

Construction

during

dences. copy

Graph

We construct

memory

of limited

in HIR

to HIR,

that

These single-instruction

are lowered

makes conservative Figure

operations

the corresponding

variable,

for local variables,

Generated IR (optimization off) -_______-_____---_ INT-ADD tint, INT-HOVE yint,

Jalapeiio

are performed,

operations

JVM

For example,

6.2 Java

of the

In contrast

or parameter-passing or of a class consist

copy

instruction.

the instruction

such as constant

register

bytecode

into a local

to the local

we

7). A simple

into

an object

10 14 17 16 21 21 24 25 26

and store

of the unnecessary

instructions

more opportunities

respectively.

Java

IR (LIR).

matching

2 7 7

bock-end

and optimizations

invokevirtual/invokestatic.

registers

a calculation

a temporary

the most

code elimination, Further

most

from

analyses

to the Jalapeiio

function-table

optimizations

(see Figure

is the same temporary,

to write

inlining,

1 and t are virtual

the

to low-level

LIR operations

as an example.

can eliminate

When

we describe

expands

operations

11(B), < B.fl> 13(f1oat), tlo(float) ts(float), tll(float)

that perform

variable

high-level is lowered

JVM.

lO(A. NonNull), < A.f2> lrl(float), t‘r(float) tb(float1, t8(float)

to on-the-fly

propagation sequences

the result, into a local propagation

13(float)

and Optimizations

approach

After

cific

operands,

Back-end

of the IR

HIR

0

and temporary

Compiler

Compiler.

Lowering

layouts

= 10(A). < A.fl> = 12(float), ts(float)

f oo 0.

section,

the LIR

tl2(float) BOO0

variables

often contains

this is

optimizations.

: t7(float) = t8(float) = t9(float) ll(B, NonNull) tlO(float) = tll(float) = = tll(float)

of method

To illustrate

6.1

instruction

BOO0 = ll(float), lrl(fleat) lO(A, NonNull)

float-mu1 float-add float-return END-BBLOCK

Figure

B, bytecode

instructions;

t6(float)

Optimizing

Optimizing

instruction.

t5(float)

Jalapeiio

In this

while

of class

ieldvnresolved

on-the-fly

6

we

A, bytecode

ieldunresolved,

a field

ield

tl,

A. As a result,

there is only one null-check

getf

LABEL0 float-div null-check getfield-unresolved float-mu1 getfield-unresolved float-mu1 float-add null-check getfield

0

class

fields of class

6, are getf

accessing

21, is a regular

compiling

B, but not class

for accessing

7 and 14 in Figure

the HIR

Before

dependence (Section

true/anti/output dependences,

implementation assumptions constraints

dependence

(monitor-enter

used

dependences, and control

of memory about

graph,

6.3), for each basic depen-

dependences

alias information.

are modeled edges between

and monitorsxit)

by introducing synchronization and memory

Figure

operations.

These

operations

across

semantics which

edges prevent

connect

Exception

different

of local Exception

dependence

operations

and exceptions

points

if the

corresponding

method

precise

modeling

to perform

more

between

regis-

in method

LIR: r2=ro r3=l-i r4=r2, r3 r5=r4 .o l-5, !E,LBL

move not and cm* if

single

Figure

basic

block

in method

from

the LIR

constructed

edges between

have

catch

constraints

true

dependence

control

blocks. allows

edge from

no stores,

and

dependences6.

An

us

dependence

There

model

that

I\ I AND

bne

ro

rl

LBL

10: Example

to the last

In this section, matching

we address

systems

after

code

piler

[33].

to perform

optimization Our

solution

block

dependence

graph

that

can be given

matching

system

tioning

DAGs

proach

considers

exception

the problem

input (such as

(defined

[15].

IJnlike

partitioning

dependences

Optimizing

in Section

;

Comtrees

tree-pattemto parti-

of memory

for PowerPC

for

this

partitioning,

that

incorporates

algorithm

before

in Figure generate.

the pattern

LIR instructions.

5, as generated

6.4

Allocation

The LIR

that

of symbolic stack

has an that

2 has a zero cost register

moves, be used

selects

cover

rules

1, 2, Once

of the tree,

instructions.

the

Thus,

for our

are emitted

for five

by our BURS

for method

code generator.

supports

different

to the available

a method.

time

We currently

alloca-

that

employ

can be a linear

[32].

reaches

the register

registers:

simulation

rule

11 shows the MIR

framework

according allocator

rule

is

fragments

of instructions

instructions

Figure

foo in Figure

verting

grammar

matching

as MIR

only two PowerPC

allocator

matching

3, 4, 5, and 6 could

as the least

code is emitted

schemes,

pattern

the least cost to cover the tree.

are selected

Register

is partitioned

unnecessary

rules

matching

graph

(relevant

Each

For example,

Although

the tree,

scan register

134

10).

will

types

Then,

a grammar

it is used to eliminate

spent in optimizing

6The addition of load-load memory dependences will be necessary to correctly support the Java memory model for multithreaded programs that contain data races.

of pattern

cost, in this case the number

Our register

and

BURS.

using

the rule

tion

dependences).

example

because

input

(e.g., [17]), our ap-

register-true

,ZERO) ))

matching

constraints

a simple

on the trees

example,

a basic

6.2) into

approaches

in the presence

(not just

;;;;;;~;;;y

using

trees

selected

code generation

on partitioning

previous

2:

into

these rules

of using tree-pattem-

to a BURS-based

for tree-pattern-matching

AND(reg,reg)

The data dependence

i.e., coalescing.

on the result

Generation

in the Jalapeno

as input

reg:

0 0 1 1

NOT(rag))

legality

10 shows

associated

edge is created

retargetable

is based

;

a partitioning

are ilhistrated

only loads and

depends

Code

REGISTER NowcreK, NOT(reg)

for the PowerPC. applied

depen-

load-load

defined

developed

to parse

Retargetable

3 4

COST __--

reg: rag: reg:

of tree pattern

and 7 as the ones with BURS-based

CI

Nr

Figure

of the test (such as getfield). 6.3

:

\

i””

register-

tests for an exception

and an instruction

rules):

code duplication.

edges, and a

contains

dependence

that

(relevant

RULB PATTERN ------__--

iau;t,i,irOo::;

and

for the

are no memory

currently

grammar

/\-

Figure

5. The graph, shows

the first instruction

exception

an instruction

null-check)

of Figure

the basic block

we do not

graph

the method,

in the basic block.

dence edges because

between

foo()

edges, exception

dependence

instruction

the dependence for

input

‘I’ CMP

emitted

code generation.

9 shows

DAG/tree:

points

We have

Example:

f oo 0

need not be added

not

of dependence

aggressive

block.

and exception

does

input

exception

in a basic

register This

block.

of basic block

dependence edges,

points

variables

graph

of memory

Java

points.

edges are also added

operations

in the basic

motion

exception

by

exception

dependence

ter write

code

synchronization

[29] is modeled

9: Dependence

temporaries, into

registers,

allocator

contains

two

from

con-

obtained and locals,

obtained

2 7 10 14 17 16 21 24 25 26

LABEL0 ppc-fdivs getfield-unresolved ppc-fmuls getfield-unresolved

BOO0 14(float) tS(float) t6(float) t7(float)

ppc-fmuls ppc-fadds ppc-lfs ppc-fmuls ppc-fadds return END-BBLOCK

M(i1oat) t9(float) t.lO(float) tll(float.1 tll(float) t.12(float)

= = = = = = = = =

6.5

12(f1oat), 13(f1oat) lO(A, NonNull), ( A.fl> Il(f1oat). tS(float) lO(A. NonNull), < A.f2> 14(float), t7(float) t6(float). t8(float) OI -16. li(B, NonNull) ) 13(float). tlo(float) ts(float.), tJl(float)

Final

Assembly

The fmal phase of the Jalapeno assembly

phase

an opt-compiled The

assembly

phase

the optimized stored

from

Java locals

priority

specified

to allocating

whose

live range

The linear but

of method

physical

registers

as that

Example: shown

times

using The

into

in Figure

12.

JVM’s

tions

to be caught

0 0 0 0 0 0 2 7 10 14 17 16 21 24 25

LABEL0 ppc-stvu ppc-ldi ppc-stv ppc-mfspr ppc-stv ppc-fdivs getfield-unresolved ppc-fmuls getfield-unresolved ppc-fmuls ppc-fadds ppc-lfs ppc-fmuls ppc-fadds

0 0 0 26

ppc-la ppcJntspr ppc-addi ppc-blr ERD-BBLOCE

than

registers

without

based

model

allows

LR(int) FP(int) LR(int) BOO0

at a particular

that

Generation

in the table

as

the table

= = = = = = = = = =

= RO(int.) = FP(int).

12: MIR

of method

among

a garbage

into

may be generated (baseline

is constructed The and

applied

The table

LIR,

to

is also

as optimizations

is converted

into

activation

records

by methods

and optimizing),

is used, making of which

collection

compiler

occurs,

needs mapping

and stack locations

(object

pointers).

points,

a different

point.

However,

certain

predetined

MIR on

compiled a common

the Jalapefio

ex-

is providing

this

generates

(register

since garbage

collection

points

called

program

can only

occur

maps

employs

a common

analogous

this stack

among

for each program

GC points,

Jalapefio

the garbage

which

hold references

vary

be generated

collector

gar-

describing

spills)

As these locations

compilers,

making

the type-accurate

information

map could

for these points.

face among piler

in the HIR.

and as LIR

unaware

bage collector

tables,

method

table.

When

stored

and CC Stack-Maps

Since different

compilers

handler

exception

to the one for unaware

at

are only inter-

exception

of which

com-

map information.

24

7

f oo 0 with

IR).

compilers

Flow-Insensitive

Based Figure

to the LIR,

by different

registers

array

or at the actual

optimizations

is converted

stack

ception

a< 32, FP(int) ) Fl(float), F2(float) R3(A. NonNull). < A.fl> Fl(float), F4(float) R3(A, NonNull), < A.f2> FJ(float.), Fl(float) Fl(float), FB(float) OC -16. R4CB. NonNull) > Fl(float), FB(float) Fl(float.1, FZ(f1oat.j aI 32. FP(int) 1

or the values of

in the class file.

as high-level

the run-time

>

multiple

is specialized

of the HIR instructions,

in modifications

(machine-specific

)

are in terms

is updated

are applied

Tables

the information

as the HIR

interface a

Suggest Documents