compiler that performs a low level of code optimization .... is to generate the best possible code ...... Susan Hummel, Derek Lieber, Vassily Litvinov, M8rk Mer-.
The Jalapefio
Dynamic
Optimizing Stephen Fink
Jong-Deok Choi
Michael G. Burke
Mauricio J. Serrano
Vivek Sarkar
Compiler
for JavaTM
David Grove
V. C. Sreedhar
Michael Hind
Harini Srinivasan
John Whaley IBM Thomas J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598
performance
Abstract
server The Jalapeiio ponent tual
Dynamic
of the Jalapeiio
Machine
(JVM)
able execution This
Optimizing
paper
tained
of Java applications the design
thus far.
efficient
on SMP server results
compiler
a compile-only
mentations
Vir-
machines.
JVM
this is the first
approach
to program
of Java (e.g., [9, 25, 22, 181) have relied
on static
progress.
Introduction
This
paper
key component being
built
the Jalapeiio
of the Jalapefio
at IBM
the Jalapeiio to program
is that
execution.
Instead
and a JIT
compiler
always
translated
to machine
lation:
an optimizing
methods compiler marily
that
performs
The
mix execution execution
compilers
both
paper
Compiler
of this
intensive a “quick”
a low level of code optimization and a “baseline”
compile-only
approach
of unoptimized JVM,
specification makes
and optimized compared
and JIT-compiled
execution
A primary
goal of the Jalapeiio
‘Trademark
or registered trademark
as in other JVM
to
meth-
panied
JVMs.
of Sun Microaystema,
high Inc.
San Francisco
‘3Wght
KM
California
Virtual
Machine.
translation IR (HIR),
and translated by exception
tables
timizations
for
scribes
our framework performance
(IR)
used in the
of the Jalapeiio
4
of the Jalapeiio
describes
a (mostly)
to an optimized
describes
how HIR
is
machine code accom-
and GC stack-maps. for efficient
method
obtained
Section
flow-insensitive
variables.
for inlining results
and how Section
bytecodes
single-assignment
presents
Compiler
5 and 6 describe
into optimized
our framework
the high-
Sections
and the back-end
summarizes
2
respectively
front-end
of Java
Section
Machine.
representation
Compiler.
to
key features
3 outlines
Virtual
that
approach
e.s follows.
Optimizing
and “back-end”
that
for Java
by describing Section
the Jalapefio
the
results
of our knowledge,
compiler
is organized
work-in-
of the Jalapefio
a compile-only
of the Jalapefio within
Compiler;
plementation
kmkiOn 10 make digital or hard copies ol‘all or pad of this work tbr personal or classroom USC is granted without fee provided that topics are not made or distributed for profit or commercial advantage & that CVit’S bear this notice and the full citation on the first page. To cC,py otherwise, to republish, to post on servers or to redistribute to lists. rcquircs prior specific permission and/or a fee.
JAVA’99
with
for this work
Optimizing
lowered
To the best
optimizing
Jalapeiio
high-level
interpreted
is to deliver
dynamic
the intermediate
single-pass
docu-
compiled
to mixing
far.
describes
Optimizing
that
it easier
thus
the context
the “front-end”
(pri-
compiler
the design
execution.
it is invoked
such trans-
paper),
describes
1997 at
and is still
and the implementation
program
level structure
they are executed.
compiler’
in December
Center
used in a JVM
of the Jalapefio
dynamic
was initiated
The rest of the paper
are
[29], and the Jalapeiio
integrated
Research
is being
provides
an inter-
bytecodes
to provide
of the JVM
of
approach
for computationally
subject
machine
ods in the Jalapefio
feature
JVMs,
code before
allocation),
the stack [29].
project
This
this is the first
a
a new JVM
a compile-only
of providing
compiler
is the
register
mimics ment
(which
Machine,
as in other
has three different
Compiler,
A distinguishing
it takes
preter Jalapeiio
Virtual
Research.
JVM
Optimizing
of Java
features
the Jalapeiio
JVM.
we have obtained
describes
certain
In contrast,
is a fully
T. J. Watson
Optimizing 1
all features
Compiler
The Jalapeiio
execution.
disallowed
class loading.
supports
the IBM
on SMP imple-
in the Jalapefio
for Java that is being used in a
applications
high
and have therefore
Optimizing
we have ob-
of Java
Some previous
such as dynamic
Optimizing
that
scalability
performance
compilation,
and scal-
of the Jalapefio
To the best of our knowledge,
optimizing
with
is a key com-
a new Java’
to support
and the implementation
dynamic JVM
Machine,
designed
describes
Compiler,
Virtual
Compiler
and
machines.
from
Optimizing
Section calls. Compiler
8 de-
Section
the current
7 op 9 im-
(as of
‘Though dynamic compilation is the default mode for the Jalapefio Optimizing Compiler, the same infrastructure CM be used to support a hybrid of static and dynamic compilation, an discussed in Section 2.
USA
1999 I-581 13-161~5/99/06...$5.00
129
March
1999).
timizations
Section
that
are in progress
implementation
-
Section
tains 2
to the current
optimization
related
Baseline/Quick Compiler I Unop timized Code I
op-
of register
escape analysis.
work
and Section
J’JM
include
Fi-
The
Jalapeiio
Virtual
subsystems
lector,
12 con-
thread
tem),
three features,
Among
dynamic
profiler
dynamic
for debugging,
quick
compiler dynamic
and a garbage
supports
type-accurate experimentation
algorithm
multithreaded
Java are objects,
and garbage the Jalapefio
for each Java thread. a reference
to the
invocation.
by “dynamic
JVM
and
of this thread
object
which
contains
frames,
are chained
(including collector)
the Java
the Jalapefio
not need to run of
a
Although
is self-bootstrapping;
on top of another pure
the compilers,
are implemented
application.
JVM
self-optimize
of the Jalapeiio
JVM.
compiler
the Jalapefio
Figure chine.
a
of the Jalapeiio
The Jalapeiio
Optimizing
It is invoked
on an automatically
Compiler
and
other scal-
Finally,
achieves
all these
seand
for achieving it
Optimizing Building
perJava
collection,
should
be
Compiler
to
a dynamic goals
op-
is a major
together
Optimizing
an On-Line
Compiler
filing
(The
under
to monitor
techniques
performance
it is written i.e., it does we can
Context
Graph
(CCG)
similar
subsystem
profiling
information
plan”
describes
which
should
subsystem
compile will
subsystem
subsys-
The
system
OLM
sampling
calls
subsystem
and with
what
continue
monitoring
context-
in a Calling Context
will
that a certain an
performance
levels.
individual
and its
“optimization
the optimizing
optimization
Tree
be invoked
uses the CCG
to build
methand pro-
of hardware
to the Calling
methods
and a
Controller
method
detects
of
also in-
and to maintain
The controller
associated
which
a collection
in [3]. The Controller is reached.
selected
130
with for
Ma-
of individual
software
information,
information
is adaptive and dynamic. set of methods
combined
profile
that
and
the performance by using
monitor
when the OLM
Compiler
(OLM)
OLM
development.)
sensitive introduced
the Jalapeiio Virtual
System,
Measurements
ods in the application
run-time
for how
is the key component
Optimization
subsystem.
is designed
is
design
is used in the Jalapefio
Adaptive
Controller
JVM.
Optimizing
servers.
the overall
Compiler
The
threshold Structure
important
platforms.
that
sign&ant
preserving
garbage
the Jalapeiio
1 shows
Jalapefio’s cludes
in Java and
is that
deliver
code
budget.
of synchronization
on SMP
of hardware
Optimizing
one per
JVM
must
to exceptions,
to retarget
possible
compile-time
correctly
is especially
performance
goal of the Jalapefio
challenge.
One of the many
*Java implementation
Compiler
the best
a given
the cost
primitives
tems are currently feature
for while
respect
Reducing
thread able
methods
its optimizations
with
timizing
header:
object
stack
mantics
The
is to generate
improvements
a variety
a distinct
stack,
In addition,
possible
[24].
creates
These stack frames
and garbage
advantages
a
block, and a status word for collection. Since threads in
thread’s
distinguishing
dynamically
of
contains
(generational
Optimizing
is running.
Compiler
threads.
links”.
alongside
in Java,
a
garbage
has a two-word
of variable-size
all its subsystems
routines,
collectors
JVM
for Jalapeiio
an application
formance
for SMP execution
One of the fields
sequence
of
With
which
the Jalapeiio
each object
to a type information
Another
while
were
consists
collection.
and non-copying)
JVM,
locking,
1: Context
Optimizing
The Jalapefio
to determine
garbage
copying
In the Jalapefio
JVM
collector.
garbage
Java programs,
non-generational,
method
Figure
the sup-
for classes that
will be best suited
of type-accurate
contiguous
com-
until
class loader
in the Jalapeiio
to future
holds
compiler
the other
for the selected
JVM
hashing,
The
run-
testing.
compiler
via backpatching
management allocator
a pointer
for other
and type
the baseline
functional.
view
variety
handling
sys-
compilation.
Memory
collection
and support
compilers,
col-
measurements
and as the default
linking
after
garbage
It is used to validate
is fully
a dynamic
allocator,
(on-line
compilers,
first.
pilers,
an object
object
such as exception
the three
loaded
Jalapefio
linker,
scheduler,
was implemented
ports
Machine
of the
dynamic
time
3
Bytecode Translation
our conclusions.
The
run
as extensions
and interprocedural
11 discusses
class loader,
that
two interprocedural
interprocedural
saves and restores, nally,
10 describes
compiler The OLM
methods,
in-
Class Files (Bytecode)
binary
code in a “boot
compiler
could
application A,
and store
to the application. would
When sible
the
Jalapefio
Compiler
(as shown
in
Compiler
functions
88
the best
budget.
The
when
compiler
Optimiz-
or as a static
the internal
Compiler
when
stfucture
compiling
level, the Jalapeiio
of the Jalapeiio
a single
Optimizing
method.
Compiler
front-end (described in Section back-end (described in Section 6).
4
Intermediate
This Profile Information
section
outlines
some essential representation
IR better
we target.
matches
Thus,
optimizations,
motion
and code transformation.
An instruction
as well
”
Hardware
pmametezs
constants, key
OptimizedMIR
1
lntermed‘ate
MIR = Machine-specific
Rqxesentation
fmmmediate
register.
Binary Code
These
piler
3: Internal
structure
for compiling
a single
eluding to trigger
those already further
by saving
execution. cedure, methods
to represent
and
out-of-bounds operators
etc.
Java
A
byte-
to implement exceptions,
operators
additional
by the optimizing
binary
flow
JVM,
currently
bootstrapping compiles
and stores
end basic graph
(see Section
pro-
‘Another Optimizing optimizer.
selected
the resulting
131
stream blocks.
(CFG)
project Compiler
basic
calls
e.g.,
to test for null array
accesses
facilitate
re-
optimisa-
The
blocks
and
in
instruc-
trap the
are constructed
the HIR instruction
The IR also includes at IBM is using an the foundation
delimited
and ENDEBLOCK
basic
generating
blocks,
and potential
of the procedure
of BC2IR’s 5.1).
into
by LABEL
In our IR, method
byproduct
com-
code in a file for later
of Jalapefio’s
compiler
the Jalapefio
not
compiler,
as a static
are grouped
the instruction tions.
can also work
the generated as part
Com-
passes as needed.
compiler
In fact,
Optimizing
method
optimized
the optimizing from
of Jalapefio
optimization
The optimizing piler
run-time
common represents
types,
operators
common
and
HIR
of an
tion. Instructions
Figure
of separate
dereferences
in code
which
signatures,
and BOUNDS-CHECK
spectively.
machine-
consisting
are also operands method
a
(a generalization
[l])
operand,
the Jalapefio
checks for several
pointer
flexibility
of operunds. The most
There
between
NULL-CHECK
--------+--------I
as greater
register
targets,
is the addition
explicit
Representation
Bottom-Up Ravrite System
BURS =
is the
branch
difference
codes LIR = Low-level
effective
code
IR,
architectures
more
in our IR is an n-tuple
of operand
a symbolic ,
the load-store
and three-address
operator and some number type
of the register-
to a stack-based
it enables
specific
of quadruples
features
(IR) used by the Jalapeiio
Compared
Compiler.
register-based that
consists
Representation
based intermediate Optimizing
At
5) and an
of an optimizer optimizer
compile-
the Jalapefio
as a static
pos-
0ptimizer.s
3 shows
the highest
compiler
tailored compiler
generate
compile-time
functions
Optimizing
as a static
a user
image
compiler
it must
is less important
bytecode-to-bytecode
Optimizing
boot
optimizing from
so, the optimizing
Optimizing
compiler,
code for a given
Figure The
in a custom as a static
Jalapefio
ing Compiler
2:
the
methods
doing
function
dynamic
time budget
Figure
Similarly,
selected
2).
a pure
Boot Image with Opt-compiled Code
them
When
essentially
Figure
image*.
also compile
sites do control as a stream
space for the caching
the front-end for building
of the Jalapeiio a static bytecode
class
t1 c static float foo(A a, B b, float cl, float I float c2 = cl/c3; return(cl*a.fl + c2*a.f2 + c3+b.f1);
h&in loop:
c3)
3 3 Figure Abstract interpretationLoop:
After
5: An example
the initial
BBSet
and selects
a BB
fully
known
and no HIR
has been generated
thus
be added
of optional tion
auxiliary
sets, a data
procedure’s 5
nesting
during
timizations contains
a description
to HIR
translation
optimizations mizations
property
of the
on the HIR
the same type of stack
edges,
but
algorithm
and performs
on-the-fly
to update
and (2) additional after
BCZIR.
algorithm
in Section in Section
are performed
actly
whose
op-
This section
block
of opti-
can be found
The
Figure algorithm lects
contains
a basic
block
block set (BBSet) that
interprets
tains
The machine
candidate initial
stack
‘Abstract optimisations.
(1) the
from
initial at the
Loop that called
the
Interpretation
the translation
values
of stack
process,
operands
of the
the BB beginning
BB.
Initially,
in the BBSet at bytecode
handler
values of local variables
constructs
can always
with
An
operands
a backward
branch
states
may be
To minimize
BBS in the main
is chosen. except
for
This
all
order,
ordering
heuristic
control-flow and that
Surprisingly,
Java compilers,
When
the lowest
simple
loops,
the
greedy
loop.
the BB with
in topological
If
also be incorrect
for a BB a simple
the HIR,
find the optimal
point.
BB, the basic
the initial
flow joins.
is reducible.
basic
at that
of a BB may
that,
current.
match.
the
for programs
the greedy
algorithm
in practice.’
which Example:
state
gram of class
certain
Figure tl,
f oo of the example.
are identified
0 with
graph
incoming
of an already-generated
index
are generated flow
compiled
main-
fact
must
[38]. When
is used for selecting
on the
control
and local
state of a BB is the symbolic start
sebasic Loop
a BB. The algorithm
can be put
or exception
Main
a worklist,
within
state during
BBS that
(for example,
parts:
and (2) the Abstract
to abstract
variables.’ of the
(BB)
bytecodes
a symbolic
corresponds
two
relies
on different
operands
control
bytecode
The
Mer
is used on the stack
state
starting algorithm.
the
of the split
BCPIR
of the BC2IR
join,
because
a BB to generate
an overview
ex-
flow
at the start
selecting
4 shows
with
At a control
is split
initial
execution
[4].
the basic block
7. Algorithm
are two there
of a times HIR is generated
algorithm
have an important
there
arrive
state
must be regenerated
on the Java
must
middle
due to as-of-yet-unseen
later
based
in [29].
may
number
in Section 5.1
is the
The
pro-
of these
the symbolic
incorrect..
interpretation
operation
the stack is not empty
with bytecode
5.2. Examples
meet
block is encountered,
5.1, and on-the-fly
on the HIR
the types
target
The abstract
they
state”
operands
this other
‘When
the same point,
(1) the BC2IR
of the BC2IR outlined
into
The BBS During
and performs
pass Java verification
paths values
the current
BBSet.
the bytecodes
we exploit:
is
the CFG
defined
that
that
element-wise
to HIR
summarized that
or an encoding
the translation,
performed
defini-
Front-end
two parts:
bytecodes
optimizations
graph,
Compiler
contains
translates
such as reaching
specification
Bytecodes
structure.
Optimizing
The front-end that
dependence
loop
Jalapeiio
information,
bytecode
algorithm
interprets
the
state
for it. For each
interpreted,
to the
constructs
and optimizations.
cess essentially of BC2IR
enters
its initial
and new BBS may be generated. will
phase the compiler
4: Overview
that
in it is abstractly
generated
analyses
such
BCZIR
loop,
state is updated,
Figure
is identified,
main
BB, the bytecode
Successorbasic blocks
Java program
HIR
instruction
5 shows
an example
and Figure
6 shows
The number
is the index
Java the HIR
on the first of the bytecode
source
pro-
for method
column from
of each which
an empty ‘The
optimal order for basic block generation that minimiaes of regeneration is a topological order (ignoring the back edges). However, because BCZIR computes the control flow graph in the same pass, it cannot compute the optimal order a priori.
blocks).
number
are needed during on-the-fly
132
the instruction compiled
is generated.
and loaded
class
the HIR instructions indices index
instruction
Also notice that
covers
a result
that
both
getf
of BC2IR’s
2 7 7 10 14 17 16 21 21 24 25 26
6: HIR
for local
5.2
On-the-Fly
consider
Analyses our
copy
raries. BC2IR
storing
inspects
its result
recently
the value
Other
directly
optimizations
renaming
etc. are performed details
generated variable
are provided
during
bytecode
------------iload x iconst 5 iadd istore y
implementation, mechanisms
7: Example
are spe-
such as object of the Jalapefio
to invoke
methods
of a single instruction, bytecode
of
closely
instructions
(i.e., converted)
that invoke layout.
such as
based on the virtual-
These multiple
LIR operations
expose
optimizations.
BOO0 = 14(f1oat) lO(A, NonNull) = ts(float) = t6(float) = t7(float) t,8(float) = = te(float) ll(B. NonUull) tlO(float) = tli(float) = tJ2(float) = tl2(float) BOPO
float-mu1 getfield-unresolved float-mu1 float-add null-check float-load float-mu1 float-add rst"rn END-BBLOCK
HIR
into multiple-instruction
the methods
for low-level
LABEL0 float-div null-check getfield-unresolved
12(float),
13(float)
hi)
(n2) 10(A). (II31 12(float), tS(float) (n4) lO(A, NonNull), (nS) 14(float). tT(float) (n6) t6(float). tB(float) (n7) aC 11(B). 13(float). t9(flaat).
b8) W) (lllO) (Illi)
-16 1 tlO(float) tll(float)
(nl2)
tempoFigure
8: LIR
of method
foo0
If
is modified Example:
instead.
propagation,
dead method
the translation
Figure
8 shows
example
in Figure
5.
far right
of each instruction
process.
the LIR
The labels
in the data dependence
indicate graph
for method
foo
of the
(n12)
on the
the corresponding
node
(nl)
through
shown
in Figure
9.
in [38]. Generated IR (optimization on) _______________-_
xint, tint
5
INT-ADD yint.
xint.
5
Dependence
propagation
an instruction-level code generation
block
and dead
BURS that
captures
The current.
synchronization operations
133
register
true/anti/output
Synchronization
code elimination
Construction
during
dences. copy
Graph
We construct
memory
of limited
in HIR
to HIR,
that
These single-instruction
are lowered
makes conservative Figure
operations
the corresponding
variable,
for local variables,
Generated IR (optimization off) -_______-_____---_ INT-ADD tint, INT-HOVE yint,
Jalapeiio
are performed,
operations
JVM
For example,
6.2 Java
of the
In contrast
or parameter-passing or of a class consist
copy
instruction.
the instruction
such as constant
register
bytecode
into a local
to the local
we
7). A simple
into
an object
10 14 17 16 21 21 24 25 26
and store
of the unnecessary
instructions
more opportunities
respectively.
Java
IR (LIR).
matching
2 7 7
bock-end
and optimizations
invokevirtual/invokestatic.
registers
a calculation
a temporary
the most
code elimination, Further
most
from
analyses
to the Jalapeiio
function-table
optimizations
(see Figure
is the same temporary,
to write
inlining,
1 and t are virtual
the
to low-level
LIR operations
as an example.
can eliminate
When
we describe
expands
operations
11(B), < B.fl> 13(f1oat), tlo(float) ts(float), tll(float)
that perform
variable
high-level is lowered
JVM.
lO(A. NonNull), < A.f2> lrl(float), t‘r(float) tb(float1, t8(float)
to on-the-fly
propagation sequences
the result, into a local propagation
13(float)
and Optimizations
approach
After
cific
operands,
Back-end
of the IR
HIR
0
and temporary
Compiler
Compiler.
Lowering
layouts
= 10(A). < A.fl> = 12(float), ts(float)
f oo 0.
section,
the LIR
tl2(float) BOO0
variables
often contains
this is
optimizations.
: t7(float) = t8(float) = t9(float) ll(B, NonNull) tlO(float) = tll(float) = = tll(float)
of method
To illustrate
6.1
instruction
BOO0 = ll(float), lrl(fleat) lO(A, NonNull)
float-mu1 float-add float-return END-BBLOCK
Figure
B, bytecode
instructions;
t6(float)
Optimizing
Optimizing
instruction.
t5(float)
Jalapeiio
In this
while
of class
ieldvnresolved
on-the-fly
6
we
A, bytecode
ieldunresolved,
a field
ield
tl,
A. As a result,
there is only one null-check
getf
LABEL0 float-div null-check getfield-unresolved float-mu1 getfield-unresolved float-mu1 float-add null-check getfield
0
class
fields of class
6, are getf
accessing
21, is a regular
compiling
B, but not class
for accessing
7 and 14 in Figure
the HIR
Before
dependence (Section
true/anti/output dependences,
implementation assumptions constraints
dependence
(monitor-enter
used
dependences, and control
of memory about
graph,
6.3), for each basic depen-
dependences
alias information.
are modeled edges between
and monitorsxit)
by introducing synchronization and memory
Figure
operations.
These
operations
across
semantics which
edges prevent
connect
Exception
different
of local Exception
dependence
operations
and exceptions
points
if the
corresponding
method
precise
modeling
to perform
more
between
regis-
in method
LIR: r2=ro r3=l-i r4=r2, r3 r5=r4 .o l-5, !E,LBL
move not and cm* if
single
Figure
basic
block
in method
from
the LIR
constructed
edges between
have
catch
constraints
true
dependence
control
blocks. allows
edge from
no stores,
and
dependences6.
An
us
dependence
There
model
that
I\ I AND
bne
ro
rl
LBL
10: Example
to the last
In this section, matching
we address
systems
after
code
piler
[33].
to perform
optimization Our
solution
block
dependence
graph
that
can be given
matching
system
tioning
DAGs
proach
considers
exception
the problem
input (such as
(defined
[15].
IJnlike
partitioning
dependences
Optimizing
in Section
;
Comtrees
tree-pattemto parti-
of memory
for PowerPC
for
this
partitioning,
that
incorporates
algorithm
before
in Figure generate.
the pattern
LIR instructions.
5, as generated
6.4
Allocation
The LIR
that
of symbolic stack
has an that
2 has a zero cost register
moves, be used
selects
cover
rules
1, 2, Once
of the tree,
instructions.
the
Thus,
for our
are emitted
for five
by our BURS
for method
code generator.
supports
different
to the available
a method.
time
We currently
alloca-
that
employ
can be a linear
[32].
reaches
the register
registers:
simulation
rule
11 shows the MIR
framework
according allocator
rule
is
fragments
of instructions
instructions
Figure
foo in Figure
verting
grammar
matching
as MIR
only two PowerPC
allocator
matching
3, 4, 5, and 6 could
as the least
code is emitted
schemes,
pattern
the least cost to cover the tree.
are selected
Register
is partitioned
unnecessary
rules
matching
graph
(relevant
Each
For example,
Although
the tree,
scan register
134
10).
will
types
Then,
a grammar
it is used to eliminate
spent in optimizing
6The addition of load-load memory dependences will be necessary to correctly support the Java memory model for multithreaded programs that contain data races.
of pattern
cost, in this case the number
Our register
and
BURS.
using
the rule
tion
dependences).
example
because
input
(e.g., [17]), our ap-
register-true
,ZERO) ))
matching
constraints
a simple
on the trees
example,
a basic
6.2) into
approaches
in the presence
(not just
;;;;;;~;;;y
using
trees
selected
code generation
on partitioning
previous
2:
into
these rules
of using tree-pattem-
to a BURS-based
for tree-pattern-matching
AND(reg,reg)
The data dependence
i.e., coalescing.
on the result
Generation
in the Jalapeno
as input
reg:
0 0 1 1
NOT(rag))
legality
10 shows
associated
edge is created
retargetable
is based
;
a partitioning
are ilhistrated
only loads and
depends
Code
REGISTER NowcreK, NOT(reg)
for the PowerPC. applied
depen-
load-load
defined
developed
to parse
Retargetable
3 4
COST __--
reg: rag: reg:
of tree pattern
and 7 as the ones with BURS-based
CI
Nr
Figure
of the test (such as getfield). 6.3
:
\
i””
register-
tests for an exception
and an instruction
rules):
code duplication.
edges, and a
contains
dependence
that
(relevant
RULB PATTERN ------__--
iau;t,i,irOo::;
and
for the
are no memory
currently
grammar
/\-
Figure
5. The graph, shows
the first instruction
exception
an instruction
null-check)
of Figure
the basic block
we do not
graph
the method,
in the basic block.
dence edges because
between
foo()
edges, exception
dependence
instruction
the dependence for
input
‘I’ CMP
emitted
code generation.
9 shows
DAG/tree:
points
We have
Example:
f oo 0
need not be added
not
of dependence
aggressive
block.
and exception
does
input
exception
in a basic
register This
block.
of basic block
dependence edges,
points
variables
graph
of memory
Java
points.
edges are also added
operations
in the basic
motion
exception
by
exception
dependence
ter write
code
synchronization
[29] is modeled
9: Dependence
temporaries, into
registers,
allocator
contains
two
from
con-
obtained and locals,
obtained
2 7 10 14 17 16 21 24 25 26
LABEL0 ppc-fdivs getfield-unresolved ppc-fmuls getfield-unresolved
BOO0 14(float) tS(float) t6(float) t7(float)
ppc-fmuls ppc-fadds ppc-lfs ppc-fmuls ppc-fadds return END-BBLOCK
M(i1oat) t9(float) t.lO(float) tll(float.1 tll(float) t.12(float)
= = = = = = = = =
6.5
12(f1oat), 13(f1oat) lO(A, NonNull), ( A.fl> Il(f1oat). tS(float) lO(A. NonNull), < A.f2> 14(float), t7(float) t6(float). t8(float) OI -16. li(B, NonNull) ) 13(float). tlo(float) ts(float.), tJl(float)
Final
Assembly
The fmal phase of the Jalapeno assembly
phase
an opt-compiled The
assembly
phase
the optimized stored
from
Java locals
priority
specified
to allocating
whose
live range
The linear but
of method
physical
registers
as that
Example: shown
times
using The
into
in Figure
12.
JVM’s
tions
to be caught
0 0 0 0 0 0 2 7 10 14 17 16 21 24 25
LABEL0 ppc-stvu ppc-ldi ppc-stv ppc-mfspr ppc-stv ppc-fdivs getfield-unresolved ppc-fmuls getfield-unresolved ppc-fmuls ppc-fadds ppc-lfs ppc-fmuls ppc-fadds
0 0 0 26
ppc-la ppcJntspr ppc-addi ppc-blr ERD-BBLOCE
than
registers
without
based
model
allows
LR(int) FP(int) LR(int) BOO0
at a particular
that
Generation
in the table
as
the table
= = = = = = = = = =
= RO(int.) = FP(int).
12: MIR
of method
among
a garbage
into
may be generated (baseline
is constructed The and
applied
The table
LIR,
to
is also
as optimizations
is converted
into
activation
records
by methods
and optimizing),
is used, making of which
collection
compiler
occurs,
needs mapping
and stack locations
(object
pointers).
points,
a different
point.
However,
certain
predetined
MIR on
compiled a common
the Jalapefio
ex-
is providing
this
generates
(register
since garbage
collection
points
called
program
can only
occur
maps
employs
a common
analogous
this stack
among
for each program
GC points,
Jalapefio
the garbage
which
hold references
vary
be generated
collector
gar-
describing
spills)
As these locations
compilers,
making
the type-accurate
information
map could
for these points.
face among piler
in the HIR.
and as LIR
unaware
bage collector
tables,
method
table.
When
stored
and CC Stack-Maps
Since different
compilers
handler
exception
to the one for unaware
at
are only inter-
exception
of which
com-
map information.
24
7
f oo 0 with
IR).
compilers
Flow-Insensitive
Based Figure
to the LIR,
by different
registers
array
or at the actual
optimizations
is converted
stack
ception
a< 32, FP(int) ) Fl(float), F2(float) R3(A. NonNull). < A.fl> Fl(float), F4(float) R3(A, NonNull), < A.f2> FJ(float.), Fl(float) Fl(float), FB(float) OC -16. R4CB. NonNull) > Fl(float), FB(float) Fl(float.1, FZ(f1oat.j aI 32. FP(int) 1
or the values of
in the class file.
as high-level
the run-time
>
multiple
is specialized
of the HIR instructions,
in modifications
(machine-specific
)
are in terms
is updated
are applied
Tables
the information
as the HIR
interface a