Jikes RVM 2.4.5 on 3.2G Pentium 4 Replay methodology [Blackburn et al. ‘06]
Deterministic run 1st iteration – compilation + application run 2nd iteration – application run
Measurement
Accuracy
Overhead
20
1st iteration includes call graph correction
Performance
Use overlap accuracy [Arnold & Grove ’05]
2nd iteration is application-only
SPECJVM98 and DaCapo benchmarks
Accuracy No correction
Static FDOM correction
Dynamic basic block profile correction
100 90
Accuracy(%)
80 70 60 50 40 30 20 10
21
Average
jbb
ipsixql
luindex
jython
hsqldb
fop
bloat
antlr
jack
mtrt
mpegaudio
javac
db
raytrace
jess
compress
0
22 Average
jbb
ipsixql
luindex
jython
hsqldb
fop
bloat
Static FDOM Correction
antlr
jack
mtrt
mpegaudio
javac
db
raytrace
jess
compress
Normalized execution time
Overhead Dynamic basic block profile correction
1.04
1.02
1
0.98
0.96
0.94
0.92
0.9
Inlining performance Static FDOM Correction
Dynamic basic block Profile correction
Perfect DCG
Normalized execution time
1.05
1
0.95
0.9
0.85
23
Average
jbb
ipsixql
luindex
jython
hsqldb
fop
bloat
antlr
jack
mtrt
mpegaudio
javac
db
raytrace
jess
compress
0.8
Baseline: profile-guided inlining with default call graph sampling
Summary
CFG constraint improves the DCG Inlining has been tuned for bad call graph Advantages Can be easily combined with other DCG profiling Minimal overhead only during the compilation
Future work
24
More inter-procedural optimizations with high accuracy DCG
Question and comment
25
Thank you!
26
27
28
29
Timing bias misleads optimizer
5,000 times
a
10,000 times
b c
Sampling with timing bias
1,000 samples
a
DCGPerfect
DCGSample
30
Inliner may inline b instead of c
c
DCGSample
Edge frequencies were reversed!
Inlining decision
500 samples
b
Call graph profiling in online optimization system Source program
Compile & instrument
Machine code
e.g. Java byte code Dynamic call graph Online optimization system
31
Profiling and program run at the same time Minimize profiling overhead Corollary: sacrifice profiling accuracy