SysteMorph: An SoC Framework for Adaptive Dynamic Optimization Systems Hamid Noori †, Kazuhito Eshima†, Yousuke Fujii†, Makoto Yoshida††, Takeshi Soga††, Norifumi Yoshimatsu††, Takanori Hayashida† & Kazuaki Murakami† †Department of Informatics, Kyushu University ††Fukuoka Industry, Science & Technology Foundation E-mail: †
[email protected], ††
[email protected]
Abstract z
This research investigates a possible architecture to adaptive dynamic optimization systems. In this architecture, the running application is monitored and high frequently executed parts of the code are detected. Then, these parts of code are optimized, according to the architecture of embedded hardware accelerator. To be able to use the hardware accelerator, the new binary code is rewritten. IPFlex DAP/DNA-HP was the target of our prototyping to validate the concept. The proposed architecture for SoC implementation utilizes dynamic software pipelining technique for optimization and a simplified 8-way VLIW as the accelerator. Some preliminary performance evaluations show speedup.
ERSA'05
2
Concept of SysteMorph Before Optimization
After Optimization
Online Profiling
Target Programs
ISA
Hints for Optimization
Adaptive HW/ISA/SW Co-optimization
Binary Rewriting
Target Programs
SysteMorph Software
SysteMorph Software
Instruction Execution
Instruction Execution Processor
Profiler
Processor
Programmable Engine
SmartHardware
ISA Redefinition
ERSA'05
Profiler Programmable Engine
Hardware Reconfiguration
Mode Control
3
SysteMorph Synthesis Flow on DAP/DNA-HP Hot path information
DNA (Dynamic reconfigurable Logic)
Renaming
Place and Route
Generate instructions to control DNA
Code for DNA configuration
Flow for instruction generation
Generate instructions to configure DNA
RISC Core
Configuration memory
Mapping
DNA Buffer Data Cache
Timing matching
BUS Controller Instruction Cache
Generate Control/data flow graph
Flow for synthesis
Optimization
External Memory
DNA Matrix
DAP (32bit RISC Processor) ERSA'05
4
Execution Flow on DAP/DNA-HP Application Program (before optimization) Ld Add Cmp Br Add
Application Program (After Optimization)
Branch taken Branch Not taken
Branch Exception
jump
Hot path Undetected
Instructions Activate DNA to control and then waiting for completion configure DNA jump
Br NOP NOP Br Add
Generate DNA instruction to control and write DNA configuration instruction.
SysteMorph Program
Online Profile
Hot path detected
Online Synthesis
ERSA'05
Binary Rewriting 5
Branch History Method
Atop : Head Address of Basic Block A Abottom:End Address of Basic Block A
Branch History Table Branch Address
Branch Target Address
Number of Branch
Abottom
Btop
300
Abottom
Ctop
200
・・・
・・・
・・・
Number of Backward Branch > Threshold
Find Hot Path
Hot Path 400 A 300
ERSA'05
200
B 300
D
100
E 100 200 F
400
200 C
300
300
A path which has more frequently taken is chosen as a Hot Path.
100
G 100 6
200
SysteMorph with VLIW accelerator External Memory
Instruction Cache
Instruction RAM
z
CPU
• z
VLIW based accelerator
• •
VLIW based Execution Unit
Simple RISC processor
Instruction RAM Compute intensive execution unit
CPU Profiler Assist
Data Cache Bus Controller
z
H/W profiler assist Store Branch history information
• • • ERSA'05
Branch Target Address Branch Instruction Address Number of Taken/Not Taken
7
Dynamic Trace Based Software Pipelining Hot Path A
A
If(1)
If(1)
Compensation code C
D
D
F
E
If(2)
G
B
C
B
H
I
Applying Software Pipelining
J
J
Original loop
G
F
If(2)
I
H
E
After applying dynamic software pipelining ERSA'05
8
0
ERSA'05 9
telecom_fft
telecom_crc32
telecom_adpcm_decode
telecom_adpcm_encode
security_sha
security_rijndael_decode
security_rijndael_encode
security_blowfish_decode
security_blowfish_encode
2000
network_patricia
network_dijkstra
office_stringsearch
office_ghostscript
consumer_typeset
consumer_tiffmedian
consumer_tiffdither
consumer_tiff2rgba
consumer_tiff2bw
consumer_lame3.96
consumer_jpeg_decode
consumer_jpeg_encode
automotive_susan_smoothing
automotive_susan_edge
automotive_susan_corners
automotive_qsort
automotive_bitcount
automotive_basicmath
Hot Paths
3500 120
3000 100
2500
Total number of Instructions in Hot Paths Number of Hot Paths
80
60
1500
1000 40
500 20
0
Exp. 1 4
Exp. 2 8 3 1 3 1
Exp. 2 12 5 1 5 1
ERSA'05 sh a
cr c3 2
su sa n_ sm oo th ing
tif f2 bw
1 speedup
Br
cj pe g
3 Ld/St
18 1. m cf
FP
m pe ge nc
Int
16 4. gz ip
N (way)
3.5 3 2.5 2 1.5 1 0.5 0
10
fft
crc32
adpcm_encode
adpcm_decode
sha
rijndael_encode
rijndael_decode
blowfish_encode
blowfish_decode
stringsearch
patricia
dijkstra
h264_decode_fb001
tiffmedian
tiffdither
tiff2rgba
tiff2bw
lame3.96
jpeg_encode
jpeg_decode
susan_smoothing
susan_edge
susan_corners
qsort
bitcount
basicmath
speedup
Preliminary Performance Evaluation 4
No Dependency Considering Dependency
3
2
1
8-way 12-way