SysteMorph: An SoC Framework for Adaptive Dynamic Optimization ...

1 downloads 39 Views 103KB Size Report
Dynamic Optimization Systems. Hamid Noori †, Kazuhito ... utilizes dynamic software pipelining technique for optimization and a simplified 8-way VLIW as the ...
SysteMorph: An SoC Framework for Adaptive Dynamic Optimization Systems Hamid Noori †, Kazuhito Eshima†, Yousuke Fujii†, Makoto Yoshida††, Takeshi Soga††, Norifumi Yoshimatsu††, Takanori Hayashida† & Kazuaki Murakami† †Department of Informatics, Kyushu University ††Fukuoka Industry, Science & Technology Foundation E-mail: †[email protected], ††[email protected]

Abstract z

This research investigates a possible architecture to adaptive dynamic optimization systems. In this architecture, the running application is monitored and high frequently executed parts of the code are detected. Then, these parts of code are optimized, according to the architecture of embedded hardware accelerator. To be able to use the hardware accelerator, the new binary code is rewritten. IPFlex DAP/DNA-HP was the target of our prototyping to validate the concept. The proposed architecture for SoC implementation utilizes dynamic software pipelining technique for optimization and a simplified 8-way VLIW as the accelerator. Some preliminary performance evaluations show speedup.

ERSA'05

2

Concept of SysteMorph Before Optimization

After Optimization

Online Profiling

Target Programs

ISA

Hints for Optimization

Adaptive HW/ISA/SW Co-optimization

Binary Rewriting

Target Programs

SysteMorph Software

SysteMorph Software

Instruction Execution

Instruction Execution Processor

Profiler

Processor

Programmable Engine

SmartHardware

ISA Redefinition

ERSA'05

Profiler Programmable Engine

Hardware Reconfiguration

Mode Control

3

SysteMorph Synthesis Flow on DAP/DNA-HP Hot path information

DNA (Dynamic reconfigurable Logic)

Renaming

Place and Route

Generate instructions to control DNA

Code for DNA configuration

Flow for instruction generation

Generate instructions to configure DNA

RISC Core

Configuration memory

Mapping

DNA Buffer Data Cache

Timing matching

BUS Controller Instruction Cache

Generate Control/data flow graph

Flow for synthesis

Optimization

External Memory

DNA Matrix

DAP (32bit RISC Processor) ERSA'05

4

Execution Flow on DAP/DNA-HP Application Program (before optimization) Ld Add Cmp Br Add

Application Program (After Optimization)

Branch taken Branch Not taken

Branch Exception

jump

Hot path Undetected

Instructions Activate DNA to control and then waiting for completion configure DNA jump

Br NOP NOP Br Add

Generate DNA instruction to control and write DNA configuration instruction.

SysteMorph Program

Online Profile

Hot path detected

Online Synthesis

ERSA'05

Binary Rewriting 5

Branch History Method „

Atop : Head Address of Basic Block A Abottom:End Address of Basic Block A

Branch History Table Branch Address

Branch Target Address

Number of Branch

Abottom

Btop

300

Abottom

Ctop

200

・・・

・・・

・・・

Number of Backward Branch > Threshold

„ Find Hot Path

Hot Path 400 A 300

ERSA'05

200

B 300

D

100

E 100 200 F

400

200 C

300

300

A path which has more frequently taken is chosen as a Hot Path.

100

G 100 6

200

SysteMorph with VLIW accelerator External Memory

Instruction Cache

Instruction RAM

z

CPU

• z

VLIW based accelerator

• •

VLIW based Execution Unit

Simple RISC processor

Instruction RAM Compute intensive execution unit

CPU Profiler Assist

Data Cache Bus Controller

z

H/W profiler assist Store Branch history information

• • • ERSA'05

Branch Target Address Branch Instruction Address Number of Taken/Not Taken

7

Dynamic Trace Based Software Pipelining Hot Path A

A

If(1)

If(1)

Compensation code C

D

D

F

E

If(2)

G

B

C

B

H

I

Applying Software Pipelining

J

J

Original loop

G

F

If(2)

I

H

E

After applying dynamic software pipelining ERSA'05

8

0

ERSA'05 9

telecom_fft

telecom_crc32

telecom_adpcm_decode

telecom_adpcm_encode

security_sha

security_rijndael_decode

security_rijndael_encode

security_blowfish_decode

security_blowfish_encode

2000

network_patricia

network_dijkstra

office_stringsearch

office_ghostscript

consumer_typeset

consumer_tiffmedian

consumer_tiffdither

consumer_tiff2rgba

consumer_tiff2bw

consumer_lame3.96

consumer_jpeg_decode

consumer_jpeg_encode

automotive_susan_smoothing

automotive_susan_edge

automotive_susan_corners

automotive_qsort

automotive_bitcount

automotive_basicmath

Hot Paths

3500 120

3000 100

2500

Total number of Instructions in Hot Paths Number of Hot Paths

80

60

1500

1000 40

500 20

0

Exp. 1 4

Exp. 2 8 3 1 3 1

Exp. 2 12 5 1 5 1

ERSA'05 sh a

cr c3 2

su sa n_ sm oo th ing

tif f2 bw

1 speedup

Br

cj pe g

3 Ld/St

18 1. m cf

FP

m pe ge nc

Int

16 4. gz ip

N (way)

3.5 3 2.5 2 1.5 1 0.5 0

10

fft

crc32

adpcm_encode

adpcm_decode

sha

rijndael_encode

rijndael_decode

blowfish_encode

blowfish_decode

stringsearch

patricia

dijkstra

h264_decode_fb001

tiffmedian

tiffdither

tiff2rgba

tiff2bw

lame3.96

jpeg_encode

jpeg_decode

susan_smoothing

susan_edge

susan_corners

qsort

bitcount

basicmath

speedup

Preliminary Performance Evaluation 4

No Dependency Considering Dependency

3

2

1

8-way 12-way