Code Generation for a Dual Instruction Set Processor Based on ...

5 downloads 3979 Views 70KB Size Report
Sep 24, 2003 - code size and execution time. ▫ More fine-grained approach is required. • Mode switches must be explicitly handled by code generation. A. B. C.
Code Generation for a Dual Instruction Set Processor Based on Selective Code Transformation

Sheayun Lee, Jaejin Lee, Sang Lyul Min, Jason Hiser, and Jack W. Davidson Seoul National University & University of Virginia Sep. 24, 2003 SCOPES 2003, Wien, Austria

Motivation ƒ Code size is often a critical constraint in cost-sensitive embedded systems.

• Due to a limited amount of available memory ƒ Dual instruction set processors provide an effective mechanism to trade performance off for a small code size.

• Ex. Thumb programs are typically 30% smaller and run 25% slower than their ARM counterparts. Sheayun Lee - SCOPES 2003

1

Dual Instruction Set Processors ƒ Full instruction set • Normal instructions, all operations available • Relatively rich addressing modes • Full set of general purpose registers available ƒ Reduced instruction set • Typically a subset of the full instruction set • Shorter instructions (usually half – 16 bits) • Limited operations and addressing modes • Only a subset of GP registers made visible ƒ Examples • ARM/Thumb, MIPS 32/16-TinyRISC, ARC Tangent, … Sheayun Lee - SCOPES 2003

2

Comparison of the two ISA ƒ Full instruction set

• Larger code size, faster execution • Used for time-critical functions ƒ ƒ

Real-time data processing Interrupt/exception handlers

ƒ Reduced instruction set

• Smaller code size, slower execution • Used for non-time-critical functions ƒ ƒ

User interface functions Bookkeeping code

Sheayun Lee - SCOPES 2003

3

Coarse-Grained Approach ƒ Current tool support from ADS (ARM Developer Suite)

• Instruction set mode specified at the module (file) level ƒ

Provided as a command-line switch to the compiler

• Linker support to handle the dynamic mode •

transitions Called ARM/Thumb interworking

Sheayun Lee - SCOPES 2003

4

Fine-Grained Approach A

ƒ Observation • Function-level or modulelevel approaches are too coarse-grained for enabling a flexible tradeoff between code size and execution time.

ƒ More fine-grained approach is required. • Mode switches must be explicitly handled by code generation. Sheayun Lee - SCOPES 2003

B

C

D

E

F 5

Our Objective ƒ Given a program, generate code that

• maximizes performance ƒ

in terms of execution time

• while satisfying code size requirement ƒ

given as an upper bound on the instruction space

• by selectively using the two different instruction sets for different program parts in a single program (even inside a single function) ƒ

taking the mode switching overhead into account

Sheayun Lee - SCOPES 2003

6

Intuitive Solution ƒ Which instruction set should be used for which part of a given program?

• Use the full instruction set for ƒ ƒ

time-critical portion of the code (frequently executed) thereby maximizing the performance

• Use the reduced instruction set for ƒ ƒ

all the other parts (infrequently executed) thereby minimizing the code size

Sheayun Lee - SCOPES 2003

7

Selective Code Transformation Full FullISA ISA program program

Code size diff Exec time diff

Constraint on total code size

Try translating every block into full ISA Profile Source Source program program

Reduced ReducedISA ISA program program

Compile in reduced ISA

Frequency information

Decision on Heuristic which blocks selection of to translate blocks to into ARM mode translate

Transformer: selective transformation Sheayun Lee - SCOPES 2003

Mixed Mixed mode mode program program 8

Program Notations ƒ A program is given by P = , where • V = {vi | i = 1, 2, …, n} • E = {eij = | there exists a control flow vi → vj} ƒ Code size and execution time variables for each basic block v Instruction set

Reduced

Full

Code size

sR(v)

sF(v)

Execution time

tR(v)

tF(v)

Sheayun Lee - SCOPES 2003

9

Formal Problem Description ƒ Find an assignment of instruction set for each basic block that maximizes performance while satisfying code size constraint. f : V → {α , β }  α if vi is to be compiled into full instructions f (vi ) =   β if vi is to be compiled into reduced instructions

ƒ Such an assignment partitions the set of blocks into two disjoint subsets. • F = {v | f(v) = α } • R = {v | f(v) = β } Sheayun Lee - SCOPES 2003

10

Code Size and Execution Time Code size overhead due to mode switches

Total code size

S = ∑ s F (v ) + ∑ s R (v ) + s v∈F

*

v∈R

∆ t = ∑ cV (v) × (t R (v) − t F (v)) − t

*

v∈F

Execution time savings Execution frequency of block v Sheayun Lee - SCOPES 2003

Execution time overhead due to mode switches

11

Mode Switching Overhead Set of edges along which mode switches should occur

E * = {eij ∈ E | (vi ∈ F ∧ v j ∈ R) ∨ (vi ∈ R ∧ v j ∈ F )} Execution frequency of edge e Total code size overhead Total execution time overhead

s * = os × E * t * = ot × ∑ cE (e) e∈E *

Sheayun Lee - SCOPES 2003

12

Optimization Problem Given a program P = , find an assignment f : V → {α, β} such that it maximizes ∆ t = ∑ cV (v) × (t R (v) − t F (v)) − ot × ∑ cE (e) v∈F

e∈E *

while satisfying S = ∑ s F ( v ) + ∑ s R ( v ) + os × E * ≤ U s v∈F

v∈R

Upper bound on the maximum code size for the whole program Sheayun Lee - SCOPES 2003

13

Approximation Method ƒ Path-based approach to selective code transformation • Based on intraprocedural acyclic subpaths ƒ

Acyclic subpaths capture the set of basic blocks that are executed together.

• Enumerate the acyclic subpaths and associate with each of them ƒ

Cost (increase of code size) ƒ Benefit (reduction in execution time)

• A greedy heuristic incrementally selects one path at a time, whose blocks are to be transformed. Sheayun Lee - SCOPES 2003

14

Path-Based Cost-Benefit Model c( p) =

∑ (s

v∈V ( p ) ∩ R

Cost of transforming blocks on path p

F

(

(v) − sR (v)) + os × E M ( p ) − E m ( p )

Set of edges where mode switch instructions are newly introduced

)

Set of edges where previously existing mode switch instructions are removed

   b( p ) = ∑ (cV (v) × (t R (v) − t F (v))) − ot × ∑ cE (e) − ∑ cE (e)   M  m v∈V ( p ) ∩ R e ∈ E ( p ) e ∈ E ( p )   Benefit from transforming blocks on path p Sheayun Lee - SCOPES 2003

Set of blocks on path p that are currently in the reduced instruction set mode 15

Selection Algorithm: Greedy Heuristic B ← U s – SR R←V F←∅ P ← {intraprocedural acyclic subpaths} do {

for each p ∈ P calculate r (p) = b (p) / c (p) select p ∈ P with maximum r (p) with c (p) ≤ B B ← B – c (p) F ← F ∪ V (p) R ← R – V (p) P ← P – { p | V (p) ∩ R = ∅ } } while (B ≥ minp∈P{ c (p) } ∧ minp∈P{ c (p) } ≥ 0 ∧ R ≠ ∅)

Sheayun Lee - SCOPES 2003

16

Implementation ƒ Based on vpo (very portable optimizer), targeted for ARM/Thumb architecture

• RTL representation of programs • Instruction selection mechanism based on peephole optimization ƒ

One or more Thumb RTL statement can be combined to form a single ARM RTL statement.

• Automatic insertion of mode switching instructions ƒ

Based on control-flow analysis

Sheayun Lee - SCOPES 2003

17

Experiments ƒ Test programs are taken from • MiBench • MediaBench ƒ Six different versions of code are generated for each of the test programs. • With different code size limits ƒ Each test program is run on an evaluation board for execution time measurement. • XScale core-based PXA250 processor • Execution times are measured by using the gettimeofday () system call. Sheayun Lee - SCOPES 2003

18

Results sha

G.721.encode

1.8

1.8

1.6

1.6

1.4

T 20% 40% 60% 80% A' A

1.2 1.0 0.8 0.6 0.4

1.4 1.2 1.0 0.8 0.6 0.4

0.2

0.2

0.0

0.0 code size

execution time

Sheayun Lee - SCOPES 2003

T 20% 40% 60% 80% A' A

code size

execution time

19

Conclusions ƒ Code generation framework that can exploit dual instruction sets

• Path-based cost-benefit model • Heuristic selection of instruction set mode for •

each basic block Automatic handling of dynamic mode switches

Sheayun Lee - SCOPES 2003

20

Future Work ƒ Efficient post-pass register allocation algorithm

• Exploit all the available registers in the • •

transformed code sections Need to precisely analyze the impact of allocating each program variable to a register Further enhance the performance of the resulting mixed instruction set mode program

Sheayun Lee - SCOPES 2003

21

Suggest Documents