An Integrated Approach to Retargetable Code Generation - CiteSeerX

7 downloads 0 Views 536KB Size Report
(ILP), which provides an inte- grated approach to several traditionally separate sub- problems in code generation. We not only have a uni-. 0-81 86-5785-5/94.
An

Integrated

Approach Tom

Wilson,

to

Gary

Retargetable

Grewal,

Ben

VLSI-CAD University Guelph,

Ontario,

Halley,

challenge

Canada

N1G2W1

fied model of the problem,

instruction

compilers

set

of

because

processors

instruction

erful

(ILP)

model.

methodology

for

for

ILP for

modeling

code

on data;

Introduction

1

data

creasingly

Set wide

Processors

use in

of their

inherent

capable

of generating

matically such

compilers

monly

ular

chip

application,

for

which

tion

available

different

chip

We have

developed that

a variety

model

the

teger

linear

grated problems

code chips,

constrained compilers

a difficult to

task

for

mention

poorly

prepared another,

any

The

one

a family

entire

code

program

approach

both

very

purpose (ILP),

quality

problem

which We not

provides only

Most

have

$03.00 @ 1994 IEEE

compilers register

variable

conflicts

in-

example.

Special

signment

techniques

sub-

when

compiling

a uni-

code

generation

might

registers, in

the

purpose code

methods,

a

an address

is

an operand on yet

assignment

such

have focused

process, inspire Most

as [5-8],

stressed on

live

[I-4]

for

heuristic

the major chips.

that

to handle.

schemes

ISP

streams

parallelism

have

on

another.

operand

be able

registers

registers.

which

such

an”d provide for

of

used

and

been

to engender

operation

assignment

We an inte-

the

has

tends

to obtain

being several

from

purpose

as an

separate

among

markedly

general

70 0-81 86-5785-5/94

parallelism

used

operand

in

different

instruction-level

ensemble

style,

cycle,

the

code

architectures.

traditionally

generation.

and a work-

high

ISP

generation

to several

in code

a methodology

friendlier parallelism

on one with

differs

of

a

programming

conventional

can generate

of special

and

have

less

ex-

generating

research

with

be used

memory,

memory,

published

by

are quite

ones may

to data

registers

machines

instruction-level

pipelined

certain

different

relevant

of the toward

genera-

are make

And

to imple-

is surrounded

capabilities

program

Much

The

is often

from

reg-

memory;

er, possibly

writing

of capabilities!

parallelism

hardware

Only for

the from

address

to data

ALU

whose

combinations

oriented

a partic-

architecture,

features

not

(DSP)

specialized.

etc.

preparing

The

of registers

a constant

access

count

branch.

constants

addresses,

com-

itself;

program

operations,

tracting

Furthermore,

enough”

and

ALU

with

for the next

the

number

varied

is often are

this

to retargetable

of data to or from

a register

instruction

a value

a conditional

a small

designs.

ing prototype for

that

chips

approach

movement

loading

updating

for

consequence

optimizing

ISP

but

processing

~tjust

architectural

quality

special

to market,

to optimize

The

or heavily

Awkward

of high

of the

has

function.

an unconventional

ISP

signal

designed

it often

the

dra-

requirements.

is specially

perform

suit ed.

digital

would

of code .

with

ment

available.

quality

real-time

and

because

concurrent

of the

isters

in-

Compilers

chips

time

applications for

attendant

if the to

the

these

generally

is the high

by

finding

versatility.

product’s

are not

employed

with

for

are

products,

and

code

their

One problem demanded

commercial

flexibdity

hasten

(ISPS)

such integrated

memory;

a field Instruction

to important We believe

We have concentrated on ISP chips that are specially designed for DSP applications. The chips in question have a single ALU that can do a multiply and accumulate in a single instruction cycle, Although the instruction repertoire is limited, the instruction words are long, permitting several things to occur in parallel during each cycle. These include: one ALU operation

of ISPS.

a variety

adapted

architecture.

code generation.

a pow-

high-quality

but one which is sufficiently

it can be easily

in the target

is the first

level par-

provides

generating

that

variations

(lSPS)

allelism, small numbers of registers, and highly specialized register capabilities. Many traditionally separate subproblems in code generation have been unified a single integer linear proand jointly optimized within gramming

Banerji

Group

abstract purpose

Dilip

Generation

of Guelph

Abstract Special

Code

as-

bottleneck published

approach

the

problem ally

by solving

employing

extensive

a succession

heuristics.

peephole

unoptimized

of subproblems,

One

approach

optimization

generated

to make

code.

a code

generator

that

lem

at once

to obtain

an integrated

that

thrives

on special

the

In contrast,

have

considers

purpose

as sets of registers

usu-

edges

[9, 10] uses most

we not

the

solution,

gives

but

one

registers.

Code

Generation

which

model registers

instructions important

idea

that

The

2.1 We

Subproblems

begin

generated The

with

a data

by the front

following

that flow

are

graph

Included

(DFG),

which

end of an appropriate

subproblems

must

be handled

is

1. Map

combinations

onto

more

of

inclusive

multiply

and

generic

machine

DFG

use

operations

instructions,

such

live

value

must

be consumed

Another

add;

tions 2. Schedule them

operations

to specific

3. Assign when

data

on

control

functional (and

steps

and

bind

units;

address)

“extra”

alternative

values

to

registers

into

ber

of registers,

ily

store

values

exceeds

spills

introduce

certain

5. Introduce

of live values

which

to

resolve

registers

and

temporar-

copies

problems

with

the

in memory;

register-to-register

points

the num-

values

with that

at

special around

loops;

are required concerns

Similarly,

across

control

block

generic

registers

boundaries

and

7. Correctly

around

compact

(highly

parallel)

imum 2.2

consistently loops;

the

individual

machine

number

of final

Important

components

instructions

into

of

the

patterns

with

a min-

instructions.

Concepts

candidate

registers

in

patterns

that

ated

Model

code

Our Our

solution

gram.

Thus

tioned of any

is to

and

trade

Although tecture,

model the

of

off issues the

reflect model

the

the

any

arise

realities

solution

within

the

in

an

can

re-

address its

various

spill

result.

terms,

archi-

supports example,

certain

use at another. at

one

point points useful

have

subsume

cannot

which

overlap

edges

require

edges

elements

in the

implication some

and

gener-

regions one

requires that

feature

plus

op-

be adcurrent

reserving

an ad-

the array

using

autoincre-

of the

code,

use of the

point

selection using

access

could

base

or by

reference

Similarly,

among

array

its

of access,

at

of in-

instruc-

may

Operations,

and traversing

register

groups machine

potentially

(recomputing

point

Within

register

71

by

design.

active.

For

either

final

as final

solution

ex-

addressing,

operation

chosen

of only

Another

relate

affect

design.

also

Another

such

final

model

associated

application

of a particular in general

ment.

a final they

represent

at each

are

in

are called

dress register

criteria

constraints

an integrated

constraints

is expressed

the

offset

that

copies

code

inclusive

be chosen

patterns

and

the

which

objects.

dressed

menwhose

correctness

all

tional

pro-

inequalities,

ILP,

that

linear

subproblems

the

providing

actual

integer

linear

Since

by

thus

an

of the

specify

solution.

together

on all

terms

effect

“subproblems”, of the

based

model

in

feasible

considered late

we

above

combined

is

other,

generated

in the

DFG

For

characteristics

appear

not

code.

or register

of array

to more

in-

represent

are inserted

solution.

generic

chosen

each

spills

op-

DFG

final

modes

patterns,

same

the

in the

must

contains

which

on overall

for a correct

or may

several it;

of

may The

This

The

operations

where

operations

tions.

DFG

use in

alternative

the

DFG

other

to be handled

choose.

or copy

appear

one of which

onto units).

operations,

depending

if they

our

may

for

spill

They

structions, 6. Assign

and

ample

pupose

wrap

edges

one

is gen-

way.

is that

at places

solution.

exactly

certain

uniform

other

as sequentially

registers

possibilities

DFG

same

regis-

order;

operations)

purpose

ILP

be useful,

the

functional

the

optional

the

might

before

(like

same

some

(like

idea

which

example,

possible;

4. In case the number

basic

from

cludes

and

regis-

conventional

the

in

the

objects

special

allows

to

register

is essentially

in a systematic

as

mapped

that

notion

approach the

When-

must

resources

we

support.

mutual

are

non-sharable

and patterns can

and

they

other

one

sets of

view.

ter,

This

What

conflict”

edges

scheduling

This

of a scheduling

DFG

erated.

way

in-

DFG.

of registers,

machine

variable

two

compiler.

generator:

“live

ever

by a code

the

abandoned

in favor

DFG

inclusive

operations, the

is the

have

stresses

incompatibility,

of

numbers

that

We

certain

retargetability.

the

for various

assignment.

for more

parts

its inherent

depicting

view

be used

represent

cover

to do is change

An

Integrated

could that

allowable

ter 2

and

the

needs

prob-

that

‘{patterns”

structions

of

only

entire

and

related

the

same

is assignment

to a set of edges in the data

flow

might

imply

of an optional spill

code

at

value. of the graph

same

(DFG).

One

application

where and -

one

involves

edge

another

represents

(recentering

the same

the

To support constraints

edges, a loop,

general,

the

ability

blocks

when

and

edges

assign

flexibility,

the ILP

are dynamically

on the

code

blocks

current

ILP

contains

enabled

by

and

values

of solution

There

variables.

that

operation

given

a compiler

generic

path

has

also

lected.

been These

which

can

DFG

of data

with

and

Registers

a single An

and

such

If certain certain

we data

specific

then

within

the

makes

set

of

ducing trol

blocks

operations control

data

movement, in turn,

in-





t as a

using

the

but

the

be included, in

one

attempt

correctness

list

and

criteria

summarizes

conveys

the

for

what

general

any

the

con-

strategy

of

a basic

DFG

operation

can be included

in at most

pattern;

a DFG

operation

is active

pattern

and

if it is not

if it is met

covered

by

by at least

one

to the design

and

edge;

an edge is active totally

if it is essential

within

an active

an edge is inactive

pattern;

if it is totally

within

an active

pattern;

to us to



by each

address

certain tive

cal-

edge

sets are

(because

either

all

active

they

belong

to the

sets

may

have

they

represent

or all

inac-

same

alternative

most

one

implementation);

recognition

Such



possibilities

by appropriate

certain

edge

member

scheduling

(because

at

active

alternative

im-

to a register

that

plementations);

no

distinction

designs. must

between

A DFG

This

merges.

to

in

such

a logically can

be

depict Additional

single-

spanning

be structured within

nodes and



model.

blocks.

also

“optimum”

are the

guarantee

active

that

allocated

and

can an

following

an active

only

allows

permits

The

one active

56001

required

in resources.

remain

“dummy”

branches

been

solution,

dur-

model

assumption

This,

multiblock

control

that



be updated

use with

this

practice

in

between

Motorola

for in

in treats

model.

not

already

be circumvented

model and

The

of resources

conflict

The eral

has

family

options block

ma-

can occur

may

equivalent approach

any

number

longer.

constraints

straints our

capable

as autoincrement,

considered

time

so-

and

required

alternative

to find

be somewhat

or accumulators.

restricted

This

operation.

of any potential

add.

for

the

ILP

Presumably

t, is prespecified,

within

parameters

times

solution.

edges.

ALU

is

The

the feasible

function.

seeks a minimum

cost

The

the entire

of data

assume

memory.

manipulation,

culation

can

by the

and

operation

Other

se-

and

steps,

any

function:

running

operations

developed

addressing

are

memories,

the

specifically

seeks

rein(t)

an architecture.

memory-resident

here.

of executing

fastest

no objective

the

patterns

nodes

execution.

registers

a particular

to

covering

registers such

objective

replaced

is multiply

ALU

ways,

instruction

exemplifies

regis-

Inter-block

addressed

ways

solution

steps

been

DFG

was

control

and

instructions

transfers

operand

standard

ing regular

same

Model

and

of control formed

variable

handle

non-pipelined

used for memory

in certain

must

here

the

basic

two

requires

has produced

it subsumes

operation

one or two

memory

data

example

of

simplest

other.

correctly

yet

single

is chosen,

presented only

not

into

manipulation.

parallel

know

but

it – both

with

use of the

is not

block

on inter-block

edge segments.

as such

are The

to any

appropriate to

of primitive

A classic

model

chines

ternal

identified

a pattern

within

The

are

A set of potential

be combined

such

have

example,

are groups

architecture. When

that

for

widths.

front-end

operations

sequences

architecture,

data

of

Assumptions

that

through

relevant

traversing

be represented

connecting

Overview

3.2

several

the number

Operational

a DFG

between

Values

or disabled,

Model

We assume

either

movement

correctly 3.1

operations

points.

or be associated

lution

The

may

ter on logically

model. 3

certain

merge

boundaries

the correct

several

keep

branch

same

to

enables

constraints

at once.

such that

“cyclic”

leaving

– perhaps

edges

of control

considered

depending

In

to different

interconnection

up

a value

a value

loop.

register

are being

linking

represents



sev-

points

by where

node

edges

must

be assigned

to an allowable

register

set;

a way

acceptable

done

active belongs



edges trol

introcon-

ordering

’72

that

represent

block,

between

around

a loop,

cept

a copied

for

the same

must

control use

value);

value, blocks,

the

same

within or

a con-

wrapping

register

(ex-



active

edges

tween

the

active



operations

in the unit,

DFG,

edges the

same

register,

quential

the

not

but

must

related

by

same

in some

are

order

are

use that

functional

and

of the

same

assigned

register

to

the

in some

se-

Variables

and

following

symbols

index

conflicting

over

ALUs

V

potentially

conflicting

over

registers

V

that: or

not

erations.

covering

i, j, h, k

operation

If

7’

Membership

a value

of nodes nodes

it connects,

are used

tifying

edges

The lected

within

the

activated

the

conveyed

final

two

auxiliary

ILP,

are defined

pairs

besides

to

the

be

step

and

others for

register

as-

results

are

some

of which

of which

must

(y,t,z,x,u),

internal

use

type intgr

t

by

intgr

final

step

opi =

total

if 1, Opi activated

Xi j

o-1

if 1, edge

(i, j)

activated

U~j,

o-1

if 1, edge

(i, j)

uses register

o-1 o-1

if 1, ~i

must

operations

(nodes

operations

that

precede

of the

could

opj

not

constores

otherwise

conflict.

sets exclusive

mutually

dependent

requiring

same

one

edge

may

contain

alternatives for

activation

register

of other

of Cc identify ter.

Such

successive though

whose

are

wrap

It

must

that often

iterations.

also the

require

They

in the

patterns

that

cover

opi

patterns

that

cover

edge

~j

registers

suitable

for

edge

are sets of operation

an

activation set.

The

same

segments to

convey

are logically

separate

contain

the

interlock a loop

be actiall require

implies

from

all

that

may

activation

around

physically

A.

set

members

edges

edges

that

each

a set of edges status.

of a number

ments

from

activation

The

Constraints

sets regis-

or seg-

a value

to

“connected”,

model.

most

one combining

k

Detail

pattern

may

cover

any

op-

eration:

From opj

fOllOW O’pj objects:

r

ing

last

set of alternative

any

edge must

be either

pattern. by

member

of some

nal

design

has its

xij

(2).

=

Such

A= and must

or be covered and

either

by a pattern.

the

one

by a combinedges

an

has no possible

1 from

A=, exactly

or covered

non-alternative

constraint

alternative

edges,

activated

Ordinary,

handled

is not

DFG)

appear

Bi B~j

following

are

will

may

in which

mutually

edge

edge

are

also

is the

sole

appear An covering

in the fiedge

that

pattern

outset,

(i, j) (Lj)eA~

(i, j) Each

The

(i) opi

operations

possibly

must qij

PEB