Synthesis of systolic arrays by equation transformations - Application ...

1 downloads 0 Views 558KB Size Report
mapping, or by using a firing-squad algori thm[Wak66] to implement broadcasting using only local ..... L. Lamport. The parallel execution of Do-Loops. Comm.
I

Synthesis of systolic arrays by equation transformations* Catherine DEZAN IRISAtand ENST Bretagnet Eric GAUTRIN, H e r d LE VERGE Patrice QUINTON, Yanniclc SAOUTER IRISA

Abstract

Synthesis of systolic arrays, froin formal specifications down to a chip, can be done using recurrence equations. The ALPHA D U CENTAUR environment that we present here implements such a design trajectory. Programs, written in ALPHA language, are rewritten using formal transformations (space-time reindexing, pipelining, control signal generation, etc.), and finally translated into a form suited to conventional VLSI design tools. We present the principle of systolic synthesis using ALPHA D U CENTAUR, and we describe a systolic correlator that has been completely designed following this method.

Introduction The idea of systolic arrays was primarily introduced by H.T. Kung and C.E. Leiserson[I 0. To this end, the Hamming distance h, is calculated between r and every subsequence e, ,..., et+N-] of e. The output of the algorithm is a binary signal s, equal t o 1 if the distance is greater than 3, and equal to 0 otherwise. These computations are described by the following equations: N

(3) J=1

s; = (hi > 3)

(4)

where @ is the exclusive-or of two binary values. Equation (3) can he rewritten as a decreasing recurrence on the index j with the introduction of a new variable 11 indexed by i and j . The equation ( 3 ) hcconies

if j = N + 1 then 0 if 1 5 j 5 N then I l ( i , j h; = H ( ? , l ) .

H(2,j)

=

+ 1) + ( r , gi e,+,-l)

The architecture that we have in mind is a linear and unidirectional array composed of N identical cells whose motlrl is shown in figure 1. The computation of h, is performed by successive accumiilations. Each cell computes the exclusive-or between a bit of T and one of e , and adds the result t o the intermediate value H input from its left neighbor. When H becomes greater than 3, the s signal is forced to 1 by the cell. Therefore, H can he encoded on two bits, HO and H I . The T value is memorized by a register loaded from an input cell and controlled by the load signal L . The initial ALPHA program which follows is derived from these spccification :

332

Intemadonal Confcrcnccon Application Spec&& Array Processors

e

L r -

HO

Figure 1: Schematic of one cell of the target systolic architecture. system correlntor ( e : {ill 5 i } of boolean; r : { j l l 5 j 5 4) ofboolean) returns ( s : {ill 5 i} of boolean; ); var HO,Hl,S : {i,jll 5 i; 1 5 j 5 .5} of boolean; INC : { i , j l l 5 i ; 1 5 j 5 4 ) of boolean ; let INC = r . ( i , j -+ j ) # e.(z, j --t i j - 1 ) ; HO = case {i,jli 2 1 ; j = 5 ) : false.(i,j +); { i , j / i2 1 ; j 2 1 ; j 5 4 ) : HO.(i,j -+ i, j 1 ) V INC; esac; H1 = case { i , j l i 2 1 ; j = 5) : false.(z,j +); { i , j l i 2 1 ; j 2 1; j 5 4 ) : H I . ( i , j + z , j 1) # ( I N C A HO.!i, j + i , j 1)); esac; s = case {i,jli 2 1 ; j = 5 ) : false.(z,j +); { i , j l i 2 1 ; j 2 1 ; j 5 4) : S.(i,j -+ i , j 1) v I N C A HO.(i. j i , j 1) A H I . ( i , j + i, j 1); esac; s = S.(i + i , 1 ) ; tel ;

+

+

+

-+

1

+

+

+

+

Synthesis cud Vrrification

333

The first transformations, that we do not represent here, is to apply a pipeline on signal T, and an input/output pipeline on signal e. Then, the space-time reindexing ( i , j + t = i - j 8 , p = j ) has to be done on all variables. (It is fair to mention that the determination of this reindexing can be done fully automatically by dependency analysis, but is not yet integrated in the ALPHADU CENTAUR environment. However, the change of basis, once the reindexing is given, is implemented.) The effect of this reindexing is shown on variable HO :

+

var

... Declaration HO : { t , p l p 2 1; 5 2 p ; t let Eqiiat i on -

+ p 2 9) of b o o l e a n ;

... HO =

case { t , p l t 2 4 ; 5 = p } : f a l s e . ( t , p-+); {t, pJp 2 1; 4 2 p ; t p 2 9} : HO.(i,p INC.(t,p 4 t,p); esac;

+

+

t - 1,p

+ 1) V

... tel ; The next step is the decomposition of this equation in a set of ALPHA-0equations. This is done by using the subst,itution mechanism which int,rodiices two new variables HOCON and HOREG, as shown below :

var

... HOREG HOCON HO ... let

:

: :

{ t , p l t 2 5 ; 4 2 p ; p 2 1) of b o o l e a n ; { t , p l t 2 4 ; 4 2 p ; p 2 1) of b o o l e a n ; { t , p ( t 2 4; 5 2 p ; p 2 1 ) of b o o l e a n ;

... HOREG = HOCON.(t,p + t - 1 , ~ ) ; HOCON = H O . ( t , p + t , p + 1 ) : HO case { t , p l t 2 4 ; s = p } : false.(t,p +); ( t l p l t 2 5 ; 4 2 p : p > - 1 } : HOREGV I N C . ( t , p + t , p ) ; esac;

... tel ; Each one of these new equations can be directly interpreted as hardware components:

334

Intmuahnal Conferenceon Application Specw Array Processors

' Cell Number 4

16

Core Area 0.58 mm2 2.38 mm2

Chip Area 3.87 mmz 7.51 mmz

Simulated Frequency ' 69.5 MHz 65.5 MHz

Table 1: Area and frequency of the systolic correlator chips

I t l -

File Disolav E d i t S e l e c t i o n s ALPHA k y s t e m Example ( X : { i l l < = i < = 3 ] o f i n t e g e r ) r e t u r n s (s : i n t e g e r ) ; var sum : { i i @ < = i < = 3 ) o f i n t e g e r ; let case sum

I

GENl

~

{ili=8) : @.(i->); {ili>E} : X + sum.(i->i-l);

=

s

esac; (case

{i' l .~ - B ): a.(i->); {ili>B) : X + sum.(i->i-l); ssac).(->3);

tel;

Figure 2: Edit window of ALPHADU CENTAUR 0

signal HOCON of processor

1 iq

to be connected to signal If0 of processor

2+1, 0

HOREG is the output of a register with input HOCON,

0

HOis: - for processors 1 to 4, thc oiitpiit of a logical or gate between HOREG

and I N C ,

- for processor

5 , connected to false (Processor 5 can be interpreted as the hardware which initializes d a t a for the boundary cell of the array.)

Following these principles, two versions of the correlator, one with 4 processors ~ technology, and the other with lF, mere automatically produced in a 1 . 5 CMOS using SOLO 1400. Table 1 summarize5 the features of these designs.

Conclusion The current version of ALPHAD U CENTAUR is a menu driven program transformation environment. Figure 2 shows a n cxample of edit window containing the

Synthesis and Ver@cation

335

program (2). This editor is syntax oriented. In order t o apply a transformation, the user selects an expression in the program and a menu entry. If the transformation is legal, the program is almost immediately modified. The transformations make an intensive use of convex calculation routines, which are implementrd using Chernikova’s algorithm[FQ88]. Extensions to the environment are planned, in particular partitioning methods[Bu], and translation to VHDL.

Acknowledgement The authors would like t o thank their partners of the Esprit Basic Research Action NANA, for many useful and very stimulating discussions on regular array synthesis.

References [BCD*87] P. Borras, D. Clbment, Th. Dcqpeyroux, J. Incerpi, G. Kahn, B. Tang, and V. Pascual. CENT.4 I’R : / h e System. Technical Report 777, INRIA. December 1987.

IB111

J. Bu. Systematic Design of Regular L’LSI Processor Arrays. PhD thesis, Delft University of Trchnology, Delft, The Netherlands, May .

[Clie86]

M.C. Chen. A design methodology for synthesizing parallel algorithms and architectures. Jourrinl o,f Parallel and Distributed Computing, 461491, December 1986.

[C384]

P.R. Cappello and I

Suggest Documents