Real time graphical simulation of systolic arrays

3 downloads 0 Views 384KB Size Report
Department of Electrical Engineering. Portland Slate ... or register contents in the form of animated schematics or timing diagrams. [121, [151. [161. ... of systolic arrays - is proposed below. The idea ... script 6le. or changed at the run time. AU visual ... codc and many sample script filcs are available in Appendix C to [9]. 171.
REAL TIME GRAPHICAL SIMULATION OF SYSTOLIC ARRAYS Hai Van Dinh Le Marek A. Perbwski Department of Electrical Engineering Portland Slate University P.O. Box 751 Portland. Oregon 97207 tel. (503)464-3806 x 23. ABSTRACT 'Ihis papex presents a new concept in visual simulation Systolic Arrays Graphical Simulator, SAGS - that allows for the visualization of the data and control flow in various systolic arrays on the monitor sawn. The systolic architecture is consmrted interactively from various basic cells and arrays. The simulator has successfully been used to simulate several wellknown and new architccturcs and has even proven itself to be useful in finding and correcting an e m in a well publishized algorithm for general pvpose maaix computations that is currently being realized in hardware and can have potential applications that rquire e x m e reliability.

1. INTRODUCTION Simulation techniques play an i m p o m t role in &sign analysis, correctness verification, performance evaluation, and test g e n e d o n f a digital systems. Many simulators are widely avrulable f a the university and industrial user communities. Recently a new generation of simulators is becoming popular that permit the USQ to visualize the data flow,logic levels, or register contents in the form of animated schematics or timing diagrams [121, [151. [161. [17l. U81. U91, [U)]. PI]. The arrival and wide acceptance of systolic pmcessor~[ll, [21, MI, [51, [ 6 ] ,together with the difficulty involved in designing as well as the problems concerning the analysis of systolic algorithms, call for new kinds of CAD tools for their synthesis at many levels, optimization, verification, and simulation. It would be very advantageous if these tools were visual and combined the properties of the tools from [12], [I51 - [21] with interactive editing ("schematic capture") and the modification of systolic architectures of various sues, types of arrays, and cells linked in many configurations. Perhaps visual simulators will be the first tools of this type. Such a tool - the visual simulator of systolic arrays - is proposed below. The idea behind such a tool Seems very natural and perhaps comes from the originator of systolic arrays, H. T Kung, who would overlap and shift s e v d overhead transparencies m illusuate data flows in systolic anays. With the exception of [I]], howevu, w e have not been able to identify such tools and we are also not aof anyone using them in practical applications. The classical method of displaying the simulation data and the ndts is totally useless for systolic arrays. A s y s m designer must be able tb observe the movement of every piece of data as it traverses thrwgh the m y at all tunes to verify whether a given algorithm is correctly mapped into the corresponding array architecture. He must also be able to see the results from the operations performed on each piece of data by any of the cells. Furthermore, for debugging plrposes, he must be able to lodc into the registers of every cell at any one time and see the values of all the control signals present in that cell. In short, he must have the most detailed view of the entire system that may consist of many arrays composed of many types of cells This pexspective must be available to him with the accuracy of single pulses during the entire simulation process. To meet the above requirements. a new breed of simulator - a systdic array simulator - has been developed and built by us to aid a hardware or software designer in the taslr of designing. evaluating, verifying, debugging, and reaching systolic systems. It was deemed essential that this simulator should be graphics based, hence its name Systolic Arrays Graphical Simulator. or SAGS for short

2. SYSTOLIC ARRAYS GRAPHICAL SIMULATOR From the vuy beginning. SAGS was designed to simulate the systolic systrms of any confgurarion. These configuratlon are specified to SAGS by mesllS of scriptfrles. A script file contains all of the vital information about the system: its n u m k of m y $ , their types and sizes,the way thcy are linked together. and the microprosrams used in w h cell. A script file also specifies when and where the input data and the cona01 signals should be fed into, and the outplt dam uken from, the systun. SAGS allows for systems with multiple inpuf cmtrol, and output data arcams. Each input M control meam is stored into ASCII files prior to being accessed by SAGS. Similarly. SAGS outputs are written into ASCII files. During the simulation run time, SAGS consistenlty and successively e x a m all of the steps involved in a problem, showing the results from each step on the screen. This is r e f d to as a mulrisrep mode of execution. It can be stopped and restarted at any time. Alternatively. SAGS can single-step through the problem, allowing a more detailed inspection of the temporary results, which is refared to as a single-stepped mode ofuccw'on. Switching beoveur these two modes can be easily accomplished at any time by typing the appmphw command in at the keyboard. Visually. SAGS allows for all of the arrays of the system to be sten on the monitor srrcen as long as cach army has a reasonable n u m b of cells. the real estate of the monitor screen is limited, arrays can be overlapped. 'Ibis m a n s that one from the hdcground can be brought into the forepund to be scrutinized at any the. In addition. individual anays can be intexdvely positioned anywhen? on the screen to match the systcmschematic closely. SAGS allows for an anay to be viewed in two different ways: a real perspccave. with the array and its cells apparing smalln and therefore containing less infomation, and a full pcrspecrive in which the cells show all their recontents. The perspective of an array can be specified in the script 6le.or changed at the run time. AU visual changes made to a system COnfigraatiOn at run time c m be morricd back to the script file for reuse. A stmu bar on the top of the screen displays additional information such ILE the c k k ~ l c pnumber. r h t~otal execution time, and the array a c d y being s e l d The USQ can measun an execution time for a cell from their miaocodes This time unit is used as a step which allows for the evaluation and the compxison of the speed of various algorithm variants Other propaties of secondary importance like, for example, the shape of the cells or the array color, are also implunented. SAGS has been used for debugging, verification and performance evaluation of various algorithms. It aids in experimenting with new designs, for example, in comparing various systolic architectures for a systolic algorithm or in playing with different cell algorithms for the same architecture such as Gausian elimination and Givens rotations. SAGS was developed on an IBM Personal Computer, running the MS DOS opzating system. It was written in Turbo Pascal [13]. a dialect of the Pascal programming language as descn'bed by Winh and Jensen in [141. Its current vcrsion limits the architecture to 15 arrays using the T u r k Graphir Toolboxm, but SAGS itself can handle an unlimited number of arrays depending on the memory available. Each cell can have two inputs and two outputs on each of its four sides. Then is practically no resmction concaaing the n u m b of inputs and outputs from the architecture or with respea U) the number of different types of cells and arrays. The SAGS socodc and many sample script filcs are available in Appendix C to [9].

171 ISCAS '89

CH2692-2/89/0000-0171$1.00 0 1989 IEEE

They are also in ASCII format on floppy disks and are available on request to intuested parties. To produce an executable SAGS copy, two software packages are need& a copy of the JXSW TW~W~ m c d "compiler (version 3.0) and a copy of the Turbo Grophir Toolbox" (version 1.07). both available from Borland International. Inc. Since SAGS is graphics-based, a video card with bit-mapped graphics capabilities is needed to run the program. Because computer graphics and simulations are floating-pointintensive applications, the use of a numeric co-processor is highly recommended. For SAGS to ralie advantage of the numeric co-processor, it must be compiled using a compiler version that supports 8087 floating point mathematical routines.

3. ONE EXAMPLE OF SAGS APPLICATION SAGS was created to help in verifying and debugging systolic processors for linear algebra applications, particularly for various implementations of Faddeev algorithm DI. Many archimtwes,including those from [2]. [4]. [6],[7land [81 have been simulatedin [91.Indeed it was. during the use of SAGS, while simulating the Nash implementation of Faddeev's algorithm [6]. [7l,that the mentioned bug in its boundary cell microprogram was 'discovered and identilied. In h e Appendix A, six snapshots from a series of 13 will illuswte the simulation of Nash's systolic design. Each snapshot is a SAGS scrcen output for one execution step. To appreciate the usefulness of SAGS in finding and correcting the mor. l d us 6rst explain the Nash's architecture which is shown in Fig. 1. It is based on the Faddeev algorithm. This is a g e n d purpose algorithm useful for a wide class of I .a& operations and especially suited for systolic implementation. It calculates the value of CX + D, given AX = B, where.A, B, C, and D are known ma&iics of d e r n, and X is an unknown manix. The problem is formulated as

[a.

--%-

-c If by some means, a suitable linear combination of the rows of A and B is found and added to the rows of -C and D as follow

if

x!"= 0 beg t n

--3k

-'

CO",

-C+M

then

= 0 r - 0 S,Ul

whae W specifies the approPriate linear combination such that only zuua appear in the lower left hand quadrant then the lower right hand quadrant will become manix E = CX + D. This is because annihilating -C quires W = CA-' sothatD + WB = CA-'B,and since AX = B,D + WB = D + C X The elegance and simplicity of the algorithm is apparent when one notes that to carry it out it is only necessary to annul the lower left hand quadrant by applying a suitable matrix triangularization procedure to the left side of (1) while extending the operation to its right side. This can be done using Gaussian elimination. To improve the numerical stability of the Faddeev algorithm, Nash suggested a modification to it by repking the Gaussian eiimination procedure used to aiangulante matrix A of (1) with orthogonal h g u larizatwn. The m y cowisrs of two typts of cells: tk boundaty celk (represented by circles) and the inrrmal cells (represmted by squms). Esch c d su)rts a micropgram. Changing t k micropmgrams of cells will allow the array to execute different procedures. For clarity, it is useful to divide Nash's algorithm into a m p h a s c procedure. In the first phase, A is mangularired by a series of Givens rotatwns (simultaneously applied to B ); in the second phase, the diagonal elements of the resulting triangular matrix are used as pivoting elements in the Gaucinn eliminolion procedure on C and D, where columns of C will be z d out and D wiU become the result. Note that for the Gaussian procedure to work properly. it is necessary that these pivoting elements be non-zao. Hence, the requirement that A be a full rank.i.e. at least one of irs square submamces of order n has a non-zero determinant Nash's systolic implementation. shown in Fig. 1, consists of a uiangular array and its right extensioa a square array. The triangular array. based on Kung's dcsign in [4]for orthogonal viongularizafion. performs Givens rotations on A (first phase) and ordinary Gaussian elimination on C (second

end else begin ?

- fi" -

CO,,

s,,,

r/f

* Xl,'*t

r = 7

end

DELAY CELL :

The cell's execution codes areshown i n ~ i g m c 2 Funhum=, the added prossing of ordinary Gaunirn elimination requires the extra ccdes shown in Figure 3.

172

Department ot ELecmcal and ComputerEnpeenng and The coordinated Science Laboratory

Nash's array computing (2). Only steps 2.3.4,5, 1'2, and 13 are shorn h ' n the series of 13, but these p i c m clearly illustrate the sequence of evens leading up to the erroneous results. (lks a w n views have been overlapped slightly so that they will fit on to a single page).

BDUSCARY CELL

4. CONCLUSION

L.

;cnpororiiy u n u s e d n-blt

We have presented a new approach to simulate systolic pessm. A visual simulator allows for the visual analysis of data flows in systolic armys of various configurauons. This approach allows for the. visualization of the behavior of such systems and is, therefore, also an excellent system design educational tool. It proved to be very useful in describing and analyzing several well-known and new architectures. A microcode error has been identified and comcted in a well-known algorithm thanks to this simulator.

bus

Fig. 3 The square array simply extends the corresponding pmcessing to B and D and thus only consistsof square cells. "he input data flow involves feeding A and B through the system fmtn the top with cells executing the micropm grams CodCS shown in Figure2 on each incoming row. Thisconesponds to the 6rst phase of the modified algorithm. Noolce that the r e q u i d skewmg of the data flow is paformed by a triangular anay of delay cells (represented by rectangles) above the system. The second phase is accomplished by a similar flow of C and D. Only this time the cells execute the m i c m p r o p m shown in Figure 3 on the data elements and the multing matrix will appear row by row coming h m the boaom of the square array. These output rows are saaightened back to normal by another hiangular array of delay cells below the square array. The input data flow can be contiguous, i.e. matrices A and B and then C and D can enter the array without any intermpb'on in between.

5. LITERATURE

H. T. Kung,"Why Systolic Architectures?",IEEE Computer Magazine, Vol. 15,NO. 1, Jan~ary1982 pp. 37 46. H. T. Kung. and C. E. hiserson, "Systolic Arrays (for VLSI)".S w s e M o n i r Proc. 1978,SIAM. 1979.pp. 256 - 282. D. K. Faddecv, and V. N. Fad&eva, Computational Methods of Linear Algebra, W. H. Frr~manand Company. 1%3. pp. 150 - 158. W.W. Gentleman, and H. T. Kung, "Matrix Triangulanzatlon by Systolic Arrays", Proc. SPIE - The Intemmionol Society of Optical Engineering. Vol. 298,1981,pp. 19 - 26. H. T. Kung. "Systolic Array for Orthogonal Triangularization", Proc. SPIE, San Diego, CA, 1981,pp. 19 - 26. J. G. Nash. "A Systolic/Cellular Computer Architecture for Linear Algebraic Operations", Proc. 1985 IEEE International Conference on Robotics and Automation. March 1985,pp. 779 - 784. I. G. Nash, and S. Hansa~,"Modified Faddeev Algorithm for Matrix Manipulation",Proc. SPIE. Vol. 495,August 1984,pp. 39 - 46. H. Y. H. Chuang. and G. He, "A Versatile Systolic Array For A4aaix Compumtions". The Intcnrotional Symposium on Computer Architec-

-

Although Nash's modification of F a d k v ' s algorithm is mathematically sound, its systolic implementation, unfortunately, contains some serious deficiencies. For instance, it is possible for the array to produce erroneous results, as illusuated by the following example. Supposewe have a linear system AX = B of order n = 3 where.X is an unknown matrix. and anem more enmes in column 1 of mahix A are z m , in this case, ozl:

twe. 1985,pp. 315

- 322.

V. H. D.Le. "A New Genaal purpoSe Systolic Array for Matrix Compu&tim". M. S. Thesis, m e n t of Electrical Engineering, Portlbd State University, P.O.Box 751,Portland, Oregon 97207,April 1988.

Since the determinant of A, A (A) I 9,is a non-zero, A k full Thus, guaranteeing that a solution to the system exists and that it is unique with xI= 1.33,xz = -0.67and x3 = 1.67. When A is fed into the m y show in Figure 1 during the second step, however, the boundary cell in row 1, co!umn 1 will clear its r register (it had a , I = 1 stored prior to this) because azl = 0. This effectively transforms A into another &x, say E, whose entries are identical to A's except for e l l . which is zero, a d all further processine will be on the resulting linear system.

[lo] V. H.D. Le, and M. A. Perkowski, "A New General purpoSe Systolic Array for Maaix Computation", submitted to IEEE TC, 1988 [ll] H. HellWagner, and T. Muller, "LISAS - Loops-implemented systolic army simulator". Proc. of VLSI & Computers Conference. COMPEURO '87,IEEE,1987. Simu[E] Caominh Tuan, and U A. Perkowski, "LogiclRegister-Transfer lator with Visualization", Record of Northconl86, p a p r 11.3, Scpt u n k 30 -October 2,1986,S &. Washington. [13] TlabowscalMantd". [14] Wirrh. and Jensen, "Pascal User Manual and Rcport". In this case. sinceA (E)= 4 is m-m,E is also f u l l m k and thaG [15] G.P.Brown, R.T. Carling. C.F. H m t , D.A. Kramlich, and P. Souza fore the procedure is completed successfully, but withx, = 3.x2 = 4 andx, = "Program Visualization: Graphical Support for Software Develop-1, which is the solution to (3) instead of (2). ment",Compurer.pp. Z7-35,August 1985. The cause of the above a m W. be traced to a bug in ttie m i c r o p [ 161 D.D. Hill,"Edisim: A Graphical Simulator Interface for LSI Design", gram in the boundary cell. As Figure 2 reveals, this microprogram has a line IEEE Trans. CAD ICs, Vol. CAD-2, NO. 2, pp. 57 - 61,April 1983. of code r=O [17] RJ.K. Jacob, "A State Transition Diagram Language for Visual Prowhich always clears the content of regism r whenever x, = 0. In fact, if at gramming", Computer, pp. 51 - 59,August 1985. any time during processing the boundary cell of a row i w i v e s a z e m [18] B. Meiamed. RJ.T. Morris, "Visual Simulation: "he Performance valuedx, homanintanalcellofrowi-1,anemneousresultwillappear Analysis Workstation",Compuler. Vol. 18,No.8,pp. 87 - 94,August when the processing ends. Thus. to ccmct the problem, this line should be 1985. removed. The c m t e d Nash algorithm has not been published to our A. Sugimoto et al. "VEGA" A Visual Modeling Language for Digital [19] knowledge and we do not thinlr we wwld have been able to 6nd and c ~ m a C i u i t Design",Proc. IEEE ICCD'84. pp. 807 - 812,1984. this bug without SAGS. For the plrpose of veri6m1,.we tint found the correct s a h & ~ ~ ! [20] A. Sugimoto, " "VEGA"A Visual Modeling Language for Digital Sys1 tuns", IEEE Design ond Tesr. pp. 38 - 45,June 1984. examples (2) and 0)manually and thcn by Using O k algorithms W1-g theFaddecvalgorithmwithGivaurotations:xl=1.33,r2=-0.67.x3=1.67. ![211 A S u ~ . S . A b e , M . K l n o d a a n d Y . K a t o , " A n O b j c c t - O r i e n t e d Now that the correct result is hown. the reader is encouraged to examine the Visual Simulata for Micropogram Development". Proc. 23-rd DAC, series of snapshots in Appendix A which shows the grsphics simulation Of pp. 138 144, June 29 July 2, Las vegss, 1986. ~

173

...J

,r

I

Department ot ELecmcal and ComputerEnpeenng and The coordinated Science Laboratory

Suggest Documents