Computer Aided Implementation of Complex Algorithms on DSP's

ComputerAidedImplementation of ComplexAlgorithms on DSP's UsingAutomaticScaling Korina KASSAPOGLOU*

*

Ecole PolytechniqueFederaledeLausanne 16, ChemindeBellerive CH-1007 Lausanne,Switzerland

Abstract. A methodology transforming for complexfloating-pointalgorithmsintocorrect fixed-point DSP programs presented. is In particular, automatic an scaling scheme leadingtooverflow-freeprograms is described. application, Depending the on scaling generated automatically may either perfectly fit, orbemodifiedinordertosubstantially improveaccuracy.Eachcaseisillustrated by anexample (FFT andRecursiveLeast-Squares algorithms). 1. Introduction Most current digitalsignalprocessors (DSP's) use fixed-point arithmetic, thus leading t o scaling and precision problems when implementing complex algorithms. Therefore, development methodologies arerequiredinorderto cope with 'the problems encountered [ 121.

and

**

Martin VETTERLI**

CenterforTelecommunicationsResearch ColumbiaUniversity New YorkCity, NY 10027

algorithm (RLS) inordertoincrease its accuracy (thus leading t o potential overflow risks). Note t h a t t h e FFT is a n FIR algorithm, while t h e RLS is a n IIR one, where variables vary can substantially.

2. Themethodology The proposed Computer Aided Implementation method is top-down a step-wise refinement approachto efficient implementation of complex structured digital signal processing algorithms into machine code for signal a processor. The mainstepsare shown in Figure 1.

A computer-aided implementatjon method is briefly presented in this paper. Starting with a floating-point complex-structured algorithm and following stepwise a refinement approach, the corresponding optimized DSP code obtained. is Special attention is given t ht o e scaling assignment issue. A matrix expression t h a t controls evolution of t h e scaling throughoutan algorithmisfirstconstructed. Then i t is used,in order to determine scale factors automatically, withrespect t oa n overflow-free configuration.

Starting with a floating-point program that describes thealgorithm,thefirsttransformation simplifies itsstructural complexity. Any recursion is then removed, while external information controls code linearization, which introduced is execution for speed purposes. In order t o structurecomputationsinthe way processor a executes them, all arithmetic expressions a r e decomposed into simple operations t h a t involve only two operandsand assign theoutputto a variable. The next and quite important step performs the floating- t o fixed-point conversion, according to the theoretical basis presented in Section 3. The final translates step the fixed-point Pascal code intotheassembler of t h e target processor, mapping the algorithm into a series of macros, most of which have already beenoptimized andare available in alibrary.

The implementation method is briefly described Section in 2. In Section 3, we show how to controlthescalingthroughoutanalgorithmand we then derive the automatic scaling scheme Two different applications are described in section 4: a FFT algorithm is successfully transformedintoassembler using theresults of section 3, whilescaling obtainedautomaticallyis modified in t h e case of arecursjveleast-squares

A t eachstage of the methodology, theresulting program is error-free and executable. Additional externalinformationrequiredateachstepmust be validated using crosschecks. Any input program is required t o be compilable, so t h a t syntax tests never areneeded. The whole developmentisperformedinhigh level language (Pascal) and is processor-independent until the lasttransformation.

3. Theautomaticscaling

scheme

We adoptthe

x ZSf(X)

( 1)

x

x

X

=

x

~-[nb(X)-ll

(2)

where nb(X) denotesthewordlengthusedand is aninteger.

X

3.2 Scaling after an arithmeticoperation Before introducing the scaling mapping of a n algorithm, i t isimperative t o clearlyexpress the scaleexponents of anyoperationoutput.

'Truncation. If I least significant bits (LSB) and most msignificant bits (MSB) are truncated from X , integerrepresentation of x according t o (1)-(Z), thislatter will be represented by:

x = y

-

m

-

overflow occured 1

sf(Y) = sf(X) - m

(4)

Kote thatroundingcan always be expressed the addition of half the least significant follohed by atruncation. Multiplication (c = ab). The product holds a priori on nb(A)+nb(R)-1 bits. If I LSR and rn aretruncatedfromit,

P

= AB/(Z')

as bit,

MSB

(if no ovcrflow when truncating)

n b ( P ) = nb(A)

+ nb(3)

-

sf(P) = sf(A) + sf@) - n;

1

-

m - k

(5)

Addition (/Subtraction) (c = a+b). When adding, scalefactors of operandsmust be equal. If this is not the case, they must be adjusted using shifts:thevariablewithsmallerscalefactor will be shifted right, so thatno overflow is caused. Assuming t h a t E hasmorebitsthan B, thesum holdsonnb(h)+lbits. According totheadopted notation,operandsare obviously left-aligned. If I LSB and m MSB aretruncated,

1028

rn - 1

sf(C) = sf@) + 1 - m = sf(A)/Z + [sf(B)+u]/Z

+

1 - rn

(6a)

If u < 0,

c

= (A 2u + I3 2[nb(A)-nb(B)1) / 21

sf(C) = sf(B) + 1 - m = [sf(A)-u]/Z + sf(B)/Z + 1

-

m

(6b)

Division treated is ain Division (c = a/b). slightlydifferentway.First, the wordlength of a division outputdependsonthe desired accuracy and thus on the algorithm used. Secondly, in the casewhere A isalwaysinferior t o R, ;"i is may beleft-shifted, giving:

In general,the

division output is given by:

andthus, sf(C) = sf(.4)

= X / ( Z l ) , unlessan

nb(Y) = nb(X)

-

(3)

2-[nb(Y)-ll ZsfV)

u-it h

Y

1

following

where is a signed number less t h a n 1, and ZSf(X) is the scale factor of x. Since fixed-point arithmetic dealswithintegers, is factoredwith advantageas: -

+

Let u = sf(A) - sf@)

3.1 Notation Let x be arealnumber. representation:

x =

nb(C) = nb(A)

-

sf(B) - m

+

nb(B) - 1

(9)

Equations (5), (6) and (9) give the scaleexponent of any operation output. In the rest of this section, we assumethat we are dealingwith an algorithm, thus with set a of operations. The main idea is group to all such successive equations within a single expression and get each obtained scale exponent in terms of scale exponents of inputvariables only. 3.3 Evolution of scaling throughout an algorithm We are given algorithm an having input n variables and xi containing operations. p We assume that each operation involves only two variablesandproducesa new one. Let S be a (n+p)-vectorcontainingthe scale exponents of all n+p variables. These latter are ordered as follows: firsttheninput variables, thenthe p operation outputs jn the same order as they appear in A. The vector S is given by:

S

-

DIE SI = A [sf(Xi)] - B [mj]

+

C

( 10)

where [sf(Xi)] contamsthe scaleexponents of input variablcsonly, . [mj] gives the number of MSB truncated aftereachoperation(andthusrepresentsakind of overflow-risk index).

requirements,namely good accuracyandcorrect A , B, C, D and E arematrices(/vectors)that operation of the algorithm (no overflow, for depend on algorithmic the structure only. instance). (Dimensions: (n+p)xn, (n+p)xp, (n+p)xl, (n+p)xp and (p)x(n+p)). They are constructed using five In thecase of algorithmswheredynamicranges functions f , g, h , 1 and 0. Functions f , g,h and vary substantially, solution the 1 mapeach of then+pvariablesto, respectively, of variables obtained with (11) toobe may severe, ann-vector, p-vector, a a realand p-vector. a deterioratingtheaccuracy. If somestatisticson Function o maps each of t h e operation p variablerangesareavailable,onemayintroduce outputs to (n+p)-vector. a Table 1 shows how limited a number of low-probability overflows, thesefunctionsareconstructed using therules accuracy. This means that given inparagraph3.2,and how matrices A, B, t h u s improving the somevalues mj are modified. The new resulting C, I) and E a r e defined. values of S can be obtainedincrementally,using: The right hand side of equation (10) contains S - DIE S I = - [So - DIE Sol] - B [ ~ m ~ ](12) contributions of, respectively, scale exponents of inputvariables,numbers (mj> of MSB truncated 3.5 How t o use these results lost) (or after each operation j and some the Pascal program that constantsthat depend onthe kind of operations. When generating supports fixed-point computation simulations Correction DIE S I upon S onthe left hand side according t o Figure 1, all that is needed a t is non zero if and only if additionswith scaling operation j is mjandthejthline of ES. If this incompatibility are encountered. When adjusting non is zero, then shift a m u s t precede this scaling, no overflow may occur. Therefore, a operation. The value of will inform one comparision of twoscaleexponents is needed t o how toreadthecontents of t h e corresponding indicate which of t h e two operands is t o be memory cell. If decisions upon integers mj are shifted. 'This maycreate adependence of E upon notsatisfactory,onemaychangethemand only S,whichisavoided by usingtheabsolutevalues. solve thelastequation (12). I t is important to note that, by construction, 4. Applications matrices B, D and E are lower triangular. In addition, E h a s zeros along its diagonal. The automatic scaling scheme applied was Therefore, equation (10) is solved in a successfully t o a 16-point FFT algorithm [3]. This straight-forwardway, if onestartsat line1 and FIR algorithm, and thus there are no proceedsline by line: correction DIE SI a t line k ais constraints on output variable representations. usesthefirst k-1 elements of S only. Assuming that the input signal is normalized, the scaleexponents of initial signal samplesare 3.4 Automaticallygeneratedscaling equalto zero.Sincecoefficients are less t h a n 1, actually parametrized Scale exponents of S are their scale exponent also is zero. Solving by the pintegersmj,sincematrices A, B, C, D equation (10) using theautomatic scalingscheme and E are completelyknownonceequation (10) (ll), we determined that each stage increased established is and scale exponents of input the scale exponents of samples by 1. After t h e variablesare specified. four required stages, scale exponents FFT of samples were equal to 4. The obtained FFT I t is obvious t h a t , we if wish a n overflow-free operatescorrectly,with good accuracy. (Actually, configuration, none of the most significant bits theaccuracycan beimproved by usinga more may be truncated.Thus, allmj's mustbesetto detailedanalysistakingintoaccount t h a t a DFT zero: isamultidimensionalrotation.) mj = 0 for 1 c j c p (11) The automatic scaling scheme was also applied This gives solution a So to equation (11). As a nt o IIR algorithm, namely recursive a mentioned the in previous paragraph, this least-squares identification one. When updating solution is easily obtained if one proceeds line systemparameters,oneusesacovariancematrix by line. P, which is alsoupdatedrecursively. In orderto getconvergence, matrix P mustbe initialized t o In t h e case of algorithms where variables have high a value. We initialized i t o avaluenear small dynamic ranges and provided t h at ht e 100 and fixed its input scale exponent a t 7. input variables are represented with maximum After thescheme (11) wasapplied, theparameter accuracy,thesolution scale exponent increased by 16, while t h a t of So isquitesatisfactory. It realizesanicetrade-offbetweentwobasic matrix P raised by 18 afteronerecursion!

1029

However, we know t h a t once convergence is reached, parameter values do not vary much andvalues of P decrease. This meansthatthere exists link equations between the variables and, of course,theyarenottakenintoaccount by (lo)-( 11). Based on elementary statistics upon variableranges, we assigned valuesgreaterthan zero tothemjandobtainedthesame scaling a s in [4]. Scale exponents of recursive variables remain thus constant through recursions. Fixed-point simulationsconfirmedthat,inreality, overflow probabilityisverysmall.

equations is quitetedious. On thecontrary,the context of Computer a Aided Implementation method greatly is attractive. We are on the process of developing programstosupportthis. Finally, theautomatic scaling scheme is applied t o twoexamples: a n FIR algorithm (FFT) andan IIR one (RecursiveLeast-Squares). References

5. Conclusions

In this paper, we concentrated on the scaling issue encountered algorithm when an is implemented in fixed-point arithmetic. Equations t h a t express the scalefactor of avariable, which is output the of a simple operation, are presented. They lead t o a matrix expression t h a t controls the scaling throughout an algebraic algorithm: scale exponents of all variables are obtained in terms of scale exponents of input variables only numbers and the of most significant bits truncated (or lost) after every operation. An automatic scaling scheme, with respecttoan overflow-free configurationisthen derived. I t is obvious that direct manipulation

of these

L.R Morris, “AutomaticGeneration of Time Efficient Digital Signal Processing Software”, IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. ASSP-25, pp.74-78, Feb. 1977. H Hanselmann. ”A Concept for Mostly Automatic Implementation of Control Algorithms”, IEEE Computer Aided Control System Design Symposium, Arlington, Virginia,Sept. 1986. M Vetterli, E.Debourse, M.Kardan, “FastFourierTransforms on the TMS 320 Signal Processor”, Proceedings des JournCes d’electronique 1985, Lausanne, Suisse. Oct. 1985. K.Kassapoglou, P.Hulliger, “Implementation of Recursive Least SquaresIdentification Algorithms on the TMS 320’, Proceedings of the EUSIPCO-86 Conference, The Hague, Sept.1986. K.Kassapoglou, “Control on Scaling throughout an Algorithm Leading an to Automatic Scaling Scheme”, Internal Report of the Laboratoire d’hformatique Technique, Ecole Polytechnique FPderale Lausanne, de Switzerland.

Figure 1. Table 1. Construction of matrices A, B, C, D and E.

10 .. 01

‘an+; [O

wherestandsfortranspose,andek,nforthekthcolumn * assuming that wordlengths are alreadydecided.

of the n-dimensionalunitymatrix

24.4.4 1030

- eb,n+; .. 01

Computer Aided Implementation of Complex Algorithms on DSP's

Computer Aided Implementation of Complex Algorithms on DSP's

Suggest Documents

Scheduling Algorithms for Computer-Aided Line ... - CiteSeerX

Scheduling Algorithms for Computer-Aided Line ... - CiteSeerX

Automated implementation of complex distributed algorithms specified ...

ME6501 - COMPUTER AIDED DESIGN COMPUTER AIDED DESIGN

Fundamental CAD algorithms - Computer-Aided Design of Integrated

Complex networks and decentralized search algorithms - Computer ...

Computer-Aided Manufacturing Aided Manufacturing

Computer-aided destruction of complex structures by blasting

Equation-free: The computer-aided analysis of complex multiscale ...

Implementation of a tolerance model in a computer aided ... - CiteSeerX

Implementation of BIM/RFID in Computer-Aided Design-Manufacturing ...

Computer Aided Implementation of Estimator Based Inverse ... - LNSE

computer-aided root aided root aided root aided root-locus

Surface Matching Algorithms for Computer Aided ... - Google Sites

IMPLEMENTATION OF IMAGE PROCESSING ALGORITHMS ON ...

Implementation of Data Compression Algorithms on ...

Implementation of DSP algorithms on ... - Academic Journals

Integrating Computer Aided Design and Computer Aided ... - wseas.us

Computer-aided design and Computer-aided engineeringwww.researchgate.net › publication › fulltext › 32574037

COMPUTER-AIDED DIAGNOSIS AND VISUALIZATION BASED ON ...

27th European Symposium on Computer Aided

International conference on Computer Aided Design ...

COMPUTER-AIDED TOOL BASED ON COMMON

Recent Updates on Computer-aided Drug Discovery