An Arithmetic Unit - IEEE Computer Society

1 downloads 0 Views 4MB Size Report
An Arithmetic Unit. PETER KORNERUP AND DAVID W. MATULA ...... [12] G. Metze and J. E. Robertson, "Elimination of carry-propagation in .... Mandelbaum [91.
378

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-32, NO.

4,

APRIL 1983

Finite Precision Rational Arithmetic: An Arithmetic Unit PETER KORNERUP

AND

Abstract-The foundations of an arithmetic unit performing the add, subtract, multiply, and divide operations on rational operands are developed. The unit uses the classical Euclidean algorithm as one unified algorithm for all the arithmetic operations, including rounding. Binary implementations are discussed, based on techniques known from SRT division, and utilizing ripple-free borrow-save and carry-save addition. Average time behavior is investigated. Index Terms-Borrow-save and carry-save addition, Euclidean algorithm, fixed-slash and floating-slash representations, rational numbers, SRT-division.

I. INTRODUCTION THIS paper develops the foundations and discusses possible implementations of a unified arithmetic unit, performing the standard arithmetic operations on rational operands. The fundamental idea underlying the unit is that the classical Euclidean algorithm can be used to compute the partial quotients of the continued fraction expansion of a rational number p/q. The Euclidean algorithm can be utilized, while "breaking down" one operand p/q to determine its partial quotients, to concurrently build a sequence of approximations of p/q O r/s for some operator O), , l+,-, X, .÷1, and another operand

r/s. In a previous paper [9] the fixed- and floating-slash representations of rational numbers, and the underlying number systems were discussed. For the purpose of this presentation it is sufficient to notice that in thefixed-slash representation it is assumed that a fraction p/q can be represented in a computer word in a sign-magnitude radix representation:

Is I

q

p

q

where the fields allocated for p and q are of equal size. To provide a greater range of representable rational values, it is convenient to have the boundary between the fields allocated for the numerator and the denominator "float," thus giving thefloating-slash fraction representation: s

k

p

/

q

+

where the k-field is used to provide the slash position in some suitable representation. The slash position may be considered Manuscript received April 27, 1982; revised October 14, 1982. This work was supported in part by the National Science Foundation under Grant MCS 8012704 and by the Danish Natural Science Research Council under Grant 11-2001. P. Kornerup is with the Department of Computer Science, Aarhus Uni-

versity, Aarhus, Denmark. D. W. Matula is with the Department of Computer Science and Engineering, Southern Methodist University, Dallas, TX 75275.

DAVID W. MATULA

equivalent to the exponent of a standard floating-point representation, and may even be allowed to move "outside the word," thus providing an even further extended range (by suitable default interpretation of the empty p or q field). As only a finite number of representable rational values can be provided in any such representation, it is necessary to supplement the number representations with a rounding algorithm, in order to form a finite precision number system, and hence to define a finite precision rational arithmetic. Fortunately, there is a "natural" rounding procedure, which is termed mediant rounding [9]. This procedure truncates the continued fraction expansion at the "last representable" convergent, which in classical number theory is called "best rational approximation." This last representable convergent is of course chosen so as to fit the particular fixed- or floating-slash representation. In Section II of this paper, the mediant rounding is formally defined, and it is shown how it can be implemented using the Euclidean algorithm. In Section III the foundations of the arithmetic unit are developed. It is shown that the partial quotients of any operand p/q define useful transformations on a given 2 X 2 matrix. With initializations of that matrix depending on another operand r/s and the particular arithmetic operation 0 to be performed, the transformed matrices provide successive approximations to the result p/q 0) r/s. A number of observations, concerning the operation of the arithmetic unit are highlighted in the text. For binary implementations an alternative to the standard (canonical) continued fraction expansion is more advantageous. In Section IV it is shown that signed continued fractions can be used to formally treat those modifications of the Euclidean algorithm that result when techniques known from fast division algorithms are incorporated. Borrow-save (signed digit) representation is analysed, and it is discussed how this, and carry-save representation, can be used internally to avoid carry propagation during the add/subtract operations in the implementation of the unit. In Section V the complexity of the algorithm is discussed. Through simulations it is demonstrated that the average execution time of the unit is very close to the time required for an ordinary divide operation of equivalent accuracy. Rational numbers and arithmetic have an inherent naturalness in many applications. Extract finite representation of "6simple" rational numbers like 2/3 in contrast to the approximation of such numbers in floating-point systems, provides ample evidence in support of finite precision rational arithmetic. As the integers are included and handled in a

0018-9340/83/0400-0378$01.00 © 1983 IEEE

379

KORNERUP AND MATULA: RATIONAL ARITHMETIC

.natural (and efficient) way, one arithmetic unit can support, in a unified manner, exact arithmetic on a range of integers and "simple" rational numbers, together with a finite precision arithmetic utilizing rational approximations of the real numbers. In large scale integration such a unit could be built and used as an "add-on" arithmetic unit (possibly for a microprocessor), thus providing the user with powerful arithmetic capabilities, utilizing the most natural number system. II. MEDIANT ROUNDING

The purpose of this section is to establish the foundations for an implementation of the standard arithmetic operators upon operands from certain sets of representable rational numbers, e.g., the fixed-slash or floating-slash number systems. This section reviews the necessary terminology and notation introduced in [9]. In dealing with the formal properties of such fraction number systems we will, for convenience in this section, limit the treatment to the region of nonnegative reals, noting that the number systems under consideration use a

sign-magnitude representation. Formally a fraction, denoted P or p/q, is an ordered pair q composed of a nonnegative integer numerator p, and a nonnegative integer denominator q, which are not both zero. The quotient of p/q is the rational number determined by the ratio of p to q for q # 0, and is taken to be positive infinity when q = 0. The numerator and denominator of an irreducible fraction must have a greatest common divisor (god) of unity, other fractions being termed reducible. Two fractions are equal, denoted p/q = r/s, iff qr = ps (p/q = r/s does not necessarily imply identical numerators and denominators). In order to introduce fraction number systems, we say that the fraction p/q is simpler than the fraction r/s if and only if

fraction to also be representable. Thus analysis of the properties of simple chains should allow us to determine the properties attainable via any such characterization of a finite precision fraction number system. The fixed-slash and floating-slash number systems are thus specific examples determining simple chains. From the point of view of computer architecture, the unary operators absolute value, negate, invert and the dyadic algebraic operators add, substract, multiply, divide, and any necessary rounding must be convenient to implement in an efficient manner on any operands chosen from the finite precision number system. From the complementary point of view of approximate real arithmetic, the result of an arithmetic operation must (when approximation is necessary) yield in some sense a best approximation to the true result on the same operands. It is obvious that the unary operators are always exact for fixed-slash and floating-slash arithmetic systems. Furthermore, all dyadic operators, through elementary integer arithmetic on numerators and denominators, yield exact results representable in higher precision fixed-slash and floating-slash number systems. Thus, the fundamental issue underlying the structure of approximate rational arithmetic is the mathematical nature and implementation feasibility of a rounding algorithm which rounds fractions with relatively large numerators and denominators to simpler fractions that are good approximations. A canonical choice for the rounding is available with foundations derived from the theory of continued fractions and best rational approximation. Utilizing the notation [ao, al,,] for the continuedfraction expansion 1

ao + -, a1 +

p _ r, q _ s, and at least one inequality is strict. We then define the simple chain F to be a finite ordered set p (k-1) p (k) 1 0 p (l) p (2)

l= q(l)

F

q(2

q(k)-1 q(k)

ai

-

0

l a2 +

oJ

the partial quotients as are assumed to be integral, any of irreducible fractions ordered by monotone increasing nu- where number p/q has a finite expansion meric value of their quotients, where all irreducible fractions nonnegative rational simpler than any member of the chain are also in the chain. p -= [aO, a,,,, am] Thus, F is closed under the simpler than relation over irreq ducible fractions. Thus which is unique (canonical) with the added requirements ao _ O; ai ' 1, i = 1, , m-1; andam '- 2 wherem - 1. Th# O 1 1 2 1 3 2 3 1 truncated continued fractions F={ , _ _, , _ , _ , _ 1 3 2 3 1 2 1 1 0

_, _, _,

is a simple chain; however,

F = {1 , _31I , _,2 _,11 _,32 I2_,1 _}0I

is not a simple chain, since 3/1 f F is simpler than 3/2 E F. Any reasonable computer word format characterizing a fraction number system, that provides for a separate representation of the numerator and denominator, would almost certainly allow any fraction simpler than some representable

P=

qi

[ao,a,,

,

al,],

i 0, 1,--",m

yield rational numbers which form a sequence of continued fraction approximations of p/q (called the convergents) whose properties can be found in classical material on continued fractions, e.g., [3, ch.X] or [4]. Combining the definition of pl/qi with the fact that the partial quotients ai of the canonical expansion [ao, a1, * *,am, are the quotients implicitly computed in the classical Euclidean -

380

IEEE TRANSACTIONS ON COMPUTERS, VOL.

C-32, NO. 4,

APRIL 1983

gcd-algorithm applied top and q, the following algorithm may Suppose, for example, we wish to round 277/642 to a fracbe used to determine the convergents p1/qi. tion limited to two decimal digits in numerator and denominator. From Example 1 we obtain the "rounded value" Algorithm EC-Euclidean Convergent Algorithm F277( 22 For anyp _ 0, q _ 1, this algorithm computes in irreducible F 1642 Sl5 fraction form the convergents p1/qi of the canonical continued with a relative error of about 0.002. fraction p/q = [ao, a,, * * , am]. Theorem 1: For any simple chain F the mapping (IF: Reals b-2= P; P-2 = °; q-2 = 1; ± F satisfies the following three properties for all real x, Y. b_, =q; p-, = 1; q-, = 0. i) Monotonic: x ?F(X) bFA), For i = 0, 1, * *, while bi_1 $# 0, determine a' as the quoii) Antisymmetric: 4y(-X) =-PF(X), tient and bi as the nonnegative remainder of the division of bi-2 by bi-,, so iii) Fixed points: |x = p/q E F 4'F(X) = x. 0 bi a-bi-,1ai + bi-2, Theorem 1 is stated without proof, noting that the proof may be obtained from standard results in the theory of continued and compute fractions. Note also that the properties stated in Theorem 1 pi =pii- ai + Pi-2, are those characterizing an "optimal rounding" as defined in E qi = qi-l * ai + qi_2 [8]. If p/q, p'/q' are consecutive fractions of the simple chain F, Example 1: To illustrate Algorithm EC let us apply it to it follows from Theorem 1 that there is a real number y with p/q = 277/642. The computations are most conveniently re< y _ p'/q' such that 4F(X) = p/q for p/q C x < y and p/q corded in tabular form: .F(X) = p'/q' for y < x < p'/q', where 4.FY) must be one of i p/q, p'/q'. The following theorem states that this "splitting ai Pi point" y is always the mediant (p + p')/(q + q'). -2 277 1 0 Theorem 2: Let p/q, p'/q', p"/q" be consecutive fractions -1 642 1 0 of a simple chain F. Then 0 0 277 0 1 -

1 2 3 4 5 6

88

13 10 3 1 0

2 3 6 1 3 3

1 3 19 22

85 277

2 7 44 51 197 642

P 4?(x)

-

=

p, for P

,< x < P, + v

q +q'f

q'

qf and the only other values mapped to p'/q' are determined

by

From the table it is noticed that gcd (277, 642) = b5) and that the irreducible fractions 0 1 3 19 22 85 277 1 2 7 44 51 197 642 are the convergents of 277/642. o For any rational x = p/q = [ao, a1, * , am] or irrational x = [ao, a,, * * ], the convergents po/qo, p /ql 1, P *2M represent successively more accurate approximations to x, where every convergent is simpler than any subsequent convergent. Thus for any simple chain F there is a largest indexed convergent p,/qi e F such that no subsequent convergent is in F, and this fact provides the basis for a natural rounding procedure. Let ±F be the set of all signed fractions corresponding to the fractions of a simple chain F. For any simple chain F, the mapping 4PF: Reals -t ±F is now defined for every real number x. When po/qo, pI/q1, P q2, are the convergents of Ix I, 4>F is defined by if x = Pm/qm e F, Pm/qm if x > 0, pj/qi E F, pi+ I/qj+ I F, 'tF (x) = pl/qi -F'F(-X) if x < 0 1

(=

= I, + p, =qX 41)Fv pF( q+ql q

ifif,isimlrtaP is simpler than ,I q q'

thanP.

4PF

: i1"' =1? i is simpler 0 q q q q ~~~~q Again we simply note without proof that Theorem 2 follows in a straightforward manner from the theory of continued fractions. In view of the prominence of the mediant as the have termed the mapping 4F: Reals -F splitting point, mediant rounding to 4F. we

PERFORMING ARITHMETIC Given a particular simple chain F, and the corresponding mediant rounding = 4.F derived from F, it is possible to define finite precision rational arithmetic upon operands from i F. Utilizing integer arithmetic and the mapping 4, we may thus define finite precision rational operators corresponding to the dyadic operators X, +, +, -, as p Q3r= (E III.

q

P q

pq q

s

r3 = s r

s

qs ps q (s+qr qs)

381

KORNERUP AND MATULA: RATIO.NAL ARITHMETIC

(ps-qr\ -o8 -= ii qs q s

p

r

then

f(pl/qi) = u,/vi.

0

However, it turns out that the above mentioned integer exObserve that although pl/qi is in reduced form, uA/Vi as pressions in operand numerators and denominators need not computed by EAA need not be so. Also, note that the uZ/vi do be computed as integer expressions utilizing the normal alnot in general form a sequence of convergents of f(p/q). gebraic rules for fraction arithmetic, but can be computed However, implicitly by the Euclidean algorithm. To see this notice that the initial matrix of Algorithm EC Ui Ui-1 UiVi I -ViUi-I

IP-2 q-2 0l 11 lP-i q-1. LI o.

Vi

Vi-1

vivi- I

(ad-bc)(qpi j-pi qji-) vivi-1

may be replaced by an arbitrary-2 X 2 seed matrix

(-1i ad - bc. [a-2 V-2_ [a cl vivi- 1 |u_l v_1 |b d] to yield a sequence of pairs (ui, vi) instead of the (pi, qi) pairs, where the numerator (ad - bc) is the determinant of the seed where ui and vi are then each linear combinations of pi and qi. matrix. Iff has no pole in the closed interval [pi/qi, Pi+ l/qi+ I] (or Further, note that Algorithm EC works as well if p and/or q equivalently vi and vi+ I are nonzero of the same sign) then are negative, q # 0, by choosing the remainder bi of the same sign as the dividend. Also if q = O, p X 0, thenp-ll/q-l = 1/0 Ui, (l u_ ui+I may be considered the only convergent. q Vi Vi Vi+ -I Algorithm EAA (Euclidean Arithmetic Algorithm) Iad - bc

for i < m. vivi+ For any (p, q), q # 0 and (a, b, c, d) this algorithm computes (ui, vi) such that ul/vi = f(pe/qi), where pj/qi We may now pick particular seed matrices to realize the are the convergents of p/q, and f is the bilinear form basic arithmetic operations. These will correspond to the f(x) = (a + bx)/(c + dx). "Curried" operations of LISP, i.e., they combine an operator with an operand, the operand being a rational number. = b-2 p; u-2 = a; V-2 = c; Theorem 3: Algorithm EAA, when seeded with the folbI = q; u-, = b; v-, = d. lowing matrices, = For i 0, 1, *, while bi- I X 0 determine ai as the quotient [r s |-r s1 0° sl and 0 l, and bi as the remainder (of the same sign as bi-2) of the divi0 Lr 01 Is o0 s O0 sion of bi-2 by bi-,, so implement, the Curried operations bi -bi-, * ai + bi-2 1

-

-

s

Add r/s, Sub r/s, Mult rls and Div rls

and compute

Ui = ui

I

ai + Ui-2,

Vi =vi_- I

ai + Vi-2-

O

respectively, applied to the operand p/q. Proof: With

Proof of the following lemma is immediate by induction on u, and vi as linear combinations of pi and qj See [5, p. 360, and bt d] s °0 p. 602] for related developments and novel extensions due to it follows that R. W. Gosper. ui = rqi + spi and vi = sqi, Lemma 1: The (ui, vi) i = 0, 1, , m determined by Algorithm EAA with seed hence -

|b d] satisfies the following relations us = aqi + bpi, vi = cqi + dp,

where pl/qi, i = 0, 1, if

, m are the convergents of p/q. Also,

f(x)

a

+ bx dx

+ c+

ui _rqi +spi= r+ Pi. s qi Vi sqi

The other three cases follow similarly. o Example 2: To illustrate the "arithmetic capability" of Algorithm EAA, let us compute 371 243

26 ( -11 17 4131

Combining the subtract operator with the operand 26/17 yields the seed matrix:

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-32, NO. 4, APRIL 1983

382

-26 17 17 0

and the computations of the EAA algorithm are recorded in the following table:

bi_- I

bi

Ui-i

IUi

Vi-i

v'

in the notation of the definition of the algorithm. Execute the algorithm until either bi = 0, or until an attempt is made to i compute a uj+ I or vi+ l whose magnitude exceeds the capacity vi of the registers. The result of the algorithm is then defined to -2 371 -26 17 be the pair ui, vi, which may now be written into the registers 243 -1 17 0 originally containing p and q, thus forming the new accumu1 0 128 -9 17 lator contents. 1 1 8 115 17 1 2 13 -1 34 When the rational number in an accumulator is to be stored 3 0 11 8 289 away in packed format, an identity operator seed matrix is used 2 1 4 -1 323 in Algorithm EAA, and the algorithm is stopped with the last 1 5 -5 5 1904 values of u and v which will fit the packed representation, thus 0 2 -11 6 4131 realizing mediant rounding. Observation 4: (Precision.) Let F be a rational number yielding the correct result. Notice furthermore that system (e.g., a particular fixed- or floating-slash system). Any given p/q, p/q can be rounded into F by the mediant rounding U3 0 =O 4PF, say 4 M/q) = Pk/qk (the kth convergent of p/q). Assume 289 V3 the arithmetic unit is capable of holding u's and v's as large is caused by the fact that 26/17 happens to be a convergent of as the numerator and denominator of the result of any dyadic -X, .÷), applied to any pair of op0 arithmetic operator (+X, 371/243 (cf. Observation 2 below). erands from F. Observation 1: (Convergent Computation.) Applying an When the arithmetic unit is initialized with p/q, and seeded identity operator (e.g., add 0/1 or mult 1 / 1) produces the sewith r/s E F and any operator, the resulting u/v will be the = = quence ui pi, vi qj, i.e., performs Algorithm EC. ol exact result of the operator applied to rls and some convergent Observation 2: (Fraction Algebra.) The computed ui and ofp/q, where]j k. That is, the unit will compute a result vi are the very same numerators and denominators that would pj/qj have been obtained using standard arithmetic rules on pl/qi u/v such that and r/s. a Observation 3: (Chain Computation.) p/q need not be in reduced form, as only its partial quotients will affect the |v (q ( s\)||( (/ 8 s) \q @)s) computation of u, and vi. p/q may thus be the (possibly unre0 0 where ®) is the operator applied. duced) result of a previous arithmetic operation. with the is that unit Assuming supplied "double-length" Theorem 3, together with the previous three observations form the key ideas for an arithmetic unit working on rational registers the accumulator is capable of holding intermediate operands. Picture the arithmetic unit as containing 6 registers results in an "extended precision" which may be utilized in a subsequent arithmetic operation. The previous observation organized and initialized the following way: states that "at least single precision" will be utilized in the next operation, however the amount of "extra precision" (i.e., how many extra partial quotients will be extracted from the accup a c mulator contents) will depend on the magnitude of the nub d q merator and denominator of the next operand (i.e., how simple Consider p, q representing an operand (the rational number the next operand is). p/q) being the contents of one "accumulator" (out of possibly If it is found desirable that the rounding of the previous many). Given a second operand r/s (say fetched from a (temporary) result (implicitly taking place during a subsequent memory in fixed- or floating-slash packed format), the seed arithmetic operation) should conform to the rounding 4) into matrix entries a, b, c, d may be formed to correspond to any the storage representation, then the arithmetic unit could be specific Curried arithmetic operator, as specified in Theorem supplied with four extra registers, which when seeded with a 3. unit matrix, could be used to terminate the algorithm at the Now let the arithmetic unit perform Algorithm EAA on the index k yielding Uk/Vk as a function of the convergent pk/ rows of the above register set-up, subtracting the appropriate qk-. multiple of q from p, while adding the same multiples of b and Also given that the unit needs registers large enough to hold d to a and c, respectively. Discarding the top row, and "sliding results which correspond to "double precision," one might ask up" the old bottom row together with the newly formed row, whether there is any way that say two "normal precision" the new register contents are defined. words could be used for storing a "double precision" result. The After the ith step of Algorithm EAA, the contents of the 6 following observation provides a partial answer to this quesregisters are thus: tion.

383

KORNERUP AND MATULA: RATIONAL ARITHMETIC

Observation 5: (Multiple Precision Representation.) When the result of a mediant rounding of some "accumulator contents" has been built up as described in Observation 1, the (ui, vi) pair which has been packed away represents the "most significant part" of the original p/q. However, the partial remainders bi and bi+ contain all the information needed to compute the "remaining" quotients of the expansion of p/q. Hence, if after storing (ut, vi) away, the unit is seeded again with the identity matrix, the unit can go on computing a rounded value of bl/b+ 1, and so on until all information from the original p/q has been extracted in the form of "packed" rational numbers. The original p/q can later be restored (in reduced form) from its components, by seeding the unit with the identity matrix, and successively loading components into the "p" and "q" registers and breaking them down into quo0 tients, while building up (u, v) pairs. The previous observation only provides a sketch of handling multiple precision operands. There are some problems concerning very large quotient values, whose solution requires floating-slash representations allowing the slash to move "outside" the word, thus providing scaled representations. We will not discuss the details here, but raise another obvious question. Since the integers form a subset of the rationals, the arithmetic unit as described handles integers in a natural way, but what about efficiency? Say if the unit is to divide two integers i and j, represented as i/ 1 and j/ 1. Thus the unit will be initialized as follows: 1

such that

lpI = I c|, then the following initialization: p c

0 q b 0

will yield the same result of Algorithm EAA (in one cycle) except possibly for common factors of ui and vi in the (ui, vi) 0 pairs. IV. BINARY IMPLEMENTATIONS

As noted in the previous section, in an implementation of Algorithm EAA it is only necessary to store two consecutive values of the bi's, ui's and vi's, and it is not necessary to record the quotient values ai-not even to have these exist explicitly. Now, a shift/subtract binary implementation of Algorithm EAA will be described, based on the idea that bits of the binary representation of at (the quotient of b_2 and bi_1) can be determined and immediately used to accumulate multiples of ui- 1 and vi- 1, thus building up ui and vi respectively. Utilizing hardware parallelism, the subtractions of the division and the additions involved in the multiplications can be executed in true parallel. For the implementation of Algorithm EAA we will use a binary, two's complement representation of the integers. Let us assume the existence of 6 registers, named P, Q, A, B, C, and D corresponding to their initializations in EAA. The registers are pairwise connected to an add/subtract circuitry, controlled by the signs of the contents of the P and Q registers. i 0] Furthermore, a shift-register K is used to keep track of any 11 0 of the contents of registers. The capacity (size) unalignment and the unit will divide 1 into i, while multiplying 1 by the of the registers is for the moment left unspecified. quotient found (i). By shifting out the top row and adding the For simplicity we will, in the following binary implemennew from below, the result is tation of Algorithm EAA, assume that the P and Q registers have been initialized with nonnegative values p and q, re1 1 0 spectively. 0 j

which is fine, but it was a cumbersome way to get i/j. Also the add and subtract operation on integers looks inefficient, but fortunately there are some tricks, as noted in the following observations. Observation 6: When the arithmetic unit is seeded and loaded with values p

q

a

c

b 0

Algorithm BSSE (Binary Shift/Subtract Euclidean Algorithm) Given initializations p, q, a, b, c, d of the registers P, Q, A, B, C, D such that p _ 0, q - 0, (p, q) < (0, 0), this algorithm computes (ut, vi) such that Ui Vi

f(PlA i

aqi + bpi

cqi + dpi

where pl/qi are the convergents of p/q. K is an auxiliary register initialized to 1. b a c while $P and Q not normalized} q p O do {leftshift P and Q); while Q # 0 do will yield the same result of Algorithm EAA (in one cycle), begin except possibly for common factors of ui and vt in the (ui, vt) while IQ not normalized) 0 pairs. do $leftshift Q, B, D and KI; Observation 7: When the arithmetic unit is loaded and loop seeded with values P' := P-Q and A' :=A+ B and C':= C+D; c P if P' _ 0 then q b O beginP := P'and A := A'and C := C'end; such that

I

q

=

lb then the following initialization:

384

IEEE TRANSACTIONS ON COMPUTERS, VOL.

C-32,

NO.

4,

APRIL

1983

Fig. 1. Hardware for Euclidean arithmetic algorithm.

exitloop if K = 1; where p'/q' = p/q, and gcd(p', q') = 1. K is an auxiliary register initialized to 1. fleftshift P and rightshift B, D and K} end; while {P and Q not normalized} swap (P, Q) and swap (A, B) and swap (C, D); do Ileftshift P and Q}; end. end.00 while Q /- 0 do begin Possible parallelism has been expressed using and as the while IQ not normalized) separator between statements, as opposed to ";" for sequential do Ileftshift Q, B, D and K}; execution. loop Since the algorithm implicitly computes partial quotients while {P not normalized} do of the canonical expansion p/q = [ao, al1, , am], the outer begin loop can be terminated according to other criteria than Q = exitloop if K = 1; 0, and the values ui, vi found in the registers B and D will correspond to Ileftshift P and rightshift B, D and K} end; ui aqi + bpi if sign (P) = sign (Q) + thenP := P-Q and A :=A + B and C:= C + D vi cqi dpi elseP:= P + Q andA A-AB and C := C- D; for the corresponding convergent pi/qi, 0 _ i _ m. end; Observe that the previous algorithm in the implicit comswap (P, Q) and swap (A, B) and swap (C, D); putation of the partial quotients mimics the classical restoring end. 0d division process (albeit by using trial subtractions instead of actually restoring). Also notice that properly encoded, the The algorithm could be "patched" to compute the sequence individual steps of the inner normalization loop, together with of canonical convergents, by checking the sign of the remainder the right shifts and the successful subtract operations, provide (in P) when the exit from the inner loop is to take place, and the LCF-representation [ 1 1 ] of p/q = [ao, al, * , am ] possibly correcting with an extra add/subtract operation. Combining techniques known from the SRT-division proHowever, it turns out that not doing so increases the speed cess [2], including those of nonrestoring division, we obtain of the algorithm significantly, and since we only want canonical the algorithm given below. In this algorithm we will encounter convergents upon termination, it is sufficient (and beneficial) some problems due to redundancy, however, not as in the SRT only to "patch up" at the end of the algorithm. A heuristic argument for the increase in speed is that Almethod due to the problem of the redundant digit set I-1, 0, 1 being used, but caused by the fact that the algorithm does gorithm BNE makes an attempt (not always successful) to not always choose remainders of the correct sign. Hence the minimize the absolute value of the remainder, instead of algorithm does not compute (ui, vi) pairs corresponding to choosing the one of correct sign. If the correct remainder canonical convergents; however, the final (u, v) pair will be happens to be close to the divisor and is chosen, then the next correct. partial quotient will be unity. If, however, the minimal remainder is chosen, the next quotient will have an absolute value Algorithm BNE (Binary Normalized/Nonrestoring of at least 2. Since the growth of the numerators and denomEuclidean Algorithm) inators is governed by the magnitude of the partial quotients, Given initializations p, q, a, b, c, d of the registers P, Q, A, the growth rate is greater when quotients of absolute value B, C, D such that (p, q) (0, 0), this algorithm terminates unity are avoided (and they happen to be very frequent in cawith values u and v in the registers B and D, satisfying nonical expansions, see [5] and [10]). Algorithm BNE computes pairs ui, vi corresponding to u = aq' + bp' "convergents" (defined in the usual way) based on a signed v = cq' + dp' continuedfraction expansion

385

KORNERUP AND MATULA: RATIONAL ARITHMETIC

-= aO+

q

ao +

P-iX = Pk and Pi+ 1 - Pk+l qi-1 qk qj+ I qk+ I

1 a2 + am

where the partial quotients ai are arbitrary integers, except that ai1$ O and [ai, ai+I, , ,am] O for I _ i < m. The sequence of as's used in Algorithm BNE in computing (ui, vi), i.e., determining the convergents p1/qi, are the partial quotients of such a signed continued fraction. To see this, and to analyze the sequence of convergents computed by Algorithm BNE, we need the following lemma. Lemma 2: The sequence of partial quotients as, implicitly computed by Algorithm BNE, satisfies the following. i) jail >-I1 for I -