Digit-set conversions: generalizations and

4 downloads 0 Views 870KB Size Report
digits sequentially, least significant digit first, by the standard way of obtaining a radix representation and progressing in the direction of the carry propagation.
IEEE

622

frame buffer architectures. In particular, while conventional frame buffer architectures permit parallel access to consecutive pixels on a single scan line, the multiaccess frame buffer discussed here permits parallel access to constant area rectangles; included in this set of rectangles are both horizontal and vertical line segments. The area of the rectangles is determined by the number of parallel memory units used in the frame buffer design. The cost of the increased level Of parallelism in the frame buffer system is additional hardware necessary to perform the more complex routing and addressing computations that accompany the noninterleaved address mappings,

REFERENCES K. Batcher, “The multidimensional access memory in STARAN,” IEEE Trans. Compur., vol. C-26, pp. 174-177, Feb. 1977.

S. Demetrescu, “High speed image rasterization using scan line access memories,” in Proc. I985 Chapel Hill Cont VLSI, 1985, pp. 221-243. J. Frailong, W. Jalby, and J. Lenfant, “XOR-Schemes: A flexible data organization in parallel memories,” in Proc. Int. Conj: Parallel Processing, 1985, pp. 276-283. H. Fuchs and J. Poulton, “Pixel planes: A VLSI oriented design for a raster graphics engine,” VLSI Design. vol. 2, no. 3, pp. 20-28, 1981. N. Gharachorloo, S. Gupta, R. F. Sproull, and I. E. Sutherland, “A characterization of ten rasterization techniques,” Comput. Graphics, vol. 23, pp. 355-368, July 1989. S. Gupta, R. F. Sproull, and I. Sutherland, “A VLSI architecture for updating raster-scan displays,” Comput. Graphics, vol. 15, pp. 7 1-78, Aug. 1981.

D. T. Harper, 111, “Block, multistride vector, and FIT accesses in parallel memory systems,’’ IEEE Trans. Parallel Distrib. Sjsr., vol. 2, pp. 43-5 I , Jan. 1991. D. Lawrie, “Access and alignment of data in an array processor,’’ IEEE Trans. Comput., vol. C-24, no. 12, pp. 1145-1155, Dec. 1975. D. Lee, “Scrambled storage for parallel memory systems,” in Proc. Int. Symp. Comput. Architecture, May 1988, pp. 232-239. A. Norton and E. Melton, “A class of boolean linear transformations for conflict-free power-of-two stride access,” in Proc. Int. Cor$ Purullel Processing, 1987, pp. 247-254. D. L. Ostapko, “A mapping and memory chip hardware which provides symmetric reading/writing of horizontal and vertical lines,” IBM J. Res. Develop., vol. 28, pp. 393-398, July 1984. G. Sohi, “High-bandwidth interleaved memories for vector processors-A simulation study,” Tech. Rep., Comput. Sci. Dep., Univ. Wisconsin-Madison, Sept. 1988. R. F. Sproull, I. E. Sutherland, A. Thompson, S. Gupta, and C. Minter, “The 8 by 8 display,” ACM Trans. Graphics. vol. 2, pp. 32-56, Jan. 1983.

D. S. Whelan, “A rectangular area filling display system architecture,” Comput. Gruphics, vol. 16, pp. 147-153, July 1982.

TRANSACTIONS ON COMPUTERS, VOL. 43, NO. 5 , MAY 1994

Digit-Set Conversions: Generalizations and Applications Peter Komerup

AbsbaCt- The problem of digit set WJnversion for fixed radix is investigated for the case of converting into a non redundant, as well as into a redundant, digit set. c~~~~~~~~~ may he from very general digit sets, and covers as special cases multiplier reedings, additions, and certain multiplications. We generalize known algorithms for conversions into non redundant digit sets, as well as apply conversion to generalize the c)(log 7 1 ) time algorithm for conditional sum addition using parallel prefix computation, and a comparison is made with standard carry-lookahead techniques. Examples on multi-operand addition are used to illustrate the generality of this approach. O( 1 ) time algorithms for converting into redundant digit sets are generalized based on a very simple lemma, which provides a framework for all conversions into redundant digit sets. Applications in multiplier recoding and partial product accumulation are used here as exemplifications.

Index Terms-Computer arithmetic, conditional sum addition, digit set conversion, multiplier recoding, non redundant and redundant representation, on-the-fly conversion, parallel prefix computation.

I. INTRODUCTION Conversion of a number from one radix representation into another plays an important role in computing, and in particular in many implementations of basic arithmetic functions, like addition, subtraction, multiplication, and division. We shall here investigate only the problem of digit set conversion for fixed base, i.e., given a radix representation of a base h number over some digit set, find a radix representation of the same number in the same base, but represented using another digit set. It is one purpose of this paper to point out how a number of the “tricks of the trade” in computer arithmetic are actually just particular instances of digit set conversions. By analyzing the problem in a very general context, we provide results and derive algorithms that are generalizations of known results, and demonstrate through some examples how they may be applied. In particular, in many cases we allow the digit sets to be very general sets of integers, for example, not necessarily just a contiguous set of integers. This also implies that the set of carries that can be handled goes beyond the usual (0, l}. Some of the results and examples presented are probably well known to people who have designed and built systems, but possibly in a more limited and specific context. Another goal is to demonstrate that arithmetic algorithms and their properties can be analyzed at the digit level, the actual encoding of digit values is only of concern when designing the actual circuitry. Although there are notable examples on digit-level designs, very often arithmetic algorithms are described without a clear separation between the issues of the algorithm and its logic implementation. This is not to imply that the encoding is unimportant, but it cannot change the fundamental properties of the algorithms. In complexity terminology the encoding chosen can only change the time and area by constant factors, but is as important as layout and floor planning is in VLSI implementations. Manuscript received May 6, 1992; revised December 1, 1992. This work was supported by the Danish Natural Science Council, Grants 11-8243 and 5.21.08.02.

The author is with the Department of Mathematics and Computer Science, Odense University, Denmark. IEEE Log Number 9213779.

0018-9340/94$04.00 0 1994 IEEE

623

IEEE TRANSACTIONS ON COMPUTERS, VOL. 43. NO. 5, MAY 1994

In Section I1 we discuss conversions into nonredundant digit sets. The first result is a formal proof of an obvious observation: For any conversion from a digit set D into a nonredundant digit set E , with D # E , there exist situations where the most significant digit of the result depend on the least significant digit of the number being converted, hence such conversions must take time [](log n ) . After introducing the concept of a conversion mapping we generalize the algorithm for "on-the-fly" conversion [ 11 for converting in linear time, most significant digit first. We then derive an O(1ogn)-time algorithm for conversion based on parallel prefix computation as in [ 2 ] , [3]. It turns out to be equivalent to the well known conditional sum adder [4],when applied to the problem of dyadic addition, but may be employed in many other situations; for example, it is demonstrated how this conversion may be applied also for multioperand addition, where the set of carries is larger. It is thus demonstrated that conditional sum addition can be generalized to multi-operand addition, but it is also shown that this approach is in general not competitive with the traditional approach, where multioperand addition is performed in a redundant digit set, which is then later converted into a nonredundant digit set. Section 111deals with conversions between systems with redundant digit sets, for the purpose of reducing the redundancy index. It is well known that addition in redundant digit sets can be performed with limited carry ripple, hence in constant time [SI. A simple lemma here provides the foundation for all such conversions into a particular contiguous, redundant digit set; providing bounds for the carry values, and describing the set of digits that can be converted. Conversion of a radix polynomial can then be performed in a digit parallel process in time O ( 1 ) , where the generated carry is absorbed with no further carry ripple. It is observed that one such conversion can reduce the redundancy index by a factor of 3, the base of the system. Repeated applications then allow for conversions where further reduction of the redundancy index is needed. It follows that, in general, conversion into a redundant digit set can be performed in time O(log, a), where 5 = maxdED Id1 and D is the source digit set. Some applications are then discussed, including addition, multiplier recoding, and partial product accumulation. Following Matula [6] in this paper we use a somewhat formal notation of radix polynomials as representations of numbers, to be distinguishable from the values represented. Given an integer r a d i x or base 3, 2 2, a radix ,3 polynomial is an expression of the form:

where the digits d , are integers, d , E Z, belonging to some digit set D that is finite and such that 0 E D. The square brackets [ I around 13 are used to distinguish the radix polynomial P in (1.1) from its real value, which is denoted IlPll, and can be expressed as: m

IlPll = p

zaz.

(1.2)

2=1

This allows us to discuss different representations of the same number or value. The representations (1.1) are assumed to have a finite number of terms, hence their values (1.2) are rational numbers. In examples we also use string representations, using ordinary symbols for digit values, including the overbar to denote negative digit values. Since we discuss digit set conversions, the base ,? remains fixed, and we denote the set of base 9 radix polynomials over some digit set D by P[P.D ] , i.e.,

Occasionally we shall also use polynomials containing a single term, so a radix polynomial M E P[@, D ] of the form M = d[iJ]' will be called a radix ,!5' monomial of order i . Also we may define the jixed point radix polynomials as a set of the form:

for some fixed values of L and M . and the set of integer radix polynomials: m

{

dl[/?]' 1 P E P[/3,D ] ,0

P,[J, D ] = P = %=I

1, Z : [ V , 3 , ~ ( i ) l1.


0. Now consider a family of polynomials (Q(J)},,,.Q(J) E PI[^, E ] chosen uniquely such that I I Q ( j ) I I = j 1 e l . then d , , ( Q ( O ) ) = 0. If d n ( Q ( k ) ) # 0 let J O = k . otherwise assume . that d , , ( Q ( k ) )= 0 and consider the sequence { Q ( J ) } , = ~ + I A + L Since .For, [ 1. E ] is finite there must be a first J = 10 for which d , , ( Q ( j ~ )# ) 0 whereas d , ( Q ( j ) ) = 0 for 0 5 J < 1 0 . For the j o chosen let

+

+

QI

=Q ( J 0 -

0 2

=()(.lo)

+ d " , ( Q l )= 0

- e1

+ * d2

d,,(Q2)

+

# 0.

Since D is complete for base 3 there exists a P E P I [j. D] such that llPll = 10. With

we then have IlQlll = llP1ll and I l Q ~ l= l IIPzII. For k < 0 just consider the sequence { Q ( J ) ) , = - L 1-1 but otherwise proceed as above From this result it is seen that any conversion into a nonredundant digit set must take time (](log 12 ) For a conversion from P[J, D ] into P[,J,E ] ,where E is assumed complete and nonredundant, any digit d E D can be rewntten uniquely as rl = ( j + e with r E E and c belonging to some carry Jet C' In general an incoming carry has to be added before conversion, so a conversion mapping n is

.

n:CxD+CxE defined along with C such that for a1 ( c ' , d ) E C x D there exists ((..e ) E C x E such that c'

+ d = c j + e.

(2.1)

A conversion mapping is most conveniently described as a table that can be constructed from D and E in an iterative process, also building up the carry set C. Initially C only contains 0, and new members of C are added when a new value of c in (2.1) is encountered, when rewriting c' d for some previously known value of e' E C. Example: With 3 = 3.D = {0.1,2} and E = (-1.0.1) the following table for the conversion mapping n is constructed by row, starting with a row for r = 0 and later adding a row for c = 1 :

+

if the conversion can take place in parallel with the process generating the digits. Such a process is called on-the-jy conversion [l], and basically consists in updating several radix polynomials in P[P,D] (the target system) whenever a new digit becomes known. Each polynomial represents a correct prefix of the result, corresponding to an assumed value of the carry-in, i.e., there is one polynomial for each member of the carry set C. The following is a generalization of results in [l]. Theorem 2.2: Let P = Cp d 7 [ S I 2E P[$, D ] and C be the carry set for the conversion mapping n : C x D -i C x E , where E is complete for base 3 . For all c E C define Qh+l E P [ B . E ] such that llQ&+lll = c/jm+'. and let for k = m.m - 1 , . . . , 1 and for all c E C.Q', be defined by Qi = Q - 7 i ) , j + , l . But D S need not be contiguous, and D is not necessarily the smallest digit set enclosing D,. so there might be other choice5 of ( u . and ( I', s ) also satisfying (3.4) that yields better actual reduction of redundancy. But the lemma may be applied recursively, so conversion into a redundant system from any source system with finite digit set can be performed in true parallel in a constant number of levels (C7( 1) time). The number of levels required depends on the redundancy index p = ID1 - , j of the source digit set D . Note that each application of the lemma changes the redundancy index by a factor ,? if 1 1 - u = ( s - r 1) - J. This is formalized in the following theorem, which we state without proof. Theorem 3.6: Let D S be a digit set for base ,j 2 2 with 6 = riiax,16u Id[. Then for any contiguous digit set E which is redundant and complete for base !j,and for all P E ' P [ ; j D . s ] .'I can be converted to a polynomial Q E P [ d , E ]with 11Q11 = llPll in time Cq(log, h ) . 71

1 1 )

+

IEEE TRANSACTIONS ON COMPUTERS, VOL. 43, NO. 5, MAY 1994

628

The use of redundant digit sets is probably best known in the context of constant time addition, as known in “carry-save addition,” or based on the classical paper of Avizienis [ 5 ] , and more recently studied in [13]. The present analysis based on conversion may be applied to derive and analyze algorithms for addition in redundant digit sets. For example, for the addition of two operands from D = { d l r 5 d 5 .s} we need a conversion from D+ = { d ) 2 r 5 d I 2s) back into D . assuming D is redundant, i.e., s - r 1 > ,?. If the redundancy index of D is p~ = (s- T 1) - d, then the redundancy 2 p n - 1. If conversion from D+ to index of D+ is pD+ = i3 D is to take place as a single process of (sum) digit rewriting and absorption, it is necessary that ,-I - 1 pu+ - ,? 2 p n - 1 = 2 + I 83,

+

+

+

+

~

PD

PD

PD

so we can conclude that ;jmust be at least 3 and p~ at least 2. As known, for ,? = 2 it is necessary with two levels of conversion if the digit set D is {0,1,2} or {-1,0, l}. but as confirmed above it is impossible to perform addition for ,j = 2 with one level of conversion. It is now straightforward to design very general adders, where operands are from distinct and very general (even noncontiguous) digit sets, and the result is in a different digit set. The set theory of [9] may then again be used by decomposition to design an implementation of the adder, in particular for higher radixes. We shall not pursue addition further here, but will conclude with a few examples of digit set conversions for some other applications, the first covering multiplier recoding. In the implementation of multiplication it is often advantageous to convert the multiplier from base 2 to a higher radix, say ;j = 4 or 8, using the appropriate balanced, minimally redundant digit set. Here we shall assume that the multiplier is in redundant carry-save form, possibly the result of some previous arithmetic operation. Example (Conversionfrom Redundant CarrySave 2’ Complement into Minimally Redundant, Balanced Base 4): Grouping two base 2 digits from the digit set {0.1,2} we get the base 4 digit set = 0,7’ = 1 we may choose r = -1 (0.1.2.3.4.5.6) With and s = 3 in Theorem 3.4, so we first map into the digit set { - 1 . 0 , l . 2 , 3 } using { -1.0.1.2) as the intermediate digit set with carries in (0. l}.Through another application of Theorem 3.4 we map from { -1.0,1.2.3} into { -2. -1.0,1.2} using {-2, -1,O. I} as the intermediate and (0.1) as the carry digit set. We may illustrate the conversion as follows:

{0,1,2) x

c

{ O I L

then becomes { - 2 , -1,0, 1 , 2 } and hence redundant, which implies that the condition could be relaxed so that the carry may be generated when d , = 2, but it need not be generated in this case. Hence testing whether the right hand neighbor carry-save digit (base 2) is nonzero then becomes a sufficient condition for generating the carry. The testing for carry generation in the next conversion will depend on the actual coding of the intermediate digit set. Assuming we have an odd number of digits to convert, the most significant position maps trivially into the target system digit set. There is no problem with the position in 2’ complement having negative weight. For an even number of digits, the most significant two positions combines into the digit set { -4, - 3 , . . . , 2 } , which with outgoing carry in {-1,0) and incoming carry in (0, l} again maps into { - 1,0 , 1 , 2 , 3 } in the first conversion.

0 As a final example we show how to systematically derive digit sets to be used in a class of multipliers, where accumulation of partial products is to be performed in a redundant digit set. The same principles may be used when designing other types of multipliers, e.g., tree-structured or modular multipliers, and for any base. Example (Partial Product Accumulation, Systolic Multiplier Cell, Base 0): Consider the multiplication of two operands in P[P,D ] with D = { d l r 5 d 5 s } , r I 0 5 s and s - r 1 = $, Le., D is assumed to be a nonredundant digit set for base [l. The partial products are then radix polynomials whose digits are products d l . dz with ill.dz E D , so the partial produces are in P[P, D * ] where

+

D* = {airs

5 a 5 I I I ~ L(Xr Z . s 2j).

Partial products can then be accumulated to form the complete product in P [ $ , F ] where F may be chosen as a digit set F = { f l k 5 f I I } with I - k 1 > 9,so that F is redundant for base 9. In each step of the accumulation a partial product is then to be added to the previous sum, so we must convert from P[P,E ] where

+

E = {elk

+ r s I c I 1 + max

( T ~ , S ~ ) }

into P[iir,F ] .Applying Theorem 3.4 we must choose a carry set C = {clu I c 5 t i } by selecting 71 and 1’ to satisfy f I 11 I 0 I II 5 1 together with

and

2)

where we can note that these bounds are independent of k and 1. To simplify this conversion we may choose T , s such that we minimize the cardinalities of E and C. which we obtain by minimizing the cardinality of D*. hence we choose:

or

Note that we cannot perform this conversion in one step. Since the target system digit set is minimally redundant, the only possibilities for ( I / , v ) are (0, 1) and (-1.0). and obviously the last choice will not work. We could, however, choose r = -2 in the first conversion, which would change the condition for generating the carry to rl, 2 2. This may be simpler to test for, in particular since the intermediate digit set

so D is balanced for ,? odd and “almost balanced” for 3 even. Finally, having chosen the digit set D and the carry set C we have to choose the digit set F, Le., choose k and I , which must satisfy k I I L 5 0 5 1 1 5 1 and I ) - u 5 ( 1 - k 1) - P, hence

+

1-k

2 3 + tj - I /

- 1.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 43, NO. 5, MAY 1994

629

K. Efe, “Multi-operand addition with conditional sum logic,” in Proc. 5th IEEE Symp. Computer Arithmetic, 1981, pp. 251-255. T. M. Carter and J. E. Robertson, “The set theory of arithmetic decomposition,” IEEE Trans. Comput., vol. 39, pp. 993-1005, 1990. A. D. Booth, “A signed binary multiplication technique,” Q. J. Mech. Appl. Math., vol. 4, pp. 236-240, 1951. G. W. Reitweiner, “Binary arithmetic,” in Advances in Computers, vol.

For ,3 = 4 we can choose: r=-2

~

u=-1 b = -3

v = l I =2

=

a l D={-2,-1,O,l) D* {-2,-1,0,1,2,3,4} ==+ C = { - l , O , l ) ===+ F = {-3,-2,-1,0,1,2} ===+ E = {-5,-4 ....,6)

and the conversion in the accumulation of partial products can then be performed as

y y {-5, -4,

* *

*, 6)

1-29 -1,o, 1) {-3, -2, -1,o, 1,2}

1, NY: Academic Press, 1960, pp. 261-265. 0.L. MacSorley, “High-speed arithmetic in binary computers,” Proc. IRE, vol. 49, pp. 67-91, 1961. B. Parhami, “Generalized signed-digit number systems: A unifying framework for redundant number representations,” IEEE Trans. Comput., vol. C-39, no. I , pp. 89-98, 1990. H. Sam and A. Gupta, “A generalized multibit recoding of two’s complement binary numbers and its proof with application in multiplier implementations,” IEEE Trans. Comput., vol. 39, pp. 1006-1015, 1990. S. Vassiliadis, E. M. Schwartz, and D. J. Hanrahan, “A general proof for overlapped multiple-bit scanning multiplications,” IEEE Trans. Comput., vol. 38, pp. 172-183, 1989.

For a radix-4 systolic multiplier, each cell can then be designed to add a product of two digits from { -2, -1,O, 1) to a partial sum digit from {-3, -2, -1,O, 1,2}, absorbing and producing a carry from { -1,O, l}, with resulting sum digit again in { -3, -2, -1,O, 1 . 2 ) .

Decomposition of { 0, 1}-Matrices

0

R. Swaminathan and D. Veeramani IV. CONCLUSION The literature on computer arithmetic is abundant with examples where digit set conversions are used implicitly in algorithms for performing standard arithmetic operations, but most often without this being realized or at least expressed. This also means that the same problem has been investigated and solved repeatedly. In this paper we have analyzed digit set conversion in its own right, providing, hopefully, a coherent view on conversions into nonredundant as well as into redundant digit sets, and then shown how these fairly general results may be applied in a variety of different arithmetic algorithms. It is our belief that this approach is beneficial because it allows a systematic treatment of a problem that often may be isolated as a subproblem in the context of some algorithm design situation.

Abstract-A simple decomposition of a r x c (0, 1}-matrix is defined in terms of a collection of disjoint submatrices obtained by deleting a “minimal” set of columns. In general, the number of such simple decompositions is O(2‘). A class of matrices, namely, vertex-tree graphic, is defined, and it is shown that the number of simple decompositions of a vertex-tree graphic matrix is at most r - 1. Finally, the relevance of simple decomposition to the well-known problem of cluster formation on (0, 1)-matrices is uncovered, and an O ( r 2 c )time algorithm is given to solve this problem for vertex-tree graphic matrices. Zndex Terms-Cluster decomposition, cluster-formation problem, disconnecting set, edge-tree graphic matrix, vertex-tree graphic matrix, simple decomposition.

I. INTRODUCTION

ACKNOWLEDGMENT The author acknowledges constructive comments from the referees and fruitful discussions with D. Matula conceming the proof of Theorem 2.1 and his notation from [6] as used here.

REFERENCES M. D. Ercegovac and T. Lang, “On-the-fly conversion of redundant into conventional representations,” IEEE Trans. Comput., vol. C-36, no. 7, pp. 895-897, 1987. R. E. Ladner and M. J. Fisher, “Parallel prefix computation,” J. ACM, vol. 27, no. 4, pp. 831-838, 1980. R. P. Brent and H. T. Kung, “A regular layout for parallel adders,” IEEE Trans. Comput., vol. C-31, no. 3, pp. 26C-264, 1982. J. Sklansky, “Conditional-sum addition logic,” IRE Trans. Electron. Comput., vol. EC-9, pp. 226231, 1960. A. Avizienis, “Signed-digit number representations for fast parallel arithmetic,” IRE Trans. Electron. Comput., vol. EC-10, pp. 389400, 1961. D. W. Matula, “Radix arithmetic: Digital algorithms for computer architecture,” in Applied Computation Theory: Analysis, Design, Modeling, Raymond T. Yeh, Ed. Englewood Cliffs, NJ: Prentice-Hall, ch. 9, pp. 374-448. T. Lynch and E. Swartzlander, “A spanning tree carry lookahead adder,” IEEE Trans. Comput., vol. C-41, no. 8, pp. 931-939, 1992.

This brief contribution deals with decomposing { 0, 1)-matrices into two or more disjoint submatrices by deletion of a “suitable” set of columns. It is shown that the number of ways a (0. 1)-matrix can be decomposed is an exponential function of the number of its rows. Subsequently, a special class of matrices is introduced and it is proved that the number of ways these special matrices can be decomposed is a linear function of the number of its rows. Results obtained on this decomposition are then used to solve the cluster-formation problem (on the previously mentioned special class of (0, 1)-matrices) that deals with the formation of “block-diagonal” structure by permuting rows and columns so that the number of 1’s common to two or more of these blocks is minimized. In this brief contribution, for convenience, trees are equated with their edge sets. In addition, all the matrices are assumed to have at least one 1-entry in each row and column. ‘This is not a Manuscript received May 21, 1992; revised December 16, 1992. R. Swaminathan is with the Department of Computer Science, University of Cincinnati, Cincinnati, OH 45221. D. Veeramani is with the Department of Industrial Engineering, University of Wisconsin, Madison, WI 53706. IEEE Log Number 9213782.

0018-9340/94$04.00 0 1994 IEEE