IEEE Circuits and Devices Magazine - CiteSeerX

Employing Alternative Number Systems to Reduce Power Dissipation in Portable Devices and High-Performance Systems

© 1999 ARTVILLE, LLC.

P

The choice of the number system—i.e., the way numbers ower dissipation has evolved into an instrumental deare represented in a digital system—can reduce power dissipasign optimization objective due to the growing detion, since the number system has an effect on several levels of mand for portable electronics equipment as well as the design abstraction. In particular, the appropriate selection due to excessive heat generation in high-performance of the number system can reduce systems. In the former case, power dissipation, because it can relow-power techniques are employed to T. Stouraitis and V. Paliouras duce: prolong battery life, while in the latter 1. the number of the operations; case, low-power techniques are re2. the strength of the operators; and quired to mitigate the reliability problems that may arise. The dominant component of power dissipation for well-designed 3. the activity of the data. CMOS circuits is static power dissipation, given as [1] A particular choice of number system can reduce the number of the actual operations required to accomplish certain computa(1) P = aC L f Vdd2 , tional tasks; therefore, it can reduce the computational load of an application. Furthermore, both data activity and the strength of the operators are influenced by the choice of the number syswhere a is the activity factor, C L is the switching capacitance, f tem. Finally, power dissipation can be reduced by using is the clock frequency, and Vdd is the supply voltage. A variety of low-power arithmetic circuit architectures. Again, the possible design techniques are commonly employed to reduce the facarchitectures are determined by the number system. tors of Eq. (1), without degrading system performance. As Several authors address the issue of low-power arithmetic; slower circuits tend to dissipate less power, the low-power dethe bulk work in this field is on the definition of new low-power sign problem can be seen as an attempt to achieve a specified circuit-level architectures [3] or the identification among existsystem performance by employing slow components. ing architectures of those that dissipate minimal power for the The reduction of the various factors that determine power basic operations, such as addition and multiplication [4]. Also, dissipation is sought at all levels of the design abstraction. In comparisons of number representations, such as sign-magnitude particular, techniques for power dissipation reduction at higher and two’s-complement systems [5], in terms of underlying bit acdesign abstraction levels aim to reduce the computational load tivity have been reported [6]. and the number of memory accesses required to perform a cerIn this article we focus on two alternative number systems tain task as well as to introduce parallelism and pipelining in the that are quite different than the conventional linear number repsystem [2]. At the circuit and process levels, minimal feature size resentations, namely the logarithmic number system (LNS) and circuits are preferred, capable of operating at minimal supply the residue number system (RNS). Both have recently attracted voltages, while leakage currents and device threshold voltages the interest of researchers for their low-power properties. We are minimized. CIRCUITS & DEVICES

■

JULY 2001

8755-3996/01/$10.00 ©2001 IEEE

23 ■

address aspects of the conventional arithmetic representations, the impact of logarithmic arithmetic on power dissipation, and discuss the low-power aspects of residue arithmetic.

LNS is applicable for low-power design because it reduces the strength of certain arithmetic operators and the bit activity.

Conventional Arithmetic Representations Parhami [5] offers an overview of low-power techniques for arithmetic circuits. Common techniques for low-power logic design can be applied to arithmetic circuits as well [5]. Such techniques are based on the following guidelines: 1. avoid wasted power: glitching minimization, not clocking idle modules; 2. barely meet performance requirements, since slower circuits dissipate less power; and 3. minimize signal activity by properly encoding data. In some cases, wasted power can be reduced by several times by minimizing the computational load of a particular task. The appropriate selection of the number system will be shown below to reduce the computational load in certain tasks. Callaway and Swartzlander [7] have focused on low-power arithmetic at the gate level; they have characterized several adder and multiplier architectures in terms of power dissipation. They offer area, time, and power dissipation measures for various architectures and word lengths. In terms of minimal power dissipation for 16-bit adders, the constant-width carry-skip adder emerges as the optimal choice. However, minimal absolute power dissipation may not be the optimization objective in a design. In most cases, a more complex criterion, the power-delay product, is more applicable, because it describes the combined effect of reducing power dissipation at the cost of increasing circuit delay. Returning to the 16-bit adder example, the utilization of the power-delay product criterion points out a different topology as an optimal solution, namely the variable-width carry-skip adder [7]. This example demonstrates that there is not an optimal choice of architecture applicable to every design situation. Instead, the design specifications (expressed as area, time, and power dissipation constraints) should be met, while minimizing an appropriate cost function. A similar discussion for multipliers of word sizes between 8 and 32 bits reveals that Wallace and Dadda architectures outperform array multipliers for low-power operation [7]. Bit activity is another factor that affects power dissipation and depends on the number system selection. It has been shown that the probabilistic distribution of the input signals largely af-

n

n−1

z

s

…

0

…

x = logb |X | 1. The organization of a (n + 1)-bit LNS digital word. ■ 24

fects the performance of the number representation in terms of bit activity. Landman and Rabaey demonstrate this effect by introducing the dual-bit type (DBT) method for modeling the bit activity in a data word [6], assuming two’s-complement and sign-magnitude representations. While the sign magnitude representation is found to exhibit less bit activity than two’s-complement coding, a general conclusion on the power dissipation behavior cannot be drawn, since the complexity of the corresponding processing circuitry is different. Since sign-magnitude arithmetic requires more complicated adders and subtractors than two’s- complement arithmetic, the increased activity of the latter can be compensated from a power dissipation viewpoint.

The Logarithmic Number System The LNS [8] has been employed in the design of low-power DSP devices, such as a digital hearing aid by Morley et al. [9]. More recently, Sacha and Irwin report that LNS can reduce power dissipation in adaptive filtering processors [10].

LNS Basics The LNS maps a linear number X to a triplet as follows X LNS →( z , s , x = log b | X | ),

(2)

where z is a single-bit flag which, when asserted, denotes that X is zero, s is the sign of X, and bis the base of the logarithmic representation. The organization of an LNS word is shown in Fig. 1. Mapping [Eq. (2)] is of practical interest because it can simplify certain arithmetic operations; i.e., reduce the strength of the operators. For example, due to the properties of the logarithm function, the multiplication of two linear numbers X = bx and Y = b y is reduced to the addition of their logarithmic images, x and y. The basic arithmetic operations and their LNS counterparts are summarized in Table 1. In order to utilize the benefits of LNS, a conversion overhead is required in most cases. Conversion circuitry is required to perform the forward LNS mapping [Eq. (2)] and the inverse mapping of the logarithmic results to linear numbers, defined as X = (1 − z )( −1)s bx .

(3)

It is noted that mappings (2) and (3) are required in the case that an LNS processor receives as input and transmits as output linear data in digital format. Since all arithmetic operations can be performed in the logarithmic domain, only an initial conversion is imposed; therefore, as the amount of processing implemented in LNS grows, the conversion overhead contribution to power dissipation becomes negligible since it remains constant. In stand-alone DSP systems a different approach is possible. The LNS forward and inverse mapping overhead can be mitiCIRCUITS & DEVICES

■

JULY 2001

gated by employing logarithmic A/D and D/A converters, instead of linear converters, followed by corresponding digital conversion circuitry. Such an approach has been adopted by Morley et al. in the design of a digital hearing-aid processor [9].

It is shown that RNS can even reduce the computation load in complex-number processing, thus providing savings at the algorithmic level.

LNS and Power Dissipation LNS is applicable for low-power design because it reduces 1. the strength of certain arithmetic operators; and 2. the bit activity. The operator strength reduction by LNS reduces the switching capacitance; i.e., it reduces the C L factor of Eq. (1). Sacha and Irwin have studied the impact of the number system choice on the QRD-RLS algorithm [10]. They have compared the amount of switched capacitance per algorithm iteration for several implementations of QRD-RLS, each using a particular arithmetic, namely CORDIC, floating-point, fixed-point, and LNS. A performance comparison of the various implementations reveals that LNS offers accuracy comparable to floating-point, but only at a fraction of switched capacitance per iteration of the algorithm. The reduction of average switched capacitance due to LNS stems from the simplification of basic arithmetic operations, shown in Table 1. However, LNS can affect power dissipation in an additional waythe bit activity; i.e., the a factor of Eq. (1). A design parameter that is often neglected despite playing a key role in an LNS-based processor performance is the base of the logarithm b [11, 12], as demonstrated in Fig. 2. The choice of base has a substantial impact on the average bit activity. Figure 2 shows activity per bit position; i.e., the probability of a transition from “low” to “high” in a particular bit position, for a two’s-complement word and several LNS words, each of a different base b. It can be seen that departing from the traditional choice b = 2 can substantially reduce the signal activity in comparison to the two’s-complement representation. The input data are sampled from a zero-mean Gaussian process with a correlation factorρ = −0.99, similar to the derivation of the DBT model. Since multiplication-additions are important in DSP applications, the power requirements of an LNS and a linear fixed-point adder-multiplier have been compared. Paliouras and Stouraitis report that approximately a two-times reduction in power dissipation is possible for operations with word size of 8 to 14 bits. Given a sufficient number of multiplication-additions, the LNS implementation becomes more efficient from the low-power dissipation viewpoint, even when a constant conversion overhead is taken into consideration.

The Residue Number System The RNS [13] has recently been shown to offer significant power-dissipation savings in the design of signal processing architectures for FIR filters [14] and frequency synthesizers [15]. CIRCUITS & DEVICES

■

JULY 2001

It is shown that RNS can even reduce the computat i o n l o a d i n c o mplex-number processing, thus providing savings at the algorithmic level.

RNS Basics The RNS maps an integer X to a N-tuple of residues xi ,

→ { x1 , x2 ,K , x N }, X RNS

(4)

where xi = X

mi

,

(5)

⋅ m denotes the mod mi operation, and mi is a member of the set i of the co-prime integers {m1 , m2 ,K , mM }, called moduli. Co-prime integers have the property that gcd( mi , mj ) = 1, i ≠ j. The modulo operation X m returns the integer remainder of the integer division x div m; i.e., a number k such that x = m ⋅ l + k, where l is an integer.

Table 1. Basic Linear Arithmetic Operations and their LNS Counterparts Linear Operation y

Z = XY = b x b = b

Logarithmic Operation

x +y

y

Z = X /Y =bx / b =b Z = m X = m bx =b

z = log b Z = x + y

x −y

z=x −y

x m

z = x / m , m , integer

Z = X m = (b x ) m

z = mx , m , integer

y

y −x

y

y −x

Z = X + Y = b x + b = b x (1 + b Z = X − Y = b x − b = b x (1 − b

)

z = x + log b (1 + b

y −x

)

z = x + log b (1 − b

y −x

) )

p0→1 0.5 0.4

Two’s Complement

0.3 0.2

b = 1.5 b = 1.3 b = 1.1

0.1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

bits

2. Probability p0→ 1 per bit position for two’s-complement and LNS encoding for ρ = −0.99. 25 ■

RNS is of interest because basic arithmetic operations can be performed in a d i g i t - pa r a l l e l carry-free manner; i.e., zi = x i o y i

mi

,

The impact of the arithmetic in a digital system is not limited to the definition of the architecture of arithmetic circuits.

(6)

where i = 1,2,K , M , and the symbol o stands for addition, subtraction, or multiplication. Every integer in the range N 0 ≤ X < ∏ i = 1 mi has a unique RNS representation. Inverse conversion is accomplished by means of the Chinese Remainder Theorem (CRT) or mixed-radix conversion [16]. The basic architecture of an RNS processor in comparison to a binary counterpart is depicted in Fig. 3. Figure 3 shows that the word length n of the binary counterpart is partitioned into M subwords, the residues, which can be processed independently and are of word length significantly smaller than n. The ith residue channel performs arithmetic modulo mi . Conceptually, RNS introduces a subword-level parallelism into an algorithm; therefore, its hardware implementation can enjoy the low-power benefits of parallel architectures [2].

RNS and Power Dissipation Freking and Parhi have studied the power dissipation of FIR filter architectures that employ RNS. They report that RNS can reduce power dissipation since it reduces [14]: 1. the hardware cost; 2. the switching activity; and 3. the supply voltage. By employing the binary-like RNS filter structures by Ibrahim [17], Freking and Parhi report that RNS reduces the bit activity up to 38% in ( 4 × 4)-bit multipliers. As the critical path in an RNS architecture increases logarithmically with the equivalent binary word length, RNS can tolerate a larger reduction in the supply voltage than the corresponding binary architecture while achieving a particular delay specification. To demonstrate the overall impact of the RNS on the power budget of an FIR filter, Freking and Parhi report that a filter unit with 16-bit coefficients and 32-bit dynamic range, operating at 50 MHz, dissipates 26.2 mW on average for a two’s-complement implementation, while

n/M

more than 100 taps. A different approach to low-power RNS is proposed by Chren. Chren suggests to one-hot encode the residues in an RNS-based architecture, thus defining one-hot RNS (OHR) [15]. Instead of encoding a residue value xi in a conventional positional notation, an ( m −1)-bit word is employed. In this word, the assertion of the ith bit denotes the residue value xi . The one-hot approach allows for further reducing bit activity and power-delay products using residue arithmetic. OHR is found to require simple circuits for processing. The power reduction is rendered possible since all basic operations (i.e., addition/subtraction and multiplication) and the RNS-specific operations of scaling (i.e., division by constant), modulus conversion, and index computation are performed using transposition of bit lines and barrel shifters. The performance of the obtained residue architectures is demonstrated through the design of a direct digital frequency synthesizer, which exhibits a power-delay product reduction of 85% over the conventional approach [15].

RNS Signal Activity for Gaussian Input In the following, the bit activity in an RNS architecture with positionally encoded residues is experimentally studied for the encoding of 8-bit data using the base {2,151}, which provides a linear FXP dynamic range of approximately 8.24 bits. Assuming data sampled from a Gaussian process, the bit assertion activities of the particular RNS, an 8-bit sign-magnitude, and an 8-bit two’s-complement system are measured and compared. The results are depicted in Figs. 4-6 for 100 Monte Carlo runs. It is observed that RNS performs better than two’s-complement representation for anti-correlated data and slightly worse than sign-magnitude and two’s-complement representations for uncorrelated and correlated sequences.

The Quadratic RNS

n/M

n

n/M

n/M mod m2

n/M

Inverse Converter

n

Forward Converter

mod m1

n

n/M mod mM

(a)

the RNS equivalent architecture dissipates 3.8 mW. Hence, power dissipation reduction becomes more significant as the number of filter taps increases, and a three-times reduction is possible for filters with

(b)

n

Residue arithmetic can be exploited to reduce the number of real operations required to perform complex-number multiplication. This is achieved by employing an extension of RNS, the quadratic RNS (QRNS) [16]. The direct complex-number multiplication can be performed as p = ( a + jb)( c + jd )

(7)

3. Structure of a binary architecture (a) and the corresponding RNS processor (b). ■ 26

CIRCUITS & DEVICES

■

JULY 2001

= ( ac − bd ) + j( bc + da ),

(8)

where j is the imaginary unit (i.e., −1), and a, b, c, and d are real numbers. Parhami [5] shows a different technique to reduce the number of multiplications to three by performing five additions or subtractions with an extra computational step. According to this technique, the complex product is computed as p = [c( a + b) − b( c + d )] + j[c( a + b) − a( c − d )],

Equations (16) and (17) show that a complex multiplication requires only two residue multiplications instead of four multiplications, an addition, and a subtraction. Therefore, by paying an initial cost for conversion, a significant computational complexity reduction can be achieved by the QRNS mapping, which is directly translated to power savings.

(9)

where the common term c( a + b) is initially computed. In case the moduli are primes of the form mi = 4k + 1, a QRNS mapping can be established, such that the residue pair of the real and imaginary part modulo mi can be mapped to a quadratic residue as

TC 2500 2000

RNS SM

1500 1000

→( qi , qi* ), ( ai , bi )   QRNS

(10) 500

where qi and qi* are the quadratic images of ai and bi , respectively. The quadratic images are obtained as qi = ai + jbi

qi* = ai − jbi

mi

mi

(11)

* i

mi

= 0.

2500

(13)

RNS TC SM

1500 1000

mi

, (14)

500

20

(15)

The quadratic mapping is of practical importance because it alleviates the dependency of the real and imaginary parts of a complex product from both the real and imaginary parts of both the operands, as shown by Eq. (8). In other words, it eliminates the cross-product terms. Therefore, by exploiting the QRNS, the complex product {( qpi , qpi* )| i = 1,2,K , N } of two QRNS-encoded complex n u m be r s, and {( qai , qai* )| i = 1,2,K , N } * {( qbi , qbi )| i = 1,2,K , N }, can be evaluated as the direct product of the corresponding quadratic images; i.e., (16)

mi

CIRCUITS & DEVICES

■

mi

.

JULY 2001

(17)

40

60

80

100

5. Number of low-to-high transitions, assuming uncorrelated (ρ = 0) Gaussian data.

2500 2000 RNS

1500

TC SM

1000 500 20

qpi* = qai* ⋅ qbi*

100

4. Number of low-to-high transitions, assuming strongly anti-correlated (ρ = −0.99) Gaussian data, for two’s-complement, RNS, and sign-magnitude number systems for 100 Monte Carlo runs.

where j is the solution of

qpi = qai ⋅ qbi

80

2000

bi = 2−1 j −1 ( qi − qi* )

mi

60

(12)

ai = 2 ( qi + q )

j −2 + 1

40

,

while the mapping is inversed as −1

20

40

60

80

100

6. Number of low-to-high transitions, assuming strongly correlated (ρ = 0.99) Gaussian data. 27 ■

Consider the Monte Carlo runs of the following experiment. Assuming that the real and imaginary parts of the factors of a complex product are taken from two Gaussian random processes, the total bit activity in the intermediate results is measured for the complex product evaluation. Specifically, 10-bit sign-magnitude and 10-bit two’s-complement operations are

25000

TC SM

20000 15000

QRNS

10000 5000

2

4

6

8

10

7. Number of low-to-high transitions for complex-number multiplication, assuming uncorrelated (ρ = 0) Gaussian operands.

compared to QRNS operations that cover a dynamic range in excess of 20 bits. Ten Monte-Carlo runs, each of 1000 samples, compose the experiment, which is repeated for uncorrelated (ρ = 0), correlated (ρ = 0.99), and anti-correlated (ρ = −0.99) Gaussian data; results are shown in Figs. 7-9, respectively. Even in the case that QRNS provides significantly larger dynamic range, it can be seen that the bit activity is reduced approximately two times. D’Amora et al. have compared the implementation of a direct-form complex FIR filter with its QRNS counterpart [18]. They report that, for a particular throughput rate, the QRNS-based implementation requires half the area and a third of the power dissipation of the conventional implementation. The conventional implementation is assumed to utilize the four-multiplication scheme for complex-number multiplication, while the QRNS implementation exploits the index transform. The index transform reduces a modulo-m multiplication to a modulo-( m −1) addition, for m prime, resembling the reduction of multiplication to addition by LNS. An integer rootρ can be determined, such that the residues r ∈[1, m) can be written as r = ρn

TC SM

20000

15000 QRNS

10000

m

,

(18)

and the multiplication of the residues can be reduced to addition modulo ( m −1) of the indices, which correspond to the residues to be multiplied. Therefore, the modulo product p of two residues, r1 and r2 is p = r1 r2

m

= ρ n1 ρ n2

= ρ n1 + n2

m

= ρn,

(19)

where

5000

n = n1 + n2 2

4

6

8

25000 TC 20000

SM

15000 QRNS

10000

m−1

.

(20)

10

8. Number of low-to-high transitions for complex-number multiplication, assuming strongly correlated (ρ = 0.99 ) Gaussian operands.

Hence, modulo multiplication can be performed as residue addition, preceded and followed by a mapping of the operands to their indices and of the result to the residue. These mappings are commonly implemented as table look-ups [16]. The QRNS can exploit the index transform because the utilized moduli need to be prime. Hence, in the case of DSP architectures such as FIR filters, the coefficients can be directly stored in index-residue form, thus the strength of each multiplication can be further reduced, since the determination of the corresponding indices is not repeated for every residue multiplication. The significant power dissipation savings reported by D’Amora et al. assume the utilization of the index transform for residue multiplication [18].

Conclusions

5000

2

4

6

8

10

9. Number of low-to-high transitions for complex-number multiplication, assuming strongly anti-correlated (ρ = −0.99) Gaussian ■ 28

m

Recent advances in computer arithmetic offer interesting alternative solutions for low-power design. Depending on an assortment of factors that need to be considered, such as signal statistics, computational load, type of arithmetic operations, accuracy and dynamic range, it is worth evaluating the LNS or the CIRCUITS & DEVICES

■

JULY 2001

RNS for hardware implementations of computationally intensive tasks. The choice of arithmetic can lead to substantial power savings. It affects several levels of the design abstraction since it can reduce the number of operations, the signal activity, and the strength of the operators. The impact of the arithmetic in a digital system is not limited to the definition of the architecture of arithmetic circuits.

[2] J.M. Rabaey and M. Pedram, Low Power Design Methodologies. Boston, MA: Kluwer, 1996.

Thanos Stouraitis received a B.S. in physics and an M.S. in electronic automation from the University of Athens, Greece, in 1979 and 1981, respectively; an M.S. in electrical engineering from the University of Cincinnati in 1983; and the Ph.D. degree from the University of Florida in 1986. He was awarded the Outstanding Ph.D. Dissertation award of the University of Florida and a Certificate of Appreciation by the IEEE Circuits and Systems Society in 1997. He is a professor of electrical and computer engineering at the University of Patras, Greece. He has served on the faculty of the University of Florida and the Ohio State University. He has published two books, several book chapters, and approximately 30 journal and 70 conference papers in the areas of computer architecture, computer arithmetic, VLSI signal and image processing, and low-power processing. He serves on the IEEE Circuits and Systems Society’s technical committee on VLSI Systems and Applications and the digital signal processing and the multimedia systems committees (e-mail: thanos@ee. Upatras.gr).

[6] P.E. Landman and J.M. Rabaey, “Architectural power analysis: The dual bit type method,” IEEE Trans. VLSI Syst., vol. 3, pp. 173-187, June 1995.

Vassilis Paliouras received the Diploma in electrical engineering in 1992 and the Ph.D. degree in electrical engineering in 1999, from the Electrical and Computer Engineering Department, University of Patras, Greece. He works as a researcher at the VLSI Design Laboratory, ECE Dept., while teaching microprocessor-based system design at the Computer Engineering and Informatics Department, both at the University of Patras. His research interests include computer arithmetic algorithms and circuits, microprocessor architecture, and VLSI signal processing, areas where he has published more than 30 conference and journal articles. Dr. Paliouras received the MEDCHIP VLSI Design Award in 1997. He is also the recipient of the 2000 IEEE Circuits and Systems Society Guillemin-Cauer Award. He is a Member of ACM, SIAM, and the Technical Chamber of Greece.

References

[1] A.P. Chandrakasan, S. Sheng, and R. Brodersen, “Low-power CMOS digital design,” IEEE J. Solid-State Circuits, vol. 27, pp. 473-484, Apr. 1992.

CIRCUITS & DEVICES

■

JULY 2001

[3] K.K. Parhi, “Low-energy CSMT carry generators and binary adders,” IEEE Trans. VLSI Syst., vol. 7, pp. 450-462, Dec. 1999. [4] T.K. Callaway and E.E. Swartzlander, Jr., “Power-delay characteristics of CMOS multipliers,” in Proc. 13th Symp. Computer Arithmetic (ARITH13), Asilomar, USA, July 1997, pp. 26-32. [5] B. Parhami, Computer Arithmetic—Algorithms and Hardware Designs. New York: Oxford Univ. Press, 2000.

[7] T.K. Callaway and E.E. Swartzlander, “Low power arithmetic components,” in Low Power Design Methodologies. J.M. Rabaey and M. Pedram, Eds. Boston, MA: Kluwer, 1996. [8] E. Swartzlander and A. Alexopoulos, “The sign/logarithm number system,” IEEE Trans. Computers, vol. 24, pp. 1238-1242, Dec. 1975. [9] R.E. Morley, Jr., G.L. Engel, T.J. Sullivan, and S.M. Natarajan, “VLSI based design of a battery-operated digital hearing aid,” in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 2512-2515, 1988. [10] J.R. Sacha and M.J. Irwin, “The logarithmic number system for strength reduction in adaptive filtering,” in Proc. Int. Symp. Low-Power Electronics and Design (ISLPED’98), Monterey, CA, 1998, pp. 256-261. [11] V. Paliouras and T. Stouraitis, “Signal activity and power consumption reduction using the Logarithmic Number System,” in Proc. 2001 IEEE Int. Symp. Circuits and Systems (ISCAS), vol. 2, pp. II-653-II-656, 2001. [12] V. Paliouras and T. Stouraitis, “Low-power properties of the Logarithmic Number System,” in Proc. 15th Symp. Computer Arithmetic (ARITH15), 2001. [13] N. Szabó and R. Tanaka, Residue Arithmetic and its Applications to Computer Technology. New York: McGraw-Hill, 1967. [14] W.L. Freking and K.K. Parhi, “Low-power FIR digital filters using residue arithmetic,” in Proc. 31st Asilomar Conference on Signals, Systems, and Computers, vol. 1, pp. 739-743, 1997. [15] W.A. Chren, Jr., “One-hot residue coding for low delay-power product CMOS design,” IEEE Trans. Circuits Syst. II, vol. 45, pp. 303-313, March 1998. [16] M.A. Soderstrand, W.K. Jenkins, G.A. Jullien, and F.J. Taylor, Residue Number Arithmetic: Modern Applications in Digital Signal Processing. Piscataway, NJ: IEEE Press, 1986. [17] M.K. Ibrahim, “Novel digital filter implementations using hybrid RNS-binary arithmetic,” Signal Processing, vol. 40, no. 2-3, pp. 287-294, 1994. [18] A. D’Amora, A. Nannarelli, M. Re, and G.C. Cardarilli, “Reducing power dissipation in complex digital filters by using the Quadratic Residue Number System,” in Proc. 34th Asilomar Conference on Signals, Systems, and Computers, 2000. CD■

29 ■