Low-Complexity Bit-Parallel Systolic Multipliers over GF(2m) - CiteSeerX

0 downloads 0 Views 235KB Size Report
field GF(2m) by employing an interleaved conventional .... It has the general form, F(x)=xm+fm-1xm-1+fm-2xm-2+… .... Applying the folded technique. [17], each ...
2006 IEEE International Conference on Systems, Man, and Cybernetics

Low-Complexity Bit-Parallel Systolic Multipliers over m ) GF(2 1 2 1 2 Chiou-Yng Lee , Chin-Chin Chen ,Yuan-Ho Chen and Erl-Huei Lu

1

2

Lunghwa University of Science and Technology, Email: [email protected] The Department of Electrical Engineering, Chang Gung University Generally, the array algorithms are classified as least-significant-bit first (LSB-first) and most-significant-bit first (MSB-first) schemes. For computing exponentiation over GF(2m), the algorithms which are the LSB-first methods involved both basic modular multiplication and squaring computations. Consequently, such multipliers require a large area and a large latency overhead to be a fully pipelined multiplier. To reduce the circuit complexity, Lee et al. [14] employed the inner product operation to implement efficient systolic multipliers with low-latency and low-complexity architectures, defined by all-one and equally-spaced polynomials. Unfortunately, such irreducible all-one polynomials are very rare. For example, as m≤100, the values of m for which an all-one polynomial of degree m is irreducible are only 2, 4, 10, 12, 18, 28, 36, 52, 58, 60, 66, 82, and 100. To solve this problem, low-complexity bit-parallel systolic multipliers for a class of GF(2m) field generated with irreducible trinomials xm+xn+1 are proposed by Lee [13,15]. Lee et al. in [16] were shown that a bit-parallel systolic Montgomery multiplier for trinomials can be translated from binomial-based multiplier. Such multipliers for some classes of GF(2m) have low-complexity bit-parallel systolic architectures, but are unsuited for implementing the general field of GF(2m). To overcome the limitation of low-complexity bit-parallel systolic multipliers for some classes of GF(2m), this paper using the MSB-first algorithm present time-dependent and time-independent multiplication algorithms using the methods of an interleaved conventional multiplication and a folded technique. Based on the proposed algorithms, two new bit-parallel systolic multipliers over GF(2m) are presented. Both multipliers exhibit lower hardware complexity than those with other existing systolic multipliers in [10,11]. Thus, our constructions of systolic arrays provide possible applications for designing other architectures computing exponentiation and inversion operations. Analysis shows that our proposed time-dependent and time-independent multipliers save about 38% and 54% space complexity as compared to the traditional multipliers for a general field of GF(2m), respectively.

Abstract Recently, cryptographic applications based on finite fields have attracted much interest. This paper presents two new algorithms, called time-dependent and time-independent multiplication algorithms over a finite field GF(2m) by employing an interleaved conventional multiplication and a folded technique. The proposed algorithms permit efficient realization of the bit-parallel multiplication using iterative arrays. The results show that our proposed time-dependent and time-independent multipliers save about 38% and 54% space complexity as compared to the traditional multipliers, respectively. Key words: Systolic Array, Folded Technique, Primitive Polynomial, interleaved conventional multiplication

1. Introduction Many important applications, such as error-correcting codes [1], cryptography [2], digital signal processing [3,4], switching theory [5], and pseudorandom number generation [6], are based on GF(2m) arithmetic operations. Therefore an efficient design of a finite field multiplier is needed. Among GF(2m) arithmetic operations, addition can be implemented simply as XOR of the corresponding vectors. However, multiplication is time-consuming and has complicated circuit complexity. Thus, design of efficient multiplier with low circuit complexity, high throughout rate and short computation delay is required. Typically, bit-parallel multipliers over GF(2m) are classified as non-systolic and systolic. Various architectures for non-systolic-style have been proposed, such as those of Fan-Dai [7] and Guo-Wang [8]. These architectures are based on three types of polynomials, all-one polynomials, equally-spaced polynomials, and trinomials to implement low-complexity bit-parallel multipliers, but the designs are irregular. For the high-speed data communication systems for a large field, their architectures are unsuited to implement non-invertible decoding algorithms such as the step-by-step Reed-Solomon decoding algorithm [9]. For efficient VLSI implementation, systolic architectures are fundamentally suited to rapid computation and depend on regular circuitry to perform arithmetic operations over finite fields GF(2m). The polynomial basis is usually used for implementing parallel and serial systolic multipliers [10-13]. Above mentioned multipliers have common characteristics that use Horner’s rule to realize regular circuit, and are not depending on a particular choice of m for GF(2m).

This paper is organized as follows. In Section 2, the polynomial basis multiplication algorithm over GF(2m) is briefly reviewed. The proposed time-dependent and time-independent multiplication algorithms are introduced in Section 3 and 4, respectively. In Section 5, we analyze the time and 1

2006 IEEE International Conference on Systems, Man, and Cybernetics

space complexity for the proposed multipliers and the related works. Finally, some conclusions are given in Section 6.

the following algorithm: Algorithm 1: (The conventional multiplication algorithm over GF(2m)) INPUT: A(x),B(x),F(x) OUTPUT:C(x)=A(x)B(x) mod F(x) 1. T0=0 2. for i=1 to m do begin 2.1 Ti=Ti-1x+bm-iA(x) mod F(x) end 3. return Tm

2. Conventional Polynomial-Basis Multiplication Algorithm over GF(2m) For any positive integer m, it is possible to extend the Galois Field GF(2) to a field with 2m elements. This field is an extension field of GF(2) and denoted by GF(2m) [2]. For GF(2m) algebra, there exists an element that generates all non-zero elements of the field. Let x be the generator and the irreducible polynomial of which is a root is called the generator/basis polynomial. It has the general form, F(x)=xm+fm-1xm-1+fm-2xm-2+… +f0, fi∈GF(2), (i=0, 1, … , m-1). In this work we have considered only primitive polynomials as generator polynomial. Each element A(x) in GF(2m) can also be represented by a polynomial (am-1xm-1+ am-2xm-2 +…+a0), where ai∈GF(2) for 0≤ i ≤m-1. The set {am-1, am-2 ,… , a0} can be viewed as a vector and represents the element A(x). Each element in GF(2m) has a unique representation as a linear combination of the polynomial. If x is the generator of GF(2m), then

As stated above, the major operations within the algorithm include multiply-by-x, generate-current-partial-product and accumulate-to-previous-result. With the bit-level representation, the intermediate product Ti in Step 2.1 can be represented by

Ti = t i −1,m − 2 x m −1 + ... + t i −1,1 x 2 + t i −1, 0 x + t i −1,m −1 ( f m−1 x m−1 + ... + f 1 x + f 0 ) + bm −i (a m −1 x = t i ,m −1 x

i =0

The multiplication of two elements in GF(2m) is m

m −1

∑fx , i

i

+ ... + t i ,1 x + t i ,0 for 1 ≤ j ≤ m − 1 for

j=0

As a result, by the recursive operations in the above equations, Wang-Lin [11] uses the unidirectional data flow concept to propose a parallel systolic multiplier over GF(2m), the circuit is identical of m2 cells, each cell is composed of two 2-input AND gates, one 3-input XOR gates and seven 1-bit latches. The latency requires 3m clock cycles.

fi ∈ GF(2).

i =0

Namely, multiplication in GF(2m) can be performed by polynomial modulo F(x) on the field elements represented as polynomials of degree m-1 or less. Let A(x) and B(x) be any arbitrary elements in the finite field GF(2m), where an (primitive) irreducible polynomial F(x) of degree m over GF(2) generates the field. If C(x) is the result of A(x)B(x) mod F(x), then the following occurs:

3. Proposed Bit-Parallel Systolic Multiplier over GF(2m)

Let A(x)=a0+a1x+a2x2+… +am-1xm-1, B(x)= b0+ b1x+ b2x2+… +bm-1xm-1 and C(x)=c0+c1x+c2x2+… +cm-1xm-1 be any three elements in GF(2m) with a primitive polynomial F(x) of degree m. Assume that m −1 F(x)=xm+P(x), where P( x) = p xi . Then, we have

m-1

A(x) = am-1x +…+a1x +a0 B(x) = bm-1xm-1+…+b1x +b0 C(x) = cm-1xm-1+…+c1x +c0 F(x) = xm+ fm-1xm-1+…+f1x +f0



m

where m is a positive integer number, all coefficients fi, ai, bi and ci (0≤i≤m-1) are in GF(2). Using the MSB-first algorithm, the product C(x)=A(x)B(x) can be obtained by

x =P(x)

i

i =0

(4)

From Algorithm 1, the multiplication requires an m-times loop of Ti=Ti-1x+bm-iA(x) mod F(x). Computing Ti can also be represented as

C(x)= A(x)B(x) mod F(x) =(b0A(x) mod F(x))+(b1A(x)x mod F(x))+ …+ (bm-1A(x)xm-1 mod F(x)) =(…((bm-1A(x) mod F(x))x +bm-2A(x) mod F(x)) x + …+b1A(x) mod F(x))x (2) +b0A(x) mod F(x) m-1

+ ... + a1 x + a 0 )

t i −1, j −1 + t i −1, m −1 f j + bm − i a j ti, j =  t i −1, m −1 f j + bm −i a j 

(1)

uniquely determined by x =

(3)

where

m −1

x m = ∑ fi xi

m −1

m −1

Ti=Ti-1x+bm-iA(x) mod F(x) =ti-1,0x+ti-1,1x2+…+ti-1,m-2xm-1+ti-1,m-1xm +bm-iA(x) mod F(x) = Ti-1x mod xm +ti-1,m-1P(x) +bm-iA(x)

(5)

where Ti-1x mod xm =ti-1,m-2xm-1+…+ti-1,1x2+ti-1,0x. By the interleaved conventional multiplication, the multiplication algorithm can derive the following algorithm.

m-2

Assume that Ti=ti,m-1x +ti,m-2x +…+ti,1x+ti,0 is the intermediate partial multiplication result of A(x) and B(x), the multiplication algorithm can be described as 2

2006 IEEE International Conference on Systems, Man, and Cybernetics

given in Algorithm 2. It consists of m×m cells and m 2×1 switch (SW1), each cell, as shown in Fig. 3, incorporates one 2-input XOR gate, one 2-input AND gate, five 1-bit latches and one 2×2 switch (SW2). Since the operations of Fig. 1 for each loop computation, the constraint of the 2 clock cycles per cell still reserves. And SW2 in Fig. 3 performs the following functions: for the (2i+0)th clock cycle v4=v3 and v2=v1 for the (2i+1)th clock cycle v2=v4

Algorithm 2: (The proposed time-dependent multiplication algorithm) INPUT: A(x), B(x), and P(x) OUTPUT:C(x)=A(x)B(x) 1. T0=0 2. for i=1 to m do begin 2.1 Ti = Ti-1 x mod xm+bm-iA(x) 2.2 Ti =Ti+ ti-1,m-1P(x) end 3. return Tm

The SW1 device is used for selecting one of two signals, k and bm-i into a single line in the ith row cells. The latency of the proposed multiplier requires 4m clock cycles.

Observing Algorithm 2, each loop operation is decomposed of two interleaved conventional multiplication operations. One can notice that two steps 2.1 and 2.2 are identical as the following simple equation

pm-1 am-1

0

pj aj

0

p1 a1

0

p0 a0

0

0

2i+1

Q(x)=Q(x)+giY(x)

bm-1

(6)

U0,1

U0,,0 0

U1,m-1

bm-2

U1,j

U1,1

U1,0 0

Ui,m-1

bi

Ui,j

Ui,1

Ui,0 0

Um-1,m-1

b0 cm-1

Um-1,j

Um-1,1

Um-1,0

c1

cj

c0

Fig.2. The proposed time-dependent bit-parallel systolic multiplier over GF(2m) pj aj ci,j k bi

Ti-1 x mod xm 2i+0

U0,j

SW1

where gi∈GF(2) and Q(x) and Y(x) are an intimidate polynomial with degree less than m-1. One can be found that two elements, A(x) and P(x), are existed on each loop multiplication. Applying the folded technique [17], each loop operation is shown as Fig.1. The ith loop operation is described as follows. (1) In the first cycle, bm-i, A(x) and Ti-1x mod xm are switched into the circuit and are performed by Ti=bm-iA(x) + Ti-1x mod xm. The result Ti is stored in the delay element. (2) Next, ti-1,m-1, P(x) and Ti are switched into the circuit and are performed Ti= ti-1,m-1 P(x) + Ti. The intermediate Ti is output and completed by the ith loop computations. P(x) A(x) 2i+1 2i+0

U0,m-1

2i+0

2i+1 v1

v2

SW2

bm-i

Y(x)

2i+0

Q(x)

v4

v3

gi ti-1,m-1 2i+1

Fig.3. The detailed circuit of the U-cell Fig.1 The detailed graph of Eq. (6) As a result, each loop operation is consumed every 2 clock cycles before changing. In general, the data on the input of the folded realization is assumed to be valid for 2N cycles before changing, where N is the number of loop operations executed on a single functional unit in hardware. For efficient VLSI implementation of Algorithm 2 in the bit-parallel systolic array, assuming that two elements, A(x) and P(x), can be pipelined into the processing array for performing the computations of Eq. (6). Fig. 2 shows an array for realizing the recursive

4. Modified Bit-Parallel Systolic Multiplier over GF(2m) In the previous section, the proposed multiplier over GF(2m) using the folded technique and the interleaved conventional multiplication method is presented. In Algorithm 2, it is easy to see that two steps (2.1, 2.2) in each loop computations are time-dependent. Therefore, each cell of Fig. 2 using SW2 to deal with two steps (2.1, 2.2) in each loop computations is required. To solve this problem, this section will introduce the time-independent 3

2006 IEEE International Conference on Systems, Man, and Cybernetics

A(x)=am-1xm-1+…+a1x+a0 and B(x)=bm-1xm-1+…+b1x+b0 in GF(2m). According to Algorithm 3, Fig. 4 shows the modified parallel-in-parallel-out systolic multiplier under the configuration of Fig. 2. The architecture is composed of m2 V-cells, m Q-cells, and m W-cells. Each Vi,j cell (Fig.5) includes one 2-input AND gate, one 2-input XOR gate and four 1-bit latches to perform two interleaved conventional multiplication operations. Each Wi cell (Fig.6) includes one 2-input XOR gate and two 1-bit latches to perform the final calculation in step 3 of Algorithm 3. Each Qj cell (Fig.7) includes one 2-input XOR gate, one SW3 and one 1-bit latch to combine both signals bm+i and k into single wire line. The latency of the multiplier requires only 3m clock cycles. The critical propagation delay of each cell is the total delay of one 2-input AND gate, one 2-input XOR gate and one 1-bit latch. Given Algorithm 3, the advantage of the modified multiplier, as shown in Fig. 4, has the following properties: (1) Two signals, ai and pi in the V-cell are combined into single wire line. (2) Two signals, bm-i and k in the V-cell are combined into single wire line. (3) Since two steps, 2.2 and 2.3, are time independent, the V-cell is the use of one 2-input AND gate, one 2-input XOR gate and four 1-bit latches to perform two interleaved conventional multiplication operations.

multiplication algorithm. Let Ti in the step 2.1 of Algorithm 1 be represented as the sum of Ci and Di, such as Ti=Ci+Di. The ith loop operation can be re-expressed as Ti =Ti-1x+bm-iA(x) mod F(x) =(Ci-1+Di-1)x mod xm +ti-1,m-1P(x) +bm-iA(x) =(Ci-1x mod xm + bm-iA(x))+(Di-1x mod xm + kP(x)) (7) = Ci+Di where k= ci-1,m-1+ di-1,m-1 Therefore, the multiplication algorithm can be modified by the following algorithm. Algorithm 3: (The proposed time-independent multiplication algorithm) INPUT: A(x), B(x), and P(x) OUTPUT: C(x)= A(x)B(x) 1. initial step 1.1 Ci=0 1.2 Di=0 2. for i=1 to m do begin 2.1 k= ci-1,m-1+ di-1,m-1 2.2 Ci=C i-1x (mod xm) +bm-iA(x) 2.3 Di=D i-1x (mod xm )+ kiP(x) end 3. Cm=Cm+Dm 4. return Cm Observing Algorithm 3, it is turned out that two-step computations in the ith loop operation are time independent, that is, the outcome of step 2.2 is not used in step 2.3. In the following example, we use the principle of Algorithm 3 to verify the correctness of the configuration of the multiplication in Example 1.

0

bm-1

bm-2

0 1 1 0 1 0

0 1 0 1 1 1

0 0 1 0 1 1

0 1 1 1 1 1

0 0 1 1 1 1

0 0 0 1 1 1

0 0 0 0 1 1

0 0 0 1 1 0

0 0 0 0 1 1

V0,m-1

V0,j

0

p1 a1

0

V0,1

p0 a0

0

V0,,0

Q1

V1,m-1

V1,j

V1,1

V1,0 0

bi

Qi

Vi,m-1

Vi,j

Vi,1

Vi,0 0

b0

Table 1. The behavior results of A(x)B(x) over GF(25) loop k Di Ci c0 c1 c2 c3 c4 d0 d1 d2 d3 d4 0 0 0 1 1 1

pj aj

0

0

Example 1: Let B(x)=x4+x3+x and A(x)=x3+x+1 be two elements of the field GF(25), where the field is constructed from F(x)=x5+x2+1. Assume that two elements are P(x)=x2+1. In the initial step, let us define that C0=0 and D0=0. Table 1 based on Algorithm 3 shows the behavior results of A(x)B(x) in each loop. After the five loop computations, the sum of C5 and D5 can be obtained by A(x)B(x)= x2+1.

0 1 2 3 4 5

Q0

pm-1 am-1

Qm-1

Vm-1,m-1

Vm-1,j

Vm-1,1

Vm-1,0

Wm-1

Wj

W1

W0

cm-1

cj

c1

c0

Fig.4. The time-independent bit-parallel systolic multiplier over GF(2m) pj aj ci,j

0 0 0 0 0 1

k bm-i

For clarity, the finite field GF(2m) generated by the primitive polynomial is used as an example to illustrate the systolic multiplier. Let

Fig.5. The detailed circuit for the V-cell

4

2006 IEEE International Conference on Systems, Man, and Cybernetics

GF(2m). A polynomial xm+xn +1 over GF(2) is called trinomials of degree m, where m>n>1. Applying the properties of trinomials, Lee in [13] using the LSB scheme proposed a low-complexity bit-parallel systolic multiplier. As comparison of Lee’ multiplier, the modified multiplier has the same circuit complexity in each cell, but the latency is more than Lee's multiplier by 2m-1 cycles. Seroussi [18] demonstrated that up to m=10000, 5148 irreducible trinomials were found. Stahnke [19] shown that primitive trinomials exist for slightly over one half of the m values. Therefore, Lee’s multiplier is unsuited for a general form field GF(2m). However, our multipliers have no such problem.

Fig.6. The detailed circuit for the W-cell

6. Conclusions

In this paper, the time-dependent and time-independent multiplication algorithms under both methods, an interleaved conventional multiplication and a folded technique, were presented. The proposed algorithms can be easily employed for realizing two new bit-parallel systolic multipliers. From Table 3, the results show that our proposed time-dependent and time-independent multipliers save about 38% and 54% space complexity as compared to Wang-Lin’s multiplier [11] and Yeh’s multiplier [10], respectively. Since low-complexity bit-parallel systolic multipliers for some classes of GF(2m) are suggested in [13,14,15,16], such multipliers are unsuited for a general form field GF(2m). However, our multipliers have no such problem. Moreover, our proposed architectures are well suited to VLSI systems because of their regular interconnection patterns, modular structures and fully inherent parallelism, and are suitable for applications, such as smart cards, mobile phone or other portable devices with limited specific space constraints.

k 2i+1 bi 2i+0 SW3

Fig.7. The detailed circuit for the Q-cell

5. Complexity

This paper has presented the two multiplication algorithms to implement two new bit-parallel systolic multipliers over GF(2m). Both algorithms using the MSB-first algorithm present time-dependent and time-independent multiplication algorithms with the concept of the interleaved conventional multiplication and the folded technique. In Algorithm 2, the proposed multiplier has a time-dependent problem. It is turned out that two clock cycles in each cell are needed to execute one loop cycle of Algorithm 2 (steps 2.1, 2.2). So the first multiplier has 50% throughput. To solve this problem, the modified multiplier in Fig. 4 is proposed with the time-independent method to overcome the space and time complexity of the proposed first multiplier. Applying the time-independent multiplication algorithm (Algorithm 3), the time-independent multiplier can be obtained by 100% throughput, and the circuit complexity is lower than that of the time-dependent multiplier, as shown in Table 2. A circuit comparison between the proposed multipliers and the existed multipliers is given in Table 2 and 3. As shown in the results of Table 2, the latency of the modified multiplier is the same as that of Yeh’s and Wang-Guo's multipliers [10,11]. We found out that the hardware complexity of the proposed multipliers by the use of the interleaved conventional multiplication is lower than those of usual traditional multipliers [10,11]. Thus, the proposed algorithms of finite field multiplication are more effective, while most other bit-parallel systolic multipliers for a general form field

References [1] F.J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes, Amsterdam: North-Holland, 1977. [2] R. Lidl and H. Niederreiter, Introduction to Finite Fields and Their Applications, New York: Cambridge Univ. Press, 1994. [3] R.E. Blahut, Fast Algorithms for Digital Signal Processing, Reading, Mass.: Addison-Wesley, 1985. [4] I.S. Reed and T.K. Truong, "The Use of Finite Fields to Compute Convolutions," IEEE Trans. Information Theory, vol. IT-21, No.2, pp.208-213, March 1975. [5] B. Benjauthrit and I.S. Reed, "Galois Switching Functions and Their Applications," IEEE Trans. Computers, Vol. 25, pp.78-86, Jan. 1976. [6] C.C. Wang and D. Pei, "A VLSI Design for Computing Exponentiation in GF(2m) and Its Application to Generate Pseudorandom Number Sequences," IEEE Trans. Computers, Vol.39, No.2, pp.258-262, Feb.1990. [7] H. Fan and Y. Dai, "Fast Bit-Parallel GF(2n) Multiplier for All Trinomials," IEEE Trans. 5

2006 IEEE International Conference on Systems, Man, and Cybernetics

Systolic Multipliers for GF(2m) Fields Defined by All-One and Equally-Spaced Polynomials," IEEE Trans. Computers, Vol. 50, No. 5, pp. 385-393, May 2001. [15] C.Y. Lee, "Low-Latency Bit-Parallel Systolic Multiplier for Irreducible xm+xn+1 With gcd(m,n)=1," IEICE Trans. Fundamentals, Vol.E86-A, No.11, pp. 2844-2852, Nov. 2003. [16] C.Y. Lee, J.S. Horng, I.C. Jou and E.H. Lu, "Low-Complexity Bit-Parallel Systolic Montgomery Multipliers for Special Classes of GF(2m)," to appear in IEEE Trans. Computers. [17] K. Parhi, VLSI Signal Processing Systems: Design and Implementation, John Wiley & Sons, 1999. [18] G. Seroussi, "Table of Low-Weight Binary Irreducible Polynomials," Visual Computing Dept., Hewlett Packard Laboratories, Aug. 1998. Available at: http://www. hpl.hp.com /techreports/98/ HPL-98-135.html. [19] W. Stahnke, "Primitive Binary Polynomials," Math. Comp., Vol. 27, pp. 977-980, 1973.

Computers, Vol.54, No.4, pp.485-490, April 2005. [8] J.H. Guo and C.L. Wang, "A Low-Complexity Power-Sum Circuit for GF(2m) and Its Applications," IEEE Trans. Circuits and Systems II, Vol. 47, No. 10, pp. 1091-1097, Oct. 2000 [9] E.H. Lu, C.Y. Lee, and R.L. Tsai, ''Decoding Algorithm for DEC RS Codes,'' Electronics Letters, Vol. 36, No. 6 , pp. 546 -548, March 2000 [10] C.S. Yeh, S. Reed, and T.K. Truong, "Systolic Multipliers for Finite Fields GF(2m)," IEEE Trans. Computers, Vol. 33, No. 4, pp. 357-360, Apr. 1984. [11] C.L. Wang and J.L. Lin, "Systolic Array Implementation of Multipliers for GF(2m)," IEEE Trans. Circuits and Systems II, vol. 38, No. 7, pp. 796-800, July 1991. [12] B.B Zhou, "A New Bit-Serial Systolic Multiplier Over GF(2m)," IEEE Trans. Computers, Vol. 37, No. 6, pp. 749-751, June 1988. [13] C.Y. Lee, "Low-Complexity Bit-Parallel Systolic Multiplier Over GF(2m) Using Irreducible Trinomials," IEE Computers and Digital Techniques, Vol. 144, No. 1, pp. 39-42, Jan 2003. [14] C.Y. Lee, E.H. Lu, and J.Y. Lee, "Bit-Parallel

Table 2 A comparison between the proposed multipliers and existed multipliers over GF(2m) Yeh et al. [10] Wang-Lin [11] Lee [13] Fig. 3 Fig. 4 General General Trinomials General General

Multiplier Generating polynomial Number of cells

m2

m2

U: m2 V: m

m2

Basic cell 2-input XOR 3-input XOR 2-input AND 1-bit latch switch Throughput Delay time per cell

2 0 2 7 0 100% TA+TX

0 1 2 7 0 100% TA+T3X

U V 1 1 0 0 1 0 4 1 0 0 100% TA+TX

1 0 1 5 1 50% TA+TX

V: m2 W: mQ: m V W Q 1 1 1 0 0 0 1 0 0 4 1 2 0 1 0 100% TA+TX

MSB 3m

MSB 3m

LSB 2m-1

MSB 4m

MSB 3m

Algorithm Latency(unit=cycle s)

Table 3 Multipliers

Details of gate counts

AND gates

XOR gates

1-bit latches

Switch gates

Total gates

Yeh et al. [10] Wang-Lin [11] Lee [13]

Generating polynomial General form General form Trinomials

2m2 2m2 m2

2m2 2m2 m2+m

7m2 7m2 4m2+m

0 0 0

11m2 11m2 6m2+2m

Fig. 2 Fig. 4

General form General form

m2 m2

m2 m2+2m

5m2 4m2+3m

m2+m m

8m2+m 6m2+6m

6

Suggest Documents