Low-complexity bit-parallel dual basis multipliers ... - Semantic Scholar

5 downloads 2059 Views 395KB Size Report
New bit-parallel dual basis multipliers using the modified BoothХs algorithm are ...... the BachelorХs degree (1986) in Medical Engineering and the M.S. degree.
Computers and Electrical Engineering 31 (2005) 444–459 www.elsevier.com/locate/compeleceng

Low-complexity bit-parallel dual basis multipliers using the modified BoothÕs algorithm Chiou-Yng Lee

a,1

, Che Wun Chiou

b,*

, Jim-Min Lin

c,2

a

b

Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Taoyuan County 333, Taiwan, ROC Department of Computer Science and Information Engineering, Ching Yun University, 229 Chien-Hsin Road, Chung-Li 320, Taiwan, ROC c Department of Information Engineering and Computer Science, Feng Chia University, Taichung City 407, Taiwan, ROC Received 25 August 2004; received in revised form 1 July 2005; accepted 7 September 2005 Available online 2 December 2005

Abstract New bit-parallel dual basis multipliers using the modified BoothÕs algorithm are presented. Due to the advantage of the modified BoothÕs algorithm, two bits are processed in parallel for reduction of both space and time complexities. A multiplexer-based structure has been proposed for realization of the proposed multiplication algorithm. We have shown that our multiplier saves about 9% space complexity as compared to other existing multipliers if the generating polynomial is trinomial or all one polynomial. Furthermore, the proposed multiplier is faster than existing multipliers.  2005 Elsevier Ltd. All rights reserved. Keywords: Dual basis; Finite field multiplication; BoothÕs algorithm; Trinomials; Polynomial basis; All one polynomial; Equally spaced polynomial

*

Corresponding author. Tel.: +886 3 4581196x3700; fax: +886 3 4588924. E-mail addresses: [email protected] (C.-Y. Lee), [email protected] (C.W. Chiou), [email protected] (J.-M. Lin). 1 Tel.: +886 3 4245215; fax: +886 3 4244168. 2 Tel.: +886 4 24517250x3754; fax: +886 4 24516101. 0045-7906/$ - see front matter  2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.compeleceng.2005.09.002

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

445

1. Introduction In recent years, finite (Galois) fields GF(2m) have found many applications in areas of error correcting code [1], cryptography [2], digital signal processing [3,4], switching theory [5], pseudorandom number generation [6], encoding of Reed–Solomon codes [7], and solving the Wiener–Hopf equation [8]. Multiplication is the most critical operation in finite field arithmetic operations. Other time-consuming finite field arithmetic operations such as exponentiation, division, and multiplicative inversion can be carried out by repeated multiplications. Hence, there is a need to have fast multiplication architecture with low complexity. The representation of the field elements plays a crucial role in the efficiency of the architectures for the arithmetic operations. There are three popular bases, polynomial (canonical or standard) basis (PB), normal basis (NB), and dual basis (DB) for representation of finite fields. Different representation has its distinct advantage. The polynomial basis architectures [9–19] have low design complexities and their sizes are easily extended to meet various applications due to their simplicity, regularity, and modularity in architecture. The major advantage of the normal basis [20–27] is that the squaring of an element is computed by a cyclic shift of the binary representation. As compared with other two former bases, the dual basis multipliers [7,8,28–34] require less chip area. Berlekamp [7] firstly employed the dual basis of the polynomial basis for implementation of bit-serial field multiplication. BerlekampÕs bit-serial field multiplier efficiently realizes the multiplication operation by using a linear feedback shift register in the implementation of the Reed–Solomon encoder. Several serial systolic multipliers based on BerlekampÕs multiplication scheme have been proposed [33,34]. The complexity of the dual basis multipliers is heavily relied on the choice of the primitive irreducible polynomial which generates the field. Therefore, Wang and Blake [33] firstly derived a bit-serial dual basis multiplier with an irreducible trinomial to give low space and time complexities. Furthermore, Morii et al. [8] have found that the weakly dual basis is also available for bit-serial multiplication. A self-dual normal basis multiplier was presented by Wang [31] with very low complexity. Fenn et al. [30] developed a bit-parallel version of such dual basis multipliers using general dual basis. Wu et al. [28] further reduced the space complexity of the multiplier in [30] by reusing some partial sums. The key idea of the ‘‘divide-and-conquer’’ algorithm was firstly introduced by Booth [35]. In this algorithm, intermediate results are always in a redundant form of two numbers. The modified BoothÕs algorithm in [36] further reduces the partial products to the half. At each iteration cycle, two bits of the multiplier are processed in parallel. Pekmestzi [37] proposed a multiplexer-based array multiplier by using the modified BoothÕs algorithm. Applying the modified BoothÕs algorithm, we will present a dual basis finite field multiplier which is mainly constructed by multiplexers rather than conventional AND and XOR gates. The proposed multiplier saves both space and time complexities while comparing with other existing dual basis multipliers. The organization of this paper is as follows. In Section 2, we introduce the mathematic background about the finite field arithmetic. In Section 3, the proposed multiplication algorithm using the modified BoothÕs algorithm is derived. Examples are described in Section 4. Both space and time complexities of various bit-parallel multipliers over GF(2m) are discussed in Section 5. Finally, a brief conclusion is made in Section 6.

446

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

2. Preliminaries Let GF(2m) be a finite field of 2m elements. GF(2m) is an extension field of the ground field GF(2). Let a be a root of an irreducible primitive polynomial P ðxÞ ¼ p0 þ p1 x1 þ p2 x2 þ    þ pm1 xm1 þ xm of degree m over GF(2). Let / = {1, a, a2, a3, . . . , am1} be a polynomial basis of GF(2m). An element B 2 GF(2m) is represented by B ¼ b0 þ b1 a þ b2 a2 þ    þ bm1 am1 ; where bi 2 {0, 1} for all 0 6 i 6 m  1. For c 2 GF(2m), the trace Tr(c) of c over GF(2) is given by TrðcÞ ¼

m1 X

i

c2 .

i¼0

The following properties for trace function are hold: (a) Trðc þ kÞ ¼ TrðcÞ þ TrðkÞ for all c; k 2 GFð2m Þ, (b) TrðecÞ ¼ eTrðcÞ for e 2 GFð2Þ and c 2 GFð2m Þ. The trace function is a linear function from GF(2m) to GF(2). Let / = {1, a, a2, a3, . . . , am1} and w = {w0, w1, w2, w3, . . . , wm1} be bases for GF(2m), and c 2 GF(2m), c 5 0. Then, both / and w bases are said be weakly dual to each other with respect to c if for all 0 6 i; j 6 m  1;  1 if i ¼ j; Trðcai wj Þ ¼ 0 if i 6¼ j. Fenn et al. [30] have showed that any polynomial basis has at least one dual basis with respect to the trace function. For example, one dual basis of the polynomial basis {1, a, a2, a3, a4} with the irreducible polynomial P(x) = 1 + x2 + x5 is {a, 1, a4, a3, a2}. For an element A 2 GF(2m), A is represented in both / and w bases by A¼

m1 X i¼0

ai ai ¼

m1 X

aj wj ;

j¼0

where ai and aj are the coordinates of A with respect to the bases / and w, respectively. It may be necessary to perform a basis conversion between the polynomial basis and its dual basis. As the above representations for an element A 2 GF(2m), the jth coordinate with respect to the basis w is given by ! m1 m1 X X ai ai ¼ ai Trðcaiþj Þ. ð1Þ aj ¼ Trðcaj AÞ ¼ Tr caj i¼0

i¼0

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

447

3. The proposed multiplexed-based multiplier Assume that m is even in this study and the result can be easily extended to an odd m. Let two elements A and B in GF(2m), and B, A are represented in the polynomial basis / and its dual basis w, respectively, as follows: A¼

m1 X

aj wj ;

j¼0



m1 X

bi ai ;

i¼0

where aj and bi 2 GF(2) for 0 6 i, j 6 m  1. The product C of A and B is computed as follows: A  B ¼ C; where C is represented in the dual basis w by C¼

m1 X

cj wj ;

j¼0

for all 0 6 j 6 m  1; cj 2 GFð2m Þ. The coefficient cj of C is computed by cj ¼ Trðcaj CÞ ¼ Trðcaj ABÞ ¼ Tr caj A

m1 X i¼0

! bi ai

¼

m1 X

ðbi Trðcaiþj AÞÞ.

Based on Eq. (2), the product C can be computed in the following form 32 3 2  3 2 c0 b0 Trðca1 AÞ Trðca2 AÞ . . . Trðcam1 AÞ Trðca0 AÞ 76 7 6 c 7 6 1 2 3 m Trðca AÞ Trðca AÞ . . . Trðca AÞ 76 b1 7 6 1 7 6 Trðca AÞ 76 7 6  7 6 2 3 4 7 6 7 6 Trðca AÞ 6 Trðca AÞ Trðca AÞ . . . Trðcamþ1 AÞ 7 76 b2 7 ¼ 6 c2 7. 6 76 7 6 7 6 ... ... ... ... ... 54 . . . 5 4 . . . 5 4 cm1 bm1 Trðcam1 AÞ Trðcam AÞ Trðcamþ1 AÞ . . . Trðca2m2 AÞ Let amþd ¼ Trðcamþd AÞ for 2  a1 a2 a0 6 a a2 a3 6 1 6  6 a2 a3 a4 6 6 a a4 a5 6 3 6 6 ... ... ... 6 6   4 am2 am1 am am1

am

amþ1

ð2Þ

i¼0

0 6 d 6 m  2, Eq. (3) based on Eq. (1) can be rewritten by 32 3 2  3 a3 . . . am2 am1 c0 b0 7 6    76 a4 . . . am1 am 76 b1 7 6 c1 7 7 76 7 6  7    7 7 6 6 a5 ... am amþ1 76 b2 7 6 c2 7 7 7 6 7 6 a6 . . . amþ1 amþ2 7 76 b3 7 ¼ 6 c3 7. 76 7 6 7 7 6 7 6 ... ... ... ... 7 76 . . . 7 6 . . . 7 76 7 6  7    amþ1 . . . a2m4 a2m3 54 bm2 5 4 cm2 5 amþ2 . . . a2m3 a2m2 cm1 bm1

ð3Þ

ð4Þ

448

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

Assume that the function fi,j,u,v is defined as follows: fi;j;u;v ¼ ai bu  aj bv . The function fi,j,u,v can also be represented by fi;j;u;v ¼ ai  aj  0 _ ai  aj  bv _ ai  aj  bu _ ai  aj  ðbu  bv Þ. Assuming that the results of fi,j,u,v can be determined depending on the values of the logical variables ai and aj . The case of fi,j,u,v requires computation when ai and aj have a value 1, i.e., fi,j,u,v = bu  bv. Hence, the computed result of fi,j,u,v can be selected from the set f0; bu ; bv ; bu  bv g by the values of ai and aj . Notably, an 4-to-1 MUX is used to realize the function fi,j,u,v and is shown in Fig. 1. Using fi;j;u;v Õs, the above equation is rewritten as 32 3 2  3 2 c0 f2;3;2;3 . . . fm2;m1;m2;m1 f0;1;0;1 1 7 6 7 6 f 6 f3;4;2;3 ... fm1;m;m2;m1 76 1 7 6 c1 7 7 6 1;2;0;1 76 7 6  7 6 7 6 7 6 f2;3;0;1 6 f4;5;2;3 ... fm;mþ1;m2;m1 76 1 7 6 c2 7 7 6 76 7 6 7 6 f5;6;2;3 . . . fmþ1;mþ2;m2;m1 76 1 7 ¼ 6 c3 7. ð5Þ 6 f3;4;0;1 76 7 6 7 6 76 . . . 7 6 . . . 7 6 ... ... ... ... 76 7 6 7 6 76 7 6 7 6 4 fm2;m1;0;1 fm;mþ1;2;3 . . . f2m4;2m3;m2;m1 54 1 5 4 cm2 5 cm1 fm1;m;0;1 fmþ1;mþ2;2;3 . . . f2m3;2m2;m2;m1 1 In other words, the coefficient of C is given by ci

¼

m1 bX c 2

f2jþi;2jþiþ1;2j;2jþ1 ;

for all 0 6 i 6 m  1.

ð6Þ

j¼0

As stated above, Fig. 2 shows the function block for the proposed multiplication algorithm. The function block ‘‘TRACE’’ performs the following trace functions to do pre-computing basis conversion: amþd ¼ Trðcamþd AÞ;

for 0 6 d 6 m  2.

The ‘‘Pre-XOR’’ function responses for the XOR functions as follows: bi  biþ1 ;

for i ¼ 0; 2; 4; . . . ; m  2.

Fig. 1. An 4-to-1 MUX for realizing fi,j,u,v.

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

449

Fig. 2. Function block for computing C = A · B.

The function block ‘‘MUX ARRAY’’ computes the function at the left side of Eq. (5) and its detail circuit is shown in Fig. 3. The function block ‘‘SUM’’ consists of binary XOR trees and performs final result C based on Eq. (6). If m is an odd number, one zero-column is added to make the matrix A in Eq. (4) with an even number of columns and one zero-row is inserted to the matrix B. Thus, Eq. (4) for an odd m is rewritten as 3 2 3 b0 2  2  3      a1 a2 a3 . . . am2 am1 0 6 a0 c 7 b1 7 6 0 7 7 6 a 6      a2 a3 a4 . . . am1 am 0 76 7 6 c 7 6 1 76 b2 7 6 1 7 6       7 7 6 6 a2 6 a3 a4 a5 ... am amþ1 0 7 76 b 7 6 c 2 7 6 3 7 7 6 a 6 6       7 a4 a5 a6 . . . amþ1 amþ2 0 76 ð7Þ 7 ¼ 6 c 7. 6 3 76 . . . 7 6 3 7 6 7 6 ... 7 6 ... 6 ... ... ... ... ... ... 07 76 b 7 7 6 6 76 m2 7 6  7 6       a a a a . . . a a 0 c 56 7 4 m2 5 4 m2 m1 m mþ1 2m4 2m3 4 bm1 5 am1 am amþ1 amþ2 . . . a2m3 a2m2 0 cm1 0 Based on Eq. (7), Eq. (5) becomes 2 f0;1;0;1 f2;3;2;3 . . . fm3;m2;m3;m2 6 f f3;4;2;3 . . . fm2;m1;m3;m2 6 1;2;0;1 6 6 f2;3;0;1 f4;5;2;3 ... fm1;m;m3;m2 6 6 f5;6;2;3 ... fm;mþ1;m3;m2 6 f3;4;0;1 6 6 ... ... ... ... 6 6 4 fm2;m1;0;1 fm;mþ1;2;3 . . . f2m5;2m4;m3;m2 fm1;m;0;1

fmþ1;mþ2;2;3

...

f2m4;2m3;m3;m2

fm1;1;m1;1

32

1

3

2

c0

3

6 7 6 7 fm;1;m1;1 7 76 1 7 6 c1 7 76 7 6  7 6 7 6 7 fmþ1;1;m1;1 7 76 1 7 6 c 2 7 76 7 6 7 fmþ2;1;m1;1 76 1 7 ¼ 6 c3 7. 76 7 6 7 76 . . . 7 6 . . . 7 ... 76 7 6 7 76 7 6 7 f2m3;1;m1;1 54 1 5 4 cm2 5 cm1 f2m2;1;m1;1 1

ð8Þ

450

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

Fig. 3. The detail circuit of the function block ‘‘MUX ARRAY’’ in Fig. 2.

The symbol ‘‘1’’ in the subscripts of fi;j;u;v Õs indicates a logical value ‘‘0’’ applied to the corresponding input.

4. Examples Two examples are used to describe our algorithm and will be depicted in the following paragraphs. 4.1. Trinomial An irreducible polynomial P(x) of degree m is termed as a trinomial polynomial if it can be formed as 1 + xn + xm with m > n > 0. A trinomial polynomial P(x) = 1 + x + x4 is as an example to illustrate our algorithm. Let a be a root of P(x). The basis / = {1, a, a2, a3} is a polynomial basis and is assumed in this example. Let the basis w = {w0, w1, w2, w3} be its dual basis. Suppose A and B are represented in the basis w and /, respectively, and are expressed by A ¼ a0 w0 þ a1 w1 þ a2 w2 þ a3 w;

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

451

and B ¼ b0 þ b1 a þ b2 a2 þ b3 a3 . The product C = A · B is computed by 32 3 2  3 2  c0 a0 a1 a2 a3 b0   4 7 7 6 6 6 a a2 a3 Trðca AÞ 76 b1 7 6 c1 7 7 6 1 76 7 ¼ 6 7. 6  4 a2 a3 Trðca4 AÞ Trðca5 AÞ 54 b2 5 4 c2 5 a3

Trðca4 AÞ

Trðca5 AÞ

Trðca6 AÞ

ð9Þ

c3

b3

Trace values in the above equation are given as follows: a4 ¼ Trðca4 AÞ ¼ Trðcða þ 1ÞAÞ ¼ TrðcaAÞ þ TrðcAÞ ¼ a1 þ a0 ; a5 ¼ Trðca5 AÞ ¼ Trðcaa4 AÞ ¼ Trðcaða þ 1ÞAÞ ¼ Trðca2 AÞ þ TrðcaAÞ ¼ a2 þ a1 ; a6 ¼ Trðca6 AÞ ¼ Trðca2 a4 AÞ ¼ Trðca2 ða þ 1ÞAÞ ¼ Trðca3 A þ ca2 AÞ ¼ a3 þ a2 . Thus, Eq. 2  a0 6 a 6 1 6  4 a2 a3

(9) is rewritten as a3

32

3

2

a2

a2 a3

a3 a1 þ a0

6 7 6 7 a1 þ a0 7 76 b1 7 6 c1 7 7 6 7 ¼ 6 7. a2 þ a1 54 b2 5 4 c2 5

a1 þ a0

a2 þ a1

a3 þ a2

b0

b3

c0

3

a1

ð10Þ

c3

Replacing with functions fi;j;u;v Õs, Eq. (9) or Eq. (10) is written as 3 2 2 3 c0 f0;1;0;1 f2;3;2;3   6 7 7 6f 6 1;2;0;1 f3;4;2;3 7 1 6c 7 ¼ 6 1 7. 7 6 4 f2;3;0;1 f4;5;2;3 5 1 4 c2 5 c3 f3;4;0;1 f5;6;2;3

ð11Þ

The multiplexer-based structure for implementation of Eq. (11) is shown in Fig. 4. 4.2. All one polynomial An irreducible polynomial P ðxÞ ¼ p0 þ p1 x þ    þ pm2 xm2 þ pm1 xm1 þ jxm is called all one polynomial (AOP) of degree m if pi = 1 for i = 0, 1, 2, . . . , m  1 [14]. We use the AOP P ðxÞ ¼ 1 þ x þ x2 þ x3 þ x4 as an example to describe the proposed algorithm. Let a be the root of P(x). The polynomial basis is / = {1, a, a2, a3}. Let the basis w = {w0, w1, w2, w3} be its dual basis. If A and B are elements in GF(24) generated by P(x) = 1 + x + x2 + x3 + x4 and are represented by A ¼ a0 w0 þ a1 w1 þ a2 w2 þ a3 w3 ; and B ¼ b0 þ b1 a þ b2 a2 þ b3 a3 .

452

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

Fig. 4. A multiplexer-based multiplier for the trinomial P(x) = 1 + x + x4.

The product C of 2  a0 a1 a2 6 a a  a 6 1 2 3 6   4 a2 a3 a4 a3

a4

a5

A and B is given by 32 3 2  3 c0 a3 b0  76 7 6 a4 76 b1 7 6 c1 7 7 6 7 ¼ 6  7. 7 a 5 54 b 2 5 4 c 2 5 a6

ð12Þ

c3

b3

Based on the property of a5 = 1, the trace values in the above equation are computed by a4 ¼ Trðca4 AÞ ¼ Trðcð1 þ a þ a2 þ a3 ÞAÞ ¼ TrðcAÞ þ TrðcaAÞ þ Trðca2 AÞ þ Trðca3 AÞ ¼ a0 þ a1 þ a2 þ a3 ;   a5 ¼ Tr ca5 A ¼ TrðcAÞ ¼ a0 ; a6 ¼ Trðca6 AÞ ¼ Trðcaa5 AÞ ¼ TrðcaAÞ ¼ a1 . Thus, we have the following equation: 2

a0

6 a 6 1 6  4 a2 a3

a3

32

3

2

a2

a2

a3

a3

a4

6 7 6 7 a4 7 76 b1 7 6 c1 7 76 7 ¼ 6 7. a0 54 b2 5 4 c2 5

a4

a0

a1

b0

b3

c0

3

a1

c3

ð13Þ

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

453

Fig. 5. A multiplexer-based multiplier for the AOP P(x) = 1 + x + x2 + x3 + x4.

Eq. (13) can be decomposed into 2  32 3 2 3 2 3 c0 a0 a1 a2 a3 0 b0 6 a a a 0 76 b 7 6 a b 7 6 c 7 6 1 76 1 7 6 4 3 7 6 1 7 2 3 6  76 7 þ 6 7 ¼ 6 7.   4 a2 a3 0 a0 54 b2 5 4 a4 b2 5 4 c2 5 a3

0

a0

a1

b3

a4 b1

ð14Þ

c3

Using fi,j,u,v representation, the above equation is expressed by 2 3 2 3 2 3 c0 0 f0;1;0;1 f2;3;2;3   6  7 6 7 6 f 7 6 1;2;0;1 f3;1;2;3 7 1 6 a b3 7 6 c 7 þ 6 4 7 ¼ 6 1 7. 6 7 4 f2;3;0;1 f1;0;2;3 5 1 4 a4 b2 5 4 c2 5 c3 f3;1;0;1 f0;1;2;3 a4 b1

ð15Þ

The multiplexer-based implementation of Eq. (15) is shown in Fig. 5.

5. Complexity analysis Firstly, we use an example with the generating polynomial P(x) = 1 + x + xm to describe how to derive the space and time complexities. Table 1 shows the space and time complexities of each function block in Fig. 2 for such a generating polynomial P(x) = 1 + x + xm.

454

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

Table 1 Space and time complexities for each operation of the dual basis multiplier with P(x) = 1 + x + xm Item

Operation

Space complexity

Delay

1 2 3

TRACE Pre-XOR MUX ARRAY

4

SUM

(m m 1) XORs 2 XORs   m  m2 4-to-1 MUXs   m  m2 XORs       m  m2 þ m þ m2  1 XORs and   m  m2 4-to-1 MUXs

1 XOR gate delay 1 XOR gate delay  m 2 4-to-1 MUX gate delays    log2 m2 XOR gate delays     log2 m2 þ 2 XOR gate delays and m 2 4-to-1 MUX gate delays

Total

In general, the space complexity of the proposed multiplication algorithm for the generating polynomial with k terms requires l mm lmm þ ðk  2Þm þ  k þ 2 2  input XOR gates; and m 2m 2 lm m 4-to-1 Multiplexers. 2 However, some multiplexers can be replaced by AND gates to reduce complexity for some special generating polynomials such as AOP. Comparisons of the space complexity for various bit-parallel multipliers over GF(2m) for trinomial and all one polynomial are listed in Tables 2A and 2B, respectively. We will take into the transistor count using a standard CMOS VLSI realization. In the CMOS VLSI technology, 2-input AND, 2-input XOR and 4-to-1 MUX are composed of 6, 6 and 16 transistors, respectively [37–40]. Tables 3A and 3B show the total number of transistors for various bit-parallel multipliers. As the trinomial of the form P(x) = 1 + xn + xm is employed for the generating polynomial, our multiplexer-based multiplication algorithm saves about 9% space complexity while comparing with other existing multipliers. When the generating polyno-

Table 2A Comparison of space-complexity of bit-parallel multipliers in GF(2m) using an irreducible trinomial Multipliers

Basis

#AND

#XOR

#MUX

PB DB DB

m2 m2 0

m21 m2 1   m  m2 þ 1  1

0 0m

P ðxÞ ¼ 1 þ xn þ xm ; 1 < n < m2 Wu [19] PB Wu et al. [28] DB Presented here DB

m2 m2 0

m2  1 m2 1   m  m2 þ 1  1

0 0m

m2 m2 0

m2  m2 m2 m2   m  m2  12

0 0 m

m

P(x) = 1 + x + x , m > 2 Wu [19] Wu et al. [28] Presented here

n

m

P ðxÞ ¼ 1 þ x þ x ; n ¼ Wu [19] Wu et al. [28] Presented here

m 2

PB DB DB

2

2

2

m

m

m

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

455

Table 2B Comparison of space-complexity of bit-parallel multipliers in GF(2m) using an irreducible AOP Multipliers Itoh–Tsujii [14] Wu–Hasan [29] Presented here

Basis PB DB DB

#AND 2

m + 2m + 1 m2 m1

#XOR 2

m + 2m m2  1   ðm þ 3Þ m2  2

#MUX 0 0  m m2

Table 3A Comparison of the total number of transistors of various bit-parallel multipliers in GF(2m) using an irreducible trinomial Multipliers

Basis

Generating polynomial

#Transistors

P(x) = 1 + x + xm, m > 2 Wu [19] Wu et al. [28] Presented here

PB DB DB

Trinomial Trinomial Trinomial

12m2  6 12m2  6 22m m2 þ 6m  6

P ðxÞ ¼ 1 þ xn þ xm ; 1 < n < m2 Wu [19] PB Wu et al. [28] DB Presented here DB

Trinomial Trinomial Trinomial

12m2  6 12m2  6 22m m2 þ 6m  6

P ðxÞ ¼ 1 þ xn þ xm ; n ¼ m2 Wu [19] Wu et al. [28] Presented here

Trinomial Trinomial Trinomial

12m2  6 12m2  3m 22m m2  3m

PB DB DB

Table 3B Comparison of the total number of transistors of various bit-parallel multipliers in GF(2m) using an irreducible AOP Multipliers

Basis

Generating polynomial

#Transistors

Itoh–Tsujii [14] Wu–Hasan [29] Presented here

PB DB DB

AOP AOP AOP

12m2 + 24m + 6 12m2  6 22m m2 þ 15m  8

mial is all one polynomial, our proposed multiplier still saves about 9% space complexity while comparing with existing bit-parallel multipliers. Tables 4A and 4B show comparisons of the time complexities of various bit-parallel multipliers over GF(2m). The symbols TA, TX and TM denote the gate delays of 2-input AND, 2-input XOR and 4-to-1 MUX, respectively. Our proposed multiplexer-based multiplier is little faster than existing multipliers. With different m values, Tables 5A and 5B show the comparing results of space complexity of various bit-parallel multipliers with trinomials and AOPs, respectively. Tables 6A and 6B depict the comparing results of time complexity of various bit-parallel multipliers with trinomials and AOPs, respectively, if real circuits such as M74HC86 (STMicroelectronics, XOR gate, tPD = 12 ns (TYP.)) [41], M74HC08 (STMicroelectronics, AND gate, tPD = 7 ns (TYP.)) [42], and M74HC153 (STMicroelectronics, Multiplexer, tPD = 16 ns (TYP.)) [43] are employed.

456

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

Table 4A Comparison of time complexity of various bit-parallel multipliers in GF(2m) using an irreducible trinomial Multipliers

Basis

Generating polynomial

Propagation delay

PB DB DB

Trinomial Trinomial Trinomial

T A þ ðdlog2 ðm  2Þe þ 2ÞT X T A þ ðdlog2 me þ 1ÞT X T M þ ðdlog2 ðm2 Þe þ 1ÞT X

P ðxÞ ¼ 1 þ xn þ xm ; 1 < n < m2 Wu [19] PB Wu et al. [28] DB Presented here DB

Trinomial Trinomial Trinomial

T A þ ðdlog2 ðm  1Þe þ 2ÞT X T A þ ðdlog2 dmþk1 2 ee þ 2ÞT X T M þ ðdlog2 ðm2 Þe þ 1ÞT X

P ðxÞ ¼ 1 þ xn þ xm ; n ¼ m2 Wu [19] Wu et al. [28] Presented here

Trinomial Trinomial Trinomial

T A þ ðdlog2 ðm  1Þe þ 1ÞT X     T A þ log2 m þ 2 m4 TX  m  T M þ log2 2 þ 1 T X

m

P(x) = 1 + x + x , m > 2 Wu [19] Wu et al. [28] Presented here

PB DB DB

Table 4B Comparison of time complexity of various bit-parallel multipliers in GF(2m) using an irreducible AOP Multipliers

Basis

Generating polynomial

Propagation delay

Itoh–Tsujii [14] Wu–Hasan [29] Presented here

PB DB DB

AOP AOP AOP

T A þ dlog2 m þ log2 ðm þ 2ÞeT X T A þ ðdlog  2 ðm  2Þe þ1ÞT X T M þ dlog2 ðmþ1 2 Þe þ 1 T X

Table 5A Comparison of space complexity for various bit-parallel multipliers in GF(2m) using an irreducible trinomial with different m values Multipliers

m = 21

m = 41

m = 81

m = 161

m = 321

m = 641

Wu [19] Wu et al. [28] Presented here Improvement

5286 5286 5202 1.6%

20,166 20,166 19,182 5.1%

78,726 78,726 73,542 7.0%

311,046 311,046 287,862 8.1%

1,236,486 1,236,486 1,138,902 8.6%

4,930,566 4,930,566 4,530,582 8.8%

Note: Unit: transistor.

Table 5B Comparison of space complexity for various bit-parallel multipliers in GF(2m) using an irreducible AOP with different m values Multipliers

m = 18

m = 36

m = 82

m = 178

m = 348

Itoh–Tsujii [14] Wu–Hasan [29] Presented here Improvement

4326 3882 3826 1.5%

16,422 15,546 14,788 5.1%

82,662 80,682 75,186 7.3%

384,486 380,202 351,186 8.3%

1,461,606 1,453,242 1,337,356 8.6%

Note: Unit: transistor.

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

457

Table 6A Comparison of time complexity for various bit-parallel multipliers in GF(2m) using an irreducible trinomial with different m values Multipliers

m = 18

m = 36

m = 82

m = 178

m = 321

m = 641

Wu [19] Wu et al. [28] Presented here

91 79 76

103 91 88

115 103 100

127 115 112

139 127 124

151 139 136

Note: Unit = ns.

Table 6B Comparison of time complexity for various bit-parallel multipliers in GF(2m) using an irreducible AOP with different m values Multipliers

m = 12

m = 28

m = 52

m = 100

m = 178

m = 348

Itoh–Tsujii [14] Wu–Hasan [29] Presented here

103 67 64

127 79 76

151 91 88

175 103 100

187 115 112

211 127 124

Note: Unit = ns.

6. Conclusions We have presented new bit-parallel dual basis multipliers using the modified BoothÕs algorithm. The proposed multiplier inherits the advantage of the modified BoothÕs algorithm and then reduces both space and time complexities. A multiplexer-based structure has been proposed for realization of the proposed multiplication algorithm. We have shown that our multiplier saves about 9% space complexity as compared to other existing multipliers if the generating polynomial is trinomial or all one polynomial. Furthermore, the proposed multiplier is little faster than other existing multipliers. Acknowledgments The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their great help in improving the paper. References [1] MacWilliams FJ, Sloane NJA. The theory of error-correcting codes. Amsterdam: North-Holland; 1977. [2] Lidl R, Niederreiter H. Introduction to finite fields and their applications. New York: Cambridge University Press; 1994. [3] Blahut RE. Fast algorithms for digital signal processing. Reading, MA: Addison-Wesley; 1985. [4] Reed IS, Truong TK. The use of finite fields to compute convolutions. IEEE Trans Inform Theory 1975; IT-21(2):208–13. [5] Benjauthrit B, Reed IS. Galois switching functions and their applications. IEEE Trans Comput 1976; C-25(January):78–86.

458

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

[6] Wang CC, Pei D. A VLSI design for computing exponentiation in GF(2m) and its application to generate pseudorandom number sequences. IEEE Trans Comput 1990;39(2):258–62. [7] Berlekamp ER. Bit-serial Reed–Solomon encoder. IEEE Trans Inform Theory 1982;IT-28(November):869–74. [8] Morii M, Kasahara M, Whiting DL. Efficient bit-serial multiplication and the discrete-time Wiener–Hopf equation over finite fields. IEEE Trans Inform Theory 1989;35(6):1177–83. [9] Bartee TC, Schneider DJ. Computation with finite fields. Inform Comput 1963;6(March):79–98. [10] Mastrovito ED. VLSI architectures for multiplication over finite field GF(2m). In: Mora, T, editor. Applied algebra, algebraic algorithms, and error-correcting codes, Proc sixth int conf, AAECC-6, Rome, July 1988. p. 297– 309. [11] Mastrovito ED. VLSI architectures for computations in Galois fields. PhD thesis, Linko¨ping University, Department of Electrical Engineering, Linko¨ping, Sweden, 1991. [12] Koc¸ C ¸ K, Sunar B. Low-complexity bit-parallel canonical and normal basis multipliers for a class of finite fields. IEEE Trans Comput 1998;47(3):353–6. [13] Lee CY. Low complexity bit-parallel systolic multiplier over GF (2m) using irreducible trinomials. IEE Proc Comput Digit Tech 2003;150(1):39–42. [14] Itoh T, Tsujii S. Structure of parallel multipliers for a class of fields GF (2m). Inform Comput 1989;83:21–40. [15] Hasan MA, Wang M, Bhargava VK. Modular construction of low complexity parallel multipliers for a class of finite fields GF (2m). IEEE Trans Comput 1992;41(8):962–71. [16] Lee CY, Lu EH, Lee JY. Bit-parallel systolic multipliers for GF (2m) fields defined by all-one and equally-spaced polynomials. IEEE Trans Comput 2001;50(5):385–93. [17] Paar C. A new architecture for a parallel finite field multiplier with low complexity based on composite fields. IEEE Trans Comput 1996;45(7):856–61. [18] Paar C, Fleischmann P, Roelse P. Efficient multiplier architectures for Galois Fields GF (24n). IEEE Trans Comput 1998;47(2):162–70. [19] Wu H. Bit-parallel finite field multiplier and squarer using polynomial basis. IEEE Trans Comput 2002;51(7):750–8. [20] Massey JL, Omura JK. Computational method and apparatus for finite field arithmetic. US Patent Number 4,587,627, May 1986. [21] Wang CC, Truong TK, Shao HM, Deutsch LJ, Omura JK, Reed IS. VLSI architectures for computing multiplications and inverses in GF (2m). IEEE Trans Comput 1985;C-34(8):709–17. [22] Reyhani-Masoleh A, Hasan MA. A new construction of Massey–Omura parallel multiplier over GF (2m). IEEE Trans Comput 2002;51(5):511–20. [23] Reyhani-Masoleh A, Hasan MA. Fast normal basis multiplication using general purpose processors. IEEE Trans Comput 2003;52(11):1379–90. [24] Oh S, Kim CH, Lim J, Cheon DH. Efficient normal basis multipliers in composite fields. IEEE Trans Comput 2000;49(10):1133–8. [25] Fan H, Dai Y. Key function of normal basis multipliers in GF (2n). Electron Lett 2002;38(23):1431–2. [26] Sunar B, Koc¸ C ¸ K. An efficient optimal normal basis type II multiplier. IEEE Trans Comput 2001;50(1):83–7. [27] Takagi N, Yoshiki J-I, Takagi K. A fast algorithm for multiplicative inversion in GF (2m) using normal basis. IEEE Trans Comput 2001;50(5):394–8. [28] Wu H, Hasan MA, Blake IF. New low-complexity bit-parallel finite field multipliers using weakly dual bases. IEEE Trans Comput 1998;47(11):1223–34. [29] Wu H, Hasan MA. Low complexity bit-parallel multipliers for a class of finite fields. IEEE Trans Comput 1998;47(8):883–7. [30] Fenn STJ, Benaissa M, Taylor D. GF(2m) multiplication and division over the dual basis. IEEE Trans Comput 1996;45(3):319–27. [31] Wang CC. An algorithm to design finite field multipliers using a self-dual normal basis. IEEE Trans Comput 1989;38(10):1457–9. [32] Fenn STJ, Benaissa M, Taylor D. Dual basis systolic multipliers for GF(2m). IEE Proc Comput Digit Tech 1997;144(1):43–6. [33] Wang M, Blake IF. Bit serial multiplication in finite fields. SIAM J Disc Math 1990;3(1):140–8.

C.-Y. Lee et al. / Computers and Electrical Engineering 31 (2005) 444–459

459

[34] Diab M, Poli A. New bit-serial systolic multiplier for GF(2m) using irreducible trinomials. Electron Lett 1991;27(13):1183–4. [35] Booth A. A signed binary multiplication technique. Quart J Mech Appl Math 1951;4:236–40. [36] MacSorley L. High speed arithmetic in binary computers. Proc IRE 1961;49(January). [37] Pekmestzi KZ. Multiplexer-based array multipliers. IEEE Trans Comput 1999;48(1):15–23. [38] Baker RJ, Li HW, Boyce DE. CMOS—circuit, design, layout, and simulation. New York: IEEE Press; 1998. [39] Kang SM, Leblebici Y. CMOS digital integrated circuits—analysis and design. McGraw-Hill; 1999. [40] Byun G-Y, Kim H-S. Low-complexity multiplexer-based multiplier of GF (2m). IEICE Trans Inform Syst 2003;E86-D(12):2684–90. [41] Available from: http://www.st.com/stonline/books/pdf/docs/2006.pdf. [42] Available from: http://www.st.com/stonline/books/pdf/docs/1885.pdf. [43] Available from: http://www.st.com/stonline/books/pdf/docs/1905.pdf. Chiou-Yng Lee received the BachelorÕs degree (1986) in Medical Engineering and the M.S. degree in Electronic Engineering (1992), both from the Chung Yuan University, Taiwan, and the Ph.D. degree in Electrical Engineering from Chang Gung University, Taiwan, in 2001. From 1988 to now, he was a research associate with Chunghwa Telecommunication Laboratory in Taiwan. He joined the department of project planning. He taught those related field courses at Ching-Yun Technology University. He is currently as an assistant professor of Department of Computer Information and Network Engineering in Lunghwa University of Science and Technology. His research interests include computations in finite fields, error-control coding, signal processing, and digital transmission system. Besides, he is a member of the IEEE and the IEEE Computer society. He is also an honor member of Phi Tao Phi in 2001.

Che Wun Chiou received his B.S. degree in Electronic Engineering from Chung Yuan Christian University in 1982, the M.S. degree and the Ph.D. degree in Electrical Engineering from National Cheng Kung University in 1984 and 1989, respectively. From 1990 to 2000, he was with the Chung Shan Institute of Science and Technology in Taiwan. He joined the Department of Electronic Engineering and the Department of Computer Science and Information Engineering, Ching Yun University in 2000 and 2005, respectively. He is currently as Dean of Division of Continuing Education in Ching Yun University. His current research interests include faulttolerant computing, computer arithmetic, parallel processing, and cryptography.

Jim-Min Lin was born on March 5, 1963 in Taipei, Taiwan. He received the B.S. degree in Engineering Science and the M.S. and the Ph.D. degrees in Electrical Engineering, all from National Cheng Kung University, Tainan, Taiwan, in 1985, 1987, and 1992, respectively. Since February 1993, he has been an Associate Professor at the Department of Information Engineering and Computer Science, Feng Chia University, Taichung City, Taiwan. His research interests include Operating Systems, Software Integration/Reuse, Embedded Systems, Software Agent Technology, and Testable Design.

Suggest Documents