BASED ON MODIFIED BOOTH'S ALGORITHM

International Journal of Computers and Applications, Vol. 29, No. 4, 2007

LOW-COMPLEXITY BIT-PARALLEL MULTIPLIERS FOR A CLASS OF GF(2m ) BASED ON MODIFIED BOOTH’S ALGORITHM C.-Y. Lee∗ and C.W. Chiou∗∗

algorithms rely on field sizes of several 100 bits. The majority of publications concentrate on finite field architectures for relatively small fields suitable for implementation of channel codes. As cryptographic applications [4, 5] like the Diffie–Hellman key exchange algorithm are based on the discrete exponentiation over GF(2m ). These methods for computing exponentiations over GF(2m ) based on Fermat’s theorem are performed by the repeated multiplysquare algorithm. To achieve a high level of security, many private and public-key algorithms which are relied on computations in GF(2m ) require large field sizes, some as large as GF(22000 ). Hence, the efficient design of multiplication algorithms and architectures is a major research objective. A polynomial xm + · · · + x2 + x + 1 over GF(2m ) is called an all one polynomial (AOP) of degree m. It is easy to show that an AOP basis exists in GF(2m ) if and only if m + 1 is a prime and 2 is a primitive root modulo (m + 1). Applying the property of the AOP of degree m, i.e., xm+1 = 1, each element is represented by a non-conventional basis having m + 1 basis element in GF(2m ). Many efficient architectures whose elements are constructed from an irreducible AOP for the multiplication in GF(2m ) using this element representation are studied [6–9]. With linear feedback shift register architecture, Fenn et al. in [6] suggested two types of bit-serial multipliers. These architectures have efficient modularity and space complexity. Koc and Sunar [7] presented a low-complexity bit-parallel canonical basis multiplier. This multiplier was extended to obtain a bit-parallel normal basis multiplier. Their structure needs hardware complexity of m2 AND gates and m2 − 1 XOR gates. Wu and Hasan [8] proposed two multipliers based on the weakly dual basis. Lee et al. [9] used an inner-product algorithm to implement a bitparallel systolic multiplier. Lee et al. [10] also presented a convolution-based systolic multiplier. This architecture takes the total latency of (m + 1)(TAND + TXOR ). For a certain degree, an irreducible equally spaced polynomial (ESP) of a high degree can be obtained from a corresponding irreducible AOP with very low degree. Representing the field elements with respect to a polynomial basis, many ESP-based multipliers can use an AOP-based

Abstract This investigation presents an effective algorithm for computing multiplication over a class of GF(2m ) based on both irreducible all one polynomials (AOPs) and equally spaced polynomials (ESPs). The proposed AOP-based multiplier uses the modified Booth’s algorithm to develop a new multiplexer-based bit-parallel multiplier that is simple and modular and such properties are important for VLSI hardware implementation.

The multiplier requires m/4(m + 1)

MUX4×1 and (1.5m2 + 0.5m − 1) XOR gates. Its time delay is not greater than TM + (3 + log2 m/4)TX , where TM and TX are the time delays of MUX4×1 and 2-input XOR gate, respectively. For a certain degree, an irreducible ESP with a high degree can be obtained from a corresponding irreducible AOP with a relatively very low degree. Using the subword parallel processing, the proposed AOP-based multiplier with low degree can also be adopted to realize ESP-based multipliers with high degrees.

Key Words Bit-parallel multiplier, modified Booth’s algorithm, finite field, AOP, ESP, subword parallel processing

1. Introduction There are two main areas of applications for finite (Galois) field arithmetic: channel coding such as the widespread Reed–Solomon codes and public-key cryptography [1–3]. Although both channel codes and cryptographic algorithms make use of the finite field GF(2m ), the field orders needed differ dramatically: channel codes are typically restricted to arithmetic with field elements which are represented by up to eight bits, whereas public-key ∗

Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Taoyuan County 333, Taiwan, ROC; e-mail: [email protected] ∗∗ Department of Computer Science and Information Engineering, Ching Yun University, Taiwan 320, ROC; e-mail: cwchiou@ cyu.edu.tw Recommended by Dr. Sunil R. Das (paper no. 202-1934)

337

Suppose that α is a root of an irreducible AOP of degree m. Then any element A in the Galois field GF(2m ) can be represented as A = a0 + a1 α + a2 α2 + . . . + am−1 αm−1 , where the coordinates ai ∈ GF(2) for 0 ≤ i ≤ m − 1. As αm+1 = 1, the element A can also be represented as A = A0 + A1 α + . . . + Am−1 αm−1 + Am αm , where ai = Ai + Am for 0 ≤ i ≤ m − 1. Thus, an element A ∈ GF(2m ) has two different representations if the basis {1, α, α2 , . . . , αm } is used to represent the element A. Note that the basis {1, α, α2 , . . . , αm } is called the AOP basis of the canonical basis {1, α, α2 , . . . , αm−1 }. In the AOP basis, for any element A and B ∈ GF(2m ), the following features are held [9]:

multiplier of degree m to extend an ESP-based multiplier of degree m(m + 1)i , i = 1, 2, . . . , such as those multipliers in [7–11]. As m increases, an ESP-based multiplier becomes more complicated. By using the methods of the modular construction, there are two existing architectures which are ESP-based multipliers of a high degree and are realized by an AOP-based multiplier with a small degree, as seen in [12, 13]. In fact, design concepts of two such architectures are similar to the subword parallel concept for general-propose processors first introduced in [14] for multimedia applications. Traditional XOR and AND gates are used by various AOP-based multipliers [9, 10, 13] for giving efficient bitparallel systolic architectures. On the other hand, the key idea of the “divide-and-conquer” methodology was first introduced by Booth in [15]. In this technique, intermediate results are always represented in a redundant form of two integer numbers. Pekmestzi in 1999 [16] presented a multiplexer-based array multiplier and complexities were efficiently reduced by using the modified Booth’s algorithm. From the basic idea of the modified Booth’s algorithm, this article alternates the use of 4 × 1 multiplexer and XOR gates to implement a finite field multiplier that the field element is constructed from irreducible AOPs of degree m. Therefore, the proposed AOP-based multiplier architecture differs from conventional AOP-based multipliers [9, 10, 13]. Moreover, using the basic concept of the subword parallel processing, the proposed ESP-based multiplier of large fields using the idea of a AOP multiplier with small fields is also presented. The ESP-based multiplier over GF(2mr ) consists only of r AOP-based multipliers. The rest of the paper is organized as follows. Section 2 describes the field representations of the elements of GF(2m ) generated by an irreducible AOP. Section 3 proposes the new multiplexer-based multiplication algorithm based on the modified Booth’s algorithm. Section 4 uses the idea of the proposed AOP-based multiplier to construct an ESP-based multiplier of degree m × r. Section 5 draws conclusions.

(1) αm+1 = 1. (2) Aαi = A(i) , where A(i) denotes the element obtained by shifting A cyclically i positions to the right. (3) Aα−i = A(−i) , where A(−i) denotes the element obtained by shifting A cyclically i positions to the left. (4) The inner product of A(i) and B (−i) , termed by “A(i) B (−i) ”, can be given by: A(i) B (−i) =

m

a b α ,

j=0

where denotes x mod m + 1. m A(i) B (−i) . (5) AB = i=0

As stated above, the product C = AB, based on the aforementioned properties, is involved by two operations: cyclic shifting and inner product. For simplicity, the finite field GF(24 ) is used as an example to illustrate the AOP-based multiplication. Let A = a0 + a1 α = aA2 α2 + a3 α3 + a4 α4 and B = b0 + b1 α + b2 α2 + b3 α3 + b4 α4 be two any elements of the field GF(24 ); and let C = c0 + c1 α + c2 α2 + c3 α3 + c4 α4 be the product of A and B. The product C can then be computed using property (5), as shown in the following Table 1. Table 1 The Product of AB over GF(24 )

2. Finite Field Representation It is assumed that the reader is familiar with the basic concepts of finite fields. The properties of finite fields are covered in detail in [1, 2]. GF(2m ) contains 2m elements and is an extension field of ground field GF(2) (i.e., {0,1}). Let α be a root of an (primitive) irreducible polynomial P (x) = p0 + p1 x + . . . + pm−1 xm−1 + xm over GF(2m ), each element A in GF(2m ) can uniquely be represented by A = a0 + a1 α + · · · + am−1 αm−1 over GF(2) if 1, α, α2 , . . ., and αm−1 are linearly independent. The set {1, α, α2 , . . . , αm−1 } is called a polynomial basis of GF(2m ). A polynomial of the form P (x) = p0 + p1 x + p2 x2 + . . . + pm xm over GF(2m ) is called an AOP of degree m if pi = 1 for i = 0, 1, 2, . . . , m [11]. It has been pointed that an AOP is irreducible if and only if m + 1 is a prime and 2 is a generator of the field GF(m + 1) [11]. For m ≤ 100, the values of m for which an AOP of degree m is irreducible, are 2, 4, 10, 12, 18, 28, 36, 52, 58, 60, 66, 82 and 100.

1 A(0) B (0) =

α2

α4

α6 (=α) α8 (=α3 )

a0 b 0 a1 b 1 a2 b 2

a3 b 3

a4 b 4

= a4 b 1 a0 b 2 a1 b 3

a2 b 4

a3 b 0

A(2) B (−2) = a3 b2 a4 b3 a0 b4

a1 b 0

a2 b 1

= a2 b 3 a3 b 4 a4 b 0

a0 b 1

a1 b 2

A(4) B (−4) = a1 b4 a2 b0 a3 b1

a4 b 2

a0 b 3

c1

c3

(1)

A

(3)

A

B B

(−1)

(−3)

c0

c2

c4

Notably, all coefficients ai and bi in the (m + 1) × (m + 1) array are symmetric. Based on the above properties, From the results of C = AB computation, the AOPbased multiplier is suited for implementing parallel-in parallel-out bit-parallel systolic architecture, as seen in [9]. The multiplier is constructed by (m + 1)2 identical 338

Set j = −k + m + 1 for the 3rd term in the right side of the above equation, we have:

cells, each cell consists of one 2-input AND gate, one 2-input XOR gate and three 1-bit latches. The latency needs m + 1 clock cycles. The major feature of this AOPbased multiplier based on the inner-product multiplication can achieve low-complexity bit-parallel systolic architecture as compared to traditional multipliers with general field of GF(2m ). Given the above five properties, Lee et al. [10] proved that AB 2 + C computations with MSBfirst algorithm can be implemented by a low-complexity bit-parallel systolic architecture. As a polynomial of the form g(x) = 1 + xr + x2r + . . . + xnr is called a r-ESP of degree nr, Lee et al. [13] pointed out that ESP-based multipliers can be constructed from an AOP-based multiplier. As stated above, their multipliers based on two operations, the cyclic shifting and the inner product, are utilized to achieve efficient systolic array architectures. Moreover, because an AOP basis is a non-conventional basis having m + 1 basis elements in GF(2m ), such multipliers require extra logical operations to convert the basis to an ordinary basis. For the space complexity, such AOP-based multipliers require (m + 1)2 XOR gates, (m + 1)2 AND gates and 3(m + 1)2 1-bit latches.

m/2

c = a b +

a b

j=1 m/2

+

a b

k=1 m/2

= a b +

a b

j=1 m/2

+

a b

k=1

Next, set j = k for the 3rd term in the right side of the above equation, the coefficient c can be expressed as: m/2

c = a b +

(a b + a b )

j=1

(1) 3. Proposed AOP-Based Multiplier Over GF(2m ) Using Modified Booth’s Algorithm

In fact, the term a b + a b in the above equation can also be represented by: a b + a b

As the major advantage of the modified Booth’s algorithm is that two bits are processed in parallel, this section will investigate the use of 4-to-1 multiplexer and XOR gate to establish the efficient multiplexer-based bit-parallel multiplier using the modified Booth’s algorithm.

= (a + a )(b + b ) + a b + a b

(2)

Therefore, 3.1 Algorithm m/2

c =

With the AOP basis representation, assume that two elements A and B in GF(2m ) can be represented by:

(a + a )(b + b )

j=1 m

+ A = a0 + a1 α + · · · + am−1 αm−1 + am αm

a b

(3)

j=0

From the above equation, it is noted that the term m a b exists in each coefficient c and can be

B = b0 + b1 α + · · · + bm−1 αm−1 + bm αm

j=0

discarded when, to perform a multiplication in GF(2m ), the product C must be carried out by the modulo reduction with the AOP. Therefore, the product C can be expressed as:

where am = bm = 0. Applying Lee–Lu–Lee’s multiplication m A(i) B (−i) , the coeffialgorithm in [9], i.e., C = AB = i=0

cients of the product C can be rewritten as:

m

C=

c =

m j=0

m 2

(a + a )(b + b )α

i=0 j=1

(4)

a b

= a b +

= a b +

m/2 j=1 m/2 j=1

a b +

a b +

m j= m 2 +1 m j= m 2 +1

The key idea of Booth’s algorithm in [15] is that intermediate results are always in a redundant form of two integer numbers. To apply the algorithm, assume that:

a b

a ¯, = a + a ¯b, = b + b

a b 339

As am = bm = 0, it is noted that: ⎧ ⎨a for = m a, = ⎩a for = m ¯b, =

⎧ ⎨b

⎩b

3.2 Hardware Implementation From the discussion in the previous section, Fig. 1 shows the architecture of a parallel-in parallel-out multiplier over GF(2m ) in which elements are represented with a root of an irreducible AOP of degree m. The multiplier is composed of two main operational modules: the polynomial multiplication and modulo reduction modules. To obtain the first step of the field multiplication, the polynomial multiplication module can be easily built according to (5). Its implementation procedure can consist of the following steps:

for = m for = m

Thus, the product C can be simplified by the following equation:

C=

m

c α

(1) Generate a, = a + a and b, = b + b , for 0 ≤ i ≤ m and 0 ≤ j ≤ m. (2) Generate mi,j = b, + b, , for 0≤ i ≤ m and 1 ≤ j ≤ m/4. (3) Generate di,j , for 0 ≤ i ≤ m and 1 ≤ j ≤ m/4. (4) Generate ci , for 0 ≤ i ≤ m.

(5)

i=0

where

c

⎧ m/4 ⎪ ⎪ di,j + a ⎪ ⎪ ⎪ ⎪ ⎨ j=1 b = ⎪ ⎪ ⎪ m/4 ⎪ ⎪ ⎪ ⎩ di,j

m 2

for

for

j=1

m 2

= odd

As stated above, both steps 1 and 2 can be carried out by XOR arrays; step 3 is implemented by one MUX4×1 array; step 4 is realized by one XOR tree. Finally, to perform a multiplication in GF(2m ), the product C must be reduced modulo of the irreducible AOP. The modulo reduction operational module can be implemented with m XOR gates. The gate count and time delay incurred at each step are given in Table 3. For the proposed parallel multiplier, the time delay is not greater than TM + (3 + log2 m/4)TX , where TM and TX are the time delay of an MUX4×1 and an XOR gate, respectively. The structure of the proposed bit-parallel multiplier requires (1.5 m2 + 0.5 m − 1) 2-input XOR gates and m/4(m + 1) MUX4×1 gates.

= even

di,j = a, b, +a, b, Notice that the results of di,j can be determined depending on the values of the logical variables a, and a, , as shown in Table 2. The case of di,j requires computation when b, and b, have a value 1, i.e., dij = b, + b, . Hence, the computed result of di,j can be selected from 0, b, , b, and b, + ¯b, by the values of a, and a, . Finally, to perform a multiplication in GF(2m ), the product C must be reduced modulo the irreducible AOP. m−1 Let C = cj αj be the result through by the reduced

Example 1. Assume that A = a0 + a1 α + a2 α2 + a3 α3 and B = b0 + b1 α + b2 α2 + b3 α3 are two elements in GF(24 ). Let C = c0 + c1 α + c2 α2 + c3 α3 denote the product of A and B. The computation of the product C = AB is given using the following steps: (1) First, we use a + a and b + b computations to generate a, and b, , respectively, such as:

j=0

modulo of the irreducible AOP. Then, we have: cj = c + cm ⎧ ⎨ 2i for j = ⎩ 2i − m for

0≤i≤ m 2

m 2

−1

a1,4 = a1 , a2,0 = a2 + a0 , a3,1 = a1 + a3 ,

≤i≤m

a4,2 = a2 , a0,3 = a0 + a3 Table 2 Result of di,j

a, a, di,j 1

1

b, + b,

1

0

b,

0

1

b,

0

0

0 340

(2) Then, we generate the term mi,j for j = 0, i = 0, 1, 2, 3, 4 as: m0,0 = b1,4 + b2,3 ,

m1,0 = b2,0 + b3,4 ,

m2,0 = b3,1 + b4,0 ,

m3,0 = b4,2 + b0,1 ,

m4,0 = b0,3 + b1,2

(3) Then, d0,0 is selected from 0, b2,3 , b1,4 of a1,4 and a2,3 d1,0 is selected from 0, b2,0 , b3,4 of a2,0 and a3,4 d2,0 is selected from 0, b3,1 , b4,0 of a3,1 and a4,0 d3,0 is selected from 0, b4,2 , b0,1 of a4,2 and a0,1 d4,0 is selected from 0, b0,3 , b1,2 of a0,3 and a1,2

and m0,0 by the values and m1,0 by the values and m2,0 by the values and m3,0 by the values and m4,0 by the values

(4) Finally, the result C = c0 + c1 α + c2 α2 + c3 α3 is obtained as: c0 = d0,0 + d2,0

Figure 1. The proposed AOP-based bit-parallel multiplier architecture.

c1 = d3,0 + d2,0 c2 = d1,0 + d2,0

a2,3 = a2 + a3 , a3,4 = a3 , a4,0 = a0 ,

c3 = d4,0 + d2,0

a0,1 = a1 + a0 , a1,2 = a1 + a2

b1,4 = b1 , b2,0 = b2 + b0 , b3,1 = b1 + b3 , b4,2 = b2 , b0,3 = b0 + b3 b2,3 = b2 + b3 , b3,4 = b3 , b4,0 = b0 , b0,1 = b1 + b0 , b1,2 = b1 + b2

As stated above, Fig. 2 shows the proposed parallel multiplier in GF(24 ). In the recent years, many bit-parallel architectures for computing multiplication in GF(2m ) whose elements are constructed from an irreducible AOP of degree m, the time delay is not greater than TA + (2+ log2 (m − 1))TX , and the space complexity requires m2 2-input AND gates and

Table 3 Computation Time and Gate Counts of the Bit-Parallel AOP-Based Multiplier with Reduced Time Delay Accumulated Time Delay

Available Signals

TX

a, = a + a and b, = b + b , for 0 ≤ i ≤ m and 0 ≤ j ≤ m

2TX

mi,j = b, + b, , (m + 1)m/4 for 0 ≤ i ≤ m and 1 ≤ j ≤ m/4 m/4 di,j m/4(m + 1) 0

2TX + TM

j=1

TM + (2 + log2 m/4)TX C =

m i=0

Gate Count

c α

TM + (3 + log2 m/4)TX cj = c + cm ⎧ ⎪ for ⎨ 2i j= ⎪ ⎩2i − m for

m −1 0≤i≤ 2 m +1≤i≤m 2 341

MUX4×1

XOR

0

m2

0

(m/4 − 1)(m + 1)

0

m

Figure 2. The bit-parallel AOP-based multiplier over GF(24 ). Table 4 Comparing Polynomial Basis Multipliers with Generating AOPs Multipliers

XOR Gates 2

Itoh–Tsujii [11] m + 2m Wu–Hasan [8]

m2 + 2m − 2 2

MUX4×1

AND Gates Delay

0

m2 + 2m + 1 TA + log2 m + log2 (m + 2)TX

0

m2 2

TA + (m + log2 (m − 1))TX

Koc–Sunar [7]

m −1

0

m

Lee et al. [9]

(m + 1)2

0

(m + 1)2

Fig. 1

1.5m2 + 0.5m − 1 m/4(m + 1) 0

m2 − 1 2-input XOR gates, as seen in [7–9, 11]. Table 4 shows comparisons of various AOP-based bit-parallel multipliers. From Table 4, our proposed multiplexer-based multiplier has fewer complexities than others. The transistor count based on the standard CMOS VLSI realization [17] is employed for comparison. Because an AOP basis representation is attractive, AOP-based multipliers in [9] used the method of an inner-product operation to construct bit-parallel systolic multiplier. Moreover, this circuit is a non-conventional basis having m + 1 basis element in GF(2m ), one needs extra logical operations to convert the extended basis to an ordinary basis. However, our multiplier has no such problem.

TA + (2 + log2 (m − 1))TX (m + 1)(TA + TX ) TM + (3 + log2 m/4)TX

of the irreducible r-ESP of degree mr. Then, an element, A, in the Galois field of GF(2mr ) can be represented as A = a0 + a1 α + a2 α2 + . . . + amr+r−1 αmr+r−1 over GF(2) using the polynomial basis {1, α, α2 , . . . , αmr+r−1 }. A can be decomposed as: A=

r−1

Ak αk

(6)

k=0

where, Ak =

m

air+k αir

i=0

amr+k = 0

4. ESP-Based Multiplier Using the Subword Parallel Processing

From the above equation, Ak is called the subword of the element A in GF(2mr ); that is, an element A is decomposed into r subwords, each subword is defined in (6). First, the subword, Ak , is multiplied by αr :

A polynomial of the form g(x) = 1 + xr + . . . + x(m−1)r + xmr is called an r-ESP of degree mr. Let g(x) = p(xr ). Then p(x) is an AOP of degree m. It has been shown that if p(x) is an irreducible AOP, the r-ESP g(x) must be irreducible if and only if r = (m + 1)j = 1 mod (m + 1)r, for j ≥ 1 [9]. For mr ≤ 100, the possible pairs (mr, r) for which an r-ESP of degree mr is irreducible, are (6, 3), (18, 9), (20, 5), (54, 27) and (100, 25). Now, suppose that α is a root

Ak · αr = αr · (ak + ar+k αr + · · · + amr+k αmr ) = ak αr + ar+k α2r + · · · + amr+k α(m+1)r 342

C = (A0 + A1 α + · · · + Ar−1 αr−1 )(B0 + B1 α + · · ·

As αmr+r = 1, the above equation becomes:

+ Br−1 αr−1 ) = B0 (A0 + A1 α + · · · + Ar−1 αr−1 ) + B1 (A0

Ak · αr = amr+k + ak αr + · · · + a(m−1)r+k αmr

+ A1 α + · · · + Ar−1 αr−1 )α + · · · + Br−1 (A0

r

From the above equation, it is to see that Ak · α is obtained by shifting Ak cyclically r positions to the right. Therefore, the connection diagram of the circuit switch unit for computing Ak · αr is shown in Fig. 3. To facilitate the multiplication representation, let us define that Ak = Ak · α

r

+ A1 α + · · · + Ar−1 αr−1 )αr−1 = B0 (A0 + A1 α + · · · + Ar−1 α + A0 α + · · · + Ar−1 α

r−1

+ A2 α + · · · + Ar−1 α

r−2

r−1

(8) ) + B1 (Ar−1

) + · · · + Br−1 (A1 + A0 αr−1 )

= C0 + C1 α + · · · + Cr−1 αr−1

(7)

According to (8), the multiplication of two elements in GF(2mr ) can be expressed as the following matrix-vector multiplication: ⎡ A0

⎢ ⎢ ⎢ A1 ⎢ ⎢ ⎢ A2 ⎢ ⎢ .. ⎢ . ⎣ Ar−1

Assume that Ak = ak + ar+k αr + · · · + amr+k αmr and Bh = bh + br+h αr + · · · + bmr+h αmr are two subwords of A and B, respectively, where α is a root of the irreducible r-ESP of degree mr. Applying the property of (7), the multiplication of two subwords can be represented by:

+ br+h αr + · · · + bmr+h αmr ) + br+h (ak + ar+k αr + · · · + amr+k αmr )αr + · · · + bmr+h (ak + ar+k αr + · · · + amr+k αmr )αmr = bh (ak + ar+k αr + · · · + amr+k αmr ) + br+h (amr+k

Br−1

⎥ ⎥ C1 ⎥ ⎥ ⎥ C2 ⎥ ⎥ .. ⎥ . ⎥ ⎦

(9)

Cr−1

= Dr−1,t = Dj−1,t for 1 ≤ j ≤ r − 1 = Ej−1,t for 1 ≤ j ≤ r − 1 = Er−1,t

Let the product C =

From the above equation, one can see that the product Ci = Ak Bh is same as the structure of the AOP-based multiplication. Therefore, each multiplication of two subwords can employ our proposed AOP-based multiplier to implement the ESP-based multiplier. Now, given three elements in GF(2mr ), C = C0 + C1 α + · · · + Cr−1 αr−1 = (A0 + A1 α + · · · + Ar−1 αr−1 ) m (B0 + B1 α + · · · + Br−1 αr−1 ), where Ak = ajr+k αjr ,

j=0

· · · A0

Ar−2

⎥ ⎢ ⎥ ⎢ B1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ B2 ⎥ = ⎢ ⎥ ⎢ .. ⎥ ⎢ ⎢ . ⎥ ⎦ ⎣

⎤ C0

E0,0 = B0 , E1,0 = B1 , . . . , Er−1,0 = Br−1

+ · · · + bmr+h (ar+k + a2r+k αr + · · · + ak αmr )

j=0

··· .. .

⎡

D0,0 = A0 , D1,0 = A1 , . . . , Dr−1,0 = Ar−1

+ ak αr + · · · + a(m−1)r+k αmr )

m−1

A1 ...

⎥⎢ ⎥⎢ A2 ⎥⎢ ⎥⎢ ⎥⎢ A3 ⎥⎢ ⎥⎢ .. ⎥⎢ ⎢ . ⎥ ⎦⎣

⎤ B0

where “Dj,t ” denotes the content of the j position of the register D after t clock cycles. Assume that the initial values for two linear feedback shift registers D and E are assigned as follows:

= bh (ak + ar+k αr + · · · + amr+k αmr )

bjr+h αjr and Ci =

···

D0,t+1 Dj,t+1 Ej,t+1 E0,t+1

Ci = (ak + ar+k αr + · · · + amr+k αmr )(bh

m

A0

⎤⎡

Here, the r × r matrix in (9) is called the matrix Q. To compute the product C = AB, assume that the two linear feedback shift registers D and E have both r block registers. Each block register has the (m + 1)-bit registers. The linear feedback shift register D is used to produce the matrix Q. Two shift registers are defined as:

Figure 3. The circuit switch unit.

Bh =

· · · A1

Ar−1

r−1

Ci αi denote the multiplication of

i=0 m−1

A and B, where Ci =

j=0

cjr+i αjr . Thus, the product C

can be represented as: C=

r−1

Ci αi

i=0

j=0

where

cjr+i αjr , the product C

Ci =

can be represented as:

r t=1

343

Di,t · E0,t

r AOP-based multipliers and a linear array. The detailed circuit of the basic cell Ui is shown in Fig. 5. In the initial step, all 1-bit latches in each basic cell Ui are reset to zeros. The space complexity of the proposed ESPbased multiplier for computing C = AB in GF(2m ) requires (1.5 m2 /r + 1.5 m − r) XOR gates, (3m + 2r) 1-bit latches and m/4(m/r + 1) MUX4×1 . The time delay is not greater than r(TM + (3 + log2 m/4r)TX ). Several bit-parallel multipliers have been reported for multiplication in GF(2m ) [11–13]. A circuit comparison between the proposed multiplier and existing multipliers is given in Table 5. From this table, we can see that the proposed ESP-based multiplier has much less hardware complexity than existing multipliers. Implementation issues have been discussed, especially, on reducing the space complexity incurred by the ESP-based multiplier. 5. Conclusions This investigation has presented a new multiplexer-based bit-parallel multiplier for GF(2m ) whose elements are generated by irreducible AOPs. The proposed architecture alternates 4 × 1 MUXs and XOR gates to implement efficient bit-parallel multiplier. For a certain degree of GF(2m ), an irreducible ESP with a high degree can be obtained from a corresponding irreducible AOP with very low degree. Using the subword parallel processing, it is shown that the AOP-based multiplier of a small field can be used to construct all corresponding ESP-based multipliers of larger fields. This expansion method is regular and simple and then provides a convenient way to design parallel multipliers for very large finite fields. As m increases, the proposed ESP-based multiplier has fewer complexities than the others in Table 5. Moreover, both multipliers are very simple and modular, thus are well-suited for VLSI implementation.

Figure 4. The bit-parallel ESP-based multiplier over GF(2mr ).

Acknowledgements

Note: denotes 1-bit latches Figure 5. The detailed circuit of U-cell.

The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their great help in improving the paper. The work was supported in part by the National Science Council of Republic of China under grant numbers NSC 95-2221-E-262-014 and NSC 95-2221-E-231-019.

The proposed ESP-based multiplier for computing multiplication in GF(2m ) is shown in Fig. 4. The multiplier is composed of two linear feedback shift registers,

Table 5 Comparing Polynomial Basis Multipliers with Generating ESPs of Degree m Multipliers

XOR Gates 2

MUXs

AND Gates 2

Delay

Itoh–Tsujii [11]

(m + r) − r

0

(m + r)

TA + log2 m + log2 (m + r + 1)TX

Hasan et al. [12]

m2 + m − r

0

m2

TA + (m/r + log2 m)TX

Lee et al. [9]

(m + r)2

0

(m + r)2

(m + r)(TA + TX )

0

r(TM + (3 + log2 m/4r)TX )

Fig. 4

2

1.5 m /r + 1.5 m − r m/4(m/r + 1) 344

References

Biographies

[1] E.R. Berlekamp, Algebraic coding theory (New York: McGrawHill, 1968). [2] D.E.R. Denning, Cryptography and data security (Reading, MA: Addison-Wesley, 1983). [3] M.Y. Rhee, Cryptography and secure communications, (Singapore: McGraw-Hill, 1994). [4] A. Menezes, P. van Oorschot, & S. Vanstone, Handbook of applied cryptography (Boca Raton, FL: CRC Press, 1997). [5] W. Diffie & M. Hellman, New directions in cryptography, IEEE Transactions on Information Theory, IT-22, 1976, 644– 654. [6] S.T.J. Fenn, M.G. Parker, M. Benaissa, & D. Tayler, Bit-serial multiplication in GF(2m ) using irreducible all-one polynomial, IEE Proceedings-Computers and Digital Techniques, 144, 1997, 391–393. [7] C.K. Koc & B. Sunar, Low-complexity bit-parallel canonical and normal basis multipliers for a class of finite fields, IEEE Transactions on Computers, 47, March 1998, 353–356. [8] H. Wu & M.A. Hasan, Low-complexity bit-parallel multipliers for a class of finite Fields, IEEE Transactions on Computers, 47 (8), August 1998, 883–887. [9] C.Y. Lee, E.H. Lu, & J.Y. Lee, Bit-parallel systolic multipliers for GF(2m ) fields defined by all-one and equally-spaced polynomials, IEEE Transactions on Computers, 50, May 2001, 385–393. [10] C.Y. Lee, E.H. Lu, & L.F. Sun, Low-complexity bit-parallel systolic architectures for computing AB2 + C in a class of finite field GF(2m ), IEEE Transactions on Circuits and Systems, Part II, 48, May 2001, 519–523. [11] T. Itoh & S. Tsujii, Structure of parallel multipliers for a class of fields GF(2m ), Infomation Computers, 83, 1989, 21–40. [12] M.A. Hasan, M.Z. Wang, & V.K. Bhargava, Modular construction of low complexity parallel multipliers for a class of finite fields GF(2m ), IEEE Transaction on Computers, 41, (8), August 1992, 962–971. [13] C.Y. Lee, E.H. Lu, & J.Y. Lee, Bit-parallel systolic modular multipliers for a class of GF(2m ), 15th IEEE Symp. Computer Arithmetic (Arith-2001), Vail, Colorado, USA, June 2001, 51–58. [14] R.B. Lee, Subword parallelism with MAX-2, IEEE-Micro, 16, 1996, 51–59. [15] A. Booth, A signed binary multiplication technique, 4, 1951, 236–240. [16] K.Z. Pekmestzi, Multiplexer-based array multipliers, IEEE Transactions on Computers, 48 (1), January 1999, 15–23. [17] S.M. Kang & Y. Leblebici, CMOS digital integrated circuitsanalysis and design (McGraw-Hill, 1999).

Chiou-Yng Lee received his Bachelor’s degree (1986) in medical engineering and his M.S. degree in electronic engineering (1992), both from the Chung Yuan university, Taiwan and his Ph.D. in electrical engineering from Chang Gung University, Taiwan, in 2001. From 1988 to 2005, he was with the Chunghwa Telecommunication Laboratory in Taiwan. He is currently as the Department of Computer Information and Network Engineering, LungHwa University of Science Technology in 2005. His research interests include computations in finite fields, error-control coding, signal processing and digital transmission system. Besides, he is a member of the IEEE society. He is also an honour member of Phi Tao Phi in 2001. Che Wun Chiou received his B.S. degree in Electronic Engineering from Chung Yuan Christian University in 1982, the M.S. degree and the Ph.D. in Electrical Engineering from National Cheng Kung University in 1984 and 1989, respectively. From 1990 to 2000, he was with the Chung Shan Institute of Science and Technology in Taiwan. He joined the Department of Electronic Engineering, Ching Yun University in 2000. He is currently as Dean of Electrical Engineering and Computer Science and Professor of Department of Computer Science and Information Engineering in Ching Yun University. His current research interests include fault-tolerant computing, computer arithmetic, parallel processing and cryptography.

345

BASED ON MODIFIED BOOTH'S ALGORITHM

BASED ON MODIFIED BOOTH'S ALGORITHM

Suggest Documents

A Modified Decision Tree Algorithm Based on Genetic Algorithm for ...

An algorithm based on a new DQM with modified ... - CyberLeninka

A New Modified Playfair Algorithm Based On Frequency ... - CiteSeerX

An algorithm based on DQM with modified trigonometric cubic

A Modified Unsharp Masking Algorithm based on ...

A New Modified Playfair Algorithm Based On Frequency ... - IJETAE

An Effective Algorithm for Business Process Mining Based on Modified ...

Swarm intelligence based on modified PSO algorithm ...

a new simplified svpwm algorithm based on modified carrier signal

A Modified Mountain Clustering Algorithm based on Hill Valley Function

New cautious BFGS algorithm based on modified Armijo-type ... - Core

An algorithm based on DQM with modified trigonometric cubic

A Modified Genetic Algorithm based Load ...

A New Modified Playfair Algorithm Based On Frequency Analysis

On Modified Algorithm for Fourth-Grade Fluid

Booths booked 10.27.17

a modified histogram based fast enhancement algorithm

Modified Binary Search Algorithm - arXiv

Intensity-based modified Doppler variance algorithm dedicated for ...

A Modified Genetic Algorithm based Load Distribution ...

A modified gradient-based algorithm for solving extended Sylvester ...

A Census-Based Stereo Vision Algorithm Using Modified Semi-Global ...

Modified Hill-top Algorithm Based Maximum Power ...

Modified GPSR Based Optimal Routing Algorithm for ... - IEEE Xplore