Compact Bit-Parallel Systolic Montgomery Multiplication ... - CiteSeerX

Compact Bit-Parallel Systolic Montgomery Multiplication Over GF(2m) Generated by Trinomials Chiou-Yng Lee1, Chin-Chin Chen2 and Erl-Huei Lu2 1

Lunghwa University of Science and Technology, Email: [email protected] 2 The Department of Electrical Engineering, Chang Gung University

Abstract--This

paper presents a scalable and systolic Montgomery’s algorithm in GF(2m) using the Hankel matrixvector representation. The hardware architectures derived from this algorithm represents low-complexity bit-parallel systolic multipliers with trinomials. The results reveal that our proposed multiplier saves approximately 36% space complexity as compared to an existing systolic Montgomery multiplier for trinomials. Moreover, the proposed architectures have the features of regularity, modularity, and local interconnect ability. Accordingly, they are well suited for VLSI implementation.

Keywords: Bit-Parallel Systolic Multiplier, Hankel MatrixVector, Trinomial

1. Introduction Finite field arithmetic operations, especially for the binary field GF(2m), have been widely adopted in cryptography and error-control codes. In particular, two public-key cryptography schemes, elliptic and hyperelliptic curve cryptosystems [1], require arithmetic operations to be performed in finite field. Both approaches of software implementations and hardware architectures for the finite field GF(2m) have been studied extensively. In the finite field [2], the performance of a cryptosystem is primarily determined by an efficient implementation of the arithmetic operations, e.g., addition, multiplication and inversion. Inversion can be carried out just using repeated multiplication-squaring algorithm. Therefore, to reduce the complexity of elliptic curve cryptosystems, efficient architectures for multiplication over GF(2m) are desirable. To improve the performance of modular integer multiplications, the Montgomery multiplication algorithm without division operation was originally proposed by P.L. Montgomery [3]. The benefit of the Montgomery multiplication algorithm is that it restructures the multiplication operations such that the modular adjustment will depend on the least significant digits rather than the most significant digits in conventional modular integer multiplications. Thus, the algorithm replaces division operations with simple addition and shifting operations. Furthermore, the Montgomery multiplication algorithm [4] applies the modular reduction based on the least significant

digit rather than the most significant digit in conventional modular multiplication algorithms. Up to date, several modular multiplication algorithms and architectures for the field GF(2m) which are based on the Montgomery multiplication concept have been proposed in [5,6]. Lee et al. [7] developed a transformation method converting the circuit from AOP-basis multiplier [8] into a bit-parallel systolic Montgomery multiplier for trinomials. With the feature of easy implementation in both software and hardware, the Montgomery multiplication algorithm is very attractive in cryptography, especially in elliptic curve cryptosystems. By employing the Hankel matrix-vector representation, the new Montgomery multiplication algorithm over GF(2m) is presented. As the field is constructed by irreducible trinomials, one shows that the Montgomery multiplication can be decomposed into two Hankel matrixvector multiplications. The low-complexity bit-parallel systolic multiplier derived from this algorithm is also invented. The results reveal that the proposed multiplier saves approximately 36% space complexity as compared to an existing multipliers for trinomials [7,10].

2. The conventional Montgomery multiplication over GF(2m) Let GF(2m) be a finite field of 2m elements. GF(2m) is a vector space over GF(2) of dimension m. A set of m linearly independent vectors is chosen to serve as the basis of representation. Let P(x)=p0+p1x+p2x2+… +pmxm of degree m over GF(2) denote an irreducible primitive polynomial, where p0=pm=1. Any element A(x)∈GF(2m) can be represented with the following polynomial basis representations: A(x)=a0+a1x+a2x2+… +am-1xm-1 . Let A(x),B(x),C(x) be three elements in GF(2m). The Montgomery multiplication efficiently computes C(x)=A(x)⋅B(x)⋅R-1(x) mod P(x), where R(x) satisfies such that gcd(R(x),P(x))=1. Generally, R(x)=xk is commonly chosen as the Montgomery factor, because the reduction modulo xk makes the terms of order larger than k for the remainder operation can be negligible, and the division by xk is just to

shift the polynomial to the right by k places for the division. Since P(x) and R(x) are relatively prime to each other, two polynomials R-1(x) and P’(x) exist with the characteristic that R(x)⋅R-1(x) +P(x)⋅P’(x)=1. Thus, the computation algorithm of the Montgomery multiplication is achieved as follows: Step 1. H(x)=A(x)B(x) . Step 2. U(x)=H(x)⋅P’(x) mod R(x) . Step 3. C(x)=(H(x)+U(x)⋅P(x))/R(x) mod P(x) . As stated above, it is found that efficient multiplier architecture can be obtained if R(x) is properly chosen according to the irreducible polynomial P(x). For example, if the field is generated with a trinomial P(x)=xm+xk+1, then the choice of R(x)=xk turns out to facilitate the implementation of a bit-parallel systolic multiplier, as seen in [7].

3.

The proposed bit-parallel systolic Montgomery multiplier over GF(2m) for all trinomials

The proposed Montgomery multiplication algorithm is firstly depicted. Then the proposed architecture based on this proposed algorithm is developed. The complexity analysis is also made. 3.1 Algorithm Let A(x)=am-1xm-1+…+a1x+a0 and B(x)=bm-1xm-1+ … +b1x+b0 be two elements in GF(2m), where the field is constructed from an irreducible polynomial P(x)=xm+xn+1 over GF(2). Assume that the intermediate product T(x)=t2m-2x2m2 +…+t1x+t0 is the general multiplication of A(x) and B(x). Assume that the intermediate product T(x) is represented by T(x)=T1+ T2xn+T3xm+n, where

T1 = t 0 + t1 x + ... + t n −1 x n−1 , T2 = t n + t n+1 x + ... + t m + n −1 x m −1 , T 3= t m + n −1 + t m + n +1 x + ... + t 2 m − 2 x m − n − 2 . Let the Montgomery parameter R(x) be chosen by using R(x)=xn. The Mongomery multiplication of A(x) and B(x) can be rewritten by C ( x) = A( x) B( x) x − n mod( x m + x n + 1) T + T2 x n + T3 x m + n + T1 ( x m + x n + 1) = 1 xn = T2 + T3 x m + T1 ( x m −n + 1) = T2 + T3 + T1 x m − n + (T1 + T3 x n ) = C 0 + C1 ,

where C 0 = T2 + T3 + T1 x m− n = c0,0 + c0,1 x + " + c 0, m−1 x m−1 , and C1 = T1 + T3 x n = c1,0 + c1,1 x + " + c1,m− 2 x m− 2 .

Definition 1. An m×m matrix H is called a Hankel matrix if it satisfies the relation H(p,q)=H(p-1,q+1), for 1≤p, q = Q (i ) ⊙ A mod 2 } 4 return C = [c0 , c1 ," , c m−1 ] .

B(x)

A(x)

Hankel matrix addition 2m-1 bit

Bit-parallel systolic Hankel multiplier

C(x)

Fig.2. Bit-parallel systolic Montgomery multiplier architecture for all trinomials a0

h0

a1 1

D

U00

c2

h1

a2 2

1

D

D

U01

U02

h2

a3

2

D

D

3

h3

a4

3

D

4

D

U03

4

U04

3.2 Architecture

D

h5

6

h6

5

Given irreducible trinomials, the Montgomery multiplication can be decomposed into two Hankel matrixvector multiplications, as shown in Eq.(3). Observing Algorithm 1, Step 1 identically initializes two Hankel vectors. Both vectors are permuted only by the element B(x). Step 2 performs a straightforward matrix addition to obtain H=K0+K1 requiring a total of m-1 XOR gates. As in Example 2, one has two Hankel vectors, K 0 =[b0,b1,b2,b3,b4,b0,b1,b2,b3]

K1 = [b3,b4, 0,0,0,0,0,b0,b1]. The matrix addition H=K0+K1 can be represented by the vector H =[h0,h1,h2,h3, h4,h5,h6,h7,h8]= [b0+b3,b1+b4,b2,b3,b4,b0,b1,b2+b0,b3+b1], as shown in Fig. 1. Step 3 is the final step to achieve the Montgomery multiplication for the field GF(2m) generated by irreducible trinomial with Hankel matrix-vector multiplication. Therefore, the proposed Montgomery multiplier could be decomposed into two modules, a matrix addition circuit and a Hankel multiplier, as shown in Fig. 2. Fig. 3 shows the proposed Hankel multiplier incorporating of m×m U-cells. Each U-cell is composed of one AND gate, one XOR gate and two 1-bit latches, as shown in Fig.4. Therefore, the Montgomery multiplication requires 2m-1 clock cycles in total. b0

h4 D

b1

b2

b3

c3 D1

U10

U11

U12

U13

U14 D

c4 D2

U20

U21

U22

U23

U24 7

h7

8

h8

D

c0 D3

U30

U31

U32

U33

U34 D

c1

4

D

U40

U41

U42

U43

U44

Fig.3. The bit-parallel systolic Hankel multiplier am-i

hi+j

1

D

c 1

D

b4

Fig. 4. The detailed circuit of the U-cell

h0

h1

h2

h3

h4

h5

h6

h7

h8

3.3 Complexity Recently, a bit-parallel systolic Montgomery multiplier using the matrix-vector approach has been proposed by Lee et

al. in [7]. However, a problem was encountered in this circuit. A trinomial-based multiplier is not flexible. Owing to realize a low-complexity systolic array, the multiplication algorithm is based on the fixed irreducible trinomial of degree m. Accordingly, the architecture should be redesigned to yield the finite field GF(2m) established from irreducible trinomials of various types. The proposed multiplier does overcome such a problem. The complexity of the proposed multiplier and Lee et al.’s multiplier is estimated. The transistor count based on the standard CMOS VLSI realization is employed for comparison. Therefore, some basic logic gates: 2-input XOR, 2-input AND, 1×2 SW, and 1-bit latch are composed of 6, 6, 6 and 8 transistors, respectively [9]. Table 1 illustrates the comparison of two multipliers with the number of transistors. As m is large, the space complexity of the proposed multiplier is about 36% lower than both multipliers [10,7].

4. Conclusions This study develops a new way to realize bit-parallel systolic Montgomery multipliers over GF(2m) under an Hankel matrix-vector multiplication. As the field is constructed from irreducible trinomials, it shows that the Montgomery multiplication can be decomposed into two Hankel matrix-vector multiplications. Compared with both multipliers [10,7], the proposed architecture can save up to 36% space complexity while maintaining approximate single data processing performance.

References [1] N. Kobliz , “Elliptic Curve Cryptography,” Math. Computation, Vol. 48, pp. 203-209, 1987. [2] Lidl, R. and Niederreiter, H., Introduction to Finite Fields and Their Applications, New York: Cambridge Univ. Press, 1994. [3] P.L. Montgomery, "Modular multiplication without trial division," Math. Comp., Vol. 44, pp. 519-521, 1985. [4] Ç.K. Koç and T. Acar, "Montgomery multiplication in GF(2k)," Designs, Codes, and Cryptography, Vol. 14, pp. 57-69, 1998. [5] A.A.A. Gutuba and A.F.Tenca, "Efficient scalable VLSI architecture for Montgomery inversion in GF(p)," INTEGRATION, the VLSI journal, Vol.37, pp.103–120, 2004 [6] C.W. Chiou, C.Y. Lee, A.W. Deng and J.M. Lin, "Efficient VLSI implementation for Montgomery multiplication in GF(2m)," to appear in Tamkang Journal of Science and Engineering, 2006 [7] C.Y. Lee, J.S. Horng, and I.C. Jou, "Low-complexity bit-parallel systolic Montgomery multipliers for special classes of GF(2m)," IEEE Trans. Computers, Vol. 54, pp. 1061-1070, 2005. [8] C.Y. Lee, E.H. Lu, and J.Y. Lee, "Bit-parallel systolic multipliers for GF(2m) fields defined by all-one and equally-spaced polynomials," IEEE Trans. Computers, Vol.50, No. 5, ,pp. 385393, May 2001. [9] S.M. Kang and Y. Leblebici, CMOS Digital Integrated CircuitsAnalysis and Design, McGraw-Hill, 1999. [10] C.Y. Lee, "Low-complexity bit-parallel systolic multiplier over GF(2m) using irreducible trinomials," IEE Proceedings Computer Digital Technology, Vol. 150, pp. 39-42, 2003.

Table 1. A comparison of bit-parallel systolic Montgomery multipliers for trinomials Multipliers # AND # XOR Latches Latency transistors Fig. 2 m2 m2+m-1 2m2 2m-1 28m2 +6m-6 [7] for xm+xm-1+1 m2 1.5m2+m 4m2 m+1 47m2+8m m 2 2 2 [7] for x +x+1 m 1.5m +0.5m 4m m+1 47m2+5m 2 2 2 Lee’s [10] m m + m-1 4m +2m-2 2m-1 44 m2+22m-22

Compact Bit-Parallel Systolic Montgomery Multiplication ... - CiteSeerX

Compact Bit-Parallel Systolic Montgomery Multiplication ... - CiteSeerX

Suggest Documents

Montgomery Multiplication - CiteSeerX

MONTGOMERY MULTIPLICATION

Systolic Montgomery multiplication B. Dixon, AK Lenstra ... - Infoscience

MONTGOMERY MULTIPLICATION ... - Semantic Scholar

Montgomery Multiplication Coprocessor for Altera NIOS ... - CiteSeerX

Systolic multiplication - Acta Universitatis Sapientiae

Montgomery Multiplication Using Vector Instructions

From Euclid's GCD to Montgomery Multiplication to the ... - CiteSeerX

Montgomery Multiplication Coprocessor for Altera ... - Semantic Scholar

Fast Montgomery Modular Multiplication and RSA Cryptographic ...

An expandable montgomery modular multiplication ... - KFUPM ePrints

FPGA montgomery modular multiplication architectures ... - IEEE Xplore

Accelerating Montgomery Modulo Multiplication for Redundant Radix ...

Montgomery Modular Multiplication in Residue ... - Semantic Scholar

Montgomery Multiplication on the Cell - Infoscience - EPFL

Montgomery Multiplication on the Cell - Springer Link

Montgomery modular multiplication on ... - ACM Digital Library

Improved RNS Montgomery Modular Multiplication with Residue ...

implementations of montgomery multiplication ... - Semantic Scholar

Exact Analysis of Montgomery Multiplication - CDC - Technische ...

Modified Montgomery modular multiplication and RSA exponentiation ...

An RNS Montgomery Modular Multiplication ... - Semantic Scholar

Montgomery Scalar Multiplication for Genus 2 Curves

Montgomery modular multiplication architecture for ... - IEEE Xplore