Compact Bit-Parallel Systolic Montgomery Multiplication Over GF(2m) Generated by Trinomials Chiou-Yng Lee1, Chin-Chin Chen2 and Erl-Huei Lu2 1
Lunghwa University of Science and Technology, Email:
[email protected] 2 The Department of Electrical Engineering, Chang Gung University
Abstract--This
paper presents a scalable and systolic Montgomery’s algorithm in GF(2m) using the Hankel matrixvector representation. The hardware architectures derived from this algorithm represents low-complexity bit-parallel systolic multipliers with trinomials. The results reveal that our proposed multiplier saves approximately 36% space complexity as compared to an existing systolic Montgomery multiplier for trinomials. Moreover, the proposed architectures have the features of regularity, modularity, and local interconnect ability. Accordingly, they are well suited for VLSI implementation.
Keywords: Bit-Parallel Systolic Multiplier, Hankel MatrixVector, Trinomial
1. Introduction Finite field arithmetic operations, especially for the binary field GF(2m), have been widely adopted in cryptography and error-control codes. In particular, two public-key cryptography schemes, elliptic and hyperelliptic curve cryptosystems [1], require arithmetic operations to be performed in finite field. Both approaches of software implementations and hardware architectures for the finite field GF(2m) have been studied extensively. In the finite field [2], the performance of a cryptosystem is primarily determined by an efficient implementation of the arithmetic operations, e.g., addition, multiplication and inversion. Inversion can be carried out just using repeated multiplication-squaring algorithm. Therefore, to reduce the complexity of elliptic curve cryptosystems, efficient architectures for multiplication over GF(2m) are desirable. To improve the performance of modular integer multiplications, the Montgomery multiplication algorithm without division operation was originally proposed by P.L. Montgomery [3]. The benefit of the Montgomery multiplication algorithm is that it restructures the multiplication operations such that the modular adjustment will depend on the least significant digits rather than the most significant digits in conventional modular integer multiplications. Thus, the algorithm replaces division operations with simple addition and shifting operations. Furthermore, the Montgomery multiplication algorithm [4] applies the modular reduction based on the least significant
digit rather than the most significant digit in conventional modular multiplication algorithms. Up to date, several modular multiplication algorithms and architectures for the field GF(2m) which are based on the Montgomery multiplication concept have been proposed in [5,6]. Lee et al. [7] developed a transformation method converting the circuit from AOP-basis multiplier [8] into a bit-parallel systolic Montgomery multiplier for trinomials. With the feature of easy implementation in both software and hardware, the Montgomery multiplication algorithm is very attractive in cryptography, especially in elliptic curve cryptosystems. By employing the Hankel matrix-vector representation, the new Montgomery multiplication algorithm over GF(2m) is presented. As the field is constructed by irreducible trinomials, one shows that the Montgomery multiplication can be decomposed into two Hankel matrixvector multiplications. The low-complexity bit-parallel systolic multiplier derived from this algorithm is also invented. The results reveal that the proposed multiplier saves approximately 36% space complexity as compared to an existing multipliers for trinomials [7,10].
2. The conventional Montgomery multiplication over GF(2m) Let GF(2m) be a finite field of 2m elements. GF(2m) is a vector space over GF(2) of dimension m. A set of m linearly independent vectors is chosen to serve as the basis of representation. Let P(x)=p0+p1x+p2x2+… +pmxm of degree m over GF(2) denote an irreducible primitive polynomial, where p0=pm=1. Any element A(x)∈GF(2m) can be represented with the following polynomial basis representations: A(x)=a0+a1x+a2x2+… +am-1xm-1 . Let A(x),B(x),C(x) be three elements in GF(2m). The Montgomery multiplication efficiently computes C(x)=A(x)⋅B(x)⋅R-1(x) mod P(x), where R(x) satisfies such that gcd(R(x),P(x))=1. Generally, R(x)=xk is commonly chosen as the Montgomery factor, because the reduction modulo xk makes the terms of order larger than k for the remainder operation can be negligible, and the division by xk is just to
shift the polynomial to the right by k places for the division. Since P(x) and R(x) are relatively prime to each other, two polynomials R-1(x) and P’(x) exist with the characteristic that R(x)⋅R-1(x) +P(x)⋅P’(x)=1. Thus, the computation algorithm of the Montgomery multiplication is achieved as follows: Step 1. H(x)=A(x)B(x) . Step 2. U(x)=H(x)⋅P’(x) mod R(x) . Step 3. C(x)=(H(x)+U(x)⋅P(x))/R(x) mod P(x) . As stated above, it is found that efficient multiplier architecture can be obtained if R(x) is properly chosen according to the irreducible polynomial P(x). For example, if the field is generated with a trinomial P(x)=xm+xk+1, then the choice of R(x)=xk turns out to facilitate the implementation of a bit-parallel systolic multiplier, as seen in [7].
3.
The proposed bit-parallel systolic Montgomery multiplier over GF(2m) for all trinomials
The proposed Montgomery multiplication algorithm is firstly depicted. Then the proposed architecture based on this proposed algorithm is developed. The complexity analysis is also made. 3.1 Algorithm Let A(x)=am-1xm-1+…+a1x+a0 and B(x)=bm-1xm-1+ … +b1x+b0 be two elements in GF(2m), where the field is constructed from an irreducible polynomial P(x)=xm+xn+1 over GF(2). Assume that the intermediate product T(x)=t2m-2x2m2 +…+t1x+t0 is the general multiplication of A(x) and B(x). Assume that the intermediate product T(x) is represented by T(x)=T1+ T2xn+T3xm+n, where
T1 = t 0 + t1 x + ... + t n −1 x n−1 , T2 = t n + t n+1 x + ... + t m + n −1 x m −1 , T 3= t m + n −1 + t m + n +1 x + ... + t 2 m − 2 x m − n − 2 . Let the Montgomery parameter R(x) be chosen by using R(x)=xn. The Mongomery multiplication of A(x) and B(x) can be rewritten by C ( x) = A( x) B( x) x − n mod( x m + x n + 1) T + T2 x n + T3 x m + n + T1 ( x m + x n + 1) = 1 xn = T2 + T3 x m + T1 ( x m −n + 1) = T2 + T3 + T1 x m − n + (T1 + T3 x n ) = C 0 + C1 ,
where C 0 = T2 + T3 + T1 x m− n = c0,0 + c0,1 x + " + c 0, m−1 x m−1 , and C1 = T1 + T3 x n = c1,0 + c1,1 x + " + c1,m− 2 x m− 2 .
Definition 1. An m×m matrix H is called a Hankel matrix if it satisfies the relation H(p,q)=H(p-1,q+1), for 1≤p, q = Q (i ) ⊙ A mod 2 } 4 return C = [c0 , c1 ," , c m−1 ] .
B(x)
A(x)
Hankel matrix addition 2m-1 bit
Bit-parallel systolic Hankel multiplier
C(x)
Fig.2. Bit-parallel systolic Montgomery multiplier architecture for all trinomials a0
h0
a1 1
D
U00
c2
h1
a2 2
1
D
D
U01
U02
h2
a3
2
D
D
3
h3
a4
3
D
4
D
U03
4
U04
3.2 Architecture
D
h5
6
h6
5
Given irreducible trinomials, the Montgomery multiplication can be decomposed into two Hankel matrixvector multiplications, as shown in Eq.(3). Observing Algorithm 1, Step 1 identically initializes two Hankel vectors. Both vectors are permuted only by the element B(x). Step 2 performs a straightforward matrix addition to obtain H=K0+K1 requiring a total of m-1 XOR gates. As in Example 2, one has two Hankel vectors, K 0 =[b0,b1,b2,b3,b4,b0,b1,b2,b3]
K1 = [b3,b4, 0,0,0,0,0,b0,b1]. The matrix addition H=K0+K1 can be represented by the vector H =[h0,h1,h2,h3, h4,h5,h6,h7,h8]= [b0+b3,b1+b4,b2,b3,b4,b0,b1,b2+b0,b3+b1], as shown in Fig. 1. Step 3 is the final step to achieve the Montgomery multiplication for the field GF(2m) generated by irreducible trinomial with Hankel matrix-vector multiplication. Therefore, the proposed Montgomery multiplier could be decomposed into two modules, a matrix addition circuit and a Hankel multiplier, as shown in Fig. 2. Fig. 3 shows the proposed Hankel multiplier incorporating of m×m U-cells. Each U-cell is composed of one AND gate, one XOR gate and two 1-bit latches, as shown in Fig.4. Therefore, the Montgomery multiplication requires 2m-1 clock cycles in total. b0
h4 D
b1
b2
b3
c3 D1
U10
U11
U12
U13
U14 D
c4 D2
U20
U21
U22
U23
U24 7
h7
8
h8
D
c0 D3
U30
U31
U32
U33
U34 D
c1
4
D
U40
U41
U42
U43
U44
Fig.3. The bit-parallel systolic Hankel multiplier am-i
hi+j
1
D
c 1
D
b4
Fig. 4. The detailed circuit of the U-cell
h0
h1
h2
h3
h4
h5
h6
h7
h8
3.3 Complexity Recently, a bit-parallel systolic Montgomery multiplier using the matrix-vector approach has been proposed by Lee et
al. in [7]. However, a problem was encountered in this circuit. A trinomial-based multiplier is not flexible. Owing to realize a low-complexity systolic array, the multiplication algorithm is based on the fixed irreducible trinomial of degree m. Accordingly, the architecture should be redesigned to yield the finite field GF(2m) established from irreducible trinomials of various types. The proposed multiplier does overcome such a problem. The complexity of the proposed multiplier and Lee et al.’s multiplier is estimated. The transistor count based on the standard CMOS VLSI realization is employed for comparison. Therefore, some basic logic gates: 2-input XOR, 2-input AND, 1×2 SW, and 1-bit latch are composed of 6, 6, 6 and 8 transistors, respectively [9]. Table 1 illustrates the comparison of two multipliers with the number of transistors. As m is large, the space complexity of the proposed multiplier is about 36% lower than both multipliers [10,7].
4. Conclusions This study develops a new way to realize bit-parallel systolic Montgomery multipliers over GF(2m) under an Hankel matrix-vector multiplication. As the field is constructed from irreducible trinomials, it shows that the Montgomery multiplication can be decomposed into two Hankel matrix-vector multiplications. Compared with both multipliers [10,7], the proposed architecture can save up to 36% space complexity while maintaining approximate single data processing performance.
References [1] N. Kobliz , “Elliptic Curve Cryptography,” Math. Computation, Vol. 48, pp. 203-209, 1987. [2] Lidl, R. and Niederreiter, H., Introduction to Finite Fields and Their Applications, New York: Cambridge Univ. Press, 1994. [3] P.L. Montgomery, "Modular multiplication without trial division," Math. Comp., Vol. 44, pp. 519-521, 1985. [4] Ç.K. Koç and T. Acar, "Montgomery multiplication in GF(2k)," Designs, Codes, and Cryptography, Vol. 14, pp. 57-69, 1998. [5] A.A.A. Gutuba and A.F.Tenca, "Efficient scalable VLSI architecture for Montgomery inversion in GF(p)," INTEGRATION, the VLSI journal, Vol.37, pp.103–120, 2004 [6] C.W. Chiou, C.Y. Lee, A.W. Deng and J.M. Lin, "Efficient VLSI implementation for Montgomery multiplication in GF(2m)," to appear in Tamkang Journal of Science and Engineering, 2006 [7] C.Y. Lee, J.S. Horng, and I.C. Jou, "Low-complexity bit-parallel systolic Montgomery multipliers for special classes of GF(2m)," IEEE Trans. Computers, Vol. 54, pp. 1061-1070, 2005. [8] C.Y. Lee, E.H. Lu, and J.Y. Lee, "Bit-parallel systolic multipliers for GF(2m) fields defined by all-one and equally-spaced polynomials," IEEE Trans. Computers, Vol.50, No. 5, ,pp. 385393, May 2001. [9] S.M. Kang and Y. Leblebici, CMOS Digital Integrated CircuitsAnalysis and Design, McGraw-Hill, 1999. [10] C.Y. Lee, "Low-complexity bit-parallel systolic multiplier over GF(2m) using irreducible trinomials," IEE Proceedings Computer Digital Technology, Vol. 150, pp. 39-42, 2003.
Table 1. A comparison of bit-parallel systolic Montgomery multipliers for trinomials Multipliers # AND # XOR Latches Latency transistors Fig. 2 m2 m2+m-1 2m2 2m-1 28m2 +6m-6 [7] for xm+xm-1+1 m2 1.5m2+m 4m2 m+1 47m2+8m m 2 2 2 [7] for x +x+1 m 1.5m +0.5m 4m m+1 47m2+5m 2 2 2 Lee’s [10] m m + m-1 4m +2m-2 2m-1 44 m2+22m-22