Efficient VLSI Implementation for Montgomery Multiplication in GF(2m)

3 downloads 183 Views 637KB Size Report
Jan 9, 2006 - rithm [24], and Digital Signature Standard [25], modular multiplication is ... word-size processing units, their multiplier can handle operands of ...
Tamkang Journal of Science and Engineering, Vol. 9, No 4, pp. 365-372 (2006)

365

Efficient VLSI Implementation for Montgomery Multiplication in GF(2m) Che-Wun Chiou1*, Chiou-Yng Lee2, An-Wen Deng3 and Jim-Min Lin4 1

Department of Computer Science and Information Engineering, Ching Yun University, Chung-Li, Taiwan 320, R.O.C. 2 Department of Computer Information and Network Engineering, Lung Hwa University of Science & Technology, Taoyuan, Taiwan 333, R.O.C. 3 Department of Information Management, Ching Yun University, Chung-Li, Taiwan 320, R.O.C. 4 Department of Information Engineering and Computer Science, Feng Chia University, Taichung City, Taiwan 407, R.O.C.

Abstract The Montgomery multiplication algorithm without division operations is popular both in prime field GF(p) and Finite field GF(2m). However, the Montgomery multiplication algorithm has the time-dependent problem. We will present a time-independent Montgomery multiplication algorithm. The results show that our proposed time-independent Montgomery multiplication algorithm not only saves about 50% time complexity but also saves about 11% space complexity as compared to the traditional Montgomery multiplication algorithm. Our proposed systolic array Montgomery multiplier has simplicity, regularity, modularity, and concurrency, and is very suitable for VLSI implementation. Key Words: Finite Field, Cryptography, ECC, Montgomery Multiplication, Parallel Processing

1. Introduction Recently, finite field arithmetic operations in GF(2m) were frequently desired in coding theory [1], cryptography [2], digital signal processing [3,4], switching theory [5], and pseudorandom number generation [6]. There are three popular types of bases over finite fields; polynomial basis (PB) [7-15], normal basis (NB) [16,17], and dual basis (DB) [18,19]. Among the finite field arithmetic operations, multiplication is the most important, complex and time consuming. In general, other complex operations, such as exponentiation, inversion and division, can be used with Fermat’s theorem to perform the iterative multiplication operations, especially for cryptographic systems. With the advantages of low design complexity, simplicity, regularity, and modularity in architecture, the polynomial basis multipliers are widely used, *Corresponding author. E-mail: [email protected]

producing efficient VLSI multiplier implementations. For most public key cryptosystems such as RSA algorithm [20], Elliptic Curve Cryptography (ECC) [21, 22], NTRU [23], Diffie-Hellman key exchange algorithm [24], and Digital Signature Standard [25], modular multiplication is the most important arithmetic operation. Therefore, there has been growing interest and considerable research activity to develop fast algorithms and architectures of modular multiplications. The Montgomery multiplication algorithm originally proposed in [26] is an efficient method to implement modular multiplication operations in prime fields GF(p) for many public-key cryptosystems such as RSA and ECC. The benefits of using Montgomery multiplication algorithm are: The first, it does not require division operations. The second, it performs the reduction operation depending on the least significant digit rather than the most significant digit used in traditional modular multiplication algorithms. Thus, it eliminates the carry-propagation problem. Up to

366

Che-Wun Chiou et al.

date, many modular multiplication algorithms and architectures based on the Montgomery multiplication concept have been proposed [27-33]. The major drawbacks of the Montgomery multiplication algorithm are that it needs some pre-computation constants and the basis conversion of the result. Koc and Acar [34] has proven that the Montgomery multiplication algorithm is also accurate in GF(2m) fields. Wu [35] proposed an efficient architecture for bitparallel Montgomery multiplier and squarer in GF(2m) fields generated with irreducible trinomials. Fournaris and Koufopavlou [36] presented both folded and pipelined architectures for the Montgomery multiplication in GF(2m). Bajard et al. [37] provided a Montgomery multiplier over GF(pm) other than GF(2m). In 2000, E. SavaS et al. [38] firstly introduced an unified multiplier for GF(P) and GF(2m) which uses Montgomery multiplication algorithm for both fields. Based on the pipeline structure of word-size processing units, their multiplier can handle operands of any size. However, the pre-computed constants and transformations are needed for the multipliers using the Montgomery algorithm. Goodman and Chandrakasan [39] proposed a reconfigurable processor for a full suite of arithmetic operations over prime and binary extension fields by using reprogrammable logic cells. The Montgomery algorithm is employed for computing multiplication in GF(P) by this public-key cryptography processor. The multiplication in GF(2m) utilizes the iterative MSB-first scheme and interleaved modular reduction technique. In [40], Großschädl published a unified bit-serial multiplier which does not require any pre-computations and transformations. Its major disadvantage is that the low performance in GF(P) which is caused by the reduction algorithm and the redundant-to-binary conversion. Wolkerstorfer [41] invented a unified arithmetic unit for both fields with interleaved modular reduction in addition and multiplication. Its arithmetic unit allows operation at high clock frequency by using redundant number representation and efficient modular reduction. Savas et al. [42] proposed two unified multipliers for both fields which are scalable and offer faster computation of multiplication. Their multipliers are also based on the Montgomery algorithm and provided high-radix design for low-power and high-performance applications. Satoh and Takano [43] introduced a scalable dual-field processor for Elliptic Curve Cryptograph by using the Mont-

gomery multiplication algorithm and the Wallace tree. Because ECC requires multiplications in both prime field GF(p) and finite field GF(2m). Thus, the Montgomery multiplication algorithm which is available for both prime and finite fields is attractive for ECC. Therefore, many researchers have focused on the research topic of unified architectures for both GF(p) and GF(2m) fields [38-44]. Both multiplication-addition and modular operations for each iteration of the Montgomery multiplication algorithm are required. Both multiplication-addition and modular operations are time-dependent and then they are time-consuming. To overcome this problem, in this study, we will present a time-independent Montgomery multiplication algorithm over GF(2m) fields. The remainder of this article is organized as follows. Section 2 briefly reviews the traditional Montgomery multiplier. Section 3 presents the proposed time-independent Montgomery multiplier. A brief conclusion is made in the final section, Section 4.

2. Preliminaries The finite field arithmetic operations in GF(2m) were frequently desired in coding theory, cryptography, digital signal processing, switching theory, and pseudorandom number generation. The elements of a GF(2m) field can be represented by three popular types of bases: polynomial basis (PB) [7-15], normal basis (NB) [16,17,45], and dual basis (DB) [18,19]. With the benefits of low design complexity, simplicity, regularity, and modularity in architecture, the polynomial basis representations are widely used due to produce efficient VLSI implementations. Let a(x) and b(x) be elements in GF(2m) generated by an irreducible polynomial f(x) of degree m. The set {x0, x1, x2, …, xm-1} is called the polynomial basis. Three terms a(x), b(x) and f(x) are expressed as follows: a ( x) = a0 x 0 + a1 x1 + a2 x 2 + ... + am -1 x m -1 , b( x) = b0 x 0 + b1 x1 + b2 x 2 + ... + bm -1 x m -1 , and f (x ) = 1 + f1 x + f 2 x 2 + ... + f

m -1 x

m -1

+ xm ,

where ai ,bi , and f j Î GF( 2 ), 0 £ i £ m - 1 and 1 £ j £ m - 1.

Let r(x) be a chosen element in GF(2m) and gcd(f(x), r(x)) = 1. Instead of computing c(x) = a(x) ´ b(x) mod f(x)

Efficient VLSI Implementation for Montgomery Multiplication in GF(2m)

in conventional modular multiplication algorithms, the Montgomery multiplication algorithm computes c(x) = a(x) ´ b(x) ´ r-1(x) mod f(x), where r-1(x) is the inverse of r(x). By choosing the value r(x), the Montgomery multiplication algorithm becomes less complex and gives efficient hardware implementation. A popular selection of r(x) is r(x) = xm. The Montgomery multiplication can be given by c(x) = a(x) ´ b(x) ´ x-m mod f(x) = (a0 + a1x + a2x2 + … + am-1xm-1) ´ b(x) ´ x-m mod f(x) = ((...(((a0b(x)x-1 + a1b(x))x-1 + a2b(x))x-1 + a3b(x))x-1 + ...)x-1 + am-1b(x))x-1

(1)

The iteration algorithm can be used for computing Eq.(1). Let h(x) represent the computing result of the last iteration. The operation in next iteration has the following general form: h (x )´ x -1 + ai b (x ),

where the residue representation of h (x ) is

h0 + h1 x1 + h2 x 2 + ... + hm -1 x m -1

(2)

for all hi Î {0,1} and 0 £ i £ m-1.

Since the result must be modulated by f(x), thus f(x)=0 and we have f ( x) = 1 + f1 x1 + f 2 x 2 + ... + f m -1 x m -1 + f m x m = 0, and multiply both sides by x -1 , x -1 + f1 + f 2 x1 + ... + f m -1 x m - 2 + f m x m -1 = 0,

(3)

thus x -1 = f1 + f 2 x1 + ... + f m -1 x m - 2 + f m x m -1.

367

Based on Eqs. (1) and (4), the traditional bit-level Montgomery multiplication algorithm in [34] is given below: Algorithm-A: (Bit-Level Montgomery Multiplication Algorithm) Input a(x), b(x), f(x) Output c(x): = a(x)b(x)x-m mod f(x) S1: c(x): = 0 S2: For i = 0 To m-1 Do Begin S3: c(x): = c(x) + aib(x) S4: c(x): = c(x) + c0f(x) S5: c(x): = c(x)/x S6: End S7: Return c(x) In VLSI designs, systolic architectures are fundamentally suited to rapid computation and depend on regular circuitry to perform arithmetic operations over GF(2m). Figure 1 shows the semi-systolic array implementation of the above algorithm. The detailed circuit of each unit cell Ui,j is depicted in Figure 2. For the m ´ m multiplication, the latency of this semi-systolic multiplication array requires m clock cycles. Each clock cycle takes delays of two 2-input AND gates, two 2-input XOR gates, and two 1-bit latches. In the bit-level Montgomery multiplication algorithm, the variable c0 in the statement S4 is the bit-0 of c(x) after computing the statement S3. Therefore, statements S3 and S4 must be executed in order. In other words, they are time-dependent. Such time-dependent statements are time-consuming because the value m is very huge, for example, 512 bits to 1024 bits or more, as the Montgomery multiplication algorithm is applied to public-key cryptosystems. To alleviate this problem, a fast algorithm for executing both statements S3 and S4 in parallel will be presented and discussed in the next section.

Using Eq.(3), h(x) ´ x-1 is given by

3. Time-Independent Montgomery Multiplication Algorithm

h( x) ´ x -1 = (h0 + h1 x1 + h2 x 2 + ... + hm -1 x m -1 ) ´ x -1 = h0 x

-1

+ h1 + h2 x + ... + hm -1 x 1

m-2

= h0 ( f1 + f 2 x1 + ... + f m -1 x m - 2 + f m x m -1 ) + h1 + h2 x1 + ... + hm -1 x m - 2 = (h1 + h0 f1 ) + (h2 + h0 f 2 ) x1 + (h3 + h0 f3 ) x 2 + ... +(hm -1 + h0 f m -1 ) x m - 2 + h0 f m x m -1

(4)

As aforementioned, the execution of the statement S4 in the Algorithm-A is depended on the result c0 in the statement S3. Thus, if the statement S3 in the AlgorithmA is pre-computed one step ahead, both statements S3 and S4 can be executed in parallel. Therefore, to overcome the time-dependent problem in the Algorithm-A, the time-independent algorithm is described in the fol-

368

Che-Wun Chiou et al.

Figure 1. The semi-systolic array for the m m bit-level Montgomery multiplier.

S6: End S7: Return c(x)

Figure 2. The detailed circuit of the cell Ui,j in Figure 1.

lowing algorithm. Algorithm-B: (Bit-Level Time-Independent Montgomery Multiplication Algorithm) Input a(x), b(x), f(x) Output c(x): = a(x)b(x)x-m mod f(x) S1: c(x): = 0, d0: = a0b0 S2: For i = 0 To m-1 Do Begin S3: c(x): = c(x)/x Parallel Begin S4: c(x): = c(x) + aib(x) + dif(x) S5: di+1: = ai+1b0 + dif1 + aib1 Parallel End

In the above time-independent algorithm, both statements S4 and S5 are executed in parallel. The hardware implementation of this time-independent algorithm using semi-systolic array is shown in Figure 3. The detailed circuits of both cells Vi,j and Wi are depicted in Figure 4. These V cells are responsible for executing both statements S3 and S4 in the Algorithm-B. These W cells have charge of performing the statement S5 in the AlgorithmB. Due to characteristics of finite fields GF(2m), the operator “+” represents the mod 2 operation. In other words, the symbol “+” in both Algorithm-A and -B represents the simple logical XOR operation. The latency of this semi-systolic multiplication array with m ´ m size also needs m + 1 clock cycles, but each clock cycle only takes one 2-input AND gate delay, one 3-input XOR gate delay, and one 1-bit latch delay. Comparisons of time and space complexities of both the traditional Montgomery and the proposed timeindependent Montgomery algorithms are listed in Table 1. Let D and W denote the unit gate delay and number of transistors corresponding to one level of a logic circuit, respectively. Jeon et al. [46] has pointed out the following results: Delay of 2-input AND gate = 2.4D, Delay of 2-input XOR gate = 4.2D

Efficient VLSI Implementation for Montgomery Multiplication in GF(2m)

369

Figure 3. The semi-systolic array for the time-independent Montgomery multiplier.

Figure 4. The detailed circuits of the cell Vi,j and Wi in Figure 3.

Table 1. Comparisons of time and space complexities Multipliers

Traditional Montgomery Multiplier (Fig. 1) [34]

Time-Independent Montgomery Multiplier (Fig. 3)

Space Complexity

2-input AND gate 2-input XOR gate 3-input XOR gate 1-bit Latch Total Transistors

2m2 2m2 0 4m2 72m2W

2m2 + 3m 0 m2 + m 3m2 + 4m (64m2 + 78m)W

Time Complexity

2-input AND gate delay 2-input XOR gate delay 3-input XOR gate delay 1-bit Latch Total Delays (unit delays)

2m 2m 0 2m 16mD

m+1 0 m+1 m+1 (8m + 8)D

370

Che-Wun Chiou et al.

Delay of 3-input XOR gate = 4.2D, Delay of 1-bit Latch = 1.4D Transistors of 2-input AND gate = 6W, Transistors of 2-input XOR gate = 14W Transistors of 3-input XOR gate = 28W, Transistors of 1-bit Latch = 8W. The proposed time-independent Montgomery multiplier in Figure 3 requires the total number of (64m2 + 78m)W (= (2m2 + 3m) ´ 6W + (m2 + m) ´ 28W + (3m2 + 4m) ´ 8W) transistors and takes (8m + 8)D (= (m + 1) ´ 2.4D + (m + 1) ´ 4.2D + (m + 1) ´ 1.4D) delays. Our proposed time-independent Montgomery multiplication algorithm takes (8m + 8)D time complexity while the traditional Montgomery multiplication algorithm requires 16mD. Furthermore, the space complexity of our proposed time-independent Montgomery multiplication algorithm is (64m2 + 78m)W while that of the ordinary Montgomery multiplication algorithm is 72m2W. As a result, our proposed time-independent Montgomery multiplier saves both about 50% time complexity and 89% space complexity as compared to the traditional Montgomery multiplier.

4. Conclusion We have shown that the conventional Montgomery multiplication algorithm has the time-dependent problem. To overcome this problem, we have presented the timeindependent Montgomery multiplication algorithm. Our proposed time-independent Montgomery multiplication algorithm saves about 50% (@ (16mD - 8mD)/16mD = 50%) time complexity as compared to the traditional Montgomery multiplication algorithm. Moreover, our proposed time-independent Montgomery multiplication algorithm only saves about 11% (@ (72m2 - 64m2)/72m2 @ 11%) space complexity while comparing with the traditional Montgomery multiplication algorithm. The major drawback of our proposed Montgomery multiplier is that it requires two different cells which increase the process complexity.

Acknowledgments The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their

great help in improving the paper.

References [1] MacWilliams, F. J. and Sloane, N. J. A., The Theory of Error-Correcting Codes, Amsterdam: North-Holland (1977). [2] Lidl, R. and Niederreiter, H., Introduction to Finite Fields and Their Applications, New York: Cambridge Univ. Press, U.S.A. (1994). [3] Blahut, R. E., Fast Algorithms for Digital Signal Processing, Reading, Mass.: Addison-Wesley (1985). [4] Reed, I. S. and Truong, T. K., “The Use of Finite Fields to Compute Convolutions,” IEEE Trans. Information Theory, Vol. IT-21, pp. 208-213, (1975). [5] Benjauthrit, B. and Reed, I. S., “Galois Switching Functions and Their Applications,” IEEE Trans. Computers, Vol. C-25, pp. 78-86 (1976). [6] Wang, C. C. and Pei, D., “A VLSI Design for Computing Exponentiation in GF(2m) and Its Application to Generate Pseudorandom Number Sequences,” IEEE Trans. Computers, Vol. 39, pp. 258-262 (1990). [7] Bartee, T. C. and Schneider, D. J., “Computation with Finite Fields,” Information and Computing, Vol. 6, pp. 79-98 (1963). [8] Mastrovito, E. D., “VLSI Architectures for Multiplication over Finite Field GF(2m),” Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes, Proc. Sixth Int’l Conf., AAECC-6, T. Mora, ed., Rome, pp. 297-309 (1988). [9] Koç, Ç. K. and Sunar, B., “Low-complexity Bit-parallel Canonical and Normal Basis Multipliers for a Class of Finite Fields,” IEEE Trans. Computers, Vol. 47, pp. 353-356 (1998). [10] Lee, C.-Y., “Low Complexity Bit-parallel Systolic Multiplier over GF(2m) Using Irreducible Trinomials,” IEE Proc.-Comput. Digit. Tech., Vol. 150, pp. 39-42 (2003). [11] Itoh, T. and Tsujii, S., “Structure of Parallel Multipliers for a Class of Fields GF(2m),” Information and Computation, Vol. 83, pp. 21-40 (1989). [12] Hasan, M. A., Wang, M. and Bhargava, V. K., “Modular Construction of Low Complexity Parallel Multipliers for a Class of Finite Fields GF(2m),” IEEE Trans. Computers, Vol. 41, pp. 962-971(1992). [13] Lee, C. Y., Lu, E. H. and Lee, J. Y., “Bit-parallel Sys-

Efficient VLSI Implementation for Montgomery Multiplication in GF(2m)

tolic Multipliers for GF(2m) Fields Defined by All-one and Equally-spaced Polynomials,” IEEE Trans. Computers, Vol. 50, pp. 385-393 (2001). [14] Paar, C., “A New Architecture for a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields,” IEEE Trans. Computers, Vol. 45, pp. 856-861 (1996). [15] Chiou, C. W., Lin, L. C., Chou, F. H. and Shu, S. F., “Low Complexity Finite Field Multiplier Using Irreducible Trinomials,” Electronics Letters, Vol. 39, pp. 1709-1711 (2003). [16] Massey, J. L. and Omura, J. K., “Computational Method and Apparatus for Finite Field Arithmetic,” U.S. Patent Number 4,587,627, May (1986). [17] Reyhani-Masoleh, A. and Hasan, M. A., “A New Construction of Massey-Omura Parallel Multiplier over GF(2m),” IEEE Trans. Computers, Vol. 51, pp. 511520 (2002). [18] Berlekamp, E. R., “Bit-serial Reed-Solomon Encoders,” IEEE Trans. Information Theory, Vol. IT-28, pp. 869-874 (1982). [19] Wu, H. M., Hasan, A. and Blake, I. F., “New Lowcomplexity Bit-parallel Finite Field Multipliers Using Weakly Dual Bases,” IEEE Trans. Computers, Vol. 47, pp. 1223-1234 (1998). [20] Rivest, R. L., Shamir, A. and Adleman, L., “A Method for Obtaining Digital Signatures and Public-key Cryptosystems,” Comm. ACM, Vol. 21, pp. 120-126 (1978). [21] Kobltiz, N., “Elliptic Curve Cryptosystems,” Math. Computation, Vol. 48, pp. 203-209 (1987). [22] Miller, V. S., “Use of Elliptic Curves in Cryptography,” Advances in Cryptology - CRYPTO ¢85 Proceedings, Springer Verlag, pp. 417-429 (1986). [23] Hoffstein, J., Pipher, J. and Silverman, J. H., “NTRU: A Ring Based Public Key Cryptosystem,” Proc. Algorithmic Number Theory: Third Int’l Symp. (ANTS 3), J.P. Buhler, ed., pp. 267-288 (1998). [24] Diffie, W. and Hellman, M. E., “New Directions in Cryptography,” IEEE Trans. Information Theory, Vol. 22, pp. 644-654 (1976). [25] National Institute for Standards and Technology, Digital Signature Standard (DSS). FIPS PUB 186-2, (2000). [26] Montgomery, P. L., “Modular Multiplication Without Trial Division”, Math. Computation, Vol. 44, pp. 519-

371

521 (1985). [27] Nibouche, O., Bouridane, A. and Nibouche, M., “Architectures for Montgomery’s Multiplication,” IEE Proc.-Comput. Digit. Tech., Vol. 150, pp. 361-368 (2003). [28] Walter, C. D., “Systolic Modular Multiplication,” IEEE Trans. Computers, Vol. 42, pp. 376-378 (1993). [29] O’Rourke, C. and Sunar, B., “Achieving NTRU with Montgomery Multiplication,” IEEE Trans. Computers, Vol. 52, pp. 440-448 (2003). [30] Kornerup, P., “A Systolic, Linear-array Multiplier for a Class of Right-shift Algorithms,” IEEE Trans. Computers, Vol. 43, pp. 892-898 (1994). [31] Walter, C. D., “Montgomery Exponentiation Needs No Final Subtractions,” Electronics Letters, Vol. 35, pp. 1831-1832 (1999). [32] Bium, T. and Paar, C., “High-radix Montgomery Modular Exponentiation on Reconfigurable Hardware,” IEEE Trans. Computers, Vol. 50, pp. 759-764 (2001). [33] Koc, K. C. and Acar, T., “Analyzing and Comparing Montgomery Multiplication Algorithms,” IEEE Micro, Vol. 16, pp. 26-33 (1996). [34] Koc, C. K. and Acar, T., “Montgomery Multiplication in GF(2k),” Designs, Codes, and Cryptography, Vol. 14, pp. 57-69 (1998). [35] Wu, H., “Montgomery Multiplier and Squarer in GF(2m),” CHES 2000, LNCS 1965, pp. 264-276 (2000). [36] Fournaris, A. P. and Koufopavlou, O., “GF(2k) Multipliers Based on Montgomery Multiplication Algorithm,” Proc. of the IEEE International Symposium on Circuits and Systems, Vol. II, pp. 849-852 (2004). [37] Bajard, J. C., Imbert, L., Negre, C. and Plantard, T., “Efficient Multiplication in GF(pk) for Elliptic Curve Cryptography,” IEEE Symp. Computer Arithmetic, pp. 181-187 (2003). [38] SavaS, E., Tenca, A. F. and Koç, Ç. K., “A Scalable and Unified Multiplier Architecture for Finite Fields GF(p) and GF(2m),” CHES 2000, LNCS 1965, pp. 277-292 (2000). [39] Goodman, J. and Chandrakasan, A. P., “An Energyefficient Reconfigurable Public-key Cryptography Processor,” IEEE Journal of Solid-State Circuits, Vol. 36, pp. 1808-1820 (2001). [40] Großschädl, J., “A Bit-serial Unified Multiplier Architecture for Finite Fields GF(p) and GF(2m),” CHES

372

Che-Wun Chiou et al.

2001, LNCS 2162, pp. 202-219 (2001). [41] Wolkerstorfer, J., “Dual-field Arithmetic Unit for GF(p) and GF(2m),” CHES 2002, LNCS 2523, pp. 500-514 (2003). [42] Savas, E., Tenca, A. F., Çiftçibasi, M. E. and Koç, Ç. K., “Multiplier Architectures for GF(p) and GF(2n),” IEE Proceedings-Computers and Digital Technology, Vol. 151, pp.147-160 (2004). [43] Satoh, A. and Takano, K., “A Scalable Dual-field Elliptic Curve Cryptographic Processor,” IEEE Trans. Computers, Vol. 52, pp. 449-460 (2003). [44] Gutub, A. A.-A., Tenca, A. F., Savas, E. and Koc, C. K., “Scalable Unified Hardware to Compute Mont-

gomery Inverse in GF(p) and GF(2n),” CHES 2002, LNCS 2523, pp.484-499 (2003). [45] Massey, J. L. and Omura, J. K., “Computational Method and Apparatus for Finite Field Arithmetic,” U.S. Patent Number 4,587,627 (1986). [46] Jeon, J.-C., Kim, H.-S. and Lee, H.-M., “Bit-serial AB2 Multiplier Using Modified Inner Product,” Journal of Information Science and Engineering, Vol. 18, pp. 507-518 (2002).

Manuscript Received: Sep. 12, 2005 Accepted: Jan. 9, 2006

Suggest Documents