An Area-Efficient GF(2 ) MSD Multiplier based on ... - Semantic Scholar

1 downloads 97 Views 163KB Size Report
m bit independent of digit size of D, we can reduce register size compared to .... [1] Digital Signature Standard (DSS), Federal Information Processing. Standards ...
An Area-Efficient GF(2m) MSD Multiplier based on an MSB Multiplier for Elliptic Curve LSI Ryuta Nara, Kazunori Shimizu, Shunitsu Kohara, Nozomu Togawa, Masao Yanagisawa and Tatsuo Ohtsuki 1

The Department of Computer Science and Engineering, Waseda University 3-4-1 Okubo, Shinjuku, Tokyo, 169-8555, Japan Tel: +81-3-3209-3211 (Ext. 5775), Fax: +81-3-3208-7439 E-mail: [email protected]

Abstract: In this paper, we propose an MSD (most significant digit) multiplier based on an MSB (most significant bit) multiplier over GF(2m). The proposed multiplier is based on connecting D (digit size)-bit bit-operations in series. In each digit operation in our proposed multiplier, the “left shift and reduction operation” is serially performed for each of D bits. Because registers for storing intermediate computational results have only m bit independent of digit size of D, we can reduce register size compared to conventional digit-serial multipliers. We also implemented an ECC LSI using the proposed MSD multiplier.

1. Introduction Elliptic curve cryptosystems (ECCs) are one of public key cryptosystems which was proposed by [2],[5] independently. Currently, the most popular public key cryptosystem is RSA [7], but it is said that ECC has higher security compared to RSA. For example, the security level of 160-bit ECC is equivalent to that of 1024-bit RSA. ECC requires the key size only about 1/6 compared to that for RSA. Thus we can expect that ECC-based public key cryptosystem LSI will be implemented in less area and power dissipation and higher processing throughput compared to that for RSA. ECC operations are performed over the finite field arithmetic. The finite field used in ECC is grouped into two types: prime field GF(p) and binary field GF(2m). Obviously hardware complexity of binary field arithmetic is less than that of prime field since the binary field does not require carry operation and can be implemented with only a single XOR gate for each bit addition. The critical path delay of the binary field operations can be much smaller than that of prime field operations. Since over 90 percent of ECC operations are occupied by multiplications, it is strongly required to improve a multiplication algorithm and its dedicated multipliers for ECC LSIs. Bit-serial multiplication is a basic algorithm whose architecture will be the simplest one, but its throughput is too low to be used in ECC LSIs. Digit-serial multiplication is an algorithm to which a bit-serial multiplication algorithm is extended where several bits are operated within one cycle. In digit-serial multiplication, we can have a trade-off between area and performance by changing its digit size D operated within one cycle. In that sense, digit-serial multiplication has higher flexibility than bit-serial multiplication. Also since the number of operations within one cycle for digitserial multiplication is usually higher than that for bit-serial multiplication, digit-serial multiplication can have high performance with maintaining low frequency and low power [7]. In addition, Ref. [3] proposed a digit-serial multiplier which increases accumulator (adder and register) and decreases the critical path delay. In this paper, we propose an MSD (most significant digit) multiplier based on an MSB (most significant bit) multiplier over GF(2m). This architecture is based on connecting D (digit size)-bit bit-operations in series. In each digit operation in our proposed multiplier, the “left shift and reduction operation” is serially performed for each of D bits. Because registers for storing intermediate computational results have only m bit independent of

digit size of D, we can reduce register size compared to conventional digit-serial multipliers.

2. MSD Multiplication An MSD multiplication algorithm is one of digit-serial multiplication algorithm. A digit-serial multiplication algorithm multiplies m-bit data by D-bit data at each clock cycle. D is more than one and called digit size. In digit-serial multiplication, the m / D⎤ Digit-serial multiplication is number of digits will be ⎡[m/D]. grouped into two types just as in bit-serial multiplication: An MSD multiplication algorithm [7] begins calculation from the most significant digit and multiplication result for multiplying mbit data by D-bit data is stored into a shift register. An LSD multiplication algorithm [3],[7] begins calculation from the least significant digit and a shift register stores a multiplicand A. It also needs another register to store intermediate computational results. Note that, our proposed multiplier is based on an MSD multiplication algorithm.

3. An MSD Multiplier based on an MSB Multiplier While an LSD multiplication algorithm requires extra registers to store a multiplicand and intermediate results, an MSD multiplication algorithm requires no extra registers but the critical path delay may be larger. The total number of clock cycles for an MSD multiplication algorithm is less than that of an LSD multiplication. If we design an MSD multiplier carefully, decreasing rate of the number of clock cycles can be larger than that of increasing rate of critical path delay when digit size D is increased. Based on the above discussion, we propose an MSD multiplier based on an MSB multipliers (Fig. 1). The proposed MSD multiplier can be implemented by only connecting D (digit size)bit bit-operations in series so that registers for storing intermediate computational results have only m bits independent of D. Moreover we explain that our multiplier can have small computational time (= clock cycles * delay) compared to conventional digit-serial multipliers.

Figure 1. Block diagram of the proposed MSD multiplier (m=163, D=4).

Table 1. Estimation of FF, Number of clock cycles and delay

3.1 Proposed MSD Multiplier Architecture The proposed MSD multiplier can be implemented by connecting D-bit bit-operations in series. In each digit operation in our proposed multiplier, the “left shift and reduction” operation is serially performed for each bit of D-bit data. Because registers for storing intermediate computational results have only m bits independent of digit size of D, we can reduce register size compared to conventional digit-serial multipliers (Fig. 2(b)). In [3],[7], size of intermediate data between the accumulator and the reduction calculator requires (m + D) bits as shown in Fig. 2(a), since they separate an accumulator from a reduction calculator. They also require one more number of clock cycles at one multiplication compared to the proposed multiplier.

in this study has been fabricated in the chip fabrication program of VDEC, the University of Tokyo in collaboration with Rohm Corporation and Toppan Printing Corporation.

Figure 3. Area vs digit size D.

(a) Conventional MSD multiplier. (b) Proposed MSD multiplier. Figure 2. MSD multiplier core architecuture.

3.2 Estimaion of Area and Delay As in Fig. 2(b), our MSD multiplier has m-bit register, while [3],[7] have (m+D)-bit register to store an intermidiate data. The number of clock cycles to compute m-bit data * m-bit data is m / D⎤ in our proposed MSD multiplier while it is ⎡[m/D]+1 m / D⎤ in ⎡[m/D] [3],[7], since our MSD multiplier computes shift and reduction opeation within one clock cycle. By constructing an XOR binay tree, we can have a delay proportion to D in our MSD multiplier. Table 1 summarizes a number of FF, a number of clock cycles and delay of our MSD multiplier and those of [7].

Figure 4. Critical path delay vs digit size D.

4. Results and Implementation of ECC LSI Technology libraries we use are VDEC ROHM 0.35um CMOS technology libraries. We have synthesized the proposed MSD multiplier using FIPS186-2 parameters [1]. We have written in Verilog-HDL the RTL description of the proposed MSD multiplier and synthesized it using Design Complier W-2004.12SP2. Fig. 3 shows the required area for the proposed MSD multiplier. As in the figure, the required areas are proportional to digit size of D. Fig. 4 shows the critical path delay of the proposed MSD multiplier. The critical path delay is almost proportional to log D. We have implemented an elliptic curve cryptosystem on ASIC using the proposed MSD multiplier. We use an elliptic curve cryptosystem over GF(2163) based on FIPS186-2 at B-163 [1]. We use the LD-Montgomery algorithm [4] as a scalar multiplication of our ECC. The digit size of D of a multiplier is 48 bits. Fig. 5 shows the block diagram and the chip photograph of the implemented ECC LSI, whose clock frequency is 50MHz and computation time is 0.115ms for ECC processing.

Acknowledgements This work is supported by VLSI Design and Education Center(VDEC), the University of Tokyo in collaboration with Synopsys, Inc. and Cadence Design Systems, Inc. The VLSI chip

(a) Block diagram. (b) Chip photograph. Figure 5. Implemention of ECC LSI.

References [1] Digital Signature Standard (DSS), Federal Information Processing Standards Publication 186-2, National Institute of Standards and Technology, 2000. [2] N.Koblitz, “Elliptic curve cryptosystems, “ Math. Computation, vol. 48, pp.203-209, 1987. [3] S. Kumar, T. Wollinger and C. Paar, “Optimum digit serial GF(2m) multipliers for curve-based cryptography,” IEEE Transactions on Computers, vol.55, no. 10, pp.1306-1311, October 2006. [4] López and R. Dahab, “Fast multiplication on elliptic curves over GF(2m) without precomputation,” Cryptographic Hardware and Embedded Systems -- CHES'99, Springer-Verlag, Lecture Notes in Computer Science 1717, pp.316-327, August 1999. [5] V.Miller, “Uses of elliptic curves in cryptography,” Advances in Cryptology, Proc. Crypto '85, H. C. Williams, ed., pp. 417-426, 1986. [6] R.L.Rivest, A.Shamir and L.Adleman, “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems,” Comm. ACM, vol. 21, no. 2, pp.120-126, February 1978. [7] L. Song and K.K. Parhi, “Low Energy Digit-Serial/Parallel Finite Field Multipliers,” Journal of VLSI Signal Processing, vol. 19, no. 2, pp.149166, June 1998.

Suggest Documents