FPGA implementation of multiplication algorithms for ECC - IEEE Xplore

FPGA Implementation of Multiplication Algorithms for ECC Ravi Kishore Kodali, Lakshmi Boppana, AV Saikiran and Chandana N. Amanchi Department of Electronics and Communication Engineering, National Institute of Technology, Warangal WARANGAL, 506004, India

Abstract—Various cryptographic techniques use finite field multiplication. An efficient implementation of finite field multiplication is essential. Especially, elliptic curve cryptography (ECC), which provides high security with shorter key lengths, requires many multiplications during encryption and decryption phases. It is imperative to choose a faster and less resource consuming multiplier. Many algorithms have been proposed for the implementation of finite field multiplication in the literature. This paper discusses three different multiplication algorithms: SunarKoc, Karatsuba and Booth. The same have been implemented using Xilinx Virtex-7 family XC7V2000T-1FLG1925 FPGA device and a comparison in terms of time and resource utilization have been presented. All the three algorithms have been implemented for the key lengths of 194−, 233− and 384− bits. Similar key lengths are used in ECC.

Keywords: ECC, multiplication, Cryptography I. I NTRODUCTION In elliptic curve cryptography (ECC), encryption and decryption are performed through elliptic curve point operations. These operations include point addition, point doubling and scalar point multiplication. Scalar point multiplication consumes most of the resources and also it is time taking. Finite field multiplication operations are performed number of times in the scalar point multiplication. Hence, an efficient algorithm has to be chosen for multiplication. Consider a binary finite field, GF{2m }, where m is the number of bits. Elements in the binary field can be considered as polynomials of degree (m − 1). The binary string, (am−1 , am−2 , ....., a1 , a0 ) can be expressed as polynomial: am−1 X m−1 + am−2 X m−2 + ..... + a1 X + a0 . Finite field multiplication is somewhat different from normal multiplication in which the result should also be in the field. The product of two elements in the binary field GF {2m } may exceed m− bits. In such a case, the result should be brought back to m− bits using irreducible polynomial, a polynomial of degree m. An irreducible polynomial is chosen to be trinomial or pentanomial for an efficient polynomial reduction. Sunar-Koc, Karatsuba and Booth algorithms are some of the techniques proposed for finite field multiplication. Sunar-Koc algorithm involves the conversion of numbers into optimal normal basis of type II. Then the multiplication is performed and the result is converted back [1]. Karatsuba

c 978-1-4799-8792-4/15/$31.00 2015 IEEE

algorithm is based on divide and conquer approach. The product of two elements of size m− bits is calculated using three m 2 − bits multiplications [2]. By using this technique recursively, resource utilization can be reduced. Booth algorithm calculates the product by examining adjacent pairs of bits for addition or subtraction of partial product and multiplicand. These three multiplication algorithms have been implemented using Field programmable gate array (FPGA) for three different key lengths: 194-, 233− and 384− bits. The irreducible polynomials [3] used are: 194− bits : x194 + x87 + 1 233− bits : x233 + x74 + 1 384− bits : x384 + x12 + x3 + x2 + 1 A comparison of the algorithms is also made with respect to hardware resource utilization and time. The rest of the paper is organized as follows: Section II presents literature review. Sections III, IV, V discuss Sunar-Koc, Karatsuba and Booth algorithms, respectively. Results and comparison are provided in section VI. The last section concludes the work. II. L ITERATURE R EVIEW A design for elliptic curve operations is proposed in [4]. Elliptic curve point operations and binary field arithmetic are discussed. Algorithms for scalar point multiplication, point addition, multiplication and inversion are given. Hardware implementation of point addition and point multiplication processor are presented. A high performance ECC processor for GF(2163 ) is presented in [5]. Lopez-Dahab algorithm, which uses projective coordinate system, is used for point multiplication. Point addition and doubling are performed with uniform addressing technique. Arithmetic units for coordinate conversion, point addition and doubling are presented. The importance of scalar point multiplication in ECC implementation is discussed in [6]. Sliding window method is applied to perform the scalar point multiplication. Different window sizes are used from 3 to 15. The results are compared with binary and NAF methods. A fast elliptic curve point multiplication is presented in [7] using mixed coordinate system based width-w NAF multiplication algorithm. The cost analysis of the same is done and

549

proved that it is very less compared to width-w and NAF methods. Sunar-Koc multiplication algorithm is proposed in [8]. It is shown that its implementation requires 1.5×(m2 − m) X-OR gates for GF {2m }. A detailed discussion on the implementation of Itoh-Tsujii inversion algorithm (ITIA) over GF(2233 ) is given in [1]. Sunar-Koc multiplier is utilized for the multiplication. The implementations of Sunar-Koc multiplier on FPGA and WSN nodes are discussed in [9]. Algorithms are given for different phases of Sunar-Koc multiplication and a block diagram showing the phases is presented. The Sunar-Koc multiplication is implemented for the key lengths 174−, 194− and 233− bits and a comparison is presented. Original Karatsuba method is proposed and implemented in [10]. The recursive Karatsuba-offman multiplier is implemented in [2]. Two new sequential and parallel architectures are proposed and implemented on Spartan- 3 FPGA device. A low complexity Karatsuba multiplier is implemented by transforming the multiplication operands in [11]. The proposed algorithm space complexity is compared with padded Karatsuba algorithm (KA), simple recursive KA and hybrid KA for the key lengths of 163−, 233− and 283− bits. A hardware accelerator for polynomial multiplication in extended Galois fields is implemented in [12] applying the Kartsuba method iteratively. Booth multiplication algorithm working is explained with an example. A low power Booth multiplier is implemented using recoding technique in [13]. A power efficient Booth multiplier is implemented in [14]. III. SUNAR-KOC MULTIPLIER The Itoh-Tsujii inversion algorithm (ITIA) requires squaring and multiplication operations while carrying out an inversion operation. The squaring operation is a simple cyclic shift in normal basis. The multiplication can be performed efficiently using Sunar-Koc algorithm. In this algorithm, the numbers are first converted into shifted form of canonical basis (optimal normal basis of type II) and then the product is computed. The numbers are then converted back into normal basis. This involves three steps [9]: 1) Conversion of the normal basis into its equivalent shifted form of canonical basis. 2) Computation of the product of the numbers. 3) Conversion of the resultant product into its normal basis. The conversion technique of the normal basis into its equivalent shifted form of canonical basis is as follows: • If 2 is primitive modulo p, the set of powers of 2 modulo p 2, 22 , 23 , ...22m−1 , 22m (modp) is equivalent to {1, 2, 3, ...2m − 1, 2m}. Therefore, the elements in the i i normal basis γ 2 + γ −2 can take the form γ j + γ −j for j ∈ [1, 2m]. The powers can be brought into the range [1, m] by writing the elements γ j + γ −j as γ (2m+1)−j + γ −(2m+1)+j for j ≥ m + 1. • If the multiplicative order of 2 modulo p = m, then the set of powers of 2 modulo p 2, 22 , 23 ...2m−1 , 2m are m

550

distinct integers in [1, 2m]. Then the powers are brought into the range [1, m] as given in equation (1). j = 2i (mod p) if 2i (mod p) ∈ [1, m]

(1a)

j = (2m + 1) − 2i (mod p) if 2i (mod p) ∈ [m + 1, 2m] (1b) Second step involves multiplication of the numbers in the shifted form of canonical basis. Let us consider two numbers X, Y ∈ GF (2m ). They can be expressed in basis N as given in equation (2). m

X=

Y =

m

xi βi =

(i=1)

(i=1)

m

m

y j βj =

(j=1)

xi (γ i + γ −i )

(2a)

yj (γ j + γ −j )

(2b)

(j=1)

X

Y (canonical basis) m

m

BASIS CONVERTER

BASIS CONVERTER X

1

m

1

m

Y

(shifted form)

MULTIPLICATION

P

1

m

BASIS CONVERTER m

P (canonical basis)

Fig. 1: Sunar-Koc Multiplication

Z = X.Y = (

m

(i=1) m

=(

xi βi ).(

y j βj )

(j=1)

xi (γ i + γ −i )).(

(i=1)

=

m

m m

m

yj (γ j + γ −j ))

(j=1)

xi yj (γ (i−j) + γ −(i−j) )

i=1 j=1

+

m m

xi yj (γ (i+j) + γ −(i+j) )

i=1 j=1

= Z1 + Z2

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

(3)

Algorithm 1 Karatsuba Multiplier INPUT: X and Y of degree (m − 1). OUTPUT: P = X.Y 1. half m = (m + 1)/2; half m odd = m/2; 2.if m is even then

Z1 and Z2 can be further expressed by m m xi yj (γ (i−j) + γ −(i−j) ) Z1 = i=1 j=1 i=j

= Z2 =

xi yj (γ (i−j) + γ −(i−j) )

(3a)

1≤i,j≤m m m

xi yj (γ (i+j) + γ −(i+j) )

i=1 j=1

= +

m m−i

xi yj (γ (i+j) + γ −(i+j) )

i=1 j=1 m

m

xi yj (γ (i+j) + γ −(i+j) )

i=1 j=m−i+1

= W1 + W2

(3b)

Hence, the product, Z, can be calculated as Z = Z1+W 1+ W 2 [1]. The third step involves the conversion of the product back into its normal basis. The Sunar-Koc multiplication uses 1.5 × (m2 − m) XOR gates. Figure -1 shows the block diagram of Sunar-Koc multiplier. IV. KARATSUBA MULTIPLIER Karatsuba multiplier is based on divide and conquer approach. Consider two numbers X, Y ∈ GF (2m ). These numbers can be represented in binary form as (xm−1 , xm−2 , .....x1 , x0 ) and (ym−1 , ym−2 , ...., y1 , y0 ) respectively. The numbers are divided into two equal parts for karatsuba algorithm. Most significant ( m 2 )− bits as one part and )− bits as another part. Then the product least significant ( m 2 of X and Y can be calculated by performing three ( m 2 )− bit multiplications and few shifting and addition operations. X=

m−1

xi β i

i=0 (m/2)−1

=

xi β i +

i=0

m−1 i=m/2

(m/2)−1

(m/2)−1

=

xi β i

i

xi β + β

m/2

i=0

xi+m/2 β i

i=0 m/2

= X0 + β X1 m−1 Y = yi β i i=0 (m/2)−1

=

yi β i +

i=0

(3c)

m−1

X.Y = X0 Y0 + X1 Y1 β m + (X0 Y1 + X1 Y0 )β m/2 = X0 Y0 + X1 Y1 β m + ((X0 + X1 )(Y0 + Y1 ) − X0 Y0 − X1 Y1 )β m/2 = Z0 + Z1 β m/2 + Z2 β m

yi β i

i=m/2

(m/2)−1

=

a ← x; b ← y; else if m is odd then a ← ’0’&x; b ← ’0’&y; end if 3. for i in 0 to half m-1 do sum a ← a(i) xor a(i + half m); sum b ← b(i) xor b(i + half m); end for 4.a0b0 := karatsuba m by 2(a(half m-1 downto 0), b(half m-1 downto 0)); a1b1 := karatsuba m by 2(a(m-1 downto half m), b(m-1 downto half m)); a2b2 := karatsuba m by 2(sum a, sum b); 5. 5.1 P(half m-1 downto 0) := a0b0(half m-1 downto 0); 5.2 for i in 0 to half m-2 do P(half m+1) := a2b2(i) xor a1b1(i) xor a0b0(i) xor a0b0(i+half m); end for 5.3 p(2*half M-1) := a2b2(half m-1) xor a0b0(half m-1) xor a1b1(half m-1); 5.4 for i in half m to 2*half m odd-2 do P(half m + i) := a2b2(i) xor a0b0(i) xor a1b1(i) xor a1b1(i-half M) ; end for 5.5 if m is odd then P(3*half m-3) := a2b2(2*half m-3) xor a0b0(2*half m3) xor a1b1(half m-3) ;P(3*half m-2) := a2b2(2*half m2) xor a0b0(2*half m-2) xor a1b1(half m-2) ; end if 5.6 p(3*half m-1) := a1b1(half m-1); 5.7 P(2*m-2 downto 3*half m) := a1b1(2*half M odd-2 downto half m); 6. Return P;

(4)

(m/2)−1

yi β i + β m/2

i=0

= Y0 + β m/2 Y1

yi+m/2s β i

i=0

(3d)

Karatsuba algorithm can be applied recursively for better resource utilization. X0 , X1 , Y0 , Y1 can be further divided into lower-order polynomials for lower-order multiplications. Karatsuba multiplier for polynomials of degree (m − 1) is


551

X m

Y

P 2m

m

Karatsuba_m

m/2

m/2

m

m/2

Karatsuba_(m/2)

m/4

m/2

m

m/2

m/2

Karatsuba_(m/4)

m/4

m/4

m/2

Karatsuba_(m/4)

2 1−bit multiplier

m/4 m/4

m/2

Karatsuba_(m/4)


m

Karatsuba_(m/2)

Karatsuba_(m/2)

m/4

m/2


Fig. 2: Recursive Karatsuba Multiplication

shown in Algorithm - 1. Karatsuba m by 2 is the multiplier for ( m 2 ) degree polynomials. The polynomials are further divided until the degree becomes ’1’ and the Karatsuba algorithm is applied recursively. The product obtained from the algorithm 1 has more than m− bits. It should be reduced to m− bits using the irreducible polynomial. The polynomials are divided as shown below: GF(2194 ) : 194 → 98 → 50 → 26 → 14 → 8 → 4 → 2 → 1 GF(2233 ) : 233 → 118 → 60 → 30 → 16 → 8 → 4 → 2 →1 GF(2384 ) : 384 → 192 → 96 → 48 → 24 → 12 → 6 → 4 →2→1

Karatsuba algorithm can be implemented using either parallel architecture or sequential architecture. Parallel architecture is faster but it consumes more resources. Sequential architecture is slower but it consumes less resources. The architecture should be chosen depending on the important consideration(speed or resources).

V. BOOTH MULTIPLIER Let X and Y be multiplicand and multiplier, respectively. Let P be the product ofX and Y . The product is initialised to zero. Then, Booth multiplication can be implemented by repeatedly performing addition or subtraction of the multiplicand to the product by examining the bits of multiplier, followed by an arithmetic right shift. Let X, Y, B :std logic vector(m − 1 downto 0), A :std logic vector(2m downto 0), P :std logic vector(2m − 1 downto 0). Algorithm -2 gives the steps in Booth multiplication. Every time, the last two bits of the multiplier are examined. If the bits are ”10”, multiplicand is subtracted from the product. If the bits are ”01”, multiplicand is added to the product. If the bits are ”00” or ”11”, no operation is performed. After this step, both the product and multiplier are right shifted. This process is repeated ’m’ number of times. At the end of the process, we obtain a ’2m− 1’ degree polynomial as the product. This product should be reduced to m bits using the irreducible polynomial. VI. IMPLEMENTATION RESULTS

Figure -2 shows the block diagram of recursive Karatsuba multiplication.

552

All the three multiplication algorithms have been implemented using Xilinx Virtex-7 family XC7V2000T-1FLG1925


(a) Key length = 194

(b) Key length = 233

(c) Key length = 384

Fig. 3: Simulation results of Sunar-Koc algorithm




Fig. 4: Simulation results of Karatsuba algorithm




Fig. 5: Simulation results of Booth algorithm 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

553

TABLE III: FPGA Synthesis results for GF (2384 )

Algorithm 2 Booth algorithm INPUT: X and Y OUTPUT: P = X.Y 1. A ← zero; 2. A(m downto 1) ← X; 3. for i in 0 to m-1 do

Parameter Logic Utilization No. of LUT’s No. of IO’s Maximum combinational path delay

if A(1) = 1 and A(0) = 0 then B ← A(2m downto m+1); A(2m downto m+1) ← (B - Y); else if A(1) = 0 and A(0) = 1 then B ← A(2m downto m+1); A(2m downto m+1) ← (B + Y); end if A(2m-1 downto 0) ← A(2m downto 1); end for 4. P = A(2m-1 downto 0);

TABLE I: FPGA Synthesis results for GF (2194 )

Logic Utilization No. of LUT’s No. of IO’s Maximum combinational path delay

Available 1221600 1200

Sunar-Koc Used

Algorithm Karatsuba Used

Booth Used

52827 582

10294 582

112940 582

3.767ns

8.173ns

239.918ns

TABLE II: FPGA Synthesis results for GF (2233 ) Parameter Logic Utilization No. of LUT’s No. of IO’s Maximum combinational path delay


Sunar-Koc Used


Booth Used

76055 699

13308 699

162905 699

3.877ns

8.125ns

287.708ns

VII. C ONCLUSIONS The Sunar-Koc, Karatsuba and Booth algorithms have been implemented using Xilinx Virtex-7 family XC7V2000T1FLG1925 FPGA device for three different key lengths:194-,

554


Booth Used

42705 1152

30145 1152

442443 1152

4.194ns

9.365ns

473.311ns

233-, 384- bits. It may also be noted that similar key sizes are normally being used in ECC. A reource utilization and performance comparison of the three algorithms is presented. The Sunar-Koc multiplier is the fastest and the Karatsuba multiplier is the least resource consuming. Elliptic curve cryptography can be implemented with Karatsuba multiplier which uses lesser resources.

FPGA device. The key lengths considered are: 194−, 233− and 384− bits and the simulation results for the same are shown in Figures -3, -4 and -5 respectively. The synthesis results of the same are given in Tables -I, -II and -III. From the synthesis results, it can be observed that Booth algorithm resource consumption is high and it is the slowest among the three algorithms. Karatsuba algorithm resource consumption is low and its speed is moderate. Sunar-Koc algorithm is the fastest and its resource consumption is also moderate. If the speed is the important factor for the design, Sunar-Koc algorithm may be chosen for ECC. If the consideration is resource consumption, Karatsuba algorithm is the best option in the implementation of ECC.

Parameter


Sunar-Koc Used

R EFERENCES [1] Q. Deng, X. Bai, L. Guo, and Y. Wang, “A fast hardware implementation of multiplicative inversion in gf (2m),” ˆ in 2009 Asia Pacific Conference on Postgraduate Research in Microelectronics&Electronics (PrimeAsia), 2009, pp. 472–475. [2] E.-H. Wajih, M. Mohsen, Z. Medien, and B. Belgacem, “Efficient hardware architecture of recursive karatsuba-ofman multiplier,” in Design and Technology of Integrated Systems in Nanoscale Era, 2008. DTIS 2008. 3rd International Conference on. IEEE, 2008, pp. 1–6. [3] H. Darrel, M. Alfred, and V. Scott, “Guide to elliptic curve cryptography,” M]. New York: Sprin ger-Verlag, 2004. [4] M. Amara and A. Siad, “Hardware implementation of arithmetic for elliptic curve cryptosystems over gf(2m),” ˆ in Internet Security (WorldCIS), 2011 World Congress on, Feb 2011, pp. 73–78. [5] H. M. Choi, C. P. Hong, and C. H. Kim, “High performance elliptic ˆ curve cryptographic processor over gf (2163),” in Electronic Design, Test and Applications, 2008. DELTA 2008. 4th IEEE International Symposium on. IEEE, 2008, pp. 290–295. [6] R. Kodali and H. Budwal, “High performance scalar multiplication for ecc,” in Computer Communication and Informatics (ICCCI), 2013 International Conference on, Jan 2013, pp. 1–4. [7] R. Kodali, S. Karanam, K. Patel, and H. Budwal, “Fast elliptic curve point multiplication for wsns,” in TENCON Spring Conference, 2013 IEEE, April 2013, pp. 194–198. [8] B. Sunar and C. K. Koç, “An efficient optimal normal basis type ii multiplier,” IEEE Transactions on Computers, vol. 50, no. 1, pp. 83–87, 2001. [9] R. K. Kodali, P. Gomatam, and L. Boppana, “Implementations of sunarkoc multiplier using fpga platform and wsn node,” in TENCON 20132013 IEEE Region 10 Conference (31194). IEEE, 2013, pp. 1–4. [10] A. Karatsuba and Y. Ofman, “Multiplication of multidigit numbers on automata,” in Soviet physics doklady, vol. 7, 1963, p. 595. [11] Z. Ge, G. Shou, Y. Hu, and Z. Guo, “Design of low complexity gf (2m) ˆ multiplier based on karatsuba algorithm,” in Communication Technology (ICCT), 2011 IEEE 13th International Conference on. IEEE, 2011, pp. 1018–1022. [12] Z. Dyka and P. Langendoerfer, “Area efficient hardware implementation of elliptic curve cryptography by iteratively applying karatsuba’s method,” in Design, Automation and Test in Europe, 2005. Proceedings. IEEE, 2005, pp. 70–75. [13] A. Prabhu and V. Elakya, “Design of modified low power booth multiplier,” in Computing, Communication and Applications (ICCCA), 2012 International Conference on. IEEE, 2012, pp. 1–6. [14] S.-R. Kuang and J.-P. Wang, “Design of power-efficient configurable booth multiplier,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 57, no. 3, pp. 568–580, 2010.


FPGA implementation of multiplication algorithms for ECC - IEEE Xplore

FPGA implementation of multiplication algorithms for ECC - IEEE Xplore

Suggest Documents

FPGA implementation of multipliers for ECC - IEEE Xplore

FPGA implementation of multipliers for ECC - IEEE Xplore

Implementation of ECC/ECDSA cryptography algorithms ... - IEEE Xplore

FPGA montgomery modular multiplication architectures ... - IEEE Xplore

Implementation of Modular Multiplication for RSA ... - IEEE Xplore

algorithms and FPGA implementation

FPGA Implementation of Tabu Search for the Quadratic ... - IEEE Xplore

Novel Architecture for Efficient FPGA Implementation of ... - IEEE Xplore

High Performance Median FPGA Implementation for ... - IEEE Xplore

Real-Time Machine Vision FPGA Implementation for ... - IEEE Xplore

FPGA based Optimization for Masked AES Implementation - IEEE Xplore

Implementation of RNS addition and RNS multiplication ... - IEEE Xplore

An Efficient Implementation Of Genetic Algorithms For ... - IEEE Xplore

Implementation of cryptographic algorithms for radio ... - IEEE Xplore

FPGA Implementation of 160- bit Vedic Multiplier - IEEE Xplore

Efficient FPGA Implementation of a Wireless ... - IEEE Xplore

FPGA Implementation of the C-Mantec Neural Network ... - IEEE Xplore

A high performance FPGA implementation of DES - IEEE Xplore

FPGA implementation of vedic floating point multiplier - IEEE Xplore

FPGA implementation of vedic floating point multiplier - IEEE Xplore

Design and Implementation of Fast FPGA Based ... - IEEE Xplore

FPGA Implementation of a Fully Digital Demodulation ... - IEEE Xplore

FPGA implementation of Hilbert transformer based on ... - IEEE Xplore

FPGA-based Hardware Implementation of Optical Flow ... - IEEE Xplore