Low Latency Elliptic Curve Cryptography ... - Semantic Scholar

Low Latency Elliptic Curve Cryptography Accelerators for NIST Curves over Binary Fields Chang Shu, Kris Gaj ECE Department George Mason University 4400 University Drive Fairfax, VA 22030-4444, USA {cshu, kgaj}@gmu.edu

Abstract We designed hardware accelerators based on Xilinx FPGAs, XCV2000E, to speed up the scalar multiplications on elliptic curves recommended by NIST, over GF (2163 ) and GF (2233 ), in polynomial basis representation. Linear-Feedback-Shift-Registers (LFSRs) are exploited in the most significant digitserial (MSD) multipliers in order to improve design efficiency. We adopt the algorithm of scalar multiplication devised by López and Dahab [4]. We demonstrate how this algorithm can be implemented using multiple multipliers working in parallel, and we select the optimal parameters for these multipliers. The accelerators can run around 3 times faster than the best hardware implementation reported previously by Gura et al. [1] at CHES 2002, when ported to the same device Xilinx Virtex XC2000E.

1. Introduction Over the last 20 years, Elliptic Curve Cryptography has evolved from a mere curiosity into a mature and secure family of public key cryptosystems used in practical applications. Several implementations of ECC over GF (2n ) have been developed and reported in the literature [1, 5]. Most of these ECC accelerators are composed of an arithmetic unit (AU) and a controller. In such architecture, one field multiplier adopted in AU completes all field multiplications. In our approach, all ECC operations are implemented as independent arithmetic units, with no resource sharing. As a result, multiple GF (2n ) multipliers can work in parallel, and a substantial improvement in speed can be achieved. Additionally, the complicated data-path containing many levels of bit-wide multiplexers has been simplified in order to reduce the minimum clock period. The design has been ported to the same device as Gura et al’s design, namely, Xilinx XCV2000E-FG680-7, and performance comparisons

Tarek El-Ghazawi ECE Department The George Washington University 801 22nd Street NW Washington DC, 20052 USA [email protected]

are demonstrated.

2. López-Dahab algorithm Let P , P1 , and P2 be points on the curve E such that P2 = P1 + P . Let the affine x-coordinate of Pi be represented by Xi /Zi , for i ∈ 1, 2. The projective X-coordinates 1 of 2Pi and P3 = P1 + P2 can be represented as follows: X(2Pi ) = Xi4 + b · Zi4 Z(2Pi ) = Zi2 · Xi2 X(P1 + P2 ) = x · Z3 + (X1 · Z2 ) · (X2 · Z1 ) Z(P1 + P2 ) = (X1 · Z2 + X2 · Z1 )2

(1)

Due to the limitation of pages, we only list the formulae computing point addition and point doubling. More details of the algorithm can be found in Reference [4]. According to Equation (1), both point addition and doubling can be performed in parallel. Coordinate transform need to be performed at the last step.

3. Field arithmetics Addition in a binary Galois Field is trivial. If trinomial or pentanomial can be chosen as the field polynomial, squaring can be implemented very efficiently using XOR gates. Multiplication is the most important field operation that must be implemented with high efficiency. We presented a new architecture aimed at low wire density via hardwired XORs. In our MSD serial multipliers (See Figure 1), reductions are performed at each iterative step to keep the partial pruduct size as n instead of n + d, where d is the digit size, so that it’s easier for EDA tools to place and route. For multiplicative inversion, we adopted Itoh-Tsujii’s [3] method. 1 The projective Y-coordinate don’t need to be computed at the intermediate stages, but can be retrieved from the X and Z coordinates of the final results.

Shift by 4 each cycle

c0

c1

c2

c3

c4

c5

c6

c7

c8

c9

c 10

c 11

c 12

c 13

...

c 14

c 155 c 156 c 157 c 158

c 159 c 160 c 161 c 162

...

b(x)

d3(x)

x b(x) mod f(x)

d2(x)

x2 b(x) mod f(x)

d1(x)

163 163

b(x)

163 163 163

d0(x)

3

x b(x) mod f(x)

cj 163

163 163

163

163

cj

ck

ck

d(x)

c i-4

ci

ci

=

c i-4

D

Q

ci

FF

163

clk

Shift by 4 each cycle

di a0

...

a160 a161 a162 a163 di

Figure 1. Digit-serial multiplier over GF(2163 )

4. FPGAs implementation and results Table 2. Performance comparisons with Gura et al. results Elliptic Curve Cryptography Accelerator

mul_1

mul_2

sqr_1

sqr_2

mul_3

sqr_4

Coordinate Transformer mul_4

mul_5

Guar et al.

Point Doubler

Point Adder

sqr_3 sqr_5

Inverter

sqr_6

mul_6

sqr_7

Our design (d = 32)

Field size n

163

233

163

233

FFs

6,442

NA

7,467

10,632

LUTs

19,508

NA

25,763

35,800

f(MHz)

66.5

66.5

68.9

67.9

Latency (µs)

143

225

48

89

Multiple Squarer

5. Conclusions Figure 2. The diagram of the ECC accelerator

Table 1. Digit size of multipliers Digit size mul mul mul mul mul mul

1 2 3 4 5 6

GF (2163 )

GF (2233 )

32 32 32 8 8 8

32 32 32 8 8 8

Field multipliers are not shared among point adder, point doubler, and coordinate converter to avoid complicated data-path that will have negative effect on timing and routing. Performance comparisons with the accelerators developed by Gura et al. [1] are provided in Table 2, for which both designs are ported into the same FPGA device, Xilinx XCV 2000E-FG680-7.

Our accelerator can run three times as fast as the accelerator designed by Gura et al. with the same resource utilization. Fast speed can be achieved due to efficient field arithmetic, top-level algorithm and rational partition of the design. In particular, LFSRs are exploited , together with AND-XORs arrays in order to further optimize the design. The best choice of the word length of multipliers is also a significant contribution to the efficiency of the design.

6. References [1] N. Gura et al. An end-to-end systems approach to elliptic curve cryptography. In CHES ’02, pages 349–365, 2003. [2] FIPS-186-2, Digital Signature Standard [3] T. Itoh and S. Tsujii. A fast algorithm for computing multiplicative inverses in GF (2m ) using normal bases. Inf. Comput., 78(3):171–177, 1988. [4] J. López and R. Dahab. Fast multiplication on elliptic curves over GF (2m ) without precomputation. In CHES ’99, pages 316–327, 1999. [5] G. Orlando and C. Paar. A high performance reconfigurable elliptic curve processor for GF (2m ). In CHES ’00, pages 41–56, 2000.

Low Latency Elliptic Curve Cryptography ... - Semantic Scholar

Low Latency Elliptic Curve Cryptography ... - Semantic Scholar

Suggest Documents

Low-Latency Elliptic Curve Scalar Multiplication - Semantic Scholar

ELLIPTIC CURVE CRYPTOGRAPHY: JAVA ...

Elliptic curve cryptography-based access control in ... - Semantic Scholar

An Elliptic Curve Cryptography Coprocessor over ... - Semantic Scholar

An Elliptic Curve Cryptography Coprocessor over ... - Semantic Scholar

Implementation of Elliptic Curve Cryptography with ... - Semantic Scholar

Design of an Elliptic Curve Cryptography ... - Semantic Scholar

Low-Power Elliptic Curve Cryptography Using Scaled ... - CiteSeerX

Elliptic Curve Cryptography: Algorithms and Implementation Analysis ...

Elliptic Curve Cryptography Based Access ... - Computer Science

An Elliptic Curve Cryptography based Authentication ...

E-commerce Security through Elliptic Curve Cryptography

MRADG design on Elliptic Curve Cryptography

Elliptic Curve Cryptography on PocketPCs - SERSC

Elliptic Curve Cryptography on PocketPCs - SERSC

Elliptic Curve Cryptography (ECC) based Relational Database ...

Analysis of Elliptic Curve Cryptography - Ijser

Elliptic Curve Cryptography and Government Backdoors - Duke ...

A secure and enhanced elliptic curve cryptography

A Novel Elliptic curve cryptography Processor

1 Elliptic curve cryptography - David Jao

Elliptic Curve Cryptography based Threshold Cryptography (ECC-TC ...

Low Power Elliptic Curve Digital Signature Design ... - Semantic Scholar

Low-Cost Elliptic Curve Digital Signature ... - Semantic Scholar