Inversions with Application to Elliptic Curve Scalar

0 downloads 0 Views 804KB Size Report
Keywords- Parallel algorithm, Montgomery's trick, simul- taneous inversion, elliptic ... Finite fields play an importanit part in codinig arid cryp- tography. The basic ...
Parallel Algorithm for Computing Simultaneous Inversions with Application to Elliptic Curve Scalar A

Multiplication

Palash Sarkar, Pradeep Kurnar Mishra and Rana Barua Abstract- Montgomery's trick is a well known technique for performing simultaneous inversions of several field elements. However, this technique is a strictly sequential algorithm. Here we introduce a parallel algorithm for performing simultaneous inversions of several finite field elements. The algorithm uses a binary tree and can perform inversions of 2r elements using 3 x 2r-I multipliers in (r + 1) multiplication rounds and one inversion round. We also describe how to modify the algorithm when less number of multipliers are available. This parallel algorithm is used to obtain a new parallel algorithm for elliptic curve scalar multiplication using a flxed base point. The scalar multiplication algorithm is resistant against simple power analysis (SPA) and can be implemented with different number of multipliers (2,4,8,...). Results show that implementation with 2 multipliers can lead to almost 40% speed-up over previously best known sequential SPA resistant algorithm. Keywords- Parallel algorithm, Montgomery's trick, simultaneous inversion, elliptic curve cryptosystems, scalar multiplication.

I. INTRODUCTION

Finite fields play an importanit part in codinig arid cryptography. The basic operations on finite fields are addition, multiplication arid inversion. Of these, additiotn is the chleal)est atid iniversion is the costliest. The cost of an iiiversioti can be as high as thirty (or Inrore) tiInes the cost of a multiplication for a finite field of cardinality a prime (6], whereas the cost of an addition is usually negligible compared to that of a multiplication. In many situations, the requiremienit is to compute inversions of several field elements (for example in SSL Handshake scheme [91). Montgomery's trick (see for example [9]) is an elegant technique for simultaneous comIiputation of the ir-verses of several field eleiiieiits. Usinig this trick it is possible to compute the inverses of n elements using 3(n - 1) multiplications and one iniversion (see Section lI-A). However, Montgomery's trick is a strictly sequential algorithrn with little scope for parallelisini. In this paper, we introduce a new algorithm for simultanieous computation of the inverses of several field elements. A sequential execution of this algorithm leads to more multiplicatioris than that of Montgomery's trick. On the other hand, our algorithm is nicely parallelizable and can be implemented quite easily using a binary tree. We show that the inverses of 2r elements can be computed using 3 x 2` I mtultipliers using (r + 1) multiplication round and one inversion round. When r is moderately large, the availability

of 3 x 2r-1 multipliers can be costly. In suchi a situationi. we show how to how to implement the algorithml usinig a fixed number (2, 4, 8, . . .) of multipliers. Elliptic curve scalar mnultiplicationi is the operation of computinig inP, where nm is a positive initeger andl P is a point on the curve. This is the basic operation on which elliptic curve cryptography is built. Consequenitly, there has been a tremendous amount of research on obtaininig efficient algorithms for different situation. The new book [3] provides an excellent discussion of this area. One particularly important issue is resistance against side clhanniiel attacks, which are of two types - simple power aiialysis (SPA) and differential power analysis (DPA). The second part of our paper is devoted to obtaininig a new parallel algorithm for elliptic curve scalar inultiplication. The algorithm is SPA resistauit and is applicable to the situation where the base point P is fixed. (The algorithm can be made DPA resistant using generic techlniques, though we do riot describe them here.) The niain idea behind the algorithm is to combinie the tnew iniverse coiputattipai algorithIm with a table look-up atnd a biniary tree bawsed t,echli(lue. TIhe algorithm canI be impleietitted wsing'a tiXd(l niuitiber (2, 4, 8, . . .) of multipliers. Resultsx shlow that. OVer prime fields (wlhere iniversioIn is relatively costly) atind( utsilng 2 multipliers leads to almost 40% increase in speed-up) over the best previously known sequential algorithm. We would like to point out that the currenit work does niot address the problem of efficiently computinig the itnverse of an element. This problem has been studied extensively in the literature (see for example [10]). We are concernied witl replacing mlore thani one inversionis with a single inversion and some mnultiplications. Thus our approach is complementary to the problem of obtainiing an efficient algoritlh for inversion, i.e., any efficient algorithmx for inversiotn cati be used to invett the single elemeint that is required b)y ouir

algorithm.

1I. BACKGROUND

A. Montgomery's Trick In Montgomery's trick (see for example [9]) inverses are computed as follows. Let xl, - , x,, be the elemerits to be inverted. Set a, = xi and for i = 2,..., it compuite ai = ai- I xi Theni invert a, arid compute x,-, =a,, a i Now, for i = n- 1,n-2,...,2, compute atl = xi+I a-I xt' = ai-ma.1. Finally compute x-l = a- = and The authors are with the Cryptology Researchi Group, Applied This procedure provides xl ,.. , xJ' using a total of 3(ni Statistics Unit, Indian Statistical Institute, 203 B. T. Road, Calcutta, P'in 700 108, INDIA. E-mail: {palash, pradeep-t, rana}Oisical.ac.in. 1) multiplications arid one inversion. =

0-7803-8294-3/04/$20.00 @2004 IEEE

782

t:w

TABLE I ECADD AND ECDBL ALGORITHM

Algorithm ECADD

Algorithm ECDBL

input: P(X1.YI),Q(X2,Y2) Output: P + Q = (X3,y3).

Input : P(xI, yi) Outpujt : 2P = (X4, y4) .

= (X2 -X)-I A2. A=tl*(Y2-YI)

Al. t1

IA3.

=3 2_2-XI-X2 A4. y3=A*(XI-X3)-Yi A5. return (X3, Y3)

DI. Ti - (2y1)-1 D2. T2 = 3X2 + a D3. A=T1*T2

D4. X4= A2 -XI - X2 Y4 = A * (xi - X3) -Yl D6. return (X4,y4)

D5.

B. Elliptic Curvie Preliminaries Elliptic curve cryptography has a wide literature. We refer the reader to the excellent book (3] for details. Here we only describe the essentials that we will require. An elliptic curve point is represented using a pair of finite field elements. Addition and doubling of points are performed as shown in Table I (for the case where the characteristic of the field is greater than 3). Note that in the addition algorithm we assume P $ Q. Let [i], [m] and [s] be the times required for one inversion, multiplication and squaring in the underlying fields respectively. Then, ECADD has complexity i[i] + 2[m] + l1s] and ECDBL has complexity 1[i] + 2(m] + 2[sJ. In the current work, we do not distinguish between a multiplication and a squaring. This may not be a realistic assumption when the underlying field is represented using a normal basis; in sucl a situation, squaring is virtually free. However, for staiidard (or polynomial) basis representation the cost of a squarinig is nearlycequal to that of a multiplication. Heiice we consider otnly statndarld basis representation of the finite field.

III. A NEW ALGORITHM FOR COMPUTING SIMULTANEOUS INVERSES Suppose we wish to compute the inverses of the field elements ao,... ,an1. For simplicity, let us assume, n = 2r for some r. Conisidler the full binary treeT witlh levels lummmbered 0 to r. At j-th node of the i-thi level of the tree (0) < i Kr; 0) 2 be a positive integer. We express m in the base 2w. Let m = co + c12w+ *. + ct_ 2w(t-'), where each cj E {o,..., 2' 1} and t =2 for some r. Then mP = coP + cl2wP+..+cetr2w(t- )P. For all j with 0 < j < t -1 we precompute 2JWP and store it in a table T[J. Thus T[j] = 2iw p for 0 < j < t -1. This table is used to simultaneously compute coP,c2wP,-. ,ct_.2w(t-1)P using the right-toleft binary rnethod. Finally we add them to obtain mP. Let the n-bit binary representation of m be m j ...mo. Note that t = [n/wI. We express cj in binary, i.e., we write cj = + c2 + + c 12", where cj = mwji. Algorithm PAR-SC Input: c for 0 < i < w - 1,0 < j < t - 1;table T[] Output: mP 1. For j = 0 to t - 1 -

TABLE III

NUMBER OFt ROLINDS REQUIRED FOR 160 BIT SCALAR NOTE 2r= 160/w.

w\2P 8

4

10

(718,14) (679,23)

5

20

(381,10) (383,14) (377,23)

(726,10)

MULTIPLIERl.

2

(1426,10) (1402,14) ((1306,23)

2P1(r, p) + ZI ) (3[2'/2"'i + Pl(i, p)) + (,w - 2)(3[2t' 1/21)1 +

PI(r + 1,))) nmultiplicatioii rotIl(lis to (ompIl)let tle1calal multiplication. Proof: Algorithm PAR-SC iiivokes 2r-ECDBL, 2rECADD and 2r-ADD onice each. Firtlier, it invokes 2r_ ECADDBL a total of (w - 2) tinies. Adding up) all the costs gives us the required result. U V. RESULTS

In tiiis section, we present soine results for typical situatiorns. For elliptic curve cryptosystemii a group order of 160 bits is considered to be secure. We consider various window sizes (values of w) and different number of Inultipliers (2P) and in each case we calculate the number of multiplicationi and inversion rounds required by Algorithli PAR-SC. The results are sumnmlarized in Table III. Each entry of Table III is a pair of the form (a, b), where a is the number of multiplication rounds and b is the numlber Rj= TUj]; Q(C) Rj; Q(iCi) - 0; of niiversion rounds. Tlie cost of an imiversioll over prilmle 2. (Ro,.. ., Rt 1) = t-ECDBL(Ro,* . 1); (> 3) fields canl be thirty or iiioie titlnes the cost of ai 1mu1l3. Fori= 1 tow-2 tiplitatiotn [21, [6]. If we asstiimie thurt Itl illVel.iOII iS v(1u;1il td 30 mnultiplication, thei for wiiidow size w = 5 amid usiiig R =t-ECADDEBL(Ro, Q(1OtQ1}) t.w(o multipliers, we will require ariotlIldi 1726 mufilt.iplic;a) (c 4 (QCO) tiomi IouiidIs. lzui, Takagi and M 1lie- [5] hIave rel)orte(l that Q the best scalar multiplication algorithm for elliptic curves =t-ECADD(Ro; Q(M,* Rt 1, Q('), over field of characteristic more than 3, takes 2429 niulti5. Let res = t-ADD(Q(1), ** plications, excluding additions arid assuming that cost of 6. Returrn (res). multiplication arid squaring are the same. Our algorithmn Note that the amount of computation is independent of does that in 1726 mlultiplicatiorns with 2 iniultipliers, whlichl the values of the Cjs. Hence the algorithm is SPA resistant. is a 40% speed-up over the sequential algorithni. Algorithrrm t-ADD takes as input t points and comnputes IEF ERENCES their sutiti. S-!6. fir llv.I.(anulrl2tt§r oosgnitt I)trrctllatnl lwrt ilnalysis f.r l'llitlltc. (Clnrvt ifJ ('IlICS 199191, 292-3112 ('ryptost.yoteni I'.,1t.rcp: Algoritlhmii I-ADD K. FIong, LV. Hankerson, J1. L6pes aind A. Meneses lielidversihot andX poilit (21 halving In&put: PO, P1, * *, P2' - I revisited. Technical Report, COPR 2003-18, lepartntest of Combiinatorics and Optimization, University of Waterloo, Canada. 2003. Output: PO + P1 + * ± P2r-jD. Hankerson, A. Meneses and S. Vanntone Guidre to Elliptitt Cit-re Ct3ptovy'smijh. 131 to appear. 1. For i = 1 to r do T. Izu and T. Takagi. A Fast Parallel Elliptic Curve Multiplicaition Resitatlit 141 against Side-Channel Attackh Techisical Report CORR 2002-03, University of 2. k = Waterloo,2002. Available at http://www.cacr.rilath.tswaterloo.ca 6Sj T Izun B. Moller and T. TakLgi. Improved Elliptic Curve Multiplicationi Metht3 (Po,P2i P2.25,P3.2iP, ods Resistant Against Side Channel Attacks. Proceedings of Indocrypt 2002 *. k-1)2i 25ti1, pp 206-313, Springer-Verlag. =k-ECADD(Po, P2i - 1, P2.2i - 1, P3.2i - l, ... , P(k -1l)2i -1); 161 LNCS A. J. Meneses, P. C. van Oorschot and S. A, Vanstone. Husdloo,t 'Appltid CRC Press, 1997. Cryjptogrvphgy. 4. return Po. P. Montgomery. Speeding the Pollard and Elliptic Curve Methods for Fsctori (71 sation. In Moth. omip., vol 48, pp 243-264, 1987. Algorithm 2r-ADD invokes 21-ECADD for 0 < i < - 1. Is, K.Multiplication Okeya and K. Sakurai. Efficient Elliptic Curve Cryptooystemns front a Scalar Algorithm with Recovery of the y-coordinate oti a Monttgorstery form Elliptic Curve. In CHES 2(001, LNCS 2162. pp 126-141. Springer-Verlag Using the cost of 2i-ECADD we obtairn the cost for 2r-ADD 2001. Performance via Betchto be a total of Er- (3[2i/2P] + Pl(i,p)) multiplication 091 H. Shacham and D. Boneh. Improving SSL Handshake In S7'-RS4A LNCS 2020, Sprioiger-Vtrlag. 2001. rournds and r inversion rounds. Thus we can now obtaini (l01 int. N. Sklavos. K. Ppipadomanolokis P. Kitfos and 0. Kotifopnvlou. Etoelidenn Al?o gorithm VLSI Ir:mplernentationis ('vii ie of 9th IEEE loti tionla Cohtmft#t,f,, the nuinber of rounds required to compute algorithml PARElecttonsi,i., (Citcuit and Sy.tets (ICECS'02). Vol. I1, pp. 557-560. Croatia, SeptemlSC. ber 15- 18, 2002. Proposition 5: Using 2P multipliers, Algorithm PAR-SC requires (r + w) inversion rounds and (w + 5)[2r/2P] +

Q(1)1).

-

i

785

Suggest Documents