An Efficient High Performance Scalar Multiplication Method with Resistance against Timing Attacks Turki F. Al-Somani1 and Alaaeldin Amin2 1 Computer Engineering Department, Umm Al-Qura University, Makkah, Saudi Arabia 2 Computer Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia, E-mail:
[email protected] [email protected] Abstract This paper presents an efficient high performance elliptic curve scalar multiplication method with resistance against Timing Attacks. The main idea of the proposed method is to control the main scalar multiplication loop such that either a single point addition is performed or a number of consecutive point doublings that take the same time taken by a single point addition is performed. The proposed method works with both binary-encoded as well as -AF-encoded private keys with -AF encoding yielding higher performance. It requires no extra fake computations and its time complexity is less than other recently reported countermeasures, especially when parallel multipliers are used.
1. Introduction Elliptic Curve Cryptosystems (ECCs) have been recently attracting increased attention [1]. The ability to use smaller key sizes and the computationally more efficient ECC algorithms compared to those used in earlier public key cryptosystems such as RSA [2] and ElGamal [3] are two main reasons why ECCs are becoming more popular. They are considered particularly suitable for implementation on smart cards or mobile devices. Because of the physical characteristics of such devices and their use in potentially hostile environments, Side Channel Attacks (SCA) [4] [5] on such devices are considered serious threats. SCA seek to break the security of these devices through observing their power consumption trace or computations timing. SCA are usually divided into two types. The first type, Simple Side Channel Attack (SSCA), which is based on a single observation of power consumption or the consumed time, while the second type, Differential Side Channel Attack (DSCA)
combines SSCA with an error-correcting technique using statistical analysis [4]. SCA have been demonstrated against implementations of many cryptosystems, utilizing timing, power consumption, electromagnetic radiation, etc [4 - 21]. Public key cryptosystems have proved the most vulnerable to timing attacks because they typically perform lengthy mathematical operations, the running time of which depends directly on the data due to branch statements. Timing attacks on ECCs and countermeasures have been reported in [13, 22 – 24]. Schepers in [13] applied Timing Attacks on a software implementation of an ECC written in C++. The work in [13] showed that inserting fake computations can be discovered by Timing Attacks at the commitment stage [25]. Chevallier-Mames et al. [22] proposed side-channel atomicity as an efficient countermeasure against SSCA. Side-channel atomicity involves nearly no computational overhead to resist against SSCA. It splits the elliptic curve point operations into atomic blocks, which are indistinguishable from each other. Hence, side-channel atomicity is considered as an inexpensive countermeasure that does not leak out any data regarding which operation is being performed. Hodjat et. al. in [23] proposed a new scalar multiplication algorithm that inspects three bits at a time to resist against Timing Attacks. The proposed work in [23] defined a new dataflow for point addition and point doubling. However, the proposed work in [23] uses two parallel field multipliers and a single field squarer. Recently, Al-Somani and Ibrahim in [24], presented an elliptic curve cryptoprocessor with resistance against Timing Attacks. The proposed cryptoprocessor uses three parallel multipliers to increase both the performance and the immunity against Timing Attacks. This paper presents an efficient high performance elliptic curve scalar multiplication method with
resistance against timing attacks. The rest of this paper is organized as follows: Section 2 gives a brief introduction to ECCs. Section 3 describes the proposed scalar multiplication method. Section 4 analyzes the security of the proposed method and provides a comparison between the proposed method and other recent countermeasures. Finally, Section 5 concludes this work.
2. Elliptic Curve Preliminaries Elliptic curve cryptosystems, originally proposed by Niel Koblitz and Victor Miller in 1985 [1], are considered as viable alternative to RSA cryptosystems but with much shorter key sizes. An ECC with key size of 128-256 bits has been shown to offer equal security to that of RSA with key size of 1-2K bits [1]. To date, no significant breakthroughs have been reported in determining weaknesses in the ECCs, which are based on the discrete logarithm problem over points on an elliptic curve. An elliptic curve E over the finite field GF(p) defined by the parameters a, b ∈ GF(p) with p > 3, consists of the set of points P = (x, y), where x, y ∈ GF(p), that satisfy the equation: y2 = x3 + ax + b (1) where a, b ∈ GF(p) and 4a3 + 27b2 ≠ 0 mod p, together with the point at infinity O which is the additive identity of the group [1]. The number of points #E on an elliptic curve over a finite field GF(q=pm) is defined by Hasse’s theorem [26]. The set of discrete points on an elliptic curve forms an abelian group, whose group operation is known as point addition. Elliptic curve point addition is defined according to the “chord-tangent process”. Point addition over GF(p) is described as follows: Let P and Q be two distinct points on an elliptic curve E defined over GF(p) with Q ≠ -P (Q is not the additive inverse of P). The addition of the two points P and Q is the point R (R = P + Q), where R is the additive inverse of S, with S being the third point on E intercepted by the straight line through points P and Q. The additive inverse of a point P = (x, y) ∈ Ε is the point – P = (x, – y) which is the reflection of the point P with respect to the x-axis on E. When P = Q and P ≠ – P the addition of P and Q is the point R (R = 2P), where R is the additive inverse of S with S being the third point on E intercepted by the straight line tangent to the curve at point P. This operation is referred to as point doubling. The finite field GF(2m) is of particular importance in cryptography since it leads to efficient hardware implementations. Elements of the field are represented in terms of a basis. Most implementations use either a
Polynomial Basis or a Normal Basis [27]. Let GF(2m) be a finite field of characteristic two. A nonsupersingular elliptic curve E over GF(2m) is defined to be the set of solutions (x, y) ∈ GF(2m) × GF(2m) to the equation, y2 + xy = x3 + ax2 + b (2) ∈ where a and b GF(2m), b ≠ 0, together with the point at infinity. Adding a point P on the elliptic curve E to itself a number of times (k) is known as the scalar product (kP) of point P by the scalar k. Scalar multiplication is a basic operation for ECCs. The scalar multiplication operation (kP) yields a point on the elliptic curve which is the result of adding point P to itself k times. Several scalar multiplication methods have been proposed in the literature [29]. Computing kP can be done using a straightforward binary method, so called the doubleand-add method, based on the binary expression of the multiplier k. Computing kP using the binary method is shown below: Algorithm 1: Binary Algorithm (most-to-least) 1. input P, k 2. Q ← P 3. for i from m-2 downto 0 do 3.1. Q ← 2Q 3.2. if ki = 1 then Q ← Q + P 4. end for 5. output Q Depending on the type of coordinate system used for field operations, point addition operations require double or triple the time taken by point doubling operations [24]. Thus, for the binary algorithm (Algorithm 1), the time taken by single loop iteration will be considerably different depending on the value of the bit being inspected (ki). This iteration time dependence on the bit value of the scalar has allowed timing attacks [5] to infer the value of the private scalar k. To guard against timing attacks, the double-and-addalways algorithm has been proposed where each iteration performs both a doubling operation and an add operation irrespective of the value of ki [9]. In such a case, the value of point addition is only committed when ki = 1 and discarded otherwise. Whereas the binary method requires m point doublings and an average of m point additions, the double-and-add2
always method requires m point doublings and m point additions. Non-Adjacent-Form (NAF) [30] reduces the average number of point additions to (m/3). In NAF, signed digit representation are used such that the scalar multiplier’s coefficient ki ∈ {0, ±1}. Algorithm 2
shows the NAF binary scalar multiplication algorithm. Algorithm 2 inspects the bits of the scalar multiplier k, if the inspected bit ki = 0, only point doubling is performed. If, however, the inspected bit ki ≠ 0, both point doubling and addition/subtraction are performed. Algorithm 2: Binary *AF Algorithm 1. input P, NAF(k) 2. Q ← P 3. for i from m-2 to 0 do 3.1 Q ← 2Q 3.2 if ki = 1 then Q ← Q + P 3.3 if ki = -1 then Q ← Q – P 4. output Q
3. Proposed Scalar Multiplication Method In this section, the proposed scalar multiplication method is presented. The main idea of the proposed method is to ensure that the main loop of the scalar multiplication algorithm require the same time in each iteration. This can be achieved by observing that the required time to compute point addition in projective coordinates is from 2 to 3 times the required time to perform point doubling [28]. Accordingly, the proposed method either performs a single point addition or performs a number of consecutive point doublings that require the same time taken by a single point addition. However, to make these consecutive point doublings appear exactly the same as a single point addition, extra dummy field operations might be needed either within the point addition operation or within point doubling operation. This, of course, depends on the proper selection of the coordinate system. The proposed method exploits the absence of interdependency between point doubling and point addition within the least-to-most version of the binary algorithm (see Algorithm 3). For a given iteration, the point addition (Step 4.1) and point doubling (Step 4.2) operations, in Algorithm 3, are independent. Accordingly, several consecutive point doublings can be performed and stored to be added later, if needed, to the accumulation point. Algorithm 3: Binary Algorithm (least-to-most) 1. input P, k 2. R←P 3. Q←O 4. for i from 0 to m-1 do 4.1. if ki = 1 then Q ← Q + R 4.2. R ← 2R 5. end for 6. output Q
The pseudo code of the proposed scalar multiplication method is given in Algorithm 4 with the assumption that a single point addition takes the same time taken two point doublings. It is worth noting that Algorithm 4 works for both binary-encoded as well as NAF-encoded private key (k). Using a NAF-encoded key improves the overall average performance of the scalar multiplication. In Algorithm 4, Step 2 is the initialization step where two registers are initialized with the base point P and a precomputed point 2P. Algorithm 4 also needs two flags (flag0 & flag1) and an index (i) to control the execution of point operations. These flags and the index are initialized at Step 2 as well. The main scalar multiplication loop starts at Step 3. Within Step 3, two bits of the scalar multiplier k are processed and the index is updated. Step 3.1 inspects the ith bit of the private scalar (ki), while step 3.2 inspects the the (i+1)th bit ki+1. If the key bit being inspected (ki or ki+1) equals 1 (ī), a point addition (subtraction) is performed while no operation is performed if it equals 0. Step 3.3 performs two consecutive doublings regardless of the value of the inspected bits ki and ki+1. In general, each loop iteration takes several passes to be executed and the index i updated. The number of passes in an iteration depends on the value of the private scalar bits being inspected (ki ki+1). If ki ki+1 = 00, only one pass is needed to execute the two doubling operations of step 3.3. If ki ki+1 = 01 (10), two passes are needed; one to execute Step 3.1 (3.2) with one point addition or subtraction, and another to execute the two doubling operations of step 3.3. If ki ki+1 = 11, three passes are performed; one to execute Step 3.1 with one point addition or subtraction, another to execute Step 3.2 with one point addition or subtraction, and a third pass to execute the two doubling operations of steps 3.3. In a given iteration, the passes to be executed are controlled by the value of the two flags flag0 and flag1. Thus, any given pass performs either a single point addition (subtraction) or equivalently two point doublings. At the end of each iteration, the flags are cleared and the index i is incremented by 2. If the required time to perform point addition, however, is 3 times that of point doubling, Algorithm may be used. In any given pass of the main scalar multiplication loop (Step 3), only one point addition or equivalently 3 consecutive doublings may be performed. Accordingly, any pass of the main loop in Algorithms 4 and 5 should take the same time irrespective of the values of private scalar bit being inspected. Choice Of either of the two algorithms
depends on the adopted coordinate system for field operations. Table 1 shows an example of the execution of the proposed method using Algorithm 4. In Table 1, the scalar multiplier k is (100010 1 0)NAF = (134)10. The following series of point operations are required to compute kP: (ADD-DBL-DBL)-(ADD-DBL-DBL)(DBL-DBL)-(ADD-DBL-DBL). Table 2 shows an example of the execution of the proposed method using Algorithm 5 with a scalar multiplier k = (1010 1 0)NAF = (38)10. Algorithm 4: The Proposed Algorithm assuming that 2DBLs= 1ADD 1. input P, k 2. R1 ← P, R2 ← 2P, Q ← O, km = 0, flag0 = False , flag1 = False, i= 0 3. While (i < m) do 3.1. if (not flag0) then 3.1.1. flag0 = True 3.1.2. if (ki ≠ 0) then Q ← Q + ki R1 3.2. Else if (not flag1) then 3.2.1. flag1 = True 3.2.2. if (ki+1 ≠ 0) then Q ← Q + ki+1 R2 3.3. Else 3.3.1. i= i + 2 3.3.2. if i < m then 3.3.2.1. R1 ← 2 R2 3.3.2.2. R2 ← 2 R1 3.3.2.3. flag0 = False , flag1 = False 4. output Q Algorithm 5: The Proposed Algorithm assuming that 3DBLs= 1ADD 1. input P, k 2. R1 ← P, R2 ← 2P, R4 ← 2 R2, Q ← O, km = km+1 = 0, flag0 = False , flag1 = False , flag2 = False, i= 0 3. While (i < m) do 3.1. if (not flag0) then 3.1.1. flag0 = True 3.1.2. if (ki ≠ 0) then Q ← Q + ki R1 3.2. Else if (not flag1) then 3.2.1. flag1 = True 3.2.2. if (ki+1 ≠ 0) then Q ← Q + ki+1 R2 3.3. Else if (not flag2) then 3.3.1. flag2 = True 3.3.2. if (ki+2 ≠ 0) then Q ← Q + ki+2 R4 3.4. Else 3.4.1. i= i + 3 3.4.2. if i < m then 3.4.2.1. R1 ← 2 R4 3.4.2.2. R2 ← 2 R1
3.4.2.3. 3.4.2.4. 4.
R4 ← 2 R2 flag0 = False , flag1 = False flag2 = False
output Q
4. Security & Performance Analysis Although the execution of point operations vary according to the inspected key bits, the adopted resistance measures do not allow the attacker to associate these operations with a specific key bit position. Because only one point operation is performed in any iteration pass within the scalar multiplication loop. The proposed method either performs an addition or an equivalent consecutive number of doubles in any iteration pass. This renders the proposed method secure against Timing Attacks. The time complexity of the proposed method requires on the average [(m) DBL + ( m ) ADD] 3
operations with no added fake computations resulting in an efficient implementations in terms of both space and time complexities. To compare the proposed method to other recently proposed countermeasures, e.g. [22], [23] and [24], we assume GF(2m) field with normal basis representation [27]. Further, the Lopez-Dahab projective coordinate system [33] with 14 and 5 field multiplications needed to compute point addition and point doubling respectively is used. In addition, the bit-serial MasseyOmura multiplier [32] is used since it works for both optimal normal basis type I and type II [27]. Being bitserial, m clock cycles are required to perform a field multiplication operation. Since Side-Channel Atomicity [22] is a general technique that can be used with any finite field, basis or projective coordinate system, the above assumptions are used to compare the proposed method with SideChannel Atomicity. Table 3 compares the time complexity of the method proposed here to those of other recently reported countermeasures [22-24]. The designs reported in [23] & [24], however, use parallel field multipliers. To compare to these designs, the datapath of Lopez-Dahab projective coordinate system is used as reported in [31] where point addition and point doubling require 7 and 3 multiplications respectively if two parallel multipliers are used, and only 5 and 3 multiplications if three multipliers are used. The results in Table 3 show that the performance of the proposed method is either better or comparable to the other countermeasures. Figure 1 shows the comparison results with different number of key bits. Clearly, the depicted results show that the proposed
method provides better performance than other countermeasures when parallel multipliers are used.
5. Conclusion An efficient high performance elliptic curve scalar multiplication method with resistance against Timing Attacks has been proposed. The main idea of the proposed method is to control the main scalar multiplication such that either a single point addition is performed or a number of consecutive point doublings that take the same time taken by a single point addition operation. The proposed method requires no extra fake computations and its time complexity outperforms recently reported timing attacks countermeasures, particularly when parallel multipliers are used.
Acknowledgments The authors would like to acknowledge the support of Umm Al-Qura University (UQU) and the support of King Fahd University of Petroleum & Minerals (KFUPM).
References [1]
Koblitz, N.: ‘Elliptic curve cryptosystems’. Mathematics of Computation, 1987, vol. 48, pp. 203-209. [2] Rivest, R., Shamir, A. and L. Adleman.: ‘A method for obtaining digital signatures and public key cryptosystems’. Communications of the ACM, 1978, Vol. 21, No.2, pp. 120126. [3] El Gamal, T.: ‘A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms’. Advances in Cryptology: Proceedings of CRYPTO 84, 1985, Springer Verlag, pp. 10-18. [4] Kocher, P., Jaffe, J. and Jun, B.: ‘Differential power analysis’. CRYPTO '99, 1999, LNCS 1666, pp. 388-397. [5] Kocher, P.: ‘Timing Attacks on Implementations of DiffeHellman, RSA, DSS, and Other Systems’. CRYPTO '96, 1996, LNCS 1109, pp. 104-113. [6] Dhem, J, Koeune, F., Leroux, P., Mestre, P., Quisquater, J., and Willems, J.: ‘A practical implementation of the timing attack’. In CARDIS, 1998, pp. 167–182. [7] Koeune, F. and Quisquater, J.: ‘A timing attack against Rijndael’. Technical Report CG-1999/1, June 1999. [8] Handschuh, H. and Heys, H.: ‘A Timing Attack on RC5’, Lecture Notes in Computer Science 1556: Selected Areas in Cryptography - SAC '98, 1999, Springer-Verlag, pp. 306-318. [9] Coron, J.: ‘Resistance against differential power analysis for elliptic curve cryptosystems’. In Cryptographic Hardware and Embedded Systems - CHES’99, 1999, LNCS 1717, SpringerVerlag, pp. 292–302. [10] Schindler, W.: ‘A timing attack against RSA with the chinese remainder theorem’. In CHES 2000, 2000, pp. 109–124. [11] Schindler, W., Koeune, F., and Quisquater, J.: ‘Unleashing the full power of timing attack’. Technical Report CG-2001/3, 2001. [12] Gandolfi, K., Mourtel, C, and Olivier, F.: ‘Electromagnetic analysis: Concrete results’. In CHES 2001, 2001, pp. 251-261. [13] Schepers, D.: ‘Timing Attacks for Cryptosystems Based on Elliptic Curves’. Diploma Thesis, 2002, Technische Universität Darmstadt, Germany.
[14] Schindler, W.: ‘A combined timing and power attack’. LNCS 2274, 2002, pp.263–279. [15] Schindler, W.: ‘Optimized timing attacks against public key cryptosystems’. Statistics and Decisions 20, 2002, pp.191– 210. [16] Brumley, D., and Boneh, D.: ‘Remote timing attacks are practical’. Computer Networks 48(5), 2005, pp.701-716. [17] Bernstein, D.: ‘Cache-timing attacks on AES’. April 2005. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf. [18] O'Hanlan, M. and Tonge, A.: ‘Investigation of cache timing attacks on AES’. School of Computing, Dublin City University, 2005. [19] Hoggins, C., Alessandro, C., Kinniment, D. and Yakovlev, A.: ‘Securing On-Chip Operations against Timing Attacks’. School of Electrical, Electronic & Computer Engineering, University of Newcastle Upon Tyne, Technical Report Series, NCL-EECE-MSD-TR-2005-108, 2005. [20] Furlong, M. and Heys, H.: ‘A Timing Attack on the CIKS-1 Block Cipher’, Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering (CCECE 2005), May 2005, Saskatoon, Saskatchewan. [21] Bonneau, J. and Mironov, I.: ‘Cache-Collision Timing Attacks Against AES’. LNCS(4249), 2006, pp. 201-215. [22] Chevallier-Mames, B., Ciet, M. and Joye, M.: ‘Low-Cost Solutions for Preventing Simple Side-Channel Analysis: SideChannel Atomicity’, In IEEE Transactions on Computers, June 2004, Volume 53 (6), pp. 760-768. [23] Hodjat, A., Hwang, D. and Verbauwhede, I.: ‘A scalable and high performance elliptic curve processor with resistance to timing attacks,’ Proc. IEEE International Conference on Information Technology (ITCC 2005), April 2005, pp. 538543. [24] Al-Somani, T. and Ibrahim, M.: ‘High Performance Elliptic Curve GF(2m) Cryptoprocessor Secure Against Timing Attacks’, International Journal of Computer Science and Network Security (IJCSNS), 2006, Vol. 6, No. 1B. [25] Coron, J.: ‘Resistance against differential power analysis for elliptic curve cryptosystems’. In Cryptographic Hardware and Embedded Systems - CHES’99, 1999, LNCS 1717, SpringerVerlag, pp. 292–302. [26] McEliece, R.: ‘Finite Fields for Computer Scientists and Engineers’. 1987, Kluwer Academic Publishers. [27] Lidl, R. and Niederreiter, H.: ‘Introduction to finite fields and their applications’. 1994, Cambridge University Press, Cambridge, UK, revised edition. [28] Cohen, H., Ono, T. and Miyaji, A.: ‘Efficient Elliptic Curve Exponentiation Using Mixed Coordinates’. In Advances in Cryptology -SIACRYPT '98, 1998, K. Ohta nd D. Pei, Eds., vol. 1514 of Lecture Notes in Computer Science, pp. 51-65. [29] Gordon, D.: ‘A Survey of Fast Exponentiation Methods’. Journal of Algorithms, 1998, pp. 129-146. [30] Joye, M. and Tymen, C.: ‘Compact Encoding of Non-Adjacent Forms with Applications to Elliptic Curve Cryptography’, Public Key Cryptography, 2001, vol. 1992 of Lecture Notes, in Computer Science, pp. 353-364, Springer-Verlag. [31] Al-Somani, T., Ibrahim, M. and Gutub, A.: ‘High Performance Elliptic Curve GF(2m) Crypto-Processor’, Information Technology Journal, 2006, 5(4), pp. 742-748. [32] Omura, J. and Massey, J.: ‘Computational method and apparatus for finite field arithmetic’. U.S. Patent Number 4,587,627, May 1986. [33] Lopez, J. and Dahab, R.: ‘Improved Algorithms for Elliptic Curve Arithmetic in GF(2n)’. SAC'98, 1998, LNCS Springer Verlag, pp. 201-212.
i 0
Table 1: An example of the proposed method using Algorithm 4. Q Step flag flag ki ki+ R1 R2 Point Operations 2 0 F F P 2P O 1 3.1 0 T F P 2P P 1
2
4
6
3.2
T
T
0
3.3
F T T F T T F T T F
F F T F F T F F T F
0
1 1
0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 1 1 1
3.1 3.2 3.3 3.1 3.2 3.3 3.1 3.2 3.3
Step 2
0
3.1
T
F
F
0
3.2
T
T
F
0
3.3
T
T
T
0
3.4 3.1 3.2 3.3 3.4
F T T T F
F F T T F
F F F T F
0 1 1 1 1
-2P
ADD
8P 8P 8P 32P 32P 32P 128P 128P 128P 512P
-2P -2P 6P 6P 6P 6P 6P 6P 134P 134P
2 DBL ADD 2 DBL 2 DBL ADD 2 DBL
1 1 1 1 0 0 0 0
0
P
2P
4P
O
-
0
P
2P
4P
-2P
ADD
0
P
2P
4P
-2P
0 1 1 1 1
8P 8P 8P 8P 64P
16P 16P 16P 16P 128P
32P 32P 32P 32P 256P
-2P 6P 6P 38P 38P
3 DBL ADD ADD 3 DBL
Table 3: Comparisons between the proposed method and recent countermeasures. # Multipliers Time Complexity Proposed Method Time Complexity 1 {[5*m + 14*(m/3)] * m} {[5*m + 14*(m/3)] * m} 2 {[(18*(m+3) + (m+3)/2 +1)] * (m/3)} {[3*m + 7*(m/3)] * m} 3 {[4*m + 4*(m/3)] * m} {[3*m + 5*(m/3)] * m}
Time Complexity (Clock Cycles)
Ref. [22] [23] [24]
4P 4P 4P 16P 16P 16P 64P 64P 64P 256P
2P
Table 2: An example of the proposed method using Algorithm 5. flag flag flag ki ki+ ki+ Q R1 R2 R4 Point F F F 0 0 P 2P 4P O 1
i
3
P
450000 400000 350000
Chevallier-Mames et. al. [22]
300000
Hodjat et. al. [23]
250000
Al-Somani & Ibrahim [24]
200000
Propsed (1 Multiplier)
150000
Propsed (2 Multipliers)
100000
Propsed (3 Multipliers)
50000 0 160
170
180
190
200
No. of key bits (m)
Figure1: Comparisons results with different number of key bits.