PHYSICAL SIDE-CHANNEL ATTACKS ON CRYPTOGRAPHIC ...

PHYSICAL SIDE-CHANNEL ATTACKS ON CRYPTOGRAPHIC SYSTEMS N. P. SMART Abstract. We survey a number of attacks on cryptographic systems which depend on measuring physical characteristics of such systems whilst a given cryptographic operation is carried out. Such measurements could include the time needed to perform certain operations, the power consumed or any electromagnetic radiation produced. As such the physical measurement is producing a side-channel for the cryptographic system which leaks information about the internal secret key. We also describe a number of the countermeasures that have been proposed in the literature.

1. Background Modern cryptography is about ensuring the integrity, con dentiality and authenticity of digital communications. As such it has a large number of applications from e-commerce on the Internet through to charging mechanisms for pay-TV. As more and more devices become network aware they also become potential weak links in the chain. Hence cryptographic techniques are now being embedded into devices such as smartcards, mobile phones and PDA's. This poses a number of problems, since the cryptographic modules holding the secret keys are no longer maintained in secure vaults inside large corporations. For a cryptographic system to remain secure it is imperative that the secret keys, that it uses to perform the required security services, are not revealed in any way. The fact that secret keys are now embedded into a number of devices means that the hardware becomes an attractive target for hackers. For example if one could determine the keys which encrypt digital television transmissions, then one could create decoders and sell them on the black market. In addition, if one could determine the keys which protect a number of store valued smartcards (which hold an electronic representation of cash) then one could essentially \print money". Since cryptographic algorithms themselves have been studied for a long time by a large number of experts, hackers are more likely to try to attack the hardware and system within which the cryptographic unit is housed. A particularly worrying class of attacks has been developed in the last few years by P. Kocher and colleagues at Cryptography Research Inc., see [21] and [22]. In these attacks a number of physical measurements of the cryptographic unit are made, for example power consumption, computing time or EMF radiations. These are made over a large number of encryption or signature operations and then, using statistical techniques, the secret key embedded inside the cryptographic unit is uncovered. These attacks work because there is a correlation between the physical measurements taken at dierent points during the computation and the internal state of the processing device, which is itself related to the secret key. For example when data is loaded from memory the memory bus will have to carry the value of the 1

2

N. P. SMART

data, which will take a certain amount of power depending on the data value. Since the load instruction always happens at the same point within the computation one can produce correlations between various runs of the application, eventually giving away the secret of the smartcard. The three main techniques developed by Kocher et. al are timing attacks, simple power analysis (SPA) and dierential power analysis (DPA). It is DPA which provides the most powerful method of attack, yet it can be mounted using very little resources. In this paper we survey the recent work in this area. In the rst section, Section 2, we recap on some of the algorithms used to implement the cryptographic primitives. This is to enable the description of the various attacks and defences later on. In Section 3 we describe the simple variants of some of the attacks, whilst in Section 4 we discuss the more powerful dierential variants. In Section 5 we cover some new areas of classical cryptanalysis which mean that once a small proportion of the secret bits are determined one can recover the rest using mathematical techniques, there by increasing the power of attacks like DPA. Starting from Section 6 we explain the various defences which have been proposed. This is an extended version of the printed version of the paper. 2. Cryptographic Primitives Cryptographic algorithms are essentially divided into two types: Symmetric (secret key) or asymmetric (public key). The types of operations performed vary signi cantly between the two types of operations. In symmetric systems a variety of bit and byte operations are performed over a number of rounds. Such systems are usually very fast and allow the encryption of bulk data. Asymmetric systems on the other hand usually use some form of exponentiation algorithm in a nite Abelian group, either the integers modulo a large number, as in RSA [31] or the group of points on an elliptic curve, as in EC-DSA, see [20] and [29]. Asymmetric systems are generally slow and are often used to determine or encrypt session keys for later use by bulk ciphers. One of the most important applications of asymmetric cryptography is in the production of digital signatures. For the basic notions from cryptography we refer to [26]. In what follows we will concentrate on the points of interest for our discussion. 2.1. Symmetric Ciphers. Symmetric ciphers, such as DES [14], use a single secret key that both parties need to know to both send and receive communications. They usually work via a set of complicated bit and byte operations plus the use of look up tables. Symmetric ciphers often work in two phases. In the rst phase the key is preprocessed. This often involves expanding the key into various round keys to be used in dierent rounds of the second phase. In the second phase the data is encrypted/decrypted by applying simple, keyed, transformations over a number of rounds. For example in DES the key scheduling phase converts the input 56 bit key, K , into sixteen 48 bit round keys, Ki . The bits of each 48 bit round key are a xed permutation of a choice of 48 bits of the original 56 bit key. Each round key is a divided into eight six bit subkeys which form a data dependent index into one of eight look up tables called S-boxes. Then sixteen rounds of a Fiestel cipher are applied. In a Fiestel round the message on input to round i is split into a left and right portion, denoted Li and

PHYSICAL SIDE-CHANNEL ATTACKS ON CRYPTOGRAPHIC SYSTEMS

3

Ri . In each round these are operated on via a function of the form Li = Ri,1 Ri = Li,1 f (Ri,1 ; Ki ) For DES the function f is implemented by simple byte manipulation operations and a set of table lookups, with the two portions Li and Ri always being 32 bits long. The function f for DES is computed via the outline of Algorithm 1.

Algorithm 1.

1. The message portion Ri,1 is rst expanded to 48 bits and then xor'ed with the round key Ki . 2. The resulting 48 bit object is then divided into eight chunks each of six bits in length. 3. These eight chunks are used to index eight dierent lookup tables, called Sboxes, each of which returns a four bit nibble. 4. The eight nibbles are then recombined via a bit permutation into the 32 bit output of the function f . Note that we have simpli ed the above discussion of DES, having only concentrated on what is important for our future discussion.

2.2. Asymmetric Ciphers. Public key cryptography is usually based on some form of exponentiation algorithm in a nite Abelian group, G. That is given g 2 G and x 2 Z the main computational task is to compute h = gx: Either for a secret x or a secret g depending on the cryptographic primitive in use. The basic algorithm used to perform exponentiation is the binary method. Although this method is slower than others, it has the advantage of using very little memory and so is often preferred in constrained computational devices such as smartcards. Algorithm 2 (Binary Exponentiation). 1. Label the bits of x as [xd,1 ; : : : ; x0 ], with x0 being the least signi cant bit. 2. Set h = 1. 3. For i = d , 1 to 0 (a) h = h2 (b) If xi = 1 then h = h g. Algorithm 2 is referred to as a left-right exponentiation algorithm since the exponents bits are processed from the most signi cant down to the least signi cant bits. If the exponents bits are processed from least signi cant up to the most signi cant we refer to the algorithm as a right-left algorithm. A more advanced technique, which requires more storage, is to use one of the various window techniques, see [26]. In a window method the bits of the exponent are processed in groups rather than individually as above. However, window techniques only apply to left-right exponentiation algorithms. Algorithm 3 (m-ary Window Method). 1. We rst precompute gi = gi for 0 i 2m , 1.

4

N. P. SMART

2. Write, where l (log2 x)=m d=m,

x=

Xl, zi2mi: 1

i=0

3. Set h = 1. 4. For i = l ,m1 to 0 (a) h = h2 (b) h = h gzi . There are many other such methods, some of which only apply to certain groups. For example signed window methods can be applied for elliptic curve cryptosystems. But all such methods can be considered essentially equivalent for our purposes. We now discuss how such exponentiation techniques apply to the most popular cryptosystems: RSA In RSA the group G is the group of integers modulo N where N = pq. Either for RSA decryption or signature generation the basic operation is to compute M = C d (mod N ) for some secret exponent d. In practice the owner of d also knows p and q, so the actual computation is often carried out by the Chinese Remainder Theorem (CRT). The CRT is used to recover M after computing Mp = C d (mod p,1) (mod p); Mq = C d (mod q,1) (mod q): Notice that for RSA the basic operation is to exponentiate many elements with a xed secret exponent, this makes RSA particularly vulnerable to the attacks described in this paper. However, as we shall later see, certain algorithmic changes can help mitigate this vulnerability. Die-Hellman Die-Hellman key exchange is where two entities, A and B, produce two key pairs (a; ga ) and (b; gb ) and combine them into a shared secret key K = gab : This can be done in one of two ways: 1. Ephemeral Die-Hellman: Both a and b are secret random elements on each run of the protocol. 2. Static Die-Hellman: Only a is chosen at random. Hence B always computes hb for, essentially, random group elements h and secret integer b. In this last case the goal is to recover B 's permanent static secret, b. Just as for RSA the use of a xed exponent in the static variant can produce vulnerabilities. DSA style systems In these systems one always computes gk for some (permessage) random secret integer k, called the ephemeral key. But one also computes the solution s to an equation of the form s = (H (M ) + xr)=k (mod q): Where H is a cryptographic hash function, q is xed and known, M and r are also known, but x is a permanent secret and k is the integer used to perform the prior exponentiation. In these systems our goal is to recover the permanent secret x, even though this has nothing to do with the exponent k.


5

Die-Hellman and DSA type systems come in two avours. The rst type, which we shall call conventional, uses as a group the integers modulo a large prime p. The second type, based on elliptic curves, uses the group of points on an elliptic curve, see [5] for further details on elliptic curve based systems. Given a point P the exponentiation step, for elliptic curves, is to compute [x]P for an integer x, where [x]P means the result of adding P to itself x times. This is done in exactly the same ways as computing gx in a multiplicative group. 3. Simple Attacks An elementary form of physical side channel attack is to simply measure some physical characteristic of the device whilst it is carrying out the cryptographic operation. Kocher et. al. [22] describe such a technique based on the power consumption of the device, the resulting attack they dub Simple Power Analysis (SPA). Given a ne enough resolution one can see the dierent operations being carried out within the device. For example in an exponentiation algorithm the squaring operations could have a dierent power output to the general multiplication operations. Hence if one can determine the exact order that a sequence of square and multiply operations are carried out then one can sometimes recover the key. For example if we notice the sequence of square and multiply operations was

SMSSMSMSSSMSSSSM then, assuming the binary method of exponentiation was used, we know that the bits of the exponent are given by 10110010001: In principle such an attack can be mounted on a system which only used ephemeral exponents, however this requires more accurate measurements, since one only has one chance of measuring the side channel information for a given ephemeral exponent. It appears at rst sight that this simple analysis only applies to the binary method of exponentiation. But often, with window techniques, the multiply is skipped when zi = 0. If this can be detected then we obtain m bits of the exponent in one step. In addition the 2m possible multiplies in step 4 (b) of the earlier window algorithm can have dierent power consumption traces. Even determining 5 bits of a number of ephemeral exponents in the DSA algorithm can lead to a complete break of the cryptographic system, using techniques such as those in [18]. This idea of determining the internal state of a computer from some physical measurement such as power consumption is not new. In the early days of computing, operators often could determine the output of a computation when the output device was broken by listening to the noise the computer was making. 4. Differential Attacks We describe two type of attacks which require multiple data points to be taken across a number of runs. The rst attack uses the physical side channel of the time

6

N. P. SMART

required to perform a given operation, the second attack uses the power consumption over a number of runs. Both proceed by utilising statistical dierences in the data given certain guesses for the bits. 4.1. Timing Analysis. Again we use the exponentiation algorithms as an example with which to explain the attack. Now however we require that the exponentiation is carried out using a xed exponent for a number of dierent (or the same) base. This typically happens in RSA type systems, where one can ask the smartcard to decrypt the same message a number of times, the time taken to do so being recorded each time. We assume that the method used is a window method, with windows of length m. In the following we shall use the notation of our previous description of the window algorithm. The following can easily be adjusted for use with any exponentiation algorithm. The method we shall explain is that given by Kocher [21], however we note that in [15] a slightly dierent statistical methodology is given which is akin to the dierential power analysis we shall consider later on. As a precomputation stage one computes the average time, ti , it takes the device to compute a multiplication by gi , for i = 0; : : : ; 2m , 1. We also need to determine the average time, s, it takes to compute the powering by 2m . Then given a window pattern fzi g, i = 0; : : : ; l , 1, we expect the time to compute the exponentiation is given by

Tj =

Xl, (s + tz ) + ej ; 1

i

i=0

where ej is some measurement error, for the j th experiment. Now assume we know z0 ; : : : ; zb,1 and we want to compute zb . Then for the 2m choices for zb we compute

Gb (zb ) =

Xb (s + tz ): i

i=0

Then we subtract Gb (zb ) from our each of our observed timings Tj to obtain

Hb;j (zb ) =

Xl, (s + tz ) + ej : 1

i=b+1

i

As explained in [21] if we compute the variance Vb (zb ) of Hb;j (zb ) over all j samples, then the value of Vb (zb ) which is smallest should correspond to the correct value of zb . Assuming this value of zb is correct we can then iterate the method, using the same timing measurements, to obtain zb+1 . It should be noted that certain types of arithmetic, such as Montgomery Arithmetic [30], reduce the variations in the timing measurements, hence a more complicated analysis is needed. See [15] for a report on an attack on an RSA system which used Montgomery Arithmetic. It is also harder to apply this analysis when the RSA implementation uses the CRT improvement mentioned earlier. A chosen ciphertext attack using timing measurements on RSA systems which use the CRT optimisation is explained in p [21], as follows: First one chooses cipher texts C which have size around N , then one tries to detect the timing dierences in the precomputation of C (mod p) and C (mod q):


7

The idea being to detect whether C is larger or smaller than p or q. Repeating this a number of times, the idea is to reveal the upper bits of p and/or q. Once a quarter of the bits of p have been determined recent techniques of Coppersmith [11] can be used to nd p and q, and hence the secret key. Schindler [32] points out that such an attack is not very practical, he goes on to give a practical attack based on the powering algorithm used in a system using CRT and Montgomery arithmetic. A similar technique is also proposed in [21] to use timing analysis on DSA type systems. Since for DSA one always uses an ephemeral key, the attack on the exponentiation algorithm described above will not work. Instead one uses the signing equation s = (H (M ) + xr)=k (mod q) Since H (M ) + xr is usually computed before the multiplication by k,1 this can be a point where one can use timing analysis techniques to try to recover the bits of x. In [17] a timing attack on DES is described which can recover the Hamming weight of the key from the key scheduling part of the DES algorithm. Two implementations of DES are examined on a DOS based computer. From the experiments it appears that the time to compute the key schedule is linearly dependent on the Hamming weight of the original key. 4.2. Dierential Power Analysis. A more interesting attack is provided by differential power analysis. The basic idea is that the power consumption of a device is dependent on the actual data being operated on. For example an eight bit register with three bits sets should require dierent power than the same register with ve bits set. Such considerations apply to other physically measurable quantities as well as power consumption, but in the following we shall solely discuss power consumption. At rst sight it would appear that it is the hamming weight of data which eects the power and not the exact data. However, we need to take into account eects on the power when the internal state changes. For example consider the following two state changes between a byte with hamming weight three and one with hamming weight ve. 01010001 ,! 10101110 01010001 ,! 11110001: The rst requires eight state changes of the transistors holding the bits, whilst the second only requires two state changes. Such dierences will also cause power consumption variations. The basic idea of dierential power analysis (DPA) is to look at the power trace over a number of runs of the device and then to nd correlations between the dierent runs which depend on a guess for the key. If the key guess is correct then one will obtain a high correlation, if the guess is not correct then they will not be correlated. We shall now explain how DPA works, see [22] or [28] for more details. Let E (M; K ) denote the encryption function we are analysing. We compute the power trace for a number of plain text/cipher text pairs Ci = E (Mi ; K ): (We may not know one of Mi or Ci , but that is immaterial for our discussion).

8

N. P. SMART 12 "res.000" "res.001" "res.002" "res.003" "res.004" "res.005" "res.006" "res.007" "res.008" "res.009"

10

8

6

4

2

0 0

2000

4000

6000

8000

10000

12000

14000

Figure 1. DPA attack on DES

We then make a guess for a certain set of bits of K , and deduce from Mi , Ci and the guessed bits, some internal variable which depends on the guessed bits. The idea being that the internal variable we are trying to determine actually occurs in the algorithm and so will have an eect on the power trace. This internal variable computation is referred to as the DPA selector function, D(Ci ; Mi ; Ks ) 2 [0; : : : ; 2h , 1]; where Ks is our guess for s-bits of K and the output is of size 2h . Suppose we have m power traces, represented by Ti [j ] which gives the power consumption on trace i at time j . We divide the traces into 2h sets, Sk , depending on the value of D(Ci ; Mi ; Ks ) for our guess of Ks . Then the average of each such set is computed at each time interval, x[j ]k , in addition the average, z [j ], of all samples is also computed. The DPA statistic is then computed

X D[j ] = (z[j ] , x[j ]k ) 2h

k=0

2

and plotted. Assuming the guess for Ks is correct, and the actual value for D(Ci ; Mi ; Ks ) eects the power consumption, one would expect to obtain a peak in the graph of D[j ] at the point in time where the power becomes eected by the value of D(Ci ; Mi ; Ks ). On the other hand if the guess is incorrect then the sets Si should be independent and we obtain no correlation, hence we expect no peaks to be seen. We run such an experiment using a simulated processor using the DES algorithm. As the DPA selector function we took a guess, Ks , for six bits of the last rounds key and then computed four bits of the previous rounds value of R from the value for Ci . The results of this simulation for ten of the sixty four guesses can be seen in Figures 1 and 2, where one can clearly see that the trace which corresponds to the correct key gives a marked peak.


9

12 "res.000" "res.001" "res.002" "res.003" "res.004" "res.005" "res.006" "res.007" "res.008" "res.009"

10

8

6

4

2

0 9500

10000

10500

11000

11500

Figure 2. DPA attack on DES, view of the last four rounds

Note that if we did not know Ci but we knew Mi then a similar technique could be applied to the rst round, instead of the last. It is also possible to run a similar attack using neither knowledge of Ci or Mi , in which case one would be essentially attacking the key schedule part of the algorithm. See [4] for a discussion on attacking the key schedule of both DES and the AES candidates. See also [9] for a discussion of DPA on the AES candidates. This dierence between knowing or not knowing Ci or Mi , is referred in the cryptographic community as a known ciphertext or known plaintext attack. It should be noted that standard theoretical security models do not apply when considering the types of attacks considered in this paper. For example it is usually stated that using a plaintext aware encryption function, such as OAEP [3], eliminates the possibility of mounting a chosen ciphertext attack (at least in the random oracle model of computation). However, one can still mount a DPA attack on an OAEP scheme by sending the device completely random (and incorrect) ciphertexts. The device will respond that the ciphertext is invalid, but one can still measure the power and time needed for the device to determine this. Similar techniques can be applied to modular exponentiation algorithms. For example, suppose we know the output of h = gx; is computed via a left-right binary method, then the intermediate value computed in the last loop of the binary algorithm will be q = h or h=g: Assuming a \standard" bit representation for the group elements within the algorithm, we then know the two possible representations for the intermediate value. We can then apply DPA with the selector function which picks, say, the least signi cant bit of q. This will hopefully allow us to determine the least signi cant bit of x. This process can then be repeated to reveal further bits of x.

10

N. P. SMART

As mentioned previously the above statistical technique can be used in the case of timing analysis, where one replaces the power trace Ti [j ] by a single time measurement Tj . Messerges et. al [27] mention three types of attack scenario on exponentiation systems using DPA. These dierent scenarios apply to dierent types of system being broken hence the distinction can be quite important for a speci c physical device. 1. SEMD: Single exponent, multiple data. Here the attacker needs the device to exponentiate many random messages with both the secret exponent and an exponent known to the attacker. The philosophy is that the attacker determines the secret by determining where the secret exponent diers from the known exponent bit by bit. Messerges et al claim that this attack requires about 20000 exponentiations to recover each bit of the exponent. 2. MESD: Multiple exponent, multiple data. In this attack the attacker needs the device to exponentiate a single message to both the secret exponent and many known exponents of the attackers choosing. Hence the device is a very general device, but the attack now is easy since the authors claim that they require about 200 exponentiations per bit of secret 3. ZEMD: Zero exponent, multiple data. This is probably the most realistic attack since the device is only required to perform exponentiations by a single secret exponent on multiple pieces of data. However, the exact exponentiation algorithm used by the device must be known to the attacker. In this case the authors claim they require about 200 exponentiations per bit of secret exponent. The basic idea of DPA can be summarized in the following quote from a paper of Coron and Goubin, [13], which they dub the Fundamental Hypothesis. "There exists an intermediate variable, that appears during the computation of the algorithm, such that knowing a few key bits (in practice less than 32 bits) allows us to decide whether two inputs (resp. two outputs) give or not the same value for this variable." 5. Partial Key Exposure One does not need to use timing and power analysis to recover the full key in any system. Often it is possible to recover the whole key using classical techniques from only a small amount of extra information. Such methods are said to utilise partial key exposure. For example in [4] a method is given to recover a DES key from the hamming weights of the six subkeys used in each of the sixteen rounds. There are 96 such subkeys in all and the Hamming weight of each one gives rise to a linear equation amongst the 56 bits of the original key. Hence it is not surprising that one can recover the full key from this information. For RSA a number of interesting results are known on partial key exposure, see [6] for a survey of this and other matters related to RSA. The most important result in this eld is the already mentioned result of Coppersmith [11]. Coppersmith's result states that if a quarter of the most (or least) signi cant bits of one of the prime factors of the modulus is known, then one can actually factor the modulus completely.


11

Related to this, and of interest in attacks on the exponentiation techniques, are the results of Boneh, Durfee and Franklin [7]. The main one being that for small public exponent, if one knows a quarter of the least signi cant bits of the private exponent, d, then one can recover d. In more detail they prove Theorem 1. Suppose N = pq, log2(N ) = n and 1 e; d (p , 1)(q , 1) = (N ) satisfy ed 1 (mod (N )) then 1. Given the n=4 least signi cant bits of d one can compute all of d in time polynomial in n and e. 2. Suppose e is a prime in the range [2t ; : : : ; 2t+1 ] with n=4 t n=2. Then given the t most signi cant bits of d one can compute d in time polynomial in n. 3. Suppose e 2 [2t; : : : ; 2t+1] is a product of at most r distinct primes with n=4 t n=2. Given the factorisation of e and the t most signi cant bits of d one can compute d in time polynomial in n and 2r . 4. Suppose e 2 [2t; : : : ; 2t+1 ] with 0 t = len=2 and suppose d > N for some > 0. Then given the n , t most signi cant bits of d one can recover d in time polynomial in n and 1=. It should also be noted that for small public exponent around half of the most signi cant bits of the private exponent d are leaked automatically by computing b(kN + 1)=rc for some integer 0 < k e. For DSA systems Howgrave-Graham and Smart [18] show that if t bits of the ephemeral keys are known for just over n=t messages, then one can probably recover all of the ephemeral keys. They show that this method actually works, in a very short amount of time, if one knows around eight successive bits out of 30 ephemeral keys each of 160 bits in length. 6. Defences Following Kocher's papers a number of people have started to examine this problem and propose solutions. Goubin and Patarin [19] give three possible general strategies to combat DPA type attacks, apart from the obvious defence of physical shielding, 1. Introduce random timing shifts so as to decorrelate the output traces on individual runs. 2. Replacing critical assembler instructions with ones whose signature is hard to analyse, or reengineering the crucial circuitry which performs the arithmetic operations or memory transfers. 3. Make algorithmic changes to the cryptographic primitives under consideration. The rst two such approaches are of a hardware nature, which we discuss in Section 7, whilst the last approach is at the level of the algorithm. In Section 8 we discuss a number of algorithmic changes which have been proposed by various authors to dierent cryptographic primitives. 7. Masking the Internal State As mentioned previously there are two main ways of attempting to defeat these attacks using hardware type techniques: Namely by using instructions/circuitry

12

N. P. SMART

which produce very little side-channel leakage or by introducing temporal disturbances. 7.1. Special Instructions/Circuitry. By using technologies which reduce the overall power consumption and the relative variation of power trace one can help to mitigate the eect of DPA. In addition physical shielding can also help reduce the eect of DPA, for example Shamir [33] proposes decorrelating the power supply from a smart card by using capacitors as power isolation elements. A hardware approach that has been studied in various papers is the application of balanced architectures. For example by balancing the Hamming weights of the operands, see [22] and [23]. One technique is to use a redundant data representation which represents each datum in a form with a constant Hamming weight. For example to represent the nibble x we could use the pair, fx; xg, where x represents the complement of x, for example 1011 is represented as 10110100: One needs to take care when altering memory locations, buses, registers etc that all data is cleared before the new entry is written since otherwise one still could obtain an information leakage, see [25]. For example if x represents the initial state and y the nal state then information about the quantity xy will possibly be leaked by power consumption, even though x and y have constant Hamming weight e.g. x y xy wt(xy) 01101001 01111000 00010001 2 01101001 00001111 01100110 4 01101001 10010110 11111111 8 Hence during state changes it is important to, for example, set to zero all the bits before writing the new value. This means that each state change has a constant Hamming weight dierence. For example in the rst example above we use the two state changes 01101001 ! 00000000 ! 01111000: In addition one can also design balanced circuitry which operates on the balanced operands considered above. Such circuitry could include the standard logical operations of and, or and xor, and in addition more complicated operations such as integer addition or multiplication. 7.2. Temporal Disturbances. An attack such as DPA essentially makes use of the correlation between the power consumption on successive runs of the cryptographic operation. Such a correlation happens because computing devices are essentially deterministic in nature. To reduce this correlation, and hence the chance that DPA will work, one can introduce temporal disturbances into the cryptographic operation. There are two main ways of producing the required temporal misalignments; (i) Introducing random clock signals. (ii) Introducing randomness into the execution order. Kommerling and Kuhn [24] mention various techniques which introduce a certain amount of temporal disturbance into the processor. For example randomised clocking which puts an element of non-determinism into the instruction cycle. They, however, state that this does not provide enough of a defence, since attacks can use cross correlation techniques to remove the eect of the randomised clock.


13

Kocher et al [22] mention that randomising execution order can help defeat DPA, but can also lead to other problems if not done carefully. As an example of this later technique Kommerling and Kuhn, mention the idea of randomised multi-threading at an instruction level. They describe this with a set of essentially `shadow' registers. The auxiliary threads they then state could execute random encryptions, hence hoping to mask the correct encryption operation. This has its drawbacks since the processor is required to perform tasks which are in addition to the desired computation, increasing computational costs considerably. Chari et.al [8] mention a number of counter measures to DPA type attacks, including the creation of balanced hardware, clock cycles of varying lengths and randomised execution sequence. They mention that for randomised execution sequences to be eective then the randomisation needs to be done extensively. For example they mention that if only the table lookups in each DES round are randomised then one can still perform DPA by taking around eight times as much data. In addition no mechanism is provided which would enable aggressive randomised execution. Hence for randomised order of execution to work it needs to be done in a highly aggressive manner which would preclude the type of local randomisation implied by the descriptions above. The randomised multi-threading idea is close to a solution but suers from increased CPU time and requires a more complex processor with separate banks of registers, one for each thread. In addition a possible attack on such systems using a so called integration technique is described in [10]. 8. Algorithmic Alterations Another way to defeat the above attacks such as DPA is to slightly alter the cryptographic algorithms so that for example the secret keys are not actually used within the main operation. If the secrets are not actually used within the computation then the secret will not contribute to the power consumption or time required. There are three main proposals for algorithmic alterations. 1. Implementing balanced algorithms. 2. Information Blinding: Where the secret is hidden via some other quantity. 3. Information Splitting: Where the internal state is split into a number of other states. 8.1. Balanced Algorithms. By balanced algorithms we mean algorithms which are designed so that the operations performed only depend slightly on the input data. Many public key algorithms can be implemented in a balanced way, and such a defence is probably the easiest to implement. However, as we shall see, they only provide a limited amount of protection. An obvious way of preventing SPA style attacks is described by Coron [12] in the context of elliptic curve cryptosystems. In an SPA attack against an exponentiation algorithm one uses the fact that when a zero bit is encountered in the exponent a multiplication operation is not performed, a simple idea is to compute the same functions no matter what the bit is. Algorithm 4 (Balanced Exponentiation). 1. Label the bits of x as [xd,1 ; : : : ; x0 ] 2. Set h = 1 3. For i = d , 1 to 0

14

N. P. SMART

(a) h0 = h2 . (b) h1 = gh0 . (c) h = hxi . However, Coron points out that the above is still weak with respect to DPA. Since, for example, the processing for the second bit of the exponent will be on a value of h which is correlated with the rst bit, i.e. either h = 1 or h = g on the second pass through the loop. Hence, by successively guessing the bits and looking for correlations, the exponent can be recovered. Clearly this way of resisting SPA and the attack using DPA can be applied to any exponentiation algorithm, for example see [16] for a variant of the above algorithm as applied to ECC systems based on Koblitz curves. 8.2. Information Blinding. Kocher et.al [21] and [22] recommend using a level of blinding, especially when applied to algorithms such as RSA, this again increases the computing time needed to implement the operation and could also modify the original cryptographic primitive in ways which could lead to other weaknesses being uncovered. This is a popular approach which is mentioned by a number of authors, and in private communications with the current author. We again take as our example the exponentiation technique used in public key cryptography. The easiest way to blind the exponent is to add a random multiple of the group order onto x before one computes the exponentiation, i.e. one uses the equation gx+r#G = gx: This has the disadvantage of increasing the required amount of processing time. This could be quite pronounced since one needs to choose r from a large set so as to reduce the ability of using DPA along with an exhaustive search for r. Since the DPA attack requires us to guess internal states one approach would be to randomise (or blind) the base g, this is more applicable in discrete logarithm based systems than those based on the RSA assumption. This technique needs to be done carefully since we require the computation of gx and not some random element raised to the power of x. One approach is to multiply g by some random element r for which we know s = rx . We can then compute gx = s,1 (gr)x : A simple way to achieve this is described by Coron [12] in the context of elliptic curve systems. A particularly interesting approach to blinding, which only applies to elliptic curve systems, is to use projective coordinates, as described by Coron [12]. In elliptic curve systems there are as many ways of representing g as there are elements in the nite eld, Fq , over which the elliptic curve is de ned. Typically one has q > 2160 . Usually one starts with a point in ane coordinates P = (X; Y; 1) where X; Y 2 Fq (In elliptic curve systems one replaces g by a point P , this is only a notational convention). However one could also write P = (x; y; z ) where z is a non-zero random element of Fq and one de nes x = Xz 2 and y = Y z 3 :


15

When one then computes [m]P (the elliptic curve equivalent of gm ) the internal bits of P are then blinded. This defence against DPA can be implemented with very little loss in performance. A very novel way of blinding the exponent in an exponentiation algorithm is proposed by Messerges et. al. [27]. They propose that one should use two exponentiation algorithms at once, the rst being the standard left-right method, whilst the second being a right-left method. When an exponentiation is carried out, a random bit position is chosen in the exponent. The algorithm then performs two exponentiations, one going left-right from this bit position, whilst the second going right-left from this bit position. Note, one needs to be careful that the change over from a left-right to a right-left method cannot be detected, since that may allow some information to be deduced about x. Since window methods cannot be used with right-left algorithms this implies that the left-right method should not use window techniques either. One way to help defeat the previously mentioned timing attack on the DES key schedule is proposed in [17]. Since the time to perform the key schedule seems to depend on the Hamming weight of the key, the idea is to blind the Hamming weight as follows. Suppose we wish to compute the key schedule for the 56 bit DES key K . First one chooses a second key K 0 of length 56 bits by setting K 0 = K and then

ipping half the bits of K 0 which are set to one and half the bits of K which are set to zero. Then the Hamming weight of K 0 and K K 0 is equal to 28. Now run the key schedule algorithm on K 0 and K K 0 to obtain the round keys 0 K1; : : : K160 and K 1 ; : : : ; K 16 . This should protect against the timing attack since the algorithm is always run on data with constant Hamming weight. The actual key schedule can then be derived from Ki = Ki0K i : 8.3. Information Splitting. Chari et. al [8] propose to mask the internal bits of data used by algorithms. This they do by splitting the bits into bit shares and processing the bit shares in a way so that on recombining we obtain the correct result. In this way the target bits for the DPA selector function are not exposed internally to the processor and so will hopefully have no eect on the power trace. Goubin and Patarin [19] suggest essentially the same method, but give a number of examples which we shall discuss below. They propose splitting the operands into two and essentially duplicating the work load. This can lead to at least a doubling in the computing resources needed to perform the cryptographic operation, and hence may not be a suitable defence for CPU intensive applications or in environments where processing power is limited. Goubin and Patarin give the following general description of their method. Each intermediate value, V , in the algorithm is split into a set of k variables, V1 ; : : : ; Vk . This set is chosen, along with a function f , such that The set fV1 ; : : : ; Vk g allows us to obtain V , via V = f (V1 ; : : : ; Vk ): No subset of fV1 ; : : : ; Vk g will allow us to obtain V . All the required transforms to be performed on the Vi during the algorithm can be done so without resorting to the computation of V . They give a number of possible ways of implementing this for the DES algorithm. Using functions which are linear or quadratic bijections on a number of binary

16

N. P. SMART

variables. The main computational operations of DES are kept consistent via appropriate modi cation of the S -boxes. They propose a similar method for modular exponentiation, which they explain in the context of the RSA algorithm. We shall present it in terms of a general exponentiation which can apply to an arbitrary protocol. The following information splitting idea is based on the associative and commutative nature of multiplication in an Abelian group. We wish to compute gx for some known element g and some secret element x. We rst split g into two random shares g1 and g2 so that g = g1 g2 : This is easy to perform since we compute a random element g1 2 G and then set g2 = g1,1 g. We then compute g1x and g2x: The nal result then being gx = (g1x )(g2x). Hence in the above general notation we are using V = f (V1 ; V2 ) = V1 V2 : Algorithm 5 (Split Exponentiation). 1. Set z1 = 1. 2. Set z2 = 1. 3. For i = d , 1 to 0. (a) z1 = z12. (b) z2 = z22. (c) If xi = 1 then (i) z1 = z1 g1 . (ii) z2 = z2 g2 . 4. Output z = z1 z2 . Clearly the above can be applied to any exponentiation algorithm. However, it will not protect against SPA attacks (for the binary method) but will help protect against DPA and Timing attacks. Hence this splitting idea could be used in conjunction with the balanced exponentiation algorithms considered earlier. 8.4. Conclusion. We have explained how power analysis and timing analysis can be applied to a number of cryptographic algorithms. Standard attacks based on partial key exposure for various systems means that a little physical side channel leakage can go a long way. In addition we have described a number of the proposed defences that have been given in the open literature. There are probably a number of other techniques which are proprietary and which are not available for public discussion. It would appear that a number of algorithmic changes need to be made to standard implementation techniques for cryptographic protocols. All of which carry a large computational cost in terms of CPU time, memory or additional silicon. The main hardware based defences only aim to increase the signal to noise ratio and hence cannot fully theoretically defeat such attacks, although the practical eect will stop an attacker with limited resources. Hence both the algorithmic and hardware defences are not a complete solution and more research is likely to be needed in the coming years.


17

Finally we close on noting that public key protocols which use ephemeral exponents, such as DSA and Ephemeral Die-Hellman and their elliptic curve variants, appear to oer a more protection than systems which use xed exponent such as RSA. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

References R. Anderson and M. Kuhn. Tamper resistance - a cautionary note. In The Second USENIX Workshop on Electronic Commerce Proceedings pp 1{11, Oakland, California, November 18-21, 1996. R. Anderson and M. Kuhn. Low cost attacks on tamper resistant devices. In Security Protocols, Springer LNCS 1361, pp 125{136, 1997. M. Bellare and P. Rogaway. Optimal asymmetric encryption. In Advances in Cryptology, EUROCRYPT '94, Springer LNCS 950, pp 92{111, 1994. E. Biham and A. Shamir. Power analysis of the key scheduling of the AES candidates. Second Advanced Encryption Standard Candidate Conference, Rome March 1999. I.F. Blake, G. Seroussi and N.P. Smart. Elliptic curves in cryptography. Cambridge University Press, 1999. D. Boneh. Twenty years of attacks on the RSA cryptosystem. Notices of the American Mathematical Society, 46, 203{213, 1999. D. Boneh, G. Durfee and Y. Frankel. An attack on RSA given a small fraction of the private key bits. In Advances in Cryptology, AsiaCrypt '98, Springer LNCS 1514, pp 25{34, 1998. S. Chari, C.S. Jutla, J.R. Rao and P. Rohatgi. Towards sound approaches to counteract power-analysis attacks. In Advances in Cryptology, CRYPTO '99, Springer LNCS 1666, pp 398{412, 1999. S. Chari, C.S. Jutla, J.R. Rao and P. Rohatgi. A cautionary note regarding evaluation of AES candidates on smartcards. Second Advanced Encryption Standard Candidate Conference, Rome March 1999. C. Clavier, J.-S. Coron and N. Dabbous. Dierential power analysis in the presence of hardware countermeasures. To appear CHES 2000. D. Coppersmith. Small solutions to polynomial equations, and low exponent RSA vulnerabilities. Journal of Cryptology, 10, 233{260, 1997. J.-S. Coron. Resistance against dierential power analysis for elliptic curve cryptosystems. In CHES 1999, Springer LNCS 1717, pp 292{302, 1999. J.-S. Coron and L. Goubin On boolean and arithmetic masking against dierential power analysis To appear CHES 2000. FIPS 46. Data Encryption Standard. NIST, 1977 Revised as FIPS 46-1:1988; FIPS 46-2:1993 J.-F. Dhem, K. Koeune, P.-A. Leroux, P. Mestre, J.-J. Quisquater and J.-L. Williams. A practical implementation of the timing attack. In proceedings of CARDIS 98, or Universite Catholique de Louvain, Crypto Group Technical Report. M.A. Hasan. Power analysis attacks and algorithmic alterations to their countermeasures for Koblitz curve cryptosystems. To appear CHES 2000. A. Hevia and M. Kiwi. Strength of two data encryption standard implementations under timing attacks ACM Transactions on Information and System Security, 2, 416{437, 1999. N.A. Howgrave-Graham and N.P. Smart. Lattice attacks on digital signature schemes. To appear Designs, Codes and Cryptography. L. Goubin and J. Patarin. DES and dierential power analysis. The \duplication method". In CHES 1999, Springer LNCS 1717, pp 158{172, 1999. N. Koblitz. Elliptic curve cryptosystems. Math. Comp., 48, 203{209, 1987. P. Kocher. Timing attacks on implementations of Die-Hellman, RSA, DSS and other systems. In Advances in Cryptology, CRYPTO '96, Springer LNCS 1109, pp 104{113, 1996. P. Kocher, J. Jae and B. Jun. Dierential power analysis. In Advances in Cryptology, CRYPTO '99, Springer LNCS 1666, pp 388{397, 1999. P. Kocher, J. Jae and B. Jun. Balanced cryptographic computational method and apparatus for leak minimization in smartcards and other cryptosystems. World Patent No. WO9967766, 1999. O. Kommerling and M. Kuhn. Design principles for tamper-resistant smartcard processors. In USENIX Workshop on Smartcard Technology, Chicago, Illinois, USA, May 10-11, 1999.

18

N. P. SMART

[25] R. Mayer-Sommer. Smartly analyzing the simplicity and the power of simple power analysis on smart cards. To appear CHES 2000. [26] A.J. Menezes, P.C. van Oorschot and S.A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. [27] T.S. Messerges, E.A. Dabbish and R.H. Sloan. Power analysis attacks of modular exponentiation in smartcards In CHES 1999, Springer LNCS 1717, pp 144{157, 1999. [28] T.S. Messerges, E.A. Dabbish and R.H. Sloan. Investigations of power analysis attacks on smartcards In Proceedings of USENIX workshop on Smartcard Technologies pp 151-161, 1999. [29] V. Miller. Use of elliptic curves in cryptography. In Advances in Cryptology, CRYPTO - '85, Springer LNCS 218, 47{426, 1986. [30] P.L. Montgomery. Modular multiplication without trial division. Math. Comp., 44, 519{521, 1985. [31] R. Rivest, A. Shamir and L. Adleman. Cryptographic communications system and method. US Patent 4,405,829, 1983. [32] W. Schindler. A timing attack against RSa with Chinese Remainder Theorem. To appear CHES 2000. [33] A. Shamir. Protecting smart cards from passive power analysis with detached power supplies. To appear CHES 2000. Computer Science Department,, Woodland Road,, University of Bristol, BS8 1UB, UK

E-mail address :

[email protected]