Linear Regression Attack with F-test: A New. SCARE Technique for Secret Block Ciphers. Si Gao1,2, Hua Chen1(B), Wenling Wu1, Limin Fan1, Jingyi Feng1,2,.
Linear Regression Attack with F-test: A New SCARE Technique for Secret Block Ciphers Si Gao1,2 , Hua Chen1(B) , Wenling Wu1 , Limin Fan1 , Jingyi Feng1,2 , and Xiangliang Ma1,2 1
Trusted Computing and Information Assurance Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China {gaosi,chenhua,wwl,fanlimin,fengjingyi,maxiangliang}@tca.iscas.ac.cn 2 University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China
Abstract. The past ten years have seen tremendous progress in the uptake of side channel analysis in various applications. Among them, Side Channel Analysis for Reverse Engineering (SCARE) is an especially fruitful area. Taking the side channel leakage into account, SCARE efficiently recovers secret ciphers in a non-destructive and nonintrusive manner. Unfortunately, most previous works focus on customizing SCARE for a certain type of ciphers or implementations. In this paper, we ask whether the attacker can loosen these restrictions and reverse secret block ciphers in a more general manner. To this end, we propose a SCARE based on Linear Regression Attack (LRA), which simultaneously detects and analyzes the power leakages of the secret encryption process. Compared with the previous SCAREs, our approach uses less a priori knowledge, covers more block cipher instances in a completely non-profiled manner. Moreover, we further present a complete SCARE flow with realistic power measurements of an unprotected software implementation. From traces that can barely recognize the encryption rounds, our experiments demonstrate how the underlying cipher can be recovered step-by-step. Although our approach still has some limitations, we believe it can serve as an alternative tool for reverse engineering in the future.
Keywords: Linear Regression Attack
1
· SCARE · F-test
Introduction
Over the past decades, Side Channel Attacks (SCA) posed a major threat for many cryptographic implementations. As a powerful tool, SCA also shows great potential in many non-key-recovery applications, including Side Channel Analysis for Reverse Engineering (SCARE). In general, reversing a secret cipher through cryptanalysis is quite difficult. With side channel leakage, things become much easier. Successful SCAREs have been proposed for many block c Springer International Publishing AG 2016 S. Foresti and G. Persiano (Eds.): CANS 2016, LNCS 10052, pp. 3–18, 2016. DOI: 10.1007/978-3-319-48965-0 1
4
S. Gao et al.
ciphers, including DES-like ciphers [1–3], AES-like ciphers [4] and general SPN ciphers [5]. Despite the tremendous progress in the literature, getting SCARE out of the lab is not an easy task. Most previous SCARE techniques, explicitly stated or not, have a few limitations on their target ciphers or implementations. Guilley et al.’s Sbox recovery is the only SCARE that has been verified with realistic measurements [3]. As their attack implicitly assumes the diffusion layer is a known bit-permutation, it only applies to DES-like ciphers. Other attacks rely on theoretical simulations [4,6,7] or measurement-aided simulations [5], which makes it hard to predict their actual performances in practice. In addition, most attacks rely on “collision-detection” technique, which suggests the attacker has to find the leakages of the same Sbox computation (preferably in the first round) to build templates. This requirement imposes further restrictions on the target cipher as well as implementation. Our Contribution. In this paper, we aim to extend the previous SCARE techniques with Linear Regression Attack (LRA) [8]. Compared with other power analyses, the advantage of LRA lies in its flexibility in the regression model. With the full basis, LRA detects any relevant power leakage, just like NICV [9]. Meanwhile, LRA can also perform regressions with different models, verifying various conjectures about the secret cipher. It is well known that the commonly used evaluation measure in LRA—coefficient of determination (R2 )— increases with the number of regressors [10]. In this paper, we suggest using F-test to fairly compare different models and reveal some inherent cryptographic operations. In SCARE, such attack further recovers the secret linear components, as well as the inputs of the Sboxes. Compared with the previous SCAREs, our approach has three advantages: first, it works in a general framework which covers many common structures (SPN, Feistel, generalized Feistel, etc.). Second, our attack takes less a priori knowledge about the target cipher or its implementation. In our attack, the attacker does not have to know things like the size of the Sboxes, the accurate location of each Sbox computation on the trace or the order of the permutation computation in advance. Last but not least, our approach is completely non-profiled. This means our attack works even if all the Sboxes in the encryption process are completely different, whereas all previous collisionbased SCAREs fail due to lack of valid templates. We have verified our attack with power leakages from an unprotected software implementation of DES. Our experiments present the complete SCARE flow in details, demonstrating how our LRA-based SCARE helps to determine the secret cipher step-by-step.
2 2.1
Preliminaries Previous SCARE Techniques
So far, most SCARE studies focus on block ciphers. As modern block ciphers usually contain non-linear (confusion) layers and linear (diffusion) layers, in the following, we discuss these two cases separately.
Linear Regression Attack with F-test
5
Sbox Recovery. Confusion layers often consist of several small components, called Substitution Boxes (Sboxes). For Sbox recovery, two types of SCAREs exist: – Collision-based SCARE [4–6]. As a prevalent tool in SCARE [4–6], collision attack exploits the similarity between the leakages from sequential computations of the same Sbox. Although marked as a non-profiled attack, collision attacks share exactly the same routine as Template Attack (TA) [11]. The only difference lies in the profiling stage, where collision attacks use other sequentially-implemented Sbox computations as the profiling trace set [12]. Since the leakages of the exact same Sbox computation are not always available, this “online profiling” stage imposes restrictions on the implementations as well as the target ciphers. For instance, if the target cipher is DES, the attacker cannot build templates with the first round’s Sboxes, due to the secret expansion transformation E. As DES uses 8 different Sboxes, finding collision within the first round [5] is also impossible. Besides, collision attacks usually requires the accurate points of interest to build effective templates. Without any a priori knowledge, finding the accurate points of interest is not an easy task in practice. As a result, none of the previous collision-based SCAREs validated their attack with realistic experiments. – Guilley et al.’s Sbox Recovery [3]. In 2010. Guilley et al. proposed an Sbox recovery technique based on 1 bit CPA. As a nominal distinguisher, 1bit CPA does not require a priori knowledge about the leakage model or the accurate points of interest. To our knowledge, this is the only SCARE that verified with realistic hardware implementations (DPAContest v1). However, in order to focus on one single output bit, the authors use an “output mask”. Technically speaking, this means the attacker needs to find which bit in the right register should store the guessed bit, as well as the last value of this register (according to the Hamming Distance (HD) model). In other words, these masks implicitly assume the attacker already know the diffusion layer is a bit-permutation and the underlying cipher uses Feistel structure. Linear Component Recovery. To our knowledge, Daudigny et al. ’s DES recovery is the only SCARE devoted to the diffusion layer. Unfortunately, their work relies heavily on the specific implementation [1]. Specifically, in the permutation recovery, the authors assume the corresponding state is computed from the most to the least significant bit, and use the time order of all bits as the permutation table. If the implementation uses any other order, their SCARE fails. Other attacks recover linear components from the Sboxes’ power consumption. In collision-based SCAREs, the linear part is treated as a secret matrix, which can be determined from a lot of collision equations [5]. In this case, recovering the linear components shares the same preconditions, as long as the unknown linear part does not hinder the Sbox recovery. 2.2
Linear Regression Attack
In 2005, Schindler et al. proposed the Stochastic Attack [13] as an efficient alternative for Template Attack [11]. With coefficient of determination (R2 ),
6
S. Gao et al.
Doget et al. further developed a non-profiled key-recovery attack [8]. In some papers [8,14], this extension is noted as “Linear Regression Attack” (LRA). A typical LRA works as follows: if the attacker wishes to recover a secret key byte k, he can measure the power consumptions of some key-related operations in the encryption process. Denote the n-bit intermediate state as x, the datadependent power leakage can be written as L(x), where L stands for the leakage ˆ the function. Since the encryption algorithm is given, with any key guess k, attacker can compute the corresponding intermediate state xkˆ . As the leakage L(x) only relates to the correct intermediate state xk , comparing L(x) with all xkˆ gives a clue for the correct key. Specifically, the attacker chooses a t-length n regression basis Gb = xb1 , xb2 , ..., xbt , where bi ∈ F2n and xu = xi ui (xi i=1
is the i-th bit of x and ui is the i-th bit of u). With N times measurements l and ˆ the leakage function can be estimated as L(x ˆ ˆ ) = β0 + β1 xb1 + a key guess k, ˆ k k β2 xkbˆ2 + ... + βt xkbˆt , where
⎞ xkbˆ1 (1) . . . xkbˆt (1) ⎜ .. .. ⎟ .. ⎟ Akˆ = ⎜ . . . ⎠ ⎝ b1 bt xkˆ (N ) · · · xkˆ (N ) −1 βkˆ = A A A ˆ ˆ k ˆ (l(1), ...., l(N )) k k ⎛
l(i) is the i-th measurement and x(i) is the corresponding intermediate state. If the attacker uses a valid assumption about L(x) (i.e. chooses a valid Gb ), only the correct key guess gives a valid regression. Thus, the attacker can use the coefficient of determination (R2 ) as a distinguisher [8] 2 N
ˆ ˆ (x(i)) l(i) − L k Rk2ˆ = 1 − i=1 N 2
l(i) − ¯l i=1 k = arg max Rk2ˆ ˆ k
Theoretically speaking, R2 provides a measure of how well the observed outcomes are replicated by the model, as the proportion of total variation of outcomes explained by the model [10]. Since the regression with the wrong intermediate state cannot effectively explain the variance, key guesses with higher R2 are more likely to be correct.
3
LRA with F-test: A Useful Tool
Although LRA is a powerful key-recovery attack, directly applying it in SCARE gives poor results. Unlike SCA, SCARE usually needs to compare different
Linear Regression Attack with F-test
7
models. Unfortunately, R2 is not suitable for this task. In this section, we perform F-test to compare LRA results from different regression models. Although not explicitly stated, Whitnall’s stepwise regression uses the same technique [14]. In this section, we take one step further and discuss how F-test can help us in the field of reverse engineering. 3.1
Motivation
In regression, R2 is a statistical measure of how well the regression approximates the real data points. However, R2 alone cannot be used as a meaningful comparison of models with different numbers of independent variables. As a matter of fact, R2 spuriously increases when extra explanatory variables are added to the model. In this case, it is hard to tell whether the new model is more effective than the old one. This problem seldom affects LRA in a key-recovery scenario: in most block ciphers, the secret key only affects the value of the explanatory variables. Since all the key guesses share the same regression model, the highest R2 indicates the best regression. In SCARE, the story is completely different: as SCARE’s target involves the regression model itself, using LRA in SCARE will inevitably face the problem of comparing different regression models. 3.2
F-test with Nested Model
A well-known solution for this problem would be introducing F-test between two models [15]. In statistics, two models are “nested” if one model (the full model M2 ) contains all the terms of the other (the restricted model M1 ), and at least one additional term. To determine whether the restricted model is adequate, we can test the following hypothesis H0 : the restricted model is adequate H1 : the full model is better with F statistic RSS1 − RSS2 N − p2 + 1 ∼ F (p2 − p1 , N − p2 + 1) RSS2 p2 − p1 where p1 (p2 ) stands for the number of explanatory variables in M1 (M2 ), RSS1 (RSS2 ) represents the residual sum of squares, and N is the number of measurements. Following the notations in Sect. 2, the residual sum of squares (RSS) is defined as 2 ˆ j (x(i)) l(i) − L RSSj = i
The null hypothesis is rejected if this statistic is greater than the critical value of the F-distribution for some desired false-rejection probability α.
8
3.3
S. Gao et al.
Applications in SCARE
In SCAREs, LRA with F-test can help us verify various conjectures. For instance, considering the case where we wish to decide whether a regression model can explain the variance of the power measurements. Given a false-rejection probability α, F-test determines whether the regression is valid, considering both the sample size N and the number of explanatory variables. Specifically, let M0 denote the model that contains only the constant term (the restricted model ), while M1 is the tested regression model (the full model ). If the F-test above rejects H0 with high confidence, the power measurements are somehow related to the model M1 . This test helps us distinguish whether the resultant R2 represents a valid regression or the consequence of random noises. In the following, this test is denoted as the ValidTest. Another interesting application is to separate parallel signals from signals that actually “mix” together in the cryptographic computations. Suppose we have some intermediate state x and the corresponding power leakage l, and wish to determine whether l comes from x itself or some cryptographic computations of x. Throughout this paper, we assume the majority of the power leakage follows the weighted Hamming Weight model, where L(x) = β0 + β1 x1 + ... + βn xn . Take the two-bit x = {x0 , x1 } as a toy example, following the weighted Hamming Weight model, the power leakage can be written as L(x) = β0 + β1 x0 + β2 x1 . If some cryptographic computations (e.g. XOR) occur, the expression of L(x) also contains β3 x0 x1 . Thus, the following hypothesis test applies: H0 : M0 with regression basis {1, x0 , x1 } is adequate H1 : M1 with regression basis {1, x0 , x1 , x0 x1 } is better If the F-test accepts H0 with high confidence, we can conclude that x0 and x1 are simply parallel implemented. Otherwise, it suggests there might be some cryptographic operations performed with both x0 and x1 . Similarly, for a d-bit group {x1 , x2 , ..., xd }, if we wish to test whether the i-th bit of x (xi ) mixes with other bits, we can use the following hypothesis test: H0 : M0 with regression basis G0 = {xu |u ∈ F2d ∧ ui = 0} ∪ {xi } is adequate H1 : M1 with regression basis G1 = {xu |u ∈ F2d } is better As this test aims to prune irrelevant bits, in the following sections, we denote this test as the PruningTest.
4
A Realistic LRA-Based SCARE
This section further explains how our LRA with F-test helps to reveal the secret cryptographic components. For this purpose, we chose an unprotected software implementation of DES as our target. The power consumptions were measured with a LeCroy WaveRunner 610Zi oscilloscope at a sampling rate of 20 MSa/s. The entire trace set contains 20 000 traces, with 80 000 samples covering the first 3 rounds. As the power consumption of unprotected software implementation can be easily exploited, in our experiments, we only use the first 2 000 traces. Throughout this section, we assume the attacker does not know the underlying cipher (DES) or the specific implementation.
Linear Regression Attack with F-test
4.1
9
Generalized Structure of the Target Cipher
In order to formally define a general flow for SCARE, we start our discussion by proposing a generalized structure that covers most common block ciphers. Many previous SCAREs assume their target ciphers use either the SubstitutionPermutation Network (SPN) or the standard Feistel structure. Although those choices are quite popular, with LRA, we can do better.
Fig. 1. Structure overview
In Fig. 1(a), P0 and P1 represent linear operations, while S stands for the non-linear operation. It is not hard to see that the standard SPN (Fig. 1(b)) and Feistel structure (Fig. 1(c)) can be regarded as special cases of this generalized scheme. Many other schemes, including the generalized Feistel structure, can also be expressed by the generalized structure in Fig. 1(a) similarly. It is worth mentioning that in a few cases, Fig. 1(a) may not correspond to a full encryption round: if the round function uses more than one confusion layers, it should be expressed as multiple rounds in Fig. 1(a). As we can see in Fig. 1(d), our target cipher DES fits this scheme perfectly. Secret key in SCARE. In most SCAREs, the secret key is simply regarded as a part of the secret cipher. Specifically, if the secret key k is added before an Sbox S, SCARE can only recover an equivalent Sbox S where S (x) = S(x ⊕ k). Similar equivalence holds if k is added to other positions. In the following, we simply ignore the secret key and recover it as a part of the secret Sboxes.
10
4.2
S. Gao et al.
Preparation
Before any reverse engineering, the attacker firstly observes the measured traces and tries to learn some basic facts about the secret encryption procedure. In our experiments, the attacker can easily identify three repetitive patterns on the trace, which correspond to the first three encryption rounds. However, locating each cryptographic operation on the trace is much harder. Indeed, without any a priori knowledge, the attacker cannot even infer the number of Sbox with confidence. Due to the length limit, we omit the measured trace figures here: interested reader can find these figures in the full version of this paper. 4.3
Step 1: Recovering P0
Let n denote the block length. Assume P0 has m0 bits independent outputs, the operation of P0 can be written as (y1 , y2 , ..., ym0 ) = P0 (x1 , x2 , ..., xn ) , where P0 is a binary matrix. Our goal is to determine each yq , which can be written as a linear combination of {x1 , x2 , ..., xn }. Apparently, we can also remove all xi with coefficient 0 and simply write yq = ⊕ xq j j
where xq = xq 1 , xq 2 , ..., xq d represents the d input bits with coefficient 1. Thus, recovering P0 equals to finding xq from {x1 , x2 , ..., xn }. Given an input bit group guess x ˜q , we can fit the leakage from the Sboxes’ input (P0 ’s output) with full basis LRA. With some false-rejection probability α, the ValidTest shows whether there is a connection between the power leakage and x ˜q . If there ˜q may still involve some irrelevant is, x ˜q can express some yq . Meanwhile, x input bits. The PruningTest finds the input bits that do not appear in the ˜q is the exact expression of yq . If both tests reject H0 , we can conclude that x relevant input for some yq . The detailed procedure is presented in Algorithm 1. Noted the LRA the ValidTest uses the constant basis G0 = {1} and the in ˜uq |u ∈ F2d , while the LRA in the i-th PruningTest uses full basis G1 = x u u G2 = x ˜q |u ∈ F2d ∧ ui = 0 ∪ {xqi } and the full basis G1 = x ˜q |u ∈ F2d . Theoretically, Algorithm 1 only succeeds when the target state yq is related to every single bit in x ˜q . According to our discussion above, XORing all bits in x ˜q together gives us a candidate for yq . Thus, the attacker can perform one last ValidTest with this candidate bit: if this bit does lead to a valid regression, we have found some yq . This test blocks out many undesirable cases, such as non-linear leakages or x ˜q expresses more than one yq . With Algorithm 1 identifying the correct input bits, all output bits can be found by simply enumerating all possible input guesses x ˜q . Considering the implementation cost, designers tend to choose a lightweight matrix as the diffusion layer. Thus, the size of the correlated bit group (d) is more likely to be a small number. To this end, the enumeration starts with the smaller group guesses (smaller d) and moves towards the larger ones (larger d). As m0 cannot
Linear Regression Attack with F-test
11
Algorithm 1. LRA based SCARE test 1: procedure SCAREtest(˜ xq ) xq ) Test whether x ˜q can explain the power variance 2: [pr1 , R2 ]=ValidTest(˜ 3: if pr1 > 1 − α then 4: for i = 1 to d do xq ) Test if xqi is relevant 5: pr2 [i]=PruningTest(i,˜ 6: if pr2 [i] < 1 − α then 7: return “Error 2” x ˜q contains irrelevant bit xqi 8: return min(pr2 [1..d])R2 9: else 10: return “Error 1” x ˜q cannot explain the power variance
be efficiently determined in advance, the attacker must abort the enumeration whenever he believes he has found enough yq . Assuming yq contains at most d bits of x, the enumerations above takes Cnd times LRA to find P0 . For d 8, this approach becomes too expensive. Optimization. Clearly, Algorithm 1 returns two types of errors: with Error 2, it ˜q cannot form a valid suggests that x ˜q contains an irrelevant bit xqi . Otherwise, x regression. As the first case limits the expression of yq to a smaller range, we can build a more efficient version of this attack. Suppose we choose a dg -bit group guess where dg > d, Algorithm 1 verifies whether it causes a valid regression with the ValidTest. If it does, as the PruningTest gives clues about which bit is irrelevant, finding the exact input should be easy. In this case, we wish to find the minimal dg -bit groups that covers all possible d-bit groups. This problem equals to finding the covering set of a hypergraph. According to R¨ odl’s conclusion [16], as n → ∞, Cd M (n, dg , d) → dn Cdg Thus, if the attacker estimates the expressions of all yq contain at most d input bits, enumerating all dg group guesses above gives all yq . Dan Gordon’s web site provides some known covering sets [17]. Note that this trick should only be applied when d is large, as the covering problem of a hypergraph is quite complicated itself. For clarity, we present the pseudo-code of this optimization in Algorithm 2. Experiments. Considering P0 is the first cryptographic operation in Fig. 1(a), in our experiments, we have tested our attack with the first half of the first round’s trace. With α = 0.01 %, only 32 bits pass our ValidTest. Since P0 ’s output involves half of the plaintext bits, an experienced attacker may guess that P0 is a bit permutation. Table 1 lists our P0 ’s recovery with various numbers of traces. According to the IP transformation in DES, our P0 ’s recovery gives 100 % accurate result with 2000 traces. With 500 traces, our recovery gives one Type II error (“false negative”), which means one of P0 ’s output bit is filtered out.
12
S. Gao et al.
Algorithm 2. Linear Component Recovery: the optimized approach Require: n-bit input list x = {x1 , x2 , ..., xn }, guessed length d and a parameter dg > d 1: procedure LinearRecovery 2: List=φ ˜q ⊂ x in the covering set do 3: for each dg -bit group x Test if x ˜q is the expression of some yi 4: result=SCARETEST(˜ xq ) 5: if result=“Error 1” then 6: continue 7: else 8: Remove the extra bits using the PruningTest 9: candidate=⊕ xq j j 10: if candidate passes SCARETEST then Make sure ⊕ xq j is valid 11: 12:
j
List=List ∪ x ˜q return List
Add candidate as a output bit
Interestingly, our recovery did not report any Type I error (“false positive”), which means there was no incorrect bit in the recovered P0 ’s output. Table 1. Recovering P0 in the first round α
Number of Recovered Correct Type I Type II traces bits bits error error
0.01 % 500
31
31
0
1
0.01 % 2000
32
32
0
0
Since the power measurements do not contain any information about the bit order of P0 , here we can only retrieve P0 up to its bit-permutation equivalent. This sets stage for our next step. 4.4
Step 2: Recovering S1
As mentioned before, after the first step, we do not have the inputs of each Sbox. In order to further recover the secret Sboxes, we have to find the actual input of each Sbox first. We can perform a similar attack to obtain the Sboxes’ inputs. The only difference lies in our leakage choice: here we choose the leakage of the Sboxes’ outputs instead. Typically, if an Sbox is cryptographically strong, LRA should not predict its output, unless the guessed input group x ˜q contains all of its input. To this end, the attack procedure is exactly the same as Step 1, except for the last “XOR test”. Although the trivial enumeration works for most popular Sbox sizes, it is worth mentioning there is a trick that can significantly speed up this process.
Linear Regression Attack with F-test
13
Property 1. Let y = S(x) denote the Sbox computation and l(x) denote the corresponding power leakage. Suppose the l(x) follows the weighted Hamming Weight model of y, as the sample size N → ∞, LRA with full basis of x satisfies lim Rf2 =
N →∞
1 1+
1 SN R
where SNR is the Signal-to-Noise Ratio. For most commonly used S, if one bit xi is removed from the regression basis, LRA with the partial basis of x satisfies lim Rp2 ≈
N →∞
1 1 2 1 + SN1 R
Proof sketch. Without loss of generality, we first limit the output of S(x) to 1 bit. In this case, the leakage can be written as l(x) = β1 S(x) + β0 + n, where n represents the independent Gaussian noise with mean 0 and variance σ 2 . It is not hard to see that lim Rf2 = 1 −
N →∞
σ2 1 = SN Rσ 2 + σ 2 1 + SN1 R
Let x = {x1 , x2 , ..., xn } \ {xi }, any S(x) can be written as S0 (x )(1 − xi ) + S1 (x )xi . If the target Sbox is cryptographic strong, without too much bias, we can assume S0 (x ) and S1 (x ) are nearly independent from one another1 . Since LRA with x combines the point of x|xi = 0 and x|xi = 1, according to the Least Square Regression, the resultant point l(x ) ≈ 0.5β1 (S0 (x ) + S1 (x )). Thus, 2 1 2 2 1 1 x (S0 (x ) − S1 (x )) + N σ 2 2 β1 = lim Rp ≈ 1 − 1 2 1 2 N →∞ 2 1 + N ( 4 β1 + σ ) SN R As the output bits of an Sbox should be independent from one another, such proof sketch can easily extend to the multi-bit case.
Optimization. This property suggests if a whole Sbox’s R2 is high enough, part of its input has a smaller, yet still significant R2 . Suppose an Sbox involves s input bits, most of its partial inputs (proper subsets) have a larger R2 than other irrelevant guesses. To this end, we can add some constraints in the enumeration, especially for s − 1 and s − 2-bit groups. For instance, if the ValidTest and PruningTest of a whole Sbox use significance level α = 0.01 %, we can loosen the restriction on the partial inputs with lower significance levels. In our experiments, the significance level α is set to {1 %, 0.1 %, 0.01 %} for input length {s − 2, s − 1, s}, respectively. Such constraints efficiently filter out many unnecessary guesses, improves the overall performance significantly. For clarity, we present the pseudo-code of this optimization in Algorithm 3. 1
If xi only appears in the linear terms or does not appear at all, the R2 above might be biased. However, considering the other output bits, the overall bias should be small. For a cryptographic strong Sbox, xi should appears in the non-linear terms in at least one output bit.
14
S. Gao et al.
Algorithm 3. Sbox Recovery: the optimized approach Require: n-bit input list x = {x1 , x2 , ..., xn }, guessed length s 1: procedure SboxRecovery 2: List=all possible groups with length s − 3 3: for s = s − 2 to s do ˜q ∈ List do 4: for each s − 1 bit group x 5: Remove x ˜q from List Generate new input groups with length s 6: Add a new bit into x ˜q 7: result=SCARETEST(˜ xq ) with significance level c[s ] 8: if result = “Error” then Add all possible subsets with length s 9: List=List ∪ x ˜q 10: for each s-bit group x ˜q ∈ List do Test the left groups in List 11: SuperSet=all s + 1 bit supersets of x ˜q 12: if a superset passes SCARETEST then x ˜q is a proper subset 13: Remove x ˜q from List 14: return List
Determine the size of the Sbox. If the size of the Sbox (s) is not given in advance, the attacker needs to find it through a few trail-and-error procedures. Specifically, ˜q . If s > s, the PruningTest let s denote the size of the guessed input group x tells us there are irrelevant bits in x ˜q . On the other hand, if s < s, according to Property 1, x ˜q should be a valid group with lower significance level. To make ˜q with size s + 1. sure x ˜q is a proper subset, we can test all the supersets of x If one superset is also a valid group with higher significance level, we know for sure that x ˜q is a proper subset of the Sbox’s input and s < s. Experiments. One major advantage of our approach, is that it does not require the actual points of interest. In theory, as our analysis only considers the power consumption of each Sbox’s output, our Sbox recovery should only uses the power leakages of the Sbox’s output. However, considering our PruningTest removes the valid regressions caused by “parallel effect”, the power consumption of the Sboxes’ inputs (or P0 ’s output) should be automatically discarded. For this reason, in our experiments, our test runs 100 times with all samples points in the first round. Since we do not know the order of P0 ’s output, each attempt uses a random order and returns a list of corresponding input bits. Table 2 demonstrates all the correct Sbox input groups with their success rates, as well as the incorrect group that our attack returned. With 2000 traces, our LRA-based SCARE always returns the correct Sbox input with 100 % accuracy, except for S5 . The left 5 cases returned a result list containing only 7 correct Sbox inputs. This is caused by our constraints on the enumeration procedure: with certain orders of P0 ’s output, our constraints may filter out the correct partial input group of S5 . If the attacker uses only 500 traces, as our discussion in Sect. 4.3, one of the output bit of P0 is missing. As a consequence, our attack in this section cannot find the corresponding Sbox inputs (S5 ). Meanwhile, with such limited trace set, our attack also returns some incorrect groups.
Linear Regression Attack with F-test
15
Table 2. Recovering the input of S No
Success rate (N=500) Success rate (N=2000)
S0
100/100
100/100
S1
100/100
100/100
S2
90/100
100/100
S3
100/100
100/100
S4
65/100
100/100
S5
0/100
95/100
S6
86/100
100/100
S7
30/100
100/100
Incorrect Group 17
0
Unlike the linear case, for Sboxes, we cannot determine the actual expression through the input bits. As our attack already finds each Sbox’s input, the attacker can pick several points of interest using NICV (with the recovered Sbox inputs, rather than the plaintexts). Since both the Sbox inputs and the accurate points of interest are already recovered, collision attacks can further recover this Sbox. The details of the collision attacks are out of the scope of this paper. Interested readers can find this part in Rivain et al.’s paper [5]. 4.5
Step 3: Recovering P1
As a linear transformation, P1 ’s recovery follows exactly the same routine as P0 . The input bits include the n-bit plaintext as well as all the output bits of S. Our target leakage comes from the Sboxes’ input (or P0 ’s output) in the second round. In our experiments, our test directly runs through all the sample points in the second round. With α = 0.01 % and N = 2000, our attack returns a list of 32 valid 2-bit candidates, whose XOR forms one of P1 ’s output bit. As we can see in Table 3, smaller trace set increases both Type I errors and Type II errors. Table 3. Recovering P1 with different numbers of traces α
4.6
Number of Candidates Correct Type I Type II traces bits bits error error
0.01 % 500
36
31
5
1
0.01 % 2000
32
32
0
0
The Complete Attack
Although presented step by step, we would like to stress that this attack still needs manual intervention. Considering the enormous space of all possible secret
16
S. Gao et al.
block ciphers, the information that power traces provides is not enough to determine all the details. As a result, all SCAREs require some empirical intervention, whether by guessing the structure or guessing the input size of certain components (e.g. Sboxes). Our attack here is no exception: in both Step 1 and Step 3, the attacker needs to decide whether he has find all the output bits. Noted this does not suggest our attack is inferior to the previous SCAREs: most previous works directly assume the attacker already knows those parameters (e.g. the cryptographic structure, the size of the Sbox, the output size of the permutation, etc.). Indeed, most SCAs today still requires some manual interventions in the preparation stage, whether by identifying the encryption rounds or removing some outlier traces. We believe our SCARE should be regarded as a handy tool for experienced attackers, rather than an automatic attack. In addition, we did not bother to cover all possible block ciphers with our SCARE. Considering the enormous space of all possible secret block ciphers, we believe it make more sense to focus on the most common designs: arbitrary algorithms with exotic features usually require ad-hoc solutions, which is out of the scope of this paper.
5
Discussion
In the last section, we propose a general LRA-based SCARE and verify it with realistic power leakages. Specifically, our analysis uses a quite general structure, which covers most common block ciphers. Unlike the collision attacks [5], the 8 different Sboxes and the Expansion E in DES do not hinder our SCARE. In addition, in our analysis, the attacker does not have to accurately locate each Sbox on the leakage trace. Leakage model. The major limitation of our approach, is that it only works with linear leakage in theory. This is indeed an inherent drawback: in LRA, the secret recovery relies on the fact that the attacker can decide whether the corresponding regression function looks like the correct leakage function. If the leakage function contains non-linear terms, the attacker cannot decide whether the nonlinear terms come from the leakage function or the cryptographic operation. Collision attacks do not face this problem, since they use an “online profiling” stage to characterize the leakage function [5]. This is actually an inevitable tradeoff: without any assumption on the leakage function, non-profiled SCA cannot successfully attack any bijective cryptographic operation [14]. Nonetheless, our LRA-based SCARE still works when the leakage function can be approximated as a linear function. As LRA with linear basis always gives good regression, adding non-linear terms cannot provide a significant better regression. The significance level α. The significance level α plays an important role in our LRA-based SCARE. α helps to decide whether increment of R2 should be regarded as the consequence of a better regression model or negligible noise. In our paper, we simply choose a common significance level (α = 0.01 %) in the hypothesis testing. This α works well in our experiments in Sect. 4. For other
Linear Regression Attack with F-test
17
implementations, α = 0.01 % may not always be a good choice. As α depends on the specific leakage features, the attacker may have to test several common values and estimate which recovery is more likely to be correct. Parallel or Hardware Implementations. Theoretically, as our PruningTest automatically removes the “parallel effect”, our LRA-based SCARE should also work for parallel implementations. However, in our experience, LRA-based SCARE can learn some information from parallel implementations, although the result is far from satisfying. Indeed, most previous SCAREs explicitly assume the underlying implementation is sequential. In addition, it might be interesting to ask whether our attack can be extended to hardware implementations, with the Hamming Distance (HD) model. The problem of the HD model is, it involves the state of the last round. For SCARE, learning the last state means the attacker has to learn the specific implementation code as well as the underlying data-path. Considering the context of SCARE, we believe it makes more sense to avoid such assumption: however, if the last state is already given, our attack works exactly the same way.
6
Conclusion
Despite various SCARE techniques in literature, recovering a secret cipher in practice, is not an easy task. In fact, most previous SCAREs have some limitations on their target ciphers or implementations. In this paper, we propose a new SCARE technique based on Linear Regression Attack (LRA). Specifically, in order to fairly compare different regression models, we perform F-test against the regression results. LRA with F-test helps us successfully recover linear components as well as the Sboxes’ inputs, without much a priori knowledge about the underlying cipher or its implementation. Compared with the previous SCAREs, our approach uses less a priori knowledge, covers more block cipher instances in a completely non-profiled manner. We have verified our attack with real-life measurements from an unprotected software implementation of DES. Experiments confirm that our attack works well with realistic measurements, extracting valuable information for experienced attackers. Although our approach still has some limitations, we believe it can serve as an alternative tool for reverse engineering in the future. Acknowledgements. We would like to thank the anonymous reviewers for providing valuable comments. This work is supported by the National Basic Research Program of China (No.2013CB338002) and National Natural Science Foundation of China (No. 61272476, 61672509 and 61232009).
References 1. Daudigny, R., Ledig, H., Muller, F., Valette, F.: SCARE of the DES. In: Ioannidis, J., Keromytis, A., Yung, M. (eds.) ACNS 2005. LNCS, vol. 3531, pp. 393–406. Springer, Heidelberg (2005). doi:10.1007/11496137 27
18
S. Gao et al.
2. R´eal, D., Dubois, V., Guilloux, A.-M., Valette, F., Drissi, M.: SCARE of an unknown hardware Feistel implementation. In: Grimaud, G., Standaert, F.-X. (eds.) CARDIS 2008. LNCS, vol. 5189, pp. 218–227. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85893-5 16 3. Guilley, S., Sauvage, L., Micolod, J., R´eal, D., Valette, F.: Defeating any secret cryptography with SCARE attacks. In: Abdalla, M., Barreto, P.S.L.M. (eds.) LATINCRYPT 2010. LNCS, vol. 6212, pp. 273–293. Springer, Heidelberg (2010). doi:10. 1007/978-3-642-14712-8 17 4. Clavier, C., Isorez, Q., Wurcker, A.: Complete SCARE of AES-Like block ciphers by chosen plaintext collision power analysis. In: Paul, G., Vaudenay, S. (eds.) INDOCRYPT 2013. LNCS, vol. 8250, pp. 116–135. Springer, Heidelberg (2013). doi:10.1007/978-3-319-03515-4 8 5. Rivain, M., Roche, T.: SCARE of secret ciphers with SPN structures. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8269, pp. 526–544. Springer, Heidelberg (2013). doi:10.1007/978-3-642-42033-7 27 6. Clavier, C.: An improved SCARE cryptanalysis against a secret A3/A8 GSM algorithm. In: McDaniel, P., Gupta, S.K. (eds.) ICISS 2007. LNCS, vol. 4812, pp. 143–155. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77086-2 11 7. Novak, R.: Side-channel attack on substitution blocks. In: Zhou, J., Yung, M., Han, Y. (eds.) ACNS 2003. LNCS, vol. 2846, pp. 307–318. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45203-4 24 8. Doget, J., Prouff, E., Rivain, M., Standaert, F.X.: Univariate side channel attacks and leakage modeling. J. Crypt. Eng. 1(2), 123–144 (2011) 9. Bhasin, S., Danger, J.L., Guilley, S., Najm, Z.: NICV: normalized inter-class variance for detection of side-channel leakage. In: 2014 International Symposium on Electromagnetic Compatibility, Tokyo (EMC 2014/Tokyo), pp. 310–313 (2014) 10. Wiki: Coefficient of determination. http://en.wikipedia.org/wiki/Coefficient of determination 11. Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Ko¸c, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). doi:10.1007/3-540-36400-5 3 12. G´erard, B., Standaert, F.X.: Unified and optimized linear collision attacks and their application in a non-profiled setting: extended version. J. Crypt. Eng. 3(1), 45–58 (2013) 13. Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). doi:10.1007/11545262 3 14. Whitnall, C., Oswald, E., Standaert, F.-X.: The myth of generic DPA. . .and the magic of learning. In: Benaloh, J. (ed.) CT-RSA 2014. LNCS, vol. 8366, pp. 183– 205. Springer, Heidelberg (2014). doi:10.1007/978-3-319-04852-9 10 15. Allen, M.P.: Understanding Regression Analysis. Springer Science & Business Media, New York (1997) 16. Frankl, P., R¨ odl, V.: Near perfect coverings in graphs and hypergraphs. Eur. J. Comb. 6(4), 317–326 (1985) 17. Gordon, D.: La Jolla Covering Repository. https://www.ccrwest.org/cover.html