Toward the True Random Cipher: On Expected ... - Semantic Scholar

1 downloads 0 Views 227KB Size Report
tution components (s-boxes) of an SPN are randomly selected, then the ... claim that the SPN structure is a practical approximation of the true random cipher.
Toward the True Random Cipher: On Expected Linear Probability Values for SPNs with Randomly Selected S-Boxes ∗ Liam Keliher



Henk Meijer



Stafford Tavares

§

Abstract A block cipher, which is an important cryptographic primitive, is a bijective mapping from {0, 1}N to {0, 1}N (N is called the block size), parameterized by a key. In the true random cipher, each key results in a distinct mapping, and every mapping is realized by some key—this is generally taken to be the ideal cipher model. We consider a fundamental block cipher architecture called a substitution-permutation network (SPN). Specifically, we investigate expected linear probability (ELP) values for SPNs, which are the basis for a powerful attack called linear cryptanalysis. We show that if the substitution components (s-boxes) of an SPN are randomly selected, then the expected value of any ELP entry converges to the corresponding value for the true random cipher, as the number of encryption rounds is increased. This gives quantitative support to the claim that the SPN structure is a practical approximation of the true random cipher. Keywords: cryptography, block ciphers, substitution-permutation networks, expected linear probability, true random cipher

1

Introduction

Block ciphers are an important class of cryptographic algorithms, often used for the efficient encryption of large volumes of information. They can serve as cryptographic primitives in larger security frameworks, for example, the systems used to conduct secure e-commerce over the Internet. A block cipher is a bijective mapping from {0, 1}N to {0, 1}N (N is called the block size), parameterized by a bitstring called a key, denoted k. Typically k is secret, known only to the communicating parties. Common block sizes are 64 and 128 bits. The input to a block cipher is called a plaintext, and the output is called a ciphertext. The substitution-permutation network (SPN) [6, 8, 24] is a fundamental block cipher architecture based on Shannon’s principles of confusion and diffusion [22]. These principles are implemented through substitution and linear transformation (LT), respectively. Recently, SPNs have been the ∗

This work was funded in part by Communications and Information Technology Ontario (CITO), and by the Natural Sciences and Engineering Research Council (NSERC), Canada. † Department of Mathematics and Computer Science, Mount Allison University, Canada. ‡ School of Computing, Queen’s University, Canada § Department of Electrical and Computer Engineering, Queen’s University, Canada.

focus of increased attention. This is due in part to the selection of the SPN Rijndael as the U.S. Government Advanced Encryption Standard (AES) [7]. We consider SPNs in which the substitution components (s-boxes) are selected independently from the uniform distribution on the set of all possible s-boxes (the LT component remains fixed). The ideal cipher model is generally taken to be the true random cipher [19], in which each key value results in a distinct bijective mapping from {0, 1}N to {0, 1}N , and each such mapping is realized by some key. It is desirable to quantify aspects of a block cipher’s behavior which approximate that of the true random cipher. In this work we consider expected linear probability (ELP) values, which are the basis for a powerful attack called linear cryptanalysis (LC), with the goal of investigating the relationship between expected ELP values for an SPN with randomly selected s-boxes (where the outer expectation is over all SPNs generated by this random selection of s-boxes), and the corresponding values for the true random cipher. The contributions of this work are twofold. First, we derive a general formula for the expected value of a fixed ELP entry over all SPNs with randomly selected s-boxes. This can be applied for any choice of the LT component of the SPN. Second, we compute this formula for an SPN with a practical block size and a specific well-known LT, and demonstrate that the resulting values converge to what would be expected if the SPN were replaced by the true random cipher, as the number of encryption rounds is increased. We conjecture that this convergence can be shown analytically. Conventions The Hamming weight of a binary vector x is written wt(x). If Z is a random variable, E [Z] denotes the expected value of Z. And we use #A to indicate the number of elements in the set A.

2

Substitution-Permutation Networks

An SPN encrypts a plaintext through a series of R simpler encryption steps called rounds. The input to round r (1 ≤ r ≤ R) is first bitwise XOR’d with an N -bit subkey, denoted kr , which is typically derived from the key, k, via a separate key-scheduling algorithm. The substitution stage then partitions the resulting vector into M subblocks of size n (N = M n), which become the inputs to a row of bijective n × n substitution boxes (s-boxes)—bijective mappings from {0, 1}n to {0, 1}n . Finally, the permutation stage applies an invertible linear transformation (LT) to the output of the s-boxes (classically, this was a bitwise permutation). Often the permutation stage is omitted from the last round, as its presence here adds no cryptographic strength. A final subkey, kR+1 , is XOR’d with the output of round R to form the ciphertext. Figure 1 depicts an example SPN with N = 16, M = n = 4, and R = 3. We assume the most general situation for the key, namely, that k is an independent key [2], a concatenation of (R 1) subkeys chosen independently from the uniform distribution on {0, 1}N — ­ + ® 1 2 R+1 symbolically, k = k , k , . . . , k . i.e., k ∈ {0, 1}N (R+1) . Definition 2.1. Let K denote the set of all independent keys. From the above, K has the uniform distribution.

plaintext

k1 round 1 Invertible Linear Transformation

k2 s-boxes

round 2 Invertible Linear Transformation

k3

round 3

k4

ciphertext

Figure 1: SPN with N = 16, M = n = 4, R = 3

3

Linear Probability

Definition 3.1. Suppose B : {0, 1}d → {0, 1}d is a bijective mapping. Let a, b ∈ {0, 1}d be fixed, and let X ∈ {0, 1}d be a uniformly distributed random variable. The linear probability LP (a, b) is defined as def

LP (a, b) = (2 · ProbX {a • X = b • B(X)} − 1)2 ,

(1)

where • denotes the inner product over GF (2). If B is parameterized by a key, k, we write LP (a, b; k), and the expected LP (ELP) is defined as def

ELP (a, b) = E [LP (a, b; K)] , where K is a random variable uniformly distributed over the relevant space of keys. Note that LP values lie in the interval [0, 1] (hence the term “probability,” although technically these are not probability values). A nonzero LP value signifies a correlation between the input and output of B, with a higher value indicating a stronger correlation (in fact, LP (a, b) is the square of the entry [a, b] in the correlation matrix for B [5]). The values a/b in Definition 3.1 are referred to as input/output masks. For our purposes, the bijective mapping B may be an s-box, a single encryption round, or a sequence of consecutive encryption rounds. The following lemma derives immediately from Parseval’s Theorem [18].

Lemma 3.2. Let B : {0, 1}d → {0, 1}d be a bijective mapping parameterized by a key, k, and let a, b ∈ {0, 1}d . Then X X LP (a, x; k) = LP (x, b; k) = 1 x∈{0,1}d

X

x∈{0,1}d

4

x∈{0,1}d

ELP (a, x) =

X

ELP (x, b) = 1 .

x∈{0,1}d

Linear Cryptanalysis of Markov Ciphers

Linear cryptanalysis (LC) is an attack on block ciphers which was introduced by Matsui in 1993 [16]. It is a known-plaintext attack, which means that the attacker requires a number of plaintexts and their corresponding ciphertexts in order to proceed (the method by which these are obtained is outside the scope of this work). It will be useful to consider LC in the most general context, namely that of Markov ciphers [15].

4.1

Markov Ciphers

Let E : {0, 1}N → {0, 1}N be an R-round cipher, for which round r is given by the function y = ²r (x; kr ) (x ∈ {0, 1}N is the round input, and kr ∈ {0, 1}N is the round-r subkey). Then E is a Markov cipher with respect to the XOR group operation (⊕) on {0, 1}N if, for 1 ≤ r ≤ R, and any x, ∆x, ∆y ∈ {0, 1}N , ProbK {²r (x; K) ⊕ ²r (x ⊕ ∆x; K) = ∆y} = ProbK,X {²r (X; K) ⊕ ²r (X ⊕ ∆x; K) = ∆y} ,

(2)

where X and K are independent random variables uniformly distributed over {0, 1}N and K (the set of all independent keys), respectively. That is, the probability over the key that a fixed input difference produces a fixed output difference is independent of the round input. Note that it is easy to show that an SPN with fixed s-boxes is a Markov cipher. Remark 4.1. The term “Markov cipher” derives from the following. Consider an R-round cipher for which (2) holds. Let ∆y ∈ {0, 1}N \ 0 be fixed, let Y0 ∈ {0, 1}N be a uniformly distributed random variable, and let Y0∗ = Y0 ⊕ ∆y. If we define Yr (resp. Yr∗ ) to be the random variable representing the result of the encryption of Y0 (resp. Y0∗ ) through rounds 1 . . . r (1 ≤ r ≤ R), and if Zi = Yi ⊕ Yi∗ , then Z0 , Z1 , . . . , ZR is a Markov chain, where probabilities are computed over Y0 and over the uniform distribution of independent keys [15]. Remark 4.2. The material in the remainder of Section 4 applies to any Markov cipher. Although we are dealing with LC, which does not involve XOR differences between encryption inputs or outputs (unlike differential cryptanalysis [3], which is considered the dual of LC), the relevance of the Markov property given in (2) is via an interesting connection between linear probability and differential probability (see, for example, equations (3) and (4) in [23]).

4.2

Linear Cryptanalysis

Matsui [16] introduced two versions of linear cryptanalysis; the more powerful version is known as Algorithm 2 (Algorithm 1 yields only a single subkey bit). Algorithm 2 can be used to extract (pieces of) the round-1 subkey, k1 . Once k1 is known, round 1 can be stripped off, and LC can be reapplied to obtain k2 , and so on. To carry out Algorithm 2, the attacker seeks to find input/output masks a, b ∈ {0, 1}N for the keyed bijective mapping consisting of rounds 2 . . . R, for which LP (a, b; k) is maximal. Given these values, the attack proceeds as in Figure 2. pi round 1 xi

ˆ Guess k1 = k Form a • xi

rounds 2...R

ci

Form b • ci

Obtain NL known hplaintext, ciphertexti pairs: hp1 , c1 i , hp2 , c2 i , . . . , hpNL , cNL i 1 ˆ Guess k = k. Encrypt each pi through round 1 to obtain xi ˆ If a • xi = b • ci then increment counter µ(k) ³ ´2 ˆ which maximizes 2 · µ(k) ˆ − NL Choose the k

Figure 2: Summary of linear cryptanalysis (Algorithm 2) The probability that Algorithm 2 will determine the correct value of k1 increases as the number of known hplaintext, ciphertexti pairs, NL , is increased. The value NL is called the data complexity of the attack—this is what the attacker wants to minimize. Given an assumption about the behavior of round-1 output [16], Matsui shows that if c , NL = LP (a, b; k) then Algorithm 2 has the success rates in Table 1, for various values of the constant c. Note that this is the same as Table 3 in [16], except that the constant values differ by a factor of 4, since Matsui uses bias values, not LP values. (The corresponding table in [12] has an error, in that the constants have not been multiplied by 4 to reflect the use of LP values.)

c

8

16

32

64

Success rate

48.6%

78.5%

96.7%

99.9%

Table 1: Success rates for LC Algorithm 2 4.2.1

Notational Issues

Above, we have discussed input and output masks and the associated LP values for rounds 2 . . . R of an R-round cipher. It is useful to consider these and other related concepts as applying to any T ≥ 2 consecutive “core” rounds (we say that these are the rounds being approximated). For Algorithm 2 as outlined above, T = R − 1, and so what we will call the “first round,” or “round 1,” is actually round 2 of the cipher. We use superscripts when we are dealing with individual rounds, so LP t (a, b; kt ) and ELP t (a, b) are LP and ELP values, respectively, for round t (1 ≤ t ≤ T ). On the other hand, we use t as a subscript to refer to values which apply to the first t rounds as a unit, so, for example, ELP t (a, b) is an ELP value over rounds 1 . . . t. Note that whenever we view rounds 1 . . . t as a unit, we will assume that the LT is omitted from the last (tth ) round—this simplifies certain arguments.

4.3

Linear Characteristics

For fixed a, b ∈ {0, 1}N , direct computation of LP T (a, b; k) for T core rounds is generally infeasible, first since it requires encrypting all N -bit vectors through rounds 1 . . . T , and second because of the dependence on an unknown key. The latter difficulty is usually handled by working instead with the expected value ELP T (a, b). The data complexity of Algorithm 2 for masks a and b is now taken to be NL =

c . ELP T (a, b)

(3)

The implicit assumption is that LP T (a, b; k) is approximately equal to ELP T (a, b) for almost all values of k (this derives from the Hypothesis of Stochastic Equivalence in [15]). The problem of computational complexity is usually treated by approximating ELP T (a, b) through the use of ­linear characteristics® (or simply characteristics). A T -round characteristic is a (T + 1)-tuple Ω = a1 , a2 , . . . , aT , aT +1 , where each at ∈ {0, 1}N . We view at and at+1 as input and output masks, respectively, for round t. ­ ® Definition 4.3. Let Ω = a1 , a2 , . . . , aT , aT +1 be a T -round characteristic. The linear characteristic probability (LCP) and expected LCP (ELCP) of Ω are defined as LCP (Ω; k) =

ELCP (Ω) =

T Y

t=1 T Y t=1

LP t (at , at+1 ; kt ) ELP t (at , at+1 ) .

4.4

Choosing the Best Characteristic

In carrying out LC, the attacker typically runs an algorithm to find the T -round characteristic, Ω, for which ELCP (Ω) is maximal; such a characteristic (not necessarily unique) is called the ­ ® best characteristic [17]. If Ω = a1 , a2 , . . . , aT , aT +1 , and if the input and output masks used in Algorithm 2 are taken to be a = a1 and b = aT +1 , respectively, then ELP T (a, b) (used to determine NL in (3)) is approximated by ELP T (a, b) ≈ ELCP (Ω) .

(4)

The approximation in (4) has been widely used to evaluate the security of block ciphers against LC [8, 11]. Knudsen calls a block cipher practically secure if the data complexity determined by this method is prohibitive [14]. However, by introducing the concept of linear hulls, Nyberg demonstrated that the above approach can underestimate the success of LC [20].

4.5

Linear Hulls

Definition 4.4 (Nyberg [20]). Given masks a, b ∈ {0, 1}N , the corresponding linear hull, denoted ALH(a, b), is the set of all T -round characteristics (for the T rounds under consideration) having a as the input mask for round 1 and b as the output mask for round T , i.e., all characteristics of the form ­ ® Ω = a, a2 , a3 , . . . , aT , b . Remark 4.5. Nyberg used the term approximate linear hull, hence the abbreviation ALH.

Theorem 4.6 (Nyberg [20]). Let a, b ∈ {0, 1}N . Then X ELP T (a, b) = ELCP (Ω) .

(5)

Ω∈ALH(a,b)

Remark 4.7. It follows immediately from Theorem 4.6 that (4) does not hold in general, since ELP T (a, b) is seen to be equal to a sum of terms ELCP (Ω) over a set of characteristics, and therefore, in general, the ELCP of any characteristic will be strictly less than the corresponding ELP value. This is referred to as the linear hull effect. An important consequence is that an attacker may overestimate the number of hplaintext, ciphertexti pairs required for a given success rate.

4.6

Maximum Average Linear Hull Probability

An SPN is considered to be provably secure against LC if the maximum ELP, max

a,b∈{0,1}N \0

ELP T (a, b) ,

(6)

is sufficiently small that the resulting data complexity is prohibitive for any conceivable attacker. (For Algorithm 2 as described above, this must hold for T = R − 1. Since variations of LC can be used to attack the first and last SPN rounds simultaneously, it may also be important that the data complexity remain prohibitive for T = R − 2.) The value in (6) is also called the maximum average linear hull probability (MALHP) [12].

5

SPN-Specific Considerations

In the current section, we present results related to LC which are specific to the SPN structure. Note that where matrix multiplication is involved, we view all vectors as column vectors. Also, if M is a matrix, M0 denotes the transpose of M. Lemma 5.1. Consider T core SPN rounds. Let 1 ≤ t ≤ T , and a, b, kt ∈ {0, 1}N . Then LP t (a, b; kt ) is independent of kt , and therefore LP t (a, b; kt ) = ELP t (a, b) . ˆ = x ⊕ kt when evaluating (1). Proof. Follows from the change of variables x Corollary 5.2. Let Ω be a T -round characteristic for an SPN. Then LCP (Ω) = ELCP (Ω). Definition 5.3. Let L denote the N -bit LT of the SPN represented as a binary N × N matrix, i.e., if x, y ∈ {0, 1}N are the input and output, respectively, for the LT, then y = Lx. Lemma 5.4 ([5]). If b ∈ {0, 1}N and a = L0 b, then a • x = b • y for all N -bit inputs, x, to the LT, and corresponding outputs, y (i.e., if b is an output mask for the LT, then a = L0 b is the (unique) corresponding input mask). It follows from Lemma 5.4 that if at and at+1 are input and output masks for round t, respectively, then the resulting input and output masks for the substitution stage of round t are at and bt = L0 at+1 , respectively. Further, at and bt determine input and output masks for each s-box in round t. Let the masks for Sit be denoted ati and bti , for 1 ≤ i ≤ M (we number s-boxes from left to right). Then from Matsui’s Piling-up Lemma [16] and Lemma 5.1, t

t

ELP (a , a

t+1

)=

M Y

t

LP Si (ati , bti )

(7)

i=1

t

where LP Si ( ) denotes an LP value for Sit . From the above, any characteristic Ω ∈ ALH(a, b) determines an input and an output mask for each s-box in rounds 1 . . . T . If this yields at least one s-box for which the input mask is zero and the output mask is nonzero, or vice versa, the linear probability associated with that s-box will trivially be 0, and therefore ELCP (Ω) = 0 by (7) and Definition 4.3. In this case, Ω does not contribute to the right-hand side of (5) in Theorem 4.6. We exclude such characteristics from consideration via the following definition. Definition 5.5. For a, b ∈ {0, 1}N , let ALH(a, b)∗ consist of the elements Ω ∈ ALH(a, b) such that for each s-box in rounds 1 . . . T , the input and output masks determined by Ω for that s-box are either both zero or both nonzero. Definition 5.6 ([2]). Any T -round characteristic, Ω, determines an input and an output mask for each s-box in rounds 1 . . . T . Those s-boxes having nonzero input and output masks are called active. Definition 5.7. Let v be an input or an output mask for the substitution stage of round t. Then the active s-boxes in round t can be determined from v (without knowing the corresponding output or input mask, respectively). We define γv to be the M -bit vector which encodes the pattern of active s-boxes: γv = γ1 γ2 . . . γM , where γi = 1 if the ith s-box is active, and γi = 0 otherwise, for 1 ≤ i ≤ M.

Definition 5.8 ([12]). Let γ, γˆ ∈ {0, 1}M . Then © ª def W [γ, γˆ ] = # y ∈ {0, 1}N : γx = γ, γy = γˆ , where x = L0 y .

Remark 5.9. Informally, the value W [γ, γˆ ] represents the number of ways the LT can “connect” a pattern of active s-boxes in one round (γ) to a pattern of active s-boxes in the next round (ˆ γ ). Remark 5.10. It is easy to see that W [0, 0] = 1, and if γ ∈ {0, 1}M \0, then W [γ, 0] = W [0, γ] = 0.

6

Expected ELP Values over all SPNs

Nyberg’s result (Theorem 4.6) gives an exact expression for an expected linear probability value (over all keys), for an SPN with fixed s-boxes. However, because it involves a sum over a (generally) very large set of characteristics, this value appears to be difficult to compute exactly (except for extremely small “toy” examples). By adopting an SPN model in which the s-boxes are selected uniformly and independently from the set of all bijective n × n s-boxes, we are able to obtain an exact expression for the expected ELP value, where the outer expectation is over all SPNs generated by this random selection of s-boxes. Further, as we will see in Section 7, it is feasible to compute this expression for certain SPNs with large block sizes. The main result of this section is Theorem 6.12. In what follows, we assume that n, N , and T ≥ 2 are fixed, and that the LT is fixed. Definition 6.1. Let SPN denote the set of SPNs generated by selecting each s-box uniformly and independently from the set of all bijective n × n s-boxes. Remark 6.2. It follows that all SPNs in SPN are equally probable, i.e., SPN has the uniform distribution.

6.1

Distribution of LP Values for Randomly Selected S-boxes

Given nonzero input and output masks for an active s-box, S, if S varies over the set of all bijective n × n s-boxes, the corresponding LP value will also vary. The resulting distribution is given in the following lemma. Lemma 6.3 ([21, 25]). Let S be a bijective n × n s-box (n ≥ 2), and let α, β ∈ {0, 1}n \ 0 be fixed. If S varies uniformly over the set of all bijective n × n s-boxes, then the resulting distribution of values LP (α, β) is given by the following ordered pairs of the form hLP, probabilityi: * ¡ *  ¡ 2n−1 ¢2 + ¢ + 2n−1 2  [    2 2 2n−2 +q q n−2 n−2 ¡ ¢ 0, ¡2 2n ¢ , : 1 ≤ q ≤ 2 . n 2    22n−4  2n−1 2n−1 The distribution of Lemma 6.3 is plotted in Figure 3 for n = 8 (with a log10 scale on the y-axis).

The next two lemmas, due to the authors, give the expected LP value over all s-boxes for one active s-box, and for multiple active s-boxes (i.e., for a characteristic), respectively. Lemma 6.4. Let Z be a random variable which has the distribution of Lemma 6.3. Then E [Z] =

1 . (2n − 1)

0

log10 (probability)

-10 -20 -30 -40 -50 -60 -70 -80 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

LP value

Figure 3: Distribution of LP values for randomly selected 8 × 8 s-box Proof. The distribution of Lemma 6.3 consists essentially of the squares of the terms in the hypergeometric distribution [13]. The result follows from the second moment of this distribution. Lemma 6.5. Let a, b ∈ {0, 1}N \ 0 be fixed, and let Ω ∈ ALH(a, b)∗ . Let A(Ω) be the number of s-boxes made active by Ω. Let Z be a random variable taking the value ELCP (Ω) for each SPN in SPN . Then µ ¶A(Ω) 1 E [Z] = . 2n − 1 Proof. Enumerate the s-boxes made active by Ω as S1 , S2 , . . . , SA(Ω) (order is unimportant). It follows that Ω determines nonzero input and output masks for each Ss (1 ≤ s ≤ A(Ω)). Let Zs be a random variable representing the LP value for Ss . The distribution of Zs is given by Lemma 6.3 (insofar as Zs is concerned, varying uniformly over SPN is the same as varying uniformly over all choices of bijective n × n s-boxes for Ss ), and from Lemma 6.4 we have E [Zs ] =

(2n

1 . − 1)

Since we vary uniformly over SPN , the Zs are independent. From Definition 4.3 and (7), we have Z = Z1 · Z2 · · · ZA(Ω) , and from the independence of the Zs , ¤ £ E [Z] = E [Z1 ] · E [Z2 ] · · · E ZA(Ω) µ ¶A(Ω) 1 . = 2n − 1

6.2

Counting Characteristics

The elements of ALH(a, b)∗ can be grouped according to the number of s-boxes they make active. Definition 6.6. For fixed masks a, b ∈ {0, 1}N , let Ca,b [A] denote the number of characteristics in ALH(a, b)∗ which make A s-boxes active. Remark 6.7. If Ω ∈ ALH(a, b)∗ , then the s-boxes which Ω makes active in rounds 1 and T are given by γa and γb , respectively (recall Definition 5.7), and therefore are the same for all Ω ∈ ALH(a, b)∗ . Remark 6.8. If a and b are both nonzero, and if Ω ∈ ALH(a, b)∗ , then Ω activates at least one s-box in each of rounds 2 . . . (T − 1). In what follows we will assume that a and b are nonzero (so that Remark 6.8 applies). Clearly the numbers of s-boxes made active in round 1 and round T are wt(γa ) and wt(γb ), respectively. If A(Ω) is the total number of s-boxes made active by Ω ∈ ALH(a, b)∗ , then A(Ω) is minimized by an Ω which activates one s-box in each of rounds 2 . . . (T − 1), and is maximized by an Ω which activates all s-boxes in rounds 2 . . . (T − 1). Therefore, we are interested in the values Ca,b [A] for Amin ≤ A ≤ Amax , where Amin = wt(γa ) + wt(γb ) + (T − 2) Amax = wt(γa ) + wt(γb ) + M (T − 2) . 6.2.1

Recursive Formulation for Ca,b [A]

We can think of “constructing” a characteristic in ALH(a, b)∗ in a round-by-round fashion. In each round t (2 ≤ t ≤ T − 1), we select the s-boxes to be made active, ensuring that we are able to “connect” these active s-boxes to the active s-boxes in the previous round (see Definition 5.8 and Remark 5.9), i.e., that the relevant W [ ] entry is nonzero. For t = T − 1, we also need to be able to connect to the active s-boxes in round T . This suggests a recursive formulation for Ca,b [A]. Definition 6.9. For t ≥ 2 and γ, γˆ ∈ {0, 1}M , let D[t, A, γ, γˆ ] denote the number of characteristics over rounds 1 . . . t which make a total of A s-boxes active, with the restriction that all the characteristics have the same input mask for round 1 which makes active the pattern of s-boxes given by γ, and the same output mask for round t which makes active the pattern of s-boxes given by γˆ . Remark 6.10. Although we refer to fixed input and output masks in Definition 6.9, they need not be specified, as the number of characteristics denoted by D[t, A, γ, γˆ ] depends only on γ and γˆ . It follows from the above that Ca,b [A] = D[T, A, γa , γb ] . Theorem 6.11. Let t ≥ 2, γ, γˆ ∈ {0, 1}M , and Amin ≤ A ≤ Amax , where Amin = wt(γ) + wt(ˆ γ ) + (t − 2) Amax = wt(γ) + wt(ˆ γ ) + M (t − 2) .

If t = 2, then D[t, A, γ, γˆ ] =

½

W [γ, γˆ ] if A = wt(γ) + wt(ˆ γ) 0 otherwise

If 3 ≤ t ≤ T , then D[t, A, γ, γˆ ] =

wX max

w=wmin

X

W [γ, γˆ ] · D [t − 1, A − wt(ˆ γ ), γ, γ] ,

(8)

γ∈{0,1}M wt(γ)=w

where wmin = max {1, A − wt(γ) − wt(ˆ γ ) − M (t − 3)} wmax = min {M, A − wt(γ) − wt(ˆ γ ) − (t − 3)} . Proof. If t = 2, then Amin = Amax = wt(γ) + wt(ˆ γ ), and clearly D[2, wt(γ) + wt(ˆ γ ), γ, γˆ ] is equal to the number of ways the active s-boxes in round 1 can be connected to the active s-boxes in round 2, that is, D[2, wt(γ) + wt(ˆ γ ), γ, γˆ ] = W [γ, γˆ ] . It follows that if A 6= wt(γ) + wt(ˆ γ ), D[2, A, γ, γˆ ] = 0 . If 3 ≤ t ≤ T , then each of the t-round characteristics counted by D[t, A, γ, γˆ ] can be viewed as consisting of a (t − 1)-round characteristic for rounds 1 . . . (t − 1) which makes (A − wt(ˆ γ )) s-boxes active, concatenated with a final mask which activates s-boxes in round t according to the pattern γˆ . If γ is the pattern of active s-boxes in round (t − 1), then D [t − 1, A − wt(ˆ γ ), γ, γ] is the number of such (t − 1)-round characteristics. It follows that D[t, A, γ, γˆ ] is given by a sum of terms of the form γ ), γ, γ] , W [γ, γˆ ] · D [t − 1, A − wt(ˆ where the summation is over all γ ∈ {0, 1}M , with certain restrictions on the value wt(γ). Since at least one s-box must be active in each of rounds 2 . . . (t − 1) (Remark 6.8), we have wt(γ) ≥ 1. Now (A − wt(γ) − wt(ˆ γ )) active s-boxes must be “distributed” among rounds 2 . . . (t − 1). If A is sufficiently large, then all the s-boxes in rounds 2 . . . (t − 2) can be made active, forcing the remaining (A − wt(γ) − wt(ˆ γ ) − M (t − 3)) active s-boxes to be located in round (t − 1). It follows that wt(γ) ≥ wmin = max {1, A − wt(γ) − wt(ˆ γ ) − M (t − 3)} . A similar argument shows that wmax is the correct upper bound for the outer summation in (8).

6.3

Main Result

We now state and prove our main result. Note that since we are dealing with an expectation over SPN , we augment our notation with subscripts as appropriate to indicate when there is a dependence on the underlying SPN. We use spn to denote a fixed value in SPN , and SPN to denote a random variable over SPN . Theorem 6.12. Let T ≥ 2, and a, b ∈ {0, 1}M \ 0. If Amin = wt(γa ) + wt(γb ) + (T − 2) Amax = wt(γa ) + wt(γb ) + M (T − 2) , then ESPN [ELP (a, b)] =

A max X

Ca,b [A] ·

A=Amin

µ

1 n 2 −1

¶A

.

Proof. We make use of Theorem 4.6, with ALH(a, b)∗ in place of ALH(a, b), based on the discussion preceding Definition 5.5. For Ω ∈ ALH(a, b)∗ , let A(Ω) be the number of s-boxes made active by Ω. Then X 1 ESPN [ELP (a, b)] = ELPspn (a, b) #SPN spn∈SPN

=

=

1 #SPN X

X

A max X

A=Amin



 1 #SPN ∗

Ω∈ALH(a,b)∗

=

X

ELCPspn (Ω)

(9)

spn∈SPN Ω∈ALH(a,b)∗

Ω∈ALH(a,b)

=

X

µ

2n

Ca,b [A] ·

1 −1 µ

X

spn∈SPN

¶A(Ω)

1 2n − 1

¶A



ELCPspn (Ω) (10)

,

(11)

where (9) follows from Theorem 4.6, (10) follows from Lemma 6.5, and (11) is obtained by grouping characteristics in ALH(a, b)∗ according to the numbers of s-boxes they make active. Remark 6.13. We can rewrite ESPN [ELP (a, b)] as ESPN,K [LP (a, b; K)], the expectation of LP (a, b; K) over SPN and K. It is not hard to show that for each selection of s-boxes and each independent key, there is a selection of s-boxes which, together with the all-zero key (or, in general, any fixed key), produces an equivalent SPN (i.e., the same mapping from plaintexts to ciphertexts). Moreover, varying uniformly over SPN and K is equivalent to fixing the all-zero key and varying uniformly over SPN . Therefore ESPN [ELP (a, b)] = ESPN [LP (a, b; 0)] . We did not need this fact in the proof of Theorem 6.12, but it will be useful below.

7

Example SPN Structure

In this section we apply the theory of the preceding sections to a specific SPN structure. We consider SPNs in which M = n (therefore N = n2 , hence these are called square SPNs), and in which the LT is the permutation of Kam and Davida [10]. This permutation connects output bit j of s-box i in round t to input bit i of s-box j in round (t + 1) (numbering proceeds from left to right, beginning at 1). Figure 4 gives an example of such an SPN for the parameters M = n = 4 (N = 16), and R = 3. plaintext (N = 16) s-boxes

k1 round 1

k2 round 2

k3

round 3

k4

ciphertext

Figure 4: SPN with M = n = 4 (N = 16), R = 3, and permutation of Kam and Davida

7.1

Evaluating Ca,b [A]

The main task in computing ESPN [ELP (a, b)] using Theorem 6.12 is evaluating the terms Ca,b [A], and the challenge here is obtaining the values W [ ]. The expression in Definition 5.8 can be applied directly, but this requires processing all N -bit vectors through the LT, which is prohibitive for common block sizes such as 64 bits. However, the symmetry of the Kam and Davida permutation makes derivation of the values W [ ] straightforward, as we show in the next lemma. (Note that Remark 5.10 takes care of the trivial cases.) Lemma 7.1. Consider an SPN in which M = n, and in which the LT is the permutation of Kam and Davida [10]. Let γ, γˆ ∈ {0, 1}M \ 0, and let p = wt(γ) and q = wt(ˆ γ ). Then µ ¶ q X ¡ i ¢p q−i q 2 −1 . W [γ, γˆ ] = (−1) i i=1

Proof. Let 1 ≤ t ≤ (T − 1), let P be a fixed set of p active s-boxes in round t, and let Q be a fixed set of q active s-boxes in round (t + 1). We want to determine the number of ways we can connect P to Q through the LT, i.e., we want to determine the number of output masks for round t which activate exactly the s-boxes in P, and which are transformed by the LT into input masks for round (t + 1) which activate exactly the s-boxes in Q. Let 1 ≤ i ≤ q, and suppose we “mark” i of the s-boxes in Q. Let ci be the number of output masks for round t which activate exactly the s-boxes in P, and which, when transformed by the LT into input masks for round (t + 1), activate some subset of the i marked s-boxes. Note that each s-box in round t has exactly one output “wire” connecting it to any given s-box in round (t + 1). So if S ∈ P , an output mask counted by ci can have 1’s or 0’s on the i wires connecting S to the i marked s-boxes ¡ i ¢in round (t + 1), and 0’s on the remaining (n − i) output wires for S. Therefore, there are 2 − 1 possible n-bit output masks for S (the all-zero mask is not allowed, since S is active). Since this same argument applies for each s-box in P , we have ¡ ¢p ci = 2i − 1 . ¡¢ Noting that there are qi ways to mark i s-boxes in Q, and applying the inclusion-exclusion principle, we get q X

µ ¶ q W [γ, γˆ ] = (−1) ci i i=1 µ ¶ q X ¡ i ¢p q−i q = 2 −1 . (−1) i q−i

i=1

Remark 7.2. Note that in Lemma 7.1, W [γ, γˆ ] depends only on the number of s-boxes made active by γ and by γˆ , not on the specific patterns of active s-boxes.

7.2

Computational Results

For the example SPN structure given above, we used Theorem 6.12 to compute ESPN [ELP (a, b)] for a range of parameters: 2 ≤ n ≤ 10, 2 ≤ T ≤ 16, and 1 ≤ wt(γa ), wt(γb ) ≤ M = n. Note that for this structure, the result of Theorem 6.12 does not depend on the specific choices of a and b, but only on wt(γa ) and wt(γb ) (this follows from Remark 7.2). In Figure 5 we plot values ESPN [ELP (a, b)] for M = n = 4 and 2 ≤ T ≤ 16, in the case that wt(γa ) = wt(γb ) = 1 (with a log10 scale on the y-axis). On the same graph we also plot experimental values, obtained as follows. For fixed masks a = D000 (hex) and b = 0050 (hex) (which satisfy wt(γa ) = wt(γb ) = 1), and for each value of T , we generated 1000 SPNs at random, and for each SPN we computed LPspn (a, b; 0) directly from (1). (We fixed the all-zero key, making use of Remark 6.13.) We then plotted the average of these 1000 LP values. Note that the experimental values correspond well to the theoretical values. We also observe the apparent convergence of the theoretical and experimental values to some limiting value, and this led to our consideration of the true random cipher (initially this work had a different direction). From Lemma 6.4, we know that ELP (a, b) for the true random cipher is

log10 (expected ELP)

0 Theorem 6.12 Experimental True random cipher

-1 -2 -3 -4 -5 -6

2

4

6

8

10

12

14

16

Number of rounds being approximated (T ) Figure 5: ESPN [ELP (a, b)] for M = n = 4 and a = D000(hex), b = 0050(hex)

given by (2N

1 1 = 16 ≈ 1.526 × 10−5 , − 1) (2 − 1)

and log10 (1.526 × 10−5 ) ≈ −4.82. This value is plotted in Figure 5, and indeed appears to be the limit approached by the theoretical and experimental curves. In Figure 6 we plot ESPN [ELP (a, b)] for a 64-bit block size (M = n = 8). Again we see a strong correspondence between what appears to be a limiting value for ESPN [ELP (a, b)], and the value ELP (a, b) for the true random cipher, namely 1 ≈ 5.421 × 10−20 − 1)

(264

(log10 (5.421 × 10−20 ) ≈ −19.27 is plotted in Figure 6). The above leads us to the following conjecture. Conjecture 7.3. Consider an SPN with the LT of Kam and Davida [10]. Let a, b ∈ {0, 1}N \ 0. Then lim ESPN [ELP (a, b)] =

T →∞

7.3

(2N

1 . − 1)

Generalized Conjecture

Based on preliminary computations for SPNs that have LTs other than the Kam and Davida permutation, we believe that the convergence expressed in Conjecture 7.3 holds for a much larger class of LTs, which we call high-level-complete (or HL-complete). We need the following well-known definition.

0 log10 (expected ELP)

Theorem 6.12 True random cipher

-5 -10 -15 -20 2

4

6

8

10

12

14

16

Number of rounds being approximated (T ) Figure 6: ESPN [ELP (a, b)] for M = n = 8 and wt(γa ) = wt(γb ) = 1

Definition 7.4 ([10]). Let F : {0, 1}d → {0, 1}d . Then F is called complete if every output bit depends on every input bit. Formally, for all 1 ≤ i, j ≤ d, there exist x, y ∈ {0, 1}d such that x and y differ in exactly bit i, and F (x) and F (y) differ in at least bit j. Remark 7.5. It is not hard to show that if F is linear and invertible, then F cannot be complete. However, since the linear functions that concern us are the LT components of SPNs, we give a modified definition that captures the property we want. Definition 7.6. The linear transformation component of an SPN is called high-level-complete, or HL-complete, if, when all the s-boxes of the SPN are complete, the SPN itself is complete after some number of rounds. Remark 7.7. It is not hard to see that the permutation of Kam and Davida is HL-complete—for complete s-boxes, it yields a complete SPN after two rounds (with the LT present only in the first round). We now give our generalized conjecture. Conjecture 7.8. Consider an SPN whose LT is HL-complete. Let a, b ∈ {0, 1}N \ 0. Then lim ESPN [ELP (a, b)] =

T →∞

8

(2N

1 . − 1)

Conclusion

We have investigated a fundamental block cipher architecture, the substitution-permutation network (SPN), which is based on Shannon’s principles of confusion and diffusion [22]. SPNs represent an increasingly important class of ciphers, as evidenced by the recent adoption of the SPN Rijndael as the U.S. Government Advanced Encryption Standard (AES) [7]. There is a growing body of

research to support the cryptographic strength of the SPN structure [1, 4, 8, 9, 12, 24], and it is in this direction that we have conducted our work. In the analysis of block ciphers, the true random cipher [19] is generally taken to be the ideal cipher model. We consider expected linear probability (ELP) values (important in the powerful attack called linear cryptanalysis [16]) for SPNs in which the s-boxes are selected uniformly and independently from the set of all bijective s-boxes. We derive an exact expression for the expected ELP value over all SPNs generated by this method. This expression depends implicitly on the choice of linear transformation (LT), and is a function of the number of rounds of encryption. For a particular choice of LT, namely that of Kam and Davida [10], we evaluated our expression for a variety of parameters. First of all, we found that experimentally generating SPNs and computing the average LP directly yielded values which corresponded well to those predicted by our expression. More importantly, both theoretical and experimental values appear to have a limiting value (as the number of encryption rounds is increased), namely the ELP value for the true random cipher. This gives quantitative support to the claim that an SPN with sufficient rounds of encryption approximates the true random cipher. Moreover, we conjecture that this convergence can be shown analytically. Acknowledgments The authors would like to thank Amr Youssef (Cairo University) for reading and commenting on an earlier draft of this work.

References [1] C.M. Adams, A formal and practical design procedure for substitution-permutation network cryptosystems, Ph.D. Thesis, Queen’s University, Kingston, Canada, 1990. [2] E. Biham, On Matsui’s linear cryptanalysis, Advances in Cryptology—EUROCRYPT’94, LNCS 950, Springer-Verlag, pp. 341–355, 1995. [3] E. Biham and A. Shamir, Differential cryptanalysis of DES-like cryptosystems, Journal of Cryptology, Vol. 4, No. 1, pp. 3–72, 1991. [4] Z.G. Chen and S.E. Tavares, Towards provable security of substitution-permutation encryption networks, Fifth Annual International Workshop on Selected Areas in Cryptography (SAC’98), LNCS 1556, Springer-Verlag, pp. 43–56, 1999. [5] J. Daemen, R. Govaerts, and J. Vandewalle, Correlation matrices, Fast Software Encryption : Second International Workshop, LNCS 1008, Springer-Verlag, pp. 275–285, 1995. [6] H. Feistel, Cryptography and computer privacy, Scientific American, Vol. 228, No. 5, pp. 15–23, May 1973. [7] FIPS 197, Advanced Encryption Standard (AES), Federal Information Processing Standards Publication 197, U.S. Department of Commerce, National Institute of Standards and Technology, Information Technology Laboratory, Gaithersburg, Maryland, 2001. [8] H.M. Heys and S.E. Tavares, Substitution-permutation networks resistant to differential and linear cryptanalysis, Journal of Cryptology, Vol. 9, No. 1, pp. 1–19, 1996. [9] S. Hong, S. Lee, J. Lim, J. Sung, and D. Cheon, Provable security against differential and linear cryptanalysis for the SPN structure, Fast Software Encryption (FSE 2000), LNCS 1978, Springer-Verlag, pp. 273–283, 2001.

[10] J.B. Kam and G.I. Davida, Structured design of substitution-permutation encryption networks, IEEE Transactions on Computers, Vol. C-28, No. 10, pp. 747–753, October 1979. [11] L. Keliher, H. Meijer, and S. Tavares, Modeling linear characteristics of substitutionpermutation networks, Sixth Annual International Workshop on Selected Areas in Cryptography (SAC’99), LNCS 1758, Springer-Verlag, pp. 78–91, 2000. [12] L. Keliher, H. Meijer, and S. Tavares, New method for upper bounding the maximum average linear hull probability for SPNs, Advances in Cryptology—EUROCRYPT 2001, LNCS 2045, Springer-Verlag, pp. 420–436, 2001. [13] M.G. Kendall, The Advanced Theory of Statistics, Volume I, Charles Griffin & Company Limited, 1943. [14] L.R. Knudsen, Practically secure Feistel ciphers, Fast Software Encryption, LNCS 809, Springer-Verlag, pp. 211–221, 1994. [15] X. Lai, J. Massey, and S. Murphy, Markov ciphers and differential cryptanalysis, Advances in Cryptology—EUROCRYPT’91, LNCS 547, Springer-Verlag, pp. 17–38, 1991. [16] M. Matsui, Linear cryptanalysis method for DES cipher, Advances in Cryptology— EUROCRYPT’93, LNCS 765, Springer-Verlag, pp. 386–397, 1994. [17] M. Matsui, On correlation between the order of s-boxes and the strength of DES, Advances in Cryptology—EUROCRYPT’94, LNCS 950, Springer-Verlag, pp. 366–375, 1995. [18] W. Meier and O. Staffelbach, Nonlinearity criteria for cryptographic functions, Advances in Cryptology—EUROCRYPT’89, LNCS 434, Springer-Verlag, pp. 549–562, 1990. [19] A.J. Menezes, P.C. van Oorschot, and S.A. Vanstone, Handbook of Applied Cryptography, CRC Press, 1996. [20] K. Nyberg, Linear approximation of block ciphers, Advances EUROCRYPT’94, LNCS 950, Springer-Verlag, pp. 439–444, 1995.

in

Cryptology—

[21] L. O’Connor, Properties of linear approximation tables, Fast Software Encryption : Second International Workshop, LNCS 1008, Springer-Verlag, pp. 131–136, 1995. [22] C.E. Shannon, Communication theory of secrecy systems, Bell System Technical Journal, Vol. 28, no. 4, pp. 656–715, 1949. [23] S. Vaudenay, On the security of CS-Cipher, Fast Software Encryption (FSE’99), LNCS 1636, Springer-Verlag, pp. 260–274, 1999. [24] A.M. Youssef, Analysis and design of block ciphers, Ph.D. Thesis, Queen’s University, Kingston, Canada, 1997. [25] A.M. Youssef and S.E. Tavares, Resistance of balanced s-boxes to linear and differential cryptanalysis, Information Processing Letters, Vol. 56, pp. 249–252, 1995.

Suggest Documents