The SAC Test: A New Randomness Test, with Some Applications to PRNG Analysis Julio C. Hernandez1, José María Sierra 2, and Andre Seznec1 1
INRIA-IRISA, CAPS TEAM, Campus de Beaulieu, 35042 Rennes, France {jcesar, seznec}@irisa.fr http://www.irisa.fr/caps 2 Carlos III University 28911 Leganés, Madrid, Spain
[email protected]
Abstract. A new statistical test for randomness, the SAC test, is presented, together with its results over some well-known generators in the literature. These results are analyzed and some possible applications of the test, as for measuring the strength of cryptographic primitives including block ciphers, stream ciphers and pseudorandom number generators, especially during the design and analysis phase, are detailed. Finally, the source code for the SAC test is provided, which proves some of its other advantages: it is easy to implement, and very fast so well-suited for practical applications.
1 Introduction The problem of randomness testing or, alternatively, of assessing the quality of different pseudorandom number generators is becoming increasingly crucial, specially for assuring the safety of communications [1], due to the nowadays wide use of public key cryptography, key management and digital signatures that all rely in the existence of secure methods for generating random numbers and which, in the presence of a bad random bit generator, suffer a great decrease in security [2]. This is not to mention their need in scientific computing, including Monte Carlo simulations, probabilistic algorithms and VSLI testing, to name a few. A number of classic tests are presented in [3] but these are considered outdated and not very powerful nowadays, because many obviously weak random generators could pass all of them. Alternatively, the Diehard battery of tests [4] was considered the most complete and powerful battery of tests, inspiring some others like [5], until the same authors published a new battery of tests [6] that is claimed to be better than Diehard. These tests, together with some other classical ones are implemented in [7].
A. Laganà et al. (Eds.): ICCSA 2004, LNCS 3043, pp. 960–967, 2004. © Springer-Verlag Berlin Heidelberg 2004
The SAC Test: A New Randomness Test
961
However the number and power of the different tests batteries, unfortunately, no universal tests for randomness exists (at least in an applicable, practical form) and some previously tests purportedly to be universal [2] are not (see discussion in [1,8]). That’s why it is worthy to continue to devise new types of randomness tests to measure new aspects of randomness. It is important that these tests are both easy to implement (so to easily include in already existing test batteries) and efficient (so being able to examine new aspects of randomness without significantly increasing the demand of time or computations). 1.1 The Strict Avalanche Criterion The Strict Avalanche Criterion was originally presented in [9], as a generalization of the avalanche effect [10], although the idea is somehow already present, but not formulated in concrete terms, in different early works in the field of cryptography [11]. It was devised for measuring the amount of non-linearity of substitution boxes, a key component of many block ciphers. There have been many extensions and generalizations of this idea, but all of them essentially try to abstract the same idea. The avalanche effect tries to reflect, to some extend, the intuitive idea of high-nonlinearity: a very small difference in the input producing a high change in the output, thus an avalanche of changes. Mathematically:
∀x, y | H ( x, y ) = 1 Average( H ( F ( x), F ( y ))) =
n 2
So if F is to have the avalanche effect, the Hamming distance between the outputs of a random input vector and one generated by randomly flipping one of the bits should be, on average, n/2. That is, a minimum input change (one single bit) is amplified and produces a maximum output change (half of the bits) on average. This definition also tries to abstract the more general concept of independence of the output from the input (and thus our proposal and its applicability to measuring the quality of PRNGs). Although it is clear that this independence is impossible to achieve (a given input vector always produces the same output through F) the ideal F will resemble a perfect random function where inputs and outputs are statistically unrelated. Any such F (or any good PRNG) would have perfect avalanche effect. In fact, we will use an even more demanding property that has been called the Strict Avalanche Criterion which, in particular, implies the Avalanche Effect, and that could be mathematically described as:
1 ∀x, y | H ( x, y ) = 1 H ( F ( x), F ( y )) ≈ B ( , n) 2
962
J.C. Hernandez, J.M. Sierra, and A. Seznec
It is interesting to observe that this implies the avalanche effect, because the average of a Binomial distribution with parameters 1/2 and n is n/2, and that the amount of proximity of a given distribution to a certain distribution (in this case a B(1/2,n)) could be easily measured by means of a chi-square goodness-of-fit test. That is exactly the procedure we will follow.
2
The SAC Test
The general approach used will be inspired by the definitions presented in the last paragraph, but should be slightly changed to reflect the fact the we do not have a F function, nor inputs or outputs, and in particular that we don’t have inputs that we can change at a single bit value to measure the effect on this change in the output. We will assume that the function F is the transition function that accepts and produces bitstreams of length n ( n taking values 8, 16, 32, 64, 128… etc.). It is clear that in this case, we cannot change the inputs, but, in any case, we can measure the Hamming distance between the inputs and the outputs and verify if this Hamming distance is or not compatible with the notion of independence and lack of correlation, thus with the theoretical optimal distribution. For this, we perform a chi-square goodness of fit test to accept or reject the null hypothesis of F having perfect avalanche effect. That is, for the bitstream w0 w1 w2 w3 w4 w5 w6 … wi wi+1… we consider that w1=F(w0) w2=F(w1) wi+1=F(wi), and we consequently measure the avalanche properties of F by measuring the values H(w0,w1) H(w1,w2) … H(wi,wi+1) , where H(x,y) rep* resents the Hamming distance between the values x, y ε {0,1} . We collect these values and check if their distribution is statistically consistent with the theoretical one, which should be those of a B(1/2,n). For this we perform a classical chi-square test. We repeat this process for different values of |wi|, in this paper from 8 to 128 bits. 2.1 Expected Values The last phase of any of the tests described in the last section consists of a chi-square goodness of fit test (with the classical correction of not counting deviations in bins where the expected number of observations is less than 5.0) where the bit length of the word under exam (8, 16, 32, 64, 128…) coincides with the number of degrees of z freedom (the Hamming distance between two vectors x, y ε {0,1} could take z+1 values, and this implies z degrees of freedom). Thus, it is easy to perform the test to different significance levels, using the values shown in Table 1.
The SAC Test: A New Randomness Test
963
Table 1. Values for the usual significance levels of a chi-square statistical distribution with the different number of degrees of freedom
Degrees of freedom
α=0.05
α=0.01
8 16 32 64 128
15.5073 26.2962 46.1942 83.6752 155.4047
20.0902 31.9999 53.4857 93.2168 168.1332
2.2 Results We present the results obtained over a number of well-known pseudorandom number generators, as implemented in [7] with the SAC test, using different lengths, from 8 to 128 bits, in Table 2. In bold, we mark the results that have a corresponding p-value less than 0.01, and thus represent a failure of the generator to pass the test.
Table 2. Results obtained by different versions of the SAC test over some well-known generator in the literature. Bold values have an associated p-value which is less than 0.01, so they signal generators that do not pass the given test
D.o.f
Cong
rand
lrand48
shr3
shr3plus
fib
kiss
8 16 32 64 128 Result
485.6 460.0 459.5 41.1 1348.6 Fail
337.6 348.9 352.7 655.4 1471.9 Fail
401.4 409.4 423.1 676.8 1277.7 Fail
8.70 11.49 17.27 36.10 53.36 Pass
10.27 22.31 13.89 41.08 41.47 Pass
251.3 208.8 186.0 69.8 190.5 Fail
8.35 9.79 25.78 24.44 28.38 Pass
In Table 3 we present the results of a recent battery of statistical tests [7], which includes some tests that are, in their author’s words, difficult to pass. We apply this battery of tests over the same pseudorandom number generators that appear at table 2, for comparison’s sake. We could, in this way, conclude that the SAC test is somehow more powerful that the Frequency test, because the congruential generator does not pass the SAC tests while is pass the frequency tests, all the other results being equal.
964
J.C. Hernandez, J.M. Sierra, and A. Seznec
Analogously, we can conclude that the SAC test is more powerful, at least from these set of PRNGs point of view, than the GCD dist test (the GCD test is divided in two parts, the GCD dist and the GCD steps tests, offering a p-value for each of them, but the latter being the one which p-value corresponds to the p-value of the general GCD test). This is because the GCD dist test can be passed by both the rand (the C random generator) and the lrand48 (the generator of the drand48 family) while these two generators fail the SAC tests, all the other results being equal. Similarly, and continuing with this comparative, it seems that the Collision test is more powerful than the SAC test, because all the rest of results being equal, it is able of pointing out failures in the output of the shr3 generator where the SAC test isn’t. The comparison between the Birthday test and the SAC test is, finally, less clear, because the SAC performs both better (rand fails the SAC test where passes the Gorilla test) and worse (the shr3 passes the SAC test while fails the Gorilla) Table 3. Results obtained by different versions of the SAC test over some well-known generator in the literature. Bold values have an associated p-value which is less than 0.01, so they signal generators that do not pass the given test Test
cong
rand
lrand48
shr3
shr3plus
fib
kiss
Frequency
0.3423
1.0000
1.0000
0.6437
0.2484
1.0000
0.4506
GCD steps
1.0000
1.0000
1.0000
0.1873
0.8488
1.0000
0.3991
GCD dist
1.0000
0.4153
0.5801
0.2985
0.6718
1.0000
0.5207
Bday
1.0000
1.0000
1.0000
0.3386
0.2580
1.0000
0.4212
Gorilla
1.0000
0.9370
1.0000
1.0000
0.9870
1.0000
0.3259
Collision
1.0000
0.9941
1.0000
1.0000
0.9790
1.0000
0.1554
3 Conclusions and Future Work We have presented a new test of randomness for pseudorandom number generators, the SAC test, after justifying the need for these kinds of new tests, for being able to cope with the increasing demands related with random and pseudorandom number generation. Additionally, we have shown that this new test is (at least from the point of view of a battery of well-known, classical generators) more powerful than some other widely used tests of randomness for pointing out deficiencies in some PRNGs. This said, we should stress that the conclusions over the power of the different tests examined should be seen simply as an example of its usefulness, not at all a definitive proof of its supremacy other any other randomness test. We sincerely think that the SAC test is a powerful new tool for testing the quality of PRNGs, and even of cryptographic primitives (which, after all, was the motivation that originated it), specially during the design phase, for avoiding, minimizing or, at least, limiting pitfalls
The SAC Test: A New Randomness Test
965
and mistakes. The SAC test offers some other advantages over some other kinds of randomness tests, namely its ability to look for bad properties in the output far from the 32 bits limit which seems to be the scope of many modern tests and, furthermore, the virtue of doing this efficiently, without requiring too much computational power nor time. There are two main lines for improving the work and, possibly, the results presented here: 1) Incrementing the length of bits examined to 256, 512, 1024, 2048, etc. and also the number of explored lengths. This is quite straightforward to do, and only requires some more lines of code and, perhaps, a better C implementation of the probabilities associated with B(1/2,n) when n is large. All these problems seem easy to circumvent, for example, by implementing the test in some high level mathematics oriented language as Mathematica, Maple, etc. Also, it seems clear that, the higher the number of different length bits examined (and, thus, the number of tests or sub-tests) the better the power of these tests for distinguishing bad generators. In the implementation used, the SAC tests were limited to analyze the first 320000 bits of the respective outputs of the given generators. Incrementing this limit seems like a quite natural next step. Also, incrementing the number of different lengths studied (not only 8, 16, 32 but also 9, 10, 11, etc.) could be interesting because it can be done at nearly no computational cost and reveal weaknesses that could be hidden when studying other lengths. 2) Proving the independence of the SAC test from other classical tests This could be done in a similar fashion to the used in [1], where many different statistical tests where included in a battery and there existed the need of assuring their mutual independence (or, analogously, the inexistence of redundancies). This was achieved by generating many random sequences and observing the p-values generated by each of the tests, then studying the results searching for correlations. Although this task is computationally expensive, when performed against many generators it could, in a way, show that the SAC test provides useful and independent information over the output of a PRNG, thus giving the SAC test a complete justification. However, we have reasons to believe that this is precisely the case, as the different subtests in the SAC test (i.e. for different bit length) seem not to be obviously related. A clear example is shown in Table 1, where for the same PRNG, for example the cong, all the sub tests of the SAC point out that it fails miserably (negligible pvalue) while SAC-64, offering a statistic of 41.1 seems to be quite satisfied with the observed output. Similar results are obtained over the Fibonacci generator. This lack of inner auto-correlation, although fair of proving it, seems to indicate that each of the SAC-n test is essentially different and thus there is likely to found a test of the family SAC-n which is probably different to the rest of already known tests.
966
J.C. Hernandez, J.M. Sierra, and A. Seznec
References 1.
Rukhin, Andrew L.: Testing Randomness: A suite of statistical procedures. SIAM Journal on Theory of Probability and its applications. v. 45, 2000 2. U.M. Maurer. A universal statistical test for random bit generators. In A.J. Menezes and S.A. Vanstone, editors, Advances in Cryptology – Crypto '90, pages 409--420, SpringerVerlag, New York, 1991 3. D. E. Knuth. The Art of Computer Programming, Volume 2. Prentice Hall PTR, 3th. Edition, 1997 4. Marsaglia, G. Diehard: A battery of tests for randomness. http://stat.fsu.edu/ geo/diehard.html 1996 5. J. Soto. Statistical Testing of Random Number Generators. In Proceedings of the 22nd National Information Systems Security Conference, 1999. 6. Marsaglia G. and Wai Wan Tsang Some Difficult-to-pass Tests of Randomness. Journal of Statistical Software. volume 7, 2002, Issue 3 7. Center for Information Security and Cryptography (CISC) Library of Tests for Random Number Generators at http://www.csis.hku.hk/cisc/download/idetect/ 8. J. S. Coron and D. Naccache, An Accurate Evaluation of Maurer's Universal Test, Proceedings of SAC '98 (Lecture Notes in Computer Science), Springer-Verlag, 1998 9. R. Forre. The strict avalanche criterion: spectral properties of booleans functions and an extended definition. Advances in Cryptology, CRYPTO 88, Lecture Notes in Computer Science, vol. 403, S. Goldwasser ed., Springer-Verlag, pages 450-468, 1990 10. Webster, A. and S. Tavares. 1985. On the Design of S-Boxes. Advances in Cryptology, Crypto '85, pages 523-534 11. Feistel, H. 1973. Cryptography and Computer Privacy. Scientific American. 228(5): 15-23
Appendix: C Cource Code of a Basic Implementation of the SAC Test The C code of the SAC test for n = 8 double SAC-8_test( unsigned long (*rng)() ) { const unsigned long m = 10000; unsigned int n, i, j; unsigned long xor8, xor32, alea[m]; int ea8count[8];
int hamming8=0;
float expected=0.0, suma=0.0; float chi8=0.0; for (i=0;i