Programmable Logic Device (FPLD). A new simple method of randomness extraction from on-chip generated low-jitter clock signal is presented. The proposed ...
True Random Number Generator in Altera ACEX Devices Miloš Drutarovský 1 , Viktor Fischer2and Rastislav Lukáč 1 1
Department of Electronics and Multimedia Communications, Technical University of Košice, Park Komenského 13, 041 20 Košice, Slovak Republic {Milos.Drutarovsky, lukacr}@tuke.sk 2 Laboratoire Traitement du Signal et Instrumentation, Unité Mixte de Recherche CNRS 5516, Université Jean Monnet, Saint-Etienne, France
Abstract The paper introduces an analog phase-locked loop (PLL) based true random number generator (TRNG) implemented as an IP core in a digital Altera Field Programmable Logic Device (FPLD). A new simple method of randomness extraction from on-chip generated low-jitter clock signal is presented. The proposed TRNG is implemented in a low-cost Altera ACEX FPLD and tailored for embedded “System on a programmable chip” (SOPC) cryptographic applications. Quality of generated true random numbers is confirmed by passing standard NIST statistical tests. The possibility of including the proposed TRNG into a SOPC design significantly increases the system security of embedded cryptographic hardware.
1. Introduction Random numbers are employed today in both numerical simulations and cryptography. Current cryptography requires good true random numbers. Almost all cryptographic protocols require the generation and the use of secret values that must be unknown to attackers [1]. For example, TRNGs are required to generate public/private keypairs for asymmetric (public key) algorithms including RSA, DSA and Diffie-Hellman. Keys for symmetric and hybrid cryptosystems are also generated randomly. Unfortunately computers are not able to generate true random numbers, as they are deterministic systems. Numerical pseudo-random generators relay on complexity and their standalone use in cryptography, for example to generate keys is inadvisable. The only way to get true random numbers, hence true security for crypto-systems, is to build a generator based on a random physical phenomenon. In recent years, cryptography has gained increasing attention from both chip vendors and end users. One
consequence of this trend has been the growing importance of embedded cryptographic hardware. Current modern high-density FPLDs provide an alternative hardware platform even for system-level integration of embedded symmetric [2] and asymmetric [3] algorithms but not for high quality TRNG. Most hardware TRNGs follow unpredictable natural processes, such as thermal (resistance or shoot) noise or nuclear decay. Such TRNGs are not compatible with modern FPLDs and cannot provide a SOPC solution. The fact that TRNG cannot be implemented inside the FPLD represents significant security and system disadvantage in embedded cryptographic applications. This paper describes implementation of a new analog PLL based generator that uses on-chip resources of low-cost Altera ACEX FPLDs [4]. Proposed method reliably extracts intrinsic randomness from low-jitter clock signals synthesized by reduced on-chip ACEX analog PLL. The TRNG is developed as an Intellectual Property (IP) building block and provides significantly higher system level security for complete embedded cryptographic SOPC designs. The paper starts with an overview of basic features of reduced analog PLL circuits available in Altera ACEX and some APEX FPLDs. In Section 3 we explain the basic principle of proposed true random noise extraction from PLL generated low-jitter clock signal. Section 4 describes the experimental hardware implementation of our method and speed/area results. Results of statistical evaluation of proposed TRNG output are presented in Section 5. Finally, concluding remarks are given in Section 6.
2. Analog PLL in Altera FPLD Current modern FPLDs often use on-chip PLLs to increase performance and to provide on-chip clockfrequency synthesis. There are two fundamental
approaches to implement PLL in FPLD – one uses digital PLL, or DLL, (e.g. in XILINX Virtex FPLDs [5]) and the second one uses true analog PLL (e.g. in Altera ACEX [4] and APEX [6] FPLDs). Both approaches have some advantages and disadvantages but analog PLLs seem to be a better solution for a cryptographic TRNG design. In analog PLLs, nondeterministic noise causes the internal voltage controlled oscillator (VCO) to fluctuate in frequency. The internal control circuitry adjusts the VCO back to the specified frequency, but this change is seen as a clock jitter. Other frequency fluctuations are caused by variations of supply voltage, temperature and by a noisy environment. Altera tries to minimize the clock jitter, for example their typical analog intrinsic PLL jitter in APEX FPLD has 1-sigma value σ jit = 15 ps (under Gaussian approximation, the peak-to-peak value is approximately tJITTER = 6σ jit ) for a F = 100 MHz synthesized clock signal [7]. In [5] it was shown that the clock jitter in APEX FPLD is significantly higher, when internal FPLD flip-flops are switching, but the true intrinsic jitter is always present and it is included in the overall jitter. To support high-speed designs, -1 and -2 speed grade ACEX 1K devices offer ClockLock and ClockBoost circuitry containing a phase-locked loop enabling 1× and 2× clock multiplication. Although there are no detailed ACEX jitter values available in the literature, maximum peak-to-peak jitter on ClockBoost generated clock is tJITTER = 250 ps [4] if the input clock stability is 100 ps (common value for typical external oscillator). Since APEX and ACEX devices use similar VCO, it is expected that PLL parameters from [5] and [7] are applicable also for jitter behavior of ACEX devices with integrated PLL.
3. Principle of True Random Noise Extraction The basic principle behind our method is to extract the randomness from the jitter of the clock signal synthesized in the embedded analog PLL. The jitter is detected by the sampling of a reference (clock) signal using correlated (clock) signal synthesized in the PLL. The problem lies in the fact that the reference signal has to be sampled near the edges influenced by the jitter. The exact sampling position is controlled by the insertion of several delay elements having fixed and sufficiently small delays. If the overall delay (a sum of delays of all delay elements) is larger than the drift caused by temperature and voltage changes, and if the jitter is larger than the delay of one element, we can guarantee the reliability of the method. The rest of the paper demonstrates that such operation is possible. In the future we would like to evaluate real parameters of the jitter.
In the ACEX FPLDs the smallest delay is obtainable between carry-in and carry-out in the logic cell. A simplified timing model of a logic cell is depicted in Fig. 1. CARRY-IN
tCLUT CONTROL-IN
tSU tH
tC
DATA-OUT
tCICO tLABCARRY
CARRY-OUT
Fig. 1. ACEX 1K Simplified Logic Cell Timing Model
Delays in the Fig. 1 have the next meaning: tCLUT – Look-up-table (LUT) delay for carry-in tSU – Logic cell (LC) register setup time for data tH – LC register hold time for data tC – Logic cell (LC) register control signal delay tCICO – Carry-in to carry-out delay tLABCARRY – Routing delay for the carry-out signal of a LC driving the carry-in signal of a different LC in a different Logic Array Block (LAB). Since carry-in to carry-out delays in the LAB and between LABs are very important for our method, we have better analyzed the differences between various family members. It is interesting that parameters from Altera data sheets and timing analysis result differ quite significantly (see Table 1). We have taken the parameters obtained from MaxPlus2 [8] version 10.1 timing analyzer as the basis for our method. You should note that the Table 1 does not contain speed version -3 of the APEX devices, because PLL feature is included only in versions -1 and -2. A simplified block diagram of the true random number generator (TRNG) is depicted in Fig. 2. CLK
PLLx2
:2
CLJ
D
Q
tCICO CLK D
Q
xXOR(nTCLK) Decimator
tCICO CLK
(DEC)
... tCICO D
Q
CLK
Fig. 2. Block diagram of the TRNG
x(nTDEC)
Device
tCLUT
tSU
tH
TC
1K10-1 1K10-2 1K30-1 1K30-2 1K50-1 1K50-2 1K100-1 1K100-2
500 600 500 600 500 600 500 600
500 600 500 600 500 600 500 600
900 1100 900 1100 900 1100 900 1100
500 600 500 600 500 600 500 600
ACEX 1K data sheets tCICO tLABCARRY 200 100 200 100 200 100 200 100 200 100 200 100 200 100 200 100
Timing analysis tCICO tLABCARRY 100 300 100 400 100 300 100 400 200 100 200 100 100 500 100 700
Table 1. ACEX 1K timing model parameters
In internal analog phase locked loop (PLL), the global clock signal CLK is multiplied by 2 using Altera ClkLock macrofunction with boost parameter 2× . To enter the global signal into the local logic array, it is divided by 2 in the next stage. The obtained signal CLJ enters the logic cell through carry input and it passes across the cell to carry-out output after tCICO delay. Outputs of the delay chain enter D flip-flops where they are sampled using clock signal CLK. If the jitter is comparable with the tCICO delay, the output of at least one D flip-flop will be influenced by this random jitter (see Fig. 3). CLK CLJ
where TDEC = TCLK DEC . If the input decimator samples xXOR ( nTCLK ) are statistically independent, the decimated output sequence x ( nTDEC ) must quickly converge to unbiased binary sequence that is uncorrelated (this is only the necessary condition for a statistical independency but it can be easily tested so we used it as the first test of our TRNG described in Section 5). Since the binary stream xXOR ( nTCLK ) is influenced by an analog part of the PLL, we can expect that obtained values will be statistically independent (it is valid for tCICO ≈ tJITTER ) and the proposed method reliably extracts randomness caused by analog PLL noise.
4. Experimental Hardware Implementation
D1 D2 D3 tCICO
jitter
Fig. 3. TRNG waveforms
From the given example it is clear that the jitter (gray zone) will very probably influence the output of D flip-flop two, but the probability that it will affect flip-flop one and three is significantly lower. Clearly, waveforms depicted in Fig. 3 are very ideal. Under real conditions the flip-flop two produces the output logical signal that is statistically biased, so the probability of zeros and ones can be quite different. Moreover actual probability can vary with the time and temperature, so the delay chain must be larger than the maximum possible drift. One common way to reduce statistical bias is to XOR bits together and use decimation shown in Fig 2. This method improves the bias at the expense of decreasing output bit-rate by factor DEC according to x ( nTDEC ) = xXOR ( nTDEC ) ⊕ xXOR ( nTDEC − TCLK ) ⊕ … … ⊕ xXOR ( nTDEC − ( DEC − 1) TCLK )
(1)
The TRNG was implemented by using standard Altera megafunction for embedded PLL configuration. There are two problems related to the random number generator implementation in FPLD: - function of the generator cannot be verified by using simulation (jitter is not simulated), - since detection of the jitter is based on a repetition of the low signal delays using a carry chain, placement and routing has an important impact on the generator operation (e.g., to guarantee correct operation of the generator, D flip-flops have to be implemented in the same logic array block as the carry chain delay). So a good operation of the design has to be ensured by resource locking (assignments) and it has to be verified and tested in a real hardware. Generator blocks have been designed using both Altera Hardware Description Language (AHDL) and VHDL. Since its implementation is hardwarespecific, it seems to be more practical to use AHDL instead of VHDL (at least for jitter detector block), because the first language is closer to the hardware and the implementation can be better controlled on a low level basis (assignments of hardware elements). The FPLD resource requirements of proposed TRNG block as well as supporting logic (FIFO, control logic) of experimental hardware
implementation is demonstrated in Table 2. The first two columns show resource requirements (in Logic Cells) and maximum supported frequency for the generator, as it is presented in Fig. 2. The second two columns give number of LCs and maximum frequency of the complete circuit including 32 bits wide 1024-word FIFO and a data bus controller. Presented results have been obtained by using Altera MaxPlus II v. 10.1 [8].
difference from an ideal TRNG. This difference is caused by certain small correlation (or more complex statistical dependency) between jitter values. Based on this result we can conclude that this small dependency is generally no problem for generation of typical cryptographic keys with the length from hundreds to several thousands bits 0.5004 0.5002
Device EP1K50-1 EP1K50-2
Generator+FIFO
LCs f [MHz] LCs 22 178 221 22 149 221
0.5
f [MHz] 128 117
Mean Value
Generator only
Table 2. FPLD resource requirements
5. Statistical Evaluation of TRNG Output A potential problem related to the used XOR decimation technique for bias removing is that XOR decimation should be used only with statistically independent bits. In fact, the XOR reduces any statistical bias, but it amplifies the correlation between bits. Our XOR decimator performs XORing of (delay line has 4 elements) N XOR = DEC * 4 = 512* 4 = 2048 bits. There are about N bit ≈ Dec = 512 bits per one output bit x ( nTDEC ) that are influenced by a nondeterministic jitter, but their exact positions are generally unknown and potentially time-varying. Under ideal assumtion (statisticaly independent biased jitter values) the output signal x ( nTDEC ) must converge to unbiased binary sequence with probability of 1’s and 0’s equal to 1/2. Fig. 4 shows the evaluation of mean value of signal x ( nTDEC ) k
DEC
Mean(k ) =
n =1
0.4996 0.4994
To measure the real performance of proposed TRNG, a custom PCI development board with Altera ACEX FPLD and PLX9052 PCI target has been used. The board features the PLL-capable ACEX EP1K50-2 with one on-chip analog PLL. The external clock source was 33.3 MHz, on-chip synthesized clock was FCLK = 33.3 × 2 = 66.66 MHz and DEC = 512 . Values x ( nTDEC ) generated with the bit-rate FCLK / ( 2 DEC ) ≈ 65098 bits/s were converted to 32-bit words saved in the 1024-word FIFO. The FIFO size together with the bit-rate ensure that no bit is losen during data transfer to the PC and data saving on the hard disk.
∑ x ( nT )
0.4998
(2)
k for 1-Gigabit record ( k = 1,… , 230 ) of TRNG output. The mean value converges approximately to the value 0.5003 and clearly shows that there is small
0.4992
0
200
400
600 Mbits
800
1000
1200
Fig. 4. Mean value evolution of signal x ( nTDEC )
Only decimated output x ( nTDEC ) is directly available and can be extensively tested. In order to check the correlation of the decimated output x ( nTDEC ) we have applied the autocorrelation test. This test measures the correlation between bits at the distance d computed as [1] L − d −1
∑ x ( nT ) ⊕ x ( ( n + d ) T ) (3) {x ( nT ) , n = 0,1,… , L − 1} is a binary
A(d ) =
DEC
DEC
n=0
where DEC sequence of L decimated bits. Applying this function on L = 230 ≈ 109 bits with 1 ≤ d ≤ 2048 we have found no particular correlation and all points of normalized test statistic [1] n−d 2 A(d ) − 2 X = (4) n−d approximately follow normal N ( 0,1) distribution (shown in Fig. 5) as it is expected for an ideal TRNG. 200 180 160 140 120 100 80 60 40 20 0 -4
-3
-2
-1
0 X
1
2
3
4
Fig. 5. Histogram of 2048 values of test statistic X
Result of this test confirms that the necessary condition of uncorelated decimator output is fulfiled and it makes sense to use more sophisticated statistical tests to evaluate the randomness (or better said the difference from an ideal TRNG output, since the randomnes cannot be proven) of the TRNG output. A large number of generalized statistical tests for the randomness have been proposed, such as the DIEHARD specification [8], FIPS 140-2 [9] and NIST statistical test suite [10]. It seems that NIST statistical test suite is currently the most comprehensive test package. Our NIST statistical tests were performed on 1 Gigabit of continuous TRNG output acquired from the experimental
hardware and followed testing strategy, general recommendations and result interpretation described in [10]. We used set of m = 1024 1-Megabit sequences produced by the generator and we have evaluated the set of P-values at a significance level α = 0.01 . The subset of the most representative results is shown in Table 3. The number of acceptable sequences was within the expected confidence intervals [10] for all performed tests and P-values were uniformly distributed over the interval ( 0,1) . Similarly FIPS tests [9] performed on 20000-bit sequences and DIEHARD tests [8] performed on 80-Megabits sequences do not reveal any additional statistical anomaly from an ideal TRNG.
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
P-Value
120 97 121 110 99 80 98 120 97 115 107 111 101
115 82 96 96 112 102 120 101 103 106 105 77 107
86 110 101 102 85 94 107 78 98 97 77 128 115
92 104 92 100 90 122 109 103 96 103 115 126 113
100 105 108 118 108 107 109 102 91 119 101 87 84
120 113 95 98 118 105 102 127 122 99 107 82 109
100 109 115 100 101 111 94 93 118 100 109 99 99
92 104 109 94 100 94 100 107 104 103 109 114 82
96 99 86 100 103 101 96 97 100 95 110 110 127
103 101 101 106 108 108 89 96 95 87 84 90 87
0.166187 0.665675 0.321175 0.871719 0.476118 0.248537 0.614674 0.055223 0.442722 0.562029 0.156493 0.000683 0.024372
Proportion 0.9883 0.9873 0.9902 0.9912 0.9883 0.9951 0.9951 0.9824 0.9902 0.9805 0.9854 0.9863 0.9941
Statistical Test
Frequency Block-Freq. Cusum Runs Long-Run Rank FFT Periodic-Template Universal Apen Serial Lempel-Ziv Linear-Complexity
Table 3. NIST test results (uniformity of P -values, proportion of passing sequences)
6. Conclusions Current modern high-density FPLDs provide hardware platform even for system-level integration of embedded SOPC cryptographic applications. Cryptographic IP building blocks, if available, allow electronic systems manufacturers to build easily and quickly on a single chip the same functionality that previously consumed several chips or even entire printed circuit board. We have proposed a new method of true random numbers generation in embedded SOPC designs. The randomness of the sequence of numbers has been extensively tested. We believe that intrinsic analog PLL noise is a good source of true randomness and at least for 1 Megabit sequences our TRNG is not distinguishable from ideal TRNG. The proposed IP block presents implementation of TRNG in commercially available low-cost Altera FPLD. The generator has been tested on two different hardware modules (PCI cards) with similar results. Nevertheless, we would
like to further evaluate its voltage and temperature dependence and repeatability. If the statistical parameters of the generator will not be suitable for certain applications, it is still possible to implement TRNG in a more powerful Altera FPLDs (e.g. APEX FPLDs [6]). These FPLDs include more complex PLL blocks that allow to use more complex clock configuration. Such a solution has been proposed in [12], and its stability and repeatability is guaranteed by the used (more sophisticated) principle. The proposed generator is very cheap, it uses very small amount of FPLD resources and it is fast enough for typical embedded cryptographic applications. Advantage of our solution lies in the fact that proposed IP block together with e.g. symmetrical and asymmetrical algorithm can fit into one FPLD chip and significantly increase system security of an embedded cryptographic SOPC system. This solution is currently in development and will be presented in a future paper.
References [1] J.A. Menezes, P.C. Oorschot, S.A. Vanstone, “Handbook of Applied Cryptography”, CRC Press, New York, 1997. [2] V. Fischer, M. Drutarovský, “Two Methods of Rijndael Implementation in Reconfigurable Hardware”, Proceedings of the Workshop on Cryptographic Hardware and Embedded Systems CHES’2001, Paris, May 2001, pp. 81-96. [3] V. Fischer, M. Drutarovský, “Scalable RSA Processor in Reconfigurable Hardware - a SoC Building Block”, Proceedings of XVI Conference on Design of Circuits and Integrated Systems - DCIS 2001, November 2001, Porto, Portugal, pp.327-332. [4] “ACEX 1K Programmable Logic Family”, Data Sheet, September 2001, ver. 3.3, pp.1-86, http://www.altera.com. [5] “Superior jitter management with DLLs“, Virtex Tech Topic VTT013 (v1.1), February 1, 2001, pp.1-4, http://www.xilinx.com. [6] “APEX 20K Programmable Logic Device Family“. Data Sheet, February 2002, ver. 4.3, pp.1-116, http://www.altera.com. [7] “Jitter comparison analysis: APEX 20KE PLL vs. Virtex-E DLL“, Technical Brief 70, January 2001, ver.1.1, pp.1-7, http://www.altera.com. [8] “MaX+PLUS II Programmable Logic Development System”, Altera Inc., http://www.altera.com. [9] G. Marsaglia, “DIEHARD: a battery of tests of randomness“, http://stat.fsu.edu/~geo/diehard.html. [10] “Security requirements for cryptographic modules“, Federal Information Processing Standards Publication 140-2, U.S. Department of Commerce/NIST, 1999, http://www.nist.gov. [11] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A. Heckert, J. Dray, S. Vo, “A statistical test suite for random and pseudorandom number generators for cryptographic applications“, NIST Special Publication 800-22, May 15, 2001, pp.1-153, htttp://www.nist.gov. [12] V. Fischer, M.Drutarovský, “True Random Number Generator Embedded in Reconfigurable Hardware”, Proceedings of the Workshop on Cryptographic Hardware and Embedded Systems - CHES’2002, San Francisco, August 2002.