Random numbers are needed in particular for key generation, authentication protocols, zero-knowledge protocols, padding, in many digital signature schemes ...
SIMPLE PLL-BASED TRUE RANDOM NUMBER GENERATOR FOR EMBEDDED DIGITAL SYSTEMS Viktor Fischer*, Miloš Drutarovský**, Martin Šimka** and Frédéric Celle* * ** Laboratoire Traitement du Signal et Department of Electronics and Instrumentation, Unité Mixte de Multimedia Communications, Recherche CNRS 5516, Technical University of Košice Université Jean Monnet Park Komenského 13, 10, rue Badouin, 04120 Košice, Slovak Republic 42 000 Saint-Etienne, France {Milos.Drutarovsky, Martin.Simka} {fischer, celle}@univ-st-etienne.fr @tuke.sk Abstract. The paper presents a simple true random number generator (TRNG) which can be embedded in digital Application Specific Integrated Circuits (ASICs) and Field Programmable Logic Devices (FPLDs). As a source of randomness, it uses on-chip noise generated in the internal analog phaselocked loop (PLL) circuitry. In contrast with traditionally used free running oscillators, it uses a novel method of randomness extraction based on two rationally related synthesized clock signals. The generator has been developed for embedded cryptographic applications, where it significantly increases the system security, but it can be used in a wide range of applications. The quality of TRNG output is confirmed by applying special statistical tests, which pass even for high output bit-rates of several hundreds of kilobits per second.
1 Introduction Producing unpredictable, i.e. irreproducible uniformly distributed, random number sequences on hardware is one of the central issues of cryptographic software and hardware implementations. Random numbers are needed in particular for key generation, authentication protocols, zero-knowledge protocols, padding, in many digital signature schemes and even in some encryption algorithms. In all these applications, security greatly depends on the quality of the source of random numbers. TRNGs can be produced using any non-deterministic process. The most typical implementations use quantum mechanics [1] which can produce up to one Mbit per second. Radioactivity can also provide true random bits [2] but at a much lower rate of several hundreds of bits per second. Many other hardware generators exist. They usually exploit the thermal noise (resistance or shoot) in electronic devices. In the well-known Intel Random Number Generator [3] a slow clock signal modulated by a thermal noise is sampled using fast clock signal from a free running oscillator. The drift between the two clocks provides the source of random bits with an average rate of 75 Kbits per second.
Our aim was to find a solution completely embedded in a digital circuit. These circuits are well suited for implementation of so called pseudo-random number generators. However, pseudo-random number generators cannot provide sufficient security, because they are predictable. Digital circuits include only limited sources of randomness, e. g. metastability, frequency of a free-running oscillator, clock jitter, etc. Usually, reliable generators based on the metastability and/or frequency instability are difficult to implement and they are not secure enough for cryptographic applications. The entropy increase is not sufficiently high in some cases, so the output of generator can be predicted [4]. In another case [5] off-chip components are required what decreases a cryptographic security of the implementation. In contrast with these methods, in [6] we have proposed a novel method of randomness extraction based on two rationally related synthesized clock signals. It was shown that it is perfectly suited for modern FPLDs with the internal analog phase-locked loop (PLL) circuitry (e.g. Apex or Stratix FPLDs from Altera). The generator provided random data with only small deviations from ideal one. In this paper we present a modified version of our generator [6] having simplified structure and higher data rate. It is based on an extended knowledge of jitter features obtained by measurements in real conditions on Altera development boards.
2 The PLL-synthesized clock jitter 2.1 Analog PLLs embedded in digital circuits New digital VLSI circuits use advanced clock generation and distribution circuitry based on embedded analog PLLs [7], [8], [9], [10]. A simplified block diagram of one analog based PLL block available in advanced digital circuits is depicted in Figure 1. Each PLL block can provide at least one synthesized clock signal with frequency FOUT : FOUT = FIN
K m = FIN M n× k KD
(1)
where FIN is the frequency of the external input clock source. Reference-, feedback- and post-divider values n, m and k can vary from one to several hundreds in FPGAs [9], [10], or to several thousands in ASICs [8]. :n Input Clock FIN
Phase Frequency Detector
:m
Charge Pump & Loop Filter ClockShift Circuitry
VCO
:k Output Clock m n×k
FOUT = FIN
Figure 1: Simplified block diagram of an embedded PLL circuitry
2.2 Jitter of PLL synthesized clock signals In analog PLLs, a noise causes the Voltage Controlled Oscillator (VCO) to fluctuate in frequency. Other frequency fluctuations are caused by variations of supply voltage, temperature and by a noisy environment. The internal control circuitry adjusts the VCO
back to the specified frequency, but a certain part of the fluctuations caused by nondeterministic noise cannot be compensated for and is seen as a clock jitter. The size of the intrinsic jitter depends on the quality factor Q of the VCO, on the bandwidth of the Loop Filter and on the so-called pattern jitter introduced by the Phase Frequency Detector. It is often given in peak-to-peak value or 1-sigma (or RMS) value. 1sigma value of the jitter ( σ jit ) depends on the technology and the configuration of the PLL and it can range from 3.5 ps to 10 ps for ASICs [8] up to 50 ps for FPGAs [9], [10]. Since the technology of the PLL and the quality of the VCO is usually defined, a user can change the output jitter by modification of divider values and filter bandwidth. For example the jitter in an Apex FPLD has 1-sigma value of σ jit ≈ 15.9 ps for a FOUT = 66.6 MHz clock signal and multiplication factor 2 [11]. These results were acquired under “ideal conditions”, with only a minimal amount of FPLD resources occupied and minimal input/output activities. Our last measurements (cf. in subsection 2.3) show that the clock jitter in an Apex FPLD is significantly higher (about 140 ps ) for higher multiplication factors and when internal FPLD flip-flops are switching on different clock frequencies. 2.3 Jitter size measurement Since the knowledge of the jitter size and its statistical features is crucial for correct settings of the TRNG parameters, several measurements have been made for different configurations of the PLL (different values K M and K D ). We used Agilent Infiniium DCA 86100B wide bandwidth oscilloscope and the Nios development board [12] with Apex EP20K200 device. The 1-sigma value of the jitter measured at the output of the PLL has achieved values σ jit ≈ 32 ps for FIN = 33.3 MHz and K M K D = 2 1 . For ratios K M K D = 3 2, 4 3,5 4,6 5… the 1-sigma value was in the range σ jit ≈ 47 − 64 ps and the jitter has approximately Gaussian distribution as it is illustrated in Figure 2a) for configuration K M K D = 6 5 . However, for K M K D = 157 48 used in [6], the jitter was σ jit ≈ 140 ps and it exhibited two peaks with a total size of 600 ps (peak-to-peak). The obtained jitter distribution is depicted in Figure 2b) and it is compatible with a reference measurement made by Xilinx on Altera FPLDs [11], where a two-peak jitter distribution has been documented. Since the jitter included in the clock signal in real conditions was significantly higher than that documented by Altera (note, that Altera has used the same kind of development board, only K M and K D parameters were different), we could significantly reduce the generator complexity. As far as K M and K D parameters are chosen properly, the proposed method is insensitive to the jitter distribution (see section 3 and [6]).
a) Figure 2: The jitter probability distribution for a) K M
b) K D = 6 5 b) K M K D = 157 48
3 Randomness extraction from the PLL-generated clock jitter The basic principle behind our method is to extract the randomness from the jitter of the clock signal synthesized in the embedded analog PLL. The jitter is detected by the sampling of a reference (clock) signal using a rationally related (clock) signal synthesized in the onchip analog PLL. The fundamental problem lies in the fact that the reference signal has to be sampled near the edges influenced by the jitter. The basic structure of the random bitstream generator is depicted in Figure 3. CLK
PLL
CLJ
D
Q
CLK
D
Q
q(nTCLK)
Decimator (NKD)
x(nNTQ)
CLK
Figure 3: Basic structure of random bitstream generator using PLL-synthesized low-jitter clock signal
Because there is a probability that the first flip-flop could become metastable, the second flip-flop is cascaded. In case the first flip-flop produces a metastable output, it can resolve until its output is clocked by the second flip-flop. This flip-flops connection does not assure that only stable signal is clocked, but the probability that the output q(nTCLK ) will get a valid state is much higher [13]. Let CLJ be an on-chip PLL-synthesized rectangular clock waveform with the frequency (2) where CLK is a reference clock signal and parameters K M and K D defined in (1) are related to the PLL structure. Signal CLJ is sampled into the D flip-flop using a clock signal with frequency FCLK . There are K D rising edges of CLK signal and 2 K M edges (rising and falling) of CLJ waveform during time period FCLJ = FCLK K M / K D
TQ = K DTCLK = K M TCLJ
(3)
It has been shown in [6] that if K M and K D are relative primes, the set of samples create an equidistant set of values with the distance step d=
TCLK T GCD (2 K M , K D ) = CLJ GCD (2 K M , K D ) 2KM 2K D
(4)
where GCD means Greatest Common Divisor. It has been shown that the worst-case distance between the two closest edges of CLK and CLJ during the period TQ is given as MAX (∆Tmin ) = d / 2
If K M , K D and FCLJ are chosen so, that
σ jit > MAX (∆Tmin )
(5) (6)
we can guarantee that during TQ the sampling edge of CLK will fall at least once into the edge zone of CLJ (the edge zone means the time interval around the edge with a width smaller than σ jit ). Therefore during the period TQ , K D values of CLJ will be sampled into the first D flip-flop and at least one of them will statistically depend on the random jitter, so the output value q(nTCLK ) of the second flip-flop will be nondeterministic. In [6] we used delay elements to increase the probability of overlapping of CLK and CLJ edge zones.
Thanks to the measurements we got the information about the jitter value, and also about the probability of edges overlapping that both are higher as expected. Therefore the delay line is not needed anymore. The decimated output signal x (nTQ ) = q (nTQ ) ⊕ q (nTQ − TCLK ) … ⊕ q (nTQ − ( K D −1)TCLK )
(7)
which is generated at the output of an Exclusive-OR (XOR)-based decimator as a bit-wise addition modulo 2 ( ⊕ ) of K D samples q (.) sampled with the frequency FCLK , will be nondeterministic, too. It can be seen that in reference to [6] we have changed basic structure of the generator in several ways: - because of higher jitter value, which has been approved by precise jitter measurements we could replace the delay line and the bank of flip-flops by a single flip-flop, - the metastability behavior of the signal q(nTCLK ) and thus of the generator in general was improved by addition of the second flip-flop, - we have validated that if condition (6) is fulfilled, the speed of the generator can be increased by reducing the decimation factor to NK D , where N = 1 is number of TQ periods.
4 TRNG implementation We have validated our simplified structure of the random bitstream generator using Altera analog PLLs embedded in Apex E family. We have used an evaluation board with a PC Card interface. The Apex EP20K160 device has included generator, 16x128-bit FIFO, PC Card interface and a custom logic. As the best option a 2-PLL configuration (shown in Figure 4) with only one input clock signal has been chosen. Synthesized clock signals CLK and CLJ are not fed out from the FPLD (in design presented in [6] one synthesized signal has been fed out of the device and again reused in FPLD, what is definitely less secure). Therefore they cannot be manipulated separately without a circuit reconfiguration. This fact is very important for cryptographic applications, because it significantly improves overall system security. APEX EP20K160E-2X
Serial output 454 kbits/s fCLJ (96,4 MHz)
clk1
Random bitstream generator
clk0
fCLK (95 MHz)
PLL4
Oscil 40 MHz
CLK4p
S/P Conv.
FIFO 16x128
inclk
PLL2 clk1
CLK2p (NC)
inclk clk0
Custom logic & PC Card interface
CLKLK_FB2p (NC) CLKLK_OUT2p (NC)
Figure 4: Block diagram of the experimental PC card with Apex EP20K160 ETC144-2x
Multiplication and division factors for individual output signals were selected as follows (the first number represents the PLL index and the second number the output index): clk 20 = 40 × 53 22 = 96.36 (MHz ) clk 40 = 40 ×19 38 = 20 (MHz) clk 41 = 40 ×19 8 = 95(MHz )
Therefore MAX (∆Tmin ) ≈ 12 ps (see equations (4) and (5)). This value is much smaller than measured jitter value σ jit (see section 2.3). The TRNG was implemented for N = 1 and so RTRNG ≈ 45400 bit/s (about six times more than in [6]). This was done intentionally in order to check generated output bitstream under more stringent conditions. The FPLD resource requirements of the proposed TRNG blocks as well as supporting logic (FIFO and control logic) are shown in Table 1. The first four columns show resource requirements (Logic Cells, Embedded System Blocks (ESB), PLL outputs and Global Clocks used) for the generator. The other four columns give requirements of the complete circuit including FIFO buffer and data bus controller. Presented results have been obtained using Altera Quartus II v. 2.2. Table 1: TRNG resource requirements
LC 19
TRNG only ESB PLL GCLK 0 2 2
TRNG + FIFO and logic LC ESB PLL GCLK 129 2 2 3
While 19 logic cells represent less than 1% of available cells, the project uses 100% of available PLLs (two PLLs). Since each PLL contains two clock outputs, two clock outputs remain available (one at each PLL) and with some restrictions, they can be used inside the application area (e. g. in our case for the PC Card interface). Internal usage of PLL clock signals represents no security problem as these signals cannot be externally manipulated.
5 Statistical evaluation of TRNG There are some well-documented general statistical tests that can be used to look for deviations from an ideal TRNG [14]. A good TRNG should pass all kinds of tests. 5.1 Results of the NIST tests Our NIST statistical tests were performed on 1 Gigabit of continuous TRNG output records (for N = 1 ) and followed the testing strategy, general recommendations and result interpretation described in [14]. We have used a set of 1024 1-Megabit sequences produced by the generator and we have evaluated the set of P -values at a significance level α = 0.01 . We have got similar results as in [6] (with the exception of FFT test, see below) so we can conclude that there are no detectable differences for ensemble of 1024 1-Megabit records. Results of these tests are not included in the paper due to space limitation. It was claimed in [6] that some 1-Gbit TRNG records did not pass NIST FFT tests. During evaluation of proposed new TRNG implementation we have found errors in NIST FFT test formulation (confirmed also in [15]). After correction of these errors in NIST package all the FFT tests pass without any problems for all tested (1-Gbit) records. To
prove even further the quality of the generator, we have applied also very strict Frequency (Monobit) tests for very long records. 5.2 Frequency (Monobit) test of very long record The most common statistical test of TRNGs is the Frequency (Monobit) test [14]. Good TRNGs should provide independent binary (Bernoulli) random variables 0 and 1 with the same probability. For a sequence of independent identically distributed Bernoulli random variables x (nNK D ) we can define variable S n = X 1 + … + X n , where X n = 2 x (nNK D ) −1 are antipodaly encoded values {−1,1} . By the classic De Moivre-Laplace theorem [14], for a sufficiently large number of trials, the distribution of the normalized binomial sum sn = Sn / n is closely approximated by a standard normal distribution N (0,1) and, roughly said, S n < 3 n for almost all n (note that we used extremely long TRNG record in order to detect also very small deviations). Results for 74-Gbit record (bottom curve, for decimation factor N = 1 ) and 37-Gbit record (upper curve, for N = 2 ) are shown in Figure 5. Small deviation for N = 1 is visible. After decimation by factor 2, deviation is within expected values (marked as parabola with peak point in 0), but it remains visible. This is the only deviation from an ideal TRNG currently known to us. On the other hand, the bias decreasing using the XOR decimator is applicable only for independent bits. The fact that we obtained lower bias after decimation gives us indirect evidence that output bits are independent and very close to the perfectly random bits. 1
x 10
Cummulated "1" and "0" Difference
6
0.5
Difference
0
b)
-0.5
-1
-1.5
a)
-2
-2.5
-3
0
1
2
3 4 5 Record Length
6
7
8 x 10
10
Figure 5: Results of frequency tests of very long TRNG output: a) 74 Gbits ( N = 1 ) b) 37 Gbits ( N = 2 )
6 Conclusions In this paper we have described and evaluated a simplified method of true random bitstream generation inside digital VLSI circuits. The design of the TRNG and the method of randomness extraction guarantee that the output depends on a physical undeterministic
internal random process. The randomness of the sequence of numbers has been extensively tested and only extremely small differences from an ideal TRNG have been detected. The proposed solution is very cheap, it uses a very small number of logic resources and it is faster than comparable methods. Although the functionality of the proposed solution has been demonstrated for Altera Apex family, the same principle can be used for all recent high-performance ASICs or FPLDs that include an on-chip reconfigurable analog PLL for clock synthesis. In the future we plan a deeper research of the jitter and its measurement, as its statistical features directly affect the statistical features of the output random sequence.
Acknowledgments This work has been done in the frame of the project CryptArchi included in the French national program ACI Cryptologie and Slovak scientific project VEGA 1/1057/04.
References [1] T. Jennewein, U. Achleitner, G. Weihs, H. Weinfurter, A. Zeilinger, “A fast and compact quantum random number generator”, Rev. Sci. Inst. 71, 1675-1680 (2000) [2] J. Walker, “Hotbits: Genuine random numbers, generated by radioactive decay”, online at www.fourmilab.ch/hotbits [3] B. Jun, P. Kocher, "The INTEL Random Number Generator", Cryptography Research, Inc., White Paper prepared for Intel Corporation, April 1999, pp.1-8, www.intel.com [4] M. Dichtl, “How to Predict the Output of a Hardware Random Number Generator”, In C.D. Walter, C. K. Koc, Ch. Paar (Eds.): CHES 2003, LNCS 2779, Springer, Berlin, 2003, pp.181-188 [5] K.H. Tsoi, K.H. Leung, P.H. Leong, “Compact FPGA-based True and Pseudo Random Number Generators”, Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), California USA, 2003, pp.51-61 [6] V. Fischer, M. Drutarovský, “True Random Number Generator Embedded in Reconfigurable Hardware”, In B.S.Kaliski Jr. et al. (Eds.): CHES 2002, LNCS 2523, Springer, Berlin, 2003, pp.415-430 [7] J. G. Maneatis, “Selecting PLLs for ASIC Applications Requires Tradeoffs”, Planet Analog Magazine 9/2003, www.planetanalog.com [8] “XpressArray High Density 0.18um Structured ASIC”, WEB page of the AMI Semiconductors Company, www.amis.com [9] “Using the ClockLock & ClockBoost PLL Features in Apex Devices”, Altera Application Note 115, v.2.3, May 2002, pp.1-55, www.altera.com [10] “Using PLLs in Stratix Devices”, Altera Application Note 200, v.1.0, February 2002, pp.1-70, www.altera.com [11] “Superior Jitter Management with DLLs”, Virtech Tech Topic VTT013 (v.1.2), January 21, 2003, pp.1-6, www.xilinx.com [12] “Nios Embedded Processor Development Board”, Altera Data Sheet, v.2.1, April 2002, pp.1-22, www.altera.com/nios [13] “Metastability in Altera Devices”, Altera Application Note 42, v.4.0, May 1999, pp.1-10, www.altera.com [14] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A. Heckert, J. Dray, S. Vo, “A statistical test suite for random and pseudorandom number generators for cryptographic applications“, NIST Special Publication 800-22, May 15, 2001, pp.1-153, http://csrc.nist.gov/rng/ [15] S. Kim, K. Umeno, and A. Hasegawa, “Corrections of the NIST Statistical Test Suite for Randomness”, Cryptology ePrint Archive, Report 2004/018, January 26, 2004, http://eprint.iacr.org/2004/018.pdf