High Performance True Random Number Generator in ... - CiteSeerX

18 downloads 15281 Views 192KB Size Report
tocols, zero-knowledge protocols, padding, in many digital signature schemes, ..... The proposed solution is very cheap, uses few logic resources and is faster.
High Performance True Random Number Generator in Altera Stratix FPLDs 2 ˇ Viktor Fischer1 , Miloˇs Drutarovsk´ y2 , Martin Simka , and Nathalie Bochard1 1

Laboratoire Traitement du Signal et Instrumentation, Unit´e Mixte de Recherche CNRS 5516, Universit´e Jean Monnet, 10, rue Barrouin, 42000 Saint-Etienne, France {Fischer,Nathalie.Bochard}@univ-st-etienne.fr 2 Department of Electronics and Multimedia Communications, Technical University of Koˇsice, Park Komensk´eho 13, 04120 Koˇsice, Slovakia {Milos.Drutarovsky,Martin.Simka}@tuke.sk

Abstract. The paper presents a high performance True Random Number Generator (TRNG) embedded in Altera Stratix Field Programmable Logic Devices (FPLDs). As a source of randomness, an on-chip noise generated in the internal analog Phase-Locked Loop (PLL) circuitry is used. In contrast with traditionally used free running oscillators, it uses and extends a recently developed method of randomness extraction based on two rationally related clock signals. Although it was developed for the Stratix family, the principle can be easily employed in other digital devices containing analog PLLs. We use the large flexibility of PLLs embedded in Stratix family to demonstrate the relationship between PLL and TRNG configuration, the quality of output random bit-stream, and the speed of the generator. The quality of TRNG output is confirmed by applying statistical tests, which pass also for a high-speed version of the generator giving up to 1M random bits per second. The generator developed for cryptographic applications helps to increase the system security, but it can also be used in a wide range of other applications.

1

Introduction

The issue of random number generation is becoming crucial for implementation of cryptographic systems in Field Programmable Logic Devices (FPLDs). Random numbers are needed in particular for key generation, authentication protocols, zero-knowledge protocols, padding, in many digital signature schemes, and even in some encryption algorithms [1]. In all these applications, security greatly depends on the quality of the source of randomness. The quality of generated numbers is proved by passing statistical tests. In addition to good statistical properties of the obtained numbers, the output of the generator used in cryptography must be unpredictable. For this reason, pseudo-random generators easily implementable in FPLDs, are not suitable for cryptographic applications. It is well-known that most attacks are directed at the implementations of the cryptographic algorithms and not at the algorithms themselves. This means that

special attention should be paid to avoid weaknesses that help the attacker to break a system. Our aim was to find a solution completely that can be embedded in a modern FPLD. Digital circuits of modern FPLDs include only limited sources of randomness, e.g. metastability, frequencies of free-running oscillators, clock jitter, etc. Usually, reliable and fast True Random Number Generators (TRNGs) based on metastability are very difficult to implement. Free running oscilators are typicaly used in known FPLD based TRNGs [2, 3]. In principle, TRNGs based on free running oscilators and intrinsic jitter contained in digital circuits can be used without any additional FPLD resources. Actual implementations [3] use off-chip components that generaly decrease the cryptographic security of the implementation. Implementation [2] requires very careful placement of ring oscillator pairs embedded into Xilinx FPLD. It can provide random bits at speeds up to 0.5Mbit/s with good statistical characteristics. In [4] we have proposed a novel method of randomness extraction based on two rationally related synthesized stable clock signals. It was shown that it is well suited for modern FPLDs with internal analog Phase-Locked Loop (PLL) circuitry (e.g. Apex, Cyclone or Stratix FPLDs from Altera [5, 6]). In this paper we present deeper analysis of the possible generator’s configurations. In addition, we describe a detailed methodology for the design of TRNG, so that a reader will get a complete overview of how to set the parameters of the TRNG for the given requirements. We use the large flexibility of PLLs embedded in Stratix FPLDs to demonstrate the relationship between PLL and TRNG configurations, the quality of the output random bit-stream, and the speed of the generator. Although the TRNG was developed for the Altera Stratix family of devices, the principle can be easily employed in other digital devices containing analog PLLs. The paper is organised as follows: in Section 2 we describe the PLL circuitry as the source of random jitter. Section 3 is dedicated to the basic principle of our TRNG. In Section 4 we present a general design methodology and propose several TRNG configurations for evaluation. In Section 5 we show the experimental results and discuss the features and possible advantages and limitations of the proposed configurations. Finally, Section 6 presents conclusions and perspectives.

2

PLL Blocks Embedded in Stratix FPGAs

Recent ASICs and FPLDs generate clock frequencies using PLL circuits to multiply an external low-frequency crystal by an order of magnitude. The analog variant of the PLL implemented in Altera FPLDs offers a source of unpredictable randomness applicable in cryptography. Each PLL block can provide at least one synthesized clock signal with frequency FOU T [5]: FOU T = FIN

m KM = FIN n×k KD

(1)

where FIN is the frequency of the external input clock source. The Altera Stratix devices include two types of PLLs:

Fast PLL (FPLL): Stratix devices include up to 8 FPLLs. The FPLLs offer general-purpose clock management with multiplication and phase shifting. The multiplication is simplified in comparison to EPLL and uses only m/k scaling factors with a range from 1 to 32 [5]. Input frequency can vary in dependency on m (for speed grade -5) from 15 to 717 MHz, output frequency from 9.4 to 420 MHz, and the frequency of the Voltage Control Oscillator (VCO) from 300 to 1000 MHz. Enhanced PLL (EPLL): Compared to FPLL, EPLLs have additional configurable features like external feedback, configurable bandwidth, run-time reconfiguration, etc. The also have an enhanced range of parameters. The input frequency can vary for a speed grade -5 device from 3 to 684 MHz, output frequency from 9.4 to 420 MHz and the frequency of the VCO from 300 to 800 MHz. Reference-, feedback- and post-divider values n, m and k can vary from 1 to 512 (1024 for k) with a 50% duty cycle [5]. 2.1

Jitter Generated in Stratix PLLs

In analog PLLs, various noise sources cause the internal VCO to fluctuate in frequency. The internal control circuitry adjusts the VCO back to the specified frequency and this change is seen as jitter. Under ideal conditions, the jitter is caused only by analog (non-deterministic) internal noise sources, and is noted as an intrinsic jitter. Other possible frequency fluctuations are caused by variations of supply voltage, temperature, external interference through the power, ground, or by the internal noisy environment generated by internal FPLD circuits [7]. The size of the intrinsic jitter depends on the quality factor Q of the VCO, on the bandwidth of the loop filter and on the so-called pattern jitter introduced by the phase frequency detector. The intrinsic jitter is often given in a peakto-peak value or 1-sigma (RMS) value. The 1-sigma value of the jitter (σjit ) depends on the technology and the configuration of the PLL and it can range up to 100 ps [5, 6]. Since the technology of the PLL and the quality of the VCO are usually defined, a user can change the output jitter directly by modification of scaling factors (for FPLL and EPLL) and filter bandwidth (only for EPLL), but also indirectly by the design of the board (separation of the analog and digital ground, filtering of the analog power supply, etc.). Since the size of the jitter is very important for our method, we needed to measure it for various PLL configurations. To reduce the subjectivity of the board design strategy, we have selected the Altera DSP Development board with a Stratix EP1S25F780C5 device [8] for jitter measurements and TRNG implementation. The jitter has been measured similarly in [9] using Agilent Infiniium DCA 86100B wide bandwidth oscilloscope. We have found that in comparison to the Nios board with APEX [10] (used as a reference in [4]) the jitter is significantly smaller. For example, for the FPLL and the ratio 12/7 the jitter achieves 1-sigma value of about 10 ps (see Figure 1(a)) and for the EPLL and the ratio 139/133 the 1-sigma value of the jitter is about 16 ps (see Figure 1(b)). Note that this value depends on the PLL settings and the type of power supply filter

(a) FPLL

(b) EPLL

Fig. 1. Jitter of the clock signal (horizontal scale: 200 ps/div) CLI PLL2

PLL1

CLK

CLJ

D1

q(nTCLK)

x(nNTQ) Decimator (NKD)

∆1 D2 ∆2

. . .

DN

Fig. 2. Basic structure of the TRNG

included on the development board, but is never lower than the internal intrinsic jitter of FPLD.

3

Principle of the PLL-based TRNG

The basic principle behind our method is to extract the randomness from the jitter of the clock signal synthesized in the embedded analog PLL. The jitter is detected by the sampling of a reference (clock) signal using a rationally related (clock) signal synthesized in the on-chip analog PLL. The fundamental problem lies in the fact that the reference signal has to be sampled near the edges influenced by the jitter. The basic structure of the random bitstream generator is depicted in Figure 2. Let CLJ be an on-chip PLL-synthesized rectangular clock waveform with the frequency KM (2) FCLJ = FCLK KD

where CLK is a reference clock signal and parameters KM and KD defined in (1) are related to the PLL structure. Signal CLJ is sampled into the D flip-flop using a clock signal with frequency FCLK . There are KD rising edges of CLK signal and 2KM edges (rising and falling) of CLJ waveform during time period TQ = KD TCLK = KM TCLJ .

(3)

It has been shown in [4] that if KM and KD are relative primes, the set of samples creates an equidistant set of values. The worst-case distance between the two closest edges of CLK and CLJ during the period TQ is given as MAX(∆Tmin ) =

TCLK TCLJ GCD(2KM , KD ) = GCD(2KM , KD ) 4KM 4KD

(4)

where GCD means Greatest Common Divisor. If KM , KD , and FCLJ are chosen so that σjit > MAX(∆Tmin ) (5) we can guarantee that during TQ the sampling edge of CLK will fall at least once into the edge zone of CLJ (the edge zone means the time interval around the edge with a width smaller than σjit ). Therefore during the period TQ , KD values of CLJ will be sampled into the D flip-flop and at least one of them will statistically depend on the random jitter, so the output value q(nTclk ) of the flip-flop will be nondeterministic. In [4] we used delay elements to increase the probability of overlapping of CLK and CLJ edge zones. In [9] we showed that the delay line is not needed for known values of jitter, when σjit À MAX(∆Tmin ). The decimated output signal x(nTQ ) = q(nTQ ) ⊕ q(nTQ − TCLK ) ⊕ . . . ⊕ q(nTQ − (KD − 1)TCLK ) ,

(6)

which is generated at the output of an Exclusive-OR (XOR)-based decimator [11] as a bit-wise addition modulo 2 (⊕) of samples q(.) sampled with the frequency FCLK , will be nondeterministic, too.

4

TRNG Architectures Embedded in Stratix FPGAs

As it can be seen in Figure 2, the TRNG can be designed using one or two PLLs, depending on the position of the switch. Our implementation strategy was to get the fastest and the best quality generator using a minimum amount of resources (PLLs). Since the Stratix family contains two types of PLLs, several configurations are possible. Although the most economic solution would be based on the use of one FPLL (since there are four FPLLs in the selected device), multiplication and division factors of a single FPLL cannot fullfil the implementation condition (5). However, the extended range of parameters of the EPLL enable one to build a single-PLL TRNG. For this reason, the following four architectures of the TRNG implemented in Stratix devices are possible: 1. Two FPLLs (referenced further as configuration A)

2. One FPLL and one EPLL (configuration B) 3. One EPLL (configuration C) 4. Two EPLLs (configuration D) To follow our implementation strategy, we have analyzed the influence of individual parameters of PLLs on the output bit rate and on the sensitivity to the jitter expressed through the parameter MAX(∆Tmin ) defined in (4). Next, we present relations between the TRNG parameters, which are important in the TRNG design (note that for a single-PLL configuration, MCLK and DCLK are equal to one). The PLLs’ output frequencies can be expressed as: FCLK =

MCLK FCLI DCLK

(7)

MCLJ MCLJ DCLK KM FCLI = FCLK = FCLK . DCLJ DCLJ MCLK KD Since the TRNG requires at least FCLJ =

(8)

MAX(∆Tmin ) ≈ σjit

(9)

MAX(∆Tmin ) ¿ σjit

(10)

or better then the first practical design condition is (see equation 4): GCD(2KM , KD ) = 1 .

(11)

If condition (10) is not fulfilled, the quality of the random bitstream output can be enhanced to some extent by the use of the delay elements and D flip-flops depicted in dashed lines in Figure 2. Now, let us characterize the relationship between the jitter and the output bitrate of the TRNG. For the jitter we get: FCLI MAX(∆Tmin ) =

1 4MCLK MCLJ

,

(12)

so decreasing MAX(∆Tmin ) for fixed FCLI requires maximization of MCLK and MCLJ . Coefficients DCLK and DCLJ have no influence on it. For the output bitrate R = 1/TQ = FCLK /KD we get the condition R=

FCLI DCLK DCLJ

(13)

so increasing R for fixed FCLI requires minimization of DCLK and DCLJ . Of course, optimization of (12) and (13) cannot be done independently. There are system limits expressed by the condition R = 4FCLK FCLJ . MAX(∆Tmin )

(14)

The application of the presented analysis of the TRNG design will be illustrated by several implementation examples given in the following section.

5

Experimental Results

TRNG architectures presented in Section 4 were tested on an Altera DSP board with Stratix EP1S25F780C5 FPLDs [8]. Acquired bits were transmitted to the PC through a parallel port. The complete TRNG design including 1024 x 8-bit FIFO and a parallel interface controller needs up to 120 Logic Elements (LE) from about 25000 LEs available in the device. The signal CLK was used as a clock signal for the control logic and was therefore limited to about 250 MHz (although the output frequency of the PLL can be higher). The TRNG architectures were described in VHDL and implemented using the Altera Quartus II development system, version 3.0 SP2. Because the jitter depends on an analog process, the real TRNG output cannot be simulated. In order to test the basic quality of different versions of TRNGs, we evaluated the following parameters (all of them were computed for the record length of N = 1, 000, 000 bits): 1. Bias computed as N1 bias = E[b(n)] − 0.5 = E[b] − 0.5 ∼ − 0.5 = N

(15)

where N1 is the number of b(n) = 1 for n = 0, 1, . . . , N −√ 1. For a good TRNG, the bias should converge to 0 (with deviation ≈ ±3/ N ). 2. Maximal autocorrelation coefficient computed as ρmax = max{|corr(bk )|, k = 1, 2, . . . , 100}

(16)

where · E =

©

corr(bk ) = corr(b(n), b(n − k)) = ¸ ª© ª b(n) − E[b(n)] b(n − k) − E[b(n − k)] p var(b(n))var(b(n − k)) 2

var(b(n)) = var(b) = E[{b − E[b]} ] = E[b]{1 − E[b]}

(17)

(18)

Based on [1, 11] it can be shown that for a good TRNG (with bias → 0 ) and a finite record length N the corr(bk ) follows standard normal distribution N (0, 1) and the following condition should be fulfilled (value χ = 2.576 is from P (X > χ) = α = 0.01/2 valid for N (0, 1) distribution) 2.576 ρmax → √ = 0.002576 N

(19)

3. Standard FIPS140-2 statistical tests [12] that analyze 20, 000 bit records and define thresholds to assess TRNG randomness. FIPS140-2 tests include Monobit, Poker, Run and Long runs tests [1, 13]. We analyzed 100 sequences for each tested TRNG architecture and evaluated relative number (tM , tP , tR , tL ) of sequences that passed each test. A good TRNG should pass all FIPS140-2 tests so that tN IST = tM tP tR tL = 1.

Table 1 includes parameters and results for selected TRNG architectures. As could be expected, the best output bitrate and quality (expressed through the bias, ρmax and tN IST ) are obtained using a TRNG configuration with two EPLLs. Since the use of the delay line (from Figure 2) could hide quality differences between configurations, it has not been used to generate the results. Table 1. Configuration parameters and quality evaluation of the tested TRNGs

Conf. A B C D

MAX PLL1 PLL2 Final ∆Tmin R σjit Bias ρmax tN IST Type KM KD Type KM KD KM KD [ps] [kb/s] [ps] Fast Enh. Enh. Enh.

12 43 212 43

7 Fast 25 12 7 Fast 25 12 207 1 1 7 Enh. 31 10

144 516 212 430

175 10.4 952.4 175 2.9 952.4 207 14.7 386.5 217 2.3 1142.9

10 23 12 13

-0.358 0.054 -0.003 0.002

0.043 0 0.023 0 0.012 0.96 0.003 1

To emphasize the effect of the number of delay elements on the quality of the generated bitstream, we have chosen a lower quality TRNG (configuration A). The results presented in Table 2 show that if more than two elements are used, the bias and correlation coefficient are significantly reduced. Statistical parameters expressed through the parameter tN IST are less stable (below seven delay elements), because MAX(∆Tmin ) in configuration A and the jitter have comparable size. Table 2. Quality evaluation of configuration A for different number of delay elements # of elements Bias 0 1 2 3 4 5 6 7 8 9

-0.358 0.175 0.024 0.030 -0.001 -0.021 -0.027 0.000 0.007 0.000

ρmax tN IST 0.0433 0 0.011 0 0.002 0.007 0.003 0 0.002 1 0.003 0.014 0.002 0 0.002 1 0.003 0.98 0.003 1

The influence of the parameter MAX(∆Tmin ) (sensitivity to the jitter) on the quality of generated bitstream can be seen in Table 3. We use configuration A with eight delay elements as a reference, but by changing multiplication and division factors of both FPLLs we obtain various sensitivities and speeds of the generator. It can be seen that, in spite of the use of the delay line, the quality of

the output bitstream is lower if MAX(∆Tmin ) is bigger than the jitter (see last two lines of the Table 3). We can conclude that the best performance TRNG in Stratix family can be obtained using two EPLLs. Usage of the delay elements can further improve the quality of the output. The final speed of the generator (more than 1Mbit/s) is much higher than that presented in [4], while the quality remains comparable. Table 3. Quality evaluation of configuration A with eight delay elements for different multiplication and division coefficients Conf. A1 A2 A3 A4 A5

PLL1 Type KM Fast 12 Fast 12 Fast 12 Fast 12 Fast 10

KD 7 7 7 7 7

PLL2 Type KM Fast 25 Fast 23 Fast 17 Fast 11 Fast 9

KD 12 12 12 12 12

Final KM K D 144 175 120 161 72 119 60 77 50 63

MAX ∆Tmin [ps] 10.4 11.3 15.3 23.7 34.7

R [kb/s] 952.4 1142.9 1904.8 2285.7 2285.7

σjit [ps] 10 12 11 14 15

Bias ρmax tN IST 0.000 0.002 -0.007 0.133 -0.144

0.002 1 0.003 1 0.003 0.98 0.032 0 0.003 0

In order to demonstrate the quality of the proposed TRNG, we performed more strict statistical tests for the best version of the TRNG - configuration D, with eight delay elements. There are some well-documented general statistical tests that can be used to look for small deviations from an ideal TRNG [12], [13]. A very good TRNG should pass many of these tests. We performed testing with the NIST test suite [12] including the latest known corrections [14]. Our NIST statistical tests were performed on 1 Gigabit of continuous TRNG output records and followed the testing strategy, general recommendations, and result interpretation described in [12]. We have used a set of 1024 1-Megabit sequences produced by the generator and we have evaluated the set of P -values at a significance level α = 0.01. We did not find any detectable deviations for the ensemble of 1024 1-Megabit records. Results of these tests are not included in the paper due to space limitations.

6

Conclusions

In this paper we have described the methodology and design of high performance PLL-based true random number generators embedded in modern FPLDs. We used the large flexibility of the analog PLLs embedded in the new Altera Stratix FPGA family to demonstrate the relationship between PLL parameters and TRNG configuration, the quality of the output random bit-stream, and the speed of the generator. The high quality of TRNG output was confirmed by applying special statistical tests, which are passed even for the high-speed version of the generator delivering more than 1M random bits per second. For the first time it was experimentally confirmed that delay-line elements can improve the quality of TRNG output if PLL jitter is very small.

The proposed solution is very cheap, uses few logic resources and is faster than comparable methods. Although the functionality of the proposed solution has been demonstrated for the Altera Stratix family, the same principle and design methodology can be used for all recent high-performance ASICs or FPLDs that include an on-chip reconfigurable analog PLL. The generator developed for embedded cryptographic applications helps to increase the system security, but it can also be used in a wide range of other applications.

Acknowledgments This work has been done in the frame of the project CryptArchi included in the French national program ACI Cryptologie (project number CR/02 2 0041) and the project VEGA 1/1057/04.

References 1. Menezes, J.A., Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, New York (1996) 2. Kohlbrenner, P., Gaj, K.: An embedded true random number generator for fpgas. In: Proceeding of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, ACM Press (2004) 71–78 3. Tsoi, K., Leung, K., Leong, P.: Compact FPGA-based true and pseudo random number generators. In: Proceedings of the IEEE Symposium on FieldProgrammable Custom Computing Machines (FCCM). (2003) 51–61 4. Fischer, V., Drutarovsk´ y, M.: True Random Number Generator Embedded in Reconfigurable Hardware. In Kaliski, Jr., B.S., Koc, C.K., Paar, C., eds.: Workshop on Cryptographic Hardware and Embedded Systems – CHES 2002. Volume 2523 of LNCS., Berlin, Germany, Springer-Verlag (2002) 415–430 5. Stratix Device Handbook: Volume 2, Chapter 1, Using General-Purpose PLLs in Stratix & Stratix GX Devices, v.2.2 (2003) 6. Altera Application Note 115: Using the ClockLock & ClockBoost PLL Features in Apex Devices, v.2.3 (2002) 7. Xilinx: Superior Jitter Management with DLLs, Virtech Tech Topic VTT013, v.1.2 (2003) 8. Altera Data Sheet: Stratix EP1S25 DSP Development Board, v.1.4 (2003) ˇ 9. Fischer, V., Drutarovsk´ y, M., Simka, M., Celle, F.: Simple PLL-based True Random Number Generator for Embedded Digital Systems. In: Proceedings of IEEE Design and Diagnostics of Electronic Circuits and Systems Workshop – DDECS 2004, Star´ a Lesn´ a, Slovakia (2004) 129–136 10. Altera Data Sheet: Nios Embedded Processor Development Board (APEX device), v.2.2 (2003) 11. Davies, R.B.: Exclusive OR (XOR) and hardware random number generators (2002) 12. Rukhin, A., et al.: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications. (NIST Special Publication 800-22) (revised May 15, 2002). 13. NIST FIPS PUB 140-2: Security Requirements for Cryptographic Modules (2001) 14. Kim, S., Umeno, K., Hasegawa, A.: Corrections of the NIST statistical test suite for randomness. Cryptology ePrint Archive, Report 2004/018 (2004)

Suggest Documents