2013 13th International Conference on Computer-Aided Design and Computer Graphics
Design and Implementation of a Delay-based PUF for FPGA IP Protection Jiliang Zhang† , Qiang Wu† , Yongqiang Lyu‡ , Qiang Zhou‡ , Yici Cai‡ , Yaping Lin† and Gang Qu§ † College of Information Science and Engineering, Hunan University, Changsha, China Email: {hnu.zjl, wuqiang2000}@gmail.com ‡ RIIT&TNList&DCST, Tsinghua University, Beijing, China Email: {luyq, zhouqiang, caiyc}@tsinghua.edu.cn § Department of Electrical and Computer Engineering, University of Maryland, College Park, USA Email:
[email protected]
a challenge-response system that is completely the same as another. Due to this unique property of PUFs, there is a prospect that it can be applied widely in areas such as integrated circuit Intellectual Property (IP) protection [1, 2], digital rights management, and key generation [4]. Field-programmable gate arrays (FPGAs) are a kind of programmable chips that can be configured by users to implement digital circuits of any functionality. With its reconfigurability [5], continuous improvement in quality (such as time, area and power) and the decrease of production cost, FPGAs have been widely used in the computing acceleration, communication and other areas. Hence, the low cost, high reliability and high security implementation of PUFs on the FPGA will bring about many FPGA applications in security-related areas. For example, recent researches indicate that unclonable fingerprints generated by PUFs in chips combined with finite state machines of sequential circuits is able to effectively solve the problem that FPGA IP cores are vulnerable to illegal duplication and distribution [20, 24]. In this paper, we will introduce related works on PUFs at first. Then, we will describe our design and implementation of a delay-based PUF on 28nm FPGA devices, and followed by the experimental data and conclusion.
Abstract—Physical Unclonable Function (PUF) makes use of the uncontrollable process variations during the production of IC to generate a unique signature for each IC. It has a wide application in security such as FPGA Intellectual Property (IP) protection, key generation and digital rights management. Ring Oscillator (RO) based PUF and Arbiter-based PUF are the most popular PUFs, but they are not specially designed for FPGA. RO-based PUF incurs high resource overhead while obtaining less challenge-response pairs, and requires “hard macros” to implement on FPGA. The arbiter-based PUF brings low resource overhead, but its structure is hard to be mapped on FPGA. Anderson’PUF can address these weaknesses of current Arbiter-based and RO-based PUFs. However, it cannot be directly implemented on the new generation FPGAs, and therefore it has the scalability issue. In order to address these problems, this paper presents a delay-based PUF using the intrinsic structure of FPGA (look-up table and multiplexer). The proposed delay-based PUF is completely realized on 28nm FPGAs. The experimental results show its high uniqueness and reliability. Moreover, we test the proposed PUF in the high temperature, and the results show its availability. Finally, the prospect of the proposed PUF in the FPGA IP protection is discussed. Keywords-physical unclonable functions (PUFs); FPGAs; intellectual property (IP) protection; IP cores; fabrication variation; watermarking; fingerprint; hardware security; EDA
I. I NTRODUCTION
II. R ELATED W ORK
Due to randomness of manufacturing process, many aspects of chip production fail to be fully controlled. For each logical gate, threshold voltage and gate oxide thickness will not be the same, which brings about fabrication variation that means chips built under the same conditions will be different. Although such differences are not significant enough to affect the function and performance of circuits, they can be used to distinguish between two chips. We can take advantage of these differences in manufacturing to help safety design. Physical Unclonable Function (PUF) is such a physical system that makes good use of these differences. When a PUF is given a challenge, it produces a response. However, it is very difficult to predict the response (output) without accessing the system physically. Even with very complex manufacturing equipment, it is impossible to build 978-1-4799-2576-6/13 $31.00 © 2013 IEEE DOI 10.1109/CADGraphics.2013.22
The concept of PUF was first proposed by Pappu [6]. It was followed by other kinds of PUF which fall into four categories [7]: Non-electronic PUF, Analog electronic PUF, Memory-based intrinsic PUF and Delay-based intrinsic PUF. The Delay-based intrinsic PUF is currently the hottest research topic. Non-electronic PUFs mainly includes Optical PUF [6], Paper PUF [8] and CD PUF [9]. Analog electronic PUFs mainly include Coating PUF [10], LC-PUF (similar to Coating PUFs) [11], Threshold voltagebased PUF [12] and Impedance-based PUF [13]. The Memory-Based intrinsic PUF mainly includes Static RAM PUF (SRAM PUF) [1] and Butterfly PUF [2]. The SRAM PUF consists of a large number of memory units. 107
Ă
based on a delay loop (ring oscillator) to generate random bit strings. A RO is a simple circuit that oscillates with a particular frequency, which cannot be predicted due to manufacturing process and other uncertain factors. This kind of PUF generates the output logic-0 or logic-1 by comparing the frequencies of two circuits selected. N oscillators can produce n ∗ log(n)-bit information entropy. Figure 2 illustrates a simple RO-based PUF with two 5-stage ring oscillators. The two ROs within the dashed box in Figure 2 must be identical, so as to ensure that the frequency differences between them are caused by the differences in random manufacturing processes. Recently, researchers proposed a lot of methods to enhance the security of Arbiterbased PUF and RO-based PUF [15,19]. The uniqueness of the Arbiter-based PUF that has been implemented on a FPGA is only 1.05% [7], because the Arbiter-based PUF stresses symmetry of the layout and hence is difficult to be implemented on FPGAs [21]. In contrast, a RO-based PUF does not require high symmetry and thereby is easy to be implemented on FPGAs [21]. However, a RO-based PUF consumes more resources than an Arbiterbased PUF, produces a limited number of challenge-response pairs (CRPs), and needs hard macros to fix the routing [22]. Anderson’PUF [25] can address these weaknesses of current Arbiter-based and RO-based PUFs. However, it cannot be directly implemented on the new generation FPGAs, and therefore it has the scalability issue. To overcome these weaknesses of mainstream PUFs, this paper designed and implemented a delay-based PUF on 28nm FPGAs. The PUF makes use of the underlying structure (lookup table and multiplexer) of a FPGA. Compared with the previously proposed delay-based intrinsic PUFs, this PUF has the following merits: 1) good scalability; 2) low hardware consumption; 3) good portability without the need of hard macros to fix routing so that the PUF can be easily embedded into other designs; 4) high reliability.
Ă
Figure 1.
The structure of Arbiter-based PUF
х͍
Ϭͬϭ
Figure 2.
The structure of RO-based PUF
A memory unit is formed of two cross-coupled converters with two stable states that are normally represented by 0 and 1. Because not all FPGAs support memories that do not need initialization, the SRAM PUF is not suitable for all types of FPGA. In order to resolve the problem, Guajardo et al. proposed an improved SRAM PUF, named Butterfly PUF. While units in SRAM PUF are based on a cross-coupled inverter, Butterfly PUF uses an unstable cross-coupling circuit, replacing an inverter with a latch or flip-flop. The latch of Butterfly PUF, as a circuit for storing information, can be emptied (the output is 0) or be reset (the output is 1), bringing the advantage that they do not require measurement by being powered up. Therefore, Butterfly PUF is applicable to all types of FPGA. The Delay-based intrinsic PUF mainly includes Ring Oscillator (RO) based PUF [15, 22, 23] and Arbiter-based PUF [4]. Lim et al. proposed the Arbiter-based PUF in [4]. The basic structure of it is shown in Figure 1. Two parallel norder multiplexer chains in the structure share the input port, and the output ports are respectively connected to the input port D of a flip-flop and the clock input port. The input port uses step input signals, and the select ports of the multiplexer chains form challenge input bits: b0 ∼ bn . Signal bi determines whether the step signals inputted in the i-th stage go straight along the original multiplexer chains or cross. Different challenge input signals and the delay difference between the two parallel multiplexer chains determine whether the step signal will reach the flip-flop input port D, where logic-1 will be latched, or the clock input port, where logic-0 will be latched. The latched value can serve as a 1-bit PUF signature. The Arbiter-based PUF requires symmetry of the layout and hence is difficult to be implemented on FPGAs [7]. Suh et al. proposed a RO-based PUF [23] which is
III. D ESIGN AND I MPLEMENTATION OF THE PUF A. The Principle of PUF Design The design of PUF is based on the principle that unclonable chip signatures can be generated according to fabrication variations. The main idea of this paper is to design a circuit that generates logic-0 or logic-1 according to fabrication variations and then obtain n-bit binary chip signatures by implementing the circuit n times. The circuit structure of the core part of PUF design is shown in Figure 3. The value of the 16-bit shift register (LUT) A is initialized to 0101010101010101 (0x5555), and the value of 16-bit shift register B is initialized to the binary complement of the initial value of A, i.e. 1010101010101010 (0xAAAA). In this way, the output ports (OUT) of shift register A and B, both of which use the same clock signal CLK, will output sequences 0101010101010101 and 1010101010101010 in sync respectively. Then input ports
108
! "# $ % &''''
from logic-1 to logic-0, the delay of shift register A and the multiplexer it drives is shorter than the delay of shift register B and the multiplexer it drives. If this case occurs, when shift register B has shifted, i.e., signal N1 has already changed from logic-0 to logic-1, the select pin of the top multiplexer is still logic-1 since shift register A has not made the shift yet. Therefore, signal N2 will shift to be logic-1, as N1 does. But, as shift register A does the shift, the select input of the multiplexer on the top becomes logic-0. According to the analysis above, the delay of shift register A and the multiplexer it drives is different from the delay of shift register B and the multiplexer it drives. The difference in delay determines whether there will be a glitch (a short positive spike) in N2 or not and how wide the spike is. If the delay difference between circuits is trivial, no positive spike will appear in N2; when the delay difference between circuits is large, a positive spike will appear in N2. The larger the delay difference is, the wider the spike is. In this paper, the value of an individual bit of PUF signature is determined by whether there will be a spike in signal N2. As shown in Figure 3, we connect N2 to port 1 of a flip-flop which is initialized to 0 and the feedback from output port Q of the flip-flop to input port D. As long as N2 has a positive spike pulse that reaches the PRE port of the flip-flop, the output of the flip-flop is logic-1, which is a 1-bit PUF signature. Conversely, if there is no glitch reaching the PRE port, it outputs logic-0. Then the value of 1-bit PUF signature is logic-0 as well. Note that if the positive spike generated by signal N2 in Figure 3 is too narrow, the positive spike will be “filtered” out before reaching the PRE port of the flip-flop. The reason why there appears such mechanism called “filtering” is that the resistance and capacitance along the path can be regarded as a filter which can weaken the positive spike. In order to prevent a positive spike from being “filtered” out and a vast majority of flip-flops from being set to 1 when the positive spike is too wide, we should adjust the width of the positive spike to a reasonable range. In experiments, the delay of signal N1 can be adjusted by changing the relative distance between two multiplexers selected in a carry chain. If the distance between two multiplexers that are connected to the shift registers has increased, i.e, the length of multiplexer chain between them has increased, the delay of N1 will be longer as well. We can obtain a longer multiplexer chain (a multiplexer chain longer than 4) by connecting the output port of the multiplexer at the top of a carry chain to the input port 1 at the bottom of another carry chain. Experiments discovered that a PUF achieves the best performance when 5 multiplexers are arranged between the multiplexers that shift registers drives (make the length of the whole multiplexer chain 7). At this time, on average, 28.75 bits of the 64-bit signature are numbered 1, i.e., 44.9% of the 64 bits are with value 1, being close to the ideal value 50%. If the multiplexer chain length is reduced by 1, the percentage will be less
35(
$( $ )* +# þÿ,
$ % &
Figure 3.
An overview of the PUF design
(IN) of the two shift registers will be assigned values continuously, making the output of shift register A and B continue with the same sequence after the initial 16 rounds of output. The output ports OUT of the two shift registers are respectively connected to the 2-to-1 multiplexers. Both of the input ports (0) of the two 2-to-1 multiplexers are connected to logic-0; the input port (1) of the bottom multiplexer is connected to logic-1, and its output port logic-1 is connected to the input port (1) of the top multiplexer. An analysis of timing behavior of the circuit shown in Figure 3 is as follows: in the beginning, the output port of shift register A is logic-0; therefore, signal N2 is logic-0. Next, in the rising edge of the clock, the output port of shift register A (OUTA ) will shift from logic-0 to logic1. Meanwhile, the output port of shift register B (OUTB ) will shift from logic-1 to logic-0. In fact, although shift register A and the multiplexer it drives should be identical in structure design and circuit connection to shift register B and the multiplexer it drives, the delays of these two parts of the circuit will be different due to inevitable fabrication variations. In this paper, we employ this property of PUF to generate chip signatures. In Figure 3, signal N2 stays to be logic-0. When shift register A outputs logic-0 and shift register B outputs logic1, N2 outputs logic-0 ; when shift register A outputs logic-1 and shift register B outputs logic-0, the output of N2 remains to be logic-0. There are two cases in which N2 outputs logic1 worth highlighting. In one case, when the output of shift register A shifts from logic-0 to logic-1 and the output of shift register B shifts from logic-1 to logic-0, the delay of shift register A and the multiplexer it drives is shorter than the delay of shift register B and the multiplexer it drives. In this case, when shift register A has shifted from logic0 to logic-1, namely the select pin of the top multiplexer becomes logic-1, signal N1 and N2 are logic-1. N1 won’t turn to be logic-0 until shift register B does a shift. In the other case, when the output of shift register A shifts from logic-0 to logic-1 and the output of shift register B shifts
109
.01 --
---
--
---
---
--
---
./
--
.01
---
-
--
--
.0
--
---
---
---
---
./
-
Figure 4.
In order to solve this problem, this paper uses two LUTs on a SLICEM to implement two 16-bit shift registers of the PUF, 2-to-1 multiplexers in the carry chain to implement the multiplexers of the PUF, and any one of the 8 flipflops to latch 1-bit PUF signatures. Figure 4 shows how to implement a PUF with a 1-bit signature on a 28nm Xilinx Zynq-7000 XC7Z020 FPGA. The dotted line represents the direction of data flow. In the structure of Xilinx Zynq7000 XC7Z020 FPGA, x coordinates of all SLICEs are even numbers. Every SLICE with the x-coordinate greater than a SLICEM’s x-coordinate by 1 and the y-coordinate equal to a SLICEM’s y-coordinate is a SLICEL. We use SLICEMs to implement shift registers and SLICELs to implement multiplexer chains. Two CLBs adjacent in the vertical direction generate 1 bit of a PUF signature. In this way, 4 SLICEs can generate 1 bit of a PUF signature, and a flip-flop required for latching the PUF signature can be selected from 8 flip-flops of a SLICEL randomly. In this paper, we design and implement a delay-based PUF on the new generation 28nm FPGAs so that the corresponding software development tool can automatically carry out synthesis, placement and routing on it without manual intervention and the use of hard macros to fix routing, leading to a good portability and scalability.
--
-
.0
---
--
IV. E XPERIMENTAL R ESULTS AND A NALYSIS
A. Experimental Design
Generating 1-bit of PUF signatures on a 28nm Xilinx FPGA
In this paper, we designs and implements 64-bit PUF signatures on a ZedBoard development board (Zynq-7000 XC7Z020 FPGA). A FPGA is divided into 16 areas, in each of which is implemented a 64-bit PUF. The range constraint (ROLC RANGE statements) supported by Xilinx Integration Development Kit is used to lay out a design to a designated region. Note that since the fabrication variation between any two different FPGAs is generally greater than the fabrication variation between two areas of the same piece of FPGA, if the PUF signatures obtained in regions in the same FPGA are unique enough, then the PUF signatures implemented on different FPGAs will have good uniqueness as well. Therefore, we test the uniqueness and reliability of PUFs through implementing 16 PUFs on the same FPGA. Eight LED lights on a ZedBoard can be used to display 8 bits of a PUF signature. When a LED light is on, it means logic-1, otherwise, it means logic-0. Additionally, eight switches on a ZedBoard are used to determine which 8 bits of a 64-bit PUF signature are to be displayed by the 8 LED lights. For each PUF, the LED lights display 8 bits eight times, from the highest 8 bits to the lowest 8 bits. Each time of display will be recorded in order to get a complete record of a 64-bit PUF signature. The program processing the raw experimental data was implemented in C.
than 10%; if the multiplexer chain is lengthened by 1, the percentage will be around 65%. Once a LUT of a SLICEM on a Virtex-5 FPGA is configured into the shift register, an input port (0) of the carry chain connected to the shift registers can be connected via DX to an independent logic-0 to meet the design requirements. Hence, the shift register and the multiplexer chain can be laid out on the same SLICE. However, the SLICE structure of the new generation Xilinx FPGAs (including Virtex-7 series, Kintex-7 series, Artix-7 series and Zynq-7000 Series which is used in this paper) is different from Virtex-5 and other previous FPGAs. For example, for a SLICEM on a Zynq-7000 FPGA, path O5 and path DX (it is difficult to draw them in the figure. Readers can view them by using the FPGA Editor) that can be selected by input port (0) of a multiplexer in the carry chain, when a LUT in the corresponding position is used as a shift register, serve as the input and output port of the shift register. Henceforth, the design requirement that the data input port (0) of a multiplexer should be connected to a signal that is always logic-0 will not be satisfied. Therefore, the Anderson’PUF [25] cannot be directly implemented on the new generation FPGAs.
110
Figure 5. Frequency distribution of the Hamming distances between any two of 16 PUF signatures
Figure 6. Frequency distribution histogram of Hamming distances between PUF signatures of the same PUF generated in the two experiments
B. The Overhead of the PUF
long as the distribution of ‘0’ and ‘1’ in a PUF signature is uneven, the performance of the PUF will be affected to some extent. The ideal uniformity of the PUF is 50%, namely ‘0’ and ‘1’ emerge with equal probability for each bit in a PUF signature. We can modulate signal delay by changing multiplexer chain length, which adjusts the probability of ‘0’ and ‘1’ of each PUF bit, then we achieve the purpose of adjusting uniformity. This is the major advantage of the PUF presented in this paper. Through these adjustments, we can obtain uniformity close to the ideal value. The experiment shows that 64 bits of the signature generated by each PUF have up to 33 bits numbered ‘1’ and at least 20 bits numbered ‘1’, namely 44.9% of the bits are ‘1’, being pretty close to the ideal uniformity 50%. 3) Reliability: Reliability is used to evaluate the stability of PUF signatures generated by the same challenge in repeated experiments. Ideally, PUF signatures should remain the same under same challenges over multiple observations. Actually, a variety of environmental conditions, such as temperature, voltage, and aging of the devices, may lead to differences in the circuit delay and cause PUF signatures to vary. The difference between any two signatures generated under the same challenge in repeated experiments by a PUF with high reliability should be slight. The paper uses the following formula to evaluate the reliability of PUFs:
16 64-bit PUFs use 680 of the 53,200 LUTs (1.3%) of a Zynq-7000 XC7Z020 FPGA, 129 of the 17400 LUTs (0.7%) that can be used as memories or shift registers, 278 of 13300 SLICEs (2.1%). Experimental results show that the implementation of PUFs in this paper incurs low overheads. C. Performance Analysis 1) Uniqueness: The paper evaluates PUF uniqueness with Hamming distances. For a pair of PUFs: Pi and Pj (i = j), the bit number of the signature are n bits, then the average Hamming distance between any two PUFs implemented in different areas on FPGA is calculated as follows.
u=
k−1 k 2 HD(Pi , Pj ) × 100% k(k − 1) i=1 j=i+1 n
(1)
In this way, we get a total of (16 ∗ 15)/2 = 120 statistical data. Figure 5 is the histogram demonstrating frequency distribution of the Hamming distances. If each of the 16 signatures is unique and in each signature logic-0s or logic-1s are distributed evenly, then the theoretical expectation of the Hamming distance between any two of them should be 50% of 64 bits, namely, 32. In the experiment, the maximum Hamming distance between any two of 16 PUF signatures is 45; the minimum is 18. The average of all 120 HDs is 31.75 (49.6% of PUF signature bits), pretty close to the ideal expected 32. As shown in Figure 5, these HDs are concentrated around the ideal expected 32. This experiment proves sound uniqueness of signatures generated by the PUF proposed in the paper. 2) Uniformity: Uniformity discussed here is about the distribution of ‘0’ and ‘1’ in a PUF signature. The uniformity of signatures generated by a PUF is associated with the performance of the PUF. In extreme cases, if all bits of a signature generated by a PUF are ‘0’ or ‘1’, then signatures generated by the PUF are identical, by no means unique. As
x
u=
1 HD(Ri , Ri,y ) × 100% x y=1 n
(2)
where x stands for the times of sampling; n is the number of bits of a signature generated by a PUF; Ri,y is the yth sampling of Ri . We recorded CRPs of 16 PUFs after they were implemented on FPGA. A week later, we recorded CRPs of the 16 PUFs again. Through calculating the Hamming distance between signatures generated by each PUF in the two experiments, we can obtain the differences between these
111
Figure 7. Temperatures of the FPGA chip in experiments under normal conditions
Figure 8. Temperatures of the FPGA chip after being heated by an electric hair dryer
Figure 9. Frequency distribution histogram of Hamming distances between PUF signatures generated at high and normal temperatures
signatures and analyze the reliability of the relevant PUF. The frequency distribution histogram of these Hamming distances is shown in Figure 6. Two experiments indicate that the Hamming distances range from the minimum 0 to the maximum 6. The average was 2.38 (only 3.7% of 64 bits). In particular, Hamming distances between signatures generated by 62.5% of the PUFs in two experiments are less than or equal to 2. Meanwhile, comparing Figure 5 and Figure 6, between Hamming distances of the signatures generated by different PUFs and Hamming distances of the signatures generated by the same PUF through many experiments, the readers can see a “Gap” between 9 and 17. The “Gap” implies that the signatures generated by the PUF can identify a particular PUF. Therefore, this experiment results show that the PUF has high reliability. 4) The impact of high temperature on reliability: The circuit delay will be affected by temperature, and a PUF designed by this paper generates signatures with delay differences caused by random manufacturing variation. Therefore, the PUF signatures are very likely to change with temperature. We expect the PUF design to have a good uniqueness and high reliability at high temperature. We read and monitor the temperature of FPGA chips with Xilinx Chip-Scope tool. Experiments mentioned in the above section were conducted at a room temperature about 15 ◦ C, with good heat dissipation and on a FPGA experimental board with only PUFs running. By Xilinx ChipScope we acquired the FPGA chip temperature as shown in Figure 7, which was about 40 ◦ C. We raised the temperature of the FPGA chip with an electric hair dryer blowing against the chip on the experimental board in order to simulate a hightemperature state that may be caused by heavy workload or environmental factors. We control the temperature of the FPGA chip around 70 ◦ C by Xilinx ChipScope, shown in Figure 8. As the chip temperature was maintained at about 70 ◦ C, signatures generated by a PUF were being recorded. By
Figure 10. Frequency distribution histogram of Hamming distances between any two signatures generated by 16 PUFs at high temperatures
calculating the Hamming distance between a PUF signature generated at high temperatures and those generated at normal temperatures, we can analyze the degree of temperature influence and assess the performance of this PUF. Figure 9 is the frequency distribution histogram of 16 Hamming distances. The maximum Hamming distance between a signature generated by a PUF at a high temperature and a signature by the same PUF at a normal temperature is 9; the minimum is 2; the average is 5.50 (8.6% of the total number of bit positions 64). For 81.25% of the PUFs, the Hamming distance between signatures generated by a PUF at high temperatures and normal temperatures gathered within the range 3 to 8. Comparing Figure 9 and Figure 6, we can observe that a temperature difference (between 40 ◦ C and 70 ◦ C) affects the reliability of PUF to a certain extent. The average Hamming distance increases from 2.38 to 5.50, the maximum Hamming distance from 6 to 9. However, comparing Figure 9 and Figure 5, we can still see a “Gap” (Hamming distances 12 to 17) between Hamming distances of the signatures generated by different PUFs and Hamming distances of the signatures generated by the same PUF
112
the increasingly serious problem of IP infringement. IP cores with modular design can be easily copied or sold by the third party without reverse engineering, bringing serious losses to IP manufacturers and reducing the market share of their products. Therefore, how to protect FPGA IP cores effectively has become an urgent issue [16]. Currently, FPGA IP protection technologies can be divided into three categories: 1) signature-based scheme; 2) encryption-based scheme; 3) PUF-based binding scheme. The signature-based scheme is to represent IP ownership by embedding into IPs an encrypted signature, i.e. a watermark. After embedding an owner’ signature into IPs, the chips thus produced will bear the same watermark. When intellectual property disputes occur, its owner can ask a trusted third party to recover the signature from the stolen IP, which can effectively address the infringement issue. Although the watermarking technology has been widely researched [16, 17, 18], it has a big limitation: watermarking is a passive IP protection technology. It can not actively prevent IP from being illegally duplicated, distributed, and integrated into SOC. Moreover, FPGA IP is essentially a bitfile core. The watermark embedded in the file is more likely to be tampered with and covered than an ASIC, making FPGA IP protection more difficult. Therefore, users are in urgent need of an IP protection technology with active defense. Encryption-based scheme is on a kind of “quasi-active” IP protection technology. The activeness is reflected in the fact that encrypted IP cores which are illegally copied and used won’t work without correct decryption. The scheme is to encrypt the configuration bit stream and then load it into a FPGA [3]. However, for SRAM-based FPGAs, FPGA design is stored as a bit stream in the external memory (EEPROM). Once a FPGA is powered up, the bit stream is loaded to configure the FPGA. During the loading, it is very easy to clone the bit stream by wiretap [18]. Moreover, encryptionbased methods have the following main disadvantages [20, 24]: 1) commercial encryption-based binding schemes can only protect a single IP core; 2) most of the encryptionbased schemes require a trusted third party to interact with the protocol; 3) encryption-based scheme has the serious problem of key management. It requires secure ROM or Flash memory to store FPGA-specific cryptographic keys, and therefore these methods are expensive and vulnerable to side-channel attacks [14]. Therefore, encryption-based scheme is a un-satisfactory technique to prevent IP cores from being sold to unauthorized users. The PUF-based binding scheme is to combine the PUFgenerated unclonable signatures with the finite state machines of sequential circuits to “actively” restrict IP to running on particular hardware platforms, while providing commercial payment ways similar to pay-per-use [20, 24]. The basic theoretical framework of the scheme was first proposed in [20], then the authors of which conducted a further
Table I P ERFORMANCE COMPARISONS WITH PREVIOUS PUF S PUF Uniqueness Reliability PUF Uniqueness Reliability
optics PUF[6] 49.79% 25.25% SRAM PUF[2] 43.16% 3.8%
CD PUF[9] 54% 8% Anderson PUF[25] 48% 3.6%
Arbiter PUF[4] 1.05% 0.3% Mills PUF[26] 47.8% 6.8%
RO PUF[23] 46.15% 0.48% Our PUF 49.6% 3.7%
in different temperatures. Thus, although the temperature change will affect reliability, PUFs have good uniqueness at temperatures from 40 ◦ C to 70 ◦ C. In conclusion, PUFs presented in the paper can still work effectively at high temperatures. In addition, we also keep statistics on Hamming distances between any two signatures generated by 16 PUFs at high temperatures and then draw a frequency distribution histogram of 120 data points of Hamming distances, as shown in Figure 10. The maximum of the 120 Hamming distances is 40; the minimum is 14; the average is 29.65 (46.3% of the total number of bit positions 64). At high temperatures, the maximum number of bits with value 1 in a PUF signature is 31; the minimum is 14; the average is 23.50, i.e., only 36.7% of the bits of a PUF signature are numbered ‘1’, far from the ideal 50%. However, we can adjust the length of the multiplexer chain to adjust the uniformity of PUF signatures and enhance the uniqueness of PUF design. Moreover, PUF signature differences due to environmental causes can be corrected by CRC [4]. D. Performance Comparisons with Previous PUFs Table 1 shows performance comparisons of the PUF designed and implemented by this paper with other kinds of PUFs. Uniqueness and reliability are used to evaluate the performance of PUFs. The ideal value of uniqueness is 50%, namely the closer to 50% the value of uniqueness is, the better the uniqueness of the corresponding PUF is. The ideal value of reliability is 0, so the closer to 0 the value of reliability is, the more reliable the corresponding PUF is. The experimental data of compared PUFs in Table 1 are from the reference [7]. Table 1 shows that uniqueness of our proposed PUF reaches 49.6%, pretty close to the ideal value of 50% and better than most of the other kinds of PUFs. As for reliability, the PUF designed by the paper achieves 3.7%, also better than most of the other kinds of PUFs. The proposed PUF has four advantages: 1) good scalability; 2) less FPGA resource overhead; 3) unnecessity of hard macros and 4) high transplantability. V. PUF S FOR FPGA IP P ROTECTION With the reusable design methodology prevailing in the field of integrated circuit design, IC design industry is facing
113
research on the scheme in [24]. The scheme is currently the first non-encrypted FPGA hardware IP binding scheme that can potentially address the drawbacks to signature-based schemes and encryption-based schemes mentioned above. PUFs not only have an attractive application prospect in the field of FPGA intellectual property protection but also wide application in key generation, digital rights management and other security areas. Therefore, PUF design and implementation with low-cost, high stability and high security will lead the development of related fields.
[8] P. Bulens, F. X. Standaert, J. J. Quisquater, “How to strongly link data and its medium: the paper case,” IET Information Security, 2010, 4(3): 125-136. [9] G. Hammouri, A. Dana, B. Sunar, “CDs have fingerprints too,” In Proc. CHES, Lausanne, Switzerland, 2009: 348-362. [10] P. Tuyls, G. J. Schrijen, B. Koric, et al. “Read-proof hardware from protective coatings,” in Proc. CHES, Yokohama, Japan, 2006: 369-383. [11] J. Guajardo, B. Koric, P. Tuyls, et al. “Anti-counterfeiting, key distribution, and key storage in an ambient world via physical unclonable function,” Information Systems Frontiers, 2009, 11(1): 19-41. [12] K. Lofstrom, W. R. Daasch, D. Taylor. “IC identification circuit using device mismatch,” in Proc. ISSCC, San Francisco, USA, 2000: 372-373. [13] R. Helinski, D. Acharyya, J. Plusquellic, “A physical unclonable function defined using power distribution system equivalent resistance variation,” in Proc. DAC. San Francisco, USA, 2009: 676-681. [14] A. Moradi, D. Oswald, C. Paar, et al., “Side-channel attacks on the bitstream encryption mechanism of Altera Stratix II: facilitating black-box analysis using software reverseengineering,” in Proc. FPGA, 2013: 91-100. [15] C. Yin, G. Qu, Q. Zhou, “Design and Implementation of a Group-based RO PUF,” in Proc. DATE, Grenoble, France. 2013: 416-421. [16] J. Zhang, Y. Lin, Q. Wu, et al., “Watermarking FPGA bitfile for intellectual property protection,” Radioengineering, 2012, 21(2): 764-771. [17] J. Zhang, Y. Lin, W. Che, et al., “Efficient verification of IP watermarks in FPGA designs through lookup table content extracting,” IEICE Electronics Express, 2012, 9(22): 17351741. [18] J. Zhang, Y. Lin, Y. Lyu, et al., “A chaotic-based publicly verifiable FPGA IP watermark detection scheme,” SCIENTIA SINICA Informationis, 2013, 43(9): 1096-1110. [19] M. Majzoobi, F. Koushanfar, M. Potkonjak, “Testing techniques for hardware security,” in Proc. ITC. Santa Clara, USA, 2008: 1-10. [20] J. Zhang, Y. Lin, Y. Lyu, et al., “Binding hardware IPs to specific FPGA device via inter-twining the PUF response with the FSM of sequential circuits,” In Proc. 21st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Seattle, USA, 2013: 227. [21] S. Morozov, A. Maiti, P. Schaumont, “An analysis of delay based PUF implementations on FPGA,” in Proc. ARC, 2010: 382-387. [22] A. Maiti, P. Schaumont, “Improved ring oscillator PUF: an FPGA friportly secure primitive,” Journal of Cryptology, 2011, 4(2): 375-397. [23] G. Suh, S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” In Proc. DAC. San Diego, USA, 2007: 9-14. [24] J. Zhang, Y. Lin, Y. Lyu, et al., “FPGA IP protection by binding finite state machine to physical unclonable functions,” In Proc. 23rd International Conference on Field Programmable Logic and Applications (FPL), Porto, Portugal, 2013. [25] J. H. Anderson, “A PUF design for secure FPGA-based embedded systems,” in Proc. ASPDAC. Taipei, Taiwan, 2010: 1-6. [26] A. Mills, “Design and evaluation of a delay-based FPGA physically unclonable function,” in Proc. ICCD. Montreal, Canada, 2012: 143-146.
VI. C ONCLUSION AND F UTURE W ORK The proposed PUF in the paper takes full advantage of the logic unit structure, placement and routing structure of FPGAs, and hence is suitable for implementation on FPGAs. Compared to previous PUF designs, it can be implemented on FPGAs with good scalability. We implemented the PUF on 28nm FPGA successfully. Experimental results show that the PUF design has satisfactory uniqueness, uniformity and reliability. In future, we intend to explore methods to further optimization of the hardware, and improvement of PUF reliability. We also plan to achieve prototype implementation of PUF-based non-encrypted binding scheme with PUF designed in this paper and assess the performance of the prototype system. ACKNOWLEDGMENT This work was supported by the Postgraduate Research and Innovation Project of Hunan Province of China under grant No CX2012B142. National Significant Science and Technology Projects of China under Grant No 2013ZX01039001-002-003. National Natural Science Foundation of China under Grant No 61228204. The Young Teacher Development Program of Hunan University. R EFERENCES [1] J. Guajardo, S. Kumar, G. Schrijen, et al., “FPGA intrinsic PUFs and their use for IP protection,” in Proc. CHES, Vienna, Austria, 2007: 63-80. [2] S. Kumar, J. Guajardo, R. Maes, et al., “The butterfly PUF: protecting IP on every FPGA,” in Proc. HOST, Anaheim, USA, 2008: 67-70. [3] R. Maes, D. Schellekens, I. Verbauwhede, “A pay-per-use licensing scheme for hardware IP cores in recent SRAMFPGAs,” IEEE Transactions on Information Forensics and Security, 2012, 7(1): 98-108. [4] D. Lim , J. W. Lee, B. Gassport, et al., “Extracting secret keys from integrated circuits,” IEEE Transactions on VLSI, 2005, 13(10): 1200-1205. [5] J. Zhang, Q. Wu, J. Chen, “Research on design method of dynamic partial reconfigurable system,” Journal of Software Engineering, 2012, 6(2): 21-30. [6] R. Pappu. Physical one-way functions. Massachusetts Ave, Cambridge, USA: MIT, 2001. [7] R. Maes, I. Verbauwhede, “Physically unclonable functions: A study on the state of the art and future research directions,” in Towards Hardware-Intrinsic Security, ser. Information Security and Cryptography, Springer, 2010: 3-37.
114