The extremely high cost of custom ASIC fabrication makes FPGAs an ... using the latest deep sub-micron process technolog
Implementing Memory Protection Primitives on Reconfigurable Hardware Brett Brotherton and Nick Callegari University of California, Santa Barbara Department of Electrical and Computer Engineering Santa Barbara, CA 93106 {bbrother,nick callegari}@ece.ucsb.edu Abstract The extremely high cost of custom ASIC fabrication makes FPGAs an attractive alternative for deployment of custom hardware. Embedded systems based on reconfigurable hardware integrate many functions onto a single device. Since embedded designers often have no choice but to use soft IP cores obtained from third parties, the cores operate at different trust levels, resulting in mixed trust designs. The goal of this project is to evaluate recently proposed protection primitives for reconfigurable hardware by building a real embedded system with several cores on a single FPGA. Overcoming the practical problems of integrating multiple cores together with security mechanisms will help us to develop realistic security policy specifications that drive enforcement mechanisms on embedded systems.
1
Introduction
Reconfigurable hardware, such as a Field Programmable Gate Array (FPGA), provides an attractive alternative to costly custom ASIC fabrication for deploying custom hardware. While ASIC fabrication requires very high nonrecurring engineering (NRE) costs, an SRAM-based FPGA can be “programmed” after fabrication to be virtually any circuit. Moreover, the configuration can be updated an infinite number of times for free. In addition, the performance gap between FPGAs and ASICs is narrowing. Although FPGAs are slower than ASICs, FPGAs can be fabricated using the latest deep sub-micron process technology, while many ASIC companies are forced to use more affordable technology that is several generations behind. Because they are able to provide a useful balance between performance, cost, and flexibility, many critical embedded systems make use of FPGAs as their primary source of computation. For example, the aerospace industry relies on FPGAs to control everything from the Joint Strike Fighter to the Mars Rover. We are now seeing an explo-
Ted Huffmire University of California, Santa Barbara Department of Computer Science Santa Barbara, CA 93106
[email protected]
sion of reconfigurable hardware based designs in everything from face recognition systems [24], to wireless networks [27], to intrusion detection systems [10], to supercomputers [1]. In fact it is estimated that in 2005 alone there were over 80,000 different commercial FPGA designs projects started. [21] Since major IC manufacturers outsource most of their operations to a variety of countries [22], the theft of IP from a foundry is a serious concern. FPGAs provide a viable solution to this problem, since the sensitive IP is not loaded onto the device until after it has been manufactured and delivered. This makes it harder for the adversary to target a specific application or user. In addition, device attacks are difficult on an FPGA since the intellectual property is lost when the device is powered off. Modern FPGAs use bitstream encryption and other methods to protect the intellectual property once it is loaded onto the FPGA or an external memory. Although FPGAs are currently fielded in critical applications that are part of the national infrastructure, the development of security primitives for FPGAs is just beginning. Reconfigurable systems are typically cobbled together from a collection of existing modules (called cores) in order to save both time and money. Cost pressures often force designers to obtain cores from third parties, resulting in mixed trust designs. The goal of this paper is to evaluate recently proposed security primitives for reconfigurable hardware [9] [8] by building an embedded system consisting of multiple cores on a single FPGA. Our testing platform will help us to better understand how these security primitives interact with cores on a real system. Overcoming the problems of integrating several cores together with reconfigurable protection primitives will provide an opportunity to develop more realistic security policy specifications that drive the enforcement mechanisms on embedded systems. In addition to designing a security policy for the system, we will build a programmatic interface for user applications to communicate with the device. Since security primitives
on embedded devices need to be efficient, we also wish to ensure that the combined system achieves efficient memory system performance.
2 2.1
Related Work IP Theft
Most of the work relating to FPGA security targets the problem of preventing the theft of intellectual property and securely uploading bitstreams in the field. Since such theft directly impacts their bottom line, industry has already developed several techniques to combat the theft of FPGA IP, such as encryption [2] [12] [13], fingerprinting [16], and watermarking [17]. However, establishing a root of trust on a fielded device is challenging because it requires a decryption key to be incorporated into the finished product. Some FPGAs can be remotely updated in the field, and industry has devised secure hardware update channels that use authentication mechanisms to prevent a subverted bitstream from being uploaded [7] [6]. These techniques were developed to prevent an attacker from uploading a malicious design that causes unintended functionality. Even worse, the malicious design could physically destroy the FPGA by causing the device to short-circuit [5].
2.2
Reconfigurable Protection
Figure 1 shows several different strategies for providing protection for embedded systems. Ideally, every application runs on its own dedicated device, but this is clearly inefficient from a cost perspective. In contrast to strictly physical protection, separation kernels [26] [11] [19] use software virtualization to prevent applications from interfering with each other, but they come with the overhead of software and can only run on general-purpose processors. Huffmire et al. proposed a third approach, reconfigurable protection [9], that uses a reconfigurable reference monitor that enforces the legal sharing of memory among cores. A memory access policy is expressed in a specialized language, and a compiler translates this policy directly to a circuit that enforces the policy. The circuit is then loaded onto the FPGA along with the cores. Huffmire et al. proposed another protection primitive, moats and drawbridges [8], that exploits the spatial nature of computation on FPGAs to provide strong isolation of cores. A moat surrounds a core with a channel in which routing is disabled. In addition to isolation of cores, moats can also be used to isolate the reference monitor and provide tamper-resistance. Drawbridges use static analysis of the bit-stream to ensure that only specified connections between cores can be established. Drawbridges can also be used to ensure that the reference monitor cannot be bypassed and is always invoked.
There appears to be little other work on the specifics of managing FPGA resources in a secure manner. Chien and Byun have perhaps the closest work, where they addressed the safety and protection concerns of enhancing a CMOS processor with reconfigurable logic [3]. Their design achieves process isolation by providing a reconfigurable virtual machine to each process, and their architecture uses hardwired TLBs to check all memory accesses. Our work could be used in conjunction with theirs, using soft-processor cores on top of commercial off-the-shelf FPGAs rather than a custom silicon platform. In fact, we believe one of the strong points of our work is that it may provide a viable implementation path to those that require a custom secure architecture, for example execute-only memory [20] or virtual secure co-processing [18]. Gogniat et al. propose a method of embedded system design that implements security primitives such as AES encryption on an FPGA, which is one component of a secure embedded system containing memory, I/O, CPU, and other ASIC components [4]. Their Security Primitive Controller (SPC), which is separate from the FPGA, can dynamically modify these primitives at runtime in response to the detection of abnormal activity (attacks). In this work, the reconfigurable nature of the FPGA is used to adapt a crypto core to situational concerns, although the concentration is on how to use an FPGA to help efficiently thwart system level attacks rather than chip-level concerns. Indeed, FPGAs are a natural platform for performing many cryptographic functions because of the large number of bit-level operations that are required in modern block ciphers. However, while there is a great deal of work centered around exploiting FPGAs to speed cryptographic or intrusion detection primitives, systems researchers are just now starting to realize the security ramifications of building systems around hardware which is reconfigurable.
2.3
Covert Channels, Direct Channels, and Trap Doors
Although moats provide physical isolation of cores, it is possible that cores could still communicate via a covert channel. Exploitation of a covert channel results in the unintended flow of information between cores. Covert channels work via an internal shared resource, such as power consumption, processor activity, disk usage, error conditions, or temperature of the device. [29] [25]. For example, a power analysis attack could extract the cryptographic keys used by an encryption core [15]. Classical covert channel analysis involves the articulation of all shared resources on chip, identifying the share points, determining if the shared resource is exploitable, determining the bandwidth of the covert channel, and determining whether remedial action can be taken [14] [23]. Storage channels can be mitigated
Reconfigurable Protection Separate Processors gate keeper
gate keeper
gate keeper
app3
app2
app1
Separation Kernels app1 app2 app3
DRAM
DRAM
app2
app1
kernel
Physical
DRAM
DRAM
DRAM
app3
DRAM
DRAM
DRAM
DRAM
Reference Monitor
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
Software
Figure 1. Alternative strategies for providing protection on embedded systems. From a security standpoint, a system with multiple applications could allocate a dedicated physical device for each application, but economic realities force designers to integrate multiple applications onto a single device. Separation kernels use virtualization to prevent applications from interfering with each other, but they come with the overhead of software and are therefore restricted to general-purpose processor based systems. The goal of this project is to evaluate a recently proposed approach to providing protection for FPGA based embedded systems that uses a reconfigurable reference monitor to enforce the legal sharing of memory among cores.
by partitioning the resources, while timing channels can be mitigated with sequential access. Examples of remedial action include decreasing the bandwidth (e.g., the introduction of artificial spikes (noise) in resource usage [28]) or closing the channel. Unfortunately, an adversary can extract a signal from the noise, given sufficient resources [23]. Of course the techniques we are evaluating in this paper [9] [8] are primarily about restricting the opportunity for direct channels and trap doors [30]. The memory protection scheme proposed in [9] is an example of that. Without any memory protection, a core can leak secret data by writing the data directly to memory. Another example of a direct channel is a tap that connects two cores. An unintentional tap is a direct channel that can be established through luck. For example, the place-and-route tool’s optimization strategy may interleave the wires of two cores.
3 3.1
System Architecture Approach
Figure 4 shows the architecture of our system. A reference monitor/arbiter sits between the MicroBlaze processors and the on-chip peripheral bus (OPB). Shared external memory (DRAM), the AES Core, the RS-232 interface, and
the Ethernet interface also are connected to the bus. In addition to integrating the Ethernet core into the system, we have designed software to communicate over TCP with the processor. We have the ability to send data together with the desired operation to the device and receive either encrypted or decrypted data depending on the request. We have also modified the serial code to work with the new file format. We can now receive and process files over both serial and Ethernet connections. In addition, we have configured our two-processor system to run applications simultaneously.
3.2
Progress
In order to integrate the reference monitor with the OPB, we first integrated the reference monitor with the OPB block ram controller and ensured that it functions correctly with low latency and overhead. Then, we tackled the more difficult problem of integrating the reference monitor with the OPB. We can now regulate access to any of the slave processors on the bus, and our modifications only add one cycle to the latency.
3.3
Future Work
The remaining action items for this part of the project include ensuring that the reference monitor and security pol-
ModuleID Op (2)
(rw)
Address
(0x8E7B018)
Enforcement Module Parallel Search
M2
M1
M1
M2
Range
Range ID Match?
0000 1000 1110 0111 1011 0000 0000 1XXX
1
0
0000 1000 1110 0111 1011 0000 0001 10XX
2
1
...
...
TDMA Arbiter
TDMA Arbiter
Bus
Bus
Access Descriptor
MEM
RM
Buf
MEM
Figure 2. Since the placement of the reference monitor affects memory bus performance, this paper explores various design alternatives. In the figure on the left, a memory access must pass through the reference monitor (RM) before going to memory. In the figure on the right, the reference monitor (RM) snoops on the bus, and a buffer (B) stores the data until the access is approved. An arbiter prevents the bus from being accessed by more than one module at a time.
icy perform as intended when integrated with the system. In addition, we will test the MicroBlaze software with the new file sending application.
4
0
{0,1,0,...,0}
Module ID Op Range ID Bit Vector init
RM
N
0001 0101 1111 0000 0001 1010 1111 XXXX
{M3,z,R3} {M1,w,R4}
DFA Logic
0
{M1,rw,R1}, {M1,r,R3}, {M2,rw,R2}, {M3,rw,R3}
{Legal ,Illegal}
Figure 3. The inputs to the enforcement module are the module ID, op, and address. The range ID is determined by performing a parallel search over all ranges, similar to a content addressable memory (CAM). The module ID, op, and range ID together form an access descriptor, which is the input to the state machine logic. The output is a single bit: either grant or deny the access.
ublaze 1
Security Policy
Our reference monitor will enforce a stateful memory access policy. There are three states: one state for the case in which P rocessor1 (or M odule1 ) has access to the AES Core, one state for the case where P rocessor2 (or M odule2 ) has access to the AES Core, and one state for the case where neither has access to the AES Core. A processor obtains access to the AES core by writing to a specific control word (Control Word 1), and a processor relinquishes access to the AES core by writing to another specific control word (Control Word 2). Therefore, the transitions between states occur when one of the processors writes to one of these specified control words. In addition to permitting temporal sharing of the AES Core, the policy isolates the two MicroBlaze controllers
{M1,rw,R1}, {M1,r,R3}, {M2,rw,R2}, {M2,r,R3}, {M3,rw,R3}
1
ublaze 1 Ref Monitor/Arbiter
OPB
Shared External Memory RS232 AES Core
Figure 4. System architecture.
Ethernet
such that P rocessor1 and RS-232 data is in a separate isolation domain as P rocessor2 and Ethernet data. Since each component of our system is assigned a specific address range, our reference monitor is well-suited for enforcing a resource sharing policy. We specify the policy for our system as follows: Range0 → [0x41400000,0x4140ffff]; (Debug) Range1 → [0x28000000,0x28000777]; (AES1) Range2 → [0x28000800,0x28000fff]; (AES2) Range3 → [0x24000000,0x24777777]; (DRAM1) Range4 → [0x24800000,0x24ffffff]; (DRAM2) Range5 → [0x40600000,0x4060ffff]; (RS-232) Range6 → [0x40c00000,0x40c0ffff]; (Ethernet) Range7 → [0x28000004,0x28000007]; (Ctrl W ord1 ) Range8 → [0x28000008,0x2800000f]; (Ctrl W ord2 ) Range9 → [0x28000000,0x28000003]; (Ctrl W ordAES ) Access0 → {M odule1 ,rw,Range5 } | {M odule2 ,rw,Range6 } | {M odule1 ,rw,Range3 } | {M odule2 ,rw,Range4 } | {M odule1 ,rw,Range0 } | {M odule2 ,rw,Range0 }; Access1 → Access0 | {M odule1 ,rw,Range1 } | {M odule1 ,rw,Range9 }; Access2 → Access0 | {M odule2 ,rw,Range2 } | {M odule2 ,rw,Range9 }; T rigger1 → {M odule1 ,w,Range7 }; T rigger2 → {M odule1 ,w,Range8 }; T rigger3 → {M odule2 ,w,Range7 }; T rigger4 → {M odule2 ,w,Range8 }; Expr1 → Access0 | T rigger3 Access2 * T rigger4 ; Expr2 → Access1 | T rigger2 Expr1 * T rigger1 ; Expr3 → Expr1 * T rigger1 Expr2 *; P olicy → Expr1 * | Expr1 * T rigger3 Access2 * | Expr3 T rigger2 Expr1 * T rigger3 Access2 * | Expr3 T rigger2 Expr1 * | Expr3 | ; Figure 5 shows the DFA that enforces this policy. From this policy, we automatically generate a hardware description in Verilog of a reference monitor.
5 5.1
Communication Interface
init
M1 M2 R0: rw rw R3: rw __ R4: __ rw R5: rw __ R6: __ rw R7: _w _w {M2,w,R7} {M2,w,R8} M1 M2 R0: rw rw R2: __ rw R3: rw __ R4: __ rw R5: rw __ R6: __ rw R8: __ _w R9: __ rw
{M1,w,R7} {M1,w,R8} M1 M2 R0: rw rw R1: rw __ R3: rw __ R4: __ rw R5: rw __ R6: __ rw R8: _w __ R9: rw __
Figure 5. A stateful policy for our system.
the device and receive the result programmatically. This will enable applications to provide a user interface to allow the user to select a data file and a key file and to specify where to save the file containing the result.
5.2
Progress
We have implemented a user interface in C++ to allow more functionality and user friendliness. The user can specify the desired operation, the input file, and the key file. Currently, the output is sent to a default location. To achieve modularized functionality, we implemented the serial socket coding to allow the user to connect to the Xiling board. Functions are enabled to listen to the board and output the encrypted/decrypted data to a text file.
5.3
Future Work
Goal
Currently, we are using HyperTerminal to connect to the AES core via the serial connection. We have tested the AES core using a 128-bit key by manually parsing the data into 32-bit lines. Our goal is to incorporate a programmatic interface to allow a user application to send data and keys to
The remaining action items for this part of the project include enhancing the user interface to handle multiple forms of I/O (both serial and Ethernet). Each type of I/O will be assigned to a specific processor core. We will also allow the user to specify which connection method to use when communicating with the device.
References [1] U. Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff, and P. Sadayappan. Parallel FPGA-based all-pairs shortest-paths in a directed graph. In Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06), April 2006. [2] L. Bossuet, G. Gogniat, and W. Burleson. Dynamically configurable security for SRAM FPGA bitstreams. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS ’04), Santa Fe, NM, April 2004. [3] A. Chien and J. Byun. Safe and protected execution for the Morph/AMRM reconfigurable processor. In Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, CA, April 1999. [4] G. Gogniat, T. Wolf, and W. Burleson. Reconfigurable security support for embedded systems. In Proceedings of the 39th Hawaii International Conference on System Sciences, 2006. [5] I. Hadzic, S. Udani, and J. Smith. FPGA viruses. In Proceedings of the Ninth International Workshop on Field-Programmable Logic and Applications (FPL ’99), Glasgow, UK, August 1999. [6] S. Harper and P. Athanas. A security policy based upon hardware encryption. In Proceedings of the 37th Hawaii International Conference on System Sciences, 2004. [7] S. Harper, R. Fong, and P. Athanas. A versatile framework for FPGA field updates: An application of partial self-reconfiguration. In Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping, June 2003. [8] T. Huffmire, B. Brotherton, G. Wang, T. Sherwood, R. Kastner, T. Levin, T. Nguyen, and C. Irvine. Moats and drawbridges: An isolation primitive for reconfigurable hardware based systems. In Proceedings of the 2007 IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 2007. [9] T. Huffmire, S. Prasad, T. Sherwood, and R. Kastner. Policy-driven memory protection for reconfigurable systems. In Proceedings of the European Symposium on Research in Computer Security (ESORICS), Hamburg, Germany, September 2006. [10] B. Hutchings, R. Franklin, and D. Carver. Assisting network intrusion detection with reconfigurable hardware. In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02), 2002. [11] C. Irvine, T. Levin, T. Nguyen, and G. Dinolt. The trusted computing exemplar project. In Proceedings of the 5th IEEE Systems, Man and Cybernetics Information Assurance Workshop, pages 109–115, West Point, NY, June 2004. [12] T. Kean. Secure configuration of field programmable gate arrays. In Proceedings of the 11th International Conference on Field Programmable Logic and Applications (FPL ’01), Belfast, UK, August 2001. [13] T. Kean. Cryptographic rights management of FPGA intellectual property cores. In Tenth ACM International Symposium on FieldProgrammable Gate Arrays (FPGA ’02), Monterey, CA, February 2002. [14] R. Kemmerer. Shared resource matrix methodology: An approach to identifying storage and timing channels. In ACM Transactions on Computer Systems, 1983. [15] P. Kocher, J. Jaffe, and B. Jun. Differential power analysis. August 1999. [16] J. Lach, W. Mangione-Smith, and M. Potkonjak. FPGA fingerprinting techniques for protecting intellectual property. In Proceedings of the 1999 IEEE Custom Integrated Circuits Conference, San Diego, CA, May 1999.
[17] J. Lach, W. Mangione-Smith, and M. Potkonjak. Robust FPGA intellectual property protection through multiple small watermarks. In Proceedings of the 36th ACM/IEEE Conference on Design Automation (DAC ’99), New Orleans, LA, June 1999. [18] R. B. Lee, P. C. S. Kwan, J. P. McGregor, J. Dwoskin, and Z. Wang. Architecture for protecting critical secrets in microprocessors. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA 2005), pages 2–13, June 2005. [19] T. E. Levin, C. E. Irvine, and T. D. Nguyen. A least privilege model for static separation kernels. Technical Report NPS-CS-05003, Naval Postgraduate School, 2004. [20] D. Lie, C. Thekkath, M. Mitchell, P. Lincoln, D. Boneh, J. Mitchell, and M. Horowitz. Architectural support for copy and tamper resistant software. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), Cambridge, MA, November 2000. [21] D. McGrath. Gartner dataquest analyst gives ASIC, FPGA markets clean bill of health. EE Times, 13 June 2005. [22] R. Milanowski and M. Maurer. Outsourcing poses unique challenges for the u.s. military-electronics community. August/September 2006. [23] J. Millen. Covert channel capacity. In Proceedings of the 1987 IEEE Symposium on Security and Privacy, Oakland, CA, USA, April 1987. [24] H. Ngo, R. Gottumukkal, and V. Asari. A flexible and efficient hardware architecture for real-time face recognition based on Eigenface. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2005. [25] C. Percival. Cache missing for fun and profit. In BSDCan 2005, Ottowa, Ontario, Canada, 2005. [26] J. Rushby. A trusted computing base for embedded systems. In Proceedings 7th DoD/NBS Computer Security Conference, pages 294– 311, September 1984. [27] B. Salefski and L. Caglar. Reconfigurable computing in wireless. In Proceedings of the Design Automation Conference (DAC), 2001. [28] H. Saputra, N. Vijaykrishnan, M. Kandemir, M. Irwin, R. Brooks, S. Kim, and W. Zhang. Masking the energy behavior of DES encryption. In IEEE Design Automation and Test in Europe (DATE ’03), 2003. [29] F. Standaert, L. Oldenzeel, D. Samyde, and J. Quisquater. Power analysis of FPGAs: How practical is the attack? FieldProgrammable Logic and Applications, 2778(2003):701–711, Sept. 2003. [30] K. Thompson. Reflections on trusting trust. Communications of the ACM, 27(8), 1984.