An FPGA-based Key-Store for Improving the ... - Semantic Scholar

11 downloads 22043 Views 260KB Size Report
transportation systems are just a few examples of such web applications, but many ... Encrypted key – The key is stored in a file on a disk of the machine hosting ... the fourth solution (encrypted key) has the best performance, as compared to the ... services. The device is implemented on top of a Commercial Off The Shelf ...
An FPGA-based Key-Store for Improving the Dependability of Security Services A. Cilardo1 , A. Mazzeo1 , L. Romano1 , and G. P. Saggese1 University of Naples Federico II, Via Claudio 21, 80125 Napoli, Italy {acilardo, mazzeo, lrom, saggese}@unina.it

Abstract. A key-store is a facility for storing sensitive information, most typically the keys of a cryptographic application which provides a security service. In this paper, we present a hardware implemented key-store, which allows secure storage and high performance retrieval of RSA keys. Since RSA is the most widely adopted standard for cryptographic keys, our key-store can be effectively used to improve the dependability of a wide class of security services. The device is implemented on top of a Commercial Off The Shelf (COTS) programmable hardware board, namely a Celoxica RC1000 mounting a Xilinx Virtex-E 2000 FPGA part. We describe the architecture of the hardware device, i.e. individual functional blocks and their interactions, and technical solutions we adopted for maximizing performance. We then illustrate the organization of the device driver, which enforces authentication mechanisms, i.e., accomplishes the task of making the key-store device available only to trusted software modules. Finally, we evaluate the security and performance gain which can be achieved by integrating our device in a case study system. The killer application used for the experimental evaluation is an Attribute Authority (AA) server, i.e. a trusted application that uses cryptographic techniques to certify authorization information to a large community of principals. It is thus foremost that reliable authorization information be made available in a timely fashion.

1

Introduction

In the recent years, we are witnessing the proliferation of Web oriented software applications, consisting of autonomous software components, which interact over the network in a fully automated way, towards the provision of crucial functions. In this context, by crucial functions we mean functions whose failure may result in major economic impact and/or loss of human lives. Emerging e-commerce, health-care, and transportation systems are just a few examples of such web applications, but many others are possible. For these applications to work properly, it is foremost that a whole set of services, collectively called infrastructure services, be provided in a reliable and timely fashion. Typical infrastructure services are directory, discovery, and security. This paper focuses on the dependability of security services. Dependability is an umbrella term, which encompasses a variety of system attributes, such as reliability, availability, performance, safety, and security [1]. In this context, by dependability we mean security and performance. Since the implementation of many security services – such as authentication, confidentiality, authorization, non-repudiation, and timestamping – relies on cryptography, cryptographic systems are critical applications

both from a security and a performance viewpoint. To meet the stringent dependability requirements of such applications – and ultimately of security services – it is thus crucial that secure and high-performance solutions be provided to store and access sensitive information, i.e. cryptographic keys, used for both simmetric-key and public-key cryptography. Existing solutions can be grouped in four main categories (or a combination thereof): 1. Off-line key – The key is stored on a dedicated server (key server), which is kept off-line. Only authorized human operators can access the server, for executing cryptographic procedures. This solution is adopted, to the best of the authors’ knowledge, by several commercial certificate servers, such as [2], [3], and [4]; 2. Distributed key – In these schemes, the key is spread over a set of servers, i.e. different shares of the key are stored on each of the servers, and threshold cryptography [11] is used to construct the output of the cryptographic procedure. This solution is adopted, among the others, by the Cornell On-line Certification Authority (COCA) system [12], and by the Intrusion Tolerance via Threshold Cryptography (ITTC) project [13]; 3. Negotiated key – Parties negotiate a key dynamically, using some kind of protocol, which exploits the inherent difficulty of solving a mathematical problem with partial knowledge of the problem’s solution. The Diffie-Hellman [10] protocol is one of the most widely known among such schemes. In order for these schemes to work, it is essential that parties be confident about mutual identities (authentication). Again, this is achieved via yet another cryptographic procedure, and related key. Commercial protocols, such as Secure Socket Layer (SSL) [14] and its newer version Transport Level Security (TLS) [15] incorporate both authentication and key negotiation facilities; 4. Encrypted key – The key is stored in a file on a disk of the machine hosting the cryptographic application. In order to protect the confidentiality of the key, this file is encrypted via yet another cryptographic procedure (and related key or pass-phrase). This solution is adopted by most cryptographic packages, such as OpenSSL [9], and implementations of the Java Cryptography Extension (JCE) [5]. The above described techniques provide only partial solutions to the problem of high-performance and high-security key access, with individual alternatives being characterized by different trade-offs. The first solution (off-line key) has good security, since an air-gap is the best measure against security breaches, but it is also characterized by extremely poor performance (i.e. a very high latency in key access). The second solution (distributed key) exhibits interesting properties of intrusion tolerance, since it allows the security service to survive breaches in a subset of the servers. However, this comes at a high cost in terms of performance, due to the inherent overhead of deploying threshold cryptography over a distributed platform. Although the latency of key accesses is likely to be lower than in the off-line key solution, it might still be unacceptable under many circumstances. The third solution (negotiated key) has a performance problem (the negotiation phase takes time) and a security problem (authentication keys still need secure storage). Finally, the fourth solution (encrypted key) has the best performance, as compared to the others. However, its security is poor, due to two main reasons. First, the key might

be stolen quite easily. Research conducted in [6] demonstrates that multiple copies of a cryptographic key are present in a system which uses the encrypted key solution. For example, the key may appear in an Operating System swap file which contains intermediate state of a previous signing session, or it may appear in a backup file created automatically by the Operating System at fixed intervals, or it may appear on the disk on a damaged sector which is not considered part of the file system. The authors also demonstrate that the attacker can locate such copies in a few minutes, even in systems with gigabytes of disk space. Once the attacker has stolen the key, he1 can conduct off-line cryptanalytic attacks undisturbed. Second, there is still an open issue regarding secure storage and retrieval of the key or pass-phrase which is used to encrypt the primary key. In this paper, we propose a solution which allows secure storage of cryptographic keys while providing high-performance key access. The solution relies on a custom hardware device implementing a tamper resistant key-store, which is made available only to trusted applications, by means of a security enhanced device driver. Our keystore is a low-cost facility which may be used – as an alternative to or in combination with the above discussed solutions – to allow reliable and timely delivery of security services. The device is implemented on top of a Commercial Off The Shelf (COTS) programmable hardware board, namely a Celoxica RC1000 board mounting a Xilinx Virtex-E 2000 FPGA part. The driver has been developed and tested on version 2.4.18-3 of the Linux kernel, but it should work fine on any version newer than 2.2. As far as security is concerned, we show that the device effectively increases system resilience to security attacks at different levels, and in particular at the user level (an intruder user application is active in the system), at the root level (an intruder launches a shell with root privileges), and at the physical level (an intruder has gained physical access to the system). As far as performance is concerned, we provide experimental results showing that our key-store dramatically cuts down on the latency of key access. A performance comparison is made to the encrypted key solution, which is – among the above discussed alternatives – the one which provides fastest access to cryptographic keys. The speed-up factor achieved on a dual processor server equipped with a hardware RAID controller – if the encrypted key is already in the disk cache – ranges between 1.29 and 2.77 for a value of the key length ranging between 512 and 4096 bit (experiments on different set-ups indicate that much larger values would be achieved on machines with lower disk I/O bandwidth and less computing power). If the encrypted key is not in the disk cache, the speed-up is between one and two orders of magnitude. We explicitly note that, as it is apparent from area occupation data reported in Section 2, the key-store uses a small percentage of the FPGA cells suitable for implementing logical functions, namely LUT cells. Consequently, it would be very possible to use the unoccupied space to implement additional security facilities. In particular we have already implemented on the same FPGA device a crypto-processor executing the RSA algorithm [20] [21]. We do not describe here the architecture of the crypto-processor due to lack of space. Nevertheless, we explicitly note that 1

Throughout the paper he and she are used interchangeably.

the authentication mechanisms proposed in this paper would be perfectly suited to enforce authentication, also in cases where the FPGA board provided additional security functions. The rest of the paper is organized as follows. Section 2 describes the architecture and the implementation of the hardware device. Section 3 illustrates the organization of the device driver, which accomplishes the task of making the key store available only to trusted software modules. Section 4 describes the application used as a case study, namely an Attribute Authority (AA) server. In Section 5, we evaluate the security and performance gain which can be achieved by using our device, in the context of the case study application. Section 6 concludes the paper with some final remarks.

2

Key Store Hardware Architecture and Implementation

In this Section we describe the internal structure of the FPGA-based key-store device. A prototypical version of the key-store has been designed on a standard 32-bit PCI card, namely a Celoxica RC1000 card [7] This board hosts a Xilinx Virtex2000E FPGA and a set of four 2MB SRAM memory banks. The Virex2000E FPGA consists of interconnection circuitry and an array of Configurable Logic Blocks (CLBs) including each four 4-input Look-Up Tables (LUTs), four flip-flops (FFs), and associated carry chain logic. A simplified schematic of the key-store is depicted in Figure 1. The system consists

Addr2

log2M log2(k/32)

N1 N2 ... NM-1

EnN

Store

Addr2

log2M log2(k/32)

clk

D1 D2 ... DM-1 32

EnD

W/R

...

log2(k/64)

Addr3 clk

EnQINV

EnD 32

Addr1

log2M

32

Addr1

Qinv1 Qinv2 ... QinvM-1 32

clk

Addr2

32

Start Reset KeyID W/R

Addr3 EnN EnD EnQINV

EnQINV 32

clk

k/2

...

32

EnN

Addr1

Qinv Block

k

W/R

FSM

Addr1

D Block

k

W/R

Demux

N Block

RAM control signal

Controller

To/From RC1000 board RAM

Data Bus

Fig. 1. The FPGA part of the hardware key-store module.

of two parts: the Controller and the Store. In order to load data from or to store data to the FPGA, the RAM of the RC1000 board is used as a buffer, since the board does not allow direct access to the FPGA. Hence, when a key is to be loaded to the key-store, a DMA is activated from the central memory to the Celoxica board RAM. Then, the block Controller on the FPGA moves the data form the board RAM to the FPGA RAM (the actual key-store). Since the Celoxica board has an internal 32-bit bus between the RAM blocks and the FPGA part, and the PCI bus is a 32-bit bus too, the FPGA system works serially on 32-bit words. The key-store can hold up to M RSA [22] keys. The actual value of parameter M is chosen at design time, before the synthesis step is performed (we chose M = 4). Individual components of each key are defined according to PKCS#1 [8] and include the p, q, N , E, D, dP , dQ, and qInv quantities. Such components are orderly stored in the memory

blocks of Figure 1, i.e. individual components of the RSA key are mapped to specific physical memory blocks. Hardware requirements of our implementation are reported in Table 1, in terms of the number and the percentage of FPGA basic building blocks, i.e. LUT, FF, and BRAM, used to implement the Controller part and the Store part of the key-store device for different values of the key-length. Percentages are computed with respect to the total number of elementary cells of a specific type available on the FPGA (namely LUT, FF, and BRAM). Key-length Controller Store LUT FF LUT FF BRAM 512b 335 145 67 2 3 1024b

335

145

70

4

6

2048b

335

145

76

8

12

4096b

335

145

494

5

23

LUT 402 (1.05%) 405 (1.05%) 411 (1.07%) 829 (2.16%)

Total FF 147 (0.38%) 149 (0.39%) 153 (0.40%) 150 (0.39%)

BRAM 3 (1.88%) 6 (3.75%) 12 (7.50%) 23 (14.38%)

Table 1. Area requirements for implementing the Controller and the Store part of the key-store.

It is worth noting that the key-store only uses a very small percentage (a few percents) of the total FPGA resources. The maximum frequency of the overall system is 39 MHz. This limit is imposed by the maximum cycle rate that the on-board RAM can sustain.

3

Device Driver Architecture and Implementation

We developed a customized device driver to protect the key-store from unauthorized use. The driver embeds mechanisms for ensuring that only authorized application(s) are granted access to the key-store device. The driver is compiled statically in the kernel and dynamic loading of modules is not enabled. This has two fundamental advantages. First, the driver is actually part of the running kernel. As such, its text and data structures are loaded in kernel space, and thus benefit from kernel space protection mechanisms. Second, it is not possible to load modules into the running kernel (this is pictorially represented in Figure 2, where the unavailability of the insmod and rmmod facilities is emphasized). In particular, it is impossible for a potential attacker to load a malicious device driver module, in order to circumvent the access control mechanisms enforced by the running kernel. This section describes the architecture of the driver and provides a thorough treatment of development issues. In our implementation, the driver has been designed for Linux [24]. In particular, the driver has been developed and tested on version 2.4.18-3 of the Linux kernel, but it should work fine on any version newer than 2.2. In the design and implementation of the driver, we made extensive use of information and techniques described in [23]. The overall architecture of the driver is depicted in Figure 2, where for the sake of simplicity, only one trusted application is considered. In the following, we explain

how such a trusted application can use the key-store, and how other applications are prevented to do so. By trusted application we mean a software program for which the following conditions hold: – the system administrator can justifiably trust that the program does not contain malicious code or code which can (easily) be exploited in a malicious way (examples of such programs are COTS software components developed by a trusted vendor, and digitally signed with the vendor’s private key); – it has been launched by the system administrator in person via a protected interface, i.e. one supporting strong authentication of the operator (this typically relies on the use of a tamper resistant personal cryptographic token, such as a crypto-card). We assume that the trusted application consists of multiple threads belonging to a statically allocated thread pool. In other words, we do not allow the trusted application to dynamically spawn new threads. This is not a limitation, since the key-store device is intended for use by performance critical applications and such applications typically allocate threads statically, to avoid the inherent overhead of dynamic thread allocation. Since, as we already mentioned, the driver is compiled and

add_key()

add_TAI()

TAI0 TAI1 ...

del_TAI()

TAIR

TAI0 (2037, 32022)

2 3

TAI1 (2038, 32714)

2 4

... TAIR

KS_Functions

Kernel

key_desc

User Interface

open()

open_KeyStore()

read()

read_key()

write()

store_key()

io_ctl()

tai

.

...

reset_KS_Struct()

TAI1= (2038,32714)

insmod rmmod

KS_Structures

Admin Interface

delete_key()

... read_key(); ...

check current

Kernel Space

User Space

Trusted Application main(){ open_KeyStore(); TAI0= ... store_key(); (2037,32022) ... close_KeyStore();} }

close_KeyStore()

close()

Fig. 2. Overall scheme of the device driver with Admin and User APIs.

statically linked to the running kernel, it is entirely in kernel space, as illustrated in Figure 2. It provides two programming interfaces: the Admin Interface and the User Interface. The Admin Interface allows the system administrator to manipulate the KS Structures. These kernel level data structures form the basis of the access control mechanisms enforced by the driver when applications try to use the device. The main contents of KS Structures are: – tai - This is an array containing the information used to authenticate the threads of the trusted application. Each thread is identified by its Task Authentication Info (TAI), i.e. by the couple (PID, start time), where PID is a 32-bit sequentially assigned Process IDentifier and a start time is derived from the value of the kernel variable jiffies, that is a a 32-bit unsigned integer storing the number of elapsed ticks since the system was started [24].

– key desc - This is an array of linked lists of key descriptors. A key descriptor is an identifier for a specific key in the key-store. If more threads are to be granted access to the same key, the same value for a key descriptor must be present in the corresponding lists. As an example, in Figure 2 threads (2037, 32022) and (2038, 32714) can both access key number 2. – Key Format - This structure (not shown in the figure) specifies key characteristics such as the key type and the key size of individual keys. The key type supported is PKCS RSA version 2.1. The key size specifies the key length in bits (for PKCS RSA key, it ranges from 512 to 4096 bits). Functions provided by the Admin Interface are: reset KS struct (resets the contents of KS Structures); add key (allocates a new entry in the key desc structure and stores a key descriptor); delete key (deletes an entry in the key desc structure); add TAI (stores a TAI in the first unused location of the TAI array); del TAI (deletes a TAI from the TAI array). Since the Admin Interface plays a crucial role for the security of the system, it is protected via strong authentication of the operator. In our implementation, this relies on a smart-card reader, but integration of more sophisticated mechanisms (such as biometric devices, and the like) is straightforward. Once KS Structures has been properly filled, the trusted application can use the User Interface to access the key-store. This interface consists of four I/O functions, namely open KeyStore, store key, read key, and close KeyStore which provide a high-level view of the key-store device, and an additional function, list keys (not shown in the figure) which returns the list of keys belonging to a thread. User Interface functions call lower level I/O functions, implemented in the KS Functions, to actually perform the requested actions on the device.

4

Case Study Application

The application we used as a case study is an Attribute Authority (AA) server. An AA is a trusted third party that provides authorization services – i.e. uses cryptographic techniques to certify authorization information – to a potentially large community of principals. By authorization information it is generally meant certified information about principals’ roles and rights. Users are most typically components of a distributed software infrastructure. We briefly describe the architecture of the system (i.e. the basic functional units and their interactions), and the phases required for issuing an attribute certificate (AC). The process of issuing an AC entails the following steps (main success scenario), as illustrated in Figure 3: 1. The Registration Authority (RA), which is a trusted client, sends an AC request to the AA server. The format of the AC request is not described in detail, since it is not relevant in this context; 2. The Front End module of the AA server receives the request and performs authentication and integrity checks (i.e. it checks that the request was indeed sent by the RA and that it was not tampered with). Check is done by verifying the RA digital signature embedded in each request. This avoids that ACs be issued for

requests which do not come from the RA. To verify the signature, the Front End module uses the Public Key of the RA. Upon success of signature verification, the AC request is handed over to the Verification module; 3. The Signing module creates an AC, containing the authorization information to be certified. The AC is signed with the private key of the AA. The implementation of the Signing Module is based on a modified version of the OpenSSL [9] cryptographic engine, which allows the generation of ACs. The module comes in two flavors. Both variants generate the RSA signature in software. However, while one version of the module reads the private key from disk in encrypted form and decrypts it (in software), the other version reads the key from the FPGA-based key-store. This is pictorially represented in Figure 3 by the switching element on top of the sign operation; 4. The Verification module performs a whole set of additional checks. Verification procedures implemented by the module are not detailed here, since they are not relevant in this context.

FPGA

Front End

AC Request

RA Private Key

RA Signature

Check

RA Public Key

Verify

Signing

Network Sign

Verification

Decrypt Signed AC request

AC request

Disk

Key-Store

Registration Authority (RA)

Compare

Assemble

Rejection AA Private Key

Sign

Identical

Signed AC

AA Signature

Different Rejection

Serial Number

AC

Attribute Authority (AA)

Fig. 3. Flow for the issuing of an AC.

We explicitly note that if the FPGA-based key-store is used, any temporary data related to the key is deleted right after the signing operation, as described in Section 3. Conversely, as emphasized in [6], most commercial grade security applications do not take such a security measure, which results in a major security vulnerability.

5

Dependability Analysis

This Section contains results about security and performance evaluation of the key-store device. As far as security is concerned, we conduct a speculation-based analysis which demonstrates that the proposed approach provides the device with the same level of protection as non-swappable kernel memory. Consequently, the device can effectively resist security attacks at different levels, and in particular at the user level, at the root level, and at the physical level. As far as performance is concerned, we provide experimental results showing that our key-store dramatically cuts down on the latency of key access, as compared to the encrypted-key solution. The speed-up factor achieved on a dual processor server equipped with a hardware RAID controller – if the encrypted key is already in the disk cache – ranges between 1.29 and 2.77 for a value of the key length ranging between

512 and 4096 bit (experiments on different set-ups indicate that much larger values would be achieved on machines with lower disk I/O bandwidth and less computing power). If the encrypted key is not in the disk cache, the speed-up is between one and two orders of magnitude. 5.1

Security Gain

In this Section, we discuss the security improvements coming with the adoption of the key-store device. An attack is considered to be successful if the intruder steals the private key (in plain text). This scenario represents a total break of the system, since the intruder would be able to impersonate the key owner. In our case study, the intruder would be able to impersonate the AA Server, and thus generate fake ACs. Since security is a wide and complex issue, which encompasses all levels of a computing system, it is widely accepted by security professionals that no measure is able alone to cope with all kinds of attacks. A well known motto says that the security achieved by a system is as strong as the weakest link of the entire security chain. Based on this observation, in this Section we consider attacks and discuss countermeasures only for those levels where the availability of a tamper resistant hardware device can help improve security2 . We assume that such precautions have been taken. Consequently, we do not consider attacks at the personnel level, such as bribing the system administrator. We assume that the personnel is trusted and that the identity of human operators is verified via strong authentication techniques, such as smart-cards, biometric devices, and the like. Neither we consider network level attacks in detail, such as Denial Of Service (DOS) attacks, which are also beyond the scope of applicability of a tamper resistant security device. In conclusion, we focus on attacks at the following levels: user level (a shell with user level privileges is active in the system), root level (a shell with root privileges is opened), and physical level (an intruder has gained physical access to the system). Resilience to User Level There is a large number of vulnerabilities which can be exploited to launch malicious code on a remote host (such as buffer overflow exploits [16]) and an equally large number of countermeasures which can be taken to eliminate such vulnerabilities (such as the use of safe dynamic libraries [17]). We do not discuss attack techniques and patches in detail here, since this is beyond the scope of this work. As far as user level attacks are concerned, we simply assume that the intruder has somehow launched a shell with user level privileges and he wants to access the private key. One possible attack is for the intruder to write code issuing calls to the key-store device to read the key. In this case, the intruder program would not be allowed to open the device, since the open routine in the running kernel would detect a call from a user process with a non-registered TAI. 2

Some precautions must be taken, in order for an FPGA to be a tamper-resistant device. Typical measures are disabling the read-back functionality and using a back-up battery for holding the FPGA configuration [26]

Another possible technique is for the attacker to try to impersonate one of the threads of the trusted application. This entails generating an impostor thread with the same TAI as an authorized one, i.e. with the same PID and the same creation time (TAI components were in Section 3). As far as the PID is concerned, the attacker might spawn new processes until one is found, whose PID matches the PID of one of the threads of the trusted application. This technique is often referred to as loopattack, since it exploits the fact that the PID of a newly created process is the PID of the previously created process incremented by one, which results in PIDs forming a periodic sequence whose period is determined by the maximum value used by the kernel internal counter. In current Linux kernels, such a counter is a 32-bit integer. This would lead to the huge value of 232 for the counter to be reset. However, for compatibility with traditional Unix systems and legacy software developed for 16-bit hardware platforms, the maximum PID number allowed on Linux is 32767. More details about this can be found in chapter 3 of [24]. With these numbers, an impostor process with the desired PID can be generated in less than ten seconds. However, in order for the attack to be successful, the impostor thread must also have the same creation time. Since jiffies is a 32-bit unsigned integer, it returns to 0 about 497 days after the systems has been booted. As a consequence, synchronizing a loopattack procedure with kernel jiffies to obtain a specified TAI is virtually impossible. In conclusion, user level attacks are unable to break the system. Resilience to Root Level attacks We now consider root level attacks, i.e. attacks by an intruder who has launched a root shell. Again, attacks based on writing impostor code or impersonating an authorized thread would fail, for the same reasons which were illustrated in section 5.1. Another technique, which is only possible if the intruder has root privileges, would be for the attacker to write a malicious device driver. However, since dynamic loading of kernel modules has been disabled at compile time, this approach is not feasible. In conclusion, an attacker is left with three possible approaches to break the system: 1) tampering with the KS Structures data structures; 2) modifying the code of the current macro; and 3) modifying the contents of the PID and start time fields of the process descriptor. All the above mentioned actions entail writing to the kernel memory space. This is not possible with any program or tool running in user mode (we explicitly note that having acquired root privileges does not mean that the processor is running in supervisor mode, but simply that it is executing instructions on behalf of a privileged user). In particular, as far as debuggers are concerned, the attacker would at most be able to use the debugger read only on the running kernel, i.e. she would not be allowed to change values or set break-points. However, since the key-store device driver has been compiled and statically linked to the rest of the kernel code, and that the wise administrator has (most probably) not enabled the production of debugging information during compilation, using a debugger on the running kernel is not possible at all. For more details, please refer to the documentation shipped with the Linux source code about use of the gdb debugger on kernel code. The only way for an attacker to mess with the data structures and/or code which enforces security controls is thus to write malicious code performing one of the three actions described above and have somehow the processor execute such code while in

supervisor mode. To the best of our knowledge, there are no easy ways to accomplish this task. One possibility is to use the features provided by the some Linux pseudo-devices, such as /dev/mem and /dev/kmem, which provide some level of access to physical/virtual memory bytes. Another possibility is to bring the processor in supervisor mode and have it execute impostor code. The task is even harder if one has to achieve this goal without even crashing/hanging the running kernel. In fact, it should be emphasized that, in order for the attack to be of any use, all attack activities should go undetected. As a result, avoiding system crashes/hangs is a key requisite for the attack to be really successful, since it is quite unrealistic for a critical service, that a system crash/hang would go undetected. For instance, the AA server of our case study is continuously monitored, both directly (by the professionals of the technical staff) and indirectly (by RA clients, which continuously issue AC requests). Under these circumstances, a system crash or hang would most likely be detected, and countermeasures (such as invalidating the stolen key) would be taken. We can thus conclude that: i) breaking the system using this technique appears an hard task, and ii) potential system breaks have a very low probability to go undetected. Resilience to Physical Level attacks In this scenario, an intruder has somehow gained physical access to the system. Trivial attacks, such as read-back attacks (i.e., using the JTAG ports to read the FPGA configuration) are not feasible since, as already mentioned, we assume that basic protection measures, and in particular disabling the read-back functionality, have been taken. Thus, there are two possible techniques for the intruder to steal the key. One technique is to probe the system bus while the key-store is sending the key to an application (the key is in plain text when it travels on the bus). Let away the case of an impostor application successfully issuing a read operation to the key-store (which was discussed in Section 5.1), the attacker is left with two possibilities: one is to generate fake service requests to an authorized application, the other is to monitor the system bus during normal system operation. With regard to our case study, the former scenario is virtually impossible, since the only requests that are served (i.e. which result in the generation of a signature) come from authorized clients, which sign the requests with their own private key. This scenario would thus be a client impersonation problem, not a break of the server. As such, it is not analyzed here. The latter solution is indeed feasible. A possible countermeasure would be to have the key-store put the key on the bus in encrypted form. This would entail integrating in the key-store device an encryption block, for instance an Advanced Encryption Standard (AES) [25] block, and having the key-store device share with the trusted application a secret key, for encryption and decryption of the travelling information. However, the security gain would come at a cost, both in terms of silicon area (resources would have to be allocated on the FPGA to implement the encryption block and to store the additional key) and in terms of performance (encryption and decryption routines would have to be executed by the key-store device and by the software application). While the former is negligible for modern FPGAs which have plenty of gates and pipelining capabilities, the latter can be prohibitive, since the time

penalty for software encryption and decryption can be relatively large. We decided not to use bus encryption since it is quite unlikely – especially for for a critical service – that an intruder gets a chance to probe the system bus undisturbed, i.e. without the attack being detected. A second technique is to reverse engineer the configuration of the FPGA. This entails: i) to read the FPGA configuration, and ii) to interpret such a configuration. The first task is indeed very hard, especially if measures have been taken, such as avoiding external copies of the FPGA configuration (such as on a ROM). This is typically done by loading the bit-stream to the FPGA and using a battery to hold the configuration. If that is the case, the only way for an intruder to retrieve the configuration data, is to probe the internal circuitry of the FPGA. This means that the attacker should get a chance to open up the package undisturbed, and probe (millions and millions of) points in undocumented positions without damaging the device. Anyway, even if we assume that the attacker successfully reads the configuration of the FPGA, she would still have to perform the second task, i.e. the reverse engineering of the configuration data. Major FPGA vendors, such as Xilinx and Actel, assure that it is virtually impossible to interpret and/or to modify the bit-stream of an FPGA, since the irregular row and column pattern of the hierarchical interconnection network exacerbates the inherent complexity of the reverse engineering process [18], [19]. We can then conclude that breaking the system with a physical level attack appears an extremely hard task, and that potential system breaks have a very high probability to be detected. 5.2

Performance Gain

In this Section, we evaluate the performance improvement which can be achieved, using the key-store, with respect to the encrypted key solution. We measured the performance gain, in terms of key read time, i.e. the time needed to have the key available in plain text. In the following, TKS is the time to read the key from the keystore, and TDISK is the time to retrieve the encrypted key from disk and to decrypt it. In order to measure TKS and TDISK , we compiled the AA server – described in Section 4 – linking the variant of the cryptographic engine of the Signing module which reads the key from the key-store and the one which reads the key from disk in encrypted form and decrypts it in software, respectively. The two variants were launched on a high-end server, namely a Dell PowerEdge 1400SC with two 1400MHz Pentium III processors, running a Linux Red Hat kernel 2.4.18-3 with dual processor support. This system is very fast both at disk access (it has a hardware RAID 3 controller equipped with 32 MBytes of disk RAM) and at executing computations. That means this is quite an unfair setup for the key-store based variant of the system. This is especially true for larger values of the key-size (which are those typically used in real-world applications). In fact, for reads larger than 500 bytes, the dominating factor to disk read time is bandwidth, for which RAID performance is definitely good. We decided to use this set-up to present a worst case performance evaluation study for our key-store. Experiments on different set-ups indicated that even larger values would be achieved on machines with lower disk I/O bandwidth and less computing

power. The time required to retrieve a key of length ranging from 512 to 4096 bits, on the AA Server with and without the key-store facility, is reported in Figure 4.

Fig. 4. Time to retrieve the key with and without the key-store.

Results show that the speed-up is heavily influenced by the state of the system. More precisely, it depends on whether the file containing the encrypted key is found in the disk cache or not. If this file is found in the disk cache, the performance gain (defined as the ratio between TDISK and TKS ) ranges from 1.29 to 2.78 for a key-size ranging from 512 to 4096 bits (comparison between the two curves in the microseconds region of Figure 4). The shape of the curves can be explained by the following observations. For smaller values of the key length, the main contribution to TDISK is the time to read the encrypted key from disk, while for larger values of the key length, the dominating contribution becomes the time to decrypt the key, which is proportional to the keylength and to the inverse of the CPU speed. If the file containing the encrypted key is not in the disk cache, the performance gain is dramatic. Measurements indicate a value of one order of magnitude for the best case (data is already under the disk heads, i.e. seek time and latency are zero), and a value of about two orders of magnitude for the worst case (data is one spin away from the disk heads). A few comments are in order about individual contributions to TKS . The analysis clearly indicates that the real performance bottleneck is the bus. In the following, we motivate this conclusion. Some contributions to TKS (such as the time for initializing the key-store, and the time to instruct a DMA operation between the main processor and the Celoxica on-board RAM) are constant, i.e. they do not depend on the key length. Other time contributions (such as the time needed to actually transfer data between the main memory and the Celoxica on-board RAM, and between the Celoxica

on-board RAM and the key-store on the FPGA part) increase instead almost linearly with the key length (and proportionally to the inverse of the system bus speed). Since the bandwidth of memory transfers between the main memory and the Celoxica on-board RAM is relatively low (read transfer is 65 Mbit/sec, as measured on the experimental platform), we can conclude that the bus is the performance bottleneck.

6

Conclusions

This paper has proposed an hardware-based approach to allow reliable and timely delivery of security services. The approach relies on a custom hardware device implementing a tamper resistant RSA key-store, which is made available only to trusted applications, by means of a security enhanced device driver. The key-store is implemented on top of a Commercial Off The Shelf (COTS) programmable hardware board, namely a Celoxica RC1000 mounting a Xilinx Virtex-E 2000 FPGA part. We described the architecture of the hardware device and illustrated the organization of the device driver, which accomplishes the task of making the key-store device available only to trusted software modules. Finally, we evaluated the security and performance gain. As far as security is concerned, we showed that the device can effectively resist security attacks at different levels, and in particular at the user level (an intruder user application is active in the system), at the root level (a shell with root privileges is opened by an intruder), and at the physical level (an intruder has gained physical access to the system). As far as performance is concerned, we have provided experimental results showing that our key-store dramatically cuts down on the latency of key access.

Acknowledgements Authors are grateful to Gianluca D’Ardia for his precious help in programming the FPGA and coding the driver, and to Claudio Basile, Zbigniew Kalbarczyk, Steve Lumetta, and Jun Xu for fruitful technical discussions. This work was supported in part by the Italian National Research Council (CNR), by Ministero dell’Istruzione, dell’Universita’ e della Ricerca (MIUR), by the Consorzio Interuniversitario Nazionale per l’Informatica (CINI), and by Regione Campania, within the framework of following projects: SP1 Sicurezza dei documenti elettronici, Gestione in sicurezza dei flussi documentali associati ad applicazioni di commercio elettronico, Centri Regionali di Competenza ICT, and Telemedicina.

References 1. J. C. Laprie, “Dependable Computing and Fault Tolerance: Concepts and Terminology”, in Proc. of 15th International Symposium on Fault Tolerant Computing, IEEE Computer Society, pp. 2-11, Ann Arbor, MI, 1985. 2. RSA Keon Certificate Authority, RSA web site, http://www.rsasecurity.com/products/keon/datasheets/dskeoncertificateauth.html 3. Baltimore UniCERT Attribute Certificate Server, Baltimore web site, www.baltimore.co.kr/downloads/pdf/baltimoreunicertextendedacs.pdf

4. VeriSign Public Key Infrastructure, Verisign web site, http://www.verisign.com/products/onsite/index.html 5. SUN web site, http://java.sun.com/products/jce/ 6. A. Shamir, and N. van Someren, “Playing hide and seek with stored key”, Financial Cryptography 1999. 7. Celoxica Ltd web site, datasheets available at: http://www.celoxica.com/technical library/files/CELMRKDATRC1000RC1000 8. RSA Laboratories, “PKCS #1 v2.1: RSA Cryptography”, Standard Draft 2, January 2001. 8, 1998. 9. The OpenSSL Project, http://www.openssl.org 10. W. Diffie, and M. E. Hellman, “New Directions in Cryptography”, IEEE Transactions on Information Theory, V. IT-22, n. 6, Jun 1977, pp. 74-84. 11. Y. Desmedt, “Threshold cryptography”, European Transactions on Telecommunications, 5(4):449-457, July-August 1994. 12. L. Zhou, F.B. Schneider, and R. Van Renesse, “COCA: A Secure Distributed Online Certification Authority”, ACM Transactions on Computer Systems, Vol. 20, No. 4, November 2002, pp. 329-368. 13. D. Boneh, M. Malkin, and T. Wu, “Building Intrusion Tolerant Applications”, Proc. of DARPA Information Survivability Conference and Exposition 2000 (DISCEX00), Vol. 1, pp. 74-87. 14. Netscape SSL Specification 3.0, November 1996, available at http://wp.netscape.com/eng/ssl3/index.html 15. The TLS Protocol Version 1.1, October 2002, available at http://www.ietf.org/internetdrafts/draft-ietf-tls-rfc2246-bis-02.txt 16. S. Beattie, P. Calton, C. Cowan, F. Wagle, and J. Walpole, “Buffer overflows: attacks and defenses for the vulnerability of the decade”, Proc. of DARPA Information Survivability Conference and Exposition 2000 (DISCEX ’00), Vol. 2, pp. 119-129. 17. N. Singh, and T. Tsai, “Libsafe: transparent system-wide protection against buffer overflow attacks”, Proc. of International Conference on Dependable Systems and Networks, 2002 (DSN ’02), p. 541. 18. Xilinx, “Configuration Issues: Power-up, Volatility, Security, Battery Back-up”, Application Note XAPP 092, 1997. 19. QuickLogic, QuickNote #57: “High-Level Design Security with QuickLogic Devices”, 1997. 20. A. Mazzeo, N. Mazzocca, L. Romano, and G.P. Saggese, “FPGA-based Implementation of a Serial RSA processor”, Proceedings of the Design And Test Europe (DATE) Conference 2003, pp. 582-587. 21. A. Cilardo, A. Mazzeo, L. Romano, and G.P. Saggese, “ Carry-Save Montgomery Modular Exponentiation on Reconfigurable Hardware”, Proceedings of the Design And Test Europe (DATE) Conference 2004. 22. R. L. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining Digital Signature and Public-Key Cryptosystems”, Commun. ACM, vol. 21, pp. 120-126, 1978. 23. A. Rubini, J. Corbet, “Linux Device Driver”, O’Reilly, 2nd edition, June 2001. 24. D. P. Bovet, M. Cesati, “Understanding Linux Kernel”, O’Reilly, October 2000. 25. National Institute of Standards and Technology (NIST), Federal Information Processing Standards (FIPS) Publication 197, “Advanced Encryption Standard (AES)”, NIST/U.S. Department of Commerce, Nov 2001. 26. T. Wollinger, J. Guajardo, and C. Paar, “Cryptography on FPGAs: State of the art of implementations and attacks” ”, to appear in ACM Trans. on Embedded Computing Systems (TECS), 2003.

Suggest Documents