Cryptographic Key Protection Module in Hardware for the Need2Know ...

6 downloads 6176 Views 184KB Size Report
necessary step toward highly secure and flexible deployments of the ... ANS X9.63. DSA: Digital Signature Algorithm (uses SHA-512) FIPS 186-2, ANS. X9.62 ...
Cryptographic Key Protection Module in Hardware for the Need2Know System Scott Fields

Don Bouldin

Electrical & Computer Engineering University of Tennessee Knoxville, TN 37996-2100 [email protected]

Electrical & Computer Engineering University of Tennessee Knoxville, TN 37996-2100 [email protected]

Abstract— Traditional public key cryptographic methods

provide access control to sensitive data by allowing the message sender to grant a single recipient permission to read the encrypted message. The Need2Know® system (N2K) improves upon these methods by providing rolebased access control. N2K defines data access permissions similar to those of a multi-user file system, but with N2K access is strongly enforced through cryptographic standards. Since custom hardware can efficiently implement many cryptographic algorithms and can also provide additional security, N2K stands to benefit greatly from a hardware implementation. To this end, the main N2K algorithm, the Key Protection Module (KPM), was specified in VHDL. The design was built and tested incrementally: first the KPM controller was specified, then cryptographic sub-modules for random number generation, Elliptic Curve Cryptography (ECC) encryption and decryption, Advanced Encryption Standard (AES) encryption and decryption, AES Key Wrap, and Secure Hash Algorithm (SHA) were added. Both RTL simulation and formal verification were used during the design cycle. Finally, the design was implemented successfully in an FPGA. This is the first N2K implementation in hardware, and it provides an accelerated and secured alternative to the software-based system. A hardware implementation is a necessary step toward highly secure and flexible deployments of the N2K system. I.

INTRODUCTION

A. Public Key Cryptography Overview Since the introduction of public key cryptography, it has become widely used as a preferred method for exchanging confidential information. The basic application of public key cryptography for encrypting a message is as follows: (1) a potential recipient generates a public/private key pair and distributes the public key, (2) a sender encrypts a message

with the recipient's public key and transmits the message, (3) the recipient receives the encrypted message and decrypts it with his/her private key. Using this scheme, message access is on a per-user basis where the message is readable by the intended recipient. The scheme can be further enhanced if the recipient knows the sender's public key, for if the sender encrypts some known value – a hash of the message, for instance – with his/her private key, then the recipient can verify the sender's identity by decrypting the known value using the sender's public key. Even with this enhancement, access control is still on a per-user basis. B. Need2Know Overview A user-based scheme is ill suited for more complex communication among groups of people, so it has limited use for large organizations or coalitions of organizations. With this need in mind, the proprietary public key cryptography system, Need2Know (N2K), has been specified. N2K defines data access permissions similar to those of a multi-user file system: read/write and read-only permissions can be assigned to individuals or groups, and permissions can be time-sensitive. The organizational structure is centrally managed, and access is strongly enforced through the use of the cryptographic standards listed in Table 1. TABLE I.

N2K-EMPLOYED CRYPTOGRAPHIC STANDARDS Algorithm

Standard

DRBG: Deterministic Random Bit Generator

ANS X9.63 A.4.1

SHA-512: Secure Hash Algorithm

FIPS 180

KDF: Key Derivation Function (uses SHA-512)

ANS X9.63 5.6.3

AES-256: Advanced Encryption Standard, 256-bit

FIPS 197

AESKW: AES Key Wrap (uses AES-256)

FIPS 197

ECDH: Elliptic Curve Diffie-Hellman

ANS X9.63

DSA: Digital Signature Algorithm (uses SHA-512) FIPS 186-2, ANS X9.62

II.

KEY PROTECTION MODULE

A. Processing Steps N2K data are processed by the Key Protection Module (KPM), which uses the aforementioned cryptographic standards to control creation and accessing of secure data. A simple outline of the encryption process follows: 1) A valid symmetric key (K) and initialization vector (IV) are generated randomly using DRBG. 2) Ephemeral private keys corresponding to the recipient roles are generated using DRBG. 3) Public keys are calculated for each ephemeral private key using ECDH. 4) For each role, a key encryption key (KEK) is derived using KDF and ECDH. 5) For each role, K is wrapped with a KEK using AESKW. 6) The data is encrypted with K and IV using AES256. 7) The encrypted data, digital signature, and Meta data are packaged. 8) The package is signed with the user's private key using DSA. The decryption is the process in reverse – if the recipient has the correct role credentials, he/she can unwrap the working key and decrypt the message, verifying the sender and message contents in the process. B. Suitability for Hardware Implementation The KPM is well suited for a hardware implementation for a number of reasons. (1) The cryptographic algorithms that it uses are known to be computationally intensive, and repeatedly it has been shown that hardware implementations can outperform their software equivalents [2]. (2) Processing confidential information is safer in a dedicated-memory environment. (3) N2K access points will be designed for deployment in power-conscious settings, and dedicated hardware is generally more power-efficient than software. This initial implementation targets FPGAs rather than an ASIC since FPGA platform solutions offer rapid prototyping and low initial cost. By using VHDL for the design description, it will be easy to re-target the design to a fixed ASIC at any time in the future in order to realize the potential savings in power, delay, area, and cost. III. HARDWARE SPECIFICATION OF A CRYPTOGRAPHIC CO-PROCESSOR In this implementation, the KPM functions as a cryptographic co-processor that communicates with its host CPU via shared dual-port RAM. To begin processing, the host CPU first writes control signals and data into the RAM.

Next, the KPM detects that the write has completed and processes the data. When processing of the current data is complete, the KPM writes out its status signals and processed data. Finally, the CPU reads the KPM results, and the process is repeated. For some message lengths, the dualport RAM cannot accommodate the entire N2K message, so dual-port RAM transfers can occur several times before the message is completely processed. A. KPM Structure The KPM is the crossroads of the implementation, as shown in Fig. 1. For description in VHDL, the KPM is split into two entities. The first entity is purely structural, binding the various modules together. Direct interconnects between the cryptographic modules and the KPM are used, rather than some sort of bus. The chief consideration in avoiding a bus was to avoid the modification of several legacy modules for bus support. Also, while a bus would facilitate design reuse, it would increase latency unless an extremely wide data path was used. The second entity describes the RTL control logic. The KPM fetches and decodes control words loaded into dualport RAM by the host CPU. Depending on the instruction, the KPM then enters one of three processing states: encryption, decryption, or Built-In Self Test (BIST). B. Encryption and Decryption Encryption and decryption follow the steps outlined in the Introduction. Due to the sequential nature of the algorithm and concerns about area, the implementation makes use of each cryptographic module only once. Parallel processing on different input blocks is precluded by the sequential nature of KPM. However, pipelining is in some cases possible, when both: a) successive states do not require the same module and b) the function inputs have been previously generated. Steps 6-8 meet this requirement, and they stand to be a major bottleneck in the KPM. These steps encrypt and sign the message, which can theoretically range from 0 to 2128 bits. Additionally, they encompass the data passing with the

SHA-512 KDF

DRBG

Host CPU

KPM

AES-256

AESK

DSA ECDH

Figure 1. The KPM manages various cryptographic modules to process requests from the host CPU.

host CPU, which can suffer from high latency. Thus, these relatively slow operations are pipelined: message blocks are read from dual-port RAM, the data is encrypted (using AES256), the package is constructed in dual-port RAM, the hash value is generated (using SHA-512), and requests for more data are sent to the host CPU. The pipelining scheme needs to be unusually flexible, since the AES-256 is multi-cycle and both the RAM reads/writes and the SHA-512 operations are themselves pipelined and take different (sometimes variable) numbers of cycles to complete. When multiple recipient roles are specified, steps 4 and 5 also meet the parallelization requirements. Making use of a one-stage pipeline, step 4 could begin generating a new KEK while step 5 wraps K with the previously generated KEK. The implementation, however, does not take advantage of this. While steps 6-8 are a potential bottleneck, steps 4 and 5 generally consist of relatively few hashes and encrypt operations, so no effort was made to pipeline them. C. Built-In Self Test The BIST mode simulates encrypt and decrypt requests from the host CPU. Initially, it loads test data into dual-port RAM, and then it begins the encryption process. Upon completion, the results in RAM are compared to internally stored “golden” outputs, and the discrepancies are noted. Generated keys, encrypted results, and hash results are tested in this manner, and then the process is repeated to test decryption. When the BIST completes, it returns the status of its various modules to the host CPU. D. Cryptographic Modules There are a number of implementation trade-offs to be considered when choosing cryptographic modules. Two of the most computationally intensive algorithms, and therefore two of the modules most affected by these trade offs, are AES-256 and SHA-512. The AES-256 module in this implementation is a non-pipelined, LUT-based loop architecture for encrypt/decrypt. It requires 17 cycles for an encryption or decryption, and there is an additional penalty for the key scheduler whenever a new key is loaded for decryption. This design was previously prototyped at 33MHz on a Xilinx Virtex-E FPGA to achieve 30 MB/s. While the looped architecture is not extremely fast, it saves area compared to pipelined or unrolled versions. Optimizations to increase the throughput are well documented in the literature [3], but are not used in this implementation. The SHA-512 module is pipelined for passing 1024 bits of input, 64 bits per cycle. It takes a total of 80 cycles per 1024 bits, and postprocessing can add another 80 or 160 cycles to the final word. The module has been previously prototyped at 66MHz on a Xilinx Virtex-E to achieve 104 MB/s. IV.

RESULTS

A. Verification Results The design was built incrementally: first the KPM controller was specified without the cryptographic modules.

Functional simulations were run on the standalone controller using Mentor Graphics ModelSim to test basic functionality, and formal verification was performed using Cadence FormalCheck to check for corner-case errors and deadlocks. Once the KPM tests were satisfactory, the functional cryptographic modules were similarly developed and tested as standalone units. After a module was tested, it was integrated into the whole KPM and the original tests were rerun to ensure that the system was still correct. Incremental verification has included the DRBG, SHA512, KDF, AES-256, and AESKW modules. At the time of this paper submission, the full ECDH and DSA modules have yet to be incorporated into the KPM. While the KPM currently holds stub modules for these units, the modules are non-functional, preventing a full-system functional test of the design. B. Implementation Results The implementation has targeted two platforms: one based on the Xilinx Virtex-E and the other based on the Xilinx Virtex-II Pro. Logic usage, as reported by Synplicity Synplify Pro, is shown in Table II. The Virtex-E platform used for testing was the Pilchard Reconfigurable Computing Platform [4], so these results include interface logic for communicating with the host CPU over an SDRAM memory bus. Similarly, the Virtex-II Pro platform was a PCI-based solution from Amirix [5], and these results include interface logic for a memory controller and IBM CoreConnect bus. TABLE II.

DESIGN IMPLEMENTATION RESULTS ARE SHOWN FOR THE VIRTEX-E AND VIRTEX-II PRO KPM Virtex-E (XCV1000E)

Virtex-II Pro (XC2VP30)

Flip Flops

4,725 of 24,576 (19%)

4,759 of 27,392 (17%)

LUTs

6,939 of 24,576 (28%)

7,201 of 27,392 (26%)

KPM + Cryptographic Modules Virtex-E (XCV1000E)

Virtex-II Pro (XC2VP30)

Flip Flops

10,133 of 24,576 (41%)

10,135 of 27,392 (37%)

LUTs

15,761 of 24,576 (64%)

15,373 of 27,392 (56%)

V.

CONCLUSIONS

This work is the first implementation of the N2K KPM in hardware. The KPM has been functionally and formally verified, and most of its supporting cryptographic modules have been implemented and incorporated into the design. Implementation results have been reported for two FPGA families. This work provides an accelerated alternative to the software-based system, and its development is a necessary step toward highly secure and flexible deployments of the N2K cryptographic role-based access control system.

VI.

FUTURE WORK

A number of optimizations exist for the various cryptographic modules, and future improvements on the KPM will take these into account. In addition, it would be useful to convert the KPM co-processor into a re-usable IP block for incorporation in larger System-on-Chip designs. In this effort, supporting logic will need to be developed to implement the N2K technique fully. Finally, since the KPM is intended to be used in insecure environments, its resistance to physical attack needs to be assessed and possibly improved with a tamper-resistant package. ACKNOWLEDGMENTS The authors would like to thank A. Miller and S. Carrithers of the University of Tennessee for their shared authorship of several cryptographic modules and Ersin Domangue of InfoAssure for his explanation of the KPM specifications. This work was partially supported by the Office of Naval Research grant number N00014-04-1-0562 via the National Center for Advanced Secure Systems Research.

REFERENCES [1] [2]

[3]

[4]

[5]

“InfoAssure – Products / Need2Know,” [Online] Available: http://www.infoassure.net/need2know.html. J. Nechvatal, et al., “Report on the Development of the Advanced Encryption Standard (AES),” October 2, 2000. [Online]. Available: http://csrc.nist.gov/CryptoToolkit/aes/round2/r2report.pdf. F.X. Standaert, G. Rouvroy, J.J. Quisquater and J.D. Legat, “Efficient Implementation of Rijndael Encryption in Reconfigurable Hardware: Improvements and Design Tradeoffs,” Proceedings of the 2003 Cryptographic Hardware and Embedded Systems (CHES) Conference, Sept., 2003, pp. 334-350. P.H.W. Leong, M.P. Leong, O.Y.H. Cheung, T. Tung, C.M. Kwok, M.Y. Wong and K.H. Lee, “Pilchard - A Reconfigurable Computing Platform with Memory Slot Interface,” Proceedings of the 2001 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), April, 2001, pp. 170-179. “Amirix AP100 Platform FPGA Development Board,” [Online] Available: http://www.amirix.com/products.

Suggest Documents