A high performance FPGA implementation of DES - IEEE Xplore

0 downloads 0 Views 452KB Size Report
DSiPTM Laboratories, School of Electrical and Electronic Engineering,. The Queen's University of Belfast, Northem Ireland. Email: {Maire.McLoone, J.McCanny ...
A HIGH PERFORMANCE FPGA IMPLEMENTATION OF DES MBire McLoone and John V McCanny DSiPTM Laboratories, School of Electrical and Electronic Engineering, The Queen’s University of Belfast, Northem Ireland. Email: {Maire.McLoone,J.McCanny @ee.qub.ac.uk) Abstract : FPGAs have proven to be very effective and efficient devices on which to implement encryption algorithms. They perform at much faster data-rates and provide better security than equivalent software implementations. They also provide more flexibility than ASIC implementations. This paper presents a high performance silicon intellectual property (IP) core for the Data Encryption Standard (DES) encryption algorithm. The 16-stage pipelined DES design runs at an encryption rate of 3.87 Gbitsls using Xilinx Virtex FPGA technology making this the fastest single-chip DES FPGA implementation reported to date. This result is a factor 28 times faster than software implementations. Keywords: FPGA, ASIC, Intellectual Property (IP), pipelined

1 INTRODUCTION The Data Encryption Standard P E S ) algorithm is the most widely used encryption algorithm and has been a federal standard since 1977. Although soon to be replaced by the Advanced Encryption Standard (AES) algorithm, DES will still remain in the public domain for a number of years. It provides a basis for comparison for new algorithms and is also used in IPSec protocols, ATM cell encryption, the Secure Socket Layer (SSL) protocol and in TripleDES, adopted to improve DES in the X9.17 and I S 0 8732 standards [ 1,2]. A 16-stage pipelined DES Algorithm hardware implementation is outlined in this paper. It allows 16 data blocks to be processed simultaneously resulting in an impressive gain in speed. It also supports the use of different keys every clock cycle, thus improving overall security since users are not restricted to using the same key during any one session of data transfer. The design is implemented on Xilinx Virtex FPGA technology. The advantages of implementing cryptographic algorithms on FPGAs include algorithm agility - where the same FPGA can be reprogrammed at run time to support different algorithms, scalable security through different versions of the same algorithm (DES and TripleDES) and alterable architecture parameters - where features such as variable s-boxes or different modes of operation can be realised [3].

Section 2 of this paper discusses relevant background work. Section 3 provides a brief description of the Data Encryption Standard algorithm. In section 4, the

0-7803-6488-0/00/$10.00 0 2000 IEEE

374

design procedure and performance of the pipelined DES design is described. A performance evaluation of pipelined DES algorithm architectures is carried out in section 5 . Finally, concluding remarks are made in section 6.

2 RELATED WORK Leonard and Mangione-Smith [4] published the first paper on FPGA implementations of the DES algorithm in 1997. Their fastest implementation achieves a data-rate of 3.3 Mbyteslsec on a Xilinx XC4000 series device. However, the design does not support decryption and each key must be precomputed before it can be used in the device. A single-chip implementation of DES on a xllinx XC4000 series device has been described by Wong et a1 [5]. Their design achieves an encryption speed of 26.7 Mbitshec. Kaps and Paar [3] canied out extensive research on high-speed FPGA architectures for the DES algorithm. Among other designs, they consider a pipelined design with a fourstage pipeline. The data-rate achieved is 402.7 Mbits/sec on the XC4028EX device. Recently, Patterson [6] presented a paper describing a DES encryption implementation using h i t s on a Virtex XCV150-6 FPGA. JBits provides a Javabased Application Programming Interface (API) for the run time creation and modification of the configuration bitstream, which allows dynamic circuit specialization based on a specific key and mode [6]. In this implementation, the key schedule is computed in software and forms part of the bitstream. Therefore, all key input and subkey generation circuitry is removed and when pipelined, the result is a design with a throughput of 10.7 Gbit'sec. A free-DES core [7] also exists which utilizes a 16-stage pipelined DES design implemented on a Virtex XCV400-6 device. It achieves a throughput of 3 .OS Gbit'sec. The Sandia National Laboratories ('3%) DES ASIC implementation is the fastest known DES implementation, capable of running at 10 Gbitshec. The ASIC was fabricated with static 0.6 micron CMOS technology [8]. The fastest publicly available DES software implementation encrypts at 15 Mbits/sec on a 200 MHz Pentium [9]. However, a paper by Eli Biham [lo] outlines a DES implementation which achieves a throughput of 137 Mbitshec on a 300 MHz Alpha 8400 processsor. A recent paper by Elbirt and Paar [ I l l illustrated a fully pipelined Serpent algorithm implementation (the Serpent algorithm is a candidate for the AES). The design performed at an encryption rate 9 times faster than implementations involving iterative looping and loop unrolling. Table 1 below summarises the performance of some of the implementations outlined above and for comparison purposes the specifications for a number of commercial hardware implementations of the DES algorithm are given in Table 2.

0-7803-6488-0/00/$10.000 2000 EEE

375

Table 1. Specifications for Recent DES FPGA Implementations Manufacturer Leonard, MangioneSmith Wong, Wark, Dawson Kaps, Paar Patterson

Device Used XC40 I3

CLB Slices 520

System Clock 6.6 MHz

Data Rate Mbyteslsec 3.3

XC4020E XC4028EX xcv150

43 8 741 1584

10 M H Z 39.7 M H Z 168 MHz

3.34 50.33 1344

Table 2. Specifications for Commercial DES Hardware Implementations Manufacturer

VLSI Technology Memec CAST Inc. Free-DES

CLBs

Device Used

vMoo9 XCV4000E XCVl50-6 xcv400

316 255 5263

System Clock 33 MHz 43 MHz 101 MHZ 47.7 M H Z

Data Rate Mbyteshec 14 21.5 50.5 363.9

3 DATA ENCRYPTION STANDARD (DES) ALGORITHM DES is a private key (symmetric) algorithm. An outline of DES is shown in Fig. 1. It is a block cipher operating on 64-bit blocks of plaintext utilising a 64-bit key. Every gth bit of the 64-bit key is used for parity checking and otherwise ignored. After an initial permutation, the 64-bit input is split into a right (R.,) and left half (Lo), each 32 bits in length. DES has 16 iterations or rounds. In each round a function,f ,is performed in which the data is combined with a 48-bit permutation of the key. After the 16'h iteration, the right ( R I ~ )and left 6 1 6 ) halves are concatenated and a final permutation, which is the inverse of the initial permutation, completes the algorithm.

3.1 Functionfof DES Algorithm The functionfof the DES algorithm is made up of four operations. Firstly, the 32-bit right half of the plaintext, &, is expanded to 48-bits and then XORed with a 48-bit sub-key, KI. The result is fed into eight substitution boxes (s-boxes), which transform the 48-bit input to a 32-bit output. Finally, a straight permutation (Ppermutation) is performed, the output of which is XORed with the initial left half, Lo, to obtain the new right half, RI. The original right half, %, becomes the new left half, L,. This is outlined in Fig. 2.

0-7803-6488-0/00/$10.00 0 2000 IEEE

376

-Plaintext

U / \ KI

K16

Function, f

II-( [3 1:O]

[3 1:O]

R’6

R16L16

Final Permutation P3:01

Ciphertext

Fig. 1. Outline of DES Encryption Algorithm Ki

[47:0]

R

[0:31]

L

[0:31]

Expansion

+ ,permutation

[0:31] p . S-boxes -+ permutation

Fig. 2. FunctionJof the DES Algorithm

0-7803-6488-O/OO/$lO.OO 0 2000 IEEE

377

4 PIPELINING THE DES ALGORITHM The iterative nature of the DES algorithm makes it ideally suited to pipelining. The DES algorithm implementation presented in this paper is based on the Electronic Codebook (ECB) mode. Although the ECB mode is less secure than other modes of operation, it is commonly used and its operation can be pipelined u21. The fully pipelined DES implementation will also operate in counter mode. Counter mode is a simplification of Output Feedback (OFB) mode and involves updating the input plaintext block as a counter, h+, = 4 + I , rather than using feedback. Hence, the ciphertext block, i is not required in order to encrypt plaintext block, i+1[13]. Counter mode provides more security than ECB mode and operation of either mode involves trading security for high throughput. In order to pipeline the algorithm the function f block must be instantiated 16 times. Registers are then placed at the left and right outputs of each function f block to allow the data to be sequenced. It is also necessary to delay the sub-keys entering each block. This is achieved by the addition of a skew, which delays the individual sub-keys by the required amount. The structure of the pipelined DES is shown in Fig. 3.

4.1 Sboxes A study by Haskins [14] indicates that the use of ROM blocks provides the most efficient implementation for the s-boxes of the DES algorithm. Thus two LUTs within a slice on the Virtex FPGA device can be combined to create a 32 x 1-bit synchronous RAM [15], initialised, and used to implement the s-boxes. Eight 32 x 1-bit RAM blocks are required for each s-box.

4.2 Key Procedure There are two methods of implementing the DES key procedure. The initial step in both methods is to remove the parity check bits in the 64-bit key. Every Sth bit is used for parity checking, leaving 56-bits. A different 48-bit sub-key is now generated for each of the 16 rounds of DES. In the first method the sub-keys are determined by first splitting the 56-bits into two 28-bit lengths of data. Then both halves are shifted left by either one or two bits depending on the round number. In the second method the resulting 48-bit permutations are coded. This is the method used in the design outlined in this paper. It eliminates the need for logic and thus results in a faster implementation.

0-7803-6488-O/OO/$lO.OO 0 2000 IEEE

378

4.3 Skew Design For the 16-stage pipelined DES design it is necessary to control the time at which the sub-keys are available to each functionfblock. This is accomplished by the addition of a skew that delays the individual sub-keys by the required amount. The skew consists of a ‘dffarray’ sub-component, which generates a sequence of latches as required. The code for this sub-component is outlined in Fig. 4. The ‘skew’ component generates an array of latches of varying lengths. It uses the ‘dffarray ’ sub-component to produce the correct number of latches required at each round of the DES algorithm. The code for the ‘skew’ component is shown in Fig. 5. One latch is generated to delay the second sub-key, two latches are generated to delay the third sub-key and so on. (It is not necessary to delay the first sub-key).

RL L,

-RI

Function

Function f L l k 1

- -

t

L

K2

K3

-

Key, Skew

n

Core

Fig. 3. Structure of Pipelined DES Algorithm Design

The pipelined DES design is a large design as it contains 16 instantiations of the fimctionfcomponent, hence the targeted device is the largest in the Virtex family, the XCVlOOO. The design utilises 6446 CLB slices which is 52% of the total number of CLB slices available on this device. Of IOBs, 188 out of 404 (46 %) are used. This design uses a system clock of 60.5 M H z . The bit rate is 64 times the clock frequency since the design is pipelined, hence the data-rate achieved is 3.87 Gbitdsec (484 Mbytes/sec). In this design data blocks can be accepted every clock cycle and after an initial delay of 16 clock cycles the respective encrypteddecrypted data blocks appear on consecutive clock cycles. The design also supports a key change at full speed, i.e. a different key can be used with every block of plaintext.

0-7803-6488-O/OO/$lO.OO 0 2000 IEEE

379

type MIDKEY is array (0 to Depth) of KEY; signal A : MIDKEY; begin

library IEEE; use IEEE.std-logic-I 164.all; use IEEE.std-logic-unsigned.all; use work.TYPES.al1;

G I : for I in 0 to Depth generate begin SI: if I = 0 generate FFI: dffl port map(D => Keyin, clk => clk, reset => reset, Q => A(I)); end generate S I ;

--KEY is of type std-logic-vector(0 to 47) entity dffarray is generic( Depth : POSITIVE); port (Keyin : in KEY; clk, reset : in std-logic; D-Key: out KEY); end dffarray;

S2: if I /= 0 generate FF2 : dffl port map(D => A(I-l),clk => clk, reset => reset, Q => A(I)); end generate S2; end generate G I ;

architecture synth of dffarray is

....................................

--Component Declarations component dff 1 port (D : in std-logic-vector( 0 to 47); clk, reset : in std-logic; Q: out std-logic-vector( 0 to 47)); end component;

D-Key I) port map(Keyin => SkewKeyin(l+l), clk => clk, reset => reset, D-Key=> SkewD-Key(l+l)); end generate G2; SkewD-Key(0)

Suggest Documents