(Electronic Design Automation) costs available for FPGA implementations. .... E. Handel-C versus HDLs ... eration, ipad and opad are 512-bit constants, and Ko is the ..... of the digital signature algorithm," in Proceedings of the 9th Interna-.
IEEE ISIE 2006, July 9-12, 2006, Montreal, Quebec, Canada
Designing
an
HMAC-Hash Unit
on
FPGAs Using Handel-C
Esam Khan, M. Watheq El-Kharashi, Fayez Gebali, and Mostafa Abd-El-Barr
Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada Emails: {ekhan, watheq, fayez, mabdelba}@ece.uvic.ca Abstract- In this paper, we utilize an emerging system design methodology in designing a reconfigurable HMAC-hash unit. This new methodology directly maps a design described in a high level language, Handel-C, to FPGA platforms. The HandelC approach narrows the gap between performance and flexibility, and thus, reduces the risk of translating a high level prototype into HDLs. It provides a high degree of flexibility from two viewpoints: the language level of abstraction and the hardware reconfiguration. We consider a detailed case study: a reconfigurable HMAC-hash unit that implements six standard hash functions, MD5, SHA-1, RIPEMD-160, HMAC-MD5, HMACSHA-1, and HMAC-RIPEMD-160. Using Handel-C, we enhanced the performance of the designed unit by applying pipelining, parallelism, and reconfigurability. The use of Handel-C resulted in an HMAC-hash unit architecture that is better in speed than most of the previously designed units. At the same time, the area cost for putting the six standard algorithms on the same hardware core is also kept as low as possible. I. INTRODUCTION
At present, there exist different methodologies for embedded system design. On one extreme, a project is coded as a software program to run on a general purpose processor (GPP). This method provides the highest level of flexibility and ease of change to accommodate new standards by simply changing the software. However, this option would be poor in performance compared to other design options. On the other extreme, an application is implemented using a dedicated hardware (or ASIC). While ASICs provide the highest performance, it is difficult to modify an implemented architecture to accommodate new standards since this would require implementing a new architecture [1]. Between these two extremes, programmable processors and reconfigurable hardware exist to fill the gap between performance and flexibility. The common technology used for reconfigurable hardware is Field Programmable Gate Arrays (FPGAs) [1]. Fig. 1 locates the above described options on the design spectrum, showing their relative performance and flexibility. In a survey conducted by Celoxica [2], new hardware implementation trends are moving towards using FPGAs instead of ASICs. This is mainly due to the hardware reconfigurability and the low NRE (Non Recurring Engineering) and EDA (Electronic Design Automation) costs available for FPGA implementations. Although ASICs are still better in terms of performance and unit price, FPGAs are recently showing improvements in these two aspects. A typical embedded system design methodology prototypes a design in a high level language, such as C, and then translates it manually into an HDL code. This process is time consuming and error-prone [3]. Furthermore, for efficient designs, the details of target platforms should be known in advance when
1-4244-0497-5/06/$20.00 C 2006 IEEE
Flexibility /
Programmable processors
FPGA
p-
Performance
Fig. 1.
flexibility.
Comparing different design options in terms of performance and
using HDLs. In other words, HDLs are lower level than high level languages, which reduces the flexibility of a prototype using them. In order to narrow the gap between performance and flexibility and reduce the time required to complete a design and the risk of errors that might result from translating a high level prototype into HDLs, a new design approach is emerging nowadays, which relies on a high level language. There are different languages used for this purpose [4]. Handel-C is among the most emerging ones [5]. Handel-C is a high level language that has the capability to be mapped directly to FPGAs. In addition, it has constructs, such as pipelining and parallelism, that improve the performance of a design. Since it is not device-specific, Handel-C can synthesize a design to different FPGA chips, providing more flexibility and speed upgrade. The Handel-C methodology, described in this paper, allows for a high degree of flexibility from two viewpoints: the language level of abstraction and the hardware reconfiguration. Handel-C is like any other high level language that can be easily modified for updates and new standard. Moreover, it enables a design space exploration. FPGAs are reconfigurable and can easily be reprogrammed for new updates and standards. In addition to narrowing the gap between performance and flexibility, this approach provides fast time-to-market
(TTM).
This paper considers a detailed case study: a reconfigurable HMAC-hash unit (described in our previous work [6]). Our HMAC-hash unit consists of two parts. The first one is a hash unit designed using a unified hash algorithm to implement
1521
Authorized licensed use limited to: UMM AL QURA UNIVERSITY. Downloaded on January 2, 2010 at 04:35 from IEEE Xplore. Restrictions apply.
MD5, SHA- 1, and RIPEMD- 160. The second part implements the HMAC algorithm. The HMAC-hash unit is reconfigurable at runtime to select one of six algorithms: MD5, SHA1, RIPEMD-160, HMAC-MD5, HMAC-SHA-1, and HMACRIPEMD- 160. This paper is organized as follows. Handel-C is briefly described in Section II. The design flow using this methodology is described in Section III. The HMAC-hash unit is described in Section IV and the application of the Handel-C design methodology to design this unit is discussed in Section V. In Section VI, comparison with other similar works is discussed. Finally, we conclude the paper in Section VII.
II. HANDEL-C Handel-Cl [5] is a high level language based on ANSIC. It is designed to enable direct compilation of programs to FPGA logic or RTL descriptions. It allows designers to import algorithms written in C and use them to create a rapid implementation and optimization path in hardware. The advantage of Handel-C over conventional C is that it is fully synthesizable and that it supports parallelism. Most of the constructs available in ANSI-C are also available in Handel-C. In addition, Handel-C adds some constructs that simplify hardware design and direct mapping of a design to FPGA. Fig. 2 shows the shared constructs between ANSI-C and Handel-C and the added constructs by Handel-C.
if (a == 0) a++; else delay;
The delay statement does nothing for one clock cycle. It is inserted to balance execution time between the two branches.
B. Parallelism The default in Handel-C is to execute statements sequentially. However, Handel-C allows parallelism by using the keyword par. Example 2: The following sequential block takes three clock cycles, whereas the parallel block takes only one cycle. // Sequential Block
seq
I
E~~~~~~~~~~~~nhanced / bitmaiuton Preprocessors\~~~~ Parallelism (part }) effects /\RAM, ROM, and MIPRA Pointers /
/
Side
..,
x=
i++ *j- -;
rFunctions Ar~~~~~~~~ithemtic operators e.g., +, -, *, /, %
/ Run-time recursion
Floating point
/
\
\
Bitwise logical operators e.g., & IA,
\
b=2;
c=3;
I
Example 3: The following codes replicate the statement a[i] b [i]; seq(i=O; i y; //shift x by two bits to the // right, x becomes ObOO11 y--i x = x y; //shift x by one bit to the left, // x becomes ObOll0
x
2) Take and drop bits: Let x be a 12-bit variable and y and z be two 4-bit variables. = OxC27; // initialize x to the hexadecimal value (C27) y = x 1 while(i != n) par
I
processing(i); seq
seq(j=O; j!=16; j++) delay; // delay for 16 clock cycles t // free the required registers
i++;
preprocessing(i); // preprocessing of next block
if(i
==
n-1)
paddingo;
else
paddingo;
processing(i);
// padding of last bl -k
padding if # of blocks = 1 /processing of the last block
where n is the number of 512-bit blocks in the message to be hashed (counting starts from 0), and i is initialized to '0'. After preprocessing the first block of a multi-block message, the processing stage starts processing the current block i. During processing a block, the preprocessing stage starts preparing for the next block. In order to do this, we used another set of 16 32-bit registers, Ro to R15. The values of these registers are copied from Xo to X15, respectively. The seq block repeats the delay statement 16 times to ensure that the X registers are copied and become free. The padding process is done on the last block, which is the first block when n = 0. Finally, the last block is processed. Since preprocessing time is much shorter than processing time, the preprocessing stage stalls waiting for the processing stage to finish with the current block and start processing the next one; then it starts preprocessing the next block, if there is any. This is guaranteed in the above code using par. An initialization step is executed in parallel with the preprocessing of the first block. After processing all message blocks, a completion stage is required to prepare the hash value for output. These two stages are not shown in the above code. VI. COMPARISON In this section, we compare the results obtained using Handel-C to design our HMAC-hash unit to results obtained using other methodologies to design hash units. Table I summarizes the comparison. From the table we notice that our design is the only one to use Handel-C among the group under study. Other designs used different methodologies, e.g., VHDL. The table shows also that our design outperforms most of the other designs in speed. The only exceptions are Selimis et al. [23] and Kitsos et al. [25]. However, these two designs include only one hash function, whereas ours includes six integrated hash functions. The extra area required by our design is due mainly to the number of incorporated hash functions in our design. Our design incorporates six hash algorithms, whereas at most four hash functions are included in the other designs. We can see also from the table that the performance achieved from our design that is based on the Handel-C methodology is either comparable to or better than those designs that are based on other methodologies.
1525
Authorized licensed use limited to: UMM AL QURA UNIVERSITY. Downloaded on January 2, 2010 at 04:35 from IEEE Xplore. Restrictions apply.
TABLE I COMPARISON WITH OTHER DESIGN METHODOLOGIES.
Designs
Ng [19]
Methodology* HMAC included ? Hash functions implemented"* FPGA vendor FPGA chip
VHDL No M, R Altera EPFIO K50 1,964 26.66
Wang [20]
Kang [21]
Yes M, S Altera EP20 K1000 5,329 21.96
No M, S, H Altera EP20 K1000 10,573 18.0
Dominikus [22] VHDL No M, S, R, S265 Xilinx XC V300E
Selimis [23]
McLoone [24]
Yes S Xilinx XC V50
Yes S Xilinx XC V1000E
Area cost (LUTs) 4,493 1,593 82 Maximum frequency (MHz) 42.9 "-" means not mentioned. M = MD5, S = SHA-1, R = RIPEMD-160, H = HAS-160 (The Hash function Algorithm Standard), S256 *** This number includes area cost of the HMAC-SHA-1 core and an encryption core.
VII. CONCLUSION In the work presented in this paper, we applied an emerging design methodology to design a reconfigurable HMAC-hash unit. This methodology uses the Handel-C language to design a target project and directly map it onto FPGA platforms. This methodology reduces the risk of errors of the traditional design trend, which prototypes a design in high level languages and then translates it into HDLs. In addition, this methodology increases the flexibility of a design with enhanced performance. Using this methodology, we increased the HMAC-hash unit performance through application of parallelism and pipelining. We showed that higher performance could be achieved using the Handel-C methodology when compared to previous work. Also, the time required to design, implement, and test the designed unit using this methodology is reasonably low as compared to the time required using other design approaches. REFERENCES [1] K. Compton and S. Hauck, "Reconfigurable computing: A survey of systems and software," ACM Computing Surveys, vol. 34, no. 2, pp. 171-210, June 2002. [2] Celoxica Limited. (2003, Dec.) Survey of system design trends. [Online]. Available: http://www.celoxica.com/techlib/files/CELW040216Z08-256.pdf [3] Celoxica Limited. (2002, Aug.) Handel-C language overview. Product Brief. [Online]. Available: http://www.celoxica.com/techlib/files/CEL-
W0307171KDD-47.pdf
[4] B. Holland, M. Vacas, V. Aggarwal, R. DeVille, I. Troxel, and A. D. George, "Survey of C-based application mapping tools for reconfigurable computing," in Proceedings of the 8th Annual MAPLD International Conference (MAPLD 2005), Sept. 2005. [5] Celoxica Limited. (2003) Handel-C language reference manual. [Online]. Available: http://www.celoxica.com/techlib/files/CELW030811132Q-60.pdf [6] E. Khan, M. W. El-Kharashi, F. Gebali, and M. Abd-El-Barr, "A reconfigurable hardware unit for the HMAC algorithm," in Proceedings of the ITI 3rd International Conference on Information & Communication Technology (ICICT 2005), Cairo, Egypt, Dec. 2005, pp. 861-874. Limited. Software prod[7] Celoxica (2005, July) DK uct for version 4.0. [Ondescription line]. Available: http://www.celoxica.com/support/articles/521/CELENGSPDDKDK4.0-Software-Product-Description-01001.pdf [8] Xilinx. (2005) Xilinx ISE 7 software manuals and help. [Online]. Available: http://toolbox.xilinx.com/docsan/xilinx7/books/manuals.pdf [9] H. Eriksson and J. Josefsson, "Evaluation of Handel-C," M.Sc. Thesis, Chalmers University of Technology, Gothenburg, Sweden, Nov. 1999. [Online]. Available: http://www.etek.chalmers.se/ e5he/handelc.pdf [10] J. Lokier. Building custom processors using Handel-C. [Online]. Available: http:llwww.celoxica.com/techlib/files/CEL-W0307171HF8-
14,494*** 24.2 =
Kitsos [25] VHDL No S Xilinx XC V300 5112 47
Proposed design Handel-C Yes M, S, R Xilinx XC2 V4000 12,970 44.1
SHA-265.
[11] Celoxica Limited. (2002, Aug.) Handel-C for hardware design. White Paper. [Online]. Available: http://www.celoxica.com/techlib/files/CELW030717IL48-63.pdf [12] Celoxica Limited. (2002, Aug.) Introducing software paradigms to hardware design. White Paper. [Online]. Available: http://www.celoxica.com/techlib/files/CEL-W030717 IL56-64.pdf [13] C. Sullivan and M. Saini, "Software-compiled system design optimizes Xilinx programmable systems," Xcell Joumrnal, no. 46, 2003. [Online]. Available: http://www.xilinx.com/publications/xcellonline/xcell-46/ xc pdf/xc-celoxica46.pdf [14] P. Kocher, R. Lee, G. McGra, A. Raghunathan, and S. Ravi, "DES developed in Handel-C," in London Communication Symposium (LCS 2002), Sept. 2002. [Online]. Available: http://www.ee.ucl.ac.uk/lcs/papers2002/LCS057.pdf [15] S. M. Loo, B. E. Wells, N. Freije, and J. Kulick, "Handel-C for rapid prototyping of VLSI coprocessors for real time systems," in Proceedings of the 34th South-eastern Symposium on System Theory, Mar. 2002, pp. 6 - 10. [Online]. Available: http://www.eng.uah.edu/-smloo/ssst2002.pdf [16] T. Stoecklein and J. Baesig. Handel-C - an effective method for designing FPGAs (and ASICs). [Online]. Available: http://www.celoxica.com/techlib/files/CEL-WO307171HTM- 16.pdf [17] E. Khan, M. W. El-Kharashi, F. Gebali, and M. Abd-El-Barr, "An FPGA design of a unified hash engine for IPSec authentication," in the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC' 05), Banff, Alberta - Canada, July 2005, pp. 450-453. [18] NIST. (2002, Mar.) The keyed-hash message authentication code (HMAC). FIPS PUB 198. [Online]. Available: http://csrc.nist.gov/publications/fips/fips 198/fips- 198a.pdf [19] C.-W. Ng, T.-S. Ng, and K.-W. Yip, "A unified architecture of MD5 and RIPEMD- 160 hash algorithms," in Proceedings of the 2004 International Symposium on Circuits and Systems, ISCAS '04, May 2004, pp. 23-26. [20] M.-Y. Wang, C.-P. Su, C.-T. Huang, and C.-W. Wu, "An HMAC processor with integrated SHA-1 and MD5 algorithms," in Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC 2004, Jan. 2004, pp. 456-458. [21] Y. K. Kang, D. W. Kim, T. W. Kwon, and J. R. Choi, "An efficient implementation of hash function processor for IPSec," in Proceedings of the 2002 IEEE Asia-Pacific Conference on ASIC, Taipei, Taiwan, Aug. 2002, pp. 93-96. [22] S. Dominikus, "A hardware implementation of MD4-family hash algorithms," in Proceedings of the 9th International Conference on Electronic, Circuits and Systems, Sept. 2002, pp. 1143-1146. [23] G. Selimis, N. Sklavos, and 0. Koufopavlou, "VLSI implementation of the keyed-hash message authentication code for the wireless application protocol," in Proceedings of the 10th IEEE International Conference on Electronics, Circuits and Systems, ICECS 2003, Dec. 2003, pp. 24-27. [24] M. McLoone and J. V. McCanny, "A single-chip IPSec cryptographic processor," in Proceedings of the IEEE Workshop on Signal Processing Systems, SIPS 2002, Oct. 2002, pp. 133-138. [25] P. Kitsos, N. Sklavos, and 0. Koufopavlou, "An efficient implementation of the digital signature algorithm," in Proceedings of the 9th International Conference on Electronics, Circuits and Systems, 2002, vol. 3, Sept. 2002, pp. 1151 - 1154.
4.pdf
1526
Authorized licensed use limited to: UMM AL QURA UNIVERSITY. Downloaded on January 2, 2010 at 04:35 from IEEE Xplore. Restrictions apply.