!"#$%&'"()*+*&,%"'&'"-).!*/.'".)0.!*,&/&*'".)'"&*+*0.')-%-12&* * +3*14567896:*.3*;7:*?3*;@>AB9:*#3'>5C>8:*-3*,988=DD=45=:*E3*;>A=>F:*#3*'9AA48>6>*G.C=B4A9FH*I*JKLM*!?#)" 04C4F*84F*C9A9DN4F*A9F9AC4F
DESIGN A GENERATION KEYS MODULE FOR R.S.A ENCRYPTION USING FPGA Orellana, Rafael Lacruz, Jesús
[email protected] [email protected] Departamento de Electrónica y Comunicaciones, Universidad de Los Andes. Mérida-Venezuela Coronel, María
[email protected] Departamento de Circuitos y Medidas, Universidad de Los Andes. Mérida-Venezuela Abstract. The art of keeping messages secure is cryptography. The R.S.A algorithm is a high quality, secure and asymmetric key algorithm used to provide data protection services. This algorithm requires two different keys, one for encryption (public key) and the other for decryption (private key). The computational load depends on the keys bits size and typically is implemented using software-based programs. This paper presents the design and logic synthesis of a module for keys generation of the R.S.A algorithm using hardware description language, particularly, Verilog. The design includes functional and structural specification. Functional specification shows the port list description and functional core in order to attend basic functionality of the module. Structural description shows the proposed architecture for R.S.A keys generation module. The data-path shows the implemented sub-modules and the interconnection between them. Modular exponentiation submodule is designed to compute prime numbers using Fermat test. The keys of R.S.A algorithm are computed using Euclidean Extended Algorithm and stored in two registers. Random numbers used in the algorithm are generated with linear feedback shift registers. The module is parametrized to generate 16, 64 and 128 bits keys size. The correct data flow is checked by the control unit implemented as a finite state machine. Test bench is designed to check the functionality of R.S.A generation keys module, simulated using Verilog and Xilinx ISE 14.7. Test shows public key and private key computed and the correct encryption and decryption task for different keys size. For the logic synthesis a FPGA kit board Spartan-3E is used setting a clock frequency of 50 MHz. The final results show singular improvements in the proposed architecture in terms of timing and area, and the advantages of using a parameterizable design. Keywords: Cryptography, R.S.A, Public key, Private Key, FPGA
.'!OYK*#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!
1.
INTRODUCTION
Programmable hardware structures, especially Field Programmable Gates Arrays (FPGA) devices, are useful to implement prototypes, offering high performance hardware at a reduced cost in comparison to Application specific Integrated Circuits (ASIC). Currently, FPGA devices are composed of complex functional blocks that are useful to implement complex algorithms [1]. In cryptography applications, it is possible to change the algorithm in terms of hardware, and obtain good performance and the ability to be connected to high speed peripheral devices with special function to !"!#$%!&'(#)&%(*+,-+,*$!!,+!).#&/,01#&).2$)$!$&3$.#4+$!5 Cryptography plays an important role in the security of data. It enables us to store sensitive information or transmit it across insecure networks so that unauthorized persons cannot read it. Encryption algorithms can be classified into two groups: symmetric key algorithms (with private key algorithms) and asymmetric key algorithms (with public key algorithms) [2]. The asymmetric key algorithm requires two different keys, one for encryption and other for decryption. Now, Rivest-Shamir-Adleman (R.S.A) algorithm is the most widely accepted and implemented public key cryptosystem. It is based on different keys, one key for encryption (public key) and a different but related key for decryption (private key) [3]. However, the R.S.A algorithm has a large computational load, operating over large (typically thousands of bits long) integers. Several works have been done on hardware implementation of R.S.A encryption algorithm. A hardware implementation of R.S.A encryption scheme has been proposed by Deng Yuliang and Mao Zhigang. in [4], where they use Montgomery algorithm for modular multiplication. A similar approach has been used by C.N. Zhang & Y. Xu. in [5]. This design scheme focuses on the implementation of a R.S.A cryptographic processor using Bit-Serial Systolic Algorithm. Other work was proposed and modeling of R.S.A public key encryption/decryption system for 128 bits key sizes using a FPGA [6]. This entire works were implemented considering that public key and private key are known. This paper presents an architecture to implement a generation keys module for R.S.A algorithm using a FPGA. It uses a modular exponentiation module to calculate the prime numbers of the algorithm using Fermat test. The R.S.A keys are computed using Euclidean extended algorithm. Random numbers used in the algorithm are generated with linear feedback shift registers (LFSR). The module is parametrized to generate 16, 64 and 128 bits key size. The correct data flow is checked by the control unit implemented as a finite state machine. 2.
R.S.A ALGORITHM
R.S.A is an encryption algorithm based on blocks. This means that both the plain text and the cipher text are given number between 0 and (n-1). A message larger that log2(n) is divided into segments of appropriate length, called blocks, which are encrypted one by one. Besides, as a public key cryptographic algorithm, it is based on a mathematic related key pairs between public key and a private key [7]. R.S.A algorithm is summarized in three main steps: keys generation, encryption and decryption. 2.1 Keys generation
#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!*.'!OYL
60)(!&!#$-)$&-+(2.#$&.0/&-478(*&9$"!&.+$&:$0$+.#$/5&; shows a flow diagram used to calculate them.
Figure 1- Diagram flow to calculate R.S.A keys. Keys generation for RSA starts with the selection of two prime numbers (p and q) which are then multiplied to produce the publicly visible modulus n. The strength of R.S.A algorithm is based on the difficulty of factoring n to discover the original prime numbers. Hence the larger the value of these primes, the harder the factorization problem becomes. Then, is calculated Euler function (n). Next an integer, E, that is relatively prime to !"#, is randomly chosen as the public key. It must satisfy that the Greater Common Divisor (GCD?&7$#'$$0)$%&(
[email protected],&;=>A&.0/&-478(*& key is between the ranges from 1 to !"#. Private key, D, is generated finding the multiplicative inverse using Euclidean extended algorithm (EEA). 2.2 Encryption In R.S.A algorithm both, plain text (M) and cipher text (C), are blocks with length less than log2(n) [7]. In encryption, the cipher text is genera#$/&7"&;B@5&C=?>5 C = M E mod n
(1)
Where mod is the modulus operator between exponential operation ME and the number n. 2.3 Decryption In R.S.A algorithm the clear message is recovered using the private key D 7"&.--8"(0:&;B@5& CD?>A&')$+$&mod is the modulus operator between CD and the number n. M = C D mod n
(2)
.'!OYJ*#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!
Now, next section shows the design of a hardware module to implement the step of keys generation in R.S.A algorithm. 3.
R.S.A KEYS GENERATION MODULE DESIGN
The R.S.A keys generation module is designed to obtain public key and private key. It is parametrized for key sizes ,3&;E>&7(#!&C=F&7(#!A&FG&7(#!&.0/&=DH&7(#!?5&I)$&/$!*+(-#(,0 of the port interface is shown (0&;I.78$&=>5 Table 1. Port interfaces of R.S.A keys generation module Port Name
Direction
clk_i
Input
Size (bits) 1
reset_i
Input
1
start_i
Input
1
public_key
Output
N
n_number
Output
N
valid_eu
Output
1
Description Clock control signal. All signal timings are related to the +(!(0:&$/:$&,3&;*89> Asynchronous reset signal. It is active LOW and reset all module Used to start operation module. When HIGH the module calculates R.S.A keys. LOW indicates that module does not generate any key Public key generated using R.S.A algorithm. The size is parametrized for 16 bits, 64 bits, 128 bits Product of two numbers primes generated using R.S.A algorithm. The size is parametrized for 16 bits, 64 bits, 128 bits Flag to indicate that module has generated a valid keys
3.1 Data-path design ;&!),'!)$&/.#.-path proposed for R.S.A keys generation module. Control signals come from control unit to correct data flow.
Figure 2- Data-path of R.S.A keys generation module.
#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!*.'!OYV
Two LFSR modules are used in the design to generate random numbers. The first one generates random numbers in order to find if it is a prime number. A Modular Exponentiation module is used to implement Fermat test [8]. When two prime numbers are calculated they are stored in two registers (p_number and q_number). Then, Euler function and product of prime numbers are calculated and stored in registers (fi_number and n_number). The second LFSR is used to generate random public keys for Euclidean Extended module. When random public key satisfies conditions of Euclides algorithm, public and private key are calculated and stored in registers (public_key and private_key). Flag valid_eu is used to indicate that the public key (n_number and public_key) and private key are computed satisfactorily. LFSR modules design. Linear feedback shift register is used to generate random numbers. In theory, an N-bit LFSR can generate 2N-1 bit long random sequence before repeating [9]. These module is parametrized to implement N-bit LFSR for random public key and (N/2)-bit for the generation of prime numbers. Combinational logic using exclusive OR gates, AND gates and shift operator are implemented in a feedback loop of LFSR. The structure of the N-bit LFSR is !),'0&(0&;5
Figure 3- LFSR module for N-bits Modular exponentiation module. modular exponentiation module.
;& !),'!& #)$& ).+/'.+$& !#+4*#4+$& used for
Figure 4- Modular exponentiation module
.'!OYW*#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!
Modular exponentiation for large numbers is considerably difficult to compute. However, this operation can be simplified into a series of modular multiplications and squaring operations KLMK=NM5&60)(!&.8:,+(#)%)$&$O-,0$0#&04%7$+&;P>&(!&scanned either from Left to Right (LR) or Right to Left (RL). In LR method, which is common used, if the scanned bit is logic ;zero> a squared operation w(#)&7.!$&04%7$+&;Q> is performed. However if the scanned bit is logic ;on$>& .& %48#(-8(*.#(,0& ,-$+.#(,0& 7$#'$$0& 7.!$& 04%7$+& ;Q>& .0/& $O-,0$0#& 04%7$+& ;P>& (!& computed. This operation is performed k-times, ')$+$& ;9>& (!& #)$& modulus length [10][11]. Modulus operation is implemented using restoring hardware dividers [12]. Euclidean extended module. Euclidean extended algorithm is an extension of Euclides algorithm to compute GCD between two integer numbers (a and b) and the coefficients (x and z) showed in ;B@5&CJ?>5 W)$0&RST&C.A7?&(
[email protected],&;=> #)$&*,$33(*($0#&;U>&+$-+$!$0#!)$&(02$+!$& %48#(-8(*.#(2$&,3&;7>&04%7$+&K=JM5 a x + b z = GCD(a,b)
(3)
This module takes as input Euler function and a random public key generated by the LFSR. When the conditions shown (0& ;& .+$& !.#(!3($/A& -+(2.#$& 9$"& (!& *,%-4#$/& .!& (02$+!$& multiplicative of random public key and it is stored in registers. ;& !),'!& #)$& -!$4/,& code of Euclidean extended algorithm implemented in hardware.
Figure 5- Euclidean extended algorithm for compute private key 3.2 Control unit design Control unit is implemented as a finite state machine (FSM) to control the data flow shown (0&;5&Q&!#.#$ diagram of FSM is shown (0&;5 ;I.78$&D>&!),'!)$&-,+#&(0#$+3.*$&,3& the FSM designed. When input reset_i is enabled, all registers and modules of data-path go to initial conditions, and FSM stayed in IDLE state. If reset_i signal is unable, the input start_i starts the operation of the module. The GENERATE_PRIMES state is used to calculate two prime numbers. An internal counter is implemented to check if two prime numbers are generated using Fermat test in the data-path and then store the product of them in registers. Next, in CALCULATE_EULER state
#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!*.'!OYX
Figure 6- State diagram of Control Unit FSM the Euler function is stored and then used to calculate the public and private keys. ENABLE_LFSR_KEY state is used to generate a random number to be use as a possible public key, CHECK_KEY state is verified if the random number meets the range of possible values for the public key. Next, GENERATE_KEYS state is responsible to enable the module to calculate public and private keys using Euclidean extended algorithm. Finally, in END state the keys are stored in registers and activate a high flag to indicate the task is ended. Table 2. Port interface of Control Unit FSM Port Name
Direction
result
Input
Size (bits) N
done_exp start_primes counter
Input Input Input
1 1 2
fi_n_reg random_key
Input Input
N N
valid_eu done_eu
Input Input
1 1
enable_prime
Output
1
enable_exp enable_key
Output Output
1 1
enable_eu store_fi_n valid_prime
Output Output Output
1 1 1
Description Indicates the result of modular exponentiation. It is used with Fermat test. When HIGH indicates the end of modular exponentiation. It is used to enable FSM for calculate prime numbers. Indicates when two prime numbers are computed using Fermat test. Register used to store the Euler function. Random number used as a possible public key in Euclidean extended algorithm. When HIGH indicates a valid private key is computed. When HIGH indicates the end task of Euclidean extended module. When HIGH enables LFSR for random numbers used to compute prime numbers. When HIGH enables modular exponentiation module. When HIGH enables LFSR for random numbers used as possible public key. When HIGH enables Euclidean extended module. When HIGH stores Euler function in a register. When HIGH indicates a prime number is computed
.'!OYM*#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!
4.
RESULTS
This section shows prime numbers computed using Fermat test, Euler function number and Euclidean extended module performance to calculate public key and private key. The data-path of all modules, control unit and test bench are described using Verilog language and simulated using Xilinx ISE 14.7. A Spartan-3E board is used to synthesize the code. Results are shown in hexadecimal numbers. 4.1 Prime numbers and Euler function simulation Fermat test is implemented using %,/48.+&$O-,0$0#(.#(,0&%,/48$5&; shows the test bench simulation for 128-bits size to compute prime numbers (p_number and q_number), Euler function number (fi_n_reg) and the product of prime numbers (n_number). ;I.78$& J>& -+$!$0#!& these values for 16-bits and 64-bits size.
Figure 7- Time diagram for test bench to calculate prime numbers Table 3. Prime numbers calculated for 16-bits and 64-bits size N-bits size 16 64
p_number 95 6CE74E9D
q_number 29 4EBD374D
n_number fi_n_reg 17DD 1720 21715B557FBF6039 21715B54C43ADA50
4.2 Euclidean extended module simulation ;& !),'! the test bench for 128-bits public key and private key using Euclidean extended algorithm. When GCD of fi_n_reg and random_key +$:(!#$+!& (!& ;=>, a private key is calculated and valid_eu flag is high. Flag done_eu is high to indicate that the task of Euclidean extended module is finished. ;I.78$&G>&!),'!&16-bits and 64-bits keys calculated.
Figure 8- Time diagram for test bench to calculate public and private key
#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!*.'!OYY
Table 4. Public and private key calculated for 16-bits and 64-bits size N-bits size 16 64
Public key 05B5 0F07C000FDDA8A25
Private key 02BD 0AA9A4E7CB35E33D
4.3 Synthesis results ;I.78$& V>& !),'!& #)$ area (slices) and timing (maximum clock frequency) summary with synthesis estimated values for R.S.A keys generation module. The FPGA chip used is a Xilinx Spartan-3E xc3s500e-5fg320 with 50 MHz of clock frequency. Table 5. Summary with estimated values for R.S.A keys generation module N-bits size 16 64 128
Number of Slices 858 7628 30998
Number of Slice Flip Flops 1103 4167 7917
Maximum Frequency 79.18 MHz 40.09 MHz 28.49 MHz
It is clear that when the number of bits is increased, the maximum frequency achievable is decreased, because it is more complicated to obtain prime numbers and encryption/ decryption keys with the hardware architecture proposed. For 16-bits and 64-bits keys size the number of slices allows the implementation of this module using the Xilinx FPGA, perhaps, 128-bits keys size the slices used are over available, therefore, it is necessary to select other FPGA chip. 5.
CONCLUSIONS
In this paper hardware architecture to generate public and private keys of R.S.A algorithm is presented. Fast modular exponentiation module is used to implement a primality test to get two prime numbers, Fermat test is used in this case. Euclidean extended algorithm is implemented in hardware to solve a Diophantine equation with GCD of Euler function number and public key, where its inverse multiplicative represents the private key. R.S.A generation keys module is parametrized for 16, 64 and 128-bits keys size. The proposed hardware architecture is implemented using Verilog targeting Xilinx Spartan3E xc3s500e-5fg320. The whole design is tested using Xilinx ISE 14.7 tool, showing the correct functionality to obtain prime numbers, Euler function number and R.S.A keys. Synthesis result shows that 128-bits size implementation require more than 100% resources of FPGA board selected, so another FPGA chip board with best performance is necessary for hardware implementation, however, 64-bits and 16-bits keys size are implemented satisfactorily using the kit board selected. Nowadays, for security reasons, commercial implementations requires 1024-bits and 2048bits keys size, so it is possible to employ the parameterizable characteristic of the proposed hardware architecture to implement them, using advanced FPGA kit boards with specialized blocks to reduce area and improve maximum frequency.
.'!OYZ*#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!
Acknowledgements We would like to express our special thank and appreciation to the Electronic and Communications department of Electrical Engineering School at University of Los Andes for their support throughout this work. REFERENCES [1].Huffmire, T., Irvine, C., Nguyen, T., Levin, T., Kastner, R., & Sherwood, T., FPGA Updates and Programmability, pp. 87-96. Springer Science - Basicness Media, 2010. [2].Schneier, B., Applied Cryptography: Protocols, Algorithms, and Source Code in C, pp. 200-210. John Wiley & Sons, 1996. [3].Rivest, R., Shamir, A., & Adleman, L., Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM, vol. 21, n. 2, pp. 120X126, 1978. [4].Deng Y., Mao Z., & Ye Y., Implementation of RSA Crypto-Processor Based on Montgomery Algorithm. Fifth International Conference on Solid-State and Integrated Circuit Technology, pp. 524-526, 1998. [5].Zhang, C., Xu, Y., & Wu, C., A Bit-Serial Systolic Algorithm and VLSI Implementation for RSA. IEEE Communications, Computers and Signal Processing, vol. 2, pp. 523526, 1997. [6].Sushanta, K., & Manoranjan, P., FPGA Implementation of RSA Encryption System. International Journal of Computer Applications, vol. 19, n. 9, pp. 10-12, 2011. [7].Prasu, G., Malabika, B., & Biswas, M., Hardware Implementation of TDES Crypto System with On Chip Verification in FPGA. Journal of Telecommunications, vol. 1, n. 1, pp. 113-117, 2010. [8].Henk, C., & Sushil, J., Encyclopedia of Cryptography and Security, pp. 455-456. Springer, 2011. [9].Chiranth, E., Chakravarthy, H., Nagamohanareddy, P., Umesh, T., & Chethan, M., Implementation of RSA Cryptosystem Using Verilog. International Journal of Scientific & Engineering Research, vol. 2, n. 5, pp. 1-7, 2011. [10].Muhammad, I., Mamun, B., Reaz, H., & Sazzad, H., FPGA Implementation of RSA Encryption Engine with Flexible Key Size. International journal of communication, vol. 1, n. 3, pp. 107-113, 2007. [11].Vibhor, G., Aruna, V., Architectural analysis of RSA crypto system on FPGA. International Journal of Computer Applications, vol. 26, n. 8, pp. 30-34, 2011.
#9P4A=>F*C98*'45QA9F4*"5B9A5>D=45>8*C9*#RB4C4F*)@PRA=D4F*95*"5Q95=9AS>*T*'=95D=>F*&U8=D>C>F:*'"#.)"'!*.'!OY[
[12].Soderquist, P. & Leeser, M., Division and square root: choosing the right implementation. IEEE Micro, vol. 17, n. 4, pp. 56-66, 1997. [13]. Cormen, T., Leiserson, C., Rivest, R., & Stein, C., Introduction to Algorithms, pp. 859861. MIT Press and McGraw-Hill, 2001.