Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
FPGA Based Schonhage Strassen Integer Multiplication Algorithm Joseph Anthony Prathap, Associate Professor, Vardhaman College of Engineering, Shamshabad, Telangana. E-mail:
[email protected] Sarvigari Akhila Reddy, Vardhaman College of Engineering, Shamshabad, Telangana. Vattam Shravya, Vardhaman College of Engineering, Shamshabad, Telangana. Madugula Aparna, Vardhaman College of Engineering, Shamshabad, Telangana. Moghal Raval Cheruvu Nagma, Vardhaman College of Engineering, Shamshabad, Telangana.
Abstract--- This paper presents the integer multiplication using the Schonhage Strassen method. The Schonhage Strassen Algorithm is based on the finite fast fourier transform referred as Number Theoretic Transform. The Number Theoretic Transform is advantages to complex number Discrete Fourier Transform, for the round-off error is eliminated in multiplication algorithm. Conventionally, the integer multiplication is performed using the Karatsubha algorithm, Toom Cook algorithm, Boothe algorithm, Robertson’s algorithm. The Schonhage Strassen Algorithm is suitable for high integer values ranging from 10000 to 40000. In this work, the proposed Schonhage Strassen Algorithm based integer multiplication is developed using the VHDL code. The real time validation of the proposed Schonhage Straseen Algorithm based integer multiplication is carried out using the Xilinx Spartan FPGA devices. The performance analysis of speed, area and power are evaluated and compared with different Xilinx Spartan FPGA family. Keywords--- Integer Multiplication, Number Theoretic Transform, Schonhage Strassen Algorithm, Field Programmable Gate Array.
I.
Introduction
The multiplication operation is widely used in many real time applications. The design of multiplication operation is pivotal in digital design. Almost, all digitally designed multiplication algorithms exhibit trade off with respect to speed, area and power. Though many multiplication algorithms such as Karatsubha algorithm, Toom Cook algorithm, Fourier Transform based algorithm and Polynomial multiplication are existing, the demand and utilization of these algorithms in real time produce trade off with respect to area, speed, power and cost. Over the years, the fast Karatsubha multiplication algorithm has evolved and enhanced dramatically. The Karatsubha-Comba multiplier based on FPGA delivers low latency by compromising the area due to the complexity in the hardware design [1]. The coprocessor based integer multiplier provides parallelism within the karatsuba algorithm with decrease in critical path [2]. The comba multiplication targeted on the Xilinx Viretex 7 FPGA aides in the Fully Homomorphic Encryption (FHE) algorithm [3]. The two-fold speed-up is achieved by using the MPA for two integers of 210 bits [4]. The shift and add multiplication algorithm has a substantial improvement in maximum operating frequency [5]. The urdhva tiryabhyam algorithm of Vedic multiplier exhibits high performance compared to the DFT multiplication method [6]. The multiplication algorithm holds its merits and demerits according to the application utilized. The squaring and multiplication method achieves the desired level of security in the number theory based cryptosystems [7]. The SSA with the Map Reduce model presents an efficient parallel multiplier for terabit integers [8]. The multiplication algorithm when implemented in digital controller proves to be acceptable. The modular multiplication implemented using FPGA reduces the area with the slight increase in the critical path of few module in the design [9]. The Xilinx Spartan 3 FPGA used for multiplication algorithm implementation shows less delay time with booth multiplier and less hardware in Robertson’s Algorithm [10]. The FPGA real time implemented matrix-vector multiplication demonstrates the importance in the video processing, computer graphics, image and DSP applications [11].
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
736
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
In this work, the SCHONHAGE STRASSEN Algorithm based integer multiplication is designed using the VHDL code and real time implemented using the Xilinx Spartan FPGA. The SSA method uses the modulus based Fourier Transform referred as “Number Theoretic Transform (NTT) algorithm”. The NTT algorithm assumes all values to be non-negative integer values. The VHDL code for the proposed work follows the structural style of modelling. The real time implementation for the SSA based multiplication is performed by the Xilinx Spartan 3A DSP FPGA device. The next section explains in detail about the proposed method.
II.
Schonhage Straseen Algorithm based Integer Multiplication
The multiplication based on the SSA method is performed by using the finite FFT algorithm. The finite FFT algorithm utilizes the integers in the evaluation procedure. The procedure for integer multiplication using the SSA algorithm is as follows a) The number of integers for the multiplication is fixed as 8 digits. b) If the number of digits is less than 8, the 8 sequences are to be zero padded at the end. c) These formatted two 8 valued integer equivalents are used for the NTT algorithm which is considered as the order of sequence denoted as “n”. d) The “n” value is made sure as non-negative integer. e) The NTT algorithm is based on the modulus operator. In the NTT algorithm, all the values of input sequence as to be within the range (0 to M), where M is the modulus selected (i.e.,) 1≤n≤M. f) The working module in the NTT algorithm is formulated by N = kn+1 (1) Where k is an integer > 1, n is the order of the sequence. g) The primitive nth root of unity for the n-point FFT is substituted by the given below equation in the NTT algorithm (2) ω = gk mod N, Where g is the generator Based on the following conditions, the generator value “g” is chosen a) The assumption for ‘g’ is given as say ‘a’. b) The prime number N is taken as N-1 and factorise as two product values say x and y. c) Now the check for evaluating the generator “a” is given by following (3) aN-1 mod M=1 (4) ax mod M ≠1 and ay mod M ≠1 h) The convolution based on NNT algorithm is evaluated, after finalising with the values of ‘N’, ‘M’ and ‘a’. i) The two 8 input sequences are fed through the 8X8 matrices. The 8X8 matrices of the NTT algorithm is given below (𝑊𝑊80 )0 1 0 ⎛(𝑊𝑊8 ) ⎜(𝑊𝑊82 )0 ⎜(𝑊𝑊 3 )0 ⎜ 84 0 ⎜(𝑊𝑊8 ) ⎜(𝑊𝑊 5 )0 ⎜ 86 0 (𝑊𝑊8 ) ⎝(𝑊𝑊87 )0
(𝑊𝑊80 )1 (𝑊𝑊81 )1 (𝑊𝑊82 )1 (𝑊𝑊83 )1 (𝑊𝑊84 )1 (𝑊𝑊85 )1 (𝑊𝑊86 )1 (𝑊𝑊87 )1
(𝑊𝑊80 )2 (𝑊𝑊81 )2 (𝑊𝑊82 )2 (𝑊𝑊83 )2 (𝑊𝑊84 )2 (𝑊𝑊85 )2 (𝑊𝑊86 )2 (𝑊𝑊87 )2
(𝑊𝑊80 )3 (𝑊𝑊81 )3 (𝑊𝑊82 )3 (𝑊𝑊83 )3 (𝑊𝑊84 )3 (𝑊𝑊85 )3 (𝑊𝑊86 )3 (𝑊𝑊87 )3
(𝑊𝑊80 )4 (𝑊𝑊81 )4 (𝑊𝑊82 )4 (𝑊𝑊83 )4 (𝑊𝑊84 )4 (𝑊𝑊85 )4 (𝑊𝑊86 )4 (𝑊𝑊87 )4
(𝑊𝑊80 )5 (𝑊𝑊81 )5 (𝑊𝑊82 )5 (𝑊𝑊83 )5 (𝑊𝑊84 )5 (𝑊𝑊85 )5 (𝑊𝑊86 )5 (𝑊𝑊87 )5
(𝑊𝑊80 )6 (𝑊𝑊81 )6 (𝑊𝑊82 )6 (𝑊𝑊83 )6 (𝑊𝑊84 )6 (𝑊𝑊85 )6 (𝑊𝑊86 )6 (𝑊𝑊87 )6
(𝑊𝑊80 )7 (𝑊𝑊81 )7 ⎞ (𝑊𝑊82 )7 ⎟ (𝑊𝑊83 )7 ⎟ ⎟ (𝑊𝑊84 )7 ⎟ (𝑊𝑊85 )7 ⎟ ⎟ (𝑊𝑊86 )7 (𝑊𝑊87 )7 ⎠
(5)
vii. The FFT manipulated values are multiplied for the evaluation of the 8 point values which is again fed the IFFT based NTT algorithm using the following 8X8 matrices as shown below (𝑊𝑊80 )−0 (𝑊𝑊80 )−1 (𝑊𝑊80 )−2 (𝑊𝑊80 )−3 (𝑊𝑊80 )−4 (𝑊𝑊80 )−5 (𝑊𝑊80 )−6 (𝑊𝑊80 )−7 1 −0 (𝑊𝑊81 )−1 (𝑊𝑊81 )−2 (𝑊𝑊81 )−3 (𝑊𝑊81 )−4 (𝑊𝑊81 )−5 (𝑊𝑊81 )−6 (𝑊𝑊81 )−7 ⎞ ⎛(𝑊𝑊8 ) 2 −0 (𝑊𝑊82 )−1 (𝑊𝑊82 )−2 (𝑊𝑊82 )−3 (𝑊𝑊82 )−4 (𝑊𝑊82 )−5 (𝑊𝑊82 )−6 (𝑊𝑊82 )−7 ⎟ ⎜(𝑊𝑊8 ) ⎜(𝑊𝑊 3 )−0 (𝑊𝑊 3 )−1 (𝑊𝑊 3 )−2 (𝑊𝑊 3 )−3 (𝑊𝑊 3 )−4 (𝑊𝑊 3 )−5 (𝑊𝑊 3 )−6 (𝑊𝑊 3 )−7 ⎟ 8 8 8 8 8 8 8 ⎜ 84 −0 ⎟ (6) (𝑊𝑊84 )−1 (𝑊𝑊84 )−2 (𝑊𝑊84 )−3 (𝑊𝑊84 )−4 (𝑊𝑊84 )−5 (𝑊𝑊84 )−6 (𝑊𝑊84 )−7 ⎟ ⎜(𝑊𝑊8 ) ⎜(𝑊𝑊 5 )−0 (𝑊𝑊 5 )−1 (𝑊𝑊 5 )−2 (𝑊𝑊 5 )−3 (𝑊𝑊 5 )−4 (𝑊𝑊 5 )5 (𝑊𝑊 5 )−6 (𝑊𝑊 5 )−7 ⎟ 8 8 8 8 8 8 8 ⎜ 86 −0 ⎟ (𝑊𝑊8 ) (𝑊𝑊86 )−1 (𝑊𝑊86 )−2 (𝑊𝑊86 )−3 (𝑊𝑊86 )−4 (𝑊𝑊86 )−5 (𝑊𝑊86 )−6 (𝑊𝑊86 )−7 ⎝(𝑊𝑊87 )−0 (𝑊𝑊87 )−1 (𝑊𝑊87 )−2 (𝑊𝑊87 )−3 (𝑊𝑊87 )−4 (𝑊𝑊87 )−5 (𝑊𝑊87 )−6 (𝑊𝑊87 )−7 ⎠ ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
737
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
vii. The convoluted output of the 8 point input sequences are shifted and added to acquire the desired product of the integers. viii. The flowchart for the proposed NTT method is shown in Fig.1. The flowchart depicts the SSA based integer multiplication. The integer values of 5 digits are converted as 8 bit input sequences by zero padding. The two 5 digit integers are assumed as A={a1,a2,a3,a4,a5,a6,a7,a8} and B={b1,b2,b3,b4,b5,b6,b7,b8}. The FFT algorithm based on the NTT is applied for the two values of A and B individually. The 8X8 matrix representation is used for the FFT based on NTT algorithm. The values in the matrices are fixed according to the modulus function. The recursive point-wise multiplication is performed between the values of A and B. The resultant values are subjected to the IFFT, which is again based on NTT. The 8 output values are ordered accordingly by utilizing the power of 10. The oriented output is the desired product value of the integer multiplication. Integer inputs say A ={a1,a2,a3,a4,a5,a6,a7,a8} and B= {b1,b2,b3,b4,b5,b6,b7,b8} A[0:7]
B[0:7]
FFT 8 X 8 NTT matrix is applied for each of the two integer inputs A & B
Recursive Point wise multiplication of Y[0:7] = A[0:7] * B[0:7]
The value of Y[0:7] is fed through IFFT based 8 X 8 NTT Matrix
Orientation of Y[0:7] values using the powers of 10
Desired integer product output
Fig. 1: Flowchart of the SSA Based Integer Multiplication
III.
Results and Discussions
The inputs values A[0:7] and B[0:7] for the two input sequences are verified with five different combinations. The prime number 337 is fixed as the modulus value ‘M’. The generator value is assumed as W=85. The input values for A=03145 and B=07436, gives Product= 23386220. Similarly, A=03145 & B=27436, Product = 86286220; for A=05678 & B=21234, Product =120566652; for A=00275 & B=82117, Product = 22582175 and for A=05678 & B=01234, Product =706652.The outputs obtained corresponding to the given inputs is 100% precise. The Model Sim software based simulated output of the five input sets for the proposed SSA method is given in Fig. 2. The real time implementation of the SSA based integer multiplication is evaluated using the Xilinx Spartan FPGA family devices. Fig.3 shows the RTL view with the detailed schematic of the proposed method. The FPGA devices
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
738
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
likes Xilinx Spartan 3A DSP, QPro Virtex6 Hi-Real and Xilinx Spartan 6 are utilized for the comparison of the proposed work with respect to area and power. The device utilization chart for the proposed method using the FPGAs are presented in Table 1, Table 2 and Table 3. The Tables 1-3 proves that the Xilinx Spartan 3A DSP FPGA has less component usage compared to the other FPGAs. Also the Xilinx Spartan 6 FPGA consumes less power with the proposed method as depicted in the Table 6. Table 4 and Table 5 show the power analysis of the proposed method using the Xilinx Spartan 3A DSP and QPro Virtex6 Hi-Real respectively. Table 7 depicts the timing analysis of the proposed method using the Xilinx FPGA devices. The delay is minimal for the proposed SSA based Integer multiplication using the Virtex 6 FPGA device.
Fig. 2: Simulation Output of the Proposed SSA based Integer Multiplication
Fig. 3: RTL View of the Proposed SSA based Integer Multiplication
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
739
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
Table 1: Device Utilization Chart for the Proposed Method Using Xilinx Spartan 3A DSP Device Utilization Summary Logic Utilization
Used
Available
Utilization
Number of Slice Latches
56
33,280
1%
Number of 4 input LUTs
2,338
33,280
7%
Number of occupied Slices
1,363
16,640
8%
1,363
1,363
100%
0
1,363
0%
2,447
33,280
7%
272
519
52%
46
84
54%
Number of Slices containing only related logic Number of Slices containing unrelated logic Total Number of 4 input LUTs Number used as logic
2,338
Number used as a route-thru
109
Number of bonded IOBs Number of DSP48As Average Fanout of Non-Clock Nets
1.71
Table 2: Device Utilization Chart for the Proposed Method Using QPro Virtex6 Hi-Real FPGA Device Utilization Summary Slice Logic Utilization
Used
Number of Slice Registers Number of Slice LUTs Number of occupied Slices
Available
Utilization
292
160,000
1%
1,681
80,000
2%
552
20,000
2%
82%
Number of LUT Flip Flop pairs used
1,685
Number with an unused Flip Flop
1,393
1,685
Number with an unused LUT Number of fully used LUT-FF pairs Number of slice register sites lost to control set restrictions Number of bonded IOBs Number of DSP48E1s Average Fanout of Non-Clock Nets
4
1,685
1%
288
1,685
17%
40
160,000
1%
272
400
68%
64
480
13%
1.92
Table 3: Device Utilization Chart for the Proposed Method Using Xilinx Spartan 6 FPGA Device Utilization Summary Slice Logic Utilization Number of Slice Registers Number of Slice LUTs Number of occupied Slices
Used
Available
Utilization
292
93,296
1%
1,693
46,648
3%
590
11,662
5%
82%
Number of LUT Flip Flop pairs used
1,700
Number with an unused Flip Flop
1,408
1,700
Number with an unused LUT Number of fully used LUT-FF pairs Number of slice register sites lost to control set restrictions Number of bonded IOBs Number of DSP48A1s Average Fanout of Non-Clock Nets
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
7
1,700
1%
285
1,700
16%
40
93,296
1%
272
328
82%
64
132
48%
1.99
740
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
Table 4: Power Analysis for the Proposed Method Using Xilinx Spartan 3A DSP FPGA
Table 5: Power Analysis for the Proposed Method Using Xilinx QPro Virtex6 Hi-Real FPGA
Table 6: Power Analysis for the Proposed Method Using Xilinx Spartan 6 FPGA
Table 7: Timing Analysis of the Proposed Method Using the Xilinx FPGA devices Methods Max Path Delay Number of paths Number of destination ports Memory Utilized Total Real Time to MAP Total Real Time to PAR
IV.
SPARTAN 3A DSP 73.278ns 14834722814318 7 324028 KB 18 secs 1 mins
SPARTAN 6 72.817ns 12589833543128 10 119568 KB 56 secs 57 secs
VIRTEX 6 34.043ns 12589833543128 10 129668 KB 1 mins 36 secs 3 mins 14 secs
Conclusion
The real time implementation of the proposed integer multiplication using the Schonhage Strassen Algorithm is feasible. The parametric analysis of area, delay and power are evaluated and compared with the Xilinx Spartan FPGA devices like Spartan 3A DSP, Spartan 6E and Virtex 6. The area utilized by the Xilinx Spartan 3A DSP FPGA is minimal. The power consumption of the proposed method implemented in Spartan 6 is as low as 0.064W.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
741
Jour of Adv Research in Dynamical & Control Systems, Vol. 10, 03-Special Issue, 2018
The Virtex 6 FPGA shows less delay for the proposed method implemented. The proposed integer multiplication can be extended for the FPGA implementation of SSA based floating point multiplication.
References [1] [2]
[3]
[4]
[5] [6] [7] [8] [9]
[10]
[11]
Rafferty, C., O’Neill, M. and Hanley, N. Evaluation of large integer multiplication methods on hardware. IEEE Transactions on Computers 66 (8) (2017) 1369-1382. San, I. and At, N. On increasing the computational efficiency of long integer multiplication on FPGA. IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2012, 1149-1154. Moore, C., Hanley, N., McAllister, J., O’Neill, M., O’Sullivan, E. and Cao, X. Targeting FPGA DSP slices for a large integer multiplier for integer based FHE. International Conference on Financial Cryptography and Data Security, 2013, 226-237. Rudnicki, K. and Stefański, T.P. FPGA implementation of the multiplication operation in multipleprecision arithmetic. MIXDES-24th International Conference Mixed Design of Integrated Circuits and Systems, 2017, 271-275. Pathan, A. and Memon, T.D. An optimised 3× 3 shift and add multiplier on FPGA. 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST), 2017, 346-350. Savithry, J. and Krishna Prakash, N. FPGA Implementation of DFT Processor using Vedic Multiplier. International Journal of Pure and Applied Mathematics 118 (10) (2018) 51-56. Jahani, S., Samsudin, A. and Subramanian, K.G. Efficient big integer multiplication and squaring algorithms for cryptographic applications. Journal of Applied Mathematics (2014). Sze, T.W. Schönhage-Strassen algorithm with MapReduce for multiplying terabit integers. Proceedings of the International Workshop on Symbolic-Numeric Computation, 2011, 54-62. Beguenane, R., Beuchat, J.L., Muller, J.M. and Simard, S. Modular multiplication of large integers on FPGA. Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers, 2005, 1361-1365 Mishra, R.S., Gour, P. and Soni, B.B. Design and Implements of Booth and Robertson’s multipliers algorithm on FPGA. International Journal of Engineering Research and Applications 1 (3) (2011) 905910. Qasim, S.M., Telba, A.A. and AlMazroo, A.Y. FPGA design and implementation of matrix multiplier architectures for image and signal processing applications. International Journal of Computer Science and Network Security 10 (2) (2010) 168-176.
Author Name and Affiliation Dr. Joseph Anthony Prathap was born in 1981 in Puducherry. He has obtained B.E [Electronics and Communication] and M.Tech [VLSI Design] degrees in 2003 and 2007 respectively from Sathyabama University and then Ph.D in FPGA based Power Converters in 2017 from Annamalai University. He has put in 11 years of service in teaching and research. He is currently Associate Professor in the Department of Electronics and Communication Engineering at Vardhaman College of Engineering, Shamshabad, Telangana, India. His research interest includes VLSI design, development of digital switch patterns, FPGA control techniques for power converters, photovoltaic power electronics converters.
ISSN 1943-023X Received: 5 Feb 2018/Accepted: 15 Mar 2018
742