2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)
FPGA Implementation of IEEE-754 Floating Point Karatsuba Multiplier Ravi Kishore Kodali, Satya Kesav Gundabathula and Lakshmi Boppana Department of Electronics and Communication Engineering National Institute of Technology, Warangal WARANGAL, INDIA
[email protected] m Abstract—
The floating point arithmetic, specifically multiplication, is a widely used computational operation in many scientific and signal processing applications. In general, the IEEE-754 single-precision multiplier requires a 23 x 23 mantissa multiplication and the double-precision multiplier requires a large 52 x 52 mantissa multiplier to obtain the final result. This computation exists as a limit on both area and performance bounds of this operation. A lot of multiplication algorithms have been developed during the past decades. In this paper, the two of the popular algorithms, namely, Booth and Karatsuba (Normal and Recursive) multipliers have been implemented, and a performance comparison is also made. The algorithms have been implemented on an uniform reconfigurable FPGA platform providing a comparison of FPGA resources utilized and execution speeds. The recursive Karatsuba is the best performing algorithm among the algorithms.
Keywords—Floating Point multiplication; Karatsuba multiplier; FPGA
I. INTRODUCTION Embedded systems are designed for different functionality which finally requires the manipulation of real-valued data. These data are stored as floating point numbers in the memory, which is limited. So, these floating point computations are to be approximated to avoid memory limitations, which are known as the floating point arithmetic [1]. In floating point arithmetic, multiplication operation occurs more frequently, when compared to others. Thus floating point multiplication plays a major role in the design and implementation aspects of floating point processor [2]. The computational speed and hardware utilization [3] are the two important criteria while choosing an algorithm in the implementation of floating point multipliers [4]. FPGAs offer high performance and very high operating speeds with limited amount of logic devices and IP cores available on the system. Their applications are more commonly observed in the field of digital signal processing, communications engineering, and also in very high speed computing systems such as super computers. This work involves an efficient FPGA implementation of floating point multiplication, namely IEEE-754 single and double precision floating point
978-1-4799-4190-2/14/$31.00 ©2014 IEEE
multiplication. Two different algorithms have been chosen for this purpose. The rest of the paper is organized as follows: Section II provides literature survey, section III presents an overview of the floating point multiplication and the algorithms used, section IV gives hardware implementation, section V presents simulation and experimental results and the final section concludes the work. II.
LITERATURE SURVEY
Optimizing the operational speed of the multiplier is the main aspect in the design of a floating point arithmetic processor. 24 x 24 and 53 x 53 mantissa multiplication when performed using traditional multiplication approach, utilizes large amount of hardware resources and takes more delay for the computation. So, the hardware utilization and timing delays can be reduced by the Booth’s algorithm when used for mantissa multiplication [5] which is detailed in [6] and [7]. The timing delay and power dissipation are further reduced by using a carry save adder scheme, high-speed CMOS full adder and modified carry select adder, which is given in [8]. However, several algorithms are in existence which serves the purpose of optimization of floating point multipliers. Karatsuba algorithm defined for multiplication of long integers is one of the fastest and best algorithms. The survey of strengths and weaknesses of Booth and Karatsuba algorithms is presented in [9] which concludes that Karatsuba multiplication takes lesser signal propogation time and also long number multiplication can be suitably implemented using Karatsuba’s algorithm when compared to Booth algorithm [10]. The implementation of Floating point multiplier using Karatsuba algorithm is very efficient as presented in [11]. When this algorithm is performed recursively [12] till it encounters the multiplication of 2-bit or 3-bit numbers, use of higher order logical multiplier blocks is avoided and hence the implementation becomes very simple and efficient in terms of area requirements [13]. Hence recursive Karatsuba algorithm is chosen for the implementation of the floating point multiplier on FPGA platform and comparison is done with the aforementioned two algorithms.
300
2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)
III. OVERVIEW OF FLOATING POINT MULTIPLICATION AND THE USED ALGORITHMS The IEEE-754 floating point number format is as follows: For single precision
For double precision
Any floating point operation implementation generally need the computation of sign, exponent and mantissa fields separately and combining them after normalization. The basic steps involved in the computation are described in Algorithm 1.
A. Booth Multiplication Algorithm Multiplication of two signed binary numbers (2’s compliment form) can be achieved through the Booth’s algorithm. It has the advantage that the time taken for multiplication depends on the number of 1’s in the multiplier. Lesser the number of 1’s, faster is the multiplication. The radix-2 Booth’s multiplication algorithm involves dealing with 2-bits of the multiplier at a time. Let Q denote the multiplier, Qn , Qn−1 denote the LSBs of multiplier in the current and previous cycles respectively and M denote the Multiplicand. Rules of the algorithm are given in the Table -I.
B. Non-Recursive Karatsuba Multiplication Algorithm Let p = (u1u2...un)b , q = (v1v2...vn)b , where n = 2k, then we can write p and q as the following form: So we have In 1963, Karatsuba transformed equation (1) to equation (2). where r0 = p 0 q 0 , r1 = p 1 q 1 r2 = (p 0 + q 0 )×(q 0 + q 1 ) if n = 2, equation (2) needs to execute three multiplication and four addition and subtraction base operations. If n > 2, the same equation reduces the problem of multiplying two length n (n = 2k) integers to three multiplication of length n/2 numbers, namely p 0 q 0 , p 1 q 1 , (p 1 + p 0 )(q 1 + q 0 ), plus two addition of length n/2 numbers, two additions of length n numbers, two subtractions of length n numbers.
Fig. 1: Block diagram non-recursive Karatsuba multiplier
C. Recursive Karatsuba Multiplication Algorithm We can obtain the same product as that in above by using divide and conquer method recursively. Let T (n) be the computation time of the multiplication p × q, we can get the recursion easily:
T ABLE I: Booth’s multiplier recoding rules
So we get
978-1-4799-4190-2/14/$31.00 ©2014 IEEE
301
2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)
Hence, recursive Karatsuba is more efficient than normal Karatsuba which is proved by hardware implementation on FPGA. IV.
ALGORITHM IMPLEMENTATION
In this section, the details of the floating point multiplier design using Booth’s, Karatsuba and Recurive Karatsuba algorithms have been discussed. The implementation of design has been focused on Virtex-6 FPGA using Xilinx ISE 10.1 platform. The Booth Multiplication implementation is given in Algorithm -2.
the floating point inputs. The 24-bit operands are divided into two 12-bit vectors each and the subsequent operations in the algorithm are carried out using 12x12 Booth multiplier instead of using traditional multiplier to optimize the area and delay. This implementation is demonstrated in Algorithm -3.
In the Karatsuba multiplication for single precision, x and y denote the hidden bit 1 included with 23-bit mantissa of
978-1-4799-4190-2/14/$31.00 ©2014 IEEE
For the double precision Karatsuba multiplication the implementation is same as that of the single precision case except for the length of the operands and also an extra zero bit is added to the operands to make the length of sequence an even number so that bifurcation is possible. Hidden bit 1 along with mantissa which is 52- bit constitutes the 53-bit operands. The implementation is given in Algorithm -4.
302
2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)
In the recursive case of Karatsuba multiplication for single precision each 24-bit operand (the hidden bit 1 along with 23bit mantissa) has been split recursively until 3-bit sequences are obtained i.e., 24-bit into two 12-bit, 12-bit into two 6- bit, and finally 6-bit into two 3-bit. These 3-bit values have been used for direct multiplication instead of 12- x 12- as in the case of non-recursive Karatsuba multiplication so that area utilization and time delay is considerably reduced. The implementation is given in Algorithm -5. For the double precision format, multiplication using recursive Karatsuba algorithm is achieved by first pre-pending three 0’s to the 53-bit operands(one hidden bit and 52-bit mantissa) and then bifurcating the 56-bit sequence recursively until 7- bit vector is obtained to use 7x7 multiplier for the processing. The same is given in Algorithm -6.
T ABLE II: Comparison of algorithms for Single precision
T ABLE III: Comparison of algorithms for double precision
V.
RESULTS AND SIMULATION
We have implemented these algorithms for both single and double precision using VHDL. It has been synthesized and routed on Virtex-6 FPGA target using Xilinx ISE. Simulation results have been analyzed in ModelSim-SE. Hardware utilization and performance for all proposed implementation and for the Xilinx core are shown in Table -II and Table -III
978-1-4799-4190-2/14/$31.00 ©2014 IEEE
respectively, on FPGA. All the hardware resource estimates were obtained after place and route process of FPGA synthesis. The simulation results for single and double precision are shown in Figures -2a and -2b. The device utilization summary and the macro statistics shows that the Karatsuba algorithm consumes less hardware when compared to Booth’s and recursive Karatsuba provides much better utilization than the non-recursive Karatsuba and Booth algorithm for both single format and double precision format.
303
2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT)
[2]
[3]
(a) Single Precision IEEE-754 simulation results
[4]
[5]
[6] (b) Double Precision IEEE-754 simulation results Fig. 2: Simulation Results
The number of bonded inputs and outputs does not depend on the algorithm used. The timing summary also concludes that Karatsuba multiplier is faster than Booth’s and recursive Karatsuba multiplier is the fastest among the three algorithms. VI.
CONCLUSIONS
In this work, the three floating point multiplication algorithms, namely Booth, normal Karatsuba and recursive Karatsuba, have been implemented using FPGA and a performance comparison has been carried out. Two main criteria, FPGA resources and processing speed are used when evaluating the performance. As can be noticed from the results that the recurisve Karatsuba algorithm performs better than normal Karatsuba and Booth algorithm, whereas the recursive Karatsuba algorithm has the least FPGA resources utilised and the speed is relatively high. Hence recursive Karatsuba is the best algorithm among the three algorithms. REFERENCES [1]
Y. Weijian and W. Meiying, “Research of the integer four arithmetic operations replace floating point four arithmetic operations algorithms in the c++,” in Computer and Communication Technologies in Agriculture
978-1-4799-4190-2/14/$31.00 ©2014 IEEE
[7]
[8]
[9] [10]
[11]
[12]
[13]
Engineering (CCTAE), 2010 International Conference On, vol. 1, June 2010, pp. 349–352. T . Rodolfo, N. L. V. Calazans, and F. Moraes, “Floating point hardware for embedded processors in fpgas: Design space exploration for performance and area,” in Reconfigurable Computing and FPGAs, 2009. ReConFig ’09. International Conference on , Dec 2009, pp. 24–29. M. Kumar Jaiswal and R. C. C. Cheung, “Area-efficient fpga implementation of quadruple precision floating point multiplier,” in Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2012 IEEE 26th International, May 2012, pp. 376–382. M. Kumar Jaiswal and N. Chandrachoodan, “Efficient implementation of ieee double precision floating-point multiplier on fpga,” in Industrial and Information Systems, 2008. ICIIS 2008. IEEE Region 10 and the Third international Conference on, Dec 2008, pp. 1–4. N. Besli and R. Deshmukh, “A 54 times;54 bit multiplier with a new redundant binary booth’s encoding,” in Electrical and Computer Engineering, 2002. IEEE CCECE 2002. Canadian Conference on , vol. 2, 2002, pp. 597–602 vol.2. A. Inoue, R. Ohe, S. Kashiwakura, S. Mitarai, T . T suru, T . Izawa, and G. Goto, “A 4.1 ns compact 54/spl times/54 b multiplier utilizing sign select booth encoders,” in Solid-State Circuits Conference, 1997. Digest of Technical Papers. 43rd ISSCC., 1997 IEEE International, Feb 1997, pp. 416–417. G. Renxi, Z. Shangjun, Z. Hainan, M. Xiaobi, G. Wenying, X. Lingling, and H. Yang, “Hardware implementation of a high speed floating point multiplier based on fpga,” in Computer Science Education, 2009. ICCSE ’09. 4th International Conference on, July 2009, pp. 1902–1906. M. Uya, K. Kaneko, and T. Yasui, “A cmos floating point multiplier,” Solid-State Circuits, IEEE Journal of, vol. 19, no. 5, pp. 697–702, Oct 1984. N. Nedjah and L. de Macedo Mourelle, “A review of modular multiplication methods and respective hardware implementations.” Informatica (03505596), vol. 30, no. 1, 2006. M. Machhout, M. Zeghid, W. El Hadj Youssef, B. Bouallegue, A. Baganne, and R. T ourki, “Efficient large numbers karatsuba-ofman multiplier designs for embedded systems.” International Journal of Electronics, Circuits & Systems, vol. 3, no. 1, 2009. M. K. Jaiswal and R. C. Cheung, “Vlsi implementation of doubleprecision floating-point multiplier using karatsuba technique,” Circuits, Systems, and Signal Processing, vol. 32, no. 1, pp. 15–27, 2013. N. Nedjah and L. de Macedo Mourelle, “A reconfigurable recursive and efficient hardware for karatsuba-ofman’s multiplication algorithm,” in Control Applications, 2003. CCA 2003. Proceedings of 2003 IEEE Conference on, vol. 2, June 2003, pp. 1076–1081 vol.2. E.-H. Wajih, M. Mohsen, Z. Medien, and B. Belgacem, “ Efficient hardware architecture of recursive karatsuba-ofman multiplier,” in Design and Technology of Integrated Systems in Nanoscale Era, 2008. DTIS 2008. 3rd International Conference on, March 2008, pp. 1–6
304