Optimized Floating Point Square-root

1 downloads 0 Views 282KB Size Report
algorithm is a crucial factor in digital signal processing. Certain computation ... group of two bits. Step 4: Select the first group of bits and subtract 01 from it. If.
Optimized Floating Point Square-root Umesh Satpute

Kalyani Bhole

Sushanta Reang

Instrumentation and Control Engineering Instrumentation and Control Engineering Instrumentation and Control Engineering College of Engineering Pune College of Engineering Pune College of Engineering Pune Pune 411005 Pune 411005 Pune411005 Email: [email protected] Email:[email protected] Email:[email protected]

Abstract—In present digital world, fast and resource optimized execution of basic mathematical operations such as multiplication, division, square-root etc. play an important role. There are enormous algorithm where it is necessary to calculate squareroot. After addition, subtraction, multiplication and division, square-root is most important mathematical operation. Therefore, this paper presents fast, resource optimized, and floating point square-root algorithm. Three different algorithms such as 1) non-restoring algorithm 2) IEEE 754 floating point squareroot algorithm and 3) Logarithmic square-root algorithm, are implemented on Xilinxs Spartan 3E and compared for resource utilization and execution clocks. Comparison shows that IEEE 754 floating point square-root algorithm outperforms with the throughput as 50MSPS consuming 60% less resources than logarithmic square-root algorithm.

keyword: FPGA, IEEE 754 Floating point, logarithmic. I. I NTRODUCTION In todays digital world, fast and resource optimized execution of basic mathematical operations such as multiplication, division, square-root etc. play an important role. There are enormous algorithms where it requires to calculate square-root to reach end terminal. Besides addition, subtraction, multiplication and division, square-root is also a most important mathematical operation. Square root algorithm is a crucial factor in digital signal processing. Certain computation such as RMS (Root Mean square), computing the magnitudes of fast Fourier transform (FFT) results and many more applications includes square root calculations. Square root calculation are considerably more complicated in its structure and capacities. In certain operations it has to be frequently carried out and requires a substantial dynamic range. The necessities for execution and an expansive dynamic range prompt the utilization of drifting (floating) point or logarithmic number frameworks. Kalyani Bhole, Prateek Singh [1], Optimized floating point arithmetic unit is different arithmetic operations such as addition, subtraction, multiplication, division on which floating point number is performed. These operations are dependant on IEEE754 standard for drifting point and it has been implemented on Spartan 3E XC3500e FPGA board. This paper is referred only for conversion for fixed to floating point. M.Franke1, A. Th. Schwarzbacher, M. Brutscheck, St. Becker [2] have proposed only a fixed point square root algorithms such as restoring as well as non-restoring and their imple-

c 978-1-5386-2459-3$31.00 2018 IEEE

mentation. These algorithms are to compare LUTs, Power consumption Hardware design guidelines. Anuja Nanhe, Gaurav Gawali, Shashank Ahire, and K. Sivasankara [3] have introduced pipelined architecture to implement only 8 bit fixed and floating point square root in Field of Programmable Gate Array (FPGA) using modified non-restoring square root algorithm. This algorithm has been optimized by eliminating a number of elements without compromising the precision of the square root and the remainder. This modified non-restoring square root algorithm is to implement on ALTERA cyclone II FPGA. But it required more power only for 8bit. Arpita Jena, Siba Kumar Panda [4] has only discussed different square root algorithms like Newton Method, Babylonian method, Digit recurrence method, Restoring method Non- restoring method. Their problem or advantages. For efficient VLSI signal processing application. This authors said non-restoring pipelined square root architecture effective because this can be implemented with fewest number of hardware resources and it is best suitable for FPGA implementation but not practically implanted, it is discussed in theoretically. N. Ramya Rani1 V. Subbiah and L. Sivakumar1 [5] is focused in the efficient design of logarithmic floating point multiplication and division. implantation devices are Xilinx SPARTAN and VIRTEX FPGA Devices. It is not use in square root algorithm IEEE754 standard transformation of drifting point [6]. This Paper present optimizing floating point square root non-restoring square root algorithms, which are of two type, series and parallel IEEE754 algorithm and to compare with logarithmic algorithm these all algorithm have to check their Resource-Power analysis i.e. LUTs, Slices, Power utilization, throughput. II. M ETHOD There are several methods which are used for calculating floating point square root but the proposed work includes following three methods: a) Non- restoring algorithm b) IEEE754 floating point square-root c) Logarithmic algorithm to compute square-root. All these methods are explained in the following subsections A. Non-Restoring method Non-Restoring algorithm is one of the oldest technique which is being used for different mathematical computations such as division, square-root etc. This method is based on

242

twos compliment representation. Following steps show nonrestoring square-root algorithm work. Due to various advantages as compared to other square root extraction methods the non-restoring method is preferred now a days with wide implementations. It requires only limited number of arithmetic operations due to which the calculation time is reduced and calculation becomes simple and easy. So no more computational complexity arises like other methods. Here the remainder bit does not required to be stored like restoring method. This can be implemented with fewest number of hardware resources and it is best suitable for FPGA implementation. The hardware implementation is very simple. The steps of the non-restoring method are described below. This algorithm can also be used for various design of divider architectures for efficient VLSI Signal processing applications as proposed by A. sahu and Siba K.Panda [9].Again using this admired method Siba K. Panda et al. designed an efficient pipelined square root architecture [10] for the multiplicity of VLSI signal processing applications. These Step follows : Step 1: Start Step 2: Represent the number in the form so that it should have even number of bits in decimal as well as floating part by appending zero before and after it respectively. Let us call this representation as Radicand,M. Divide the radicand in two bits beginning at decimal point in both directions. Step 3: Beginning on the left (most significant), select the first group of two bits. Step 4: Select the first group of bits and subtract 01 from it. If borrow is zero, result is positive then quotient is 1 otherwise it is 0. Step 5: Append the next group of bits to the remainder. Append 01 for the second iteration, from third iteration onwards add zero in between two extreme 1s (to be subtracted next two bits of dividend) and quotient to subtract from remainder of previous stage. Step 6: If result of subtraction is negative, write previous remainder as it is and quotient is considered as 0, else write the difference as remainder and quotient as 1. Step 7: Repeat step 5 and step 6 until end group of two bits. Step 8: End 1) Example: : Flow of non-restoring algorithm is explained with the help of an example. Let us consider that we want to find out square-root of number 150 whose binary representation is 8b10010110. Therefore, radicand is 8b10010110 represented by M. As per step 2, this number is even. Hence no need to append zero to it. Then divide the radicand in groups of two bits 10;01;01;10. Beginning from MSB select first group of two bits i.e. 10. Subtract 01 from it. In this case borrow is zero hence result is positive i.e. 1. Now append next two bits to the remainder i.e.01, making remainder as 0101. Now subtract 101 from remainder. Borrow is zero again hence result is 1 and remainder becomes 0000. Next step is to append next two bits to it forming remainder as 000001. Subtracting 1001 from remainder, we need to borrow 1. Hence result is zero, restore the previous subtraction, making remainder as 000001.

For the last iteration append next two bits to remainder i.e. 00000110. Now subtracting 10001 from it, needs a borrow 1, Hence result is zero. Restore the subtraction making final remainder as 00000110. Therefore, result is 4b1100 i.e. 12.as shown in fig 1.

Fig. 1. Non-restoring algorithm

B. IEEE 754 floating Point A = (−1)s (1 + F raction)2E−B The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a specialized standard for coasting point calculation set up in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). In this paper see only single precision is obtained. It has most of these component such as: 1. Sign Bit (S) = 0 represent non- negative and 1 represent Negative. 2. Exponent (mantissa). 3. Fraction (significant). 4 Bias (B). TABLE I F IXED TO FLOATING POINT CONVERSION Sr.no 1 2 3 4

Component Sign Exponent Fraction Bias

Single Precision(Bit) 1 8 23 127

Double Precision(Bit) 1 11 57 1023

TABLE II F IXED TO FLOATING POINT CONVERSION a 0 0 0 0 0 0 0 1

b 0 0 0 0 0 0 1 x

c 0 0 0 0 0 1 x x

d 0 0 0 0 1 x x x

e 0 0 0 1 x x x x

f 0 0 1 x x x x x

g 0 1 x x x x x x

h 1 x x x x x x x

International Conference on Communication, Computing and Internet of Things (IC3IoT)

x 0 0 0 0 1 1 1 1

y 0 0 1 1 0 0 1 1

z 0 1 0 1 0 1 0 1

243

Fig. 3. Logarithmic method algorithm

Fig. 4. Non-Restoring method simualtion

Fig. 2. IEEE Floating-Point square root

1) Fixed to floating point conversion: Any floating point arithmetic operations, Fixed to Floating point conversion is basic primary step to optimize resources also its execution time, Fixed to Floating point conversion is implemented using logic gates. Digital relation between Fixed point exponent and number of shift required, saves execution time and its implementation in terms of logic gates saves resources. Table II shows relation between fixed point exponent and number of shifts

MATLAB) and after that take blocks like XLINK Block set, constant, Gateway IN, Gateway OUT, Black Box, display box. As shown in following fig 4. It requires calculation of integer square root using non- restoring algorithm. this algorithm can write a Verilog language in project navigator (ISE14.7) and synthesized it then after configured MATLAB i.e. system generator create simulation file saved project navigator file. Now run to check their input output CLK, and project navigator to check its slices, LUTs, throughput, power utilization

C. Logarithmic method

B. IEEE-754 method

The dynamic range of floating point is very high as compared to integer values and expanded intricacy over settled point. Logarithmic number or system frameworks (LNS) give a comparable range and accuracy to gliding point yet may have a few favorable circumstances in multifaceted nature over drifting point for certain application like signal processing operation. Let X be the number in Floating point arrange. As per flow chart fig 2. to discover square foundation of number X.

In simulation process the very same process is followed First we write a program in project navigator (ISE14.7) and next in system generator (configured MATLAB) takes blocks like XLINK Block set constant Gateway IN, Gateway OUT, Black Box, display box . As shown following fig 2.simulation and result IEEE754- floating point squared root. In this simulation calculate the floating point square root usingIEEE-754 algorithm this algorithm writing a Verilog language in project navigator (ISE14.7) and synthesized it then after configured MATLAB i.e. system generator create simulation file saved project navigator file. next run check the output. this output comes hex number and this output is validated using manual calculator and [6] online converter. Output verify is the last step to be carried out.

III. S IMULATION AND R ESULT: A. Non-Restoring method In simulation process, First write a program in project navigator (ISE14.7) and next in system generator (configured

244

International Conference on Communication, Computing and Internet of Things (IC3IoT)

TABLE III N ON -R ESTORING FLOATING POINT SQUARE - ROOT ALGORITHM R ESOURCE -P OWER ANALYSIS S. no 1 2 3 4

Fig. 5. IEEE-754 method Simulation

C. Logarithmic method In simulation process the very same process is followed First we write a program in project navigator (ISE14.7) and next in system generator (configured MATLAB) takes blocks like XLINK Block set constant Gateway IN, Gateway OUT, Black Box, display box . As shown following fig 5.simulation and result IEEE754- floating point squared root. In this simulation calculate the floating point square root usingIEEE-754 algorithm this algorithm writing a Verilog language in project navigator (ISE14.7) and synthesized it then after configured MATLAB i.e. system generator create simulation file saved project navigator file. next run check the output. this output comes hex number and this output is validated using manual calculator and [7] online converter. Output verify is the last step to be carried out.

Parameter Latency LUTs and Slices Power utilization (at 30 deg.C) Throughput

Series (8bit) 1 LUT= 12; Slice = 06

Parallel (8bit) 4 LUT= 48; Slice = 124

Series (24bit) 1 LUT= 484; Slice = 243

Parallel (24bit) 12 LUT= 160; Slice = 94

0.210w

0.210w

0.218w

0.218w

50 MSPS

12.5 MSPS

50MSPS

4.166MSPS

process inputs. It is an amount of measurements per given time or samples per second in Power utilization particular temperature which amount power consumed. In FPGA one has to write a program in two ways, one is series and another way is parallel As shown in above table Table III: Non- Restoring square-root algorithm resource power Analysis is observed that series 8 bit and 24 bit and similarly it is observed parallel 8 bit as well as 24 bit. In series method, it requires less resources like LUTs, Slices, latency and throughput as compared to parallel which is very high. All observation series method is faster than parallel method and also resources required for series method are less hence this method is more efficient than parallel. B. . IEEE-754 floating point square- Root algorithm Resource-Power analysiss TABLE IV IEEE-754 FLOATING POINT SQUARE - ROOT ALGORITHM R ESOURCE -P OWER ANALYSIS Sr.no 1

Parameter Latency

2

LUTs and Slices

3

Power utilization

4

Throughput

IEEE-754 1 LUTS =84, Slice =141 0.218W, (Ambient temp 30deg.c) 50 MSPS

As shown in above table Table IV:IEEE-754 floating point square root algorithms Resources write on series method. Its latency is one, LUTs and Slices required less compared to other method. And. Its speed of execution is 50% greater because of its throughput 50 MSPS. Power utilization is 0.218W at ambient temperature 30 deg.c.

Fig. 6. Logarithmic method

C. Logarithmic floating point square root algorithm resource power analysis

IV. R ESOURCE -P OWER A NALYSIS

TABLE V R ESOURCE -P OWER ANALYSIS OF LNS SQUARE -ROOT ALGORITHM .

A. Non-Restoring floating point square- Root algorithm Resource-Power analysis In FPGA, resourcepower analysis depends on such factor latency such as LUTs and slices, power utilization, throughput. Latency is defined as the number of cycles it takes to complete all outputs. Throughput is the rate at which the system can

Sr.no 1

Parameter Latency

2

LUTs and Slices

3

Power utilization

4

Throughput

LNS 2 Total LUTS =216, TotalSlice,=116 0.210W (ambient temp. at 30,deg.C 25MSPS

International Conference on Communication, Computing and Internet of Things (IC3IoT)

245

As shown in above tableV, Logarithmic floating square-root requires more resources as compared to IEEE 754 algorithm and its execution speed is low. V. C ONCLUSION In this paper, optimized floating point square-root, the main goal is its execution speed is very high and its requires all other resources less as compared to other literature survey papers. and also its execution required less time. It is widely used in digital signal processing and image processing. It aslo present the design of log LUT floating point multiplication and division unit. Comparison shows that IEEE 754 floating point squareroot algorithm output executes with the throughput 50MSPS consuming 60% less resources than logarithmic square-root algorithm. IEEE-754 algorithm requires 50% more resources than LNS. The throughput of Non-Restoring parallel algorithm is slower by 75% than serial algorithm. The proposed strategy has conducted to implement FPGA based unsigned 32 bit and 64-bit binary square root successfully. R EFERENCES [1] Kalyani Bhole, and Prateek Singh, ”Optimized Floating point Arithmetic Unit” , 2014 IEEE India conference. [2] M. Franke1, A. Th. Schwarzbacher2, M. Brutscheck1, 2, St. Becker1,”Implementation of Different Square Root Algorithms 2007 CIICT. [3] Anuja Nanhe, Gaurav Gawali, Shashank Ahire, and K. Sivasankaran, ” Implementation of Fixed and Floating Point Square Root Using Nonrestoring Algorithm on FPGA , IJCEE.octomber2013. [4] Arpita Jena, Siba Kumar Panda ,”Revision of Various Square-Root algorithms for efficient VLSI Signal processing applications,IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) [5] N. Ramya Rani1 V. Subbiah and L. Sivakumar1,”Design of logarithm based floating point multiplication and division on fpga”, ARPN Journal of Engineering and Applied Sciences.january 2016 [6] http://www.hschmidt.net/FloatConverter/IEEE754.html [7] https://baseconvert.com/

246

International Conference on Communication, Computing and Internet of Things (IC3IoT)