IEEE Conference Paper Template

6 downloads 3043 Views 693KB Size Report
based Parallel Multiplier for the MAC unit of DSPs. Arvind Chakrapani .... 131 out of 1920 (6 %). 127 out of 1920 (6 %). No. of IOs. 34. 34. 33. Delay. 41.428 ns.
Proceedings of IEEE International Conference of Science, Technology, Engineering and Management (ICSTEM’ 17)

A High Speed and Low Complexity Modified Splitter based Parallel Multiplier for the MAC unit of DSPs Arvind Chakrapani

R. Rekha

Department of ECE, Karpagam College of Engineering, Coimbatore, INDIA. [email protected]

Department of ECE, Karpagam College of Engineering, Coimbatore, INDIA. [email protected]

Abstract - Modern digital signal processors (DSPs) rely on fast binary multipliers to realize high speed circuits for broadband applications. This paper proposes a high speed and low complexity modified splitter based parallel multiplier (ModSBPM) for both signed and unsigned numbers. The proposed Mod-SBPM reduces the computational complexity compared to SBPM by replacing multiplication operation by adders while generating the partial products. The synthesis report shows that Mod-SBPM is more efficient when compared to SBPM and Booth multiplier in terms of hardware requirements including the number of slices and look up table (LUT). Simulation result for 8 x 8 Mod-SBPM shows that the critical path delay is about 70.5 % of that of Booth multiplier for unsigned numbers and about 70.4 % for signed numbers. Index Terms - Binary multiplier, Parallel multiplier, Splitter based parallel multiplier, Booth multiplier.

I. INTRODUCTION The emerging technologies demands the use of high performing DSPs to meet the requirements of high speed circuits employed for offering broadband services. It is clear that multipliers play a vital role in the design of multiplier and accumulator (MAC) unit in digital signal processors and advanced microcontrollers. Most of the advanced communication systems employs the signal processing operations namely linear filtering, convolution, discrete Fourier transform (DFT) and discrete cosine transform (DCT). In all the operations mentioned above binary multiplication is a mandatory operation to be performed. So the parallel multiplier employed has to be effective in terms of speed, power consumption, computational complexity and layout area. In signal processing applications multiplication time is the dominant factor in determining the instruction cycle of a DSP chip. Moreover design of multipliers by combining the positive traits of different features has been suggested by researchers in the past. Hence design of parallel multipliers with high speed and low complexity is a potential area of research.

II. BINARY MULTIPLIERS This section discusses on the various binary multipliers existing in the literature. Wallace tree multiplier [1] is a fast multiplier which produces n partial products when n bit numbers are employed as multiplier input. Wallace tree is a tree of carry save adders. The computation process gets complicated when the multiplier input exceeds 32 bits due to more number of adder requirements and also it slows down the multiplication process. The low power and high speed VLSI can be implemented with different multiplexing algorithms like Booth algorithm (BA) and modified Booth Algorithm (MBA) which is designed by reducing the level of partial products. Even though Booth multiplier [2] produces n/2 partial products for n inputs it involves more computational complexity due to the presence of 2’s complement operation and addition of overlapped partial products to generate the multiplier result. Erle et al. [3] employed a binary carry-save addition for an efficient noniterative pipelined implementation based on Verilog register transfer level model to achieve better latency and throughput performance when compared to area. Vazquez et al. [4] suggested an architecture for parallel multiplier which generates reduced partial products based on a multi operand tree structure. While Khan et al. [5] introduced an energy efficient 16-bit multiplier architecture by employing Booth algorithm. The input data has to be processed in its 2’s complement form and the proposed work ignored the hardware requirements for realizing the multiplier architecture. Rao and Dubey [6] proposed a high speed multiplier using modified Booth algorithm and compression adder for high speed arithmetic circuits. Baba and Rajaramesh [7] proposed a parallel multiplier in which half the partial products are generated using modified Booth encoder and an additional partial product is produced by extending the sign bit. Carry save adder (CSA) and carry look ahead adder is used as building block to speed up the multiplication operation. Arvind Chakrapani et al. [8] proposed a splitter based parallel multiplier which is designed to reduce the partial product by n/2 for n multiplier inputs. Even though the number of partial product and the delay is reduced but still the computational complexity occurs at the partial product generation using binary multiplication. In order to avoid the

March 3-4, 2017 @ KIT-Kalaignarkarunanidhi Institute of Technology

127

Proceedings of IEEE International Conference of Science, Technology, Engineering and Management (ICSTEM’ 17) complexity and to improve the speed of the multiplier, a modified splitter based architecture is proposed to perform binary multiplication by means of addition operation. III. MODIFIED SPLITTER BASED BINARY MULTIPLIERS This section proposes a modified splitter based parallel multiplier (Mod-SBPM) which is a hardware efficient multiplier for both unsigned and signed numbers when compared to that of SBPM and Booth multiplier. The modified SBPM is proposed to reduce the computational complexity of by replacing the need of binary multiplication to generate the partial products and also provides the result comparatively faster than the existing multiplier. The block representation of the modified SBPM is presented in Fig.1 and it comprises of four functional units namely multiplier splitter, partial product generator (PPG), sign detector and Wallace tree adder (WTA). PPG is the main building block of Mod-SBPM which comprises of three sub blocks namely Adder partial product generator (APPG), partial product array (PPA) and 4 to 1 multiplexer. The internal block diagram of PPG is illustrated in Fig. 2. This block receives input from both multiplier and multiplicand bits, based in the pair of multiplier bits the partial products are generated. The mandatory requirement of the binary multiplication and 2’s complement representation of the multiplicand bits is not required in the proposed modified method and hence leads to low computational complexity and delay. In the block diagram of Mod-SBPM, n stipulates the number of bits in the multiplier and multiplicand. The multiplicand (A) is directly given as the input of PPA and it is designed to produce the number of feasible partial products based on the multiplier splitter. Multiplier splitter is used to split the multiplier into a pair of non-overlapping bits from the least significant bit (LSB) which acts as the selection lines (B1 , B0 ) of the 4 to 1 multiplexer. While calculating the partial products, based on the combination of selection lines the multiplicand is scaled by its respective decimal value. The conventional Wallace tree adder is used to add the partial products only. In Wallace tree multiplier (WTM) each bit of the multiplier is multiplied with multiplicand, therefore to multiply n*n in WTM n partial products are obtained whereas in our proposed modified SBPM the n/2 partial products were generated without using binary multiplication based on the selection table given in Table 1. Table 1. Selection table for partial product computation in Mod-SBPM 𝑩𝟏

𝑩𝟎

0 0 1

0 1 0

1

1

Partial Products No operation Multiplicand Left shift the multiplicand once 3 x multiplicand (computed optimally using 2 Half adders and (N-2) Full Adders)

When the pair of bits is ‘00’ the product is assign to zero. For ‘01’ the multiplicand is assigned to the product. For ‘10’, the multiplicand has to be scaled by 2 which can be done by left shifting its binary representation by one. And for ‘11’, the multiplicand is scaled by 3 by using the tripler block and the result is stored in T. The partial product is chosen based on the selection table.

Fig. 1. Block Diagram of modified splitter based parallel multiplier The steps followed in Modified SBPM are summarized below (i) Express the given numbers into binary form. (ii) Multiplier is partitioned into a pair of non-overlapping bits from the LSB to act as selection lines. (iii) Generation of four products based on the selection table (iv) Based on the selection line, one product is selected as the partial product. (v) Place the partial product in the Wallace tree adder appropriately (vi) If all the N/2 partial products are placed in the WTA go to step (vii) else go to step (ii) and select the next pair of multiplier bits. (vii) Add the partial products by using Wallace tree adder. (viii) Deduce the sign bit for the resulting product. IV.

SIMULATION AND SYNTHESIS RESULT ANALYSIS

The simulation results of the proposed Mod-SBPM for unsigned and signed numbers are given in Fig. 3 and 4 respectively. The proposed modified SBPM is coded in Verilog hardware description language (Verilog HDL) to perform simulation and it is synthesized using Xilinx ISE tool. Based on the synthesis report presented in table 2 and 3 it is evident that Mod-SBPM’s hardware requirements for implementation and the computational time are lesser compared to that of SBPM and Booth multiplier respectively.

March 3-4, 2017 @ KIT-Kalaignarkarunanidhi Institute of Technology

128

Proceedings of IEEE International Conference of Science, Technology, Engineering and Management (ICSTEM’ 17) A

A

N-1

HALF ADDER

TN-1

T(N-2)

FULL ADDER

T(N-3)

N-2

A N-3

FULL ADDER

..............

A A0 1

A2

A3

FULL ADDER

FULL ADDER

T3

T2

HALF ADDER

T1

Multiplicand bits

PARTIAL PRODUCT ARRAY

T0

X3 X2

X1

X0

4 to 1 Multiplexer

Pair of bits from B

PARTIAL PRODUCT

Fig. 2. Block Diagram of partial product generator used in Mod-SBPM Table 2. Synthesis report for unsigned multipliers Device utilization

Multiplexers Total XOR’S 1-bit XOR2 1-bit XOR3 No. of Slices No. of 4 input LUT No. of IOs Delay

Booth multiplier

10-bit 8:1MUX (5) 58 27 31 88 out of 960 (9 %) 160 out of 1920 (8 %) 34 41.428 ns

Splitter based parallel

Modified splitter based

multiplier

parallel multiplier

10-bit 4:1MUX (4) 42 12 30 71 out of 960 (7 %) 131 out of 1920 (6 %) 34 34.595 ns

10-bit 4:1MUX (4) 65 18 47 69 out of 960 (7 %) 127 out of 1920 (6 %) 33 24.374 ns

Fig. 3. Simulation output of the proposed Mod-SBPM for unsigned numbers

Fig. 4. Simulation output of the proposed Mod-SBPM for signed numbers March 3-4, 2017 @ KIT-Kalaignarkarunanidhi Institute of Technology

129

Proceedings of IEEE International Conference of Science, Technology, Engineering and Management (ICSTEM’ 17)

Table 3. Synthesis report for signed multipliers

Device utilization

Booth multiplier

Splitter based parallel

Modified splitter based

multiplier

parallel multiplier

1 bit 4:1 Mux (16) Multiplexers

10-bit 8:1MUX (5)

10-bit 4:1 MUX (4)

10-bit 4:1MUX (4)

Total XOR’S

59

43

66

1-bit XOR2

35

13

19

1-bit XOR3

24

30

47

No. of Slices

108 out of 960 (11 %)

71 out of 960 (7 %)

69 out of 960 (7 %)

131 out of 1,920 (6 No. of 4 input LUT

203 out of 1920 (10 %)

%)

127 out of 1920 (6 %)

No. of IOs

37

37

36

Delay

47.635 ns

34.639 ns

24.374 ns

used in MAC unit of a digital signal processor and it involves a single processor cycle to fetch, execute the instruction and also to store the result [8]. Moreover the proposed high speed architecture of Mod-SBPM involves computationally efficient hardware.

[2]

[3]

V. CONCLUSION The proposed Mod-SBPM is designed and synthesized to produce the product faster and also over-comes the limitation of other basic multipliers. The performance analysis shows that the hardware requirement for implementing the proposed Mod-SBPM both signed and unsigned number is less compared to that of SBPM, Booth multiplier and Wallace tree multiplier. The synthesis report also shows that the computational time of Mod-SBPM is about 10.22 ns and 10.265ns lesser compared to SBPM for signed and unsigned numbers respectively. Even though the ModSBPM is simulated for a 8 x 8 multiplier, it can be extended even for parallel multiplier with arbitrary number of input bits. Hence the low complexity, hardware efficient Mod-SBPM is a viable method for designing high speed multipliers for digital signal processors.

[4]

[5]

[6]

[7]

[8]

References [1] Wallace, C.S.: A suggestion for a fast multiplier. IEEE Transactions on Electronic Computers 13(1), 14–17

March 3-4, 2017 @ KIT-Kalaignarkarunanidhi Institute of Technology

(1964) Booth, A.D.: A signed binary multiplication technique. Quarterly Journal of Mechanics and Applied Mathematics 4(2), 236–240 (1951) Erle, M.A., Hickmann, B.J., Schulte, M.J.: Decimal floating-point multiplication. IEEE Transactions on Computers 58(7), 902–916 (2009) Vazquez, A., Antelo, E., Montuschi, P.: Improved design of high-performance parallel decimal multipliers. IEEE Transactions on Computers 59(5), 679–693 (2010) Khan, M.Z.A., Saleem, H., Afzal, S., Naseem, J.: An Efficient 16-Bit Multiplier based on Booth Algorithm. International Journal of Advancements in Research and Technology 1(6), 16–18 (2012) Rao, M.J., Dubey, S.: A high speed wallace tree multiplier using modified booth algorithm for fast arithmetic circuits. IOSR Journal of Electronics and Communication Engineering 3(1), 07–11 (2012) Baba, S.K., Rajaramesh, D.: Design and implementation of advanced modified booth encoding multiplier. International Journal of Engineering and Science Invention 2(8), 60–68 (2013) Arvind Chakrapani, Elanchezian, T., Karthikeyan, G., Divya, N., Kabilarasan, K., Chinchu Joseph.: A Low Complexity Splitter Based Parallel Multiplier for DSP Applications. Proceedings of the National Academy of Sciences, India, 85(2), 277-281 (2015)

130