An Efficient Hardware Based MAC Design in Digital Filters with ...

13 downloads 212 Views 175KB Size Report
consists of multiplier-cum-accumulator which can be used as multiplier as well as MAC. Here the previous MAC result is added as one of the partial products of ...
2014 ,QWHUQDWLRQDO&RQIHUHQFHRQ6LJQDO3URFHVVLQJDQG,QWHJUDWHG1HWZRUNV 63,1 International Conference on Signal Processing and Integrated Networks (SPIN)

An Efficient Hardware Based MAC Design in Digital Filters with Complex Numbers Mohamed Asan Basiri M,

Noor Mahammad Sk,

Department of Computer Science and Engineering, Indian Institute of Information Technology Design and Manufacturing Kancheepuram, Chennai 600127, Email: [email protected]

Department of Computer Science and Engineering, Indian Institute of Information Technology Design and Manufacturing Kancheepuram, Chennai 600127, Email: [email protected]

Abstract—This paper proposes a novel fixed point complex number multiply accumulate circuit, which is used in real time digital signal processing applications. The proposed architecture consists of multiplier-cum-accumulator which can be used as multiplier as well as MAC. Here the previous MAC result is added as one of the partial products of the current multiplication. So the depth of the multiplier-cum-accumulator unit remains same as O(log2 n) in case of Wallace tree multiplier based multiplier-cum-accumulator and O(n) in case of Braun multiplier based multiplier-cum-accumulator. And hence the separate accumulator with depth O(log2 n) can be avoided. The performance results are showing that proposed architecture gives the better performance compared with conventional fixed point complex number MAC. The proposed architecture achieves an improvement factor of 32.4% in Wallace tree and 19.1% in Braun multiplier based fixed point complex number MAC with out pipeline using 45 nm technology library. The same architecture achieves an improvement factor of 14.6% in Wallace tree and 12.2% in Braun multiplier based fixed point complex number MAC with pipeline. Index Terms—Carry look ahead adder, Complex number arithmetic, DSP processor, FIR filter, Multiply accumulate circuit.

I. I NTRODUCTION In general, digital signal processors are used to perform the digital signal processing operations like convolution, correlation, transform and filtering. All the above mentioned digital signal processing operations are in the form of multiplication and repeated addition. So multiply accumulate circuit (MAC) is the heart of the digital signal processor. The general digital finite impulse response (FIR) filter [1] represented as (1), where x[n] and y[n] are input and output signal sequences respectively. Here h[n] is filter impulse response and N is the length of the filter. The signal sequences can be represented as fixed/floating point complex numbers. Complex numbers are playing a vital role in electronics and digital signal processing (DSP), because they are easy way to represent and manipulate the most useful real world sinusoidal waveforms. The signal attributes like amplitude and phase, can be revealed easily by complex numbers than real numbers. For instance complex numbers are used in fast fourier transform (FFT) [2]. y[n] =

N −1 

x[n − k]h[k]

k=0

‹,((( 978-1-4799-2866-8/14/$31.00 ©2014 IEEE

(1)

The basic blocks of MAC is shown in Fig. 1, where the inputs A and B are multiplied then the multiplication result is added with the previous MAC result. If A and B are n bits wide then the multiplication result will have 2n bits wide. So to avoid overflow during accumulation, the accumulation register will have k extra bits with its actual length of 2n bits. Inputs A, B

Multiplier

Addition

Register

Result

Fig. 1.

Basic blocks of MAC

The multiplier is the part of the MAC which can be designed in many ways. Array multiplier and Wallace tree [3] multiplier are the popular multipliers that are used in hardware implementation. The Wallace tree multiplier has the time complexity as O(log2 n). The array multiplier can be further classified into two categories namely, ripple carry array multiplier and carry save array multiplier. The ripple carry array multiplier can be designed with time complexity of O(n2 ). The carry save array multiplier can be designed in many ways, they are Braun multiplier [4] and Baugh wooley [5] multiplier with time complexity of O(n). The second part of the MAC is accumulator which can be designed in several ways namely, ripple carry adder and carry look ahead adder. The ripple carry adder can be designed with time complexity of O(n). The recursive doubling based carry look ahead adder can be designed in Θ(log2 n). II. T HE LITERATURE REVIEW The paper [6] shows the fixed point MAC design using carry save array multiplier. The drawback with this approach is the result will be produced at every k th clock cycle in a k stage pipelined system. The paper [7] explains the fixed point MAC design using array multiplier with two stage pipeline. The carry and sum from the last carry save stage of the multiplier are sent to the first stage along with partial

475

2014 International Conference on Signal Processing and Integrated Networks (SPIN)

result. The paper [8] and [9] explains the reconfigurable MAC architecture, where full precision (32-bit) MAC is used to perform two half precision (16-bit) MAC operations or four quarter precision (8-bit) MAC operations. The paper [10] explains the twin precision based double throughput fixed point MAC, where one full precision MAC is used to perform two half precision MACs. The paper [11] shows the reduction of critical depth of the MAC, where the N bit accumulator is divided into two N/2 bit adders. Here the accumulation can be done in two cycles. The paper [12] shows the basic complex number multiplier structure. FIR filter design using distributed arithmetic is explained in [13], where the multiple MAC operations can be done in distributed hardware with look up table based multiplication and the results are processed through with tree based adder. x[n]=(a+jb)

MAC

Inputs A, B

Register

Multiplier-cum-accumulator

Result

Fig. 3.

Proposed architecture of MAC

and (R + jS), where P = {Ps , Pm }, Q = {Qs , Qm }, R = {Rs , Rm } and S = {Rs , Rm }. Here the suffix s represents the sign bit and m represents binary number. The complex number multiplication can be done in several ways. Let the real part of the multiplication of the two complex numbers is X and imaginary part is Y , where X = {Xs , Xm } and Y = {Ys , Ym }. According to (3), FPCN multiplication requires three fixed point multipliers and five fixed point X + jY = (P + jQ)(R + jS)

h[n]= (c+jd)

Fig. 2.

X +jY = {(P −Q)S +(R−S)P }+j{(P −Q)S +Q(R+S)} (3) X + jY = {P R − QS} + j{P S + QR} (4)

y[n] -1 Z

-1 Z

-1 Z

(2)

-1 Z

Basic FIR filter structure with complex number values

The figure 2 shows the basic multiple constant multiplication based FIR filter structure with complex number filter coefficient (h[n]) and input signal sample values (x[n]), where the dotted rectangle shows the MAC unit.

adders. Here the multiplication will start after computing (P − Q), (R − S) and (R + S). According to (4), FPCN multiplication requires four fixed point multipliers and two fixed point adders. Here the multiplication can start immediately. Because of one additional multiplier, hardware area for (4) is higher than (3). But the time complexity for (4) is log2 n depth lesser than (3). So the proposed architecture is following (3).

A. Contribution of this paper

{Ps,Pm}

In general, MAC operation can be done with multiplication followed by accumulation. So the depth of the MAC circuit is depending on the multiplier and accumulator circuit. In the proposed architecture, the accumulation can be done along with multiplication (multiplication-cum-accumulation). That is, the previous MAC result is added along with the partial products of the current multiplication. And hence the separate accumulator circuit is avoided. The Fig. 3 shows the proposed architecture of the MAC. In this paper, fixed point complex number (FPCN) multiply accumulate circuit is proposed using Wallace tree/Braun multiplier with/without pipeline. The experimental results of the proposed architecture is compared with the conventional fixed point complex number MAC architecture. The rest of the paper is organized as, section III states the proposed architecture for fixed point complex number MAC. Design modeling, implementation and results are stated in section IV, followed by a section V conclusion. III. T HE PROPOSED ARCHITECTURE OF FIXED POINT COMPLEX NUMBER MAC In general, a digital signal can be represented as amplitude (r) with phase (θ), which can be  written as a complex number z = r θ = x + jy, where r = x2 + y 2 and θ = tan−1 ( xy ). The two complex numbers are represented as (P + jQ),

{Ss,Sm}

{Rs,Rm}

Fixed point multiplier

{Ss,Sm}

{Qs,Qm}

{Qs,Qm}

Fixed point multiplier

{Rs,Rm}

Fixed point multiplier

{Ps,Pm}

Fixed point multiplier

Fixed point subtractor

Fixed point adder

Fixed point accumulator

Fixed point accumulator

{Xs,Xm}

X=P.R-Q.S

Y=P.S+Q.R

{Ys,Ym}

Fig. 4. Conventional fixed point complex number multiplier-cumaccumulator

The conventional fixed point complex number multipliercum-accumulator is shown in Fig. 4, where four fixed point multipliers and four fixed point adders are used. In the the proposed architecture two fixed point multiplier-cumaccumulators, two fixed point multipliers and two fixed point adders are involved. The Fig. 5 shows the proposed fixed point complex number multiplier-cum-accumulator. So one extra adder depth can be avoided in proposed architecture. In this paper, the shaded box is representing the multiplicationcum-accumulation.

476

2014 International Conference on Signal Processing and Integrated Networks (SPIN)

{Qs,Qm}

{Ps,Pm} {Xs,Xm}

{Ss,Sm}

{Qs,Qm} {Rs,Rm}

{Ss,Sm}

{Rs,Rm}

Fixed point multiplier

Fixed point multiplier-cumaccumulator

Fixed point multiplier

Fixed point subtractor

with four stage pipeline. In the pipelined system, one more csa is used to add the sum and carry from last stage of csa of multiplier with F . The square box with HA, F A represents the half adder and full adder respectively. The horizontal dark line represents the pipelining. The previous MAC result is sent as one of the partial product which is represented as {F s, F }. The shaded square box represents the adder which adds the feed back signal. The implementation results are showing 32bit fixed point MAC with output as 96-bit wide, where 32 extra bits are appended with msb of the multiplication result which is 64 bit wide. The last stage of multiplier structure is recursive doubling based carry look ahead adder (CLA) with the depth of Θ(log2 n). The carry output (c), sum (s) from CLA and (As xor Bs xor Fs ) are used for the resultant sign adjustment.

{Ys,Ym}

Fixed point multiplier-cumaccumulator

Fixed point adder Y=P.S+Q.R

X=P.R-Q.S

Fig. 5.

{Ps,Pm}

Proposed fixed point complex number multiplier-cum-accumulator {Bs, B[10:0]}

{As, A[10:0]} partial products generation p0...

...p10

csa1

csa2

csa3

csa4

{Bs, B[10:0]}

{As, A[10:0]} csa5

csa7

csa6

partial products generation p0... csa8

csa1

csa10

csa2

csa4

sum

carry Rs=Fs R=s

(As xor Bs xor Fs)

csa5

=1

csa6

As xor Bs

CLA

csa8

=1 =0

c

csa7

=0 Fs=Rs F=R

{c,s[35:0]}

Rs=Fs R=(~s)+1

csa3

{Fs, F[35:0]} Fs=Rs F=(~R)+1

csa11

=0

...p10

csa9

{Rs, R[35:0]} =1

Fs=Rs F=(~R)+1

Rs=(~Fs) R=s

MAC result

{Fs, F[35:0]}

csa9

=1 As xor Bs

{Rs, R[35:0]}

Fig. 6. The proposed 11-bit fixed point MAC using Wallace tree multiplier (radix-2) with out pipeline

=0 Fs=Rs F=R

Rs=Fs R=s

c31 csa10

=0

carry

sum (As xor Bs xor Fs)

CLA {c,s[35:0]}

=1

{Rs, R[35:0]}

A. Radix-2 fixed point multiplier-cum-accumulator

Rs=Fs R=(~s)+1

The fixed point multiply-cum-accumulator circuit consists of three parts namely partial products generation, carry save addition and sign calculation. The Fig. 6 shows 11-bit fixed point multiplier-cum-accumulator using Wallace tree multiplier. The two fixed point input operands are {As, A} and {Bs, B}, where A and B are 11-bit binary number. The suffix s represents the sign bit. If As (Bs ) = 0, the number is treated as positive otherwise negative. The resultant number after MAC operation will be {Rs, R}, where R is 36-bit wide. The multiplication result will be 22-bit wide. But in the accumulation step, 14 extra bits are appended with msb of multiplication result to avoid overflow. Here the previous MAC result is treated as one of the partial products. The Fig. 7 is showing the 11-bit fixed point MAC using Wallace structure with four stage pipeline. The square box with csa1, csa2, csa3, ... represents the carry save adder with the depth of Θ(1). The Fig. 8 shows 5-bit fixed point multiplier-cum-accumulator using Braun multiplier, where the final accumulator register width is 16-bits. And the Fig. 9 is showing the 5-bit fixed point MAC using Braun structure

=0 c

=1

{Rs, R[35:0]}

Rs=(~Fs) R=s

MAC result

Fig. 7. The proposed 11-bit fixed point MAC using Wallace tree multiplier (radix-2) with four stage pipeline

The theorem 1 is showing the depth of proposed fixed point Wallace tree/Braun multiplier-cum-accumulator. The same depth will be achieved if the proposed system is implemented in fixed point complex number MAC. Hence, the total circuit depth of proposed system is reduced by factor of Θ(log2 n) which is the depth of an extra accumulator in conventional MAC after multiplication. The table I shows the number of pipeline stages involved in the 32-bit fixed point complex number MAC with 96-bit accumulator. Theorem 1. The circuit depth of the proposed MAC using Wallace tree and Braun multiplier-cum-accumulator is O(log2 n) and O(n) respectively.

477

Proof: The final result of the n-bit Wallace tree/Braun

2014 International Conference on Signal Processing and Integrated Networks (SPIN) {As, A[4:0]}

{Bs, B[4:0]}

partial products (p0[4:0], p1[4:0], ...p4[4:0]) generation

p1(4)

F[15:10]

p1(2)

p1(3) p0(4)

F[5]

p1(1) p0(2)

p0(3)

p1(0) p0(1)

F[0] p0(0)

F[4] F[6]

HA

FA

FA

F[8] 6’b0

F[9]

FA

p4(4)

FA

FA

FA

FA

F[7]

FA

p2(4)

p2(3) FA

FA

p3(4)

FA

p4(3)

FA

p3(2)

p3(3) FA

p4(2)

FA

p3(1)

FA

p4(1)

HA

p2(2)

HA

p3(0)

F[1]

F[2]

p2(0)

p2(1)

HA

FA

FA F[3]

HA

HA

HA

p4(0)

Carry look ahead adder

{c,s[15:0]} {Fs, F[15:0]}

=0

Rs=Fs R=s

(As xor Bs xor Fs)

Fs=Rs F=(~R)+1 =1

{Rs, R[95:0]}

=1 =0 Fs=Rs As xor Bs F=R

sel

=0

Rs=Fs R=(~s)+1

c

=1

Rs=(~Fs) R=s

{Rs, R[15:0]} MAC result

Fig. 8.

{Rs, R[15:0]}

The proposed 5-bit fixed point MAC using Braun multiplier (radix-2) with out pipeline

multiplier-cum-accumulator is fed as a partial product for the purpose of accumulation as shown in Fig. 6 and Fig. 8 respectively, where the total number of partial products including previous MAC result is (n + 1). This will avoid the usage of separate carry lookahead adder for accumulation purpose. Since Wallace tree/Braun multiplier’s last stage CLA will act as an accumulator. Hence, we call this structure as Wallace tree/Braun based multiplier-cum-accumulator. So the total circuit depth is Θ(1) + Θ(log2 n) + Θ(log2 (2n − 1)) = O(log2 n) for Wallace tree multiplier-cum-accumulator and Θ(1) + Θ(n) + Θ(log2 n) = O(n) for Braun multiplier-cumaccumulator, where Θ(1) is for generating partial products, Θ(log2 n) is depth of the Wallace structure, Θ(log2 (2n − 1)) is the depth of the CLA in the last stage of Wallace tree multiplier-cum-accumulator, Θ(n) is depth of Braun structure and Θ(log2 n) is the depth of the CLA in the last stage of Braun multiplier-cum-accumulator. Hence, the total depth is reduced by a factor of Θ(log2 n). IV. D ESIGN M ODELING , I MPLEMENTATIONS AND R ESULTS The whole system is modeled in Verilog HDL. These Verilog HDL models are simulated and verified using Xilinx ISE simulator. The timing, area and power analysis of this implementation has been done with Cadence 6.1 ASIC design tool. All the designs are implemented for different nanometer technology, for 180nm technology slow normal.lib library is used, for 90nm technology tcbn90gbwp7tlvttc0d88 ccs.lib is used and for 45nm technology tcbn45gsbwpbc088 ccs.lib is used. While technology shrinking, the delay, area and net power are getting reduced. The proposed architecture is com-

TABLE I N UMBER OF PIPELINE STAGES FOR 32- BIT FIXED POINT COMPLEX NUMBER MAC WITH 96- BIT ACCUMULATOR System Conventional 32 bit fixed point complex number MAC using Wallace tree multiplier Conventional 32 bit fixed point complex number MAC using Braun multiplier Proposed 32 bit fixed point complex number MAC using Wallace tree multiplier Proposed 32 bit fixed point complex number MAC using Braun multiplier 32 bit fixed point complex number MAC using [7]

No. of pipeline stages 3 5 3 4 2

pared with the conventional fixed point complex number MAC. The proposed architecture achieves an improvement factor of 32.4% in Wallace tree and 19.1% in Braun multiplier based complex number MAC with out pipeline using 45 nm technology library. The same architecture achieves an improvement factor of 14.6% in Wallace tree and 12.2% in Braun multiplier based complex number MAC with pipeline. The worst path delay comparison between conventional and proposed complex number MAC with out pipeline is shown in Fig. 10, where W conv and B conv represents conventional Wallace tree and Braun multiplier based fixed point complex number MAC respectively. Similarly W pro and B pro represents proposed Wallace tree and Braun multiplier based fixed point complex number MAC repectively. The Fig. 11 shows the total cycle delay comparison between conventional and proposed complex number MAC with pipeline. The total cycle delay can be

478

2014 International Conference on Signal Processing and Integrated Networks (SPIN) {As, A[4:0]}

{Bs, B[4:0]}

partial products (p0[4:0], p1[4:0], ...p4[4:0]) generation

p3(4)

p4(4)

F[15:10]

p2(4)

p0(4)

p1(4)

HA

p0(3)

p0(2)

HA

p1(3)

p0(1)

p1(1)

p1(0)

HA

HA p1(2)

FA

FA

FA

p2(3)

FA

FA

p4(3)

FA

p2(2)

FA

FA

p3(2)

p3(3)

FA

p4(2)

p2(0)

p0(0)

FA p3(0)

p3(1)

p4(0)

FA

p4(1)

FA p2(1)

F[1]

F[15:10] F[8] FA

6’b0

FA

F[7] FA

F[6]

F[5]

FA

F[4]

FA

F[3]

HA

HA

F[2]

F[0] HA

Fs=Rs F=(~R)+1

HA

{Fs, F[15:0]}

=1

F[9]

1’b0

F[15:0] As xor Bs

=0

Fs=Rs F=R

Carry look ahead adder {c,s[15:0]} =0

Rs=Fs R=s

(As xor Bs xor Fs)

{Rs, R[95:0]}

=1 Rs=Fs R=(~s)+1

=0

{Rs, R[15:0]}

Fig. 9.

c

=1

Rs=(~Fs) R=s

{Rs, R[15:0]}

MAC result

The proposed 5-bit fixed point MAC using Braun multiplier (radix-2) with pipeline

W_conv W_pro B_conv B_pro

1.6

x 10

W_conv W_pro B_conv B_pro [7]

2 Total cycle delay (ps)

x 10

1.8

Worst path delay (ps)

2.5

Worst path delay (ps) without pipeline

4

2

Total cycle delay (ps) with pipeline

4

computed by multiplying the number of pipeline stages with the worst path delay in pipelined system. The inverse of the worst path delay represents the operating frequency.

1.4

1.5

1

0.5

1.2 1

0

0.8

180 nm

90 nm Technology

45 nm

0.6 0.4

Fig. 11. Total cycle delay (ps) comparison for complex number mac using Wallace tree/Braun multiplier with pipeline

0.2 0

180 nm

90 nm Technology

45 nm

Fig. 10. Worst path delay (ps) comparison for complex number mac using Wallace tree/Braun multiplier with out pipeline

The proposed/conventional radix-2 pipelined architectures are compared with [7], where radix-4 array multiplier based MAC with 2 stage pipeline is incorporated. That is the previous MAC result is added with the first carry save stage of the multiplier-cum-accumulator. So only 2 stage pipeline is possible. But in the proposed scheme, the previous MAC result is added with the last carry save stage of the multiplier-cumaccumulator. And hence n stage pipeline is possible which tends to increase the operating frequency. The proposed system gives an operating frequency as 147.8 GHz using Wallace tree

multiplier with pipeline in 45nm process, whereas the same gives 117.2 GHz using Braun multiplier with pipeline. The Fig. 12 and 13 are showing the net power comparison between conventional, proposed complex number MAC without and with pipeline respectively. The Fig. 14 and 15 are showing the total area comparison between conventional, proposed complex number MAC without and with pipeline respectively. But this proposed technique is designed only for time optimization and hence, there is no significant difference between proposed and conventional system in power/area perspective.

479

2014 International Conference on Signal Processing and Integrated Networks (SPIN)

Net power (nw) without pipeline

6

4.5

x 10

4 3.5

x 10

W_conv W_pro B_conv B_pro [7]

2.5

3

2

Total area (μm )

Net power (nw)

Total area (μm2) with pipeline

5

3 W_conv W_pro B_conv B_pro

2.5 2 1.5

2

1.5

1

1 0.5 0.5 0

180 nm

90 nm Technology

0

45 nm

Fig. 12. Net power (nw) comparison for complex number mac using Wallace tree/Braun multiplier without pipeline

x 10

Net power (nw)

10 8 6 4 2

180 nm

90 nm Technology

45 nm

Fig. 13. Net power (nw) comparison for complex number mac using Wallace tree/Braun multiplier with pipeline

Total area (μm2) without pipeline

4

15

x 10

2

Total area (μm )

W_conv W_pro B_conv B_pro 10

5

0

45 nm

Fig. 15. Total area (μm2 ) comparison for complex number mac using Wallace tree/Braun multiplier with pipeline

R EFERENCES

W_conv W_pro B_conv B_pro [7]

12

0

90 nm Technology

Net power (nw) with pipeline

5

14

180 nm

180 nm

90 nm Technology

45 nm

Fig. 14. Total area (μm2 ) comparison for complex number mac using Wallace tree/Braun multiplier without pipeline

V. C ONCLUSION In this paper, a high performance 32-bit radix-2 fixed point complex number MAC is proposed, where the real and imaginary parts can be computed by sending the previous MAC result as one of the partial product to the present multiplication. So the depth of the MAC is equal to the depth of the multiplier. And hence the separate accumulator circuit is avoided. The experimental results are showing the proposed fixed point complex number MAC is giving better performance than the conventional fixed point complex number MAC.

[1] Kenny Johansson, Oscar Gustafsson, Linda S. DeBrunner, and Lars Wanhammar, “Minimum Adder Depth Multiple Constant Multiplication Algorithm for Low Power FIR Filter”, IEEE International Symposium on Circuits and Systems, page(s) 1439-1442, May 2011. [2] Steven W. Smith, “The Scientist and Engineers Guide to Digital Signal Processing”, California Technical Publishing, page(s) 551-566, 1997. [3] C. S. Wallace, “A suggestion for a fast multiplier”, IEEE Transactions on Electronic Computers, vol. EC-13, no. 1, pp. 1417, Feb. 1964. [4] C.M. Jones, S.S. Dlay, and R.G. Naguib, “Berger check prediction for concurrent error detection in the Braun array multiplier”, IEEE International Conference on Electronics, Circuits and Systems, vol. 1, pp. 8184, Oct. 1996. [5] Sjalander M and Larsson-Edefors P, “High-Speed and Low-Power Multipliers Using the Baugh-Wooley Algorithm and HPM Reduction Tree”, IEEE International Conference on Electronics, Circuits and Systems, page(s) 33-36, Sep. 2008. [6] F. Elguibaly, “A fast parallel multiplier accumulator using the modified Booth algorithm”, IEEE Transactions on Circuits Systems, vol. 27, no. 9, pp. 902-908, Sep. 2000. [7] Young-Ho Seo and Dong Wook Kim, “A new VLSI Architecture of Parallel Multiplier- Accumulator Based on Radix-2 Modified Booth Algorithm”, IEEE Transactions on VLSI systems, vol.18, no.2, Feb. 2010. [8] Li-Rong Wang, Yi-Wei Chiu, Chia-Lin Hu, Ming-Hsien Tu, ShyhJye Jou and Chung-Len Lee, “A Reconfigurable MAC Architecture Implemented with Mixed-Vt Standard Cell Library”, IEEE International Symposium on Circuits and Systems, page(s) 3426-3429, May 2008. [9] GUO Yuan, LI Shaokang, WANG Yiyu and ZOU Lianying, “Implementation of a High Performance Subword Parallelism 64-Bit IMAC for Multimedia Servicey”, International Conference on Computer Engineering and Technology, page(s) 272-276, April 2010. [10] Tung Thanh Hoang, Magnus Sjalander and Per Larsson-Edefors, “Double Throughput Multiply-Accumulate Unit for Flex Core Processor Enhancements”, IEEE International Symposium on Parallel Distributed Processing, page(s) 1-7, May 2009. [11] Zicari P, Perri S, Corsonello P and Cocorullo G, “An optimized adder accumulator for high speed MACs”, IEEE international conference on ASIC, vol. 2, page(s) 757 - 760, Oct. 2005. [12] Rizalafande Che Ismail and Razaidi Hussin, “High Performance Complex Number Multiplier Using Booth-Wallace Algorithm”, IEEE International Conference on Semiconductor Electronics, page(s) 786-790, Oct. 2006. [13] Martin Kumm, Konrad Moller and Peter Zipf, “Dynamically Recongurable FIR Filter Architectures with Fast Reconguration”, International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), page(s) 1-8, July 2013.

480

Suggest Documents