Lecture 18: Datapath Functional Units

21 downloads 128 Views 311KB Size Report
18: Datapath Functional Units. 2. CMOS VLSI Design. CMOS VLSI Design 4th Ed . Outline. ❑ Comparators. ❑ Shifters. ❑ Multi-input Adders. ❑ Multipliers ...
Lecture 18: Datapath Functional Units

Outline ‰ ‰ ‰ ‰

Comparators Shifters Multi-input Adders Multipliers

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

2

Comparators ‰ ‰ ‰ ‰

0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

3

1’s & 0’s Detectors ‰ 1’s detector: N-input AND gate ‰ 0’s detector: NOTs + 1’s detector (N-input NOR) A7 A6 A5 A4 A3 A2

A3 A2 allones

allzeros

A1 A0

A1 A0 A7 A6 A5 A4 A3 A2

allones

A1 A0

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

4

Equality Comparator ‰ Check if each bit is equal (XNOR, aka equality gate) ‰ 1’s detect on bitwise equality B[3] A[3] B[2] A[2]

A=B

B[1] A[1] B[0] A[0]

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

5

Magnitude Comparator ‰ Compute B – A and look at sign ‰ B – A = B + ~A + 1 ‰ For unsigned numbers, carry out is sign bit A≤ B C B3

A≥ B

N

A3 B2 A2 B1

Z

A=B

A1 B0 A0

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

6

Signed vs. Unsigned ‰ For signed numbers, comparison is harder – C: carry out – – – –

Z: zero (all bits of A – B are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign ≠ B) S: N xor V (sign of result)

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

7

Shifters ‰ Logical Shift: – Shifts number left or right and fills with 0’s • 1011 LSR 1 = 0101 1011 LSL1 = 0110 ‰ Arithmetic Shift: – Shifts number left or right. Rt shift sign extends • 1011 ASR1 = 1101 1011 ASL1 = 0110 ‰ Rotate: – Shifts number left or right and fills with lost bits • 1011 ROR1 = 1101 1011 ROL1 = 0111

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

8

Funnel Shifter ‰ A funnel shifter can do all six types of shifts ‰ Selects N-bit field Y from 2N–1-bit input – Shift by k bits (0 ≤ k < N) – Logically involves N N:1 multiplexers

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

9

Funnel Source Generator Rotate Right Logical Right Arithmetic Right Rotate Left Logical/Arithmetic Left

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

10

Array Funnel Shifter ‰ N N-input multiplexers – Use 1-of-N hot select signals for shift amount – nMOS pass transistor design (Vt drops!) k[1:0]

left

Inverters & Decoder s3

s2

s1

s0

Y3 Y2

Z6

Y1

Z5

Y0 Z4

Z3

Z2

Z1

18: Datapath Functional Units

Z0

CMOS VLSI Design 4th Ed.

11

Logarithmic Funnel Shifter ‰ Log N stages of 2-input muxes – No select decoding needed

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

12

32-bit Logarithmic Funnel ‰ Wider multiplexers reduce delay and power ‰ Operands > 32 bits introduce datapath irregularity

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

13

Barrel Shifter ‰ Barrel shifters perform right rotations using wraparound wires. ‰ Left rotations are right rotations by N – k = k + 1 bits. ‰ Shifts are rotations with the end bits masked off.

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

14

Logarithmic Barrel Shifter

Right shift only

Right/Left shift 18: Datapath Functional Units

Right/Left Shift & Rotate CMOS VLSI Design 4th Ed.

15

32-bit Logarithmic Barrel ‰ Datapath never wider than 32 bits ‰ First stage preshifts by 1 to handle left shifts

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

16

Multi-input Adders ‰ Suppose we want to add k N-bit words – Ex: 0001 + 0111 + 1101 + 0010 = 10111 ‰ Straightforward solution: k-1 N-input CPAs – Large and slow 0001 0111 1101 0010 + 1000 + 10101 + 10111

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

17

Carry Save Addition ‰ A full adder sums 3 inputs and produces 2 outputs – Carry output has twice weight of sum output ‰ N full adders in parallel are called carry save adder – Produce N sums and N carry outs X4

C4 S4

Y4 Z4 X3

Y3 Z3 X2

C3 S3

C2 S2

Y2 Z2 X1

Y1 Z1

C1 S1

XN...1 YN...1 ZN...1 n-bit CSA CN...1

18: Datapath Functional Units

SN...1

CMOS VLSI Design 4th Ed.

18

CSA Application ‰ Use k-2 stages of CSAs – Keep result in carry-save redundant form ‰ Final CPA computes actual result 0001 0001 0111 1101 0010 0111 +1101 1011 4-bit CSA 0101_ 0101_ 1011 0101_ 1011 5-bit CSA +0010 01010_ 00011 00011 01010_ + 01010_ 10111 + 00011 10111

18: Datapath Functional Units

X Y Z S C X Y Z S C A B S

CMOS VLSI Design 4th Ed.

19

Multiplication ‰ Example:

1100 : 1210 0101 : 510 1100 0000 1100 0000 00111100 : 6010

multiplicand multiplier partial products product

‰ M x N-bit multiplication – Produce N M-bit partial products – Sum these to produce M+N-bit product

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

20

General Form ‰ Multiplicand: ‰ Multiplier:

N −1 N −1 M −1 ⎛ M −1 ⎞ ⎛ ⎞ P = ⎜ ∑ y j 2 j ⎟ ⎜ ∑ xi 2i ⎟ = ∑ ∑ xi y j 2i + j ⎠ i =0 j =0 ⎝ j =0 ⎠ ⎝ i =0

‰ Product:

p11

Y = (yM-1, yM-2, …, y1, y0) X = (xN-1, xN-2, …, x1, x0)

y5

y4

y3

y2

y1

y0

x5

x4

x3

x2

x1

x0

x0y5

x0y4

x0y3

x0y2

x0y1

x0y0

x1y5

x1y4

x1y3

x1y2

x1y1

x1y0

x2y5

x2y4

x2y3

x2y2

x2y1

x2y0

x3y5

x3y4

x3y3

x3y2

x3y1

x3y0

x4y5

x4y4

x4y3

x4y2

x4y1

x4y0

x5y5

x5y4

x5y3

x5y2

x5y1

x5y0

p10

p9

p8

p7

p6

p5

18: Datapath Functional Units

p4

p3

p2

multiplicand multiplier

partial products

p1

p0

CMOS VLSI Design 4th Ed.

product

21

Dot Diagram ‰ Each dot represents a bit x0

partial products multiplier x x15

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

22

Array Multiplier y3

y2

y1

y0

x0

x1

CSA Array

x2

x3

CPA

p7

p6

p5

p4

p3

p2

p1

p0

A B critical path

Sin A Cin

B

Sin

B

Cin

Cout

= Cout Cout

A

Sout

18: Datapath Functional Units

A

=

Cin Sout

Cout

B Cin

Sout Sout

CMOS VLSI Design 4th Ed.

23

Rectangular Array ‰ Squash array to fit rectangular floorplan y3

y2

y1

y0

x0

p0

x1

p1

x2

p2

x3

p3

p7

18: Datapath Functional Units

p6

p5

p4

CMOS VLSI Design 4th Ed.

24

Fewer Partial Products ‰ Array multiplier requires N partial products ‰ If we looked at groups of r bits, we could form N/r partial products. – Faster and smaller? – Called radix-2r encoding ‰ Ex: r = 2: look at pairs of bits – Form partial products of 0, Y, 2Y, 3Y – First three are easy, but 3Y requires adder /

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

25

Booth Encoding ‰ Instead of 3Y, try –Y, then increment next partial product to add 4Y ‰ Similarly, for 2Y, try –2Y + 4Y in next partial product

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

26

Booth Hardware ‰ Booth encoder generates control lines for each PP – Booth selectors choose PP bits

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

27

Sign Extension ‰ Partial products can be negative – Require sign extension, which is cumbersome – High fanout on most significant bit s s s s s s s

s s s s s s s

s s s s s s

s s s s s s

s s s s s

s s s s s

s s s s

s s s s s s s s s s s s s s s s

s s s s s

PP0 PP1 PP2

s s s

0 x-1 x0

PP3 PP4

multiplier x

s s s s s s s s

PP5 PP6 PP7 PP8

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

x15 0 x16 0 x17

28

Simplified Sign Ext. ‰ Sign bits are either all 0’s or all 1’s – Note that all 0’s is all 1’s + 1 in proper column – Use this to reduce loading on MSB s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 s 1 1 1 1 1 s 1 1 1 s 1

s s s s s s s s

PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

29

Even Simpler Sign Ext. ‰ No need to add all the 1’s in hardware – Precompute the answer! s s s 1 s

s s

1 s s

1 s s

1 s s

1 s s

1 s s

s s

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8

30

Advanced Multiplication ‰ Signed vs. unsigned inputs ‰ Higher radix Booth encoding ‰ Array vs. tree CSA networks

18: Datapath Functional Units

CMOS VLSI Design 4th Ed.

31