18: Datapath Functional Units. 2. CMOS VLSI Design. CMOS VLSI Design 4th Ed
. Outline. ❑ Comparators. ❑ Shifters. ❑ Multi-input Adders. ❑ Multipliers ...
Lecture 18: Datapath Functional Units
Outline
Comparators Shifters Multi-input Adders Multipliers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
2
Comparators
0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
3
1’s & 0’s Detectors 1’s detector: N-input AND gate 0’s detector: NOTs + 1’s detector (N-input NOR) A7 A6 A5 A4 A3 A2
A3 A2 allones
allzeros
A1 A0
A1 A0 A7 A6 A5 A4 A3 A2
allones
A1 A0
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
4
Equality Comparator Check if each bit is equal (XNOR, aka equality gate) 1’s detect on bitwise equality B[3] A[3] B[2] A[2]
A=B
B[1] A[1] B[0] A[0]
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
5
Magnitude Comparator Compute B – A and look at sign B – A = B + ~A + 1 For unsigned numbers, carry out is sign bit A≤ B C B3
A≥ B
N
A3 B2 A2 B1
Z
A=B
A1 B0 A0
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
6
Signed vs. Unsigned For signed numbers, comparison is harder – C: carry out – – – –
Z: zero (all bits of A – B are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign ≠ B) S: N xor V (sign of result)
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
7
Shifters Logical Shift: – Shifts number left or right and fills with 0’s • 1011 LSR 1 = 0101 1011 LSL1 = 0110 Arithmetic Shift: – Shifts number left or right. Rt shift sign extends • 1011 ASR1 = 1101 1011 ASL1 = 0110 Rotate: – Shifts number left or right and fills with lost bits • 1011 ROR1 = 1101 1011 ROL1 = 0111
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
8
Funnel Shifter A funnel shifter can do all six types of shifts Selects N-bit field Y from 2N–1-bit input – Shift by k bits (0 ≤ k < N) – Logically involves N N:1 multiplexers
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
9
Funnel Source Generator Rotate Right Logical Right Arithmetic Right Rotate Left Logical/Arithmetic Left
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
10
Array Funnel Shifter N N-input multiplexers – Use 1-of-N hot select signals for shift amount – nMOS pass transistor design (Vt drops!) k[1:0]
left
Inverters & Decoder s3
s2
s1
s0
Y3 Y2
Z6
Y1
Z5
Y0 Z4
Z3
Z2
Z1
18: Datapath Functional Units
Z0
CMOS VLSI Design 4th Ed.
11
Logarithmic Funnel Shifter Log N stages of 2-input muxes – No select decoding needed
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
12
32-bit Logarithmic Funnel Wider multiplexers reduce delay and power Operands > 32 bits introduce datapath irregularity
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
13
Barrel Shifter Barrel shifters perform right rotations using wraparound wires. Left rotations are right rotations by N – k = k + 1 bits. Shifts are rotations with the end bits masked off.
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
14
Logarithmic Barrel Shifter
Right shift only
Right/Left shift 18: Datapath Functional Units
Right/Left Shift & Rotate CMOS VLSI Design 4th Ed.
15
32-bit Logarithmic Barrel Datapath never wider than 32 bits First stage preshifts by 1 to handle left shifts
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
16
Multi-input Adders Suppose we want to add k N-bit words – Ex: 0001 + 0111 + 1101 + 0010 = 10111 Straightforward solution: k-1 N-input CPAs – Large and slow 0001 0111 1101 0010 + 1000 + 10101 + 10111
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
17
Carry Save Addition A full adder sums 3 inputs and produces 2 outputs – Carry output has twice weight of sum output N full adders in parallel are called carry save adder – Produce N sums and N carry outs X4
C4 S4
Y4 Z4 X3
Y3 Z3 X2
C3 S3
C2 S2
Y2 Z2 X1
Y1 Z1
C1 S1
XN...1 YN...1 ZN...1 n-bit CSA CN...1
18: Datapath Functional Units
SN...1
CMOS VLSI Design 4th Ed.
18
CSA Application Use k-2 stages of CSAs – Keep result in carry-save redundant form Final CPA computes actual result 0001 0001 0111 1101 0010 0111 +1101 1011 4-bit CSA 0101_ 0101_ 1011 0101_ 1011 5-bit CSA +0010 01010_ 00011 00011 01010_ + 01010_ 10111 + 00011 10111
18: Datapath Functional Units
X Y Z S C X Y Z S C A B S
CMOS VLSI Design 4th Ed.
19
Multiplication Example:
1100 : 1210 0101 : 510 1100 0000 1100 0000 00111100 : 6010
multiplicand multiplier partial products product
M x N-bit multiplication – Produce N M-bit partial products – Sum these to produce M+N-bit product
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
20
General Form Multiplicand: Multiplier:
N −1 N −1 M −1 ⎛ M −1 ⎞ ⎛ ⎞ P = ⎜ ∑ y j 2 j ⎟ ⎜ ∑ xi 2i ⎟ = ∑ ∑ xi y j 2i + j ⎠ i =0 j =0 ⎝ j =0 ⎠ ⎝ i =0
Product:
p11
Y = (yM-1, yM-2, …, y1, y0) X = (xN-1, xN-2, …, x1, x0)
y5
y4
y3
y2
y1
y0
x5
x4
x3
x2
x1
x0
x0y5
x0y4
x0y3
x0y2
x0y1
x0y0
x1y5
x1y4
x1y3
x1y2
x1y1
x1y0
x2y5
x2y4
x2y3
x2y2
x2y1
x2y0
x3y5
x3y4
x3y3
x3y2
x3y1
x3y0
x4y5
x4y4
x4y3
x4y2
x4y1
x4y0
x5y5
x5y4
x5y3
x5y2
x5y1
x5y0
p10
p9
p8
p7
p6
p5
18: Datapath Functional Units
p4
p3
p2
multiplicand multiplier
partial products
p1
p0
CMOS VLSI Design 4th Ed.
product
21
Dot Diagram Each dot represents a bit x0
partial products multiplier x x15
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
22
Array Multiplier y3
y2
y1
y0
x0
x1
CSA Array
x2
x3
CPA
p7
p6
p5
p4
p3
p2
p1
p0
A B critical path
Sin A Cin
B
Sin
B
Cin
Cout
= Cout Cout
A
Sout
18: Datapath Functional Units
A
=
Cin Sout
Cout
B Cin
Sout Sout
CMOS VLSI Design 4th Ed.
23
Rectangular Array Squash array to fit rectangular floorplan y3
y2
y1
y0
x0
p0
x1
p1
x2
p2
x3
p3
p7
18: Datapath Functional Units
p6
p5
p4
CMOS VLSI Design 4th Ed.
24
Fewer Partial Products Array multiplier requires N partial products If we looked at groups of r bits, we could form N/r partial products. – Faster and smaller? – Called radix-2r encoding Ex: r = 2: look at pairs of bits – Form partial products of 0, Y, 2Y, 3Y – First three are easy, but 3Y requires adder /
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
25
Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
26
Booth Hardware Booth encoder generates control lines for each PP – Booth selectors choose PP bits
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
27
Sign Extension Partial products can be negative – Require sign extension, which is cumbersome – High fanout on most significant bit s s s s s s s
s s s s s s s
s s s s s s
s s s s s s
s s s s s
s s s s s
s s s s
s s s s s s s s s s s s s s s s
s s s s s
PP0 PP1 PP2
s s s
0 x-1 x0
PP3 PP4
multiplier x
s s s s s s s s
PP5 PP6 PP7 PP8
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
x15 0 x16 0 x17
28
Simplified Sign Ext. Sign bits are either all 0’s or all 1’s – Note that all 0’s is all 1’s + 1 in proper column – Use this to reduce loading on MSB s 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 1 1 s 1 1 1 1 1 1 1 s 1 1 1 1 1 s 1 1 1 s 1
s s s s s s s s
PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
29
Even Simpler Sign Ext. No need to add all the 1’s in hardware – Precompute the answer! s s s 1 s
s s
1 s s
1 s s
1 s s
1 s s
1 s s
s s
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
PP0 PP1 PP2 PP3 PP4 PP5 PP6 PP7 PP8
30
Advanced Multiplication Signed vs. unsigned inputs Higher radix Booth encoding Array vs. tree CSA networks
18: Datapath Functional Units
CMOS VLSI Design 4th Ed.
31