A Low-PDP and Low-Area Repeater Using Passive ...

0 downloads 0 Views 807KB Size Report
output (VZ) of a channel and the BER bathtub curve. The higher peak-to-peak jitter is due to residual ISI and can be completely removed by either reducing the ...
A Low-PDP and Low-Area Repeater Using Passive CTLE for On-Chip Interconnects Ming-Shuan Chen, Mau-Chung Frank Chang, Chih-Kong Ken Yang University of California, Los Angeles, CA

Magnitude (dB)

w/o EQ (BW=1.02GHz)

w/ EQ (BW=2.54GHz)

Frequency (GHz)

Fig. 1 Proposed on-chip link architecture, channel cross-sectional view, and channel frequency response.

RC

RB 2.7kΩ RC 2.7kΩ CC CL

Vbias

(for DC unbalanced data)

RB

RC

CMOM

57f 4.5f

CNMOS

Vin CC

VX

RB C L

INV1

Vin

AAC

Istatic

VX Vin

ADC Atot

VX

INV2

Shared Bias Gen.

Vbias CD

z1/p1 RB ADC= RC+RB CC AAC= CL+CC Atot=ADC*AAC

p2 1 RC*CC z1 p2= ADC z1=

Fig. 2 Proposed passive-CTLE-based repeater design

Power (uW)

Abstract This paper presents an improved repeater circuit that preserves the advantages of the inverter repeater and achieves a lower power, delay, and area by applying proper equalization. Designed and measured in 65nm CMOS technology, the proposed repeater achieves 44% lower power-delay product (PDP) while occupies 46% lower area. I. Introduction The presence of multiple cores, shared on-chip memories, and specialized functional units in large digital integrated systems leads to using repeaters for wide data busses that pass data between blocks. To maintain high performance, high-throughput, low-latency, and low-power on-chip links are needed to transmit across highly dissipative wire channels. The low-pass frequency response of the wires disperses the pulse response and induces inter-symbol interferences (ISI). The most commonly used approach to handle this problem is to insert inverters as repeaters to split the large RC length into smaller segments with sufficient bandwidth [1]. Many circuits that reduce the number of repeaters and power have been proposed in literature [2-4]. Most of these approaches eliminate repeating stages to reduce routing complexity and to achieve low power consumption. Clocked comparators have been proposed at the receiver in order to recover the low swing, equalized data. These approaches have not been widely adopted since the latency is often limited by the clocking constraint. An added advantage of inverter repeaters is their compatibility with standard-cell design methodology. Also, the single-ended signaling enables higher throughput. This paper presents a repeater circuit with built-in equalization that preserves advantages of an inverter repeater while achieving 44% lower PDP and 46% lower area. II. Link Architecture and Circuit Implementation The proposed on-chip link and the cross-sectional view of the target channel are depicted in Fig. 1. The objective is to transmit 4Gb/s data through single-ended, ground-shielded M4 wires across 5mm length with a total width of 1um per channel. When using inverter repeaters, 5 stages are required for minimized delay. Without repeaters, the channel introduces -17.2dB loss at 2GHz. The proposed design utilizes a passive continuous-time linear equalizer (CTLE) as

Inverter Repeater EQed Repeater

Activity Factor

Fig. 3 (Left) INV1 static power under different pre-emphasis and (Right) power consumption under different activity factor

the receiver front-end. To recover the equalized signal back to full swing, an additional inverter, INV1, is inserted in between the CTLE and the wire driver INV2. This inverter adds a modest delay but maintains proper logic polarity. By not requiring a clock, this simple approach directly replaces inverter repeaters requiring fewer repeating stages leading to lower power dissipation. For the 5mm channel, we apply two repeating stages, each driving a 2.5mm segment that has 6.9dB loss at 2GHz. The channel response is shown in Fig. 1, which demonstrates a bandwidth of 1.0GHz. The equalizer is designed to compensate the channel loss and extend the bandwidth from 1.0 to 2.5GHz. The added bandwidth further reduces latency. Fig. 2 shows the design of the passive CTLE, which consists of a resistive divider (formed by RB and RC) that attenuates low-frequency signals and a coupling capacitor CC that passes high-frequency signals. The coupling capacitor CC is implemented by stacking a metal-oxide-metal capacitor (CMOM) on top of an NMOS capacitor (CNMOS) to minimize area. The dc level of the CTLE output is biased by Vbias, which is generated by a shared bias generator consisting of a decoupling capacitor (CD) and feedback inverters that consumes little static current (30uW/link). For non-dc balanced data, an additional inverter followed by resistors RB and RC is required for biasing purpose. The frequency responses of the unequalized channel (Vin), the passive CTLE (VX/Vin), and the overall channel (VX) are plotted in Fig. 2. The passive CTLE inserts a zero, z1, to cancel out the channel-induced dominate pole, p1. The bandwidth is extended to the frequency of the CTLE-induced second pole, p2, which depends on the product of z1 and the inverse of the dc gain. Ideally, the bandwidth can be further improved by choosing a low dc gain. One of the key advantages of this design is to leverage INV1 as a single-ended receiving amplifier. By reducing the voltage swing by only 6.6dB

Before EQ (VunEQ)

After EQ (VEQ)

30mV/div, 100ps/div

30mV/div, 100ps/div

Fig. 5 Measured eye diagrams of unequalized (VunEQ) and equalized (VEQ) outputs after a 2.5mm channel 10-4

Bit Error Rate

10-6 10-8 10-10 10-12

0.48UI

10-14

0.5 0.7 0.9 0.1 0.3 Sampling Clock Phase (UI)

Fig. 6 (Left) Measured eye diagrams of input (VY) and output (VZ) and, (Right) measured BERT bathtub curve

Fig. 7 (Left) Simulated and measured PDP under different voltage swings and, (Right) measured PDP under different supply voltages

Fig. 4 The architecture of prototype IC with 16 parallel channels

III. Measurement Results To verify the proposed circuits, a prototype that transmits 16 x 4Gb/s data over 16 parallel 5mm wires is implemented in 65nm CMOS technology. The transceiver structure is shown in Fig. 4. The required input sequence for testing is generated by a 16b, 27-1 PRBS generator. To verify the correctness of the transceived data, a bank of quarter-rate samplers, operating at 1GHz, is incorporated which retimes and sends the data to the 64b, 1GHz parallel PRBS checker for BER measurement. Additional channels are included to measure the delay, power, and signal waveform. A channel using inverter repeaters is included for comparison. Fig. 5 illustrates the eye diagrams of the unequalized (VunEQ) and equalized (VEQ) signals after a 2.5mm channel. Fig. 6 shows the voltage waveforms of the input (VY) and output (VZ) of a channel and the BER bathtub curve. The higher peak-to-peak jitter is due to residual ISI and can be completely removed by either reducing the segment length by 15% or a 28% lower data rate. Fig. 7 shows the measured PDP of the proposed link under different voltage swings (VEQ) and the comparison between the two approaches under different supply voltages. The measured power and delay are 968uW and 286ps respectively. When compared to using inverter repeaters, the power, delay, and PDP are 29%, 22%, and 44% lower. Fig. 8 shows the die micrograph and the layout and compares the energy efficiency per length and throughput with other state-of-the-art papers. A table summarizing the performance of this work and that of other state-of-the-art papers is shown in Table 1.

4 x 16b=64Gb/s

Energy Efficiency per Length (fJ/b/mm)

64b PRBS Checker + BERT

1-GHz Quarter-Rate Samplers

16b PRBS Gen.

(~0.45X), the structure dissipates little static current. Fig. 3 illustrates the power consumption of INV1 under different pre-emphasis showing the increase in power with lower voltage swing (lower ADC). This trade-off limits the bandwidth improvement. We choose ADC to be roughly 6dB to cover sufficient PVT variation. Fig. 3 compares the power consumption of inverter repeater and proposed repeater under different activity factors (α). The proposed repeater consumes less power when α is higher than 0.054. This passive CTLE has several advantages over other types of equalizers. First, since the channel response can be accurately extracted and modeled, a single zero accurately inserted can equalize the channel as effectively as an FFE or DFE. Second, the CTLE in combination with the receiver amplifier consumes little power (47uW) for equalization. Third, the design is robust to PVT variation since an incomplete pole-zero cancellation only results in a slightly higher ISI or peaking. Simulation shows that varying z1 frequency by ±30% impacts the eye opening by less than 7%. Finally, Fig. 8 shows that while the CTLE repeater occupies 15% larger area per stage compared to inverter repeaters, it only requires 2 stages which leads to 46% lower area.

Proposed 57.2 x 2 Repeater =114.4 um2 49.5 x 5 Inverter Repeater = 247.2 um2

Inverter Repeater

This Work Throughput (Gb/s/um)

Fig. 8 (Left) Die micrograph, layout of the proposed equalizer and, (Right) comparison of energy efficiency per length and throughput of state-of-the-art on-chip links ISSCC’09 ISSCC’12 ISSCC’13 [2] [4] [3]

This work Inverter Proposed Repeater Repeater

Technology

90nm

65nm

65nm

65nm

65nm

Channel Data Rate

4Gb/s

10Gb/s

3Gb/s

4Gb/s

4Gb/s

Length

10mm

6mm

10mm

5mm

5mm

2

2.56

0.75

4

4

35.6

174

9.5

67.6

48.4

250ps

100ps

333ps

365.4ps

285.8ps

Throughput (Gb/s/um) Energy Efficiency per Length (fJ/b/mm) Delay

Table I Performance Summary

Reference [1] R. Ho et al., "The Future of Wire," the Proceedings of IEEE, 2001. [2] B. Kim et al., "A 4Gb/s/ch 356fJ/b 10mm Equalized On-Chip Interconnect with Nonlinear Charge-Injecting Transmit Filter and Transimpedance Receiver in 90nm CMOS," ISSCC, 2009. [3] D. Walter et al., " A Source-Synchronous 90Gb/s Capacitively Driven Serial On-Chip Link Over 6mm in 65nm CMOS," ISSCC, 2012. [4] S.–K. Lee et al., "A 95fJ/b current-mode transceiver for 10mm on-chip interconnect," ISSCC, 2013.