transferring multiple bits over each symbol through modulation techniques has been proposed. One of the modulation technique is pulse amplitude modulation ...
A CMOS 3.2 GB/s Serial Link Transceiver, Using PWM and PAM Scheme Khayrollah Hadidi
Nooshin Ghaderi
Department of Electrical Engineering Urmia University, Urmia 53154, Iran Kh.hadidi@ urmia.ac.ir
Department of Electrical Engineering Urmia University, Urmia 53154, Iran n.ghaderi@ urmia.ac.ir Abstract—In this paper, a 3.2Gb/s serial link transceiver, in a 0.35µm CMOS technology is presented. This serial link utilizes a new multi level pulse-width and pulse-amplitude modulation technique. By using PWM, clock is embedded in the encoded signals. Thus, in the receiver, a conventional PLL can easily extract the clock from the incoming data stream. The multilevel PAM scheme also reduces the symbol rate, as compared to a conventional 2-PAM system. In proposed architecture the minimum pulse width is equal to 4Tb/3. Due to a larger pulse width than the conventional NRZ, with pulse width of Tb, ISI can be improved. The multiphase output of a three stage ring oscillator VCO in the PLL can be used to modulate and demodulate the signal. A new charge pump circuit is also introduced to decrease the mismatch between up and down path. The jitter of recovered clock is 13.9 ps at 800 MHz. The transmitter and receiver power consumption is 200 and 35 mW, respectively. I. INTRODUCTION With the rapid progress in semiconductor technologies, demand for bandwidth in serial links has been increasing. Increasing the bus bandwidth, however, increases the pin count and enlarges the chip area. So, the concept of transferring multiple bits over each symbol through modulation techniques has been proposed. One of the modulation technique is pulse amplitude modulation which is described in [1], the data rate of the transmitters which are used PAM signaling reached several gigabits per second. The other concept of analog–digital merged data stream is pulse-modulated signals, including pulse-width modulation (PWM), pulse-phase modulation (PPM), and pulse-density modulation (PDM). The PPM was described in [2]. In [8], PWM scheme was presented. All of them combined the data and clock channels in a single channel to reduce the pin count. The binary data are encoded into pulses with different widths while ensuring a periodic rising edge during each period. Thus, the clock signal could be easily recovered in the receiver using a simple phase-locked loop (PLL). In [3] and [4], the data and clock channels were merged into a single channel using both PWM and PAM (PWAM) schemes. In this paper, we proposed a new PWAM method, which uses variant level of voltages at different slots of time. So the first bit is obtained from PWM modulator, and another 3bits are obtained from PAM modulator. In proposed architecture 3 time slots for 4bits is adequate, so the minimum pulse width is equal to 4Tb/3, which was 4Tb/5 in the [3],[4].using wider pulse width can improve the ISI, furthermore because of less number of time slots in this architecture, less number of clock
978-1-4244-3896-9/09/$25.00 ©2009 IEEE
205
phases are needed and ring oscillator can be made by less number of stages so the operating frequency can increase while power dissipation decrease. Voltage value of each slot is chosen among two values according to PWM modulator output. It means that designing of DAC and ADC is so easy. This paper is organized as follows: A new 4-bit PWAM signaling concept is reviewed in Section ІІ. PLL architecture is described in section ІІІ. Sections Ⅳ and Ⅴ describe the implementation of the transmitter and receiver, respectively. Conclusions are discussed in sectionⅥ. II. PROPOSED PWAM SIGNALING STRUCTURE Fig.1 shows the proposed PWAM transmitter and receiver scheme. In Fig.1 (a) 4-bit PWAM transmitter is used to transmit the merged data and clock across a channel, and in Fig.1 (b) recovers the data and clock. Fig. 2 shows the proposed PWAM signaling scheme, which is merged by a two-level PWM for the first bit and a four-level PAM format for the other three bits. The PWAM-encoded signals cannot only achieve a high-speed data rate due to the PAM format but can also reduce the pin counts and easily recover the clock signal due to the embedded PWM function. Tx_Ck Bit3
Bit2 Bit1 Bit0
Ck1 Ck2 Ck3
PLL
Ck4
TX_PWAM
TX_PWM PWM Modulator TX_PWM
PAM Modulator
TX_PWAM
Ck5 Ck6
ck1-ck6
(a) TX_PWAM
TX_PWAM
Ck1 Bit3
Ck2
PLL
Ck3 Ck4 Ck5 Ck6
Bit2 PWAM Demodulator
Bit1 Bit0 Rx_Ck
(b) Fig.1 Proposed PWAM transceiver (a) Transmitter (b) Receiver
The PWAM transmitter consists of a 1-bit PWM modulator and a 3-bit PAM modulator. The PWM-encoded signal has pulses with two different widths. The pulse width is quantized into two levels to represent Tx-bit3. Then, the PAM modulator converts Tx-bit2, Tx-bit1 and Tx-bit0 into a PWAM-encoded signal. In proposed PWAM, there could be variant levels of voltages in different slots of time. These voltage levels are chosen according to the value of TX_PWM, which is the output of PWM modulator block and Tx-bi2, Txbit1 and Tx-bit0. If the proportional slot of PWM modulator output voltage is one, the amplitude of signal can be VR3 or VR4 according to the value of TX_bit, and this value would be VR1 or VR2, if TX_PWM was zero on that slot of time. Tx-bit2 is used to determine the amplitude of output voltage for time slote1, Tx-bit1 for time slot2 and Tx-bit0 for time slot3. 1
VR3 VR2 VR1
2 3
1
PLL jitter is 1ps at 800MHz. Ck1 UP
Ck2
Charge pump LPF
UP Input
PD Down
3 Stage VCO
Ck3 Ck4 Ck5
Down
Ck6
Fig.3 PLL scheme Vdd M14
M21
Up
M19
Down
M15
M18
M16 M17
Down
M20
Up
Vcont C1
M12
M13 R1
M10
M11
C2
2 3
0000
1000
0001
1001
0010
1010
0011
1011
0100
1100
0101
1101
0110
1110
0111
1111
Fig.4 New design for charge pump circuit IV. TRANSMITTER DESIGN Transmitter block is shown in Fig1.a. It contains two main building blocks: a PWM modulator and a PAM modulator. A PLL provides six phases of clock that are used to produce output signal. Ck1 is synchronized with Tx_Ck.
VR4
Fig2 Proposed 4bit PWAM waveform. III. PLL DESIGN The PLL synchronizes the ck1 output of VCO with incoming data. As shown in Fig.3, our design is a conventional charge pump PLL which consists of a phase frequency detector, a Charge pump, a LPF and a three stage differential VCO[5],[6] to generate 6 Phase clock, which are used to modulate and demodulate data at the next blocks. The tuning range of VCO is 500MHz - 1.4GHz. In our modulation, binary data is always transmitted in RZ format and has not the properties of NRZ data which cause the task of clock and data recovery difficult. So the sequential phase/ frequency detector [7] is suitable. A common nonideality in charge pumps is the mismatch between the currents from the PMOS and NMOS transistors that implement positive and negative current pumps. Shown in Fig. 4, the proposed differential charge pump circuit solves this problem. As can be seen, it uses the same transistor for up and down signal, and has less mismatch compared to previous topologies. Simulation results show that acquisition time is 60ns, and
206
A. PWM Modulator The PWM signal is produced according to the value of Tx_Bit3.The output duty cycle is (n+1)/3, where n=0,1. As can be seen in Fig.5, TX_PWM signal must go high, with rising edge of ck1, and come back to zero, according to the value of Tx_Bit3. If Tx_Bit3=0 it must comes back to zero, with rising edge of ck3 otherwise if Tx_Bit3=1 it must comes back to zero, with rising edge of ck5. Fig.6 shows the implementation of PWM Modulator. As can be seen, TXPWM goes high when both of ck5 and ck1 are high, (rising edge of ck1) and it comes back to zero according to the value of Tx_Bit3, with rising edge of ck3 or ck5.
pulse1, pulse2 and pulse3 are used as control signals for current sources as shown in Fig.9. The output voltage can be given by:
Fig.5 Tx_PWM signal with respect to 6 phase clock. Vdd TXPWM
TXPWM Ck1 Ck5
Ck3
(
Bit3
)
time slot1 : TxPWM = 1 → V out , ou t = R 1 (I + (Tx _ Bit 2 × 2 I ))
Ck5
⎧⎪If time slot 2 : ⎨ ⎪⎩If
Ck3
( ) TxPWM = 0 → V (out , out ) = −R (I + (Tx _ Bit1 × 2I ))
TxPWM = 1 → V out , out = R 1 (I + (Tx _ Bit1 × 2I ))
(
)
1
time slot 3 : TxPWM = 0 → V out , out = − R 1 (I + (Tx _ Bit 0 × 2I ))
Fig.6 PWM modulator
Vdd
B. Interface The interface circuit, which is used in this transceiver, is composed of a current-mode open-drain transmitter with onchip termination resistors tied to Vdd [8], that is followed by a level shifter, as shown in Fig.7.Shielded twisted pair(STP) for differential signaling is used. The current source is set to 4mA. the on-chip termination resistors R are 50Ω. So, the differential voltage levels are from Vdd to Vdd-0.2V. The received differential signal is first level shifted by the source follower circuit, [9]. C. PAM block In the previous section, proper duration of pulses according to the value of Tx_Bit3 was produced, (TX_PWM). The next step is producing proper magnitude of 3 time slots according to the value of Tx_Bit2, Tx_Bit1 and Tx_Bit0. At first, one Tx_clock period must be divided into three sections or time slots.
R1
R2
Pulse1 Ck6=Ck3
Ck6
Ck1=Ck4
Ck1 Tx_Bit2
Fig.8 pulse1 generator circuit Vdd R1 Vdd 4mA
Out Out
TXPWM
W/L
W/L
W/L
W/L
Pulse2 Pulse3 Pulse1 2W/L 2W/L
R2
TXPWM
Level Shifter
2W/L
2W/L
Vdd R1=50Ω
Vdd 4mA
Out TXPWM
Out
TXPWM
Fig.9 PAM block
R2=50Ω Level Shifter
Simulation result for output waveform is shown in Fig.10.
Fig.7 Interface Circuit These time slots can be achieved, using six clock phases from the ring oscillator. (Fig.5). Area1 = Ck1 & Ck6 Area2 = Ck3 & Ck2 (1) Area3 = Ck5 & Ck4 The next step is producing three pulses (pulse1, pulse2 and pulse3), according to the value of remained three input bits (Tx_Bit2, Tx_Bit1 and Tx_Bit0), at the three time slots (area1, area2 and area3). These three pulses must be produced as follow: Pulse1=1 if Area1=1 & Tx_Bit2=1 Pulse2=1 if Area2=1 & Tx_Bit1=1 (2) Pulse3=1 if Area3=1 & Tx_Bit0=1 The circuit implementation of Pulse1 is shown in Fig.8. As can be seen, Pulse1=1 if area1=1 & Tx_Bit 2=1 So Pulse1=1 if (Ck1 & Ck6) = 1 & Tx_Bit 2=1.
207
Fig.10 Simulation results for PWAM output V. RECEIVER DESIGN A. Pre Amplifier Stage The received signal must be amplified before any process. The amplifier is a capacitive degeneration differential pair with a zero and two poles which could improves the linearity of the stage. [11] B. PLL Block In the PLL block, rising edge of ck1 from VCO was synchronized with rising edge of data. So each data in each
time slots can be sampled with three phases of clock. (ck2, ck4 and ck6) C. PWAM Demodulator Block diagram of proposed PWAM demodulator is shown in Fig.11. Data which is sampled in ck2(D(ck2)), is in the first time slot and is always positive, so if D+(ck2) > Vref+ & D(ck2) < Vref- then Tx_Bit2=1 else Tx_Bit2=0. D(ck6) which is in the last time slot is always negative, so if D+(ck6) < Vref- & D-(ck6) > Vref+ then Tx_Bit0=1 else Tx_Bit0=0. D (ck4) is located in the second time slot and may be positive or negative, if this value is positive which means D+(ck4) > 0 & D-(ck4) Vref+ & D(ck4) < Vref- or if D+(ck4) < Vref- & D-(ck4) > Vref+ then Tx_Bit1=1 else Tx_Bit1=0. CML logic circuit is used for implementing OR gate to achieve high speed [10]. Fig.12 indicates the comparator, which is used in PWAM demodulator. Simulation results for proposed PWAM demodulator for Tx_Bt3 and Tx_Bt1 which are obtained from ck4 are shown in Fig.13. Ck2
Data+
+
A2+
Data-
-
Ref+
Comp1 +
+
Ref-
Data-
A2-
Ck6
+
Data+
-
Ref+
Comp2 +
A3or1+
-
Ref+
Comp3 +
-
+
A3+
-
A3or1-
Fig.13 Simulation results for demodulator for Tx_Bit3 and Tx_Bit1
A3Ck4
Data-
-
+ Data+ -
A4-
Data+
Ref+
A1+ DFF
Data-
Ck4 +
Data-
Ref-
A4+
+
-
Ref-
Data+
-
-
Ref-
A3or2+
+
REFERENCES
Comp4 +
-
-
[1]
A3or2-
A1-
Ck4
Fig.11 Block diagram of proposed PWAM demodulator Vdd
On Rn
M8
incoming data stream. By using the PAM scheme the symbol rate can be reduced, as compared to a conventional 2-PAM system. The symbol rate reduction lowers not only the ISI in the channel but the maximum required on-chip clock frequency as well. The multiphase output of a three stage ring oscillator VCO in the PLL can be used to modulate and demodulate the signal. Voltage value of each slot is chosen among two values according to PWM modulator output. It means that designing of DAC and ADC is so easy. In the transceiver, the symbol rate is 800MS/s, and the equivalent data rate is 3.2Gb/s.
R1 +
D
M4 Ck- M2
Vdd R2
M5
R3
Op -
D
M9
M10 Ck
Rp
M6
M11
M7 M3
Ck+
Ck+
R4
DckM12 M15
Dck+ M13
M14 M16
Ck-
Fig.12 the comparator, which is used in PWAM demodulator. VI. CONCLUSION In this paper, a serial link transceiver, in 0.35µm CMOS process, is presented. This serial link utilizes a new combination of PWM and PAM technology. With the proposed PWAM, the pulse width is larger than the conventional NRZ (2-PAM) case, therefore ISI will be improved. Because of using PWM, the necessary component of clock is embedded in the encoded signals. Thus, in the receiver, a conventional PLL can easily extract the clock from
208
R. Farjad-Rad, C.-K. Yang, M. A. Horowitz, and T. H. Lee, “A 0.3-μm CMOS 8-Gb/s 4-PAM serial link transceiver,” IEEE J. Solid-State Circuits, vol. 35, no. 5, pp. 757–764, May 2000. [2] K. Nogam and A. E. Gamal, “A CMOS 160-Mb/s phase modulation I/O interface circuit,” in ISSCC Dig. Tech. Papers, Feb. 1994, pp. 108–109. [3] C. Y. Yang and Y. Lee, “A PWM and PAM Signaling Hybrid Technology for Serial-Link Transceivers”, IEEE Transactions on Instrumentation and Measurenent. 2008. [4] C. Y. Yang and Y. Lee, “A 0.18-μm CMOS 1-Gb/s serial link transceiver by using PWM and PAM techniques,” in Proc. IEEE Int. Symp. Circuits Syst., May 2005. [5] R. Zhang and G. S. La Rue, “Fast Acquisition Clock and Data Recovery Circuit With Low Jitter” IEEE J. Solid-State Circuits, vol. 41, no. 5, May 2006. [6] J. Lee and B. Kim, “A low noise fast-lock phase-locked loop with adaptive bandwidth control,” IEEE J. Solid-State Circuits, vol. 35, no. 8, pp. 1137–1145, Aug. 2000. [7] J. Savoj and B. Razavi, “ High-Speed CMOS Circuits for Optical Receivers” Kluwer Academic Publishers,2001 [8] W.H. Chen, G.K. Dehng, J.W. Chen, S.I.Liu,”A CMOS 400-Mb/s Serial Link for AS-Memory Systems Using a PWM Scheme.” IEEE J. SolidState Circuits, vol. 36, no. 10, Oct. 2001 [9] Kh. Hadidi, J. Sobhi, A. Hasankhan, D. Muramatsu, T. Matsumoto, “A Novel Linear CMOS Buffer” IEEE, 1998 [10] L. Li, S. Raghavendran, and D.T. Comer,” CMOS Current Mode Logic Gates for High-Speed Applications” 12th NASA Symposium on VLSI Design, Coeur d’Alene, Idaho, USA, Oct 2005. [11] S. Gondi and B. Razavi,” Equalization and Clock and Data Recovery Techniques for 10-Gb/s CMOS Serial-Link Receivers” IEEE J. SolidState Circuits, vol. 42, no. 9, Sep. 2007