A NOVEL LOW-LATENCY PARALLEL ARCHITECTURE FOR DIGITAL ...

A NOVEL LOW-LATENCY PARALLEL ARCHITECTURE FOR DIGITAL PLL WITH APPLICATION TO ULTRA-HIGH SPEED CARRIER RECOVERY SYSTEMS Pablo Gianni, Hugo S. Carrer, Graciela Corral-Briones, and Mario R. Hueda Laboratorio de Comunicaciones Digitales - Universidad Nacional de Córdoba - CONICET Av. Vélez Sarsfield 1611 - Córdoba (X5016GCA) - Argentina Emails: [email protected], [email protected] ABSTRACT

parallel processing implementations, necessary to achieve ≥100Gb/s throughput, introduce high latency in the feedback loop of traditional digital PLLs (DPLLs), which limits the achievable bandwidth and consequently the capture range and phase noise tracking capabilities of the receiver. Although feedforward carrier recovery based on the Viterbi and Viterbi (VV) algorithm overcomes some of the latencyrelated limitations [6], traditional decision directed DPLLs [7] offer advantages in some aspects of the operation of CR, for example, tracking of high amplitude and high frequency sinusoidal carrier frequency jitter experienced by typical lasers. Therefore, an optimal carrier recovery may involve a combination of the VV algorithm with a traditional decision directed DPLL. This motivates the interest in low-latency DPLL implementations suitable for parallel processing.

This paper introduces a new low latency parallel processing digital carrier recovery (CR) architecture suitable for ultra-high speed intradyne coherent optical receivers (e.g. ≥ 100Gb/s). The proposed parallel scheme builds upon a novel digital phase locked loop (DPLL) architecture, which breaks the bottleneck of the feedback path. Thus, it is avoided the high latency introduced by the parallel processing implementation in the feedback loop of traditional DPLLs. Numerical results show that the bandwidth and the capture range of the new parallel DPLL are close to those achieved by a serial DPLL. This excellent behavior makes the proposed low latency parallel DPLL architecture an excellent choice for implementing high speed CR systems in both ASIC and FPGA platforms.

A traditional PLL is often modeled as a linear filter, assumption which is useful to compute the small signal transfer function [8, 7]. However, the PLL is actually a nonlinear filter. Unfortunately, this precludes the use of the unfolding techniques discussed by Parhi in [9], which are applicable only to strictly linear filters. Therefore, a different approach to reduce the latency of the PLL parallel implementation must be considered.

1. INTRODUCTION Coherent detection based receivers with electronic dispersion compensation (EDC) are being considered for next generation optical fiber transmission systems (e.g., 100 Gigabits per second (Gb/s) and beyond) [1, 2]. Unlike in intensity modulation direct detection (IM/DD) schemes [3], in coherent detection receivers with EDC it is possible to completely compensate with zero penalty the main fiber channel impairments [1] (i.e., chromatic dispersion (CD) and polarization mode dispersion (PMD) [4]). In particular, intradyne detection is preferred over the alternative heterodyne or homodyne architectures because it replaces complex optical phase-locked loops (PLLs) by more robust and easier to implement digital carrier recovery (CR) techniques. The main challenges for the digital carrier recovery in fiber optic receivers are the high carrier frequency offset typical of intradyne optical detectors, and the large phase noise of typical lasers used as transmitters and local oscillators [5, 6]. Both challenges can be overcome with a high-bandwidth carrier recovery system. However, another challenge is that

In this work we introduce a new low latency parallel processing digital carrier recovery architecture suitable for ultra-high speed intradyne coherent optical receivers. The proposed approach takes out of the feedback loop as much hardware as possible in order to simplify the loop and reduce latency. Then, the bottleneck of the critical PLL feedback path is broken by using a novel approximation to the DPLL computation. Simulation results show a capture range and bandwidth close to those achieved by serial DPLLs [8]. The proposed low latency DPLL architecture enables the efficient parallel implementation of high-speed CR systems in both FPGA and ASIC devices. This paper is organized as follows. Section 2 introduces the new DPLL computation and describes parallel implementation architectures. Section 3 presents simulation results while conclusions are drawn in Section 4.

This paper has been supported in part by the ANPCyT (PICT20081256), MINCyT - Córdoba (PID2008), and Fundación Tarpuy.

31 978-1-4244-8848-3/11/$26.00 ©2011 IEEE

In a decision directed CR loop (see Fig. 2), the symbol information is first removed [7]. In QPSK modulation receivers, this operation can be easily carried out in the phase domain as follows: φñ = (θn )π/2 ,

Fig. 1. Simplified block diagram of the coherent receiver with electronic dispersion compensation (EDC).

where (.)M denotes modulus M . In the absence of phase noise and carrier frequency offset (i.e., φn = 0 and Ωc = 0), notice that φñ = π/4 ∀n. The residual phase φñ is filtered by the first-order PLL. The phase at the output of the numerical control oscillator (NCO) results [7] ψn = ψn−1 + Kp �n ,

Fig. 2. Block diagram of a decision-directed first-order serial DPLL.

is the phase error. For QPSK modulation, note that �n ∈ [−π/4, +π/4] (e.g., �n = 0 when φn = ψn−1 and Ωc = 0). Similarly, it is possible to show that

2.1. First-Order DPLL Figure 1 shows a simplified block diagram of the coherent receiver with electronic dispersion compensation. Without loss of generality we consider quadrature phase-shift keying (QPSK) modulation [7]. Then, the sample at the EDC output can be expressed as

ψn+1

�n+1

(2)

ψn−1 + Kp �n + Kp �n+1 ,

(8)

=

�

φñ+1 − ψn−1 − Kp �n

�

π/2

−

π . 4

(9)

therefore ψn+1

(3)

where ζn ∈ {±π/4, ±3π/4} is the phase of the transmit QPSK symbol an , Ωc is the angular carrier frequency offset given by Ωc = 2πT fc , where fc and T are the carrier frequency offset and the symbol duration, respectively. Component φn is the total phase noise given by + φ(ASE) + φ(jitter) . φn = φ(laser) n n n

ψn + Kp �n+1

=

Notice that the nonlinear operation (.)π/2 precludes the use of the unfolding techniques for parallel processing1 [9]. When the carrier frequency offset is very small (i.e., Ωc � 1) and the bandwidth of the loop is low to moderate such Kp � 1, the term Kp �n in (9) can be neglected. Thus, the phase error results � � π − , �n+1 ≈ φñ+1 − ψn−1 (10) 4 π/2

where sn and θn are the module and the phase of the complex sample rn , respectively. In QPSK modulation systems, the symbol information is contained in the phase of rn . The received phase θn can be expressed as θ n = ζ n + Ωc n + φ n ,

=

where

(1)

where an ∈ {±1 ± j} is the transmit QPSK symbol; αn is the total phase noise, which includes the effects of the lasers phase noise, carrier frequency offset, and laser phase jitter. Component zn represents the amplified spontaneous emission (ASE) noise sample, which is modeled as a white complex Gaussian random variable with power σ 2 [1]. The EDC output signal (1) can be rewritten as rn = sn ejθn ,

(6)

where Kp is the proportional gain of the first-order PLL loop filter, and � � π (7) − �n = φñ − ψn−1 4 π/2

2. NEW APPROXIMATION TO DPLL COMPUTATION

rn = an ejαn + zn ,

(5)

≈

ψn−1 + Kp (φñ − ψn−1 )π/2

+

π Kp (φñ+1 − ψn−1 )π/2 − 2Kp . 4

Generalizing, we can get ψn+m

m � �

≈

ψn−1 + Kp

−

π (m + 1)Kp , 4

φñ+k − ψn−1

k=0

(4)

Note that φn includes the contribution of the laser phase (laser) (ASE) ), ASE generated phase noise (φn ), and noise (φn (jitter) laser phase jitter (φn ).

1 This

32

m ≥ 0.

situation is similar to the one found in [10].

�

π/2

(11)

Similarly, it is possible to show that ψn+1

= =

ψn + Kp �n+1 + Ki �¯n ψn−1 + Kp (�n + �n+1 )

+

�n−1 + �¯n ) Ki (¯

(15)

where �n+1

Fig. 3. Low latency parallel architecture for the first-order DPLL.

=

�

−

π . 4

φñ+1 − ψn−1 − Kp �n − Ki �¯n−1

�

π/2

(16)

For the type-II second-order DPLL, the steady-state error is zero (i.e., lim �n → 0) [7]. Thus, assuming that the n→∞ bandwidth of the loop is low to moderate such Kp � 1, the contribution of the term Kp �n can be neglected; therefore the phase error (16) results � � π − . (17) �n+1 ≈ φñ+1 − ψn−1 − Ki �¯n−1 4 π/2 Furthermore, since the accumulated phase error varies slowly with the time (i.e., �¯n ≈ �¯n−1 ), from (15) and (17) we can obtain ψn+1

Fig. 4. Implementation of the low latency parallel first-order DPLL.

≈ +

ψn−1 1 � Kp (φñ+k − ψn−1 − kKi �¯n−1 )π/2 k=0

A low latency parallel implementation of the first-order DPLL can be easily derived from (11). Let P be the parallelization factor. Figs. 3 and 4 show the architecture of the low latency parallel first-order DPLL. Block “Wk ” (k = 0, 1, ..., P − 1) uses a fast adder (e.g., a Wallace tree and a carry save adder [9]) to quickly calculate the NCO output. Furthermore, the gain Kp is assumed to be a power of 2 (i.e., Kp = 2−Np with Np being a positive integer). This way, multiplications by the proportional gain Kp are reduced to simple bit shift operations.

+

ψn+m

≈ +

(12)

(13)

is the accumulated phase error with

π/2

−

π . 4

� π� (m + 1) Ki �¯n−1 − Kp , 4

m ≥ 0. (19)

Maximum clock frequency of complex digital signal processors in state of the art 40nm CMOS technology is limited to less than 1GHz. Thus, the computational load and bit resolution required to carry out the different operations in (19) could be difficult to implement in multigigabit per second data rate receivers with current CMOS technology. In order to simplify the implementation of (19), consider the block diagrams of the DPLL shown in Fig.5. Note that the secondorder DPLL can be considered as two separated feedback

k=0

� � �k = φ˜k − ψk−1

ψn−1 m � Kp (φñ+k − ψn−1 − kKi �¯n−1 )π/2

2.3. Modified Second-Order DPLL

where Ki is the integral gain while �k ,

(18)

A low latency parallel architecture to implement the secondorder DPLL can be obtained from (19).

For a second-order DPLL, the NCO output is given by [7]

n−1 �

π� . 4

k=0

2.2. Second-Order DPLL

�¯n−1 =

2 Ki �¯n−1 − Kp

The good accuracy of (17) and (18) will be verified by computer simulations in the next section. Following a similar analysis, it is possible to derive

+

ψn = ψn−1 + Kp �n + Ki �¯n−1 ,

�

(14)

33

Fig. 5. Block diagrams of the decision-directed secondorder serial DPLL. loops: the proportional and integral loops. Thus, the NCO output (19) can be rewritten as (p)

(i)

ψn+m = ψn+m + ψn+m , (p)

m ≥ 0,

Fig. 6. Low latency parallel implementation of a secondorder DPLL.

(20) On the other hand, from (19), (22), and Fig. 5 we can also derive the NCO component due to the integral path:

(i)

where ψn+m and ψn+m are the NCO components due to the proportional and integral paths, respectively (see Fig. 5). From (19), it is simple to show that (p)

ψn+m

(i)

ψn+m

(p)

≈

�¯n+m =

k=0

π (m + 1)Kp . 4

− Since �

φñ+k − ψn−1 − kKi �¯n−1 (i) ψn−1

(p)

(p)

(25)

�ˆk = �¯n−1 +

n+m �

�ˆk ,

(26)

k=n

with �

�ˆk π/2

m �

≈

ψn−1 + Kp

−

π (m + 1)Kp , 4

(22) π/2

(p)

k=0

(23)

where π/2

.

� π (p) (i) − φ˜k − ψk−1 − ψk−1 4 π/2 � � π (p) − . φˆk − ψk−1 4 π/2

�

(27)

Thus, a parallel implementation of the type II second-order DPLL can be easily achieved as depicted in Fig. 6. Term L = lP with l being a positive integer, represents the latency required to compute all the operations of the integral path (e.g., computation of the phase errors (27)). Since the latency in this path is not as critical as in the proportional loop, its effect on the DPLL performance will be negligible, as we will show in the next section. Similarly to Kp , the integral gain Ki is assumed to be a power of 2 (i.e., Ki = 2−Ni with Ni being a positive integer). Finally, it is important to note that all additions are modulus 2π.

,

(φˆn+k − ψn−1 )π/2

� � (i) φˆn+k = θn+k − ψn−1 − kKi �¯n−1

= =

(p) ψn−1

eq. (21) can be rewritten as ψn+m

n+m � k=0

(21)

� − − kKi �¯n−1 = φñ+k − π/2 � � (i) (p) = (θn+k − ψn−1 − kKi �¯n−1 )π/2 − ψn−1 �

(i)

ψn−1 + (m + 1)Ki �¯n−1 .

Based on (14), (20), (24), and (25), the accumulated phase error can be evaluated as

ψn−1 m � Kp (φñ+k − ψn−1 − kKi �¯n−1 )π/2

+

≈

(24) 3. SIMULATION RESULTS

Notice that (23) reduces to the first-order DPLL computation given by (11), therefore its parallel implementation can be achieved as shown in Fig. 4.

Next we evaluate the effectiveness of the proposed low latency parallel DPLL architecture. We use QPSK modulation

34

12 Serial DPLL Low Latency Parallel DPLL

Table 1. DPLL Parameters Serial Proposed

Parallelism

Kp

Ki

Processing Rate

1 16

0.12 2−4

0.001 2−7

10GHz 625M Hz

11.5 SNR at BER=1e−3 [dB]

DPLL

6 Serial DPLL Low Latency Parallel DPLL

4

Magnitude [dB]

2

11

10.5

10

9.5

0 −2

9

−4

−1.5

−1

−0.5

0

0.5

1

1.5

Frequency offset [GHz]

−6

Fig. 8. Capture range of the serial and low latency parallel DPLL.

−8 −10 −12 5 10

6

10

7

10

8

10

9

10

Frequency [Hz]

Serial DPLL Low Latency Parallel DPLL

0.8 SNR penalty at BER=1e−3 [dB]

0.7

Fig. 7. Frequency response of the DPLLs. on a nondispersive noisy channel with P = 16, 1/T = 10 Giga-symbols per second (Gs/s), and latency L = 32 symbols. The signal-to-noise ratio (SNR) with SNR = 2/σ 2 (see eq. (1)) at a given bit-error-rate (BER) is also used as a measure of the goodness of the proposed CR loop. Two different second-order DPLLs were simulated for comparison purposes: the serial DPLL (S-DPLL) and the proposed low latency parallel DPLL architecture (P-DPLL) shown in Fig. 6. The frequency responses for both DPLLs are depicted in Fig. 7. The loop filter gains were selected in order to get ∼ 200MHz loop bandwidth and 0.3 dB peaking (see Table 1). For the optical system considered here, these values of bandwidth and peaking provide a good tradeoff between capture range and the residual phase noise power at the input of the slicer (see Fig. 1). The capture range is analyzed in Fig. 8. We plot the SNR required to achieve BER = 10−3 for different values of the carrier frequency offset fc (see eq. (3)). As it can be observed, the capture range for the P-DPLL is ∼ 1GHz, which is close to the maximum theoretical frequency offset value for QPSK given by 1/8T = 1.25GHz [11]. Finally, Fig. 9 investigates the behavior of the DPLLs in the presence of sinusoidal frequency jitter given by (see eq. (4)) φ(jitter) = n

Aj sin (2πT fj n) , fj

0.6 0.5 0.4 0.3 0.2 0.1 0 −0.1 −0.2 0

20

40

60

80

100

Jitter amplitude [MHz]

Fig. 9. SNR penalty versus sinusoidal frequency jitter amplitude Aj with fj =1MHz.

was set to fj =1MHz and the amplitude Aj was swept from 0 to 100MHz. It can be noted that the proposed low latency parallel DPLL can track the sinusoidal jitter with an SNR degradation ≤0.3 dB with respect to the serial DPLL.

4. CONCLUSIONS A new DPLL based carrier recovery architecture for high speed optical coherent receivers has been introduced in this paper. The proposed parallel scheme builds upon a novel DPLL computation, which breaks the bottleneck of the feedback path. We have shown a novel approach that leads to a simple parallel simple implementation. Furthermore, it has been shown that the new parallel DPLL with P = 16 can provide a bandwidth and capture range similar to those achieved by the serial DPLL.

(28)

where Aj and fj are the amplitude and frequency of the sinusoidal frequency jitter, respectively. The jitter frequency

35

5. REFERENCES

[6] M. Taylor, “Phase estimation methods for optical coherent detection using digital signal processing,” J. Lightw. Technol., vol. 27, no. 7, pp. 901–914, 2009.

[1] D. E. Crivelli, H. S. Carrer, and M. R. Hueda, “Adaptive digital equalization in the presence of chromatic dispersion, PMD, and phase noise in coherent fiber optic systems,” Globecom’04, Dec. 2004, paper SP08-3.

[7] E. A. Lee and D. G. Messerschmitt, Digital Communication, 1st ed. KAP, 1992. [8] F. M. Gardner, Phaselock Techniques, 3rd ed. WileyInterscience, Jul. 2005.

[2] M. Kuschnerov et. al., “DSP for coherent single carrier receivers,” J. Lightw. Technol., vol. 27, no. 16, pp. 3614–3622, Aug. 2009.

[9] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. Wiley-Interscience, Jan. 1999.

[3] O. E. Agazzi, M. R. Hueda, H. S. Carrer, and D. E. Crivelli, “Maximum likelihood sequence estimation in dispersive optical channels,” J. Lightw. Technol., vol. 23, no. 2, pp. 749–763, Feb. 2005.

[10] M. Thompson, “Low-latency, high-speed numerically controlled oscillator using progression-of-states technique,” Solid-State Circuits, IEEE Journal of, vol. 27, no. 1, pp. 113–117, 1992. [Online]. Available: 10.1109/4.109564

[4] G. P. Agrawal, Fiber-Optic Communication Systems. Wiley-Interscience, 1997. [5] K. Pyawanno et. al., “Fast and automatic frequency control for coherent receivers,” ECOC, Sep. 2009, paper 7.3.1.

[11] D. Messerchmitt, “Frequency detectors for PLL acquisition for timing and carrier recovery,” IEEE Trans. Commun., vol. 27, no. 9, pp. 1288–1295, Sep. 1979.

36

A NOVEL LOW-LATENCY PARALLEL ARCHITECTURE FOR DIGITAL ...

A NOVEL LOW-LATENCY PARALLEL ARCHITECTURE FOR DIGITAL ...

Suggest Documents

A Novel Architecture for Domain Specific Parallel Crawler

A Novel Parallel Architecture with Fault-Tolerance for ... - Google Sites

A PARALLEL ARCHITECTURE FOR SERIALIZABLE PRODUCTION

A Parallel Architecture for Feature Extraction in

a scalable parallel hardware architecture for ...

A Cortical Architecture for Parallel Anticipation of

Distributed Parallel Architecture for - ASE

A Novel Autonomous Control Scheme for Parallel

A Novel Data-Parallel Coprocessor for Multimedia

A Non-Coherent Architecture for GNSS Digital

A TWO-LEVEL RECONFIGURABLE ARCHITECTURE FOR DIGITAL ...

A Novel Middleware Architecture for Personal

A Novel Architecture and Mechanism for

Incremental Parallel Microtitration: A Novel

Parallel Architecture and Compilation Techniques - ACM Digital Library

Introduction to Parallel Architecture

A Secure Digital Camera Architecture for Integrated Real-Time Digital ...

Architecture Framework for Mapping Parallel Algorithms to Parallel ...

Architecture Considerations for Massively Parallel Hardware ... - CRoCS

Application and Architecture Modeling for Parallel ... - CiteSeerX

Column-Parallel Vision Chip Architecture for High

FPGA-based Parallel Hardware Architecture for Real

book Advanced Computer Architecture for Parallel Processing ...

Parallel Architecture for the Solution of Linear