A Scalable and Hardware-Efficient Architecture for ... - CiteSeerX

A Scalable and Hardware-Efficient Architecture for Digitally Adaptive Electronic Dispersion Compensation a

Daniel Efinger*a, Stefan Payera, Halmo Fischerb Institute of Telecommunications, University of Stuttgart, Pfaffenwaldring 47, D-70569 Stuttgart; b Agilent Technology R&D and Marketing GmbH & Co. KG, Digital Photonic Test (DPT), Herrenbergerstr. 130, D-71034 Böblingen ABSTRACT

We present a novel hardware architecture for digitally adaptive feed-forward equalization (FFE) suitable to compensate the inter-symbol interference (ISI) caused by chromatic (CD) and time-varying polarization mode dispersion (PMD) in intensity modulated optical links with direct detection (IM/DD). Existing analog tapped delay lines for realizing the equalization filter at a bit rate of 40 Gbit/s commonly use external manual or random dithering approaches for tap weight adjustment1,2. While manual tap weight adjustment is impractical for systems with randomly time-varying behavior, random dithering of the tap weights to find the optimal setup shows adaptation times above 1s which exceeds the measured PMD variations in installed fibers3 (~10ms) by far. Our solution follows a completely digital implementation approach and it can be scaled to various bit rates using distributed arithmetic (DA) and some parallelization techniques. The digital adaptation unit, which employs a simplified Least-Mean-Square-Algorithm (LMS)4, is directly implemented together with the FFE. Measurements in our hardware-in-the-loop testbed with a Virtex-II field programmable gate array (FPGA) from Xilinx have demonstrated that it is able to track time-varying optical channels well within 1 ms at a bit rate of 10.7 Gbit/s. Keywords: fiber optic links, intensity modulation with direct detection, electronic dispersion compensation, feedforward equalizer, distributed arithmetic, adaptive equalization, field programmable gate arrays

1.

INTRODUCTION

Since service providers have started the rollout of triple play and enhanced multimedia services, it becomes more and more obvious, that the transmission techniques and the capacity of present backbone and metropolitan area networks will no longer satisfy future requirements. However, the installed fiber infrastructure could offer more potential. The limitation is mainly due to the static nature of present optical links and the simple intensity modulated transmission schemes with direct detection (IM/DD) deployed so far. One direction in research to overcome this limitation is to adapt well established Ethernet technologies to fiber optic links (100 Gbit/s Ethernet) and to increase the net bit rate in order to exploit the already installed fiber infrastructure more efficiently. Both, the migration to Ethernet and the increase of the bit rates up to 40 - 100 Gbit/s, will challenge the underlying transmission technologies as well as the related measurement and monitoring equipment. So far, static compensation of chromatic dispersion (CD) using dispersion compensating fibers (DCF) was common in IM/DD systems with additional wavelength division multiplexing (WDM) transmission. The effect of the time-varying polarization mode dispersion (PMD) was negligible at bit rates up to 10 Gbit/s per wavelength channel. Now, if bit rates are to be increased to 40 - 100 Gbit/s, time-varying transmission impairments like PMD or changes in link characteristics due to prospective dynamic optical switching and routing cannot be neglected anymore and new approaches in optical transmission techniques have to be taken. New optical transmission schemes should be capable to adaptively compensate time-varying effects1,2,4,5 or try to increase spectral efficiency while maintaining robustness against them5-7. This trend will also penetrate into the aggregation network domain where cheap installation cost and economy of scale are major drivers. Optical compensation devices require high mechanical precision and are difficult to adjust dynamically which leads to high manufacturing and operational costs. Electronic dispersion compensation (EDC) serves therefore as an enabling physical layer technology for upgrading optical links to bit rates of 40 - 100 Gbit/s and for the ambition of introducing dynamic control into optical networks. Additionally, CMOS based manufacturing is very mature and cheap compared to *[email protected]; phone +49 711 685-67937; fax +49 711 685-67929; www.inue.uni-stuttgart.de

pure optical technologies and adaptation to changing channel characteristics is much easier using digitally implemented algorithms. Nevertheless, the rate at which such devices have to be clocked is still challenging and some parallelization and simplification techniques have to be applied. That is why especially for short reach links in metro and fiber to the curb, building or home (FTTX) scenarios, IM/DD and electronic (post-) compensation by feed-forward equalizers (FFE) and decision feedback equalizers (DFE) are still economic alternatives compared to maximum likelihood sequence estimation (MLSE) or the most recently discussed orthogonal frequency division multiplexing (OFDM). This paper is organized as follows: in section 2 we present the basic system model including the theoretical foundations of applying an adaptive FFE for EDC. Since our implementation of the adaptation unit depends on some findings which have been investigated in earlier work4, we included a short review. Section 3 deals with the detailed implementation issues introducing the principle of distributed arithmetic (DA)8 and picks up the idea of parallelizing the FFE9 to meet the clock frequencies of available digital hardware. At the end of this section, we describe the implementation of the adaptation unit using a simplified Least-Mean-Square-Algorithm (LMS)4. A hardware-in-the-loop simulation with a prototype FFE implemented on a field programmable gate array (FPGA) demonstrates in section 4 that the application of the parallel, adaptive FFE is able to prevent an optical link with time-varying channel characteristics from outage. 2.

SYSTEM MODEL AND THEORETICAL FOUNDATIONS OF EDC USING AN ADAPTIVE FFE

2.1 Optical transmission system

Fig. 1 indicates how we modeled and simulated a single wavelength optical communication link. The bits bk are taken at a rate Rb = 1 T out of a De Brujin bit sequence (DBBS) which is a well known pseudo random bit sequence (PRBS) of maximal length complemented by an additional zero bit to the longest run of zeros. The bits bk are fed to an impulse shaping device whose impulse response has time-domain raised cosine shape leading to a non-return to zero (NRZ) driving signal for the external optical modulator. Since the impulse shape fulfills the Nyquist criterion at discrete time instances t = kT with k ∈ Z , there is no inter-symbol interference (ISI) present in the transmit signal. The electrooptical conversion is performed by a Mach-Zehnder Modulator (MZM) whose electrical interface is connected to the driving signal of the impulse shaper and the optical one is connected to a laser diode which is operated in continuous wave (CW) mode. The modulated optical signal enters the standard single mode fiber (SSMF) where it is affected by CD and first-order PMD. The dispersion coefficient representing CD of an SSMF is given by D = 17 ps/(nm·km). Usually, the accumulated or residual dispersion rd = D ⋅ l in ps/nm with l being the fiber length is used to evaluate the performance in terms of the optical signal-to-noise ratio (OSNR). First order PMD can be characterized by the differential group delay (DGD) Δτ between x- and y-polarization. An equal power splitting ratio of γ = 0.5 among the two directions of polarization is assumed which causes the largest signal distortion as a worst-case scenario. In general both, CD and general order PMD, are linear distortions in the optical domain. Due to the direct detection, however, they become nonlinear distortions in the electrical part of the receiver. Besides the advantage of simplicity and low cost, that is the reason why the FFE suffers from suboptimal compensation performance. Only first order PMD is still a linear distortion in the electrical part of the receiver, since it is just a time shift of the signal in both polarizations10.

PRBS

NRZimpulse

bk

cos 2

ext. modulator SSMF MZM

opt. BP-filter

photodiode

H opt

⋅

n(t )

laser diode (CW)

LD

noise (ASE) of optical amplifier

Figure 1. Model of the IM/DD optical transmission system.

2

electr. LP-filter

sampling

H el

equalizer

FFE

T x(t )

xk

bˆk

Since we assume that the amplified spontaneous emission noise (ASE) of the optical amplifier is dominating, we just place one additive white Gaussian noise source (AWGN) in the optical domain. It is worth to mention that by optical filtering, square-law detection of the photodiode and electrical filtering, the noise characteristics will turn into a noncentral χ 2 -distribution. Due to square-law detection the noise becomes signal dependent, i.e. a transmitted “1” is more affected by noise than a “0” in IM/DD systems. Finally, the analog signal is sampled at a rate of 1 T , digitized and fed to the FFE. 2.2 EDC using an FFE

In Fig. 2 a sample block diagram of an FFE with M = 3 coefficients is depicted. In this configuration EDC consists just of a linear discrete-time transversal filter followed by a decision device whose output is the estimate bˆk for the transmitted bit bk .

xk

xk −1

ek

xk −2

T

T

bk c0

c1

c2 bˆk

yk Figure 2. FFE with indication of coefficient adjustment by MMSE: min E[| e k | 2 ] . c

Using boldface lowercase for vector, uppercase boldface for matrix notation and (⋅)T for transposition, the filtering operation between the input sample vector x k = ( x k ,

K, x

k − ( M −1)

) T and the equalizer coefficients c = (c 0 ,

K, c

M −1

) T can be

written as an inner product: yk = cT x

(1)

If we assume that the channel and noise characteristics are known in advance and if they are constant over time, it is possible to initialize the filter coefficients to fixed values. It just remains to apply an appropriate criterion how to calculate these. The minimum mean squared error (MMSE) criterion is a common approach to determine the equalizer setup for stationary channels and white noise characteristics since it not only accounts for ISI but also includes optimal noise rejection properties. As indicated in Fig. 2, it aims at minimizing the difference between a given signal level which is often chosen to be the same as the transmitted signal level representing bk and the equalizer output y k . During our investigations for suitable adaptation algorithms4 we could prove that the MMSE criterion is also well suited for IM/DD systems where the optical noise follows a non-central χ 2 -distribution after square law detection. With E[⋅] being the expectation operator the MMSE approach can be formulated by c = arg min E[| e k | 2 ] c

(2)

K

which leads to a convex optimization problem in c = (c 0 , , c M −1 ) T . This property will be important in the following subsection when adaptive equalization is introduced. The solution to the MMSE approach is given by11 c = R xx−1p bx

(3)

with R xx = E[x k x Tk ] being the autocorrelation matrix of the input samples and p bx = E[bk x k ] being the cross-correlation vector between input samples and corresponding transmitted bit which serves as a reference signal here. In Fig. 3 the MMSE approach and its intent is again illustrated by an eye diagram at the output of an analog equalizer evaluated at discrete-time sampling instants. Minimizing E[| e k | 2 ] tries to reopen the eye towards the desired signal levels representing “1” and “0”, respectively, and thus makes the samples y k less sensitive to noise influence.

y(t) or yk (at sampling instant)

ek "1"

sampling instant "0" t

Figure 3. Eye diagram for illustration of MMSE criterion at discrete-time sampling instants.

2.3 Adaptive FFE using a simplified LMS-Algorithm

As already mentioned in section 2.2 the mean squared error (MSE) E[| e k | 2 ] is a quadratic convex function in the equalizer coefficients c = (c 0 ,

K, c

) T . It is known from mathematical theory that a strictly convex function has a

K, c ) can be viewed as a multidimensional hypercone around the minimum point (cf. Fig. 4). At each point c = (c , K , c ) of this hypercone the direction to the minimum point is given by the negative gradient vector evaluated at c = (c , K , c ) . These properties are exploited in M −1

unique minimum and the MSE E[| e k | 2 ] with respect to c = (c 0 ,

T

M −1

T

M −1

0

T

M −1

0

formulating the LMS-Algorithm as an iterative solution to the MMSE approach. If we introduce an additional time index k to account for the time dependence of the coefficients, i.e. c = (c0 , turns into c k = (c0 ,k ,

K, c

M −1,k

the LMS-Algorithm to determine the coefficients

c k +1 = (c 0, k +1

c k = (c 0 , k ,

11

K, c

T

M −1, k

K, c ,K, c

) T , we may start at an initial setup c 0 = (c0 , 0 ,

M −1, 0

K, c

M −1

)T

) T . The iterative mathematical rule of

M −1, k +1

)T

at time instant

k +1

from

) at time instant k can then be written by

c k +1 = c k + μ ek x k

(4)

with μ being a scaling or step size factor to control the speed of convergence towards the optimum MMSE solution c MMSE . In eq. (4) we assume that the instant error signal ek is defined by e k = bˆk − y k . Note, that using the estimate bˆk for the determination of the instant error signal ek is called the decision-directed equalization mode and it only works if the equalizer output produces some correct decisions, i.e. the eye diagram must not be completely closed, even with the initial setup c 0 = (c0 , 0 , , c M −1, 0 ) T at time k = 0 .

K

E[|ek|2]

optimum solution: cMMSE

initial setup: c0

c0,k

c1,k

c1 c1

c0 c0

Figure 4. Example hypercone of E[| e k | 2 ] for M = 2 (left) and corresponding contour plot with indication of LMS iterations (right).

The iterations of the LMS-Algorithm are also illustrated in Fig. 4 by a contour plot. Since the term ek x k in eq. (4) is a so-called stochastic gradient vector instead of a real gradient of E[| e k | 2 ] (the expectation operator E[⋅] is just omitted11), it is not necessarily orthogonal to the contour lines. It is also worth to mention that the optimum MMSE solution c MMSE cannot be reached exactly since there exists always a random walk around it. The effect of this random walk can be controlled by choosing the step size factor μ appropriately. A high value for μ leads to fast convergence but has the disadvantage of high random walk whereas a small value for μ leads to slow convergence with less deviation from c MMSE . Thus, a trade-off between speed of convergence and accuracy has to be made. Although the LMS-Algorithm in eq. (4) has a rather simple structure, its application in a fixed-point digital signal processing unit imposes several challenges. Especially, the required multiplications among μ , the elements of x k = ( xk ,

K, x

) T and ek represent a bottleneck to execution speed of the adaptation loop. In an earlier work4 we have already investigated the performance of simplified LMS-Algorithms in a time-varying optical IM/DD system. The basic idea was to coarsely quantize the elements of x k and ek in order to replace their multiplication by simple sign correlations which are easy to implement by comparators. We have found out that the so-called Error-Sign-LMSAlgorithm with Threshold has demonstrated similar adaptation speed and accuracy as the original LMS-Algorithm at less computational complexity. If we denote the threshold for the error magnitude | ek | by Te , we can write for the update equation of the Error-Sign-LMS-Algorithm with Threshold: k − ( M −1)

0 for | ek | < Te ⎧ c k +1 = c k + ⎨ ⎩μ sign(ek ) x k for | ek | ≥ Te

(5)

In order to explain the necessity of introducing a threshold Te for the error magnitude, we have to have a closer look on the update equations. From eqs. (4) and (5) it becomes obvious, that the scaling by step size factor μ and the magnitude of the error signal ek have only impact on the length of the stochastic gradient vector ek x k whereas the input sample vector x k itself determines also the direction of change. The illustrations in Fig. 4 further reveal that the closer the current equalizer setup lies near the optimum MMSE solution c MMSE , the smaller is the magnitude of ek . Thus, if we just applied the sign of the error signal ek without the threshold Te near the optimum c MMSE (cf. eq. (5)), we would actually raise the error magnitude and produce a severe random walk away from the optimum. The algorithm in eq. (5) prevents this deficiency of overweighting small error signal magnitudes.

3.

EFFICIENT DIGITAL FPGA IMPLEMENTATION OF A PARALLEL, ADAPTIVE FFE

In this section we make the step from discrete-time continuous-valued equalization to digital equalization which is suitable for a fixed-point number implementation on an FPGA. We have developed a dedicated structure of the FFE which is best suitable for the architecture of the FPGA. 3.1 DA filters

Direct implementation of the multiplications which are required in the filtering and coefficient update operations is not very efficient on an FPGA with its flip flop (FF) and look-up table (LUT) structure. DA, which is a bit level rearrangement of the inner product for calculating the equalizer output y k , is very suitable for an FPGA implementation of an FFE when the number of coefficients M is small1,2,8.

K

By using a resolution of B bits for the input samples x k − m , m = (0, , M − 1) , we may address the individual bit positions by the notation x k − m [i ] ∈ {0,1} , i = ( B − 1, ,0) . Applying this binary notation to eq. (1), we arrive at: y k = c0 (xk [ B − 1] ⋅ 2 B −1 +

K

K + x [0] ⋅ 2 ) + K + c 0

k

M −1

(x

k −( M −1)

[ B − 1] ⋅ 2 B−1 +

K+ x

k − ( M −1)

[0] ⋅ 2 0 )

(6)

Note, that for the moment, we omit the discrete time index k in the coefficient notation (cf. section 2.2). Now, if we rearrange this result according to the powers of 2, we get:

K+ c

y k = (c0 xk [ B − 1] + ⎛ c0 ⎞ ⎜ ⎟ =⎜ ⎟ ⎜ ⎟ ⎝ c M −1 ⎠

M

T

M −1

xk −( M −1) [ B − 1]) ⋅ 2 B −1 +

⎛ x k [ B − 1] ⎞ ⎜ ⎟ B −1 ⎜ ⎟⋅2 ⎜ ⎟ ⎝ xk −( M −1) [ B − 1] ⎠

M

+

⎛ c0 ⎞ ⎜ ⎟ +⎜ ⎟ ⎜ ⎟ ⎝ c M −1 ⎠

K M

T

K + (c x [0] + K + c k

M −1

xk −( M −1) [0]) ⋅ 20

⎛ x k [0 ] ⎞ ⎜ ⎟ 0 ⎜ ⎟⋅2 ⎜ ⎟ ⎝ xk −( M −1) [0] ⎠

Kx

Since xk −m [i ] ∈ {0,1} , we can define a bit vector x[i ] = ( x k [i ]

0

M

k −( M −1)

(7)

[i ])T for each bit position i = ( B − 1,

K,0) in the

final result of eq. (7). The bit vectors x[i ] can be used as binary addresses to an LUT which stores all 2 M possible results of the inner product c T x[i ] , i.e. the LUT stores all distinct sums among the coefficients. Reading out the LUT contents, shifting them according to their bit position (i.e. multiplication with a power of 2) and the final addition leads to the DA filter structure depicted in Fig. 5 as an example with M = 3 and B = 5 . When we implement the DA filter we can further exploit the fact that the DA LUTs can be used by all bit positions in parallel. This can be done by distributing the LUT entries with multiplexer structures towards the summing nodes (cf. Fig. 5).

K x [0])

( x k [ B − 1]

k

2

= (11101) 2

5 ⎛ x k [ 0] ⎞ ⎛ 1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ xk −1[0] ⎟ = ⎜ 1 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ x k − 2 [ 0] ⎠ ⎝ 0 ⎠

3

⎛ xk [1] ⎞ ⎛ 0 ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ xk −1[1] ⎟ = ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ xk − 2 [1] ⎠ ⎝ 0 ⎠

⎛ xk [ 2] ⎞ ⎛1⎞ ⎜ ⎟ ⎜ ⎟ ⎜ xk −1[ 2] ⎟ = ⎜1⎟ ⎜ ⎟ ⎜ ⎟ ⎝ xk − 2 [ 2] ⎠ ⎝1⎠

3 LUT-content 0 c0 c1 c0 + c1

M

M

(111)

Figure 5. Principle of DA filtering.

T

5 LUT address generation

LUT-adress (000)T (001)T (010)T (011)T T

( 1 01 0 0 ) 2

( 0 0101 ) 2

T

c0 + c1 + c2

⎛ x k [3] ⎞ ⎛ 1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ xk −1[3] ⎟ = ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ x k −2 [3] ⎠ ⎝ 0 ⎠

3

3

5

⎛ x k [ 4] ⎞ ⎛ 1 ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ x k −1 [ 4] ⎟ = ⎜ 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ x k − 2 [ 4] ⎠ ⎝ 1 ⎠

3

× 20 × 21

M × 25

yk

3.2 Parallel equalizer implementation The serial equalizer depicted in Fig. 2 would have to be operated at a clock frequency which is as high as the nominal line or bit rate Rb . When we talk about optical communications with bit rates in the range of 40 - 100 Gbit/s, the implementation of digital circuits capable of allowing clock frequencies in this regime is very challenging and costly. At the moment, it is hardly possible with available silicon and mask technology. Therefore, we have to think about an alternative way for implementation which allows operation at feasible clock frequencies and at manageable costs as well as complexity. At this point, we have already developed and presented an approach for parallelization of the FFE structure9. The parallel FFE can be configured in such a way that a trade-off between clock frequency and chip area consumption can be made. Fig. 6 illustrates the idea of parallelization showing a parallel FFE built up of P equivalent single FFE macros. The parallelization P adjusts the clock frequency to 1 /( PT ) at the expense of placing P times the single FFE macros. Each single FFE macro consists of one DA filter as described in the previous subsection. The input signal x(t ) is sampled and A/D-converted in parallel before the generated samples xkP− j , j = (0,

K, P − 1) , enter the parallel FFE. The single FFE

blocks are connected appropriately to the input samples and equalization is performed in each single FFE in parallel which also produces a parallel bunch of estimates for the transmitted bits. It is also obvious that several single FFEs may share a common adaptation unit since we employ equivalent FFE macro blocks and the input samples of all single FFEs are affected by the same channel conditions and noise statistics. PT

x (t )

xkP

T

PT c0

LL

PT

xkP − iP equalizer #0

ciP bkP

T

xkP − ( P −1) c1 PT xkP −1 c0

LLx LL

PT

cxMkP−−1( P −1) − iP

kP − ( M −1)

equalizer #(P-1)

ciP bkP − ( P −1)

c1 xkP − ( P −1) −1

Lx L

cM −1

kP − ( P −1) − ( M −1)

Figure 6. Structure of parallel equalizer.

3.3 Implementation of the adaptation unit The adaptation unit implements the Error-Sign-LMS-Algorithm with Threshold presented in subsection 2.3. First, when we just consider one element of the vectorial adaptation algorithm in eq. (5), we may write for the update of an arbitrary coefficient cm , m = (0, , M − 1) :

K

0 for | ek | < Te ⎧ cm ,k +1 = cm ,k + ⎨ μ sign ek ) xk −m for | ek | ≥ Te ( ⎩

(8)

Note, that the discrete-time index k is added again to the coefficients to emphasize the temporal dependency in an adaptive equalization context. Now, recalling the DA structure from subsection 3.1, we have actually to update the LUT contents which consist of all distinct sums among the coefficients. Let ( X 0 X M −1 )T with the binary elements X m ∈ {0,1} , m = (0, , M − 1) , be the

K

address of an arbitrary LUT entry, then its content is given by c0,k X 0 +

K+ c

K

M −1, k

X M −1 .

Applying eq. (8) to each coefficient gives us the update equation for the LUT content having the address ( X 0

KX

M −1

)T :

c0,k +1 X 0 +

K+ c

M −1, k +1

X M −1

⎛ ⎧⎪ 0 for | ek | < Te ⎫ ⎞⎟ = ⎜ c0 , k + ⎨ ⎬ X0 + ⎜ ⎪⎩μ sign(ek ) xk for | ek | ≥ Te ⎭ ⎟⎠ ⎝

= c0 , k X 0 +

K+ c

0 , M −1

K + ⎛⎜⎜ c

M −1, k

⎧⎪ 0 for | ek | < Te ⎫ ⎞⎟ +⎨ ⎬ X M −1 ⎪⎩μ sign(ek ) xk −( M −1) for | ek | ≥ Te ⎭ ⎟⎠

(9)

⎝ 0 for | ek | < Te

⎧ X M −1 + ⎨ ⎩μ sign(ek ) (xk X 0 +

K+ x

k −( M −1)

X M −1 ) for | ek | ≥ Te

A closer look at the final result of eq. (9) reveals that we can also avoid direct multiplications in the adaptation unit itself. For X m ∈ {0,1} , m = (0, , M − 1) , are binary, the terms containing them for multiplication are easy to evaluate. If

K

| ek | ≥ Te , we just have to sum up all input samples xk −m , m = (0,

K, M − 1) , for which the corresponding address bit

X m ≠ 0 . Afterwards, the sign of the error signal ek has to be evaluated and the calculated sum of input samples has possibly to be negated. The scaling by the step size factor μ is performed by a simple shift operation if μ is chosen to be a power of 2. The actual implementation of the adaptation unit is aligned to the processing flow of the FFE as shown in the upper part of Fig. 7. We employed some pipelining technique to implement the DA filter operations. In the first clock cycle we read the corresponding LUT contents depending on the input sample vector x k = ( xk , , xk −( M −1) )T . Then we do the shifted

K

summations in the next one, and finally we make the decision. equalizer

xk

LUT

summation

bˆk

decision

yk

bˆk

adaptation unit

c k +1 x

Summation summation

xkk

Negation negation

update

cck

c k?

k

Figure 7. Processing flow of the FFE (top) and of the adaptation unit (bottom).

The basic schematic of the adaptation unit is illustrated in Fig. 8. Fig. 8 has to be explained with the aid of Fig. 7 because the steps necessary for adaptation can also be pipelined accordingly. All possible sums of input samples are calculated with respect to the different LUT addresses and shifted according to the step size factor μ in the first pipelining stage. Since we would have to wait for the equalizer output yk and the decided bit bˆk for the residual processing tasks, we avoid idle time and put the negation of all shifted sums in advance. So we are prepared for both cases of the error sign evaluation and just have to choose the corresponding result. In the final stage, when the equalizer output yk and the decided bit bˆ are available, we can determine the magnitude and sign of the error signal. At this point it is worth to have k

a detailed view on the error evaluation part. The output of the equalizer is a single bit. Hence, we first have to transform the bit into its corresponding fixed-point signal level (denoted as “upper-“ and “lower eye ref.” in Fig. 8). After that we are able to compare these signal levels to the equalizer output yk and determine if the error magnitude | ek | is below or above the threshold Te . Depending on the result of this comparison, we choose the precalculated LUT update values and add them to the contents of the LUTs.

error evaluation

upper eye ref. By

lower eye ref.

bˆk D Q

yk D Q

D Q By

negation

summation xk

B

D Q

D Q Bsum

xk-1

(100)T D Q

D Q

Bsum

Bsum

Bc

1

B

D Q

D Q Bsum

xk-2

LUT

1

(010)T D Q

Bsum

D Q Bc

Bsum

(001)T

1

B

D Q

D Q

D Q

Bsum

Bsum

D Q Bsum

Bc

(110)T

1 D Q Bsum

D Q

Bsum

D Q Bsum

Bc

(101)T

1 D Q Bsum

D Q

D Q

Bsum

Bsum

Bc

(011)T

1 D Q Bsum

D Q

Bsum

D Q Bsum

Bc

1 D Q Bsum

Bsum

(111)T D Q

D Q Bsum

Bc

Fig. 8. Schematic of the adaptation unit.

The processing flow involves 4 pipelining stages, i.e. it lasts 5 clock cycles until the equalizer output of the corresponding input samples becomes available in the output latches of the equalizer core. The same holds for the updated LUT contents, which are also available right after 5 clock cycles. Since new LUT updates must be based on the current coefficient setup according to eq. (8), the update procedure is only started every fifth clock cycle to account for this fact.

4.

SYNTHESIS USING A VIRTEX-II FPGA AND PERFORMANCE RESULTS

4.1 Synthesis results and hardware-in-the-loop setup We synthesized and tested the parallel, adaptive FFE design on a Virtex-II FPGA evaluation board from Hunt Engineering12 (cf. Fig. 9) whose clock frequency was fixed to 100 MHz. That is why our best practical synthesis for this setup resulted in placing P = 107 single FFE macros with M = 3 coefficients in parallel at a logic utilization of 77% . FPGA

P

PC + simulation software

Figure 9. Virtex-II FPGA on carrier board (left) and schematic of the hardware-in-the-loop simulation chain.

S

OUT-FIFO

P

equalizer

IN-FIFO

S

This leads to a throughput of 10.7 Gbit/s. Thus, one could thing of a link providing a net bit rate of R B = 10 Gbit/s where additional forward error correction (FEC) overhead of about 7% is included. The timing conditions allowed us to employ only one global adaptation unit which distributes the updated LUT contents to all 107 single FFEs. Restrictions concerning real-time data delivery were imposed by the low speed peripheral interface of the Virtex-II FPGA and the evaluation board. The interface to the evaluation board is realized via a universal serial bus (USB) connection to the PC where the optical transmission system is simulated (cf. Fig. 9). That is why we had to implement the serial-to-parallel (S/P) conversion for the input to the equalizer core and the parallel-to-serial (P/S) conversion from the output to the PC also on the FPGA. Some logic resources had to be dedicated for this task. Nevertheless, we can assert that the equalizer core itself provides the throughput of 10.7 Gbit/s and that our prototyping testbed is suitable to demonstrate the feasibility of our implementation approaches. Using recent FPGA devices7 or aiming towards a fullcustom integrated circuit design could circumvent these restrictions.

4.2 Performance results Fig. 10 shows the behavior of the parallel, adaptive FFE with respect to its coefficients and a temporally windowed bit error ratio (BER) under time-varying channel conditions. The equalized system is compared to a simple threshold receiver. The static part of the optical channel consists of a 63.7 km SSMF span causing a CD of rd = 1083 ps/nm. The first-order PMD is periodically switched from Δτ = 0 ps to Δτ = 50 ps and vice versa within a period of 200 µs. In the time slots where only CD is present on the optical link, the threshold receiver works also below a BER of 1.0e-3 which is generally required before FEC evaluation. But if additional first-order PMD disturbs the signal, the threshold receiver goes far beyond the level of 1.0e-3 whereas the equalized system goes back to around 1.0e-3 after a few µs of adaptation time and thus prevents system outage by quick reaction to changes in link characteristics. 140

1e-1 change of channel conditions

120

threshold receiver FFE 3 1e-2

80

temporal BER

coefficients

100

c0 c1 c2

60 40 20

1e-3

1e-4

0 -20

change of channel conditions

-40 0

400

800 t [µs]

1200

1e-5

1600

0

400

800

1200

1600

t [µs]

Fig. 10. Temporal behaviour of the coefficients (left) and the temporal BER (right) for the parallel, adaptive FFE.

Since adaptation time of our solution scales directly with clock frequency and/or parallelization, it is obvious that if our implementation was applied to a system with R B = 40 Gbit/s, we would be able to adapt the FEE even below 1 ms. Compared to externally adjusted analog equalization filters reported so far1,2 with adaptation times in the range of seconds, our digital approach offers great improvement.

5.

CONCLUSION

We have developed and verified a scalable and hardware-efficient FFE architecture which allows efficient digitally adaptive EDC. Most notably, this adaptive design shows much faster adaptation speed than similar known designs in this research field so far. Our dedicated approach is able to track time-varying channel conditions well within 1 ms which is far below the measured PMD variations in installed fibers3 (~ 10 ms). Taking recent advances in speed and logic resources of FPGA technology into account, the achieved throughput of 10.7 Gbit/s on a Virtex-II could be further increased. Although the realization by an FFE is suboptimal concerning its compensation properties in an optical IM/DD link, our solution might be very attractive in those network domains where low cost and economy of scale are important.

REFERENCES 1.

Nakamura, M., Nosaka, H., Ida, M., Kurishima, K. and Tokumitsu, M., “Electrical PMD equalizer ICs for a 40Gbit/s transmission,” Optical Fiber Communication Conference (OFC), Los Angeles, CA, February 2004, paper TuG4 2. Franz, B., Rosener, D., Dischler, R., Buchali, F., Junginger, B., Meister, T.F. and Aufinger, K., “43 Gbit/s SiGe Based Electronic Equalizer for PMD and Chromatic Dispersion Mitigation,” European Conference on Optical Communications (ECOC), Glasgow, Scotland, September 2005, vol. 3, pp. 333 - 334 3. Bülow, H., Baumert, W., Schmuck, H., Mohr, F., Schulz, T., Kuppers, F. and Weiershausen, W., “Measurement of the Maximum Speed of PMD Fluctuation in Installed Field Fiber,” Optical Fiber Communications Conference (OFC), San Diego, CA, USA, February 1999, vol. 2, pp. 83 – 85 4. Efinger, D. and Speidel, J., “Investigation of Fast and Efficient Adaptation Algorithms for Linear Transversal and Decision-Feedback Equalizers in High-Bitrate Optical Communication Systems”, 9. ITG Fachtagung Photonische Netze, Leipzig, Germany, April 2008 5. Freckmann, T. and Speidel, J., “Viterbi equalizer with analytically calculated branch metrics for optical ASK and DBPSK,” IEEE Photonics Technology Letters, January 2006, Vol. 18, No. 1, pp. 277 – 279 6. Ohm, M., “Multilevel optical modulation formats with direct detection,” Dissertation, Universität Stuttgart, ShakerVerlag, 2006. 7. Buchali, F., Dischler, R., Klekamp, A., Bernhard, M. and Efinger, D., “Realisation of a real-time 12.1 Gb/s optical OFDM transmitter and its application in a 109 Gb/s transmission system with coherent reception,” European Conference on Optical Communications (ECOC), Vienna, Austria, September 2009 8. S.A. White, “Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review,” IEEE ASSP Magazine, July 1989 9. Efinger, D. and Speidel, J., “A Parallel Equalizer for High-Speed Electronic Dispersion Compensation,” European Conference on Optical Communications (ECOC), Berlin, Germany, September 2007 10. Otte, S., “Nachrichtentheoretische Modellierung und elektronische Entzerrung hochbitratiger optischer Übertragungssysteme,” Dissertation, Christian-Albrechts-Universität zu Kiel, Shaker-Verlag, 2003 11. S. Haykin, “Adaptive Filter Theory,“ Prentice Hall, 3rd edition, 1996 12. http://www.hunteng.co.uk/info/virtex2.htm

A Scalable and Hardware-Efficient Architecture for ... - CiteSeerX

A Scalable and Hardware-Efficient Architecture for ... - CiteSeerX

Suggest Documents

A Scalable Architecture for Autonomous Heterogeneous ... - CiteSeerX

A Scalable Architecture for Autonomous Heterogeneous ... - CiteSeerX

Scalable Architecture and Evaluation for Multiparty ... - CiteSeerX

ShareStreams: A Scalable Architecture and Hardware ... - CiteSeerX

A Scalable and Self-Configuring Architecture for Service ... - CiteSeerX

SPOTT: A Predictable and Scalable Architecture for ... - CiteSeerX

Programmable and Scalable Architecture for

GrayWulf: Scalable Clustered Architecture for Data ... - CiteSeerX

Compass: A scalable simulator for an architecture for ... - CiteSeerX

A Microkernel Architecture for a Highly Scalable Real ... - CiteSeerX

A Scalable Network Architecture for Distributed Virtual ... - CiteSeerX

DIRAC: A Scalable Lightweight Architecture for High ... - CiteSeerX

A Scalable Architecture for Parallel CORBA-based ... - CiteSeerX

A Scalable BIST Architecture for Delay Faults - CiteSeerX

A scalable tool architecture for diagnosing wait states in ... - CiteSeerX

A Scalable Architecture for the HTML5/ X3D Integration ... - CiteSeerX

A Scalable BIST Architecture for Delay Faults - CiteSeerX

An Architecture for A Scalable Wide Area Distributed System - CiteSeerX

RAID-II: A Scalable Storage Architecture for High ... - CiteSeerX

A Component-Based Architecture for Scalable Distributed ... - CiteSeerX

RAID-II: A Scalable Storage Architecture for High ... - CiteSeerX

Service-Oriented Architecture for Building a Scalable ...

A SCALABLE ARCHITECTURE FOR DIRECTORY ... - Semantic Scholar

A Scalable Register File Architecture for