A SOFT-INPUT ADAPTIVE EQUALIZER ALGORITHM Todd K. Moon∗ & Daniel J. Monroe† & Aleksey Orekhov‡ & Jacob H. Gunther§ Electrical and Computer Engineering Department Utah State University Logan, UT 84332-4120 nd [t] ∼ N (0, σd2 )
ABSTRACT Equalization of digital communication signals in a modern setting often has available probability distributions as desired inputs, rather than simply the desired symbols. Such data may be available, for example, in a turbo equalization setting. This paper presents an adaptive equalizer which takes probability distributions as inputs and trains its output to match the desired distributions, where the output distribution is obtained as a posterior error calculation based on the FIR filter output. Two training criteria are examined: Euclidean mean squared error between the output distribution and the desired distribution, and the relative entropy between these distributions are presented. Both LMS and RLS adaptation methods are developed. 1. INTRODUCTION Consider an adaptive filter employed as an FIR adaptive equalizer in a digital communication setting, with input x[t] ∈ C, filter state xt ∈ CL , filter coefficients h[t] ∈ CL , and filter output y[t] ∈ C. The desired input d[t] of an equalizer is represented as a sequence of points from the signal constellation. That is, given a signal constellation C = {q0 , q1 , . . . , qM −1 } ⊂ C having M points, the desired signal at each time is d[t] ∈ C. Conventionally, this desired signal may be viewed as being transmitted over a perfect side channel to the equalizer at the receiver. Practically it is often obtained by training signals prepending the data payload of a packet. By contrast, in this paper we contemplate that a probability mass distribution over the points in the constellation is available, so that d[t] ∈ RM is the “soft” probability vector d[t] = [P (q[t] = q0 ), P (q[t] = q1 ), . . . , P (q[t] = qM −1 )]T , ∗ Information Dynamics Laboratory and Electrical and Computer Engineering Dept., Utah State University, Logan, UT 84332,
[email protected]. This work was supported in part by the 2009 NSA-REU at USU † Bradley University ‡ Cooper Union § Utah State University
dt
d[t]
φ
dist
2
n[t] ∼ N (0, σ ) q[t]
y[t] channel
ADF h
φ
φt
Fig. 1. Conceptual block diagram. The φ function computes posterior probabilities for comparison with the soft desired probabilities d
where q[t] ∈ C is the transmitted constellation point at time t. This desired signal may be viewed as the result of the training data being passed over a noisy side channel to the receiver, as portrayed in figure 1. (Removal of the blocks in dashed lines and replacing the connections with a scalar error e(t) = d[t] − y[t] yields the standard LMS, but with a noisy input.) The soft-input setting may be naturally obtained, for example, in a turbo-equalizer/decoder setting (see, e.g., [1, 2, 3, 4]) , where a following decoder emits soft outputs which are then fed back to the adaptive equalizer. As a model in this setting, the variance of the noisy desired data, σd2 , is likely to be less than the variance of the channel, σ 2 . The noisy desired signal setting may also be applicable to a quasi-blind setting. One approach to equalization with soft inputs is to slice the probabilities into symbols (i.e., employ MAP detection) and utilize a conventional adaptive filter. Such hard decoding prior to equalization, however, reduces the information available to the equalizer, and is more prone to instability when the symbols fed back are incorrectly decoded. It is natural in this setting, then, to seek a soft-input equalizer. The BCJR (or APP) algorithm (see, e.g., [5, 6] is a natural (and optimal) soft-input/soft-output equalizer that could be used. However, it suffers from complexity exponential in the channel length, and so may be unsuitable for many applications. The soft-input adaptive filter has complexity only linear in the equalizer length, and so is more com-
putationally suitable for some applications, at the expense of some performance as compared with the APP algorithm. Our development will show that, like the BCJR algorithm, it is possible to incorporate prior inputs on the symbols, a feature not shared with the conventional hard-input LMS adaptive filter. Another benefit of this equalizer is that it can track changes in the channel over the course of the data packet, a capability difficult to achieve with the BCJR algorithm. These algorithms may be considered as adaptive approximations to the minimum mean-squared equalizers presented in [7].
For convenience, throughout the remainder of this development we will deal with this simplified form. The derivative of (1) is P ∗ 2 1 ∗ ∗ ∂φi (y) j (qi − qj ) exp[− σ 2 Re(y (qi − qj ))] σ2 P = ∂y (1 + j6=i exp[− σ22 Re(y ∗ (qi − qj ))])2
In what follows, we consider the (Euclidean) mean-squared error criterion and a Kullback-Leibler criterion as optimization criteria, and express adaptation rules in both the stochastic gradient descent (as in LMS adaptive filters) and recursive least-squares (RLS) formulations. 2.1. Stochastic Gradient Descent Adaptation
2. DEVELOPMENT The filter output y[t] is y[t] = hH xt . Given the filter output y[t], the a posterior probability (APP) of a symbol qi ∈ C at time t is p(y[t]|q[t] = qi )P (q[t] = qi ) P (q[t] = qi |y[t]) = PM −1 j=0 p(y[t]|q[t] = qj )P (q[t] = qj ) △
The first cost function represents the desire to minimize the mean Euclidean distance between a desired vector of probabilities dt and the probability vector produced by the filter, J(h) = Ekd[t] − φt k2 The gradient of this function is ∂J(h) = −2E[eTt φ′t xt ], ∂h∗
△
= φi (y[t]) = φi [t],
where p(y[t]|q[t] = qi ) is the likelihood of the observation y[t] when qi is the transmitted symbol. It is interesting to observe that this explicitly incorporates the prior probabilities. The vector of output probabilities φt = [φ0 (y[t]), . . . , φM −1 (y[t])]T = [φ0 (hH xt ), . . . , φM −1 (hH xt )]T △
= [φ0 [t], . . . , φM −1 [t]]T is the soft symbol output of the adaptive filter. We consider, then, that the adaptive filter consists of an FIR processing stage, as is conventional, followed by the APP calculator. In the case of complex zero-mean Gaussian noise, the equalized observation y[t] can be written as y[t] = q + n[t] for some signal constellation point q ∈ C, where n[t] is complex noise, n[t] ∼ N (0, σ 2 ). Then exp[− 12 |y − qi |2 ]P (qi ) φi (y) = PM −1 σ 1 2 j=0 exp[− σ 2 |y − qj | ]P (qj )
φi (y) = =
exp[ σ22 Re(y ∗ qi )] PM −1 2 ∗ j=0 exp[ σ 2 Re(y qj )] 1
1+
P
j6=i
exp[− σ22 Re(y ∗ (qi − qj ))]
where et = d[t] − φt is the probability error vector and where φ′t = [φ′0 (y[t]), . . . , φ′M −1 (y[t])]T is the element-by-element derivative of φt , and where h∗ is the conjugate of the filter coefficient vector h. The conventional stochastic gradient approximation is now invoked, ∂J(h) ≈ −2eTt φ′t xt . Filter coefficients are updated by ∂h∗ gradient descent, h[t + 1] = h[t] + µeTt φ′t xt .
(1)
(3)
The update rule is thus similar to an LMS adaptive filter, except for the presence of the factor φ′t . We refer to this as the LMS-E (for Euclidean) algorithm. We may also contemplate adaptation with respect to the Kullback-Leibler distance. In this case, the cost function is the scaled relative entropy (Kullback-Leibler distance) between the probability distributions d = dt and φ = φt : J(h) = D(dt ||φt ) =
In the case of equal prior probabilities (or ignoring the priors) and constant magnitude signals, significant simplifications are possible:
(2)
M −1 X
di [t] log
i=0
di [t] φi [t]
It is straightforward to show that M −1 X ∂J(h) di [t]φ′i (y[t])xj = − ∗ ∂hj φi (y[t]) i=0
so that
∂J(h) = −ǫTt φ′t xt ∂h∗
where ǫt is the probability ratio vector with components di [t] . φi (y[t])
1 1
0.5
Filter coefficients are updated by gradient descent, h[t + 1] = h[t] + µKL ǫTt φ′t xt .
imag(g’/g)
0.5 real(g’/g)
ǫi =
0
−0.5
(4)
−2
−1 2 1
0
φ′i (y) =
′
g g
(y) =
1 σ2
M −1 X l=0
P
M −1 X
qi∗ −
1 φi (y)(qi∗ − ηxH t h). σ2
Then ∂φi (y[t]) ∂φi (y[t]) ∂y[t] 1 = ≈ 2 φi (yt ) qi∗ − ηxH t ht xt . ∂h∗ ∂y[t] ∂h∗ σ
1 ∗ σ 2 qi .
Also
ql∗ −2 ∗ j exp[ σ 2 Re{y (ql − qj )}]
!
N X
kdt − φt k2
t=1
has the approximate gradient # " N ∂J(h) X X 1 ∗ . = e [t]φi (y) ηxt xH t h − xt q i 2 i ∂h∗ σ t=1 i Setting this gradient to zero and solving, we obtain the normal equations
.
(6)
N X N X X X ei [t]qi∗ φi [y]xt . ηei [t]φi [y]xt xH t h= t=1
t=1
i
i
(8)
!
−2 j exp [ σ 2
P
To simplify the form of the equation let Q = diag(q0 , q1 , . . . , qM −1 ) Also, let ηeTt φt = γ[t] and eTt Q∗ φt = β[t], simplifying (8) to N N X X H β[t]xt . γ[t]xt xt h = t=1
t=1
This is readily expressible using the conventional recursive update using the matrix inversion lemma. Let R−1 N
=
N X
γ[t]xt xH t
and
rN =
where 1 X 1 η= Re(ql )ql∗ = 2 M σ2 σ
( 1 QPSK 2 BPSK.
N X
β[t]xt ,
t=1
t=1
l
Real(y)
Using y[t] = hH xt , the approximate derivative is
J(h) =
ql∗ . Re{y ∗ (ql − qj )}] l=0 (7) Setting up our RLS structure will require (7) to be linear in y. Figure 2 shows the real and imaginary parts of (g ′ /g)(y) for a QPSK constellation. For symmetric constellations, the real part is a monotonic increasing function of Re(y) and the imaginary part is a monotonic decreasing function of Im(y). A linearization about the origin yields ′ η g (y) ≈ 2 y ∗ g σ φi (y) σ2
2
2
Imag(y)
Real(y)
The Euclidean least-squares criterion
From (5) and (6) we obtain φ′i (y) =
−2
φ′i (y) ≈
The function fi (y) is exponential, so fi′ (y) = ki (y)fi (y) for some function ki (y), so g ′ (y) ′ . (5) φi (y) = φi (y) ki (y) − g(y)
1
−1 −2
fi′ (y) fi (y) g ′ (y) − . g(y) g(y) g(y)
Taking the derivative of (1), ki (y) =
0
Fig. 2. Re((g ′ /g ′ )(y) and Im((g ′ /g)(y)
2.2. Recursive Least-Squares Adaptation
fi (y) φi (y) = , g(y)
0 −1
0
−1
We can also consider adaptation using a least squares approach. Again consider (1), which has the form
−1 −2
2
1
Imag(y)
The similarity of (3) and (4) is interesting: the only difference is that in one case the error is formed as a difference, while in the other it is formed as a ratio. Another observation is that (4) represents an exact gradient, so that the stochastic gradient approximation is not employed.
so that
0
−0.5
so that R−1 N h = rN , which has the solution hN = RN rN .
(9)
−1 H R−1 N +1 = ρRN + γ[N + 1]xN +1 xN +1 ,
10 0 Magnitude (dB)
R−1 N and rN can be recursively updated with a forgetting factor ρ as
−10 −20 −30
rN +1 = ρrN +1 + β[N + 1]xN +1 .
−40
Let KN +1 =
ρ γ[N +1]
RN xN +1 . + xH N +1 RN xN +1
(11)
0
0.1
0.2
0.3 0.4 0.5 0.6 0.7 Normalized frequency (xπ rad/sample)
0.8
0.9
1
Fig. 3. Test channel magnitude response
1.5 squared error
Using the matrix inversion lemma, the recursive update for RN +1 can be computed as −1 H RN +1 = ρR−1 N + γ[N + 1]xN +1 xN +1 # " RN xN +1 xH 1 N +1 RN . (10) = RN − ρ H ρ γ[N +1] + xN +1 Rt xN +1
LMS, clean d LMS with noisy d Soft−in LMS−E
1 0.5 0 0
500
1000
1500
training epoch
Then (10) becomes RN +1 =
1 RN − KN +1 xH N +1 RN ρ
Fig. 4. Error measures for conventional and soft-in LMS
From (9), the coefficient update rule becomes
3. SOME RESULTS
hN +1 = RN +1 rN +1 = hN + KN +1
β[N + 1] ∗ − y [N + 1] . (12) γ[N + 1]
In summary, the following steps implement the soft input RLS algorithm with least-squares criterion function: • • • • • •
Compute the filter output y[t + 1] = hH t xt+1 . Compute the error et+1 = dt+1 − φy[t+1] . Use (11) to compute the Kalman Gain Vector. Update the filter coefficients using (12). Update the Rt matrix with (10). Repeat.
Now consider the Kullback-Leibler criterion in a least[t] . Prosquares setting, with the error defined as ǫi [t] = φdii[y] ceeding similarly as before, # " X X 1 ∂J(h) ∗ . =− ǫi [t]φi [y] xt xH t h − xt q i ∂h∗ σ2 t i
Minimize the gradient with respect to h to obtain the normal equations N X X t=1
i
N X X [ǫi (t)qi∗ φi (y)xt ] , ǫi (t)φi (y)xt xH t h= t=1
i
or, vectorizing,
N N X X T ∗ T ǫt Q φn xt . ǫt φt xt xH t h= t=1
t=1
The rest of the RLS-KL algorithm proceeds as in the Euclidean case.
The adaptive filters were tested in an equalization setting, where a 15-tap complex channel was used having the frequency response shown in figure 3. Figure 4 shows the average squared error for σ 2 = 0.05 (Eb /N0 = 15 dB) for a 50-tap adaptive equalizer, where σd2 is 3 dB less than σ 2 , µ = 0.01, and results are averaged over 1000 independent runs. The green trace shows conventional LMS. The red trace shows the convergence of the LMS when the desired signal is corrupted, but otherwise standard LMS. The cyan line shows the soft-input LMS with Euclidean noise measure. It should be recognized that this is actually a different error measure, being the Euclidean measure kdt − φt k2 , not |y[t] − d[t]|2 . But notice that the error measure continues to decrease long after the squared error flattens out. The reason for ongoing decrease become apparent in figure 5, which shows the trace of the filter coefficients for the LMS and soft-in LMS-E algorithm. The soft input filter continues to adapt to push for ever more certain probabilities, instead of simply attempting to match the symbol exactly. Figure 6 shows the distance measure for the soft-input LMS-KL. To achieve stability, we have set µKL = µ/10 = 0.001. As before, the filter coefficients continue to adapt, driving the error measure ever smaller. Figure 7 shows convergence results for RLS, with some comparisons against LMS(µ = 0.01). Again, the noisy data results in higher error, and again the soft RLS converges harder. Convergence is not as direct, as figure 8 shows: there is a period when the coefficient vary wildly, then settle down to their correct values.
0.8 |H|, RLS
0.6
|H|, LMS
0.8 0.6
0.4 0.2
0.4
0 0
500
1000
1500
1000
1500
0.2 500
1000
1500
training epoch
|H|, Soft LMS−E
0.8 0.6 0.4
0.6 0.4 0.2 0 0
500 training epoch
0.2 0 0
|H|, Soft RLS−E
0.8 0 0
500
1000
1500
training epoch
Fig. 5. Adaptive filter coefficients for LMS and soft-in LMS-E
Fig. 8. Coefficient adaptation for RLS and RLS-E 4. REFERENCES [1] M. T¨uchler, R. Koetter, and A. C. Singer, “Turbo Equalization: Principles and New Results,” IEEE Trans. Comm., vol. 50, pp. 754–767, May 2002. [2] R. Koetter, A. C. Singer, and M. Tuchler, “Turbo equalization,” IEEE Signal Processing Magazine, pp. 67–80, Jan. 2004.
KL error
6 4 2 0 0
500
1000
1500
training epoch
|H|, Soft LMS−KL
0.8
[4] L. Hanzo, T. Liew, and B. Yeap, Turbo Coding, Turbo Equalization and Space-Time Coding for Transmission Over Fading Channels. West Sussex, England: Wiley, 2002.
0.6 0.4 0.2 0 0
[3] M. T¨uchler, A. C. Singer, and R. Koetter, “Minimum mean squared error equalization using a priori information,” IEEE Trans. Signal Processing, vol. 50, pp. 673– 683, Mar. 2002.
500
1000
1500
training epoch
Fig. 6. KL distance and adaptive filter coefficients
[5] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate,” IEEE Trans. Info. Theory, vol. 20, pp. 284–287, Mar. 1974. [6] T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms. Wiley, 2005.
1.5 LMS, clean d RLS, clean d RLS, noisy d Soft RLS−E
error
1 0.5 0 0
500
1000
1500
training epoch
Fig. 7. Error measures for RLS, noisy RLS, and soft RLS-E, compared with LMS
[7] M. T¨uchler, A. C. Singer, and R. Koetter, “Minimum Mean Squared Error Equalization Using A Priori Information,” IEEE Trans. Signal Processing, vol. 50, pp. 673–683, Mar. 2002.