1300
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 7, JULY 2007
Transactions Papers Soft-Input Soft-Output Equalizers for Turbo Receivers: A Statistical Physics Perspective Mauri Nissil¨a and Subbarayan Pasupathy, Fellow, IEEE
Abstract—Many algorithms in signal processing and digital communications must deal with the problem of computing the probabilities of the hidden state variables given the observations, i.e., the inference problem, as well as with the problem of estimating the model parameters. Such an inference and estimation problem is encountered, for e.g., in adaptive turbo equalization/demodulation where soft information about the transmitted data symbols has to be inferred in the presence of the channel uncertainty, given the received signal samples and a priori information provided by the decoder. An exact inference algorithm computes the a posteriori probability (APP) values for all transmitted symbols, but the computation of APPs is known to be an NP-hard problem, thus, rendering this approach computationally prohibitive in most cases. In this paper, we show how many of the well-known low-complexity soft-input soft-output (SISO) equalizers, including the channelmatched filter-based linear SISO equalizers and minimum mean square error (MMSE) SISO equalizers, as well as the expectationmaximization (EM) algorithm-based SISO demodulators in the presence of the Rayleigh fading channel, can be formulated as solutions to a variational optimization problem. The variational optimization is a well-established methodology for low-complexity inference and estimation, originating from statistical physics. Importantly, the imposed variational optimization framework provides an interesting link between the APP demodulators and the linear SISO equalizers. Moreover, it provides a new set of insights into the structure and performance of these widely celebrated linear SISO equalizers while suggesting their fine tuning as well. Index Terms—Linear equalizers, mean field inference, turbo receivers, variational optimization.
I. INTRODUCTION N THE COMBAT against the detrimental effects of intersymbol interference (ISI), turbo receivers/equalizers have gained an enormous burst of research interest in recent years. They are based on the likelihood processing where the equalization and decoding tasks are iteratively combined on the same set of received data and where feedback information
I
Paper approved by B. Hochward, the Editor for Communication and Coding Theory of the IEEE Communications Society. Manuscript received October 15, 2004; revised August 30, 2006. This paper was presented in part at the 17th Annual IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications, September 2006. M. Nissil¨a was with VTT Technical Research Center of Finland, Oulu FIN90571, Finland. He is now with Nokia Siemens Networks, Oulu FIN-90630, Finland (e-mail:
[email protected]). S. Pasupathy is with the Department of Electrical and Computer Engineering, University of Toronto, ON M5S 3G4, Canada (e-mail:
[email protected]). Digital Object Identifier 10.1109/TCOMM.2007.900609
from the decoder is incorporated into the equalization process [1]. In essence, the soft-input soft-output (SISO) equalizer/demodulator and the SISO decoder exchange probabilistic information on the data symbols, most preferably in the form of symbol a posteriori probabilities (APPs), with each other. The computation of exact APPs is called an inference problem, and it is known to be an NP-hard problem. Therefore, various approaches to reduce the computational complexity of the turbo equalizers in both the single antenna and multiple antenna channels have appeared in recent years, as demonstrated by [2]–[13] without any attempt to make the list exhaustive. Specifically, the prefiltering of the received signal before APP computation was proposed as a method for complexity reduction in [7], whereas in [6] the prefilter was tailored for M-BCJR algorithm. In [5], it was shown that the sum-product algorithm operating on a loopy factor graph (LFG), where the data symbols form a set of variable nodes, converges to a good approximation of the exact APPs as long as the factor graph has a girth of at least six. Furthermore, linear SISO equalizers consisting of forward and feedback finite impulse response filters jointly optimized by using the minimum mean square error (MMSE) criterion were proposed for single antenna systems in [2]–[4], [14] and for the multiple-input multiple-output (MIMO) channels in [8], [9]. In [2], the filter coefficients were computed under the assumption that perfect a priori information on transmitted symbols is available, whereas in other cases the estimated a priori information obtained from the decoder is taken into account in computation of the filter coefficients. The other techniques to reduce the computational complexity of the turbo receivers include the list sphere decoding [13], the semidefinite relaxation of the maximum-likelihood (ML) detection problem [10], or sequential Monte Carlo methodology [11], [12]. The marginalization of some global function is a very generic problem in turbo processing, and is frequently encountered in many other research fields as well, such as in neural computation, artificial intelligence, and machine learning. Therefore, the development of approximate methods for the computation of marginals, i.e., approximate inference, has been a subject for active research within these seemingly different fields for years [15]–[19]. One of the most popular approaches for the approximate inference is the variational optimization, a technique having its origin in the statistical physics [20]. The main aim of this paper is to show that many of the well-known linear SISO equalizers, like the channel-matched
0090-6778/$25.00 © 2007 IEEE
¨ AND PASUPATHY: SOFT INPUT SOFT OUTPUT EQUALIZERS NISSILA
1301
filter-based linear SISO equalizers [2] and the linear MMSE SISO equalizers [3], as well as the joint equalization/demodulation and soft decision directed (SDD) channel estimation algorithms [21] can be interpreted as specific instances of the generic variational free energy minimization (VFEM) algorithm. Importantly, the imposed variational optimization framework seems to provide an interesting link between the “optimal” APP demodulator and the linear SISO equalizers, and for its own part, tries to characterize the intimate relationship between them. In addition, it provides a new set of insights into the structure and performance of these algorithms, and suggests some small improvements on them as well. But the formulation of the linear SISO equalizers as solutions to the variational optimization problem may have even more far-reaching implications. It essentially supplements the earlier contributions, which relate the standard turbo-decoding algorithms to the generic belief propagation algorithm (BPA) operating on the LFG [22] and show that any stationary point of the BPA operating on the LFG can be interpreted as a solution to the constrained free energy minimization problem [17], [23]. This paper is structured as follows. In Section II, the baseband signal model is presented, and the problem definition and short review of the variational optimization are given. In Section III, three existing examples of SISO equalizers/demodulators are reinterpreted using a statistical physics perspective. In Section IV, discussion and some numerical performance results are provided, and the conclusions are finally drawn in Section V.
In (2), p(r | s) denotes the probability density function of r given the vector s, and IC (s) denotes the indicator function for all symbol vectors s which satisfy the code constraints C. The optimal receiver is very complex, since the symbol sequence probabilities have to be computed for all permissible sequences, taking values in the Cartesian product space S K where K denotes the length of the data burst. So, in turbo processing, the complexity issue is addressed in two different ways: 1) by iterative processing where the symbol sequence distribution is updated iteratively in separate demodulation and decoding blocks that exchange probabilistic information with each other in the form of extrinsic information; and 2) assuming that the symbol probabilities at the output of each block are independent in distinct symbol intervals. In fact, the validity of this independence assumption eventually determines how well the turbo receiver performs with respect to the optimal receiver. So, the basic problem in the SISO demodulation, as well as in the SISO decoding is to find an approximate factorization of the symbol sequence distribution so that the approximation error due to the factorization is minimized with respect to some appropriate distance measure. An intuitively appealing choice is to approximate the symbol sequence distribution with the product of marginal distributions, and this is what is commonly used in turbo processing. This implies that the SISO demodulator approximates the symbol sequence distribution as 1 p(r | s)p(s) ≈ pr (sk ). (3) p(s | r) ≈ pr (s) = p(r) k
II. SYSTEM DESCRIPTIONS A. Baseband Signal Model In this paper, we consider the transmission of a block of coded linearly modulated data symbols over the frequency-selective channel. Without sacrificing any generality, we assume a single antenna transmission system throughout the paper. By using the discrete-time baseband system modeling with symbol-rate sampling, the received signal vector can be expressed as r = Hs + n
(1)
where H denotes the channel convolution matrix, s denotes a column vector of transmitted data symbols, and n denotes a column vector of independent and identically distributed complex Gaussian noise samples with zero-mean and variance σn2 . Each data symbol sk in s is taking values in the discrete space S = {α1 , . . . , αP } where P denotes the number of complex constellation points of the symbol mapper.
Here, p(s) denotes the a priori distribution of s, which in accordance with the turbo principle, is postulated to be of product form as p(s) = k λ2,k (sk ) where λ2,k (sk ) represents the extrinsic information on sk delivered by the channel decoder via the interleaver. The marginal distribution pr (sk ) is computed as pr (sk ) = s\sk pr (s), where the summation is performed over all symbols in s except the symbol sk (“summary for sk ”). A theoretical justification for the marginalization in the light of information divergence is given by Lemma 1. Lemma 1: The product of marginals can be interpreted as an optimal factorization in the sense of minimizing the Kullback– Leibler (KL) divergence between the object distribution pr (s) and any fully factorized distribution pr (sk ) = arg min D(pr (s)Q(0s)) (4) k
Q(s)
where Q(s) is constrained to be a product of arbitrary symbol probabilities qk (sk ), i.e., Q(s) = k qk (sk ). The KL divergence between the distributions pr (s) and Q(s) is defined as
B. Turbo Receiver Principles An optimal (lossless) decoding for the coded signal in the presence of ISI channel is obtained as ˆs = maxs p(s | r), where p(s | r) represents the a posteriori distribution of the codewords given the vector p(r | s)IC (s) . p(s | r) = s p(r | s)IC (s)
(2)
D(pr (s) Q(s)) =
s
pr (s) ln
pr (s) . Q(s)
(5)
Proof: See Appendix. The KL-divergence can be regarded as a distance measure between two distributions although it is generally asymmetrical, i.e., D(pr Q) = D(Q pr ). The symbol APPs pr (sk ) are efficiently computed with the standard forward-backward
1302
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 7, JULY 2007
processing (BCJR) algorithm [24] but the complexity grows exponentially as a function of the channel memory length L. A pertinent question at this point is whether the minimization of D(Q(s) pr (s)) produces any feasible less-complex solution. The answer is in the affirmative as will be shown shortly. Meanwhile, however, we will discuss how the KL divergence D(Q(s) pr (s)) relates to the statistical physics concept called variational free energy. C. Variational Free Energy Minimization The distribution pr (s) can be expressed in the form of Boltzman distribution as 1 1 (6) pr (s) = e− T E(s) Z where the posterior log-likelihood E(s), called the energy of the sequence s, is given as E(s) = −rH Hs − sH HH r + sH HH Hs − T ln p(s).
(7)
Furthermore, Z is a normalizing constant (called the partition function in statistical physics) and the temperature T corresponds to the variance σn2 . The Helmholtz free energy defined as FHelmholtz = −ln Z is a fundamentally important quantity in statistical physics, since it can be used to characterize the underlying physical system [20]. Physicists have, therefore, developed a wide range of techniques to calculate the Helmholtz free energy either exactly or approximately. The most important approximate methods are Monte Carlo probabilistic resampling algorithms and deterministic variational inference algorithms. Before describing the variational inference algorithms, we first consider the problem of computing the Helmholtz free energy as a variational optimization problem. For that purpose, we introduce a variational (“trial”) probability distribution Q(s) for which we define a variational free energy as F(Q) = U(Q) − H(Q), whereU(Q) is a variational average and H(Q) is the energy defined as U(Q) = T1 s Q(s)E(s) entropy of Q(s) given as H(Q) = − s Q(s) ln Q(s). Using routine calculation it can be shown (see [17]) that F(Q) = −ln Z + D(Q(s) pr (s)).
(8)
Since D(Q pr ) is always non negative, and is zero if and only if Q(s) = pr (s), it immediately follows that the variational free energy F(Q) gives an upper bound for the Helmholtz free energy, and the target distribution pr (s) can be interpreted as a distribution that minimizes the variational free energy, i.e., pr (s) = arg minQ(s) F(Q). Importantly, the treatment of the inference problem as an optimization problem via the variational free energy minimization (VFEM) suggests some fruitful approaches to obtain approximate inference algorithms. For example, the key element of the mean field approximation is to upper-bound FHelmholtz by minimizing F(Q) over a restricted class of probability distributions. In particular, the fully factorized form of Q(s) leads to what is often called a naive mean field solution [18]. But the computational tractability of the VFEM approach is largely due to the fact that the computation of the KL divergence D(Q pr ) entails taking expectation under the variational distribution Q, instead of the distribution pr as in (5).
III. FORMULATION OF EXAMPLE EQUALIZERS AS INSTANCES OF VFEM ALGORITHM A. Matched-Filter (MF)-Based Linear SISO Equalizers A new interpretation for the MF-based linear SISO equalizers, proposed earlier by several authors (e.g., [2], [3], [25]), using the statistical physics perspective is given by the following proposition. Proposition 1: Consider the minimization of F(Q) subject to constraint that the variational distribution Q(s) is fully factorized. The fixed point equations for solving this VFEM problem results in the output distribution of the MF-based linear SISO equalizer, defined as QMF (s) = k qk (sk ), where 1 Ek 2 |sk − sˆk | . (9) qk (sk ) = λ2,k exp − γk T Here, γk is a normalizing constant and sˆk denotes the output of the linear SISO equalizer given as 1 H H sˆk = e H (r − Hs + Hek sk ) = cH k,MF (r − Hs + Hek sk ) Ek k (10) H where Ek = eH H He denotes the energy of the symbol sk k k at the receiver, ek denotes the kth unit vector, and the kth soft symbol sk is given as sk = Eqk [sk ] = αi qk (sk = αi ) (11) αi ∈S
where Eqk [·] denotes expectation under the distribution qk . Proof: See Appendix. Note, that according to turbo principle, the extrinsic output distribution of the MF-based linear SISO equalizer is defined as Λ1 (s) = k λ1,k (sk ), where λ1,k (sk ) = qk (sk )/λ2,k (sk ). Thus, the soft output of the standard MF-based linear SISO equalizer can be interpreted as the mean field approximation to the objective function pr (s). But, importantly, this novel interpretation provides new insight, and suggests some approaches for fine-tuning of the existing algorithm. In particular, the following points are worth noting. 1) The soft symbol estimates, which are used in the soft interference cancellation, are computed by exploiting the full APP information on the channel symbols. The full APP information is the sum of the information induced by the coding, the estimate of which is provided by the channel decoder in the form of extrinsic information λ2 , and the ISI-channel-induced information embodied by the soft output of the equalizer, i.e., ln qk = ln λ2,k + ln λ1,k . This is in contrast to most of the earlier contributions dealing with the MF-based linear SISO equalizers, where only the extrinsic output of the decoder is used for the soft interference cancellation. Exception is made in [26], where a similar remark concerning the use of full APP information in the context of MF-based linear SISO equalizers was made by using an adhoc reasoning. 2) The variational free energy F(Q) is not a convex function of qk entailing possible existence of multiple local minima to which the mean field algorithm can converge to. The risk of constantly being trapped into the bad local minimum can, however, be reduced by using the mean
¨ AND PASUPATHY: SOFT INPUT SOFT OUTPUT EQUALIZERS NISSILA
1303
proposition, a statistical physics interpretation of these wellknown equalizers is provided by applying the VFEM technique. As such, the requirement that the data symbols take values only in the discrete space is relaxed, and instead they are assumed to take on values in the continuous-complex-valued space, i.e., s ∈ CK . In addition, the data symbols are regarded as random variables with independent Gaussian a priori probability density function (pdf), i.e., p(s) = CN (s, s, V), where s denotes a mean vector and V denotes a diagonal covariance matrix with vk being the kth diagonal element. Proposition 2: Consider the minimization of F(Q) subject to constraint that the variational distribution Q(s) = CN (s, ˆs, Σ), where the mean vector ˆs and the diagonal covariance matrix Σ (with the kth diagonal element being denoted as σs2k ) are unknown parameters subject for minimization. This minimization results in the output distribution of the linear MMSE SISO equalizer, defined as QMMSE (s) = CN (s, ˆs, Σ), where −1 ˆs = s + VHH HVHH + σn2 I (r − Hs) (12) σs2k =
Fig. 1. Factor graph for pr (s) using: (a) exact factorization; and (b) mean field factorization.
field annealing technique [27]. A key idea in the mean field annealing technique is to vary the value of the temperature T as the iterations evolve, starting from some high temperature value, and then, cooling the system down along with iterations. 3) The mean field approximation is also notorious because it over-estimates the entropy of the variational distribution. This is due to fact that the actual probabilistic interactions between the neighboring symbols is approximated with their average influence (mean field) on any specific symbol, i.e., the symbols are assumed to be statistically independent under the variational distribution as shown by the factor graph description in Fig. 1. Therefore, an improvement may be obtained either by defining a more sophisticated structure for the trial distribution Q or by modifying the variational free energy term F(Q) itself so that the probabilistic interactions between symbols can be taken into account in a more sophisticated way. Maybe the most successful examples of the latter approach are the Bethe method leading to loopy belief propagation [17] and the Kikuchi approximation [17], [28]. B. Linear MMSE SISO Equalizers
σn2 . H 2 −1 e eH k k H H + σn V
(13)
Proof: See Appendix. As in the previous case, the standard turbo processing entails the exchange of the extrinsic information between the equalizer and decoder blocks. In particular, the extrinsic output distribution of the MMSE SISO equalizer is given as Λ1 (s) = k λ1,k (sk ), where λ1,k (sk ) = (extr) CN (sk , sˆk , (σs2k )(extr) ) can be obtained from QMMSE (s) by setting sk = 0 and vk = 1. Specifically −1 (extr) H HVHH + σn2 I + (1 − vk )Hek eH = eH sˆk k VH k H × (r − Hs + Hek sk )
σs2k
(extr)
=
σn2 H ek HH Hek
+ σn2
=
(14) σn2 1 = Ek + σn2 SNRk + 1
(15) k where SNRk = E σn2 denotes the signal-to-noise ratio at time k. So, the familiar MMSE SISO equalizer can be interpreted as an instance of what we refer to as the Bayesian mean field (BMF) approximation. Again, this novel interpretation yields some new insight into this familiar problem as stated by the following remarks. 1) The Hessian of F(Q) with respect to mean vector [i.e., [∂ 2 F(Q)]/[∂ˆs∂ˆsH ]] and with respect to variance vector diag(Σ) [i.e., ∂ 2 F(Q)/∂diag(Σ)∂diag(Σ)H ] is positive semidefinite implying that F(Q) is convex in terms of both the mean vector and the variance vector. Due to convexity, the variational free energy in this case has only one minimum, which is a global minimum. 2) It can be easily verified by exploiting the joint Gaussianity of r and s that arg min D(CN (s, ˆs, Σ) pr (s)) ˆ s,Σ
The linear MMSE SISO equalizers have been proposed earlier by several authors [3], [4], [8], [9], [29]. In the following
= arg min D(pr (s) CN (s, ˆs, Σ)). ˆ s,Σ
(16)
1304
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 7, JULY 2007
So, from (4) and (16) we notice that the output pdf of the MMSE SISO equalizer can be expressed as a product of marginals of the target distribution (s), i.e. Q (s) = p r MMSE k pr (sk ), where pr (sk ) = p (s)ds . r \k s\k 3) The Bayesian mean field approach postulates the existence of the a priori pdf of data vector s, p(s). But how should this a priori pdf be specified? At the first iteration, a reasonable choice is the Gaussian pdf with zero mean and unit-variance, i.e., p(s) = CN (s, 0, I), leading to the standard MMSE equalizer. At later iterations, an intuitively appealing choice for the a priori pdf is the Gaussian pdf whose mean vector s and covariance matrix V are computed by exploiting the extrinsic information obtained as a feedback from the channel decoder, i.e., sk = Eλ2, k [sk ] and vk = Eλ2, k [ | sk | 2 ] − | sk | 2 . Moreover, a computationally attractive implementation is obtained by setting V = (1/K) k vk × I, and assuming that the channel matrix is cyclic (e.g., via the use of cyclic prefixed transmission) [14]. It may, however, be advantageous from the performance point of view to calculate the mean and covariance of CN (s, s, V) in terms of the full symbol APP information, instead of only exploiting the extrinsic information. 4) Unlike what is previously proposed in literature (see [3], [29]), the optimal (from the VFEM point of view) variance of the extrinsic output of the MMSE equalizer is exclusively determined by the instantaneous SNR, and the variance of the residual interference need not be taken into account. C. Symbol-By-Symbol APP Demodulators in the Presence of Rayleigh Fading Channel For the purpose of this example, the received signal samples are described by the following state-space model: (17) hk = Ahk−1 + Gwk rk = sTk hk + nk . (18) Here, the symbol vector at time k is defined as sk = [sk , . . . , sk−L ]T , where L denotes the length of channel memory, and hk denotes the channel impulse response at time k. The model matrices A and G embody the statistical properties of the fading channel. In addition, nk denotes a complex Gaussian receiver noise sample at time k, and wk denotes a sample of vector process noise that is independent of the receiver noise and is distributed according to CN (wk , 0, I). The exact symbol APPs are computed as p(s, Θ | r) dΘ (19) pr (sk ) = s\sk
Θ
where Θ = {h1 , . . . , hK }. The complexity of computing exact symbol APPs scales exponentially in sequence length K, since no trellis-processing is applicable in this case. An approximate method for computing the symbol APPs in the presence of unknown Rayleigh fading channel is given as ˆ MAP ) = ˆ MAP ) pr (sk ) ≈ p(sk | r, Θ p(s | r, Θ (20) s\sk
where the sequence MAP estimate for the channel is defined ˆ MAP = arg maxΘ p(Θ | r). A low-complexity iterative alas Θ gorithm for computing (20) can be obtained via Bayesian EM algorithm, where the data symbols are regarded as the hidden data, and the data symbols and received samples jointly form the complete data specification [21]. This Bayesian EM algorithm can be reinterpreted by using the VFEM framework as shown by the following proposition. Proposition 3: The Bayesian EM algorithm for computing (20) can be interpreted as an instance of the VFEM algorithm where F(Q) ∝ D(Q(s, Θ) p(r, s, Θ)), and the trial distribution Q(s, Θ) is assumed to take a factorized form Q(s, Θ) = Q(s)Q(Θ). In addition, Q(Θ) is assumed to be ˆ k ), where δ(hk − h ˆ k ) denotes a of form Q(Θ) = k δ(hk − h vector Dirac delta function which has the properties: hk δ(hk − ˆ k )f (hk )dhk = f (h ˆ k ) and δ(0) = 1. h Proof: The variational free energy F(Q) can be expanded as
ˆ k ) log p(r, s | Θ)dΘ F(Q) ∝ − Q(s) δ(hk − h Θ
s
−
k
ˆ k ) log p(Θ)dΘ+ δ(hk − h
Θ k
+
k
=− −
ˆ k ) log δ(hk − h ˆ k )dhk δ(hk − h
hk
Q(s)
s
Q(s) log Q(s)
s
ˆk) log p(rk , sk | h
k
ˆk | h ˆ k−1 ) + log p(h
Q(s) log Q(s).
(21)
s
k
Minimizing F(Q) alternatively with respect to Q(s) and Q(Θ) while keeping the others fixed, yields the Bayesian EM algorithm given at the ith iteration as ˆ (i−1) ) ∀k p(sk | r, Θ (22) ˆ (i−1) . = arg max E log p(r, s, Θ) | r, Θ
E-step: Compute ˆ (i) M-step: Θ
Θ
(23)
As shown in [21], the Bayesian EM algorithm (22) and (23) can be implemented by iteratively interconnecting the SDD Kalman smoother and the BCJR algorithm. But, importantly, the VFEM framework provides a nice link between the exact symbol APPs and the approximate symbol APPs obtained via the Bayesian EM algorithm. On the other hand, given that p(s, Θ | r) = p(s | r, Θ)p(Θ | r), the Bayesian EM algorithm can also be interpreted so that the full description of our knowledge about Θ, p(Θ | r), is being approximated by a ˆ From this perdelta-function—a pdf that is concentrated on Θ. spective, any other approximating distribution Q(Θ), no matter how inaccurate it may be, can be expected to be an improvement on the train of spikes produces by the standard Bayesian EM algorithm. For example, a simple Gaussian approximation may
¨ AND PASUPATHY: SOFT INPUT SOFT OUTPUT EQUALIZERS NISSILA
1305
give an improvement, but further exploration of that is beyond the scope of this paper.
IV. DISCUSSION AND NUMERICAL RESULTS As shown in previous sections, both the demodulators computing the exact symbol APPs and the linear SISO equalizers can be interpreted as devices that approximate the joint symbol sequence distribution by a fully factorized distribution such that the information loss is minimized in some sense. So, it seems that the basic difference between these equalizers is not so much in the amount of structure they model, but rather in the measure of information loss they minimize. An aim of this section is to discuss, mostly on an intuitive level, the insight we can gain into the convergence properties of these equalizers, simply by examining the properties of the information divergences they try to minimize. The mean field approximation is characterized by the fact that it finds a single, most massive mode, excluding the less plausible modes of the target distribution. This is because the KL-divergence D(Q(s) pr (s)) is forced to zero in the regions where the target distribution pr (s) has vanishing density. In contrast, minimizing D(pr (s) Q(s)) will favor Q(s) distributions that try to cover all modes of pr (s), although this involves assigning high probability to Q(s) in areas of low probability under pr (s) [30]. For this reason, the KL-divergence D(pr (s) Q(s)) is termed “inclusive” divergence in [30]. By analogy, we call the KL-divergence D(Q(s) pr (s)) “exclusive” divergence by virtue of the fact that it tends to exclude all but the most massive mode of the target distribution. So now the relevant question is which one of these two different KL-divergences is more favorable from the turbo equalizer point of view? Obviously, it is better to try to cover all modes of the target distribution. This statement is, in fact, backed by numerous simulations results found in literature that clearly show the superiority of the APP SISO demodulators to linear MF-based linear equalizers. The MF-based linear SISO equalizers can easily get trapped by a “wrong” mode of the target distribution, which is manifested by the error floor in their performance. This danger is naturally highest at the first iteration of the turbo equalizer, whereas at later iterations, the existence of reliability information from the decoder typically makes the probability mass in pr (s) to concentrate in one mode. Therefore, the simple MF-based linear SISO equalizer is a viable alternative at these conditions. While the MF-based linear equalizer seems to be in difficulties at early iterations, particularly in severe ISI channels, the MMSE equalizer that is based on the BMF approximation seems to do fine. This is because the postulated Gaussian a priori pdf of the data symbols essentially makes the target distribution pr (s) also Gaussian, and thereby, making it unimodal and convex, which greatly facilitates the search for the global minimum. Based on the above reasoning, it is a good strategy to apply the BCJR or linear MMSE equalization at the early iterations of the turbo processing, and then after initial convergence to switch to the low-complexity MF-based equalizer. In essence,
Fig. 2. BER versus Eb /N0 for the QPSK-modulated system in fixed ProakisB channel.
this gives a theoretical justification for the earlier experimental results presented in [3], [25]. In this section, the performance of the turbo receivers that use either the APP SISO demodulator or linear MMSE SISO equalizer as a building block are evaluated by computer simulations in terms of bit error rate (BER) and frame error rate (FER) versus Eb /N0 (Eb denotes the average signal energy per bit, and N0 denotes the spectral density of the receiver thermal noise) with the aim of studying the different ways of defining the soft input of the MMSE SISO equalizer. The simulated matched-filter bound (MFB) is also given as a lower bound to the performance of the investigated receivers. In simulations, information bits were first coded with the rate1/2 (64-state) convolutional code, then interleaved with pseudorandom bit-interleaver, and finally modulated into QPSK symbols. Then the data symbols were collected into frames with the following frame structure: each frame was composed of six cyclic-prefixed data blocks with block size being 512 symbols and two cyclic-prefixed training blocks (one block at each end of the frame) with each having 64 symbols. The length of the cyclic prefix was 20 symbols. The extension of the data and training blocks with the cyclic prefix effectively separates the blocks in time, and also enables the exploitation of the frequency domain equalization (FDE). The channel, in this simulation case, was the fixed (Proakis-B) channel with the impulse response h = [0.407 0.815 0.407]T , and the receiver assumed perfect channel state information (CSI). In Figs. 2 and 3, the performance results in terms of BER and FER versus Eb /N0 are presented. The abbreviation TFDE refers to the turbo detection scheme where the MMSE-FDE SISOequalizer [with the covariance matrix V set at V = (1/K) k vk × I] and SISO decoder are iteratively interconnected. In addition, the abbreviation fullAPP refers to the turbo detection scheme where the a priori pdf p(s) = CN (s, s, V) is defined in terms of full APP information about the symbols, whereas the abbreviation extrINFO refers to the
1306
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 7, JULY 2007
B. Proof of Proposition 1 The variational free energy F(Q) under the constraint Q(s) = k qk (sk ) can be expressed as 1 F(Q) = Q(s)E(s) + Q(s) ln Q(s) − ln Z T s s
1 = ql (sl ) r − Hs 2 − ql (sl ) ln λ2,l T s l l + ql (sl ) ln ql (sl ) + C (24) l
Fig. 3. FER versus Eb /N0 for the QPSK-modulated system in fixed ProakisB channel.
case where p(s) is solely defined in terms of extrinsic information obtained from the decoder. Furthermore, APP demod refers to the turbo detection scheme where the symbol APPs are computed exactly. Both of these performance figures back up the theory in that the full APP information should be exploited in the soft interference cancellation, even though the MMSE SISO equalizer is in question (in the case of MF-based linear SISO equalizer, similar results have been simulated earlier in [26]). V. CONCLUSION In this paper, we revisited the familiar problem of turbo equalization, looking at the problem from the statistical physics perspective. In particular, we introduced a powerful statistical physics tool called the VFEM, and as an example of the potential power of this tool, we showed that the familiar linear SISO equalizers as well as the Bayesian EM-based SISO demodulator in the unknown channel case can be reinterpreted as instances of the VFEM algorithm. Importantly, this novel reinterpretation provides new insight into the structure and performance of these known algorithms, and may possibly give an impetus for new designs. APPENDIX
where C is a constant. The fixed point equations for solving the minimum of F(Q) are obtained by finding the zero gradient point of F(Q) with respect to each factor of Q(s). The gradient with respect to qk is obtained as
∂ 1 F(Q) = ql (sl ) r − Hs 2 ∂qk T s l=k − ln λ2,k + ln qk (sk ) + 1 .
(25)
By setting (25) to zero, the distribution qk (sk ) is obtained as 1 −1 q (s ) r−Hs 2 qk (sk ) = λ2,k e T s l =k l l γk = λ2,k
1 − 1 Ek (sk ) e T . γk
(26)
By exploiting the decoupling properties of the fully factorized variational trial distribution Q(s), we obtain that Ek (sk ) = ql (sl ) r − Hs 2 s
l=k
= r r − sH Q\k HH r − rH H s Q\k + tr ssH HH H Q\k H
= r − Hs + Hek (sk − sk ) 2 + C where · Q\k denotes the expectation under the distribution l=k ql (sl ), tr denotes the trace of the matrix, C is a constant that is independent of sk , and sk denotes the soft symbol value defined in (11). By expanding (27) and incorporating the terms that are independent of sk into the normalizing constant γk , we obtain(9)–(11) after straightforward elaboration.
A. Proof of Lemma 1 A solution to the minimization in (4) is obtained by considering the difference
C. Proof of Proposition 2 The variational free energy in this case can be formulated as F(Q) = F(ˆs, Σ)
D(pr
s)
qk (sk )
− D pr (s)
k
=
s
pr (s)
pr (sk )
k
k
ln
pr (sk ) = qk (sk )
k
D(pr (sk ) qk (sk )) ≥ 0.
= − ln det Σ +
1 σn2
Q(s) s
× (rH r − rH Hs − sH HH r + sH HH Hs)ds
+ Q(s)((s − s)H V−1 (s − s))ds + C s
¨ AND PASUPATHY: SOFT INPUT SOFT OUTPUT EQUALIZERS NISSILA
H 1 H H r + σn2 V−1 s ˆs σn2 + ˆsH HH r + σn2 V−1 s − ˆsH HH H + σn2 V−1 ˆs + C. (27) − tr Σ HH H + σn2 V−1
1307
= − ln det Σ −
Looking for the zero gradient point ofF(ˆs, Σ) with respect to ˆs and by using standard matrix manipulation we find that −1 H ˆs = HH H + σn2 V−1 H r + σn2 V−1 s −1 (28) = s + VHH HVHH + σn2 I (r − Hs). Similarly, by finding the zero gradient point of F(ˆs, Σ) with respect to σs2k , we arrive at (13). REFERENCES [1] C. Douillard, M. J´ez´equel, C. Berrou, A. Picard, P. Didier, and A. Glavieux, “Iterative correction of intersymbol interference: Turbo-equalization,” Eur. Trans. Telecommun. (ETT), vol. 6, pp. 507–511, Sep./Oct. 1995. [2] C. Laot, A. Glavieux, and J. Labat, “Turbo equalization: Adaptive equalization and channel decoding jointly optimized,” IEEE J. Select. Areas Commun., vol. 19, pp. 1744–1752, Sep. 2001. [3] M. T¨uchler, R. Koetter, and A. C. Singer, “Turbo equalization: Principles and new results,” IEEE Trans. Commun., vol. 50, pp. 754–767, May 2002. [4] D. Reynolds and X. Wang, “Low-complexity turbo-equalization for diversity channels,” Signal Process., vol. 81, pp. 989–995, May 2001. [5] G. Colavolpe and G. Germi, “On the application of factor graphs and the sum-product algorithm to ISI channels,” IEEE Trans. Commun., vol. 53, pp. 818–825, May 2005. [6] C. Fragouli, N. Al-Dhahir, S. N. Diggavi, and W. Turin, “Prefiltered spacetime M-BCJR equalizer for frequency-selective channels,” IEEE Trans. Commun., vol. 50, pp. 742–753, May 2002. [7] G. Bauch and N. Al-Dhahir, “Reduced-complexity space-time turboequalization for frequency-selective MIMO channels,” IEEE Trans. Wireless Commun., vol. 1, pp. 819–828, Oct. 2002. [8] M. Sellathurai and S. Haykin, “TURBO-BLAST for wireless communications: Theory and experiments,” IEEE Trans. Signal Process., vol. 50, pp. 2538–2546, Oct. 2002. [9] T. Abe and T. Matsumoto, “Space-time turbo equalization in frequencyselective MIMO channels,” IEEE Trans. Veh. Technol., vol. 52, pp. 469– 475, May 2003. [10] B. Steingrimsson, Z. Luo, and K. M. Wong, “Soft quasi-maximumlikelihood detection for multiple-antenna wireless channels,” IEEE Trans. Signal Process., vol. 51, pp. 2710–2719, Nov. 2003. [11] B. Dong, X. Wang, and A. Doucet, “A new class of soft MIMO demodulation algorithms,” IEEE Trans. Signal Process., vol. 51, pp. 2752–2763, Nov. 2003. [12] D. Guo and X. Wang, “Blind detection in MIMO systems via sequential Monte Carlo,” IEEE J. Select. Areas Commun., vol. 21, pp. 464–473, Apr. 2003. [13] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multipleantenna channel,” IEEE Trans. Commun., vol. 51, pp. 389–399, Mar. 2003. [14] D. Wang, J. Hua, X. Gao, X. You, M. Weckele, and E. Costa, “Turbo detection and decoding for single-carrier block transmission systems,,” in Proc. IEEE PIMRC 2004, pp. 1163–1167. [15] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” Mach. Learn., vol. 37, pp. 183–233, 1999. [16] M. I. Jordan and Y. Weiss, “Probabilistic inference in graphical models,” in Handbook of Neural Networks and Brain Theory, M. Arbib, Ed. Cambridge: MIT Press, 2002. [17] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. Inf. Theory, vol. 51, pp. 2282–2312, Jul. 2005. [18] J. S. Yedidia, “An idiosyncratic journey beyond mean field theory,” in Advanced Mean Field Methods, Theory and Practice, M Opper and D. Saad, Eds. Cambridge, MA: MIT Press, 2001. [19] R. J. McEliece and M. Yildirim, “Belief propagation on partially ordered sets,” in Mathematical Systems Theory in Biology, Communications,
[20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
Computation, and Finance, D. Gilliam and J. Rosenthal, Eds. New York: Springer-Verlag, 2002, pp. 275–300. D. Chandler, Introduction to Modern Statistical Mechanics. London, U.K.: Oxford Univ Press, 1987. M. Nissila and S. Pasupathy, “Adaptive Bayesian and EM-based detectors for frequency-selective fading channels,” IEEE Trans. Commun., vol. 51, pp. 1325–1336, Aug. 2003. R. J. McEliece, D. J. C. MacKay, and J. F. Cheng, “Turbo decoding as an instance of Pearl’s ‘belief propagation’ algorithm,” IEEE J. Select. Areas Commun., vol. 16, pp. 140–152, Feb. 1998. T. Heskes, “Stable fixed points of loopy belief propagation are minima of the Bethe free energy,” Adv. Neural Inf. Process. Syst., vol. 15, pp. 343– 350, 2003. L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. 20, pp. 284–287, Mar. 1974. H. Omori, T. Asai, and T. Matsumoto, “A matched filter approximation for SC/MMSE iterative equalizers,” IEEE Commun. Lett., vol. 5, pp. 310– 312, Jul. 2001. F. Vogelbruch and S. Haar, “Improved soft ISI cancellation for turbo equalization using full soft output channel decoder’s information,” in Proc. IEEE GLOBECOM 2003, vol. 3, pp. 1736–1740. G. L. Bilbro, W. E. Snyder, S. J. Garnier, and J. W. Gault, “Mean field annealing: A formalism for constructing GNC-like algorithms,” IEEE Trans. Neural Netw., vol. 3, pp. 131–138, Jan. 1992. P. Pakzad and V. Anantharam, “Kikuch approximation method for joint decoding of LDPC codes and partial-response channels,” IEEE Trans. Commun., vol. 54, pp. 1149–1153, Jul. 2006. M. T¨uchler, A. C. Singer, and R. Koetter, “Minimum mean squared error equalization using a priori information,,” IEEE Trans. Signal Process, vol. 50, pp. 673–683, Mar 2002. B. J. Frey, R. Patrascu, T. S. Jaakkola, and J. Moran, “Sequentially fitting “inclusive” trees for inference in noisy-OR networks,” Adv. Neural Inf. Process. Syst., vol. 13, pp. 493–499, 2000.
Mauri Nissil¨a was born in Oulu, Finland, in 1968. He received the M.Sc. (Eng) degree (with honors) from the University of Oulu, Oulu, Finland, in 1994. He is currently working toward the Ph.D. (Dr.Tech.) degree at the same university. From 1994 to 1998, he was a Research Engineer with Nokia Mobile Phones, Oulu, Finland. From 1998 to 2005, he was with the VTT Electronics, Oulu, as a Research Scientist. In 2005, he joined Nokia Networks, Oulu, where he is currently working as a Specialist. During 1999–2000, he was a Visiting Research Scientist at the University of Toronto, Toronto, Canada. His current research interests include digital communications and statistical signal processing.
Subbarayan Pasupathy was born in Chennai, Tamil Nadu, India. He received the B.E. degree in telecommunications from the University of Madras, Chennai, the M.Tech. degree in electrical engineering from the Indian Institute of Technology-Madras, Chennai, and the M.Phil. and Ph.D. degrees in engineering and applied science from Yale University, New Haven, CT. Currently, he is a Professor in the Department of Electrical and Computer Engineering at the University of Toronto, Toronto, Canada, where he has been a faculty member since 1972. His research interest over the last 30 years include statistical communication theory, signal processing, and their applications to digital communications. He has served as the Chairman of the Communications Group, the University of Toronto, where he was also the Associate Chairman of the Department of Electrical Engineering. From 1980 to 1983, he was an Associate Editor for the Canadian Electrical Engineering Journal. Prof. Pasupathy is a Registered Professional Engineer in the Province of Ontario, Canada. He served as an Editor of Data Communications and Modulation for the IEEE TRANSACTIONS ON COMMUNICATIONS (1982–1989), as a Technical Associate Editor for the IEEE Communications Magazine (1979– 1982), and wrote a regular humour column entitled “Light Traffic” for the IEEE COMMUNICATIONS MAGAZINE (1984–1998). He won the Canadian Award in Telecommunications by the Canadian Society of Information Theory in 2003, and was elected as a Fellow of the Engineering Society of Canada in 2004.