Linearly Approximated Log-MAP Algorithms for Turbo Decoding

6 downloads 0 Views 233KB Size Report
Different floating-point and fixed-point implementa- tion issues of the Log-MAP algorithm for decoding turbo codes are studied. A theoretical framework.
Linearly Approximated Log-MAP Algorithms for Turbo Decoding Jung-Fu Cheng

Tony Ottosson

Advanced Development and Research

Department of Signals and Systems

Ericsson Inc.

Chalmers University of Technology

7001 Development Dr., RTP, NC 27709, USA

SE-412 96, G¨ oteborg, Sweden

[email protected]

[email protected]

Abstract

sive parallel or a sequential form [8], e.g.,

Different floating-point and fixed-point implementation issues of the Log-MAP algorithm for decoding turbo codes are studied. A theoretical framework is proposed to explain why the Log-MAP turbo decoders are more tolerant of overestimation of SNR than underestimation. A linear approximation (the Log-Lin algorithm) that allows for efficient ASIC implementations and results in negligble performance loss in floating-point models is proposed. It is further shown that a fixed-point implementation of this Log-Lin algorithm, including clipping and quantization of the received samples to 6 bits, results in a loss of performance of less than 0.05 dB.

I

Introduction

The introduction of parallel concatenation [1] opened a whole new way of constructing powerful randomlike codes from simple constituent codes. Using a low-complexity iterative decoding algorithm based on the principle of belief propagation [2], these parallel concatenated convolutional (turbo) codes have been demonstrated to achieve near Shannon limit performance on the AWGN and full-interleaved Rayleigh fading channels. At the heart of this iterative decoding algorithm is a soft-input soft-output module that is capable of computing the a posteriori likelihood of the constituent codes. The original maximum a posteriori (MAP) algorithm [3] is unsuitable for practical implementations because of the required multiplication and exponential operations. By formulating this algorithm in the logarithmic domain [4–7], the multiplications become addition and exponentials disappear. The additions, however, become the soft combining operations: ! X ∗ λi MAX (λi ) , log . e i

i

Since this operation can be decomposed into a recur-

MAX∗ (λ1 , λ2 , λ3 , λ4 ) = MAX∗ (MAX∗ (λ1 , λ2 ), MAX∗ (λ3 , λ4 )) ∗





= MAX (MAX (MAX (λ1 , λ2 ), λ3 ), λ4 ),

(1) (2)

it suffices to find an efficient implementation of MAX∗ (λ1 , λ2 ). Applying the Jacobian logarithm, this can be expressed as [8] MAX∗ (λ1 , λ2 ) = max(λ1 , λ2 ) + log(1 + e−|λ1 −λ2 | ) , max(λ1 , λ2 ) + fMAP (|λ1 − λ2 |). That is, the MAX∗ operation is equivalent to finding the maximum of the two inputs and then adding a correction term fMAP (|λ1 −λ2 |). The resulting LogMAP algorithm thus resembles a two-pass Viterbi recursion [7]: one going forward and one going backward. At each time step of the individual recursion, the branch metrics are added to the state metrics, but, instead of simply taking the maximum of the incoming metrics to a state, the new state metrics are computed by a MAX∗ operation. In Section II, we analyze the interplay between this correction function and the scaling of soft values [6]. Using a modified expression of the MAX∗ operator, we are able to give a more complete characterization of the turbo decoding performance under signal to noise ratio (SNR) mismatch [9]. Much research effort has been directed to finding good low-complexity implementations of the correction function fMAP (z), which is illustrated in Fig. 3. Most notably, close to exact implementation performance has been reported for using look-up tables of 8–16 values [4, 5]. Another approach, the Log-Max algorithm [4, 7], sets the correction term to zero to obtain an extremely simple implementation at the expense of around 0.5 dB loss in coding gain. However, we find these two approaches not well suited for the emerging global third generation mobile communication systems (IMT-2000) based wideband code division multiple access (CDMA). First, with the intrinsic interference-limited nature of CDMA,

the system capacity is directly linked to the operating SNR for a particular quality of service requirement. For certain setups, the loss of 0.5 dB coding gain could amount to 10% loss in capacity. Hence it is preferable to use more accurate implementations for performance improvement. Second, to support a wide spectrum of data rates and the capability of interleaving over multiple time intervals, the turbo decoder will be operated at quite a wide range of SNRs. Since fixed-point ASIC implementation of the turbo decoder is likely to be the only viable solution to support enough decoding iterations for the highest data rate (2M bps) services in IMT-2000, the look-up table approach could turn out to be quite cumbersome as will be discussed in Section III. In Section IV, we shall present the advantages of simple functional approximations to the correction term that are more amenable to ASIC implementation. In particular, we will refine a linear approximation idea first touched upon in the Appendix of [5]. The new MAX∗ operator can be implemented with simple logic and can be easily adapted to different operating SNRs and quantization parameters. This new algorithm, which we will refer to as Log-Lin, can be cost-effectively realized with a highly pipelined architecture. Analysis in the following is illustrated by simulation results of rate 1/3 Wideband CDMA turbo codes [10]. The two constituent encoders, separated by a prunable prime interleaver, use the same generator polynomial given by i h 1+D+D3 . g(D) = 1 1+D 2 +D 3 To terminate the trellis, six additional tail bits are generated by each constituent encoder. Hence, for an input block of N bits, there are 3N + 12 coded bits. The proposed Log-Lin algorithm is found to achieve the same performance as the optimal Log-MAP algorithm under various simulation models. Finally, our conclusions are presented in Section V.

II

Floating-Point Model

The floating-point model of the channel and receiver structure is illustrated in √ Fig.1. The coded bits ct are modulated into ±A = ± REb , where R is the code rate and Eb is average energy per bit. The signal is multiplied by a complex fading factor ht = at ejθt . For an AWGN channel, ht , 1 and, hence, at = 1 and θt = 0. While for an independent Rayleigh fading channel, θt is a random phase rotation uniformly distributed in [0, 2π) and the envelope at is a Rayleigh random variable with unit second moment, i.e., the 2 probability density function of at is pr (at ) = 2at e−at , for at ≥ 0.

ht

h∗t

Lc =

2A σ2

±A Re

Turbo

rt

λt Decoder

AWGN(2σ 2 ) Figure 1: Floating-point receiver structure. A zero-mean complex Gaussian random variable with variance 2σ 2 = N0 is added to the faded signal, where N0 /2 is the double-sided noise power spectrum density. The receiver multiplies the signal with the conjugate of the perfect channel estimate h∗t . The averaged signal-to-noise ratio is therefore SNR = E[a2t ]

A2 A2 2REb = = . σ2 σ2 N0

The soft values are then scaled by Lc = 2A σ2 before feeding into the turbo decoder. Intuitively, the requirement to scale the soft values comes from the observation that the a posteriori likelihood of the information bit ut can be decomposed into three terms [4–6]: Pr { ut = +1 | r} Pr { ut = −1 | r} = V(ut ) + Lc rt + W(ut ),

Λ(ut ) = log

where V(ut ) and W(ut ) are the a priori and the extrinsic information of bit ut , respectively. Scaling thus ensures the compatibility of these terms and the correctness of soft information exchange between constituent decoders. From an implementational point of view, however, the nonlinear correction function fMAP is the only operation in the Log-MAP algorithm that needs the exact magnitude information. The other two operations (max and addition) are scale-invariant. This implies that the soft values do not need to be scaled for a LogMax decoder, since the correction term is not used at all. A turbo decoder based on the Log-Max decoder is therefore robust against any SNR mismatch problems. This is illustrated in Fig. 2 for a turbo code of block length N = 640 on the AWGN channel. The Log-MAP turbo decoding algorithm, on the other hand, is sensitive to scaling errors as reported in [9]. However, the peculiar behavior of being more tolerant of overestimation of SNR than underestimation has not been explained. As the following analysis shows, this unbalanced reaction is mainly caused by the nonlinearity of the correction function. Realizing that rt = L1c λt , the exact same Log-MAP and hence the turbo decoding algorithms can actually

operate on un-scaled soft values rt by using a modified soft combiner 1 MAX∗ (λ1 , λ2 ) Lc 1 fMAP (Lc |r1 − r2 |). = max(r1 , r2 ) + Lc

MAX∗ (r1 , r2 ) ,

Let z = |r1 − r2 | and suppose the estimated scaling ˆ c = δLc . Referring to Fig. 3, we can observe factor is L ˆ c propagates through the algorithm: how the error in L – For the case of overestimation, we have δ > 1 and ˆ c z) < fMAP (Lc z). The former is therefore fMAP (L further scaled down by dividing δLc . Overscaling the soft values thus means de-emphasizing the correction term. Therefore, the performance of the Log-MAP decoder will converge to that of the Log-Max decoder in the limit of unbounded overscaling. This conver0

10

True Eb/No (dB) 0.75 −1

Frame Error Rate

10

1.05

−2

10

1.35

−3

10

Log−Max Log−MAP −4

10

−4

−2

0 2 4 Eb/No Estimate Error (dB)

6

8

Figure 2: Performance sensitivities of the Log-MAP turbo decoder on the AWGN channel. The frame size is N = 640 and 8 iterations are assumed. 0.7

f MAP(z)=log(1+e−z) f Lin1(z)=max((2.77−z)/4, 0) f Lin2(z)=max((3.20−z)/8, 0)

0.6 0.5

gence can be observed at higher estimation errors towards the right-hand side of Fig. 2. – For the case of underestimation, we have δ < 1. ˆ c z) will not only be larger than fMAP (Lc z) fMAP (L but will also be further amplified by dividing δLc . Because the function fMAP (·) grows faster when the argument is small, the erroneous correction term is overly emphasized. This adds significant amount of noise to the soft information and leads to a large loss of performance as shown towards the left-hand side of Fig. 2.

III

Fixed-Point Model

We consider a q-bit uniform quantization front end for the fixed-point turbo decoder as illustrated in Fig. 4. The quantizer is modeled as scaling the req−1 ceived signal by G = 2MA and then rounding the result to the nearest nominal integer from the set [−2q−1 , 2q−1 − 1]. Effectively, the received signals are clipped at M times their average amplitude and are quantized according to the step size of 2MA q−1 . For a given number of bits q, the parameter M can thus be used to control how these bits are allocated for dynamic range (clipping) and precision (step size). The number of bits q used for soft value representation is a very important design parameter for fixedpoint turbo decoders. It affects the hardware complexity for implementing the algorithm and is also directly linked to the total buffer size required for storing the soft values. Under the constraint of not losing significant performance, this number should therefore be minimized. As in the case for convolutional codes [11], however, the performance of turbo codes is strongly affected by the clipping level multiplier M even for a fixed q. Using the criterion of maximizing the channel cutoff rates, [12] finds the optimal M to be a nonlinear function of the operating SNR. Using the same technique as the last Section, the fixed-point soft combining operator MAX∗ (x1 , x2 ) should approximate   G Lc max(x1 , x2 ) + |x1 − x2 | . fMAP (3) Lc G

0.4

ht

0.3 0.2

h∗t

±A

Re

0.1 0 0

1

2

3 z

4

5

6

Figure 3: The correction function fMAP (z) = log(1+ e−z ) and two linear approximations.

G=

AWGN(2σ 2 )

rt

2q−1 rmax

=

2q−1 MA

q-bit A/D

xt

Turbo Decoder

Quantization

Figure 4: Fixed-point receiver structure.

Note that both G and Lc are functions of the SNR. To support the wide range of operating SNRs in IMT-2000, multiple look-up tables thus need to be implemented accordingly. Furthermore, these lookup tables will be accessed at very high speed in the turbo decoder. It might therefore be costly to implement such high-speed random-access memory hardware. Consequently, it is customary [7] to implement only a single physical MAX∗ (x1 , x2 ) operator. The computation of MAX∗ i (xi ) would then rely on the sequential decomposition of the operation as one shown in Eq. (2). This approach creates a long critical path in the system. As a result, the system needs to run at a higher clock rate to support a satisfactory number of decoding iterations for high data rate services.

IV

Log-Lin Algorithms

To avoid the drawbacks mentioned in the last section, we propose using simple operational approximations to the correction function. The goal is to make the MAX∗ module simple enough such that it can be costeffectively duplicated to form a highly pipelined implementation of MAX∗ i (xi ) using the recursive parallel decomposition (Eq. (1)) as one shown in Fig. 5. For the case of S inputs to the MAX∗ operation, the parallel implementation can reduce the length of the critical path down to log2 S from (S − 1) for the sequential implementation. In numerical simulations, the parallel decomposition shows slightly better performance than the sequential approach, which seems to be a result of better numerical properties. As shown in Fig. 3, we proposed the following linear approximation to the correction function:   c−z fLin (z) = max ,0 , D where D can be 4 or 8 and c is a parameter that can be optimized for performance. Following Eq (3), the soft combining operation can be implemented in the fixed-point domain as MAX∗Lin (x1 , x2 ) c − LGc |x1 − x2 | G ,0 = max(x1 , x2 ) + max Lc D   C − |x1 − x2 | ,0 , = max(x1 , x2 ) + max D

!

where C = b LGc cc. Since dividing by 4 or 8 is simply a shift of the bit positions by 2 or 3, the MAX∗Lin operation can be implemented with simple logic circuits as one shown in Fig. 6. The algorithm can be easily adapted to different operating SNRs, because the effect of quantization gain and the requirement to scale soft values are combined into a single control parameter C.

We first compare the performance of the proposed algorithm with that of the optimal Log-MAP algorithm under the floating-point model. Four test cases are considered based the two frame sizes (N = 640 and N = 2560) and the two channel models (AWGN and independent Rayleigh fading assumptions). For the linearly approximated decoder, we set D = 4 and c = 4 ln 2. Fig. 7 plots the frame error rates (FER) of these test cases after 8 decoding iterations using the alternative algorithms. The proposed Log-Lin algorithm is found to achieve identical performance as the optimal Log-MAP algorithm. Next we consider a quantization front end that clips the soft values at M = 3 times the average amplitude and uses q = 6 bits to represent the results [12]. For the internal storage of extrinsic information and state metrics, qe bits are used by the turbo decoder. In our simulation, these internal numbers are simply limited to within [−2qe −1 , 2qe −1 − 1]. Since the extrinsic information tends to grow with the number of decoding iteration, qe needs to be larger than q. However, as shown in Fig. 7, the FERs of the fixed-point Log-Lin decoder using qe = q + 2 are less than 0.05 dB inferior to the corresponding floating-point models for the four test cases considered. This is very surprising since the extrinsic information could easily grow to an order of magnitude larger than the input soft values over iterations. Indeed, most of the extrinsic information in our simulation is found to be saturated after a few iterations. We also compared the performance curves to those using qe = q + 3 and found no discernible improvements. x1 MAX*

x2

MAX*

x3 MAX*

x4

MAX*

x5

MAX*(x1,...,x8)

MAX*

x6

MAX*

x7 MAX*

x8

Figure 5: Pipelined MAX∗ (x1 , x2 , . . . , x8 ).

C + qe y qe

-

SUB |x-y| -

of

sgn

+ SUB

x

implementation

0 0

ADD |C-|x-y||

sgn

MAX*(x,y) MUX MUX

qe

Figure 6: Conceptual implementation of the simplified MAX∗ operation with D = 4.

Acknowledgement The authors are grateful to Michael Breschel of Ericsson Inc. for assisting in the fixed-point simulation codes.

0

10

Log−Lin, fixed−point Log−Lin, floating Log−MAP, floating

−1

References

Frame Error Rate

10

AWGN

[1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: turbo-codes,” Proc. IEEE Int. Conf. Commun. ’93, pp. 1064–1070, May 1993.

Indepdent Rayleigh

−2

10

N=640 −3

10

[2] D. J. C. MacKay, R. J. McEliece, and J.-F. Cheng, “Turbo decoding as an instance of Pearl’s ‘belief propagation’ algorithm,” IEEE Journal Sel. Areas Commun., vol. 16, pp. 140–152, Feb. 1998.

N=2560

N=2560

N=640

[3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory, vol. 20, pp. 284–287, Mar. 1974.

−4

10

0

0.5

1

1.5 Eb/No in dB

2

2.5

3

[4] P. Robertson, E. Villebrun, and P. Hoeher, “A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain,” Proc. IEEE Int. Conf. Commun. ’95, pp. 1009-1013, June 1995.

Figure 7: Performance comparison of the floatingpoint Log-MAP, float-point Log-Lin and fixed-point Log-Lin decoders on the AWGN and independent Rayleigh fading channels. Eight decoding iterations are assumed.

[5] S. Benedetto, G. Montorsi, D. Divsalar and F. Pollara, “Soft-output decoding algorithms in iterative decoding of turbo codes,” The Telecommun. and Data Acquisition Progress Report, Jet Propulsion Laboratory, CIT, vol. 42-124, pp. 63–87, Feb. 1996.

V

[6] J. Hagenauer, E. Offer and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 429–445, 1996.

Conclusions

The Log-MAP algorithm is a logarithmic domain description of the MAP algorithm that can be expressed as a max operation and an additive correction term. Neglecting the correction term, we get the LogMax algorithm. In interference-limited systems like WCDMA, however, this approximation may result in a capacity loss of as much as 10%, and hence we need to include also the correction term (or a approximation thereof). The drawback of including the correction term is that it makes the algorithm sensitive towards SNR mismatch. In the analysis we showed why this is true and also explained the reason why, for the Log-MAP algorithm, overestimation of the SNR is more tolearable than underestimation. Since the correction term is a nonlinear function, it is typically approximated by a set of look-up tables (one for each SNR region). However, for ASIC implementation this implies a need of high speed memory. Instead we proposed using a linear approximation (the Log-Lin algorithm) of the correction term. Simulations indicate that this Log-Lin algorithm achieves the same performance as the Log-MAP algorithm, and that a fixed-point implementation of the Log-Lin algorithm with optimized clipping and quantization to 6 bits, results in a loss of performance of less than 0.05 dB.

[7] A. J. Viterbi, “An intuitive justification and a simplified implementation for the MAP decoder for convolutional codes,” IEEE Journal Sel. Areas Commun., vol. 16, pp. 260–264, Feb. 1998. [8] J. Erfanian, S. Pasupath, and G. Gulak, “Reduced complexity symbol detectors with parallel structures for ISI channels,” IEEE Trans. Commun., vol. 42, pp. 1661-1671, Feb/Mar/Apr 1994. [9] T. A. Summers and S. G. Wilson, “SNR mismatch and online estimation in turbo decoding,” IEEE Trans. Commun., vol. 46, pp. 421-423, Apr. 1998. [10] Third Generation Partnership Project (3GPP), Technical Specification 25.212: Multiplexing and Channel Coding (Frequency Division Duplex Mode), ver. 3.0.0, Oct. 1999. [11] I. M. Onyszchuk, K.-M. Cheung and O. Collins, “Quantization loss in Convolutional decoding,” IEEE Trans. Commun., vol. 41, pp. 261–265, Feb. 1993. [12] G. Jeong and D. Hsia, “Optimal quantization for soft-decision turbo decoder,” Proc. IEEE Veh. Techn. Conf. ’99 Fall, pp. 1620–1624, Sep. 1999.