RadioVetenskap och Kommunikation 99
Karlskrona 14 - 17 juni 1999
Timing Synchronization with Turbo Codes on AWGN Channels Bartosz Mielczarek1, Arne Svensson2 1
Department of Signals and Systems, Chalmers University of Technology, SE-412 96, Göteborg, Sweden. PH: +46 31 772 1763, FAX: +46 31 772 1748. E-mail:
[email protected]. 2 Department of Signals and Systems, Chalmers University of Technology, SE-412 96 Göteborg, Sweden. PH: +46 31 772 1751, FAX: +46 31 772 1748. E-mail:
[email protected].
Abstract
der to reduce the rate of the code. The simplicity of the encoder is counterbalanced, however, by a much more complicated decoder structure, which is shown in figure 2. A typical turbo decoder uses two separate Soft-In/Soft-Out MAP (Maximum A Posteriori) decoders which are connected with a proper bit reordering by two interleavers (INT) and one deinterleaver (DEINT). The sequence from the encoder is modelled as X k = (u k,x kp) (for puncturing and rate 1/2), where u k is the systematic bit and x kp is the parity bit corresponding to the systematic bit. The received samples are given by: Y k = (y ks ,y kp) , where y ks = u k + n ks and y kp = x kp + n kp . Here n ks and n kp are independent samples of two white Gaussian processes with variance of σ 02 .
This paper presents the behaviour of a turbo coding scheme when the symbol timing is not perfectly known. The algorithm for timing synchronization together with the theoretical Cramer Rao bound and the performance of an NDA ML synchronizer are presented. It is shown that synchronization can be achieved using the soft bit outputs of the turbo decoder without the need of using complex separate synchronizers prior to feeding the signal to the turbo decoder.
1
Introduction
Due to the cell planning schemes and battery conservation of portable receivers, the future wireless communication systems may have to be used at very low signal-to-noise ratios (SNR). This requirement creates the necessity of using very powerful codes, capable of operating at the low SNR and good schemes for synchronization of frequency, phase and timing of the modulated signal. The codes that seem to be the most interesting are the turbo codes, a new approach to channel coding introduced in 1993, which are capable of operating very close to the Shannon limit. The behaviour of such codes is usually tested under the assumption of perfect channel knowledge and perfect synchronization. This paper looks at the effects of timing mismatch on the bit error probability of turbo codes and proposes a simple timing recovery algorithm.
2
uk
uk RSC1 PUNCT.
INT
xpk
RSC2 Figure 1: Structure of a turbo encoder.
DEINT p yk1
Turbo Codes
Turbo codes were first introduced in [1] and have been a subject of an intense research ever since. They belong to a class of parallel concatenated codes - the structure of a typical turbo encoder can be seen in figure 1. It consists of two systematic recursive convolutional encoders (RSC), which are separated by a pseudo-random interleaver (INT). The output of the encoders can be punctured in or-
MAP1
INT MAP2
yks p yk2
INT
Figure 2: Structure of a turbo decoder.
1
Main menu
Contents
Author index
Print
The most widely used MAP algorithm is the recursive BCJR algorithm, which allows for relatively easy calculation of the log likelihood ratio (LLR) (see [2]) as P ( u k = +1 Y ) L ( u k Y ) = log ---------------------------------- P ( u k = -1 Y )
0
10
−1
10
(1)
−2
where Y is the whole received sequence. The MAP algorithm must be provided with exact parameters of signal and noise power in order to give optimum performance. The most significant feature of the turbo decoder is the feed-back loop which uses the soft bit output of one decoder as additional apriori information for the second decoder. The algorithm works iteratively, i.e. the quality of decoding improves with consecutive iterations. There is, however, a threshold, after which additional runs of the algorithm do not decrease the number of errors.
3
BER
10
−3
10
−4
10
−5
10
1
1.5 E /N (dB)
2
2.5
3
0
code, having the interleaver size of N=256. The number of decoding iterations was set to 10. The signal was transmitted on an AWGN channel, with QPSK modulation and root-raised cosine (β=0.3) pulse shaping. All BER calculations were done using data from 10000 transmitted blocks. In figure 3, we show the performance of a turbo coding scheme when the symbol timing varies according to a Gaussian distribution around the optimum value. It is clear that even moderate variance of a timing error causes severe degradation of the BER. There are two main reasons for such a behaviour: the sampling in the wrong position introduces additional ISI and reduces the power of the useful signal (see next section), thus decreasing the effective SNR significantly. The second reason is that such a situation renders the parameters of the channel and signal provided to the MAP decoder obsolete and such mismatch makes the decoder commit more decoding errors than usually (see [3]). In the picture, the reference graphs of a perfectly synchronized turbo decoder and a decoder working with synchronization parameter errors with a variance given by the Cramer-Rao bound (calculated using (2) and (3)) are shown as comparison. It must be noted that the bound provided by the Cramer-Rao inequality is rather loose in this SNR region and is not likely to be approached even with the most advanced synchronization techniques [4].
(2)
where N is the number of symbols and ξ the normalized mean square bandwidth of the pulse spectrum. The ξ parameter for the root-raised cosine pulse used in this paper is given by (3)
where β is a roll-off factor (see [5]). The problem of synchronization is even more complicated when no training sequences may be used to recover the timing position. In such a scheme, only the Non Data Aided (NDA) estimation schemes are applicable. In this paper we concentrate only on the NDA approach.
4
0.5
Figure 3: Bit error rate of the turbo code caused by symbol timing offset (variance is normalized by the symbol period length).
Good synchronization is essential in digital wireless communication. The incoming HF signal needs to be downconverted to lower frequency and sampled properly in order to minimize intersymbol interference [6]. Such conversion requires exact knowledge of carrier frequency, carrier phase and symbol period. Any mismatch in those parameters may lead to severe degradation of the system performance and increased bit error rate. The synchronization problem becomes more difficult at low symbol SNR γ, which is shown by the modified Cramer-Rao bound for the timing error variance, given by
π2 ξ = ----- + β 2 ( π 2 – 8 ) 3
0
b
Background on synchronization
1 var { ε – ε 0 } > -------------------------2⋅N⋅γ⋅ξ
ideal CR var=0.01 var=0.03 var=0.05 var=0.07 var=0.1
Degradation due to timing errors
5
In order to asses the degradation of performance caused by the non-optimal sampling point we tested a turbo coding scheme with different timing offsets. The following simulations were conducted with the (031,027) half-rate turbo
Soft bits output
The soft bit output of the MAP algorithm is usually used only for the final decision on the transmitted bit. It may, however, be used for other purposes - including synchronization. In figure 4 we present the behaviour of the soft
2
analog part
TIMING ESTIM.
1 0.8
DMF
0.6
INTP
TURBO
0.4
digital part
0.2
Downconverter
0 0.5
Figure 5: Digital receiver structure with a turbo decoder.
1 0.5
0 Time offset (ε/T)
0 −0.5 −1
The sampling is done by a free-running sampler and the sampled signal (generally not synchronized) is fed to the digital matched filter (DMF). It is then processed by the interpolator unit (INTP) which tries to recover proper values of the signal at the optimum sampling point by means of linear interpolation (or any other similar scheme). The signal is then fed to the turbo decoder (TURBO) which generates soft bits used by the timing estimator to generate estimate of symbol timing.
−0.5 Phase offset (φ/π)
Figure 4: Normalized sum of squared bits of the turbo decoder after one iteration. bits calculated by the LLR function given by equation (1) for different time and phase offsets. The soft bits produced by the decoder over a decoding block were squared, added together and finally normalized. It can be clearly seen that the maximum lies exactly in the point of perfect synchronization. In order to approximate the soft bit function along the time offset axis we model the received signal as y k = a RC ( ε )u k + n kISI
7
In figure 7 we show the proposed timing estimator based on the soft bit output of a turbo decoder processing nonsynchronized samples. The timing estimator works with the following assumptions: 1. Sampling is done exactly four times per symbol - the packet is short enough for this to hold, even if the sampling clock has a small drift. 2. The phase is perfectly known prior to the timing recovery - the phase recovery algorithm is not specified in this paper and will be treated separately in the near future. 3. The initial ε ∈ (– 0.5T S,0.5T S) , where TS is the symbol
(4)
where a RC ( ε ) is the scaling factor depending on the shape of the pulse and timing offset, and n kISI is a sample of approximately Gaussian process. Its variance is not constant and depends on the timing offset as 2 ( ε ) = σ 2 + 1.8049ε 2 , where σ 2 ( ε ) is σ n2 ( ε ) = σ 02 + σ ISI 0 ISI the variance of the ISI-inducted noise. The numerical value was calculated empirically using simulations for RC pulse with β=0.3. By analysing equation (1) and conducting simulations it can be shown that the average squared LLR value remains almost constant for timing offsets which cause the power of the ISI together with the thermal noise to be approximately equal to or larger than the power of the signal. In our case these offsets are given by ε > 0.32T S . For offsets ε < 0.32T S the average squared output of the LLR grows approximately linearly to reach the maximum at ε = 0 . Such a behaviour allows us to use a simple form of gradient search method, which we present in section 7.
6
Timing estimator
period - in a practical system one can expect some means of external crude synchronization, i.e. by additional signalling channel. The algorithm works as follows: after sampling and filtering of the signal by the matched filter, two sets of samples (separated by one symbol period) given by z 1s, p ( ( k – 0.5 )T s + εT s ) z 2s, p ( ( k + 0.5 )T s + εT s )
(5)
form vectors Y 1 = (z 1s ,z 1p) and Y 2 = (z 2s ,z 2p) , which are fed independently to the turbo decoder which performs the standard decoding procedure for one iteration. The soft bits generated for both sets are processed as
Receiver structure
N–1
The proposed receiver structure (figure 5) attempts to move the signal processing to the digital domain as early as possible.
ℵ =
∑ ( L 2 ( uk Y 1 ) – L 2 ( uk Y 2 ) )
(6)
k=0
forming a metric ℵ . The metric is subsequently fed to a
3
lookup table (figure 8), which produces the timing estimate εˆ 1 used by the interpolator to modify the set of samples using linear interpolation. New sets of samples
1 TS metric 1/2T metric
z˜ 1s,p ( ( k – 0.25 )T s + ( ε – εˆ 1 )T s ) z˜ 2s,p ( ( k + 0.25 )T s + ( ε – εˆ 1 )T s )
S
0.5
(7)
are used for the second iteration. Note that now samples are separated only by a half symbol period. The same procedure of getting the new estimate of time offset εˆ 2 is repeated and the samples are modified again. In the next step the actual decoding iteration is conducted using single set of samples z˜ s,p ( kT s + ( ε – εˆ 1 – εˆ 2 )T s ) and the whole procedure is repeated, but only the samples separated by half symbol period will be used in consecutive synchronization iterations. The table lookup functions, mapping the timing error to the output of equation (6), are obtained from equation (1) and figure 4. They are monotonic, although the first one (TS metric) is quite flat in the region close to the optimum sampling point. Such a behaviour causes slow convergence of the timing estimate once it is close to the optimum. This is the reason for using the other function, which has no such problem, provided that ε ∈ (– 0.25T S,0.25T S) .
0
−0.5
−1 −0.5
0 ε/T
0.5
Figure 8: Lookup table functions for SNR=3dB.
8
Performance
The performance of the proposed timing recovery algorithm was tested and compared with the traditional ML timing estimator described in [4]. Such an ML synchronizer uses a bank of matched filters (in this case 50 filters) with different delays, corresponding to different timing offsets. After the initial sampling, the digital signal is fed to those filters, which results in sample sequences of different delays. These sequences are then downsampled and the resulting values are squared and added over the block forming a target function.The timing estimate is chosen to be the value of ε maximizing the value of the target function as given by
1
0.8
0.6
0.4
0.2
N–1
0 −1
−0.5
0 t/Ts
0.5
εˆ = arg max ∑ z ( kT s + εT s ) 2 ε k=0
1
Using εˆ , the linear interpolator finds new values of the incoming samples, which in turn are used by the decoder. The variances of the timing estimate produced by the proposed algorithm (MAP) for the three first iterations and the reference ML algorithm are shown in figure 9. The timing error variance provided by the proposed MAP algorithm (in the third iteration) outperforms the ML estimate for SNR larger than approximately 1dB. Unfortunately, after the third iteration the synchronization gains are negligible. It is also worth noting that the variance itself does not determine whether one scheme will outperform the other - we will show that the shape of the error pdf is also important. The superiority of MAP algorithm is confirmed in figure 10 which shows the actual BER of the turbo code used together with the synchronization units described above.
Figure 6: Example of non-perfect sampling ( ε = 0.1 ). The solid lines represent the samples used in the first iteration and the dashed lines the samples used in the second iteration after the first, rough estimation of the timing offset. LOOKUP TABLE DMF
INTP
TURBO1
∑(
⋅ )
TURBO2
∑(
⋅ )
(8)
2
2
Figure 7: Timing recovery unit.
4
The MAP synchronizer with 3 synchronization iterations and 10 decoding iterations achieves better results than the ML synchronizer in the region of SNR>2dB. Moreover, the ML synchronizer seems to reach a performance floor and the curve becomes much flatter than the one corresponding to the MAP synchronizer. It is not clear why for the lower SNR, the MAP scheme is inferior to the ML. The loss occurs, however, in the region of BER>0.01, which is hardly ever used. The actual timing error distribution for the three consecutive iterations of the MAP scheme and the ML scheme can be seen in the figure 11. The ML error distribution curve is narrower than the MAP one, but there are also large errors which occur when the initial sampling offset is close to the maximum half symbol period and the ML algorithm produces the estimate of the same absolute value but the opposite sign. Those large errors are therefore not cycle slips - they do not cause losing of the symbol. The MAP synchronizer produces more crude estimates in terms of the pdf, but they are limited to the range of ε ∈ (– 0.2T S,0.2T S) . There are no large errors that cause significant increase of BER.
−1
10
−2
10
ML MAP − 1 iter. MAP − 2 iter. MAP − 3 iter. −3
10
0
0.5
1
1.5 Eb/N0 (dB)
2
2.5
3
Figure 9: Timing error variances. 0
10
−1
10
9
Conclusions
−2
BER
10
−3
10
−4
10
The joint synchronization and decoding of turbo codes seems to give good results in the low SNR region. Even the simple solution presented in the paper outperforms the traditional ML method and reduces the hardware complexity by making use of the existing turbo decoder structure and avoiding sending pilot symbols. Acknowledgement: This work has been funded by Center for Personal Computing and Communication (PCC).
MAP ML ideal CR var=0.01 var=0.05 var=0.1
−5
10
0
0.5
1
1.5 Eb/N0 (dB)
2
2.5
3
Figure 10: BER for differently synchronized turbo codes.
References
0.4 ML
0.35
[1]
MAP − 1 iter. 0.3
MAP − 2 iter.
[2]
MAP − 3 iter.
0.25 0.2
[3]
0.15 0.1
[4] 0.05 0 −1
[5] −0.5
0 ε/T
0.5
1
[6]
Figure 11: Timing error distributions (Eb/N0=3dB).
5
Berrou, C., Glavieux, A., Thitimajshima, P., “Near Shannon limit error-correcting coding and decoding: turbo codes,” ICC 1993, pp. 1064-1070. Barbulescu, S.A., Iterative Decoding of Turbo Codes and Other Concatenated Codes, PhD Dissertation, University of South Australia, 1996. Summers, T.A., Wilson, S.G., “SNR Mismatch and Online Estimation in Turbo Deciding”, IEEE Transactions on Communications, vol. 46, 1998, pp. 421-423. Meyr, H., Moeneclaey, M., Fechtel, S.A., Digital Communication Receivers, Wiley 1998 Mengali, U., D’Andrea, A.N., Synchronization Techniques for Digital Receivers, Plenum Press 1997 Proakis, J.G., Digital Communications, McGrawHill 1995