Fast Header Decoding Using Turbo Codes on

10 downloads 0 Views 52KB Size Report
In this paper we present a modification of a typical turbo ... increase the reliability of the header decoding even further. ... with low bit error rate. (BER). One example is multiuser packet transmission. In ... operate in the same signal-to-noise ratio (defined here as ... turbo encoder, Rayleigh fading channel and turbo decoder.
Fast Header Decoding Using Turbo Codes on Rayleigh Fading Channels Bartosz Mielczarek, Arne Svensson Chalmers University of Technology Department of Signals and Systems Communication Systems Group SE-412 96 Göteborg, Sweden PH: +46 31 772 1763 FAX: +46 31 772 1748, {Bartosz.Mielczarek,Arne.Svensson}@s2.chalmers.se

Abstract In this paper we present a modification of a typical turbo code to allow for the header bits to be decoded much faster than the other bits. This gives the decoder a possibility to discard data packets not addressed to it, and saves time and power since fewer iterations need to be performed to reliably decode the header. The proposed algorithm frequently terminates the first trellis of the turbo encoder by inserting tail bits into the data sequence. The bits surrounding the termination sequences have much lower bit error rate and higher reliability than the other bits and can be used to store header information. Moreover, a tailored interleaver and puncturer design can increase the reliability of the header decoding even further. The algorithm is shown to decode the header bits at bit error rate of approximately 0.002 (at Eb/N0=3dB) after only one iteration. In addition, the proposed algorithm has lower memory requirements than the non-modified turbo decoding algorithm.

1. Introduction Turbo codes are one of the most promising coding techniques for very noisy channels. Their exceptional performance allows transmitting the signal with very low power which is one of the most crucial demands in future mobile systems. There exist, however, a number of problems with the practical use of turbo codes and one of their major drawbacks is relatively high complexity. The decoding of turbo codes is iterative, i.e., the quality of the decoded bits becomes better after each iteration. While this allows for significant reduction of the decoder complexity, it creates problems for certain applications demanding fast, reliable decoding with low bit error rate (BER). One example is multiuser packet transmission. In such systems, each packet is marked with a so-called header which identifies the sender and the receiver of a packet. After receiving a packet, the decoder reads the header first and decides whether the packet was addressed to it or not. The problem with the turbo coded systems can be easily

identified here - if the header is turbo encoded, in most cases more than one iteration is needed to decode it reliably. If there are many users in the system, a lot of packets will not be intended for the particular receiver and their decoding will take a lot of power and cause significant delays. A straightforward solution to the problem is to encode the header separately and not use the turbo decoder to identify the receiver’s address. Unfortunately, this causes new problems. Firstly, the header code must be very strong to operate in the same signal-to-noise ratio (defined here as SNR=Eb/N0) region as the actual turbo code with which the data is encoded. This means that the rate of such a header code must be very low, which reduces the bandwidth efficiency. Another solution is to transmit the header bits with higher energy than the data. This, however, reduces the power efficiency of the system. In addition to that, since the header bits are outside the turbo code structure, both these solutions reduce the turbo code interleaver length (for fixed delay) which impairs the BER performance (see [1]). A promising solution is to use a compact turbo-coded packet for all types of bits: header, data and pilot. Then the length of the codeword can span the whole packet, there is no need to use different codes for different parts of the packet, and additional code properties can be used to estimate the channel (see [5]). The quick decoding of the header calls for a special error protection scheme where the header bits are decoded much faster than the other bits. Although similar to the typical unequal error protection schemes, such a code does not guarantee that the final bit error rate will be different for different bits. Instead, in the beginning of the process, some bits will be decoded much faster and more reliably. Our solution takes advantage of the fact that the bits located close to the beginning and the end of the codeword are usually better protected. This is due to the fact the decoder starts from the known all-zero states. Since in a non-modified code only a few bits have this property, we propose a frequent first trellis termination which will increase the

2.3 Turbo decoder s xk

uk RSC encoder

p,1

xk

N-bit Interleaver

Puncturer p

xk RSC encoder

p,2

xk

Figure 1: The classical structure of a parallel turbo encoder number of well protected bits. After a fixed number of data bits, the first code trellis is terminated by tail bits and the process continues until the whole codeword is created. Moreover, we modify the interleaver and puncturer structures to further decrease the BER values for the selected bits.

2. System Model The system analysed in this paper consists of three blocks: turbo encoder, Rayleigh fading channel and turbo decoder. A brief description of these blocks follows.

2.1 Turbo encoder The architecture of the typical parallel turbo encoder is shown in Figure 1. It consists of two identical recursive systematic convolutional encoders (RSC), connected in a parallel way by a pseudo-random interleaver. The stream of N input bits u k is fed to the first encoder without any modifications and is randomly interleaved for the second encoder. The parity bits x kp, 1 and x kp, 2 produced by the encoders are then modulated and transmitted together with the systematic bits (denoted also as x ks ). A natural rate for this code is 1/3 (one systematic bit and two parity bits for one data bit) but is easily increased by puncturing some of the parity bits.

2.2 Channel model



˜  α˜  k – 1 ( s′ ) ⋅ γ k ( s′ ,s ) ⋅ β k ( s )  S+ - L ( uˆ k ) = log  ----------------------------------------------------------------------  ˜  α˜ k – 1 ( s′ ) ⋅ γ k ( s′ ,s ) ⋅ β k ( s )  S



(2)

where L ( uˆ k ) is the value of the soft decoded bit k, and α˜ k ( s′ ) and β˜ k ( s′ ) are the recursively calculated probabilities of arriving at state s′ computed from the start and the end of the trellis, respectively. Finally γ k ( s′ ,s ) is the probability of the transition between states s′ and s , which (for the first iteration) is given by 1 e ( u ) γ k ( s′ ,s ) ∼ exp  --- u k L 12 k  2

(3)

1 ⋅ exp  --- L c ( a ks y ks u k + a kp y kp x kp ) 2  e ( u ) is the extrinsic information about bit k, where L 12 k passed from the first APP decoder to the second one, L c = 4E C ⁄ N 0 is the channel reliability factor, y ks and y kp are the noisy received values of the systematic and parity bits, respectively, and finally a ks and a kp are the fading amplitudes of the systematic and parity bits, respectively. The numerator of the MAP equation includes all the transitions caused by u k = +1 ( S + ) and the denominator the transitions caused by u k = – 1 ( S - ). The BCJR decoding is performed at each component APP decoder and the extrinsic soft bit information is exchanged between them in a fashion reminding of a turbo engine principle (hence the name of the code). As a result, the consecutive iterations improve the BER rate until the saturation point (usually after 10-20 iterations).

3. Decoding with the BCJR algorithm

The streams of systematic and parity bits are interleaved prior to transmission over the channel and BPSK modulated with symbol energy E C . The demodulated and deinterleaved code symbols y k experience Rayleigh fading and are corrupted by white Gaussian noise. We assume here ideal interleaving which implies that fading amplitudes are uncorrelated (channel has no memory). Moreover, we assume that the receiver has ideal channel state information (CSI) and is perfectly synchronized. The received code symbols are then represented as yk = ak xk + nk

The turbo decoder consists of two A-Posteriori Probability (APP) decoders which decode each component code separately and exchange reliability information about the decoded bits. APP decoders usually use the BCJR-MAP decoding algorithm ([2]) which computes the soft bits using the log likelihood ratio (LLR)

(1)

where a k is a fading amplitude, x k ∈ {– E C , + E C} and n k is white Gaussian noise with E [ n k2 ] = σ 02 .

Looking at equation (2) one can see that the summations over the transitions involve probabilities α˜ k ( s′ ) and β˜ k ( s′ ) of arriving at the states of the trellis. Obviously, especially at early iterations, the probabilities for all states will not differ much, reflecting the uncertainty of the decoding process (see [6]). However, the bits transmitted in the beginning of the trellis are usually better protected since the first state of the trellis is always specified (usually the all-zero state) resulting in α˜ 0 ( 0 ) = 1, α˜ 0 ( s′ ≠ 0 ) = 0 . For the first transmitted bit u 1 , equation (2) reduces to γ 1 ( 0 ,0 )  β˜ 1 ( 0 ) L ( uˆ 1 ) = log  ------------------ + log  -------------- (4)  γ 1 ( 0 ,1 )  β˜ 1 ( 1 )

data subblock

data subblock header subblock

payload bits Subblock 1

tail bits

header subblock header bits

Subblock 2 Figure 3: Packet division principle

Data bit transition

Tail bit transition

Figure 2: The multiple trellis termination principle since there are only two states 0 and 1, which can be reached from the starting state 0. The following bits will suffer from higher uncertainty due to the larger number of states. Equation (4) reveals that there are basically three issues that need to be addressed in order to provide better protection of the header bits: a) increasing the number of bits with at least one adjacent well defined state. b) increasing the ratio of γ k ( 0 ,0 ) ⁄ γ k (0,1) . c) increasing the ratio of state probabilities β˜ k ( 0 ) ⁄ β˜ k ( 1 ) . A simple solution to all the above problems will be presented in following sections.

4. Multiple trellis termination Since in a non-modified turbo code only few first bits are better protected, we propose a multiple first RSC trellis termination scheme which increases the number of bits close to a known state. Figure 2 shows the principle. The incoming data sequence is divided into n subblocks of fixed length. The data bits in each subblock are then transmitted and the first code trellis is terminated by inserting tail bits. This operation splits one long trellis of the first encoder into a set of shorter trellises, carrying data bits and tail bits. The new code has the following properties: a) The code rate is reduced due to the insertion of the tail bits which carry no useful information b) The number of the starting bits is now n c) The decoding of the first trellis is simplified due to the shorter trellises The modified rate of the code is given as nK r = r 0  1 – -------  N

(5)

where r0 is the original rate of the code, K is the memory of the code and N is the interleaver length. Even though the proposed code modification changes the codeword Hamming weight distribution, there is no significant BER performance loss (see [5]). Moreover, one can see additional advantages such as rate compatibility and improved synchronization. Having increased the number of bits with the property gi-

ven by equation (2) to n, the remaining question is how to divide the block into the subblocks and how to design the interleaver and puncturer so the ratios in the equation (4) will be as large as possible.

5. Packet division Since terminating the trellis increases the reliability of bit detection close to the termination point, a straightforward way to increase the ratios β˜ k ( 0 ) ⁄ β˜ k ( 1 ) is to make sure that the kth bit is close to another termination point on the right (the β˜ k ( s′ ) probabilities are calculated backwards). This can be achieved by inserting the header bits between two trellis terminations as can be seen in figure 3. After sending the data, the first trellis is terminated by sending a proper tail sequence. Then some of the header bits are encoded and the trellis is once again terminated. This process continues until the whole codeword is formed. In this way, the header bits are ‘wedged’ between the allzero state on the left and all-zero state on the right (even though there are some tail bit transitions in this case). This will lead to a better reliability of decoding as we will se in the following sections.

6. Extrinsic information The above discussion dealt with the first component decoder, at the first iteration which means that no extrinsic information was available and proper trellis termination was possible. Decoding of the second component code is more difficult to analyse due to the presence of the interleaver and the extrinsic ‘a priori’ information generated by the first decoder. This means that equation (4) can no longer be used since the header bits will generally be interleaved to random positions. However, as equation (3) shows, providing the second decoder with high extrinsic information e ( u ) can improve the decoding quality even though no L 12 k specific states lie close to the header bits. In other words, if we map high extrinsic information bits to the transitions surrounding the header bits, the surrounding states should also be detected with good reliability. e (u ) Figure 4 shows the average extrinsic information L 12 k generated by the first APP decoder (with the generator polynomials (031,027)oct (K=4), N=1000 bits, 10 header bits grouped in 5 regions (n=10) with 2 bits in each of them, r=1/3 and Eb/N0=3dB).

20

Original puncturing pattern

18 16

|Le12(u )|

14 12

k

tail bits

header bits Modified puncturing pattern

10

tail bits

8 6 4 2 0

200

400

600

800

1000

k

Figure 4: Absolute extrinsic information passed from the first to the second decoder as a function of the position in the code block (block length 1000, with 10 header bits distributed in pairs every 200 bits). One can clearly see that the reliability of bit decoding is much higher for the header and tail bits. This suggests that proper interleaving of those bits can improve the decoding in the second APP decoder. Note that without multiple trellis termination, there is no way of knowing beforehand where the high extrinsic information bits are located.

7. Interleaver design Figure 5 shows the principle of an improved interleaver construction. Firstly, the number of high reliability bits m assigned to each interleaved header bit is chosen (in our example, we choose to assign one such bit on both sides of each header bit which gives m=2). Secondly, a subset of nm bits with high extrinsic information value is chosen. In our example we chose the tail bits following each header transmission. Thirdly, m random bits from the above subset are chosen under special constraints and interleaved to surround each interleaved header bit. The two constraints are: a) No surrounding high reliability bit can be interleaved to the vicinity of the same header bit as it originated from. b) For each interleaved header bit, all m interleaved high reliability bits must originate from different header bits. First code trellis Interleaving Second code trellis header bit header tail bit Figure 5: Interleaver design principle

‘normal’ bit

systematic bits

parity bits xp,1

parity bits xp,2

Figure 6: The puncturing principle (note that the parity bits xp,2 are shown after proper deinterleaving) The above constraint ensure the correlation between the interleaved extrinsic bits to be very low which increases the decoding performance (see [4]).

8. Puncturing When the rate of a turbo code is to be increased, a common solution is to puncture some of the parity bits, which reduces the total number of code bits. In this paper, without any significant loss of generality, we look into a rate 1/2 punctured code, where every second parity bit from each component encoder is erased. Puncturing must be designed in conjunction with the interleaver since it may happen that some bits will be better protected than the other. For example, when an even bit position is interleaved into an odd position, both parity bits will be transmitted (since puncturing leaves first parity bits on even positions and second parity bits on odd positions). This means, however, that another bit will have both its parity bits punctured (see [3]) and its protection will be much weaker. Figure 6 shows two steps of the puncturing pattern design (for rate 1/2 code and header bits located at even positions). Firstly, each header bit is interleaved to an odd position. This ensures that their second parity bits will not be punctured. Secondly, the typical puncturing pattern (where the first parity bit is punctured for odd positions and the second parity bit is punctured for even positions) is used with the following modifications: a) In the tail bits following the header, all second parity bits x p, 2 are punctured and all first parity bits x p, 1 are kept. b) The first parity bit for the second header bit is kept, instead the second parity bit for the last tail bit is punctured (see figure 6). The above construction keeps the same rate of the code and guarantees that the first APP decoder will have good statistics about all the bits adjacent to the header bits and will generate high reliability information. The second decoder

will thus be able to recover from missing parity bits for the tail and provide low BER estimates. The above method can be easily modified to any code rate. The basic rules are: a) Leave all the parity bits for the header bits. b) If puncturing of tail bits is necessary, leave as many first parity bits as possible.

If the above modification cannot achieve the desired header BER, additional protection can be gained by repeating systematic header bits (again at the cost of decreased code rate). Equation (3) now becomes 1 e ( u ) γ k ( s′ ,s ) ∼ exp  --- u k L 12 k  2 L–1

−1

10

BER

9. Additional systematic bits

1  ⋅ exp  --- L c  u k 2 

0

10



∑ aks, l yks, l + akp ykp xkp 

(6)

−2

10

−3

10

−4

10

0

200

400

600

800

1000

k

Figure 7: A performance example of the proposed scheme for Eb/N0=3dB, rate 1/2,100000 transmitted blocks (the thick black line shows the average BER for the whole block).

l=0

where L is the number of repeated systematic bits for the given header bit u k , a ks, l and y ks, l are the fading amplitude and received code symbol for the lth repeated systematic bit. The reason for retransmitting the systematic bits instead of parity bits is that they are used in both component APP decoders as opposed to different parity bits. This method, however, should be treated as the last resort since it does not incorporate the additional bits into the code structure.

10. Algorithm performance

any desired protection level of the header bits (also by including additional systematic bits if necessary). Moreover, no additional circuits are necessary in its implementation. Since the proposed code structure will have different weight spectrum than the classical code, a deeper and more general analysis concerning its asymptotic performance and convergence properties is necessary. It is also possible that another interleaver design (for example using more high reliability bits to surround the interleaved header bits) can push the performance limits even more. These problems will be subject to future work.

Acknowledgement The proposed code modification was tested by simulation. The parameters described in section 6 were used in all simulations (we also tested the code with rate 1/2). 100000 random data packets were transmitted independently and decoded using our method. The performance of the proposed algorithm for rate 1/2 is shown in figure 7. It is obvious that the selected header bits are detected very reliably after the first iteration. The actual bit error rate for the header bits is 0.005 while the average bit error rate for the whole block is only 0.08. For the rate 1/3 code, the actual values are 0.002 and 0.03, respectively. Obviously, the decoder will be able to identify the header bits faster in this case as compared to the typical case where all the bits have the same level of protection. This should result in significantly reduced delay and power needed for packet identification.

11. Conclusions We show that relatively simple modification of the code can provide very effective and fast header decoding, reducing the delay and power consumption at the decoder. The proposed algorithm can be easily tailored to any code and

This work was supported by the Swedish Foundation for Strategic Research under the Personal Computing and Communication grant.

References [1] C. Berrou, A. Glavieux, P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: turbo codes,” ICC 1993, pp. 1064-1070. [2] L. Bahl, J. Cocke, F. Jelinek, and J.Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, pp. 284-287, Mar. 1974. [3] Barbulescu, S.A., Iterative Decoding of Turbo Codes and Other Concatenated Codes, PhD Dissertation, University of South Australia, 1996. [4] J. Hokfelt, On the Design of Turbo Codes, PhD Dissertation, Lund University, Lund, Sweden 2000. [5] B. Mielczarek, A. Svensson, “Joint Adaptive Rate Turbo Decoding and Synchronization on Rayleigh Fading Channels,” VTC 2001, Rhodes, Greece, pp.1886-1890 [6] B. Mielczarek, Synchronization in Turbo Coded Systems, Licentiate thesis, Chalmers University of Technology, Göteborg, Sweden, 2000