Designing good permutations for turbo codes - CiteSeerX

2 downloads 0 Views 395KB Size Report
residual bit error rate (BER) at the decoder output, when increasing the SNR. Turbo codes [1], and other multidimensional codes, are able to offer quasi-optimum.
Designing good permutations for turbo codes: towards a single model C. Berrou, Y. Saouter, C. Douillard, S. Kerouédan, and M. Jézéquel GET/ENST Bretagne/PRACOM, Unité CNRS 2658, BP 832, 29285 BREST Cedex FRANCE. Phone: (+33) 229 001306, Fax: (+33) 229 001184 E-mail: [email protected] Abstract—The design of the turbo code internal permutation is crucial regarding the minimum Hamming distance (MHD) and the related achievable asymptotic gain. After having set down the permutation issues in a synthetic way, this paper presents a generic model which allows us to obtain large MHDs for turbo codes and also offers large possibilities of natural parallelism in the decoder architecture. Keywords-component; turbo code; permutation; interleaving; minimum Hamming distance; parallelism.formatting. I.

sequence, which is naturally the all-zero sequence, and to consider non-zero codewords with minimum weight as possible competitors for the decoder. • A return to zero (RTZ) [2,3] sequence is any finite input sequence which makes a recursive systematic convolutional (RSC) encoder quit state 0 and makes it go back to this state. Using the D operator, and denoting G(D) the recursivity generator of the encoder (G(D) = 1 + D + D3 for polynomial 15, in Fig. 1), all RTZ sequences are multiple of G(D), and vice-versa.

INTRODUCTION

X

Obtaining near capacity performance with a compound error-correction code involves a twofold problem, both aspects being quite independent: first, no information waste in the decoder, and second, a sufficient minimum Hamming distance (MHD). The former point is necessary to enable the decoder to remove errors at signal to noise ratios (SNRs) as low as possible, while the latter guarantees a deep decreasing of the residual bit error rate (BER) at the decoder output, when increasing the SNR. Turbo codes [1], and other multidimensional codes, are able to offer quasi-optimum performance for two reasons. First, turbo decoding is indeed devised in such a way that no information is wasted or partially unused, that is, to be more precise, both component decoders benefit from the whole redundancy, thanks to the message passing procedure. Second, multiplicities are low, that is, codewords with a minimum distance are not many, much lower than those of algebraic codes (Reed-Solomon, BCH, ...) for instance, by several orders of magnitude. But these two properties do not guarantee that, near the theoretical limit, turbo and turbo-like codes are able to achieve low error rate performance. A sufficient value of the MHD is required to meet this low error rate objective. The MHD of a turbo code (Fig. 1) depends both on the parallel concatenated component codes and on the permutation, whose role is to minimize the probability that the component decoders of C1 and C2 have both little redundant information to decide in favour of a codeword. This paper proposes a generic model which leads to large MHDs for turbo codes and also offers large possibilities of natural parallelism in the decoder architecture. Before beginning the justification and the presentation of this model, let us remind or lay some basic concepts that will be used in the next sections: • Turbo codes, as well as all codes used in practice, are linear. It is then possible to choose a reference transmitted

IEEE Communications Society

C1 data

permutation Π

Y1

C2

Y2

Figure 1.

8-state classical turbo code (polynomials 15,13).

• The most elegant and powerful way to transform convolutional codes into block codes is circular termination. The circularity principle, that is the way to encode such that the final state of the register is equal to the initial one, allows each component convolutional code to become a perfect block code, without any need for additional information (no tail bits) and without any side effect. While the former point is interesting only for short blocks, the latter is crucial vis-à-vis MHDs for all sizes. Circular termination makes it possible to protect all the bits at the same level of protection, including the starting and finishing portions of the block. All bits being uniformly encoded, and all of them benefiting from the whole set of redundancy, there is no particular insidious effect that would reduce drastically the MHD. Therefore, circular termination constitutes a sizeable simplification in the research for good permutations. One restricting condition to the use of the circularity principle is that the block length must not be a multiple of the period of the pseudo-random generator, from which the circular RSC (CRSC) encoder is derived. For instance, when adopting polynomial 15 for the encoder (Fig. 1), block size cannot be a multiple of 7 (unless using stuffing bits).

341

0-7803-8533-0/04/$20.00 (c) 2004 IEEE

N columns i=0

i=k-1 i=0

writing M rows

k = M.N reading

i = P - (k mod. P) i=P i = 2.P

i=k-1

(b)

(a)

Figure 2. Rectangular (a) and circular (b) permutation.

II.

S min = P0 = 2k

EASE OF USE REGULAR PERMUTATION

In practice, to satisfy the criterion of maximum total spatial distance, P is chosen as an integer close to P0, and prime with k. Almost regular permutation (ARP)

For a long time, regular permutation has been exclusively seen as rectangular (linewise writing in an ad-hoc memory and columnwise reading, Fig. 2(a)). When using CRSC codes as component codes of a turbo code, circular permutation, based on congruence properties, is more appropriate. Circular permutation, for blocks having k information bits (Fig. 2(b)), is devised as follows. After writing the data in a linear memory, with address i (0 ≤ i ≤ k – 1), the block is likened to a circle, both extremities of the block (i = 0 and i = k - 1) being then contiguous. The data are read out such that the jth datum read was written at the position i given by:

i=Π (j)=Pj mod k

(1)

where the skip value P is an integer, prime with k. We define the total spatial distance (or span) S(j1,j2) as the sum of the two spatial distances, before and after permutation, for a given pair of positions j1 and j2:

S ( j1 , j2 ) = f ( j1 , j2 ) + f (Π ( j1 ), Π ( j2 ))

(2)

(3)

Finally, we denote Smin the minimum value of S(j1,j2), for all possible pairs j1 and j2:

S min = min {S ( j1, j2 )}

(4)

j1 , j 2

With regular permutation, the value of P which maximizes Smin is

minimum distance would be around

with the condition:

k=

P0 2

mod P0

O( k ) ). (6)

Case (b) deals with a weight-3 RTZ sequence. Again, whereas the contribution of redundancy Y1 is not high for this pattern, redundancy Y2 gives relevant information over a large

which gives

IEEE Communications Society

7.M , which is a large 2

minimum distance for typical values of k. With respect to this w = 2 case, the code is said to be "good" because dmin tends to infinity when k tends to infinity (according to a law in

(5)

P0 = 2k

The dilemma in the design of a good permutation for turbo codes lies in the need to obtain a sufficient minimum distance for two distinct classes of codewords, which require totally conflicting treatment [4]. The first class contains all codewords made up of a single (irreducible) RTZ sequence, and a good permutation for this class is as regular as possible. The second class encompasses all codewords made up of compound RTZ sequences, and non uniformity (controlled disorder) has to be introduced in the permutation function to obtain a large minimum distance. Fig. 3 illustrates the situation, showing the example of a 1/3 rate turbo code, using component binary encoders with code memory ν =3 and ν periodicity L = 2 -1 = 7. For better visual understanding and though circular was previously said to be more appropriate, the block of k bits is seen as a rectangle with M rows and N columns with

M ≈ N ≈ k , and regular permutation is used, that is, data are written linewise and read columnwise. Case (a) depicts a situation where encoder C1 (the horizontal one) is fed by an RTZ sequence with input weight w = 2. Redundancy Y1 delivered by this encoder is poor, but redundancy Y2 produced by encoder C2 (the vertical one) is very informative for this pattern, which is also an RTZ sequence, but whose span is 7.M instead of 7. The associated

where

f (u , v) = min{u − v , k − u − v }

(7)

342

0-7803-8533-0/04/$20.00 (c) 2004 IEEE

less than 20%). Because the properties of a circular permutation are invariant after any rotation, we can choose Q(0) = 0 and then a given ARP is defined by 2(C – 1) integers, besides P. In practice, for short and medium blocks (say k < 2000), a cycle of 4 is sufficient and the permutation requires 7 parameters, at the most, to be determined. In fact, experience shows that having only one non-zero value for A(j) is sufficient. For long blocks, larger cycles are necessary, but this can be done simply, as explained below.

span, of length 3.M. The conclusions are the same as the above case. Case (c) shows two examples of sequences with weights w = 6 and w = 9, which are RTZ sequences for both encoders C1 and C2. They are obtained by a combination of two or three minimal length RTZ sequences. The set of redundancies is limited and depends neither on M nor on N. These patterns are typical of codewords that limit the minimum distance of a turbo code, when using a regular permutation. N columns 00...001000000100...00

writing

M rows

X

(a)

C1

1101

data regular rectangular permutation on k = M.N bits

reading

C2

Y1

Figure 4. Almost regular permutation with cycle 4 (4 different fluctuation vectors).

(b)

Y2 (c)

1101 0000 0000 0000 0000 0000 0000 1101

A sufficient, but not necessary, condition to ensure that any ARP defined by (8) exists is that all As and Bs are multiple of C. We adopt this not constraining condition, which allows us to rewrite (9) as:

1101 1101 0000 1101

Q ( j ) = C (α ( j ).P + β( j ))

where α(j) and β(j) are small integers, from 0 up to 8, generally, and are repetitive with period C.

Figure 3. Some possible RTZ (Return To Zero) sequences for both encoders C1 and C2, with G(D)= 1 + D + D3 (period 7). (a) with input weight w = 2; (b) with w = 3; (c) with w = 6 or 9.

Two typical set of Q values, with cycle C = 4 and α = 0 or 1, are given below:

In order to "break" rectangular patterns, some disorder has to be introduced into the permutation rule, while ensuring that the good properties of regular permutation, with respect to irreducible RTZs, are not lost. The almost regular permutation (ARP) we propose is based on faint vectorial fluctuations around the locations given by regular permutation, as depicted in Fig. 4. The number of different vectors used in the ARP is called the disorder cycle and is denoted C. C must be a divider of k. The components of the fluctuation vectors are small compared to M and to N. Data are still written linewise, but reading is performed columnwise taking into account the small vectorial deviation at each location. With circular permutation, the visual representation of ARP is not easy, but the equations are simple. Equation (1) is modified as:

i = Π ( j ) = Pj + Q(j) mod k

if j = 0 mod. 4, then Q = 0 if j = 1 mod. 4, then Q = 4P + 4β1 if j = 2 mod. 4, then Q = 4β2 (11) if j = 3 mod. 4, then Q = 4P + 4β3 if j = 0 mod. 4, then Q = 0 if j = 1 mod. 4, then Q = 4β1 if j = 2 mod. 4, then Q = 4P + 4β2 (12) if j = 3 mod. 4, then Q = 4P + 4β3 These models, which require the knowledge of only 4 parameters (P, β1, β2 and β3), were validated on different 8 or 16-state, binary or duo-binary turbo codes. In particular, the turbo code permutation of DVB-RCS [5] and DVB-RCT [6] standards, which was devised by the authors of this article, was inspired by relations (11). The use of duo-binary instead of binary codes does not require any change in the permutation model, except that k, the number of information bits has to be replaced by N = k/2, the number of couples, in the search for parameters P and {β}, in relations (8) and (10).

(8)

with

Q ( j ) = A( j ).P + B ( j )

(9)

The positive integers A(j) and B(j) are periodic with period C, and their values are fairly small compared to P and k/P (say

IEEE Communications Society

(10)

343

0-7803-8533-0/04/$20.00 (c) 2004 IEEE

For long blocks, larger values of C may be necessary. For instance, if k is a multiple of 3, a cycle value of 12 can be simply introduced, by adding to relation (11) or (12) the following rule:

order, according to the technique depicted in Fig. 5. Four Softin/Soft-Out (SISO) (forward or backward) processors are assigned to the four quadrants of the circle under process. At the same time, the four units deal with data that are located at places corresponding to the four possible congruences.

if  j / 3 = 1 mod.2, then Q is replaced by Q + 12δ1P +12δ2 (13) Parameters δ1 and δ2, can be chosen once for all, for all block sizes (for instance δ1 = δ2 = 1).

i or j = 0 Processor 1

Crozier et al. have recently proposed a permutation model, called dithered relative prime (DRP) permutation [7-8], whose key ideas are very similar to those of ARP, but leading to somewhat different equations. III.

Processor 4

i or j = 3k/4 + 3 i or j = k/4 + 1 Processor 2

STRATEGY TO FIND PARAMETERS

The strategy to find the parameters of the permutation is as follows:

Processor 3

i or j = k/2 + 2

step1. Select the degree of disorder in the permutation law, typically 4 for short and medium blocks, 8 or 12 for long blocks.

Figure 5. The circle under process is divided in four quadrants,with four associated SISO (forward or backward) processors, allowing a parallelism degree of 4.

step 2. Find possible sets of parameter P, {β}, {δ} which give a large value to Smin, as defined by (4).

For instance, at the beginning of the process, the first processor deals with data located at an address with congruence 0, the second with congruence 1, and so on. The clock period just after, the first processor will handle data with congruence 1 address, the second one with congruence 2 address, etc. So, through a shift barrel which directs the four processors towards four distinct memory pages corresponding to the four congruences, a parallelism with degree 4 is feasible. For any value of C, larger parallelism with degree pC is possible, where p is any integer, provided that k is also a multiple of p. Indeed, C being the cycle of Π, any multiple pC of C is also a cycle of the permutation, provided that pC is a divider of k. Under this condition, j mod. pC and Π(j) mod. pC are periodic over the circles of length k, and a parallelism of degree pC is feasible. For instance, parallelism with degree 128 is workable for k = 4096, a multiple of 128.

step 3. Among the previous sets of parameters values, select the ones that maximize the RTZ spatial distance (RTZ span):

S RTZ, min = min {S RTZ ( j1, j2 )} j1 , j2

according to relations (2) to (4), for any value of j1 and j2 within the information block, and such that there is a possible RTZ sequence of weight 2 starting at place j1 (resp. j2) and ending at place j2 (resp. j1), and another RTZ sequence of weight 2 starting at place Π(j1) (resp. Π(j2)) and ending at place Π(j2) (resp. Π(j1)). step 4. Check minimum Hamming distances, using for instance the Error Impulse Method (EIM) [9]. Make simulations for crucial cases. IV.

V.

Figure 6 provides the MHDs that were obtained using (12) as the permutation law (cycle 4) of a duo-binary 16-state turbo code, for different block sizes and coding rates. These MHDs were estimated using the EIM, and are compared to the socalled matched MHds (MMHDs), which are the required minimum Hamming distances to obtain a given FER performance (here, FER = 10-7) at theoretical limits [10]. That is, the MMHD is just, no more no less, the MHD which is needed to avoid "flattening" when seeking a given error rate at the lowest possible signal to noise ratio. We can notice that the MHDs that the ARP permutation model yields are not far from the FER = 10-7 MMHDs, except for short blocks and high coding rates. For these cases, an increase of the MHDs by 2 would be welcome.

NATURAL PARALLELISM

The kind of parallelism proposed, that is, the possibility to use several processors without increasing the memory requirements, is inherent to the permutation law1. Let us consider the permutation, defined by (8) and (10), with a cycle C = 4 for instance, k being then a multiple of 4. Because the permutation cycle is 4, the congruences of j and Π(j), modulo 4, are periodic. A parallelism with degree 4 is then possible, for both decoding in natural and permuted 1

The trivial parallelism related to the natural decomposition of turbo processing in backward/forward and first/second encoding is not addressed here.

IEEE Communications Society

SOME RESULTS

344

0-7803-8533-0/04/$20.00 (c) 2004 IEEE

VI.

Designing good permutations for low error rates turbo codes is a very arduous problem. In this article, we have proposed a single and simple model, based on the concept of almost regular permutation. This model requires the search for a limited number of parameters, typically 4 when considering relations (11) or (12) as basic models with cycle 4, suitable for short or medium blocks. Because these parameters are strongly dependent on the two component codes and on the pattern errors they produce, we are not able yet to provide a systematic way to calculate them. However, we have proposed an empirical procedure to find possible sets of appropriate parameters. This model and the way to find its numerical parameters, were validated by numerous experiments. Furthermore, the simplicity of the model propounded leads to straightforward solutions for high degree parallelism, through the possible multiplication of SISO processors, without having to increase the intrinsic and extrinsic data memories. Finally, the permutation law, thanks again to its simplicity, does not require any use of ROM memory to store the permuted addresses. These addresses are easily calculated on-the-fly by simple recurrent additions.

Figure 6. Comparing matched minimum Hamming distances for QPSK and FER = 10-7 (lines), and distances obtained with duo-binary 16-state turbo code (dots), estimated from the Error Impulse Method (EIM).

Figure 7 shows the performance of binary 16-state turbo code: k = 5472, R = 1/3 (cycle 12, according to (12) and (13), with P = 97, β1 = 6, β2 = 4, β3 = 6, δ1 = δ2 = 1, dmin = 67), and duo-binary 16-state turbo code: k = 408, R = 1/2 (cycle 4, according to (12), with P = 43, β1 = 1, β2 = 1, β3 = 1, dmin = 19).

REFERENCES [1]

C. Berrou, A. Glavieux and P. Thitimajshima, "Near Shannon limit error-correcting coding and decoding: turbo-codes", Proc. of IEEE ICC '93, Geneva, pp. 1064-1070, May 1993. [2] R. Podemski, W. Holubowicz, C. Berrou and G. Battail, "Hamming distance spectra of turbo-codes", Ann. Télécomm., Tome 50, N° 9-10, pp. 790-797, Sept-Oct. 1995. [3] C. Berrou and A. Glavieux, “Turbo codes”, in the Encyclopedia of Telecommunications, Wiley, New York. [4] C. Berrou, "Turbo codes: some simple ideas for efficient communications", ESA DSP '2001, Lisbon, Oct. 2001, and ESA TTC '2001, Noordwijk, The Netherlands, Oct. 2001. [5] DVB, "Interaction channel for satellite distribution systems", ETSI EN 301 790, V1.2.2, pp. 21-24, Dec. 2000. [6] DVB, "Interaction channel for digital terrestrial television", ETSI EN 301 958, V1.1.1, pp. 28-30, Aug. 2001. [7] S. Crozier, J. Lodge, P. Guinand and A. Hunt, "Performance of turbo codes with relatively prime and golden interleaving strategies", Proc. of 6th Int'l Mobile Satellite Conf., pp. 268-275, Ottawa, Canada, June 1999. [8] S. Crozier and P. Guinand, "Distance upper bounds and true minimum distance results for turbo-codes with DRP interleavers", Proc. of 3rd Int'l Symposium on Turbo Codes & Related Topics, pp. 169-172, Brest, France, Sept. 2003. [9] C. Berrou, S. Vaton, M. Jézéquel and C. Douillard, "Computing the minimum distances of linear codes by the error impulse method", Proc. of GLOBECOM’02, Taipei, Taiwan, Nov. 2002. [10] C. Berrou, E.A. Maury and H. Gonzalez, "Which minimum Hamming distance do we really need?", Proc. of 3rd Int'l Symposium on Turbo Codes & Related Topics, pp. 141-148, Brest, France, Sept. 2003..

Frame Error Rate

5 10 5

-1

10-2 5 10-3 5 10-4 5 10-5 5 10-6 5 10-7 5 10-8 0

QPSK, 16-state binary, R = 1/3, k = 5472 bits QPSK, 16-state duo-binary, R = 1/2, k = 408 bits

1

2

CONCLUSION

3

Eb/N0 (dB)

Figure 7. Performance (Frame error rate) of binary and duo-binary turbo codes, with almost regular permutation (ARP). See main text for parameters. Max-log-MAP decoding with 8 iterations, 6-bit input quantization

IEEE Communications Society

345

0-7803-8533-0/04/$20.00 (c) 2004 IEEE