fixed point Demodulator

A Turbo Co/Decoder implementation for next generation DVB-S2 S.Benedetto1, T.Botticchio2, P.Burzigotti2, R.Degaudenzi3, A.Martinez3, G.Montorsi1, F.Richichi2 and P. Tabacco4. 1

DELEN, Politecnico di Torino, Corso Duca degli Abruzzi 24 Phone: +39 011 5644099 fax: +39 011 5644000, [email protected], [email protected] 2

Space Engineering S.p.A., via dei Berio 91, 00155 Roma Italy Phone. +39 06 225951, Fax. +39 06 2280739, [email protected], [email protected], [email protected] 3

TOS-ETC, ESA/ESTEC, Keplerlaan 1, Postbus 299, 2200 AG Noordwijk (The Netherlands) Phone. +31 715654227, Fax. +31 715654596 [email protected], Phone. +31 715654943, Fax. +31 715654596 [email protected] 4

DSP SYSTEMS, via dell'Orsa Maggiore 21, 00144 Roma Italy, [email protected]

Abstract Present work describes the algorithmic design and show the performance of a Turbo Decoder aimed for low Eb/No working points (near Shannon limit) that was purposely design to work in a pragmatic approach with the following constellations QPSK, 8PSK, 16APSK, 16QAM. This Turbo Codec is based on a pure Turbo Codes implementation without the need of additional Reed Solomon Outer Codec. 1 Overview Turbo codes, first introduced in 1993, are a new way to construct concatenated codes able to achieve near Shannon-limit error correction in terms of BER. The presence of large interleavers and iterative feedback decoding allow reaching these results thanks to relatively simple structure. In a first issue turbo codes have been presented in parallel fashion, made up by two RSCs (Recursive Systematic Codes) and an interleaver. The progress of the study gets possible different choices in number, type and concatenation of coders to obtain solutions with different performances and complexity (both during encoding and decoding phase). Today a serial concatenation seems to allow better performance, yielding lower error floors than parallel concatenated codes (PCCC). Also, they usually employ constituent encoders with lower complexity (less states) leading to a remarkably simpler decoding algorithm. In order to improve the bandwidth efficiency it is also possible to cascade a high-level modulator, as alternative to traditional TCM (trellis code modulation). In the following Figure 1 an general scheme for a codulator (coder and modulator) is depicted. Particularly, in our study-case the encoder is a TURBO encoder, the source produces equally probable binary symbols (0/1), the modulator is a QPSK, 8PSK, 16PSK or 16QAM.

Figure 1: Codulator scheme

The quantities shown in the previous Figure 1 are related by the following relationships: Rs Re =Rs/rc Rm = Re/M = Rs/(Mrc) = Rs/r

transmission rate of the source encoded stream rate baud rate

The parameter rc characterizes the encoder in terms of bandwidth and error correcting capability; it means how much redundancy is introduced. 2 Logic Structure The presence of a multilevel modulation involves a first choice about the turbo encoder: in literature both "pragmatic" and "ad-hoc" encoders are present. The bottom refers to turbo-coder related to the (known) modulation, with a direct mapping of encoder outputs to the constellation points. Being assigned a source rate and the code rate rc, the modulation and the mapping rule are selected together to obtain the prefixed channel symbol rate Rm. A "pragmatic" solution, instead, is completely independent from the modulation and, then, from Rm, with a major flexibility: it is possible to fix the code rate and modulation independently; the correct channel symbol rate is achieved with an appropriate puncturing pattern. The advantage of a "pragmatic" solution lies in its flexibility, being sources with different rates accepted without needing for heavy modification. The present design of Turbo is aimed to work with different modulation (QPSK, 8PSK, 16APSK, 16QAM) achieving a number of spectral efficiency, but just the puncturer has to be modified. The payment is a slightly lower performance. In the present paper we refer to a "pragmatic" SCCC1 solution, characterized by the presence of an outer 16-state convolutional coder, a puncturer, an interleaver and an inner differential encoder, a pure Turbo without needing for concatenated code (such Reed-Solomon). The main purpose of the interleaver is to avoid error patterns. After the corrections in one dimension (first decoder) the remaining errors should be spread becoming correctable error patterns in the second dimension (second decoder). The error correction capability of turbo code will approach that of product code.

1

Serially Concatenated Convolutional Code

Input Register size K

Outer (Convolutional) Encoder

Puncturer

Inner (Differential) Encoder

Interleaver

rate 1/2

row-column Interleaver

Modulation

rate 1/1

Figure 2: Codulator scheme.

The Turbo is made out of two convolutional codes serially concatenated using 16-states (outer code) and 2-states (inner code). A triangular, asymmetric S-random interleaver with fixed length 12000 bits is used. The structure for different supported coding rates and modulation is shown in the following Table 1. Table 1. Different modulation parameters design.

Modulation

FEC Rate [r]

QPSK

1/2 3/4 2/3 5/6 3/4 7/8

8-PSK 16-APSK (16-QAM)

Spectral efficiency [] (bps/Hz) 0.994 1.491 1.988 2.486 2.983 3.480

Channel symbols per block [F]

Information bits per block [K]

6000 6000 4000 4000 3000 3000

5996 8996 7996 9996 8996 10496

Table 2. Target Eb/N0 for different spectral efficiencies with 8PSK modulation. Code rate

Spectral Efficiency

2/3 3/4 4/5

2 2.25 2.4

Eb/N0 Capacity Standard [dB] 6.45 7.33 7.75

2.77 3.67 4.29

Const. Capacity

1 dB worse than const. capacity

Target Eb/N0 [dB] (2 dB over standard)

3.14 4.04 4.65

4.14 5.04 5.65

4.45 5.33 5.75

In order to set the performance specifications for the project, we have evaluated the Eb/N0 in dB required to obtain a “quasi error free” information flow at the input of the MPEG multiplexer. We based on the DVB standard document [2] but since in our case the spectral efficiencies are different from the DVB standard (lacking Reed-Solomon encoder), we have interpolated linearly the values reported in [2]. An example is summarized in Table 2 where the 3rd column represent the Eb/N0 given by the standard, the 4th column the value of Eb/N0 induced by capacity, the 5th the block-size constrained bound for interleaver size 12,000 and BER=10-10, the 6th the values at 1 dB from constrained capacity with block size 12,000, and the 7th the agreed objectives of Eb/N0 yielding a 2 dB gain with respect to the standard.

3 Encoder This section simply reports the chosen encoders constituting TURBO. The outer encoder is the first that bits meet, while the outer is just before the channel interleaver. 3.1

Inner differential encoder

r = 1/1

k0=1 , n0=1

S={S0,S1}

U={U0}

0

S0

z-1

C={C0}

S0

1

U=0 U=1

C0

U0

0

S1

1

t=k

S1 t=k+1

Figure 3. 2-states Systematic Encoder for rate 1/1.

The choice of so a simple inner encoder is made in orders to minimize the presence of floor during decoding. It can be shown that the outer code rate (ro) must be as lower as possible, then inner code rate (ri) as greater as possible: rc = ri * r0 . As the inner code rate is 1/1 it prevents this code be systematic. 3.2 Outer convolutional encoder The scheme in Figure 4 represents the recursive convolutional encoder. It is a 16-states machine with a code rate fixed to 1/2. r = 1/2 U0

C0

z-1

z-1

z-1

z-1 C1

Figure 4. 16 states Recursive Systematic Encoder for rate 1/2.

4 Decoder In literature a number of decoding algorithm are known, but just ML (Maximum Likelihood) and MAP (Maximum A Posteriori) are implemented for their performance; both of them are studied to be implemented in turbo decoders. The ML algorithm, known also as Viterbi algorithm minimizes the probability of code word error but not the single symbol error. During the decoding process only the probability of most likely sequences is calculated and compared to obtain survivor for the sequent step. No information about the symbol is calculated but just that of the whole sequence. Even if solutions, such as SOVA, were proposed to modify and to improve the original ML algorithm making possible the exchange of "soft" information about the single symbol probability, they still remain sub-optimal solution.

The MAP, instead, is a more complicated algorithm (then with a major implementation complexity) that points to the minimization of symbol error through the maximization of the a-priori knowledge of single symbol probability. The basic feature of turbo decoders is the feedback of information elaborated during the process. A complete decoding is carried out after a number of decoding iteration on the same block of input symbols.

NOT used

P(Ci,I/O) = A PRIORI PROBABILITY of coded bits I/O from i-th SISO P(Ui,I/O) = A PRIORI PROBABILITY of information bits I/O from i-th SISO

From soft channel interleaver

6 bit quantization 8 bit quantization

P(C2,I)

P(C2,O)

SISO

6 8

INNER (differential)

P(U2,O)

DE-

P(C1,I)

8 INTERLEAVER

8

DEPUNCTURER

SISO OUTER (convolutional)

P(U1,I)

P(U2,I)

INTERLEAVER

P(C1,O) 8

PUNCTURER

8

P(U1,O)

8

Figure 5: SISO interconnection to exchange extrinsic information.

The constituent blocks, labeled SISO (Soft Input - Soft Output), implement a recursive exchange of information one to other in order to have the knowledge on information bits grown. No hard decision is made during the decoding process but "soft" information is calculated that is a "weighted" decision. The SISO block accepts as input the a priori probability of information bits and coded bits from a previous estimation and outputs their update version toward the next block. The APP algorithm is based on the calculation of probabilities associated to trellis. Particularly, the scope of the decoder is to allow a decision based on comparison of the probability that bit is 1 and the probability that bit is 0. It is possible to obtain that comparison one shot simply modifying the original algorithm and propagating llr2 values.

5 Implementation The Turbo Codec was implemented using the architecture and the fast prototyping system shown below. The Codec implementation had a maximum rate of 2Mbps with 7 iterations. Changing the Coder rate we could experience at 400Kbps up to 30 iterations.

2

log likelihood ratio, defined as

æ P(u = 1) ö logçç ÷÷ è P(u = 0) ø

RAM 1

27

c1

πΙ

RAM 3

RAM α

π(c) π(c)

c0 8

RAM 2

π(c)

27 27

8

MUX 27

MUX

MUX

27

27

MUX

ΙΝΙ M U X

192

αk

192

β1

192 192

M U X

192

β2

192

192

πO

llr(c1)

8

llr(c0)

8

Figure 6. SISO memory architecture.

ACS - PE

πk(c) 9

αk(s) βk(s)

12

12

MAX-MIN

∆ αk(s) 12

πk(c) 9

12

M U X

αk+1(s)

CORR 12

MSB

βk(s)

LUT

12

MAX

Figure 7. ACS Processor Element (ACS-PE).

12

β k-1(s)

α

prog bus

/16 data, /16 addr, / 3 csb, rdb, wrb clk 20 MHz

FPGA1 Connection FPGA

(Frame Sync)

clk 20 MHz

SHARK6 BOARD

R A M R A M

FPGA 40 MHz ALTERA EP20K1500E1X

decoded_out decoded_out_valid

MS1 0x480000 - 0x480FFF

FPGA6 start_read llr_ready llr_valid / 24 llr

prog_wrb prog_rdb prog_csb /16 prog_addr /16 prog_data

: 2

Soft Demodulator

SIMM

SISO+ Interleaver+ De-Interleaver

FPGA5

SIMM

MS 2

decoded_out_valid decoded_out

MS1 0x481000 - 0x481FFF

MS1 0x482000 - 0x482FFF

/ 5 I,Q IQ_vaild fs_sync start_read / 24 llr llr_valid llr_ready

DSP 6 SHARC

prog bus

UART inteface BER Meter MS0

ALTERA DSP BUILDER PROFESSIONAL DEVELOPMENT BOARD

External PC

Figure 8.Shark6 board.

UART

clk 20 MHz

FPGA3 PC LPT1 interface

decoded_out_valid decoded_out

clk 20 MHz

6 Performances The Turbo Codec performance measured from the HW demonstrator is shown for rate 2/3 and 4/5 in terms of BER, FER and PER. BER/EbNo 2/3

4.50

4.40

4.30

4.20

4.10

4.00

3.90

3.80

3.70

3.60

3.50

3.40

3.30

3.20

3.10

3.00

1.00E+00

1.00E-01

MEASURED BER

1.00E-02

1.00E-03

1.00E-04

1.00E-05

1.00E-06 EbNo

Figure 9 Rate 2/3 performances

1

0.1

0.01

FER PER vPER

0.001

0.0001

0.00001

EbNo

Figure 10 Rate 2/3 performances

4.40

4.30

4.20

4.10

4.00

3.90

3.80

3.70

3.60

3.50

3.40

3.30

3.20

3.10

3.00

STATISTICS 2/3

BER/EbNo 4/5

1.00E-01

1.00E-02

MEASURED BER

1.00E-03

1.00E-04

1.00E-05

1.00E-06

1.00E-07

1.00E-08

1.00E-09 EbNo

Figure 11. Rate 4/5 performances

STATISTICS 4/5

0.1

0.01 FER

0.001

PER vPER

0.0001

0.00001

0.000001

0.0000001 EbNo

Figure 12. Rate 4/5 performances

5.40

5.30

5.20

5.10

5.00

4.90

4.80

4.70

4.60

4.50

1

5.75

5.70

5.65

5.60

5.55

5.50

5.45

5.40

5.35

5.30

5.25

5.20

5.15

5.10

5.05

5.00

4.95

4.90

4.85

4.80

4.75

4.70

4.65

1.00E+00

7 Conclusion The designed and implemented Turbo Codec is aimed for close to Shannon Limits performances uniformly with the Pragmatic approach for the QPSK, 8-PSK, 16-APSK, 16-QAM digital modulation approaches. The Codec has been implemented using off the shelf HW fast prototyping systems and could be used for the next generation DVB-S2 application with an overall 2dB improvement with respect to DVB-S1 (first generation). 8 List of Acronyms APP A Posteriori Probability RSC Recursive Systematic Code SCCC Serially Concatenated Convolutional Code SISO Soft Input Soft Output SOVA Soft Output Viterbi Algorithm 9 References Documents. [1] S.Benedetto, D.Divsalar, G.Montorsi, F.Pollara, "A Soft-Input Soft-Output APP Module for iterative Decoding of Concatenated Codes", IEEE Communication Letters, Vol.1, NO.1, January 1997. [2] Digital Viedo Broadcasting (DVB): Framing structure, channel coding and modulation for Digital Satellite News Gathering (DSNG) and other contribution applications by satellite. European Telecommunication Standard, prEN 301 210, January 1998.