Joint Factor Graph Detection for LDPC and STBC Coded MIMO Systems: A New Framework Jianxiao Yang, Charbel Abdel Nour, Charlotte Langlais Department of Electronics Institut TELECOM-TELECOM Bretagne, Brest, France
[email protected] Abstract—By modeling the corresponding MIMO channel as a factor graph (FG), STBC decoder, bit interleaver and LDPC decoder can be combined together to form a larger joint factor graph (JFG). Extrinsic information transfer (EXIT) charts and bit error rate (BER) results show that the JFG enables a shuffled decoding superior to the classical serial iterative detection in performance while reducing complexity. Keywords-joint factor graph (JFG) detection; low-density parity-check (LDPC); multiple-input multiple-output (MIMO); space-time block codes (STBC)
I.
INTRODUCTION
MIMO systems promise to significantly increase channel capacity and spectral efficiency [1]. Despite the existence of several MIMO detectors in [2]-[11], receiver complexity and latency remain as the main drawbacks of coded MIMO. A detection based on the belief propagation (BP) algorithm provides error rate performance comparable to maximum likelihood (ML) detection while allowing a greater flexibility in coded system design. It becomes particularly appealing when the MIMO scheme is concatenated with an outer LDPC code. In order to improve performance, extrinsic information can be exchanged between the MIMO detector and the forward error correcting (FEC) decoder, i.e., the so-called “turbo framework”. In this context, the exchanged information between the FEC decoder and the MIMO STBC decoder is only provided after the end of the local decoding process of each of these blocks over the whole frame. The resulting latency of such a schedule can be prohibitively high outweighing the increase in throughput provided by the introduction of MIMO. Previous works [3],[10],[11] did address a joint factor graph (JFG) for MIMO detection and channel decoding while limiting their analysis to a turbo schedule. In this paper, we propose to apply a joint channel decoding and MIMO detection based on the BP shuffled algorithm over the JFG. EXIT charts and simulation results show that the proposed algorithm offers a decoding schedule superior to the turbo schedule in both performance and complexity. The paper is organized as follows: After introducing the system model in section II, the proposed JFG associated to the BP algorithm for detection and decoding are described in section III. Section IV is dedicated to the different schedulings over the JFG, and, in particular, the proposed shuffled scheduling. Section V compares the different algorithms and schedulings thanks to EXIT chart analysis and simulation results. Section VI concludes the paper. Manuscript received April 15, 2010. This work was supported in part by the ANR MOCAMIMODYN project.
SYSTEM MODEL
II.
A. MIMO Transceiver System Notation with a bar like s denotes complex variables. Consider an LDPC codeword bn , n = 1,2," N , which are constrained by N − K LDPC checks pq , q = 1,2," ( N − K ) . After interleaving, log 2 ( M C ) bits are mapped to a complex symbol s ∈ to a Mc-ary constellation. L complex symbols are then grouped in a L -dimensional complex transmitted T signal S = [ s1 , s2 ,", sL ] , sl ∈ to a Mc-ary constellation. Consider a MIMO system with N T transmit and N R receive antennas with fading constant over D consecutive channel use. C( S) = ⎡⎣c1,1 ( S) ,c2,1( S) ,"cN ,1 ( S) ,",c1,D ( S) ,c2,D ( S) ,"cN ,D ( S) ⎤⎦ is the T
T
T
space-time block coded symbol vector of size ( N T ⋅ D ) × 1
and the element ci ,t ( S ) is the linear transformation of the L
mapped symbols, transmitted at transmit antenna i and time t ( i = 1,2,", NT ; t =1,2,",T ). The complex received signal is then given by: Y = H ⋅ C(S ) + W
(1) T
where Y = ⎡⎣ y1,1, y2,1,"yN ,1,", y1,t , y2,t ,"yN ,t ,", y1,D, y2,D,"yN ,D ⎤⎦ is the received vector of size ( N R ⋅ D ) × 1 and y j ,t represents the received symbol at antenna j and time t ( j = 1,2,", NR ; R
R
R
t =1,2,", D ), H = diag ⎡ H[1] ,", H[ D] ⎤ represents the flat fading ⎣ ⎦
channel matrix of size ( N R ⋅ D ) × ( N T ⋅ D ) over D consecutive channel use and h j ,i ( u ) is the complex fading coefficient from transmit antenna i to receive antenna j at time u. W denotes the complex zero-mean AWGN vector with variance
σ W2
of size ( N R ⋅ D ) × 1 . Equivalently, by decomposing each complex component by its real and imaginary components, the real-valued representation of the system can be written as (2)
Y = H ⋅ C ⋅ S + W = H equiv ⋅ S + W
where
H equiv = H ⋅ C
of
size
decomposed channel fading matrix.
( 2N R D ) × ( 2L )
is
the
III.
JOINT FACTOR GRAPH FOR MIMO DETECTION AND CHANNEL DECODING
∑ L( ) k≠ j
A. Joint factor graph The joint factor graph for the MIMO detection and the LDPC decoding is given Fig. 1. The connections between the received symbols and the STBC symbols depend on the MIMO channel. The connections between the STBC symbols and the mapped symbols depend on the used STBC. The interleaver is represented by a connection matrix between the mapped symbols and the LDPC variables.
t yk → bi
denotes the extrinsic information from the other
received symbol nodes. At the t th iteration, the message sent from the bit bi to the connected received symbol y j is: L(bi )→ y j =
NR D
∑
t
L(yk) →bi + L(P )→bi = L(bi ) − L(y j →) bi t
k =1, k ≠ j
t
t −1
t
(6)
where L(bt ) is the complete information of bit bi : i
NR D
∑ L( )
L(bi ) = L(Y)→bi + L(P →)bi = t
t −1
t
t yk → bi
k =1
∑
+
k ∈P ( bi )
L(pk →) bi t −1
(7)
Then the a posteriori information of bi is updated by: y1,1
y2,1
y3,1
yN R ,1
L(bi ) ← L(bi ) − L(y j →) bi + L(y )j → bi t
c1,1 ( S ) c2,1 ( S ) c3,1 ( S ) cN T ,1 ( S )
Figure. 1. The joint FG representation of the detector and the decoder (Example given for NT=NR=4, D=1, Hadamard STBC, see sect. V).
B. BP Algorithm for The MIMO detection At each received symbol node, incoming messages from all the connected LDPC bit nodes and other connected received symbol nodes are exploited as a priori information; the detector computes extrinsic information for all the connected LDPC bit nodes. At the t th iteration, the message sent from the j th received complex symbol y j of Y to the
Ly j →bi
⎛ ∑ p ( y j | s, Hequiv ) Pr ( s \ bi ) ⎞ ⎜ b ∈s,b = 0 ⎟ = log ⎜ m i ⎟ p y | s , H Pr s \ b ( ) equiv ) i ⎟ ⎜ ∑ ( j b b ∈ s , = 1 ⎝ m i ⎠
•
⎛ 1 p ( y j | s, Hequiv ) = exp ⎜ − 2 y j − ∑ hj ,i ⋅ si ⎜ σw i πσ w2 ⎝
t
where α (pt ) = j
•
t
∏
k ∈B ( p j )
(
sgn L(bk)→ p j t
t pj
i
t
t
used and therefore, (3) can be simplified as: 2 ⎧⎪ 1 ⎫ t t −1 ⎪ ≈ max ⎨− 2 y j − ∑hj ,i ⋅ si + ∑ L(yk) →bi + ∑ L(P→)bm ⎬ s:bi = 0 i k≠ j bm ∈s , m ≠i ⎪⎩ σ w ⎪⎭ 2 ⎫ ⎪⎧ 1 t t −1 ⎪ − max ⎨− 2 y j − ∑hj ,i ⋅ si + ∑ L(yk) →bi + ∑ L(P→)bm ⎬ s:bi =1 σ i k≠ j bm ∈s, m≠i ⎩⎪ w ⎭⎪
where •
(t )
t
i
(11)
the
detecting
i
(
) (
α (pt ) = α (pt ) sgn L(bt −→1)p sgn L(bt )→ p j
LDPC check nodes in (7) at the previous iteration and
(10)
extrinsic
information in (7). Update the information of the check nodes connected to t P ( bi ) , and the edge information L(b )→ p using j
(
i
j
β p(t ) = β p(t ) − φ L(bt −→1)p
where L(Pt −→1)b represents the extrinsic information from the
L(pk) →bi t
k ∈P ( bi )
t
represents
LY →bi
j
(5)
∑
t
L(bi)→ p j ← L(bi) − L(p)j →bi
(4)
of (3), the max-log approximation log ( e x + e y ) = max ( x, y ) is
k ∈B p j
t bk → p j
information for each check node p j ∈ P ( bi ) connected to bi with
t
⎞ ⎟ ⎟ ⎠
t −1
t
represent the check node messages; Update the variable information L(bt ) and the a priori
t
the a priori information of bi . To simplify the computation
t
) ( ( )) (9) ) and β ( ) = ∑ φ ( L( ) ) ( )
t −1
L(bi ) = L(Y)→bi + L(P )→ bi = L(Y)→bi +
and Pr ( s \ bi ) represents the a priori information excluding
L(y )j →bi
(
L(p )j →bi = α p( j) sgn L(bi →)p j φ β p( j) − φ L(bi →)p j
(3)
2
(8)
Compute the extrinsic information from the connected check nodes P ( bi ) to the variable nodes bi by
where j = 1,2,", ( N R D ) , i = 1,2,", ( L ⋅ M C ) 1
t
C. BP algorithm for the LDPC Decoder The horizontal and vertical shuffle decoding (HSS and VSS) [12]-[14] provide faster convergence than flooding schedule (FS). The HSS updates the information of the check nodes serially (row-wise), while the VSS updates the variable nodes (column-wise). In order to take advantage of shuffle decoding and avoid important delays, the VSS decoding schedule has been employed. Therefore, after finishing detection over variable node bi , the decoding steps are applied:
i th connected bits bi is given by (t )
t −1
t
j
( t −1) bi → p j
L
i
i
) +φ( L
(t )
j
j
)
bi → p j
j
(t ) bi → p j
←L
(12)
)
(13) (14)
An appealing feature of the VSS scheduling is the possibility to apply an inherent stopping criterion for the iterative decoding controlled by the signs of the check nodes
{α ( ) } . t pj
If all the check node constraints are satisfied,
•
decoding can terminate automatically. IV.
SCHEDULING FOR THE BP ALGORITHM OVER THE JFG
First, we briefly review the conventional turbo scheduling applied to the JFG. Second, we describe the proposed vertical shuffle scheduling for the joint MIMO detection and LDPC decoding. Note that in both cases the LDPC decoder uses vertical shuffle scheduling. A. Turbo scheduling over the JFG Considering two disjoint parts in the JFG, depicted in Fig. 1, that is the MIMO nodes in one hand, and the LDPC nodes in the other hand, the turbo schedule of the BP algorithm can be applied. For the MIMO detection, the BP algorithm is used over the MIMO nodes. Several internal iterations are traditionally needed before outputting soft bit information. These outputs pass through the interleaving connection matrix and are sent to the LDPC decoder. Applying the BP algorithm, the LDPC decoder also requires some internal iterations before outputting extrinsic information. Then the interleaved extrinsic information serves as the a priori information for consecutive MIMO detection. Thus before exchanging information between the MIMO detector and the LDPC decoder, the whole frame needs to be processed by each element. The turbo scheduling over the JFG is similar to the well-known Flooding Schedule (FS) for LDPC decoding. We propose to apply faster iterative process such as shuffle scheduling to the JFG. B. Vertical shuffle scheduling over the JFG The principle of shuffle decoding is not to wait for the complete frame to be decoded by the MIMO detector before going through the LDPC decoder. The MIMO detection is performed by the BP algorithm over Y corresponding to a ST block. Then the LDPC variables connected to the elements of Y are updated thanks to BP algorithm. Then a new ST block is considered. Depending on possibly previous updated LDPC variables, a priori information can be available for MIMO detection. Thus the shuffle decoding allows for a better information exchange between the MIMO detector and the LDPC decoder. The advantages of such a scheduling are a lower decoding latency thanks to a decrease number of iterations and/or better BER performance. Moreover, for a better information exchange, bit nodes from the same STBC should try to avoid connecting to the same LDPC check node. The proposition of interleaver (connection matrix) design rules that maximize the girth of the overall JFG is the subject of our future research. While particularly appealing in the case of a BP-based detection, the proposed algorithm is also applicable with other detection methods such as the ML detection. The joint decoding and MIMO detection shuffled BP algorithm is summarized as follows: For the received MIMO symbols y | y ∈ Y , j = 1, " , N D , the detector applies the following {j j R } steps:
{
Compute the a priori information L(bt )→ y i
j
} of the bits
connecting to the received symbol y j of Y by (6); •
{
Compute the extrinsic information L(yt ) →b j
i
} of the bits
connecting to y j by (5); •
Update the a posteriori information L(bt ) for bi by (8); i
For
the
LDPC
variables
{b | i = 1,", L ⋅ log ( M )} i
C
2
connected to Y , the decoder repeats the following steps: • Compute the extrinsic information L(pt ) →b from the j
i
connected check nodes p j ∈ P ( bi ) by (9); •
{ } } for every check
Update the variable node information L(bt ) by (10) and i
{
(t )
update the a priori information Lb → p i
j
nodes p j ∈ P ( bi ) connected to bi by (11); •
Update the check node information α (pt ) and β p(t ) by (12) j
j
{
}
and (13), then memorize the new information L(bt )→ p . V.
i
j
EXIT CHART ANALYSIS AND SIMULATIONS
We consider a 4×4 MIMO system. Simulations were carried out for the rate-1/2 LDPC code ( N = 16200 ) specified in DVB-T2 [15], a QPSK and a STBC based on Hadamard construction, with D=1, given by ⎡1 1 1 1 ⎤ ⎢1 −1 1 −1⎥ ⎥ C=⎢ ⎢1 1 −1 −1⎥ ⎢ ⎥ ⎣1 −1 −1 1 ⎦
A fast flat fading Rayleigh channel is considered. Turbo and shuffle schedulings were explored. In order to assess the performance of the BP MIMO detector, the ML MIMO detection is kept as reference. The EXIT chart is a fast and insightful mean of characterizing the iterative behavior of a system [16]. However, the vertical shuffle scheduling allows for partial information exchange inside the graph during the detection and/or decoding of a frame that cannot be tracked by conventional EXIT chart. We propose to divide the analysis into two steps. First the EXIT chart analysis of the MIMO detector alone can help to compare the different algorithms and schedulings. Second, a modified EXIT chart analysis of the LDPC decoder featuring decoding trajectories is proposed to evaluate the impact of different schedulings. A. Decoding trajectories ofr the MIMO detector The mutual information (MI) I E at the output of the MIMO detector is a function of the channel SNR and the a priori information at the input such IA that IE = T ( IA ,Y) = T ( IA , Eb N0 ) . The generation of IA depends on the considered scheduling. In the case of turbo
B. Decoding trajectories of the LDPC decoder Conventional EXIT chart analysis cannot be performed in the case of BP shuffled decoding. However, the variations of mutual information can be tracked thanks to decoding trajectories. The objective is to show how the types of schedulings and detection algorithms impact the decoding state of the LDPC along the iterations. To explain what a decoding trajectory is, fig. 3 depicts the decoding trajectory of the turbo BP algorithm such that the LDPC decoder is based on vertical shuffled scheduling with 6 iterations and the MIMO detector is based on BP with 6 iterations. The MIMO detector and the LDPC decoder are simulated. Mutual information is tracked along the iterations. IVAR→CHK is the mutual information of the variables nodes, while ICHK→VAR is the mutual information of the check nodes. The mutual information IVAR →CHK is given
(
i) (i) by I(VAR →CHK = T ICHK →VAR ,IE
)
0.55
0.5
0.4
0.3 0
0.2
0.4
0.6
0.8
1
IA Figure. 2. Decoding trajectories of the MIMO detector of different detection algorithms and schedulings for Eb/N0=2.2 dB.
improvement in terms of mutual information after 6 detector iterations. The process continues until the (1,1) point. Thus each point represents a half global iteration. At the second point, after 6 LDPC iterations, no improvement can be achieved with more LDPC iterations since the CHK->VAR curve is reached. Going back to the detector helps to open the tunnel between the two curves. Since two types of iterations, global iteration or inner iteration, are possible, iteration profiling is needed to guarantee an efficient receiver design. Efficient profiles are the ones requiring the minimum total number of iterations for a target performance. Fig. 4 depicts the decoding trajectories of the LDPC decoder for different schedulings and MIMO detection. The decoding states are shown only at the end of one global iteration to allow for the comparison between shuffled and turbo scheduling for the JFG. The intermediate points after MIMO detection, depicted for the turbo BP in Fig.3, are not drawn. The zoomed window of Fig.4 allows for a comparison in terms of number of iterations. A segment between two consecutive decoding states of the turbo ML (det=1, dec=6) is taken as reference (1 global iteration). The turbo BP (det=6, dec=6) requires 4 global iterations to achieve the same MI. The shuffled BP detection (det=1, dec=1) requires only 5 global iterations, and the shuffled ML detection (det=1, dec=1) needs only 4 global iterations. 1
and i denotes the LDPC decoder iteration number. The dashed black curve (CHK Æ VAR) corresponds to the check i) ( i −1) node decoding state such that I(CHK →VAR = T I VAR →CHK . For
0.9
)
0.8 0.62
IVAR->CHK
CHK->VAR Turbo BP VSS (det=6,dec=6)
0.7 0.6
Non Turbo BP VSS (det=6,dec=100) Non Turbo BP FS (det=6,dec=100)
0.58
0.6
0.5
0.56 0.54
I
comparison, the zoomed window depicts the decoding trajectories for the vertical and flooding schedules for the LDPC decoder. For the FS, the conventional steps of EXIT chart can be drawn. Remark that 2 iterations of FS give approximately the same MI than one iteration of VSS. The first point of the turbo BP is the initial decoding state after 6 detector iterations (vertical line). The second one corresponds to the new decoding state after 6 LDPC iterations (VSS). At the third one, information goes back to the detector. The vertical gain represents the
Turbo BP (det=1) Turbo BP (det=6) Shuffle BP (det=1) Turbo ML (det=1)
0.35
0) ( 0) with I(CHK → VAR = 0 , I VAR → CHK = I E ,
(
0.45
VAR->CHK
Fig. 2 shows an EXIT chart comparison of different detection algorithms and schedulings. The turbo BP approaches the turbo ML performance after 6 detector iterations [10]. It is interesting to note that when no inner detector iteration is performed, the turbo BP algorithm greatly degrades performance. However, with a shuffle scheduling, the BP curve joins the ML one with increasing IA at the input. This predicts that the BP with shuffle scheduling over the JFG allows for less detector iterations without inducing an important performance penalty. Note also that despite the higher EXIT chart curve for the turbo BP detection with 6 iterations, this latter performs worse than the ML in BER simulations. This type of behavior was already observed when the max-log approximation is used [17]. It was explained by an over-estimate of the reliability with respect to the ML detection.
0.6
IE
scheduling, a “virtual channel” with a Gaussian distribution is used to generate IA and to mimic the decoder output with detector iterations [10]. In the case of vertical shuffle scheduling over the JFG, the virtual channel at the input of the detector cannot be used since mutual information evolves along with the frame detection. The LDPC decoding is so i) simulated, with the check nodes input I (VAR → CHK replaced by a “virtual channel”, and IA is the output of the decoder.
0.52
0.4 0.5
0.3
0
0.02
0.04
0.06
0.08
0.1
ICHK->VAR 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ICHK->VAR
Figure 3. Decoding trajectories of the LDPC decoder for the turbo BP alg.
iterations. BER results show the superiority of the proposed scheme in terms of performance and receiver efficiency. Moreover, the shuffle scheduling has shown high tolerance for simplified detection while being particularly appealing for a hardware implementation. Future works include a joint LDPC, STBC, bit interleaver factor graph optimization in terms of girth and a study of reduced complexity detection schemes.
1 0.9
0.7
0.96 0.94
0.6
0.92
IVAR->CHK
IVAR->CHK
0.8
0.5
0.9 0.88
0.84 0.82
0.3
0.8 0.4
0.2
REFERENCES
LDPC CHK Shuffle BP (det=1,dec=1) Shuffle ML (det=1,dec=1) Turbo BP (det=6,dec=6) Turbo ML (det=1,dec=6)
0.86
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
[1]
0.85
[2]
ICHK->VAR
0
0.2
0.4
0.6
0.8
1
ICHK->VAR Figure. 4. Decoding trajectories of the LDPC decoder for the different schedulings for Eb/N0=2.2 dB
[3]
In Fig. 5, glb denotes global iterations between the decoder and the detector. In the sake of a fair comparison, 60 LDPC decoding iterations are used for all schemes. Additional performance loss is however observed for the BP algorithm due the over-estimation problem. Best performance is observed for the shuffled ML detection. Reducing its total number of iterations to 35 (by 41%) still allows it to show the same performance as a turbo scheduling with the same detection type. The price to pay for going from a ML to a BP detection is only 0.2 dB. The shuffled scheduling in this case shows similar performance than the turbo scheduling even after reducing the number of iterations to 22 (reduction by 63%). Note that without iterations between the decoder and the detector, an important performance penalty of 1.25 dB can be observed even for a ML detection.
[4]
10
10
BER
10
10
ML (det=1,dec=60) Turbo BP (glb=10,det=6,dec=6) Turbo BP (glb=60,det=1,dec=1) Shuffle BP (glb=22,det=1,dec=1) Shuffle BP (glb=60,det=1,dec=1) Turbo ML (glb=10,det=1,dec=6) Shuffle ML (glb=35,det=1,dec=1) Shuffle ML (glb=60,det=1,dec=1)
-1
-2
[5]
[6]
[7]
[8]
[9]
[10] [11]
-3
[12]
-4
[13] [14]
10
10
-5
[15] -6
1.5
2
2.5
3
3.5
SNR (dB) Figure. 5. BER comparison of shuffled and turbo schedulings using ML or BP detectors for different iteration profiles.
VI.
CONCLUSIONS
We have proposed to apply a shuffled BP algorithm to the JFG of a coded MIMO system. A modified EXIT chartbased analysis based on decoding trajectories has been introduced to assess decoding efficiency in terms of receiver
[16]
[17]
G.J. Foschini and M.J. Gans, “On the limits of wireless communications in a fading environment when using multiple antennas,” Wireless Personal Commun., no.6, pp.315-335, 1998. G.J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multielement antennas,” Bell Labs Tech. J., vol.1, pp.41–59, 1996. S. Gounai, and T. Ohtsuki, "Convergence Acceleration of Iterative Signal Detection for MIMO System with Belief Propagation," GLOBECOM '06, pp. 1-5, Nov. 27 2006-Dec. 1, 2006. B. Steingrimsson, Z.-Q. Luo, and K. Wong, “Soft quasi-maximumlikelihood detection for multiple-antenna wireless channels,” IEEE Trans. on Signal Processing, vol.51, pp.2710–2719, Nov., 2003. B. Hassibi and H. Vikalo, “On the sphere-decoding algorithm I. expected complexity," IEEE Trans. on Signal Processing, vol.53, pp.2806-2818, Aug., 2005. Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best sphere decoding for MIMO detection,” IEEE J. Select. Areas Commun., vol.24, pp.491–502, Mar.2006. H. Artes, D. Seethaler, and F. Hlawatsch, "Efficient detection algorithms for MIMO channels: a geometrical approach to approximate ML detection," IEEE Trans. on Signal Processing, vol.51, pp.2808 - 2820, Nov., 2003. J. Luo, K.R. Pattipati, P.K. Willett, and F. Hasegawa, “Near-optimal multiuser detection in synchronous CDMA using probabilistic data association," IEEE Commun. Lett., vol.5, pp.361-363, Sept., 2001. R-R. Chen, R. Peng, A. Ashikhmin, and B. Farhang-Boroujeny, "Approaching MIMO capacity using bitwise Markov Chain Monte Carlo detection," IEEE Trans. on Commun., vol.58, pp. 423-428, Feb., 2010. J. Hu, and T.M. Duman, "Graph-Based Detector for BLAST Architecture", ICC '07. pp.1018-1023, 24-28 June 2007. S. Bavarian, and J. Cavers, "A new framework for soft decision equalization in frequency selective MIMO channels," IEEE Trans. on Commun., vol.57, pp.415-422, Feb., 2009. M. Mansour and N. Shanbhag, “Turbo decoder architectures for lowdensity parity-check codes,” in GLOBECOM'02, pp.1383–1388, Nov.17–21, 2002. F. Guilloud, “Generic architectures for LDPC codes decoding,” Ph.D. dissertation, Telecom Paris, Paris, France, July 2004. J. Zhang and M. Fossorier, “Shuffled iterative decoding,” IEEE Trans. on Commun., vol.53, pp.209–213, Feb., 2005. ETSI EN302755, “Digital Video Broadcasting (DVB); Frame structure, channel coding and modulation for a second generation digital terrestrial television broadcasting system (DVB-T2),” v1.1.1, Sept., 2009. S.T. Brink, G. Kramer, and A. Ashikhmin, "Design of low-density parity-check codes for modulation and detection," IEEE Trans. on Commun., pp.670-678, vol.52, April 2004. G. Lechner and J. Sayir, “Improved Sum-Min Decoding of LDPC Codes,” ISITA2004, Parma, Italy, October 10-13, 2004.