alization is then discussed and shown to resemble LDPC decoding. The paper ... this, we unify convolutional codes with LDPC codes from a broader structural ...
An Efficient Decoding Algorithm for Cycle-free Convolutional Codes and its Applications Jing Li Krishna R. Narayanan and Costas N. Georghiades Department of Electrical Engineering Texas A&M University, College Station, TX 77843-3128 E-mail: {jingli, krishna, georghia}@ee.tamu.edu Abstract— This paper proposes an efficient graph-based sum-product algorithm for decoding 1/(1+D n ) code, whose Tanner graph is cycle-free. Rigorous proof is given which shows the proposed algorithm is equivalent to the MAP decoding implementing the BCJR algorithm, but with magnitude less of complexity. In this, the paper presents an explicit example which confirms the claim that sum-product algorithm is optimal on cycle-free graphs. A parallel realization is then discussed and shown to resemble LDPC decoding. The paper further proposes a min-sum algorithm which is equivalent to the max-log-MAP algorithm. Prospective applications which can take advantage of the proposed decoding algorithms are discussed and simulations are provided.
I. Introduction The discovery and rediscovery of low density parity check (LDPC) codes have aroused great interest in the study of code graph and associated decoding algorithms. Tanner graphs, factor graphs, Bayesian networks and code geometry have been investigated and shown to offer an umbrella under which many different codes, including Hamming codes, convolutional codes, turbo codes, LDPC codes, and product codes, can be unified [1]-[5]. Code graph helps to structure the constraint governing the code in a way to help the derivation of efficient decoding algorithms. Graph-based decoding, as an instance of Pearl’s belief propagation algorithm [6], has shown to be an efficient generic algorithm through its successful application in LDPC codes, but has been much less favored for application on convolutional codes. One major reason lies in that a convolutional code graph is usually multiconnected, which presents cycles if the edge directions are not concerned. Hence, it is not obvious how probability propagation should be properly conducted, nor does it guarantee any close approximation to optimal decoding. This paper studies the graph-based decoding of a special type of convolutional codes, 1/(1 + D), whose code graph is cycle-free, and extends it to 1/(1 + Dn ). An efficient sum-product decoding is proposed and shown to be equivalent to the MAP (maximum a posterior) decoding implementing the BCJR algorithm. To cater for the hardware implementation, a parallel realization is then proposed and shown to resemble LDPC decoding. In this, we unify convolutional codes with LDPC codes from a broader structural viewpoint. To cater for applications where complexity is judiciously restricted, we further propose an approximation of the sum-product algorithm, namely, the min-sum algorithm, and show that it is equivalent to the max-log-MAP algorithm. Prospective applications which can take advantage of the proposed algorithms are then discussed, including serial turbo codes, repeat accumulate codes [8] [9] and product accumulate
codes [10]. Complexity analysis and simulations also presented. The basic idea of this paper is implied in Tanner’s pioneering work in 1981 [1], including the fundamental idea of graph representation and the generic iterative probabilistic decoding algorithm, which has seen been investigated by Frey, Forney, Koetter, Luby, Richardson, Urbanke, Wiberg, et al [2]-[7]. But this paper explicitly connects the trellis-based BCJR algorithm, LDPC decoding and the sum-product decoding through the case study of 1/(1 + Dn ) code. While preparing the paper, we became aware of Wiberg’s dissertation [5] and papers by Forney [3] and Frey [4] et al, where the sum-product algorithm is developed as a generalization of the trellis decoding, the LDPC decoding and a variety of other algorithms in signal processing and digital communications. These independent works show that in a general setting, the sumproduct decoding is potentially optimal on cycle-free trellis codes, but no examples are provided. The contribution of this work is to present a concrete example to confirm the above claim and, by providing an efficient alternative to the BCJR algorithm, to save complexity for a variety of applications where 1/(1 + Dn ) code is used. The paper is organized as follows. Section 2 presents the sum-product, the parallel sum-product and the minsum algorithms for decoding 1/(1 + D) code. Proof is given to show their equivalence to the MAP, the LDPC and the Max-log-MAP decoding. Section 3 analyzes the complexity, extends it to 1/(1 + Dn ), and discusses some prospective applications where the proposed algorithms could find useful. Section 4 concludes the paper. II. Efficient Decoding Algorithms for 1/(1 + D) A. Sum-Product Decoding Algorithm The rate-1 convolutional code 1/(1 + D) is typically decoded using a 2-state MAP decoder implementing the BCJR algorithm. However, a more computationally efficient approach is possible, which exploits the code graph and implements the sum-product algorithm. As shown in Fig. 1(b), 1/(1 + D) code can be described using a cycle-free Tanner graph which presents the relation of yi = xi ⊕ yi−1 (⊕ denotes modulo-2 addition). It is a bipartite graph where nodes are divided into, bits and checks with edges inter-connecting them. A check presents a constraint on the code such that all bits connecting to it should sum up (modulo-2) zero. The key of sum-product decoding is to specify a proper order in which messages are passed, since it affects both the final performance and the rate of convergence to a good decoding solution. In this section, we propose a
0-7803-7206-9/01/$17.00 © 2001 IEEE
1063
serial update schedule where messages are passed onward from bit 1 to bit N and then backward from bit N to bit 1, and finally combined and passed out. Fig. 2 illustrates the message flow. The fundamental rule to be observed in a sum-product algorithm is partial independence, ie, no message should be fed back to its source. In other words, the outbound message along an edge should make use of messages from all other sources except the inbound message along this very edge. For example, the message (in log-likelihood ration (LLR) form) of bit yi consists of Lch (yi ) from the channel, Lef (yi ) from check i and Leb (yi ) from check i+1 (see Fig. 2(a)). Since Lef (yi ) came from check i, it should be excluded when yi passes (extrinsic) information to check i. Likewise, for bit yi−1 to function in check i, information content Leb (yi−1 ) should not be used. Hence, the messages associated with check i to be sent to bit xi at the kth turbo iteration1 , denoted as (k) Le (xi ), is computed by: (k) L(k) Lch (yi ) + L(k) e (xi ) = Lch (yi−1 ) + Lef (yi−1 ) ± eb (yi ) , (1) i |yi =0) denotes the message obwhere Lch (yi ) = log Pr(r Pr(ri |yi =1) tained from the channel upon receiving noise corrupted signal ri , Lef (yi ) denotes the message passed “forward” to bit yi from the sequence of bits/checks before the ith position, and Leb (yi ) denotes the message passed “backward” to bit yi from the sequence of bits/checks after the ith position. Superscript (k) denotes the kth turbo iteration and subscript i denotes the ith bit/check. Operation ± refers to a “check” operation and is mathematically expressed as: γ=α± β
⇐⇒ ⇐⇒
1 + eα+β , eα + eβ α β γ tanh = tanh · tanh . 2 2 2 γ = log
(2) (3)
Following the rule of partial independence, Lef (yi ) and Leb (yi ), which correspond to messages passed forward and backward along the code graph, can be calculated using (see Fig. 2(b,c) for illustration): (k−1) (k) L(k) (y ) = L (x ) ± L (y ) + L (y ) , (4) i i ch i−1 i−1 ef o ef (k−1) L(k) (xi+1 ) ± Lch (yi+1 )+L(k) (5) eb (yi ) = Lo eb (yi+1 ) ,
L(k) ef (y1 ) L(k) eb (yN )
= L(k−1) (x1 ) ± ∞ = L(k−1) (x1 ), o o
(6)
= 0.
(7)
Treating all bits as a time series, it is obvious that the (k) outbound message at the present time instance i, Le (xi ), has utilized all dependence among the past and the future (through Lef (xi−1 ) and Leb (xi )) without looping-back of the same information (see Fig. 2(a)). Hence this tends to be a best-effort solution. In the next section, we present a rigorous proof for its optimality. In this we confirm Wiberg’s claim that it is possible to find an efficient updating order (where each intermediate “cost function” is calculated only once) in a cycle-free structure like trellis (and yet get optimal performance) [5]. B. Optimality of the Sum-Product Decoding We show the optimality of the proposed sum-product decoding by showing its equivalence to the BCJR algorithm of per-symbol max a posteriori probabilistic decoding [11]. Due to the limit on space, we skip basic introduction to the BCJR algorithm. Interested readers are referred to [11] [12] [13]. We use xt , yt , st , rt to represent data bit, coded bit, (binary) modulated bit (signals to be transmitted over the channel) and received bit (noise corrupted), respectively. Their relations are illustrated as follows: xt
yt = yt−1 ⊕xt BP SK =⇒ y =⇒ st ∈(0,1) t ∈(0,1)
+noise ∈(±1)
=⇒ rt
(8)
The following definitions and notations are needed in the discussion: Pr(St = m) — the probability decoder is in state m at time instance t, (m ∈ {0, 1} in a 2-state case). j • ri = (ri , ri+1 , · · · , rj ) — received sequence. t • αt (m) = Pr(St = m, r1 ) — forward path metric. N • βt (m) = Pr(rt+1 |St = m) — backward path metric. • γt (m ,m) = Pr(St = m, rt |St−1 = m ) — branch metric. Pr(xt =0|rN ) 1 • Λt = log — output LLR of bit xt . Pr(xt =1|rN 1 ) The branch metric of 1/(1 + D) code is given by (see trellis in Fig. 1(a)): •
(k)
γt (0, 0) γt (0, 1) γt (1, 0) γt (1, 1)
= = = =
Pr(xt Pr(xt Pr(xt Pr(xt
= 0) Pr(rt |yt = 1) Pr(rt |yt = 1) Pr(rt |yt = 0) Pr(rt |yt
= 0), = 0), = 1), = 1).
(9) (10) (11) (12)
where Lo (xi ) denotes the a priori information of bit xi . Specifically, in a serially concatenated system where In the BCJR algorithm, the forward metric is computed (k) 1/(1 + D) acts as the inner code, Lo (xi ), k>0 contains recursively using: the message obtained from the outer code after the kth αt−1 (0)γt (0, 0) + αt−1 (1)αt (1, 0) αt (0) (0) turbo iteration. Apparently, Lo (xi ) = 0, ∀i , since noth= (13) αt (1) αt−1 (0)γt (0, 1) + αt−1 (1)αt (1, 1) ing is fedback from the outer code in the first iteration. From Fig. 2, we can see the boundary conditions of (4) Substituting (9)-(12) to (13), and dividing both the and (5) are: nominator and the denominator by αt−1 (1) Pr(xt = 1 The decoding of 1/(1 + D) alone does not involve any iteration, 1) Pr(rt |yt = 1), we get: but a simple pass of “forward”, “backward” and “out” (see Fig. 2), αt−1 (0) Pr(xt =0) t |yt =0) in which case superscript (k) can be omitted. However, 1/(1+D) by + 1 · Pr(r α (1) Pr(x =1) Pr(rt |yt =1) α (0) t−1 t itself does not serve much purpose. Since it is primarily used as an t = (14) inner code in a serial concatenation with turbo decoding, superscript αt−1 (0) αt (1) + Pr(xt =0) (k) is thus used to indicate the kth round of turbo iteration. αt−1 (1)
1064
Pr(xt =1)
Define: α ¯t Lch (yt ) Lo (xt )
sum-product algorithm, serial procedure includes the for(l) (l) (15) ward pass (4), where Lef (yi+1 ) is computed using Lef (yi ), (l) and the backward pass (5), where Leb (yi−1 ) is computed (l) (16) using Leb (yi ). To parallelize the procedure, it is convenient to adopt a batch mode update instead of sequential mode, that is, messages of the (i+1) bit or the (i−1)th bit (17) need not depend on the message ofthi bit of the current th iteration. Rather, it can resorts to the old message from the previous iteration. Thus, (4) and (5) can be rewritten as: (k−1) (xi ) ± Lch (yi−1 )+L(k−1) (yi−1 ) , (21) L(k) ef (yi ) = Lo ef (k−1) (18) L(k) (yi ) = L(k−1) (xi+1 ) ± L (y )+L (y ) . (22) ch i+1 i+1 eb o eb
αt (0) , αt (1) Pr(rt |yt = 0) , := log Pr(rt |yt = 1) Pr(xt = 0) . := log Pr(xt = 1)
:= log
Taking logarithm on both sides of (14), we get: α ¯t
eα¯ t−1 · eLo (xt ) + 1 log α¯ + Lch (yt ), e t−1 + eLo (xt ) α ¯ t−1 ± Lo (xt ) + Lch (yt ).
= =
Likewise, in the backward recursion we have: β¯t
=
= = Finally using: Λt
Fig. 1(c) presents 1/(1 + D) code in a sparse parity check matrix form. It can be seen that the above parallelization essentially turns the decoder behave like an LDPC decoder, which performs a batch mode of bit-tocheck and check-to-bit update. We would like to mention that this parallel sum-product algorithm is what is used in the decoding of irregular repeat accumulate codes [8] [9].
βt (0) , := log βt (1) γt+1 (0, 0)βt+1 (0) + γt+1 (0, 1)βt+1 (1) = log , γt+1 (1, 0)βt+1 (0) + γt+1 (1, 1)βt+1 (1)
= = =
= =
log
Pr(xt+1 =0) Pr(rt+1 |yt+1 =0) βt+1 (0) Pr(xt+1 =1) · Pr(rt+1 |yt+1 =1) · βt+1 (1) + Pr(xt+1 =0) Pr(rt+1 |yt+1 =0) βt+1 (0) Pr(xt+1 =1) + Pr(rt+1 |yt+1 =1) · βt+1 (1) Lo (xt+1 ) Lch (yt+1 )+β¯t+1
1
,
D. Min-Sum Algorithm and Its Relation to Max-log-MAP
·e +1 ¯t+1 , L (y )+ β t+1 ch +e Lch (yt+1 ) + β¯t+1 . Lo (xt+1 ) ± e
Observe that the main complexity of the sum-product algorithm comes from ± operation. A straight-forward implementation of ± requires 1 addition and 3 table (19) lookups (assuming log(tanh( x )) and log(tanh−1 ( x )) are 2 2 implemented via table lookup). To cater for systems we compute the output (extrinsic) information where complexity is the major issue, approximation can be made to replace ± with a simple min-max operation: Pr(xt = 0|Y1N ) 1 + eα+β , log , γ = α± β = log α N Pr(xt = 1|Y1 ) e + eβ = sign(α) · sign(β) · min(|α|, |β|) αt−1 (m )γt (xt = 0, m m)βt (m) log m m , 1 + e−|α+β| m m αt−1 (m )γt (xt = 1, m m)βt (m) + log , αt−1 (0) Pr(rt |yt =0)βt (0) 1+e−|α−β| +αt−1 (1) Pr(rt |yt =1)βt (1) log , ≈ sign(α) · sign(β) · min(|α|, |β|) (23) αt−1 (0) Pr(rt |yt =1)βt (1) log
eLo (xt+1 )
+αt−1 (1) Pr(rt |yt =0)βt (0) Lch (yt )+β¯t
α ¯ t−1
·e +1 , eα¯ t−1 + eLch (yt )+β¯t (Lch (yt ) + β¯t ). α ¯ t−1 ± log
e
(20)
For clarity, Tab. II summarizes the above results from the BCJR algorithm and compares them with the sumproduct algorithm described in the previous section. It is easy to see that the two algorithms are performing exactly the same operation where α ¯t = Lch (yt ) + Lef (yt ), β¯t = Leb (yt ) and Λt = Le (xt ). Hence, the sum-product decoding of 1/(1 + D) presents an efficient alternative of the conventional BCJR algorithm. C. Parallel Sum-Product Algorithm and Its Relation to LDPC Decoding
A considerable amount of complexity can be saved using this approximation, and the sum-product algorithm is then reduced to the min-sum algorithm. It is interesting to note that, just as the sum-product algorithm is equivalent to the MAP (log-MAP) algorithm, this minsum algorithm is equivalent to the Max-log-MAP algorithm. To see this, recall that Max-log-MAP algorithm approximates the log-MAP algorithm through [13]: log(eα + eβ ) ≈ max(α, β), which can directly translates to: γ = α± β, = log(e0 + eα+β ) − log(eα + eβ ), ≈ max(0, α + β) − max(α + β), = sign(α) · sign(β) · min(α, β). (24)
III. Applications Parallelization is highly welcome in hardware impleA. Complexity mentation, where one processing unit can be assigned for Tab. I compares the complexity of log-MAP, Max-logeach bit/check such that all check-to-bit and bit-to-check updates can be processed simultaneously. In the above MAP [13], the sum-product and the min-sum algorithms
1065
for decoding 1/(1 + D). As can be seen, the sum-product algorithm requires less than 1/5 the complexity of the logMAP algorithm and the sum-product algorithm requires only about 1/8 that of the Max-log-MAP algorithm, both of whcih are very impressive. Further, the proposed algorithms can be conveniently adopted to 1/(1+Dn ) codes, which have very similar code graphs. Fig. 3 presents the code graph of a 1/(1 + D2 ) code. From the graph perspective, 1/(1 + Dn ) and 1/(1 + D) are essentially the same code, where the former is like operating on a n-segmented block size. B. Applications and Simulations
[2] [3] [4] [5] [6]
[7] [8]
1/(1 + D) code has been widely used for serially concatenated systems to achieve interleaving gain without reduction in code rate. Prospective applications which use 1/(1 + D) code include serial turbo codes, regular/irregular repeat accumulate (RA) codes [8] [9] and product accumulated (PA) codes [10], etc. The proposed algorithms are expected to be of great practical interest to the above systems. We evaluate the performance of the above algorithms with product accumulate codes [10]. Product accumulate codes are a class of interleaved serial concatenated codes where the inner code is 1/(1 + D), and the outer code is a parallel concatenation of 2 single parity check (SPC) codes [10]. Like turbo and repeat accumulate codes, product accumulate codes have shown performance impressively close to the capacity. We compare the performance of product accumulate codes using the serial and the parallel sum-product decoding. In general, serial update leads to a faster convergence, and perceivably a better performance, than parallel update. Fig. 4 tests a (8k,4K) product accumulate code. Performance at 5, 15, 30 iterations is evaluated. We see that the difference between the two approaches is only about 0.1 dB at bit error rate of 10−5 , which is almost negligible. Hence, parallel sum-product decoding serves as a promising candidate for hardware implementation. Fig. 5 compares the performance of a (2K,1K) product accumulate code using the sum-product and the minsum decoding. Performance at 5, 10, 15, 20 iterations is evaluated. At all those iterations, the min-sum decoding incurs only about 0.2 dB loss. Hence, the min-sum algorithm provides a good tradeoff between performance and complexity, which is very appealing for simple, low-cost systems.
[9] [10]
[11] [12] [13]
B. J. Frey, Graphical models for machine learning and digital communication, The MIT Press, Cambridge, Massachusetts, London, England, 1998 G. D. Forney, Jr, “Codes on graphs: normal realizations,” Trans. Inform. Theorty, vol. 47, pp. 520-548, Feb. 2001 F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” Trans. Inform. Theorty, vol. 47, pp. 498-519, Feb. 2001 N. Wiberg, Codes and decoding on general graphs, Doctoral dissertation, 1996 R. J McEliece, D. J. C. MacKay, and J.-F. Cheng, “Turbo decoding as an instance of Pearl’s “Belief Propagation” algorithm,” IEEE Journ. Sele. Areas Commun., Vol. 16, pp.140152, Feb. 1998 M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A. Spielman, “Analysis of low density codes and improved designs using irregular graphs,” ACM Symposium, pp. 249-258, 1998 D. Divsalar, H. Jin and R. J. McEliece, “Coding theorems for ’turbo-like’ codes”, Proc. 1998 Allerton Conf. Commun. and Control, pp. 201-210, Sept. 1998 H. Jin, A. Khandekar and R. McEliece, “Irregular repeataccumulate codes,” Proc. 2nd Intl. Symp. on Turbo Codes and Related Topics, Brest, France, Sept. 2000 J. Li, K. R. Narayanan and C. N. Georghiades, “A class of linear-complexity, soft-decodable, high-rate, “good” codes: construction, properties and performance”, to be presented Proc. Intl. Symp. Inform. Theory, Washington D.C., June, 2001 L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory, pp. 284-287, Mar., 1974 W. E. Ryan, “A turbo code tutorial,” http:// www.ece.arizona.edu/∼ryan/ P. Robertson, E. Villebrum, and P. Hoeher, “A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain,” Proc. Intl. Conf. Commun., Seattle, 1995
TABLE I Complexity of log-MAP, Max-log-MAP, Sum-Product and Min-Sum Decoding for 1/(1 + D) Code (operations per bit)
Oper add
log-map
max-log-map
sum-pro
min-sum
39 8 8
31 8
5
2 3
min/max lookup
5
0
0/0 1/1
State 0
0
1/0 0/1
State 1
1
x1
y1
x2
y2
x3
y3
IV. Conclusion An efficient sum-product algorithm is proposed for decoding 1/(1+Dn ) codes, which is shown to be both simple and optimal. Its parallel realization and low-complexity approximation are discussed. Prospective applications which can take advantage of the proposed algorithms are also investigated. The proposed algorithms are expected to be of great interest in theory and practice alike.
x1 x2 x3 x4 . . .
y1 y2 y3 y4 . . .
1
1 1 1 1 1 1 1
1 1 1 .
.
.
.
References [1]
R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inform. Theory, vol. IT-27, pp.533-547, Sept. 1981
Fig. 1. Structure of 1/(1 + D) code . (a) Trellis; (b) Tanner graph; (c) Sparse parity check matrix form.
1066
TABLE II Summary of Sum-Product and MAP Decoding
forward backward extrinsic LLR
BCJR α ¯ t = (¯ αt−1 ± Lo (xt )) + Lch (yt ) Lo (xt+1 ) β¯t = (β¯t+1 + Lch (yt+1 )) ± Λt = α ¯t−1 ± (β¯t + Lch (yt ))
Sum-product Lef (yt ) = (Lef (yt−1 ) + Lch (yt−1 )) ± Lo (xt ) Leb (yt ) = (Leb (yt+1 ) + Lch (yt+1 )) ± Lo (xt+1 ) Le (xt ) = (Lef (yt−1 ) + Lch (yt−1 )) ± (Leb (yt ) + Lch (yt )) 0 0
(a). Message flow
xi-1
Lo(xi-1)
i-1
xi
Le(xi)
i
xi+1
Lef(yi-1 )
yi-1 yi
) (y i eb L Lo(xi+1) i+1
Lch(yi-1)
y1
x2
y2
x3
y3
x4
y4
Fig. 3. Tanner graph of 1/(1 + D 2 ) code
Lch(yi)
(2K,1K) PA Code
−1
yi+1
10
Lch(yi+1) 5 5
−2
10
)
L
x1
1 (y i+ eb
−3
BER
10
(b). Forward pass
15
−4
xi-1
Lo(xi-1)
i-1
Lef(yi-1)
yi-1
10
Lch(yi-1) −5
10
xi
xi+1
Lo(xi)
i
Lef(yi)
Lo(xi+1) i+1
Lef(yi+1)
yi
Lch(yi)
30
serial sum−product parallel sum−product
−6
10
yi+1
15 30
5 15 30 iterations
0.5
Lch(yi+1)
2
1.5 Eb/No (dB)
1
2.5
Fig. 4. Sum-product decoding: serial vs parallel (8K,4K) PA Code
−1
10
xi-1
xi
( L eb Lo(xi-1) i-1
Lo(xi)
i
5
−2
10
)
y i +2
) ( y i -1 L eb
5
yi-1
yi
−3
10
Lch(yi-1) BER
(c). Backward pass
10
−4
10
Lch(yi)
−5
5 10 15 20 iterations 20 sum−product min−sum
10
xi+1
) (y i b
e Lo(xi+1) i+1 L
yi+1
−6
Lch(yi+1)
10
) 1 (y i + b
10 20 15
15
−7
10
Le
1
1.5
2
Eb/No (dB)
Fig. 2. Sum-product decoding of 1/(1 + D) code
Fig. 5. Sum-product decoding vs min-sum decoding
1067
2.5