A parallel MAP algorithm for low latency turbo decoding - IEEE Xplore

0 downloads 0 Views 206KB Size Report
Abstract—To reduce the computational decoding delay of turbo codes, we propose a parallel algorithm for maximum a posteriori (MAP) decoders. We divide a ...
288

IEEE COMMUNICATIONS LETTERS, VOL. 6, NO. 7, JULY 2002

A Parallel MAP Algorithm for Low Latency Turbo Decoding Seokhyun Yoon, Student Member, IEEE, and Yeheskel Bar-Ness, Fellow, IEEE

Abstract—To reduce the computational decoding delay of turbo codes, we propose a parallel algorithm for maximum a posteriori (MAP) decoders. We divide a whole noisy codeword into sub-blocks and use multiple processors to perform sub-block MAP decoding in parallel. Unlike the previously proposed approach with sub-block overlapping, we utilize the forward and backward variables computed in the previous iteration to provide boundary distributions for each sub-block MAP decoder. Our scheme depicts asymptotically optimal performance in the sense that the BER is the same as that of the regular turbo decoder. Index Terms—Belief propagation and low latency, parallel decoding, turbo codes.

I. INTRODUCTION Fig. 1. A simplified diagram of turbo decoder.

I

N TURBO CODING, convolutional codes are used as a constituent code to obtain a large coding gain. Correspondingly, the decoding algorithm employs some type of recursive schemes, such as symbol-by-symbol maximum a posteriori (MAP) decoding [1], [2] (also known as forward-backward algorithm) or its logarithmic versions [3] in which the variables are computed recursively. Moreover, since turbo decoding is an iterative algorithm, the decoding delay may not be acceptable. be the time duration of a whole codeword and Let be the decoding computation time. In convolutional if one codes, the decoding delay can be far less than uses a sliding window algorithm as described in [4]. Turbo codes being a block code, the decoding delay cannot be less . For a block code, in which the decoding process than can start only after the reception of a whole codeword, the and . To reduce the decoding delay is the sum of decoding delay, we may use a short frame sized turbo code at the expense of performance degradation. This is a plausible is the dominant option in low data rate systems because factor in decoding delay. However, in high data rate systems, such as multi-mega bits per second (bps), it may require to , which is dominant. reduce in turbo deThere are several approaches to reduce coding, including the parallel schemes with multiple processors, as in [5]–[7]. In [5] and [6], the whole trellis stages are divided into multiple overlapped sub-blocks and the same MAP decoder as that of the regular turbo decoder is used for each sub-block. On the other hand, in [7], a sectionalized trellis is divided into sub-trellises and multiple processors are used in Manuscript received by February 26, 2002. This work was supported in part by New Jersey Center for Wireless Telecommunications (NJCWT) and by National Science Foundation (NSF) under CCR-0085846. The authors are with the Center for Communication and Signal Processing Research, New Jersey Institute of Technology, Newark, University Heights, NJ 07102 USA (e-mail: [email protected]; [email protected]). Publisher Item Identifier 10.1109/LCOMM.2002.801310.

parallel to compute the branch metrics in each sub-trellis. This work may seem related to our work, but, in fact, it is totally different in principle. In this letter, we propose parallel MAP decoding similar to [5] and [6]. However, instead of using overlapping, we utilize the forward and backward variables that were computed in the previous iteration as intermediate boundary distributions for each sub-block MAP decoder. In the structure of [6], each sub-block MAP utilizes, in fact, only partial observations and hence it is sub-optimal unless a reasonable overlapping depth is used, while the proposed scheme utilizes all the observations by message passing, as will be discussed in Section III. II. FORWARD–BACKWARD ALGORITHM FOR TURBO DECODING Consider a turbo code with two 1/2 rate systematic convolutional codes as its constituent code, whose simplified decoder structure is depicted in Fig. 1. MAP 1 and 2 are the MAP decoders for constituent codes 1 and 2, respectively. One iteration of turbo decoding includes MAP 1, interleaving, MAP 2 and deinterleaving, in this order. MAP decoding includes the computation of forward variables, backward variables and the extrinsic part of the likelihood ratio. A problem with this structure is that MAP decoding is a recursive process and the whole decoding should be repeated many times (typically five to ten iterations), causing a large delay. Let and be the forward and backward variables at the th trellis stage of state of the th constituent code. For a detailed description of the turbo decoding and MAP decoding, see [1] or [2]. In MAP decoding, the forward and backward variables are computed recursively. For the former, starting from , is computed recursively an initial distribution and based on the distribution of the previous variable the channel inputs for the th trellis stage. Similarly, for the

1089-7798/02$17.00 © 2002 IEEE

YOON AND BAR-NESS: A PARALLEL MAP ALGORITHM FOR LOW LATENCY TURBO DECODING

289

Fig. 2. A turbo decoder using parallel MAP algorithm. All the sub-block MAP decoders, shown as a box, are implemented in parallel. Each of them starts its forward-backward recursion with the boundary distributions computed in previous iteration.

latter, starting from , is computed from and the related observations. Due to the iterative and recursive nature of the decoding algorithm, the decoding computation time can be very large. To reduce the decoding time, a parallel MAP scheme was proposed in [6], where a whole trellis stages is divided into sub-blocks and processed in parallel. However, when implementing each sub-block MAP separately in parallel, appropriate boundary distributions are not available. Hence, in [6], overlapping between neighbor sub-blocks is used; i.e., th sub-block MAP decoder starts its forward recursion from and the backward recursion from , where is an integer representing the overlapping depth. Therefore, each subtrellises block MAP contains a computation of computations for each sub-block and requires the additional to provide appropriate boundary distributions at and . Furthermore, the algorithm is sub-optimal unless it uses reasonable overlapping depth. III. PROPOSED ALGORITHM AND ITS RELATION TO BELIEF PROPAGATION PARADIGM A turbo decoder that employs our proposed parallel scheme is shown in Fig. 2. As in[6], we divide the noisy codeword of a constituent convolutional code of length into sub-blocks of trellis stages. However, instead of using overlength lapping, we utilize the forward and backward variables that were computed in the previous iteration of the adjacent sub-blocks to provide appropriate boundary distributions for each sub-block MAP decoder; i.e., the th sub-block MAP decoder starts the and the backward recursion forward recursion with , where represents the values were computed in with the previous iteration. All sub-block MAP decoders perform the computation simultaneously and hence, the proposed algorithm reduces the decoding computation time exactly by a factor of . Parallel MAP Decoding We can summarize the proposed algorithm as follows. For and as initial the th sub-block, use distributions in the corresponding sub-block MAP recursion and to compute

, , recursively in that order. We then use the resulting forward and backward varito compute the extrinsic information ables and , for

. Finally, the boundary dis-

and are passed to neighbor tributions sub-blocks to be used as initial distributions in the next iteration. We emphasize, as reported in [8], that the iterative turbo decoding can be described as a “Belief Propagation” algorithm, in which the likelihood ratios (beliefs) on the transmitted information bits are updated iteratively by exchanging information between nodes in a graph. Unfortunately, the convergence of the Belief Propagation algorithm for a loopy network is not yet proven. However, as mentioned in [8], it inspires various other strategies for turbo decoding, according to how the message passing in the graphical model is coordinated. In fact, immediate application of the belief propagation algorithm to the ”hidden Markov chain,” described in [8], results in our parallel equal to 1. As in the MAP decoder with sub-block length belief propagation paradigm, the th sub-block decoder takes , , or 2, from its neighmessages from the other constituent decoder, produces bors and , , and sends updated information and denote them back to its neighbors. vectors representing a priori and extrinsic information, respectively, of the th sub-block decoder in the th constituent code. in Fig. 2. For simplicity, we omitted the systematic part The white arrows in Fig. 2 represent edge parameters that once initialized, remain fixed until the entire decoding is done. By the message passing between sub-blocks, the proposed algorithm can utilize the entire observation to finally obtain optimal values on the likelihood ratios. In [6], since each sub-block does not communicate with each other, the intermediate boundary distributions require additional neighborhood observations, which were provided by overlapping the sub-blocks. With the configuration in Fig. 2, we obtain at least two advantages over the previous parallel MAP scheme in [6]. First, there are no additional computations due to overlapping. Second, communicating with their neighbors, each sub-block MAP decoder can converge to optimal values since information in one sub-block propagates through the entire network by message passing.

290

IEEE COMMUNICATIONS LETTERS, VOL. 6, NO. 7, JULY 2002

not yet proven, it can be intuitively inferred that a good initial boundary distribution would accelerate the convergence speed. Clearly, at the first iteration, the internal forward and backward variables computed in its early recursion of a sub-block MAP decoder will be unreliable, since at initial iteration, we do not have prior knowledge about the intermediary boundary distributions. However, if the sub-block length is large enough, the values produced at the final recursion of each sub-block can be reliable enough. For example, for the th sub-block , MAP of length , the third forward variable,

Fig. 3. Comparison of BER convergence of the parallel scheme with sub-block length of 8, 16, 32, 128 and 8192 (equivalent to the regular turbo decoder) respectively. Averaged over 1200 trials.

Initialization Before proceeding with iterative decoding, the input messages to each sub-block decoder must be initialized properly. An example of the initialization of these variables is as follows. 1) Initialize permanently the edge boundary distributions , where is Kronecker delta function. 2) Initialize temporally the intermediate boundary distribu, for all and tions , where is the number of states of the convolutional code used. , for all . 3) Initialize a priori likelihood ratios Weassumed the constituent convolutional code starts and terminates at state zero with appropriate tail bits. When implemented in parallel, we have no information on the initial distribution in the first iteration, so we choose the initializationin 2). IV. SIMULATION RESULTS In Fig. 3, we plotted the bit error probabilities of the proposed scheme at each iteration step, each of which was simulated with sub-block length of 8, 16, 32, 128, and 8192 of information bits, the sub-block length of 8192 is equivalent to the regular turbo coding. At each trial, the interleaver was set at random. The simulations were performed with the Max-Log MAP algorithm Since the proposed parallel MAP algorithm is used to reduce the computational delay, we also have to take into account the convergence speed. If the proposed algorithm requires more number of iterations than that required for the regular turbo decoder, the net delay reduction may not be as much as we expected. In the figure we note the following. First, the final BER after convergence is almost the same as that of a regular turbo decoder, regardless of the sub-block length. Second, the longer the sub-block length the faster the convergence. In this simulation, the performance with a sub-block length of 128 shows almost the same convergence speed as that of the regular turbo decoding. This means that we obtain a delay reduction exactly , if we use a longer sub-block length than 128 by a factor of for the turbo code used. The reason that we can obtain faster convergence when we use a longer sub-block is as follows. Although the convergence of the Belief Propagation algorithm for a loopy network is

utilizes can utilize only three observations, while observations that belongs to the th sub-block. all the Since the soft information is transferred through the recursive computation, even in the first iteration, we obtain quite reliable information on the boundary distributions at the final recursion of a sub-block MAP decoding. V. CONCLUDING REMARKS In order to increase the decoding speed, we propose a parallel MAP algorithm. Instead of using overlaps, we utilize the forward and backward variables computed in the previous iteration, to provide appropriate boundary distributions. Simulation results show that the scheme converges asymptotically to the optimal performance in almost the same convergence rate as that of regular turbo decoder, provided that each sub-block is reasonably long. Compared to a regular turbo decoder, the scheme requires only a small amount of additional memory to store the intermediate boundary distributions, even though the algorithm employs multiple processors. Moreover, the modularity of the proposed algorithm makes hardware implementation easy; it is a concatenated structure of identical sub-block MAP decoders, which perform exactly the same operation as that of a regular MAP decoder. ACKNOWLEDGMENT The authors thank all the reviewers of this letter and the Associate Editor for making effort to give positive criticism, especially regarding the clear definition of decoding delay. It certainly made this letter more understandable. REFERENCES [1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannonlimit error-correction coing and decoding: Turbo-codes,” in Proc. ICC’93, Geneva, Switzerland, May 1993, pp. 1064–1070. [2] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Trans. Inform. Theory, vol. 42, pp. 429–445, Mar. 1996. [3] P. Robertson, E. Villebrun, and P. Hoeher, “A comprison of optimal and sub-optimal decoding algorithms operating in the log domain,” in Proc ICC’95, Seattle, WA, June 1995, pp. 1009–1013. [4] A. J. Viterbi, “An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes,” IEEE J. Select. Areas Commun., vol. 16, pp. 260–264, Feb. 1998. [5] S. A. Barbulescu, “Iterative Decoding of Turbo Codes and Other Concatenated Codes,” Ph.D. dissertation, Univ. of South Australia, 1996. [6] J. Hsu and C. Wang, “A parallel decoding scheme for turbo codes,” in Proc. ISCAS’98, vol. 4, June 1998, pp. 445–448. [7] Y. Lin, S. Lin, and M. Fossorier, “MAP algorithm for decoding linear block codes based on sectionalized trellis diagrams,” in Proc. GlobeCom’98, Sydney, Australia, Nov. 1998, pp. ???–???. [8] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, “Turbo decoding as an instance of Pearl’s belief propagation algorithm,” IEEE J. Select. Areas Commun., vol. 16, pp. 140–152, Feb. 1998.