An Improved Stochastic Decoding Algorithm of LTE Turbo Codes Jiao Xianjun Radio Systems Lab, Nokia Research Center Beijing, 100176, P.R. China.
[email protected]
Abstract. Turbo codes are widely used in many communication systems, and decoding algorithm is studied intensively. Most decoding algorithms are MaxLog-MAP based, which performs arithmetic computation in logarithm domain. In recent years, stochastic turbo decoding has been proposed as a potential high parallel scheme. Stochastic decoding is carried on in probability domain, and probability computations are done by logical operation of stochastic bit streams. In this paper, stochastic decoding is improved in three ways: firstly, use OR gate instead of multiplexer or Taylor expansion in exponential domain for probability adding operation, where data distribution of turbo decoding process is utilized; secondly, a direct intra parallel streams barrel shifter is used to break correlation of division results; thirdly, random initialization instead of zero initialization for flip-flop is used to shorten warming up time of decoding stages. Simulations show that these improvement methods reduce decoding complexity: it costs less hardware resources and decoding cycles/time. Stochastic decoding also shows scalable and flexible characteristics in the Software Defined Radio (SDR) scenario. Keywords: Stochastic decoding; turbo codes; LTE; Software Defined Radio; Data parallelism
1
Introduction
Turbo codes decoding has been studied thoroughly recent years. Most methods are Max-Log-MAP based. Max-Log-MAP convert probability multiplying to LogLikelihood Ratios (LLR) adding and maximum [1, 2], as adder is simpler than multiplier. Stochastic turbo decoding uses original form of probability calculation, but avoids arithmetic multiplier. It converts probability to stochastic bit stream, where probability is carried by events of ‘1’ occurrence. Then probability computation can be done by logical operation of stochastic streams. Assuming probability P is represented by a stochastic bit stream, where probability of ‘1’ occurrence is P. Then multiplying two probabilities and can be done by put the two bit streams through AND gate. It can be proved easily that probability of ‘1’ occurrence of output bit stream from AND gate will be . Adding, normalization and division operations
also can be done by operating stochastic bit streams [3, 4]. In [5, 6], a full stochastic decoder of LTE turbo codes is proposed. The reset of paper is organized with five parts. probability algorithm of decoding turbo codes is described in section 2; stochastic decoding and improvement methods are described in section 3; simulation results is in section 4; considerations for SDR is described in section 5; conclusion is in section 6.
2
Probability decoding of turbo codes
Fig. 1. Structure of turbo decoder
Fig. 1 depicts block diagram of probability turbo decoder. For every MAP decoder, channel output probability of systematic and parity bits are denoted by and ; input priori systematic bits probability from another decoder is denoted by ; output extrinsic systematic bits probability to another decoder is denoted by (as priori probability input of another decoder). Output posterior systematic bits probability for hard decision is denoted by . ) ( ) ( ) ( ) Branch metric: ( The metric is for the transition from state to . and are systematic and parity bit variable. and pair is the concrete input (systematic bit) and output (parity bit) { }. of the state transition in trellis. For LTE binary turbo codes, ( ) ∑ ( ) ( ); ( ) ∑ ( ) ( ) State metric: Forward ( ) and backward ( ) state metric is calculated from those states, which can be connected by transition. ) ∑ ( ) ( ) ( ) Extrinsic probability: ( Extrinsic systematic bits probability is calculated from all states transition pairs, which generates systematic bit and parity bit . Notice that right side of equation doesn’t include extern input and channel native for generating extrinsic information. ) ( ) ( ) ( ) Posterior probability: ( Full information is produced by new/extrinsic information , extern input and channel native . ( )=1, ∑ ( )=1, Notice that normalization should be performed to ensure ∑ )=1. and ∑ { } (
3
Stochastic decoding
A full stochastic turbo decoder is revealed in [5, 6]. Fig. 2 depicts MAP decoder structure, which is very similar with [5, 6].
Fig. 2. Structure of MAP decoder
Unlike Max-Log-MAP decoder, stochastic decoder in Fig. 2 can run in full parallel. That means if each stage of two sub-decoders of Fig. 1 is implemented as hardware block, every part of whole turbo decoder can run every clock. In other words, all components run concurrently. The concept of iteration number is disappeared, because two sub-decoders can exchange extrinsic information every clock. So decoding cycles (DCs) is adopted to represent one cycle/iteration for stochastic decoding. Converting probability to bit stream can be done by random number generator and comparator, detail information can be found in [3, 4]. To get probability value for hard decision, a counter can be used to count number of ‘1’ in stream. Many cycles has to run to get enough number of ‘1’, if reasonable precision is wanted. If multi streams run concurrently, enough number of ‘1’ will be reached by less decoding cycles [8]. From section 2, calculations involved in each stage of MAP decoder are multiplying and normalization. Normalization will involve adding and division. Notice that multiplying can be done by AND gate [3], and it is already the simplest way. The improvement methods are mainly for normalization. Traditional adding and division of stochastic computation are depicted in Fig. 3 (also can be found in [3]).
Fig. 3. (a) Stochastic adding by multiplexer. (b) Stochastic division by JK flip-flop
JK flip-flop actually performs normalization of two stochastic streams. If more than two streams need normalization, should be sum of multi streams. 3.1
Improvement method 1: Adding by OR gate
Multiplexer for adding has drawback of too many cycles, [5] proposed an exponential and logarithmic transformations method to transfer adding to multiplying. Detail structure can be found in [7]. Exponential and logarithmic calculations are done by Taylor’s expansion. See Fig. 4.
Fig. 4. (a) Principle of exponential-domain computation (Fig. 3 in [5]). (b) 1st order Taylor expansion circuit. (c) 2nd order Taylor expansion circuit. (c) 3rd Taylor expansion circuit (Fig. 3 in [7]).
In this paper, adding is done by OR gate instead of prior art in turbo decoding. If two stochastic streams (one has probability , the other has probability ) are put through OR gate, it can be easily proved that output stream has probability . Consider normalized error over real adding result . The error can be rewritten as . Fig. 5 plots the error versus ( ) ( )) (
(
).
Normalized error versus difference of P1 and P2
0
10
-1
Normalized error
10
-2
10
-3
10
-30
-20
-10
0 10*log10(P1/P2)
10
20
30
Fig. 5. Normalized error analysis of OR gate result over real adding
From Fig. 5, if difference of P1 and P2 is larger, the error will be smaller. This is the basic reason that why OR gate can be used to do adding in turbo decoding, because large difference does occur high probably among metrics in one stage. Though two streams condition is analyzed, it can be extended easily to more streams case, which is the situation of eight state metrics per stage in LTE turbo codes. Fig. 6 plots Cumulative Distribution Function (CDF) of maximum state metric and minimum state metric difference per stage before normalization. The simulation condition is first iteration, code length 6144 bits and Eb/N0 0.6dB. Empirical CDF; code length 6144 bits; Eb/N0=0.6dB; first iteration 1 0.9 0.8 0.7
F(x)
0.6 0.5 0.4 0.3 0.2 0.1 0
0
10 20 30 40 50 60 70 80 90 100 x = max(state metric)/min(state metric) in one stage before normalization
Fig. 6. CDF of differences of state metrics in one stage before normalization
From Fig. 6, 70% differences are more than 10 times, and 90% differences are more than 5 time times. Residential error is limited by normalization and code redundancy through iteration process. Simulation result in section 4 shows that the error has little impact on performance.
OR gate is much simpler than Fig. 3 (a) and Fig. 4. OR gate doesn’t have warming up delay, because it is combinational logic. So it can decrease decoding cycles. Apparently, it is superior to scheme with register when consider decoding cycles. 3.2
Improvement method 2: direct inter streams permutation
To overcome latching problem of JK flip-flop based division, [10] and [8] describe EM memory method for single bit stream and barrel shifter method for multi bits stream. See Fig. 7.
Fig. 7. (a) EM memory method for single bit stream (Fig. 6 (a) in [10]). (b) Barrel shifter method for multi bits stream (Fig. 5 (b) in [8]).
In [10] and [8], when JK flip-flop is in hold state, output stream selects bit from history memory of new born bit (known as EM memory [10]) or other stream randomly. Unlike the scheme with selection logic, multi streams are permuted directly by barrel shifter with stepping shift length every clock in this paper. Simulation shows it works, because each stream has same probability basically. Fig. 8 depicts the direct permuting method, and it is simpler than Fig. 7.
Fig. 8. Direct intra streams barrel shifter
3.3
Improvement method 3: random initialization of flip-flop
In traditional hardware design, flip-flops always are initialized with zeros. This is not the best option for stochastic computation. Because stochastic computation counts on frequent changing bit event to achieve enough precision. Zeros initialization has to take many time to spill out those zeros in every stages. If flip-flops are initialized with random bits, warming up process and overall decoding time will be shortened.
4
Simulation results
C language based LTE turbo codes test bench is constructed in order to test three improvement methods for stochastic decoding. Traditional Max-Log-MAP algorithm is also simulated as baseline. Shortest information length of 40 bits is chosen for rapid verification. Code rate is 1/3, which results in 3*40+12=132 bits to channel. For stochastic decoding, 32 bits width stream is used. Fig. 9 plots simulation results. LTE turbo decoding performance; infomation 40bits; tail 12 bits; rate 1/3
0
10
-1
BLER
10
-2
10
1. 2. 3. 4. 5.
zeros init; 32 bits stream; 192 DCs zeros init; 32 bits stream; 256 DCs zeros init; 32 bits stream; 320 DCs random init; 32 bits stream; 192 DCs max-log-map; 8 interations
-3
10
2
2.5
3
3.5
Eb/N0(dB)
Fig. 9. Stochastic decoding BLER performance
Fig. 9 shows that OR gate and direct barrel shifter work correctly. Random initialized flip-flops (curve 4) brings less number of decoding cycles than zero initialized flip-flops (curve 1~3). Random initialization with 192 decoding cycles (curve 4) has better performance than zero initialization with 320 decoding cycles (curve 3). With improvement methods, 192 decoding cycles (curve 4) achieve less than 0.25dB gap with max-log-map decoding (curve 5), while 1024 decoding cycles is need in [6] to achieve the same performance. Decoding cycles in this paper has been decreased much than prior art.
5
Consideration in SDR scenario
In SDR scenario, implementing algorithm with high parallel is a key point for utilize processor resources. Data parallelism has become more and more important in parallel computation. There are many researches on how to implement Max-LogMAP algorithm in parallel form [9]. When Max-Log-MAP is implemented by several parallel sub decoders [9], there are some limitations. 1. The number of sub decoders can’t be too big, or decoding performance will be degraded because of too short code length in each sub decoder. Parallel degree is limited. 2. The first and last decoder is a little different with other
decoders. It isn’t an ideal data parallelism case. An ideal case is that many same decoders process different dat. 3. If number of decoders/parallel degree is changed, parameters of overall decoder and sub decoders have to be changed accordingly. Stochastic won’t have those drawbacks. Because of native character of probability, if more quick result is desired, more same decoders/processes can be created to have more random events in unit time, which is an ideal data parallelism. If resources are limited, less decoders/streams can be used to have enough random events in longer time. Stochastic decoding has good scalable characters, every single parallel decoder maintains same implement no matter how parallel degree is changed.
6
Conclusion
In this paper, three improvement methods are proposed for stochastic decoding of LTE turbo codes: using OR gate for adding, direct intra streams barrel shifter, and random initialization of flip-flop. Simulation results show that these improvement methods have decreased hardware complexity and increased time efficiency by more than five times. Stochastic decoding is also very suitable for data parallelism in the Software Defined Radio scenario.
Reference 1. J. Vogt and A. Finger, Improving the max-log-MAP turbo decoder, Electronics Letters, vol. 36, pp. 1937 - 1939, Nov. 2000. 2. P. Salmela, Implementations of Baseband Functions for Digital Receivers. Tampere University of Technology, Doctoral Thesis, Aug. 2009. 3. Saeed Sharifi Tehrani, Shie Mannor and Warren J. Gross, Survey of Stochastic Computation on Factor Graphs, ISMVL'07, pp. 54 – 54, 13-16 May 2007. 4. Bradley D. Brown, Howard C. Card, Stochastic Neural Computation I: Computational Elements, VOL. 50, NO. 9, pp 891 - 905, SEPTEMBER 2001. 5. Quang Trung Dong, Matthieu Arzel, Christophe Jego, Warren J. Gross, Stochastic Decoding of Turbo Codes, IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 12, pp 6421 - 6425, DECEMBER 2010. 6. Quang Trung Dong, Matthieu Arzel and Christophe Jégo, Design and FPGA Implementation of Stochastic Turbo Decoder, NEWCAS, IEEE 9th, pp 21 - 24, 26-29 June 2011. 7. C. L. Janer, J. M. Quero, J. G. Ortega, L. G. Franquelo, Fully Parallel Stochastic Computation Architecture, IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 44, NO. 8, pp 2110 - 2117, AUGUST 1996. 8. Matthieu Arzel, Cyril Lahuec, Christophe Jego, Warren J. Gross, Yvain Bruned, Stochastic Multiple Stream Decoding of Cortex Codes, IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 59, pp 3486 - 3491, NO. 7, JULY 2011. 9. Y. Sun and J. R. Cavallaro, Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advance turbo decoder, Integration, the VLSI Journal, vol. 44, pp. 305 - 315, Sept. 2011. 10. Saeed Sharifi Tehrani, Shie Mannor, Warren J. Gross, Fully Parallel Stochastic LDPC Decoders, IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, pp 5692 5703, NOVEMBER 2008.