A Memory Efficient Belief Propagation Decoder for Polar ... - IEEE Xplore

COMMUNICATION IC

A Memory Efficient Belief Propagation Decoder for Polar Codes Sha Jin1, Liu Xing1, Wang Zhongfeng2, Zeng Xiaoyang3 School of Electronic Science and Engineering, Nanjing University, Nanjing 210046, China Broadcom Corp., 5300 California Avenue 92617 Irvine, CA, USA 3 State Key Laboratory of ASIC & System, Fudan University, Shanghai, China 1 2

Abstract: Polar codes have become increa singly popular recently because of their capacity achieving property. In this paper, a memory efficient stage-combined belief propagation (BP) decoder design for polar codes is presented. Firstly, we briefly reviewed the conventional BP decoding algorithm. Then a stage-combined BP decoding algorithm which combines two adjacent stages into one stage and the corresponding belief message updating rules are introduced. Based on this stage-combined decoding algorithm, a memory-efficient polar BP decoder is designed. The demonstrated decoder design achieves 50% memory and decoding latency reduction in the cost of some combinational logic complexity overhead. The proposed decoder is synthesized under TSMC 45nm Low Power CMOS technology. It achieves 0.96 Gb/s throughput with 14.2mm2 area when code length N=216 which reduces 51.5% decoder area compared with the conventional decoder design. Keywords: polar codes; belief propagation; stage-combined; memory-efficient; implemen tation

I. INTRODUCTION Polar code, which is discovered by Arıkan recently [1], is a major breakthrough in coding theory. It is the first known family of error correction codes achieving the Shannon capacity China Communications • May 2015

for binary-input discrete memoryless channels. Thanks to its low coding complexity and channel achieving capacity [1], polar code is a potential candidate for error correction codes in next generation communication system. Two main decoding algorithms of polar codes are successive cancellation (SC) decoding and belief propagation (BP) decoding [2]. The SC algorithm decodes bits serially and is with low decoding complexity, but it suffers from high decoding latency. Some works have been proposed to reduce the latency of SC decoding, for example, ref.[3]-[5]. A simplified successive cancellation (SSC) decoding algorithm is given in [3], which reduces the decoding latency by simplifying the decoding process of rate one and rate zero constituent codes. In [4], a pre-computation approach is exploited to reduce the latency of SC decoder by pre-computing the two possibilities of future bit likelihoods. About 50% latency reduction can be achieved by using this approach. In [5], an improved version of the SSC decoding algorithm, which is called maximum likelihood (ML-SSC), is proposed to increase the throughput of SC decoder. However, they still suffer from the inherent serial schedule of SC decoding. Compared to SC algorithm, BP algorithm decodes much faster due to its inherent high parallelism. It is more popular for decoder implementation. Authors in [6] proposed a differ-

34

In this paper, a novel b e l i e f p ro pa g a t i o n (BP) decoder design for polar codes was presented. A new technique, stage-combination, is proposed to combine two adjacent stages into one, thus the number of stages and the memor y requirement of the decoder will be halved. By applying path searching and nodes merging, the corresponding belief message updating rules are developed.

35

ent schedule for soft decoding of polar codes. Authors in [7] increased the efficiency of polar BP decoder by optimizing the hardware architecture of computational blocks and they also focus on the early stopping criteria for reducing the iteration numbers of BP decoder [8]. By applying the early stopping techniques, the energy dissipation and decoding latency of polar BP decoder can be reduced. A flexible FPGA (Field Programmable Gate Array) implementation of BP decoder for polar codes is presented in [9], which supports any code rate and multiple code lengths. The latest efficient implementation of polar BP decoder is proposed in [10] which presents two kinds of architectures: the single column architecture which reduces the memory requirement by sharing the L and R memory and the double-column architecture which puts twice processing elements into one clock cycle and further reduces the decoding latency. In this paper, a memory-efficient stage-combined BP decoder is presented. Based on the factor graph of polar codes, a stage-combined BP decoding algorithm is introduced at first. The conventional factor graph of polar BP decoding is based on the 2-left-port 2-rightport basic computational block (2x2 BCB). With the introduction of a 4-left-port 4-rightport basic computational block (4x4 BCB), two adjacent stages of the conventional BP decoder are combined into one stage. This stage-combined decoding scheme halves the number of stages, and thus leads to significant reduction in terms of decoding latency and memory requirement. Simulation results show that the proposed algorithm performs slightly better than the conventional BP. Based on this new algorithm, a memory-efficient BP decoder is implemented to demonstrate its efficiency. The scale number selection and quantization scheme are analyzed in detail. Then the overall architecture of the proposed decoder and synthesis results are presented. Under 45nm CMOS technology, the proposed decoders are synthesized and compared with the conventional ones. The decoder area is significantly reduced and the area efficiency is improved.

The rest of this paper is organized as follows. Section II briefly reviews polar codes and BP decoding algorithm. The stage-combined decoding algorithm is introduced in Section III. Hardware architecture of the proposed decoder and synthesis results are given in Section IV. Some conclusions are drawn in Section V.

II. REVIEW OF BP DECODER 2.1 Polar codes Polar codes can be defined with parameters (N, K, A, uAc) [1], where N = 2n is the codeword length, K is the number of information bits, A is the information bits set, and uAc represents frozen bits. The process of polar coding consists of three steps. Firstly, source vector u1N = (u1, u2, ... un-1, un) is encoded into a codeword x1N = ( x1, x2, ... xn-1, xn) as follows. x1N = u1N G N = u1N BN F ⊗n , (1) where GN is the generator matrix, BN is a bit reversal permutation matrix, and F⊗n is defined as follows: F ⊗n = F ⊗ F ⊗(n−1) , (2) 1 0 where F = 1 1 . Secondly, x1N is sent over channels WN, and the channel output y1N = (y1, y2, ... yn-1, yn) is received. Finally, given the knowledge of channel output y1N , A and uAc, the decoder generates an estimate uˆ 1N of u1N .

2.2 The conventional BP decoding algorithm The process of polar coding (encoding and decoding) can be represented by a factor graph [1] using Eq. (1). Fig.1 shows the factor graph of polar codes with N=8. As shown in Fig.1, the factor graph is divided into n=log2N stages. Each stage consists of N/2 2-left-port 2-rightport basic computational block (2x2 BCB). Fig.2 shows the factor graph of the 2x2 BCB. There are four ports with the 2x2 BCB, which are labeled with integers (i,j), 0 ≤ i ≤ n-1, 1 China Communications • May 2015

≤ j ≤ N/2. Here the parameter i indicates the stage number and the parameter j indicates the node number in the column. Being similar to LDPC codes, the BP decoding of polar codes is the process of passing messages iteratively through the factor graph. In the decoder, each node is associated with two types of messages: left-to-right messages L and right-to-left messages R. The message of each node, which is represented by log likelihood ratio (LLR), is initiated as follows. The left most source vector nodes in Fig.1 are initiated with zero or positive infinity as Eq.(4) and the right most codeword nodes are initiated with channel output LLRs as Eq.(5). Other nodes are initiated with zero. 0 if j ∈ A R1, j = ∞ i f j ∈ Ac (4)

Ln+1, j

P(y j |x j = 0) = ln (5) P(y j |x j = 1)

In BP decoding, messages are passed iteratively from left to right and then from right to left through the factor graph. Each 2x2 BCB computes the L and R messages as follows. Li, j = g(Li+1,2 j−1 , Li+1,2 j + Ri, j+N/2 ) Li, j+N/2 = g(Ri, j , Li+1,2 j−1 ) + Li+1,2 j Ri+1,2 j−1 = g(Ri, j , Li+1,2 j + Ri, j+N/2 ) (6) Ri+1,2 j = g(Ri, j , Li+1,2 j−1 ) + Ri, j+N/2 Where g(x,y)=log(cosh((x+y)/2))−log(cosh((x− y)/2)), it can be simplified [11-13] as : g(x,y) ≈ 0.9 ∙ sign(x)sign(y)min (|x|,|y|)(7) After t iterations, uˆ 1N can be decided at the left most nodes as follows. 0 i f R1, j ≥ 0 uˆ j = (8) 1 else

Stage 1

u1

Stage 2

Stage 3

x1

u5 u3

x2 x3

u7 u2

x4 x5 x6 x7 x8

u6 u4 u8

Fig.1 Factor graph of the polar codes with N=8

R (i, j )

(i, j 

N ) 2

(i  1, 2 j  1)

(i 1, 2 j )

L

Fig.2 Factor graph of 2x2 BCB

(i, j )

N (i, j + ) 4 N (i, j + ) 2 3N (i, j + ) 4

(i + 1, 4 j - 3)

(i + 1, 4 j - 2) (i + 1, 4 j - 1) (i +1, 4 j )

Fig.3 Factor graph of 4x4 BCB

III. THE STAGE-COMBINED BP DECODING ALGORITHM In this section, the stage-combined BP decoding algorithm is introduced, and some simulation results are given to show its performance.

3.1 4x4 BCB When two adjacent stages in the factor graph are combined into one stage, a kind of 4x4 BCB will be formed. Its factor graph is shown in Fig.3 [12][13]. It is constructed with four China Communications • May 2015

2x2 BCBs. As shown in Fig.3, the 4x4 BCB contains 8 node ports. The left 4 nodes are a(i, j), b(i, j+N/4), c(i, j+N/2), d(i, j+3N/4), and the right 4 nodes are e(i+1, 4j-3), f(i+1, 4j-2), g(i+1, 4j-1), h(i+1, 4j), 0 ≤ i ≤ n/2−1, 1 ≤ j ≤ N/4. From this factor graph, the L messages transmitting from right to left for nodes a, b, c, d can be computed as follows:

36

La = g(Le , g(Rb , Lg ) + g(L f , Rc ) + g(Rd + Lh , g(Rb , Rc ) + g(L f , Lg ))) Lb = g(L f , Rd + Lh + g(Rc , Lg )) + g(Le , g(Ra , Lg + g(Rc , Rd + Lh ))) (9) Lc = g(Lg , Rd + Lh + g(Rb , L f )) + g(Le , g(Ra , L f + g(Rb , Rd + Lh ))) Ld = Lh + g(Rc , Lg ) + g(Rb , L f ) + g(Le , g(Ra , g(Rb , Rc ) + g(L f , Lg )))

Due to symmetry, the R messages transmitting from left to right have the same computational forms. Re = g(Ra , g(Rb , Lg ) + g(L f , Rc ) + g(Rd + Lh , g(Rb , Rc ) + g(L f , Lg ))) R f = g(Rb , Rd + Lh + g(Rc , Lg )) + g(Ra , g(Le , Rc + g(Lg , Rd + Lh ))) (10) Rg = g(Rc , Rd + Lh + g(Rb , L f )) + g(Ra , g(Le , Rb + g(L f , Rd + Lg ))) Rh = Rd + g(Rc , Lg ) + g(Rb , L f ) + g(Ra , g(Le , g(Rb , Rc ) + g(L f , Lg )))

3.2 Factor graph based on 4x4 BCB By replacing the original 2x2 BCB with the new 4x4 BCB, a new factor graph can be deduced as shown in Fig.4, where the code length N=16. The conventional factor graph based on 2x2 BCBs contains n=log2N stages. With the introduction of 4x4 BCB and stage combination, the number of layers is reduced to n/2=log4N stages. The decoding schedule of the stage-combined BP decoding algorithm keeps the same as the conventional one. Since one clock cycle is allocated to each stage during the decoding procedure, the conventional BP decoding costs 2(log2N -1) clock

u1 u9 u5 u13 u3 u11 u7 u15 u2 u10 u6 u14 u4 u12 u8 u16

Stage 1

Stage 2

4x4 BCB

4x4 BCB

4x4 BCB

4x4 BCB

4x4 BCB

4x4 BCB

4x4 BCB

4x4 BCB

Fig.4 Factor graph of polar codes based on 4x4 BCBs, N=16

37

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16

cycles per iteration. By applying the 4x4 BCB, the decoding latency is reduced to (log2N -2) clock cycles per iteration, so 50% clock cycle number is reduced. In addition, since N messages in each stage need to be stored, the memory requirement will be reduced by half as well. Therefore, the staged-combined BP algorithm will not only reduce the decoding latency, but also save the memory requirement.

3.3 Simulation results In order to evaluate the error correction performance of the proposed decoding algorithm, the Monte Carlo simulation results over AWGN channel are provided in Fig.5. The code lenth is 1024 and the iteration number is set to 15. As shown in Fig.5, the proposed decoding algorithm performs similar to the conventional one in low SNR region, and it even outperforms the conventional algorithm in high SNR region. This is because the conventional BP algorithm is not optimal, and the proposed stage-combined algorithm gives more accurate belief computations.

IV. IMPLEMENTATION In this section, the stage-combined BP algorithm is implemented. The scale number and quantization scheme are analyzed at first, then the detailed architecture of proposed decoder is presented. Comparison with the conventional BP decoder is given at last.

4.1 Scale number and quantization selection For better hardware implementation, the scale number 0.9 in Eq. (7) need to be adjusted. Here, the hardware friendly numbers (1-1/8), (1-1/16) are selected as two candidates for the scale number. As shown in Fig.6, decoding algorithms with scale number 0.875 and 0.9375 have similar performance as the original number 0.9 at high SNR region. In addition, the peroformance of scale number 0.875 is much better than the other two in low SNR region. Therefore, 0.875 is selected as the scale numChina Communications • May 2015

0

10

BER, proposed FER, proposed BER, conventional FER, conventional

-1

10

-2

Error Rates

ber for hardware implementation. After the scale number is decided, it is important to choose an efficient quantization scheme for a better tradeoff between hardware consumption and decoding performance. LLRs in the decoding process are expressed in sign-magnitude forms, including one sign bit and several magnitudue bits. Fig.7 shows the comparison of different quantization schemes. As shown in the figure, the 6-bit-quntization scheme has appromixately 0.2 dB decoding degradation over the unquntized scheme. The degradations of 7-bit-quantization and 8-bit-quantization are relatively negligible. 7 bits is a good word length choice for hardware implementation. All these simulations are based on uniform quantization. In order to further shorten the bits number, some non-uniform quantization could also be explored.

10

-3

10

-4

10

-5

10

0.5

1

1.5

2 Eb/No (dB)

2.5

3

3.5

Fig.5 Performance comparison of proposed BP decoding and conventional BP decoding for polar codes

4.2 Hardware architecture

China Communications • May 2015

10

0

10

-1

10

-2

0.875 0.9375 0.9

BER

The overall architeture of the stage-combined BP decoder is shown in Fig.8, where the code length is 210. The top level architecture includes six modules. The processing elements (PE) array computes the L and R messages with 256 4x4 BCBs. The data right and data left modules are actually two multiplexers which select the inputs for processing elements. The two routers perform message permutations between stages. Since the factor graph of 1024-bit polar codes based on 4x4 BCB contains 5 stages, the message memory requirement is 4*7*1024bits=28Kb. The PE module design which updates the L or R messages in 4x4 BCB is shown in Fig.9. The computations it handles are Eq. (9) and Eq. (10). There are only two types of basic operations in this module, Type I “g” operation and Type II “+” operation. The detailed architectures of four independent outputs are shown in Fig.9. The labels i (j, k) in each processing unit indicate the ith g/+ operation with inputs j and k. The blocks with red edge are operations with the same inputs and they can be reused. Therefore, 16 “g” operation units and 13 “+” operation units are instantiated in the PE. The

10

-3

10

-4

10

-5

0.5

1

1.5

2

2.5

3

3.5

Eb/N0 (dB)

Fig.6 performance comparison of different scale number

critical path is illustrated with dashed blue line in Fig. 9 (b), which includes three “+” operations and two “g” operations. The detailed architectures of operations “g” and “+” are shown in Fig. 10. Since the LLRs in the decoder are written in sign-magnitude forms, they need to be converted to 2’complement format before they are added and go through an

38

0

10

-1

10

-2

10

-3

10

-4

10

-5

6 bits 7 bits 8 bits float

BER

10

0.5

1

1.5

2 Eb/No (dB)

2.5

3

3.5

Fig.7 performance comparison of different quantization scheme

LLR 0/1

data left multiplexer

PE array

data right multiplexer

256 4x4 BCBs

router 1

out router 2

register file 4096 7-bit registers

Fig.8 top level architecture of the proposed decoder, N=1024

inverse conversion after the adder. The Type I module implements Eq. (7) with scale number 0.875. The scale unit in Fig. 10 (a) is implemented with a simple shifter and adder.

4.3 Synthesis results To demonstrate the efficiency of the proposed stage-combined BP decoder architecture, several polar decoder examples are designed and synthesized under TSMC 45nm Low Power CMOS technology. The conventional polar BP decoder architecture was proposed in [2], and

39

also presented in [10] as the single column architecture. It is also designed and synthesized under the same technology to give a fair comparison. The design is partially parallel and the number of PE (4x4BCB) is set as 256 which means 1024 nodes are processed concurrently. For the conventional design, this number is 512 due to that conventional PE is based on 2x2 BCB. PEs are reused in decoding to reduce the combinational area. Table.1 shows the comparison results under different code lengths. Due to the PE reuse, the combinational area remains the same. The sequential area, which is mainly used for message memory storage, becomes the critical part of the decoder when the code length increases. Since the number of stages is halved by stage combination, the message memory of the proposed decoder will be reduced by 50%. Compared to the conventional design, the proposed decoder reduces the total decoder area by 18.4%, 45.5%, 51.5%, 52.7% when code length is 212, 214, 216, and 218, respectively. The critical path of conventional design consists of one “g” and one “+” operations. So the critical path of the proposed decoder increases from 2.05ns to 5.07ns. In [10] the authors proposed a kind of compact register file design skill to decrease the message memory area and the critical path will be dominated by the memory access time. The same technique can be applied here and the gap in critical path will be further reduced. In general, the proposed decoder achieves 50% memory and 50% decoding latency reduction in the cost of some PE complexity overhead.

V. CONCLUSION This paper presented a memory-efficient stage-combined BP decoder design for polar codes. Based on the factor graph of polar codes, a stage-combined BP algorithm which halves the number of stages is introduced at first. Simulation results show that this decoding algorithm performs slightly better than the conventional BP. Hardware design results show that the proposed decoder can lead to China Communications • May 2015

in1

Type I

in2

g 1 (2,3)

in1

Type I

in3

g 2 (2,7)

in4

Type I

in6

g 3 (3,6)

+2

Type I

Type II

Type I

g5

+4

g6

g 7 (1,5)

Type II

Type I

+5

g 10

in3

out1

g 8 (3,7)

in5

+3

in7

g 4 (6,7)

in7

in8

Type II

in8

Type I

Type I

in4

Type II

Type I

Type I

in2

Type II

g 11

Type I

Type II

g9

+6

Type II + 1 (4,8)

Type II out2 +7

Critical path

+ 1 (4,8)

(a) (b)

in1

Type I

in2

g 7 (1,5)

Type II

Type I

+8

g 14

in3 Type I

in4

g12 (2,6)

in5 in6

Type II

Type I

Type II

g 13

+9

g 7 (1,5)

in2 Type I

Type II

g 15

+ 10

out3

Type I

in3

g 1 (2,3)

in4

Type I

in5

g12 (2,6)

in6

Type I

Type II

Type I

+2

g 16

Type II

out4

+ 13

Type II

Type II

+ 11

+ 12

g 8 (3,7)

in7

+ 1 (4,8)

in8

Type I

in1

Type I g 4 (6,7)

(c) (d) Fig.9 the detailed architecture of 4x4 BCB

sign(x) sign(y) mag(x)

mag(y)

comp select

sign(x)

sign

xor scale 0.875

s2c

mag(x)

adder

sign(y)

mag

c2s

mag

s2c

mag(y)

(a) Type I

sign

(b) Type II

Fig.10 the detailed architectures of Type I and Type II modules in 4x4 BCB Table I Comparison of proposed polar decoder with conventional polar decoder over different code lengths Code length

Area [mm2] Sequential area

Combinational area

Total area

Throughput [Mb/s]

proposed

conventional

proposed

conventional

proposed

conventional

proposed

conventional

210

0.121

0.273

0.626

0.173

0.747

0.446

1683

1850

2

12

0.605

1.336

0.626

0.173

1.231

1.509

1346

1514

214

2.904

6.304

0.626

0.173

3.53

6.477

1122

1280

2

16

13.568

29.12

0.626

0.173

14.194

29.293

962

1110

218

61.952

132.096

0.626

0.173

62.578

132.269

842

979


40

50% memory reduction over the conventional design, and it also significantly reduces the decoder area with the code length increases. In general, the proposed decoder achieves 50% memory and 50% decoding latency reduction in the cost of some PE complexity overhead.

ACKNOWLEDGEMENTS This work was jointly supported by the National Nature Science Foundation of China under Grant No. 61370040 and 61006018, the Fundamental Research Funds for the Central Universities, the Priority Academic Program Development of Jiangsu Higher Education Institutions and Open Project of State Key Laboratory of ASIC & System (Fudan University) 12KF006.

References [1] E. Arıkan, “Channel polarization: a method for constructing capacity- achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. on Inf. Theory, vol. 55, no. 7, pp. 30513073, July 2009. [2] E. Arıkan, “A performance comparison of polar codes and reed-muller codes,” IEEE Commu. Lett, vol. 12, no. 6, pp. 447-449, June 2008. [3] A. A.-Yazdi and F. R .Kschischang, “A simplified successive cancellation decoder for polar codes,” IEEE Comm. Lett., vol. 15, no. 12, pp. 1378-1380, December 2011. [4] C. Zhang and K. K. Parhi, “Low-latency sequential and overlapped architectures for successive cancellation polar decoder,” IEEE Trans. on Signal Proc., vol. 61, pp. 2429-2441, 2013. [5] G. Sarkis and W. J. Gross, “Increasing the throughput of polar decoders,” IEEE Comm. Lett., vol. 17, no. 4, pp. 725-728, April 2013. [6] Low-complexity soft-output decoding of polar codes, IEEE JSAC, vol. 32, no. 5, pp. 958-966, May 2014. [7] Bo Yuan; Parhi, K.K., “Architecture optimizations for BP polar decoders,” Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , vol., no., pp.2654,2658, 26-31 May 2013 [8] Bo Yuan; Parhi, K.K., “Early Stopping Criteria for Energy-Efficient Low-Latency Belief-Propagation Polar Code Decoders,” Signal Processing, IEEE Transactions on , vol.62, no.24, pp.6496,6506, Dec.15, 2014 [9] Pamuk, A., “An FPGA implementation architecture for decoding of polar codes,” Wireless Communication Systems (ISWCS), 2011 8th International Symposium on , vol., no., pp.437,441,

41

6-9 Nov. 2011 [10] Y. Park, Y. Tao, S. Sun, and Z. Zhang, “A 4.68 Gb/s belief propagation Polar decoder with bit-splitting register file”, in IEEE Symposium on VLSI Circuits, June 2014. [11] Marc P. C. Fossorier, M. Mihaljevic, and H. Imai, “Reduced Complexity Iterative Decoding of Low-Density Parity Check Codes Based on Belief Propagation,” IEEE Trans. on Commun., vol. 47, no. 5, pp. 673-680, May. 1999. [12] F. R. Kschischang, B. J. Frey, and H. Loeliger, “Factor Graphs and the Sum-Product Algorithm,” IEEE Trans. on Inf. Theory, vol. 47, no. 2, pp. 498-519, Feb. 2001. [13] G. D. Forney Jr., “Codes on Graphs: Normal Realizations,” IEEE Trans. on Inf. Theory, vol. 47, no. 2, pp. 520-548, Feb. 2001.

Biographies Sha Jin, received the B.S. degree in physics in 2002, and received the Ph.D. degree in microelectronics in 2007, both at the Nanjing University, Nanjing, China. From 2007 to 2008, he worked for the OmniVision Technologies, Inc as an ASIC design engineer. Since 2008, he has been worked for the School of EE at Nanjing University, China as an associate professor. His research is focused on VLSI architectures and integrated circuit (IC) design for communications, coding theory applications, and image signal processing. Liu Xing, received the B.S. degree in physics from Nanjing University, Nanjing, China, in 2012. He is currently working toward the M.S. degree at the School of EE, Nanjing University. His research interests are in the design of efficient algorithms and implementations for decoding error-correcting codes, in particular polar codes. Wang Zhongfeng, received the B.E. and M.S. degrees in automation from Tsinghua University, Beijing, China, in 1988 and 1990, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, in 2000. He is now a Senior Principle Scientist with Broadcom Corporation, CA, USA. His current research interests include the area of VLSI design for digital communication systems. Zeng Xiaoyang, received the B.S. degree from Xiangtan University, Xiangtan, China, in 1992, and the Ph.D. degree from Changchun Institute of Optics, Fine Mechanics, and Physics, Chinese Academy of Sciences, Changchun, China, in 2001. Since 2003, he joined the State Key Lab of ASIC and System, Fudan University, as an Associate Professor, where he is currently a Full Professor and the Director. His research interests include information security chip design, system-onchip platforms, and VLSI implementation of digital signal processing and communication systems.


A Memory Efficient Belief Propagation Decoder for Polar ... - IEEE Xplore

A Memory Efficient Belief Propagation Decoder for Polar ... - IEEE Xplore

Suggest Documents

A Simplified Belief Propagation Decoder for Polar Codes - IEEE Xplore

Memory-Efficient Polar Decoders - IEEE Xplore

A Cost efficient LDPC decoder for DVB-S2 - IEEE Xplore

Hardware-Efficient Belief Propagation

Communication-Efficient Parallel Belief Propagation for Latent ...

Low Complexity Belief Propagation Polar Code Decoders

Augmented Belief Propagation Decoding of Low ... - IEEE Xplore

Improving belief propagation on graphs with cycles - IEEE Xplore

Improved Belief Propagation Decoding Algorithm for Short Polar Codes

nozomi { a fast, memory-efficient stack decoder for lvcsr - CiteSeerX

Reduced Complexity Belief Propagation Decoders for Polar ... - arXiv

An Efficient List Decoder Architecture for Polar ... - Semantic Scholar

Efficient Belief Propagation with Learned Higher

Wave Propagation - IEEE Xplore

Efficient Pattern Matching Algorithm for Memory ... - IEEE Xplore

A Fully Parallel Truncated Viterbi Decoder for Software ... - IEEE Xplore

Reduced-Memory High-Throughput Fast-SSC Polar Code Decoder

Efficient and Correct Trust Propagation Using CloseLook - IEEE Xplore

Efficient and Correct Trust Propagation Using CloseLook - IEEE Xplore

Efficient Belief Propagation for Early Vision - Brown CS

Efficient Belief Propagation for Utility Maximization and Repeated

Improving Belief Propagation Decoding of Polar Codes Using ... - arXiv

Enhanced Belief Propagation Decoding of Polar Codes by Adapting

NETWORK PROPAGATION MODELS FOR GENE ... - IEEE Xplore