Machine Intelligence in Forward Error Correction Decoding

Machine Intelligence in Forward Error Correction Decoding. Supervisors: Dr. Hugo Tullberg - Ericsson Research Prof. Ragnar Thobabben - KTH

Navneet Agrawal

Master Thesis presentation

1 / 52

Acknowledgment

Acknowledgment

Dr. Hugo Tullberg, Supervisor Maria Edvardsson, Manager Entire team of Ericsson Research Special mention: Mathias Andersson, Nicolas Seyvet Vidit Saxena, and Maria Fresia

Navneet Agrawal


2 / 52

Content

Contents

Background - Channel coding, Factor graphs, Sum-Product Algorithm Neural Network Decoder - Design and Analysis Experiments and Results Conclusions and Future work

Navneet Agrawal


3 / 52

Background

Background Introduction

Communication System Introduction to Coding Theory Factor graphs Decoding using Sum-Product Algorithm

Navneet Agrawal


4 / 52

Background

Communication System

Communication System Model

Source b : b ∈ {0, 1}k

Sink ˆ = ˆs ⊗ G−1 b

Encoder s = bT ⊗ G : s ∈ C, G ∈ {0, 1}n×k

Decoder ˆ y = argmax p(r|y)

Modulator y = (−1)si

Navneet Agrawal

y:s∈C

Channel r = y+n : ni ∼ N (0, σn2 )


Demodulator p(ri |yi )

5 / 52

Background

Channel Coding Theory

Channel Coding Basics The channel coding adds redundancy in order to recover information lost during transmission. A linear block code C(n, k) forms n linear combinations of k information bits, where n > k. Rate of a code is the ratio of amount of information per bits transmitted, i.e. r = nk . Hamming distance dH between two codewords u, v ∈ C is the number of positions at which u differs from v. Minimum distance dmin is the minimum hamming distance between any codewords u, v ∈ C, u ̸= v. Maximum number of errors that can be corrected using a code C(n, k) is given by ⌋ ⌊ t ≤ dmin2−1 . The information bits are encoded by taking a product with the Generator matrix of size [k × n]. y = bT ⊗ G The rows of the parity check matrix H (size [n − k, n]) forms the basis for the dual code C ⊥ = {v ∈ GF(2)n : y · vT = 0, ∀y ∈ C}, where H ⊗ yT = 0T , ∀ y ∈ C. Navneet Agrawal


6 / 52

Background

Channel Coding Theory

Maximum Likelihood Decoding Maximum Likelihood (ML) decoding solution of y is given by, yˆi

ML

= argmax yi :sT ∈C

n ∑(∏ ∼yi

) p(rj |yj ) 1{s∈C} ,

j=1

where 1{f} is the code membership function, which is 1 if f is true, 0 otherwise. ML decoding problem is NP-Hard with complexity exponentially increasing with the length of the code, given by: kc 2min(n−k,k) . Code membership function 1s∈C has a factorized form given by,

1{s∈C} =

K ∏

1{∑ sk =0} ,

k=1

where sk denotes ∑ the subset of bits in s which must satisfies the parity check sk = 0. Navneet Agrawal


7 / 52

Background

Factor Graphs

Factor graphs x1

Systematic method to apply distributive law (ab + ac = a(b + c)). Example: consider function f with factorization

f1 x5

f2 x4

f(x1 , x2 , x3 , x4 , x5 ) = f1 (x1 , x5 ) f2 (x1 , x4 ) f3 (x2 , x3 , x4 ) f4 (x4 ),

=

∑

x3

∼x1

∼x1

[∑ |

x2

Figure: Factor graph of function f(x1 , x2 , x3 , x4 , x5 )

f1 (x1 , x5 ) f2 (x1 , x4 ) f3 (x2 , x3 , x4 ) f4 (x4 )

| =

f3

f4

where each variable xi ∈ {a1 , . . . , aq }, aj ∈ R. f(x1 ) can be computed by marginalizing other variables, ∑ f(x1 ) = f(x1 , x2 , x3 , x4 )

{z

marginal of products

x1

}

][ ∑ (∑ )] f1 (x1 , x5 ) f2 (x1 , x4 )f4 (x4 ) f3 (x2 , x3 , x4 ) .

x5

x4

{z

product of marginals

x2 ,x3

}

f1 x2 f2 x3 f3

Marginal of products requires 4q5 sums and multiplications, while Product of marginals requires only 2q2 + 6q4 .

x4 f4 x5 Figure: Bipartite graph representation of f.

Navneet Agrawal


8 / 52

Background

Factor Graphs

Marginalization via message passing Messages µ(x) are passed along the edges of the graph. Sum-Product Algorithm (SPA) ⋆ Sum-mary Factor node (fj ) to variable node (xi ), µfj →xi (xi ) =

∑ ( ∼{xi }

)

∏

fj

∏

∏

x4

∏

f1 f4

x4

f2

∑ x2 ,x3

µfk →xi (xi ),

fk ∈n(xi )\{fj }

Navneet Agrawal

∑

x5

xk ∈n(fj )\{xi }

⋆ Product Variable node (xi ) to factor node (fj ), µxi →fj (xi ) =

∑

µxk →fj (xk ) ,

x1


f3

9 / 52

Background

Decoding using Sum Product Algorithm

Decoding using Sum Product Algorithm Tanner Graph ⇒ Parity check matrix H[n−k,n] • • • •

Hamming [7,4]  1 H = 1 0

{v1 , . . . , vn } variable nodes {c1 , . . . , cn−k } check (factor) nodes An edge between vi and cj if H[j, i] = 1 Messages are the Log-Likelihood Ratios, γvi = ln

 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1

p(rvi |yvi = +1) 2rv = 2i . p(rvi |yvi = −1) σ

v0

Summary: Check to Variable node: ∑ ∏ µcj →vi = fcj µvk →cj (vk ) ∼{vi }

code:

vk ∈n(cj )\vi

∏

⇒ γcj →vi = 2 tanh−1

tanh

vk ∈n(cj )\vi

v1

( γv

k →cj

2

)

v2 . v3

Product: Variable to Check node: ∏ µvi →cj = µck →vi ∑

c1

v4 v5

ck ∈n(vi )\cj

⇒ γvi →cj = γvi +

c0

γck →vi .

c2

v6

ck ∈n(vi )\cj

Figure: Tanner graph of (7,4) Hamming code. Navneet Agrawal


10 / 52

Background

Decoding using Sum Product Algorithm

Tanner graphs with cycles - Good and Bad v0 v1

Good codes have cycles • Without ) cycles → at least (

v0 c0

v2

2r−1 2 n

codewords with Hamming weight 2.

v3

• Do not perform MAP-decoding,

factors are not independent. • Performance worsen for codes with many small girth cycles such as BCH or polar codes. • The iterative nature increases the latency of the decoder.

Navneet Agrawal

v5 v6

c0

v2 c1

v4

SPA perform poorly with cycles

v1

v3

c1

v4 c2

v5

c2

v6

Figure: SPA over Tanner graph with cycles. Nodes v0 and v2 form a cycle with c0 and c1 . The information received by node v0 contains information of v2 , and in next iteration, v2 wil receive its own information back from v0 via c1 . There is no exact expression for marginalization of v0 or v2 .


11 / 52

Neural Network Decoders


Related Work and Contributions NND Design and Operations Hyper-Parameters

Navneet Agrawal


12 / 52


Related work

Related work Sub-optimal decoding • Iterative decoding : Belief Propagation or SPA

R. Tanner, A recursive approach to low complexity codes., 1981, Hagenauer et. al., Iterative decoding of binary block and convolutional codes, 1996 • Linear Programming Relaxation. J. Feldman et. al. , Using linear programming to decode binary linear codes., 2005.

Data-driven decoding of structured codes • Functional similarity of error-correcting codes with neural networks.

• • • •

J. Bruck and M. Blaum, Neural networks, error-correcting codes, and polynomials over the binary n-cube.,1989 Feed-forward neural network decoders El-Khamy et. al. , Soft decision decoding of block codes using artificial neural network., 1995, Hopfield network decoders Esposito et. al. , A neural network for error correcting decoding of binary linear codes., 1994, Random neural network decoder El-Khamy et. al. , Random neural network decoder for error correcting codes., 1999 Learning structure of linear codes using a deep neural network. Gruber et. al., On Deep Learning-Based Channel Decoding, 2017

Neural Network Decoder (NND): SPA based implementation for data-aided learning. Nachmani et. al., Learning to Decode Linear Codes Using Deep Learning, 2016

Navneet Agrawal


13 / 52


Contributions

Contributions

Elucidate design and construction of the NND. Analysis of various parameters affecting the training and online performance of the NND. Introduction of a new loss metric that improves online performance. Deeper insight into working of the NND based on the learned weight’s distribution. Analysis of NND’s performance on different families and sizes of linear block codes.

Navneet Agrawal


14 / 52


NND Design and Operations

SPA over an Unrolled Tanner Graph

L1 v0 v1

v0 c0

v2 v3

v6

c0

v2 c1

v4 v5

v1

v0

v3

v2 c1

v4 c2

v5

v1

v3

L2

v4 c2

v6

v5 v6

Figure: Unrolled Tanner graph for 2 iteration of SPA for (7,4) Hamming code.

Navneet Agrawal


15 / 52



SPA using Neural Network over edges of Tanner graph

Figure: Network graph of the SPA-NN with nodes as edges of Tanner graph. Navneet Agrawal


16 / 52



Neural Network Decoder with Learn-able Weights

Wo2e

Wi2o

Wo2e

We2o

Wo2e

We2o

We2x

Figure: Dashed lines shows connections carrying learn-able weights. Navneet Agrawal


17 / 52



Neural Network Decoder - Operations Odd layer:

( Xi = tanh

) ) 1( T Wi2o ⊗ L + WT , e2o ⊗ Xi−1 2

where L is channel information and Xi−1 is output of previous even layer. Even layer: ( ) Xi = 2 tanh−1

X⋆i−1 ,

where X⋆i−1 is obtained by applying matrix transformations on previous odd layer output, equivalent of taking product of elements of Xi−1 corresponding to Wo2e = 1 along the row. Output layer: Lˆi = L + WT e2x ⊗ Xi−1 Navneet Agrawal


18 / 52



Computational Complexity

Operations Input-layer: Odd-layer: Even-layer: Output-layer: Total:

Multiplications ne (ne + n + 1) ne (2ne + 1) ne n 3n2e + 2ne (n + 1)

Nodes n ne ne n 2(n+ne )

Table: Number of multiplications and nodes for one SPA iteration in NND.

Figure: Comparison of graph size (bar) and total number of multiplications (line) required for NND of different codes built for 5 SPA iterations.

Navneet Agrawal


19 / 52


Hyper-Parameters

Hyper-parameters for Training NND Class Design

Weights

Optimization

Training

Parameter Code Parity Check Matrix Number of SPA iterations Network Architecture ˜ i2o ) Train input weights (W ˜ e2x ) Train output weights (W Weights Initialization Loss Function Loss Function type Optimizer Learning Rate Training codeword type SNR Training (dB) SNR validation (dB) Training Batch length Validation Batch length Total training epochs Validate after n epochs LLR Clipping

Value (n, k) type Binary Matrix Integer FF or RNN True or False True or False Random or Fixed Cross-entropy, Syndrome or Energy Single or Multiple RMSProp float (< 1.0) 0 or random float or array float or array Integer Integer Integer Integer Integer

Typical (32,16) polar 5 RNN False False Fixed Cross-entropy Multiple RMSProp 0.001 0 [2.0] {−3, −2, . . . , 9} 120 600 218 500 20

Table: A list of parameters required to set up NND for training. A typical example of parameter settings is provided for reference.

Navneet Agrawal


20 / 52


Hyper-Parameters

Common Parameters Code - C(n, k) : NND design and performance is specific to the Tanner graph of Code. LLR Clipping : tanh−1 causes sudden explosion in LLR values. Values are clipped to [−20, 20]. Weights settings: • Selection: Even-to-Odd layer weight (We2o ) are mandatory, other

weigths (Wi2o , We2x ) can be ignored. • Initialization: Weights are initialized with fixed values, making NND

perform equivalent to SPA. • Quantization: 32-bit floating point numbers are used.

Optimizer: Adaptive Stochastic Gradient Descent Optimizer RMSProp. Learning Rate: Vary depending on the size of the network and number of learn-able parameters. Training codewords: Training using “All-zero” codewords allowed due to symmetry property of SPA 1 . 1

Definition 4.81 in [ Richardson and Urbanke, Modern Coding Theory., Cambridge Navneet Agrawal


21 / 52


Hyper-Parameters

Number of SPA iterations

NND graph is designed for fixed number of SPA iterations (ni ). Size of graph and number of learn-able parameters grow with ni . Performance improvement with increase in ni is not linear. Deeper networks are hard to train.

Navneet Agrawal


22 / 52


Hyper-Parameters

Network Architecture NND is characterized by a set of operations involving learn-able weights W in each iteration. Feed-forward architecture: • Learn-able parameters are

separately trained for each iteration. • Leads to a higher degree of freedom for the model.

Recurrent architecture: • Learn-able parameters are shared in

each iteration. • Leads to constraints on learn-able

parameters and regularization of the model. • NND performs same operations in each iteration - similar to SPA.

Navneet Agrawal


23 / 52


Hyper-Parameters

Loss Function - Cross Entropy Output layer loss: LCE f (p, y) = −

N ) 1 ∑( y(n) log (1 − p(n)) + (1 − y(n)) log p(n) N n=1

where p(n) is the estimated probability of y(n) = 0 obtained from the final output layer, and y is the binary vector of target codeword. Multi-loss: ( N ) 2L ) ∑( 1 ∑ CE y(n) log (1 − p(l, n)) + (1 − y(n)) log p(l, n) Lm (p, y) = − NL l=2,4,...

n=1

where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer.

Navneet Agrawal


24 / 52


Hyper-Parameters

Loss Function - Cross Entropy with Syndrome check Output layer loss: LSC f (p, y) = −

N ) 1 ∑( y(n) log (1 − p(n)) + (1 − y(n)) log p(n) N n=1

where p(n) is the network output probability of nth bit at the first output-layer at which the syndrome check is satisfied. Multi-loss: ( N ) 2M ) ∑( 1 ∑ SC Lm (p, y) = − yn log (1 − p(l, n)) + (1 − yn ) log p(l, n) MN l=2,4,...

n=1

where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer. If the syndrome check is satisfied at layer 0 < k < 2L, then 2M = k, else 2M = 2L.

Navneet Agrawal


25 / 52


Hyper-Parameters

Cross Entropy Loss - With and Without Syndrome check

Figure: BLER performance comparison for Cross-Entropy loss with and without syndrome check. Navneet Agrawal


26 / 52


Hyper-Parameters

Loss Function - Energy based Loss Output layer loss: ) 1 ∑( (1 − 2p(n))(−1)y(n) N N

LEf (p, y) = −

n=1

where p(n) is the network output probability of nth bit at the final output layer. Multi-loss: ( N ) 2M ) ∑( 1 ∑ E y(n) (1 − 2p(l, n))(−1) Lm (p, y) = − MN l=2,4,...

n=1

where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer.

Navneet Agrawal


27 / 52


Hyper-Parameters

Cross Entropy vs Energy based Loss

Figure: Comparison of Cross entropy and Energy based loss functions for LLR output of a single bit, given the target bit y = 0.

Navneet Agrawal


28 / 52


Hyper-Parameters

Cross Entropy vs Energy based Loss

Figure: BER performance comparison of Energy based and Cross-entropy loss functions for (32,16) polar code. Navneet Agrawal


29 / 52


Hyper-Parameters

SNR of the Training data

NND is desired to perform optimally during its online operations, with data generated using any SNR value. Training at extreme SNR values: Low SNR: too many error, High SNR: too few errors, Restricts learning of patterns in the code structure. Training data can be created using, 1) Fixed SNR, 2) Range of SNR,

Navneet Agrawal


30 / 52


Hyper-Parameters

Comparison of SNR values for training

The Normalized Validation Score (NVS) is calculated for a training SNR value ζt by averaging over the ratio of BER for Neural Network Decoder (NND) and SPA, evaluated for a validation data set generated with a set of SNR values ρv,s , s = {1, . . . , S} using the network trained at SNR ζt . 1 ∑ BERNND (ζt , ρv,s ) S BERSPA (ρv,s ) S

NVS(ζt ) =

s=1

Figure: Comparison of SNR values for training (32,16) polar code.

Navneet Agrawal


31 / 52


Hyper-Parameters

Training Length and Stopping Criteria

Training length of NND depends on the number of learn-able parameters and learning rate. Problems with long training: • Deviate from the global optimal due to high learning rate • Overfitting the training data leads to poor online performance

Solution: Early Stopping Stopping Criteria: The NND trains and validates the performance for specific number of epochs, before selecting the state of NND that gave the best performance on validation set.

Navneet Agrawal


32 / 52

Experiments and Results


Simulation and Software Implementation Edge Weight Analysis Decoding Results

Navneet Agrawal


33 / 52


Experimental Setup

Simulation and Software Implementation Programming language: Python 2.7 (+ NumPy 1.13) Neural Networks: Tensorflow 1.2 (+ Python API) SPA decoder: Open Source implementation by Radford M. Neal

Start: - Gather params - Program config

Navneet Agrawal

Initialize: - NND Tensorflow graph - Communication system


Training/test process: - Start TF session - Loop 1-2 while not 3: 1. Train n batches 2. Validate model 3. Stopping criteria - Save trained weights

34 / 52


Edge Weight Analysis

Learned Weight Analysis v0 0

v1

1

v2

4

c0

6

v3

c1

v4 v5 Figure: Learned weight distribution over edges for (7,4) Hamming code.

Navneet Agrawal


c2

v6

35 / 52



Learned Weight Analysis Tree structured  1 0 H = 1 1 0 0

(7,4) code  0 1 0 0 0 0 0 1 0 0 1 0 1 1 1

v0 v1

c0

v2 v3

Figure: Learned weight distribution over edges for tree structured (7,4) code.

c1

v4 v5

c2

v6 Navneet Agrawal


36 / 52



Evolution of Weights in Consecutive Layers

(a) 20 iterations

(b) 5 iterations

Figure: Analysis of evolution of weights for different number of SPA iterations, and different NND architectures. Navneet Agrawal


37 / 52


Decoding Results

Decoding Results

Table: List of codes evaluated for their decoding performance with the NND.

Code (32,16) polar (32,24) polar (128,64) polar (63,45) BCH (96,48) LDPC

2

rate 0.5 0.75 0.5 0.71 0.5

dmin 8 3 10

ne 192 128 1792 432 296

NND vs SPA2 4.0dB 2.0dB 3.0dB 1.5dB -

Performance at SNR 6.0dB Navneet Agrawal


38 / 52


Decoding Results

(32, 16) polar code

Navneet Agrawal


39 / 52


Decoding Results

(32, 16) polar code

(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.

Navneet Agrawal


40 / 52


Decoding Results

Constrained vs Fully-connected NND

(a) Constrained

Navneet Agrawal

(b) Fully connected


41 / 52


Decoding Results

(32, 24) polar code

Navneet Agrawal


42 / 52


Decoding Results

(32, 24) polar code


Navneet Agrawal


43 / 52


Decoding Results

(128, 64) polar code

Navneet Agrawal


44 / 52


Decoding Results

(128, 64) polar code


Navneet Agrawal


45 / 52


Decoding Results

(63, 45) BCH code

Navneet Agrawal


46 / 52


Decoding Results

(63, 45) BCH code


Navneet Agrawal


47 / 52


Decoding Results

(96, 48) LDPC code

Navneet Agrawal


48 / 52


Decoding Results

(96, 48) LDPC code

(b) Edge-Weight distribution histogram. (a) Edge-Weight distribution heat-map.

Navneet Agrawal


49 / 52

Conclusions and Future Work

Conclusions

Conclusions Presented algorithm, the NND, implements SPA using neural network, benefiting from properties of both neural networks and SPA. The NND shows promising results for High Density Parity Check (HDPC) codes such as BCH or Polar codes. The network learns to assign complimentary weights to edges forming cycles, hence mitigating the effect of cycles in degrading performance of the SPA. Selection of various hyper-parameters affect NND’s online performance. Presented analysis will help in selection of appropriate hyper-parameter for any family or size of code. Carefully designed loss-function may lead to improvement in NND’s performance, especially when the training data only represents a small subset of the real data. The NND is incapable of achieving ML decoding threshold due to its restricted design and operations (owing to SPA). Complexity of NND grows with density of the parity check matrix, and the number of SPA iterations. This restricts NND to be implemented for medium to long length HDPC codes.

Navneet Agrawal


50 / 52


Future Work

Future Work

Code designing: codes designed to perform well with the NND. Generalized NND: Make better inferences using message passing on a broader family of systems represented by the factor graphs. Hyper-parameter optimization: Bayesian Optimization of Neural Network Hyper-parameters. Intelligent receiver: Combined channel-synchronizer and decoder as two separate Neural networks trained to perform optimally in tandem. Quantization of NND weights to fewer bits: Enabling fast physical layer implementation.

Navneet Agrawal


51 / 52


Future Work

Thank you for your time.

Questions?

Navneet Agrawal


52 / 52

Machine Intelligence in Forward Error Correction Decoding

Machine Intelligence in Forward Error Correction Decoding

Suggest Documents

Dynamic forward error correction

error detection and correction enhanced decoding

Efficient Forward Error Correction for Reliable ...

Forward Error Correction for Reliable e-MBMS

Forward Error Correction (FEC) Codes - Atlanta RF

Forward Error Correction based Encoding Technique ...

Adaptable Forward Error Correction for Multimedia

Superdense Coding Interleaved with Forward Error Correction

Forward Error Correction (FEC) Codes - Atlanta RF

forward error correction schemes for digital communications

Forward error correction (FEC) notes

OCR Error Correction Using Statistical Machine ...

Design and Implementation of Forward Error Correction in Software ...

Design and implementation of Forward Error Correction in Software ...

ERROR CORRECTION

Adjusting Forward Error Correction with Quality Scaling for ... - CiteSeerX

Generic forward error correction of short frames for IP streaming ...

Forward Error Correction Concealment Method for CELP-Based ...

real-time adaptive forward error correction for mpeg-2 video

Layer-Aware Forward Error Correction for Mobile ... - Fraunhofer HHI

Forward Error Correction for Molecular Communications - University of ...

Optimized cross-layer forward error correction ... - Nazanin Rahnavard

TCP-Aware Forward Error Correction for Wireless Networks

Optimized cross-layer forward error correction ... - Nazanin Rahnavard