Factor graphs. Systematic method to apply distributive law (ab + ac = a(b + c)). .... Nachmani et. al., Learning to Decode Linear Codes Using Deep Learning, 2016 ... Introduction of a new loss metric that improves online performance. Deeper ...
Machine Intelligence in Forward Error Correction Decoding. Supervisors: Dr. Hugo Tullberg - Ericsson Research Prof. Ragnar Thobabben - KTH
Navneet Agrawal
Master Thesis presentation
1 / 52
Acknowledgment
Acknowledgment
Dr. Hugo Tullberg, Supervisor Maria Edvardsson, Manager Entire team of Ericsson Research Special mention: Mathias Andersson, Nicolas Seyvet Vidit Saxena, and Maria Fresia
Navneet Agrawal
Master Thesis presentation
2 / 52
Content
Contents
Background - Channel coding, Factor graphs, Sum-Product Algorithm Neural Network Decoder - Design and Analysis Experiments and Results Conclusions and Future work
Navneet Agrawal
Master Thesis presentation
3 / 52
Background
Background Introduction
Communication System Introduction to Coding Theory Factor graphs Decoding using Sum-Product Algorithm
Navneet Agrawal
Master Thesis presentation
4 / 52
Background
Communication System
Communication System Model
Source b : b ∈ {0, 1}k
Sink ˆ = ˆs ⊗ G−1 b
Encoder s = bT ⊗ G : s ∈ C, G ∈ {0, 1}n×k
Decoder ˆ y = argmax p(r|y)
Modulator y = (−1)si
Navneet Agrawal
y:s∈C
Channel r = y+n : ni ∼ N (0, σn2 )
Master Thesis presentation
Demodulator p(ri |yi )
5 / 52
Background
Channel Coding Theory
Channel Coding Basics The channel coding adds redundancy in order to recover information lost during transmission. A linear block code C(n, k) forms n linear combinations of k information bits, where n > k. Rate of a code is the ratio of amount of information per bits transmitted, i.e. r = nk . Hamming distance dH between two codewords u, v ∈ C is the number of positions at which u differs from v. Minimum distance dmin is the minimum hamming distance between any codewords u, v ∈ C, u ̸= v. Maximum number of errors that can be corrected using a code C(n, k) is given by ⌋ ⌊ t ≤ dmin2−1 . The information bits are encoded by taking a product with the Generator matrix of size [k × n]. y = bT ⊗ G The rows of the parity check matrix H (size [n − k, n]) forms the basis for the dual code C ⊥ = {v ∈ GF(2)n : y · vT = 0, ∀y ∈ C}, where H ⊗ yT = 0T , ∀ y ∈ C. Navneet Agrawal
Master Thesis presentation
6 / 52
Background
Channel Coding Theory
Maximum Likelihood Decoding Maximum Likelihood (ML) decoding solution of y is given by, yˆi
ML
= argmax yi :sT ∈C
n ∑(∏ ∼yi
) p(rj |yj ) 1{s∈C} ,
j=1
where 1{f} is the code membership function, which is 1 if f is true, 0 otherwise. ML decoding problem is NP-Hard with complexity exponentially increasing with the length of the code, given by: kc 2min(n−k,k) . Code membership function 1s∈C has a factorized form given by,
1{s∈C} =
K ∏
1{∑ sk =0} ,
k=1
where sk denotes ∑ the subset of bits in s which must satisfies the parity check sk = 0. Navneet Agrawal
Master Thesis presentation
7 / 52
Background
Factor Graphs
Factor graphs x1
Systematic method to apply distributive law (ab + ac = a(b + c)). Example: consider function f with factorization
f1 x5
f2 x4
f(x1 , x2 , x3 , x4 , x5 ) = f1 (x1 , x5 ) f2 (x1 , x4 ) f3 (x2 , x3 , x4 ) f4 (x4 ),
=
∑
x3
∼x1
∼x1
[∑ |
x2
Figure: Factor graph of function f(x1 , x2 , x3 , x4 , x5 )
f1 (x1 , x5 ) f2 (x1 , x4 ) f3 (x2 , x3 , x4 ) f4 (x4 )
| =
f3
f4
where each variable xi ∈ {a1 , . . . , aq }, aj ∈ R. f(x1 ) can be computed by marginalizing other variables, ∑ f(x1 ) = f(x1 , x2 , x3 , x4 )
{z
marginal of products
x1
}
][ ∑ (∑ )] f1 (x1 , x5 ) f2 (x1 , x4 )f4 (x4 ) f3 (x2 , x3 , x4 ) .
x5
x4
{z
product of marginals
x2 ,x3
}
f1 x2 f2 x3 f3
Marginal of products requires 4q5 sums and multiplications, while Product of marginals requires only 2q2 + 6q4 .
x4 f4 x5 Figure: Bipartite graph representation of f.
Navneet Agrawal
Master Thesis presentation
8 / 52
Background
Factor Graphs
Marginalization via message passing Messages µ(x) are passed along the edges of the graph. Sum-Product Algorithm (SPA) ⋆ Sum-mary Factor node (fj ) to variable node (xi ), µfj →xi (xi ) =
∑ ( ∼{xi }
)
∏
fj
∏
∏
x4
∏
f1 f4
x4
f2
∑ x2 ,x3
µfk →xi (xi ),
fk ∈n(xi )\{fj }
Navneet Agrawal
∑
x5
xk ∈n(fj )\{xi }
⋆ Product Variable node (xi ) to factor node (fj ), µxi →fj (xi ) =
∑
µxk →fj (xk ) ,
x1
Master Thesis presentation
f3
9 / 52
Background
Decoding using Sum Product Algorithm
Decoding using Sum Product Algorithm Tanner Graph ⇒ Parity check matrix H[n−k,n] • • • •
Hamming [7,4] 1 H = 1 0
{v1 , . . . , vn } variable nodes {c1 , . . . , cn−k } check (factor) nodes An edge between vi and cj if H[j, i] = 1 Messages are the Log-Likelihood Ratios, γvi = ln
0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1
p(rvi |yvi = +1) 2rv = 2i . p(rvi |yvi = −1) σ
v0
Summary: Check to Variable node: ∑ ∏ µcj →vi = fcj µvk →cj (vk ) ∼{vi }
code:
vk ∈n(cj )\vi
∏
⇒ γcj →vi = 2 tanh−1
tanh
vk ∈n(cj )\vi
v1
( γv
k →cj
2
)
v2 . v3
Product: Variable to Check node: ∏ µvi →cj = µck →vi ∑
c1
v4 v5
ck ∈n(vi )\cj
⇒ γvi →cj = γvi +
c0
γck →vi .
c2
v6
ck ∈n(vi )\cj
Figure: Tanner graph of (7,4) Hamming code. Navneet Agrawal
Master Thesis presentation
10 / 52
Background
Decoding using Sum Product Algorithm
Tanner graphs with cycles - Good and Bad v0 v1
Good codes have cycles • Without ) cycles → at least (
v0 c0
v2
2r−1 2 n
codewords with Hamming weight 2.
v3
• Do not perform MAP-decoding,
factors are not independent. • Performance worsen for codes with many small girth cycles such as BCH or polar codes. • The iterative nature increases the latency of the decoder.
Navneet Agrawal
v5 v6
c0
v2 c1
v4
SPA perform poorly with cycles
v1
v3
c1
v4 c2
v5
c2
v6
Figure: SPA over Tanner graph with cycles. Nodes v0 and v2 form a cycle with c0 and c1 . The information received by node v0 contains information of v2 , and in next iteration, v2 wil receive its own information back from v0 via c1 . There is no exact expression for marginalization of v0 or v2 .
Master Thesis presentation
11 / 52
Neural Network Decoders
Neural Network Decoders
Related Work and Contributions NND Design and Operations Hyper-Parameters
Navneet Agrawal
Master Thesis presentation
12 / 52
Neural Network Decoders
Related work
Related work Sub-optimal decoding • Iterative decoding : Belief Propagation or SPA
R. Tanner, A recursive approach to low complexity codes., 1981, Hagenauer et. al., Iterative decoding of binary block and convolutional codes, 1996 • Linear Programming Relaxation. J. Feldman et. al. , Using linear programming to decode binary linear codes., 2005.
Data-driven decoding of structured codes • Functional similarity of error-correcting codes with neural networks.
• • • •
J. Bruck and M. Blaum, Neural networks, error-correcting codes, and polynomials over the binary n-cube.,1989 Feed-forward neural network decoders El-Khamy et. al. , Soft decision decoding of block codes using artificial neural network., 1995, Hopfield network decoders Esposito et. al. , A neural network for error correcting decoding of binary linear codes., 1994, Random neural network decoder El-Khamy et. al. , Random neural network decoder for error correcting codes., 1999 Learning structure of linear codes using a deep neural network. Gruber et. al., On Deep Learning-Based Channel Decoding, 2017
Neural Network Decoder (NND): SPA based implementation for data-aided learning. Nachmani et. al., Learning to Decode Linear Codes Using Deep Learning, 2016
Navneet Agrawal
Master Thesis presentation
13 / 52
Neural Network Decoders
Contributions
Contributions
Elucidate design and construction of the NND. Analysis of various parameters affecting the training and online performance of the NND. Introduction of a new loss metric that improves online performance. Deeper insight into working of the NND based on the learned weight’s distribution. Analysis of NND’s performance on different families and sizes of linear block codes.
Navneet Agrawal
Master Thesis presentation
14 / 52
Neural Network Decoders
NND Design and Operations
SPA over an Unrolled Tanner Graph
L1 v0 v1
v0 c0
v2 v3
v6
c0
v2 c1
v4 v5
v1
v0
v3
v2 c1
v4 c2
v5
v1
v3
L2
v4 c2
v6
v5 v6
Figure: Unrolled Tanner graph for 2 iteration of SPA for (7,4) Hamming code.
Navneet Agrawal
Master Thesis presentation
15 / 52
Neural Network Decoders
NND Design and Operations
SPA using Neural Network over edges of Tanner graph
Figure: Network graph of the SPA-NN with nodes as edges of Tanner graph. Navneet Agrawal
Master Thesis presentation
16 / 52
Neural Network Decoders
NND Design and Operations
Neural Network Decoder with Learn-able Weights
Wo2e
Wi2o
Wo2e
We2o
Wo2e
We2o
We2x
Figure: Dashed lines shows connections carrying learn-able weights. Navneet Agrawal
Master Thesis presentation
17 / 52
Neural Network Decoders
NND Design and Operations
Neural Network Decoder - Operations Odd layer:
( Xi = tanh
) ) 1( T Wi2o ⊗ L + WT , e2o ⊗ Xi−1 2
where L is channel information and Xi−1 is output of previous even layer. Even layer: ( ) Xi = 2 tanh−1
X⋆i−1 ,
where X⋆i−1 is obtained by applying matrix transformations on previous odd layer output, equivalent of taking product of elements of Xi−1 corresponding to Wo2e = 1 along the row. Output layer: Lˆi = L + WT e2x ⊗ Xi−1 Navneet Agrawal
Master Thesis presentation
18 / 52
Neural Network Decoders
NND Design and Operations
Computational Complexity
Operations Input-layer: Odd-layer: Even-layer: Output-layer: Total:
Multiplications ne (ne + n + 1) ne (2ne + 1) ne n 3n2e + 2ne (n + 1)
Nodes n ne ne n 2(n+ne )
Table: Number of multiplications and nodes for one SPA iteration in NND.
Figure: Comparison of graph size (bar) and total number of multiplications (line) required for NND of different codes built for 5 SPA iterations.
Navneet Agrawal
Master Thesis presentation
19 / 52
Neural Network Decoders
Hyper-Parameters
Hyper-parameters for Training NND Class Design
Weights
Optimization
Training
Parameter Code Parity Check Matrix Number of SPA iterations Network Architecture ˜ i2o ) Train input weights (W ˜ e2x ) Train output weights (W Weights Initialization Loss Function Loss Function type Optimizer Learning Rate Training codeword type SNR Training (dB) SNR validation (dB) Training Batch length Validation Batch length Total training epochs Validate after n epochs LLR Clipping
Value (n, k) type Binary Matrix Integer FF or RNN True or False True or False Random or Fixed Cross-entropy, Syndrome or Energy Single or Multiple RMSProp float (< 1.0) 0 or random float or array float or array Integer Integer Integer Integer Integer
Typical (32,16) polar 5 RNN False False Fixed Cross-entropy Multiple RMSProp 0.001 0 [2.0] {−3, −2, . . . , 9} 120 600 218 500 20
Table: A list of parameters required to set up NND for training. A typical example of parameter settings is provided for reference.
Navneet Agrawal
Master Thesis presentation
20 / 52
Neural Network Decoders
Hyper-Parameters
Common Parameters Code - C(n, k) : NND design and performance is specific to the Tanner graph of Code. LLR Clipping : tanh−1 causes sudden explosion in LLR values. Values are clipped to [−20, 20]. Weights settings: • Selection: Even-to-Odd layer weight (We2o ) are mandatory, other
weigths (Wi2o , We2x ) can be ignored. • Initialization: Weights are initialized with fixed values, making NND
perform equivalent to SPA. • Quantization: 32-bit floating point numbers are used.
Optimizer: Adaptive Stochastic Gradient Descent Optimizer RMSProp. Learning Rate: Vary depending on the size of the network and number of learn-able parameters. Training codewords: Training using “All-zero” codewords allowed due to symmetry property of SPA 1 . 1
Definition 4.81 in [ Richardson and Urbanke, Modern Coding Theory., Cambridge Navneet Agrawal
Master Thesis presentation
21 / 52
Neural Network Decoders
Hyper-Parameters
Number of SPA iterations
NND graph is designed for fixed number of SPA iterations (ni ). Size of graph and number of learn-able parameters grow with ni . Performance improvement with increase in ni is not linear. Deeper networks are hard to train.
Navneet Agrawal
Master Thesis presentation
22 / 52
Neural Network Decoders
Hyper-Parameters
Network Architecture NND is characterized by a set of operations involving learn-able weights W in each iteration. Feed-forward architecture: • Learn-able parameters are
separately trained for each iteration. • Leads to a higher degree of freedom for the model.
Recurrent architecture: • Learn-able parameters are shared in
each iteration. • Leads to constraints on learn-able
parameters and regularization of the model. • NND performs same operations in each iteration - similar to SPA.
Navneet Agrawal
Master Thesis presentation
23 / 52
Neural Network Decoders
Hyper-Parameters
Loss Function - Cross Entropy Output layer loss: LCE f (p, y) = −
N ) 1 ∑( y(n) log (1 − p(n)) + (1 − y(n)) log p(n) N n=1
where p(n) is the estimated probability of y(n) = 0 obtained from the final output layer, and y is the binary vector of target codeword. Multi-loss: ( N ) 2L ) ∑( 1 ∑ CE y(n) log (1 − p(l, n)) + (1 − y(n)) log p(l, n) Lm (p, y) = − NL l=2,4,...
n=1
where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer.
Navneet Agrawal
Master Thesis presentation
24 / 52
Neural Network Decoders
Hyper-Parameters
Loss Function - Cross Entropy with Syndrome check Output layer loss: LSC f (p, y) = −
N ) 1 ∑( y(n) log (1 − p(n)) + (1 − y(n)) log p(n) N n=1
where p(n) is the network output probability of nth bit at the first output-layer at which the syndrome check is satisfied. Multi-loss: ( N ) 2M ) ∑( 1 ∑ SC Lm (p, y) = − yn log (1 − p(l, n)) + (1 − yn ) log p(l, n) MN l=2,4,...
n=1
where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer. If the syndrome check is satisfied at layer 0 < k < 2L, then 2M = k, else 2M = 2L.
Navneet Agrawal
Master Thesis presentation
25 / 52
Neural Network Decoders
Hyper-Parameters
Cross Entropy Loss - With and Without Syndrome check
Figure: BLER performance comparison for Cross-Entropy loss with and without syndrome check. Navneet Agrawal
Master Thesis presentation
26 / 52
Neural Network Decoders
Hyper-Parameters
Loss Function - Energy based Loss Output layer loss: ) 1 ∑( (1 − 2p(n))(−1)y(n) N N
LEf (p, y) = −
n=1
where p(n) is the network output probability of nth bit at the final output layer. Multi-loss: ( N ) 2M ) ∑( 1 ∑ E y(n) (1 − 2p(l, n))(−1) Lm (p, y) = − MN l=2,4,...
n=1
where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer.
Navneet Agrawal
Master Thesis presentation
27 / 52
Neural Network Decoders
Hyper-Parameters
Cross Entropy vs Energy based Loss
Figure: Comparison of Cross entropy and Energy based loss functions for LLR output of a single bit, given the target bit y = 0.
Navneet Agrawal
Master Thesis presentation
28 / 52
Neural Network Decoders
Hyper-Parameters
Cross Entropy vs Energy based Loss
Figure: BER performance comparison of Energy based and Cross-entropy loss functions for (32,16) polar code. Navneet Agrawal
Master Thesis presentation
29 / 52
Neural Network Decoders
Hyper-Parameters
SNR of the Training data
NND is desired to perform optimally during its online operations, with data generated using any SNR value. Training at extreme SNR values: Low SNR: too many error, High SNR: too few errors, Restricts learning of patterns in the code structure. Training data can be created using, 1) Fixed SNR, 2) Range of SNR,
Navneet Agrawal
Master Thesis presentation
30 / 52
Neural Network Decoders
Hyper-Parameters
Comparison of SNR values for training
The Normalized Validation Score (NVS) is calculated for a training SNR value ζt by averaging over the ratio of BER for Neural Network Decoder (NND) and SPA, evaluated for a validation data set generated with a set of SNR values ρv,s , s = {1, . . . , S} using the network trained at SNR ζt . 1 ∑ BERNND (ζt , ρv,s ) S BERSPA (ρv,s ) S
NVS(ζt ) =
s=1
Figure: Comparison of SNR values for training (32,16) polar code.
Navneet Agrawal
Master Thesis presentation
31 / 52
Neural Network Decoders
Hyper-Parameters
Training Length and Stopping Criteria
Training length of NND depends on the number of learn-able parameters and learning rate. Problems with long training: • Deviate from the global optimal due to high learning rate • Overfitting the training data leads to poor online performance
Solution: Early Stopping Stopping Criteria: The NND trains and validates the performance for specific number of epochs, before selecting the state of NND that gave the best performance on validation set.
Navneet Agrawal
Master Thesis presentation
32 / 52
Experiments and Results
Experiments and Results
Simulation and Software Implementation Edge Weight Analysis Decoding Results
Navneet Agrawal
Master Thesis presentation
33 / 52
Experiments and Results
Experimental Setup
Simulation and Software Implementation Programming language: Python 2.7 (+ NumPy 1.13) Neural Networks: Tensorflow 1.2 (+ Python API) SPA decoder: Open Source implementation by Radford M. Neal
Start: - Gather params - Program config
Navneet Agrawal
Initialize: - NND Tensorflow graph - Communication system
Master Thesis presentation
Training/test process: - Start TF session - Loop 1-2 while not 3: 1. Train n batches 2. Validate model 3. Stopping criteria - Save trained weights
34 / 52
Experiments and Results
Edge Weight Analysis
Learned Weight Analysis v0 0
v1
1
v2
4
c0
6
v3
c1
v4 v5 Figure: Learned weight distribution over edges for (7,4) Hamming code.
Navneet Agrawal
Master Thesis presentation
c2
v6
35 / 52
Experiments and Results
Edge Weight Analysis
Learned Weight Analysis Tree structured 1 0 H = 1 1 0 0
(7,4) code 0 1 0 0 0 0 0 1 0 0 1 0 1 1 1
v0 v1
c0
v2 v3
Figure: Learned weight distribution over edges for tree structured (7,4) code.
c1
v4 v5
c2
v6 Navneet Agrawal
Master Thesis presentation
36 / 52
Experiments and Results
Edge Weight Analysis
Evolution of Weights in Consecutive Layers
(a) 20 iterations
(b) 5 iterations
Figure: Analysis of evolution of weights for different number of SPA iterations, and different NND architectures. Navneet Agrawal
Master Thesis presentation
37 / 52
Experiments and Results
Decoding Results
Decoding Results
Table: List of codes evaluated for their decoding performance with the NND.
Code (32,16) polar (32,24) polar (128,64) polar (63,45) BCH (96,48) LDPC
2
rate 0.5 0.75 0.5 0.71 0.5
dmin 8 3 10
ne 192 128 1792 432 296
NND vs SPA2 4.0dB 2.0dB 3.0dB 1.5dB -
Performance at SNR 6.0dB Navneet Agrawal
Master Thesis presentation
38 / 52
Experiments and Results
Decoding Results
(32, 16) polar code
Navneet Agrawal
Master Thesis presentation
39 / 52
Experiments and Results
Decoding Results
(32, 16) polar code
(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.
Navneet Agrawal
Master Thesis presentation
40 / 52
Experiments and Results
Decoding Results
Constrained vs Fully-connected NND
(a) Constrained
Navneet Agrawal
(b) Fully connected
Master Thesis presentation
41 / 52
Experiments and Results
Decoding Results
(32, 24) polar code
Navneet Agrawal
Master Thesis presentation
42 / 52
Experiments and Results
Decoding Results
(32, 24) polar code
(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.
Navneet Agrawal
Master Thesis presentation
43 / 52
Experiments and Results
Decoding Results
(128, 64) polar code
Navneet Agrawal
Master Thesis presentation
44 / 52
Experiments and Results
Decoding Results
(128, 64) polar code
(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.
Navneet Agrawal
Master Thesis presentation
45 / 52
Experiments and Results
Decoding Results
(63, 45) BCH code
Navneet Agrawal
Master Thesis presentation
46 / 52
Experiments and Results
Decoding Results
(63, 45) BCH code
(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.
Navneet Agrawal
Master Thesis presentation
47 / 52
Experiments and Results
Decoding Results
(96, 48) LDPC code
Navneet Agrawal
Master Thesis presentation
48 / 52
Experiments and Results
Decoding Results
(96, 48) LDPC code
(b) Edge-Weight distribution histogram. (a) Edge-Weight distribution heat-map.
Navneet Agrawal
Master Thesis presentation
49 / 52
Conclusions and Future Work
Conclusions
Conclusions Presented algorithm, the NND, implements SPA using neural network, benefiting from properties of both neural networks and SPA. The NND shows promising results for High Density Parity Check (HDPC) codes such as BCH or Polar codes. The network learns to assign complimentary weights to edges forming cycles, hence mitigating the effect of cycles in degrading performance of the SPA. Selection of various hyper-parameters affect NND’s online performance. Presented analysis will help in selection of appropriate hyper-parameter for any family or size of code. Carefully designed loss-function may lead to improvement in NND’s performance, especially when the training data only represents a small subset of the real data. The NND is incapable of achieving ML decoding threshold due to its restricted design and operations (owing to SPA). Complexity of NND grows with density of the parity check matrix, and the number of SPA iterations. This restricts NND to be implemented for medium to long length HDPC codes.
Navneet Agrawal
Master Thesis presentation
50 / 52
Conclusions and Future Work
Future Work
Future Work
Code designing: codes designed to perform well with the NND. Generalized NND: Make better inferences using message passing on a broader family of systems represented by the factor graphs. Hyper-parameter optimization: Bayesian Optimization of Neural Network Hyper-parameters. Intelligent receiver: Combined channel-synchronizer and decoder as two separate Neural networks trained to perform optimally in tandem. Quantization of NND weights to fewer bits: Enabling fast physical layer implementation.
Navneet Agrawal
Master Thesis presentation
51 / 52
Conclusions and Future Work
Future Work
Thank you for your time.
Questions?
Navneet Agrawal
Master Thesis presentation
52 / 52