Machine Intelligence in Forward Error Correction Decoding

4 downloads 0 Views 1MB Size Report
Factor graphs. Systematic method to apply distributive law (ab + ac = a(b + c)). .... Nachmani et. al., Learning to Decode Linear Codes Using Deep Learning, 2016 ... Introduction of a new loss metric that improves online performance. Deeper ...
Machine Intelligence in Forward Error Correction Decoding. Supervisors: Dr. Hugo Tullberg - Ericsson Research Prof. Ragnar Thobabben - KTH

Navneet Agrawal

Master Thesis presentation

1 / 52

Acknowledgment

Acknowledgment

Dr. Hugo Tullberg, Supervisor Maria Edvardsson, Manager Entire team of Ericsson Research Special mention: Mathias Andersson, Nicolas Seyvet Vidit Saxena, and Maria Fresia

Navneet Agrawal

Master Thesis presentation

2 / 52

Content

Contents

Background - Channel coding, Factor graphs, Sum-Product Algorithm Neural Network Decoder - Design and Analysis Experiments and Results Conclusions and Future work

Navneet Agrawal

Master Thesis presentation

3 / 52

Background

Background Introduction

Communication System Introduction to Coding Theory Factor graphs Decoding using Sum-Product Algorithm

Navneet Agrawal

Master Thesis presentation

4 / 52

Background

Communication System

Communication System Model

Source b : b ∈ {0, 1}k

Sink ˆ = ˆs ⊗ G−1 b

Encoder s = bT ⊗ G : s ∈ C, G ∈ {0, 1}n×k

Decoder ˆ y = argmax p(r|y)

Modulator y = (−1)si

Navneet Agrawal

y:s∈C

Channel r = y+n : ni ∼ N (0, σn2 )

Master Thesis presentation

Demodulator p(ri |yi )

5 / 52

Background

Channel Coding Theory

Channel Coding Basics The channel coding adds redundancy in order to recover information lost during transmission. A linear block code C(n, k) forms n linear combinations of k information bits, where n > k. Rate of a code is the ratio of amount of information per bits transmitted, i.e. r = nk . Hamming distance dH between two codewords u, v ∈ C is the number of positions at which u differs from v. Minimum distance dmin is the minimum hamming distance between any codewords u, v ∈ C, u ̸= v. Maximum number of errors that can be corrected using a code C(n, k) is given by ⌋ ⌊ t ≤ dmin2−1 . The information bits are encoded by taking a product with the Generator matrix of size [k × n]. y = bT ⊗ G The rows of the parity check matrix H (size [n − k, n]) forms the basis for the dual code C ⊥ = {v ∈ GF(2)n : y · vT = 0, ∀y ∈ C}, where H ⊗ yT = 0T , ∀ y ∈ C. Navneet Agrawal

Master Thesis presentation

6 / 52

Background

Channel Coding Theory

Maximum Likelihood Decoding Maximum Likelihood (ML) decoding solution of y is given by, yˆi

ML

= argmax yi :sT ∈C

n ∑(∏ ∼yi

) p(rj |yj ) 1{s∈C} ,

j=1

where 1{f} is the code membership function, which is 1 if f is true, 0 otherwise. ML decoding problem is NP-Hard with complexity exponentially increasing with the length of the code, given by: kc 2min(n−k,k) . Code membership function 1s∈C has a factorized form given by,

1{s∈C} =

K ∏

1{∑ sk =0} ,

k=1

where sk denotes ∑ the subset of bits in s which must satisfies the parity check sk = 0. Navneet Agrawal

Master Thesis presentation

7 / 52

Background

Factor Graphs

Factor graphs x1

Systematic method to apply distributive law (ab + ac = a(b + c)). Example: consider function f with factorization

f1 x5

f2 x4

f(x1 , x2 , x3 , x4 , x5 ) = f1 (x1 , x5 ) f2 (x1 , x4 ) f3 (x2 , x3 , x4 ) f4 (x4 ),

=



x3

∼x1

∼x1

[∑ |

x2

Figure: Factor graph of function f(x1 , x2 , x3 , x4 , x5 )

f1 (x1 , x5 ) f2 (x1 , x4 ) f3 (x2 , x3 , x4 ) f4 (x4 )

| =

f3

f4

where each variable xi ∈ {a1 , . . . , aq }, aj ∈ R. f(x1 ) can be computed by marginalizing other variables, ∑ f(x1 ) = f(x1 , x2 , x3 , x4 )

{z

marginal of products

x1

}

][ ∑ (∑ )] f1 (x1 , x5 ) f2 (x1 , x4 )f4 (x4 ) f3 (x2 , x3 , x4 ) .

x5

x4

{z

product of marginals

x2 ,x3

}

f1 x2 f2 x3 f3

Marginal of products requires 4q5 sums and multiplications, while Product of marginals requires only 2q2 + 6q4 .

x4 f4 x5 Figure: Bipartite graph representation of f.

Navneet Agrawal

Master Thesis presentation

8 / 52

Background

Factor Graphs

Marginalization via message passing Messages µ(x) are passed along the edges of the graph. Sum-Product Algorithm (SPA) ⋆ Sum-mary Factor node (fj ) to variable node (xi ), µfj →xi (xi ) =

∑ ( ∼{xi }

)



fj





x4



f1 f4

x4

f2

∑ x2 ,x3

µfk →xi (xi ),

fk ∈n(xi )\{fj }

Navneet Agrawal



x5

xk ∈n(fj )\{xi }

⋆ Product Variable node (xi ) to factor node (fj ), µxi →fj (xi ) =



µxk →fj (xk ) ,

x1

Master Thesis presentation

f3

9 / 52

Background

Decoding using Sum Product Algorithm

Decoding using Sum Product Algorithm Tanner Graph ⇒ Parity check matrix H[n−k,n] • • • •

Hamming [7,4]  1 H = 1 0

{v1 , . . . , vn } variable nodes {c1 , . . . , cn−k } check (factor) nodes An edge between vi and cj if H[j, i] = 1 Messages are the Log-Likelihood Ratios, γvi = ln

 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 0 0 1

p(rvi |yvi = +1) 2rv = 2i . p(rvi |yvi = −1) σ

v0

Summary: Check to Variable node: ∑ ∏ µcj →vi = fcj µvk →cj (vk ) ∼{vi }

code:

vk ∈n(cj )\vi



⇒ γcj →vi = 2 tanh−1

tanh

vk ∈n(cj )\vi

v1

( γv

k →cj

2

)

v2 . v3

Product: Variable to Check node: ∏ µvi →cj = µck →vi ∑

c1

v4 v5

ck ∈n(vi )\cj

⇒ γvi →cj = γvi +

c0

γck →vi .

c2

v6

ck ∈n(vi )\cj

Figure: Tanner graph of (7,4) Hamming code. Navneet Agrawal

Master Thesis presentation

10 / 52

Background

Decoding using Sum Product Algorithm

Tanner graphs with cycles - Good and Bad v0 v1

Good codes have cycles • Without ) cycles → at least (

v0 c0

v2

2r−1 2 n

codewords with Hamming weight 2.

v3

• Do not perform MAP-decoding,

factors are not independent. • Performance worsen for codes with many small girth cycles such as BCH or polar codes. • The iterative nature increases the latency of the decoder.

Navneet Agrawal

v5 v6

c0

v2 c1

v4

SPA perform poorly with cycles

v1

v3

c1

v4 c2

v5

c2

v6

Figure: SPA over Tanner graph with cycles. Nodes v0 and v2 form a cycle with c0 and c1 . The information received by node v0 contains information of v2 , and in next iteration, v2 wil receive its own information back from v0 via c1 . There is no exact expression for marginalization of v0 or v2 .

Master Thesis presentation

11 / 52

Neural Network Decoders

Neural Network Decoders

Related Work and Contributions NND Design and Operations Hyper-Parameters

Navneet Agrawal

Master Thesis presentation

12 / 52

Neural Network Decoders

Related work

Related work Sub-optimal decoding • Iterative decoding : Belief Propagation or SPA

R. Tanner, A recursive approach to low complexity codes., 1981, Hagenauer et. al., Iterative decoding of binary block and convolutional codes, 1996 • Linear Programming Relaxation. J. Feldman et. al. , Using linear programming to decode binary linear codes., 2005.

Data-driven decoding of structured codes • Functional similarity of error-correcting codes with neural networks.

• • • •

J. Bruck and M. Blaum, Neural networks, error-correcting codes, and polynomials over the binary n-cube.,1989 Feed-forward neural network decoders El-Khamy et. al. , Soft decision decoding of block codes using artificial neural network., 1995, Hopfield network decoders Esposito et. al. , A neural network for error correcting decoding of binary linear codes., 1994, Random neural network decoder El-Khamy et. al. , Random neural network decoder for error correcting codes., 1999 Learning structure of linear codes using a deep neural network. Gruber et. al., On Deep Learning-Based Channel Decoding, 2017

Neural Network Decoder (NND): SPA based implementation for data-aided learning. Nachmani et. al., Learning to Decode Linear Codes Using Deep Learning, 2016

Navneet Agrawal

Master Thesis presentation

13 / 52

Neural Network Decoders

Contributions

Contributions

Elucidate design and construction of the NND. Analysis of various parameters affecting the training and online performance of the NND. Introduction of a new loss metric that improves online performance. Deeper insight into working of the NND based on the learned weight’s distribution. Analysis of NND’s performance on different families and sizes of linear block codes.

Navneet Agrawal

Master Thesis presentation

14 / 52

Neural Network Decoders

NND Design and Operations

SPA over an Unrolled Tanner Graph

L1 v0 v1

v0 c0

v2 v3

v6

c0

v2 c1

v4 v5

v1

v0

v3

v2 c1

v4 c2

v5

v1

v3

L2

v4 c2

v6

v5 v6

Figure: Unrolled Tanner graph for 2 iteration of SPA for (7,4) Hamming code.

Navneet Agrawal

Master Thesis presentation

15 / 52

Neural Network Decoders

NND Design and Operations

SPA using Neural Network over edges of Tanner graph

Figure: Network graph of the SPA-NN with nodes as edges of Tanner graph. Navneet Agrawal

Master Thesis presentation

16 / 52

Neural Network Decoders

NND Design and Operations

Neural Network Decoder with Learn-able Weights

Wo2e

Wi2o

Wo2e

We2o

Wo2e

We2o

We2x

Figure: Dashed lines shows connections carrying learn-able weights. Navneet Agrawal

Master Thesis presentation

17 / 52

Neural Network Decoders

NND Design and Operations

Neural Network Decoder - Operations Odd layer:

( Xi = tanh

) ) 1( T Wi2o ⊗ L + WT , e2o ⊗ Xi−1 2

where L is channel information and Xi−1 is output of previous even layer. Even layer: ( ) Xi = 2 tanh−1

X⋆i−1 ,

where X⋆i−1 is obtained by applying matrix transformations on previous odd layer output, equivalent of taking product of elements of Xi−1 corresponding to Wo2e = 1 along the row. Output layer: Lˆi = L + WT e2x ⊗ Xi−1 Navneet Agrawal

Master Thesis presentation

18 / 52

Neural Network Decoders

NND Design and Operations

Computational Complexity

Operations Input-layer: Odd-layer: Even-layer: Output-layer: Total:

Multiplications ne (ne + n + 1) ne (2ne + 1) ne n 3n2e + 2ne (n + 1)

Nodes n ne ne n 2(n+ne )

Table: Number of multiplications and nodes for one SPA iteration in NND.

Figure: Comparison of graph size (bar) and total number of multiplications (line) required for NND of different codes built for 5 SPA iterations.

Navneet Agrawal

Master Thesis presentation

19 / 52

Neural Network Decoders

Hyper-Parameters

Hyper-parameters for Training NND Class Design

Weights

Optimization

Training

Parameter Code Parity Check Matrix Number of SPA iterations Network Architecture ˜ i2o ) Train input weights (W ˜ e2x ) Train output weights (W Weights Initialization Loss Function Loss Function type Optimizer Learning Rate Training codeword type SNR Training (dB) SNR validation (dB) Training Batch length Validation Batch length Total training epochs Validate after n epochs LLR Clipping

Value (n, k) type Binary Matrix Integer FF or RNN True or False True or False Random or Fixed Cross-entropy, Syndrome or Energy Single or Multiple RMSProp float (< 1.0) 0 or random float or array float or array Integer Integer Integer Integer Integer

Typical (32,16) polar 5 RNN False False Fixed Cross-entropy Multiple RMSProp 0.001 0 [2.0] {−3, −2, . . . , 9} 120 600 218 500 20

Table: A list of parameters required to set up NND for training. A typical example of parameter settings is provided for reference.

Navneet Agrawal

Master Thesis presentation

20 / 52

Neural Network Decoders

Hyper-Parameters

Common Parameters Code - C(n, k) : NND design and performance is specific to the Tanner graph of Code. LLR Clipping : tanh−1 causes sudden explosion in LLR values. Values are clipped to [−20, 20]. Weights settings: • Selection: Even-to-Odd layer weight (We2o ) are mandatory, other

weigths (Wi2o , We2x ) can be ignored. • Initialization: Weights are initialized with fixed values, making NND

perform equivalent to SPA. • Quantization: 32-bit floating point numbers are used.

Optimizer: Adaptive Stochastic Gradient Descent Optimizer RMSProp. Learning Rate: Vary depending on the size of the network and number of learn-able parameters. Training codewords: Training using “All-zero” codewords allowed due to symmetry property of SPA 1 . 1

Definition 4.81 in [ Richardson and Urbanke, Modern Coding Theory., Cambridge Navneet Agrawal

Master Thesis presentation

21 / 52

Neural Network Decoders

Hyper-Parameters

Number of SPA iterations

NND graph is designed for fixed number of SPA iterations (ni ). Size of graph and number of learn-able parameters grow with ni . Performance improvement with increase in ni is not linear. Deeper networks are hard to train.

Navneet Agrawal

Master Thesis presentation

22 / 52

Neural Network Decoders

Hyper-Parameters

Network Architecture NND is characterized by a set of operations involving learn-able weights W in each iteration. Feed-forward architecture: • Learn-able parameters are

separately trained for each iteration. • Leads to a higher degree of freedom for the model.

Recurrent architecture: • Learn-able parameters are shared in

each iteration. • Leads to constraints on learn-able

parameters and regularization of the model. • NND performs same operations in each iteration - similar to SPA.

Navneet Agrawal

Master Thesis presentation

23 / 52

Neural Network Decoders

Hyper-Parameters

Loss Function - Cross Entropy Output layer loss: LCE f (p, y) = −

N ) 1 ∑( y(n) log (1 − p(n)) + (1 − y(n)) log p(n) N n=1

where p(n) is the estimated probability of y(n) = 0 obtained from the final output layer, and y is the binary vector of target codeword. Multi-loss: ( N ) 2L ) ∑( 1 ∑ CE y(n) log (1 − p(l, n)) + (1 − y(n)) log p(l, n) Lm (p, y) = − NL l=2,4,...

n=1

where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer.

Navneet Agrawal

Master Thesis presentation

24 / 52

Neural Network Decoders

Hyper-Parameters

Loss Function - Cross Entropy with Syndrome check Output layer loss: LSC f (p, y) = −

N ) 1 ∑( y(n) log (1 − p(n)) + (1 − y(n)) log p(n) N n=1

where p(n) is the network output probability of nth bit at the first output-layer at which the syndrome check is satisfied. Multi-loss: ( N ) 2M ) ∑( 1 ∑ SC Lm (p, y) = − yn log (1 − p(l, n)) + (1 − yn ) log p(l, n) MN l=2,4,...

n=1

where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer. If the syndrome check is satisfied at layer 0 < k < 2L, then 2M = k, else 2M = 2L.

Navneet Agrawal

Master Thesis presentation

25 / 52

Neural Network Decoders

Hyper-Parameters

Cross Entropy Loss - With and Without Syndrome check

Figure: BLER performance comparison for Cross-Entropy loss with and without syndrome check. Navneet Agrawal

Master Thesis presentation

26 / 52

Neural Network Decoders

Hyper-Parameters

Loss Function - Energy based Loss Output layer loss: ) 1 ∑( (1 − 2p(n))(−1)y(n) N N

LEf (p, y) = −

n=1

where p(n) is the network output probability of nth bit at the final output layer. Multi-loss: ( N ) 2M ) ∑( 1 ∑ E y(n) (1 − 2p(l, n))(−1) Lm (p, y) = − MN l=2,4,...

n=1

where p(l, n) is the network output probability of nth bit at the (l + 1)th output layer.

Navneet Agrawal

Master Thesis presentation

27 / 52

Neural Network Decoders

Hyper-Parameters

Cross Entropy vs Energy based Loss

Figure: Comparison of Cross entropy and Energy based loss functions for LLR output of a single bit, given the target bit y = 0.

Navneet Agrawal

Master Thesis presentation

28 / 52

Neural Network Decoders

Hyper-Parameters

Cross Entropy vs Energy based Loss

Figure: BER performance comparison of Energy based and Cross-entropy loss functions for (32,16) polar code. Navneet Agrawal

Master Thesis presentation

29 / 52

Neural Network Decoders

Hyper-Parameters

SNR of the Training data

NND is desired to perform optimally during its online operations, with data generated using any SNR value. Training at extreme SNR values: Low SNR: too many error, High SNR: too few errors, Restricts learning of patterns in the code structure. Training data can be created using, 1) Fixed SNR, 2) Range of SNR,

Navneet Agrawal

Master Thesis presentation

30 / 52

Neural Network Decoders

Hyper-Parameters

Comparison of SNR values for training

The Normalized Validation Score (NVS) is calculated for a training SNR value ζt by averaging over the ratio of BER for Neural Network Decoder (NND) and SPA, evaluated for a validation data set generated with a set of SNR values ρv,s , s = {1, . . . , S} using the network trained at SNR ζt . 1 ∑ BERNND (ζt , ρv,s ) S BERSPA (ρv,s ) S

NVS(ζt ) =

s=1

Figure: Comparison of SNR values for training (32,16) polar code.

Navneet Agrawal

Master Thesis presentation

31 / 52

Neural Network Decoders

Hyper-Parameters

Training Length and Stopping Criteria

Training length of NND depends on the number of learn-able parameters and learning rate. Problems with long training: • Deviate from the global optimal due to high learning rate • Overfitting the training data leads to poor online performance

Solution: Early Stopping Stopping Criteria: The NND trains and validates the performance for specific number of epochs, before selecting the state of NND that gave the best performance on validation set.

Navneet Agrawal

Master Thesis presentation

32 / 52

Experiments and Results

Experiments and Results

Simulation and Software Implementation Edge Weight Analysis Decoding Results

Navneet Agrawal

Master Thesis presentation

33 / 52

Experiments and Results

Experimental Setup

Simulation and Software Implementation Programming language: Python 2.7 (+ NumPy 1.13) Neural Networks: Tensorflow 1.2 (+ Python API) SPA decoder: Open Source implementation by Radford M. Neal

Start: - Gather params - Program config

Navneet Agrawal

Initialize: - NND Tensorflow graph - Communication system

Master Thesis presentation

Training/test process: - Start TF session - Loop 1-2 while not 3: 1. Train n batches 2. Validate model 3. Stopping criteria - Save trained weights

34 / 52

Experiments and Results

Edge Weight Analysis

Learned Weight Analysis v0 0

v1

1

v2

4

c0

6

v3

c1

v4 v5 Figure: Learned weight distribution over edges for (7,4) Hamming code.

Navneet Agrawal

Master Thesis presentation

c2

v6

35 / 52

Experiments and Results

Edge Weight Analysis

Learned Weight Analysis Tree structured  1 0 H = 1 1 0 0

(7,4) code  0 1 0 0 0 0 0 1 0 0 1 0 1 1 1

v0 v1

c0

v2 v3

Figure: Learned weight distribution over edges for tree structured (7,4) code.

c1

v4 v5

c2

v6 Navneet Agrawal

Master Thesis presentation

36 / 52

Experiments and Results

Edge Weight Analysis

Evolution of Weights in Consecutive Layers

(a) 20 iterations

(b) 5 iterations

Figure: Analysis of evolution of weights for different number of SPA iterations, and different NND architectures. Navneet Agrawal

Master Thesis presentation

37 / 52

Experiments and Results

Decoding Results

Decoding Results

Table: List of codes evaluated for their decoding performance with the NND.

Code (32,16) polar (32,24) polar (128,64) polar (63,45) BCH (96,48) LDPC

2

rate 0.5 0.75 0.5 0.71 0.5

dmin 8 3 10

ne 192 128 1792 432 296

NND vs SPA2 4.0dB 2.0dB 3.0dB 1.5dB -

Performance at SNR 6.0dB Navneet Agrawal

Master Thesis presentation

38 / 52

Experiments and Results

Decoding Results

(32, 16) polar code

Navneet Agrawal

Master Thesis presentation

39 / 52

Experiments and Results

Decoding Results

(32, 16) polar code

(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.

Navneet Agrawal

Master Thesis presentation

40 / 52

Experiments and Results

Decoding Results

Constrained vs Fully-connected NND

(a) Constrained

Navneet Agrawal

(b) Fully connected

Master Thesis presentation

41 / 52

Experiments and Results

Decoding Results

(32, 24) polar code

Navneet Agrawal

Master Thesis presentation

42 / 52

Experiments and Results

Decoding Results

(32, 24) polar code

(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.

Navneet Agrawal

Master Thesis presentation

43 / 52

Experiments and Results

Decoding Results

(128, 64) polar code

Navneet Agrawal

Master Thesis presentation

44 / 52

Experiments and Results

Decoding Results

(128, 64) polar code

(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.

Navneet Agrawal

Master Thesis presentation

45 / 52

Experiments and Results

Decoding Results

(63, 45) BCH code

Navneet Agrawal

Master Thesis presentation

46 / 52

Experiments and Results

Decoding Results

(63, 45) BCH code

(a) Edge-Weight distribution heat-map. (b) Edge-Weight distribution histogram.

Navneet Agrawal

Master Thesis presentation

47 / 52

Experiments and Results

Decoding Results

(96, 48) LDPC code

Navneet Agrawal

Master Thesis presentation

48 / 52

Experiments and Results

Decoding Results

(96, 48) LDPC code

(b) Edge-Weight distribution histogram. (a) Edge-Weight distribution heat-map.

Navneet Agrawal

Master Thesis presentation

49 / 52

Conclusions and Future Work

Conclusions

Conclusions Presented algorithm, the NND, implements SPA using neural network, benefiting from properties of both neural networks and SPA. The NND shows promising results for High Density Parity Check (HDPC) codes such as BCH or Polar codes. The network learns to assign complimentary weights to edges forming cycles, hence mitigating the effect of cycles in degrading performance of the SPA. Selection of various hyper-parameters affect NND’s online performance. Presented analysis will help in selection of appropriate hyper-parameter for any family or size of code. Carefully designed loss-function may lead to improvement in NND’s performance, especially when the training data only represents a small subset of the real data. The NND is incapable of achieving ML decoding threshold due to its restricted design and operations (owing to SPA). Complexity of NND grows with density of the parity check matrix, and the number of SPA iterations. This restricts NND to be implemented for medium to long length HDPC codes.

Navneet Agrawal

Master Thesis presentation

50 / 52

Conclusions and Future Work

Future Work

Future Work

Code designing: codes designed to perform well with the NND. Generalized NND: Make better inferences using message passing on a broader family of systems represented by the factor graphs. Hyper-parameter optimization: Bayesian Optimization of Neural Network Hyper-parameters. Intelligent receiver: Combined channel-synchronizer and decoder as two separate Neural networks trained to perform optimally in tandem. Quantization of NND weights to fewer bits: Enabling fast physical layer implementation.

Navneet Agrawal

Master Thesis presentation

51 / 52

Conclusions and Future Work

Future Work

Thank you for your time.

Questions?

Navneet Agrawal

Master Thesis presentation

52 / 52

Suggest Documents