A Generalised Framework for Convolutional Decoding Using a

0 downloads 0 Views 247KB Size Report
A Generalised framework for Convolutional Decoding using a Recurrent Neural. Network. P.J. Secker, S.M. Berber and Z.A. Salcic. Department of Electrical and ...
ICICS-PCM 2003 15-18 December 2003 Singapore

3A7.6

A Generalised framework for Convolutional Decoding using a Recurrent Neural Network P.J. Secker, S.M. Berber and Z.A. Salcic Department of Electrical and Electronic Engineering The University of Auckland New Zealand Abstract This paper introduces a model of the conventional convolutional coding system based on representing encoder outputs as n-dimensional vectors in Euclidean space. Previously, it has been shown that the gradient descent algorithm can be used for bit decoding at the receiver, and can be implemented using a Recurrent Neural Network (RNN). In this paper we generalise the mathematical framework for the general rate 1 n encoder. Our simulation results confirm that the RNN decoder is capable of performing very close to the Viterbi decoder, and has been found here to work extremely well for some simple convolutional codes. Keywords: decoding, convolutional codes, recurrent neural networks

1

considered. Here the results have been generalised for any rate n1 convolutional encoder. This allows easy simulation of the RNN decoder corresponding to any convolutional code.

2

Convolutional Encoder Output

The rate n1 convolutional encoder generates an nbit symbol vector for each message bit input to the encoder. Using the discrete time index, s := t + s∆t, where ∆t is the encoder sampling interval, we can represent the general rate n1 encoder as shown in Fig. 1.

Introduction

The Viterbi Algorithm can be utilised as a relatively efficient solution to the convolutional decoding problem. However the undesirable relationship between complexity and encoder constraint length has encouraged further research into more hardware and computationally efficient decoding techniques. A common goal has been to develop a decoder that approaches the performance of the Viterbi decoder, yet does not suffer from excessive hardware complexity. Artificial Neural Networks, with their massively parallel processing structure, have attracted attention as a solution to such problems in recent years [1][2]. In the following sections, an alternative representation of the convolutional decoding problem will be shown, which is more readily applied to a neural network implementation. Naturally, if the convolutional decoding problem is to be solved by a neural network structure, it is advantageous to reformulate the problem in such a way that best lends itself to the new processing paradigm. This work is an extension of that done in [3][4][5] where only a few specific encoder cases were

0-7803-8185-8/03/$17.00 © 2003 IEEE

Figure 1: Rate

1 n

convolutional encoder model

At time s, the L-bit sequence B(s) = [b(s), b(s − 1), ..., b(s − (L − 1))] is used to produce the n-bit codeword γ(B(s)) = [γ1 (B(s)), γ2 (B(s)), ..., γn (B(s))], where L is the constraint length of the encoder. If bipolar signalling is used, i.e. we use the bit mapping b(s) ∈ {−1, +1} then it can be shown that for the general rate n1 encoder, the outputs are:

γj (B(s)) = (−1)

L Y

b(s + 1 − i)gj,i (−1)gj,i

(1)

i=1

where the gj,i s are elements of the encoder generator matrix, G, and γj (B(s)) ∈ {−1, +1}. However, if we use the alternative bipolar bit mapping

b(s) ∈ {+1, −1} (as used in CDMA systems) then eq. (1) reduces to γj (B(s)) =

L Y

b(s + 1 − i)gj,i ,

bol transmitted at time s = {0, 1, . . .}. For example, for the simple 2-dimensional case of encoder  1with  rate 1 01 the , L = 3, and generator matrix G = 111 2 components of the observed vector R(s) are

(2)

i=1

and γj (B(s)) ∈ {+1, −1}. From Fig. 1 γ(B(s)) can be viewed as a vector in an n-dimensional Euclidean space, with each component γj (B(s)) being a single coordinate in the j th -dimension. A diagram of all possible γ vectors at a single time s for the two dimensional encoder case (rate 21 ) is shown in Fig. 2. This corresponds to a single stage in the decoder trellis diagram, where only one symbol vector is actually transmitted.

Figure 2: Rate

3

1 2

r1 (s) = γ1 (B(s)) + w1 = (b(s)b(s − 2)) + w1 (3) r2 (s) = γ2 (B(s)) + w2 = (b(s)b(s − 1)b(s − 2)) + w2 (4) At time s, we have a single observed vector, R(s), at the receiver represented as a summation of the component vectors of γ(B(s)) with their corresponding additive noise vectors, as shown in Fig. 4.

Figure 4: Received vectors for rate

Convolutional Encoder Outputs

1 2

encoder case

The diagram in Fig. 4 again corresponds to a single stage of the trellis diagram, where a single n-bit symbol vector must be chosen out of all possible symbol vectors for that stage. The coded symbol vector that is closest in Euclidean distance to the received symbol vector is the symbol that was most likely transmitted at that time (that is, taking no other symbol vectors into account). This will also yield the smallest noise vector W (s).

Channel Output

Continuing our vector analysis right through to the receiver, the effect of channel noise can be modelled as affecting each symbol vector component, γj , individually. Applying an independent noise source, wj , to each encoder output, γj , for j = {1, 2, ..., n}, we arrive at the model as shown in Fig. 3.

4

Decoding Objective Function

Clearly, at time s, the decoding problem is to find the encoded symbol vector, γ(B(s)), which is closest to the received vector R(s). Likewise, for a sequence of T + 1 message bits encoded with a general rate 1 n encoder the decoder task is to find a sequence of symbol vectors γ(B(s+k)) for k = {0, 1, . . . , T } which minimises the function f (B) as in eq. (5). Figure 3: Rate model

1 n

convolutional encoder with channel f (B) =

Thus at the receiver we observe a noisy symbol vector R(s) = [r1 (s), r2 (s), . . . , rn (s)] for each sym-

T X k=0

kR(s + k) − γ(B(s + k))k2 =

T X

W 2 (s + k)

k=0

(5) 2

to minimise it. In [3] the gradient descent algorithm was used as the optimisation technique. We have an objective function, f (B) , and a design vector B = [b(s), b(s + 1), . . . , b(s + T )], which is an array of T + 1 information bits. The gradient descent algorithm can be employed to minimise f (B) with respect to this vector to yield an estimate of the information bits. To do this, we need to find the gradient of the function with respect to the design vector, as shown in eq. (9)

f (B) can be viewed as an error energy function representing the sum of all noise vectors, W , squared. T is referred to as the depth of the decoder. The terms R(s + k) and γ(B(s + k)) are both n-dimensional vectors (where n is determined by the encoder). The column brackets denote the Euclidean distance between the two vectors. There is a dependence between consecutive R/γ vectors due to the operation of the convolutional encoder and thus minimising each term of the summation independently is not necessarily the same as minimising the overall function. The Euclidean distance calculation of eq. (5) can be easily expanded as shown in eq. (6).

f (B) =

T X



∂f (B) ∂f (B) ∂f (B) , ,..., ∇f (B) = ∂b(s) ∂b(s + 1) ∂b(s + T )

 (9)

A generalised expression for the gradient term, , for the general rate n1 encoder has been derived as ∂f (B) ∂b(s+a)

[(r1 (s + k) − γ1 (B(s + k)))2 +

k=0

(r2 (s + k) − γ2 (B(s + k)))2 + . . . + L X n h X ∂f (B) = (−2)(gj,k ) rj (s + a + k − 1). ∂b(s + a) j=1

(rn (s + k) − γn (B(s + k)))2 ] (6) Eq. (6) can be further simplified with a second summation for each dimension j = 1, 2, . . . , n, as shown in eq. (7).

f (B) =

n T X X

k=1

L Y

2

[rj (s + k) − γj (B(s + k))]

(7)

k=0 j=1

" rj (s + k) −

L Y

p+1

#2

b(s+a)

b(s + k + 1 − i)gj,i

∂f (B) = b(s+a) −α ∂b(s + a) p p

(a = 0, . . . , T )

(11) The update of all parameters b(s + a) for a = 0, 1, . . . , T represents one iteration of the gradient descent algorithm. The algorithm may iterate P times, at which point a decision can be made on the one or more elements of B (message bits). The parameters can be updated one by one (sequential update), or all at the same time (parallel update). α is a learning rate factor and can be chosen to eliminate self-feedback in the RNN. In this case α is chosen as

i=1

(8) The decoding task, at time s, is to find the optimum combination of information bits, B = [b(s), b(s+ 1), . . . , b(s + T )], which minimises f (B) . Once a suitable minimum is found, we can make a decision on one or more message bits from the set {b(s), b(s + 1), . . . , b(s + T )}. The most reliable bit decision is that of b(s), since this has the largest number of subsequent received symbol vectors incorporated into the objective function.

αzsf =

5

(10)

Eq. (10) can be interpreted as a summation for each term in the encoder generator matrix G, and terms for which gj,k = 0 will cancel. The operation of the gradient descent algorithm involves updating the T +1 bit values according to eq. (11) at each iteration step, p.

By incorporating the specific encoder structure, we can evaluate f (B) in terms of message bits, b(s), used to generate the codewords. Replacing s with s + k in eq. (2), and substituting into eq. (7), we arrive at eq. (8) T X n X

i

i=1 i6=k

k=0 j=1

f (B) =

b(s + a + k − i)gj,i − b(s + a)gj,k

Minimisation of the Objective Function

1 2GP

(12)

where the subscript zsf stands for zero-selffeedback and GP represents the sum of ones in the encoder generator matrix, G . If a non-linear activation function fa is employed, then the parameter

As we have an expression for the objective function, f (B) , an optimisation algorithm must be employed 3

update function for zero self feedback is as shown in eq. (13). In [3] it was shown how a parallel implementation of eq. (13) for all a = 0, 1, . . . , T is in fact a type of single layer RNN.

b(s+a)p+1

a sigmoidal (tanh) activation function. Results presented here are for hard decision decoding only. In [3] multiple RNN decoders were used in parallel to improve decoder performance. Due to the excessive hardware resource required by this method our focus was to optimise the performance of only a single RNN decoder. Figures 5 and 6 shows the results of decoding with a single RNN decoder, with SA and without SA,  re- spectively for the convolutional code with G = 11 01 11 . As shown from the graphs the performance with SA can approach the Viterbi decoder with only a single RNN, whereas without SA the performance is clearly intolerable. Simulations were based on transmitting 1000 bursts of 100 bits, with 800 network iterations per bit. The nonlinear activation function (tanh) was used to yield the best performance of the three. A decoding depth of T=5L is employed in both the RNN decoder and the Viterbi decoder. Where SA was used, the initial variance of the gaussian noise was chosen as 0.05 and decreased linearly to zero with the number of iterations.

" L n h XX 1 (g ) rj (s+a+k−1). = fa j,k GP k=1 j=1 #! L i Y b(s + a + k − i)gj,i (p) (13) i=1 i6=k

It is well known that the gradient descent algorithm is not guaranteed to find the global minimum of the objective function, and may get stuck in local minima. One technique previously employed for overcoming this is to add a noise term to each parameter during update. This is in fact a form of Simulated Annealing (SA), where the variance of the noise is decreased as the number of iterations increases. In this case the update expression becomes ! 1 h i p+1 p · +w (14) b(s + a) = fa GP where wp is an additive white gaussian noise (AWGN) term that decreases in variance (linearly in 2 this case) from the first iteration (σw 0 ) to zero on the 2 last iteration (σwP ). The [ · ] term is the same as in eq. (13). SA has proved to be an effective global optimisation technique due to its suitability to a wide range of problems, and with no restriction on the form of cost function [6]. However the drawbacks with this stochastic gradient technique are the large number of iterations required and the increased hardware complexity [7]. An alternative is to choose a convolutional code that has a corresponding objective function with few local minima. With a general term for the gradient update term, it was possible to simulate decoding for large sets of convolutional codes, specified by their generator matrix, G . Results showed that simple codes in fact do exist that do not require SA to achieve very good decoding performance.

6

Figure 5: RNN vs. Viterbi results for rate (with SA)

1 2

code

Further simulations of large sets of convolutional codes showed that the RNN decoder in fact performs very well for some convolutional codes even without SA. Also, only 10 iterations were required per decoding bit, which dramatically improves the speed of the decoder. Some examples of are codes are those with generator matrices     1 0 1 1 0 1  , 1 1 0 , 0 1 0 0 0 1

Simulation

Software simulations of the RNN decoding technique were performed with a large number of convolutional codes. Other configurable parameters included decoding with or without SA, fully parallel or sequential parameter update, and three different neuron activation functions: signum (hard-limiting), ramp, and

which are rate 4

1 2

and

1 3

encoders respectively.

ferent convolutional codes revealed that a number of codes do not require the complex simulated annealing technique, yet performance still approaches that of a Viterbi Decoder.

References [1] Cheolwoo You and Daesik Hong. Neural convolutional decoders in the satellite channel. In Neural Networks, 1995. Proceedings., IEEE International Conference on, volume 1, pages 443–448, 1995.

Figure 6: RNN vs. Viterbi results for rate SA)

1 2

[2] Stephen B. Wicker and Xiao-an Wang. An artificial neural net decoder. IEEE Trans. Communications, 44(2):165–171, Feb. 1996.

code (no

[3] A. Hamalainen and J. Henriksson. Convolutional decoding using recurrent neural networks. In Neural Networks, 1999. International Joint Conference on, volume 5, pages 3323–3327. IJCNN, 1999.

The performance for the latter case is shown in Fig. 7. Here SA is not used and the RNN iterates 10 times per decoded bit. The tanh activation function is used and all neurons were updated in parallel. As we can see for such codes the performance approaches Viterbi decoding even without the complexity of SA.

[4] A. Hamalainen and J. Henriksson. Novel use of channel information in a neural convolutional decoder. In Neural Networks 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on, volume 5, pages 337–342. ICJNN, 2000. [5] A. Hamalainen and J. Henriksson. A recurrent neural decoder for convolutional codes. In Communications, 1999. ICC ’99. IEEE International Conference on, volume 2, pages 1305–1309. ICC, 1999. [6] Y. Li, J. Yao, and D. Yao. An efficient composite simulated annealing algorithm for global optimization. In IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions, pages 1165–1169, 2002.

Figure 7: RNN vs. Viterbi results for rate SA)

7

1 3

[7] P. Mendonca and L. Caloba. New simulated annealing algorithms. In IEEE International Symposium on Circuits and Systems, pages 1668–1671, Hong Kong, June 9-12 1997. IEEE.

code (no

Conclusions

We have extended previous results to simplify and generalise the convolutional decoding technique based on the minimisation of a certain error energy function. The generalised gradient formula used for neuron update enables us to take the generator matrix G , for any rate n1 convolutional encoder, find the gradient descent update factor via simple substitution and consequently map the result onto a neural network structure for decoding purposes. Simulations for many dif5

Suggest Documents