Introduction to Tornado Codes Andrea Soldá∗ University of Salerno
[email protected]
Abstract Tornado codes belong to the class of error-correcting codes for erasure channels, based on the construction of randomly connected irregular bipartite graphs. Using a carefully chosen graph structure, tornado codes can achieve nearly optimal efficiency, and with linear complexity for encoding and decoding and rather simple and fast algorithms they are predestined for the encoding of large amounts of data, and as an alternative to the classical ARQ concept as used on the Internet. This work starts with a short introduction to error-correcting codes and then continues with explaining the principal idea and construction of tornado codes. Then a short comparison between tornado codes and Reed-Solomon codes − the most commonly used error-correcting codes − is presented.
I.
ii.
Introduction
Generally spoken, Forward Error Correction (FEC) means the addition of redundancy to information in a way that errors after transmission can be detected or corrected [2]. Trivial error-correcting codes are the simple parity check, adding a single redundant bit as the sum of all data bits modulo 2, and the repetition code, which repeats every character multiple times. The simple parity check can detect single bit errors within a block, while the repetition code of length 2t + 1 can correct up to t errors. For this, the character encountered the most is regarded as the one most probable. However, with increasing error-correcting capability messages using the repetition code get very large. The code rate R describes the ratio between the number of original bits (data bits) and the number of total bits.
very message transmitted from a sender to some recipient is inevitably affected by noise within the channel the message passes through. Noise has different causes, such as thermal noise within the channel, or interference with other communication, and manifests in errors within the delivered message [1]. Naturally, the aim of any communication is reliable delivery of information, which is equivalent to minimization of errors within the message. There are different ways to accomplish such goal.
E
i.
Error Correcting Codes
Watching human conversation one can observe that with a low number of errors a message is still understood correctly. This is possible because human language contains lots of redundancy, enabling the recipient to reconstruct the original message. This leads to a specific solution for achieving reliable communication: the use of forward error-correcting codes. ∗ Master
Forward Error Correction
R = k/n where: k = data length n = block length n − k = number of parity bits
Student, Information Theory class
1
Andrea Soldá • Information Theory • A.A. 2015/16
iii.
Channel Models
The assertion that for the repetition code the character most encountered is the most probably is based on the assumption that errors occur independently of the character with equal probability and independently of each other. Using such idealized channel model [3], if there is only two different kinds of characters then this is called a Binary Symmetric Channel (BSC). Because errors occur independently of their position, this model is also often referred to as a discrete memoryless channel.
In 1948 Claude Shannon surprised the experts with his publication entitled "A Mathematical Theory of Communication" [4], which showed that under certain circumstances and with growing block length of a FEC block the probability of error tends towards 0 [5] [6]. First let the capacity C of a channel be defined as follows: C = W · log2 (1 + S/N ) bits/sec C is dependent of the bandwidth W and the signal-noise ratio S/N, and is directly related to the error probability p of the channel. For the BSC the capacity is given by: CBSC = 1 + p · log2 p + (1 − p) · log2 (1 − p) and for the BEC by: CBEC = 1 − p
Another important channel model is the Binary Erasure Channel (BEC). Within this model, the positions of the errors are known. As such, errors are usually called erasures instead. If such error gets detected, it can be corrected right away, so the BEC offers better error correction capability than the BSC.
The central theorem in Shannon’s work says that there exists FEC codes for which the error probability of a FEC block with growing block length tends towards 0 under the condition that the code rate is lower than the capacity of the channel. Therefore:
∃ FEC codes: R < C : limn→∞ P(error ) = 0 Reciprocally, if the code rate is larger than the capacity then the error probability tends towards 1.
v.
iv.
Noisy Channel Coding Theorem
In the beginnings of electronic data communication when analog techniques were still dominant it was assumed that the only way to achieve arbitrarily low transmission error rates was to use either sufficiently high signal power or sufficient amount of redundant information.
Hamming and Linear Block codes
Richard Hamming realized as one of the first what the essential idea for constructing FEC codes was. The Hamming distance d between two vectors X and Y is defined as the number of locations in which the two vectors differ. The bigger the hamming distance between each pair of code words of a FEC block, seen as vectors of characters, the more errors can be detected. A code with a minimal Hamming distance dmin between each pair of code words can detect dmin − 1 bit errors and correct (dmin − 1)/2 errors [3]. Within decoding respectively error correcting, each received word is assigned the nearest code 2
Andrea Soldá • Information Theory • A.A. 2015/16
word in respect to the Hamming distance, which corresponds to a minimization of the error probability. This problem generally is NP-complete [7].
correct up to (n − k)/2 errors. Within the BEC model a FEC block can thus be reconstructed after receiving any k symbols. For this, the efficiency of a FEC codes shall be defined as:
Hamming codes are a special case of linear block codes [3]. A block code is a FEC code which encodes blocks of characters instead of single characters. Linear means that every linear combination of valid code words constitute a valid code word as well. Because of this property a linear code can be viewed as a sub-space GF (q)k of a vector space GF (q)n where GF (q) denotes a Galois field of order q, and the set of all code words can be represented by a matrix G of size (k ∗ n). G is called the generator matrix because all code words can be generated using this matrix:
efficiency = #symbols required f or decoding
#data symbols
c = d · G = (d0 , ..., dk−1 ) · [ Ik | P] = (d0 , ..., dk−1 , pk , ..., pn−1 ) where: d = data symbols and p = parity symbols. Decoding of a code word works with help of the parity-check matrix H. H is the null space over all code words. A word is a code word if and only if it is enclosed within the null space H: syndrome s(c) = c · H T = 0 The complexity of linear block codes is generally O(n2 ) for the encoding step, while decoding is NP-complete because of the problem of maximum likelihood decoding.
II.
RS codes therefore have an efficiency of 1. In the encoding step the redundant data symbols get computed using (Lagrange) interpolation over the given k data symbols. For this, the polynomial P( x) = c0 + c1 x + ... + ck−1 x k−1 shall be chosen in a way that P(0) = d0 , ..., P(k−1) = dk−1 , where d0 to dk−1 denote the original k data symbols. Then the encoded symbols are given by P(0) , ..., P(n−1) . RS codes are therefore systematic codes [9]. Decoding within the BEC model works analogously to encoding by interpolating among k (correctly) received symbols to reconstruct the missing ones. Error correction within the BSC is substantially more complicated, and comprises computing the syndrome, locating the error positions, and finding the correct symbols which essentially requires solving a system of linear equations. The algorithms used in practice have an encoding complexity of O(n log n) and a decoding complexity of O(n2 ). Because of the quadratic complexity and the costly computations (GF arithmetics) the maximum practical block length is 255. RS codes are used in a wide range of applications, including: • • • • • •
Reed - Solomon Codes
Reed-Solomon codes (RS codes) [3] [8] [9] have been invented by Reed and Solomon in 1960. RS codes constitute a special case of linear block codes. RS codes have a number of outstanding properties. They are so-called maximum distance separable (MDS) codes, i.e. they have a maximally possible "minimum Hamming distance" dmin = n − k + 1 between each pair of code words, and therefore can detect up to n − k and
Memory systems Wireless and mobile communications Satellite communications Digital TV ADSL CDs/DVDs
III. i.
Tornado Codes
Gallager Codes
The FEC codes developed by Gallager in 1963 as part of his dissertation at the MIT consitute 3
Andrea Soldá • Information Theory • A.A. 2015/16
the beginnings of the history of origins of Tornado codes [10] [11] [4]. These Gallager codes are based on bipartite regular graphs. At the left side there are the data as well as parity bits. q of these bits at a time are chosen randomly and connected with one of the nodes on the right side, with the bits selected in a way that their sum modulo 2 equals 0. The node degree is kept constant on each side. Here is a diagram of a Gallager code as a bipartite graph:
must be carried out. This is extremely time consuming. Gallager codes are no optimal codes. Especially, by connecting nodes randomly there arises a variable efficiency that converges with the block length going to infinity. Thus, Gallager codes can only start to be useful for large block lengths. Because of the large amounts of data, the complicated computations, and the invention of Reed-Solomon codes at about the same time, Gallager codes got forgotten for about 30 years after which they were re-discovered. The final breakthrough was reached in 1997 [12]. From this work the so-called Tornado-Codes emerge.
ii.
In this form Gallager codes are easily representable as linear codes. The parity-check matrix H then forms to:
Tornado Codes
Tornado codes [12] are systematic FEC codes for large block lengths. The underlying channel model of these codes is the Binary Erasure Channel (errors are thus indepedently distributed). In difference to Gallager codes their construction is not based on bipartite regular graphs but on bipartite irregular graphs whose structure is extremely important.
H = [ D | P] =
0 1 1 0 1
1 1 0 1 0
1 1 0 0 1
1 0 1 1 0
1 0 1 1 0
1|0 0|0 0|1 1|1 1|1
1 1 1 0 0
0 1 1 0 1
0 1 0 1 1
In each row there are q and in every column there are p ones, q being the right and p the left node degree. The total number of ones conforms to the number of edges in the representation as a bipartite graph. Because the number of nodes is rather low in general the parity-check matrix is sparse. Because of this, these codes are also referred to as LowDensity Parity-Check (LDPC) codes. For solving a system of linear equations with sparse matrices there exist specialized (faster) algorithms. However, encoding is complicated because computing the generator matrix requires solving a system of linear equations with dense matrices, so a complete matrix multiplication
Tornado codes can be encoded and decoded in O(n · ln(1/ε)) time, where ε is a deliberately chosen positive constant. The maximum reliable code rate, i.e. the rate for which the FEC block error probability converges towards 0, is then given by 1 − p/(1 − ε), with p denoting the error probability of the BEC. Using the parameter ε this value can reach arbitrarily close 4
Andrea Soldá • Information Theory • A.A. 2015/16
to the code rate of the channel (i.e. the Shannon bound). Therefore, Tornado codes can be considered "nearly" optimal codes (in other words, "nearly" MDS codes) with an efficiency of about 1/(1 + ε). Because of the linear complexity and the comparably simple computation Tornado codes are very fast, and also the principal algorithms for the encoder and decoder are simple.
iii.
all but one adjacent data bits are known, by computing the sum (XOR) of the check bit with all known data bits.
Construction of Tornado Codes
For the construction of Tornado codes the parity bits of the Gallager codes get moved to the right side of the bipartite graph and considered as check bits. On the left side there is n data bits, and the right side is viewed as containing β · n check bits (β < 1).
For the encoding the check bits get constructed by computing the sum (XOR) of all adjacent data bits.
The aim of the whole decoding procedure is that repeated decoding steps eventually reconstruct all missing data bits. This should work if at most (1 − ε) · β · n data bits are missing. By removing the "decoded" (used) edges within each decoding step, decoding results in linear complexity in the number of edges just as for the encoding. Nodes gaining a node degree of 0 are treated as removed as well. Using the principle described so far missing data bits can be reconstructed using the check bits. But how can missing check bits be derived, in a way that they are useful for reconstructing other data bits? The solution is to cascade the graphs so a series of codes C ( B0 ), ..., C ( Bm ) over the graphs B0 , ..., Bm emerge. The first graph B0 contains as before the n data bits at its left side. The check bits of each graph Bi then resemble the new data bits for the following graph Bi+1 . Therefore, every graph Bi contains βi · n left and βi+1 · n right nodes. √ This is repeated recursively until βi+1 · n ≤ n. For these last check bits an ordinary erasurecorrecting code C with code rate 1 − β can be used that can reconstruct all missing data for a loss up to β percent with high probability.
For decoding, a missing data bit can be reconstructed if the value of the check bit and 5
Andrea Soldá • Information Theory • A.A. 2015/16
Under the condition that C can encode and decode in O(n2 ) time the overall complexity is linear with: √ O(( n)2 ) = O(n) A possible candidate for the code C is the Reed-Solomon erasure-correcting code. The whole code ( B0 , ..., Bm , C ) is therefore a code with n data bits and m +1
∑
βi n + βm+2 n/(1 − β) = n( β/(1 − β))
We want to show now that under the condition of a maximum loss of β(1 − ε) percent, and as long as there exist left nodes, there is always a right node in the graph with degree 1. Because this is based on probability theory, only the asymptotic behavior can be analyzed. For this we define an edge with left (right) degree i as an edge that is connected to a left (right) node with degree i. Let (λ1 , ..., λm ) be the vector that holds the procentual proportion of edges with left degree 1...m. Analogously, let (ρ1 , ..., ρm ) be the vector holding the procentual proportion of edges with right degree 1...m. Now, within every decoding step 1 out of E edges gets removed: ∆t = 1/E = 1 time unit = 1 decoding step, t = 0...1 Let li (t) be the proportion of edges with left degree i at time t. Analogously, let ri (t) be the proportion of right edges with degree i at time t. The proportion of all edges e(t) at time t results in:
i =1
check bits, and thus has a code rate of 1 − β. If every sub-code can be decoded with high probability at a maximum loss of β(1 − ε) percent of the data then the whole code can be decoded with high probability and a maximum loss of β(1 − ε) percent of the data.
iv.
Decoding Procedure
The goal is now finding suitable sub-graphs; the decoding procedure can only continue when there is a check bit on the right side of a sub-graph for which exactly one adjacent data bit is missing, i.e. there is a right node with degree 1. In this case the missing data bit gets reconstructed (replaced by the check bit), all edges connected to the left node get removed (thereby always computing the XOR of the reconstructed data bit with the adjacent check bit), and finally the left and right node get removed. This decoding step gets repeated as long as there are right nodes with degree 1. The whole decoding procedure is successful when all nodes on the left side got removed.
e ( t ) = ∑ li ( t ) = ∑ r i ( t ) Further, at every decoding step a right node with degree 1 gets chosen, and the adjacent left node and all its edges get removed. The probability that the left node degree is exactly i is li (t)/e(t), and in this case i edges of degree i get removed. This is expressed by the following equation: Li (t + ∆t) − Li (t) = −i · li (t)/e(t) where Li (t) denotes the number of edges with left degree i, and therefore Li (t) = li (t) · E = li (t)/∆t. Because only the asymptotic behavior can be analyzed, we let the number of edges go to infinity: E → ∞ and therefore ∆t → 0. lim∆t→0 Li (t + ∆t) − Li (t) =
= lim∆t→0
li (t+∆t)−li (t) ∆t
=
= dli (t)/dt = −i · li (t)/e(t) This differential equation can be solved more easily when we substitute dt/e(t) by dx/x. With the initial condition li (t = 0) = δλi , 6
Andrea Soldá • Information Theory • A.A. 2015/16
where δ denotes the number of missing (resp. remaining) left nodes, the proportion of left nodes with degree i results in: li ( x ) = δλi xi , x ∈ [1, 0) Solving the differential equations for ri ( x ) resp. r1 ( x ) is more complicated. The result for r1 ( x ) is expressed using the so-called "degree sequence functions" (polynomials): λ( x ) = δλi xi−1 ρ( x ) = δρi xi−1 Finally we obtain: r1 ( x ) = δλ( x )( x − 1 + ρ(1 − δλ( x ))) This proportion of right edges with degree 1 shall now be always bigger than 0, for all x ∈ [1, 0). The inequality solved now leads to the central condition for a successful decoding procedure:
δ ≤ 4 · (1 − (1/(d − 1))(1/(d/β−1)) ) The peek acceptable error rate δ for the decoding to be successful is achieved when d = 3. As an example let β = 0.5 so the highest acceptable error rate is at most 0.43%; then to reconstruct a block at least 1.14 times the amount of original data must be received, far from the optimum 1.
ii.
Irregular graphs
Let the distribution of the left degrees ("left degree sequence") be given by the following "truncated heavy tail" distribution with parameter d: λi = 1/( H (d)(i − 1)), i = 2...d + 1 where
ρ(1 − δλ( x )) > 1 − x, ∀ x ∈ [1, 0) Considering that these calculations describe the asymptotic behavior, the following is concluded: Provided that the central condition is fulfilled, with growing block length the probability of successful reconstruction for all data converges towards 1
d
H (d) =
1 i i =1
∑
i.e. the harmonic series till 1/d. The average left node degree then results in ∼ ln(d).
Contrary the following can be said: Is this condition violated then the probability of successful decoding converges towards 0 Because this inequation describes exactly only the asymptotic behavior, in practice there is mostly a few left nodes remaining. However, if there is at most η% nodes left, for some η > 0, and is λ1 = λ2 = 0, then the decoding procedure terminates successfully [12].
IV. i.
Irregular graphs
Regular Graphs
If on the graph’s left side there are n nodes of degree d, and on the right side there are βn nodes then these have (necessarily) degree d/β. The "degree sequence functions" then result in λ( x ) = x d−1 and $( x ) = x d/β−1 . Now if d ≥ 3 then the central condition is only fulfilled when
Let the right degrees ("right degree sequence") be distributed after Poisson (the power series of the distribution can be approximately viewed as a polynomial): ρi = (e−α ai−1 )/(i − 1)!, i ≥ 1 The average right node degree then results in ar = αeα /(eα − 1), with α chosen such that ar = al /β. 7
Andrea Soldá • Information Theory • A.A. 2015/16
VI.
Practical example
In this section a working example is presented. Let n be 14 and β = 12 .
i.
Tornado graph
This is a possible graph for a 14 bit word:
With the specified edge degree distributions given above the central condition is fulfilled exactly when δ ≤ β/(1 + 1/d). The peak acceptable loss ratio is hence determined by the parameter d. The higher d is selected the nearer the code comes to the Shannon bound; we therefore have an asymptotically optimal code. Which d is chosen must be weighed against the encoding/decoding complexity as d also defines the number of edges by n · al ∼ n · ln(d). The complexity for encoding and decoding therefore results in O(n ln(d)).
V.
Tornado vs Reed - Solomon
The following tables shall give a short comparison for the most important properties of Tornado and Reed-Solomon codes:
Efficiency Calculation Encoding Decoding Block Length
Tornado
RS
∼ 1/(1 + e) XOR O(n · ln(1/e)) O(n · ln(1/e)) (Very) large
1 GF arith. O(n · log n) O ( n2 ) ≤ 255
Running time for Tornado Codes is about 100 times faster then standard implementation for RS Codes.
The check-bits’ value are given by the followinq equations: c1 = d1 = 1 c2 = d1 + d2 + d3 + d4 = 1 c3 = d2 + d3 + d4 + d5 + d6 = 1 c4 = d7 + d8 + d11 + d13 + d14 = 1 c5 = d9 + d10 + d12 + d13 + d14 = 1 c6 = d9 + d10 + d11 + d12 + d13 + d14 = 0 c7 = d5 + d6 + d7 + d8 + d9 + d10 + d11 + d12 + d13 + d14 = 0 8
Andrea Soldá • Information Theory • A.A. 2015/16
ii.
Reconstruction
Let’s assume that during transmission bits d2 , d6 and c2 are lost. The graph for the decoding procedure is:
[4] C. E. Shannon, "A Mathematical Theory of Communication", The Bell System Technical Journal, 1948 [5] D. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press (http://www.inference. phy.cam.ac.uk/mackay/itila/), 2003 [6] "Bell Labs celebrates 50 years of Information Theory", http://www.lucent.com/ minds/infotheory/docs/history.pdf [7] A. Shokrollahi, "An Introduction to LowDensity Parity-Check Codes" [8] M. Mitzenmacher, Lecture Notes on Course CS 222 [9] "Reed-Solomon Codes", http://www. 4i2i.com/reed_solomon_codes.htm
With the first step, d6 in obtained as d6 = c3 + c7 = 1 During the second step, d2 gets reconstructed as d2 = c3 + d3 + d4 + d5 + d6 = 0 Finally, c2 is computed as c2 = d1 + d2 + d3 + d4 = 1
[10] Vincent Roca, Zainab Khallouf, Julien LabourÃl’, "Design and evaluation of a Low Density Generator Matrix (LDGM) large block FEC codec", presented at the Fifth International Workshop on Networked Group Communication, 2003 [11] A. C. Snoeren, "On The Practicality of Low-Density Parity-Check Codes", http://nms.lcs.mit.edu/~snoeren/ papers/area-exam.pdf, 2001 [12] M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, D. A. Spielman, V. Stemann, "Practical Loss-Resilient Codes", Proceedings of the 29th ACM Symposium on Theory of Computing, 1997
References [1] S. Lin, D. J. Costello Jr., Error Control Coding: Fundamentals and Applications, Prentice Hall, 1983 [2] S. Robinson, "Coding Theory Meets Theoretical Computer Science", SIAM News, Vol. 34 [3] T. R. N. Rao, E. Fujiwara, Error-Control Coding for Computer Systems, Prentice Hall, 1989 9