A Technique to Prove the Channel Coding Theorem for a ... - CiteSeerX

A Technique to Prove the Channel Coding Theorem for a Reduced Class of Codes and its Applications Nadav Shulman and Meir Feder Department of Electrical Engineering - Systems Tel-Aviv University, Tel-Aviv, 68878, ISRAEL

Abstract The channel coding theorem for DMCs is proved using random coding. The standard proofs [1],[2]. are strongly using the property of total randomness of the ensemble of codes. For a more restricted ensemble of codes a specific proof is needed. This was done, for example, for linear codes, time-varying convolutional codes, see [3, pages 189 and 301], and for constant composition (constant type) codes, see [4]. In this work we show that weaker demands on the ensembles of codes, than independency of the code words, are needed. As a result, we get simple proofs of the channel coding theorem for interesting restricted ensembles of codes. As an example we can prove the channel coding theorem for the ensembles of codes based on the output of an LFSR (linear feedback shift register). A particularly interesting application of our technique is a proof that fixed (time-invariant) convolutional codes attain capacity.

bility on the ensemble is X

P¯e =

Pe (C a ) · Pr{C a }.

a∈A

In Gallager’s proof for the channel coding theorem (see [2, pp. 135–140]) for a channel with transition probabilities PN (y|x) and rate R, we select each word independently with the distribution QN (x), and we get the following upper bound on the average error probability (of a maximum–likelihood decoder) : P¯e ≤ e

" X X

ρN R

y

#1+ρ 1/(1+ρ)

QN (x)PN (y|x)

x

(1) for any 0 ≤ ρ ≤ 1. If the channel is discrete and memoryless, i.e. N Y

PN (y|x) = and we choose QN (x) =

1

P (yn |xn )

n=1

Introduction

N Y

Q(xn )

n=1

then we have In the channel coding theorem we analyze the average error probability on an ensemble of codes E = {C a |a ∈ A} where A is the group of all relevant indices and C a is a specific code, C a = {ca1 , . . . , caM } and cai = (cai, 1 , . . . , cai, N ) where cai, j is a symbol that the channel can receive. The rate of each code in the ensemble is R = logNM . If the probability of error in the ath code is Pe (C a ) and the probability to choose the ath code from the ensemble is Pr{C a } then the average error proba-

P¯e ≤ exp{−N [E0 (ρ, Q) − ρR]}

(2)

where E0 (ρ, Q) = − ln

" X X j

#1+ρ

Q(k)P (j|k)1/(1+ρ)

k

Now, we know that the random coding exponent: Er (R, Q) = max {E0 (ρ, Q) − ρR} 0≤ρ≤1

is positive for R < I(Q; P), see [2, pp. 140–144]. The ensemble of codes used in this proof contains all the possible codes with the following probability assignment: Pr{C a } =

M Y

• If we do not have pairwise independency, but we have for each i 6= j Pr{ci , cj } ≤ α · Pr{ci } · Pr{cj } and

QN (cai )

Pr{ci } ≤ β

i=1

M Y

Pr{c1 , . . . , cM } =

for α, β ≥ 1 then we can have P¯e ≤ αρ β 1+ρ exp{−N [E0 (ρ, Q) − ρR]}

Pr{ci }

i=1

where we define X

Pr{C a }

{a∈A|c

a i =v}

i.e. all the code words are totally independent and have marginal distribution QN (x).

2

Q(ci, n )

n=1

This probability assignment implies that

Pr{ci = v} =

N Y

Relaxed assumptions on the ensemble of codes

This is proved by using the fact that pairwise independence is sufficient and that by upper bounding the code words probabilities we still get an upper bound on the error probability. • Given an ensemble of codes E let us define a ˜ that contains all the codes from new ensemble, E, E and their permutations, i.e. E˜ = {C a, π |C a ∈ E, π ∈ SM } where C a, π = {caπ(1) , . . . , caπ(M ) } with the probability assignment Pr{C a, π } = M1 ! Pr{C a } It is obvious that Pe (C a )=Pe (C a, π ) and so the average error probability on both ensembles is the same. We can see that for each i 6= j we have

We now present weaker assumptions on the ensemble of codes that are sufficient for proving Gallager’s bound on the average error probability of this ensemble: • As we pass through the proof of the channel coding theorem, we can see that instead of choosing each word independently with the probability measure QN (x), we can chose the code words to be only pairwise independent and the marginal distribution of each word will be QN (x) (this fact was already noticed and utilized in [5] and [6]) i.e. it is enough that the ensemble of codes, E, will satisfy for each i 6= j: Pr{ci , cj } = Pr{ci } · Pr{cj } This weaker assumption does not change the expressions in the proof, since the proof is based on a union bound on the error probabilities between two code-words. Thus, we see that pairwise independency between the code words is sufficient to achieve the upper bounds in (1) and (2).

Pr{c˜i = v, c˜j = u} X 1 Pr{ck = v, cl = u} = M (M − 1) k6=l The extra randomness can be utilized to prove the ˜ If this is average error bound in the ensemble E. ˜ proved for E then we get a bound for the error probability for E as well. These weaker demands enables us to deal with ensembles of codes that have very specific structure, and are far from being totally random. In some cases we can use those demands to prove that a specific code achieves Gallager’s bound.

3

Example: LFSR

Codes based on

We show now an example of ensembles of codes that have the pairwise independence property, and can attain the capacity for BSC channel. Suppose l and v are random vectors uniformly distributed

over {0, 1}N . Let Tk : {0, 1}N → {0, 1}N , k = 1 . . . M , be invertible transformations (not necessarily linear), such that Tk ⊕ Tk‘ (k 6= k‘) are invertible too. Consider the code with rate R = N −1 log M , whose codewords are: {xk = Tk (l) ⊕ v,

k = 1, . . . , M } .

The ensemble of such codes (each code depend on the random choice of l and v) satisfy that the codewords are pairwise independent, and the marginal distribution of each codeword is QN (xk ) = ( 21 )N . This is the distribution that achieve capacity for the BSC, and so we can apply the theorem above to deduce that at least one code from this ensemble attains the capacity. Note that for BSC, v does not change the performance of the code, so it can be omitted from the definition of the ensemble of codes. To be even more specific, let L be the linear transformation of maximal length Linear Feedback Shift Register (LFSR) with length N , and {nk }M k=1 be integers such that nk 6= nk‘ (mod 2N − 1). Then the transformations Tk = Lnk (e.g. Tk = Lk ) k = 1, . . . , M , have the required properties above. Thus out of all codes generated by circulating an LFSR, with an initial random content, there is a code which attains (for large N ) the capacity. This is interesting, since we show that an LFSR, which in a sense can be considered as a pseudo random number generator, can be used instead of a “true” random generator to satisfy the channel coding theorem.

4

Fixed Convolutional Codes

It is well known that time-varying convolutional codes can achieve the capacity of a discrete memoryless channel [3]. The time varying assumption is needed in the proof to assure pairwise independency between the codewords. The problem whether time-invariant convolutional codes attain capacity and if so, what is the error exponent is unsolved. Here we show that fixed convolutional codes can achieve the capacity (albeit the error exponent here may be suboptimal).

We consider the following setting of fixed (timeinvariant) convolutional codes with rate R = b/n bits per symbol: At each time instance an information vector ut = {u1t , u2t , . . . , ubt } of b bits is pushed into a delay line (register) of length K (i.e. the delay line contains b · K bits). Then n · q bits, ai,j , i ∈ {1, . . . , n}, j ∈ {1, . . . , q}, which are linear combinations of bits in the register are calculated (those combinations define the specific convolutional code). n output symbols, {oi }ni=1 , are produced using the simplest mapping from bits to channel symbols, M : {0, 1}q → {1, . . . , J} defined by a distribution Q(k), k ∈ {1, . . . , J}, in the following way: oi = M(ai,1 , . . . , ai,q ) is the smallest value satisfying, oi X k=1

q X

Q(k) >

ai,j · 2−j .

j=1

For example, for a BSC with Q(1) = Q(2) = 1/2 we get oi = ai,1 . We show that for a given DMC with a transition probability P (y|x), a distribution Q(x) and any b and n such that b/n < I(Q; P ) there exist a sequence of convolutional codes of increasing K such that lim Perror → 0

K→∞

exponentially

where Perror is the probability of an error in decoding N ·b transmitted bits. I(Q, P ) is the mutual information defined by the input distribution Q and the channel P . If we fix q as well, we can achieve rates up to I(Q(q) ; P ), where Q(q) is a quantized version of Q(x) at resolution of 2−q . In [7] it is claimed (but without a complete proof), that the time-invariant convolutional codes attain capacity, and furthermore, the error exponent is the same exponent obtained for timevariant codes. However, the claim in [7] requires large b and a more complex mapping from bits to channel symbols. In a recent paper [8], it is shown that fixed 1/n codes have an error probability that vanishes exponentially, but this is proved for low rates R < Cn , only for BSC, and only for this special case of b = 1. We outline below our simple proof, showing that at any rate b/n below capac-

ity there exist a fixed convolutional code whose error probability is upper bounded by an expression which vanishes exponentially with the constraint length. For simplicity we shall show the outlines of the proof for rates 1/n and for BSC, i.e. b = 1 and q = 1. The complete proof is very similar and needs only some minor changes.

Outline of the proof We analyze the average performance of an ensemble of convolutional codes, with R = 1/n defined by a randomly chosen q · n linear combinations, vi = {vi1 , vi2 , . . . , viK } , i ∈ {1, . . . , n}, (requiring qnbK = nK random bits), and by a random initial value of the register. If the register value at time t is bt then we transmit V · bt where V is the matrix with vi as rows. Our proof analyzes a sub-optimal decoding procedure in which at each time point t we decode the information symbol (bit, for b = 1) ut based on a future observed block of size Lt · n symbols (in the case of BSC the output symbols are bits as well). The value Lt will be chosen, as described below, so that our relaxed assumptions, needed to get the upper bound on error probability in block codes, will be satisfied, i.e., that there will be a pairwise independence between the true codeword and any codeword that can cause an error in decoding ut . If an error will occur at any time point, we shall declare that our decoding has failed. We shall show that, on the average, the error probability in decoding ut will vanish, exponentially in K. Thus, as long as the information sequence length N is short enough, Perror will also vanish exponentially. Specifically, we first constrain Lt < K/2. Then, we use the fact that if A is a binary matrix with rank l and v is a random binary vector with uniform iid components , then the random vector Av has l uniform iid components. A lower bound, Lt · n, on the number of symbols we can use and still have pairwise independence, can be calculated from the current register value. We assume to know the current register value, since otherwise an error has already occurred, and there is no need

to calculate the error probability in decoding ut . We mark by b the correct register value and by d the register value in an incorrect path, which diverges from the correct path at time t bt+l = (ut+l , . . . , ut , ut−1 , . . . , ut+l−K+1 )T dt+l = (wt+l , . . . , wt , ut−1 , . . . , ut+l−K+1 )T where wt 6= ut (i.e. bt0 = dt0 for t0 < t and bt 6= dt ). The corresponding code words are (V · bt , V · bt+1 , . . . , V · bt+Lt −1 ) for the correct path, and (V · dt , V · dt+1 , . . . , V · dt+Lt −1 ) for the incorrect path. We want to use such an Lt that will guarantee independence between the two code words. Since V is uniformly distributed random matrix, those code words will be statistically independent if {bt , . . . , bt+l−1 , dt , . . . , dt+l−1 } are linear independent. It can be shown that taking Lt = l, where l is the maximum number such that the rows of the matrix       

ut− K ut− K −1 · · · ut−K+1 2 2 ut− K +1 ut− K · · · ut−K+2 2 2 .. .. .. .. . . . . ut− K +l−1 ut− K +l−2 · · · ut−K+l 2

      

2

are still linear independent, will ensure the linear independence and hence the desired pairwise statistical independence. (Again, this matrix is known to the decoder because it contains only bits that have already been decoded). Now, we analyze the error probability, averaged over a uniform choice over the messages, i.e., under the assumption that u are uniformly distributed. In this case K we have Pr{Lt = l} ≤ 2b(l− 2 ) . For each value of Lt we face the situation where we observe a block of Lt · n symbols and we try to decide between at most 2b·Lt randomly chosen different possible inputs. The error probability in this case can be upper bounded by Gallager’s exponential expression for block codes. Using this expression, and

taking the expectation with respect to Lt , we get, K/2

Pe ≤

X

K

2b(l− 2 ) ∗ 2blρ ∗ 2−nlE0 (ρ,Q)

l=0 −b K 2

= 2

K/2

X

2−nl(E0 (ρ,Q)−(ρ+1)R)

l=0

≤

1−( K +1)nEr (R,Q) 2

2

K

− 2− 2 2−n(Er (R,Q)−R) − 1

Perror ≤ N Pe For R < I(Q; P ) and logN = o(K · n), the expression above goes to zero exponentially with K · n.

References [1] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley, 1991. [2] Robert G. Gallager. Information Theory and Reliable Communication. Wiley, 1968. [3] Andrew J. Viterbi and Jim K. Omura. Principles of Digital Communication and Coding. McGraw-Hill, 1979. [4] Imre Csiszàr and Jànos Körner. Information Theory. Academic Press, New York, 1981. [5] R. L. Dobrushin. Asymptotic optimality of group and systematic codes for some channels. Theory of Probability and its Applications, pages 47–60, 1963. translated from Teor. veroyat. i ee primen. [6] E. M. Gabidulin. Limits for the decoding error probability when linear codes are used in memoryless channel. Problems of Information Trans., pages 43–48, 1967. translated from Problemy peredachi informatsii. [7] K. Sh. Zigangirov. Time-invariant convolutional codes: Reliabilty function. In Proc. 2nd Joint Soviet-Swedish Workshop Information Theory, Gränna, Sweden, April 1985. [8] Gèrald Sèguin. A random coding bound for fixed convolutional codes of rate 1/n. IEEE

Trans. on Inform. Theory, vol. IT-40:pp. 1668– 1670, September 1994.

A Technique to Prove the Channel Coding Theorem for a ... - CiteSeerX

A Technique to Prove the Channel Coding Theorem for a ... - CiteSeerX

Suggest Documents

The human gene coding for HCN2, a pacemaker channel ... - CiteSeerX

The human gene coding for HCN2, a pacemaker channel ... - CiteSeerX

How to prove Sklar's Theorem

How to prove Sklar's Theorem

FSK coding technique for the ... - CiteSeerX

A transformational approach to prove outermost ... - CiteSeerX

A Bi-level Block Coding Technique for Encoding Data ... - CiteSeerX

A Bi-level Block Coding Technique for Encoding Data ... - CiteSeerX

Joint source-channel coding for a quantum multiple access channel

A Modified Coding Technique For High Peak-To ... - Semantic Scholar

Nesterenko-like rational function, useful to prove the Apery's theorem

Adaptive Channel Coding for Mobile Channels - CiteSeerX

Bluetooth: Channel Coding Considerations - CiteSeerX

Joint Source-Channel Coding - CiteSeerX

A New Technique to the Channel Assignment Problem in Mobile ...

Coding in the Block-Erasure Channel - CiteSeerX

How to prove yourself - CiteSeerX

Coding for the Fading Channel: a Survey - Semantic Scholar

Coding for the Optical Channel: the Ghost-Pulse Constraint - CiteSeerX

Joint Source-Channel Coding of a Gaussian Mixture ... - CiteSeerX

A Review of Joint Source-Channel Coding - CiteSeerX

Channel Coding

an efficient video coding technique using a novel non ... - CiteSeerX

A coding theorem for lossy data compression by ... - Semantic Scholar