Construction of Universal Codes Using LDPC Matrices and Their Error ...

1 downloads 0 Views 257KB Size Report
key words: LDPC matrices, universal code, minimum entropy decoding, error exponent. 1. .... memoryless channel, we construct a universal channel code.
IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.9 SEPTEMBER 2007

1830

PAPER

Special Section on Information Theory and Its Applications

Construction of Universal Codes Using LDPC Matrices and Their Error Exponents Shigeki MIYAKE†a) and Mitsuru MARUYAMA†† , Members

SUMMARY A universal coding scheme for information from i.i.d., arbitrarily varying sources, or memoryless correlated sources is constructed using LDPC matrices and shown to have an exponential upper bound of decoding error probability. As a corollary, we construct a universal code for the noisy channel model, which is not necessarily BSC. Simulation results show universality of the code with sum-product decoding, and presence of a gap between the error exponent obtained by simulation and that obtained theoretically. key words: LDPC matrices, universal code, minimum entropy decoding, error exponent

1.

Introduction

Like turbo code [1], low density parity check (LDPC) code [9]–[11] has been investigated as a practical channel code which could possibly attain the Shannon limit under appropriate conditions. Recent theoretical analysis of coding schemes using LDPC matrices is proceeding as much in the source coding field [2]–[4], [12], [14], [15], [19] as in the channel coding field. Using LDPC matrices, Matsunaga et al. [12] constructed a source code for binary i.i.d. sources whose probability distribution is uniform, and proved that the code asymptotically attained the rate-distortion limit under fidelity criteria. Miyake et al. [14] extended their results to cases where multiple alphabet is used. Muramatsu et al. [15] constructed a code by using LDPC matrices in the SlepianWolf system, for which no statistical condition such as stationarity or ergodicity is assumed, and proved that the code asymptotically attained theoretical bounds using maximum likelihood decoding. All of these results are obtained under conditions in which the source statistics are known to both encoder and decoder. In practical communication systems, however, the source statistics are not necessarily known to both encoder and decoder. In this situation, a universal code can attain optimal performance in the following sense. A universal code has two types of coding scheme; variable length coding and fixed length coding. In the former type, coding rate can attain the compression limit or entropy, and the analysis of reManuscript received December 20, 2006. Manuscript revised March 26, 2007. Final manuscript received May 7, 2007. † The author is with NTT Network Innovation Laboratories, NTT Corporation, Yokosuka-shi, 239-0847 Japan. †† The author is with NTT Network Innovation Laboratories, NTT Corporation, Musashino-shi, 180-8585 Japan. a) E-mail: [email protected] DOI: 10.1093/ietfec/e90–a.9.1830

dundancy, which is the difference between coding rate and entropy, has been studied by many researchers [16], [18]. In the latter type, a universal code attains the optimal decoding error exponent, which governs the decreasing speed of the decoding error, while the coding rate remains constant. The Lempel-Ziv code [20], [21] is one of the most famous variable length universal codes and has practical encoding and decoding computing times. Recently, Caire et al. proposed a variable length universal code that combines linear code and the MDL principle [2]–[4]. Little is currently known about fixed length universal codes, especially linear universal codes, that have practical encoding and decoding computing times. Recently, while Coleman et al. proposed an efficient universal decoding algorithm by using linear programming or expander code, they did not discuss efficiency of the algorithm in a practical use, a subject they may currently be studying with simulation experiments [5]. Since many coding theorems are proved for fixed length code, if we obtain a fixed length universal code with practical encoding and decoding performance, the code will have the potential to implement codes given in coding theorems, which have been regarded as no more than proofs of existence. We construct a fixed length code using LDPC matrices, adopt minimum entropy decoding [7] as a decoding scheme because it does not depend on the statistical properties of the information source, and show that the code has universal properties in the sense of fixed length coding. It is also shown that error exponents similar to those obtained using a uniform random coding technique are obtained using a nonuniform random coding technique. Using LDPC matrices for code construction enables us to use existing efficient decoding algorithms such as the sum-product decoding algorithm. Fixed length universal codes constructed by LDPC matrices provide a computationally practical encoding and decoding scheme whose error probability decreases optimally without knowing the statistical properties of its sources. We consider i.i.d., arbitrarily varying sources (AVS), or stationary memoryless correlated sources to be classes of unknown sources. To show that a linear code can be used as a universal code, when analysing decoding error, evaluation of a supremum with respect to objective class of sources is inserted into the expectation operation of non-uniform random coding. Since this type of evaluation is not easy, universality of linear code including both encoder and decoder has not yet been proved. Though universality of minimum

c 2007 The Institute of Electronics, Information and Communication Engineers Copyright 

MIYAKE and MARUYAMA: CONSTRUCTION OF UNIVERSAL CODES USING LDPC MATRICES AND THEIR ERROR EXPONENTS

1831

entropy decoding was treated by Csisz´ar [7], universality of linear encoder was not discussed. Furthermore, when we treat LDPC matrices, the probability distribution of random coding is no longer uniform, and the evaluation problem becomes harder. In this paper, the permutation group technique which was invented by Shulman et al. [17] to prove the channel coding theorem and was applied by Muramatsu et al. [15] to prove the coding theorem for general correlated sources will be used to evaluate the supremum. In addition, the upper bound of the decoding error will be shown in exponential form using the expurgated ensemble technique, which Miller et al. [13] and Erez et al. [8] adopted to evaluate the exponent of the decoding error of a linear channel code. As an application of the theorem obtained here, by considering a correspondence between AVS and a discrete memoryless channel, we construct a universal channel code for a compound channel (a compound DMC) [6]: a class of channel whose noise is not necessarily additive. This paper is organized as follows. In Sect. 2, a definition of terms and problem setting are given as framework. In Sect. 3, the main theorem is presented and a corollary that claims the stronger statement than the theorem under a constraint follows. A universal code for a compound DMC is constructed as an another corollary. In Sect. 4, a proof of the theorem is given. Simulation results that demonstrate the universality of code constructed by LDPC matrices are presented and compared to theoretical error exponents and ones obtained by simulation in Sect. 5. The conclusion is in Sect. 6. 2.

Preliminaries and Problem Setting

We treat our problem in the binary alphabet framework. The set of binary alphabet Z 2 = {0, 1} has the property of field, and the summation and product are defined. We use the following notations. P: A class of information source. Each set of i.i.d., AVS, or memoryless correlated sources is considered as P. Sequence from the source: A random variable of sequence from the source belonging to P is denoted Z n ∈ Z n2 . A realized sequence is represented by a small letter like zn . LDPC matrix and E A : The LDPC matrix is represented as n × k matrix A. A is constructed by the randomized scheme described in Section 4. The expectation operation with respect to random variable A is represented by E A . Constant weight set of LDPC matrix A: S A (l) is a set of zn ∈ Z n2 that satisfies zn A = 0k , and w(zn ), weight of zn , equals l. Encoder and coding rate: For a sequence zn from the source, output of encoder ϕn is defined as def

ϕn (zn ) = zn A. The coding rate is k/n.

Fig. 1

Memoryless correlated sources.

On the other hand, if P is a class of memoryless correlated sources, the encoding scheme represented in Fig. 1 is considered. In this case, the output of encoder ϕn is defined as def

ϕn (xn ) = xn A. Decoder: Minimum entropy decoding [7] is adopted. If P is a class of i.i.d. or AVS, for a given code word uk ∈ Z k2 , output of decoder ψn is defined as def

ψn (uk ) = arg min H(Pzn ). zn :zn A=uk On the other hand, if P is a class of memoryless correlated sources, for a given code word uk ∈ Z k2 and a side information yn ∈ Z n2 , output of decoder ψn is defined as def

ψn (uk , yn ) = arg

min

xn :xn A=uk

H(P xn yn ),

where Pzn represents a type of zn and H(P) is an entropy function of probability distribution P. Pzn , a type def ofzn , is a probability distribution defined by Pzn (a) = n 1 i=1 1 [zi = a] , where 1[·] is an indicator function n that takes value 1 if the logical equation in [] is true, and value 0 otherwise. A class of source P as treated in this paper is defined as follows. Definition 1: [A class of source P] For i.i.d., ⎧ ⎫ n ⎪ ⎪  ⎪ def ⎪ ⎨ ⎬ n n (z ) = P(z ) P = ⎪ P . ⎪ Z i ⎪ ⎪ ⎩ ⎭ i=1

For AVS, ⎧ ⎫ n ⎪ ⎪  ⎪ ⎪ ⎨ ⎬ n n P(zi |si ), si ∈ {0, 1}(1 ≤ i ≤ n)⎪ P = ⎪ PZ n (z |s ) = . ⎪ ⎪ ⎩ ⎭ def

i=1

For memoryless correlated sources, ⎧ ⎫ n ⎪ ⎪  ⎪ def ⎪ ⎨ ⎬ n n n Y n (x , y ) = P(x , y ) P = ⎪ P . ⎪ X i i ⎪ ⎪ ⎩ ⎭ i=1

A characteristic entropy for each class of sources is defined as

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.9 SEPTEMBER 2007

1832

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ def ⎪ H(P) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

maxP∈P H(Z), P is a class of i.i.d.. ¯ H(P), maxP∈P maxP∈conv(P) ¯ P is a class of AVS. maxP∈P H(X|Y), P is a class of memoryless correlated sources.

In the above definition, conv(P) represents a convex hull set of a given AVS P. It should be noted that states of AVS belong to a binary set in the above setting, while in a general setting states of AVS can be valued in an uncountable set. Lemma 3, shown below, is valid under the condition of the finite state set of AVS. If P is a class of i.i.d., the problem setting is described as follows. A similar description can be applied when P is a class of AVS or memoryless correlated sources. [Problem Setting] When the coding rate is given as R, evaluate the upper bound of the expectation value of decoding error probability

P(zn )1 zn  ψn (ϕn (zn )) . E A sup P∈P zn

In this paper, we study issues of fixed length universal codes denoted immediately above, which are problems of constructing universal codes with the optimal error exponent under the constant coding rate condition. Refer to Csisz´ar [7] for the studies of the optimal error exponent of linear codes. In the following, the base of ln is e and the base of log or exp is 2. 3.

Main Theorem and Corollary

def

where |x|+ = max{x, 0}. Remark 1: When R ≤ H(P), the upper bound shown in Theorem 1 is trivial. On the other hand, when R > H(P), it is a nontrivial evaluation. Remark 2: The upper bound shown in Theorem 1 is identical to the one obtained for the i.i.d. class or class of memoryless correlated sources by a uniform random coding technique [7, Theorem 2] except the outer inf operation. When the coding rate R is in the interval (H(P), Rcr ) for ∃Rcr , the obtained exponent can be shown to be optimal for the same reason as discussed by Csisz´ar [7]. Also for the same reason discussed there, P can be extended to the class of k-th Markov sources without difficulty. Remark 3: While Theorem 1 is obtained for the binary alphabet, it can be extended to the multiple alphabet by following the construction of the LDPC matrix that was presented by Erez et al. [8]. The error exponents obtained in Theorem 1 are suboptimal, because the inf P∈P operation exists in front of the optimal (in the sense of Remark 2) exponents. The next corollary shows that under some conditions on P, a good LDPC matrix A provides a code with an optimal (also in the sense of Remark 2) error exponent for each P ∈ P. Corollary 1:

P = PZ

Let

  1  PZ is i.i.d. and 0 < PZ (Z = 1) < . 2

For ∀∆ > 0, ⎡ ⎢⎢⎢  E A 1 ⎢⎢⎢⎣ {PZ n (DecodingError) PZ ∈P

3.1 Main Theorem

⎤  ⎥⎥⎥⎥ > exp (−n (D(PZ ) − ∆)) ⎥⎥⎦

Theorem 1: [Error exponent of source code constructed by LDPC matrices] If P is a class of i.i.d., (Expectation value of decoding error) 

≤ exp −n inf min D(Q||P) + |R − H(Q)|+ . P∈P

→ 0 (n → ∞) holds, where def D(PZ ) = min D(Q||PZ ) + |R − H(Q)|+ ,

Q



If P is a class of AVS,

and α∈A { fα } denotes “OR” operation for α ∈ A of the logical equation fα .

(Expectation value of decoding error)

≤ exp −n inf inf min Q P∈P P∈conv(P) ¯



¯ + |R − H(Q)|+ D(Q||P)

Q



The proof is given in the Appendix. Following the proof, it can be seen that when P is a class of AVS or memoryless correlated sources, similar corollaries hold. . 3.2 An Application to Channel Coding

If P is a class of memoryless correlated sources, (Expectation value of decoding error)

≤ exp −n inf min PXY ∈P PX˜ Y˜



   +  ˜ ˜ ˜ ˜ D(X Y||XY) + R − H(X  Y) ,

As an application of AVS in Theorem 1, universal channel code for compound DMC [6] can be constructed as follows. We consider an ordinary two terminal channel as shown in Fig. 2. As a convention, the random variable of channel output is denoted by Y n . It should be noted that this random variable is different from Y n in Fig. 1.

MIYAKE and MARUYAMA: CONSTRUCTION OF UNIVERSAL CODES USING LDPC MATRICES AND THEIR ERROR EXPONENTS

1833

Fig. 2

Channel model.

Fig. 4

4.

Construction of an LDPC matrix: n = 4, k = 3, c = 3, d = 4.

Proof of Theorem 1

We will prove Theorem 1 with the following steps.

Fig. 3

Correspondence between channel and AVS.

The channel model described in Fig. 2 corresponds to the AVS model, as shown in Fig. 3. By making the states of AVS sn correspond to channel inputs xn , the codeword for AVS can be made to correspond to syndrome of channel output yn or channel noise zn . It can be seen that decoding zn from the syndrome in the channel model is equivalent to decoding source message from the codeword in the AVS model. The next corollary is derived with the above correspondence. Corollary 2: [Universal channel code for compound DMC] Encoder and decoder do not know channel statistics and know only the fact that the channel belongs to compound   def  DMC W = W (α)  α ∈ Λ . If channel coding rate R is satisfied ¯ R < 1 − sup sup H(P), P∈P P∈conv(P) ¯

then there exists an LDPC matrix with which encoding and decoding can asymptotically reduce decoding error to 0   def  with block length n, where P = P(α)  α ∈ Λ is a set whose element P(α) is an AVS corresponding to the element W (α) of compound DMC W. Remark 4: If we define channel coding rate R and corresponding source coding rate R, the well known relationship R = 1 − R holds. When the channel is a binary symmetric channel (BSC), a random coding exponent of channel coding can be obtained through the source coding error exponent obtained in Theorem 1, which is shown without difficulties in a fashion similar to that described by Csisz´ar [7, Corollary 2].

Step 1: Construct an LDPC matrix with the bipartite graph method [8] [13] by deleting matrices that have “small weight code” or “high weight code.” The ensemble of matrices constructed by this scheme is called an “expurgated ensemble.” Step 2: Evaluate random coding expectations of decoding error over the ensemble constructed in Step 1. Note that the supremum operation should be performed when we evaluate the decoding error, and then an evaluation technique by permutation group given by Muramatsu et al. [15] and Shulman et al. [17] will be utilized. 4.1 Construction of LDPC Matrices An example of the construction is shown in Fig. 4. The parity check matrix on the right side in Fig. 4 is given by the bipartite graph on the left side. Each of n variable nodes and k check nodes has c and d sockets, respectively. These numbers have a relationship nc = kd. d is given as an even number following the comment in Remark 5. By matching nc sockets of variable nodes and kd sockets of check nodes one-to-one at random and combining corresponding sockets by line, a bipartite graph is constructed. An LDPC matrix corresponding to the bipartite graph is made as follows: Set (i, j) element of a matrix A to 0 if the number of lines between the i-th variable node and the j-th check node is even, and to 1 otherwise. An expectation operation in random coding in the following is performed over the randomness appearing in the above construction and denoted by E A . 4.2 Expurgated Ensemble In proving the coding theorem, while a random coding expectation is ordinarily performed over the ensemble constructed above, the random coding expectation is taken over the expurgated ensemble. Since we delete the “bad code set” from the original ensemble, the expectation of decoding error can be improved. The set from which the “bad code subset” is removed is called the expurgated ensemble

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.9 SEPTEMBER 2007

1834

[8], [13]. The next lemma shows that bad code subset which will be removed has small probability. Lemma 1: [Erez et al. [8, Lemma 3]] The smallest weight of the code derived from LDPC matrix def A is defined as wmin ( A) = minzn 0n :zn A=0k w(zn ). There exists a positive integer d0 , and if d ≥ d0 , then lim E 1 wmin ( A) < γn = 0 n→∞ A holds, where γ = as |A|.

2e−13 d0 ,

and cardinality of set A is denoted

Remark 5: Miller et al. [13, Theorem 3] evaluated a decoding error of LDPC channel code constructed by the bipartite graph scheme. From comments in the proof, it is not difficult to show that if d is even, then    Pr zn A = 0k  w(zn ) = l    = Pr zn A = 0k  w(zn ) = n − l holds. According to the above fact, since     E A S A (l) = E A S A (n − l) also holds, when we define the largest weight of the def code derived from LDPC matrix A as wmax ( A) = maxzn 0n :zn A=0k w(zn ), under the same condition of Lemma 1, lim E A 1 wmax ( A) > (1 − γ)n = 0 n→∞

Eex A (·)

E A 1 wmax ( A) < (1 − γ)n , wmin ( A) > γn (·) = E A 1 wmax ( A) < (1 − γ)n , wmin ( A) > γn ≤ (1 + δ)E A 1 [wmax ( A) < (1 − γ)n , wmin ( A) > γn (·)

holds. Since the probability of the “bad set” expurgated from the original ensemble is very small, it should be noted that, by using Markov’s inequality, we can take a “good” LDPC matrix A that is included in the expurgated ensemble with high probability. 4.3 Evaluation of Decoding Error Probability In this subsection, only the AVS class is considered because decoding error probability can also be evaluated in a similar manner in classes of i.i.d. or memoryless correlated sources. An expectation value of a decoding error in the random coding over an expurgated ensemble can be transformed as follows. It should be noted that since universality of code A is considered, a supremum operation is inserted in the expectation.

max P(zn |sn )1 zn  ψn (ϕn (zn )) sup Eex n A s

P∈P

zn

≤ E A sup max P(zn |sn ) n s P∈P zn

1 H(Pz˜n ) ≤ H(Pzn ) · ex

z˜n zn

is valid. Remark 6: We can set d0 in Lemma 1 to be an arbitrarily natural number. While the convergence rate of E A 1 wmin ( A) < γn to 0 depends on d0 , the statement of Lemma 1 is unchanged.

1 z˜n A = zn A

= Eex P(zn |sn ) n A sup max P∈P

·

wmax ( A) < (1 − γ)n and wmin ( A) > γn, is called an expurgated ensemble. Following Lemma 1 and Remark 6, when an expectation operation in random coding E A (·) is replaced by an expectation operation on an expurgated ensemble Eex A (·)

E A 1 wmax ( A) < (1 − γ)n , wmin ( A) > γn (·) , = E A 1 wmax ( A) < (1 − γ)n , wmin ( A) > γn

def

for any δ > 0, if we take n sufficiently large, then

zn



1 H(Pz˜n ) ≤ H(Pzn )

l=γn+1 z˜n :w(˜zn −zn )=l n n

The expurgated ensemble is defined as follows. Definition 2: [expurgated ensemble] n × k matrix A, which satisfies

(1−γ)n

s

1 z˜ A = z A



≤ (1 + δ)E A sup max n P∈P

·

(1−γ)n



s



P(zn |sn )

zn

1 H(Pz˜n ) ≤ H(Pzn )

l=γn+1 z˜n :w(˜zn −zn )=l

  ·1 (˜zn − zn ) A = 0k

(1)

The equality of the above transformation is derived from the definition of the expurgated ensemble, and the last inequality is from the comment immediately below Definition 2 and linearity of the matrix A. To evaluate Eq. (1), the permutation group technique [15], [17] is applied. Ξn is a permutation group of degree n, and ξ ∈ Ξn is a permutation of degree n. From the construction, if ξ is a permutation of each row vectors of A, then P ( A) = P (ξ ( A)) holds. Using a manner similar to that of Muramatsu et al. [15], since

MIYAKE and MARUYAMA: CONSTRUCTION OF UNIVERSAL CODES USING LDPC MATRICES AND THEIR ERROR EXPONENTS

1835



1 H(Pz˜n ) ≤ H(Pzn )

z˜n :w(˜zn −zn )=l



·1 (˜zn − zn )ξ ( A) = 0k 1 H(Pz˜n ) ≤ H(Pzn )



=

z˜n :w(˜zn −zn )=l



=

cn ∈S

(l) z˜n :w(˜zn −zn )=l

is derived, Eq. (1) is evaluated as follows, noting that |Ξn | = n!.  Eq. (1)

1   = (1 + δ) Eξ A sup max P(zn |sn ) sn n! P∈P

ξ∈Ξn

(1−γ)n



zn

1 H(Pz˜n ) ≤ H(Pzn )

l=γn+1 z˜n :w(˜zn −zn )=l

  ·1 (˜zn − zn )ξ ( A) = 0k

1   = (1 + δ) Eξ A sup max P(zn |sn ) n s n! P∈P

ξ∈Ξn

·

(1−γ)n





l=γn+1 z˜n :w(˜zn −zn )=l cn ∈S

·1 z˜n − zn = ξ(cn )

A (l)

= (1 + δ)E A sup max n P∈P

·

(1−γ)n



s

zn



1 H(Pz˜n ) ≤ H(Pzn )

1 z˜n − zn = ξ(cn ) n! cn ∈S (l) ξ∈Ξn A

P(zn |sn ) = (1 + δ)E A sup max sn

P∈P

· ·

(1−γ)n



l=γn+1

z˜n :w(˜zn −zn )=l



cn ∈S

A (l)

zn

1 H(Pz˜n ) ≤ H(Pzn )

1 n l

= (1 + δ)E A sup max n P∈P

·

(1−γ)n



l=γn+1 z˜n :w(˜zn −zn )=l

  S A (l) · n l

P∈P

s



P(zn |sn )

zn

1 H(Pz˜n ) ≤ H(Pzn )



s

l

P(z |s ) n n

zn

·

z˜n

 1 H(Pz˜n ) ≤ H(Pzn ) min ,1 2k

(2)

P ( A) = P (ξ ( A)) is used when the right hand side of the 2nd equality is transformed into the right hand side of the 3rd equality. At the last inequality, since z˜n = ψn (ϕn (zn )), the fact that the number of values that z˜n can take is less than 2k is used. It should be noted that Eq. (2), obtained above, is factorized into two parts. The 1st factor is described as the sum of the ratio between the expectation over the LDPC ensemble and the expectation over the uniform ensemble. The 2nd factor can be regarded as an expectation of decoding error using minimum entropy decoding over the uniform ensemble. In the following, the 1st and 2nd factors are evaluated individually. The next lemma is used for the 1st factor. Lemma 2: [Erez et al. [8, Lemma 3 eq. (44)]] For any δ > 0, if we take sufficiently large n and d>

ln lnδR2 , ln(1 − 2γ)

then (1−γ)n

l=γn+1

P(zn |sn )

l=γn+1 z˜n :w(˜zn −zn )=l

·

· sup max n

zn

1 H(Pz˜n ) ≤ H(Pzn )

  2k E A S A (l) n

l=γn+1



A ·1 z˜n − zn = ξ(cn )

·

≤ (1 + δ)



·1 ξ −1 (˜zn − zn ) A = 0k



1 H(Pz˜n ) ≤ H(Pzn )

(1−γ)n

  2k E A S A (l) n ≤ n(n + 1)e2nδ l

holds. The following two lemmas are used to evaluate the 2nd factor. The type set constructed by sequences of length n is denoted Qn , and the set of sequences of which type is Q ∈ Qn is denoted by T n (Q). Lemma 3: Let P be an AVS. For ∀Q ∈ Qn , if the weight or the type of sequence sn equals that of s˜n , which represent states of AVS in each time, then       Pn T n (Q)  sn = Pn T n (Q)  s˜n holds. (Proof of Lemma 3) For a permutation ξ ∈ Ξn , an operation over sequence zn is defined as ξ(zn ) = zξ(1) zξ(2) · · · zξ(n) . If we pick ξ, which satisfies ξ −1 (sn ) = s˜n , since

   Pn T n (Q)  sn = P(zn |sn ) zn ∈T n (Q)

=



zn ∈T n (Q)

=



zn ∈T n (Q)

P (ξ(zn )|sn )   P zn |ξ −1 (sn )

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.9 SEPTEMBER 2007

1836

 ! · exp −n |R − H(Q)|+

hold, the lemma is proved.

≤ n2 (n + 1)

   ¯ + |R − H(Q)|+ exp −n D(Q|| P) · sup max n

Lemma 4:  ¯ def = 1n ni=1 P(z|si ) and ∀Q ∈ Qn , For P(z)    Pn T n (Q)  sn ≤ (n + 1)P¯ n (T n (Q))

P∈P

  ¯ + |R − H(Q)|+ · sup max exp −n min D(Q|| P) n

(Proof of Lemma 4) Let the type of sn be λ. Then, since

P¯ n (T n (Q)) = P¯ n (zn )

P∈P 2

zn ∈T n (Q) i=1



=

≤ n (n + 1)

· exp −n inf 2

λ(si )P(zi |si )

λn (sn )P(T n (Q)|sn )

Q

(3)



λ (s )P(T (Q)|s ) n

n

n

n

(a)

= |T n (λ)| 2−nH(λ) P(T n (Q)|sn ) ≥ (n + 1)−1 P(T n (Q)|sn )

holds, where the equality (a) is derived by Lemma 3, the lemma is proved. Using Lemma 4, the 2nd factor of Eq. (2) is evaluated as follows. (The 2nd factor of Eq. (2))

P(zn |sn ) = sup max n s

·

z˜n

= sup max n P∈P

zn

min



s

 1 H(Pz˜n ) ≤ H(Pzn ) ,1 2k

P(zn |sn )

zn

⎧   ⎫ ⎪ ˜ ≤ H(Pzn ) ⎪ ⎪ ⎪ 1 H( P) ⎪ ⎪ ⎨ ⎬ min ⎪ , 1⎪ · ⎪ ⎪ k ⎪ ⎪ 2 ⎩ ⎭ ˜ ˜ n z˜n ∈T n (P) P∈Q

P(zn |sn ) = sup max n



P∈P

s

zn

⎧  ⎫  ⎪ ⎪ ˜ ⎪ ⎪ ⎪ ⎨ n ˜  1 H(P) ≤ H(Pzn ) ⎪ ⎬   min ⎪ ( P) , 1 · T ⎪ ⎪ ⎪ k ⎪ ⎪ 2 ⎩ ⎭ ˜ n P∈Q

(a) !  ≤ sup max P(zn |sn )n2 exp −n |R − H(Pzn )|+ n

P∈P

s

zn

= n sup max n 2

P∈P

s

(b)

≤ n2 sup max n P∈P

s

inf

  ¯ + |R − H(Q)|+ · min D(Q||P)

si

sn ∈T n (λ)

P∈P

Q

2

P∈P P∈conv(P) ¯

sn



Q

2

¯ P∈conv(P)

¯ i) P(z

zn ∈T n (Q) i=1 n



=

s

≤ n (n + 1) sup P∈P

  ¯ + |R − H(Q)|+ · exp −n inf min D(Q||P)

zn ∈T n (Q) n



Q∈Qn

≤ n2 (n + 1)2

holds.

=

s





P(zn |sn )

Q∈Qn zn ∈T n (Q)

Q∈Qn

!  · exp −n |R − H(Pzn )|+

(n + 1)P¯ n (T n (Q))

where,   at (a), a˜ relation R = k/n and formulas of types ˜  ≤ 2nH(P) T n (P) and |Qn | ≤ n2 are used (see e.g. Csisz´ar and K¨orner [6]). At (b), Lemma 4 is used. By combining Lemma 2 and Eq. (3), it is shown that Eq. (2), which is an expectation of decoding error of nonuniform random coding over the expurgated ensemble, is upper bounded by an exponential function of block length n. 5.

Simulation of Universal Code Constructed by LDPC Matrices

In this section we discuss simulation results that show the universal property of the code constructed by LDPC matrices in Theorem 1 and compare the error exponent obtained experimentally with the theoretical error exponent. It should be noted that we adopt sum-product decoding instead of minimum entropy decoding in the interest of computational efficiency and that the parameter of sum-product decoding is fixed at a proper value through simulation. In simulations, after sequences from the i.i.d. source are encoded into codewords by linear code constructed from an LDPC matrix, which is generated at random, codewords are decoded by sum-product decoding and decoding error is computed. Figure 5 shows the plot of average decoding error for the linear code constructed by a good LDPC matrix and sum-product decoding with fixed parameters for various sources. Sources are i.i.d., and probability PZ (Z = 1) is taken as a source parameter, which is represented by the horizontal axis. Average decoding error is represented by the vertical axis. The plot labelled “Universal LDPC: n=100” shows average decoding error for the code with which the LDPC matrix and sum-product decoding parameters are fixed during simulation when block length n = 100 and coding rate R = 0.5. It should be noted that the source parameter corresponding to the value of source entropy of 0.5 is about 0.11.

MIYAKE and MARUYAMA: CONSTRUCTION OF UNIVERSAL CODES USING LDPC MATRICES AND THEIR ERROR EXPONENTS

1837

6.

Fig. 5

Fig. 6

Simulation of universal code: R = 0.5.

Simulation of error exponent: R = 0.5, p = 0.01.

For the plot of “Universal LDPC: n=50,” the block length condition is n = 50 and other conditions are same as the case of n=100. Both plots show that decoding error tends to decrease as source parameters decrease. It can be seen that the pair of encoder constructed by the LDPC matrix and sum-product decoder have universality under the conditions of the simulation. As shown in Fig. 6, an error exponent of the linear code used in Fig. 5 is obtained numerically by varying block length. The horizontal axis is block length and the vertical axis is average decoding error. The plot labelled “Universal LDPC” shows the simulation results, and the plot labelled “Theoretical” is obtained by substituting p = 0.01 and R = 0.5 into the exponent formula obtained in Theorem 1 without the inf operation. The slope of “Universal LDPC” is about 0.0267, and that of “Theoretical” is about 0.2382. The exponent obtained in the simulation is about 1/10 the size of the one obtained theoretically. Although using sumproduct decoding may reduce efficiency of decoding error, we do not have enough data to inquire further and must leave this discussion for future work.

Conclusion

Universal codes for i.i.d., AVS, or memoryless correlated sources were constructed using LDPC matrices, and expectation of decoding error using minimum entropy decoding is upper bounded by an exponential function of block length n. As a corollary, we show that the optimal error exponent can be attained for each of information sources. By applying the correspondence between AVS and the channel model, a universal code for compound DMC was constructed as an another corollary. A simple simulation shows the universality of the code constructed by LDPC matrices and sum-product decoding with a fixed parameter for a class of i.i.d. sources. While a binary alphabet was used throughout this paper, by utilizing the technique proposed by Erez et al. [8], Theorem 1 can be extended to a multiple alphabet without difficulty. On the other hand, it is not easy to find a decoding scheme that has both universality and computational efficiency in the multiple alphabet case. In the binary alphabet case treated in this paper, since the parameter of sumproduct decoding is the probability distribution of the information source and there is only one parameter, we can take h−1 (H(P)) as a universal parameter. This is because there is a one-to-one correspondence between the ordering of p and the magnitude of h(p) for h−1 (H(P)). In the multiple alphabet case, since there is no obvious correspondence between the ordering of the probability distribution parameters and the magnitude of their entropy function, it is not easy to find the universal and computational efficient decoding algorithm. A construction of an efficient decoding scheme using a multiple alphabet was recently proposed by Coleman et al. [5]. Study of an application of their decoding scheme to our universal coding scheme and construction of an efficient universal decoding algorithm with a multiple alphabet will be the subject of our next study. Acknowledgement The authors thank Prof. Tomohiko Uyematsu and participants in STW06 for their helpful discussions. The authors also thank the anonymous reviewers for their constructive comments and suggestions. References [1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo codes,” Proc. 1993 Int. Conf. Commun., pp.1064–1070, 1993. [2] G. Caire, S. Shamai, and S. Verd´u, “Lossless data compression with error correction codes,” Proc. 2003 IEEE Int. Symp. Inform. Theory, p.22, 2003. [3] G. Caire, S. Shamai, and S. Verd´u, “Universal data compression with LDPC codes,” Proc. Third Int. Symp. Turbo Codes and Related Topics, pp.55–58, 2003. [4] G. Caire, S. Shamai, and S. Verd´u, “Noiseless data compression with low density parity check codes,” DIMACS Series in Discrete Mathematics and Theoretical Computer Science: Advances in Network Information Theory, vol.66, pp.263–284, American Mathematical

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.9 SEPTEMBER 2007

1838

Society, 2004. [5] T.P. Coleman, M. Medard, and M. Effros, “Polynomial complexity universal block decoding with exponential error probability decay,” submitted to IEEE Trans. Inform. Theory. [6] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, 1981. [7] I. Csisz´ar, “Linear codes for sources and source networks: Error exponents, universal coding,” IEEE Trans. Inform. Theory, vol.28, no.4, pp.585–592, 1982. [8] U. Erez and G. Miller, “The ML decoding performance of LDPC ensembles over Z q ,” IEEE Trans. Inform. Theory, vol.51, no.5, pp.1871–1879, 2005. [9] R.G. Gallager, “Low density parity check codes,” IRE Trans. Inform. Theory, vol.8, pp.21–28, 1962. [10] R.G. Gallager, Low Density Parity Check Codes, no.21 in Research Monograph Series, MIT Press, Cambridge, MA, 1963. [11] D.J.C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inform. Theory, vol.45, no.2, pp.399–431, 1999. [12] Y. Matsunaga and H. Yamamoto, “A coding theorem for lossy data compression by LDPC codes,” IEEE Trans. Inform. Theory, vol.49, no.9, pp.2225–2229, 2003. [13] G. Miller and D. Burshtein, “Bounds on the maximum-likelihood decoding error probability of low-density parity-check codes,” IEEE Trans. Inform. Theory, vol.47, no.7, pp.2696–2710, 2001. [14] S. Miyake, “Lossy data compression over Z q by LDPC code,” Proc. 2006 IEEE Int. Symp. Inform. Theory, pp.813–816, 2006. [15] J. Muramatsu, T. Uyematsu, and T. Wadayama, “Low density parity check matrices for coding of correlated sources,” IEEE Trans. Inform. Theory, vol.51, no.10, pp.3645–3654, 2005. [16] J. Rissanen, “Universal coding, information, prediction, and estimation,” IEEE Trans. Inform. Theory, vol.30, no.4, pp.629–636, 1984. [17] N. Shulman and M. Feder, “Random coding techniques for nonrandom codes,” IEEE Trans. Inform. Theory, vol.45, no.6, pp.2101– 2104, 1999. [18] K. Visweswariah, S.R. Kulkarni, and S. Verd´u, “Universal variableto-fixed length source codes,” IEEE Trans. Inform. Theory, vol.47, no.4, pp.1461–1472, 2001. [19] Z. Xiong, A.D. Liveris, and S. Cheng, “Distributed source coding for sensor networks,” IEEE Signal Process. Mag., vol.21, pp.80–94, 2004. [20] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inform. Theory, vol.23, no.3, pp.337– 343, 1977. [21] J. Ziv and A. Lempel, “Compression of individual sequences by variable rate coding,” IEEE Trans. Inform. Theory, vol.24, no.5, pp.530–536, 1978.

Appendix Here, we prove Corollary 1. First, the next lemma is shown to be valid. ˆ of Lemma 5: 0 < δ < 1 is given. There exists a subset P P with the following properties. For ∀PZ ∈ P there exists ˆ which satisfies Pˆ Z ∈ P

 PZ (Z = 0) PZ (Z = 1) 1 − δ ≤ min , Pˆ Z (Z = 0) Pˆ Z (Z = 1) and

max

 PZ (Z = 0) PZ (Z = 1) , ≤ 1 + δ. Pˆ Z (Z = 0) Pˆ Z (Z = 1)

(Proof of Lemma 5)     ˆ = Pˆ (1) , Pˆ (2) , ..., Pˆ (K) and Pˆ (i) (Z = 1) = 1−δ 1−δ i , for If P Z Z Z Z 2 1+δ sufficiently large K, by using hypothesis of the corollary 0 < ˆ has the desired PZ (Z = 1) < 12 , it can be shown that P properties after some straightforward calculation. It should be noted that for any event E, by using propˆ erties of P, PZ n (E) ≤ (1 + δ)n Pˆ Z n (E)

(4)

holds. By setting E = {Decoding Error}, ⎡ ⎤ ⎢⎢⎢  ! ⎥⎥⎥⎥ ⎢ E A 1 ⎢⎢⎣ PZ n (E) > exp (−n (D(PZ ) − ∆)) ⎥⎥⎦ PZ ∈P ⎡ ⎢⎢⎢  (a) ≤ E A 1 ⎢⎢⎢⎣ Pˆ Z n (E)(1 + δ)n PZ ∈P ## $ " " 1 ˆ −∆ > exp −n D(PZ ) + log 1−δ ⎡K ⎢⎢ (i) = E A 1 ⎢⎢⎢⎣ Pˆ Z n (E)(1 + δ)n i=1 ## ⎤⎥ " " ⎥⎥ 1 (i) − ∆ ⎥⎥⎥⎦ > exp −n D(Pˆ Z ) + log 1−δ % K

E A 1 Pˆ Z(i)n (E) ≤ i=1

" " ##$ 1+δ (i) ˆ > exp −n D(PZ ) + log −∆ 1−δ K (b)

Pˆ (i) Z n (E)    ≤ EA (i) ˆ exp −n D(PZ ) + log 1+δ i=1 1−δ − ∆   K exp −nD(Pˆ Z(i) ) (c)

   ≤ 1+δ ˆ (i) i=1 exp −n D( PZ ) + log 1−δ − ∆ " " ## 1+δ = K exp −n ∆ − log → 0 (n → ∞), 1−δ where at (a) the fact commented on immediately above and 1 are used. At (b) Markov’s inequalD(PZ ) ≤ D(Pˆ Z ) + log 1−δ ity is used, and at (c), for evaluation of the numerator, an evaluation similar to the one in the proof of Theorem 1 is used.

MIYAKE and MARUYAMA: CONSTRUCTION OF UNIVERSAL CODES USING LDPC MATRICES AND THEIR ERROR EXPONENTS

1839

Shigeki Miyake received his B.E. and M.E. degrees in Physical Engineering from the University of Tokyo in 1987 and 1989 respectively. He joined NTT Laboratories in 1989. He has been engaged in research on information theory and its application except from 1998 to 2004 when he worked in business devision of NTT. He is currently a Research Engineer in NTT Network Innovation Laboratories. He is a member of the IEEE Information Theory Society, and the SITA of Japan.

Mitsuru Maruyama received his B.E. and M.E. degrees in Electrical Engineering from the University of Electro-Communications in 1993 and 1985 respectively. He received a doctorate in computer science in 1999 from the University of Electro-Communications. He joined NTT Laboratories in 1985 and has been engaged in research and development of a highdefinition videotex system, video-on-demand systems, and IP-based real-time video transmission and archiving system. He is a Group Leader in the NTT Innovation Laboratories, where he is currently studying fastprotocol processing system architectures and multi-agent systems. He is a member of the IEEE Computer Society and Communications Society; the ACM; and the Information Processing Society of Japan.