Achieving the Rate-Distortion Bound with Low-Density Generator ...

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 58, NO. 6, JUNE 2010

1643

Achieving the Rate-Distortion Bound with Low-Density Generator Matrix Codes Zhibin Sun, Mingkai Shao, Jun Chen, Member, IEEE, Kon Max Wong, Fellow, IEEE, and Xiaolin Wu, Senior Member, IEEE

Abstract—It is shown that binary low-density generator matrix codes can achieve the rate-distortion bound of discrete memoryless sources with general distortion measure via multilevel quantization. A practical encoding scheme based on the surveypropagation algorithm is proposed. The effectiveness of the proposed scheme is verified through simulation. Index Terms—Rate-distortion bound, linear code, low-density generator matrix, message-passing algorithm.

I. I NTRODUCTION

A

CHIEVING the rate-distortion bound of discrete memoryless sources has been one of the grand challenges in information theory. Though Shannon [1] has shown that the bound can be achieved asymptotically by using random block codes, the exponential encoding complexity makes such codes impractical. Coincidentally, another grand challenge in information theory is to design channel codes for discrete memoryless channels that have reasonable decoding complexity. To meet this challenge, research in channel coding has been almost exclusively focused on linear codes whose linear structure can be utilized to achieve efficient encoding and decoding. The focus on linear codes is further justified by the well-known result that linear codes can achieve the capacity of symmetric discrete memoryless channels under maximum likelihood decoding [2], [3]. The focused effort proves to be successful. Recent breakthroughs have shown that linear codes, e.g., Turbo codes and low-density parity check (LDPC) codes, used in conjunction with various messagepassing decoding algorithms, can indeed approach the capacity of discrete memoryless channels. Recall that in lossy source coding the main design problem is to reduce the exponential encoding complexity resulting from the search for a reconstruction sequence in an exponentially large sequence space. Similarly, the main design problem in channel coding is to reduce the exponential decoding Paper approved by Z. Xiong, the Editor for Distributed Coding and Processing of the IEEE Communications Society. Manuscript received April 17, 2009; revised September 14, 2009. Z. Sun was with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada. He is now with Hydro One Networks Inc., Toronto, ON, Canada (e-mail: [email protected]). M. Shao, J. Chen, K. M. Wong, and X. Wu are with the Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON, Canada (e-mail: [email protected], {junchen, wong, xwu}@ece.mcmaster.ca). The material in this paper was presented in part at the IEEE Information Theory Workshop, Lake Tahoe, CA, USA, Sep. 2007, and the 24th Queen’s Biennial Symposium on Communications, Kingston, ON, Canada, June 2008. Digital Object Identifier 10.1109/TCOMM.2010.06.090203

complexity resulting from the search for a codeword in an exponentially large space. The similar nature of the two design problems implies that the techniques developed for channel coding might be useful for lossy source coding as well. Indeed, lossy source codes like tree codes[4] and trellis codes [5] all have their origins in channel coding. Note that these two types of codes are known to be capable of achieving the rate-distortion bound. However, while these codes get the strengths from their counterparts in channel coding, they also inherit their counterparts’ drawbacks. For example, saturating the rate-distortion bound with trellis codes requires taking the constraint length to infinity, which incurs exponential complexity. In view of the recent success of LDPC codes and messagepassing algorithms for channel coding, it is natural to ask whether similar approaches are also applicable to lossy source coding. Indeed, motivated by this line of thinking, it has been proposed to use low-density generator matrix (LDGM) codes (which can be viewed as the dual of LDPC codes) and various message-passing algorithms for lossy source coding. As a consequence, the following questions naturally arise. Q1: Can LDGM codes achieve the rate-distortion bound of discrete memoryless sources with general distortion measure? Q2: If the answer to Q1 is affirmative, can LDGM codes, used in conjunction with certain message-passing algorithms, approach the rate-distortion bound of discrete memoryless sources with general distortion measure? For the binary erasure quantization problem, Q1 and Q2 have been successfully answered in [6]. For uniform binary sources with Hamming distortion measure, Q1 has been addressed in [7]; it is shown that LDGM codes1 (under optimal encoding) can saturate the rate-distortion bound in this case. Moreover, for uniform binary sources with Hamming distortion measure, good empirical answers to Q2 have been given in [8]–[11], where LDGM codes are used in conjunction with variants of message-passing algorithms for efficient encoding. Beyond the aforementioned examples, however, the answers to Q1 and Q2 are still unknown. The purpose of this paper is to settle Q1 in the affirmative for general discrete memoryless sources and distortion measure. Specifically, we show that binary LDGM codes can 1 More precisely, a hybrid LDPC-LDGM based construction is used in [7]. Nevertheless, the proof technique developed in [7] works for LDGM codes as well.

c 2010 IEEE 0090-6778/10$25.00 ⃝

1644


achieve the rate-distortion bound via multilevel quantization. Moreover, our simulation results indicate that binary LDGM codes, used in conjunction with the survey-propagation algorithm [12]–[14], provide a promising answer to Q2. Note that linear lossy source codes should not be confused with linear lossy source encoding. Indeed, although we shall use linear codebooks, the encoding operation (i.e., the mapping from the source sequence to the codeword index) is intrinsically nonlinear. In fact, it is known that linear encoding generally cannot achieve the rate-distortion bound [15], albeit it suffices for lossless data compression [16]. It should also be emphasized that research on practical lossy source coding is by no means restricted to the aforementioned ones. For example, there has been a significant amount of work devoted to designing lossy source coding schemes based on the existing lossless data compression algorithms [17]–[19]. Moreover, several promising new directions [20]–[23] have emerged during the writing of this paper. In particular, it is shown in [21] that nonlinear sparse-graph codes can achieve the rate-distortion bound of discrete memoryless sources with general distortion measure. The remainder of this paper is organized as follows. In Section II, we discuss the fundamental limit of linear codes in channel coding and lossy source coding. In Section III, we show that binary LDGM codes can saturate the ratedistortion bound via multilevel quantization. A practical encoding scheme based on the survey-propagation algorithm is proposed in Section IV. The effectiveness of the proposed scheme is verified through simulation. Finally, we close with some concluding remarks in Section V. II. L INEAR C ODES FOR C HANNEL C ODING AND L OSSY S OURCE C ODING Consider the lossy source coding problem for an i.i.d. process {𝑋𝑖 }∞ 𝑖=1 with marginal probability distribution 𝑃𝑋 on 𝒳 . Let 𝑑(⋅, ⋅) : 𝒳 × 𝒳ˆ → [0, 𝑑max ] be a distortion measure, where 𝒳ˆ is the reconstruction alphabet. Throughout this paper, 𝒳 and 𝒳ˆ are assumed to be finite. The rate-distortion function 𝑅(𝐷) for source 𝑋 with distortion measure 𝑑(⋅, ⋅) is given by [24] ˆ 𝑅(𝐷) = min 𝐼(𝑋; 𝑋), 𝑃𝑋∣𝑋 ˆ

(1)

: where the minimization is over all test channels 𝑃𝑋∣𝑋 ˆ ˆ ˆ 𝒳 → 𝒳 subject to the constraint 𝔼[𝑑(𝑋, 𝑋)] ≤ 𝐷. The ratedistortion bound is known to be achievable with random block codes. However, the encoding complexity of such codes grows exponentially with the block length since there is no structure to exploit. As a consequence, it is of considerable interest to see whether structured codes, such as linear codes, suffice to achieve the rate-distortion bound. First we shall give a definition of linear codes. A binary linear code 𝒞𝑛 of length 𝑛 and rate 𝑛𝑘 is a 𝑘-dimensional linear subspace of the vector space 𝔽𝑛2 , where 𝔽2 is the binary field with elements 0 and 1. Moreover, we can write 𝒞𝑛 = {𝑐𝑛 : 𝑐𝑛 = 𝑢𝑘 𝐺, 𝑢𝑘 ∈ 𝔽𝑘2 }, where 𝐺, referred to as the generator matrix of 𝒞𝑛 , is a 𝑘 × 𝑛 matrix with rows being the basis codewords of 𝒞𝑛 .

To see whether linear codes are capable of achieving the rate-distortion bound, it is instructive to examine a similar problem in channel coding. Let 𝑃𝑉 ∣𝑈 : 𝒰 → 𝒱 be a discrete memoryless channel. For illustrative purpose, we shall temporarily assume 𝒰 = {0, 1}. It is well known that the capacity (i.e., the maximum reliable communication rate) of channel 𝑃𝑉 ∣𝑈 is given by 𝐶 = max 𝐼(𝑈 ; 𝑉 ), 𝑃𝑈

(2)

where the maximization is over all probability distributions on 𝒰. However, the channel capacity is not always directly achievable using binary linear codes. Specifically, it has been shown [3] that the maximum reliable communication rate for binary linear codes over channel 𝑃𝑉 ∣𝑈 is given by 𝐶𝐿 ≜ 𝐼(𝑈 ; 𝑉 )∣𝑃𝑈 =𝑃 ∗ ,

(3)

where 𝑃 ∗ is the uniform distribution on 𝒰. Since the role of ˆ in lossy source coding is similar to reconstruction variable 𝑋 that of channel input 𝑈 in channel coding, while the role of generic source variable 𝑋 is similar to that of channel output ˆ to be uniformly distributed if linear 𝑉 , it is natural to restrict 𝑋 codes are used for lossy source coding. Specifically, for 𝒳ˆ = {0, 1}, one might conjecture that the minimum achievable rate in lossy source coding using binary linear codes is given by ˆ 𝑅𝐿 (𝐷) ≜ min 𝐼(𝑋; 𝑋), 𝑃𝑋∣𝑋 ˆ

(4)

where the minimization is over all test channels 𝑃𝑋∣𝑋 : ˆ ˆ ≤ 𝐷 and 𝒳 → 𝒳ˆ subject to the constraints 𝔼[𝑑(𝑋, 𝑋)] 𝑃𝑋ˆ (0) = 𝑃𝑋ˆ (1) = 12 ; in particular, if there is no test channel 𝑃𝑋∣𝑋 : 𝒳 → 𝒳ˆ satisfying all the constraints, then 𝑅𝐿 (𝐷) ˆ is set to be equal to infinity, which essentially asserts that no binary linear code can meet the distortion constraint 𝐷. This conjecture turns out to be false as shown by the following example. Let 𝒳 = 𝒳ˆ = {0, 1}, 𝑃𝑋 (0) = 𝑝 with 𝑝 ∈ (0, 12 ), and 𝑑(⋅, ⋅) be the Hamming distortion measure. It is easy to verify that 𝑅𝐿 (𝐷) = ∞ when 𝐷 < 12 (1 − 2𝑝). However, since the linear span of every binary block code is a binary linear code, it is always possible to find a binary linear code to meet the distortion constraint 𝐷 (even when 𝐷 < 12 (1 − 2𝑝)). Therefore, this example indicates that in lossy source coding one might not need to restrict the output of the test channel to be uniformly distributed when linear codes are used. To better understand the subtle difference between channel coding and lossy source coding in terms of the fundamental limit of linear codes, we shall have a close look at the reasoning behind (3). First we need a few definitions. For any sequence 𝑎𝑛 = (𝑎1 , 𝑎2 , ⋅ ⋅ ⋅ , 𝑎𝑛 ) with 𝑎𝑖 ∈ 𝒜, 𝑖 = 1, 2, ⋅ ⋅ ⋅ , 𝑛, 𝑛 𝑛 𝑛 the empirical distribution ∑𝑛 𝑃𝑎 of 𝑎 is defined as (𝑃𝑎 (𝑎))𝑎∈𝒜 1 with 𝑃𝑎𝑛 (𝑎) = 𝑛 𝑖=1 𝕀(𝑎𝑖 = 𝑎), where 𝕀(⋅) is the indicator function. We define the empirical distribution of a block code as the average of the empirical distributions of all its codewords. In particular, for a binary linear code 𝒞𝑛 of length 𝑛, the empirical distribution 𝑃𝑐𝑛 of a codeword 𝑐𝑛 ∈ 𝒞𝑛 is given by 𝑛1 (𝑛 − 𝑤(𝑐𝑛 ), 𝑤(𝑐𝑛 )), where 𝑤(⋅) is the weight function that counts the number of 1’s in a sequence;∑moreover, the empirical distribution of 𝒞𝑛 is given by ∣𝒞1𝑛 ∣ 𝑐𝑛 ∈𝒞𝑛 𝑃𝑐𝑛 . It is easy to verify that the empirical distribution of an arbitrary

SUN et al.: ACHIEVING THE RATE-DISTORTION BOUND WITH LOW-DENSITY GENERATOR MATRIX CODES

binary linear code is the uniform distribution on {0, 1} unless the code contains coordinates in which every codeword is zero (such coordinates can be deleted as they carry no information). Now one can readily prove (3) by invoking Fano’s inequality, the data processing inequality, and a convexity argument (cf. [25, Problem 5, pp. 112-113]). However, in contrast to the fact that each codeword in a channel code is transmitted with the same probability, the codewords in a lossy source code may not be used with the same frequency2. As a consequence, the empirical distribution of a code does not play much of a role in lossy source coding (see [26] for a more in-depth discussion); indeed, this is the reason why (4) should not be viewed as the fundamental limit of binary linear lossy source codes. On the other hand, the rate-distortion function (1) does impose certain constraints on the empirical distributions of individual codewords of a good lossy source code. Let (𝑅, 𝐷) be an arbitrary point on the rate-distortion curve. Let 𝑃𝑋ˆ ∗ ∣𝑋 be the optimal test channel associated with this ratedistortion point and 𝑃𝑋ˆ ∗ be the output distribution induced by 𝑃𝑋 and 𝑃𝑋ˆ ∗ ∣𝑋 . It can be shown using the result in [27] that a good lossy source code, in the sense that its rate-distortion performance is very close to the given (𝑅, 𝐷) point, must contain an exponentially non-negligible fraction of codewords whose empirical distributions are close to 𝑃𝑋ˆ ∗ . Note that 𝑃𝑋ˆ ∗ is in general not a uniform distribution except for certain special cases such as the uniform binary source with Hamming distortion measure. However, it is well known that the codeword empirical distributions of many commonly used binary random linear code ensembles concentrate (on the exponential scale) around the uniform distribution (see, e.g., [28]). Actually it can be shown under mild conditions that a binary linear code must have an exponentially non-negligible fraction of codewords with approximately the uniform empirical distribution. Although this does not exclude the possibility that the code also contains an exponentially non-negligible fraction of codewords with empirical distributions close to a non-uniform 𝑃𝑋ˆ ∗ , it does indicate that such linear codes are difficult, if not impossible, to find3 . As a consequence, if 𝑃𝑋ˆ ∗ associated with the given (𝑅, 𝐷) on the rate-distortion curve is not a uniform distribution, then it is very hard to use linear codes to directly achieve this rate-distortion point due to the mismatch between the output distribution of the optimal test channel and the dominant codeword empirical distribution of linear codes4 . To illustrate this point, we shall use the binary lossy source coding problem with Hamming distortion measure as an example. It can be seen from Fig. 1 that the performance of the scheme proposed in [9] (which is based on linear codes) degrades significantly when applied to a non-uniform source although it works well for the uniform 2 The frequency that a codeword gets used depends on the encoding function and the source distribution. 3 Since it is easy to generate linear codes for which the empirical distributions of most of the codewords concentrate around the uniform distribution, one can construct linear codes which contain an exponentially non-negligible fraction of codewords whose empirical distributions are close to a certain nonuniform distribution by adding coordinates in which every codeword is zero. However, the resulting linear codes are not very useful due to the inefficient use of the available freedom. 4 A related discussion of mismatched codebooks in lossy source coding can be found in [29].

1645

(a) 𝑃𝑋 (0) = 0.5

(b) 𝑃𝑋 (0) = 0.2 Fig. 1.

Performance degradation for non-uniform sources.

source. This is because when the source is not uniform, the corresponding 𝑃𝑋ˆ ∗ is also not uniform. Therefore, the performance degradation is not solely due to the limitation of the encoding algorithm; it is also a consequence of the intrinsic property of linear codes. III. M ULTILEVEL Q UANTIZATION We shall show that (4) is actually an upperbound on the minimum achievable rate of binary linear lossy source codes. However, in order to prove the sufficiency of binary linear codes for lossy source coding, one has to deal with the following scenarios: 1) The output distribution 𝑃𝑋ˆ ∗ induced by 𝑃𝑋 and the optimal test channel 𝑃𝑋ˆ ∗ ∣𝑋 is non-uniform; 2) The reconstruction alphabet is non-binary. It turns out that both scenarios can be handled by leveraging a method called multilevel quantization. The multilevel quantization method has a natural counterpart in channel coding called multilevel coding, which was first proposed by Gallager [30]. The idea can be illustrated by a simple example. Suppose the optimal input distribution 𝑃𝑈 for a binary-input memoryless channel 𝑃𝑉 ∣𝑈 of capacity 𝐶 is given

1646


by (𝑃𝑈 (0), 𝑃𝑈 (1)) = ( 14 , 34 ). Now construct a random variable ˜ that is uniformly distributed over 𝔽22 . Define a deterministic 𝑈 ˜ = (0, 0) and ˜ ) such that 𝑈 = 0 if 𝑈 mapping 𝑈 = 𝑓 (𝑈 𝑈 = 1 otherwise. Clearly, the induced distribution of 𝑈 is exactly the capacity-achieving input distribution, and we ˜ ; 𝑉 ) = 𝐼(𝑈 ; 𝑉 ) = 𝐶. Here the essential idea have 𝐼(𝑈 is that, by incorporating a deterministic mapping, one can convert a channel with non-uniform capacity-achieving input distribution to a channel of the same capacity whose optimal input distribution is uniform. In general, we can approximate an arbitrary input distribution by a uniform distribution over sufficiently large alphabet. Now we proceed to adopt this idea in lossy source coding. Specifically, we shall approximate the output distribution 𝑃𝑋ˆ ∗ by a uniform distribution over 𝔽𝑚 2 through a deterministic mapping. The rationale underlying such an approximation is the following. Loosely speaking, it can be assumed that the value of a binary linear code at any 𝑚 coordinates is approximately uniformly distributed over 𝔽𝑚 2 (which will be justified later). Therefore, by mapping every 𝑚 coordinates into a reconstruction symbol through a certain function, one can convert a binary linear code into a lossy source code whose dominant codeword empirical distribution matches 𝑃𝑋ˆ ∗ . It is clear that in order to do so, the length of the binary linear code should be 𝑚 times that of the length of the source sequence. In the rest of this section we shall provide a rigorous treatment of this multilevel quantization method. For each positive integer 𝑚, we define

probability as 𝑛 → ∞. Define 𝑌𝑖 (𝑢𝑘 ) = 𝑢𝑘 g𝑖 , 𝑢𝑘 ∈ 𝔽𝑘2 , where g𝑖 is the 𝑖-th column of G, 𝑖 = 1, 2, ⋅ ⋅ ⋅ , 𝑚𝑛. Partition {1, 2, ⋅ ⋅ ⋅ , 𝑚𝑛} into 𝑛 subsets 𝒩1 , 𝒩2 , ⋅ ⋅ ⋅ , 𝒩𝑛 , each of size 𝑚. We assume the elements in 𝒩𝑖 , 𝑖 = 1, 2, ⋅ ⋅ ⋅ , 𝑛, are ordered. For each 𝒩𝑖 (say, 𝒩𝑖 = (𝑛1 , ⋅ ⋅ ⋅ , 𝑛𝑚 )), we define ˜ 𝑖 (𝑢𝑘 ) = (𝑌𝑛1 (𝑢𝑘 ), ⋅ ⋅ ⋅ , 𝑌𝑛𝑚 (𝑢𝑘 )). For any deterministic 𝑋 𝑘 𝑘 ˆ ˆ𝑛 𝑘 mapping 𝑓 : 𝔽𝑚 2 → 𝒳 , let 𝒞(G, 𝑓 ) = {𝑋 (𝑢 ) : 𝑢 ∈ 𝔽2 }, ˆ 𝑛 (𝑢𝑘 ) = (𝑓 (𝑋 ˜ 1 (𝑢𝑘 )), 𝑓 (𝑋 ˜ 2 (𝑢𝑘 )), ⋅ ⋅ ⋅ , 𝑓 (𝑋 ˜ 𝑛 (𝑢𝑘 ))). where 𝑋 Define 𝑘 𝑅(𝑛, 𝑘) = , 𝑛 𝐷(𝑛, 𝑘, 𝑚, 𝑓 ) = 𝔼[𝑑𝑛 (𝑋 𝑛 , 𝒞(G, 𝑓 ))],

˜ 𝑅𝐿,𝑚 (𝐷) = min 𝐼(𝑋; 𝑋),

for some sufficiently large 𝑛 and 𝑘. Remark: In view of the definition of 𝑅𝐿,𝑚 (𝐷) and Theorem 1, one can readily show that binary LDGM codes can achieve the rate-distortion bound via multilevel quantization by optimizing , 𝑓 (⋅)) and letting 𝑚 → ∞. (𝑃𝑋∣𝑋 ˜ Proof: See Appendix B. It has been shown recently [31] that for uniform binary sources with Hamming distortion measure, a necessary condition for achieving the rate-distortion bound using LDGM codes is that the generator matrix contains rows with the number of 1’s growing unboundedly as 𝑛 → ∞ (see [32] for a weak version of this result). Note that in our construction, the condition 𝑛𝑝𝑛 → ∞ can be interpreted as the requirement that the number of 1’s in each row of the generator matrix grows unboundedly with 𝑛; therefore, our result indicates that the aforementioned necessary condition is essentially sufficient.

𝑃𝑋∣𝑋 ˜

: 𝒳 → 𝔽𝑚 where the minimization is over all 𝑃𝑋∣𝑋 ˜ 2 such is that the output distribution 𝑃𝑋˜ induced by 𝑃𝑋 and 𝑃𝑋∣𝑋 ˜ 𝑚 𝑚 uniform over 𝔽2 , and there exists a function 𝑓 : 𝔽2 → 𝒳ˆ ˜ satisfying the distortion constraint 𝔼[𝑑(𝑋, 𝑓 (𝑋))] ≤ 𝐷. We satisfying let 𝑅𝐿,𝑚 (𝐷) = ∞ if there does not exist any 𝑃𝑋∣𝑋 ˜ the constraints. Equivalently, we can define ˆ 𝑅𝐿,𝑚 (𝐷) = min 𝐼(𝑋; 𝑋), 𝑃𝑋∣𝑋 ˆ

: 𝒳 → 𝒳ˆ such that where the minimization is over all 𝑃𝑋∣𝑋 ˆ the output distribution 𝑃𝑋ˆ induced by 𝑃𝑋 and 𝑃𝑋∣𝑋 is in 𝒫𝑚 , ˆ ˆ and the distortion constraint 𝔼[𝑑(𝑋, 𝑋)] ≤ 𝐷 is satisfied. Here 𝒫𝑚 is the set of probability distributions 𝑃𝑋ˆ over 𝒳ˆ satisfying 2𝑚 𝑃𝑋ˆ (ˆ 𝑥) ∈ ℤ for all 𝑥 ˆ ∈ 𝒳ˆ . ∑Theorem 1: lim𝑚→∞ 𝑅𝐿,𝑚 (𝐷) = 𝑅(𝐷) for 𝐷 > ˆ). 𝑥∈𝒳 𝑃𝑋 (𝑥) min𝑥 ˆ∈𝒳ˆ 𝑑(𝑥, 𝑥 Proof: See Appendix A. Now we proceed to show that 𝑅𝐿,𝑚 (𝐷) is achievable using binary LDGM codes. A binary LDGM code is a binary linear code associated with a sparse generator matrix (i.e, a generator matrix in which the number of 1’s is significantly less than the number of 0’s). We shall construct a binary random LDGM code ensemble by specifying a random generator matrix G. Let G be a 𝑘 × 𝑚𝑛 matrix with entries selected independently from 𝔽2 using a Bernoulli distribution with parameter 𝑝𝑛 (i.e., Ber(𝑝𝑛 )). We shall choose 𝑝𝑛 such that 𝑝𝑛 → 0 and 𝑛𝑝𝑛 → ∞ as 𝑛 → ∞. Note that for such 𝑝𝑛 , the realization of G is a sparse generator matrix with high

where 𝑛

1∑ ˆ 𝑖 (𝑢𝑘 )), 𝑑𝑛 (𝑥 , 𝒞(G, 𝑓 )) = min 𝑑(𝑥𝑖 , 𝑋 𝑛 𝑖=1 𝑢𝑘 ∈𝔽𝑘 2 𝑛

𝑥𝑛 ∈ 𝒳 𝑛 .

Note that 𝑅(𝑛, 𝑘) is not the rate of the linear code generated by G; instead, it should be interpreted as the rate5 of the lossy source code 𝒞(G, 𝑓 ) induced by G and 𝑓 (⋅). Now we are ready to state the main result of this section. : 𝒳 → 𝔽𝑚 Theorem 2: Let 𝑃𝑋∣𝑋 ˜ 2 be a transition probability distribution such that the output distribution 𝑃𝑋˜ induced by is uniform over 𝔽𝑚 𝑃𝑋 and 𝑃𝑋∣𝑋 ˜ 2 . Let 𝑓 (⋅) be a deterministic 𝑚 ˆ mapping from 𝔽2 to 𝒳 . For any 𝜖 > 0, we have ˜ + 𝜖, 𝑅(𝑛, 𝑘) ≤ 𝐼(𝑋; 𝑋) ˜ +𝜖 𝐷(𝑛, 𝑘, 𝑚, 𝑓 ) ≤ 𝔼[𝑑(𝑋, 𝑓 (𝑋))]

IV. A P RACTICAL E NCODING S CHEME In this section, we shall propose a low-complexity encoding scheme based on the survey-propagation algorithm. Let 𝐺 be a 𝑘 × 𝑚𝑛 low-density generator matrix. It is well known that one can represent 𝐺 by a factor graph with 𝑘 variable nodes and 𝑚𝑛 check nodes (see Fig. 2). Label these variable nodes and check nodes by {𝑉1 , 𝑉2 , ⋅ ⋅ ⋅ , 𝑉𝑘 } and {𝐶1 , 𝐶2 , ⋅ ⋅ ⋅ , 𝐶𝑚𝑛 }, respectively. Let there be an edge connecting 𝑉𝑖 and 𝐶𝑗 if and only if the (𝑖, 𝑗)-entry of 𝐺 is one. Let 𝒜𝑘 be the set of indices of check nodes connected to 5 Strictly speaking, 𝑅(𝑛, 𝑘) is an upper bound on the rate of 𝒞(G, 𝑓 ) since ˆ 𝑛 (𝑢𝑘 ) = 𝑋 ˆ 𝑛 (𝑣𝑘 ) for some 𝑢𝑘 ∕= 𝑣𝑘 . it is possible that 𝑋


Fig. 2.

Factor graph of an LDGM code.

1647

and source node (cf. (9)), then sends the messages back to the adjacent check nodes (cf. (10)). 3) Each check node computes the new messages based on those from the adjacent variable nodes and network node. If the messages converge at all check nodes or the maximum number of iterations (e.g., 100) is reached, then go to 4), otherwise go to 1). computes 4) Each variable node (say, 𝑉𝑖 ) (𝑀𝑉0𝑖 , 𝑀𝑉1𝑖 , 𝑀𝑉∗𝑖 ), where ∏ (1 − 𝑀𝐶1 𝑗 →𝑉𝑖 ) 𝑀𝑉0𝑖 = 𝑗∈𝒜𝑖

𝑉𝑖 and ℬ𝑗 be the set of indices of variable nodes connected to 𝐶𝑗 . Each variable node 𝑉𝑖 is associated with its corresponding information bit 𝑢𝑖 . The value of 𝐶𝑗 can be computed using the values of variable nodes in ℬ𝑗 through modulo-2 addition. We further introduce 𝑛 source nodes and 𝑛 network nodes, denoted by {𝑆1 , 𝑆2 , ⋅ ⋅ ⋅ , 𝑆𝑛 } and {𝑁1 , 𝑁2 , ⋅ ⋅ ⋅ , 𝑁𝑛 }, respectively. Each source node 𝑆𝑙 is connected with its corresponding network node 𝑁𝑙 ; moreover, we let 𝑆𝑙 and 𝑁𝑙 be associated with source symbol 𝑥𝑙 and its corresponding reconstruction symbol 𝑥 ˆ𝑙 , respectively. We partition {𝐶1 , 𝐶2 , ⋅ ⋅ ⋅ , 𝐶𝑚𝑛 } into 𝑛 disjoint subsets, each containing 𝑚 check nodes. Each subset is connected to a unique network node. Specifically, we denote the index of the network node connected with 𝐶𝑗 as 𝜇(𝑗) and denote the set of indices of check nodes connected to 𝑁𝑙 as 𝒩𝑙 . Label each element in 𝒩𝑙 with a unique number in {1, ⋅ ⋅ ⋅ , 𝑚}. Note that such a labeling induces a mapping 𝜈(⋅) from {1, 2, ⋅ ⋅ ⋅ , 𝑚𝑛} ˆ𝑙 ) is determined by the to {1, ⋅ ⋅ ⋅ , 𝑚}. The value of 𝑁𝑙 (i.e., 𝑥 ˆ check nodes in 𝒩𝑙 through a mapping 𝑓 : 𝔽𝑚 2 → 𝒳. Our goal is to find an efficient way to determine the values of the variable nodes so that the distortion between the resulting reconstruction sequence 𝑥 ˆ𝑛 and the given source 𝑛 sequence 𝑥 is made reasonably small. To this end, we propose a message-passing algorithm tailored to the aforementioned factor graph. Specifically, messages are passed among different types of nodes with each message representing a probability distribution. They are specified as follows. ∙ ∙ ∙ ∙ ∙

From 𝑉𝑖 to 𝐶𝑗 , 𝑗 ∈ 𝒜𝑖 : 𝑀𝑉0𝑖 →𝐶𝑗 , 𝑀𝑉1𝑖 →𝐶𝑗 , and 𝑀𝑉∗𝑖 →𝐶𝑗 . From 𝐶𝑗 to 𝑉𝑖 , 𝑖 ∈ ℬ𝑗 : 𝑀𝐶0 𝑗 →𝑉𝑖 , 𝑀𝐶1 𝑗 →𝑉𝑖 , and 𝑀𝐶∗ 𝑗 →𝑉𝑖 . From 𝐶𝑗 to 𝑁𝑙 , 𝑙 = 𝜇(𝑗): 𝑀𝐶0 𝑗 →𝑁𝑙 , 𝑀𝐶1 𝑗 →𝑁𝑙 , and 𝑀𝐶∗ 𝑗 →𝑁𝑙 . From 𝑆𝑙 to 𝑁𝑙 : 𝑀𝑆𝑥𝑙 →𝑁𝑙 , 𝑥 ∈ 𝒳 . 0 1 From 𝑁𝑙 to 𝐶𝑗 , 𝑗 ∈ 𝒩𝑙 : 𝑀𝑁 and 𝑀𝑁 . 𝑙 →𝐶𝑗 𝑙 →𝐶𝑗

Remark: The introduction of the free state ∗ is one of the main features of the survey-propagation algorithm. Here is a description of the message-passing routine (also see Fig. 3). 1) Each check node sends messages to the adjacent variable nodes (cf. (6)) and network node (cf. (7) and (8)). The initial message is set to be (0.5, 0.5, 0). 2) Each variable node computes the new messages based on those from the adjacent check nodes, then sends the messages back to the adjacent check nodes (cf. (5)). Similarly, each network node computes the new messages based on those from the adjacent check nodes

−

∏

(1 − 𝑀𝐶1 𝑗 →𝑉𝑖 − 𝑀𝐶0 𝑗 →𝑉𝑖 ),

𝑗∈𝒜𝑖

𝑀𝑉1𝑖

=

∏

(1 − 𝑀𝐶0 𝑗 →𝑉𝑖 )

𝑗∈𝒜𝑖

−

∏

(1 − 𝑀𝐶1 𝑗 →𝑉𝑖 − 𝑀𝐶0 𝑗 →𝑉𝑖 ),

𝑗∈𝒜𝑖

𝑀𝑉∗𝑖

=

∏

(1 − 𝑀𝐶1 𝑗 →𝑉𝑖 − 𝑀𝐶0 𝑗 →𝑉𝑖 ).

𝑗∈𝒜𝑖

Find those variable nodes whose bias value (i.e., ∣𝑀𝑉0𝑘 − 𝑀𝑉1𝑘 ∣) is greater than a certain threshold and fix those variables accordingly. If there is no variable node whose bias value is greater than the threshold, then we fix the value of the variable node with the maximum bias value. 5) Remove those fixed variable nodes from the factor graph. If all the variable nodes are fixed, then go to 6), otherwise go to 1). 6) Calculate the value of each check node through modulo2 addition of the values of its adjacent variable nodes. Determine the value of each network node using the values of its adjacent check nodes through the deterministic mapping 𝑓 (⋅). Remark: All the messages need to be normalized. Now we proceed to apply the proposed message-passing algorithm to various sources and distortion measures to verify its effectiveness.

A. Non-Uniform Binary Source With Hamming Distortion Measure Let 𝒳 = 𝒳ˆ = {0, 1} and 𝑑(⋅, ⋅) be the Hamming distortion measure, i.e., 𝑑(𝑥, 𝑥ˆ) = 0 if 𝑥 = 𝑥 ˆ and 𝑑(𝑥, 𝑥ˆ) = 1 otherwise. Consider a binary source 𝑋 with 𝑃𝑋 (0) = 0.25. Let 𝑃𝑋ˆ ∗ be the output distribution induced by 𝑃𝑋 and the optimal test channel 𝑃𝑋ˆ ∗ ∣𝑋 . Note that 𝑃𝑋ˆ ∗ is rate-dependent. Let 𝑚 = ˜ be uniformly distributed over 𝔽4 . A deterministic 4 and 𝑋 2 mapping 𝑓 : 𝔽42 → 𝒳 is chosen so that 𝑃𝑓 (𝑋) ˜ is a good approximation of 𝑃𝑋ˆ ∗ . Table I lists the optimal 𝑃𝑓 (𝑋) ˜ for different rates. Note that although the optimal 𝑃𝑓 (𝑋) ˜ for a given rate is in general unique, the deterministic mapping 𝑓 (⋅) that induces this distribution is clearly not unique. The message-passing equations are given in Fig. 3. In (9), we choose 𝛼𝑥𝑙 ,𝑥′ = 1 if 𝑥′ = 𝑥𝑙 and 𝛼𝑥𝑙 ,𝑥′ = 𝛼 otherwise. This choice is tailored to the Hamming distortion measure.

1648


Variable node to check node 𝑀𝑉0𝑖 →𝐶𝑗 = 𝑀𝑉1𝑖 →𝐶𝑗 = 𝑀𝑉∗𝑖 →𝐶𝑗 =

∏ 𝑞∈𝒜𝑖 ∖{𝑗}

∏

𝑞∈𝒜𝑖 ∖{𝑗}

∏


1 (1 − 𝑀𝐶 )− 𝑞 →𝑉𝑖 0 (1 − 𝑀𝐶 )− 𝑞 →𝑉𝑖

∏ 𝑞∈𝒜𝑖 ∖{𝑗}

∏


1 0 (1 − 𝑀𝐶 − 𝑀𝐶 ) 𝑞 →𝑉𝑖 𝑞 →𝑉𝑖 1 0 (1 − 𝑀𝐶 − 𝑀𝐶 ) 𝑞 →𝑉𝑖 𝑞 →𝑉𝑖

(5)

1 0 (1 − 𝑀𝐶 − 𝑀𝐶 ) 𝑞 →𝑉𝑖 𝑞 →𝑉𝑖

Check node to variable node 0 𝑀𝐶 = 𝑗 →𝑉𝑖

1 0 1 + 𝑀𝑁 ) (𝑀𝑁 𝜇(𝑗) →𝐶𝑗 𝜇(𝑗) →𝐶𝑗 2

∏ 𝑞∈ℬ𝑗 ∖{𝑖}

1 0 1 + (𝑀𝑁 − 𝑀𝑁 ) 𝜇(𝑗) →𝐶𝑗 𝜇(𝑗) →𝐶𝑗 2 1 𝑀𝐶 = 𝑗 →𝑉𝑖

1 0 1 + 𝑀𝑁 ) (𝑀𝑁 𝜇(𝑗) →𝐶𝑗 𝜇(𝑗) →𝐶𝑗 2 −

∏

𝑞∈ℬ𝑗 ∖{𝑖}

∏


1 0 1 − 𝑀𝑁 ) (𝑀𝑁 𝜇(𝑗) →𝐶𝑗 𝜇(𝑗) →𝐶𝑗 2

(𝑀𝑉0𝑞 →𝐶𝑗 + 𝑀𝑉1𝑞 →𝐶𝑗 ) (𝑀𝑉0𝑞 →𝐶𝑗 − 𝑀𝑉1𝑞 →𝐶𝑗 )

(𝑀𝑉0𝑞 →𝐶𝑗 + 𝑀𝑉1𝑞 →𝐶𝑗 )

∏


(6)

(𝑀𝑉0𝑞 →𝐶𝑗 − 𝑀𝑉1𝑞 →𝐶𝑗 )

∗ 0 1 = 1 − 𝑀𝐶 − 𝑀𝐶 𝑀𝐶 𝑗 →𝑉𝑖 𝑗 →𝑉𝑖 𝑗 →𝑉𝑖

Check node to network node 0 = 𝑀𝐶 𝑗 →𝑁𝑙 1 𝑀𝐶 𝑗 →𝑁𝑙

1 ∏ 1 ∏ (𝑀𝑉0𝑞 →𝐶𝑗 + 𝑀𝑉1𝑞 →𝐶𝑗 ) + (𝑀𝑉0𝑞 →𝐶𝑗 − 𝑀𝑉1𝑞 →𝐶𝑗 ) 2 𝑞∈ℬ 2 𝑞∈ℬ 𝑗

𝑗

𝑗

𝑗

1 ∏ 1 ∏ = (𝑀𝑉0𝑞 →𝐶𝑗 + 𝑀𝑉1𝑞 →𝐶𝑗 ) − (𝑀𝑉0𝑞 →𝐶𝑗 − 𝑀𝑉1𝑞 →𝐶𝑗 ) 2 𝑞∈ℬ 2 𝑞∈ℬ

(7)

∗ 0 1 = 1 − 𝑀𝐶 − 𝑀𝐶 𝑀𝐶 𝑗 →𝑁𝑙 𝑗 →𝑁𝑙 𝑗 →𝑁𝑙

*If all 𝑉𝑞 ∈ ℬ𝑗 are fixed, then (1 − 𝑐𝑗 ) exp(𝛿) + 𝑐𝑗 exp(−𝛿) exp(𝛿) + exp(−𝛿) + 𝑊𝑠𝑜𝑢 𝑐𝑗 exp(𝛿) + (1 − 𝑐𝑗 ) exp(−𝛿) = exp(𝛿) + exp(−𝛿) + 𝑊𝑠𝑜𝑢 𝑊𝑠𝑜𝑢 = exp(𝛿) + exp(−𝛿) + 𝑊𝑠𝑜𝑢

0 𝑀𝐶 = 𝑗 →𝑁𝑙 1 𝑀𝐶 𝑗 →𝑁𝑙 ∗ 𝑀𝐶 𝑗 →𝑁𝑙

(8)

Source node to network node 𝑀𝑆𝑥𝑙 →𝑁𝑙 = 𝑀𝑆𝑥𝑙 →𝑁𝑙 = Network node to check node 0 𝑀𝑁 = 𝑙 →𝐶𝑗 1 𝑀𝑁 = 𝑙 →𝐶𝑗

𝛼𝑥𝑙 ,𝑥𝑙 exp(𝛾) + 𝛼𝑥𝑙 ,𝑥𝑙 ∑

𝛼𝑥𝑙 ,𝑥𝑙 exp(𝛾) ∑

𝑥′ ∈𝒳 :𝑥′ ∕=𝑥𝑙

𝛼𝑥𝑙 ,𝑥 exp(−𝛾) ∑ exp(𝛾) + 𝑥′ ∈𝒳 :𝑥′ ∕=𝑥𝑙 𝛼𝑥𝑙 ,𝑥′ exp(−𝛾) (𝑀𝑆𝑟 𝑙 →𝑁𝑙

(𝑘1 ,⋅⋅⋅ ,𝑘𝑚 ):𝑘𝜈(𝑗) =0

∑

𝛼𝑥𝑙 ,𝑥′ exp(−𝛾)

(𝑀𝑆𝑟 𝑙 →𝑁𝑙

(𝑘1 ,⋅⋅⋅ ,𝑘𝑚 ):𝑘𝜈(𝑗) =1

∏ 𝑞∈𝒩𝑙 ∖{𝑗}

∏

𝑞∈𝒩𝑙 ∖{𝑗}

𝑘𝜈(𝑞)

𝑀𝐶𝑞 →𝑁 ) 𝑙

𝑘

𝑀𝐶𝜈(𝑞) ) 𝑞 →𝑁 𝑙

𝑥 = 𝑥𝑙 (9) 𝑥 ∕= 𝑥𝑙

𝑟 = 𝑓 (𝑘1 , ⋅ ⋅ ⋅ , 𝑘𝑚 ) 𝑟 = 𝑓 (𝑘1 , ⋅ ⋅ ⋅ , 𝑘𝑚 )

(10)

Fig. 3. Message-passing equations. Here 𝑐𝑗 is the modulo-2 addition of the values of the variable nodes in ℬ𝑗 while 𝛿, 𝑊𝑠𝑜𝑢 , 𝛼𝑥𝑙 ,𝑥′ , and 𝛾 are the parameters that can be adjusted.

B. Uniform Ternary Source With Hamming Distortion Measure Let 𝒳 = 𝒳ˆ = {0, 1, 2} and 𝑑(⋅, ⋅) be the Hamming distortion measure. Consider a ternary source 𝑋 with 𝑃𝑋 (0) = 𝑃𝑋 (1) = 𝑃𝑋 (2) = 13 . Let 𝑚 = 4. It is easy to verify that we have 𝑃𝑋ˆ ∗ (0) = 𝑃𝑋ˆ ∗ (1) = 𝑃𝑋ˆ ∗ (2) = 13 for all rates. Correspondingly, we choose 𝑓 : 𝔽42 → 𝒳 such that 5 5 6 (𝑃𝑓 (𝑋) ˜ (0), 𝑃𝑓 (𝑋) ˜ (1), 𝑃𝑓 (𝑋) ˜ (2)) = ( 16 , 16 , 16 ) for all rates.

C. Uniform Ternary Source With ℓ1 Distortion Measure Let 𝒳 = 𝒳ˆ = {0, 1, 2} and 𝑑(⋅, ⋅) be the ℓ1 distortion measure, i.e., 𝑑(𝑥, 𝑥ˆ) = ∣𝑥 − 𝑥 ˆ∣. Consider a ternary source 𝑋 with 𝑃𝑋 (0) = 𝑃𝑋 (1) = 𝑃𝑋 (2) = 13 . Let 𝑚 = 4. Table II lists the optimal 𝑃𝑓 (𝑋) ˜ for different rates. The message-passing equations are given in Fig. 3. In (9), we choose 𝛼𝑥𝑙 ,𝑥′ = 1 if 𝑥′ = 𝑥𝑙 , 𝛼𝑥𝑙 ,𝑥′ = 𝛼 if ∣𝑥𝑙 − 𝑥′ ∣ = 1, and 𝛼𝑥𝑙 ,𝑥′ = 𝛽 if ∣𝑥𝑙 − 𝑥′ ∣ = 2. This choice is tailored to the ℓ1 distortion measure. Since smaller distortion is preferable, we always set 𝛼 to be larger than 𝛽.


TABLE I T HE OPTIMAL 𝑃𝑓 (𝑋) ˜ FOR DIFFERENT RATES .

𝑅

(𝑃𝑓 (𝑋) ˜ (0), 𝑃𝑓 (𝑋) ˜ (1))

0.479 - 1.000

4 12 ( 16 , 16 )

0.096 - 0.227 0.000 - 0.095

0.9 RL,4(D) 0.8 0.7 0.6

3 13 ( 16 , 16 ) 2 14 ( 16 , 16 )

0.5 0.4

1 15 ( 16 , 16 )

0.3 0.2

TABLE II T HE OPTIMAL 𝑃𝑓 (𝑋) ˜ FOR DIFFERENT RATES .

𝑅

(𝑃𝑓 (𝑋) ˜ (0), 𝑃𝑓 (𝑋) ˜ (1), 𝑃𝑓 (𝑋) ˜ (2))

0.43 - 1.58

5 6 5 ( 16 , 16 , 16 ) 4 8 4 ( 16 , 16 , 16 )

0.22 - 0.43

R(D) 1k 10k

R

0.228 - 0.478

1649

0.1 0

0

0.05

0.1

0.15

0.2

0.25

D

Fig. 4.

Non-uniform binary source with Hamming distortion measure.

3 10 3 ( 16 , 16 , 16 )

0.13 - 0.22

2 12 2 ( 16 , 16 , 16 )

0 - 0.13

2

TABLE III T HE OPTIMAL 𝑃𝑓 (𝑋) ˜ FOR DIFFERENT RATES .

1.8

R(D) RL,4(D)

1.6

1k 10k

1.4

R

1.2

D. Simulation Results Now we examine the performance of the proposed scheme for the aforementioned three cases. The algorithm is implemented using C. The degree distributions of the LDGM codes are obtained from [33], which are optimized for the AWGN channel. The simulation results are shown in Fig. 4, Fig. 5, and Fig. 6, respectively. Source sequences of length 1000 and 10000 are tested. Since 𝑚 = 4, the corresponding LDGM codes are of length 4000 and 40000, respectively. The damping method [11] is used in the message-passing algorithm if the messages do not converge after 30 iterations. The decimation threshold is set to be 0.9 and the maximum number of iterations is set to be 100. Each plotted point is obtained by averaging over 1000 source sequences. It can be seen that the resulting distortions are close to the theoretical lower bound in all these three cases. Note that although the degrees of variable nodes need to grow unboundedly in 𝑛 to achieve the rate-distortion bound, simulation results indicate that relatively small degrees suffice for practical purposes. Moreover, it is worth mentioning that for a fixed 𝑛, the performance of the proposed messagepassing algorithm may not improve with increasing degrees, and may even degrade. V. C ONCLUSION We have shown that binary LDGM codes can achieve the rate-distortion bound of discrete memoryless sources with general distortion measure via multilevel quantization. Moreover, a practical encoding scheme based on the survey-propagation algorithm has been proposed. The simulation results validate the effectiveness of the proposed scheme. It is of considerable interest to generalize the results in the present work to the

1 0.8 0.6 0.4 0.2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

D

Fig. 5.

Uniform ternary source with Hamming distortion measure.

multi-user scenarios such as distributed source coding and multiple description coding. A PPENDIX A P ROOF OF T HEOREM 1 It is clear that 𝑅𝐿,𝑚 (𝐷) is lower-bounded by 𝑅(𝐷) for all 𝑚. Therefore, it suffices to show that lim sup𝑚→∞ 𝑅𝐿,𝑚 (𝐷) ≤ 𝑅(𝐷). Let 𝑃𝑋ˆ ∗ ∣𝑋 be the optimal test channel associated with the rate-distortion pair (𝑅(𝐷), 𝐷) and 𝑃𝑋ˆ ∗ be the output distribution induced by 𝑃𝑋 and 𝑃𝑋ˆ ∗ ∣𝑋 . We shall construct a ˆ∗ − 𝑋 ˜ form ˜ over 𝔽𝑚 such that 𝑋 − 𝑋 random variable 𝑋 2 a Markov chain. Let 𝑘𝑥ˆ be the greatest integer satisfying 𝑘𝑥ˆ 𝑥), 𝑥 ˆ ∈ 𝒳ˆ . Let 𝒦𝑥ˆ , 𝑥 ˆ ∈ 𝒳ˆ , be disjoint subsets of ˆ ∗ (ˆ 2𝑚 ≤ 𝑃𝑋 𝑚 ∗ 𝔽2 with ∣𝒦𝑥ˆ ∣ = 𝑘𝑥ˆ . Let 𝒦 = 𝔽𝑚 ˆ . Now construct 2 ∖ ∪𝑥 ˆ∈𝒳ˆ 𝒦𝑥 the transition probability distribution 𝑃𝑋∣ ˆ ∗ as follows: ˜ 𝑋 ⎧ 1 𝑥˜ ∈ 𝒦𝑥ˆ  ⎨ 2𝑚 𝑃𝑋(ˆ ∗ (ˆ𝑥) , ) 𝑘 1 𝑃𝑋∣ . 𝑥∣ˆ 𝑥) = 1 − 2𝑚 𝑃 ˆ𝑥ˆ∗ (ˆ𝑥) , 𝑥˜ ∈ 𝒦∗ ˆ ∗ (˜ ˜ 𝑋 ∗  𝑋 ⎩ ∣𝒦 ∣ 0, otherwise

1650


For each 𝑥𝑛 ∈ 𝒳 𝑛 , define ∑ ˜ 𝑛 (𝑢𝑘 )) ∈ 𝒯 𝛿,𝑛 ), 𝜙𝑛 (𝑥𝑛 , G) = 𝕀((𝑥𝑛 , 𝑋 𝑃 ˜

2 1.8

R(D) R (D)

1.6

1k 10k

L,4

𝜓𝑛 (𝑥𝑛 , G) 𝑛 ) ∑ (1 ∑ ˆ 𝑖 (𝑢𝑘 )) ≤ 𝔼[𝑑(𝑋, 𝑓 (𝑋))] ˜ +𝜖 . = 𝕀 𝑑(𝑥𝑖 , 𝑋 𝑛 𝑖=1 2 𝑘 𝑘

1.4

R

1.2 1

𝑢 ∈𝔽2

∑𝑛 𝑥𝑖 )) ≤ Let 𝛿 be small enough so that 𝑛1 𝑖=1 𝑑(𝑥𝑖 , 𝑓 (˜ 𝛿,𝑛 𝑛 ˜ + 𝜖 whenever (𝑥𝑛 , 𝑥 𝔼[𝑑(𝑋, 𝑓 (𝑋))] ˜ ) ∈ 𝒯 . As a con𝑃𝑋,𝑋 2 ˜ sequence, we have

0.8 0.6 0.4 0.2 0

𝜙𝑛 (𝑥𝑛 , G) ≤ 𝜓𝑛 (𝑥𝑛 , G) 0

0.1

0.2

0.3

0.4

0.5

0.6

Uniform ternary source with ℓ1 distortion measure.

Without loss of generality, here we only define 𝑃𝑋∣ 𝑥) ˆ ∗ (⋅∣ˆ ˜ 𝑋 𝑥 ∈ 𝒳ˆ : 𝑃𝑋ˆ ∗ (ˆ 𝑥) > 0}. It can be for 𝑥 ˆ ∈ 𝒳ˆ+ , where 𝒳ˆ+ = {ˆ verified that ∑ 1 𝑥) = 𝑃𝑋ˆ ∗ (ˆ 𝑥)𝑃𝑋∣ 𝑥∣ˆ 𝑥) = 𝑚 𝑃𝑋˜ (˜ ˜ 𝑋 ˆ ∗ (˜ 2 𝑥 ˆ∈𝒳ˆ+

if 𝑥 ˜ ∈ 𝒦𝑥ˆ for some 𝑥 ˆ ∈ 𝒳ˆ , and ∑ 𝑃𝑋˜ (˜ 𝑥) = 𝑃𝑋ˆ ∗ (ˆ 𝑥)𝑃𝑋∣ 𝑥∣ˆ 𝑥) ˆ ∗ (˜ ˜ 𝑋 𝑥 ˆ∈𝒳ˆ+

=

𝑘𝑥ˆ ) 1 1 ∑ ( (ˆ 𝑥 ) − 𝑃 = 𝑚 ˆ∗ 𝑋 ∣𝒦∗ ∣ 2𝑚 2 𝑥 ˆ∈𝒳ˆ+

˜ is uniformly disif 𝑥˜ ∈ 𝒦∗ . Therefore, the constructed 𝑋 tributed over 𝔽𝑚 . 2 ˆ Define a deterministic mapping 𝑓 : 𝔽𝑚 2 → 𝒳 such that 𝑓 (˜ 𝑥) = 𝑥ˆ if 𝑥˜ ∈ 𝒦𝑥ˆ for some 𝑥ˆ ∈ 𝒳ˆ , and 𝑓 (˜ 𝑥) is set to be an arbitrary reconstruction symbol if 𝑥 ˜ ∈ 𝒦∗ . By the data processing inequality, we have ˜ ≥ 𝐼(𝑋; 𝑓 (𝑋)). ˜ ˆ ∗ ) ≥ 𝐼(𝑋; 𝑋) 𝑅(𝐷) = 𝐼(𝑋; 𝑋

(11)

Moreover, it is easy to verify that lim𝑚→∞ 𝑃𝑋,𝑓 (𝑋) ˆ) = ˜ (𝑥, 𝑥 𝑃𝑋,𝑋ˆ ∗ (𝑥, 𝑥ˆ) for all 𝑥 ∈ 𝒳 and 𝑥 ˆ ∈ 𝒳ˆ . Therefore, we have ˜ = 𝔼[𝑑(𝑋, 𝑋 ˆ ∗ )] ≤ 𝐷. lim 𝔼[𝑑(𝑋, 𝑓 (𝑋))]

𝑚→∞

(13)

for all 𝑥𝑛 ∈ 𝒳 𝑛 .

0.7

D

Fig. 6.

𝑋,𝑋

𝑢𝑘 ∈𝔽𝑘 2

Let {𝑤𝑛 } be a sequence such that 𝑤𝑛 𝑝𝑛 → ∞ and 𝑤𝑛𝑛 → 0 as 𝑛 → ∞. Since 𝑌𝑖 (𝑢𝑘 ) = 𝑢𝑘 g𝑖 , 𝑖 = 1, 2, ⋅ ⋅ ⋅ , 𝑚𝑛, it is easy to see that 𝑌1 (𝑢𝑘 ), 𝑌2 (𝑢𝑘 ), ⋅ ⋅ ⋅ , 𝑌𝑚𝑛 (𝑢𝑘 ) are independent for any fixed 𝑢𝑘 ∈ 𝔽𝑘2 . Moreover, in view of the fact that 𝑘

1 + (1 − 2𝑝𝑛 )𝑤(𝑢 ) , Pr{𝑢 g𝑖 = 0} = 2 𝑘

we have 1 (14) 2 as 𝑛 → ∞, where the convergence in (14) is uniform for all 𝑖 ∈ {1, 2, ⋅ ⋅ ⋅ , 𝑚𝑛} and 𝑢𝑘 ∈ 𝔽𝑘2 with 𝑤(𝑢𝑘 ) ≥ 𝑤𝑛 . Therefore, it can be shown using [24, Lemma 13.6.2, p. 359] and a continuity argument that for any 𝛿 ′ > 0, one can choose a sufficiently small 𝛿 such that Pr{𝑌𝑖 (𝑢𝑘 ) = 0} →

′ ˜ ˜ 𝑛 (𝑢𝑘 )) ∈ 𝒯 𝛿,𝑛 )] 2−𝑛(𝐼(𝑋;𝑓 (𝑋))−𝛿 ) ≥ 𝔼[𝕀((𝑥𝑛 , 𝑋 𝑃 ˜ 𝑋,𝑋

≥2

′ ˜ −𝑛(𝐼(𝑋;𝑓 (𝑋))+𝛿 )

(15)

𝒯𝑃𝛿,𝑛 𝑋

and 𝑢𝑘 ∈ 𝔽𝑘2 with 𝑤(𝑢𝑘 ) ≥ 𝑤𝑛 when 𝑛 is for all 𝑥𝑛 ∈ sufficiently large. Note that ∑ ˜ 𝑛 (𝑢𝑘 )) ∈ 𝒯 𝛿,𝑛 )] 𝔼[𝕀((𝑥𝑛 , 𝑋 𝑃 ˜ 𝑢𝑘 ∈𝔽𝑘 2

=

𝑋,𝑋

∑

˜ 𝑛 (𝑢𝑘 )) ∈ 𝒯 𝛿,𝑛 )] 𝔼[𝕀((𝑥𝑛 , 𝑋 𝑃 ˜ 𝑋,𝑋

𝑤(𝑢𝑘 )≥𝑤𝑛

∑

+

˜ 𝑛 (𝑢𝑘 )) ∈ 𝒯 𝛿,𝑛 )]. 𝔼[𝕀((𝑥𝑛 , 𝑋 𝑃 ˜ 𝑋,𝑋

𝑤(𝑢𝑘 ) 0 is arbitrary, it follows that 1 lim inf log Pr{𝜓𝑛 (𝑋 𝑛 , G) > 0} ≥ 0. 𝑛→∞ 𝑛 Now invoking [7, Lemma 1], one can readily show that ˜ +𝜖+𝜖 𝐷(𝑛, 𝑘, 𝑚, 𝑓 ) ≤ 𝔼[𝑑(𝑋, 𝑓 (𝑋))] 2 2 ˜ +𝜖 = 𝔼[𝑑(𝑋, 𝑓 (𝑋))] for some sufficiently large 𝑛. The proof is complete. ACKNOWLEDGMENT The authors would like to thank Dr. Da-ke He and Dr. Ashish Jagmohan for their contribution in the early stages of this work. They also wish to thank Prof. Alexander Barg and Prof. Prakash Narayan of the University of Maryland for helpful discussions. R EFERENCES [1] C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion," IRE Nat. Conv. Rec., Part 4, pp. 142-163, 1959. [2] P. Elias, “Coding for noisy channels," IRE Conv. Rec., Part 4, pp. 37-46, 1955. [3] E. M. Gabidulin, “Limits for the decoding error probability when linear codes are used in memoryless channel," Probl. Inf. Transm., pp. 43-48, 1967, translated from Probl. Pered. Inform.. [4] F. Jelinek, “Tree encoding of memoryless time-discrete sources with a fidelity criterion," IEEE Trans. Inf. Theory, vol. IT-15, no. 5 pp. 584-590, Sep. 1969. [5] A. Viterbi and J. Omura, “Trellis encoding of memoryless discrete-time sources with a fidelity criterion," IEEE Trans. Inf. Theory, vol. IT-20, no. 3, pp. 325-332, May 1974. [6] E. Martinian and J. S. Yedidia, “Iterative quantization using codes on graphs," 41th Annual Allerton Conf. Commun., Control, Comput., Monticello, IL, Oct. 2003. [7] M. J. Wainwright and E. Martinian, “Low-density graph codes that are optimal for binning and coding with side information," IEEE Trans. Inf. Theory, vol. 55, no. 3, pp. 1061-1079, Mar. 2009. [8] T. Murayama, “Thouless-Anderson-Palmer approach for lossy compression," Physical Review E, vol. 69, no. 035105, pp. 1-4, 2004. [9] M. J. Wainwright and E. Maneva, “Lossy source coding via messagepassing and decimation over generalized codewords of LDGM codes," IEEE International Symp. Inf. Theory, Adelaide, Australia, Sept. 2005. [10] S. Cilibertia, M. Mézard, and R. Zecchina, “Lossy data compression with random gates," Phys. Rev. Lett., vol. 95, no. 038701, 2005. [11] T. Filler and J. Fridrich, “Binary quantization using belief propagation with decimation over factor graphs of LDGM codes," in Proc. Allerton Conf. Commun., Control, Comput., Monticello, IL, Sep. 2007. [12] M. Mézard, G. Parisi, and R. Zecchina, “Analytic and algorithmic solution of random satisfiability problems," Science, vol. 297, pp. 812815, June 2002. [13] E. Maneva, E. Mossel, and M. J. Wainwright, “A new look at survey propagation and its generalizations," in Proc. 16th Annual Symp. Discrete Algorithms (SODA), pp. 1089-1098, Jan. 2005. [14] A. Braunstein, M. Mézard, and R. Zecchina, “Survey propagation: an algorithm for satisfiability," Random Structures Algorithms, vol. 27, no. 2, pp. 201-226, Mar. 2005. [15] J. L. Massey, “Joint source and channel coding," Commun. Syst. Random Process Theory, vol. 11, pp. 279-293, 1978. [16] T. C. Ancheta, Jr., “Syndrome-source-coding and its universal generalizations," IEEE Trans. Inf. Theory, vol. IT-22, no. 4, pp. 432-436, July 1976. [17] E.-H. Yang and J. Kieffer, “Simple universal lossy data compression schemes derived from the Lempel-Ziv algorithm," IEEE Trans. Inf. Theory, vol. 42, no. 1, pp. 239-245, Jan. 1996. [18] E.-H. Yang, Z. Zhang, and T. Berger, “Fixed-slope universal lossy data compression," IEEE Trans. Inf. Theory, vol. 43, no. 5, pp. 1465-1479, Sep. 1996. [19] I. Kontoyiannis, “An implementable lossy version of the Lempel-Ziv algorithm—part I: optimality for memoryless sources," IEEE Trans. Inf. Theory, vol. 45, no. 7, pp. 2293-2305, Nov. 1999.

[20] A. Gupta, S. Verdù, and T. Weissman, “Rate-distortion in near-linear time," IEEE International Symp. Inf. Theory, Toronto, Canada, July 2008. [21] A. Gupta and S. Verdú, “Nonlinear sparse-graph codes for lossy compression," IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 1961-1975, May 2009. [22] S. Jalali and T. Weissman, “Rate-distortion via Markov chain Monte Carlo," IEEE Trans. Inf. Theory, submitted for publication. [Online]. Available: http://arxiv.org/abs/0808.4156. [23] S. B. Korada and R. Urbanke, “Polar codes are optimal for lossy source coding," IEEE Trans. Inf. Theory, vol. 56, no. 4, pp. 1751-1768, Apr. 2010. [24] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [25] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems. New York: Academic, 1981. [26] T. Weissman and E. Ordentlich, “The empirical distribution of rateconstrained source codes," IEEE Trans. Inf. Theory, vol. 51, no. 11, pp. 3718-3733, Nov. 2005. [27] A. Kanlis, S. Khudanpur, and P. Narayan, “Typicality of a good ratedistortion code" (in Russian), Probl. Pered. Inform. (Probl. Inf. Transm.), vol. 32, no. 1, pp. 96-103, 1996. [28] A. Barg and G. D. Forney, Jr., “Random codes: minimum distances and error exponents," IEEE Trans. Inf. Theory, vol. 48, no. 9, pp. 2568-2573, Sep. 2002. [29] A. Dembo and I. Kontoyiannis, “Source coding, large deviations, and approximate pattern matching," IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1590-1615, June 2002. [30] R. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [31] S. Kudekar and R. Urbanke, “Lower bounds on the rate-distortion function of individual LDGM codes," 5th International Symp. Turbo Codes Related Topics, Lausanne, Switzerland, pp. 379-384, Sep. 2008. [32] A. G. Dimakis, M. J. Wainwright, and K. Ramchandran, “Lower bounds on the rate-distortion function of LDGM codes," IEEE Inf. Theory Workshop, Lake Tahoe, CA, Sep. 2007. [33] [Online]. Available: http://lthcwww.epfl.ch.research/ldpcopt Zhibin Sun received the B.Eng. and M.A.Sc. degrees in electrical engineering from McMaster University, Hamilton, ON, Canada, in 2007 and 2009, respectively. He joined the Department of Standards and New Technology Engineering at Hydro One Networks Inc., Toronto, ON, Canada, in 2009, where he is now an Engineering Trainee.

Mingkai Shao received the B.Eng. degree in computer engineering from Huazhong University of Science and Technology, China, in 2004, and the M.Sc. degree in computer science from Wuhan University, China, in 2006. He is currently pursuing the Ph.D. degree in computer engineering at McMaster University, Hamilton, ON, Canada. His research interests include information theory and multimedia communications.

Jun Chen (S’03-M’06) received the B.E. degree with honors in communication engineering from Shanghai Jiao Tong University, Shanghai, China, in 2001 and the M.S. and Ph.D. degrees in electrical and computer engineering from Cornell University, Ithaca, NY, in 2004 and 2006, respectively. He was a Postdoctoral Research Associate in the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign, Urbana, IL, from 2005 to 2006, and a Josef Raviv Memorial Postdoctoral Fellow at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, from 2006 to 2007. He is currently an Assistant Professor of Electrical and Computer Engineering at McMaster University, Hamilton, ON, Canada. He holds the Barber-Gennum Chair in Information Technology. His research interests include information theory, wireless communications, and signal processing.


Kon Max Wong (SM’81-F’02) received his BSc(Eng), DIC, Ph.D., and DSc(Eng) degrees, all in electrical engineering, from the University of London, England, in 1969, 1972, 1974 and 1995, respectively. He started working at the Transmission Division of Plessey Telecommunications Research Ltd., England, in 1969. In October 1970 he was on leave from Plessey pursuing postgraduate studies and research at Imperial College of Science and Technology, London. In 1972, he rejoined Plessey as a research engineer and worked on digital signal processing and signal transmission. In 1976, he joined the Department of Electrical Engineering at the Technical University of Nova Scotia, Canada, and in 1981, moved to McMaster University, Hamilton, Canada, where he has been a Professor since 1985 and served as Chairman of the Department of Electrical and Computer Engineering in 1986-87, 1988-94 and 2003-08. Professor Wong was on leave as Visiting Professor at the Department of Electronic Engineering of the Chinese University of Hong Kong from 1997 to 1999. At present, he holds the Canada Research Chair in Signal Processing at McMaster University. His research interest is in signal processing and communication theory and has published over 250 papers in the area. Professor Wong was the recipient of the IEE Overseas Premium for the best paper in 1989, and is also the co-author of the papers that received the IEEE Signal Processing Society “Best Young Author" awards of 2006 and 2008. In 2009, he was awarded the Royal Academy of Engineering Distinguished Visiting Fellowship, and he has also been named a recipient of the Alexander von Humboldt Research Award for 2010. He is a Fellow of IEEE, a Fellow of the Institution of Electrical Engineers, a Fellow of the Royal Statistical Society, and a Fellow of the Institute of Physics. More recently, he has also

1653

been elected as Fellow of the Canadian Academy of Engineering as well as Fellow of the Royal Society of Canada. He was an Associate Editor of the IEEE T RANSACTIONS ON S IGNAL P ROCESSING, 1996-98 and served as Chair of the Sensor Array and Multi-channel Signal Processing Technical Committee of the IEEE Signal Processing Society in 2002-04. Professor Wong was the recipient of a medal presented by the International Biographical Centre, Cambridge, England, for his “outstanding contributions to the research and education in signal processing" in May 2000, and was honoured with the inclusion of his biography in the two books: Outstanding People of the 20th Century and 2000 Outstanding Intellectuals of the 20th Century, published by IBC to celebrate the arrival of the new millennium. Xiaolin Wu (M’89-SM’96) got his B.Sc. from Wuhan University, China in 1982, and Ph.D. from the University of Calgary, Canada, in 1988, both in computer science. Dr. Wu started his academic career in 1988, and has since been on the faculty of the University of Western Ontario, New York Polytechnic University, and currently McMaster University, where he is a professor at the Department of Electrical and Computer Engineering and holds the NSERC-DALSA industrial research chair in Digital Cinema. His research interests include multimedia signal compression, joint source-channel coding, multiple description coding, network-aware visual communication and image processing. He has published over two hundred research papers and holds two patents in these fields. Dr. Wu is an associate editor of IEEE T RANSACTIONS ON M ULTIMEDIA and an associate editor of IEEE T RANSACTIONS ON I MAGE P ROCESSING.