1
Polynomial Complexity Universal Block Decoding with Exponential Error Probability Decay Todd P. Coleman, Member, IEEE, Muriel M´edard, Senior Member, IEEE, and Michelle Effros, Senior Member, IEEE submitted to IEEE Transactions on Information Theory on 4/23/06
Abstract— We discuss the universal block decoding problem for memoryless systems. Our goal is to mimic the historical development of traditional codes and low-complexity decoding algorithms that exhibit vanishing error probability for all achievable rates when probability distributions are known. We discuss some properties of ‘universally good codes’ by showing how such codes lie on the Gilbert-Varshamov bound, providing an analogue of the minimum-distance criterion to guarantee universal decoding success, and discussing robust error exponent generalizations. We show that universal decoding over linear codes is NP-complete. We consider large linear code constructions built by using bipartite graphs and fixed-length universally good codes and show that the aggregate codes have universally good properties. Using such constructions, we discuss two polynomial complexity universal decoding algorithms - one based upon linear programming decoding and the other based upon iterative symbol-flipping ‘expander graph’ decoding - both of which have exponential error probability decay in the block length for all achievable rates.
I. I NTRODUCTION In the information theory literature there has been discussion on universal coding, where encoders and decoders are constructed that operate without knowledge of the underlying probability distribution. From an ontological perspective, there has been much success - it has been shown that for numerous settings [1], [2], [3], [4], [5], there exist block encoders and decoders that can attain the same error exponent (exponential rate of decay in probability of error) as that of the random coding exponent corresponding to maximum-likelihood (ML) decoding. Such universal decoding algorithms have also served as subcomponents of other multiterminal communication systems - for instance statistical inference problems under data compression constraints [6], [7], [8], [9], [10]. As in the typical channel coding case, the encoding situation is not nearly as difficult as the decoding situation. Sequential decoding approaches with algebraic convolutional codes have been developed for certain universal settings, but either the error probability performance has not been quantified [11] or the decoder complexity becomes unbounded at rates above the cutoff rate [12]. Indeed, the proposed universal decoding algorithms’ nonlinearities and difficulty in implementation have stymied T. P. Coleman (
[email protected]) is with the Coordinated Science Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. M. M´edard (
[email protected]) is with the Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. M. Effros (
[email protected]) is with the Data Compression Laboratory, Department of Electrical Engineering, California Institute of Technology.
constructing practical codes and decoding algorithms. This apparent intrinsic difficulty in universal decoding is epitomized in the following [13, Sec. I]: “Theoretically, one can employ universal decoding; however, in many applications, it is ruled out by complexity considerations.” We begin by making some remarks about certain advances in the traditional coding literature: a Linear codes have been known to be sufficient for many channel coding problems to attain all achievable rates and the random coding exponent. However, ML decoding for general linear codes has been shown [14] to be an intrinsically complex (NP-complete) problem. b A ‘divide-and-conquer’ approach has been employed by coding theorists since the beginnings of information theory to construct large linear codes from smaller good components with polynomial complexity decoding algorithms whose performance is empirically good [15] as well as provably good [16], [17]. c The ‘divide-and-conquer’ approach to error-correcting codes has been sharpened with help of a well-formalized ‘codes on graphs’ perspective [18], [19] as well as recent ‘expansion’ results in graph theory [20] to construct linear complexity iterative decoding algorithms that achieve capacity from both an empirical [21], [22] as well as theoretical [23], [24], [25] perspective. Furthermore, iterative algorithms with empirical performance guarantees have been shown [26], [27] to be deeply connected to a linear programming relaxation decoder [28], [29], [30], [31] that lends itself to theoretical analysis. The above remarks suggest an approach to constructing practical codes and decoding algorithms for the universal setting: a’ Linear codes have been known universally to attain all achievable rates with exponential error probability decay [1], [2]. In this discussion we show that universal decoding general with linear codes is also a provably complex (NP-complete) problem. b’ We employ a ‘divide-and-conquer’ graph-based approach to show that large linear codes constructed from smaller ‘universally good’ component codes are aggregately provably good under optimal universal decoding. c’ With a codes on graphs perspective, we use expander graphs to construct linear codes and two polynomial complexity decoding algorithms, both of which exhibit exponential error probability decay. One approach is a linear programming relaxation based decoder inspired by
2
[28], [29], [30] and the other is a linear complexity iterative decoder inspired by the ‘expander codes’ framework of [23], [24], [25]. In Section II we develop models and definitions and then consider the general universal block decoding problem in point-to-point settings. Section III addresses approach a’ by discussing the various properties of ‘universally good’ linear codes. Specifically, here we show that such codes have excellent distance properties (lying on the Gilbert-Varshamov [32, p. 42-43] bound), we provide guarantees on universal decoding success, and we develop a universal robustness property of such codes that generalizes error exponents. The properties of such codes discussed here will be necessary to illustrate exponential error probability guarantees on the decoding algorithms we deveop later. Section IV relates to approach a’ by illustrating that universal decoding over general linear codes is NP-complete. Section V addresses approach b’ by developing the codes on graphs perspective and illustrates that large linear codes built graphically upon smaller universally good linear codes result in codes that have universally good properties. Sections VI and VII address approach c’ by analyzing two decoding methods for large linear codes based on expander graphs and smaller universally good codes. Section VI discusses a polynomial linear programming-based decoder, while Section VII discusses a linear complexity iterative symbol-flipping decoder. Both decoding algorithms result in exponential error probability decay for all achievable rates. Section VIII extends all these approaches to multiterminal settings including Slepian-Wolf distributed data compression and certain types of multiple-access channel coding. Finally, Section IX provides a discussion and conclusion. II. T HE G ENERAL U NIVERSAL D ECODING P ROBLEM FOR A S INGLE S OURCE A. Model and Definitions Throughout this discussion we shall consider a discrete memoryless source (DMS) U over U = {0, 1, . . . , Q−1}. The set of all probability mass functions (pmfs) on U is given by P (U). For a length-N sequence u = (u1 , u2 , . . . , uN ) ∈ U N , the type [33] Pu ∈ P (U) is the pmf defined by Pu (a) = N 1 N the i=1 1{ui =a} , for all a ∈ U. We denote by W N N pmf induced on U by N independent drawings according to W . We denote by PN (U) the subset of P (U) consisting of the possible types of sequences u ∈ U N . For any type P ∈ PN (U), the type class T (P) is the set of all u ∈ U N such that Pu = P. To summarize, we have that P (U) = P = {P (a)}a∈U : P ≥ 0, P (a) = 1 Pu =
N 1 1u =a N i=1 i
a∈U
for u ∈ U N
(1)
a∈U
PN (U) = P ∈ P (U) : P = Pu for some u ∈ U N
T (P) = u ∈ U N | Pu = P .
For a discrete random variable U with pmf W we shall denote its entropy as H (U ), which is a function of W . When we
wish to refer explicitly to the entropy as a function of some P ∈ PN (U), then we shall denote this as H (P ). For two random variables with conditional and marginal distributions given by PU |V and PV , we shall denote the conditional entropy H (U |V ) explicitly interms of PU |V and PV as H PU |V |PV . We now state the following lemma [33] that will be used throughout our discussion: Lemma 2.1: |PN (U)| |T (P)| W n (u)
|U |
≤ (N + 1) ≤ 2N H(P) = 2−N [H(Pu )+D(Pu W )] ∀ u ∈ U N
(2) (3) (4)
Thus the number of types is polynomial in N . B. The General Problem In this discussion we consider code constructions for fixed block length universal coding for the two dual settings of data compression and channel coding. The compression scenario mentioned could be relevant, for instance, in a wireless sensor network where the following two points apply: 1) The time-varying nature of the environment hinders the ability to track accurately certain parameters of the probability distribution. 2) Complexity, memory, and energy constraints make a universal fixed-to-fixed length algebraic compression approach more viable than a universal fixed-to-variable length compression approach (such as Lempel-Ziv [34], [35] or Burrows-Wheeler [36]) that requires dictionaries and table-lookups. Similarly, owing to the time-varying and multipath effects of the wireless channel, the universal channel coding scenario could be relevant where phase information cannot be tracked accurately. More specifically, we are interested in universal decoding for discrete memoryless settings, where the decoder does not have knowledge of the probability distribution to aid in decoding. Consider a DMS U with probability distribution W ∈ P (U). Without loss of generality, we assume that U = {0, 1, . . . , Q − 1} where Q = 2t for some integer t ≥ 1. Thus we may assume that U takes on values in F2t . Our goal is to design a fixed-rate universal code that permits a decoding algorithm that is blind to W and has provably good performance. We consider the case where a linear mapping −H1 − −H2 − H= : UN → UM .. . − −HM
is used to map u ∈ U N to s ∈ U M via s = Hu
(5)
where M < N and U is memoryless with probability distribution W ∈ P (U). We denote the rate R as R=t
M N
(6)
3
and note that this corresponds to rate in a data compression sense and not in a channel coding sense (which would correspond to t − R). Throughout the rest of this discussion we shall consider rate in a data compression sense. The decoder knows that u must be consistent with s, in other words it must lie in the coset Co (H, s) = {u Hu = s}, (7) and selects u ˆ as the ‘best’ coset member (in a universal sense). This encompasses two settings: a) Fixed-to-fixed length near-lossless data compression, where u is identified as the sourceword and s is the syndrome, the output of the compression operation. b) An additive noise channel y = x ⊕ u. By using a linear code C for x, and identifying the parity check matrix H with C as (8) C = {x : Hx = 0} , then we have that a sufficient statistic for decoding is Hy = Hu = s. Decoding successfully for the noise vector u is equivalent to Decoding successfully for the transmitted codeword x: x ˆ=u ˆ ⊕ y. We assume that the rate R is achievable (i.e. t M N > H (U )). It is well known in the information theory literature [1], [2] that in the universal setting, linear codes still suffice to attain all achievable rates and can the attain same error exponent as the random coding exponent. Note that for any u ∈ U N , we have that P (u) = 2−N {D(Pu W )+H(Pu )} . Thus an ML decoder with knowledge of W operates by selecting u ˆ ∈ arg min D Pu W + H Pu . u∈Co(H,s)
It has been shown [2] that there exist linear codes satisfying 1 Er (R, W ) ≤ lim inf − log PeML (N ), N →∞ N + Er (R, W ) = min D (PW ) + |R − H (P)| . (9) P∈P(U )
Note that Er (R, W ) > 0 for all R > H(U ). Moreover, the only dependence on the distribution W in (9) is in D Pu W . From the law of large numbers, with very high probability Pu → W . Since D (W W ) is non-negative and is zero if and only if W = W , it is indeed consistent that Csisz´ar’s ‘minimum-entropy’ decoder selects as the source reconstruction the coset’s entropy minimizer: (10) u ˆ ∈ arg min H Pu . u∈Co(H,s)
In [2], Csisz´ar shows that not only do there exist linear codes whose rates can be arbitrarily close to H(U ) when such a decoder is applied, but also that minimum entropy decoding achieves the same error exponent as the optimal maximumlikelihood (ML) decoder: 1 lim inf − log Peuniv (N ) ≥ Er (R, W ) . N →∞ N
Another interpretation of the universal decoding paradigm is that it is a manifestation of Occam’s razor [37] - “Entities should not be multiplied unnecessarily” - the basis for the principle of parsimony. Since the entropy function measures the inherent uncertainty or difficulty in explaining something, selecting the lowest entropy candidate consistent with observations is the same as selecting the easiest to accept candidate consistent with observations. III. U NIVERSALLY G OOD L INEAR C ODES Now we address a’ and discuss how linear codes suffice to universally attain all achievable rates and error exponents. Csisz´ar’s lemma specifying good encoders [2, Sec. III] illustrates the existence of linear H : U N → U M such mappings 2 for any joint type P ∈ PN U with the definitions ˜ Hu = H u NH (P) (u ∈ U for some u ˜ = u (11) P(u,˜u) = P 2
|U| log(n + 1) n every joint type P ∈ PN U 2 satisfies: κN
(12)
a) NH (P) ≤ 2−N (R−H(P)−κN ) b) if H PU −U˜ ≤ R − κN then NH (P) = 0.
(13) (14)
We shall denote such codes as universally good. Note that the bound (13) can be strengthened to: NH (P) ⇒ NH (P)
≤ 2−N (R−H(P)−κN ) = 2N (H(PU )−(R−H(PU˜ |U |PU )−κN ))
” “ + N H(PU )−|R−H(PU ˜ |U |PU )−κN |
≤ 2
“
+
N H(PU )−|R−H(PU ˜ )−κN |
≤ 2
”
(15) (16)
where (15) follows because by (11), NH (P) ≤ |T (PU )| ≤ 2N H(PU ) . A. The Gilbert-Varshamov Distance One important property of any linear code C with associated parity-check matrix H is its minimum distance dmin (H) =
min
u∈Co(H,0)\0
wh (u)
(17)
where wh (·) is the Hamming distance. It is well known [32] that the larger the minimum distance of a code, the larger the number of errors it can guarantee to correct: 1 dmin (H) ⇒ wh (u) < wh (u + u ˜ ) ∀˜ u ∈ C \ 0. 2 (18) Because of the linearity of C this can be generalized to wh (u)
p∆ . Moreover, the random variables 1{E(U ,β,j)} are i.i.d. because G j
j∈A
is bipartite. We then have P (E1 ) = P Nbad (U ) > np2∆ =2n[o(n)] P Nbad (U ) = np2∆ 2 1 ≤2−n[D(p∆ p∆ )−o(n)] . Note that even if E1c occurs, an error can still occur if there is another u ˜ = u with smaller empirical entroy. This corresponds to event E2 . Since our constituent codes are universally good, characterizing E2 follows directly from application of Lemmas 3.5 and 5.4: P (E2 ) ≤ 2−N Er (Ra −2κ∆ ,W ) .
VII. I TERATIVE D ECODING M ETHODS WITH L INEAR C OMPLEXITY Again, as mentioned in remark c’ of the introduction’s approach, we utilize codes on graphs to develop another class of low-complexity decoding algorithms. This approach entails using iterative decoding to solve a provably good approximation to the decoding problem (10) with linear complexity in a codes on graphs description when Co (H, s) is specified COG G, {Hj }, {sj } . This approach is inspired by the work of Barg and Z´emor [25], which addressed approximations to
ML decoding over the binary symmetric channel. Here we will consider universal decoding over non-binary alphabets. In section VIII, we will extend this approach to multiterminal settings. Let u ∈ {0, 1}N be the true sequence that has been mapped to s ∈ {0, 1}M according to (5). Before we describe our decoding algorithm, we present two subcomponents. Define (60) dM E (H, s) = arg min H Pu˜ u ˜ ∈Co(H,s)
dM D (H, s, u ) = φ = φ({Hj }) λ
arg
min
u ˜ ∈Co(H,s)
wh (˜ u + u )
1 min dmin (Hj ) j∈V 2 = λ2 (G). =
(61) (62) (63)
There might be more than one solution to the optimization problems (60) and (61), so we say that the output of the functions in such a case is any optimal solution. The proposed decoding algorithm contains an iteration counter variable i ∈ Z, a fixed point detection variable F P ∈ {0, 1}, a set variable V ⊂ V which is either A or B, a state variable u ˆ ∈ U N which is the current iteration’s ˆ ∈ U N which is the estimate of u, and a state variable u previous iteration’s estimate of u. For any 0 < α < 1 our decoder proceeds as follows: UNIV-DEC-ITER G, {Hj }, {sj } . . . 0. Initialize: i = 0, F P = 0, and V = A. . 1. Set u ˆ j = dM E Hj , sj for each j ∈ V . log(αn( φ−λ ∆ )) 2. while i ≤ log(2−α) and F P = 0 do . . ˆ. 3. Set V = V \ V and u ˆ = u . 4. Set u ˆ j = d M D Hj , s j , u ˆ j for each j ∈ V . . 5. Set i = i + 1. . 6. Set F P = 1{ˆu =ˆu} . 7. end while 8. return u ˆ Let us consider the ECOG G∆,n , {Hj }, {sj } setting and performing UNIV-DEC-ITER G∆,n , {Hj }, {sj } . Because ∆ is fixed and does not vary with n, the complexity of performing (60) and (61) is a fixed constant. Furthermore, note that because the graph and V is either A or is bipartite B, each instance of dM E Hj , sj for j ∈ V can be done in parallel and thus the overall complexity of performing step 1 is ˆj O(N ). Analogously, the same holds true for dM D Hj , sj , u for j ∈ V and so 4 also has O(N ) complexity. A. Error Probability Now we consider the error probability associated with this decoder. The following lemma [24, Lemma 5] will be useful in the analysis: Lemma 7.1: Suppose φ ≥ 32 λ2 (G). Let A ⊆ A be such that φ−λ |A | ≤ αn (64) ∆ where α < 1. Suppose B ⊆ B and Y ⊂ E satisfy 1) every e ∈ Y has an endpoint in A .
12
2) every j ∈ B satisfies |Γ(j) ∩ Y | ≥ φ. 1 Then |B | ≤ 2−α |A |. This lemma states that, provided ∆ is large enough so that the condition φ ≥ 32 λ2 (G) is met, then for any point in the while loop of the decoding algorithm we define the ‘survivor’ nodes in V as
(65) TV (u, u ˆ ) = j ∈ V s.t. wh uj + u ˆj ≥ φ then the algorithm exhibits a contraction property [25] and will converge to u in a logarithmic number of steps (which is upper-bounded by the quantity on the right-hand side of the in equality in step 2 of UNIV-DEC-ITER G∆,n , {Hj }, {sj } ). Thus we have Corollary 7.2: If the original number of survivor nodes
TA (u) j ∈ A s.t. dM E Hj , sj = uj then UNIV-DECsatisfies |TA (u)| ≤ n φ−λ ∆ ITER G∆,n , {Hj }, {sj } successfully decodes u. That the overall decoding complexity is O(N ) follows from using a circuit of size O(N log N ) and depth O(log N ), as discussed in [23], [25]. We may now claim exponential error probability decay: Theorem 7.3: For sufficiently large but fixed ∆,the family of code constructions ECOG G∆,n , {Hj }, {sj } exhibits exponential error probability decay in n under UNIV-DEC ITER G∆,n , {Hj }, {sj } decoding. Proof: For large enough ∆ the condition φ ≥ 32 λ2 (G) in Lemma 7.1 is satisfied - because when the {hj }j∈V are universally good codes, φ grows linearly in ∆ and λ2 (G) = √ O( ∆). Let us define the event ∃˜ u ∈ Co Hj , sj \ U j s.t. Hj u ˜ = Hj U j , E(U j , j) = H Pu˜ ≤ H PU j and note that the indicator variable 1{E(U ,β,j)} has the j property that p1∆ E 1{E(U ,j)} = P E(U j , j) ≤ 2−∆Er (Ra −κ∆ ,W ) . j are i.i.d. Furthermore, the random variables 1{E(U ,j)} j
j∈A
because G is bipartite and U is memoryless. Thus, we have 1{E(U ,β,j)} , Nbad (U ) |TA (u)| = j j∈A
E[Nbad (U )] = Now, by defining p2∆ as p2∆
= = > =
np1∆ .
φ λ − α ∆ ∆ √ φ 2 ∆−1 − α ∆ ∆ √ 2 ∆ φ − α ∆ ∆ φ 2 α −√ ∆ ∆
we have that, since φ is linear in ∆, for sufficiently large ∆, p2∆ > p1∆ and thus Pe ≤ P Nbad (U ) > np2∆ = 2n[o(n)] P Nbad (U ) = np2∆ 2 1 ≤ 2−n[D(p∆ p∆ )−o(n)] .
VIII. U NIVERSAL D ECODING IN M ULTITERMINAL S ETTINGS In this section we consider universal decoding for a pair of discrete i.i.d. sources (U 1 , U 2 ) ∈ U = U1 × U2 drawn according to a joint probability distribution W ∈ P (U). For k ∈ {1, 2} we define Qk = |Uk | = 2tk and without loss of generality assume U k = {0, 1, . . . , Qk − 1}. For k ∈ {1, 2} we consider the case where a linear mapping H k : −H1k − −H2k − k : U k N → Uk M k H = .. . k −HMk − is used over F2tk to map u ∈ U N to s ∈ U Mk via sk = H k u
(66)
where Mk < N . We denote the rates as M1 N M2 . R 2 = t2 N The decoder knows that u (u11 , u21 ), (u12 , u22 ) . . . (u1N , u2N ) ∈ U N R 1 = t1
(67) (68)
must be consistent with s (s1 , s2 ), in other words it must lie in the coset Co (H, s) Co H 1 , H 2 , s1 , s2
u H 1 u1 = s1 , H 2 u2 = s2 ,
(69)
and the decoder selects u ˆ as the ‘best’ coset member (in a universal sense). This encompasses two settings: a) Fixed-to-fixed length near-lossless Slepian-Wolf [45] distributed data compression, where u is identified as the sourceword and s is the syndrome, the output of the compression operation. b) A multiple access channel where x1 ∈ U1 and x2 ∈ U2 are mapped to 1 2 y = (y11 , y12 ), (y21 , y22 ) . . . (yN , yN ) ∈ UN according to
y k = xk ⊕ uk
k=1,2
.
13
By using linear codes C k for xk , and identifying the parity check matrix H k with C k as
Ck = x : H kx = 0 ,
(70)
PN U 2 with the definitions ˜ = u, NH 1 ,H 2 (P) (u ∈ U ∃ u
˜ 1 H 1 u1 = H 1 u , H 2 u2 = H 2 u ˜2 Pu1 ,˜u1 ,u2 ,˜u2 = P (78) 2 i PN U = P = Pu1 ,˜u1 ,u2 ,˜u2 ∈ PN U 2 u ˜ 1 = u1 , u ˜ 2 = u2 : i = 1 ˜ 2 = u2 : i = 2 u ˜ 1 = u1 , u (79) u ˜ 1 = u1 , u ˜ 2 = u2 : i = 3
then we have that a sufficient statistic [2], [46], [47] for decoding is the pair
H k y k = H k uk = sk
k=1,2
.
Decoding successfully for u is equivalent to decoding successfully for the transmitted codeword xk : ˆ ⊕ yk . xˆk = u k
We assume that the rate pair (R1 , R2 ) is achievable [45]: ≥ H(U 1 |U 2 ) ≥ H(U 2 |U 1 )
R1 R2
1
R1 + R2
(71a) (71b)
2
≥ H(U , U ).
(71c)
As discussed in [2], linear code pairs still suffice to attain all achievable rates and can universally attain the random coding error exponent Er (R1 , R2 , W ) given by R3 R1 + R2 Er1 (R, W )
min
(72)
+ + R − H PU 1 |U 2 |PU 2
(73)
Er2 (R, W )
P∈PN (U )
Er (R1 , R2 , W )
min
P∈PN (U )
where κN → 0 as N → ∞. We will denote such code pairs as universally good. Note that the bound (80) can be strengthened to: NH 1 ,H 2 (P)
(82)
−N (Ri −H(P)−κN )
≤2
=2N (H(PU 1 ,U 2 )−(Ri −H(PU˜ 1 ,U˜ 2 |U 1 ,U 2 |PU 1 ,U 2 )−κN )) ⇒NH 1 ,H 2 (P)
(83)
≤2 ≤
(84)
i h + N H(PU 1 ,U 2 )−|R1 −H(PU ˜ 1 |U 2 |P ˜2 )−κN | U 2 h ,i i + N H(PU 1 ,U 2 )−|R2 −H(PU ˜ 2 |U 1 |P ˜1 )−κN |
2
(74) +
D (PW ) + |R − H (P)|
min Eri (Ri , W )
i∈{1,2,3}
(75) (76)
u ˆ ∈ arg
,i = 2
U
2
min
u∈Co(H,s)
H Pu .
,i = 3 (85)
where (84) follows because by the definition of NH 1 ,H 2 (P), NH 1 ,H 2 (P) ≤ T PU 1 ,U 2 ≤ 2N H(PU 1 ,U 2 ) and (85) follows from (79). 1) Distance Properties of Universally Good Code Pairs: By defining the ordered pair (0, 0) to correspond to 0 in the usual hamming weight definition, we have that for u ∈ U N =
1 2 N U ×U ,
under the minimum-entropy decoding rule
=1
i h + N H(PU 1 ,U 2 )−|R3 −H(PU ˜ 2 |U 1 ,U 2 |PU 1 ,U 2 )−κN | ˜ 1 ,U
D (PW )
min
+ + R − H PU 2 |U 1 |PU 1 Er3 (R, W )
a) NH (P) ≤ 2−N (Ri −H(P)−κN ) (80) b) if H PU 1 −U˜ 1 ,U 2 −U˜ 2 ≤ Ri − κN then NH (P) = 0 (81)
” “ + N H(PU 1 ,U 2 )−|Ri −H(PU ˜ 1 ,U ˜ 2 |U 1 ,U 2 |PU 1 ,U 2 )−κN |
D (PW )
P∈PN (U )
for i ∈ {1, 2, 3} with 2 R3 R1 + R2 , every joint type P = i U satisfies: Pu1 ,˜u1 ,u2 ,˜u2 ∈ PN
wh (u) = (77)
N i=1
1{(u1 ,u2 )=(0,0)} . i i
We define the minimium distance here as Note that Er (R1 , R2 , W ) > 0 for all achievable rate pairs (R1 , R2 ).
dmin (H 1 , H 2 )
min
u∈Co((H 1 ,H 2 ),0)\0
wh (u) .
It follows that A. Universally Good Code Pairs Csisz´ar’s lemma specifying good encoders [2, Sec. III] the existence of pairs of linear encoders k illustrates H : UkN → U Mk k=1,2 such that for any joint type P ∈
dmin (H 1 , H 2 ) = min dmin (H k ). k∈{1,2}
Note that, from (81), universally good code pairs will satisfy dmin (H k ) will lying on the Gilbert-Varshamov bound for each k ∈ {1, 2}. Thus dmin (H 1 , H 2 ) will also grow linearly in N .
14
2) (β, E) Robust Code Pairs: As in the point-to-point case, we generalize the notion of error exponents for universally good code pairs. This will be useful in the error probability analysis for multiterminal LP decoding to be discussed later. Consider the events ∃ (˜ u1 , u ˜ 2 ) = (u1 , u2 ) Eβuniv = (˜ u1 , u ˜ 2 ) ∈ Co (H , H 2 ), (s1 , s2 ) , H Pu˜ 1 ,˜u2 ≤ H Pu1 ,u2 + β ∃ (˜ u1 , u ˜ 2 ) = (u1 , u2 ) EβML = (˜ u1 , u ˜ 2 ) ∈ Co (H , H 2 ), (s1 , s2 ) , D Pu˜ 1 ,˜u2 W + H Pu˜ 1 ,˜u2 ≤ D Pu1 ,u2 W + H Pu1 ,u2 + β where β ≥ 0. We say that the pair (H 1 , H 2 ) is (β, E) universally robust if −
1 log P Eβuniv ≥ E, N
and (β, E) ML-robust if 1 − log P EβML ≥ E. N For i ∈ {1, 2, 3}, by defining the sets 2 univ i Tβ,i U | H PU˜ 1 ,U˜ 2 ≤ H PU 1 ,U 2 + β P ∈ PN 2 ML i Tβ,i P ∈ PN U | D PU˜ 1 ,U˜ 2 W + H PU˜ 1 ,U˜ 2 ≤ D PU 1 ,U 2 W + H PU 1 ,U 2 + β
we have from direct application of the analysis in Sections IIIC and III-D that universally good code pairs are both (β, E) universally robust and (β, E) ML-robust for any E satisfying 0 < E < Er (R1 , R2 , β, W ) min Eri (Ri − β, W ) . i=1,2,3
Note that whenever {Ri − β}i=1,2,3 all lie within the SlepianWolf region (71) then Er (R1 , R2 , β, W ) > 0. B. Code Pairs On Graphs
“
2 s1 j , sj
”
Fig. 4.
“ ” 1 , H2 Hj j
` 1 2´ ue , u e
„ « H 1 , H 2 j j
„
s1 , s2 j j
«
` ´ Graphical representation of Co (H 1 , H 2 ), (s1 , s2 )
The coset pair Co (H 1 , H 2 ), (s1 , s2 ) can be expressed as Co (H 1 , H 2 ), (s1 , s2 ) = u uj ∈ Co (Hj1 , Hj2 ), (s1j , s2j ) , ∀j∈V . (87) For a particular graph G = (V, E) we denote COG G, {(Hj1, Hj2 )}, {(s1j , s2j )} as the same way as we specify Co (H 1 , H 2 ), (s1 , s2 ) in terms of (87). 1) Parity-Check Representation: In some situations, universal decoding might need to be performed when the structure of the codes is out of the control of the designer of the decoding algorithm - and thus the expander graph construction with universally good constituent code pairs does not apply. One example of such a scenario is network coding for correlated sources [48], where nodes throughout a network locally perform random linear operations and the coefficients are handed to the decoder. In such a setting, we can consider without loss of generality the parity-check representation (as discussed in Section V-A) for each individual source is provided, and 1 G2 , {(Hj
, Hj2 )}, {(s1j , s2j )}. thus we have the information G1 , 1 2 In this case we define G to be G , G and still use the terminology COG G, {(Hj1 , Hj2 )}, {(s1j , s2j )} . 2) Expander Code on Graph Pairs: From here on, unless specified otherwise, we assume that we are working with structured encoders with a single graph G that is an expander, i.e. G∆,N . For j ∈ A, we let each code pair (H k )k=1,2 with rates (Ra,k )k=1,2 be universally robust and we assume they are achievable, i.e. they satisfy (71). For j ∈ B we let each code pair (H k )k=1,2 with rates (Ra,k )k=1,2 also be universally robust, where
We consider the codes on graphs approach applied in the previous section to multiterminal settings. Here we would like to consider code constructions that employ one graph G = (V, E) to specify the whole multiterminal system in the (Rb,k = Rk − Ra,k )k=1,2 . following way: We adhere to denoting such code pairs on graphs as 1 2 • Each j ∈ V is associated with a code pair (Cj , Cj ) and , {(Hj1 , Hj2 )}, {(s1j , s2j )} . ECOG G ∆,n a syndrome pair (s1 , s2 ). • Each edge e ∈ E is associated with a pair of states ue = C. Universal Goodness of Bipartite Graph Code Pairs (u1e , u2e ). 1 2 Herewe consider how a bipartite graph code of the form • Each local code pair (Cj , Cj ) enforces the constraint that 1 2 1 2 , {(H , H )}, {(s , s )} performs under ECOG G ∆,n k j j j j uj ∈ Co Hjk , skj k=1,2 ⇔ uj ∈ Co (Hj1 , Hj2 ), (s1j , s2j ) . minimum-entropy decoding, as ∆ grows. Let N = n∆. 2 i (86) For i ∈ {1, 2, 3}, let P ∈ PN U correspond to the joint
15
type of any length-N 4-tuple (u1 , u ˜ 1, u2 ,u ˜ 2 ) satisfying k k k k j i 2 ˜ }k=1,2 . Define P ∈ P∆ U to correspond {H u = H u to the joint type of any local length-∆ 4-tuple (uj1 , u ˜ 1j , u2j , u ˜ 2j ) k k k k = Hj u ˜ }k=1,2 . Then by defininig satisfying {Hj u Ra,3 = Ra,1 + Ra,2 we have: NH 1 ,H 2 (P) ≤ NHj1 ,Hj2 Pj 1 n
≤ 1 n
P
j∈A
Pj =P j∈A
P j∈A
j 2−∆(Ra,i −H(P )−κ∆ ) (88)
Pj =P j∈A
≤ 2−N (Ra,i −H(P)−2κ∆ )
(89)
where (88) follows from (80), and (89) follows from Lemma 5.3. Thus it follows that the pair (H 1 , H 2 ) becomes universally good for large but fixed ∆, in terms of the rate Ri = Ra,i − 2κ∆ . 1) Encoding: Encoding for the compression situation is done by applying applying {sjk = Hjk ukj }k=1,2 for all j ∈ V . For the channel coding scenario, the encoding done is the same as discussed in [23], [25].
1 2 1 2 UNIV-DEC-LP 1 j , 2Hj )}, {(sj ,sj1)} 2 1 ∗ 2 ∗ G,. {(H = u ˜ ,u ˜ ,u 0. Set u , u ˜ for any u ˜ satisfying Pu˜ 1 ,˜u2 = U. 1. For each P ∈ P∆ (U) do 2. Execute LP-PRIMAL G, {− log P(a)}, {(Hj1 , Hj2 )}, {(s1j , s2j )} . 3. If the I∗ is integral and optimal solution H Pµ1 (I ∗ ),µ1 (I ∗ ) ≤ H Pu1 ∗ ,u2 ∗ ∗ ∗ . then set u1 , u2 = µ1 (I ∗ ), µ1 (I ∗ ) . 4. end for ∗ ∗ 5. return u1 , u2 We shall now define 0 < δA 0 < δB p1∆ p2∆ β
D. Decoding Methods 1) LP Decoding Methods: The LP decoding algorithm for the multiterminal setting is almost completely a direct application of Section VI. We consider a codes on graphs representation We first construct indicator variables Ia,e ∈ {0, 1}, for a ∈ U = (U1 × U2 ), e ∈ V , such that Ia,e = 1{ue =a} . Thus Ia,e specifies u ∈ U n as u1 = µ1 (I) = a1 I(a1 ,a2 ),e (a1 ,a2 )∈U 2
2
u = µ (I) =
dmin (Hj1 , Hj2 ) j∈A ∆ dmin (Hj1 , Hj2 ) = min j∈B ∆ −∆[Er (Ra,1 ,Ra,2 ,β,W )−ν∆ ] = 2 2 δB 1 −√ = ∆ + 1 1 + δB /δA + log(∆)/β ∆ + ˜ ∈ β ∈ R s.t. Er (Ra,1 , Ra,2 , β, W ) > 0 =
min
where ν∆ → 0 as ∆ → ∞. From Theorem 6.1 we have that for sufficiently large but fixed ∆, the error probability Pe of UNIV-DEC-LP G∆,n , {(Hj1 , Hj2 )}, {(s1j , s2j )} satisfies Pe ≤ 2−n[D(p∆ p∆ )−o(n)] . 1
2) Iterative Decoding Methods: The iterative expander decoding algorithm for the multiterminal setting is almost completely a direct application of Section VII. Define dM E (H 1 , H 2 ), (s1 , s2 ) = arg min H Pu˜ (90) u ˜ ∈Co(H,s) dM D (H 1 , H 2 ), (s1 , s2 ), u = arg min wh (˜ u + u ) u ˜ ∈Co(H,s)
a2 I(a1 ,a2 ),e .
(91)
(a1 ,a2 )∈U
When COG G, {(Hj1 , Hj2 )}, {(s1j , s2j )} corresponds to G which is a single graph, the coset pair from (69) and Co (H 1 , H2 ), (s1 , s2 ) is well-defined polytope B˜ G, {(Hj1 , Hj2 )}, {(s1j , s2j )} is well-defined from direct application of the definitions in Section VI-B. When COG G, {(Hj1 , Hj2 )}, {(s1j , s2j )} corre1 2 sponds to G = {G , G }, then the polytope B˜ G, {(Hj1 , Hj2 )}, {(s1j , s2j )} is easily defined as B˜ G, {(Hj1 , Hj2 )}, {(s1j , s2j )} = I|µ1 (I) ∈ B˜ G, {Hj1 }, {s1j } , µ2 (I) ∈ B˜ G, {Hj2 }, {s2j } where B˜ G, {Hj }, {sj } is discussed in Section VI-B and given by (39a)- (39c). The formulation of the LP relaxation LPPRIMAL G, {γa }, {(Hj1 , Hj2 )}, {(s1j , s2j)} for this setting follows directly from the definition of Co (Hj1 , Hj2 ), (s1j , s2j ) . We can thus consider the following universal decoding algorithm
2
1 φ = φ({Hj }) = min dmin (Hj1 , Hj2 ) j∈V 2 λ = λ2 (G)
(92) (93)
and consider for any 0 < α < 1 the algorithm: UNIV-DEC-ITER G, {(Hj1 , Hj2 )}, {(s1j , s2j )} . . . A. 0. Initialize: i = 0, F P = 0, and V = . 1. Set u ˆ j = dM E (H 1 , H 2 ), (s1 , s2 ) for each j ∈ V . log(αn( φ−λ ∆ )) 2. while i ≤ log(2−α) and F P = 0 do . . ˆ. 3. Set V = V \ V and u ˆ = u . 1 2 4. Set u ˆ j = dM D (H , H ), (s1 , s2 ), u ˆ j for each j ∈ V . . 5. Set i = i + 1. . 6. Set F P = 1{ˆu =ˆu} . 7. end while 8. return u ˆ By defining p1∆ p2∆
2−∆Er (Ra,1 ,Ra,2 ,W ) φ 2 −√ = α ∆ ∆ =
16
we have from Corollary 7.2 that for sufficiently large ∆ the error probability Pe of UNIV-DEC-ITER G∆,n , {(Hj1 , Hj2 )}, {(s1j , s2j )} satisfies Pe ≤ 2−n[D(p∆ p∆ )−o(n)] . 1
2
IX. D ISCUSSION AND C ONCLUSION Here we have considered the problem of universal decoding in discrete memoryless systems where linear codes suffice from a practical viewpoint. To understand the performancecomplexity boundaries of our problem, we first showed that optimal universal decoding is NP-complete. We next used a ‘divide-and-conquer’ approach to construct large good codes from smaller good ones and connect them with the edges of a graph with good expansion properties. The expansion properties of the graph allowed us to formulate linear programming and iterative decoding algorithms that provably work with exponentially high probability. It is our desire that these provably good algorithms and code constructions will lead to further developments in constructions and algorithms for universal coding with better performance vs. complexity tradeoffs. We note that the error exponents for the linear programming based universal decoder and the iterative ‘expander codes’ based decoder both have an error exponent characterized in terms of a Kullback-Leibler distance between an exponentially decaying quantity in the inner block length ∆ and another quantity that decays much more slow in ∆. The error exponent bounds derived from this analysis favor the latter approach. However, empirical comparisons between linear programming decoding and expander code decoding on the same graphical representation of a code show that linear programming performs much better [30]. Thus we think a better error probability analysis for linear programming based decoding is possible to illustrate this observation theoretically. Although the theoretical results illustrated here discuss polynomial (and even linear) complexity decoding algorithms with guaranteed exponential error probability decay, the coefficients in the polynomials can be large enough to prevent implementation in silicon. Most likely, an approach with good empirical performance but no theoretical guarantees (such as applying iterative algorithms on bipartite graph codes with small fixed degrees and cycles) may have higher likelihood to make its way into real systems. Thus, one possible step in that direction is to consider designing iterative low-complexity algorithms }, {s } that mimic the UNIV-DEC-LP G, {H j j and UNIV DEC-ITER G, {Hj }, {sj } algorithms of Sections VI-B and VII with low complexity and good empirical performance. One step in that direction for LP decoding is a linear complexity approximate LP decoder [49] that appears to have the same threshold as belief-propagation decoding [31]. ACKNOWLEDGEMENT The authors would like to thank Ralf Koetter for his thoughtful input on this subject.
R EFERENCES [1] V. D. Goppa, “Universal decoding for symmetric channels,” Probl. Peredachi Inform., vol. 11, no. 1, pp. 15–22, 1975, (In Russian). [2] I. Csisz´ar, “Linear codes for sources and source networks: Error exponents, universal coding,” IEEE Transactions on Information Theory, vol. 28, no. 4, pp. 585–592, 1982. [3] R. Blahut, Principles and Practice of Information Theory. AddisonWesley, 1983. [4] A. Lapidoth and J. Ziv, “Universal decoding for noisy channels: an algorithmic approach,” in Information Theory. 1997. Proceedings., 1997 IEEE International Symposium on, Ulm, 1997. [5] M. Feder and A. Lapidoth, “Universal decoding for channels with memory,” IEEE Transactions on Information Theory, vol. 44, no. 5, pp. 1726–1745, 1998. [6] S.-I. Amari and T. Han, “Statistical inference under multiterminal rate restrictions: a differential geometric approach,” IEEE Transactions on Information Theory, vol. 35, no. 2, pp. 217–227, 1989. [7] T. S. Han and S. Amari, “Parameter estimation with multiterminal data compression,” IEEE Transactions on Information Theory, vol. 41, pp. 1802–1833, 1995. [8] ——, “Statistical inference under multiterminal data compression,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2300–2324, 1998. [9] R. Jornsten, “Data compression and its statistical implications, with an application to the analysis of microarray images,” Ph.D. dissertation, University of California, Berkeley, Berkeley, CA, December 2001. [10] R. Jornsten and B. Yu, “Multiterminal estimation - extensions and geometric interpretation,” in Information Theory, 2002. Proceedings. 2002 IEEE International Symposium on, 2002. [11] D. Falconer, “Decision-directed decoder for a binary symmetric channel with unknown crossover probability,” IEEE Transactions on Information Theory, vol. 17, no. 3, pp. 336–343, 1971. [12] A. Lapidoth and J. Ziv, “On the decoding of convolutional codes on an unknown channel,” IEEE Transactions on Information Theory, vol. 45, no. 7, pp. 2231–2332, 1999. [13] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai Shitz, “On information rates for mismatched decoders,” IEEE Transactions on Information Theory, vol. 40, no. 6, pp. 1953–1967, 1994. [14] E. R. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg, “On the intractability of certain coding problems,” IEEE Transactions on Information Theory, vol. 24, no. 3, pp. 384–386, 1978. [15] R. Gallager, “Low-density parity-check codes,” IRE Transactions on Information Theory, vol. 8, pp. 21–28, Jan. 1962. [16] G. D. Forney, “Concatendated codes,” Ph.D. dissertation, MIT, Cambridge, MA, June 1965. [17] ——, “Generalized minimum distance decoding,” IEEE Transactions on Information Theory, vol. 12, no. 2, pp. 125–131, 1966. [18] ——, “Codes on graphs: Normal realizations,” IEEE Transactions on Information Theory, pp. 101–112, 2001. [19] F. Kschischang, B. Frey, and H.-A. Loeliger, “Factor graphs and the sumproduct algorithm,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 498 – 519, 2001. [20] A. Lubotzky, R. Phillips, and P. Sarnak, “Ramanujan graphs,” Combinatorica, vol. 8, no. 3, pp. 261–277, 1988. [21] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting codes and decoding: Turbo codes,” Proc. IEEE International Communications Conference, 1993. [22] S. Chung, J. G.D. Forney, T. Richardson, and R. Urbanke, “On the design of low-density parity-check codes within 0.0045 db of the Shannon limit,” IEEE Communications Letters, vol. 5, no. 2, pp. 58–60, February 2001. [23] M. Sipser and D. Spielman, “Expander codes,” IEEE Transactions on Information Theory, vol. 42, no. 6, pp. 1710–1722, 1996. [24] G. Z´emor, “On expander codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 835–837, 2001. [25] A. Barg and G. Z´emor, “Error exponents of expander codes,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1725–1729, 2002. [26] R. Koetter and P. O. Vontobel, “Graph-covers and iterative decoding of finite length codes,” Proceedings of Turbo Codes Conference, Brest, 2003. [27] P. O. Vontobel and R. Koetter, “On the relationship between linear programming decoding and min-sum algorithm decoding,” in International Symposium on Information Theory and its Applications, Parma, Italy, October 2004.
17
[28] J. Feldman, M. Wainwright, and D. R. Karger, “Using linear programming to decode linear codes,” Proceedings of Conference on Information Sciences and Systems, The John Hopkins University, March 2003. [29] J. Feldman, “Decoding error-correcting codes via linear programming,” PhD Dissertation, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September 2003. [30] J. Feldman and C. Stein, “LP decoding achieves capacity,” in ACM-SIAM Symposium on Discrete Algorithms (SODA), January 2005, conference. [31] P. O. Vontobel and R. Koetter, “Bounds on the threshold of linear programming decoding,” in IEEE Inform. Theory Workshop, March 2006. [32] S. Wicker and S. Kim, Fundamentals of Codes, Graphs, and Iterative Decoding. Norwell, MA: Kluwer Academic Publishers, 2003. [33] I. Csisz´ar, “The method of types,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2205–2523, 1998. [34] A. Lempel and J. Ziv, “A universal algorithm for sequential data compression,” IEEE Transactions on Information Theory, pp. 337–343, 1977. [35] ——, “Compression of individual sequences via variable-rate coding,” IEEE Transactions on Information Theory, pp. 530–536, 1978. [36] M. Effros, K. Visweswariah, S. R. Kulkarni, and S. Verd´u, “Universal lossless source coding with the Burrows Wheeler transform,” IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 1061–1081, May 2002. [37] W. Thorburn, “Occam’s razor,” Mind, vol. 24, pp. 287–288, 1915. [38] R. Gallager, Information Theory and Reliable Communication. New York, NY: John Wiley & Sons, 1968. [39] N. Alon and F. R. K. Chung, “Explicit construction of linear sized tolerant networks,” Discrete Math, vol. 72, pp. 15–19, 1989. [40] J. Feldman, T. Malkin, C. Stein, R. A. Servedio, and M. J. Wainwright, “LP decoding corrects a constant fraction of errors,” in IEEE International Symposium on Information Theory, Chicago, Ill, June 27 – July 2 2004, conference. [41] D. Bertsimas and J. N. Tsitsiklis, Introduction to Linear Optimization. Belmont, MA: Athena Scientific, 1997. [42] R. Horst and H. Tuy, Global Optimization: Deterministic Approaches, 3rd ed. Berlin, Germany: Springer Verlag, 1996. [43] G. Forney, R. Koetter, J. Kschischang, and A. Reznik, “On the effective weights of pseudocodewords for codes defined on graphs with cycles,” Codes, Systems and graphical models, pp. 101–112, 2001. [44] B. J. Frey, R. Koetter, and A. Vardy, “Signal space characterization of iterative decoding,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 766–781, 2001. [45] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Transactions on Information Theory, vol. 19, no. 4, pp. 471–480, 1973. [46] M. Effros, M. M´edard, T. Ho, S. Ray, D. Karger, R. Koetter, and B. Hassibi, “Linear network codes: A unified framework for source, channel, and network coding,” in Advances in Network Information Theory, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, G. et al, Ed., 2003, vol. 66. [47] S. Ray, M. Effros, M. M´edard, R. Koetter, T. Ho, D. Karger, and J. Abounadi, “On separation, randomness and linearity for network codes over finite fields,” MIT, Tech. Rep. LIDS-2687, March 2006. [48] T. Ho, M. M´edard, M. Effros, and R. Koetter, “Network coding for correlated sources,” in Proceedings of CISS, 2004. [49] P. O. Vontobel and R. Koetter, “Towards low-complexity linearprogramming decoding,” in Proc. 4th Int’l Conf. on Turbo Codes and Related Topics, April 2006.
A PPENDIX A. Proof of Lemma 3.3 Suppose we have that a parity check matrix H has duniv = d = N δ. From the definition of duniv in ((23)) it follows that for any nonzero u ˜ ∈ Co (H, 0), the following holds: u) ≥ d wh (˜
⇔
wh (˜ u ⊕ 1) ≤ n − d
(94)
and wh (˜ u) ≤ n − d
⇔
wh (˜ u ⊕ 1) ≥ d.
(95)
is satisfied, we have 1) ˜ ) ≤ wh (u) + wh (˜ u) wh (u ⊕ u 1 < d + n − d owing to (96), (95) 2 1 =n− d 2 2) wh (u ⊕ u ˜ ) = n − wh (u ⊕ u ˜ ⊕ 1) ≥ n − [wh (u) + wh (˜ u ⊕ 1)] 1 > n − d + n − d owing to (96), (94) 2 1 = d. 2 Likewise, if wh (u ⊕ 1)
n − d + n − d owing to (97), (94) 2 1 = d. 2 Thus in either case, because of the following properties: i. δ ≤ 12 (this follows from the definition (23) of duniv ), ii. The binary entropy function H2 (·) is monotonically increasing on [0, 12 ), iii. The binary entropy function H2 (·) is symmetric around 1 2, we have that H Pu˜ ⊕u > H Pu . Thus if we define s = Hu then we have that u is the unique solution to min
u ˆ ∈Co(H,s)
H Puˆ .
The alternative statement in the lemma holds because the two statements are equivalent: •
(96)
(97)
then
•
Then if
1 1 d ⇔ wh (u) > n − d 2 2
1 wh(u)< 12 duniv 1or wh (u ⊕ 1) < 2 duniv , H Pu < H2 2 δ .
This also follows from properties i.–iii. above.
18
B. Proof of Lemma 3.5 By defining the set
Tβuniv P ∈ PN U 2 | H (PU˜ ) ≤ H (PU ) + β
where (108) follows from (107). Therefore (98)
we have that universally good codes satisfy P Eβuniv ≤ NH (P) 2−N [D(PU W )+H(PU )] P∈Tβuniv
≤
i h + −N D(PU W )+|R−H(PU ˜ )−κN |
2
(99)
P∈Tβuniv 2
≤ (N + 1)|U | i h + −N D(PU W )+|R−H(PU ˜ )−κN | · max 2 (100) P∈Tβuniv
i h + −N D(PU W )+|R−H(PU ˜ )| −2κN
≤ max 2 P∈Tβuniv
≤ max 2−N [D(PU W )+|R−H(PU )−β|
+
(101)
−2κN ]
P∈Tβuniv
(102) −N [D(PU W )+|R−H(PU )−β|+ −2κN ] ≤ max 2 P∈Pn (U )
= 2−N [Er (R−β,W )−2κN ] ,
(103)
where (99) follows from (16), (100) follows from (2), (101) follows from (12), (102) follows from (98), and (103) follows from (9). Thus, universally good codes of rate R are (β, E) universally robust for all β ∈ [0, R − H(U )] and E ∈ [0, Er (R − β, W )]. C. Proof of Lemma 3.6 By defining the set TβML = P ∈ PN U 2 |
+ D Pu W + R − H Pu˜ + 1 = D Pu W + R − H Pu˜ 2 + 1 + D Pu W + R − H Pu˜ 2 + 1 ≥ D Pu˜ W + R − β − H Pu 2 + 1 + D Pu W + R − H Pu˜ 2 + 1 ≥ D Pu˜ W + R − β − H Pu 2 + 1 + D Pu W + R − β − H Pu˜ 2 + 1 = D Pu W + R − β − H Pu 2 + 1 + D Pu˜ W + R − β − H Pu˜ 2 ≥ Er (R − β, W )
(110)
(111)
where (110) follows from (109), and (111) follows from (9). Thus, it follows that P EβML ≤ 2−N [Er (R−β,W )−2κN ] and universally good codes of rate R are (β, E) ML robust for all β ∈ [0, R − H(U )] and E ∈ [0, Er (R − β, W )].
D. Proof of Lemma 6.1
D (PU˜ W ) + H (PU˜ ) ≤ D (PU W ) + H (PU ) + β
(104)
we have that universally good codes also satisfy i h + ML −N D(PU W )+|R−H(PU ˜ )| −2κN P Eβ ≤ max 2 . (105) P∈TβML
TβML :
Let us consider the constraint for any P ∈ D Pu˜ W + H Pu˜ ≤ β + D Pu W + H Pu (106) ⇔D Pu˜ W − H Pu − β ≤ D Pu W − H Pu˜ . (107) If H Pu > R − β, then + D Pu W = D Pu W + R − β − H Pu + ≤ D Pu W + R − H Pu˜ . If, on the other hand, H Pu < R − β: + D Pu˜ W + R − β − H Pu = D Pu˜ W + R − β − H Pu (108) ≤ D Pu W + R − H Pu˜ + ≤ D Pu W + R − H Pu˜ (109)
Proof: We define Γ(T ) to be the nodes in the neighborhood of T and note that, because G is bipartite, Γ(T ) ⊆ B. We shall explicitly provide a dual feasible solution with cost equal to the primal cost of u. Note that the cost of u in the primal LP is γ uj , j , j∈A
which is an upper bound on the cost of any dual solution. We now show we can achieve this cost, which means that u is indeed an optimal solution. Define τˆe,j,a
=
vj
=
γa , j ∈ A 0 j∈B Iuj ,a,e τˆe,j,a
(112) (113)
e∈N (j) a∈U
γ uj , j , j ∈ A 0 j∈B = τˆe,j,a Iuj ,a,e + τ˜e,j,a (1 − Iuj ,a,e )
=
τe,j,a
where τ˜e,j,a is yet to be specified. Note that j∈V
vj =
γ uj , j j∈A
(114) (115)
19
T A γ¯ + β δδB A
−β
−¯ γ − β δδB A
B Fig. 5.
β
γa − β − β
γa − 0
˜ 3) the remaining (1 − δA )∆ edges - denoted as E(j) - in locations where u ˜ j [e] = uj [e], have weight τˆe,j,a = γa . Thus
Γ(T ) Edge weight τ˜e,j,a settings for each node j
which is the cost of u in the primal. Also note that, for j ∈ V , with u ˜ j = uj ∈ Co Hj , sj : Iuj ,a,e τe,j,a e∈N (j) a∈U
=
e∈N (j) a∈U
=
e∈N (j) a∈U
Iuj ,a,e
Iu˜ j ,a,e τe,j,a
(116)
δB ≥ − ρ ∆β + (δA − ρ )∆ γ¯ + β δA Iuj ,a,e γa +
(117)
a∈U e∈Γ(j)
δB = − ρ ∆β + (δA − ρ )∆ γ¯ + β δA Iuj ,a,e γa + vj −
τˆe,j,a Iuj ,a,e + τ˜e,j,a (1 − Iuj ,a,e )
δB ≥ − ρ ∆β + (δA − ρ )∆ γ¯ + β δA γ +vj − δA ∆¯ δB =vj + ∆ βδB − ρ γ¯ + β + β δA =vj .
=vj .
edge location e leaving T e entering T e incident to Γ(T ) \ T all other e
τ˜e,j,a , j ∈ A γ¯ + β δδB A −β γa − β − γa −
τ˜e,j,a , j ∈ B −¯ γ − β δδB A β β 0
Note that, from summing up each row in the weighting assignments, all the dual constraints given by (42b) are satisfied with slack. Thus from complementary slackness [41, Sec 4.3] u∗ will be the unique solution. We now show the above weight assignment also satisfies all the dual constraints given by (42a): (i) For a node j ∈ T , there are at most ρ∆ ≤ ρ ∆ incoming edges e with weight −β. The remaining (outδB going) edges have weight γ¯ + β δA . Furthermore, for any u ˜ j ∈ Co Hj , sj \uj we have that wh uj + u ˜ j ≥ δA ∆ so in the worst case, we have that ˜ j [e] = uj [e], have 1) ρ ∆ edges, in locations where u weight τ˜e,j,a = −β ˜ j [e] = uj [e], 2) (δA − ρ )∆ edges, in locations where u have weight τ˜e,j,a = γ¯ + β δδB A
(119)
˜ a∈U e∈(N (j)\E(j))
Iuj ,a,e τˆe,j,a
Thus the dual constraint (42a) is satisfied for u ˜ j = uj ∈ Co (Hj , sj ). We have to be careful with how we set the auxiliary edge weights τe,j,a , τ˜e,j ,a ) for the dual constraint where (˜ u ˜ j ∈ Co Hj , sj \ uj . To do this, we define a direction for each edge in the graph. All edges that are not incident to T are directed toward the nodes in A. Edges incident to T are directed according to a ρ-orientation of the subgraph induced by (T ∪ Γ(T )). This is possible using Lemma 5.2, since |T ∪ Γ(T )| ≤ αn by assumption, and so |{Γ (j)}j∈T | ≤ αρ∆n by expansion. To satisfy the edge constraints (42b) of the dual LP, the sum of these two weights should be strictly less than γa . We give the assignment in detail below (also see Figure 5), where > 0 is a small constant we specify later:
(118)
˜ a∈U e∈E(j)
(120)
(121)
where (117) follows from 1) and 2) above, (118) follows from 3) above, (119) follows from (113) and (112), (120) follows from (47), and (121) follows from (49). (ii) For a node j ∈ Γ(T ), there are at most ρ∆ ≤ ρ ∆ γ − β δδB and incoming edges e with weight τ˜e,j,a = −¯ A = β. the remaining (outgoing) edges have weight τ ˜ e,j,a \ u ∈ Co H , s we have that Furthermore, for any u ˜ j j j j wh uj + u ˜ j ≥ δB ∆ so in the worst case, we have that 1) ρ ∆ edges, in locations where u ˜ j [e] = uj [e], have γ − β δδB weight τ˜e,j,a = −¯ A 2) (δB − ρ )∆ edges, in locations where u ˜ j [e] = uj [e], have weight τ˜e,j,a = β 3) the remaining (1 − δB )∆ edges, in locations where u ˜ j [e] = uj [e], have weight τˆe,j,a = 0. Thus a∈U e∈Γ(j)
δB Iu˜ j ,a,e τe,j,a ≥ − ρ ∆ γ¯ + β δA
(122)
+(δB − ρ )∆β (123) δ B =∆ δB β − ρ γ¯ + β + β δA =0 (124) =vj
(125)
where (122) follows from 1) above, (123) follows from 2) and 3) above, (124) follows from (112), and (125) follows from (114). (iii) For a node j ∈ (A \ T ), we have that every incident edge e is incoming and has weight τ˜e,j,a equal to either γa − or γa − β − . In the worst case, they all have weight τ˜e,j,a = γa − β − . So for any u = u ˜ ∈ Co Hj , sj we
20
have
a∈U e∈Γ(j)
≥
a∈U e∈Γ(j)
Iu˜ j ,a,e τe,j,a Iu˜ j ,a,e γa − ∆(β + )
=γ u ˜ j , j − ∆(β + ) >∆(β + ) + γ uj , j − ∆(β + ) =γ uj , j
(127)
=vj
(128)
(126)
where (126) follows from (112) -(114), (127) follows for sufficiently small from (44), and (128) follows from (114). (iv) For a node j ∈ (B \Γ(T )), we have every edge is leaving j and thus we have that τ˜e,j,a = 0 for all a and all e ∈ Γ (j). Thus Iu˜ j ,a,e τe,j,a = vj . a∈U e∈Γ(j)
So in all cases we have that Iu˜ j ,a,e τe,j,a = vj . a∈U e∈Γ(j)