Error exponents for joint source-channel coding ... - 4 Numbers game

0 downloads 0 Views 194KB Size Report
source coding and sequential random channel coding. These two bounds are in general not the same.2 We then study the joint source-channel coding problem.
Error Exponents for Joint Source-Channel Coding with Delay-Constraints Cheng Chang

Anant Sahai

Wireless Foundations, Dept. of EECS University of California at Berkeley Email: [email protected]

Wireless Foundations, Dept. of EECS University of California at Berkeley Email: [email protected]

Abstract— Traditionally, joint source-channel coding is viewed in the block coding context — all the source symbols are known in advance by the encoder. Here, we consider source symbols to be revealed to the encoder in real time and require that they be reconstructed at the decoder within a certain fixed end-to-end delay. We derive upper and lower bounds on the reliability function with delay for cases with and without channel feedback. For erasure channels with feedback, the upper-bound is shown to be achievable and the resulting scheme shows that from a delay perspective, nearly separate source and channel coding is optimal.

I. I NTRODUCTION The block-length story for error exponents is particularly seductive when upper and lower bounds agree — as they do for both lossless source coding and for point-to-point channel coding in the high-rate regime [1], [2]. However, the block-code setting conflates a particular style of implementation with the problem statement itself. Recently, it has become clear that fixed-block codes may indeed incur unnecessarily poor performance with respect to end-to-end delay, even when the required delay is fixed. In [3], we show that despite the block channel coding reliability functions not changing with feedback in the high rate regime, the reliability function with respect to fixed-delay can in fact improve dramatically with feedback.1 For fixed-rate lossless source-coding, [5] showed similarly that the reliability function with fixed delay is much better than the reliability with fixed block-length. An example given in [5], [6] illustrated how sometimes an extremely simple and clearly suboptimal nonblock code can dramatically outperform the best possible fixed-length block-code when considering the tradeoff with fixed delay. These results suggest that a more systematic examination of the tradeoff between delay and probability of error is needed in other contexts as well. The main results in this paper are upper and lower bounds on the 1 It

had long been known that the reliability function with respect to average block-length can improve, but there was a mistaken assertion by Pinsker in [4] that the fixed-delay exponents do not improve with feedback.

error exponents with delay for lossless joint sourcechannel coding, both with and without feedback. Without feedback, the upper bound strategy parallels the one used in [3] for channel coding alone. We also derive a lower bound (achievability) on the fixeddelay error exponents by combining variable length source coding and sequential random channel coding. These two bounds are in general not the same.2 We then study the joint source-channel coding problem with noiseless feedback, giving a “focusing”[3] type of upper bound similar to the pure channel coding result in [3]. We then give a matching lower bound on the joint source-channel reliability for the special case of binary erasure channels. The scheme is interesting because it is essentially a separation based architecture except that it uses a variable-rate interface between the source and channel coding layers. Source

s1

Encoding

x1 (s11 )

Channel Outputs

y1

? ?

Decoding

Fig. 1.

s2

? x2 (s12 )

? y2

s3

? x3 (s13 )

? y3

s4

? x4 (s14 )

? y4

s5

? x5 (s15 )

? y5

?

?

ˆ s1 (4)

ˆ s2 (5)

s6 ...

? x6 (s16 ) ...

? y6 ...

? ˆ s3 (6) ...

Sequential joint source-channel coding, delay ∆ = 3

A. Review of block joint source-channel coding The setup is shown in Figure 2. For convenience, there is a common clock for the iid source s1n , si ∼ Qs 3 from a finite alphabet S and the DMC channel W . Without loss of generality, the source distribution Qs (s) > 0, ∀s ∈ S. The DMC has transition matrix Wy |x where the input and output alphabets are finite sets X and Y respectively. A block joint sourcechannel coding system for n source symbols consists 2 This parallels what was reported in [7] for block-coding. There, separate source-channel coding does not achieve the upper bound of the error exponent on joint source-channel coding. 3 In this paper, s is random variable, s is the realization of the random variable.

s1 , s2 , ...

-

Joint source-channel encoder

si ∼ Qs

Fig. 2.

Noisy channel Wy |x

x1 , x2 , ...

En : S n → X n , En (sn1 ) = xn1 Dn : Y n → S n , Dn (y1n ) = sˆn1 The error probability is Pr(s1n 6= ˆs1n ) = Pr(s1n 6= Dn (En (s1n ))). The block source-channel exponent Eb,sc is achievable if ∃ a family of {(En , Dn )}, s.t.4 1 lim inf − log2 Pr(s1n 6= ˆs1n ) = Eb,sc n→∞ n

(2)

Eb,sc

= =

min

{e(R) + Er (R)}

min

{e(R) + E(R)}

H(Qs )≤R≤log |S| H(Qs )≤R≤log |S|

Er (R) = max min[D(V kW |P ) + |I(P, V ) − R|+ ] V

is the random coding error exponent for channel Wy |x . Thus as shown in [1], E(R) ≤ Esp (R) = max P

min

V :I(V,P )≤R

Es,s (R) =

D(V kW |P )

the sphere packing bound, and E(R) = Er (R) = Esp (R) for Rcr ≤ R ≤ C. As shown in [7], if the minimum of e(R)+Er (R) is attained for an R ≥ Rcr then Eb,sc = e(R) + Er (R). This error exponent is in general better than the obvious separate source channel coding error exponent maxR {min{e(R), Er (R)}} [7].

We review some related results from [5] on sequential lossless source coding and from [3] on sequential channel coding. use bits and log2 in this paper.

inf

α>0

1 D(Us kQs ) α

1 e((1 + α)R) α

(2)

Where e(R) = inf Us :H(Us )≥R D(Us kQs ) is the block source coding error exponent [2]. For channel coding without feedback but facing a hard delay constraint, [9] reviews how the random block-coding error exponent Er (R) governs how the probability of bit error decays with delay6 for random tree codes. Formally, for all i7 and ∆, there exists a sequential channel coding system, s.t. −1 i log2 Pr(bi 6= bˆi ( + ∆)) = Er (R) (3) ∆ R where bˆi ( Ri + ∆)is the estimate of bi after in total i R + ∆ channel uses, i.e. ∆ seconds after bi enters the encoder. In [3], we generalize Pinsker’s result from [4] to show that the Haroutunian Bound [10] E + (R) serves as an upper bound on the error exponent for channel coding with delay. Formally, for channel Vy |x , for any channel coding scheme, there exists a finite function i(∆) on ∆, s.t. the error exponent with delay: lim inf

∆→∞

−1 i(∆) log2 Pr(bi(∆) 6= bˆi(∆) ( + ∆)) ∆ R ≤ E (R) (4) lim inf

∆→∞ +

where E + (R) is the Haroutunian bound. E + (R) = =

B. Source coding and channel coding with delay constraint

4 We

inf

α>0,Us :H(Us )≥(1+α)R

(1)

e(R) = minU :H(U )≥R D(U kQ) is the block source coding error exponent for source Qs . E(R) is the block channel coding error exponent for channel Wy |x and P

- ˆs1 , ˆs2 , ...

For lossless source coding, [5] derives a tight5 bound on the error exponent with a hard delay conˆ straint: ∀i, lim inf ∆→∞ −1 ∆ log2 Pr(si 6= si (i + ∆)) = Es,s (R) , where Es,s (R) is the sequential source coding error exponent:

=

The relevant results of [7] are summarized into the following theorem. (1) (2) Theorem 1: Eb,sc ≤ Eb,sc ≤ Eb,sc where (1)

Joint source-channel decoder

Joint source-channel coding with a fixed delay requirement

of a encoder-decoder pair (En , Dn ). Where

Eb,sc

y1 , y2 , ...

5 This

inf

sup D(Vy |x kWy |x |P )

Vy |x :I(PV ,Vy |x ) I(x1n+∆ ; y1n+∆ ),

where x1n+∆ is the input random vector to the channel. We first examine I(x1n+∆ ; y1n+∆ ) under the discrete memoryless channel law10 Vy |x . By Lemma 8.9.2 in [11]: n+∆ X I(x1n+∆ ; y1n+∆ ) ≤ I(xi ; yi ) i=1

Now assume PV is the distribution on the input to maximize I(x; y ) given the channel law Vy |x . Then: I(x1n+∆ ; y1n+∆ ) ≤ (n + ∆)I(PV , Vy |x ) If nH(Us ) > (n + ∆)I(PV , Vy |x ) ≥ I(x1n+∆ ; y1n+∆ ) then the block coding scheme constructed in the previous section will with probability 1 make a block error. Moreover, many individual symbols will also be in error often: Lemma 4: If the source is coming from Us from time 1 to n, and the channel law is Vy |x from time 1 to time n+∆, such that H(Us ) > (1+ ∆ n )I(PV , Vy |x ) , then there exists a δ > 0 so that for n large enough (here we fix the ratio ∆ n ), the feed-forward decoder H(U )− n+∆ I(P ,V

)

s V y |x n will make at least 2 log |S|−(H(U n n+∆ s )− n I(PV ,Vy |x )) 2 symbol errors with probability δ or above. δ satisfies 11 hδ +δ log2 (|S|−1) = 12 (H(Us )− n+∆ n I(PV , Vy |x )). Proof: Lemma 3 implies:

n X

H(˜si ) ≥

H(˜s1n )

i=1



nH(Us ) − (n + ∆)I(PV , Vy |x )(9)

The average entropy per source symbol for ˜s is at least H(Us ) − n+∆ n I(PV , Vy |x ). Now suppose that H(˜si ) ≥ 12 (H(Us ) − n+∆ n I(PV , Vy |x )) for t positions. By noticing that H(˜si ) ≤ log2 |S|, we have n X

H(˜si ) ≤ t log2 |S|

i=1

1 n+∆ +(n − t) (H(Us ) − I(PV , Vy |x )) 2 n

We can pick j ∗ = jt ≥ t, by the previous lemma we (H(Us )− n+∆ n I(PV ,Vy |x )) know that j ∗ ≥ 12 2 log |S|−(H(U n, so )− n+∆ I(P ,V )) 2

s

n

V

y |x

∗ if we fix ∆ n and let n go to infinity, then j goes to infinity as well. At this point, Lemma 1 and 4 together imply that even if the channel only behaves like it came from the channel law Vy |x from time j ∗ to j ∗ +∆ and the source behaves like it came from a distribution Us from time 1 to j ∗ −1, s.t. H(Us ) > (1+ ∆ n )I(PV , Vy |x ), whatever the source distribution from time n + 1 to time n + ∆ is, the same minimum error probability δ still holds. So we assume the source from time n + 1 to time n + ∆ is from distribution Qs . Now define the “bad source-channel-sequence” set Ej ∗ as the set of source and channel output sequence pairs so that the type I delay ∆ joint source-channel decoder makes a decoding error at j ∗ . Formally13 Ej ∗ = {(~s, y¯)|sj ∗ 6= Dj∆∗ (¯ s, y¯)}. By Lemma 4, Pr(Ej ∗ ) ≥ δ. Notice that Ej ∗ does not depend on the distribution of the source or the channel behavior but only on the encoder-decoder pair. Define J = min{n, j ∗ + ∆}, and s¯¯ = sJ1 .Now we write the typical set for distribution Us : A²j1∗ (Us ) = {~s|∀s ∈ ¯

S : nsJ(s¯) ∈ (Us (s) − ²1 , Us (s) + ²1 )} and for each ~s, write the strongly typical set for channel Vy |x : ¯(~ x s)) A²j1∗,²2 (Vy |x |~s) = {y¯|∀x ∈ X either nx (∆ < ¯ ¯ nx,y (x(~ s),y ) ²2 or ∀y ∈ Y, nx (x¯(~s)) ∈ (vy|x − ²1 , vy|x + ²1 )}. Finally we have the joint strongly typical set for source Us and channel Vy |x : A²j1∗,²2 (Us , Vy |x ) = {(~s, y¯)|~s ∈ A²j1∗ (Us ) and y¯ ∈ A²j1∗,²2 (Vy |x |~s)} Lemma 5: PrUs ,Vy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) ≥ 2δ for large n and ∆. Proof: First, we define A²j1∗ (UsC ) = {(~s, y¯)|∃s ∈ ¯

S : nsJ(s¯) ∈ / (Us (s) − ²1 , Us (s) + ²1 )} and A²j1∗,²2 (~s, VyC|x ) = {(~s, y¯)|∀x



¯(~ nx,y (x s),y¯) ¯(~ nx (x s))

¯(~ nx (x s)) ∆

X either < ²2 or ∀y ∈ Y, (vy|x − ²1 , vy|x + ²1 )} So the total set {(~s, y¯)} can be partitioned as



With Eqn. 9, we derive the desired result: t≥

n+∆ n I(PV , Vy |x )) n (H(Us ) − n+∆ n I(PV , Vy |x ))

(H(Us ) − 2 log2 |S| −

{(~s, y¯)} = A²j1∗,²2 (Us , Vy |x )

(10)

Where 2 log2 |S| − (H(Us ) − n+∆ n I(PV , Vy |x )) ≥ 2 log2 |S| − H(Us ) ≥ 2 log2 |S| − log2 |S| > 0 Now for t positions 1 ≤ j1 < j2 < ... < jt ≤ n the individual entropy H(˜sj ) ≥ 12 (H(Us ) − n+∆ n I(PV , Vy |x )) . By the property of the binary entropy function12 , Pr(˜sj 6= s0 ) = Pr(sj 6= ˆsj ) ≥ δ.¤





[∪~s A²j1∗,²2 (~s, VyC|x )])

∗ Fix ∆ n , let n go to infinity, then J = min{n, j +∆} and ∆ go to infinity. By strong typicality in Lemma 13.6.1 in [11], we know that ∀²1 > 0, if J is large enough, then PrUs ,Vy |x (A²j1∗ (UsC )) ≤ 4δ . By the same ¯(~s)) ≤ 4δ . Because lemma: ∀~s : Vy |x (A²j1∗,²2 (~s, VyC|x )|x the channel behavior is independent with the source,

10 We

write the transition probability Vy |x (x, y) = vy|x hδ = −δ log2 δ − (1 − δ) log2 (1 − δ) 12 s is the zero element in the finite group Z 0 |X | .

(11) (A²j1∗ (UsC )

13 To

11 Write

s

j ∗ +∆ j∗



simplify the notation, write: ~s = sj1 ∗ , y¯ = yjj∗ +∆

+∆



, s¯ = sj1

−1

, s¯ =

we have:

We are ready to bound the error probability of the j ∗ ’th source symbol under the true source distribution Qs and channel rule Wy |x .

PrUs ,Vy |x (A²j1∗,²2 (Us , Vy |x )) =(a)

≥(b) ≥

1 − PrUs ,Vy |x (A²j1∗ (UsC )) −PrUs ,Vy |x (∪~s Aj²1∗,²2 (~s, VyC|x )) δ X ¯(~s)) Us (~s)Vy |x (A²j1∗,²2 (~s, VyC|x )|x 1− − 4 ~ s δ X δ 1− − Us (~s) 4 4 ~ s

δ = 1− (12) 2 (a) is true because of (11). (b) is true because the source and channel behavior are independent. From this we know: PrUs ,Vy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) ≥(a) PrUs ,Vy |x (Ej ∗ ) −(1 − PrUs ,Vy |x (A²j1∗,²2 (Us , Vy |x ))) δ ≥ δ− 2 δ = 2 (a) is true because Pr(A1 ∩ A2 ) ≥ Pr(A1 ) − ¤ Pr(AC 2 ) = Pr(A1 ) − (1 − Pr(A2 )) For source channel output pair (~s, y¯) ∈ A²j1∗,²2 (Us , Vy |x ), we can bound the ratio of the probability of (~s, y¯) under source Us , channel rule Vy |x and the probability of (~s, y¯) under the true source Qs and true channel rule Wy |x as follows.

Lemma 7: ∀² > 0, and large enough n ( ∆ n fixed): PrQs ,Wy |x (Ej ∗ ) ≥ δ −∆ supP D(Vy |x kWy |x |P )−j ∗ D(Us kQs )−(j ∗ +∆)² 2 2 Proof: Combining Lemma 5 and 6. PrQs ,Wy |x (Ej ∗ ) ≥ PrQs ,Wy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) = PrUs ,Vy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) PrQs ,Wy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) PrUs ,Vy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) ²1 ,²2 δ PrQs ,Wy |x (Ej ∗ ∩ Aj ∗ (Us , Vy |x )) ≥(a) 2 PrUs ,Vy |x (Ej ∗ ∩ A²j1∗,²2 (Us , Vy |x )) ∗ ∗ δ ≥(b) 2−∆ supP D(Vy |x kWy |x |P )−j D(Us kQs )−(j +∆)² 2 (a) is true because of Lemma 5 and (b) is true because of Lemma 6. ¤ Now we are ready to prove Theorem 2. For Us and Vy |x , as long as H(Us ) > n+∆ n I(PV , Vy |x ), we have a constant δ > 0, by letting ² go to 0, ∆ and n go to infinity proportionally, we have: −1 log2 PrQs ,Wy |x (sj ∗ (j ∗ + ∆) 6= sj ∗ ) ∆ −1 = log2 PrQs ,Wy |x (Ej ∗ ) ∆ j∗ ≤ ² + sup D(Vy |x kWy |x |P ) + D(Us kQs ) ∆ P n ≤ ² + sup D(Vy |x kWy |x |P ) + D(Us kQs ) ∆ P

Lemma 6: ∀² > 0, we can pick ²1 , ²2 small enough, s.t. for large enough n ( ∆ s, y¯) ∈ n is fixed): ∀(~ ²1 ,²2 Aj ∗ (Us , Vy |x ), PrQs ,Wy |x ((~s, y¯)) PrUs ,Vy |x ((~s, y¯)) ∗



≥ 2−∆ supP D(Vy |x kWy |x |P )−j D(Us kQs )−(j +∆)² Proof: The source and the channel behavior are independent, so: PrQs ,Wy |x ((~s, y¯)) Qs (~s)Wy |x (y¯|x ¯(~s)) = PrUs ,Vy |x ((~s, y¯)) Us (~s)Vy |x (y¯|x ¯(~s))

where ² can be arbitrarily small. Notice that H(Us ) > n+∆ n I(PV , Vy |x ) is equivalent to ∃R, s.t. 1 H(U ) > R > I(PV , Vy |x ), where α = ∆ s 1+α n . Then the upper bound on the error exponent is the minimum of the above error exponents over all α > 0, i.e: (2) Es,sc = inf

~s ∈ A²j1∗ (Us ), from Lemma 6 in [12], we know that for n large enough: ∗ ∗ Qs (~s) ≥ 2−j D(Us kQs )−j ² Us (~s) Meanwhile, y¯ ∈ Aj²1∗,²2 (~s, Vy |x ), from Lemma 4.4 in [3], we know that for n large enough: Wy |x (y¯|x ¯(~s)) ≥ 2−∆ supP D(Vy |x kWy |x |P )−∆² ¯ ¯ Vy |x (y |x(~s))

Combining the above two inequalities, we get the desired result. ¤

inf

R α>0,Us ,Vy |x :I(PV ,Vy |x )(1+α)R

1 D(Us kQs )} α sup D(Vy |x kWy |x |P )

{sup D(Vy |x kWy |x |P ) + P

= inf {

=

inf

R

Vy |x :I(PV ,Vy |x )0,Us :H(Us )>(1+α)R α inf {E + (R) + Es,s (R)} R

D(Us kQs )} (13) ¥

Both the upper bound for sequential error exponent and the upper bound for the block source channel

Fixed rate interface

Fixed rate interface

Rate R Rate R optimal randomized ? s1 , s2 , ... - sequential - sequential si ∼ Qs source b1 , b2 channel x1 , x2 encoder ... encoder ... Fig. 4.

Noisy channel Wy |x

Separate source-channel coding

coding error exponent have the form of a minimization over the sum of the source coding and the channel coding error exponents over an imaginary rate R. However, the sequential source coding error exponent Es,s (R) is strictly larger than the block source coding error exponent e(R) in general. Thus the upper bound for sequential joint source-channel coding error exponent is generally strictly larger than its block coding counterpart. III. L OWER BOUND ON ACHIEVABLE ERROR EXPONENTS WITH DELAY

An obvious lower bound on the error exponent can be obtained by implementing an optimal sequential source code [5] and randomized sequential channel code [13] separately. Both the source coding and channel coding are committed at the same rate R, R ∈ (H(Qs ), CW ), where CW is the capacity of the channel. This separate source channel coding system works as shown in Figure 4. Theorem 3: For the iid source ∼ Qs and a discrete memoryless channel with transition probability Wy |x , the delay-reliability defined in Defini(1) (1) tion 2 is lower bounded by Es,sc , where Es,sc = maxR {min{Er (R), Es,s (R)}}. i.e. there exists a separate source channel coding system committed to rate R∗ , where R∗ maximizes min{Er (R), Es,s (R)} s.t. for all ² > 0, there exists K < ∞, s.t. for all i, ∆, (1)

Pr(si 6= ˆsi (i + ∆)) ≤ K2−∆(Es,sc −²) Proof: For this coding scheme, a decoding error on si at time i + ∆ occurs only if the first channel error occurs before the last information bit describing si or the last information bit about si is not fed into the channel encoder at time i + ∆ yet. The union bound on the error events is as follows. For all ² > 0, there exists K < ∞ s.t. iR X

Rate R Rate R ML optimal ? - sequential - sequential - ˆs1 , ˆs2 , ... y1 , y2 channel bˆ1 , bˆ2 source ... decoder ... decoder

≤(b)

iR X

k

K1 2−(i+∆− R )Er (R)

k=1 (i+∆)R

+

X

k

k

K1 2−(i+∆− R )Er (R) 2−( R −i)(Es,s (R)−²1 )

k=iR +K1 2−∆Es,s (R) ≤ K2 ∆2−∆(min{Es,s (R),Er (R)}−²1 ) −∆(min{Es,s (R),Er (R)}−²)

≤ K2

(14)

where K1 , K2 and ²1 are properly chosen finite real numbers. (a) is the union bound on all possible error events and the fact that channel coding and source coding are independent as shown in Figure 4. (b) is (1) by (2) and (3). Es,sc The above coding scheme works for all R ∈ (H(Qs ), CW ). The optimal rate R∗ is chosen for the source coder and the channel coder to (1) ¥ achieve the error exponent Es,sc . Because Er (R) ≤ E + (R), in general the lower (1) bound Es,cs is strictly smaller than the upper bound (2) Es,cs in Eqn.13. This is comparable to the obvious non-optimal achievable error exponent for block source-channel coding. IV. J OINT SOURCE - CHANNEL CODING WITH FEEDBACK

In this section, we study the joint source-channel coding with feedback problem under delay constraint. As shown in Figure 5, the output of the channel is fed back to the joint source-channel encoder noiselessly. Recall that Es,scf is the error exponent with feedback with decoding delay. ? - Joint source-channel Encoder

s1 , s2 , ... si ∼ Qs

Fig. 5.

x1 , x2 , ...

Noisy channel Wy |x

- Joint source-channel - ˆs1 , ˆs2 , ... Decoder y1 , y2 , ...

joint source-channel coding with feedback

A. Upper bound on Es,scf PrW (b1k 6= bˆ1k (i + ∆)) Theorem 4: For the iid source ∼ Qs and a discrete k=1 memoryless channel with transition probability Wy |x (i+∆)R X k k k and noiseless feedback, then ˆ + PrW (b1 6= b1 (i + ∆))Prs (si 6= ˆsi ( )) R λ 1 k=iR Es,scf ≤ min { e(R) + E + (λR)} +Prs (si 6= ˆsi (i + ∆)) 1−λ λ∈[0,1],R 1 − λ

Pr(si 6= ˆsi (i + ∆)) ≤(a)

Proof: Suppose a sequential joint source channel coding with feedback error exponent Es,scf can be achieved. i.e. for all ² > 0, there exists finite K1 , s.t. for all i, ∆: −∆(Es,scf −²)

Pr(si 6= ˆsi (i + ∆)) ≤ K1 2

For any λ, the right hand side is minimized by P s.t. H(P ) ≥ H(Qs ) because for those P , s.t. H(P ) < H(Qs ), we have λ 1 D(P kQs ) + E + (λH(P )) 1−λ 1−λ 1 ≥(a) E + (λH(Qs )) 1−λ λ 1 = D(Qs kQs ) + E + (λH(Qs )) 1−λ 1−λ

(15)

Thus we can give an upper bound on the block error at decoding time n + ∆: Pr(s1n 6= ˆs1n (n + ∆)) ≤

n X

Pr(si 6= ˆsi (n + ∆))

(a) is true because E + (·) is monotonically decreasing.

i=1



n X

Es,scf

−(n+∆−i)(Es,scf −²)

K1 2

= K2−∆(Es,scf −²)

(16)

where K is finite. On the other hand, we also have a lower bound on the block error. For a message set of size 2mR and allowing m channel uses, the channel coding error probability for the memoryless channel with causal + feedback is lower bounded by K2−m(E (R)−²n ) , where ²n → 0. This gives the following lower bound on the block error probability for joint source-channel coding with feedback, where the length of the source block is n and allowing n + ∆ channel uses. Pr(s1n 6= ˆs1n ) (17) X nH(P ) n −(n+∆)(E + ( n+∆ )+²(1) n ) PrQs (TP )2 ≥(a) TPn ∈T n

≥(b)

X

(2)

2−n(D(P kQs )+²n

))

TPn ∈T n

2−(n+∆)(E

+ nH(P ) ( n+∆ )+²(1) n ))

≥ 2− minP {nD(P kQs )+(n+∆)(E

+ nH(P ) ( n+∆ ))}−n²n

where Tn is the set of types on S n and E + (R) = 0 (2) (1) for R > CW , ²n , ²n and ²n all converge to 0 as n goes to infinity. (a) is true because there are 2nH(P ) equally likely sequences in a type TPn , meanwhile n + ∆ channel uses with feedback are available, thus we have the lower bound on the error probability by the Haroutunian bound. (b) is by Theorem 12.1.4 in [11]. Combining (16) and (17), we have: log2 K ²n + ∆ ∆ n n + ∆ + nH(P ) + min{ D(P kQs ) + E ( )} P ∆ ∆ n+∆

Es,scf ≤ ²

This is true for all ∆ and n, write λ = let n goes to infinity: Es,scf ≤

λD(P kQs ) + E + (λH(P )) } 1−λ λ∈[0,1],P :H(P )≥H(Qs ) λD(P kQs ) + E + (λH(P )) = min { min { }} 1−λ λ∈[0,1],R≥H(Qs ) P :H(P )=R λ 1 = min { e(R) + E + (λR)} 1 − λ 1 − λ λ∈[0,1],R≥H(Qs ) λ 1 e(R) + E + (λR)} = min { 1−λ λ∈[0,1],R 1 − λ ≤

i=1

min {

λ∈[0,1]P

n n+∆

and

min

{

The last equality is because e(R) = 0, for R < H(Qs ) and E + (·) is monotonically decreasing. ¥ B. Es,scf for binary erasure channels (BEC) We do not have a general scheme for joint source channel coding for arbitrary DMC’s. But for binary erasure channels, we can apply an optimal universal source code[5] followed by an optimal “repeat until received” channel code[3] for a BEC. In [3], a “focusing” bound is derived for BEC. Our optimal joint source-channel coding with feedback scheme for binary erasure channel is as follows. A block-length N is chosen that is much smaller than the target end-to-end delays14 , while still being large enough. For a discrete memoryless source and large block-lengths N , the best possible variable-length code is given in [2] and consists of two stages: first describing the type of the block ~si using O(|S| log2 N ) bits and then describing which particular realization has occurred by using a variable N H(~si ) bits. The overhead O(|S| log2 N ) is asymptotically negligible and the code is also universal in nature. It is easy to E (l(~s)) verify that limN →∞ QsN = H(Qs ) This code is obviously a prefix-free code. Write l(~si ) as the length of the codeword for ~si , then: l(~si ) ≤ |S| log2 (N + 1) + N H(~si )

(18)

The binary sequences describing the source is fed to the optimal “repeat until received” channel coding system described in [3].

λ 1 14 We are interested in the performance with asymptotically large D(P kQs )+ E + (λH(P ))} delays ∆ 1−λ 1−λ

Theorem 5: For the iid source ∼ Qs and a binary erasure channel with error rate δ with causal noiseless feedback, then using the code described above: + (λR) ∆ ∗ λe(R) + EBEC }=E 1−λ λ∈[0,1],R + 15 where EBEC (R) = D(1 − Rkδ) is the Haroutunian bound for BEC. Before the proof, we have the following lemmas to bound the probabilities of atypical channel behavior and atypical source behavior respectively. Lemma 8: (Channel atypicality) For a binary erasure channel with erasure rate δ, the probability of more than n erasures in t channel uses is upper n bounded by (t − n)2−tD( t kδ) if nt > δ and upper bounded by 1 if nt ≤ δ. Proof: The proof is trivial by applying Theorem 12.1.4 in [11]. Thus for all ² > 0, there exists K < ∞, s.t. the above probability is upper bounded by:

Es,scf =

min {

+ −t(EBEC (1− n t )−²)

K2

(19)

¤ Lemma 9: (Source atypicality) for all ² > 0, N large enough, there exists K < ∞, s.t. for all ∆, n : n X Pr( l(~si ) > nN r) ≤ K2−nN (e(r)−²)

(20)

i=1

Proof: Only need to show the case for r > H(Qs ). By the Cram´er’s theorem[14]: n n X 1X Pr( l(~si ) > nN r) = Pr( l(~si ) > N r) n i=1 i=1 N

≤ (n + 1)|S| 2−n inf x≥N r I(x)

(21)

where the rate function I(x) is [14]: X I(x) = sup {ρx − log2 ( Qs (~s)2ρl(~s) )} ρ∈R

means that the ρ to maximize I(x, ρ) is positive. This implies that I(x) is monotonically increasing with x. Thus inf x≥N r I(x) = I(N r) For ρ ≥ 0, using the upper bound on l(~x) in (18): log2 (

X

Qs (~s)2ρl(~s) )

~ s∈S N

≤ log2 (

X

2−N D(P kQs ) 2ρ(|S| log2 (N +1)+N H(P )) )

TPN ∈T N

≤ log2 ((N + 1)|S| 2−N minP {D(P kQs )−N H(P )}+ρ|S| log2 (N +1) ) ¡ ¢ = N − min{D(P kQs ) − ρH(P )} + ²N P

log2 (N +1) where ²N = (1+ρ)|S| N goes to 0 as N goes to infinity. Substitute the above equality to I(N r):

¡ ¢ I(N r) ≥ N sup{min ρ(r − H(P )) + D(P kQs )} − ²N (22) ρ>0

P

First fix ρ, by a simple Lagrange multiplier argument, with fixed H(P ), we know that the distribution to minimize D(P kQs ) isαa tilted distribution of Qα s . It canαbe ∂H(Qs ) ∂D(Qα ∂H(Q ) s kQs ) verified that ∂α ≥ 0 and = α ∂α s . ∂α Thus the distribution to minimize D(P kQs ) − ρH(P ) is Qρs . Using some algebra, we have D(Qρs kQs ) − ρH(Qρs ) = −(1 + ρ) log2

X

1

Qs (s) 1+ρ

s∈S

Substitute this into (22): I(N r)

X ¡ ¢ 1 ≥ N sup ρr − (1 + ρ) log2 Qs (s) 1+ρ − ²N ¡

ρ>0

= N e(r) − ²N

¢

s∈S

~ s∈S N

P Write I(x, ρ) = ρx − log2 ( ~s∈S N Qs (~s)2ρl(~s) ), I(x, 0) = 0. x ≥ N r > N H(Qs ), for large N : X ∂I(x, ρ) |ρ=0 = x − Qs (~s)l(~s) ≥ 0 ∂ρ N ~ s∈S

By the H¨older inequality, for all ρ1 , ρ2 , and for all θ ∈ (0, 1): X X ( pi 2ρ1 li )θ ( pi 2ρ2 li )(1−θ) X (1−θ) (1−θ)ρ2 li ≥ (pθi 2θρ1 li )(pi 2 ) X = pi 2(θρ1 +(1−θ)ρ2 )li P This shows that log2 ( ~s∈S N Qs (~s)2ρl(~s) ) is a convex function on ρ, thus I(x, ρ) is a concave function on ρ for fixed x. Then ∀x > 0, ∀ρ < 0, I(x, ρ) < 0. , which 15 We write D(akb) as the Kullback-Liebler divergence of two binary distributions (a, 1 − a) and (b, 1 − b).

The last equality can again be proved by a simple Lagrange multiplier argument. Substitute (23) into (21) and by letting N be big enough, we get the the desired bound in (20). ¤ Now we are ready to prove Theorem 5. Proof: We give an upper bound on the decoding error on ~st at time (t + ∆)N . At time (t + ∆)N , the decoder can not decode ~st with 0 error probability iff the binary strings describing ~st are not all out of the buffer. Since the encoding buffer is FIFO, this means that the number of non-erasures from some time t1 < tN to (t + ∆)N is less than the number of the bits in the buffer at time t1 plus the number of bits coming into the encoder from time t1 to time tN . Suppose the buffer is last empty at time tN −nN where 0 ≤ n ≤ t. Given Pn−1 this condition, the decoding error occurs only if st−i ) > (n+∆)N −nE (tN −nN, tN +∆N ) i=0 l(~ where nE (t1 , t2 ) is the number of erasures from time

(23)

(a)

Error exponnets with feedback

Error exponents without feedback

1.5

Upper bound on Es,sc

1

Separate source channel coding error exponent

0.5 0 0

0.05

0.10 0.15 Erasure rate δ

0.20

10

(b) Es,sc

f

5

0 0

0.05

0.10 0.15 Erasure rate δ

0.20

Fig. 6. (a)Upper/lower bounds on Es,sc as functions of erasure rate. Upper(dotted)/lower(dashed) bounds on block coding error exponent Eb,sc (b) Joint source-channel coding error exponent with feedback Es,scf

t1 to time t2 . Then16

VI. C ONCLUSIONS AND F UTURE W ORK We studied sequential joint source-channel coding Pr(~st 6= ~st ((t + ∆)N ) and defined the error exponents with delay. By apt n−1 max X X £ nlX plying a variation of the feed-forward channel coding ≤ Pr( l(~st−i ) > m) analysis in [3], we derived an upper bound on error exn=0 m=0 i=0 ¤ ponents with delay for lossless source-channel coding. Pr((n + ∆)N − nE (tN − nN, tN + ∆N ) < m) A lower bound on the error exponent based on separate t nl max X X sequential source and channel coding is given. There m ≤(a) K1 2−nN (e( nN )−²1 ) is, in general, a gap between the lower and upper n=0 m=0 bounds. It is an open problem on how to implement (n+∆)N −m + joint source channel coding with delay constraint. For K2 2−(n+∆)N (EBEC (1− (n+∆)N )−²2 ) t joint source channel coding with feedback, we only X + nR ≤(b) K3 2−N (minR {ne(R)+(n+∆)EBEC ( n+∆ )}−²3 ) show the achievability result for erasure channels. n=0 General results for general DMCs are unknown. Moreγ∆ + over, no result on lossy source-channel coding with X (λR) λe(R)+E BEC }−²3 ) 1−λ K3 2−∆N (minR,λ∈[0,1] { ≤(c) delay-constrained problem is known. n=0

+

∞ X

+

K3 2−nN (minR {e(R)+EBEC (R)}−²3 )

n=γ∆

≤(d) K2−∆N (E



−²)

where, Ki0 s and ²0i s are properly chosen real numbers. (a) is true because the source and memoryless channel are independent and Lemma 8 and 9. In (b), we m write R = nN and take the R to minimize the error E∗ exponents. Define γ = min {e(R)+E . The first + R BEC (R)} n term of (c) comes from defining λ = n+∆ , the + second term is by noticing that E (·) is monotonically decreasing. (d) is by definitions of γ and E ∗ . ¥ V. E XAMPLE : E RROR EXPONENTS FOR BEC S For a Bernoulli (0.25) source followed by a binary erasure channel (δ) joint source-channel coding system, the entropy rate of the source is H(s) = 0.81. We plot both the upper bound and the lower bound on the joint source channel coding error exponent with delay Es,sc and the bounds on block error exponents Eb,sc in Figure 6(a). The feedback error exponent Es,scf is plotted in Figure 6(b). 16 l max is the longest code length, lmax ≤ |S| log2 (N + 1) + N |S|.

R EFERENCES [1] R. G. Gallager, Information Theory and Reliable Communication. New York, NY: John Wiley, 1971. [2] I. Csisz´ ar and J. K¨ orner, Information Theory. Budapest: Akad´ emiai Kiad´ o, 1986. [3] A. Sahai, “Why block length and delay are not the same thing,” IEEE Transactions on Information Theory, submitted. [Online]. Available: http://www.eecs.berkeley.edu/˜sahai/Papers/FocusingBound.pdf [4] M. Pinsker, “Bounds of the probability and of the number of correctable errors for nonblock codes,” Translation from Problemy Peredachi Informatsii, vol. 3, pp. 44–55, 1967. [5] C. Chang and A. Sahai, “The error exponent with delay for lossless source coding,” Information Theory Workshop, 2006. [6] S. Draper, C. Chang, and A. Sahai, “Sequential random binning for streaming distributed source coding,” ISIT, 2005. [7] I. Csiszar, “Joint source-channel error exponent,” Problems of Control and Information Theory, vol. 9(5), pp. 315–328, 1980. [8] M. V. Burnashev, “Data transmission over a discrete channel with feedback. Random transmission time,” Problems of Information Transmission, vol. 12, no. 4, pp. 10–30, 1976. [9] G. Forney, “Convolutional codes ii. maximum-likelihood decoding,” Inform. and Control, vol. 25, pp. 222–266, 1974. [10] E. A. Haroutunian, “Lower bound for error probability in channels with feedback,” Problemy Peredachi Informatsii, vol. 13, no. 2, pp. 36–44, 1977. [11] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: John Wiley and Sons Inc., 1991. [12] C. Chang and A. Sahai, “Upper bound on error exponents with delay for lossless source coding with side-information,” IEEE International Symposium on Information Theory, 2006. [13] A. Sahai, Anytime Information Theory, PhD thesis. Massachusetts Institute Technology, 2001. [14] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. Springer, 1998.

Suggest Documents