Performance Bounds for Joint Source-Channel Coding of ... - CiteSeerX

9 downloads 0 Views 195KB Size Report
Memoryless Sources Using a Binary Decomposition. Seyed Bahram ZAHIR AZAMI*, Olivier RIOUL* and Pierre DUHAMEL**. D partements *Communications et ...
Performance Bounds for Joint Source-Channel Coding of Uniform Memoryless Sources Using a Binary Decomposition  Seyed Bahram ZAHIR AZAMI*, Olivier RIOUL* and Pierre DUHAMEL** Départements *Communications et **Signal École Nationale Supérieure des Télécommunications URA CNRS 820 46, rue Barrault, 75634 Paris cedex 13, France

Abstract

Source

The objective of this paper is to design and evaluate the performance of a transmission system with jointly optimized source and channel coding of a uniformly distributed source to be transmitted over a binary symmetric channel (BSC). We rst provide the optimal performance theoretically attainable (OPTA) according to Shannon's theorem. Then, we propose a new structure for joint source and channel coding in which the samples are rst expressed in binary representation and bits of same weight are processed separately. Finally, we determine the lower bound for total distortion attainable with this structure and compare it to OPTA curves and to simulation results.

U (m)

Encoder

X (n)

BSC

X’ (n)

Decoder

U’ (m)

Figure 1: A simple transmission structure. From Shannon theory we know that source and channel coding can be treated separately without any loss of performance for the overall system [3]. This is, however, an asymptotic result, as it necessitates very long blocks and very complex coders. Our approach is to achieve relatively good results, using comparatively simple coders. In this paper, we rst derive upper and lower bounds for the optimal rate-distortion functions, taking channel transmission errors into account. Then, we propose a new structure in which the source is decomposed into parallel binary streams. For this structure, we nd a theoretically attainable rate-distortion function, using the Lagrangian multiplier method, and compare it to the optimal one. Finally, we compare these theoretical results with those obtained by a practical method based on the proposed structure.

1 Introduction This paper addresses the transmission of digital data over noisy channels with jointly optimized source and channel coders. The type of source considered in this paper has a uniform probability distribution, and the transmission is over a binary symmetric channel (BSC). Our choices for the source and channel may seem overly simplistic. We feel, however, that their study gives a better understanding of the problem of joint source/channel coding. Moreover, we shall derive building blocks that can be used in more sophisticated systems. Figure 1 illustrates our transmission system. In this conguration, the source/channel code rate is dened as the average number of coded bits per source sample: r = mn . We seek to minimize the m.s.e. distortion D = m1 E fkU ? U 0 k2g, with the constraint that r  rd , where rd is the desired bit rate.

2 Bounds for Uniform source A memoryless source with uniform probability density function is considered. This source is to be coded and transmitted over a BSC. Note that the uniformity of the source does not permit too much for source coding, except for the dimensionality that can be exploited [2]. We now proceed to derive the optimal rate-distortion function r(D), as well as upper and lower bounds.

2.1 OPTA

 This

work is supported by a grant from the CNET-France Télécom. The work of S.B. Zahir Azami is also sponsored by The optimal performance theoretically attainable the CNOUS. (OPTA) is the expression of the smallest possible dis-

1

tortion as a function of the bit rate, when transmitting a given source on a given channel. According to Shannon's theory [3], the OPTA curve r(D) is given by

Theoretic bounds 20 18 16

r(D) = R(D) (1) C where R(D) is the source rate-distortion function and C is the channel capacity. For our model, the BSC is parameterized by the raw bit-error probability p, on which the OPTA depends. More precisely, one has C = 1 ? H2(p), where H2(x) = x log2 x1 + (1 ? x) log2 1?1 x is the binary entropy function. There is no closed-form expression for the uniform source R(D) function; it is derived below with the aid of Blahut's algorithm [1]. However, it is a simple matter to obtain closed-form expressions for lower and upper bounds, as shown next.

Figure 2: Upper and lower theoretical bounds, compared with the OPTA curve obtained by Blahut's algorithm, for a uniform source and zero error channel (p = 0).

2.2 Gaussian upper bound

2.4 Blahut's algorithm

14

SNR (dB)

12

8 6 4

Upper bound

0 0

0.5

1

1.5 r(D)

2

2.5

3

3 A new joint source-channel coding structure

(2)

This upper bound is plotted in gure 2 for a uniform source distributed on [? 12 ; 21 ], for which 2 = 121 .

3.1 Bitwise decomposition

2.3 Shannon's lower bound

The bounds derived above are useful since one may check the performances of practical algorithms against them. On the other hand, we are searching for simple algorithms and have chosen to decompose a uniform source U into a set of N binary sources Ui which are transmitted through the same channel, as shown in gure 3. Thus, we consider the binary representation of each source sample, truncated on N bits, and then perform the compression and the protection operations on each bit stream separately. An important consideration is the following. We assume that the original source U is uniformly distributed (e.g., between ?1=2 and 1=2) and that natural binary P coding is used, i.e., the Ui 's are such that U = Ni=1 2?i Ui . Then it is easily seen that each bit stream Ui is a binary symmetric source (BSS), that is, the bits Ui are independent and identically distributed with Prob(Ui = 0) = Prob(Ui = 1) = 1=2.

The Shannon's lower bound [5] for the source ratedistortion function is given by R(D)  Rs(D) = H ? 21 log2 2eD: where e = 2:71828::: and H denotes the dierential entropy of the source, given by H = 21 log2 122 for a uniform source. It has been observed that this bound is not attainable except for very small distortions. Shannon's theorem gives a lower bound for the OPTA curve: 1 12 2 2 2eD r(D)  rs (D) = 21log ? H2(p) This lower bound is plotted in gure 2.

Lower bound

Using Blahut's algorithm [1], we found numerically the source-distortion function R(D). From (1) this gives the OPTA curve r(D) = 1?RH(D2 ()p) plotted in gure 2. We observe that the OPTA curve is close to the upper bound for the small values of r and comes closer to the lower bound for the large values of r. Note that the distance between the lower and upper bounds is constant, about 1:5 dB.

This gives an upper bound for the OPTA curve: 2

r(D) Blahut

2

It is known that a theoretical upper bound of the source rate-distortion function R(D) (without channel consideration) is given by the source rate-distortion function Rg (D) of a Gaussian source of same variance 2 [5]: 2 R(D)  Rg (D) = 12 log2 D 1 log D2 r(D)  rg (D) = 12? H2 (p)

10

(3)

2

MSB

U1 C1

m1

Source

U

.....

LSB

UN m

CN N

U’

.....

BSC

uniform

User

U’N

DN

nN

Since we have decomposed our source into BSS sources, the next step is to nd the OPTA performance ri (Di ) for each BSS Ui transmitted over a BSC. This is done in the following section.

U’1

D1

n1

4 OPTA for a binary source

Figure 3: Source-Channel coder combination. Each row of bits (msb, ..., lsb) is processed separately.

4.1 Error probability distortion

Of course, the output will be reconstructed by the for- The OPTA for a BSS Ui over a BSC is well known P N when the distortion i is dened as an error prob0 mula U = i=1 2?i Ui0. ability: i = E (wH (Ui ? Ui0)) where wH (x) is the Hamming weight of x, that is, wH (x) = 0 if x = 0 3.2 Remark on memoryless sources and = 1 otherwise. This clearly corresponds to an Before deriving optimal performances, a preliminary m.s.e. 0 distortion E(Ui ? Ui0 )2 if we require that Ui remark is in order. It follows from information and Ui 2 f0; 1g. In this case the source rate-distortion theory that for any memoryless source, the rate- function is given by [3, 5] distortion function may be determined as R(D) = Ri(i ) = 1 ? H2(i ) minp(u) fI(U; U 0 ); E(U ? U 0)2  Dg where U and U 0 represent input and output random variables (not vec- From (1) this gives the following OPTA curve. tors) and I(U; U 0 ) is the mutual information between U and U 0. Therefore, even though actual processing is ri (i ) = 11 ?? HH2((p)i ) (4) made by blocks of length m, the calculation of R(D) 2 is made using the denition of D for m = 1, that is, D = E(U ? U 0)2 . 4.2 m.s.e. distortion

3.3 Additivity of distortion

Even though our basic system is binary, we want the additional exibility in the output Ui0 to be different from f0; 1g, because it yields lower values of m.s.e. distortion and also permits the requirement that E(Ui ) = E(Ui0 ), which was needed to obtain the additivity property in the preceding section. Consider, for example, the limit case ri = 0. Then Ui0 = 1=2 is the value that minimizes Di = E(Ui ? Ui0)2 , and one has E(Ui ) = E(Ui0 ). P In general, the output is 0 still reconstructed as U = Ni=1 2?iUi0 even though the Ui0 are no longer bits. With this assumption, we have to re-calculate the OPTA for a m.s.e. measure of distortion Di in place of the error probability distortion i . But there is a simple relationship between them, which we now derive. Replace Ui0 = 0 by c1's and the Ui0 = 1 by c2 in the output, where c1 and c2 are real-valued constants. We seek to minimize Di for a given i with a good choice of these constants. Since Di = (1 ? i ) 21 fc21 + (1 ? c2)2 g + i 21 f(1 ? c1)2 + c22g; it is easily seen that c1 = 1 ? c2 = i is the choice that minimizes Di , and we have our basic requirement that E(Ui ) = E(Ui0 ) = 1=2. It follows that

A second important consideration is that distortion is additive. To prove this, write D = E (U ? U 0 )2 N X

= E =

X i;j

X 2?i Ui ? 2?i U 0 N

!

2

i i=1 i=1 2?(i+j )E (Ui ? Ui0)(Uj ? Uj0 )

Since Ui and Uj are independent for i 6= j, and considering that Ui and Ui0 have same biases (that is, E (Ui ) = E (Ui0 )) we have E (Ui ? Ui0)(Uj ? Uj0 ) = E(Ui ? Ui0 )E(Uj ? Uj0 ) = 0 for all i 6= j. Therefore, D simplies to D=

N X i=1

X 4?iE (Ui ? U 0 )2 = wiDi i

N

i=1

where Di = E (Ui ? Ui0 )2 is the m.s.e. corresponding to bit i, which depends on the channel raw error probability p, and wi = 4?i is the weighting associated with bit i. Thus, D is a linear superposition of bit distortions Di . It is important to note that this result was obtained with the assumption that E (Ui ) = E (Ui0 ).

Di = i ? i2 ; 3

which determines the value of the error probability i  21 that should be used in the OPTA (4) as p i = 12 (1 ? 1 ? 4Di ): The OPTA curve is, therefore, given by: p 1? 1?4D

0

10

−2

10

−4

D

10

−6

10

1 ? H2 ( 2 ) (5) 1 ? H2(p) : In gure 4, we have plotted the corresponding p source rate-distortion function Ri (Di ) = 1 ? H2 ( 1? 12?4D ), along with operating points obtained for simple coders Figure 5: Theoretical bound for distortion function used in the simulation presented in [6]. versus row-error probability, p, for dierent values of ri 2 f0; 0:7; :::; 0:9999;1; 1:0001;:::;1:1; 2; 1g. ri (Di ) =

i

−8

10

−10

10

−8

−6

10

−4

10

−2

10 p

10

0

10

i

R(D) and the coders

1

P

r = i ri . This will give the OPTA r(D) for our structure of gure 3. We solve this problem by the Lagrangian multiplier method.

0.9 0.8 0.7

R(D)

0.6

5.1 Derivation

0.5 0.4

0.2 0.1 0 0

0.05

0.1

0.15

0.2

P

ThePproblem is to minimize D = Ni=1 wi Di subject to Ni=1 ri = r where ri (Di ) is given by (5). An equivalent problem P in the variables D1P; : : :; DN is to minimize r = Ni=1 ri(Di ) subject to Ni=1 wi Di = D. For convenience in the derivation we use the latter form of the problem. Let  be the Lagrange multiplier corresponding to the constraint. Since the objective function r to be minimized is convex, and the constraint is linear in the Di 's, the solution to our problem is determined by the equation @L = 0 @D

0.3

0.25

D

Figure 4: Ri(Di ) and simple repetition and Hamming coders used in our simulations. Figure 5 illustrates the behavior of the OPTA distortion Di as a function of the BSC raw error probability p, for dierent values of bit rate ri. Several remarks are in order.  For ri = 0 (no transmission at all) we have an horizontal line Di = 14 which corresponds to Ui0 = 1 at the receiver, as we have already noticed. 2  For ri = 1, we have Di = p ? p2 and i = p. The total error probability distortion is imposed by the channel, and the optimal choice is to do no coding at all.  For ri = 1, we obtain the vertical line p = 12 . This corresponds to the limit case limr !1 Di = 0 for all p.

i

where L is the Lagrangian functional

L=

X i

ri (Di ) + (

X i

wi Di ? D)

Using (5) and the formula @H@x2 (x) = log2 ( 1?x x ) we obtain the following necessary and sucient condition for optimality: p1 ? 4D 1 + 1 p1 ? 4D log2 ( 1 ? p1 ? 4Di ) = wi (1 ? H2(p)) (6) i i Now, with any value of  > 0, this condition gives the optimal values of the Di 's. These in turn give the values of the ri 's from (5). The Di 's were computed from  by inverting the complicated function (6) numeriP cally. The result is, for any  > 0, a bitPrate r = i ri and a value of total distortion D = i wiDi which give a solution to the problem.

i

5 Lagrangian bound Now that we have determined the optimal performance ri(Di ) for each bit stream i, it remains to determine the optimal allocationPof bit rates ri that minimizes the total distortion D = i wi Di for a given rate budget 4

5.2 Results and comparison

transition probability = 0.01 40

The result (which we call Lagrangian bound) is plotted in gure 6. Notice that the curves obtained for dierent values of p are just scaled versions of each other on the r axis.

35

Lagrange Bound Optimization Without coding

30

SNR (dB)

25

Theoretic bounds 20

20

15 18

10

16 14

5

SNR (dB)

12

0 0

1

2

3

10 8

Lagrange r(D) Blahut

2 0 0

Upper bound 0.5

1

1.5 r(D)

2

2.5

5 R (bits)

6

7

8

9

10

Figure 7: Numerically attainable bound (with the used set of coders) and theoretical bound obtained by Lagrange's multiplier method. Also shown is the performance curve obtained without any coding.

6 4

4

3

6 Conclusion

Figure 6: Solid: Theoretical OPTA curves (Lagrange, In this paper we have proposed a new structure for Blahut) and the upper bound for p = 0. joint source and channel coding of a uniform source to be transmitted over a binary symmetric channel, and Figure 6 also compares our Lagrangian bound to the have derived precise theoretical performance for this OPTA curve obtained by Blahut's algorithm and the structure. Gaussian upper bound (2). It is interesting to note Further possible improvements include the ability that the Lagrangian bound is tangent to the upper to vector quantize the source samples on their binary bound every time R is an integer. This can be ex- representation, to take other types of sources into acplained as follows. count (including Gaussian and Laplacian) and to treat It turns out that the upper bound (2) is also, when the case of multiple sources. R is an integer, the optimal rate-distortion function when scalar quantization is used on the source samples U. Indeed, assuming for example that the source References is uniformly distributed on the interval [? 21 ; 21 ], so that 1. R.E. Blahut, Computation of chanel capacity and rate2 = 121 , it is easy to see that R = Rg (D) in (2) distortion functions , IEEE Transactions on Informa2 is equivalent to D = 2 2?2R = q12 where q = 2?R tion Theory 18 (1972), no. 4, 460473. is the quantization step. Now, by using our bitwise 2. J. Makhoul, S. Roucos, and H. Gish, Vector quantidecomposition as in gure 3, even though bits are prozation in speech coding , Proceedings of the IEEE 73 cessed block-wise, it seems that we have lost the abil(1985), no. 11, 15511588. ity to vector quantize the source samples U. Note, McEliece, The theory of information and coding, however, that our structure was proposed for a joint 3. R.J. Addison-Wesley, 1977. source/channel coding problem for which one also con- 4. Y. Shoham and A. Gersho, Ecient bit allocation for siders transmission errors due to a binary channel. It an arbitrary set of quantizers , IEEE Transactions on would be desirable to improve our structure to permit Acoustics, Speech and Signal Processing 36 (1988), vector quantization while also taking binary transmisno. 9, 14451453. sion errors into account. This is a subject for future 5. A.J. Viterbi and J.K. Omura, Principles of digital cominvestigation. munication and coding, McGraw-Hill, second ed., 1985. In [6], we have proposed an optimization procedure based on the proposed bitwise structure, using simple 6. S.B. ZahirAzami, P. Duhamel, and O. Rioul, Combined source-channel coding for binary symmetric chanbinary coders. The rate-distortion optimization was nels and uniform memoryless sources , Proceedings of performed using a variation of Shoham and Gersho's the colloque GRETSI (Grenoble, France), Sep. 1997, to algorithm [4]. Figure 7 shows the result of this optiappear. mization and compares it to the Lagrangian bound. 5