Combined Source/Channel (De-)Coding: Can A Priori Information Be Used Twice? T. Hindelang1 , T. Fingscheidt2 , N. Seshadri3 , R.V. Cox4 AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ 07932, USA Abstract— In digital transmission of speech, audio, images and video signals residual redundancy is often left after source coding due to the complexity and delay constraints. This redundancy remains both inside one block or frame but also in a time correlation of subsequent frames. We describe an approach to improve channel and source decoding by using both kinds of correlation. Further on we consider the bit-mapping and the multiplexing in coding and its effect on decoding. The mutual information as measurement for the gain in decoding through a priori information is explained. In this work on combined source and channel decoding, we try to answer the following question: Can a priori information that models the source parameters be used twice; first at the channel decoder and then at the source decoder. The channel decoder uses the a priori information that models the bit stream generated by the source coder. This doesn’t capture all the details of the source parameter level statistics. By exploiting the a priori knowledge of parameters (once more) at the source decoder, we show that it is possible to achieve better reconstruction than if this information was used at either of the decoders.
I. I NTRODUCTION Speech, audio, image or video signals are highly correlated and source coding algorithms are used to reduce redundancy. Usually in source coding, parameters are generated which are quantized by a sufficient amount of bits. In channel coding redundancy is added to correct or detect errors at the receiver. In speech coding, the speech is typically separated into frames which are coded individually. Due to complexity reasons and delay constraints the coded parameters contain redundancy both inside one frame but also from frame to frame (time correlation). There are publications focusing on using the source redundancy for source decoding, e.g. [1–3]. In [4] and [5] methods are described to exploit the distribution of parameters as a priori information on bit level of the channel decoder. In [6] the time correlation on bit level was used to improve channel decoding. In a new approach we use time correlation on a parameter level. Especially in flat fading channels it is known that the time correlation improves channel decoding [1]. In [7] the combination of the two kinds of a priori information on bit level were described, here we combine it on parameter level. In this paper we attempt not only to improve channel decoding, but also show how the end-to-end quality of parameters can be improved by choosing the bit-mapping and the multiplexing of the source bits before channel coding. Here we use only equal error protection and consider only parameters which are quantized with three bits. The results can be extended easily to other quantization levels. This work was done while all authors were with AT&T Labs – Research, 1) is now with the Inst. f. Comm. Eng. at the University of Technology, Munich, 80290, Germany, Email:
[email protected], 2) with Siemens AG (ICP CD Mobile Phones), Munich, 81675, Germany, Email:
[email protected] and 3) with Broadcom Corp., Irvine, 4) CA 92619, Email:
[email protected]. is with AT&T, Email:
[email protected]
In section II we briefly review the softbit decoding technique [1], whereas in section III we discuss source controlled channel decoding according to [4, 8], both presented in a uniform manner. In section IV we discuss the bit-mapping and how the bitmapping determines the mutual information. The results (V) show the dependence between mutual information and the gain by decoding with a priori knowledge. Further on, we show the bit error rate and the parameter SNR for sources with one or respectively 20 correlated parameters within one frame. II. S OURCE D ECODING U SING S OURCE A P RIORI K NOWLEDGE Let us assume a speech or an image signal s~ being source coded. The source coder usually generates so-called codec parameters, e.g. pitch, spectral coefficients, etc. In the following we focus only on a single codec parameter v~n 2 R at time instant n = 0 as depicted in Fig. 1. The parameter v0 ℄ = v0 and coded by the bit combiis scalar quantized [~ nation x0 = fx0 (0); x0 (1); :::; x0 (M 1)g consisting of M bits x0 (m) 2 f0; 1g; m = 0; 1; :::; M 1. When all parameters belonging to a frame are coded, their bits are multiplexed building a frame of L source bits. This frame is now subject to convolutional encoding resulting in a frame of channel bits y0 . In the following we assume the regarded parameter v~n is generated only once per frame. Thus the parameter time instant n = 0 denotes the current frame, n = 1 the previous frame, etc. After transmission over an equivalent channel comprising modulation, the physical channel, and soft-demodulation, the received frame y^0 is fed into a channel decoder. To allow combined source/channel decoding the channel decoder is required to yield a soft output (softbits) [1, 9] e.g. in form of M decoder probabilities Pd (x0 (m)) = P(x0 (m)jy^0 ) giving a likelihood for every source bit x0 (m). These probabilities are the interface to the source decoder which uses them in the form
Q
d
P (x0 ) =
Yd
M
1
m=0
P (x0 (m))
j
^ ) P(x0 y 0
:
(1)
The source decoding of a parameter can be done in several ways [1]. We don’t consider the simple hard decision which is always worse than using reliability information. Often times a squared error measure like the parameter SNR defined as SNR
f g=E f(~v 2
= 10 log10 E v ~
(est)
v~)2 g
(2)
describes the transmission quality reasonably well. Thus a mean square estimation called source decoding with no a priori
Source Coder
s~
v~0
Q
MUX
x0
y0
Channel Coder
Equivalent Channel
y^0
Channel Decoder
Source Decoder (est) s^ Parameter v~0
DMUX
Decoder
d
P (x0 (m))
Parameter v0 Coder
A priori Knowledge Fig. 1. The transmission system.
X
information (SD/no) according to
v~0(est)
=
v (x0 ) Pd (x0 )
x0
SD/no (3)
is appropriate. Modeling the quantized parameter as a Markov chain the parameter a priori knowledge P(xn ) (0th order Markov, models the parameter distribution) or alternatively P(xn j xn 1 ) (1st order Markov, models the parameter correlation over time) can be exploited. Assuming the Markov chain to be homogeneous the a priori knowledge can be written as P(x0 ) = P(xn ) and P(x0 j x 1 ) = P(xn j xn 1 ). Using 0th order a priori knowledge (SD/AK0), (3) becomes (est)
v~0
=
X x0
v (x0 ) C0 Pd (x0 ) P(x0 )
SD/AK0 (4)
with the constant C0 normalizing the sum over the product of probabilities to one. Finally, the usage of 1st order a priori knowledge (SD/AK1) in (4) leads to (est)
v~0
=
X x0
v (x0 ) C0 Pd (x0 ) Pp (x0 )
SD/AK1 (5)
with the constant C0 as used in (4). A recursive computation yields the prediction probabilities1
p
X
P (x0 ) =
x
P(x0
j x ) C Pd (x ) Pp (x 1
1
1
1)
A. Using Intraframe Correlation If the upper branch in Fig. 2 is switched on, correlations among the bits x0 are exploited in terms of a priori knowledge about a bit x0 (m) given the M 1 other bits x0nm (intraframe correlation). In this case the main channel decoder delivers probabilities Pd (x0 (m)) P(x0 (m)jy^0 ) dependent only on the currently received frame y^0 . At first a preliminary decoding step is performed yielding an estimate P(x0 (m)jy^0 ) as given in (1) assuming x0 (m) = 0 and x0 (m) = 1 being equiprobable, i.e. no a priori knowledge is used here. Now, the prediction probability estimate [4]
p
j
P 0 (x0 (m) y ^ )
with
X
x0nm x0 (m) 2 f0; 1g 0
jx nm ) P(x nm jy^ );
P(x0 (m)
0
0
0
CD/AK0 (7)
(6)
1
with C 1 being the normalization constant of the previous frame. If frame n = 0 is the first transmitted frame ever, the recursion (6) can be initialized with C 1 Pd (x 1 ) Pp (x 1 ) = P(x 1 ). III. CHANNEL D ECODING U SING S OURCE A P RIORI K NOWLEDGE In Fig. 2 possible realizations of the channel decoder block in Fig. 1 are depicted. They differ only in the used type of a priori knowledge about the regarded parameter, and thus in the interpretation of Pd (x0 ). For simplicity, some (de-)multiplexing 1 Here
schemes which are necessary in the channel decoder itself are not shown. Furthermore, only probabilities related to the regarded parameter x0 are depicted in Fig. 2. If both switches are in the off position, conventional channel decoding is carried out yielding — besides information on the other source bits — the soft-output bit probabilities P(x0 (m)jy ^ ); m = 0; 1; :::; M 1, assuming x0 (m) = 0 0 and x0 (m) = 1 being equiprobable. Because this assumption does not rely on any a priori knowledge about the source, we call this type of channel decoding CD/no.
and in the following the term prediction probabilities denotes the probability distribution of x0 or x0 (m) as predicted by bit combinations in the previous frame and/or neighboring bits in the same frame. They are exploiting the a priori knowledge which in our terminology exclusively relates to the decoder ROM tables describing the statistics of the parameter.
is used in the main decoding step to modify the metric. B. Using Interframe Correlation If the lower branch is switched on, correlation in time (interframe correlation) is exploited. The main channel decoder delivers probabilities Pd (x0 (m)) P(x0 (m)jY^ 0 ) with the term Y^ 0 = y^0 ; y^ 1 ; ::: denoting the whole history of received frames. Based on Pd (x 1 ) P(x 1 jY^ 1 ) as generated after decoding of the previous frame, prediction probabilities [8]
p
j
^ P 1 (x0 (m) Y 1) =
X x
jx
P(x0 (m)
1)
P(x jY^ 1
;
1)
1
with x0 (m) 2 f0; 1g
CD/AK1 (8)
y0 y^0 Pd (x0 (m)) (est) v~0 s^
j
jx nm )
P(x0 (m)
P(x0 (m) y ^ )
0
0
Preliminary Channel Decoder
Compute CD/AK0 (7)
j
p
P 0 (x0 (m) y ^ ) 0
d
P (x0 (m))
Main Channel Decoder
y^0
p
j
^ P 1 (x0 (m) Y 1)
T
Compute CD/AK1 (8)
jx
P(x0 (m) Fig. 2. Channel decoding using a priori knowledge.
1)
can be computed. They are taken into account at the respective trellis transitions (eqs. (4) ff. in [9]), when the metric is generated within the main channel decoder for the current frame. C. Using Inter- and Intraframe Correlation The above approaches can also be combined to exploit both the (CD/AK0) and the (CD/AK1) type of a priori knowledge. This means both switches in Fig. 2 are in on position. In channel decoding the AK0 and AK1 can be calculated separately if y^0 and Y^ 1 are considered as statistically independent, which is a quite simplifying assumption. Multiplication of both parts and application of a correction factor yields
p
j
^ )= P 01 (x0 (m) Y 0
p
j p
j
^ P 0 (x0 (m) y ^ ) P 1 (x0 (m) Y 0
with x0 (m) 2 f0; 1g
P(x0 (m))
1)
;
CD/AK0+1
I (X0 (m); X 0nm ) and also the dependence of between the previous parameter to the current parameter for which we use the mutual information I (X0 (m); X 1 ). Also the entropy H (X0 (m)) or - respectively - the so called bit redundancy 1 H (X0 (m)) of each considered bit gives a hint on how much we can gain by using a priori information. The bit redundancy is included in calculating the a priori probabilities in both (7) and (8). Only the natural binary and the folded binary mapping are taken into account. In Table I the bit redundancy and the mutual informations are shown. A Lloyd-Max Quantizer and a time correlation with = 0:8 are assumed. The three rows for each bit-mapping determine the values for the first, second and third bit of a parameter. For space reasons the X0 (m) is substituted by a simple X . In the fifth column the average mean square error (MSE) for the three bits is shown. It is calculated by the square distance of the parameter without error and with an error in the considered bit weighted by the probability that the considered parameter takes this value: e2 (m) =
er(m)
X x0
p(x0 ) (v (x0 )
(m) 2 v (xer )) ; 0
(10)
denotes the parameter x0 with an error at the where x0 m-th bit. There is a strong dependence between the mean square error and the parameter SNR. TABLE I M UTUAL INFORMATION , ENTROPY AND MEAN SQUARE ERROR FOR DIFFERENT BIT- MAPPINGS OF 3 BIT QUANTIZED PARAMETERS .
m
Bit-mapping 1 H (X ) I (X ; X 0n ) I (X ; X 1 ) MSE natural bin. 0.0 0.170 0.373 5.029 000,001,010,011, 0.0 0.147 0.049 1.506 100,101,110,111 0.0 0.049 0.0 0.377 folded bin. 0.0 0.0 0.373 3.871 011,010,001,000, 0.125 0.022 0.174 1.506 100,101,110,111 0.027 0.022 0.029 0.377
(9) which is to be used to modify the decoder metric. Note that in contrast to the channel decoding schemes the SD/AK1 technique already includes the distribution related redundancy used in SD/AK0. IV. B IT-M APPING
AND
M ULTIPLEXING
In this paper we only used the Lloyd-Max quantizer. The mapping of the quantization-levels to the bit combinations has influence on the bit error rate after decoding with a priori information and also on the parameter SNR which is an indicator for the quality. In [10] the dependence of some bit-mappings on the parameter SNR is shown. In the following we further extend the work by showing the amount of a priori information which can be used for channel decoding. The mutual information is a measure of the dependence of two random variables. The introduced method for decoding uses the dependence between one bit and the other bits of a quantized parameter shown in the mutual information
The mutual information dependent on the bit-mapping shows what kind of bit-mapping should be used. E.g., if there is a very high correlation of 0th order (CD/AK0) and only the bit error rate is important the natural binary mapping outperforms the folded binary mapping [4]. The disadvantage is the higher MSE of this mapping. This leads to a worse performance considering the parameter SNR (Fig. 4). The MSE also gives a first information how unequal error protection should be used; e.g. a four times higher SNR means that the error rate should be one fourth to achieve the same MSE. We don’t go into further detail. In the following, the multiplexing before channel coding is considered. Since we assume the independence of the channel decoded information (1) of the bits of one parameter in (7) and (8), these bits were distributed over the trellis with a distance of at least 20 within the block. Simulations also showed that the results get worse if the bits belonging to one parameter are put close to each other and the independence is not given.
0.2 14
0.05 0.02
y^0
j p (x (m)jy^ ) ^ ) (x (m)jY x (m)jx nm ) (x (m)jx ) Pd (x (m)) 0
0
0
0
1
0
0
0
1
0
P(x0 (m) y ^ ) 0
0.005
BER
0
y^0
j Pp (x (m)jy ^ ) ^ ) Pp (x (m)jY P(x (m)jx nm ) P(x (m)jx ) Pd (x (m))
0.01
P(x0 (m) y ^ )
0
0.002
1
0.001
2·10-4
0
0
0
1
0
CD/no CD/AK0 CD/AK1 CD/AK0+1
5·10-4
1·10-4
Parameter SNR (dB)
0.1
0
0
12 10 8 SD/AK1, CD/no (folded) SD/AK1, CD/no SD/AK1, CD/AK0+1 SD/no, CD/AK0+1 SD/no, CD/no
6
1
0
4 2
-6
-5
-4
-3
-2
-1
0
1
Es =N0 (dB)
-4
-3
-2
-1
0
1
2
Es =N0 (dB)
Fig. 3. The bit error rate of the 1st bit of a parameter quantized with 3 bits and natural binary mapping.
Fig. 4. Parameter SNR for folded binary (upper curve) and natural binary (4 lower curves) mapping using AK0 and AK1 in SD and CD.
Further on, 20 correlated parameters were considered. Again, the bits belonging to one parameter have to be distributed over the trellis. But now, there are many ways to do multiplexing. The one which delivers good results will be discussed in section V.
coding. Seen from a combined source/channel decoding point of view we showed for the given system settings that it is advantageous to use the source redundancy within the source decoder alone employing softbit source decoding [1]. We expect the reason for this to be the channel decoder’s a priori knowledge which does not really exploit the parameter statistics, but instead is tied to a bit level. This result also shows that for a system where only one parameter is considered, the a priori information can not be used twice. But if we look at the bit error rate dependent on the bit-position of the trellis [11] it is obvious that due to the constraint length of the code, neighbored bits also gain in the BER through the bit with a priori information. Therefore, 20 correlated parameters within one frame are considered. These are not correlated to each other but in time from frame to frame. In some investigation we found out that the following multiplexing before convolutional coding delivers best results. In a block of 99 bits (33 Parameters), the 20 MSBs of the correlated parameters are placed in the mid of the block, to the left and the right there are first placed the 20 2nd bits (10 each side) and then the third bits. Finally, we put 20 bits to the beginning and 19 bits to the end of the block (13 dummy parameters), so we don’t have the influence of the definite start and end of the code. Since we made the experience that the (SD/AK1) and the (CD/AK0+1) scheme leads to worse results than (SD/AK0+1) alone we were looking for a possibility to use the a priori information in the channel decoder but not obvious to the source decoder. In “Turbo” decoding, therefore the so called extrinsic information was introduced [12, 13]. In the same manner we “subtract” (in the log domain) the a priori information Pp01 in (9) which was used in channel coding. So the source decoder gets a reliability information Pd (x0 (m)) about each bit which is dependent on the received channel values and the information from all other parameters but not from the parameter itself. This information and the a priori information Pp in (5) is then used in the source decoder.
V. S IMULATION R ESULTS We perform simulations according to Fig. 1. A Gaussian parameter with correlation = 0:8 is Lloyd-Max quantized with M = 3 bit. We take a blocklength of 100 information bits which are convolutional coded by a rate 1/2, constraint length 5 recursive systematic convolutional code. First, we consider one correlated parameter within one frame, the others are dummies and are not considered for evaluation. As mentioned, the three bits of this parameter are distributed over the whole frame. Fig. 3 shows that the bit error rate decreases dependent what a priori information is used. Also a comparison to Table I shows that the bit redundancy and the two mutual informations are a good measurement for the gain in bit error rate by using a priori information. Since we have no bit redundancy itself the gain of (CD/AK0) and (CD/AK1) adds up in (CD/AK0+1) Fig. 4 shows the performance if a priori knowledge is used in the source decoder and/or the channel decoder. The 4 lower curves indicate that we gain a lot using a priori information, independent of the place and the bit-mapping. Furthermore, it turns out that the best overall performance is achieved without usage of a priori knowledge in the channel decoder at all (SD/AK1, CD/noAK). The SNR of the best scheme with a priori knowledge used in the channel decoder (SD/AK1, CD/AK0+1) is worse at all channel qualities, although — according to Fig. 3 — it reduces the bit error rate mostly! The best curve shows, that the folded binary mapping leads to a higher SNR since the average MSE is less than in natural binary mapping (see Table I). These results disprove the common opinion that the reduction of bit error rate should be the principal goal of channel de-
0.1 0.09 0.08
calculations of the source decoder. But after “subtracting”, we get another gain of up to 1 dB in the parameter SNR which gives the answer to the question “Can A Priori Information be used twice?”: “YES”. This gain can also be transmitted to systems where some parameters are redundant and others are not. It can be seen in Fig. 5, where the dummy bits gain also (pos. 0..19, 80..98).
0.07
BER
y^s~0 ~0) P(x0 (m)jy ^v 0 x p0 (x0 (m)jy^00) ^ v0) (x0 (m)jY y10 x0 (m)jx0nm y^0) (x0 (m)jx 1 ) Pd (x0 (m)) Pd (x0 ((est) m)) v~0 s^
0.06 0.05 0.04
VI. C ONCLUSIONS 0.03 CD/no CD/AK0+1 - Ap CD/AK0+1 0.02 0
10
20
30
40
50
60
70
80
90
bit number Fig. 5. Bit error rate for natural binary mapping after channel decoding using a priori information (AWGN channel at -3 dB).
Parameter SNR (dB)
14
y^0
j p (x (m)jy^ ) ^ (x (m)jY ) x (m)jx nm ) (x (m)jx ) Pd (x (m))
P(x0 (m) y ^ ) 0
0
0
0
0
1
0
0
0
12 10
R EFERENCES [1]
8 6
[2]
SD/AK1, CD/AK0+1 - Ap SD/no, CD/AK0+1 SD/AK1, CD/no SD/no, CD/no
1
0
We showed by simulation that a priori information can be used twice if it is subtracted after channel decoding and we consider not only one correlated parameter. That means for the source decoder that it does not see any a priori information of one parameter itself in the reliability information delivered by the channel decoder. The results can be extended to a complete speech transmission system, where some parameters are highly correlated and some are not. Further on, we showed that the bit-mapping is quite important what kind of a priori information delivers the highest gain in channel decoding. Also the mean square error, one possible optimization criterion, changes with the bit-mapping. Therefore, the bit-mapping has to be chosen dependent on the optimization criterion.
4
[3]
2 -4
-3
-2
-1
0
1
2
[4]
Es =N0 (dB) Fig. 6. Parameter SNR for natural binary bit-mapping with and without a priori information.
[5] [6]
In Fig. 5 the gain in bit error rates for decoding with (CD/AK0+1) is very high, especially for the MSBs (pos. 40..59). Compared to the gain in Fig. 3, where the BER for (CD/AK0+1) is reduced to app. 1/2, here it is almost 1/4. This reflects the influence of the neighbored bits in the trellis of a convolutional code. After “subtracting” the a priori knowledge of the parameter itself (CD/AK0+1 Ap) the BER increases compared to (CD/AK0+1), but we still have a big gain in BER. In Fig. 6 now things behave in an other way. The two lower curves with (CD/no) are the same compared to Fig. 4. But the curve with (SD/no, CD/AK0+1) now delivers a better quality. That means, if we use the information only once, it should be used in the channel coder. If we had used it twice without “subtracting” the a priori information of the bit itself, the SNR would be almost the same as (SD/no, CD/AK0+1), so we don’t show it here. The reason why we don’t gain anymore in this way – or even loose with only 1 parameter – is given by the double weighting of the a priori information in the reliability
[7] [8]
[9] [10]
[11] [12]
[13]
T. Fingscheidt and P. Vary, “Robust Speech Decoding: A Universal Approach to Bit Error Concealment,” in Proc. of ICASSP’97, Munich, Germany, Apr. 1997, vol. 3, pp. 1667–1670. T. Fingscheidt and P. Vary, “Softbit Speech Decoding: A New Approach to Error Concealment,” IEEE Transactions on Acoustics, Speech and Signal Processing accepted for publication, Nov. 1999. N. G¨ortz, “Joint Source Channel Decoding Using Bit-Reliability Information and Source Statistics,” in Proc. of IEEE International Symposium on Information Theory, MIT, Massachusetts, Aug. 1998, p. 9. T. Hindelang and A. Ruscitto, “Kanaldecodierung mit Apriori-Wissen bei nicht bin¨aren Quellensymbolen,” in Proc. of 2. ITG-Fachtagung ¨ “Codierung f¨ur Quelle, Kanal und Ubertragung”, Aachen, Germany, Mar. 1998, pp. 163–167, VDE–Verlag. S. Heinen, A. Geiler, and P. Vary, “MAP Channel Decoding by Exploiting Multilevel Source A Priori Knowledge,” in Proc. of EPMCC, Bonn, Germany, Oct. 1997, pp. 467–473. J. Hagenauer, “Source- Controlled Channel Decoding,” IEEE Transactions on Communications, vol. 43, no. 9, pp. 2449–2457, Sept. 1995. L. Papke and J. Hagenauer, “Quellengesteuerte Kanaldecodierung f¨ur ADPCM und LD-CELP Sprachcodecs,” in Proc. of 9th Aachen Symposium “Signaltheorie”, Aachen, Germany, Mar. 1997, pp. 225–228. T. Hindelang, J. Hagenauer, and S. Heinen, “Source Controlled Channel Decoding of Quantized and Correlated Symbols,” in Proc. of 3. ITG Conference “Source and Channel Coding”, Munich, Germany, Jan. 2000, pp. 251–258, VDE–Verlag. L.R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate,” IEEE Transactions on Information Theory, vol. 20, pp. 284–287, Mar. 1974. N. Farvardin and V. Vaishampayan, “Optimal Quantizer Design for Noisy Channels: An Approach to Combined Source-Channel Coding,” IEEE Transactions on Information Theory, vol. 33, no. 6, pp. 827–838, Nov. 1987. M. Kaindl and T. Hindelang, “Estimation of Bit Error Probabilities Using A Priori Information,” in Proc. of IEEE GLOBECOM’99, Rio de Janeiro, Brazil, Dec. 1999, vol. 5, pp. 2422–2426. C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes,” in Proc. of IEEE Int. Conf. on Communications 3, Geneva, Suisse, May 1993, pp. 1064– 1070. J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 429–445, Mar. 1996.