Channel Coding Techniques for Adaptive Multi Rate Speech Transmission Thomas Hindelang1 , Joachim Hagenauer1 , Max Schmautz2 , Wen Xu2 1
Institute for Communication Engineering (LNT) Munich University of Technology (TUM) Arcisstr. 21, 80290 Munich, Germany Email:
[email protected]
Abstract— A variable channel coding scheme for Adaptive Multi Rate (AMR) speech transmission over mobile radio channels is proposed. Although it was developed for the GSM (Global System for Mobile Communications), the basic concept of variable channel coding can be adopted to other digital radio systems. The new AMR concept allows almost wire-line speech quality even for poor channel conditions by dynamically splitting the gross bit rate between source (speech) and channel coding according to the channel quality. In this study we show some new aspects relating to AMR and show the advantage of recursive systematic convolutional (RSC) codes for mobile speech transmission. With some modification they improve not only the bit error rate but also the frame erasure rate. It is derived that RSC codes can be decoded with a “standard” non-systematic Viterbi decoder and a simple transformation. A new powerful approach is employed for the detection of the currently used mode (in-band signaled information) where the mode bits are integrated in a long block and codes gain with their length.
2
Dept. of Mobile Phone Development Siemens AG Hofmannstr. 51, 81359 Munich, Germany Email:
[email protected]
ing the frame erasure rate and by a derivation of decoding RSC codes with a standard non systematic decoder. In section IV the generation and in-band transmission of the mode bits is outlined and compared to the standardized system with short block codes. II. T HE AMR CONCEPT The basic concept of an AMR speech transmission system is depicted in Figure 1. Mobile station
Base station CH-Enc.
SP-Enc.
Channel UL
UL rate
I. I NTRODUCTION Due to a strongly varying transmission environment, mobile communication systems such as the current GSM suffer from the non-optimum partition of source coding and channel coding rate. E.g., in the GSM the source and channel coding rates remain fixed independent of the channel quality. Under bad channel conditions (deep fading) the redundancy inserted by channel coding might be insufficient to correct transmission errors, so that the signal (speech) cannot be properly reconstructed and this leads to very annoying artifacts. On the other hand, for good channels the overall speech quality can be improved if more bits are spent for source coding. The current digital radio systems are designed as a compromise between channel coding powerful enough to remove most transmission errors and sufficiently good speech quality. Now, the so-called adaptive multi rate (AMR) speech codec is standardized for GSM in ETSI (European Telecommunications Standards Institute). The AMR concept solves the source channel rate allocation problem in a more intelligent way. The ratio between source bit rate and error protecting redundancy is adapted according to the channel conditions. When the channel is bad, the source encoder operates at low bit rates with lower speech quality, allowing more bits to be used for a powerful forward error correction. The highest rate of the speech encoder is used for good channels, since in this case weak error protection is sufficient. In this paper we will present some new aspects on channel coding for mobile speech transmission and a new approach for transmitting signaling information. First, a brief description of the AMR concept is given in section II. The channel coding with unequal error protection (UEP) and recursive systematic convolutional (RSC) codes is described in section III. It is extended by an approach decreas-
CH-Est.
generate modebits
SP-Dec.
CH-Dec.
UL
detect chan. metric DL
DL chan.
UL rate
metric
net driven
Control Unit
Control Unit
UL required
DL rate
rate detect req. rate UL SP-Dec.
generate modebits
CH-Est.
DL
DL rate CH-Dec.
Channel DL
CH-Enc.
SP-Enc.
Fig. 1. Overview of the GSM AMR concept.
Both the base station (BS) and the mobile station (MS) mainly consist of the following functional entities:
a speech codec with variable bit rate (SP-Enc, SP-Dec), a channel codec with variable error protection rate, matching
to the bit rate of the speech codec (CH-Enc, CH-Dec), a channel estimation entity (CH-Est), a control unit for the rate adaptation.
In the studied system, the BS is the master and decides about the modes (rates) in both uplink (UL) and downlink (DL). The MS will decode the modes which are used in UL and DL, and send the estimated channel metric to the BS. A channel quality parameter is derived from the soft output generated by the equalizer. It is used to control the codec mode (rate) in UL and DL. The AMR concept for both uplink and downlink of the GSM speech transmission is outlined below in more detail. Uplink: After initialization the mobile starts transmission with the lowest speech bit rate, ensuring a secure transmission. The mode bits (i.e., the information about the used bit rate) and the DL channel metric (information about the DL channel quality) are sent to the channel encoder and transmitted in-band.
Hence, no control channel is necessary for the rate and channel quality indication. At the receiver (i.e., BS) appropriate channel decoding (including the detection of the in-band information) is done first followed by speech decoding. In parallel a measurement of the UL channel quality is carried out by the BS channel estimator. The measured UL channel quality and the detected DL channel quality metric are fed to the BS control unit, which determines the current DL rate (based on channel metric analysis) and the requested UL rate (based on the measured UL channel quality). Downlink: The current DL mode as well as the requested UL rate are transmitted in-band to the MS. The MS performs channel and speech decoding according to the detected DL rate (mode). Similar to the UL, a DL channel quality measurement is done by the estimator of the MS, and the requested UL rate is decoded from the received bit stream. From the measured DL channel quality a DL channel metric is calculated by the MS control unit and then transmitted in-band to the BS. The speech encoder now operates with the new requested UL rate. In Figure 1 the dashed-dotted lines indicate the DL signal flow and the dashed lines the UL signal flow, both in the MS and BS. III. CHANNEL C ODING
OF THE
S OURCE E NCODED B ITS
ui
x2i
x2i+1 Fig. 3. Realization of an RSC code.
The SegSNR (segmental signal to noise ratio) [2] is plotted for the bit positions 1 : : : 122, i.e., the considered bit is chosen randomly and the decoded speech is compared to the error free transmitted speech. Furthermore, the SegSNR is limited to +20 dB per segment of 20 ms. A low SegSNR indicates a greater error sensitivity. In this example, the first 9 bits of the LPC coefficients denoting the first vector quantized index are very sensitive to errors, but many other bits (SegSNR > 10 dB) are robust against errors. Therefore, an unequal error protection (UEP) is advantageous. The redundancy for the most important bits inserted by channel coding has to be greater than for the less important ones. That means, the information bits should be classified according to their sensitivity and then correspondingly protected. The required rates after channel encoding (e.g., 22.8 kbits/s for GSM full rate) can be obtained by puncturing. More information about UEP and puncturing can be found in [3, 4].
A. Unequal error protection The coded bits of the speech encoder are of varying importance for the speech decoder to reconstruct the original speech. Thus, the corruption of the speech encoded bits due to transmission errors have different impacts on the quality of the decoded speech. Considering a CELP (code excited linear prediction) speech coder, the bits of the LPC (linear predictive coding) coefficients are usually more important than those of the fixed codebook. Figure 2 shows the bit error sensitivity of a 6.1 kbits/s CELP speech codec [1].
19
LPC LTP−Gain LTP−Index CB−Gain CB−Index
17 15
B. Recursive systematic convolutional codes In this study we used recursive systematic convolutional (RSC) codes with constraint length 5 for convolutional channel coding. The use of RSC instead of non-systematic convolutional (NSC) codes has several advantages. The bit error rate (BER) for typical mobile radio channels is lower or alternatively the needed channel signal to noise ratio (SNR) for RSC codes at the same BER (up to 10 5 ) is lower than for NSC codes. Notice that the block error rate remains unchanged for both RSC and NSC codes. The gain in bit error rate compared to NSC codes increases the higher the rate of the code is. For a convolutional code of rate 2/3 and constraint length 4 we gain approx. 0.8 dB at a bit error rate of 10 2 . This gain holds only if the systematic bits are not punctured.
13
Another advantage of RSC codes arises from the fact that the systematic bits are transmitted over the channel and hence all information bits are available in the received bit stream before channel decoding. So, an a priori information for each bit can be calculated if there is redundancy left in the source coded bit stream [5–7] and added to a soft input outside the Viterbi decoder. That means, any Viterbi decoder can exploit a priori knowledge to improve the decoding result.
SegSNR in dB
11 9 7 5 3 1 −1 −3 −5
10
20
30
40
50
60 70 bit number
80
90
100
Fig. 2. Bit error sensitivity of CELP coded speech.
110
120
Fig. 3 shows an example for the realization of an RSC code with shift registers. The RSC code uses the generator polynomials G0 = 1 + D3 + D4 and G1 = 1 + D + D3 + D4 as defined in GSM full rate [8]. The polynomial G0 is used as the feedback polynomial. Higher rates can be achieved by puncturing.
C. Improving the frame erasure rate with RSC codes and UEP
0.2 0.1 FER
Typically, in speech transmission the bits are assigned to classes dependent on their importance for the speech quality. For an important class some error detection techniques are used like a cyclic redundancy check or a quality estimation. The result hereof determines a frame erasure.
1 0.5
0.05 0.02 0.01
In some modes in the speech channel of the GSM, e.g., TCH/AHS 7.4 kbits/s, (Traffic CHannel in the Adaptive multi rate Half rate Speech) the class 1 bits are protected with a rate 2/3 convolutional code obtained by puncturing a rate 1/2 codec. In the following we will show by an example how to modify a codec to improve both the BER and the frame erasure rate (FER).
NSC code with EEP RSC code with UEP
0.005 0
1
2
3
4
5
6
7
8
9
10
5 6 ES/N0 in dB
7
8
9
10
0.5 0.2 0.1
EEP with NSC codes
0.05
4 BER
140 rate 2/3
0.02 0.01
UEP with RSC codes 70
4
66
rate 7/11
rate 2/3
rate 7/10
4
0.005 uncoded NSC code with EEP RSC code: class 1b RSC code: class 1a
0.002 0.001
Fig. 4. Equal error protection with NSC codes and unequal error protection with RSC codes.
In the upper part of Fig. 4 an equal error protection (EEP) scheme with termination is shown. EEP is often applied to the whole class 1. If an RSC code with the same EEP is used the BER could be improved but the block error rate (at least one error within the whole block) stays the same. By using RSC codes we only change the assignment of the information bits to the paths in the trellis but not the coded bits. More information about code properties can be found in [9]. In mobile speech transmission the class 1 is divided into two subclasses 1a and 1b. Only the class 1a is considered in determining the frame erasure rate. For this reason we halve the 140 bits in the example to 70 class 1a bits and 70 class 1b bits. To achieve an overall rate of 2/3, 70 bits are coded with rate 7/11 and 4 with rate 2/3 and 70 (66 + 4 tail-) bits are coded with rate 7/10. All rates are achieved by puncturing the non-systematic bits of a rate 1/2 RSC code. A frame erasure is declared if at least one of the 70 class 1a bits is wrong. In Fig. 5 the upper part shows the frame erasure rate which is improved by approx. 0.5 dB. In the lower part it can be seen that the bit error rate of the class 1b is lower than the BER of the NSC code to a value of 8 dB. For better channels it’s a little worse. The class 1a bit error rate is now more than a factor of 2 below the curve of the NSC code. It can be noticed that this gain can be achieved without increasing complexity. For the simulation results shown in Fig. 5 and for all further results we used a block fading channel, which means the coded bits are separated onto 8 time slots (exact as in the GSM full rate [8, section 3.1.3]). Inside one slot the fading amplitude is kept constant but between the slots it is statistically independent and Rayleigh distributed. This corresponds approximately to the typical mobile channels at low speed and ideal
5·10-4
0
1
2
3
4
Fig. 5. Frame erasure rate (upper part) and bit error rate (lower part) of the NSC code with EEP and the RSC code explained in Fig. 4.
x2i u0i ui x2i+1 Fig. 6. Realization of the RSC code shown in Fig. 3 by a recursion and an NSC code.
frequency hopping but with an additional gain of 3 to 4 dB due to vanishing loss in synchronization, channel estimation, and equalization. D. Decoding an RSC code with a “standard” NSC decoder The RSC decoding can be realized by using the NSC decoder for the equivalent non-systematic code. In Fig. 6 the realization of the RSC encoder is shown. First, the sequence ui is transformed to u0i and then encoded by an NSC code. Note that the coded bit x2i is the systematic bit. After the NSC decoding the preceding transformation is canceled by a simple shift register with appropriate taps but without feedback. This leads to exactly the same hard decision as decoding with an RSC decoder. However, the exact reliability information (MAP probabilities [9]) for each decoded bit can usually be generated only by an RSC decoder. The advantage of an implementation of RSC decoding by using an NSC decoder is that the old hard-
ware (e.g., the standard Viterbi decoder for NSC codes) can further be used to decode the RSC code and the complexity for the calculation of the bits ui from u0i is negligible. E. Comparison of standard channel coding for GSM full rate and a new UEP scheme with RSC codes The advantages shown in the paragraphs III-A – III-D were the reason to modify the GSM full rate standard with a new UEP channel coding scheme and RSC codes. For the implemented UEP scheme the used channel coding rates are shown in the following table, where 3 bits are additionally reserved for mode indication or a parity check.
are better protected than in the GSM standard although some less important bits (e.g., bits 42 : : : 92) are less protected. As a result, better subjective quality is obtained confirmed by our informal subjective tests. It is well audible and objective measurable in terms of the speech SNR. For the UEP scheme presented the use of one parity check (with 3 bits like in the GSM full rate) for error concealment might not be a good solution. Instead one may use more each protecting only a small number of bits. Possible solutions without parity checks were proposed in [10, 11]. A new proposal for the error concealment in the AMR codec is described in [1].
TABLE I UEP SCHEME FOR THE SPEECH CODEC MODE WITH 13.0 KBITS / S .
bit rate bit rate
0 18 :::
2/5
19 41 42 100 101 160 :::
:::
1/2
2/3
161 219 220 243 244 262 :::
:::
:::
2/3
1/2
2/5
:::
3/4
To compare the performance, the new 13.0 kbits/s channel codec with UEP is used in both recursive systematic form and non-systematic form. The coded bits are punctured such that the channel bit rate of 22.8 kbits/s is generated [8]. Figure 7 shows the BER of UEP/NSC, UEP/RSC schemes and the GSM full rate standard for a channel SNR of 2 dB and 6 dB, where the used channel model is a block fading channel (see par. III-C). For better comparison with the new UEP scheme, the uncoded class 2 bits of the GSM standard are placed in the middle (bits 93 : : : 170) of the frame of information bits. Note that for bad channels the NSC code yields a higher BER than the uncoded bits.
10
10
−1
−2
IV. I N -BAND T RANSMISSION
OF
M ODE B ITS
As mentioned in section II the rate indicating mode bits have to be protected by a powerful code to avoid wrong rate adaptation which may lead to annoying artifacts due to wrong speech decoding. With code termination the first and last decoded bits have a lower BER compared to the bits in the middle because of the known initial and final state of the decoder. Therefore, placing the very important mode bits at the very beginning of the frame of information bits and performing a joint encoding/decoding using UEP together with the speech bits ensures a very secure in-band transmission of mode bits. In contrast, employing a short block code to protect the mode bits separately is less powerful. Because of joint coding/decoding of mode bits together with speech bits it is necessary to use the same coding scheme for the first part of the frame for all different source rates. With such a hierarchical coding scheme the complete trellis of the first part can be built without knowing the actual block-size, and the mode bits can then be decoded. The length of the first part, which is common to all rates, is typically 5m, where m is the memory of the convolutional coder. This takes into consideration the constraint length of the code and hence achieves a BER less than building the trellis only for the first few mode bits. Having determined the mode bits, the block-size and the used rates are known so that the rest of the received bit stream can be properly decoded.
BER
T 22
10
−3
M 34
10
10
1/5 1/3 1/4
GSM TCH FS, 2 dB New UEP with NSC, 2 dB New UEP with RSC, 2 dB GSM TCH FS, 6 dB New UEP with RSC, 6 dB
−4
42
101
161
2
220
172
8/14
8/11
4
14
42
45
41
16
16
2/5
1/2
9/16
1/2
2/5
1/3 1/2
24
1/3 2/5
−5
19
16
52
61
24
21
16 4
1/2
2/5
1/3
3/7 2/5
16
24
31
41
4
1/3
2/5
1/3
1/4
1/3
4
244
bit number
Fig. 8. Hierarchical channel coding structure with UEP for different source bit rates. Fig. 7. Performance of RSC and UEP for a block fading channel.
As shown in paragraph III-A, only a small number of speech encoded bits have strong impacts on the quality of the reconstructed speech in case of errors (see Fig. 2). In our UEP scheme, these bits are placed in the first part (e.g., bits 0 : : : 18) and in the last part (bits 244 : : : 262) of the frame, and hence
Fig. 8 shows the hierarchical coding scheme for 4 different speech rates with the corresponding channel coding rates, where 3 bits are reserved for the rate indication. In this example the length of the common part is 23 bits. Due to joint channel coding a plausibility check of the decoded mode bits can also be carried out by comparing the values of the met-
rics in conjunction with the estimated channel quality. Thus, wrong rate adaptation can normally be avoided. The detection of mode bits in the new AMR standard and in a hierarchical scheme The above described approach for in-band signaling is very powerful. To show its advantage we compare the AMR standard at 5.9 kbits/s [8, section 3.9.4.4, TCH/AFS5.9] and its modified hierarchical scheme. This is realized as follows: In the AMR standard, the two mode bits are separately encoded by a rate 1/4 block code. Here, they are integrated into the block of information bits. This results in 126 bits. These are encoded with a rate 1/4 convolutional code [8] resulting in 528 (520 in [8]) coded bits. Instead of puncturing the bits C(0),C(1),C(3) (see [8]) the bits C(501),C(505),C(508) are punctured. This leads to a modest increase of the BER at the end of the block of information bits but a decrease at the beginning. Finally, all puncturing positions are right shifted by 8 leaving the first 8 bits (coded mode bits) untouched. The receiver decodes the first 23 information bits and decides upon the first two bits (mode bits). Depending on this decision, decoding continues.
V. C ONCLUSIONS We described channel coding and adaptation algorithms for GSM AMR speech transmission. The methods presented here can certainly be used for other mobile transmission systems like the third generation mobile telecommunication standard. The channel coding scheme was designed as hierarchical as possible. The mode bits are coded together with the source (speech) bits to ensure a better protection. By using RSC codes instead of conventional NSC codes, better performance can be obtained especially in mobile communication environments. The channel coding algorithms presented in this study have been combined with the VR-CELP speech coding and the error concealment algorithms described in [1] and the channel estimation and rate adaptation techniques shown in [12] to build an AMR codec proposal for GSM speech transmission [13]. Subjective tests were carried out to evaluate this AMR codec proposal for the conditions defined at ETSI SMG11 AMR standardization meetings. The test results showed that it met most of the qualification requirements and constraints [14]. ACKNOWLEDGMENTS The authors are grateful to Prof. Peter Vary, Stefan Heinen and Marc Adrat, with RWTH Aachen, and Dr. Stefan Oestreich, Dr. Juergen Paulus, with Siemens AG, for many fruitful discussions. Siemens AG, Munich, Germany, has supported this work.
0.03
R EFERENCES
Mode error rate
0.01
[1]
0.003 0.001
[2]
3·10-4
3.9 dB
1·10-4 3·10-5 1·10-5
[3]
TCH/AFS standard Hierarchical approach -2
-1
0
[4] 1
2 3 ES/N0 in dB
4
5
6
7
[5] Fig. 9. Mode Errors in the TCH/AFS standard and in the hierarchical approach (Transmission over a block fading channel).
As shown in Fig. 9 there is a gain of 3 to more than 4 dB if the in-band signaling information is integrated into the whole block of information bits. The free distance of two code words in the standardized short block in TCH/AFS code is 5. For a rate 1/3 constraint length 7 (5) convolutional code it is 14 (12) and for rate 1/4 it is 20 (15) [9]. The complexity increase is modest, e.g., if a soft output Viterbi algorithm [5] is used, merely the decision feed back has to be done twice for the first 23 bits. This feedback is much less complex than the forward trellis construction which runs through at once. In this example, the hierarchical system was integrated in the standard with small modifications. Remember that this approach needs the same code and the same puncturing for the first part of a block (mode bits + approx. 20 speech bits) of all rates. Furthermore, a priori information can be exploited in the hierarchical scheme and improve the results.
[6] [7] [8] [9] [10] [11] [12]
[13] [14]
S. Heinen, M. Adrat, O. Steil, P. Vary, and W. Xu, “A 6.1 to 13.3 kbit/s Variable Rate CELP Codec (VR-CELP) for AMR Speech Coding,” in Proc. of ICASSP’99, Phoenix, Arizona, Mar. 1999. N.S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1984. J. Hagenauer, “Rate-Compatible Punctured Convolutional Codes (RCPC Codes) and their Applications,” IEEE Transactions on Communications, vol. 36, no. 4, pp. 389–400, Apr. 1988. J. Hagenauer and T. Stockhammer, “Channel coding and transmission aspects for wireless multimedia,” to appear in Proceedings of the IEEE - Special Issue on Video Transmission for Mobile Multimedia applications, 2000. J. Hagenauer, “Source-Controlled Channel Decoding,” IEEE Transactions on Communications, vol. 43, no. 9, pp. 2449–2457, Sept. 1995. A. Ruscitto and T. Hindelang, “Channel Decoding Using Residual IntraFrame Correlation in a GSM System,” IEE Electronics Letters, vol. 33, no. 21, pp. 1754–1755, Oct. 1997. T. Hindelang, T. Fingscheidt, N. Seshadri, and R.V. Cox, “Combined Source/Channel (De)Coding: Can A Priori Information Be Used Twice?,” in Proc. of ICC’2000, New Orleans, Lousiana, June 2000. “Recommendation GSM 05.03 Channel Coding,” ETSI TC-SMG, Version 8.3.0, Release 1999. R. Johannesson and K.S.Zigangirov, Fundamentals of Convolutional Codes, IEEE Press, Inc., Piscataway, New Jersey, 1999. T. Hindelang, C. Erben, and W. Xu, “Quality Enhancement of Coded and Corrupted Speeches in GSM Using Residual Redundancy,” in Proc. of ICASSP’97, Munich, Germany, Apr. 1997, vol. 1, pp. 259–262. T. Fingscheidt and O. Scheufen, “Robust GSM Speech Decoding Using the Channel Decoder’s Soft Output,” in Proc. of EUROSPEECH’97, Rhodos, Greece, Sept. 1997, pp. 1315–1318. T. Hindelang, M. Kaindl, J. Hagenauer, M. Schmautz, and W. Xu, “Improved Channel Coding and Estimation for Adaptive Multi Rate (AMR) Speech Transmission,” in Proc. of VTC’00 Spring, Tokyo, Japan, May 2000. “ETSI SMG11: Proposal of an Adaptive Multi Rate Codec.,” AMR #10 Tdoc AMR /98, Siemens, Stockholm, Sweden, June 1998. W. Xu, S. Heinen, T. Hindelang, et al., “An Adaptive Multirate Speech Codec Proposed for the GSM,” in Proc. of 3. ITG Conference “Source and Channel Coding”, Munich, Germany, Jan. 2000, pp. 51–56, VDE– Verlag.