Performance Comparison of Source Controlled GSM AMR and SMV Vocoders J. Makinen, P. Ojala, H. Toukomaa Multimedia Technologies Laboratory Nokia Research Center Tampere, Finland E-mail:
[email protected],
[email protected] Abstract- The Adaptive Multi-Rate (AMR) speech codec offers substantial improvement over previous GSM speech codecs in error robustness by adapting speech and channel coding depending on channel conditions. In GSM AMR, the trade-off between speech quality and average bit rate can be further improved by source signal based rate adaptation (SBRA). Together with fast power control, SBRA GSM AMR can be used as a variable rate codec bringing reduced average bit rate contributing to an increase in system capacity. SBRA GSM AMR was tested against currently standardised SMV variable rate speech codec. This paper also presents the general descriptions of both SBRA GSM AMR and SMV codecs. I.
INTRODUCTION
The Adaptive Multi-Rate (AMR) speech codec was developed for the Global System for Mobile Communication (GSM) system in 1998 [13]. It is also the mandatory codec for the 3GPP speech services. The codec is based on ACELP technology. AMR consists of eight codec modes having fixed bit rates ranging from 4.75 kbps to 12.2 kbps. The bit rate is dynamically controlled by the GSM radio network according to prevailing channel conditions. The codec also includes a Discontinuous Transmission (DTX) functionality, which enables simple source controlled operation by means of Voice Activity Detection (VAD) [2, 3]. The Selectable Mode Vocoder (SMV) is a variable rate speech codec standardized for the 3GPP2 cdmaOne and cdma2000 systems in 2000. The SMV codec has four operation modes each with different average bit rates, thus allowing different voice quality/system capacity trade-offs. SMV is based on extended CELP (eX-CELP) technology [7]. This paper gives an overview of the AMR codec extended with an additional optimized source based rate adaptation (SBRA) for GSM. The aim of SBRA extension is to reduce the bit rate while maintaining the speech quality. The SBRA extended AMR codec is benchmarked against the SMV speech codec, which is state of the art variable rate speech codec. The benchmarking was done by formal subjective listening tests.
II.
OUTLINE OF THE SMV SYSTEM
The SMV speech coder operates in four different variable bit rate modes, which are defined as Mode 0 (premium), Mode 1 (standard), Mode 2 (economy), Mode 3 (supereconomy). The average bit rates of the Modes greatly depend on the speech activity and content. For active speech, Modes 0, 1, 2, and 3 have dedicated target bit rates around 8.00, 5.80, 4.50 and 3.95 kbit/s, respectively [8]. The coder consists of a full-rate codec operating at 8.5 kbps, a half-rate codec at 4.0 kbps, a quarter-rate codec at 2.0 kbps, and an eighth-rate codec at 0.8 kbps. These are all used in the different SMV modes, with the exception of the quarter-rate codec, which is not allowed in Mode 0 [8]. The block diagram of the SMV system is presented in Fig. 1. The rate decision algorithm selects one of the four possible codecs, based on source signal characteristics and the utilized SMV Mode [8]. input speech
mode
Preprocessing
Frame processing
Full / Half rate
Type 0
Rate decision
Quarter / Eighth rate
Type 1
Fig. 1. A block diagram of the SMV codec.
The bit allocation of the SMV codec is shown in Table I. The full-rate and half-rate codec have both two frame types with different bit allocations. The quarter-rate and eight-rate codecs are based on spectrum- and energy-modulated random noise models having bits allocated for LPC and energy parameters only.
TABLE I BIT ALLOCATION OF THE CODECS IN THE SMV CODEC Rate Type LPC Energy Mode Pitch Excitation Gains Total/frame 0 27 0 1 26 88 28 170 1 25 0 1 8 120 16 170 Full 0 21 0 1 14 30 14 80 Half 1 21 0 1 7 39 12 80 Quarter 27 12 0 0 0 0 39 Eight 11 5 0 0 0 0 16
III.
speech input
TS 26.094
TS 26.093
VAD
DTX
OVERVIEW OF THE SBRA GSM AMR
The AMR speech codec utilizes the ACELP (Algebraic Code-Excited Linear Prediction) algorithm. The codec contains eight modes with bit-rates of 12.2, 10.2, 7.95, 7.4, 6.7, 5.9, 5.15 and 4.75 kbps. The bit allocation of AMR modes is shown in Table II. AMR exploits VAD/DTX functionality to minimize the bit rate during silence periods. During the DTX operation the encoder transmits comfort noise (CN) parameters (35 bits) only every eighth frames resulting in 0.22 kbit/s operation [3, 5]. Frame processing functions of the AMR codec includes pre-processing, LPC analysis, open-loop pitch analysis, adaptive and algebraic codebook search. A more detailed description can be found in [1]. TABLE II BIT ALLOCATION FOR THE AMR CODEC MODES Rate LPC LTP Excitation Gains Total/frame 12.2 kbps 38 30 140 36 244 10.2 kbps 26 26 124 28 204 7.95 kbps 27 28 68 36 159 7.4 kbps 26 26 68 28 148 6.7 kbps 26 24 56 28 134 5.9 kbps 26 24 44 24 118 5.15 kbps 23 20 36 24 103 4.75 kbps 23 20 36 16 95 DTX 29 0 0 6 35 (1.75kbps/8)
A.
GSM link adaptation The GSM AMR speech codec mode is selected based on used radio channel and prevailing error conditions. It increases spectral efficiency and improves voice quality in erroneous radio channels in GSM [4]. The in-band signaling of GSM supports adaptation between four active codec modes. A set of up to four active codec modes is selected at call set-up (and in handover) among all the eight possible coding modes. Due to signaling restrictions in GSM, the adaptation takes place every 40 ms, i.e., in every second speech frame. The mode can only be changed to one of the neighboring modes in the active codec set [6, 9]. B.
system capacity. According to the formal listening tests, SBRA algorithm decreases GSM AMR bit rate for active speech by 10-25% without compromising the speech quality. The achieved bit rate reduction depends on the operation mode and thus the available active codec mode set [9].
Source controlled operation for active speech The network capacity can be further optimized during active speech by adapting the AMR bit rate according to the source signal. The AMR speech codec can be extended by the SBRA algorithm, which improves the trade-off between speech quality and average bit rate, hence increasing the
TS 26.090
Bit rate target
mode SBRA
Frame processing functions
Active mode set
Fig. 2. Block diagram of the SBRA algorithm in the AMR framework.
C.
SBRA extension A high level block diagram of the SBRA GSM AMR system is depicted in Fig. 2. The SBRA algorithm selects the used speech codec mode for every speech frame based on information content of current and past speech frames. The selected mode is dependent on the active mode set and the bit rate target (the operation point) set up. The active mode set depends on the operation mode and selected call set-up. The operation mode is the mode with the highest bit rate in the active mode set and it is chosen by link adaptation according to prevailing channel conditions (Chapter 3.1, GSM link adaptation). In the SBRA module, the mode selection algorithm is adjusted according to the bit rate target set up. A closed loop controller is exploited to achieve the average bit rate convergence towards the bit rate target. The operation range is the range between minimum and maximum bit rate values, where SBRA algorithm is adjusted to perform. The operation range depends on the active mode set, which is on the one hand depending on the operation mode. An example of SBRA GSM AMR operation ranges with different active mode sets are introduced in Table III. The bit rate target can be tuned freely within the operation range during the encoding. Therefore, the SBRA extension is useful for more flexible capacity control in conversational services. TABLE III OPERATION RANGES OF GSM SBRA AMR FOR THREE ACTIVE MODE SETS Operation Active mode set Operation mode range [kbps] [kpbs] 12.2 [12.2, 7.40, 5.90, 4.75] 8.00–11.5 7.40 5.90
[7.40, 5.90 & 4.75] [5.90 & 4.75]
5.80–7.10 5.00–5.70
SPEECH QUALITY EVALUATION
Four formal listening test experiments were conducted to evaluate the performance of the SBRA extended GSM AMR against SMV codec. It can be called as a state of the art variable rate speech codec and it gives a challenging and well-known reference point for GSM SBRA AMR benchmarking. The performance evaluation focused only to compression effectiveness of the speech codecs in erroneous transmission channel, therefore testing of error robustness with different frame error rates (FER) were excluded. The listening tests were performed in Nokia Research Center’s listening test facilities conforming to ITU-T P-800 recommendations [10, 11]. A.
Adjustment of the benchmarked codec The bit rate targets of the SBRA GSM AMR were aligned with the average bit rates of the SMV modes. In Table IV, the overall average bit rates are shown for SMV modes and SBRA GSM (SG) AMR with different active mode sets. The speech activity of the listening test data was around 50%. The AMR speech codec has an option for D/A conversion (and vise versa), where speech samples (16 bits/sample) of the synthesized signal are truncated into 13 bits. Thus, the three least significant bits (LSB) are set to zero [1]. Because of incorrect implementation of truncation function in AMR standard source code [15] 1, three LSB bits are not always set to zero. This generates hushed noise during the silence periods for decoded 16 bits speech samples. This has an effect on the overall speech quality score in listening test. Therefore, the 13 bits truncation was fixed for tested SBRA GSM AMR. Noise suppression was added to the processing chain of the SBRA GSM AMR codec as the SMV codec also requires it. This addition yields more comparable results in the background noise test. The noise suppression algorithm used in the SBRA GSM AMR conditions is a proprietary algorithm, which is loosely based on Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator (MMSE STSA) [12]. TABLE IV THE OVERALL AVERAGE BIT RATES OF EACH TEST CONDITIONS Overall average bit rate (kbps) with 50% speech activity No noise Car noise Condition SG AMR [4.75, 5.9, 7.4, 12.2] 4.6 4.5 SG AMR [4.75, 5.9, 7.4] 3.6 3.1 SG AMR [4.75, 5.9] 2.8 2.7 SG AMR 4.75 (No SBRA) 2.5 2.5 SMV premium 0 4.7 4.5 SMV standard 1 3.8 3.5 SMV economy 2 3.0 2.85 SMV super-economy 3 2.8 2.55
B.
Listening test results Listening test experiments were conducted in clean and background noise environment. The experiment without background noise was Mean Opinion Score (MOS) test. The background noise experiment was Degradation Mean Opinion Score (DMOS) test done in the presence of car noise (SNR 15dBov). In both testing environments, the performance of single and tandem coding was evaluated. In addition to the listed condition in Table IV, experiments included Modulated Noise Reference Units (MNRU) at 6, 12, 18, 24, 30 dB as a reference conditions. Also the uncoded case (“direct”) as a high quality reference was included for the experiments. A nominal signal level of -26 dBov was used in the experiments. The performance evaluation results are presented with the 95% confidence intervals. The test results for single and tandem coding in the presence of car noise is shown in Fig. 3 and Fig. 4 respectively. Meanwhile, the results of single and tandem conditions in clean environment are presented in Fig. 5 and Fig. 6 respectively. The used average bit rates of tested codecs are shown in Table IV. Clearly it can be seen from the results that the effective of source controlled operation in GSM AMR is abating as the average bit rate is decreasing. Obviously the worst case is SBRA GSM AMR at 4.75 kbps operation mode. In this case the active mode set cannot have other modes and therefore the source controlled operation cannot be exploited. However a statistical comparison showed that SBRA GSM AMR was statistically equal to SMV system in every comparable condition, except one tandem condition. Herewith, the SBRA GSM AMR with the mode set [7.4, 5.9 & 4.75] was statistically better than the SMV system with mode 1 (standard). It is noticeable, that the performance of SBRA GSM AMR seems to be better than SMV in tandem cases. In clean environment and single coding test case, both benchmarked codecs have almost identical performance, even though the bit rates of compared conditions differs slightly (Table IV). In background noise experiments SBRA GSM AMR performs better than SMV codec, indicating stronger noise robustness for SBRA GSM AMR codec. As a conclusion from all conducted listening test experiments, it can be collectively said that SBRA GSM AMR seems to have better overall performance than the SMV codec. SMV Modes vs. SBRA GSM AMR 4.5 4.3 4.1 3.9
DMOS
IV.
3.7 3.5 3.3 3.1 2.9 2.7 2.5 SG AMR [4.75, ..,12.2]
1
The incorrent truncation exists in 3GPP TS 26.073: "Adaptive Multi-Rate (AMR); ANSI C source code", Release 5.3.0.
SMV Mode 0
SG AMR [4.75, ..,7.4]
SMV Mode 1
SG AMR [4.75, ..,5.9]
SMV Mode 2 SG AMR 4.75 SMV Mode 3
Fig. 3. The results for single coding in the presence of car noise.
SMV Modes vs. SBRA GSM AMR 4.5 4.3 4.1
DMOS
3.9 3.7 3.5 3.3 3.1 2.9 2.7 2.5 SG AMR SMV Mode 0 [4.75, ..,12.2]
SG AMR [4.75, ..,7.4]
SMV Mode 1
SG AMR [4.75, ..,5.9]
SMV Mode 2 SG AMR 4.75 SMV Mode 3
Fig. 4. The results for tandem coding in the presence of car noise.
SMV Modes vs. SBRA GSM AMR 4 3.8 3.6 3.4
MOS
3.2 3 2.8
bit rates are not optimized for variable bit rate coding, the test results of SBRA GSM AMR seem to be very competitive compared to existing standard variable rate speech codec. SBRA GSM AMR was statistically better in one test condition and the overall performance of SBRA GSM AMR was getting above the SMV system. The performances of both codec were almost identical in clean environment with single coding. Especially in tandem coding SBRA GSM AMR performs better than SMV codec. Also the test results in the presence of background noise indicate that SBRA GSM AMR codec seems to have stronger noise robustness and thus better voice quality in the presence of noise. SBRA extension brings improved trade off between voice quality and bit rate for GSM AMR speech codec by reducing the overall bit rate without compromising the speech quality. This feature leads for capacity saving in GSM speech services. Even though tested SBRA extension is optimized for GSM system, SBRA feature can also be exploited in other systems (e.g. WCDMA) where AMR codec is utilized. In addition to bit rate reduction, SBRA extension brings more adjusted optimization for network load in conversational services.
2.6 2.4
REFERENCES
2.2 2 SG AMR SMV Mode 0 [4.75, ..,12.2]
SG AMR [4.75, ..,7.4]
SMV Mode 1
SG AMR [4.75, ..,5.9]
SMV Mode 2 SG AMR 4.75 SMV Mode 3
Fig. 5. The results for single coding in clean environment.
[1] [2] [3] [4]
SMV Modes vs. SBRA GSM AMR 4
[5] [6]
3.8 3.6 3.4
[7]
MOS
3.2 3
[8]
2.8 2.6
[9]
2.4 2.2
[10]
2 SG AMR SMV Mode 0 [4.75, ..,12.2]
SG AMR [4.75, ..,7.4]
SMV Mode 1
SG AMR [4.75, ..,5.9]
SMV Mode 2 SG AMR 4.75 SMV Mode 3
Fig. 6. The results for tandem coding in clean environment.
V.
[11] [12]
CONCLUSION
Listening tests results were first conducted by GSM AMR codec having incorrect truncation used for D/A conversion. This caused performance degradation for AMR codec in clean environment. These incorrect test results were in line with the voice quality results presented in [14], where the voice quality of CDMA and GSM systems were compared in 2002. The publication indicated that SMV system has better voice quality than the AMR codec. The formal listening test results presented in this paper have been conducted by fixed truncation of D/A conversion in GSM AMR. Even though GSM AMR modes with different
[13]
[14]
[15]
3GPP TS 26.090: “AMR Speech Codec; Transcoding functions”. 3GPP TS 26.093 : “AMR Speech Codec; Source Controlled Rate Operation”. 3GPP TS 26.094 : “AMR Speech Codec; Voice Activity Detection (VAD)”. H. Holma, J. Melero, J. Vainio, T. Halonen and J. Makinen, “Performance of adaptive multi-rate (AMR) voice in GSM and WCDMA”, Proc VTC 2003-Spring, Jeju, Korea, 2003. 3GPP TS 26.092: “AMR Speech Codec; Comfort Noise Aspects”. 3GPP TS 45.009: “GSM/EDGE Radio Access Network; Link Adaptation”. Y. Gao, A. Benyassine, J. Thyssen, Su Huan-yu, E. Shlomot, ”eXCELP: A Speech Coding Paradigm”, Proc ICASSP 2001. Y. Gao et al., ”The SMV Algorithm Selected by TIA and 3GPP2 for CDMA Applications”, Proc ICASSP 2001, Salt Lake City, USA, 2001. J. Makinen, J. Vainio, ” Source signal based rate adaptation for GSM AMR speech codec”, Proc ITCC 2004, Las Vegas, USA, 2004. ITU-T; Recommendation P.800; “Methods for subjective determination of transmission quality”. M. Kylliainen et al.; “Compact high performance listening spaces”, Euronoise, Naples, 2003. Y. Ephraim and D. Malah, “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP32, pp. 1109-1121, 1984. GSM TR 06.75: “Digital Cellular Telecommunications System; Performance Characterisation of the Adaptive Multi-Rate (AMR) Speech Codec; ETSI Technical Report”. R. Yallapragada and V. Kripalani, ”Increments in Voice Capacity and Impact on Voice Quality with New Vocoders in GSM and CDMA Systems”, Proc ICPWC 2002, 2002. 3GPP TS 26.073: "Adaptive Multi-Rate (AMR); ANSI C source code".