IMPROVING LSF QUANTIZATION PERFORMANCE ...

4 downloads 0 Views 118KB Size Report
are the orders of the predictors.mLSF, pLSFk, qLSFk and. CBk are, respectively, the mean value vector, the predicted. LSF, the quantized LSF, and the codebook ...
IMPROVING LSF QUANTIZATION PERFORMANCE WITH SORTING Anssi R¨am¨o Speech and Audio Systems Laboratory, Nokia Research Center, P.O. Box 100,FIN-33721 Tampere, Finland [email protected] ABSTRACT All coders require that the used linear prediction (LP) filter is stable. All current line spectral frequency (LSF) vector quantization routines may cause that the resulting quantized LSF vector is out of order, and thus unstable. In those cases the vector is stabilized after the quantization by sorting the LSF vector. This, however, means that the found vector may not be optimal, since this sorting was not taken into account in the quantization loop. The proposed codebook search routine orders the LSF vector inside the quantizer loop, and the optimal code vector is always found. The proposed method improves LSF quantization performance in terms of spectral distortion (SD) while maintaining the original bit allocation. The method can be used both with predictive and non-predictive split (partitioned) vector or multistage vector quantizers. The proposed method has more effect on performance when higher order LPC models or matrix quantizers are used, because in those cases LSFs are closer to each other and invalid ordering is more likely to happen. The proposed method is used in Texas Instruments, Nokia, and Ericsson ITU-4k candidate speech coder [1]. 1. INTRODUCTION Low bit rate speech coding has advanced over the years quite amount. One of the cornerstones in speech coding is LP analysis and quantization of the LP coefficients. Usually LP coefficients are transformed into a different domain for better quantization properties. Most often the LSF representation is used [2]. There are several good papers describing methods and performance for transforming and quantizing the filter coefficients [3][4]. Current state of the art LP coefficient quantization methods take advantage of both intra-frame correlations with vector quantization and interframe frame correlations in the form of prediction or matrix quantization [5]. It is also suggested that using two or more predictors improves performance [6]. However, the ordered property of the stable LSF parameters has not been taken into account earlier. This paper presents LSF quantization problem and how it can be improved with sorting

the LSF coefficients inside the quantizer loop. Section 2 provides background for LSF based LP quantization. Section 3 presents the new quantization method in detail and a comparison to the earlier method. In Section 4 properties of the new method are discussed in various forms of vector quantization. Section 5 concludes this paper. 2. LSF REPRESENTATION AND ITS PROPERTIES Most current speech coders include a linear prediction (LP) filter, for which an excitation signal is generated. The LP filter has typically an all-pole structure 1 1 = , A(z) 1 + a1 z −1 + a2 z −2 + . . . + ap z −p

(1)

where a1 , a2 , . . . , ap are the LP coefficients. The degree p of the LP filter is usually 8 − 12 in the case of a narrow band speech coder (8kHz sampling rate). The input speech signal is processed in frames, frame update rate being 33100 Hz (i.e G.723.1 and G.729). For each speech frame, the encoder determines the LP coefficients using e.g. the Levinson-Durbin algorithm. Nowadays line spectrum frequency (LSF) representation is employed for quantization of the coefficients, because they have several good properties [5][7]. For intermediate subframes, the coefficients are linearly interpolated using the LSF representation. In order to define the LSFs, the inverse LP filter polynomial A(z) is used to construct two polynomials: P (z) = A(z) + z −(p+1) A(z −1 ) = (1 − z −1 )κ(1 − 2z −1 cos ωi + z −2 ) , i = 2, 4, . . . , p (2) and Q(z) = A(z) − z −(p+1) A(z −1 ) = (1 − z −1 )κ(1 − 2z −1 cos ωi + z −2 ) , i = 1, 3, . . . , p − 1 (3) The roots of the polynomials P (z) and Q(z) are called LSF coefficients and they have the following properties: 1. all the zeros (roots) of the polynomials are on the unit circle ejωi with i = 1, 2, . . . , p

2. the zeros of Q(z) and P (z) are interlaced with each other 3. the zeros of a stable filter are always in ascending order More specifically the last property means that the following relationship is always satisfied: 0 = ω0 < ω1 < ω2 < . . . < ωp−1 < ωp < ωp+1 = π (4) This ascending order guarantees the filter stability, which is generally required in all coding applications. Note, that the first and last parameters are always 0 and π respectively and only p values have to be transmitted. In speech coders efficient representation of the LSF coefficients is needed for storing or transmission of the spectral information. Thus the LSFs are often quantized using vector quantization (VQ) together with the prediction. Scalar quantization should be avoided, because of significant performance loss [3]. Usually, the predicted values are estimated based on the previously decoded output values (AR-predictor) or previous frame’s codebook values (MA -predictor) [4]. The general form of the combined ARMA-prediction is as follows Pm pLSFk = mLSF + j=1 Aj (qLSFk−j − mLSF )+ Pn + i=1 Bi CBk−i , (5) where Aj and Bi are the predictor matrices, and m and n are the orders of the predictors.mLSF , pLSFk , qLSFk and CBk are, respectively, the mean value vector, the predicted LSF, the quantized LSF, and the codebook vector for the frame k. In practice Aj and Bi are often diagonal and their orders m = 1 . . . 2 and n = 1 . . . 5 respectively. After the predicted value is calculated, the quantized LSF value can be obtained: qLSFk = pLSFk + CBk ,

where,

Pˆi (f ) = Pi (f ) =

1 (Aˆi exp(j2πf /Fs ))2 1 (Ai exp(j2πf /Fs ))2

(8)

are the spectra of the speech frame with and without the quantization, respectively. However this is computationally very intensive, and simpler approximations are used instead. A commonly used method is to weight the LSF error (rLSFki ) with weight (Wk ). For example, the following weighting can be used [8]: ½ , for dk < 450Hz 3.347 − 1.547 450 dk Wk = 0.8 1.8 − 1050 (450 − dk ) , otherwise, (9) where dk = LSFk+1 − LSFk−1 with LSF0 = 0Hz and LSF11 = 4000Hz. A simplified block diagram of a commonly used search procedure is shown in Figure 1, where rLSFk and eLSFk Predicted LSF

pLSFk Target LSF

LSFk

rLSFki

i:th residual codebook vector

CBki

eLSFki

Weight

Wk

x SD i Select index i with minimum SD

(6)

When using predictive quantization or constrained VQ (i.e. split or multistage) the stability of the resulting LSF vector must be checked before conversion to LP coefficients since the resulting quantized vector could be in invalid order (not sorted). Only in the case of direct VQ (non-predictive, single stage, non-split) the codebook can be designed so that the resulting quantized vector is always in order. 3. SORTING LSF INSIDE THE QUANTIZER LOOP Often all vectors are tried out (full search), when searching for the best codebook vector. Also certain perceptually important goodness measure is used for every instance in order to have perceptually valid quantization. Optimally the weighting is based on the spectral distortion SDi Z Fs 1 [10 log10 (Pi (f )) − 10 log10 (Pˆi (f ))]2 df SDi2 = Fs 0 (7)

Fig. 1. Commonly used residual quantization loop. are the residual and error LSFs respectively. The Wk based estimated spectral distortion SDi is calculated for all codebook instances i in frame k. The index i resulting the minimum distortion SDi is then selected as the quantizer output (codebook index). However, it may happen that the resulting quantized vector is out of order. In that case the decoded LSF vector must be sorted afterwards. If we reconfigure the quantization loop we can form an improved search block shown in Figure 2. The difference means that all vectors, which in traditional search are not in order, had bigger SD value than they should have. Thus the found vector was not necessarily the optimal one. This sorting before selection takes into account vector combinations which normally would have produced out of order vectors and large SDi , but after the sorting they result lower SDi and might be even selected for output. This in a sense means

Predicted LSF

i:th residual codebook vector

CBki

pLSFk

+ qLSFki

Sort LSF to ascending order

qLSFsi( k ) Target LSF

LSFk

eLSFki

Wk Weight

x SD i

Select index i with minimum SD

Fig. 2. Residual quantization loop with internal sorting.

that vectors otherwise considered outliers now have another chance to work efficiently for improving the VQ performance. The search speed of the proposed method is slightly slower than with the normal solution. Also the sorted vector is used quite seldom as the optimal index, but whenever it is used the performance is increased without extra bits. 4. PROPERTIES OF THE SORTING In the case of a non-predictive split VQ the performance improves with the help of the sorting. Note that although the codebooks inside the splits are in order the codebooks overlap in LSF domain between the splits. In the case of a multistage VQ only the case where the last stage is sorted quarantees performance improvement. However, sorting can be used for (some / all) internal stages. In that case the decoder has to do the sorting in the same stages in order to decode correctly and improvements are likely to be seen. Together with the prediction quality improvement can be expected. However, initial simulation results indicate that the improvements gained in the quantization stage are smaller than the slight worsening of the prediction performance. Root for this performance effect is probably that the sorted outlier is not inline with the speech model and thus gives worse prediction although gives less error on the current frame. Sorting also works for larger matrix quantizers. Further extending the scope it can be expected that with prediction

and larger matrix quantizer the benefits of the sorting can be larger. For arithmetic or random codebooks the method is especially well suited since the codebooks are not optimized for the speech model, but use less memory. With sorting the outliers are thrown back to usage reducing quantization error. 5. CONCLUSIONS A new LSF quantization criterion was presented where the ascending ordering property of the LSF vectors are taken into account inside the quantizer loop. The proposed method reduces quantization error while retaining the original bit allocation. 6. REFERENCES [1] A. McCree, J. Stachurski, T. Unno, E. Ertan, E. Paksoy, V. Viswanathan, A. Heikkinen, A. R¨am¨o, S. Himanen, P. Bl¨ocher, and O. Dressler, “A 4 kb/s Hybrid MELP/CELP Speech Coding Candidate for ITU Standardization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Orlando), pp. 629-632, May 2002. [2] F. Itakura, “Line spectrum representation of linear predictive coefficients of speech signals,” J. Acoust. Soc. Amer., vol. 57, S35, Apr. 1975. [3] K.K. Paliwal and B. S. Atal, “Efficient vector quantization of LPC parameters at 24 bits/frame,” IEEE Trans. Speech Audio Processing, vol. 1, pp. 3-14, 1993. [4] T. Eriksson, J. Linden, and J. Skoglund, “Interframe LSF quantization for noisy channels,” IEEE Trans. Speech Audio Processing, vol. 7, pp. 495-509, 1999. [5] C. S. Xydeas and C. Papanastasiou, “Split matrix quantization of LPC parameters,” IEEE Trans. Speech Audio Processing vol. 7, no. 2, p. 113-125, March 1993. [6] T. Eriksson, J. Linden, and J. Skoglund, “A safetynet approach for improved exploitation of speech correlations,” Proc. Int. Conf. Digital Signal Processing, Cyprus, vol. 1. pp. 96-101, 1995. [7] N. Sugamura and N. Farvardin, “Quantizer design in LSP speech analysis and synthesis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 398-401, 1988. [8] 3GPP TS 26.090 “AMR speech codec; transcoding functions”