Convention Paper - Google Sites

Audio Engineering Society

Convention Paper Presented at the 122nd Convention 2007 May 5–8 Vienna, Austria The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.

High Quality, Low Power QMF Bank Design for SBR, Parametric Coding, and MPEG Surround Decoders Han-Wen Hsu 1, Hsin-Yao Tseng 1, Chi-Min Liu 1 , Wen-Chieh Lee1, and Chung-Han Yang1 1

PSPLab, Computer Science and Information Engineering, National Chiao-Tung University, Hsin-Chu, 33050, Taiwan [email protected]

ABSTRACT Due to the alias-free properties, the complex quadrature mirror filter (QMF) bank has been used in MPEG-4 audio standard on SBR, parametric, and surrounds coding. The high complexity overhead from the complex QMF bank and the complex data processing in the decoder leads to the development of low power decoder which adopts the real QMF bank as the basic building module to reduce the complexity. However the artifacts from the aliasing in the real QMF bank are the major concern. This paper studies the artifacts from the real QMF bank and proposes a novel QMF bank design to achieve both low complexity and high quality. Also, this paper applies the novel QMF bank to develop the high-quality and low-power SBR, parametric, and MPEG surround decoders and shows the merits in complexity and quality.

1.

INTRODUCTION

In recent MPEG coding standards such as MPEG-4 HEAAC [1] (see Figure 1 and Figure 2) and MPEG Surround [2], there are two kinds of process framework adopted to eliminate or reduce the “aliasing-suffering” problem. The first is the adoption of the complex QMF bank. Because of the absence of the negative frequency component, the decimated subband has innate aliasingfree advantage. However, in addition to the almost double cost as the cosine QMF bank, it also incurs high computational complexity since the sequential operations practiced on subband signals are in the complex domain. This demand of high complexity causes the limitation to computation-sensitive or power-

sensitive applications. Hence, the second framework is based on the cosine QMF bank to maintain the reasonable complexity in real domain, and relies on the auxiliary mechanisms to eliminate or reduce aliasing suffering. However, the auxiliary mechanisms bring the other quality sacrifice. For example, in the Low Power SBR (LP-SBR), there is a constraint that the scaling gains among the contiguous tonal subbands must be the same to maintain the cancellation of the aliasing terms. The constraint results in the decreased accuracy of the reconstructed high frequency envelope. On the other hand, in MPEG Surround Low Power version [2], there are “real to complex” converters that are cascaded after the cosine QMF bank and applied to the lowest eight subbands to reduce the aliasing terms and improve the low frequency quality. The effect of the reduction is the issue need to consider.

H.W. Hsu et al.

HQ, LP QMF Bank Design

This paper proposes a novel QMF bank design which can achieve both high quality and low complexity. This framework is referred to as HQ+LP-SBR. The merits of HQ+LP-SBR lie on the adaptation to signal characters and achieve low complexity without suffering the “consistent scaling gain” problem like Low Power SBR.

2.

HQ-SBR, LP-SBR AND HQ+LP-SBR

2.1.

Aliasing-Term Cancellation in Cosine and Complex QMF

In this section, the basic concept of aliasing-term cancellation of cosine and complex QMF is reviewed and elaborated through concise diagrams [3][4]. Figure 3 illustrated the conventional 4-channel complex and cosine QMF banks. The negative frequency component in cosine QMF bank is complex conjugated to its respective positive frequency part and this is not only why the signals is real after passing through cosine QMF bank but also the whole computational complexity can be reduced compared to the complex QMF bank.

Figure 1: Block diagram of HE-AAC decoder High Quality version

Figure 3: Illustration of 4-channel cosine and complex QMF banks

Figure 2 : Block diagram of HE-AAC Low Power version

In Figure 4, the aliasing-term in cosine QMF bank is those from the overlap between the desirable signal and the imaging-term caused by decimator. From the original design of cosine QMF bank, aliasing-term belongs to one subband can be eliminated from its adjacent subbands if the scaling gain remains intact. On the other hand, Figure 5 shows the situation after applying the decimation and interpolation process on each subband in complex QMF bank. There is no aliasing-term introduced which is opposed to the cosine QMF bank due to the absence of negative frequency components. The detail of the construction and aliasingfree property of complex QMF can refer to [4].

AES 122nd Convention, Vienna, Austria, 2007 May 5–8 Page 2 of 10

H.W. Hsu et al.


2.2.

Framework of HQ+LP-SBR

Figure 6 illustrates the block diagram of the framework of HQ+LP-SBR. In the related research LP-SBR [5], the aliasing-term resulted from tone-like signals causes more audible degradation and noise-like signals causes less audible degradation. Therefore based on this assumption, if the subband signals are noise-like, its aliasing-term can be ignored and then operated in realvalue domain to save computation. On the other hand, if the subband signals are tone-like, its aliasing-term needs to be averted to keep audio quality. The proposed framework consists of three stages. The first is the “Type Decision of Filter Bank” which first determines the individual subband signal is tonal-like or noise-like, and then chooses either complex or real real/synthesis QMF bank adaptively according to the amount of tonallike subbands. The second is the “complexification” or “realification” process applied to each subband to handle the aliasing-term. The third is “image part addition” that add the affection of the image part of complexified subband to the decoded signal. Figure 4: Illustration of aliasing-term in cosine QMF bank

Figure 6: Block diagram of HQ+LP-SBR 2.2.1. Type Decision of Filter Bank

Figure 5: Illustration of no aliasing-terms in complex QMF bank

The function of this module is to determine the type of essential filterbank and is accomplished by two steps. One is “subband type decision” that is to determine the type of subband signal as either being tonal-like or noise-like. The other is “fiterbank type decision” that determines the type of essential analysis/synthesis fiterbank as either cosine or complex QMF.


H.W. Hsu et al.


Figure 7 shows the flowchart of “subband type decision”. First the magnitudes of 32 decoded MDCT spectral lines in one subband are sorted. Then, the noise-floor is estimated by 23

N = 161 ∑ X s [k ] ,

(1)

k =8

where X s means the sorted data, and the first 8 and the last 8 data are discarded to remove the affection of tonal component and zero-valued re-quantized data. On the other hand, the tonal component is estimated by the mean of the largest 4 data as T=

31

1 4

∑X

s

[k ] .

(2)

k = 28

Comparing the dB difference of T and N to a threshold (here it is 9dB); if the dB difference is larger than or equal to the threshold, this subband is marked as tonallike band, otherwise as noise-like band.

After the tonal/noise-like types of all the subbands belonging to AAC part have been determined, the essential type of analysis/ synthesis QMF bank can be decided as illustrated in Figure 8. If there are more than 5 subbands are determined as tone-like, the essential type of analysis/ synthesis QMF bank is complex QMF, otherwise cosine QMF. The criterion about the value-5 is based on the consideration of whole computational complexity and will be discussed later. 2.2.2. Complexification and Realification This module is used to complement the deficiency of unitary QMF bank. If the QMF bank is cosine and some subbands are tonal-like, the aliasing-term should be handled. The complexification process on such a subband is applied to eliminate the aliasing-term. On the other hand, if the QMF bank is complex and but the subband is noise-like, convert the QMF signals to realvalue domain through realification process will lower the complexity. Figure 9 illustrates the flowchart of this module for each subband.

Figure 9 Flowchart of complexification (A) and realification (B)

Figure 7 : Flowchart of subband signal type decision

The main matrix operations in cosine/complex analysis QMF are 63 ⎛ π X r [k ] = ∑ x[n] cos⎜ (k + 0.5)(2n − 96)⎞⎟ ⎝ 2M ⎠ n=0 and X [k ] = 63 x[n] exp⎛⎜ π (k + 0.5)(2n − 96)⎞⎟ . ∑ c ⎝ 2M ⎠ n =0

(3) (4)

for k = 0~31. Furthermore, the main matrix operations in cosine/complex synthesis QMF are 63 ⎛ π vr [n] = ∑ X r [k ] cos⎜ (k + 0.5)(2n − 64)⎞⎟ , (5) k =0

and

⎝ 2M

Figure 8 : Flowchart of filterbank type decision


⎠

. ⎧ 63 ⎛ π vc [k ] = Re⎨∑ X c [n] exp⎜ (k + 0.5)(2n − 64)⎞⎟⎫⎬ ⎝ 2M ⎠⎭ ⎩ n =0

(6)

H.W. Hsu et al.


for n = 0~127. From (3) and (4), the complexification and realification process on the analysis QMF band are to recalculate and drop the summation; that is 63 ⎛ π X [k ] = x[n] sin⎜ (k + 0.5)(2n − 96)⎞⎟ . (7) i

∑

⎝ 2M

n =0

⎝ 2M

⎠

Hence, if the essential QMF bank is cosine and the subband signals are complex, the mechanism of the block “Imaginary Part Addition” in Figure 6 is to add the summation of the products of the imaginary part of subband signals and sine coefficients. Then the decoded time-domain signals are obtained by subtracting the output of original cosine QMF bank with the output of “Imaginary Part Addition”. Additionally, there are three adjustments in the process. First, the time phase parameters of complex QMF and cosine QMF should be the same to achieve aliasing cancellation simultaneously. Second, the energy of HF real-valued subbands has to be divided by 2 as LPSBR to avoid excessive energy problem [5]. Third, the LF part that is not suffered from gain adjustment can be synthesized through cosine QMF bank to reduce the complex. 2.3.

Complexity for Complexification and “Imaginary part Addition”

The direct computation of the summation (7) for the image part needs 64 multiplications. However, there are symmetric properties: S[k,32 − n] = S[k, n], for n = 0 ~ 31, (9) S[k,64 − n] = −S[k,32 + n], for n = 0 ~ 31,

and

(10)

where S[ k , n] denotes the sine coefficients. There are thirty three different values in the sine coefficients. Hence, an image part needs only 33 multiplications by 63

15

∑ x[n]S [k , n] = ∑ (x[n] + x[32 − n])S [k, n] n=0

n=0

+ x[16]S [k ,16] + x[48]S [k ,48]

. (11)

15

+ ∑ (x[32 + n] − x[64 − n])S [k , n] n =1

On the other hand, (8) can be formulated with the matrix form:

(12)

The form can be represented as 63 63 v vc = ∑ X r [k ]⋅Cˆ k − ∑ X i [k ]⋅Sˆk

⎠

In the synthesis part, if the essential QMF bank is complex and some subband signals are real, an energy adjustment on such subbands should be applied to compensate the absent of the image part. On the other hand, (5) and (6) can be represented as 63 . ⎛ π vc [k ] = v r [k ] − ∑ X i [n] sin⎜ (k + 0.5)(2n − 64)⎞⎟ (8) n =0

v v v v = Cˆ ⋅ X r − Sˆ ⋅ X i .

k =0

(13)

k =0

where Cˆ and Sˆ are cosine and sine matrix of size 128× 64 defined by ⎛ π (k + 0.5)(2n − 64) ⎞ , (14) Cˆ [n, k ] = cos⎜ ⎟

128 ⎝ ⎠ k n π ( )( ) + 0 . 5 2 − 64 ⎛ ⎞ and (15) Sˆ[k , n] = sin⎜ ⎟, 128 ⎝ ⎠ for k = 0~63 and n = 0~127. Cˆ k and Sˆk mean the columns of Cˆ and Sˆ . Therefore, an image component

X i [k ] needs 128 multiplications for X i [k ] ⋅ Sˆk in (13). Similarly, by the facts Sˆ[k,64 − n] = −Sˆ[k, n], n = 0 ~ 63, (16)

and

Sˆ[k,128− n] = Sˆ[k,64 + n], n = 0 ~ 63,

(17)

there are only 65 different values in Sˆk . Hence, an image

X i [k ] needs only 65 multiplications ˆ for X i [k ] ⋅ Sk in (13).

component

From the fast algorithms defined in [4] and [6], the total multiplications of the matrix operations needed in LPSBR are 80 + 192 and is 256 + 512 in HQ-SBR. The multiplications required for one subband sample to be complexified and synthesized back to time-domain signal is 33 + 65. Hence, the maximum number of subbands that can be applied complexification is 5 under the constraint that the whole computational complexity of cosine QMF bank and complexification process can’t outstrip the complex QMF bank. 2.4.

+ Merits of HQ LP-SBR

From the perspective of quality, because the aliasingfree property fulfilled in HQ+LP-SBR is based on the intrinsic no-aliasing attribute which complex QMF bank has, the expected audio quality will approach HQ-SBR. As far as computational complexity is concerned, because applying realification to some noise-like subband leads to real-valued operation in the following module when the type of essential QMF bank is complex, the total computational complexity will be smaller than HQ-SBR. If the type of the essential QMF bank is cosine, due to the chosen criterion described in section 2.3 plus some noise-like subband which will


H.W. Hsu et al.


operate in real-valued domain in the subsequent procedures guaranteeing the total computational complexity which will be smaller than HQ-SBR. 3.

EXPERIMENTS Signal Description

Track

Signal

Mode Time(sec) Remark

1 Es01 Vocal(Suzan Vega) 2 Es02 German speech

Stereo 10

(c)

Stereo 8

(c)

3 Es03 English speech

Stereo 7

(c)

4 Sc01 Trumpet solo and Stereo 10 orchestra 5 Sc02 Orchestral piece Stereo 12

(d) (d)

6 Sc03 Contemporary pop music 7 Si01 Harpsichord

Stereo 11

(d)

Stereo 7

(b)

8 Si02 Castanets

Stereo 7

(a)

9 Si03 pitch pipe

Stereo 27

(b)

10 Sm01 Bagpipes

Stereo 11

(b)

11 Sm02 Glockenspiel

Stereo 10

(a) (b)

12 Sm03 Plucked strings

Stereo 13

(a) (b)

judged as very annoying. The improvement up to 0.1 is usually perceptually audible. The PEAQ has been widely used to measure the compression technique due to the capability to detect perceptual difference sensible by human hearing systems. Following experiments are based on this PEAQ system [7]. The twelve test tracks recommended by MPEG are shown in Table 1. These tracks include the critical music balancing on the percussion, string, wind instruments, and human vocal. In order to verify the decoder adopting the new approach can reach the objective sound quality as the HQ-SBR, the inputs include the 12 testing tracks encoded by these three different codecs—NCTUHEAAC [8], Nero and Coding Technology at bite rate of 80kbps. The experimental results are shown in Figure 10, Figure 11 and Figure 12. The presented experiment data shows that the performance of HQ+LP-SBR meets our expectation which means that in the best case, its objective sound quality equals to that of HQ-SBR and in the worst case equals to LP-SBR. Generally speaking, the ODG of HQ+LP-SBR will be between that of HQ-SBR and LPSBR and the average ODG of HQ+LP-SBR is almost the same as HQ-SBR.

0

-0.5 ODG

Remark: (a) Transients: pre-echo sensitive, smearing of noise in temporal domain. (b) Tonal/Harmonic structure: noise sensitive, roughness. (c) Natural vocal (critical combination of tonal parts and attacks): distortion sensitive, smearing of attacks. (d) Complex sound: stresses the Device Under Test.

-1

-1.5

-2 es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01sm02sm03 HQ-SBR

HQ+LP -SBR

LP -SBR

Table 1: The twelve test tracks recommended by MPEG. For objective quality evaluation, the PEAQ system (perceptual evaluation of audio quality) which is the recommendation system by ITU-R Task Group 10/4 is adopted. The system includes a subtle perceptual model to measure the difference between two tracks. The objective difference grade (ODG) is the output variable from the objective measurement method. The ODG values should range from 0 to −4, where 0 corresponds to an imperceptible impairment and −4 to impairment

Figure 10 ODG of 12 testing tracks encoded by NCTUAAC


H.W. Hsu et al.


LP-SBR must satisfy. In Figure 17 and Figure 20 the spectrum of LP-SBR and HQ+LP-SBR are compared to manifest that HQ+LP-SBR will not suffer from the consistent scaling gain problem.

0 -0.5

ODG

-1

4. -1.5 -2 -2.5 -3 es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01sm02sm03 HQ-SBR

HP +LP -SBR

LP -SBR

Figure 11 ODG of 12 testing tracks encoded by Nero

0 -0.5 -1 ODG

-1.5 -2 -2.5 -3 -3.5

EXTENSION TO PARAMATRIC CODING AND MPEG SURROUND

The HQ+LP-SBR can also be fitted into the architecture of MPEG-4 HE-AAC version 2. The block diagram of this compound framework is shown in Figure 13. There is only one thing different to HQ+LP-SBR which is the type of the lowest 3 or 4 subbands which must be complex to avert aliasing-term because they will be further split. Figure 14 shows the block diagram of fitting HQ+LPSBR into MPEG Surround. In Figure 14, the block named “Type Decision of Filter bank” is replaced by “Subband Signals Type Decision” and “Complexification/Realification” is substituted for “Realification”. This is because at least the lowest 8 subbands must be complex but HQ+LP-SBR only can afford to apply complexification process to 5 subbands if the type of essential QMF bank is cosine. Therefore the type of essential QMF bank is always complex. Realification on noise-like subbands determined by “Subband Signals Type Decision” is applied to reduce computational complexity when blending HQ+LP-SBR with MPEG Surround

-4 es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01sm02sm03 HQ-SBR

HQ+LP -SBR

5.

CONCLUSION

LP -SBR

Figure 12 ODG of 12 testing tracks encoded by Coding Technology On the other hand, Figure 15 ~ Figure 17 illustrates the spectrum of one frame decoded by HQ-SBR, LP-SBR and HQ+LP-SBR. Figure 18 ~ Figure 20 are another example reflecting the same result obtained in ODG from the perspective on frequency domain. In Figure 16 and Figure 19, the spectrum of HQ-SBR and HQ+LPSBR are illustrated and compared with each other to display that the energy floor of the two is almost the same. Moreover, Figure 15 and Figure 18 show the spectrum of HQ-SBR and LP-SBR at the same time to demonstrate that the energy floor of LP-SBR is sometimes higher or lower than that of HQ-SBR which is resulted from the constraint-consistent scaling gain-

This paper has proposed a decoder with low complexity as well as high quality based on the adaptation to signal characters. In this architecture of decoder, the high quality is reached by using the intrinsic aliasing-free property complex QMF bank has without suffering the problem of consistent scaling gain in LP-SBR and costing down the computational complexity by transforming the subband signal into real-value domain. The objective measurement by the perceptual evaluation of audio quality system has illustrated that the quality is closed to HQ-SBR. 6.

ACKNOWLEDGEMENTS

This work was supported by National Science Council under Contract NSC95-2221-E-009-262.


H.W. Hsu et al.

7.


REFERENCES

[1] ISO/IEC 14496-3:2001/Amd.3:2004 “Information technology-- Coding of audiovisual objects - Part 3: Audio”. 23003-1:2006/FDIS “Information [2] ISO/IEC technology—MPEG audio technologies - Part 1: MPEG Surround”. [3] P.P. Vaidyanathan, “Multirate Systems and Filter Banks”, Englewood Cliffs, NJ: Prentice-Hall, 1993. [4] H.W. Hsu, C.M. Liu, and W.C. Lee , “Fast Complex Quadrature Mirror Filterbanks for MPEG-4 HEAAC”, AES 121st Convention, San Francisco, CA, USA, 2006 October 5 – 8.

[5] O. Shimada, T. Nomura, Y. Takamizawa, M. Serizawa, N. Tanaka, M. Tsushima, T. Norimatsu, C.K. Seng, K.K. Hann and N.S. Hong, “A Low Power SBR Algorithm for the MPEG-4 Audio Standard and its DSP Implementation”, AES 116th Convention, Berlin, Germany, 2004 May 8-11. [6] S.W. Huang, T.H. Tsai, and L.G. Chen, “Fast Decomposition of Filterbanks for the State-of-theArt Audio Coding”, IEEE Signal Processing Letter, VOL12, NO. 10, October 2005. [7] ITU Radiocommunication Study Group 6, “Draft Revision to Recommendation ITU-R BS.1387Method for objective measurements of perceived audio quality”. [8] NCTU-HEAAC website http://psplab.csie.nctu.edu.tw/

Figure 13 Block diagram of HQ+LP-SBR plus PS

Figure 14 Block diagram of HQ+LP-SBR plus MPEG Surround


H.W. Hsu et al.


Figure 15 The spectrum of signal decoded by HQ-SBR and LP-SBR

Figure 16 The spectrum of signal decoded by HQ-SBR and HQ+LP-SBR

Figure 17 The spectrum of signal decoded by LP-SBR and HQ+LP-SBR


H.W. Hsu et al.


Figure 18 The spectrum of signal decoded by HQ-SBR and LP-SBR

Figure 19 The spectrum of signal decoded by HQ-SBR and HQ+LP-SBR

Figure 20 The spectrum of signal decoded by LP-SBR and HQ+LP-SBR