in the low frequency part that is encoded by a convention AAC encoder. Furthermore, high frequency reconstruction can ex
Audio Engineering Society
Convention Paper Presented at the 117th Convention 2004 October 28–31 San Francisco, CA, USA This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.
Audio Patch Method in MPEG-4 HE-AAC Decoder 1
1
1
Han-Wen Hsu , Chi-Min Liu , Wen-Chieh Lee , and
2
1. PSPLab, Computer Science and Information Engineering, National Chiao-Tung University, Hsin-Chu, 33050, Taiwan
[email protected] 2. InterVideo Digital technology (Shanghai) Co., Ltd.6F, Caohejing Software Mansion No. 461 Hongcao Rd., ShangHai, PRC (200233) ABSTRACT This paper extends the previous work on AAC to the HE-AAC. The audio path method consists of two individual parts, zero band dithering and high frequency reconstruction. The zero band dithering can conceal the fishy artifact in the low frequency part that is encoded by a convention AAC encoder. Furthermore, high frequency reconstruction can extend the audio obtained from the SBR to a full bandwidth signal. Intensive experiments have been conducted on various audio tracks to check the quality improvement and the possible risks in degrading the quality. The objective test measures used is the recommendation system by ITU-R Task Group 10/4.
1.
INTRODUCTION
An audio patch method on audio decoders without any prior information has been proposed to successfully enhance the MP3 and AAC tracks [1]. The audio patch method consists of two individual parts: zero band dithering and high frequency reconstruction. The paper considers the extension of the two modules to the HEAAC decoders. Under restriction of limited bit rate, to get the best perceptual quality, almost all audio compression codecs scarify the high frequency component of signals, and put all available bits to the low frequency component
that is more important for human hearing. However as the audio bandwidth is lower, the hearing perception become muffling. Under the tradeoff of bandwidth limiting, an advanced scheme referred to as “Spectral Band Replication (SBR)” [2]-[4] has been proposed to compress high frequency contents with little overheads, commonly about 1~3 kbits per second for each channel. With SBR module for high frequency contents, the AAC encoder can focus on compressing the low frequency part under a more sufficient available bit rate. The resulting scheme is referred to MPEG-4 High Efficient (HE) AAC or AACplus. Figure 1 illustrates an audio spectrum of a HE-AAC frame, where SBR applies on the range from 8k Hz to 16k Hz.
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
Although SBR module can extend the bandwidth of the narrowband signal decoded from the AAC decoder, the frequency range is usually still lower than 16k Hz. The determining of the maximum frequency in SBR range is affected by the two factors. One is the available bits for the SBR module. When the range of SBR is longer, the number of required bits for the energy information, the ratio of tonal and noise-like component of the timefrequency grids [2] in SBR data will increase. Hence under the bit allocation policy between the convention AAC encoder and the SBR module, the bandwidth extension range is constrained. The other factor is due to the spectral band duplication policy used commonly in the SBR algorithm. Usually SBR extend the bandwidth of the signal decoded from the convention AAC decoder as the twice. Hence, if the original audio is harder to compress, the cut-off frequency in the AAC encoder will be set lower and then the final bandwidth of the decoded audio becomes limited. To enhance the audio quality, the paper extends the signal enhanced by SBR to a full bandwidth signal. On the other hand, the effect of the convention AAC encoder still affects the audio quality largely. The zero-band is a frequent artifact for an audio encoder at very low bit rates. As illustrated in Figure 1, a breakage spectrum presents in the low frequency part due to many zero bands. Therefore, a zero-band dithering method to conceal the artifact is required.
2.
FUNDAMENTAL CONCEPT OF SBR
The HE-AAC codec is the extension of the convention AAC codec by supporting of the SBR encoder. The basic principle of SBR is to reconstruct the high frequency spectral bands by replicating the low frequency spectral bands and rescale the spectral envelope of the reconstructed high frequency component closely to the original signal according to the priori information extracted by the SBR encoder illustrated as Figure 2. Because the conventional AAC encoder only needs to compress the low frequency parts (lower then a half of the original bandwidth) of the audio signal, a half of the original sample rate is enough to keep signal information according to Nyquist’s theorem. Hence, the signal, before being compressed by the convention AAC encoder, is down-sampled by a down sampler with factor two. In other words, the HEAAC codec is a dual rate system.
Figure 2 : Block diagram of HE AAC encoder.
Figure 1: A spectrum of a HE-AAC audio frame. Experiments are conducted on intensive audio tracks to prove the improved quality. Through both the subjective and objective measure, the method is verified to be able to improve the perceptive quality of HE-AAC encoded audio signals to approach the original AAC at 65% the bit rate. Especially, the objective measurement by the perceptual evaluation of audio quality system, which is the recommendation system by ITU-R Task Group 10/4 [5] has proven a significant quality improvement.
The SBR encoder is responsible to extract the high frequency information that includes the data of spectral envelope representation, the tonal-to-noise ratio, and other control parameters. To extract the information, the original full bandwidth signal is separated into 64 subbands by a complex-valued QMF (quadratic mirror filter). Furthermore, the subband signals covered by the SBR range are gathered by a time-frequency grid. The signal energy on each unit is encoded by the SBR encoder and ensures the SBR decoder be able to scale the spectral envelope of the reconstructed high-bands closely to the original signal. The SBR range consisting of high-bands is separated into several envelopes that are segmented by several time points recoded as control parameters. The determining of the time points depends on the stable situation of the signal content. Also, several sub-bands are combined as no-uniform bands from frequency aspect. The frequency segment points are decided by the frequency resolution table. By the segment of the two dimensions, the time-frequency grid is constructed. As the segment is refiner, the number of
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 2 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
the unit is larger, and the more bits are required for high frequency component. Hence, the quality of HE-AAC tracks depends largely on the choice of different frequency resolution tables and the number of envelopes. On the other hand, to handle the inconsistence of the tonal-to-noise ratio of the original spectral bands and the replicated spectral bands, the adding of noise or sinusoids with suitable energy is also considered. The extraction of the ratio information is also based on units which are defined by a timefrequency grid with different resolutions. Figure 3 illustrates the block of the HE-AAC decoder. After being decoded by the AAC Core decoder, the low frequency signal is separated into 32 subbands by an analysis QMF. To follow, the HF generator reconstructs the high-band signals by duplicating the low-band signals that are processed through inverse filtering further. Then the envelope adjuster module modulates the spectral envelope in the high-bands, and adds additional component such as noise and tone according to the control information extracted by SBR encoder. Finally the subbands are synthesized by a synthesis QMF bank with 64 subbands to a time domain signal.
containing zero energy in the spectrum. The method adopts random noises to dither zero bands, and exploits the information of the quantization to extract the amplitude range of dithering noise. This section gives a review of the zero-band dithering algorithm.
Figure 4 : An audio spectrum containing several spectral nullities. This is also an example of a HE-AAC track only decoded by the convention AAC decoder without SBR. 3.1. Quantization Model in AAC For AAC encoder, the non-uniform quantizer is used to handle the weights of distortion effectively. Also every quantization band owns individual quantization step size ∆ q to fit different perceptually tolerable distortion allowed by psychoacoustic model. More specific, the quantization model introduced in MPEG-2/4 AAC standard [6] [7] is given as follow.
AAC-SBR Bitstream
Bitstream Parser AAC Core Decoder
Bitstream Demultiplexer
Huffman Decoding & Dequantization
32 channels
Analysis QMF Bank
HF Generator
Synthesis QMF Bank
Envelope Adjuster
3
X [k ]4 , ∆q
(1)
where X [k ] is a frequency line, S [k ] is the quantization value, and the operate int(.) denotes the nearest integer operation. 3.2. Zero Bands Occurring Condition In decoders, the encoded frequency signal X [k ] will be inversely quantized as X~ [k ] by (2). 4 ~ ~ X [k ] = S [k ]3 ⋅ ∆ q ,
64 channels
(2)
~ is defined as where ∆ ∆ q .In fact, the original X [k ] q value should be given as 4 ~ (3) X [k ] = R[k ]3 ⋅ ∆ q , where R[k ] is a real number, and 4 3
Output PCM Samples
Figure 3 : Block diagram of HE-AAC decoder. 3.
S [k ] = int
S [k ] = int(R[k ]) .
ZERO BAND DITHERING METHOD
The proposed dithering method in [1] patches spectral nullity, illustrated as Figure 4 , to ease annoying fishy noise. A zero band is defined as a spectral band
(4) From the definition of zero bands, the requantized ~ X [k ] in zero bands must be zero. From (2), it implies that the relative S [k ] must be also zero. Hence, from
AES 117th Convention, San Francisco, CA, USA, 2004 October 28–31 Page 3 of 11
Liu et al.
Audio Patch Method in MPEG-4 HE AAC Decoder
(4), R[k ] should be less than 1/2. Substituting the result to (3) illustrates that the occurring of zero bands is due to the relation X [k ]