to a width of one Bark. ▫ Formula. ○ z = 13 tan-1 (0.76 f ) + 3.5 tan-1 ( f /7.5)2. ○
f : kHz .... ATAC ( Fujitsu, NEC, Sony, JVC). ○ Transform coding with ...
Part 4: Audio Coding
Backgrounds1
Introduction Psychoacoustic Effects Filter Bank Design Historical Reviews MUSICAM & ASPEC MPEG1 MPEGII & IV AC3
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction-- Signals Parameters
Backgrounds2
Typical values of basic parameters of three classes of acoustic signals
Acoustic Signals
Frequency Range
Sampling Rate
PCM bits per sample
PCM bit rate
Telephone Speech
300 - 3400 Hz
8 kHz
8
64 kb/s
Wideband Speech
50 - 7, 000 Hz
16 kHz
14
224 kb/s
Wideband Audio
10 - 20,000 Hz
48 kHz
16
768 kb/s
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction-- Storage
Backgrounds3
Format of CD and DAT Storage Formats
Storage Device
Sampling Rate in kHz
Audio Rate in Mb/s
Overhead in Mb/s
Total Bit Rate in Mb/s
CD
44.1
1.41
2.91
4.32
DAT
32.0
1.03
1.43
2.46
DAT
44.1
1.41
1.05
2.46
DAT
48
1.54
0.92
2.46
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction-- MOS
Backgrounds4
Mean Opinion Score (MOS) Scale Subjective Rating Scale of 1-5
4.0 High quality, or near transparent coding A necessary condition for Network Quality
3.5 Speech degradation is easily detectable Communications quality
3.0 and below Synthetic speech Inadequate level of naturalness and speaker recognizability
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction-- Audio
Backgrounds5
Range of audio 15- or 20-kHz bandwidth high-fidelity audio signal
Two ways for audio coding Transform coding Subband coding
Transform coding Use frequency domain transform to extract the signal, eg. FFT, DCT Scalar quantization and entropy coding Perceptual masking model
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction-- Audio (cont.)
Backgrounds6
Applications: AC-2, AC-3, ASPEC (adaptive spectral perceptual entropy coding), ISO/MPEG audio Layer III
Subband coding Analyze the frequency components of signal by filter bank Applications: MASCAM (masking pattern adapted subband coding and multiplexing), MUSICAM (masking pattern adapted universal subband integrated coding and multiplexing), ISO/MPEG audio Layer I, II, and III
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction (Cont.)
Backgrounds7
Psychoacoustic model Simulate the masking effect of human ears Provide information to control the quantization level Use this information such that quantization error under masking threshold
Just-noticeable Distortion
NCTU/CSIE/DSP LAB Audio Processing Group
1. Introduction (Cont.)
Backgrounds8
ISO/MPEG audio standardization The first international standard in the field of high quality digital audio compression. Different sample rate: 32-, 44.1-, 48-kHz Four audio modes: monophonic, dual-channel, stereo, joint stereo Three layers Each layer has increasing complexity and quality
Applications Radio sound program emission (Digital audio broadcasting (DAB), satellite) Digital compact cassette (DCC) Television sound emission (HDTV) Storage Multimedia applications NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic
Backgrounds9
Introduction Masking Effect Critical Band
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Introduction
Backgrounds10
Sound Pressure Sounds are easily described by mens of the time-varying sound pressure p(t). The unit of sound pressure is the PASCAL (Pa). In psychoacoustics, values of the sound pressure between 10-5 Pa and 102 PA are relevent.
Sound Pressure Levels Normally used to cope with the broad range of sound pressure.
L = 20 log( p / p0 ) dB
The reference value of the sound pressure p0 is standardized to p0 = 20 µPa.
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Introduction (c.1)
Backgrounds11
Sound Intensity and Sound Intensity Levels L = 20 log( p / p0 ) dB = 10 log( I / I 0 ) dB
The reference value I0 is defined as 10 -12 W/m2
Noise Density When dealing with noises, it is advantageous to use density instead of sound intensity – e.g., the sound intensity within a bandwidth of 1 Hz. The noise power density, although not quite correct, is also used. The logarithmic correlate of the density of sound intensity is called sound intensity density level, usually shortened to density level, l. For white nose, l and L are related by the equation
L = [l + 10 log( ∆f / Hz )]dB
where ∆f represents the bandwidth of the sound. NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Introduction(c.2)
Backgrounds12
Normative Elements on Human Ear Threshold in quiet – A function of frequency that the sound pressure level of a pure tone that is just audible Masking – Property of the human auditory system by which an audio signal cannot be perceived in the presence of another audio signal. Masking threshold – A function in frequency and time below which an audio signal cannot be perceived by the human auditory system. Critical band – Loosely speaking, the perception of a particular frequency, say Ω0, by the auditory system is influenced by energy in a critical band of frequencies around Ω0. – The ear acts as a multichannel real-time analyzer with varying sensitivity and bandwidth throughout the audio range.
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect
Backgrounds13
Masking of pure tones by noise Pure tones masked by broad-band noise The masked thresholds rise with increasing frequency. The slope of this increase corresponds to about 10 dB. At low frequencies, the masked thresholds lie about 17 dB above the given density level.
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.1)
Backgrounds14
Masking of pure tones by narrow-band noise The bandwidth is about 100 Hz below and 0.2 f above 500 Hz. The level of each masking noise is 60 dB and the corresponding bandwidths of the noise are 100, 160, and 700 Hz, respectively.
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic-- Masking Effects (c.2)
Backgrounds15
Masking of pure tones by narrow-band noise (c.1)
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.3)
Backgrounds16
Pure tones masked by Low-pass or High-pass noise The cut-off frequencies is 0.9 KHz and 1.1 kHz, respectively.
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.4)
Backgrounds17
Masking of pure tones by pure tones Masking tone-- 1 kHz, 80 dB.
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.5)
Backgrounds18
Pure tones masked by pure tones
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.6)
Backgrounds19
Pure tones masked by complex tones
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.7)
Backgrounds20
Temporal effect Simultaneous masking – When two signal presence simultaneously , the phenomenon of the weaker signal become inaudible are called simultaneous masking Premasking – The test sound has to be a short burst or sound impulse which can be presented before the masker stimulus is switched on Postmasking – The test sound is presented after the masker is switched off , then quite pronounced effects occur
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Masking effect (c.8)
Backgrounds21
Premasking & Postmasking do not offer much efforts than simultaneous masking
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Critical band
Backgrounds22
Concepts Assume the part of the a noise is effective in masking a test tone is the part of its spectrum lying near the tone
Fletcher band-widening experiment [1940] Energy
2000 Hz Sinusoidal signal
60 58 56 54 52 50 48 50
100
200
400
800
1600
3200
Masker Bandwidth (Hz)
NCTU/CSIE/DSP LAB Audio Processing Group
2. Psychoacoustic - Critical band (c.1)
Backgrounds23
Observations we can assume that human ear performs like a set of bandpass filters
Critical Band Rate Psychoacoustic measure in the spectral domain which corresponds to the frequency selectivity of the human ear.
Bark Unit of critical band rate.Part of the spectral domain which corresponds to a width of one Bark.
Formula z = 13 tan-1 (0.76 f ) + 3.5 tan-1 ( f /7.5)2 f : kHz
NCTU/CSIE/DSP LAB Audio Processing Group
3. Filter Bank Design
Backgrounds24
Error Sources Two Design Concepts Perfect Reconstruction (PR) Systems History Review x(n)
H0(z) H1(z)
x0(n) x1(n)
"M "M
v0(n) v1(n)
"M "M
u0(n) u1(n)
F0(z) F1(z)
xM-1(n) uM-1(n) vM-1(n) HM-1(z) "M "M FM-1(z) Analysis Decimators Expanders bank
Synthesis bank NCTU/CSIE/DSP LAB Audio Processing Group
^ x(n)
3. Filter Bank Design-- Error Sources
Backgrounds25
Two Channel Filter Bank x(n)
H0(z) H1(z) Analysis bank
x0(n) x1(n)
?2 ?2
v0(n) v1(n)
?2 ?2
y0(n) y1(n)
Decimators Expanders
Error Sources
F0(z) F1(z)
^ x(n)
Synthesis bank
Aliasing Amplitude distortion Phase distortion
NCTU/CSIE/DSP LAB Audio Processing Group
3. Filter Bank Design-- Two Design Concepts
Backgrounds26
Two Trival Design Approaches Approach 1-- Non-Overlap Magnitude Response Aliasing effect is not serious The signal around ω=π/2 with serve attenuation Solutions to non-overlap method – Boost the non-overlap frequency region, but will result in serve amplification of noise – Decrease the transition width, but the cost of filters increase H0(z) 0
H1(z) π/2
π
ω NCTU/CSIE/DSP LAB Audio Processing Group
3. Filter Bank Design-- Two Design Concepts (c.1)
Backgrounds27
Approach 2: Overlap Magnitude Response More practical than non-overlap Result in aliasing effect Low cost of filters Solution to overlap method – Carefully design of synthesis filter bank to cease aliasing effect
Quadure Mirror Filter H0(z) 0
H1(z) π/2
π NCTU/CSIE/DSP LAB Audio Processing Group
3. Filter Bank Design-- Perfect Reconstruction
Backgrounds28
Goal : perfect reconstruction ( PR ) no aliasing distortion no amplitude distorion no phase distortion
x$ ( n ) = cx ( n − n 0 ),
x0(n) x(n)
H0(z)
v0(n) "M
x1(n) H1(z)
"M xM-1(n)
HM-1(z) Analysis bank
"M
u0(n) F0(z)
"M v1(n)
vM-1(n)
Decimators
c≠0
"M
u1(n)
F1(z)
uM-1(n) "M
FM-1(z)
Expanders
Synthesis bank
^ x(n)
NCTU/CSIE/DSP LAB Audio Processing Group
3. Filter Bank Design-- A Brief History
Backgrounds29
Distortion Soruces of the Reconstructed Signals Aliasing Amplitude distortion Phase distortion Coding and quantization of the subband signals
Brief History-- Two Channel Case Aliasing can be eliminated by a simple choice of synthesis filters: Croiser, et al. [1976] Eliminate the other two distortions: Johnston, 1980; Jain and Crochiere, 1984; Fettweis, 1985 Efficient Structures: Galand abd Nussbaumer, 1984 Perfect Reconstruction can be achieved by FIR Filters: Smith and Barnwell, 1984; Mintzer, 1985 Further Optimization [Grenez, 1988] NCTU/CSIE/DSP LAB Audio Processing Group
3. Filter Bank Design-- A Brief History(c.1)
Backgrounds30
A Brief History-- M-Channel Filter Banks Pseudo-QMF: Nussbaumer, 1981; Rothweiler, 1983; Chu, 19885; Massan and Picel, 1985; Cox, 1986. General Theory: Ramstad, 1984, Smith and Barnwell, 1985; Vetterli, 1985, Princen and Bradley, 1986, Wackershruther, 1986, Vaidyanathan, 1987, Nguyen and Vaidyanathan, 1988, Viscito and Allebach, 1988a. – Vetterli and Vaidyanathan show the use of polyphase components leads to considerable simplification of the theory. Perfect Reconstruction Systems: Vaidyanathan, 1987 – Polyphase matrices with paraunitary property. Perfect Reconstruction Modulated Filter Banks – Malvar, 1990; Koilpillai and Vaidyanathan, 1991, 1992; Ramstad, 1991.
NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review
Backgrounds31
Standard Developments MPEG1 Specification ISO Performance Tests and the Weighted Factor Test Results on MUSICAM and ASPEC
NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review-- Standard Developments
Backgrounds32
Audio coding began in the early 1970s NICAM of BBC use uniform or nonuniform quantization of audio samples 728 kb/s for a stereo audio signal of 15-kHz bandwidth
AC-2 of Dolby Adaptive transform coding Achieve 128 kb/s
NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review-- Standard Developments (c.1)
Backgrounds33
ASPEC (AT&T, D. Thomson, Fraunhofer-G, France Telecom) Transform coding Overlapping block
ATAC ( Fujitsu, NEC, Sony, JVC) Transform coding with non-overlapping block
MUSICAM (IRT, Philips, CCETT, Matsushita) Subband coding wih more than 8-band
SB/ADPCM (BTRL, NTT) Subband coding wih less than 8-band
NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review-- MPEG1 Specification
Backgrounds34
Sampling Rate 32, 44.1, 48 kHz
Input Resolution 16 bits uniform PCM
Bit Rates & Modes Monophonic: 32, 64, 96, 128, 192 kbits/s Stereo or Bilingual: 128, 192, 256, and 384 kbit/s
Access Units The smallest part of the audio signal which can be decoded by itself. Access Units should be less than 100 ms.
Decoding Delay less than 80 ms at a bitrate of 2 x 128 kbit/s NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review-- ISO Performance Tests and the Weighted Factor
Backgrounds35
Performance 1. Sound quality of forward audio playback 2. Sound quality of fast forward audio playback 3. Random access 4. Ability t encode in real time 5. Data capability for acillary information 6. High quality stereo 7. Intermediate quality audio 8. Robustness to bit errors 9. Encoder complexity 10. Decoder complexity 11. Short decoder delay
Weight Factor Note 121 67 118 55 93 86 96 89 59 117 72
S S O O O S S O O O O
NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review-- ISO Test Results on ASPEC and MUSICAM
Backgrounds36
Subjective: Items 1, 2, 6, 7 of performance test Subjective: Item 3, 4, 5, 8, 9, 10, 11 of performance test. Algorithm
ASPEC
MUSICAM
Subjective Test
3272
2942
Objective Test
4557
5408
Total
7829
8350
Comments Sound Quality Implementation Complexity Decoding Delay
ASPEC > MUSICAM MUSICAM < ASPEC MUSICAM < ASPEC
NCTU/CSIE/DSP LAB Audio Processing Group
4. Historical Review-- MPEG1 Structures
Backgrounds37
MPEG audio The MPEG audio is constructed in the following structure
MPEG MPEGAudio Audio Layer LayerIIII
MPEG MPEGAudio Audio Layer LayerIII III
Layer LayerII MUSICAM MUSICAM
ASPEC ASPEC
NCTU/CSIE/DSP LAB Audio Processing Group