Part 2: Audio Coding

23 downloads 108 Views 387KB Size Report
to a width of one Bark. ▫ Formula. ○ z = 13 tan-1 (0.76 f ) + 3.5 tan-1 ( f /7.5)2. ○ f : kHz .... ATAC ( Fujitsu, NEC, Sony, JVC). ○ Transform coding with ...
Part 4: Audio Coding „ „ „ „ „ „ „ „

Backgrounds1

Introduction Psychoacoustic Effects Filter Bank Design Historical Reviews MUSICAM & ASPEC MPEG1 MPEGII & IV AC3

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction-- Signals Parameters

Backgrounds2

„ Typical values of basic parameters of three classes of acoustic signals

Acoustic Signals

Frequency Range

Sampling Rate

PCM bits per sample

PCM bit rate

Telephone Speech

300 - 3400 Hz

8 kHz

8

64 kb/s

Wideband Speech

50 - 7, 000 Hz

16 kHz

14

224 kb/s

Wideband Audio

10 - 20,000 Hz

48 kHz

16

768 kb/s

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction-- Storage

Backgrounds3

„ Format of CD and DAT Storage Formats

Storage Device

Sampling Rate in kHz

Audio Rate in Mb/s

Overhead in Mb/s

Total Bit Rate in Mb/s

CD

44.1

1.41

2.91

4.32

DAT

32.0

1.03

1.43

2.46

DAT

44.1

1.41

1.05

2.46

DAT

48

1.54

0.92

2.46

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction-- MOS

Backgrounds4

€Mean Opinion Score (MOS) Scale Subjective Rating Scale of 1-5

„ 4.0 High quality, or near transparent coding A necessary condition for Network Quality

„ 3.5 Speech degradation is easily detectable Communications quality

„ 3.0 and below Synthetic speech Inadequate level of naturalness and speaker recognizability

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction-- Audio

Backgrounds5

„ Range of audio 15- or 20-kHz bandwidth high-fidelity audio signal

„ Two ways for audio coding Transform coding Subband coding

„ Transform coding Use frequency domain transform to extract the signal, eg. FFT, DCT Scalar quantization and entropy coding Perceptual masking model

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction-- Audio (cont.)

Backgrounds6

Applications: AC-2, AC-3, ASPEC (adaptive spectral perceptual entropy coding), ISO/MPEG audio Layer III

„ Subband coding Analyze the frequency components of signal by filter bank Applications: MASCAM (masking pattern adapted subband coding and multiplexing), MUSICAM (masking pattern adapted universal subband integrated coding and multiplexing), ISO/MPEG audio Layer I, II, and III

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction (Cont.)

Backgrounds7

„ Psychoacoustic model Simulate the masking effect of human ears Provide information to control the quantization level Use this information such that quantization error under masking threshold

Just-noticeable Distortion

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction (Cont.)

Backgrounds8

„ ISO/MPEG audio standardization The first international standard in the field of high quality digital audio compression. Different sample rate: 32-, 44.1-, 48-kHz Four audio modes: monophonic, dual-channel, stereo, joint stereo Three layers Each layer has increasing complexity and quality

„ Applications Radio sound program emission (Digital audio broadcasting (DAB), satellite) Digital compact cassette (DCC) Television sound emission (HDTV) Storage Multimedia applications NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic

Backgrounds9

„ Introduction „ Masking Effect „ Critical Band

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Introduction

Backgrounds10

„ Sound Pressure Sounds are easily described by mens of the time-varying sound pressure p(t). The unit of sound pressure is the PASCAL (Pa). In psychoacoustics, values of the sound pressure between 10-5 Pa and 102 PA are relevent.

„ Sound Pressure Levels Normally used to cope with the broad range of sound pressure.

L = 20 log( p / p0 ) dB

The reference value of the sound pressure p0 is standardized to p0 = 20 µPa.

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Introduction (c.1)

Backgrounds11

„ Sound Intensity and Sound Intensity Levels L = 20 log( p / p0 ) dB = 10 log( I / I 0 ) dB

The reference value I0 is defined as 10 -12 W/m2

„ Noise Density When dealing with noises, it is advantageous to use density instead of sound intensity – e.g., the sound intensity within a bandwidth of 1 Hz. The noise power density, although not quite correct, is also used. The logarithmic correlate of the density of sound intensity is called sound intensity density level, usually shortened to density level, l. For white nose, l and L are related by the equation

L = [l + 10 log( ∆f / Hz )]dB

where ∆f represents the bandwidth of the sound. NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Introduction(c.2)

Backgrounds12

„ Normative Elements on Human Ear Threshold in quiet – A function of frequency that the sound pressure level of a pure tone that is just audible Masking – Property of the human auditory system by which an audio signal cannot be perceived in the presence of another audio signal. Masking threshold – A function in frequency and time below which an audio signal cannot be perceived by the human auditory system. Critical band – Loosely speaking, the perception of a particular frequency, say Ω0, by the auditory system is influenced by energy in a critical band of frequencies around Ω0. – The ear acts as a multichannel real-time analyzer with varying sensitivity and bandwidth throughout the audio range.

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect

Backgrounds13

„ Masking of pure tones by noise Pure tones masked by broad-band noise The masked thresholds rise with increasing frequency. The slope of this increase corresponds to about 10 dB. At low frequencies, the masked thresholds lie about 17 dB above the given density level.

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.1)

Backgrounds14

„ Masking of pure tones by narrow-band noise The bandwidth is about 100 Hz below and 0.2 f above 500 Hz. The level of each masking noise is 60 dB and the corresponding bandwidths of the noise are 100, 160, and 700 Hz, respectively.

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic-- Masking Effects (c.2)

Backgrounds15

„ Masking of pure tones by narrow-band noise (c.1)

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.3)

Backgrounds16

„ Pure tones masked by Low-pass or High-pass noise The cut-off frequencies is 0.9 KHz and 1.1 kHz, respectively.

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.4)

Backgrounds17

„ Masking of pure tones by pure tones Masking tone-- 1 kHz, 80 dB.

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.5)

Backgrounds18

Pure tones masked by pure tones

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.6)

Backgrounds19

Pure tones masked by complex tones

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.7)

Backgrounds20

„ Temporal effect Simultaneous masking – When two signal presence simultaneously , the phenomenon of the weaker signal become inaudible are called simultaneous masking Premasking – The test sound has to be a short burst or sound impulse which can be presented before the masker stimulus is switched on Postmasking – The test sound is presented after the masker is switched off , then quite pronounced effects occur

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Masking effect (c.8)

Backgrounds21

Premasking & Postmasking do not offer much efforts than simultaneous masking

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Critical band

Backgrounds22

„ Concepts Assume the part of the a noise is effective in masking a test tone is the part of its spectrum lying near the tone

„ Fletcher band-widening experiment [1940] Energy

2000 Hz Sinusoidal signal

60 58 56 54 52 50 48 50

100

200

400

800

1600

3200

Masker Bandwidth (Hz)

NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Critical band (c.1)

Backgrounds23

„ Observations we can assume that human ear performs like a set of bandpass filters

„ Critical Band Rate Psychoacoustic measure in the spectral domain which corresponds to the frequency selectivity of the human ear.

„ Bark Unit of critical band rate.Part of the spectral domain which corresponds to a width of one Bark.

„ Formula z = 13 tan-1 (0.76 f ) + 3.5 tan-1 ( f /7.5)2 f : kHz

NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design „ „ „ „

Backgrounds24

Error Sources Two Design Concepts Perfect Reconstruction (PR) Systems History Review x(n)

H0(z) H1(z)

x0(n) x1(n)

"M "M

v0(n) v1(n)

"M "M

u0(n) u1(n)

F0(z) F1(z)

xM-1(n) uM-1(n) vM-1(n) HM-1(z) "M "M FM-1(z) Analysis Decimators Expanders bank

Synthesis bank NCTU/CSIE/DSP LAB Audio Processing Group

^ x(n)

3. Filter Bank Design-- Error Sources

Backgrounds25

„ Two Channel Filter Bank x(n)

H0(z) H1(z) Analysis bank

x0(n) x1(n)

?2 ?2

v0(n) v1(n)

?2 ?2

y0(n) y1(n)

Decimators Expanders

„ Error Sources

F0(z) F1(z)

^ x(n)

Synthesis bank

Aliasing Amplitude distortion Phase distortion

NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- Two Design Concepts

Backgrounds26

Two Trival Design Approaches „ Approach 1-- Non-Overlap Magnitude Response Aliasing effect is not serious The signal around ω=π/2 with serve attenuation Solutions to non-overlap method – Boost the non-overlap frequency region, but will result in serve amplification of noise – Decrease the transition width, but the cost of filters increase H0(z) 0

H1(z) π/2

π

ω NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- Two Design Concepts (c.1)

Backgrounds27

„ Approach 2: Overlap Magnitude Response More practical than non-overlap Result in aliasing effect Low cost of filters Solution to overlap method – Carefully design of synthesis filter bank to cease aliasing effect

Quadure Mirror Filter H0(z) 0

H1(z) π/2

π NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- Perfect Reconstruction

Backgrounds28

„ Goal : perfect reconstruction ( PR ) no aliasing distortion no amplitude distorion no phase distortion

x$ ( n ) = cx ( n − n 0 ),

x0(n) x(n)

H0(z)

v0(n) "M

x1(n) H1(z)

"M xM-1(n)

HM-1(z) Analysis bank

"M

u0(n) F0(z)

"M v1(n)

vM-1(n)

Decimators

c≠0

"M

u1(n)

F1(z)

uM-1(n) "M

FM-1(z)

Expanders

Synthesis bank

^ x(n)

NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- A Brief History

Backgrounds29

„ Distortion Soruces of the Reconstructed Signals Aliasing Amplitude distortion Phase distortion Coding and quantization of the subband signals

„ Brief History-- Two Channel Case Aliasing can be eliminated by a simple choice of synthesis filters: Croiser, et al. [1976] Eliminate the other two distortions: Johnston, 1980; Jain and Crochiere, 1984; Fettweis, 1985 Efficient Structures: Galand abd Nussbaumer, 1984 Perfect Reconstruction can be achieved by FIR Filters: Smith and Barnwell, 1984; Mintzer, 1985 Further Optimization [Grenez, 1988] NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- A Brief History(c.1)

Backgrounds30

„ A Brief History-- M-Channel Filter Banks Pseudo-QMF: Nussbaumer, 1981; Rothweiler, 1983; Chu, 19885; Massan and Picel, 1985; Cox, 1986. General Theory: Ramstad, 1984, Smith and Barnwell, 1985; Vetterli, 1985, Princen and Bradley, 1986, Wackershruther, 1986, Vaidyanathan, 1987, Nguyen and Vaidyanathan, 1988, Viscito and Allebach, 1988a. – Vetterli and Vaidyanathan show the use of polyphase components leads to considerable simplification of the theory. Perfect Reconstruction Systems: Vaidyanathan, 1987 – Polyphase matrices with paraunitary property. Perfect Reconstruction Modulated Filter Banks – Malvar, 1990; Koilpillai and Vaidyanathan, 1991, 1992; Ramstad, 1991.

NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review „ „ „ „

Backgrounds31

Standard Developments MPEG1 Specification ISO Performance Tests and the Weighted Factor Test Results on MUSICAM and ASPEC

NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- Standard Developments

Backgrounds32

„ Audio coding began in the early 1970s „ NICAM of BBC use uniform or nonuniform quantization of audio samples 728 kb/s for a stereo audio signal of 15-kHz bandwidth

„ AC-2 of Dolby Adaptive transform coding Achieve 128 kb/s

NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- Standard Developments (c.1)

Backgrounds33

„ ASPEC (AT&T, D. Thomson, Fraunhofer-G, France Telecom) Transform coding Overlapping block

„ ATAC ( Fujitsu, NEC, Sony, JVC) Transform coding with non-overlapping block

„ MUSICAM (IRT, Philips, CCETT, Matsushita) Subband coding wih more than 8-band

„ SB/ADPCM (BTRL, NTT) Subband coding wih less than 8-band

NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- MPEG1 Specification

Backgrounds34

„ Sampling Rate 32, 44.1, 48 kHz

„ Input Resolution 16 bits uniform PCM

„ Bit Rates & Modes Monophonic: 32, 64, 96, 128, 192 kbits/s Stereo or Bilingual: 128, 192, 256, and 384 kbit/s

„ Access Units The smallest part of the audio signal which can be decoded by itself. Access Units should be less than 100 ms.

„ Decoding Delay less than 80 ms at a bitrate of 2 x 128 kbit/s NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- ISO Performance Tests and the Weighted Factor

Backgrounds35

Performance 1. Sound quality of forward audio playback 2. Sound quality of fast forward audio playback 3. Random access 4. Ability t encode in real time 5. Data capability for acillary information 6. High quality stereo 7. Intermediate quality audio 8. Robustness to bit errors 9. Encoder complexity 10. Decoder complexity 11. Short decoder delay

Weight Factor Note 121 67 118 55 93 86 96 89 59 117 72

S S O O O S S O O O O

NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- ISO Test Results on ASPEC and MUSICAM

Backgrounds36

„ Subjective: Items 1, 2, 6, 7 of performance test „ Subjective: Item 3, 4, 5, 8, 9, 10, 11 of performance test. Algorithm

ASPEC

MUSICAM

Subjective Test

3272

2942

Objective Test

4557

5408

Total

7829

8350

„ Comments Sound Quality Implementation Complexity Decoding Delay

ASPEC > MUSICAM MUSICAM < ASPEC MUSICAM < ASPEC

NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- MPEG1 Structures

Backgrounds37

„ MPEG audio The MPEG audio is constructed in the following structure

MPEG MPEGAudio Audio Layer LayerIIII

MPEG MPEGAudio Audio Layer LayerIII III

Layer LayerII MUSICAM MUSICAM

ASPEC ASPEC

NCTU/CSIE/DSP LAB Audio Processing Group