Part 2: Audio Coding

Part 4: Audio Coding

Backgrounds1

Introduction Psychoacoustic Effects Filter Bank Design Historical Reviews MUSICAM & ASPEC MPEG1 MPEGII & IV AC3

NCTU/CSIE/DSP LAB Audio Processing Group

1. Introduction-- Signals Parameters

Backgrounds2

Typical values of basic parameters of three classes of acoustic signals

Acoustic Signals

Frequency Range

Sampling Rate

PCM bits per sample

PCM bit rate

Telephone Speech

300 - 3400 Hz

8 kHz

8

64 kb/s

Wideband Speech

50 - 7, 000 Hz

16 kHz

14

224 kb/s

Wideband Audio

10 - 20,000 Hz

48 kHz

16

768 kb/s


1. Introduction-- Storage

Backgrounds3

Format of CD and DAT Storage Formats

Storage Device

Sampling Rate in kHz

Audio Rate in Mb/s

Overhead in Mb/s

Total Bit Rate in Mb/s

CD

44.1

1.41

2.91

4.32

DAT

32.0

1.03

1.43

2.46

DAT

44.1

1.41

1.05

2.46

DAT

48

1.54

0.92

2.46


1. Introduction-- MOS

Backgrounds4

Mean Opinion Score (MOS) Scale Subjective Rating Scale of 1-5

4.0 High quality, or near transparent coding A necessary condition for Network Quality

3.5 Speech degradation is easily detectable Communications quality

3.0 and below Synthetic speech Inadequate level of naturalness and speaker recognizability


1. Introduction-- Audio

Backgrounds5

Range of audio 15- or 20-kHz bandwidth high-fidelity audio signal

Two ways for audio coding Transform coding Subband coding

Transform coding Use frequency domain transform to extract the signal, eg. FFT, DCT Scalar quantization and entropy coding Perceptual masking model


1. Introduction-- Audio (cont.)

Backgrounds6

Applications: AC-2, AC-3, ASPEC (adaptive spectral perceptual entropy coding), ISO/MPEG audio Layer III

Subband coding Analyze the frequency components of signal by filter bank Applications: MASCAM (masking pattern adapted subband coding and multiplexing), MUSICAM (masking pattern adapted universal subband integrated coding and multiplexing), ISO/MPEG audio Layer I, II, and III


1. Introduction (Cont.)

Backgrounds7

Psychoacoustic model Simulate the masking effect of human ears Provide information to control the quantization level Use this information such that quantization error under masking threshold

Just-noticeable Distortion


1. Introduction (Cont.)

Backgrounds8

ISO/MPEG audio standardization The first international standard in the field of high quality digital audio compression. Different sample rate: 32-, 44.1-, 48-kHz Four audio modes: monophonic, dual-channel, stereo, joint stereo Three layers Each layer has increasing complexity and quality

Applications Radio sound program emission (Digital audio broadcasting (DAB), satellite) Digital compact cassette (DCC) Television sound emission (HDTV) Storage Multimedia applications NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic

Backgrounds9

Introduction Masking Effect Critical Band


2. Psychoacoustic - Introduction

Backgrounds10

Sound Pressure Sounds are easily described by mens of the time-varying sound pressure p(t). The unit of sound pressure is the PASCAL (Pa). In psychoacoustics, values of the sound pressure between 10-5 Pa and 102 PA are relevent.

Sound Pressure Levels Normally used to cope with the broad range of sound pressure.

L = 20 log( p / p0 ) dB

The reference value of the sound pressure p0 is standardized to p0 = 20 µPa.


2. Psychoacoustic - Introduction (c.1)

Backgrounds11

Sound Intensity and Sound Intensity Levels L = 20 log( p / p0 ) dB = 10 log( I / I 0 ) dB

The reference value I0 is defined as 10 -12 W/m2

Noise Density When dealing with noises, it is advantageous to use density instead of sound intensity – e.g., the sound intensity within a bandwidth of 1 Hz. The noise power density, although not quite correct, is also used. The logarithmic correlate of the density of sound intensity is called sound intensity density level, usually shortened to density level, l. For white nose, l and L are related by the equation

L = [l + 10 log( ∆f / Hz )]dB

where ∆f represents the bandwidth of the sound. NCTU/CSIE/DSP LAB Audio Processing Group

2. Psychoacoustic - Introduction(c.2)

Backgrounds12

Normative Elements on Human Ear Threshold in quiet – A function of frequency that the sound pressure level of a pure tone that is just audible Masking – Property of the human auditory system by which an audio signal cannot be perceived in the presence of another audio signal. Masking threshold – A function in frequency and time below which an audio signal cannot be perceived by the human auditory system. Critical band – Loosely speaking, the perception of a particular frequency, say Ω0, by the auditory system is influenced by energy in a critical band of frequencies around Ω0. – The ear acts as a multichannel real-time analyzer with varying sensitivity and bandwidth throughout the audio range.


2. Psychoacoustic - Masking effect

Backgrounds13

Masking of pure tones by noise Pure tones masked by broad-band noise The masked thresholds rise with increasing frequency. The slope of this increase corresponds to about 10 dB. At low frequencies, the masked thresholds lie about 17 dB above the given density level.


2. Psychoacoustic - Masking effect (c.1)

Backgrounds14

Masking of pure tones by narrow-band noise The bandwidth is about 100 Hz below and 0.2 f above 500 Hz. The level of each masking noise is 60 dB and the corresponding bandwidths of the noise are 100, 160, and 700 Hz, respectively.


2. Psychoacoustic-- Masking Effects (c.2)

Backgrounds15

Masking of pure tones by narrow-band noise (c.1)



Backgrounds16

Pure tones masked by Low-pass or High-pass noise The cut-off frequencies is 0.9 KHz and 1.1 kHz, respectively.



Backgrounds17

Masking of pure tones by pure tones Masking tone-- 1 kHz, 80 dB.



Backgrounds18

Pure tones masked by pure tones



Backgrounds19

Pure tones masked by complex tones



Backgrounds20

Temporal effect Simultaneous masking – When two signal presence simultaneously , the phenomenon of the weaker signal become inaudible are called simultaneous masking Premasking – The test sound has to be a short burst or sound impulse which can be presented before the masker stimulus is switched on Postmasking – The test sound is presented after the masker is switched off , then quite pronounced effects occur



Backgrounds21

Premasking & Postmasking do not offer much efforts than simultaneous masking


2. Psychoacoustic - Critical band

Backgrounds22

Concepts Assume the part of the a noise is effective in masking a test tone is the part of its spectrum lying near the tone

Fletcher band-widening experiment [1940] Energy

2000 Hz Sinusoidal signal

60 58 56 54 52 50 48 50

100

200

400

800

1600

3200

Masker Bandwidth (Hz)


2. Psychoacoustic - Critical band (c.1)

Backgrounds23

Observations we can assume that human ear performs like a set of bandpass filters

Critical Band Rate Psychoacoustic measure in the spectral domain which corresponds to the frequency selectivity of the human ear.

Bark Unit of critical band rate.Part of the spectral domain which corresponds to a width of one Bark.

Formula z = 13 tan-1 (0.76 f ) + 3.5 tan-1 ( f /7.5)2 f : kHz


3. Filter Bank Design

Backgrounds24

Error Sources Two Design Concepts Perfect Reconstruction (PR) Systems History Review x(n)

H0(z) H1(z)

x0(n) x1(n)

"M "M

v0(n) v1(n)

"M "M

u0(n) u1(n)

F0(z) F1(z)

xM-1(n) uM-1(n) vM-1(n) HM-1(z) "M "M FM-1(z) Analysis Decimators Expanders bank

Synthesis bank NCTU/CSIE/DSP LAB Audio Processing Group

^ x(n)

3. Filter Bank Design-- Error Sources

Backgrounds25

Two Channel Filter Bank x(n)

H0(z) H1(z) Analysis bank

x0(n) x1(n)

?2 ?2

v0(n) v1(n)

?2 ?2

y0(n) y1(n)

Decimators Expanders

Error Sources

F0(z) F1(z)

^ x(n)

Synthesis bank

Aliasing Amplitude distortion Phase distortion


3. Filter Bank Design-- Two Design Concepts

Backgrounds26

Two Trival Design Approaches Approach 1-- Non-Overlap Magnitude Response Aliasing effect is not serious The signal around ω=π/2 with serve attenuation Solutions to non-overlap method – Boost the non-overlap frequency region, but will result in serve amplification of noise – Decrease the transition width, but the cost of filters increase H0(z) 0

H1(z) π/2

π

ω NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- Two Design Concepts (c.1)

Backgrounds27

Approach 2: Overlap Magnitude Response More practical than non-overlap Result in aliasing effect Low cost of filters Solution to overlap method – Carefully design of synthesis filter bank to cease aliasing effect

Quadure Mirror Filter H0(z) 0

H1(z) π/2

π NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- Perfect Reconstruction

Backgrounds28

Goal : perfect reconstruction ( PR ) no aliasing distortion no amplitude distorion no phase distortion

x$ ( n ) = cx ( n − n 0 ),

x0(n) x(n)

H0(z)

v0(n) "M

x1(n) H1(z)

"M xM-1(n)

HM-1(z) Analysis bank

"M

u0(n) F0(z)

"M v1(n)

vM-1(n)

Decimators

c≠0

"M

u1(n)

F1(z)

uM-1(n) "M

FM-1(z)

Expanders

Synthesis bank

^ x(n)


3. Filter Bank Design-- A Brief History

Backgrounds29

Distortion Soruces of the Reconstructed Signals Aliasing Amplitude distortion Phase distortion Coding and quantization of the subband signals

Brief History-- Two Channel Case Aliasing can be eliminated by a simple choice of synthesis filters: Croiser, et al. [1976] Eliminate the other two distortions: Johnston, 1980; Jain and Crochiere, 1984; Fettweis, 1985 Efficient Structures: Galand abd Nussbaumer, 1984 Perfect Reconstruction can be achieved by FIR Filters: Smith and Barnwell, 1984; Mintzer, 1985 Further Optimization [Grenez, 1988] NCTU/CSIE/DSP LAB Audio Processing Group

3. Filter Bank Design-- A Brief History(c.1)

Backgrounds30

A Brief History-- M-Channel Filter Banks Pseudo-QMF: Nussbaumer, 1981; Rothweiler, 1983; Chu, 19885; Massan and Picel, 1985; Cox, 1986. General Theory: Ramstad, 1984, Smith and Barnwell, 1985; Vetterli, 1985, Princen and Bradley, 1986, Wackershruther, 1986, Vaidyanathan, 1987, Nguyen and Vaidyanathan, 1988, Viscito and Allebach, 1988a. – Vetterli and Vaidyanathan show the use of polyphase components leads to considerable simplification of the theory. Perfect Reconstruction Systems: Vaidyanathan, 1987 – Polyphase matrices with paraunitary property. Perfect Reconstruction Modulated Filter Banks – Malvar, 1990; Koilpillai and Vaidyanathan, 1991, 1992; Ramstad, 1991.


4. Historical Review

Backgrounds31

Standard Developments MPEG1 Specification ISO Performance Tests and the Weighted Factor Test Results on MUSICAM and ASPEC


4. Historical Review-- Standard Developments

Backgrounds32

Audio coding began in the early 1970s NICAM of BBC use uniform or nonuniform quantization of audio samples 728 kb/s for a stereo audio signal of 15-kHz bandwidth

AC-2 of Dolby Adaptive transform coding Achieve 128 kb/s


4. Historical Review-- Standard Developments (c.1)

Backgrounds33

ASPEC (AT&T, D. Thomson, Fraunhofer-G, France Telecom) Transform coding Overlapping block

ATAC ( Fujitsu, NEC, Sony, JVC) Transform coding with non-overlapping block

MUSICAM (IRT, Philips, CCETT, Matsushita) Subband coding wih more than 8-band

SB/ADPCM (BTRL, NTT) Subband coding wih less than 8-band


4. Historical Review-- MPEG1 Specification

Backgrounds34

Sampling Rate 32, 44.1, 48 kHz

Input Resolution 16 bits uniform PCM

Bit Rates & Modes Monophonic: 32, 64, 96, 128, 192 kbits/s Stereo or Bilingual: 128, 192, 256, and 384 kbit/s

Access Units The smallest part of the audio signal which can be decoded by itself. Access Units should be less than 100 ms.

Decoding Delay less than 80 ms at a bitrate of 2 x 128 kbit/s NCTU/CSIE/DSP LAB Audio Processing Group

4. Historical Review-- ISO Performance Tests and the Weighted Factor

Backgrounds35

Performance 1. Sound quality of forward audio playback 2. Sound quality of fast forward audio playback 3. Random access 4. Ability t encode in real time 5. Data capability for acillary information 6. High quality stereo 7. Intermediate quality audio 8. Robustness to bit errors 9. Encoder complexity 10. Decoder complexity 11. Short decoder delay

Weight Factor Note 121 67 118 55 93 86 96 89 59 117 72

S S O O O S S O O O O


4. Historical Review-- ISO Test Results on ASPEC and MUSICAM

Backgrounds36

Subjective: Items 1, 2, 6, 7 of performance test Subjective: Item 3, 4, 5, 8, 9, 10, 11 of performance test. Algorithm

ASPEC

MUSICAM

Subjective Test

3272

2942

Objective Test

4557

5408

Total

7829

8350

Comments Sound Quality Implementation Complexity Decoding Delay

ASPEC > MUSICAM MUSICAM < ASPEC MUSICAM < ASPEC


4. Historical Review-- MPEG1 Structures

Backgrounds37

MPEG audio The MPEG audio is constructed in the following structure

MPEG MPEGAudio Audio Layer LayerIIII

MPEG MPEGAudio Audio Layer LayerIII III

Layer LayerII MUSICAM MUSICAM

ASPEC ASPEC


Part 2: Audio Coding

Part 2: Audio Coding

Suggest Documents

On Beer and Audio Coding

Perceptual coding of audio signals

PERCEPTUAL AUDIO CODING THAT SCALES

Perceptual coding of audio signals

COMPUTER GAMES AND MULTICHANNEL AUDIO QUALITY PART 2

A low complexity audio coding scheme for wideband audio - Signals ...

Spatial Audio Object Coding With Two-Step Coding ... - IEEE Xplore

Chapter 2 – Part 2

Using NVivo Audio-Coding: Practical, Sensorial and ...

Improved Integer Transforms for Lossless Audio Coding

Scalar Quantization for Audio Data Coding

Embedded lossless audio coding using linear ...

GIBBER: LIVE CODING AUDIO IN THE BROWSER

Parametric coding of stereo audio - Jeroen Breebaart

Spatial Audio Object Coding (SAOC) â The

Audio Coding on Packet Switched ... - CiteSeerX

An Efficient Audio Coding Scheme for Quantitative

NEW DVB VIDEO & AUDIO CODING SPECIFICATION ENABLING ...

Part 2

part 2

PACKET LOSS CONCEALMENT IN PREDICTIVE AUDIO CODING ...

Audio Coding Based on Long Temporal Contexts

Multichannel Audio Coding Using Sinusoidal ...

sub-band coding of audio using recursively

Part 2: Audio Coding