Scalable Perceptual Metric for Evaluating Audio Quality - Google Sites

Scalable Perceptual Metric for Evaluating Audio Quality Rahul Vanam Dept. of Electrical Engineering University of Washington Charles D. Creusere Klipsch School of Electrical and Computer Engineering New Mexico State University

Background • Because modern audio compression algorithms are optimized for the human auditory system, conventional objective like segmental signal-tonoise ratio are not effective • This has forced researchers to rely upon human subjective testing in order to validate and compare different algorithms – Time consuming – Ill-suited to online implementation – The results are often difficult to repeat

Background • Since the late 1980s, there has been a strong push to develop objective metrics capable of quantifying subjective audio quality • The culminated in the development of ITUR Recommendation BS.1387-1, called PEAQ – Contains a lower-complexity basic version and a more accurate advanced version

Background • Problem: Both the basic and advanced versions of PEAQ are designed to evaluate the quality of mildly impaired audio • Because we are interested in scalable audio compression, we would like an objective metric that is accurate over a wide range of audio impairments

Solution Framework • In previous work, we found that an alternative metric, namely the Energy Equalization Approach (EEA), was far more accurate in characterizing the quality of highly impaired audio than either version of PEAQ • In this paper, we combine EEA with PEAQadvanced to create a metric that is fidelity scalable: i.e., that is accurate over a wide range of audio qualities.

Energy Equalization Approach • Idea: Apply a truncation threshold to the original audio sequence, adjusting it until the energy of this sequence is the same as that of the reconstructed audio sequence – Mimics the process of band truncation that occurs in perceptual audio codecs

Energy Equalization Metric Define: • Energy of reconstructed audio ek =

total _ blocks 100

∑

i =0

GOAL: Select T so that: eT = ek

∑ (rec_spec(i, j )k )2

j =51

• Modified time-frequency spectrum ⎧ o_spec (i, j ), m_spec(i, j )Tkn = ⎨ ⎩0,

if o_spec (i,j ) ≥ Tkn if o_spec (i,j ) < Tkn

• Energy of modified spectrum eTkn =

total _ blocks 100

∑

i =0

∑ (m_spec(i, j)Tkn )2

j =51

New Metric Design • We combine the ‘T’ parameter generated by EEA with the five Model Output Variables (MOVs) that are already part of the PEAQ-advanced recommendation – Existing MOVs quantify the distortion loudness, the changes in modulation, the linear distortion, the harmonic structure of the error, and the noise-to-mask ratio

• A simple optimal linear weighting is used to fuse the MOVs into a single value

Subjective Test Data • The data used to test and design the proposed objective metrics was collected using the Comparison Category Rating (CCR) approach – 20 test subjects – 7 different audio sequences – Encoded bitrates of 16 and 32 kb/s – Using MPEG4 codecs: AAC, BSAC, and TVQ

Comparisons Optimal Linear Combination of PEAQ MOVs: Predictor Fit 3 Data point for Modified Advanced ver. LS Fit: Modified Advanced ver. Advanced version data point LS Fit:Advanced version EAQUAL data point LS Fit: EAQUAL

Subjective Measurement

2.5

2

1.5

1

0.5

0

0

0.5

1

1.5 Objective Measurement

2

2.5

3

Comparisons Optimal Linear Combination of PEAQ MOVs: Holdout Case 2.5 Squared Error Change in slope

Squared Error / Change in slope

2

1.5

1

0.5

0

0

5

10

15 Holdout Case

20

25

Comparisons Optimal Linear Combination, PEAQ MOVs plus EEA Error in Holdout Case

Optimal Fit 1.5

3


2.5


Squared Error Change in slope

Adv ver. with EEA MOV and Single Layer NN data point LS Fit:Adv ver. with EEA MOV and Single Layer NN Advanced ver. data point LS Fit:Advanced version Adv ver. with single layer NN data point LS Fit: Adv ver. with sigle layer NN

2

1.5

1

1

0.5

0.5

0

0

0

0.5

1


2

2.5

3

0

2

4

6

8

10 12 Holdout Case

14

16

18

20

Comparisons Optimal Linear Combination, Bitrate Optimized: Low/Mid Quality Audio Error in Holdout Case

Optimal Fit 3

1.5


2.5


Squared Error Change in slope

Data point:Advanced ver. with Energy equalization LS Fit:Advanced ver. with Energy Equalization Data point:Adv. ver. with bitrate based weight selection LS fit:Adv ver. with bitrate based weight selection

2

1.5

1

1

0.5

0.5

0

0

0.5

1


2

2.5

3

0

0

2

4

6

8

10 12 Holdout Case

14

16

18

20

Comparisons Optimal Linear Combination, Bitrate Optimized: High Quality Audio Error in Holdout Case

Optimal Fit 3

0.25 Squared Error Change in slope

2.5



0.2

2

1.5

1

0.15

0.1

0.05 0.5

0

0

0.5

1


2

2.5

3

0

1

2

3

4

5

6

7 8 9 Holdout Case

10

11

12

13

14

15

Note: Perceptual measurements are simulated by treating the ODG values Generated by PEAQ-advanced as if they SDG values acquired through perceptual testing

Conclusions • Combining the EEA truncation threshold with the PEAQ MOVs clearly improves the predictive performance of the metric – The correlation coefficient is increased – The MSE of the predication error is decreased

• If bitrate information is also available, performance is further increased significantly

Future Work • Design a more complex 3-layer neural network similar to that used in PEAQ to generate the metric’s output from the MOVs • Generate additional subjective data using the more recent MUSHRA testing protocol and use it to more thoroughly validate the proposed metric