Rahul Vanam. Dept. of Electrical Engineering. University of Washington. Charles D. Creusere. Klipsch School of Electrica
Scalable Perceptual Metric for Evaluating Audio Quality Rahul Vanam Dept. of Electrical Engineering University of Washington Charles D. Creusere Klipsch School of Electrical and Computer Engineering New Mexico State University
Background • Because modern audio compression algorithms are optimized for the human auditory system, conventional objective like segmental signal-tonoise ratio are not effective • This has forced researchers to rely upon human subjective testing in order to validate and compare different algorithms – Time consuming – Ill-suited to online implementation – The results are often difficult to repeat
Background • Since the late 1980s, there has been a strong push to develop objective metrics capable of quantifying subjective audio quality • The culminated in the development of ITUR Recommendation BS.1387-1, called PEAQ – Contains a lower-complexity basic version and a more accurate advanced version
Background • Problem: Both the basic and advanced versions of PEAQ are designed to evaluate the quality of mildly impaired audio • Because we are interested in scalable audio compression, we would like an objective metric that is accurate over a wide range of audio impairments
Solution Framework • In previous work, we found that an alternative metric, namely the Energy Equalization Approach (EEA), was far more accurate in characterizing the quality of highly impaired audio than either version of PEAQ • In this paper, we combine EEA with PEAQadvanced to create a metric that is fidelity scalable: i.e., that is accurate over a wide range of audio qualities.
Energy Equalization Approach • Idea: Apply a truncation threshold to the original audio sequence, adjusting it until the energy of this sequence is the same as that of the reconstructed audio sequence – Mimics the process of band truncation that occurs in perceptual audio codecs
Energy Equalization Metric Define: • Energy of reconstructed audio ek =
total _ blocks 100
∑
i =0
GOAL: Select T so that: eT = ek
∑ (rec_spec(i, j )k )2
j =51
• Modified time-frequency spectrum ⎧ o_spec (i, j ), m_spec(i, j )Tkn = ⎨ ⎩0,
if o_spec (i,j ) ≥ Tkn if o_spec (i,j ) < Tkn
• Energy of modified spectrum eTkn =
total _ blocks 100
∑
i =0
∑ (m_spec(i, j)Tkn )2
j =51
New Metric Design • We combine the ‘T’ parameter generated by EEA with the five Model Output Variables (MOVs) that are already part of the PEAQ-advanced recommendation – Existing MOVs quantify the distortion loudness, the changes in modulation, the linear distortion, the harmonic structure of the error, and the noise-to-mask ratio
• A simple optimal linear weighting is used to fuse the MOVs into a single value
Subjective Test Data • The data used to test and design the proposed objective metrics was collected using the Comparison Category Rating (CCR) approach – 20 test subjects – 7 different audio sequences – Encoded bitrates of 16 and 32 kb/s – Using MPEG4 codecs: AAC, BSAC, and TVQ
Comparisons Optimal Linear Combination of PEAQ MOVs: Predictor Fit 3 Data point for Modified Advanced ver. LS Fit: Modified Advanced ver. Advanced version data point LS Fit:Advanced version EAQUAL data point LS Fit: EAQUAL
Subjective Measurement
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5 Objective Measurement
2
2.5
3
Comparisons Optimal Linear Combination of PEAQ MOVs: Holdout Case 2.5 Squared Error Change in slope
Squared Error / Change in slope
2
1.5
1
0.5
0
0
5
10
15 Holdout Case
20
25
Comparisons Optimal Linear Combination, PEAQ MOVs plus EEA Error in Holdout Case
Optimal Fit 1.5
3
Squared Error / Change in slope
2.5
Subjective Measurement
Squared Error Change in slope
Adv ver. with EEA MOV and Single Layer NN data point LS Fit:Adv ver. with EEA MOV and Single Layer NN Advanced ver. data point LS Fit:Advanced version Adv ver. with single layer NN data point LS Fit: Adv ver. with sigle layer NN
2
1.5
1
1
0.5
0.5
0
0
0
0.5
1
1.5 Objective Measurement
2
2.5
3
0
2
4
6
8
10 12 Holdout Case
14
16
18
20
Comparisons Optimal Linear Combination, Bitrate Optimized: Low/Mid Quality Audio Error in Holdout Case
Optimal Fit 3
1.5
Squared Error / Change in slope
2.5
Subjective Measurement
Squared Error Change in slope
Data point:Advanced ver. with Energy equalization LS Fit:Advanced ver. with Energy Equalization Data point:Adv. ver. with bitrate based weight selection LS fit:Adv ver. with bitrate based weight selection
2
1.5
1
1
0.5
0.5
0
0
0.5
1
1.5 Objective Measurement
2
2.5
3
0
0
2
4
6
8
10 12 Holdout Case
14
16
18
20
Comparisons Optimal Linear Combination, Bitrate Optimized: High Quality Audio Error in Holdout Case
Optimal Fit 3
0.25 Squared Error Change in slope
2.5
Squared Error / Change in slope
Subjective Measurement
0.2
2
1.5
1
0.15
0.1
0.05 0.5
0
0
0.5
1
1.5 Objective Measurement
2
2.5
3
0
1
2
3
4
5
6
7 8 9 Holdout Case
10
11
12
13
14
15
Note: Perceptual measurements are simulated by treating the ODG values Generated by PEAQ-advanced as if they SDG values acquired through perceptual testing
Conclusions • Combining the EEA truncation threshold with the PEAQ MOVs clearly improves the predictive performance of the metric – The correlation coefficient is increased – The MSE of the predication error is decreased
• If bitrate information is also available, performance is further increased significantly
Future Work • Design a more complex 3-layer neural network similar to that used in PEAQ to generate the metric’s output from the MOVs • Generate additional subjective data using the more recent MUSHRA testing protocol and use it to more thoroughly validate the proposed metric