Payton et al. replicated Humes results in reverberation, noise plus reverberation and with multiple speaking styles, 1994. Spring 2015 ASA Meeting, Pittsburgh, ...
Using Speech Intelligibility Metrics to Predict the Performance of HearingImpaired Listeners Karen Payton Electrical & Computer Engineering Dept. UMass Dartmouth, N. Dartmouth, MA
Articulation Index (AI) • Please attend this afternoon’s special session: 2PAA Session in honor of Karl Kryter • In particular, 2PAA 11 will cover use of the AI to predict hearing impaired intelligibility
Spring 2015 ASA Meeting, Pittsburgh, PA
2
Modifications to the AI • Speech Intelligibility Index (SII) – Includes improved calculation of reverberation effects. – Includes a formula to predict intelligibility of impaired listeners in Annex of ANSI Standard, 1997
• Account for fluctuating backgrounds – Short-time AI or SII – Kates, 1987; Ma et al., 2009 – Enhanced SII – Rhebergen et al., 2006
• Account for clipping – Coherence-based SII – Kate & Arehart, 2005
• Account for binaural listening – Binaural SII – Beutelmann & Brand, 2006
Spring 2015 ASA Meeting, Pittsburgh, PA
3
Speech Transmission Index (STI) • Humes et al. first demonstrated that the STI could predict impaired performance by modeling elevated thresholds as internal noise, 1986 • Payton et al. replicated Humes results in reverberation, noise plus reverberation and with multiple speaking styles, 1994
Spring 2015 ASA Meeting, Pittsburgh, PA
4
2 Hearing-Impaired Listeners’ Intelligibility vs AI and STI
Curves fit to normal-hearing listener data for the 2 speaking styles. Data points are for 2 HI listeners. Payton et al., 1994 Spring 2015 ASA Meeting, Pittsburgh, PA
5
Modifications to the STI • Use Fourier transform of squared room impulse response to obtain MTF – Schroeder, 1981 • Adjust band weights – Humes et al., 1986 • Use speech as the probe stimulus – Frequency-domain: Houtgast & Steeneken, 1972; Drullman et al., 1994, Payton & Braida, 1999; Payton et al., 2002 – Time-domain: Ludvigsen et al., 1990, Goldsworthy & Greenberg, 2004 Spring 2015 ASA Meeting, Pittsburgh, PA
6
Comparison of various speech-based STI metrics to Traditional STI
Degradations included varied noise level plus five different reverberation conditions Goldsworthy & Greenberg, 2004
Spring 2015 ASA Meeting, Pittsburgh, PA
7
Magnitude Cross Power Spectrum Method Intelligibility (%)
100
80 60 40 20
00
0.2 0.4 0.6 0.8 1
Magnitude CPS sSTI
Circles correspond to linear amplification conditions, other symbols correspond to amplitude compression conditions (simulated impairment). Payton et al., 2002 Spring 2015 ASA Meeting, Pittsburgh, PA
8
Modifications Made to Time-Domain Methods • Compute speech-based STI over short time windows – Ludvigsen et al, 1990 (~.05 to 19 sec), Payton & Shrestha 2013 (as short as .3 sec) • Experiment with number of analysis bands, Chen & Loizou, 2010
Spring 2015 ASA Meeting, Pittsburgh, PA
9
Short-Time Speech-Based STI
Spring 2015 ASA Meeting, Pittsburgh, PA
10
Short-Time Speech-Based STI
Spring 2015 ASA Meeting, Pittsburgh, PA
11
Modifications to correct short-window anomalies • Detect windows with no input signal and increase envelope bandwidth as a function of octave-band, can generate reliable speechbased STI values from windows as short as .04 sec), Payton & Ferreira, 2014
Spring 2015 ASA Meeting, Pittsburgh, PA
12
Modified Envelope Regression sSTI method Modified ssSTI vs. Theoretical STI 160 ms
80 ms
Modified ssSTI
320 ms
Speech in 0dB SSN Payton & Ferreira, 2014
Theoretical Method Spring 2015 ASA Meeting, Pittsburgh, PA
13
Standard Deviation of MER , 50 Hz LPF 1 kHz Band
4 kHz
80 ms
160 ms Window 320 ms
250 Hz
Time (s) Spring 2015 ASA Meeting, Pittsburgh, PA
14
LPF Cutoff for 95th Percentile of σMER ≤ 0.15 Octave Band
Window Length
Spectral Content After Squaring
f Lowpass Filter Cutoff (Hz) 250 Hz 1 kHz Band -
4 kHz 309 Hz LPF
884 Hz LPF
Time (s)
Spring 2015 ASA Meeting, Pittsburgh, PA
110 Hz LPF
15
Problem: Digital Hearing Aids Include Nonlinear Processing • Amplitude Compression • Clipping • Noise reduction: – Spectral Subtraction – Binary Masks
• Dereverberation
Spring 2015 ASA Meeting, Pittsburgh, PA
16
STI Modifications to Address Nonlinear Processing • For spectral subtraction: – Normalized versions of the Cross Power Spectral methods, Envelope Regression & Covariance – Goldsworthy & Greenberg, 2004
Spring 2015 ASA Meeting, Pittsburgh, PA
17
Modified Speech-based STI Metrics
Modified metrics can’t exceed 1 or when nonlinear processing occurs. Goldsworthy & Greenberg, 2004 Spring 2015 ASA Meeting, Pittsburgh, PA
18
• For amplitude compression: – Amplitude Compression Compensation of Magnitude Cross Power Spectral (CPS) method – Payton & Chen, 2005
Spring 2015 ASA Meeting, Pittsburgh, PA
19
Modification to account for amplitude compression hearing aids Intelligibility (%)
100
80
80
60
60
40
40
20
20
00
0.2 0.4 0.6 0.8 0
Magnitude CPS sSTI
FL F2 F3 SL SV
0.2 0.4 0.6 0.8 Modified sSTI
1
Circles correspond to linear amplification conditions, other symbols correspond to amplitude compression conditions (simulated impairment). Payton & Chen, 2005 Spring 2015 ASA Meeting, Pittsburgh, PA
20
• It’s possible the short-time STI methods will do a better job since amplitude compression is not instantaneous but, rather, has attack and release times
Spring 2015 ASA Meeting, Pittsburgh, PA
21
Blind Metrics Not Directly Related to SII or STI • *Speech-to-Reverberation Modulation energy Ratio (SRMR), tailored to hearing aids (SRMRHA), Suelzle et al., 2013; also SRMR-CI, Santos et al., 2013 • *Modulation spectrum Area (ModA), Chen et al., 2013
Spring 2015 ASA Meeting, Pittsburgh, PA
22
HI Quality Ratings Compared to HASQI & *SRMR-HA
Some experimental conditions include nonlinear noise reduction Suelzle et al., 2013
Spring 2015 ASA Meeting, Pittsburgh, PA
23
Cochlear Implant Intelligibility in Reverberant Environments vs *ModA, NCM & *SRMR
Chen et al., 2013 Spring 2015 ASA Meeting, Pittsburgh, PA
24
CI Intelligibility vs *P.563, *SRMR, *ModA & *SRMR-CI
r = .89
r = .93 Santos et al., 2013
r = .82 Spring 2015 ASA Meeting, Pittsburgh, PA
r = .96 25
Invasive Metrics Not Based on AI or STI • Short-Time Envelope Correlation Index (STECI) Kates & Arehart, 2014; modification of STOI (Taal et al., 2011) to accommodate hearing impairments • Hearing Aid Speech Perception Index (HASPI), Kates & Arehart, 2014
Spring 2015 ASA Meeting, Pittsburgh, PA
26
NH & HI Intelligibility vs HASPI,CSII and STECI
Noise was stationary, speech-shaped. Degradations included peak clipping and center clipping Kates & Arehart, 2014
Spring 2015 ASA Meeting, Pittsburgh, PA
27
NH & HI Intelligibility vs HASPI,CSII and STECI
Three frequency compression ratios (1.5:1, 2:1, and 3:1) and three frequency compression cutoff frequencies (1, 1.5, and 2 kHz) were tested. Kates & Arehart, 2014
Spring 2015 ASA Meeting, Pittsburgh, PA
28
NH & HI Intelligibility vs HASPI,CSII and STECI
Ideal Binary Mask applied to multi-talker babble noise added at several SNRs. Also random mask bits altered to simulate noise estimation errors. Kates & Arehart, 2014
Spring 2015 ASA Meeting, Pittsburgh, PA
29
Discussion • Why are there so many metrics? – Some metrics were designed to capture quality judgements while others were designed to predict intelligibility – Some metrics were modified to capture suprathreshold characteristics of impaired listeners, such as recruitment – Typically, a metric or modification was developed to address a specific condition that was not well predicted by existing metrics – An important class of challenging conditions is the set of processing conditions generated by digital hearing aids
• The jury is still out on what is “the best” metric to use to predict hearing-impaired listeners’ intelligibility performance over the entire range of processing conditions and possible acoustic scenarios. Spring 2015 ASA Meeting, Pittsburgh, PA
30
Acknowledgements • Chapter 11, Speech Enhancement, Philipos Loizou, 2013 • Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices, IEEE Signal Processing Magazine, Falk et al., 2015
Spring 2015 ASA Meeting, Pittsburgh, PA
31