Prevalence of hearing loss in older adults in Beaver. Dam, Wisconsin: The epidemiology of hearing loss study. American Journal of Epidemiology, 148(9), ...
1 1
1 1 0 1 1
1 0 0
1 1
0 0 1 1 0
1 0 1 1 0
1 0
0 0 1 0 0 0 0 0
1 1 0 0 0
0 1 0 0 0
0 0 1
1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 0 1
ɑ
0 0 1 1 0 1 0 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0ʌ 0 0 0 0 0 1
r
1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 0 0 ɜː 0
0 0
0 1 0 1 1 0
0
1
0
0
1
æ
0 0 0 0 0 0 1 0 0 0 1
1 0 0 1 0 1 1 0
1
0
0 0
ʁ0
0
0 0 0 0 0
1 0
1
1
Towards the use of automatic speech recognition for the fine-tuning of hearing aids 0
1
Lionel Fontan Archean Technologies, Montauban, France Annual Conference of the British Society of Audiology Harrogate, UK — June 30th, 2017
Archean Technologies • SME (≃ 15 people) • Located in Montauban (south-west of France)
2
Archean Technologies
3
Archean Technologies • SME (≃ 15 people) • Located in Montauban (south-west of France) • Historical activity: emergency public address systems
4
Archean Technologies • Since 2012: use of Automatic Speech Recognition (ASR) for several domains of application: ➡ Public address (objective measurement of speech intelligibility) ➡ Language learning (pronunciation training software) ➡ Motor speech disorders (assessment of pathological speakers) ➡ Hearing aids (speech audiometry)
5
Speech audiometry • Speech-intelligibility tests are used in order to assess speechcommunication abilities ➡ Particularly crucial for listeners with age-related hearing loss (ARHL), whose main complaint is the difficulty to understand speech in noisy conditions (CHABA, 1998)
6
Speech audiometry • Speech-intelligibility tests are used in order to assess speechcommunication abilities • Speech-intelligibility tests usually consist in asking the listener to repeat several lists: -
of logatoms (ex. CVCs or VCVs) of words (ex. Fournier, 1951) of sentences (ex. Hearing in Noise Test — HINT; Nilsson et al., 1994)
7
Speech audiometry • Speech-intelligibility tests are used in order to assess speechcommunication abilities ➡ Two main outcome measures: -
percentage of speech units correctly identified speech reception threshold (SRT — the speech level required to obtain 50% correct words of a list)
➡ may be used to assess the benefit provided by HAs
8
Speech audiometry •
However, speech-intelligibility tests present several disadvantages:
➡ They are long and tedious. Assessment of 1 fitting condition = several word-lists
z
z z
9
Speech audiometry •
However, speech-intelligibility tests present several disadvantages:
➡ Scores may be biased by the listeners’ and the audiologists/HA dispensers’ familiarity with speech materials (Hustad & Cahill, 2003)
10
Automatic speech recognition • The use of ASR could provide fast and objective measures of speech intelligibility: - ASR for a list of 30 words requires about 30 seconds - ASR scores are totally reproducible
11
Automatic speech recognition
Speech pathologists’ speech-intelligibility ratings
• ASR can predict speech intelligibility of utterances produced by patients with motor speech disorders (Maier et al., 2009, Pellegrini et al., 2014, Fontan et al., 2015)
Speech pathologists’ ratings of speech intelligibility vs. automatic measures for sentences spoken by 12 speakers with dysarthria (r = -.81; p 15 dB at 0.5, 1, 2, and 4 kHz) First speech audiometric assessment Mini-Mental State Examination (Folstein et al., 1975) score > 26
30
— Experiment 2 — Older people with ARHL • General procedure:
Right-ear audiogram pure-tone audiometric assessment
Simulation of ARHL
Speech tests through headphones (right ear only)
Automatic speech recognition
% correct words
Comparison 31
% correct words
— Experiment 2 — Older people with ARHL Right-ear audiograms
Hearing level (dB HL)
Frequency
Mean
32
— Experiment 2 — Older people with ARHL • Speech materials: -
logatoms — 68 VCVs (Dodelé & Dodelé, 2000)
-
words — 60 French dissyllabic nouns (Fournier, 1951)
-
sentences : 40 declarative sentences from the French version of the HINT (Vaillancourt et al., 2005)
33
— Experiment 2 — Older people with ARHL • Speech tasks and procedure: -
Speech was presented through headphones at 50 dB, (right ear only)
-
Order between the three tests was counterbalanced between participants
-
Participants were encouraged to repeat as many words as they could
-
% correct words were computed for each test/participant
34
— Experiment 2 — Older people with ARHL • Results: human scores
% correct words
100 80 60 40 20 0
Logatoms
Words 35
Sentences
— Experiment 2 — Older people with ARHL • Results: correlations between human scores and ASR scores
L-lexicon
M-lexicon
S-lexicon
Logatoms
.80***
.80***
.82***
Words
.75***
.76***
.79***
Sentences
.61***
.62***
.60***
*** p < .001
36
— Experiment 2 — Older people with ARHL • Taking into account the effect of MMSE scores:
Human scores (RAUs)
LOGATOMS
r = .88, p < .001
Automatic prediction (ASR + MMSE) 37
— Experiment 2 — Older people with ARHL • Taking into account the effect of MMSE scores:
Human scores (RAUs)
WORDS
r = .86, p < .001
Automatic prediction (ASR + MMSE) 38
— Experiment 2 — Older people with ARHL • Taking into account the effect of MMSE scores:
Human scores (RAUs)
SENTENCES
r = .79, p < .001
Automatic prediction (ASR + MMSE) 39
Conclusion and perspectives • Intelligibility scores could be fairly accurately predicted by our ASR system • Strongest correlations were observed for logatoms ➡ Encouraging results with regards to the long-term goal of this research work: building a tool to help audiologists/HA dispensers with the fine tuning of HAs
40
Conclusion and perspectives • Next proof-of-concept studies: -
Applicability of the ASR system to other languages (e.g. English)
-
Prediction of speech intelligibility in noisy conditions (steady noise / babble)
-
Prediction of misperceptions (i.e. phonemic confusions) experienced by human listeners with (simulated) ARHL
41
Thank you for your attention
42
• • • • • • • • • • • • • • • • • • •
References CHABA (1988). Speech understanding and aging. The Journal of the Acoustical Society of America, 83(3), 859–895. Cruickshanks, K., Wiley, T., Tweed, T., Klein, B., Klein, R., Mares-Perlman, J., & Nondahl, D. (1998). Prevalence of hearing loss in older adults in Beaver Dam, Wisconsin: The epidemiology of hearing loss study. American Journal of Epidemiology, 148(9), 879–886. http://dx.doi.org/10.1093/oxfordjournals.aje.a009713 Dodelé L., Dodelé, D. (2000). L’audiométrie vocale en présence de bruit et le test AVfB. Les cahiers de l’audition, 3(6), p. 15-22. Deléglise, P., Estève, Y., Meignier S., & Merlin, T. (2005). The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news. In Proceedings of Interspeech '05 (pp.1653–1656). Lisbon, Portugal: International Speech and Communication Association. Estève, Y. (2009). Traitement automatique de la parole: contributions (Automatic Speech Processing: contributions). (Thesis for the Habilitation à Diriger des Recherches authorization). Le Mans (France): Université du Maine. Folstein, M. F., Folstein, S. E ., and McHugh, P.R. (1975). Mini-mental state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12(3), p. 189-198. Fontan, L., Ferrané, I., Farinas, J., Pinquier, J., Magnen, C., Tardieu, J., Gaillard, P., Aumont, X., & Füllgrabe, C. (in press). Automatic speech recognition predicts speech intelligibility and comprehension for listeners with simulated age-related hearing loss. Journal of Speech, Language, and Hearing Research. Fontan, L., Ferrané, I., Farinas, J., Pinquier, J., & Aumont, X. (2016). Using phonologically weighted Levenshtein distances for the prediction of microscopic intelligibility. In Proceedings of Interspeech ‘16 (pp. 650-654). San Francisco, CA: International Speech and Communication Association. Fontan, L., Gaillard, P., & Woisard, V. (2013). Comprendre et agir: Les tests pragmatiques de compréhension de la parole et EloKanz. In R. Sock, B. Vaxelaire, & C. Fauth (Eds.), La voix et la parole perturbées (pp. 131–144). Mons, Belgium: CIPA. Fontan, L., Pellegrini, T., Olcoz, J., & Abad, A. (2015). Predicting disordered speech comprehensibility from goodness of pronunciation scores. In Sixth Workshop on Speech and Language Processing for Assistive Technologies: SLPAT 2015 – Satellite workshop of INTERSPEECH '15, Dresden, Germany. Fournier, J. (1951). Audiométrie vocale. Paris, France: Maloine. Galliano, S., Gravier, G., & Chaubard, L. (2009). The ESTER 2 Evaluation Campaign for the Rich Transcription of French Radio Broadcasts. In Proceedings of Interspeech ’09 (pp. 2583–2586). Brighton, United Kingdom: International Speech and Communication Association. Levenshtein, V. I. (1966). Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady, 10(8), 707–710, 1966. Maier, A., Haderlein, T., Eysholdt, U., Rosanowski, F., Batliner, A., Schuster, M., & Nöth, E. (2009). PEAKS – A system for the automatic evaluation of voice and speech disorders. Speech Communication, 51(5), 425–437. Nejime Y., & Moore B. C. J. (1997). Simulation of the effect of threshold elevation and loudness recruitment combined with reduced frequency selectivity on the intelligibility of speech in noise. Journal of the Acoustical Society of America, 102(1), 603–615. Nilsson, M., Soli, S.D., & Sullivan, J.A. (1994). Development of the hearing in noise test for the measurement of speech reception thresholds in quiet and noise. Journal of the Acoustical Society of America, 95(2), 1085-1099. Pellegrini, T., Fontan, L., Mauclair, J., Farinas, J., & Robert, M. (2014). The Goodness of Pronunciation algorithm applied to disordered speech. In Proc. INTERSPEECH ’14, Singapore, p. 1463-1467. Seymore, K., Chen, S., Doh, S., Eskenazi, M., Gouvea, E., Raj, B. …, Thayer, E. (1998). The 1997 CMU Sphinx-3 English broadcast news transcription system. In Proceedings of the 1998 DARPA Speech Recognition Workshop (pp. 55–59). Lansdowne, VA: Morgan Kaufmann Publishers. Vaillancourt, V., Laroche, C., Mayer, C., Basque, C., Nali, M., Eriks-Brophy, A., …, Giguère, C. (2005). Adaptation of the HINT (hearing in noise test) for adult Canadian Francophone populations. International Journal of Audiology, 44(6), 358–361.