Monitoring System for Health Care. M. Vacher J.-F. ... The end. Medical Remote Monitoring: various sensors. Position .... Keyword Recognition. HYP. Speech/ ...
Speech and Sound Analysis Michel Vacher Motivation
Speech and Sound Use in a Remote Monitoring System for Health Care
The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
M. Vacher
The Sound Analysis System
The Speech Recognizer RAPHAEL
J.-F. Serignat S. Chaillol V. Popescu
D. Istrate
CLIPS-IMAG, Team GEOD Joseph Fourier University of Grenoble - CNRS (France)
Conclusion The end
Text, Speech and Dialogue 2006 11th to 15th of September Page 1/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Outline
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion
The Real-Time Architecture The Global System Organization The Sound Analysis System
The end
The Speech Recognizer RAPHAEL Conclusion
Page 2/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Outline
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion
The Real-Time Architecture The Global System Organization The Sound Analysis System
The end
The Speech Recognizer RAPHAEL Conclusion
Page 3/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Medical Remote Monitoring: various sensors
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture
Position
Medical
Sound
I infrared
I tensiometer
I contact
I heart rate
I microphones
The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
DATA FUSION SYSTEM
LOCAL UNDER MONITORING
SOUND ANALYSIS
MEDICAL SYSTEM
EMERGENCY CALL
Medical Center
SENSORS
Extracted informations through Sound Analysis : I patient’s activity: door lock, phone, "It’s hot!". . . I patient’s physiology: cough. . . scream, glass breaking, I distress situation: "Help me!". . . Page 4/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Outline
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion
The Real-Time Architecture The Global System Organization The Sound Analysis System
The end
The Speech Recognizer RAPHAEL Conclusion
Page 5/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Speech Corpus
Michel Vacher Motivation The Medical Remote Monitoring
The "Normal-Distress" corpus : I 21 speakers
Speech and Sound Corpora
I
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL
I
I I
The end
French 64 normal situation sentences I I
Conclusion
I
"Bonjour" (Good morning) "Où est le sel?" (Where is the salt)
64 distress sentences I I
Page 6/26
11 men, 10 women 20 years≤age≤65 years
"Au secours!" (Help me) "Un médecin vite!" (Doctor quickly)
I
2,646 audio speech files
I
duration: 38 mn
I
sampling rate: 16 kHz TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Sound Corpus
Michel Vacher
I Motivation
7 sound classes: I
The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL
I
I I I
normal sounds: door clapping, phone ringing, dishes... abnormal sounds: breaking glasses, falls, screams
1,577 audio sound files duration: 20 mn sampling rate: 16 kHz
Class of sound
Conclusion
% of the corpus (duration) (number)
Average of duration
The end
Dishes Door lock Door slap Glass breaking Ringing phone Scream Step sound Page 7/26
TSD 2006
6% 1% 33% 6% 40% 12% 2%
10.4% 12.7% 33.2% 5.6% 32.8% 4.6% 0.8% Michel Vacher
402 ms 36 ms 737 ms 861 ms 928 ms 1,930 ms 2,257 ms CLIPS-IMAG
Speech and Sound Analysis
Noisy Corpus
Michel Vacher
I Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture
I
Signal to noise ratio (SNR) : 0, +10, +20, +40dB Experimental HIS Noise : I I
non stationary recorded inside experimental apartment
The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
Page 8/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Outline
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion
The Real-Time Architecture The Global System Organization The Sound Analysis System
The end
The Speech Recognizer RAPHAEL Conclusion
Page 9/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Global System Organization (1/2)
Michel Vacher Motivation The Medical Remote Monitoring
SENNHEISER eW500
The Real-Time Architecture
The Sound Analysis System
Sound/Speech System
Micro.
The Global System Organization
The Speech Recognizer RAPHAEL
RECEIVER
EMITTER
Speech and Sound Corpora
Data Acquisition 8 Channels
HALL Micro+Em KITCHEN
XML
RECEIVER
Conclusion
RAPHAEL
Micro+Em
The end
Speech Recognition System
RECEIVER
BATH ROOM Micro+Em
− Key Word List
RECEIVER
− Sound List
BED ROOM
Data Acquisition Card: 200 ksamples/s − 8 differential channels Sampling rate: 16 ksamples/s for each channel
Page 10/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Global System Organization (2/2)
Michel Vacher Motivation
1st application
The Medical Remote Monitoring Speech and Sound Corpora
SOUND/SPEECH SYSTEM
The Real-Time Architecture
3a Sound Classification
Keyword Recognition Message Formatting
The Global System Organization
4
The Sound Analysis System
The Speech Recognizer RAPHAEL
1
if speech WAV
SPEECH RECOGNITION SYSTEM 3b
Page 11/26
TSD 2006
HYP
Segmentation
Acquisition Driving 8 chanels − 16 kHz
2nd application
2
Speech/Sound
Event Detection 5 to 8 chanels
8 inputs
Conclusion The end
if sound
Acquisition/Detection
(RAPHAEL)
Michel Vacher
XML Output
CLIPS-IMAG
Speech and Sound Analysis
Outline
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion
The Real-Time Architecture The Global System Organization The Sound Analysis System
The end
The Speech Recognizer RAPHAEL Conclusion
Page 12/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Sound Analysis System
Michel Vacher Motivation
1st application: SOUND/SPEECH SYSTEM
The Medical Remote Monitoring Speech and Sound Corpora
3a Sound Classification
Sequencer
The Real-Time Architecture
Timer()
The Global System Organization
Keyword Recognition Message Formatting 4
The Sound Analysis System
The Speech Recognizer RAPHAEL
if sound
Acquisition/Detection
Conclusion 1
Segmentation
Acquisition Driving 8 chanels − 16 kHz
if speech WAV
The end
2nd application
SPEECH RECOGNITION SYSTEM 3b
Page 13/26
TSD 2006
HYP
Speech/Sound
Event Detection 5 to 8 chanels
8 inputs
2
(RAPHAEL)
Michel Vacher
XML Output
CLIPS-IMAG
Speech and Sound Analysis Michel Vacher
Detection + Speech/Sound Segmentation I
Motivation
Detection [1]: I
The Medical Remote Monitoring
I
Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL
I
Wavelet Tree Detection Equal Error Rate: (ROC curves) EER = 6.5% if SNR = 0dB, 0% if SNR ≥ +10dB
Segmentation I I I I
frame 16 ms - overlap 8 ms GMM, 24 Gaussian models 16 LFCC coupled with energy Error Segmentation Rate "cross validation protocol":
Conclusion The end
SNR
ESR gobal
Speech
Sound
+40dB +20dB +10dB 0dB
4.1% 3.9% 3.9% 14.5%
0.8% 0.8% 1.5% 21.0%
9.5% 9.1% 7.9% 3.6%
[1] M. Vacher et al., Sound Detection and Classification through Transient Models using Wavelet Coefficient Trees, EUSIPCO, 20th September 2004. Page 14/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
Page 15/26
Sound Classification for Medical Remote Monitoring (1/2) a more complete sound corpus, a new class: object falls I adapted to HMM training and classification: one sound per file I 1,315 files, 25 mn Class of sound % of the corpus Average of (duration) (number) duration I
Dishes Door lock Door slap Glass breaking Object falls Ringing phone Scream Step sound TSD 2006
5.3% 27.2% 22% 12.3% 5.4% 11.7% 13% 3.7%
12.4% 12.5% 26.2% 8.2% 5.5% 19.4% 7.2% 8.4% Michel Vacher
380 ms 3,150 ms 950 ms 1,690 ms 1,150 ms 700 ms 2,110 ms 450 ms CLIPS-IMAG
Speech and Sound Analysis Michel Vacher
Sound Classification for Medical Remote Monitoring (2/2)
Motivation The Medical Remote Monitoring
I
analysis window 16 ms - overlap 8 ms
I
12 Gaussian models
I
16 LFCC with ∆ and ∆∆
I
GMM or 3 state HMM
I
"cross validation protocol"
I
Error classification rate:
Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
Page 16/26
SNR
GMM
HMM
≥ +50dB +40dB +20dB +10dB 0dB
3.2% 10.2% 16.5% 12.6% 23.6%
2% 9% 10.8% 15.4% 30%
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
XML Output
Michel Vacher Motivation The Medical Remote Monitoring
Speech: RAPHAEL analysis is initiated
Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
Module 2: segmentation Cuisine 1−12−2005 à 15:19:20 parole Probabilité de son=−20.2018, Probabilité de parole=−17.2258 Cuisine 1−12−2005 à 15:19:20 un docteur vite
Module 3b: RAPHAEL
sentence recognized by RAPHAEL
Page 17/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Acquiring Front Panel
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
Page 18/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
RAPHAEL (1/3)
Michel Vacher Motivation
1st application: SOUND/SPEECH SYSTEM
The Medical Remote Monitoring Speech and Sound Corpora
3a Sound Classification
Sequencer
The Real-Time Architecture
Timer()
The Global System Organization
Keyword Recognition Message Formatting 4
The Sound Analysis System
The Speech Recognizer RAPHAEL
if sound
Acquisition/Detection
Conclusion 1
Segmentation
Acquisition Driving 8 chanels − 16 kHz
if speech WAV
The end
2nd application
SPEECH RECOGNITION SYSTEM 3b
Page 19/26
TSD 2006
HYP
Speech/Sound
Event Detection 5 to 8 chanels
8 inputs
2
(RAPHAEL)
Michel Vacher
XML Output
CLIPS-IMAG
Speech and Sound Analysis
RAPHAEL (2/3)
Michel Vacher Recognized Sentences
Motivation The Medical Remote Monitoring
"Il fait chaud"
Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
Speech Signal
Auditory Front End
The Speech Recognizer RAPHAEL Conclusion The end
00010110 11010110 01011010 00010110 01000110 00010110 11110110 01010111
Acoustical Module
−−−−−− −−−−−− −−−−−−
Word Matching
Phoneme matching Feature Extraction 16 ms − 50% 16 MFCC + E + ∆ + ∆∆
Lexicon (HMM) a1
a2
a3
b1
b2
b3
Dictionary
Page 20/26
TSD 2006
Sentence Matching
Language Model (3 − gram)
{chaud} {{S WB} {o WB}}
Il fait chaud
{chaude} {{S WB} o d {& WB}}
Il fait beau
0.00123
Il fait froid
0.00076
{chaudes} {{S WB} o {d WB}}
Acoustic models of phoneme
__ __ ___ _ __ __ ___ _ _ __ ___ _ ___ __ __
0.002
{chauffage} {{S WB} oO f aA {Z WB}}
Michel Vacher
Phonetic models of words
CLIPS-IMAG
Speech and Sound Analysis
RAPHAEL (3/3)
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture
I
Acoustic models: I
The Global System Organization The Sound Analysis System
I
The Speech Recognizer RAPHAEL Conclusion
I
Dictionary: 11,000 words (French) (Speech Assessment Methods Phonetic Alphabet)
The end
I
Language model: n - gram, n ∈ {1, 2, 3} I I
Page 21/26
training with large corpora Braf80, Braf100, Braf120 large number of French speakers more than 200
extraction from the WEB and "Le Monde" corpora optimization for distress sentences
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Recognition Results
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture
I
all the sentences of our corpus
I
5 speakers
The Global System Organization The Sound Analysis System
The Speech Recognizer RAPHAEL
Corpus
Recognition Error
Normal Sentences
False Alarm Sentence "Quel temps fait-il dehors" / "Quel temps fait-il de mort" Missed Alarm Sentence "Faîtes vite" / "Equilibre" "A moi" / "Un grand"
Conclusion The end
Distress Sentences
Page 22/26
TSD 2006
Michel Vacher
CLIPS-IMAG
6%
16%
Speech and Sound Analysis
Conclusion
Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization
I
Sound classification for distress detection.
I
Speech recognition allowing call for help.
I
Real-Time operation on an operating system without real-time capacity.
I
Speaker independence of the ASR system.
I
For 10dB and upper: errorless detection and segmentation error below 5%.
I
Outlook
The Sound Analysis System
The Speech Recognizer RAPHAEL Conclusion The end
I I
Page 23/26
Improvement of sound classification through HMM Development of a complete acoustical analysis system for life-sized tests.
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis Michel Vacher
Detection and Speech/Sound Segmentation in a Smart Room Environment
Motivation The Medical Remote Monitoring Speech and Sound Corpora
The Real-Time Architecture The Global System Organization The Sound Analysis System
Thank you for your attention.
The Speech Recognizer RAPHAEL Conclusion The end
Page 24/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
For Further Reading I
Michel Vacher Appendix For Further Reading Other
D. Istrate et al. Information Extraction From Sound for Medical Telemonitoring, IEEE Trans. on Information Tech. in Biomedicine Vol. 10, issue 2, pp. 264-274, 2006. M. Vacher, D. Istrate. Sound detection and classification for medical telesurveillance, BIOMED’2004 Proceedings, ACTA PRESS, pp. 395–398, 2004. D. Istrate, M. Vacher. Détection et classification des sons : application aux sons de la vie courante et à la parole, 20th GRETSI Proceedings, Vol. 1, pp. 485-488, 2005.
Page 25/26
TSD 2006
Michel Vacher
CLIPS-IMAG
Speech and Sound Analysis
Acknowledgement I
Michel Vacher Appendix For Further Reading Other
This study is a part of the DESDHIS-ACI "Technologies pour la Santé" project of the French Research Ministry. This project is a collaboration between the CLIPS laboratory, in charge of the sound analysis, and the TIMC laboratory, in charge of the medical sensor analysis and data fusion. CLIPS and TIMC are 2 laboratories of the IMAG institute. IMAG has funded the implementation of the habitat used for our studies through the RESIDE-HIS project.
Page 26/26
TSD 2006
Michel Vacher
CLIPS-IMAG