Speech and Sound Use in a Remote Monitoring ...

3 downloads 762 Views 768KB Size Report
Monitoring System for Health Care. M. Vacher J.-F. ... The end. Medical Remote Monitoring: various sensors. Position .... Keyword Recognition. HYP. Speech/ ...
Speech and Sound Analysis Michel Vacher Motivation

Speech and Sound Use in a Remote Monitoring System for Health Care

The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

M. Vacher

The Sound Analysis System

The Speech Recognizer RAPHAEL

J.-F. Serignat S. Chaillol V. Popescu

D. Istrate

CLIPS-IMAG, Team GEOD Joseph Fourier University of Grenoble - CNRS (France)

Conclusion The end

Text, Speech and Dialogue 2006 11th to 15th of September Page 1/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Outline

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion

The Real-Time Architecture The Global System Organization The Sound Analysis System

The end

The Speech Recognizer RAPHAEL Conclusion

Page 2/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Outline

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion

The Real-Time Architecture The Global System Organization The Sound Analysis System

The end

The Speech Recognizer RAPHAEL Conclusion

Page 3/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Medical Remote Monitoring: various sensors

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture

Position

Medical

Sound

I infrared

I tensiometer

I contact

I heart rate

I microphones

The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

DATA FUSION SYSTEM

LOCAL UNDER MONITORING

SOUND ANALYSIS

MEDICAL SYSTEM

EMERGENCY CALL

Medical Center

SENSORS

Extracted informations through Sound Analysis : I patient’s activity: door lock, phone, "It’s hot!". . . I patient’s physiology: cough. . . scream, glass breaking, I distress situation: "Help me!". . . Page 4/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Outline

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion

The Real-Time Architecture The Global System Organization The Sound Analysis System

The end

The Speech Recognizer RAPHAEL Conclusion

Page 5/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Speech Corpus

Michel Vacher Motivation The Medical Remote Monitoring

The "Normal-Distress" corpus : I 21 speakers

Speech and Sound Corpora

I

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL

I

I I

The end

French 64 normal situation sentences I I

Conclusion

I

"Bonjour" (Good morning) "Où est le sel?" (Where is the salt)

64 distress sentences I I

Page 6/26

11 men, 10 women 20 years≤age≤65 years

"Au secours!" (Help me) "Un médecin vite!" (Doctor quickly)

I

2,646 audio speech files

I

duration: 38 mn

I

sampling rate: 16 kHz TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Sound Corpus

Michel Vacher

I Motivation

7 sound classes: I

The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL

I

I I I

normal sounds: door clapping, phone ringing, dishes... abnormal sounds: breaking glasses, falls, screams

1,577 audio sound files duration: 20 mn sampling rate: 16 kHz

Class of sound

Conclusion

% of the corpus (duration) (number)

Average of duration

The end

Dishes Door lock Door slap Glass breaking Ringing phone Scream Step sound Page 7/26

TSD 2006

6% 1% 33% 6% 40% 12% 2%

10.4% 12.7% 33.2% 5.6% 32.8% 4.6% 0.8% Michel Vacher

402 ms 36 ms 737 ms 861 ms 928 ms 1,930 ms 2,257 ms CLIPS-IMAG

Speech and Sound Analysis

Noisy Corpus

Michel Vacher

I Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture

I

Signal to noise ratio (SNR) : 0, +10, +20, +40dB Experimental HIS Noise : I I

non stationary recorded inside experimental apartment

The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

Page 8/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Outline

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion

The Real-Time Architecture The Global System Organization The Sound Analysis System

The end

The Speech Recognizer RAPHAEL Conclusion

Page 9/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Global System Organization (1/2)

Michel Vacher Motivation The Medical Remote Monitoring

SENNHEISER eW500

The Real-Time Architecture

The Sound Analysis System

Sound/Speech System

Micro.

The Global System Organization

The Speech Recognizer RAPHAEL

RECEIVER

EMITTER

Speech and Sound Corpora

Data Acquisition 8 Channels

HALL Micro+Em KITCHEN

XML

RECEIVER

Conclusion

RAPHAEL

Micro+Em

The end

Speech Recognition System

RECEIVER

BATH ROOM Micro+Em

− Key Word List

RECEIVER

− Sound List

BED ROOM

Data Acquisition Card: 200 ksamples/s − 8 differential channels Sampling rate: 16 ksamples/s for each channel

Page 10/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Global System Organization (2/2)

Michel Vacher Motivation

1st application

The Medical Remote Monitoring Speech and Sound Corpora

SOUND/SPEECH SYSTEM

The Real-Time Architecture

3a Sound Classification

Keyword Recognition Message Formatting

The Global System Organization

4

The Sound Analysis System

The Speech Recognizer RAPHAEL

1

if speech WAV

SPEECH RECOGNITION SYSTEM 3b

Page 11/26

TSD 2006

HYP

Segmentation

Acquisition Driving 8 chanels − 16 kHz

2nd application

2

Speech/Sound

Event Detection 5 to 8 chanels

8 inputs

Conclusion The end

if sound

Acquisition/Detection

(RAPHAEL)

Michel Vacher

XML Output

CLIPS-IMAG

Speech and Sound Analysis

Outline

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion

The Real-Time Architecture The Global System Organization The Sound Analysis System

The end

The Speech Recognizer RAPHAEL Conclusion

Page 12/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Sound Analysis System

Michel Vacher Motivation

1st application: SOUND/SPEECH SYSTEM

The Medical Remote Monitoring Speech and Sound Corpora

3a Sound Classification

Sequencer

The Real-Time Architecture

Timer()

The Global System Organization

Keyword Recognition Message Formatting 4

The Sound Analysis System

The Speech Recognizer RAPHAEL

if sound

Acquisition/Detection

Conclusion 1

Segmentation

Acquisition Driving 8 chanels − 16 kHz

if speech WAV

The end

2nd application

SPEECH RECOGNITION SYSTEM 3b

Page 13/26

TSD 2006

HYP

Speech/Sound

Event Detection 5 to 8 chanels

8 inputs

2

(RAPHAEL)

Michel Vacher

XML Output

CLIPS-IMAG

Speech and Sound Analysis Michel Vacher

Detection + Speech/Sound Segmentation I

Motivation

Detection [1]: I

The Medical Remote Monitoring

I

Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL

I

Wavelet Tree Detection Equal Error Rate: (ROC curves) EER = 6.5% if SNR = 0dB, 0% if SNR ≥ +10dB

Segmentation I I I I

frame 16 ms - overlap 8 ms GMM, 24 Gaussian models 16 LFCC coupled with energy Error Segmentation Rate "cross validation protocol":

Conclusion The end

SNR

ESR gobal

Speech

Sound

+40dB +20dB +10dB 0dB

4.1% 3.9% 3.9% 14.5%

0.8% 0.8% 1.5% 21.0%

9.5% 9.1% 7.9% 3.6%

[1] M. Vacher et al., Sound Detection and Classification through Transient Models using Wavelet Coefficient Trees, EUSIPCO, 20th September 2004. Page 14/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

Page 15/26

Sound Classification for Medical Remote Monitoring (1/2) a more complete sound corpus, a new class: object falls I adapted to HMM training and classification: one sound per file I 1,315 files, 25 mn Class of sound % of the corpus Average of (duration) (number) duration I

Dishes Door lock Door slap Glass breaking Object falls Ringing phone Scream Step sound TSD 2006

5.3% 27.2% 22% 12.3% 5.4% 11.7% 13% 3.7%

12.4% 12.5% 26.2% 8.2% 5.5% 19.4% 7.2% 8.4% Michel Vacher

380 ms 3,150 ms 950 ms 1,690 ms 1,150 ms 700 ms 2,110 ms 450 ms CLIPS-IMAG

Speech and Sound Analysis Michel Vacher

Sound Classification for Medical Remote Monitoring (2/2)

Motivation The Medical Remote Monitoring

I

analysis window 16 ms - overlap 8 ms

I

12 Gaussian models

I

16 LFCC with ∆ and ∆∆

I

GMM or 3 state HMM

I

"cross validation protocol"

I

Error classification rate:

Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

Page 16/26

SNR

GMM

HMM

≥ +50dB +40dB +20dB +10dB 0dB

3.2% 10.2% 16.5% 12.6% 23.6%

2% 9% 10.8% 15.4% 30%

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

XML Output

Michel Vacher Motivation The Medical Remote Monitoring

Speech: RAPHAEL analysis is initiated

Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

Module 2: segmentation Cuisine 1−12−2005 à 15:19:20 parole Probabilité de son=−20.2018, Probabilité de parole=−17.2258 Cuisine 1−12−2005 à 15:19:20 un docteur vite

Module 3b: RAPHAEL

sentence recognized by RAPHAEL

Page 17/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Acquiring Front Panel

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

Page 18/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

RAPHAEL (1/3)

Michel Vacher Motivation

1st application: SOUND/SPEECH SYSTEM

The Medical Remote Monitoring Speech and Sound Corpora

3a Sound Classification

Sequencer

The Real-Time Architecture

Timer()

The Global System Organization

Keyword Recognition Message Formatting 4

The Sound Analysis System

The Speech Recognizer RAPHAEL

if sound

Acquisition/Detection

Conclusion 1

Segmentation

Acquisition Driving 8 chanels − 16 kHz

if speech WAV

The end

2nd application

SPEECH RECOGNITION SYSTEM 3b

Page 19/26

TSD 2006

HYP

Speech/Sound

Event Detection 5 to 8 chanels

8 inputs

2

(RAPHAEL)

Michel Vacher

XML Output

CLIPS-IMAG

Speech and Sound Analysis

RAPHAEL (2/3)

Michel Vacher Recognized Sentences

Motivation The Medical Remote Monitoring

"Il fait chaud"

Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

Speech Signal

Auditory Front End

The Speech Recognizer RAPHAEL Conclusion The end

00010110 11010110 01011010 00010110 01000110 00010110 11110110 01010111

Acoustical Module

−−−−−− −−−−−− −−−−−−

Word Matching

Phoneme matching Feature Extraction 16 ms − 50% 16 MFCC + E + ∆ + ∆∆

Lexicon (HMM) a1

a2

a3

b1

b2

b3

Dictionary

Page 20/26

TSD 2006

Sentence Matching

Language Model (3 − gram)

{chaud} {{S WB} {o WB}}

Il fait chaud

{chaude} {{S WB} o d {& WB}}

Il fait beau

0.00123

Il fait froid

0.00076

{chaudes} {{S WB} o {d WB}}

Acoustic models of phoneme

__ __ ___ _ __ __ ___ _ _ __ ___ _ ___ __ __

0.002

{chauffage} {{S WB} oO f aA {Z WB}}

Michel Vacher

Phonetic models of words

CLIPS-IMAG

Speech and Sound Analysis

RAPHAEL (3/3)

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture

I

Acoustic models: I

The Global System Organization The Sound Analysis System

I

The Speech Recognizer RAPHAEL Conclusion

I

Dictionary: 11,000 words (French) (Speech Assessment Methods Phonetic Alphabet)

The end

I

Language model: n - gram, n ∈ {1, 2, 3} I I

Page 21/26

training with large corpora Braf80, Braf100, Braf120 large number of French speakers more than 200

extraction from the WEB and "Le Monde" corpora optimization for distress sentences

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Recognition Results

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture

I

all the sentences of our corpus

I

5 speakers

The Global System Organization The Sound Analysis System

The Speech Recognizer RAPHAEL

Corpus

Recognition Error

Normal Sentences

False Alarm Sentence "Quel temps fait-il dehors" / "Quel temps fait-il de mort" Missed Alarm Sentence "Faîtes vite" / "Equilibre" "A moi" / "Un grand"

Conclusion The end

Distress Sentences

Page 22/26

TSD 2006

Michel Vacher

CLIPS-IMAG

6%

16%

Speech and Sound Analysis

Conclusion

Michel Vacher Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization

I

Sound classification for distress detection.

I

Speech recognition allowing call for help.

I

Real-Time operation on an operating system without real-time capacity.

I

Speaker independence of the ASR system.

I

For 10dB and upper: errorless detection and segmentation error below 5%.

I

Outlook

The Sound Analysis System

The Speech Recognizer RAPHAEL Conclusion The end

I I

Page 23/26

Improvement of sound classification through HMM Development of a complete acoustical analysis system for life-sized tests.

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis Michel Vacher

Detection and Speech/Sound Segmentation in a Smart Room Environment

Motivation The Medical Remote Monitoring Speech and Sound Corpora

The Real-Time Architecture The Global System Organization The Sound Analysis System

Thank you for your attention.

The Speech Recognizer RAPHAEL Conclusion The end

Page 24/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

For Further Reading I

Michel Vacher Appendix For Further Reading Other

D. Istrate et al. Information Extraction From Sound for Medical Telemonitoring, IEEE Trans. on Information Tech. in Biomedicine Vol. 10, issue 2, pp. 264-274, 2006. M. Vacher, D. Istrate. Sound detection and classification for medical telesurveillance, BIOMED’2004 Proceedings, ACTA PRESS, pp. 395–398, 2004. D. Istrate, M. Vacher. Détection et classification des sons : application aux sons de la vie courante et à la parole, 20th GRETSI Proceedings, Vol. 1, pp. 485-488, 2005.

Page 25/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Speech and Sound Analysis

Acknowledgement I

Michel Vacher Appendix For Further Reading Other

This study is a part of the DESDHIS-ACI "Technologies pour la Santé" project of the French Research Ministry. This project is a collaboration between the CLIPS laboratory, in charge of the sound analysis, and the TIMC laboratory, in charge of the medical sensor analysis and data fusion. CLIPS and TIMC are 2 laboratories of the IMAG institute. IMAG has funded the implementation of the habitat used for our studies through the RESIDE-HIS project.

Page 26/26

TSD 2006

Michel Vacher

CLIPS-IMAG

Suggest Documents