Automatic discriminative measurement of voice ... - Semantic Scholar

4 downloads 0 Views 247KB Size Report
Jan 25, 2010 - Big Brother corpus: Conversational speech, 4 speakers,. British. 2. TIMIT: Read speech, 630 speakers, American. ▻ Data: Words with voiceless ...
Automatic discriminative measurement of voice onset time Computer Science Graduate Seminar U. Chicago Morgan Sonderegger January 25, 2010

Introduction I

Every language’s sound system consists of phones. I I I

Stops (/p/, /b/, /q/) Vowels (/a/, /E/, /1/) Fricatives (/s/, /f/, /C/)

I

Significant variation in phone realizations, within and across speakers, dialects, languages.

I

Ex: English {p,t,k} aspirated word-initially (Pedro, taco), unaspirated in Spanish.

I

To study variation, need measurements of perceptually-important quantities over large phonetic corpora.

I

Scientific interest: How and why does acoustic realization of phones vary?

I

Practical interest: Variation a huge problem for SR systems.

I

I

Manual measurements expensive, impractical for corpora, but automatic measurements must be accurate. (Effect sizes often small.) Currently, automatic measurement usually not accurate enough: I I

!: Formants, pitch, speech rate %: Stress, phone boundaries, nasalization, rhoticity, voice onset time (VOT)

I

Potential payoff: Manual annotation ≈ 3–5 minutes/VOT example, need > 500 for a small corpus.

I

Project: Automatic VOT measurement for initial voiceless stops.

Background

I

English stops: /p/, /t/, /k/, /b/, /d/, /g/

I

Closure, followed by burst. Differ along two dimensions:

I

I I

Voicing Place of articulation

Lips Hard palate Soft palate

Voiced /b/ /d/ /k/

Voiceless /p/ /t/ /g/

Background I

The signal: Discrete-time sample y[n] of speech signal, x(t).

I

The spectrum: Usually consider magnitude of short-time (discrete) Fourier transform of y[n] over window centered at t1 , t2 ,...: “power spectrum”.

I

(Blackboard interlude)

I

Acoustic properties of stops: I I I

I

Closure: Silence Burst: Aperiodic, “noisy” spectrum, turbulence Beginning of periodicity = end of stop.

Humans use many cues to distinguish between stops: I I I

Formant transitions to following vowel Burst spectrum Voice onset time

Voice Onset Time I

VOT=t(onset of voicing)-t(onset of burst).

I

Onsets of burst and voicing abrupt, often ≤ 1 pitch period (5–10 ms).

From T. Neary’s (Alberta) slides, Phonetics

Voice Onset Time

From Kuhl & Miller (1978)

I

Important in human (and chinchilla) stop perception −→

I

p