Text to Speech Synthesis

2 downloads 249 Views 812KB Size Report
Dec 6, 2013 - Outline. • Speech. • Text to Speech Synthesis. • Applications. • How TTS works. • Demos of Comme
Text to Speech Synthesis By Najeeb Khan Speech Signal Processing Lab 2013-12-6

Outline

Outline • • • • • • •

Speech Text to Speech Synthesis Applications How TTS works Demos of Commercial TTS systems Singing Voice Synthesis Music Sore Editor for SVS

Speech

Speech Levels of Speech 6. Pragmatic 5. Semantic 4. Syntactic 3. Morphological 2. Phonetic 1. Acoustic

Text To Speech Synthesis

Text To Speech Synthesis • A system which takes as input a sequence of words and converts them to human speech • Do you think TTS is a solved problem?

Applications

Applications • For applications involving access to very large or rapidly changing databases an TTS system driven from a text database has many advantages – Proof-Reading of documents – Speaking and Reading aids for the disabled – Speech output for intelligent machines

Functional outline of a TTS system

Functional outline of a TTS system

Analysis Text

Synthesis Phonetic Description

Speech

Functional outline of a TTS system

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Symbols to Standard Form A preprocessor is used to convert symbol strings such as $3.17B to text

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system Phonetic Transcription For each word a phonetic transcription is computed. A morpheme dictionary is used. If the word is not found in dictionary letter to sound rules are used

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Parsing To aid the selection of prosody correlates, a phrase- level parsing is performed. POS tagging is done to provide input for the parser

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Semantic Analysis Only those semantic effects due to particular lexical items such as negatives are found

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription

Synthesis Phonetic Description

Parsing

Semantic Analysis

Speech

Functional outline of a TTS system

Analysis Text

Synthesis Phonetic Description

Speech

Functional outline of a TTS system

Functional outline of a TTS system

Analysis Text

Synthesis Phonetic Description

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Timing Prepausal Lengthening, pause duration and polysyllabic shortening are determined plus the basic duration of each segment

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system Fundamental Frequency Pitch rises on stressed syllables, continuation rises to signal continued throughout and a number of segmental effects are determined

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Phonetic Targets Phonetic Target parameters are determined for each phonetic segment utilizing a context window

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Continuation Smoothing The target values are smoothed to get a full set of parameters every frame

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Parameter Conversion The phonetic parameters must be converted to filter coefficients

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Waveform Generation The synthesizer utilizes coefficients to generate speech waveform

the the

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech

Functional outline of a TTS system

Analysis Text

Synthesis Phonetic Description

Speech

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques Attempts to model the human speech production system Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques Attempts to model the resulting speech signal Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

The shape of the vocal tract defined by articulators is usually converted toSystem a transfer function Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

Use an excitation signal to excite a digital filter defined by formants (separation of source and vocalSignal tract Model ) System Model

Articulatory Synthesis

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Use an excitation signal to excite a digital filter defined by LPC (separation of source and vocal tract ) Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Concatenates appropriate synthesis units to construct the required speech Signal Processing must be used for Signal Model prosody

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques

Synthesis Techniques

System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

TTS Demo • http://www.ivona.com/us/

Singing Voice Synthesis

Singing Voice Synthesis

Analysis Text

Synthesis Phonetic Description

Singing Speech

Singing Voice Synthesis

Analysis Text

Synthesis Phonetic Description

Singing Speech Prosody

Score Editor

Musical notes

Japanese Commercial SVS system • Vocaloid • ‘Supercell’ was created with Vocaloid and it became a hit album/band in 2009 with more than 100,000 cumulative sales

Music Score Editor