Text To Speech Synthesis • A system which takes as input a sequence of words and converts them to human speech • Do you think TTS is a solved problem?
Applications
Applications • For applications involving access to very large or rapidly changing databases an TTS system driven from a text database has many advantages – Proof-Reading of documents – Speaking and Reading aids for the disabled – Speech output for intelligent machines
Functional outline of a TTS system
Functional outline of a TTS system
Analysis Text
Synthesis Phonetic Description
Speech
Functional outline of a TTS system
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Symbols to Standard Form A preprocessor is used to convert symbol strings such as $3.17B to text
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system Phonetic Transcription For each word a phonetic transcription is computed. A morpheme dictionary is used. If the word is not found in dictionary letter to sound rules are used
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Parsing To aid the selection of prosody correlates, a phrase- level parsing is performed. POS tagging is done to provide input for the parser
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Semantic Analysis Only those semantic effects due to particular lexical items such as negatives are found
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Analysis Symbols to Standard Form
Text
AnalysisPhonetic Transcription
Synthesis Phonetic Description
Parsing
Semantic Analysis
Speech
Functional outline of a TTS system
Analysis Text
Synthesis Phonetic Description
Speech
Functional outline of a TTS system
Functional outline of a TTS system
Analysis Text
Synthesis Phonetic Description
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Timing Prepausal Lengthening, pause duration and polysyllabic shortening are determined plus the basic duration of each segment
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system Fundamental Frequency Pitch rises on stressed syllables, continuation rises to signal continued throughout and a number of segmental effects are determined
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Phonetic Targets Phonetic Target parameters are determined for each phonetic segment utilizing a context window
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Continuation Smoothing The target values are smoothed to get a full set of parameters every frame
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Parameter Conversion The phonetic parameters must be converted to filter coefficients
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Waveform Generation The synthesizer utilizes coefficients to generate speech waveform
the the
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Synthesis Timing
Text
Fundamental Analysis Frequency
Phonetic Targets Phonetic Description
Parameter Synthesis Conversion
Continuation Smoothing
Waveform Generation
Speech
Functional outline of a TTS system
Analysis Text
Synthesis Phonetic Description
Speech
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques Attempts to model the human speech production system Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques Attempts to model the resulting speech signal Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
The shape of the vocal tract defined by articulators is usually converted toSystem a transfer function Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
Use an excitation signal to excite a digital filter defined by formants (separation of source and vocalSignal tract Model ) System Model
Articulatory Synthesis
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Use an excitation signal to excite a digital filter defined by LPC (separation of source and vocal tract ) Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Concatenates appropriate synthesis units to construct the required speech Signal Processing must be used for Signal Model prosody
Formant Synthesis
LP Synthesis
Concatenation Synthesis
Synthesis Techniques
Synthesis Techniques
System Model
Articulatory Synthesis
Signal Model
Formant Synthesis
LP Synthesis
Concatenation Synthesis
TTS Demo • http://www.ivona.com/us/
Singing Voice Synthesis
Singing Voice Synthesis
Analysis Text
Synthesis Phonetic Description
Singing Speech
Singing Voice Synthesis
Analysis Text
Synthesis Phonetic Description
Singing Speech Prosody
Score Editor
Musical notes
Japanese Commercial SVS system • Vocaloid • ‘Supercell’ was created with Vocaloid and it became a hit album/band in 2009 with more than 100,000 cumulative sales