Text to Speech Synthesis

Text to Speech Synthesis By Najeeb Khan Speech Signal Processing Lab 2013-12-6

Outline

Outline • • • • • • •

Speech Text to Speech Synthesis Applications How TTS works Demos of Commercial TTS systems Singing Voice Synthesis Music Sore Editor for SVS

Speech

Speech Levels of Speech 6. Pragmatic 5. Semantic 4. Syntactic 3. Morphological 2. Phonetic 1. Acoustic

Text To Speech Synthesis

Text To Speech Synthesis • A system which takes as input a sequence of words and converts them to human speech • Do you think TTS is a solved problem?

Applications

Applications • For applications involving access to very large or rapidly changing databases an TTS system driven from a text database has many advantages – Proof-Reading of documents – Speaking and Reading aids for the disabled – Speech output for intelligent machines

Functional outline of a TTS system


Analysis Text

Synthesis Phonetic Description

Speech


Analysis Symbols to Standard Form

Text

AnalysisPhonetic Transcription


Parsing

Semantic Analysis

Speech


Symbols to Standard Form A preprocessor is used to convert symbol strings such as $3.17B to text


Text



Parsing

Semantic Analysis

Speech



Text



Parsing

Semantic Analysis

Speech

Functional outline of a TTS system Phonetic Transcription For each word a phonetic transcription is computed. A morpheme dictionary is used. If the word is not found in dictionary letter to sound rules are used


Text



Parsing

Semantic Analysis

Speech



Text



Parsing

Semantic Analysis

Speech


Parsing To aid the selection of prosody correlates, a phrase- level parsing is performed. POS tagging is done to provide input for the parser


Text



Parsing

Semantic Analysis

Speech



Text



Parsing

Semantic Analysis

Speech


Semantic Analysis Only those semantic effects due to particular lexical items such as negatives are found


Text



Parsing

Semantic Analysis

Speech



Text



Parsing

Semantic Analysis

Speech


Analysis Text


Speech



Analysis Text


Speech


Synthesis Timing

Text

Fundamental Analysis Frequency

Phonetic Targets Phonetic Description

Parameter Synthesis Conversion

Continuation Smoothing

Waveform Generation

Speech


Timing Prepausal Lengthening, pause duration and polysyllabic shortening are determined plus the basic duration of each segment

Synthesis Timing

Text





Waveform Generation

Speech


Synthesis Timing

Text





Waveform Generation

Speech

Functional outline of a TTS system Fundamental Frequency Pitch rises on stressed syllables, continuation rises to signal continued throughout and a number of segmental effects are determined

Synthesis Timing

Text





Waveform Generation

Speech


Synthesis Timing

Text





Waveform Generation

Speech


Phonetic Targets Phonetic Target parameters are determined for each phonetic segment utilizing a context window

Synthesis Timing

Text





Waveform Generation

Speech


Synthesis Timing

Text





Waveform Generation

Speech


Continuation Smoothing The target values are smoothed to get a full set of parameters every frame

Synthesis Timing

Text





Waveform Generation

Speech


Synthesis Timing

Text





Waveform Generation

Speech


Parameter Conversion The phonetic parameters must be converted to filter coefficients

Synthesis Timing

Text





Waveform Generation

Speech


Synthesis Timing

Text





Waveform Generation

Speech


Waveform Generation The synthesizer utilizes coefficients to generate speech waveform

the the

Synthesis Timing

Text





Waveform Generation

Speech


Synthesis Timing

Text





Waveform Generation

Speech


Analysis Text


Speech

Synthesis Techniques


System Model

Articulatory Synthesis

Signal Model

Formant Synthesis

LP Synthesis

Concatenation Synthesis

Synthesis Techniques Attempts to model the human speech production system Synthesis Techniques

System Model


Signal Model

Formant Synthesis

LP Synthesis




System Model


Signal Model

Formant Synthesis

LP Synthesis


Synthesis Techniques Attempts to model the resulting speech signal Synthesis Techniques

System Model


Signal Model

Formant Synthesis

LP Synthesis




System Model


Signal Model

Formant Synthesis

LP Synthesis




The shape of the vocal tract defined by articulators is usually converted toSystem a transfer function Model


Signal Model

Formant Synthesis

LP Synthesis




System Model


Signal Model

Formant Synthesis

LP Synthesis




Use an excitation signal to excite a digital filter defined by formants (separation of source and vocalSignal tract Model ) System Model


Formant Synthesis

LP Synthesis




System Model


Signal Model

Formant Synthesis

LP Synthesis




System Model


Use an excitation signal to excite a digital filter defined by LPC (separation of source and vocal tract ) Signal Model

Formant Synthesis

LP Synthesis




System Model


Signal Model

Formant Synthesis

LP Synthesis




System Model


Concatenates appropriate synthesis units to construct the required speech Signal Processing must be used for Signal Model prosody

Formant Synthesis

LP Synthesis




System Model


Signal Model

Formant Synthesis

LP Synthesis


TTS Demo • http://www.ivona.com/us/

Singing Voice Synthesis


Analysis Text


Singing Speech


Analysis Text


Singing Speech Prosody

Score Editor

Musical notes

Japanese Commercial SVS system • Vocaloid • ‘Supercell’ was created with Vocaloid and it became a hit album/band in 2009 with more than 100,000 cumulative sales

Music Score Editor

Text to Speech Synthesis

Text to Speech Synthesis

Suggest Documents

Text to Speech Synthesis

COMBINATORIAL ISSUES IN TEXT-TO-SPEECH SYNTHESIS

Text-to-Speech Synthesis - Cognitive Systems Lab

text-to-speech synthesis of estonian

Multilingual Text Analysis for Text-to-Speech Synthesis

An Introduction to Text-to-Speech Synthesis - Association for ...

The German Text-to-Speech Synthesis System MARY - Springer Link

Duration Modeling for Arabic Text to Speech Synthesis

Corpus Driven Malayalam Text-to-Speech Synthesis for Interactive ...

experimental results on prosody for romanian text to speech synthesis

IMPROVED POS TAGGING FOR TEXT-TO-SPEECH SYNTHESIS

High-quality text-to-speech synthesis : an overview. - Google Sites

HMM-Based Distributed Text-to-Speech Synthesis Incorporating ...

Thirukkural - A Text-to-Speech Synthesis System 1. - Mile - Indian ...

Statistical Text-To-Speech Synthesis based on Segment-wise ...

An Overview of Text-To-Speech Synthesis Techniques - CiteSeerX

TEXT TO SPEECH SYNTHESIS SYSTEM FOR ... - MILE Lab - IISc

Loudmouth: Modifying Text-to-Speech Synthesis in ...

Duration Modeling for Hindi Text-to-Speech Synthesis - Partha Pratim

Read Text-to-Speech Synthesis Ebook Online - Google Sites

Modular Text-to-Speech Synthesis Evaluation for Mandarin Chinese

VocaliD: personalizing text-to-speech synthesis for individuals with

Thirukkural - A Text-to-Speech Synthesis System 1. Introduction 2

Statistical text-to-speech synthesis of Spanish subtitles - MLLP - UPV