Design and Develop Taiwanese Syllable-based ...

Design and Develop Taiwanese Syllable-based Synthesis Units Database * Yung-Ji Sher, Kao-Chi Chung, Chung-Hsien Wu*

*

Institute of Biomedical Engineering, Institute of Information Science* National Cheng Kung University

! "#$% " "& ' () * + , - & . / 0 1 $23 4567 8 9 : ; < 45 0 = > ?$23 @ A B (C D E 23 F GH I J K L $? M N O P Q (R= $S TJ U 522 VF $W F G= > X Y Z60 1 F [\] ^ _` a_ b $23 45? cM N d e fg $ h i j k2h l 6m _nF G $W = 23 F GX Y Zop q r_sk2t u $23 #) * + , - & . / v w j cx$fy z f{p j | } r_s6J U 23 F GX Y Z ~ e $o a _b $23 456 X Y Z ? 7 8 R$h l f2 3 = > o B ( g f ; < . / 0 1 ?

_Rf23 F GX Y Zf$% * + , - & f$23

"j

combination of basic phonemes of vowels and consonants and single-tone application. Based on this Taiwanese single-tone database, an infinite-vocabulary TTS synthesis system is developed. Taiwanese is a tone language instead of intonation pronunciation, with the DSP transformation of single-tone to other 7 tones, this established 522 single-tone syllables database is most likely to be the synthesis units for all of the possible Taiwanese semantic syllables. So far, the database is determined from syllables screening by using the existed Chinese character mapping to Taiwanese spoken words. The approach for the storage of corresponding acoustic waveform database has been through recording the synthesis units with syllable-based embedded sentences (SBES). The determined SBES are based on a set of syllables consisting of the desired syllables not regarding to the semantics of sentences. The criteria for SBES are determined by the clear boundary between syllables for easy segmentation. During the recording process, the speed, frequency, duration, and intensity of synthesis units are made as consistent as possible. The preliminary infinite-vocabulary Taiwanese TTS system is then born. This research should provide fundamental database for speech analysis, synthesis, and recognition in modern Taiwanese Language . The results may contribute to the education and training of native Taiwanese language, and domestic AAC aided system for the speech disabled.

)

ABSTRACT Speech is most common in the communication of human beings. According to the need of domestic alternative and augmentative communication (AAC) aided system, this study has investigated native Taiwanese spoken language and aimed to develop a Taiwanese syllable-based synthesis units database. Based on principles of phonology and digital signal processing (DSP), the establishment of this database should provide fundamental research and scientific application on the milestone of Taiwanese culture and text-to-speech (TTS) system for the education and training of native language as well as communication-aided technology. This database consists of 522 syllables based on the

Keywords: Taiwanese language, synthesis units’ database, Phonology and DSP, Text-to-Speech

! i f f+ j n T = j ¡ ¢ £ ¤ f¥f¦f§#¨© T fª « ¬ ® ![1]?i !"¯N u ° ± ² f³

1

´ j & µ ¶ · ¸ ¹ 0 1 º a_ » §?¼ ! "¯j T ½ tu ¾ 45¿ reading, listeningÀj ' ª 45(speech, writing) Á ¯ j Â\[2] Ã ¾ Ä' ª Å Æ Ç È É Ê Ë Ì!Å Æ 0 1 [3]?i $(voice) Í Î Ï Q Ð Ñ Ò Ó ÔÕ Ö #0 [× Ø ?$

$Ù £¤ = Ú Û Ü(fundamental frequency)f$ (sound intensity) j ÝÞ Û Ü(formant)ß à ® µ á fÎ & fØ â j "¯ã ä ¬ÃS Ë Ì?0 [ = Ú Û Ü ß àåæ (laryngeal) ç è é ê #[ë ìí Ë Ì¹ Í Î î fïfð ¬ñ ò $ð(pitch)Ú ó ?[ô (vocal folds)* åu [ë (glottis)#õ ö ç è C Ø 3 Ô Õ ç è ÷ Q Ð Ñ ø Ö tu ù æ ú û Þ ª [ ô Í Î $Ãåæ Þ ª #0 [= Û Ü ü ö ¹ ñ ò $ð$ý þ w u [ô Þ (amplitude) [ë Þ ª C Í Î Þ #¨ õ ö ç è [ ¹ 0 û ñ ò $ðj $$? 2 (tongue)fì (jaw) f (velum) f (uvula) j Ê (epiglottis)ª « ° (oral cavity)f (nasal) # Ý B \ ÝÞ Û Ü#$ t 0 û [$(radiation)?[ô Þ ª f0 [ f0 [ !" j ° # $ Ë Ì$Ù % ?Í Î & $ ý ' ( h ) = Ú Û Ü* i Ú ó + , (quasi-periodic sequence of pulses) g $ (vowels)6#* + ) , + (random noise) e $(consonants) [4, 5]? $ (speech communication) j ) * + , - & (digital signal processing) B (. A / 0 X + 12 . / £¤ ) * : ú #D E (digital transmission and storage)f $23 (speech synthesis)f 3 4 (speaker verification and identification) f $ 5 4 (speech recognition)f° 3 4 #6 7 (spoken language identification / translation)f8 ; < 45(aids-to-the handicapped)#+ , 9 Ù 7 : (enhancement of signal quality)[4]¬?; ó " ( £¤ ~