Document not found! Please try again

Malay Grapheme to Phoneme Tool for Automatic Speech Recognition

13 downloads 230 Views 63KB Size Report
This paper presents the design and performance of a Malay grapheme to phoneme (G2P) tool for generating the pronunciation dictionary for a Malay automatic ...
Malay Grapheme to Phoneme Tool for Automatic Speech Recognition Tien-Ping Tan1, Bali Ranaivo-Malançon2 1 School of Computer Science, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia 2 Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Selangor, Malaysia [email protected], [email protected]

Abstract This paper presents the design and performance of a Malay grapheme to phoneme (G2P) tool for generating the pronunciation dictionary for a Malay automatic speech recognition system (ASR). The G2P tool is a rule based system. It is flexible in adding and removing rules, and handling of English words. The G2P tool also contains morphological and syllable tool, which it uses to determine the pronunciation of a word. Our evaluation results showed that using the pronunciation dictionary that was generated automatically from our G2P tool, our Malay ASR system achieves WER of 16.5%, which is only 1.9% higher compared to the usage of a pronunciation dictionary that are manually verified.

1. Introduction A grapheme-to-phoneme (G2P) is a tool used to generate the pronunciation of a given word. A grapheme is “the fundamental unit of written language”, and a phoneme is “the smallest linguistically distinctive unit of sound” [1]. G2P is an important component of many speech processing systems. For example, in speech synthesis systems, the pronunciation of unknown words, that is, words that are not in the pronunciation dictionary can be predicted by applying G2P rules. In speech recognition systems, a G2P tool can be used to generate the pronunciation dictionary. Malay, in its variety of forms, is widely used in Malaysia, Indonesia, Singapore, and southern of Thailand. In this paper, we focus only on the Malay as it is used in Malaysia. Malay is written using either Latin alphabet (Rumi) or an adapted Arabic alphabet (Jawi). The G2P that is described in this paper is only for Rumi Malay. This paper reports our effort to develop Malay G2P system for ASR system. Section 2 provides an

overview of the difficulties in Malay pronunciation. Section 3 describes Malay phonology, while Section 4 and Section 5 discuss Malay morphology and syllable structures respectively. Section 6 presents the grapheme to phoneme rules. We then evaluate our pronunciation dictionary by using an ASR system in Section 7. Conclusions are drawn in Section 8.

2. Challenges for Malay pronunciation The official and national language of Malaysia is Malay, which presents a variety of dialects depending on the regions. These dialects are distinguishable in term of words, pronunciation and/or tone. Malaysia is a multiracial and multilingual country. It is often usual to hear the members of one specific ethnic group to use their first language, leaving Malay as an intergroup language. Many researchers have shown that the first language (L1) of a speaker influences second language (L2) acquisition [2]. Speakers with a different L1 may not speak like the native speakers in term of pronunciation or style. Malay like other languages is also very much influenced by English. A lot of English words have been absorbed into Malay especially in the area of science and technology. Although there is a standard method to convert the borrowed words from English into Malay, in reality in writing and in conversation, the original English words are often used instead. Thus, code switching in Malay is a very common and interesting phenomenon. Automatic speech recognition (ASR) for Malay is therefore challenging due to the existence of different variants of Malay and code switching. The first published paper on Malay ASR system was in 1997 [3]. Their research concerns the recognition of Malay isolated digits [4] and the segmentation and labeling of speech utterance for Malay ASR system [5]. Pronunciation dictionary is one of the components of an ASR system. A G2P tool is

Table 1. Malay consonants Place of articulation

Mode of articulation

Bi-lab.

Lab.-dent.

Dent.

pb

Plosive Fricative Affricate Vibrante Lateral Nasale Glide

fv

θð

m w

Table 2. Malay vowels Front Close Close-mid Open

i e

ə a

Back u o

Alveo.

Alveopalat.

td sz tş r l n

Palat.

ȓ ȴ

Ȃ j

Vel.

Glot.

kg xǴ

? h

ŋ

To our knowledge, there is only one Malay pronunciation dictionary besides our own pronunciation dictionary. This lexicon contains 13,550 entries, “each entry is associated with its pronunciation, syllable grouping of the phonemes and the stress level for each syllable” [9].

3. Malay phonology There are 36 phonemes in Malay [10]. Six of them are vowels, three are diphthongs and 27 are consonants. Table 1 and Table 2 show the IPA tables for Malay vowels and consonants respectively. The three Malay diphthongs are /aj/, /aw/ and /oj/. Figure 1 shows the Malay phoneme distribution in the text.

18 16 14 12 P e rc e n t

10 8 6 4 2 0 a b d dZ e f g h i j J k l m n N o o j/a j/a w p r s S t tS u w z ? @

normally used to generate the pronunciation of a word. This pronunciation is predicted based on the grapheme of the word, since most of the graphemes in a particular context have a specific way to be pronounced. There is still no real consensus on the list of Malay speech sounds even for the standard Malay. El-Imam and Don [6] proposed 27 consonants (19 native consonants and 8 consonants that appear only in borrowed words) and 6 vowels. Ting [7] proposed 33 phonemes with 18 pure Malay consonants and 6 vowels. In the same year, the same previous authors announced 8 vowels [8]. El-Iman and Don [6] have divided the rules for the transformation of letters into sounds into two sets: a set 30 grapheme-to-phoneme rules and a set of 46 phoneme-to-phonetic rules. At the word level, the system made 43% errors due to the pre-processing of abbreviations, numbers, and unknown words. Ting et al. [7] have used two methods to obtain the graphemeto-phoneme rules. First, they wrote manually 94 rules. The matching is only 71% at the word level. Second, they applied CART tree model to acquire automatically the grapheme-to-phoneme rules. The result is slightly better: 73.93% at the word level. In Malay, the pronunciations of most words can be determined from the grapheme, although there are some exceptions. Besides grapheme, the morphology and the syllable structures of Malay words are also required to determine the pronunciation of the word. As mentioned above, English words also often appear in Malay texts. For these words, a different strategy has to be applied to generate their pronunciations.

Phoneme

Figure 1. Phoneme distribution of Malay

4. Malay morphology Malay is an agglutinative language. It can create new words by adding affixes to a root word. Besides, additional bound morphemes can be added to the affixed word as it is shown in Figure 2 [11].

Circumfix Infix

Prefix

Proclitic

Root

Suffix

Affixed word

Enclitic Particle

Figure 2. Affixed word with clitic and particle [11]

The procedure for segmenting a word into syllables is simple (Figure 4). First, the possible syllable that can be formed is determined. During syllable segmentation, the grapheme of the word is converted to different sound class like vowel, diphthong, fricative, affricate, plosive, nasal and glides. The word is then segmented to syllables by determining the largest syllable that can be formed from right to left. diberikannya

Infixation is no longer used in Malay. The native affixes contain nine prefixes, three suffixes, and seven circumfixes. There are two proclitics, four enclitics, and three particles. Most of these bound morphemes are monosyllabic.

diberikan.nya diberi.kan.nya dibe.ri.kan.nya

5. Malay syllable structures

di.be.ri.kan.nya Malay syllable structures are shown in Table 3. Most of the words with two or more consonants that form the coda of a syllable are borrowed from English. For example the Malay word ‘struktur’ is from the English word ‘structure’. Table 3. Malay syllable structures Syllable Word Description V i.kan V.CVC CV sa.tu CV.CV CVC ban.tu CVC.CV CCV dwibahasa CCV.CV.CV.CV CCVC prak.tik CCVC.CVC CCCV stra.tegi CCCV.CVCV CCCVC struk.tur CCCVC.CVC Figure 3 shows the distribution of Malay words in the texts in term of syllable length. Most of the words in Malay are disyllabic. Disyllabic words form nearly half of the overall words in the text. This is followed by words with three syllables.

Figure 4. Segmenting the word ‘diberikannya’ to syllables

6. Grapheme to phoneme conversion rules An efficient Malay G2P tool has to be flexible in adding and removing rules because speakers with different background may use different pronunciation rules. In addition, it needs to handle English words that can appear in Malay texts. Our Malay G2P is a rulebased tool. We use eight rules to automatically generate the standard Malay pronunciations. Pronunciation variants can be generated by adding or removing some of the standard Malay rules used. As for English words, we produce the pronunciation of English words using Malay phonemes.

6.1. Standard Malay rules 6.1.1. General replacement rule. Every grapheme is by default mapped to a Malay phoneme. For example the word diberikan is converted to /d i b ə r i k a n/. One might notice that there is a phoneme without any default mapping to any grapheme. That phoneme is /e/. This phoneme is normally mapped to the grapheme ‘e’ for certain words.

0.5 0.45 0.4 0.35 Percent (%)

CV.CV.CV.CVC.CV

0.3 0.25 0.2 0.15 0.1 0.05 0 1

2

3 Number of syllables

4

>5

Figure 3: Words distribution with different number of syllable length

Table 4. Grapheme to phoneme mapping rules Grapheme Phoneme Grapheme Phoneme p b t

p b t

j l r

dȢ l r

d k q g s x h f v z sy sh kh gh c

d k k g s s h f v z ȓ ȓ x dz tȓ

m n ng ny w y a e i o u ai au oi

m n ŋ Ȃ w j a ǩ i o u aj aw oj

6.1.2. Schwa rule. The grapheme ‘a’ at the end of a word is pronounced as /ə/. This rule is applicable for ‘old’ Malay words. For example the word suka is pronounced as /s u k ə/. If the root word is combined with a suffix, the ‘a’ at the end of the root word is still pronounced as /ə/. For example, the word sukakan, which is formed by adding the suffix -kan to the root word suka is pronounced as /s u k ə k a n/ For English borrowed words and proper nouns, this rule is not applicable. The current schwa rule does not distinguish borrowed words and proper nouns. Thus, the rule is applied on all words. However, a workaround is possible since there are a finite number of words which is applicable. These words can be manually identified. There are also many speakers who do not use this rule. 6.1.3. Glottal stop insertion rule. A glottal stop /?/ is inserted between two particular sequences of vowel graphemes in a word. Table 5: Glottal stop insertion rules Grapheme sequence Word Pronunciation aa taat /t a ? a t/ oa doa /d o ? a/

6.1.4. General glottal stop rule. The grapheme ‘k’ at the end of the syllable is converted to a glottal stop /?/. For example, tidak is pronounced as /t i d a ?/. 6.1.5. Final ‘r’ deletion rule. The grapheme ‘r’ at the end of a word is not pronounced. However, there are some speakers that pronounce this final ‘r’. For example: sukar is pronounced as /s u k a/.

6.1.6. Glide insertion rule. The rule inserts a glide between two particular vowel grapheme sequences in a word. For example, buah is pronounced as /b u w a h/, and siap is pronounced as /s i j a p/. 6.1.7. Last syllable rule. For words with more than one syllable, in a certain context, the grapheme ‘u’ and ‘i’ of the last syllable is converted to phoneme /o/ and /e/ respectively. Like schwa rule, if the root word is appended with a suffix, the rule is still applicable on the last syllable of the root word.

Target grapheme u i

Table 6. Last syllable rules Following Word Pronunciation grapheme k, h, p, m, hidup /h i d o p/ ng, r k, l, t, h, r, bilik /b i l e ?/ t, k

6.1.8. Duplicate grapheme rule. Two similar graphemes are converted to a single phoneme. This rule is used mostly for proper nouns. For example, Azzam is pronounced as /a z a m/.

6.2. Variant Malay rules We create a set of pronunciation variants by simply removing schwa rule and final ‘r’ deletion rule from the standard Malay rules because there are speakers who do not use these rules in their speech.

6.3. English words Since it is possible for English words to appear in Malay text, a different approach was used to generate the pronunciation of these words. First, the unknown word is compared to the English vocabulary (from an English pronunciation dictionary). If the word can be found, the English pronunciation is converted to the Malay equivalent. This is done by mapping each English phoneme to the ‘nearest’ Malay phoneme based on perception. Except for some diphthongs, they are mapped to 2 phonemes. Studies have shown that non-native speakers tend to replace the target language phoneme with their native language phoneme. The English pronunciation dictionary Hub4 from CMU was used. It is possible that a word can appear in English and Malay. For words that appear in the English pronunciation dictionary, the pronunciation of the word will appear as variants.

Table 7. Correspondence of English and Malay phonemes English Malay English Malay Phoneme Phoneme Phoneme Phoneme ǡ Ǣ æ Ȝ Ǥ aȚ aǺ b Șȓ d ð e ǩ, Ǫ eǺ f g h i I (long)

a o e ǩ o aw aj b Șȓ d d e ǩ ej f g h i i

k l m n ŋ ǩȚ ǤǺ p r s ȓ t θ Ț u v w j z

k l m n ŋ ow oj p r s ȓ t t u u v w j z

ȴ

ȴ

Ȣ

z

8. Conclusion The results show that automatically generated pronunciation dictionary performed only slightly worst than the pronunciation dictionary that was created semi-automatically. However, it also shows that there is still room for improvement. For the mapping of grapheme ‘e’ to phoneme /e/, one possible way to reduce the mismatch is by force aligning the grapheme ‘e’ to either /ǩ/ or /e/. This approach however only solves part of the problem. The second improvement is to identify words that should be applied schwa rules and words that should not. As discuss earlier, one way is to manually determine those words that should apply this rule. This will eliminate some unnecessary variants from the dictionary. Thirdly, we should verify the English to Malay phoneme mapping to make sure that they are applied correctly. We may even improve the mapping by taking into consideration the context of the English phoneme it is in. Fourth improvement possible is to determine from the original text, a word found in the English pronunciation dictionary, whether it is really an English word or a Malay word.

8. References

7. Evaluation

[1] Wikipedia, http://en.wikipedia.org/wiki/.

For evaluating the read speech, the MASS Malay speech corpus [12] was used. The speech corpus consists of about 70 hours of read speech from 90 speakers. The audio files were divided into training and testing part. Sphinx 3 automatic speech recognition system from CMU was used for the evaluation. The language model was a trigram model created from the text corpus of 500MB. As for the acoustic model, a continuous HMM acoustic model with 3000 states, and 16 Gaussian mixtures per state was created. A pronunciation dictionary with about thirty thousand words was created using our G2P tool. We test the pronunciation dictionary that was created automatically using the G2P tool against the pronunciation dictionary that was created semi-automatically. For the pronunciation dictionary that was created semiautomatically, expert was assigned to correct the pronunciation of words that are not correctly generated, especially the conversion of grapheme ‘e’ to /e/, and schwa rule that was applied incorrectly. With the automatically generated dictionary, we were able to achieve WER of 16.5%, while using the semiautomatically generated pronunciation dictionary, our current system achieved a WER of 14.6%, or 11.5% relative improvement.

[2] J. Flege, "Second Language Speech Learning Theory, Findings, and Problems," in Speech Perception and Linguistic Experience: Issues in Cross-Language Research, W. Strange, Ed.: Baltimore: York Press, 1995, pp. 233-277. [3] A. Hussain, M. Othman and Z. A. Md. Shariff, “Recurrent backpropagation neural subnetworks for phoneme based Malay speech recognition”, ISPACS 97, Malaysia, 1997. [4] S. A. R. Al-Haddad, S. A. Samad, A. Hussain, K.A. Ishak, “Isolated Malay Digit Recognition Using Pattern Recognition Fusion of Dynamic Time Warping and Hidden Markov Models”, American Journal of Applied Sciences, 5(6), 2008, pp. 714-720. [5] S. A. R. Al-Haddad, S. A. Samad, A. Hussain, “Automatic Segmentation and Labeling for Malay Speech Recognition”, WSEAS Transactions on Signal Processing, 9(2), 2006, pp. 1337-1341. [6] Y. A. El-Imam, Z. M. Don, “Text-to-speech conversion of standard Malay”, International Journal of Speech Technology 3, Kluwer Academic Publishers, 2000, pp. 129146. [7] H. N. Ting, J. Yunus, S. H. S. Salleh, “Classification of Malay speech sounds based on place of articulation and

voicing using neural networks”, Proceedings of IEEE International Conference on Electrical Technology, vol. 1, 2001, pp. 170-173. [8] H. N. Ting, J. Yunus and S. H. S. Salleh. “Speakerindependent Malay syllable recognition using singular and modular neural network”, Journal Teknologi, 35(D), 2001, pp. 65-76. [9] K. L. Wai, H. O. Siew and R. Zainuddin, “Building a unit selection speech synthesiser for Malay language using FESTVOX and hidden Markov model toolkit (HTK)”, Chiang Mai University Journal of Natural Sciences, 6(1), 2007, pp. 149-158.

[10] Y. M. Maris, The Malay Sound System. Malaysia: Siri Teks Fajar Bakti, 1979. [11] B. Ranaivo-Malacon, "Computational Analysis of Affixed Words in Malay Language," Internal Publication, USM, 2004. [12] T-P. Tan, H. Lee, E. K. Tang, X. Xiong, E. S. Chng, "MASS: A Malay Language LVCSR Corpus Resource", Cocosda’09, Urumqi, 2009 (submitted).

Suggest Documents