human computer interface for kinyarwanda language - CiteSeerX

AUTOMATIC SPEECH RECOGNITION: HUMAN COMPUTER INTERFACE FOR KINYARWANDA LANGUAGE

Muhirwe Jackson BSc (Mak)

A Project Report Submitted in Partial Fulfilment of the Requirements for the Award of the Degree Master of Science in Computer Science of Makerere University

August 2005

Title page

i

DECLARATION I, Muhirwe Jackson, do hereby declare this project report as my original work and has never been submitted for any award of a degree in any institution of higher learning.

Signed: .......................................................... Date: ........................................... Muhirwe Jackson, Candidate.

ii

APPROVAL This report has been submitted for examination with my approval as supervisor.

Signed: .......................................................... Date: ........................................... Dr. Jehopio Peter, Ph.D. Supervisor.

iii

DEDICATION To the prince of peace, my Lord and savior Jesus Christ Let it be said of me that my source of strength is Christ alone.

To my wife, Yvonne Muhirwe who has greatly encouraged and supported me during my studies.

To my children: who always bring joy to my life.

To my mum: Ms Mukankuliza Joyce Who has wonderfully supported and encouraged throughout my education: There’s no mother like you.

To my Brothers and sister I love you all

iv

”I can do all things through Christ which strengtheneth me.” Phil 4:13 KJV

v

ACKNOWLEDGEMENT Success in life is never attained single handedly. I would like to express my heartfelt gratitude to my God almighty who revealed Himself to me through the Holy spirit and has since been my source of strength and wisdom. I wish to extend thanks to my supervisor, Dr. Peter Jehopio for the professional guidance that has enabled me accomplish this research. I also wish to extend my sincere thanks to the dean Faculty of computing and Information technology Dr. Baryamureeba Venansius for all the support he has provided to me both morally and financially without which this project may not have been a success. I would like to appreciate my wife, Yvonne Muhirwe for being such a wonderful, loving and understanding wife. Thanks for giving me space and time to dedicate to my studies, my success is your success. I extend my thanks and appreciation to the Rector of the Kigali Institute of Education, Mr. Mudidi Emmanuel for having faith in me and for all the support he provided to me at the beginning of the course. I extend my thanks and appreciation to the Rwandan Government through the Student Financing Agency for Rwanda (SFAR) for sponsoring me for the entire course. My sincere appreciation goes to Makerere University Faculty of computing and Information Technology staff, more especially Paul Bagenda, and Kanagwa Ben for their technical support. Lastly but not least, I acknowledge all my lecturers and all my classmates on the computer science programmes for having made my academic and social life comfortable at Makerere University.

MAY GOD BLESS YOU ABUNDANTLY

vi

LIST OF ACRONYMS/ABBREVIATIONS LVCSR Large Vocabulary Continuous Speech Recognition. ASR

Automatic speech recognition

TTS

Text-to-speech

IVR

Interactive Voice Response

HCI

Human Computer Interaction

I/O

Input and Output

SU

Speech Understanding

GUI

Graphical User Interface

DVI

Direct Voice Input

HMM Hidden Markov Models HTK

Hidden Markov Model Toolkit.

BNF

Backus-Naur form

SLF

Standard Lattice Format

MLF

Master Label Files

MFCC Mel Frequency Cepstral Coefficients.

vii

Contents TITLE PAGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

DECLARATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

APPROVAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

LIST OF ACRONYMS/ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . .

vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

ABSTRACT

xii

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 INTRODUCTION

1

1.1

Background to the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3

Objectives of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3.1

General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.3.2

Specific Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.4

Scope

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.5

Significance of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2 Literature Review

6

2.1

Current State of ASR Technology and its Implications for Design . . . . . .

6

2.2

Types of ASR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.3

Speech Recognition Techniques . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.4

Matching Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

viii

2.5

Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.6

Problems in Designing Speech Recognition Systems . . . . . . . . . . . . . .

11

2.7

Similar Projects Carried out . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3 METHODOLOGY 3.1

14

Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.1.1

The Task Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.1.2

A Pronunciation Dictionary . . . . . . . . . . . . . . . . . . . . . . .

18

3.1.3

Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

3.1.4

Phonetic Transcription . . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.1.5

Encoding the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Parameter Estimation (Training) . . . . . . . . . . . . . . . . . . . . . . . .

23

3.2.1

Training Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

3.2.2

HMM Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.2.3

HMM Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.2.4

Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

3.3

Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

3.4

Running the Recognizer Live . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.2

4 RESULTS

32

4.1

Perfomance Test

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

4.2

Perfomance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

4.3

Testing the System on Live Data . . . . . . . . . . . . . . . . . . . . . . . .

33

5 DISCUSSION, CONCLUSION AND RECOMMENDATIONS

35

5.1

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

5.2

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

5.3

Areas for Further Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

REFERENCES

39

ix

APPENDICES

43

Appendix A: Word Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

Appendix B: Training Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

Appendix C: Master Label Files . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

Appendix D: Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

Appendix E: HMM Definitions

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

Appendix F: VarFloor1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

Appendix G: Recognition Output . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

Appendix H: Testing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

x

List of Figures 3.1

Components of an ASR system . . . . . . . . . . . . . . . . . . . . . . . . .

15

3.2

Grammar for voice dialling . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.3

Process of creating a word lattice . . . . . . . . . . . . . . . . . . . . . . . .

17

3.4

Recording and labelling data using hslab . . . . . . . . . . . . . . . . . . . .

20

3.5

Training HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.6

Training isolated whole word models . . . . . . . . . . . . . . . . . . . . . .

25

3.7

HMM training process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.8

Speech recognition process . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

4.1

Speech recognition results . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

4.2

Live data recognition results . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

xi

ABSTRACT The main purpose of the study was to develop an automatic speech recogniser for Kinyarwanda language. The products of the study include an automatic phone dialling speech corpus, a Kinyarwanda digit speech recogniser, a recipe for building HMM speech recognisers, especially for Kinyarwanda language. Two different corpora were collected of audio recordings of indigenous Kinyarwanda language speakers, in which subjects read aloud numeric digits. One of the collected corpora contained the trainig data and the other the testing data. The system was implemented using the HMM toolkit HTK by training HMMs of the words making the vocabulary on the training data. The trained system was tested on data other than the training data and results revealed that 94.47% of the tested data were correctly recognized. The developed system can be used by developers and researchers interested in speech recognition for Kinyarwanda language and any other related African language. The findings of the study can be generalized to cater for large vocabularies and for continuous speech recognition.

xii

Chapter 1 INTRODUCTION 1.1

Background to the Study

Speech is one of the oldest and most natural means of information exchange between human beings. We as humans speak and listen to each other in human-human interface. For centuries people have tried to develop machines that can understand and produce speech as humans do so naturally (Pinker, 1994 [20]; Deshmukh et al., 1999 [5]). Obviously such an interface would yield great benefits (Kandasamy,1995,) [12]. Attempts have been made to develop vocally interactive computers to realise voice/speech recognition. In this case a computer can recognize text and give out a speech output (Kandasamy,1995) [12]. Voice/speech recognition is a field of computer science that deals with designing computer systems that recognize spoken words. It is a technology that allows a computer to identify the words that a person speaks into a microphone or telephone. Speech recognition can be defined as the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words (Zue et al., 1996 [36]; Mengjie, 2001 [17]). Automatic speech recognition (ASR) is one of the fastest developing fields in the framework of speech science and engineering. As the new generation of computing technology, it comes as the next major innovation in man-machine interaction, after functionality of text-to-speech (TTS), supporting interactive voice response (IVR) systems. The first attempts (during the 1950s) to develop techniques in ASR, which were based on the direct conversion of speech signal into a sequence of phoneme-like units, failed. The 1

first positive results of spoken word recognition came into existence in the 1970s, when general pattern matching techniques were introduced. As the extension of their applications was limited, the statistical approach to ASR started to be investigated, at the same period. Nowadays, the statistical techniques prevail over ASR applications. Common speech recognition systems these days can recognize thousands of words. The last decade has witnessed dramatic improvement in speech recognition technology, to the extent that high performance algorithms and systems are becoming available. In some cases, the transition from laboratory demonstration to commercial deployment has already begun (Zue et al., 1996) [36]. The reason for the evolution of ASR, hence improved is that it has a lot of applications in many aspects of our daily life, for example, telephone applications, applications for the physically handicapped and illiterates and many others in the area of computer science. Speech recognition is considered as an input as well as an output during the Human Computer Interaction (HCI) design . HCI involves the design implementation and evaluation of interactive systems in the context of the users’ task and work.(Dix et al., 1998) [6]. The list of applications of automatic speech recognition is so long and is growing; some of known applications include virtual reality, Multimedia searches, auto-attendants, travel information and reservation, translators, natural language understanding and many more applications (Scansoft, 2004 [27]; Robertson, 1998 [24]). Speech technology is the technology of today and tomorrow with a developing number of methods and tools for better implementation. Speech recognition has a number of practical implementations for both fun and serious works. Automatic speech recognition has an interesting and useful implementation in expert systems, a technology whereby computers can act as a substitute for a human expert. An intelligent computer that acts, responds or thinks like a human being can be equipped with an automatic speech recognition module that enables it to process spoken information. Medical diagnostic systems, for example, can diagnose a patient by asking him a set of questions, the patient responding with answers, and the system responds with what might be a possible disease.

2

1.2

Statement of the Problem

As the use of ICT tools, especially the computer, is becoming inevitable, there are many Rwandans who are left out due to inadequate human computer interface (HCI) design considerations. A case in point is the many Rwandans who are left out due to language barrier (Earth trends, 2003) [8]. These people can only read and write in their mother-tongue, Kinyarwanda language making it impossible for them to use ICT conventional tools that are built in the two International languages, English and French used in Rwanda. The purpose of this project was therefore to design and train a speech recognition system that could be used by application developers to develop application that will take indigenous Kinyarwanda language speakers aboard the current information and communication technologies to fast-track the benefits of ICT.

1.3

Objectives of the Study

1.3.1

General Objective

The general objective of the project was to develop an automatic speech recogniser for Kinyarwanda language.

1.3.2

Specific Objectives

The specific objectives of the project are:

i. To critically review literature related to ASR. ii. To identify speech corpus elements exhibited in African languages such as Kinyarwanda language. iii. To build a Kinyarwanda language speech corpus for a voice operated telephone system. iv. To implement an isolated whole word speech recognizer that is capable of recognizing and responding to speech.

3

v. To train the above developed system in order to make it speaker independent. vi. To validate the automatic speech recognizer developed during the study.

1.4

Scope

The project was limited to only isolated whole words and trained and tested on only one (1) word sentences consisting of the numeric digit 0 to 9 that could be used on operating a voice operated telephone system. Human speech is inherently a multi modal process that involves the analysis of the uttered acoustic signal and includes higher level knowledge sources such as grammar semantics and pragmatics (Dupont, 2000) [7]. This research intends to focus only on the acoustic signal processing ignoring the visual input.

1.5

Significance of the Study

The proposed research has theoretical, practical, and methodological significance:

i. The speech corpus developed will be very useful to any researcher who may wish to venture into Kinyarwanda language automatic speech recognition. ii. By developing and training a speech recognition system in Kinyarwanda language, the semi illiterates would be able to use it in accessing IT tools. This would help bridge the digital divide, since Rwanda is a monolingual nation with a population of about 8Million (Earth trends, 2003) [8] all speaking Kinyarwanda language. iii. Since Speech technology is the technology of today and tomorrow, the results of this research will help many indigenous Kinyarwanda language speakers who are scattered all over the great lakes region to take advantage of the many benefits of ICT. iv. The technology will find applicability in systems such as banking, telecommunications, transport, Internet portals, accessing PC, emailing, administrative and public services, cultural centres and many others. 4

v. The built system will be very useful to computer manufactures and software developers as they will have a speech recognition engine to include Kinyarwanda language in their applications. vi. By developing and training a speech recognition system in Kinyarwanda language, it would mark the first step towards making ICT tools become more usable by the blind and elderly people with seeing disabilities.

5

Chapter 2 Literature Review Human computer interactions as defined in the background is concerned about ways Users (humans) interact with the computers. Some users can interact with the computer using the traditional methods of a keyboard and mouse as the main input devices and the monitor as the main output device. Due to one reason or another some users cannot be able to interact with machines using a mouse and keyboard (Rudnicky et al., 1993) [26] device, hence the need for special devices. Speech recognition systems help users who in one way or the other can not be able to use the traditional Input and Output(I/O) devices. For about four decades human beings have been dreaming of an ”intelligent machine” which can master the natural speech (Picheny, 2002) [19]. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech understanding (SU) (Reddy, 1976) [23]. The goal of ASR is to transcribe natural speech while SU is to understand the meaning of the transcription. Recognizing and understanding a spoken sentence is obviously a knowledge-intensive process, which must take into account all variable information about the speech communication process, from acoustics to semantics and pragmatics.

2.1

Current State of ASR Technology and its Implications for Design

The design of user interfaces for speech-based applications is dominated by the underlying ASR technology. More often than not, design decisions are based more on the kind of recog6

nition the technology can support rather than on the best dialogue for the user (Mane et al., 1996) [16]. The type of design will depend, broadly, on the answer to this question: What type of speech input can the system handle, and when can it handle it? When isolated words are all the recognizer can handle, then the success of the application will depend on the ability of designers to construct dialogues that lead the user to respond using single words. Word spotting and the ability to support more complex grammars opens up additional flexibility in the design, but can make the design more difficult by allowing a more diverse set of responses from the user. Some current systems allow a limited form of natural language input, but only within a very specific domain at any particular point in the interaction. Even in these cases, the prompts must constrain the natural language within acceptable bounds. No systems allow unconstrained natural language interaction, and it’s important to note that most human-human transactions over the phone do not permit unconstrained natural language either. Typically, a customer service representative will structure the conversation by asking a series of questions. With ”barge-in” (also called ”cut-through”) (Mane et al., 1996,) [16], a caller can interrupt prompts and the system will still be able to process the speech, although recognition performance will generally be lower. This obviously has a dramatic influence on the prompt design, because when barge-in is available it’s possible to write longer more informative prompts and let experienced users barge-in. Interruptions are very common in humanhuman conversations, and in many applications, designers have found that without barge-in people often have problems. There are a variety of situations, however, in which it may not be possible to implement barge-in. In these cases, it is still usually possible to implement successful applications, but particular care must be taken in the dialogue design and error messages. Another situation in which technology influences design involves error recovery. It is especially frustrating when a system makes the same mistake twice, but when the active vocabulary can be updated dynamically, recognizer choices that have not been confirmed can be eliminated, and the recognizer will never make the same mistake twice. Also, when more than one choice is available (this is not always the case, as some recognizers return only the top choice), then after the top choice is disconfirmed, the second choice can be presented.

7

2.2

Types of ASR

ASR products have existed in the marketplace since the 1970s. However, early systems were expensive hardware devices that could only recognize a few isolated words (i.e. words with pauses between them), and needed to be trained by users repeating each of the vocabulary words several times. The 1980s and 90s witnessed a substantial improvement in ASR algorithms and products, and the technology developed to the point where, in the late 1990s, software for desktop dictation became available ’off-the-shelf’ for only a few tens of dollars. From a technological perspective it is possible to distinguish between two broad types of ASR: ’direct voice input’ (DVI) and ’large vocabulary continuous speech recognition’ (LVCSR). DVI devices are primarily aimed at voice command-and-control, whereas LVCSR systems are used for form filling or voice-based document creation. In both cases the underlying technology is more or less the same. DVI systems are typically configured for small to medium sized vocabularies (up to several thousand words) and might employ word or phrase spotting techniques. Also, DVI systems are usually required to respond immediately to a voice command. LVCSR systems involve vocabularies of perhaps hundreds of thousands of words, and are typically configured to transcribe continuous speech. Also, LVCSR need not be performed in real-time - for example, at least one vendor has offered a telephone-based dictation service in which the transcribed document is e-mailed back to the user. Specific examples of application of ASR may include but not limited to the following

i. large vocabulary dictation - for RSI sufferers and quadriplegics, and for formal document preparation in legal or medical services. ii. Interactive voice response - for callers who do not have tone pads, for the automation of call centers, and for access to information services such as stock market quotes. iii. Telecom assistants - for repertory dialling and personal management systems. iv. Process and factory management - for stocktaking, measurement and quality control.

8

2.3

Speech Recognition Techniques

Speech recognition techniques are the following:

i. Template based approaches matching (Rabiner et al., 1979) [22], Unknown speech is compared against a set of pre-recorded words( templates) in order to find the best match. This has the advantage of using perfectly accurate word models. But it also has the disadvantage that pre-recorded templates are fixed, so variations in speech can only be modelled by using many templates per word, which eventually becomes impractical. Dynamic time warping is such a typical approach (Tolba et al., 2001) [31]. In this approach, the templates usually consists of representative sequences of features vectors for corresponding words. The basic idea here is to align the utterance to each of the template words and then select the word or word sequence that contains the best. For each utterance , the distance between the template and the observed feature vectors are computed using some distance measure and these local distances are accumulated along each possible alignment path. The lowest scoring path then identifies the optimal alignment for a word and the word template obtaining the lowest overall score depicts the recognised word or sequence of words. ii. Knowledge based approaches: An expert knowledge about variations in speech is hand coded into a system. This has the advantage of explicit modelling variations in speech; but unfortunately such expert knowledge is difficult to obtain and use successfully. Thus this approach was judged to be impractical and automatic learning procedure was sought instead. iii. Statistical based approaches. In which variations in speech are modelled statistically, using automatic, statistical learning procedure, typically the Hidden Markov Models, or HMM. The approach represent the current state of the art. The main disadvantage of statistical models is that they must take priori modelling assumptions which are liable to be inaccurate, handicapping the system performance. In recent years, a new approach to the challenging problem of conversational speech recognition has emerged, holding a promise to overcome some fundamental limitations of the conventional Hidden Markov Model (HMM) approach (Bridle et al., 1998 [2]; Ma and Deng, 2004 [14]). 9

This new approach is a radical departure from the current HMM-based statistical modeling approaches. Rather than using a large number of unstructured Gaussian mixture components to account for the tremendous variation in the observable acoustic data of highly coarticulated spontaneous speech, the new speech model that (Ma and Deng, 2004) [15] have developed provides a rich structure for the partially observed (hidden) dynamics in the domain of vocal-tractresonances. iv. Learning based approaches. To overcome the disadvantage of the HMMs machine learning methods could be introduced such as neural networks and genetic algorithm/ programming. In those machine learning models explicit rules or other domain expert knowledge) do not need to be given they a can be learned automatically through emulations or evolutionary process. v. The artificial intelligence approach attempts to mechanise the recognition procedure according to the way a person applies its intelligence in visualizing, analysing, and finally making a decision on the measured acoustic features. Expert system are used widely in this approach (Mori et al., 1987) [18]

2.4

Matching Techniques

Speech-recognition engines match a detected word to a known word using one of the following techniques (Svendsen et al., 1989) [29].

i. Whole-word matching. The engine compares the incoming digital-audio signal against a prerecorded template of the word. This technique takes much less processing than sub-word matching, but it requires that the user (or someone) prerecord every word that will be recognized - sometimes several hundred thousand words. Whole-word templates also require large amounts of storage (between 50 and 512 bytes per word) and are practical only if the recognition vocabulary is known when the application is developed. ii. Sub-word matching. The engine looks for sub-words - usually phonemes - and then performs further pattern recognition on those. This technique takes more processing 10

than whole-word matching, but it requires much less storage (between 5 and 20 bytes per word). In addition, the pronunciation of the word can be guessed from English text without requiring the user to speak the word beforehand.

(Svendsen et al., 1989) [29], (Rabiner et al., 1981) [22], and (Wilpon et al., 1988) [34] discuss that research in the area of automatic speech recognition had been pursued for the last three decades, only whole-word based speech recognition systems have found practical use and have become commercial successes. Though whole word models have become a success the researchers mentioned above all agree that they still suffer from two major problems, that is co-articulation problems and requiring a lot of training to build a good recognizer.

2.5

Corpora

To build any speech engine whether a speech recognition engine or speech sythensis engine you need a corpus. Corpora are any collections of text and/or speech, and are used as a basis of statistical processing of natural language (Jurafsky and Martin, 2000) [10]. There are various kinds of corpora: tagged or untagged; monolingual or multilingual; balanced or specialized. For example, one of the largest and best-known corpora, the British National Corpus (Warwick, 1997) [32], consists of 100 million words of written (about 90%) and speech (about 10%) data collected from modern British English which covers a variety of styles and subjects. Speech corpus could be specialised with only telephone data (Cole et al., 1992) [4], names, names of places, etc. Developing a speech corpus may involve data collection and transcription(Cole et al., 1994) [3].

2.6

Problems in Designing Speech Recognition Systems

ASR has been proved to be a not easy task. According to (Rudnicky et al., 1993) [26] the main challenge in the implementation of ASR on desktops is the current existence of mature and efficient alternatives, the keyboard and mouse. In the past years, speech researchers have found several difficulties that contrast with the optimism of the first speech technology 11

pioneers. According to Ray Reddy (Reddy, 1976) [23] in his review of speech recognition by machines says that the problems in designing ASR are due to the fact that it is related to so many other fields such as acoustics, signal processing, pattern recognition, phonetics, linguistics, psychology, neuroscience, and computer science. And all these problems can be described according to the tasks to be performed. i. Number of speakers: With more than one speaker, an ASR system must cope with the difficult problem of speech variability from one speaker to another. This is usually achieved through the use of large speech database as training data (Huang et al., 2004) [9]. ii. Nature of the utterance: Isolated word recognition impose on the speaker the need to insert artificial pause between successive utterances. Continuous speech recognition systems are able to cope with natural speech utterances in which words may be tied together and may at times be strongly affected by co articulation. Spontaneous speech recognition systems allow the possibility of pause and false starts in the utterance, the use of words not found in the lexicon, etc. iii. Vocabulary size: In general, increasing the size of the vocabulary decrease the recognition scores. iv. Differences between speakers due to sex, age, accent and so on. v. Language complexity: The task of continuous speech recognisers is simplified by limiting the number of possible utterances through the imposition of syntactic and semantic constraints. vi. Environment conditions: The sites for real applications often present adverse conditions (such as noise, distorted signal, and transmission line variability) which can drastically degrade the system performance.

2.7

Similar Projects Carried out

African Speech Technology is the working title of a 3-year project promoting the development of the official languages of South Africa through language and speech technology applications 12

at the University of Stellenbosch. So far they have covered South African English, isiZulu, isiXhosa, Sesotho and Afrikaans (Roux et al., 2000) [25] . While African Speech Technology and other research centers are engaged in speech technology research, there is still a long way to go in automatic speech recognition of many indigenous languages in Africa. Most of what is done in automatic speech recognition worldwide revolves around the many English dialects and major languages of the northern hemisphere.

13

Chapter 3 METHODOLOGY This chapter gives a full description of how the Kinyarwanda language speech recognition system was developed. The goal of the project was to build a robust whole word recognizer. That means it should be able to generalise both from speaker specific properties and its training should be more than just instance based learning. In the HMM paradigm this is supposed to be the case, but the researcher intended to put this into practice. As the time scope was limited and to be able to focus on more specific issues than HMM in general, the Hidden Markov Model toolkit (HTK) was used. HTK is a toolkit for building Hidden Markov Models (HMMs). HMMs can be used to model any time series and the core of HTK is similarly general-purpose. However, HTK is primarily designed for building HMM-based speech processing tools, in particular recognisers (Young S.et al., 2002) [35]. Secondly to reduce the difficulties of the task, a very limited language model was used. Future research can be directed to more extensive language models. In ASR systems acoustic information is sampled as a signal suitable for processing by computers and fed into a recognition process. The output of the system is a hypothesis transcription of the utterances.

14

Figure 3.1: Components of an ASR system

Speech recognition is a complicated task and state of the art recognition systems are very complex. For pragmatic reasons the project was restricted to the same domain as the HTK tutorial suggests namely instructions that a telephone can perform, ”Dial one two zero”. System construction approach. There are a big number of different approaches for the implementation of an ASR but for this project the four major processing steps as suggested by HTK (Young S.et al., 2002,) [35] were considered namely data preparation, training, Recognition/testing and analysis. For implementation purposes the following sub-processes were taken i. Building the task grammar ii. Constructing a dictionary for the models iii. Recording the data. iv. Creating transcription files for training data v. Encoding the data (feature processing) vi. (Re-) training the acoustic models vii. Evaluating the recognisers against the test data viii. Reporting recognition results 15

3.1

Data Preparation

The first stage of any recogniser development project is data preparation. Speech data is needed both for training and for testing. In the system built here, all of this speech was recorded from scratch. The training data is used during the development of the system. Test data provides the reference transcriptions against which the recogniser’s performance can be measured and a convenient way to create them is to use the task grammar as a random generator. In the case of the training data, the prompt scripts will be used in conjunction with a pronunciation dictionary to provide the initial phone level transcriptions needed to start the HMM training process. It follows from above that before the data can be recorded, a phone set must be defined, a dictionary must be constructed to cover both training and testing and a task grammar must be defined.

3.1.1

The Task Grammar

The task grammar defines constraints on what the recognizer can expect as input. As the system built provides a voice operated interface for phone dialling, it handles digit strings. For the limited scope of this project, only a the digits 0, 1, 9 making toy grammar were needed. The grammar was defined in BN-form, as follows: $variable defines a phrase as anything between the subsequent = sign and the semicolon, where | stands for a logical or. Brackets have the usual grouping function and square brackets denote optionality. The used toy grammar was:

# #Task grammar # $digit=RIMWE|KABIRI|GATATU|KANE|GATANU|GATANDATU|KARINDWI|UMUNANI|ICYENDA|ZERO; (SENT-START[$digit] SENT-END)

The above grammar can be depicted as a network as shown below 16

Figure 3.2: Grammar for voice dialling

Word network The above high-level representation of a task grammar is provided for user convenience. The HTK recogniser actually requires a word network to be defined using a low level notation called HTK Standard Lattice Format (SLF) in which each word instance and each word-toword transition is listed explicitly. This word network can be created automatically from the grammar above using the HParse tool, thus assuming that the file gram contains the above grammar, executing HParse gram wdnet Creates an equivalent word network in the file wdnet (appendix A) see the figure below

Figure 3.3: Process of creating a word lattice

17

The above created lattice can now be used by another HTK tool HSGen to generate random sentences. These are the sentences that are used later for training and testing purposes.

3.1.2

A Pronunciation Dictionary

The dictionary provides an association between words used in the task grammar and the acoustic models which may be composed of sub word (phonetic, sysllabic etc,,) units. Since this project provides a voice operated interface the dictionary could have been constructed by hand but the researcher wanted to try a different method which could be used to construct a dictionary for a large vocabulary ASR system. In order to train the HMM network, a large pronunciation dictionary is needed. Since we are using whole-word models in this assignment, the dictionary has a simple structure. A file called ’lexicon’ was created with the following structure: GATANDATU gatandatu GATANU gatanu GATATU gatatu ICYENDA icyenda KABIRI kabiri KANE kane KARINDWI karindwi RIMWE rimwe SENT-END [] sil SENT-START [] sil UMUNANI umunani ZERO zero

A file named wdlist.txt was created containing all the words that make up the vocabulary. GATANDATU GATANU GATATU ICYENDA

18

KABIRI KANE KARINDWI RIMWE SENT-END SENT-START UMUNANI ZERO The dictionary was created finally by using HDman as follows HDman -m -w wdlist.txt -n models1 -l dlog dict lexicon This will create a new dictionary called dict by searching the source dictionary(s) lexicon to find pronunciations for each word in wdlist.txt. Here, the wdlist.txt in question needs only to be a sorted list of the words appearing in the task grammar given above. The option -l instructs HDMan to output a log file dlog which contains various statistics about the constructed dictionary. In particular, it indicates if there are words missing. HDMan can also output a list of the words used, here called models1. Once training and test data has been recorded, an HMM will be estimated for each of these words.

3.1.3

Recording

In order to train and test the recognizer on the domain and on the voices of some selected people, 10 sentences were automatically generated from the grammar with HTK’s HSGen. See appendix B for the training and testing sentences. Speech data of six (6) different speakers 3 males and 3 females of different age groups was recorded. Due to my lack of access to a recording studio, the recordings were done in an office on Sundays when there are no people in the office. As the toolkit does not require phoneme duration information for the training sentences, the (differences in) timing in the pronunciation of the training sentences is not important. The toolkit learns to recognise the words through fitting the word transcriptions on the training set. These transcriptions are used for all realisations of the same sentence, even though there might be variation between speakers relative to the transcription. The speakers were given a list with sentences which they had to read aloud. After about 5 19

sentences they took a short break, and drank a glass of water. The training corpus consisting of 150 sentences were recorded and labelled using the HTK tool HSLab.

Figure 3.4: Recording and labelling data using hslab

After recording and labelling the training sentences, a test corpus was also created the same way as the training corpus but in this case 70 sentences were used for training. The differences noted in pronunciation between speakers (and their consequences) can be categorised as articulation variation E.g., some speakers had a rolling ‘r’, others not in, example, ‘kabiri’, ’rimwe’ Phonetic change degrades the quality of the training set, since the same phonetic transcription was used for all speakers. These phonetic changes problems were solved by using isolated whole word models and having many different sentences such at the end of the day I created a speaker independent system. Articulation variation on the other hand is of course a problem for recognition but if there 20

was no articulation variation the task of recognising would become an instance based learning problem.

3.1.4

Phonetic Transcription

For training, we need to tell the recognizer which files correspond to what digit. HTK uses the so-called Master Label Files (MLF) to store information associated to speech. What makes things a bit confusing is the fact that there are two things an MLF can contain: words and phonemes. In the tutorial the usages of various HTK tools are shown that can convert lists of sentences into lists of words and then lists of phonemes, the last two in an MLF. Since the objective of this project was to create an isolated word recognizer, a file called source.mlf was created associating each recorded and labelled speech data with a word. #!MLF!# ”data/train/rimwe01.lab” RIMWE . ”data/train/rimwe02.lab” RIMWE . Etc.. See appendix C for details It is assumed that rimwe01.WAV contains the utterance ’rimwe’, and so on. Next, the model transcriptions must be obtained. For this, create an HTK edit script called ’mkphones0.led’ containing the following: EX IS sil sil DE sp The HTK tool HLed was used to the word transcriptions into model transcriptions (models0.mlf): HLEd -d dict -i models0.mlf mkphones0.led source.mlf

21

3.1.5

Encoding the Data

The speech recognition tools cannot process directly on speech waveforms. These have to be represented in a more compact and efficient way. This step is called ”acoustical analysis”: The signal is segmented in successive frames (whose length is chosen between 20ms and 40ms, typically), overlapping with each other. Each frame is multiplied by a windowing function (e.g. Hamming function). A vector of acoustical coefficients (giving a compact representation of the spectral properties of the frame) is extracted from each windowed frame. In order to specify to HTK the nature of the audio data (format, sample rate, etc.) and feature extraction parameters (type of feature, window length, pre-emphasis, etc.), a configuration file (config.txt) was created as follows: #Coding parameters SOURCEKIND = waveform SOURCEFORMAT = HTK SOURCERATE = 625 TARGETKIND = MFCC 0 D A TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F To run a HCopy a list of each source file and its corresponding output file was created. The first few lines look like: data/train/rimwe01.SIG data/MFC/rimwe.MFC data/train/rimwe02.sig data/MFC/rimwe02.MFC data/train/rimwe03.sig data/mfc/rimwe03.mfc 22

. . data/train/sil10.sig data/MFC/sil10.MFC See appendix D for details One line for each file in the training set. This file tells HTK to extract features from each audio file in the first column and save them to the corresponding feature file in the second column. The command used is: HCopy -T 1 -C config.txt -S hcopy.scp

3.2

Parameter Estimation (Training)

Defining the structure and overall form of a set of HMMs is the first step towards building a recognizer. The second step is to estimate the parameters of the HMMs from examples of the data sequences that they are intended to model. This process of parameter estimation is usually called training. The topology for each of the hmm to be trained is built by writing a prototype definition. HTK allows HMMs to be built with any desired topology. HMM definitions can be stored externally as simple text files and hence it is possible to edit them with any convenient text editor. With the exception of the transition probabilities, all of the HMM parameters given in the prototype definition are ignored. The purpose of the prototype definition is only to specify the overall characteristics and topology of the HMM. The actual parameters will be computed later by the training tools. Sensible values for the transition probabilities must be given but the training process is very insensitive to these. An acceptable and simple strategy for choosing these probabilities is to make all of the transitions out of any state equally likely. In principle the HMM should be tested on a large corpus containing wide range of word pronunciations. For this purpose 150 sentences were recorded and labelled as stated above see the training corpus CD for training data.

3.2.1

Training Strategies

HTK offers two different approaches to training speech data

23

Figure 3.5: Training HMMs Firstly, an initial set of models must be created. If there is some speech data available for which the location of the word boundaries have been marked, then this can be used as bootstrap data. In this case, the tools HInit and HRest provide isolated word style training using the fully labeled bootstrap data. Each of the required HMMs is generated individually. HInit reads in all of the bootstrap training data and cuts out all of the examples of the required phone. It then iteratively computes an initial set of parameter values using a segmental k-means procedure. On the first cycle, the training data is uniformly segmented, each model state is matched with the corresponding data segments and then means and variances are estimated. If mixture Gaussian models are being trained, then a modified form of k-means clustering is used. On the second and successive cycles, the uniform segmentation is replaced by Viterbi alignment. The initial parameter values computed by HInit are then further re-estimated by HRest. Since this project we were interested in isolated whole word the following strategy was used as described above.

If there’s no marked data, the tool HCompV is used. In this project since all the data was labelled, then HInit and HRest were used for training purposes.

24

Figure 3.6: Training isolated whole word models

3.2.2

HMM Definition.

The first step in HMM training is to define a prototype model. The purpose of the prototype is to define a model topology on which all the other models can be based. In HTK a HMM is a description file and in this case it is

~o 39 ~h "proto" 6 2 25

39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 3 39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4 39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 5 39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

26

39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 6 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.4 0.3 0.3 0.0 0.0 0.0 0.0 0.4 0.3 0.3 0.0 0.0 0.0 0.0 0.4 0.3 0.3 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0

Models for each of the events were also constructed, see appendix E for the details.

3.2.3

HMM Training

The training described in the parameter estimation introduction can be summarized in a diagram form as below.

Figure 3.7: HMM training process

27

Initialisation The HTK tool HInit was used to initialize the data as given below HInit -A -D -T 1 -S train.scp -M model/hmm0 -H hmmfile -l label -L label dir nameofhmm where: nameofhmm is the name of the HMM to initialise (here: yes, no, or sil). hmmfile is a description file containing the prototype of the HMM called nameofhmm (here: hmm rimwe, hmm kabiri, e.t.c). trainlist.txt gives the complete list of the .mfcc files forming the training corpus (stored in directory data/train/mfc). label dir is the directory where the label files (.lab) corresponding to the training corpus (here: data/train/lab/). label indicates which labelled segment must be used within the training corpus (here: rimwe, kabiri, etc.. model/hmm0 is the name of the directory (must be created before) where the resulting initialised HMM description will be output. This procedure has to be repeated for each model (hmm rimwe, hmm kabiri, hmm gatatu etc..). The HMM file output by HInit has the same name as the input prototype. E.g HInit -A -D -T 1 -S train.scp -M model/hmm0 -H hmm 1.txt -l rimwe -L data/train rimwe This process was repeated for all the models. The HTK tool HCompV was used to initialize the models to the training data as follows. HCompV -C config.txt -f 0.01 -m -S train.scp -M hmm0 proto.txt HCompv was not used to initialise the models (it was already done with HInit). HCompv is only used here because it outputs, along with the initialised model, an interesting file called vFloors, which contains the global variance vector multiplied by a factor 0.01 (see Appendix F). The values stored in varFloor1 (called the ”variance floor macro”) are to be used later during the training process as floor values for the estimated variance vectors. This results in the creation of two files proto and vFloors - in the directory hmm0. These files were edited in the following way: An error occurs at this point which rearranges the order of the parts of the MFCC 0 D A label as MFCC D A 0. This was corrected. The first three lines of proto were cut and pasted into vFloors, this was then saved as macros.

28

3.2.4

Training

The following command line was used to perform one re-estimation iteration with HTK tool HRest, estimating the optimal values for the HMM parameters (transition probabilities, plus mean and variance vectors of each observation function): HRest -A - D -T 1 -S train.scp -M model/hmm1 -H vFloors -H model/hmm0/hmm 1.txt -l rimwe -L data/train rimwe. train.scp gives the complete list of the .mfc files forming the training corpus (stored in directory data/train/mfc). Model/hmm1, the output directory, indicates the index of the current iteration. vFloors is the file containing the variance floor macro obtained with HCompv. Hmm 1.txt is the description file of the HMM called rimwe. It is stored in a directory whose name indicates the index of the last iteration (here model/hmm0). -l rimwe is an option that indicates the label to use within the training data (rimwe, kabiri, etc). Data/train is the directory where the label files (.lab) corresponding to the training corpus. rimwe is the name of the HMM to train . This procedure has to be repeated several times for each of the HMMs( kabiri. Gatatu, kane. Sil) to train. Each time, the HRest iterations (i.e. iterations within the current re-estimation iteration) are displayed on screen, indicating the convergence through the change measure. As soon as this measure do not decrease (in absolute value) from one HRest iteration to another, it’s time to stop the process. In this project 3 re-estimation iterations were used. The final word HMMs are then: hmm3/hmm 1, hmm3/hmm 0, and hmm3/hmm sil etc.. A file called hmmdefs.txt was created by combining all the hmms into one file which was consequently named hmmdefs.txt (See appendix E). After each iteration an error occurred which rearranges the order of the parts of the MFCC 0 D A label as MFCC D A 0 which was consequently corrected after each iteration.

3.3

Recognition

The recognizer is now complete and its performance can be evaluated. The recognition network and dictionary have already been constructed, and test data has been recorded. Thus, all that is necessary is to run the recognizer. The recognition process can be summarized as in the figure below.

29

Figure 3.8: Speech recognition process An input speech signal input signal is first transformed into a series of ”acoustical vectors” (here MFCs) using the HTK tool HCopy, in the same way as what was done with the training data . The result was stored in a file known as test.scp (often called the acoustical observation). The input observation was then processed by a Viterbi algorithm, which matches it against the recogniser’s Markov models using the HTK tool HVite: As follows HVite -A -D -T 1 -H model/hmm3/hmmdefs.txt -i recout.mlf -w wdnet dict hmmlist.txt -S test.scp. Where: hmmdefs.txt contains the definition of the HMMs. It is possible to repeat the -H option and list the different HMM definition files, in this case: -H model/hmm3/hmm 0.txt -H model/hmm3/hmm 1.txt etc.. but it is more convenient (especially when there are more than 3 models) to gather every definitions in a single file called a Master Macro File. For this project this file was obtained by copying each definition after the other in a single file, without repeating the header information (see Appendix E). The output is stored in a file (recout.mlf) which contains the transcription of the input (see appendix g). recout.mlf is the output recognition transcription file. Wdnet is the task network. 30

dict is the task dictionary. hmmlist.txt lists the names of the models to use (rimwe, kabiri, etc..). Each element is separated by a new line character. Test.scp is the input data to be recognised.

3.4

Running the Recognizer Live

The built recogniser was tested with live input. To do this the configuration variables parameters were altered as given below # Waveform capture SOURCERATE=625.0 SOURCEKIND=HAUDIO SOURCEFORMAT=HTK ENORMALISE=F USESILDET=T MEASURESIL=F OUTSILWARN=T These indicate that the source is direct audio with sample period 62.5 secs. The silence detector is enabled and a measurement of the background speech/silence levels was made at start-up.The final line makes sure that a warning is printed when this silence measurement is being made. Once the configuration file had been set-up for direct audio input, the HTK tool HVite was again used to recognize the live in put using a microphone.

31

Chapter 4 RESULTS The recognition performance evaluation of an ASR system must be measured on a corpus of data different from the training corpus. A separate test corpus, with new Kinyarwanda language digits records, was created as it was previously done with the training corpus. The test corpus was made of 50 recorded and labelled data which were later converted into MFC. In order to test for speaker independency of the system, some of the sujects who participated in creation of the testing corpus had not participated in creation of the training corpus.

4.1

Perfomance Test

Evaluation of the performance of the speech recognition system was done by using the HTK tool HResults. On running and testing the tool against the testing data, the following performance statistics were obtained:

4.2

Perfomance Analysis

The first line (SENT) gives the sentence recognition rate (%Correct=92.00), the second one (WORD) gives the word recognition rate (%Corr=94.87.00). The first line (SENT) should 32

Figure 4.1: Speech recognition results be considered here. H=46 gives the number of test data correctly recognized, S=4 the number of substitution errors and N=50 the total number of test data. These results imply that of the 50 sentences making the testing corpus only 46 were correctly recognized which is equivalent to 92.00% and four (4) sentences were substituted by other sentences. The statistics given on the second line (WORD) only make sense with more sophisticated types of recognition systems (e.g. connected words recognition tasks). Nevertheless,there were 6 deletion errors (D), 2 substitution errors (S) and 0 insertion errors (I). N 156 gives the total number of words making the test data and of these 148 were correctly recognized leading to a 94.87% recognition. The accuracy figure (Acc) of 94.87% is the same as the percentage correct (Cor) because it takes account of the insertion errors, which the latter does not but in this case the insertion errors are zero. These results indicate that the training of the system was successful and and that the developed system is speaker independent.

4.3

Testing the System on Live Data

To further test the system on live data and also again test its speaker independency, the system was tested by running it live. Four (4) different speakers who never participated in the creation of the training corpus helped in testing the system live. Subjects read loudly the Kinyarwanda language numeric digits and the table below gives a summary of the results. These results show that the system is speaker independent with a few errors which can be reduced by training the system on a larger training data and also including recordings from speakers from different regions of the great lakes region who speak Kinyarwanda.

33

Figure 4.2: Live data recognition results

34

Chapter 5 DISCUSSION, CONCLUSION AND RECOMMENDATIONS In this project, the main task was to develop an automatic speech recognizer for Kinyarwanda language. This system is aimed at improving on the current Human-computer interface by introducing a voice interface, which has proved to have so many advantages to the traditional I/O methods. Users naturally know how to speak so this would provide an easy interface which does not necessarily require special training which is normally the case when you’re to the use the various ICT tools for the first time. The scope was limited to only the numeric digits which could be used in many systems most especially the automatic telephone dialing system. This five chapter report contains the introduction to the study in chapter one and literature review on human-computer interfaces, ASR and on going African ASR projects. Chapter three the methodology that was used to achieve the objectives was looked at while chapter four concentrated on performance and testing the recognizer developed. This is the last chapter of the report in which the discussion, conclusion and recommendations are given.

5.1

Discussion

It has been discovered that there are many people who have a computer phobia. The reasons why many people fear to use ICT tools has been due to the indaquate user interfaces which make it difficult for the new users to explore or take a step into using these unavoidable ICT tools. A lot has been done by many researchers on improving the user interfaces and one of 35

the improvements has been including voice interfaces. It was noted by the researcher that most of these systems developed were mainly considering the five major languages International languages. The researcher therefore, found it necessary to build an ASR system which could be a starting point for many educational and commercial projects on building speech recognisers for Kinyarwanda language. In order to develop the system, the researcher first read and analysed research papers on the trends in speech recognition. Then he read reviews on the current state of the art speech recognisers. Before attempting building a speech recogniser for a new language it is always advisable to start by building one language which is already tested and in this case the researcher first constructed an English Yes and No recogniser which paved the way for the new language speech recognisers. The Cambridge University Hidden Markov Models toolkit (HTK) was used for the implementation of the recogniser. HTK was used because it is free and has been used by many reaserchers all over the world. HTK supports both isolated whole word recognition and sub-word or phone based recognition. Although the research in the area of automatic speech recognition has been pursued for the last three decades, only whole-word based speech recognition systems have found practical use and have become commercial successes (Rabiner et al., 1981 [22]; Wilpon et al., 1988 [34]). Two important reasons for this success are that the effects of context-dependence and co-articulation within the word are implicitly built into the word models and that there is no necessity of lexical decoding. Isolated word recognition was considered for this project because it proved to be much easier because the pauses between the words make it easy to detect the start and end making it possible to detect each word at a time. A limited grammar and dictionary were constructed to be used by the recognizer. The Speech data was recorded and labeled from 6 different speakers making the training and the testing corpus. Since the researcher had labeled training data, the HTK tools HInit and HRest were used during the initialization and training processing. The results obtained from the system showed that the system can automatically recognize 94.87 percent words of any Kinyarwanda lan-

36

guage speaker. The system was also tested on live data and it performed well. Four different speakers participated in the testing of the system on live data and performance was very good as seen in figure 4.2. There were some cases where the word kane was substituted with the word karindwi. This problem was mainly observed with some specific speakers not all.

5.2

Conclusion

The objective of this study was mainly to build a speech recognizer for Kinyarwanda language . In order to meet this objective a limited word grammar was constructed, a dictionary created and data from different Kinyarwanda language speakers was recorded and trained thereafter. The system was tested using testing corpus data and live data and the system scored 92.00% sentence recognition and 94.87% word recognition. This implies that the objective of creating a system that can recognize spoken Kinyarwanda language was achieved. The Kinyarwanda language automatic speech recognition recipe accompanying this report can be used by any researcher desiring to join language processing research. The project is however not all conclusive as it has catered for only a voice operated phone dialing system. As much as it has created a basis for research, this project can be expanded to cater for more extensive language models and larger vocabularies.

5.3

Areas for Further Study

In spite of the successes of the whole word model speech recognizers which is also exemplified in the success of this project, they suffer from two problems: • Co-articulation effects across the word boundaries. This problem has been reasonably well solved and connected word recognition systems with good performance have been reported in the literature (Rabiner et al., 1981 [22]; Wilpon et al., 1988 [34]). • Amount of training data. It is extremely difficult to obtain good whole word reference models from a limited amount of speech data available for training. This training problem becomes even worse for large vocabulary speech recognition systems. 37

It is because of the above reasons that I therefore recommend for future research to be taken in large vocabulary Kinyarwanda language speech recognition, using sub-words (phonemes) which solve the above mentioned problems. A sub-word based approach is a viable alternative to the whole-word based approach because here, the word models are built from a small inventory of sub-word units. Phoneme HMMs are generalisable (trainable) both towards larger vocabulary and towards different speakers.

38

REFERENCES 1. Baum, L.E., and Petrie, T., (1966). Statistical Inference for Probabilistic functions of Finite-State Markov Chains, Annotated Mathematical Statistics,37:1554-1563. 2. Bridle, J., Deng, L., Picone, J., Richards, H., Ma, J., Kamm, T., Schuster, M., Pike, S., Reagan, R., 1998. An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for the 1998 Workshop on Language Engineering, Center for Language and Speech Processing at Johns Hopkins University, pp. 161. 3. Cole, R. Noel, M. Burnet, D.C., Fanty,M., Lander, T., Oshika, B., Sutton ,S., 1994 Corpus development activities at the center for spoken language understanding. Human Language Technology Conference archive, Proceedings of the workshop on Human Language Technology. Pages: 31 - 36 . 4. R. Cole, K. Roginski, and M. Fanty.,1992 A telephone speech database of spelled and spoken names. In ICSLP’92, volume 2, pages 891–895. 5. Deshmukh, N., Ganapathiraju, A, Picone J., (1999), Hierarchical Search for Large Vocabulary Conversational Speech Recognition. IEEE Signal Processing Magazine, 1(5):84-107. 6. Dix, A.J., Finlay,J., Abowd, G., Beale, R. (1998). Human-Computer Interaction, 2nd edition, Prentice Hall, Englewood Cliffs, NJ,USA. 7. Dupont,S., (2000), Audio-Visual Speech Modeling for Continuous Speech Recognition, IEEE Transactions on multimedia, 2(3):141-151

39

8. Earth trends, (2003) Population, Health, and Human Well-Being- Rwanda. Retrieved 20-01-2005 from http://earthtrends.wri.org/pdf library/country profiles/Pop cou 646.pdf. 9. Huang, C., Tao, C., AND Chang,E., (2004). Accent Issues in Large Vocabulary Continuous Speech Recognition INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY(7):141153 10. Jurafsky D., Martin J. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Delhi, India: Pearson Education. 11. Kagaba,S., Nsanzabaganwa, S., Mpyisi,E., (2003), Rwanda Country Position Paper, Regional Workshop on Ageing and Poverty Dar es Salaam, Tanzania.retrieved 20-022005 from http://www.un.org/esa/socdev/ageing/workshops/tz/rwanda.pdf. 12. Kandasamy, S., (1995),Speech recognition systems. SURPRISE Journal,1(1). 13. Liu, F.H., Liang G., Yuqing G. AND Picheny, M, (2004).

Applications of Lan-

guage Modeling in Speech-To-Speech Translation INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY (7):221-229. 14. Ma, J., Deng, L., 2004. Target-directed mixture linear dynamic models for spontaneous speech recognition. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 12, NO. 1, JANUARY 2004. 15. Ma, J., Deng, L.,2004 A mixed-level switching dynamic system for continuous speech recognition. Elsevier Computer Speech and Language 18 (2004) 4965. 16. Mane, A., Boyce,S., Karis,D.,Yankelovich,N., (1996) Designing the User Interface for Speech Recognition Applications SIGCHI Bulletin 28(4):29-34. 17. Mengjie, Z., (2001) Overview of speech recognition and related machine learning techniques, Technical report. retrieved December 10, 2004 from http://www.mcs.vuw.ac.nz/comp/Publications/archive/CS-TR-01/CS-TR-01-15.pdf

40

18. Mori R.D, Lam L., and Gilloux M. (1987). Learning and plan refinement in a knowledgebased system for automatic speech recognition. IEEE Transaction on Pattern Analysis Machine Intelligence, 9(2):289-305. 19. Picheny, M., (2002). Large vocabulary speech recognition, IEEE Computer, 35(4):4250. 20. Pinker, S., (1994), The Language Instinct, Harper Collins, New York City, New York, USA. 21. Rabiner L.R., S.E.L.evinson: (1981) ”Isolated and connected word recognition - Theory and selected applications”, IEEE Trans. COM-29, pp.621-629 22. Rabiner, L., R., and Wilpon, J. G., (1979). Considerations in applying clustering techniques to speaker-independent word recognition.Journal of Acoustic Society of America.66(3):663-673. 23. Reddy D.R., (1976). Speech Recognition by Machine: a Review. Proceeding of IEEE, 64(4):501-531 24. Robertson, J., Wong, Y.T., Chung, C., and Kim, D.K., (1998) Automatic Speech Recognition for Generalised Time Based Media Retrieval and Indexing, Proceedings of the sixth ACM International Conference on Multimedia(pp 241-246) Bristol, England. 25. Roux, J.C., Botha, E.C., and Du Preez, J.A., (2000). Developing a Multilingual Telephone Based Information System in African Languages. Proceedings of the Second International Language Resources and Evaluation Conference. Athens, Greece:ELRA. (2):975-980. 26. Rudnicky, A.I., Lee, K.F., and Hauptmann, A.G. (1992) Survey of current speech technology. Communications of the ACM,37(3):52-57. 27. Scan soft (2004). Embeded speech soloutions retrieved January 25, 2005 from http://www.speechworks.com/ 28. Silverman, H.,F., and Morgan, D.P., (1990). The application of dynamic programming to connected speech recognition. IEEE ASSP Magazine,7(3):6-25. 41

29. Svendsen T., Paliwal K. K., Harborg E., Husy P. O. (1989). Proc. ICASSP’89, Glasgow, 30. Tiong, B., (1997) Speech Recognition retrieved December 10, 2004 from http://murray.newcastle.edu.au/users/staff/speech/home pages/tutorial sr.html. 31. Tolba, H., and O’Shaughnessy, D., (2001). Speech Recognition by Intelligent Machines, IEEE Canadian Review (38). 32. Warwick, C., 1997 What is the BNC? [Online]. Available from World Wide Web: http://www.hcu.ox.ac.uk/BNC¿ retrieved on 20-05-2005. 33. Webster’s dictionary (2004). illiterate retrieved September 23, 2004 from http://www.websterdictionary.org/definition/illiterate. 34. Wilpon J.G., D.M.DeMarco,R.P.Mikkilineni (1988) ”Isolated word recognition over the DD telephone network -Results of two extensive field studies”, Proc. ICASSP,pp. 55-58 35. Young, S., G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland,( 2002) The HTK Book. Retrieved April 1, 2005 from: http://htk.eng.cam.ac.uk. 36. Zue, V., Cole, R., Ward, W. (1996). Speech Recognition.Survey of the State of the Art in Human Language Technology. Kauii, Hawaii, USA

42

APPENDICES Apendix A Word Network VERSION=1.0 N=15 L=24 I=0 W=!NULL I=1 W=!NULL I=2 W=SENT-START I=3 W=RIMWE I=4 W=!NULL I=5 W=KABIRI I=6 W=GATATU I=7 W=KANE

I=8 W=GATANU I=9 W=GATANDATU I=10 W=KARINDWI I=11 W=UMUNANI I=12 W=ICYENDA I=13 W=ZERO I=14 W=SENT-END J=0 S=14 E=1 J=1 S=0 E=2 J=2 S=2 E=3 43

J=3 S=3 E=4 J=4 S=5 E=4 J=5 S=6 E=4 J=6 S=7 E=4 J=7 S=8 E=4 J=8 S=9 E=4 J=9 S=10 E=4 J=10 S=11 E=4 J=11 S=12 E=4 J=12 S=13 E=4 J=13 S=2 E=5 J=14 S=2 E=6 J=15 S=2 E=7 J=16 S=2 E=8 J=17 S=2 E=9 J=18 S=2 E=10 J=19 S=2 E=11 J=20 S=2 E=12 J=21 S=2 E=13 J=22 S=2 E=14 J=23 S=4 E=14

44

Appendix B Training Sentences 1. sil sil 2. sil gatatu sil 3. sil gatanu sil 4. sil gatanu sil 5. sil sil 6. sil karindwi sil 7. sil zero sil 8. sil umunani sil 9. sil gatanu sil 10. sil kane sil 11. sil icyenda sil 12. sil zero sil 13. sil icyenda sil 14. sil gatandatu sil 15. sil zero sil 16. sil sil 17. sil umunani sil 18. sil umunani sil 19. sil gatatu sil 20. sil gatandatu sil 21. sil karindwi sil 22. sil kane sil 23. sil karindwi sil 24. sil gatandatu sil 25. sil kane sil 26. sil gatanu sil 27. sil gatatu sil 28. sil zero sil

45

29. sil sil 30. sil sil 31. sil icyenda sil 32. sil kabiri sil 33. sil kabiri sil 34. sil gatanu sil 35. sil gatanu sil 36. sil icyenda sil 37. sil kabiri sil 38. sil kane sil 39. sil gatanu sil 40. sil gatanu sil 41. sil gatanu sil 42. sil icyenda sil 43. sil gatanu sil 44. sil rimwe sil 45. sil zero sil 46. sil sil 47. sil sil 48. sil kane sil 49. sil zero sil 50. sil gatandatu sil

46

Appendix C Master label file #!MLF!# ”data/train/rimwe01.lab” RIMWE . ”data/train/rimwe02.lab” RIMWE . ”data/train/rimwe03.lab” RIMWE . ”data/train/rimwe04.lab” RIMWE . ”data/train/rimwe05.lab” RIMWE . ”data/train/rimwe06.lab” RIMWE . ”data/train/rimwe07.lab” RIMWE . ”data/train/rimwe08.lab” RIMWE . ”data/train/rimwe09.lab” RIMWE .

47

”data/train/rimwe10.lab” RIMWE . ”data/train/rimwe11.lab” RIMWE . ”data/train/rimwe12.lab” RIMWE . ”data/train/rimwe13.lab” RIMWE . ”data/train/rimwe14.lab” RIMWE . ”data/train/rimwe15.lab” RIMWE . ”data/train/kabiri01.lab” KABIRI . ”data/train/kabiri01.lab” KABIRI . ”data/train/kabiri02.lab” KABIRI . ”data/train/kabiri03.lab” KABIRI . ”data/train/kabiri03.lab” KABIRI . ”data/train/kabiri04.lab” KABIRI .

48

”data/train/kabiri05.lab” KABIRI . ”data/train/kabiri06.lab” KABIRI . ”data/train/kabiri07.lab” KABIRI . ”data/train/kabiri08.lab” KABIRI . ”data/train/kabiri09.lab” KABIRI . ”data/train/kabiri10.lab” KABIRI . ”data/train/kabiri11.lab” KABIRI . ”data/train/kabiri12.lab” KABIRI . ”data/train/kabiri13.lab” KABIRI . ”data/train/kabiri14.lab” KABIRI . ”data/train/kabiri15.lab”

49

KABIRI . ”data/train/gatatu01.lab” GATATU . ”data/train/gatatu02.lab” GATATU . ”data/train/gatatu03.lab” GATATU . ”data/train/gatatu04.lab” GATATU . ”data/train/gatatu05.lab” GATATU . ”data/train/gatatu06.lab” GATATU . ”data/train/gatatu07.lab” GATATU . ”data/train/gatatu08.lab” GATATU . ”data/train/gatatu09.lab” GATATU . ”data/train/gatatu10.lab” GATATU

50

. ”data/train/gatatu11.lab” GATATU . ”data/train/gatatu12.lab” GATATU . ”data/train/gatatu13.lab” GATATU . ”data/train/gatatu14.lab” GATATU . ”data/train/gatatu15.lab” GATATU . ”data/train/kane01.lab” KANE . ”data/train/kane02.lab” KANE . ”data/train/kane03.lab” KANE . ”data/train/kane04.lab” KANE . ”data/train/kane05.lab” KANE .

51

”data/train/kane06.lab” KANE . ”data/train/kane07.lab” KANE . ”data/train/kane08.lab” KANE . ”data/train/kane09.lab” KANE . ”data/train/kane10.lab” KANE . ”data/train/kane11.lab” KANE . ”data/train/kane12.lab” KANE . ”data/train/kane13.lab” KANE . ”data/train/kane14.lab” KANE . ”data/train/kane15.lab” KANE . ”data/train/gatanu01.lab”

52

GATANU . ”data/train/gatanu02.lab” GATANU . ”data/train/gatanu03.lab” GATANU . ”data/train/gatanu04.lab” GATANU . ”data/train/gatanu05.lab” GATANU . ”data/train/gatanu06.lab” GATANU . ”data/train/gatanu07.lab” GATANU . ”data/train/gatanu08.lab” GATANU . ”data/train/gatanu09.lab” GATANU . ”data/train/gatanu10.lab” GATANU . ”data/train/gatanu11.lab” GATANU

53

. ”data/train/gatanu12.lab” GATANU . ”data/train/gatanu13.lab” GATANU . ”data/train/gatanu14.lab” GATANU . ”data/train/gatanu15.lab” GATANU . ”data/train/gatandatu01.lab” GATANDATU . ”data/train/gatandatu02.lab” GATANDATU . ”data/train/gatandatu03.lab” GATANDATU . ”data/train/gatandatu04.lab” GATANDATU . ”data/train/gatandatu05.lab” GATANDATU . ”data/train/gatandatu06.lab” GATANDATU .

54

”data/train/gatandatu07.lab” GATANDATU . ”data/train/gatandatu08.lab” GATANDATU . ”data/train/gatandatu09.lab” GATANDATU . ”data/train/gatandatu10.lab” GATANDATU . ”data/train/gatandatu11.lab” GATANDATU . ”data/train/gatandatu12.lab” GATANDATU . ”data/train/gatandatu13.lab” GATANDATU . ”data/train/gatandatu14.lab” GATANDATU . ”data/train/gatandatu15.lab” GATANDATU . ”data/train/karindwi01.lab” KARINDWI . ”data/train/karindwi02.lab”

55

KARINDWI . ”data/train/karindwi03.lab” KARINDWI . ”data/train/karindwi04.lab” KARINDWI . ”data/train/karindwi05.lab” KARINDWI . ”data/train/karindwi06.lab” KARINDWI . ”data/train/karindwi07.lab” KARINDWI . ”data/train/karindwi08.lab” KARINDWI . ”data/train/karindwi09.lab” KARINDWI . ”data/train/karindwi10.lab” KARINDWI . ”data/train/karindwi11.lab” KARINDWI . ”data/train/karindwi12.lab” KARINDWI .

56

”data/train/karindwi13.lab” KARINDWI . ”data/train/karindwi14.lab” KARINDWI . ”data/train/karindwi15.lab” KARINDWI . ”data/train/umunani01.lab” UMUNANI . ”data/train/umunani02.lab” UMUNANI . ”data/train/umunani03.lab” UMUNANI . ”data/train/umunani04.lab” UMUNANI . ”data/train/umunani05.lab” UMUNANI . ”data/train/umunani06.lab” UMUNANI . ”data/train/umunani07.lab” UMUNANI . ”data/train/umunani08.lab”

57

UMUNANI . ”data/train/umunani09.lab” UMUNANI . ”data/train/umunani10.lab” UMUNANI . ”data/train/umunani11.lab” UMUNANI . ”data/train/umunani12.lab” UMUNANI . ”data/train/umunani13.lab” UMUNANI . ”data/train/umunani14.lab” UMUNANI . ”data/train/umunani15.lab” UMUNANI . ”data/train/icyenda01.lab” ICYENDA . ”data/train/icyenda02.lab” ICYENDA . ”data/train/icyenda03.lab” ICYENDA

58

. ”data/train/icyenda04.lab” ICYENDA . ”data/train/icyenda05.lab” ICYENDA . ”data/train/icyenda06.lab” ICYENDA . ”data/train/icyenda07.lab” ICYENDA . ”data/train/icyenda08.lab” ICYENDA . ”data/train/icyenda09.lab” ICYENDA . ”data/train/icyenda10.lab” ICYENDA . ”data/train/icyenda11.lab” ICYENDA . ”data/train/icyenda12.lab” ICYENDA . ”data/train/icyenda13.lab” ICYENDA .

59

”data/train/icyenda14.lab” ICYENDA . ”data/train/icyenda15.lab” ICYENDA . ”data/train/zero01.lab” ZERO . ”data/train/zero02.lab” ZERO . ”data/train/zero03.lab”

ZERO . ”data/train/zero04.lab” ZERO . ”data/train/zero05.lab” ZERO . ”data/train/zero06.lab” ZERO . ”data/train/zero07.lab” ZERO . ”data/train/zero08.lab” ZERO .

60

”data/train/zero09.lab” ZERO . ”data/train/zero10.lab” ZERO . ”data/train/zero11.lab” ZERO . ”data/train/zero12.lab” ZERO . ”data/train/zero13.lab” ZERO . ”data/train/zero14.lab” ZERO . ”data/train/zero15.lab” ZERO

61

Appendix D Training Data data/MFC/rimwe01.MFC data/MFC/rimwe02.MFC data/MFC/rimwe03.MFC data/MFC/rimwe04.MFC data/MFC/rimwe05.MFC data/MFC/rimwe06.MFC data/MFC/rimwe07.MFC data/MFC/rimwe08.MFC data/MFC/rimwe09.MFC data/MFC/rimwe10.MFC data/MFC/rimwe11.MFC data/MFC/rimwe12.MFC data/MFC/rimwe13.MFC data/MFC/rimwe14.MFC data/MFC/rimwe15.MFC data/MFC/kabiri01.MFC data/MFC/kabiri02.MFC data/MFC/kabiri03.MFC data/MFC/kabiri04.MFC data/MFC/kabiri05.MFC data/MFC/kabiri06.MFC data/MFC/kabiri07.MFC data/MFC/kabiri08.MFC data/MFC/kabiri09.MFC data/MFC/kabiri10.MFC data/MFC/kabiri11.MFC data/MFC/kabiri12.MFC data/MFC/kabiri13.MFC

62

data/MFC/kabiri14.MFC data/MFC/kabiri15.MFC data/MFC/gatatu01.MFC data/MFC/gatatu02.MFC data/MFC/gatatu03.MFC data/MFC/gatatu04.MFC data/MFC/gatatu05.MFC data/MFC/gatatu06.MFC data/MFC/gatatu07.MFC data/MFC/gatatu08.MFC data/MFC/gatatu09.MFC data/MFC/gatatu10.MFC data/MFC/gatatu11.MFC data/MFC/gatatu12.MFC data/MFC/gatatu13.MFC data/MFC/gatatu14.MFC data/MFC/gatatu15.MFC data/MFC/kane01.MFC data/MFC/kane02.MFC data/MFC/kane03.MFC data/MFC/kane04.MFC data/MFC/kane05.MFC data/MFC/kane06.MFC data/MFC/kane07.MFC data/MFC/kane08.MFC data/MFC/kane09.MFC data/MFC/kane10.MFC data/MFC/kane11.MFC data/MFC/kane12.MFC data/MFC/kane13.MFC data/MFC/kane14.MFC

63

data/MFC/kane15.MFC data/MFC/gatanu01.MFC data/MFC/gatanu02.MFC data/MFC/gatanu03.MFC data/MFC/gatanu04.MFC data/MFC/gatanu05.MFC data/MFC/gatanu06.MFC data/MFC/gatanu07.MFC data/MFC/gatanu08.MFC data/MFC/gatanu09.MFC data/MFC/gatanu10.MFC data/MFC/gatanu11.MFC data/MFC/gatanu12.MFC data/MFC/gatanu13.MFC data/MFC/gatanu14.MFC data/MFC/gatanu15.MFC data/MFC/gatandatu01.MFC data/MFC/gatandatu02.MFC data/MFC/gatandatu03.MFC data/MFC/gatandatu04.MFC data/MFC/gatandatu05.MFC data/MFC/gatandatu06.MFC data/MFC/gatandatu07.MFC data/MFC/gatandatu08.MFC data/MFC/gatandatu09.MFC data/MFC/gatandatu10.MFC data/MFC/gatandatu11.MFC data/MFC/gatandatu12.MFC data/MFC/gatandatu13.MFC data/MFC/gatandatu14.MFC data/MFC/gatandatu15.MFC

64

data/MFC/karindwi01.MFC data/MFC/karindwi02.MFC data/MFC/karindwi03.MFC data/MFC/karindwi04.MFC data/MFC/karindwi05.MFC data/MFC/karindwi06.MFC data/MFC/karindwi07.MFC data/MFC/karindwi08.MFC data/MFC/karindwi09.MFC data/MFC/karindwi10.MFC data/MFC/karindwi11.MFC data/MFC/karindwi12.MFC data/MFC/karindwi13.MFC data/MFC/karindwi14.MFC data/MFC/karindwi15.MFC data/MFC/umunani01.MFC data/MFC/umunani02.MFC data/MFC/umunani03.MFC data/MFC/umunani04.MFC data/MFC/umunani05.MFC data/MFC/umunani06.MFC data/MFC/umunani07.MFC data/MFC/umunani08.MFC data/MFC/umunani09.MFC data/MFC/umunani10.MFC data/MFC/umunani11.MFC data/MFC/umunani12.MFC data/MFC/umunani13.MFC data/MFC/umunani14.MFC data/MFC/umunani15.MFC data/MFC/icyenda01.MFC

65

data/MFC/icyenda02.MFC data/MFC/icyenda03.MFC data/MFC/icyenda04.MFC data/MFC/icyenda05.MFC data/MFC/icyenda06.MFC data/MFC/icyenda07.MFC data/MFC/icyenda08.MFC data/MFC/icyenda09.MFC data/MFC/icyenda10.MFC data/MFC/icyenda11.MFC data/MFC/icyenda12.MFC data/MFC/icyenda13.MFC data/MFC/icyenda14.MFC data/MFC/icyenda15.MFC data/MFC/zero01.MFC data/MFC/zero02.MFC data/MFC/zero03.MFC data/MFC/zero04.MFC data/MFC/zero05.MFC data/MFC/zero06.MFC data/MFC/zero07.MFC data/MFC/zero08.MFC data/MFC/zero09.MFC data/MFC/zero10.MFC data/MFC/zero11.MFC data/MFC/zero12.MFC data/MFC/zero13.MFC data/MFC/zero14.MFC data/MFC/zero15.MFC

66

Appendix E Hidden Markov Model Definitions (HMMDEFS)

~o 1 39 39 ~h "zero" 6 2 39

-1.538187e+001 1.141508e+001 -3.588139e+000 -1.159882e+000 -1.452020e+000 -8.341283e+00 39

3.046115e+001 3.921619e+001 1.723766e+001 2.001421e+001 3.992482e+001 3.596347e+001 2.7 1.137821e+002 3 39

-1.491195e+000 -6.492606e+000 -1.891563e-001 -6.878118e+000 -6.327397e+000 -1.235269e+0 39

2.520783e+000 8.964164e+000 5.252084e+000 8.973154e+000 5.499793e+000 1.332134e+001 2.1 9.035600e+001 4 39

-9.309770e+000 -9.457813e+000 -2.599780e+000 -1.757934e+001 -1.275383e+001 -1.126780e+0 39

6.970012e+001 2.225276e+001 4.992588e+001 4.126175e+001 2.610523e+001 7.679757e+001 6.1 1.238130e+002 5 39

-2.297705e-001 -4.164129e-002 -1.899639e+000 -9.609221e+000 -5.382258e+000 -1.236597e+0 67

39

8.854380e+000 7.536385e+000 1.740920e+001 4.921722e+001 3.659902e+001 1.955439e+001 4.7 1.009971e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.060647e-001 6.262358e-002 3.131179e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.430364e-001 5.696366e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.249576e-001 7.504237e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.498170e-001 1.501830e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "rimwe" 6 2 39

-7.970719e+000 -7.500427e+000 1.132444e+001 -1.286703e+001 -7.432399e+000 -1.751952e+00 39

3.849774e+001 2.229306e+001 4.268061e+001 8.451440e+001 3.439103e+001 4.802508e+001 5.1 1.317715e+002 3 39

-2.377380e+000 -3.663290e+000 4.965676e+000 -1.033556e+001 -7.324887e+000 -1.087329e+00 39

7.865341e+000 1.011867e+001 4.059527e+000 3.692888e+001 2.392439e+001 1.843463e+001 3.2 1.058150e+002 4 39

-7.165953e+000 -6.947466e+000 6.544258e+000 -1.652563e+001 -9.213765e+000 -1.855777e+00 39

3.759945e+001 6.370345e+000 9.036909e+000 1.956501e+002 2.907838e+001 4.600018e+001 2.4

68

1.079340e+002 5 39

-6.314114e+000 -4.532432e+000 7.106805e+000 -7.048369e+000 -8.000018e+000 -1.071996e+00 39

2.019156e+001 3.633694e+001 1.606951e+001 1.441847e+002 5.447787e+001 4.976671e+001 2.1 1.010713e+002 6 0.000000e+000 9.333376e-001 6.666239e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.137994e-001 8.620062e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.917421e-001 1.082579e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.244619e-001 1.755382e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.221094e-001 7.789055e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "kabiri" 6 2 39

-7.888013e+000 -1.187933e+001 -2.681767e+000 -1.294684e+001 1.227886e+000 -9.493131e+00 39

4.862306e+001 1.635481e+001 3.838939e+001 3.666690e+001 4.893019e+001 3.454409e+001 4.1 1.329512e+002 3 39

-1.659296e+000 -7.659583e+000 8.000670e+000 -1.012939e+000 -4.288243e+000 -1.464354e+00 39

8.231427e+000 6.313235e+000 6.235238e+001 3.491607e+001 1.448567e+001 1.007993e+001 1.4 9.725203e+001 4

69

39

-7.019081e+000 -4.749569e+000 1.820031e+001 -9.001300e+000 -8.113852e+000 -1.175638e+00 39

9.790601e+000 2.004658e+001 1.836600e+001 3.005601e+001 2.896993e+001 3.778489e+001 1.2 1.157377e+002 5 39

-1.058715e+001 -7.019902e-001 1.127821e+001 -2.145595e+001 -6.475991e+000 -1.326184e+00 39

4.183236e+001 3.304543e+001 2.711980e+001 1.644754e+002 4.248949e+001 5.100737e+001 2.8 1.120947e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.191111e-001 8.088891e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.458276e-001 1.130628e-001 4.110956e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.421099e-001 5.789007e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.339923e-001 6.600768e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~o 1 39 39 ~h "kabiri" 6 2 39

-7.888013e+000 -1.187933e+001 -2.681767e+000 -1.294684e+001 1.227886e+000 -9.493131e+00 39

4.862306e+001 1.635481e+001 3.838939e+001 3.666690e+001 4.893019e+001 3.454409e+001 4.1 1.329512e+002

70

3 39

-1.659296e+000 -7.659583e+000 8.000670e+000 -1.012939e+000 -4.288243e+000 -1.464354e+00 39

8.231427e+000 6.313235e+000 6.235238e+001 3.491607e+001 1.448567e+001 1.007993e+001 1.4 9.725203e+001 4 39

-7.019081e+000 -4.749569e+000 1.820031e+001 -9.001300e+000 -8.113852e+000 -1.175638e+00 39

9.790601e+000 2.004658e+001 1.836600e+001 3.005601e+001 2.896993e+001 3.778489e+001 1.2 1.157377e+002 5 39

-1.058715e+001 -7.019902e-001 1.127821e+001 -2.145595e+001 -6.475991e+000 -1.326184e+00 39

4.183236e+001 3.304543e+001 2.711980e+001 1.644754e+002 4.248949e+001 5.100737e+001 2.8 1.120947e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.191111e-001 8.088891e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.458276e-001 1.130628e-001 4.110956e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.421099e-001 5.789007e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.339923e-001 6.600768e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "gatatu" 6 2 39

71

-7.390771e+000 -5.917681e+000 -2.103130e+000 -9.950102e+000 -3.979828e+000 -1.047798e+0 39

3.005470e+001 3.006998e+001 2.985134e+001 5.186708e+001 3.257998e+001 5.920473e+001 7.3 1.438623e+002 3 39

-4.205491e+000 -3.322559e+000 -1.800535e+000 -7.406120e+000 -5.095723e+000 -8.503020e+0 39

1.636298e+001 1.398440e+001 5.007754e+000 1.967910e+001 1.182736e+001 9.890709e+000 1.3 1.011073e+002 4 39

-1.168745e+001 -3.924672e+000 1.028068e+000 -9.808208e+000 -4.435424e-002 -5.277567e+00 39

1.160676e+001 2.277476e+001 2.329525e+001 3.608196e+001 3.057339e+001 2.757198e+001 3.2 1.214261e+002 5 39

2.013859e-001 7.833384e-001 -2.304667e+000 -1.186296e+001 -2.866407e+000 -4.851678e+000 39

1.207510e+001 8.797694e+000 1.094520e+001 3.902149e+001 1.363658e+001 2.133236e+001 3.8 1.057806e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.553569e-001 3.017069e-002 1.447246e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.361491e-001 6.385095e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.064240e-001 9.357602e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.347656e-001 6.523441e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "kane"

72

6 2 39

-8.057542e+000 -1.060878e+001 -4.268703e+000 -1.086176e+001 8.497649e-002 -5.583869e+00 39

4.903220e+001 1.778465e+001 2.172715e+001 3.269134e+001 5.859225e+001 2.064451e+001 4.5 1.328887e+002 3 39

-7.654702e+000 -5.980002e+000 6.218244e+000 -1.785333e+001 -1.128413e+001 -1.349626e+00 39

3.431395e+001 3.294825e+001 7.759271e+000 5.328711e+001 1.814535e+001 2.150519e+001 2.0 1.236389e+002 4 39

-2.756640e+000 -8.790867e+000 7.811357e+000 -1.539050e+001 -1.123307e+001 -1.232298e+00 39

5.670832e-001 2.651911e+000 2.599197e+000 3.942507e+000 5.074362e+000 4.935444e+000 9.3 6.399265e+001 5 39

-6.698071e+000 -1.129157e+000 8.156453e+000 -1.402400e+001 -7.723018e+000 -5.042853e+00 39

2.265060e+001 1.672502e+001 2.089884e+001 7.828613e+001 3.141041e+001 3.859872e+001 1.6 1.026692e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.240341e-001 7.596595e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.692063e-001 8.719578e-002 4.359789e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.010528e-001 9.894721e-002 0.000000e+000

73

0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.704605e-001 1.295396e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "gatanu" 6 2 39

-2.673798e+000 -9.356992e+000 5.441325e-001 -1.141471e+001 -1.703487e+000 -1.510815e+00 39

1.248814e+001 1.846922e+001 1.526149e+001 4.118730e+001 2.809689e+001 2.215244e+001 7.9 1.285426e+002 3 39

-9.179394e+000 -4.328063e+000 -6.742033e-001 -7.842070e+000 -3.467945e+000 -6.189126e+0 39

2.638436e+001 2.927450e+001 3.584095e+001 3.864722e+001 2.810151e+001 2.947144e+001 3.7 1.320189e+002 4 39

-6.039360e+000 -1.091987e+001 -5.664576e+000 -1.614972e+001 -1.432853e+000 -4.402873e+0 39

4.627198e+001 1.405917e+001 2.971918e+001 1.565750e+001 3.092512e+001 4.182518e+001 4.8 1.119498e+002 5 39

-2.673770e+000 -1.938864e+000 2.444091e+000 -1.066288e+001 -5.001587e+000 -8.596553e+00 39

1.219363e+001 8.865636e+000 1.369036e+001 1.443630e+001 1.524414e+001 3.601057e+001 2.0 9.604260e+001 6

74

0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.672363e-001 1.327637e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.390594e-001 6.094063e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.379871e-001 6.201285e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.489780e-001 5.102201e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "gatandatu" 6 2 39

-1.015873e+001 -7.870405e+000 -1.680081e+000 -1.051597e+001 -3.550692e+000 -7.160385e+0 39

3.556055e+001 4.019184e+001 3.388691e+001 4.676527e+001 4.065084e+001 7.829738e+001 6.0 1.409927e+002 3 39

-1.339054e+000 -6.674555e+000 1.699593e+000 -1.258614e+001 -3.930208e+000 -9.488519e+00 39

3.638019e+000 1.876286e+001 1.732436e+001 1.563778e+001 1.900627e+001 1.131656e+001 4.6 1.138005e+002 4 39

-9.892857e+000 -1.979889e+000 8.483336e-001 -7.349046e+000 -1.613775e+000 -5.763394e+00 39

8.659358e+000 8.861095e+000 1.427904e+001 1.546748e+001 3.311852e+001 2.053959e+001 4.1 1.110476e+002 5 39

2.401667e-001 -1.656038e+000 4.144118e-001 -8.085937e+000 -1.530486e+000 -4.962093e+000

75

39

1.279114e+001 4.562047e+000 5.450318e+000 2.033426e+001 1.427527e+001 1.360693e+001 4.9 9.493618e+001 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.720153e-001 2.798467e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.406485e-001 5.935153e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.238227e-001 5.078491e-002 2.539241e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.520043e-001 4.799579e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "karindwi" 6 2 39

-7.236526e+000 -1.066506e+001 2.410714e-001 -1.063585e+001 -4.352904e+000 -7.718532e+00 39

4.779219e+001 1.665519e+001 5.900079e+001 3.422823e+001 5.461636e+001 2.358897e+001 6.5 1.303562e+002 3 39

-8.145572e+000 -3.203343e+000 1.010793e+001 -1.615630e+001 -1.171961e+001 -1.120004e+00 39

5.632361e+001 1.347486e+001 5.860870e+001 5.227048e+001 2.300985e+001 2.158671e+001 6.8 1.163983e+002 4 39

-2.542266e+000 -3.334902e+000 3.259149e+000 -5.300519e+000 -4.552539e+000 -9.100178e+00 39

1.424212e+001 3.903228e+001 3.317758e+001 4.822500e+001 5.220851e+001 6.043075e+001 6.3

76

1.407230e+002 5 39

-7.458948e+000 -3.782425e+000 9.387439e+000 -1.012991e+001 -1.045399e+001 -7.275731e+00 39

2.919024e+001 2.421864e+001 4.132079e+001 7.579975e+001 8.122581e+001 5.412838e+001 6.2 1.183515e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.402466e-001 5.975344e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.431054e-001 5.689462e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.348174e-001 6.518257e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.450952e-001 5.490478e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "umunani" 6 2 39

1.681069e+000 2.597972e-001 -2.102225e+000 -1.272634e+001 -7.886747e+000 -4.786170e+000 39

1.240989e+001 1.048145e+001 2.223962e+001 2.924567e+001 2.658429e+001 2.185395e+001 2.1 1.141408e+002 3 39

-9.025550e+000 -7.838059e+000 1.823010e-001 -1.517856e+001 -7.216865e+000 -1.019580e+00 39

4.965593e+001 4.002261e+001 3.900344e+001 6.009333e+001 4.482301e+001 4.609790e+001 5.3 1.324766e+002 4

77

39

-3.113127e+000 -6.011573e+000 -4.997765e-002 -1.301623e+001 -5.893340e+000 -9.431502e+0 39

1.965076e+000 1.852885e+001 3.407612e+001 1.203665e+001 8.759190e+000 1.875137e+001 3.3 1.023604e+002 5 39

-6.238905e+000 1.952969e+000 8.139153e+000 -1.412227e+001 -5.182961e+000 -4.409090e+000 39

1.011204e+001 5.521096e+000 2.187045e+001 6.730286e+001 2.985188e+001 3.386910e+001 3.4 9.443905e+001 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.228172e-001 7.718279e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.619250e-001 2.538341e-002 1.269155e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.631567e-001 3.684329e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.365148e-001 6.348520e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "icyenda" 6 2 39

-1.539500e+001 1.913122e+000 4.892755e+000 -7.411077e+000 -3.252254e+000 -4.670670e+000 39

3.937204e+001 2.366165e+001 2.901213e+001 7.555241e+001 3.239333e+001 4.215692e+001 4.1 1.331492e+002 3 39

-9.767091e+000 -7.412555e+000 -7.954957e-001 -1.257044e+001 -9.115961e+000 -1.128199e+0

78

39

6.789838e+001 8.131030e+000 2.133095e+001 1.963373e+001 1.534718e+001 4.011393e+001 4.2 1.051703e+002 4 39

-4.214276e+000 -3.088863e+000 5.680709e+000 -1.330309e+001 -1.153709e+001 -9.793489e+00 39

3.400375e+001 3.902774e+001 1.417232e+001 5.824862e+001 2.261618e+001 3.309546e+001 3.8 1.305070e+002 5 39

-3.845330e+000 -7.520312e+000 -4.539398e+000 -1.070767e+001 -7.937269e-001 -4.884501e+0 39

1.816542e+001 2.136757e+001 1.688432e+001 2.374783e+001 2.672420e+001 2.343285e+001 3.5 1.102168e+002 6 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.504286e-001 4.957141e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.002472e-001 9.975278e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.299484e-001 7.005156e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.361448e-001 6.385522e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 ~h "sil" 6 2 39

-1.166347e+001 -2.349667e+000 6.230773e-001 -5.791427e+000 -3.163599e+000 -3.262644e+00 39

1.801193e+001 1.427911e+001 2.757210e+001 3.023239e+001 2.781987e+001 3.004006e+001 2.7

79

1.008924e+002 3 39

-8.603083e+000 2.727572e+000 3.617722e+000 1.818626e+000 3.906403e-001 3.107029e-001 -6 39

7.133804e+000 6.455761e+000 8.630907e+000 1.268900e+001 1.100004e+001 1.200856e+001 1.4 7.239511e+001 4 39

-1.287919e+001 -1.880384e+000 -2.084125e+000 -2.492788e+000 -3.290475e+000 -3.127917e+0 39

3.802010e+000 4.608351e+000 6.783229e+000 1.065659e+001 1.005600e+001 1.236252e+001 1.4 6.763563e+001 5 39

-1.074988e+001 -1.872770e+000 3.384747e-001 -3.966482e+000 -1.400925e+000 -4.761750e+00 39

1.933378e+001 4.053452e+001 2.861359e+001 4.188812e+001 3.746236e+001 3.709630e+001 5.3 1.274157e+002 6 0.000000e+000 7.034281e-001 2.965719e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.318210e-001 3.570442e-002 3.247460e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.126506e-001 7.896068e-002 8.388670e-003 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 9.393034e-001 3.173170e-002 2.896489e-002 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 8.878226e-001 1.121774e-001 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000

80

Appendix F VarFloor1

~v varFloor1 39

3.204677e-001 2.879857e-001 3.528846e-001 5.995725e-001 3.228612e-001 4.151070e-001 4.100843e001 5.447863e-001 5.071705e-001 4.509848e-001 3.897870e-001 3.592629e-001 1.131071e+000 8.854087e-003 1.427213e-002 1.615750e-002 2.087054e-002 2.114771e-002 2.467944e-002 2.856188e002 3.700293e-002 2.934534e-002 2.524008e-002 2.329455e-002 2.179735e-002 2.509273e-002 1.355496e-003 2.461951e-003 2.714343e-003 3.531215e-003 3.873707e-003 4.520955e-003 5.346298e003 6.792482e-003 5.316769e-003 4.564275e-003 4.260586e-003 3.976295e-003 3.301969e-003

81

Appendix G Recognition Output #!MLF!# ”data/test/rimwe t01.rec” 0 4500000 sil -3032.253662 4500000 8500000 rimwe -2883.145996 8500000 14300000 sil -3109.180908 . ”data/test/rimwe t02.rec” 0 4200000 sil -2704.837891 4200000 7800000 rimwe -2715.276611 7800000 12300000 sil -2285.158447 . ”data/test/rimwe t03.rec” 0 4600000 sil -2935.652588 4600000 8300000 rimwe -2754.431152 8300000 12300000 sil -2111.437012 . ”data/test/rimwe t04.rec” 0 6200000 sil -3782.968018 6200000 9900000 rimwe -2785.990234 9900000 12300000 sil -1326.802124 . ”data/test/rimwe t05.rec” 0 4700000 sil -2567.530518 4700000 8600000 rimwe -2805.017090 8600000 12300000 sil 1997.706299 . ”data/test/rimwe t06.rec” 0 3400000 sil -2565.149658 3400000 7400000 gatandatu -3665.021973 7400000 12300000 sil -3020.209961 . ”data/test/rimwe t07.rec” 0 9700000 sil -6393.378906 9700000 14600000 umunani -4403.274414 14600000 18300000 sil -2383.268311 . ”data/test/kabiri t01.rec” 0 3400000 sil -2131.476563 3400000 8200000 kabiri -3627.199463 8200000 19800000 sil -5838.706055 . ”data/test/kabiri t02.rec” 0 4000000 sil -2804.803223 4000000 9000000 karindwi -3969.567871 9000000 10300000 sil -1011.013245 . ”data/test/kabiri t03.rec” 0 1900000 sil -1134.861694 1900000 6000000 kabiri -3102.768799 6000000 8300000 sil -1219.186523 . ”data/test/kabiri t04.rec” 0 4100000 sil -2194.458008 4100000 8600000 kabiri -3459.557373 8600000 10300000 sil -994.580750 . ”data/test/kabiri t05.rec” 0 4700000 sil -3251.574219 4700000 9300000 kabiri -3424.394775 9300000 13800000 sil -2837.004639 . ”data/test/kabiri t06.rec” 0 11900000 sil -6528.833984 11900000 16400000 karindwi -4107.406250 16400000 18300000 sil -1274.626709 . ”data/test/kabiri t07.rec” 0 700000 sil -530.552368 700000 6000000 kabiri -4601.360840 6000000 8300000 sil -1722.909912 . ”data/test/gatatu t01.rec” 0 5200000 sil -3589.275146 5200000 10900000 gatatu -4363.254883 10900000 12300000 sil -924.230225 . ”data/test/gatatu t02.rec” 0 3500000 sil -2398.933594 3500000 9400000 gatatu -4556.339844 9400000 12300000 sil -1843.960815 . ”data/test/gatatu t03.rec” 0 3500000 sil -2317.750244 3500000 9400000 gatatu -4564.879883 9400000 12300000 sil -1503.680542 . ”data/test/gatatu t04.rec” 0 2900000 sil -1773.397217 2900000 8500000 gatatu -4343.373535 8500000 10300000 sil 1078.210449 . ”data/test/gatatu t05.rec” 0 4200000 sil -2778.085205 4200000 10200000 gatatu -4663.026367 10200000 12300000 sil -1197.939087 . ”data/test/gatatu t06.rec” 0 1000000 sil -689.505798 1000000 6000000 gatatu -4486.029785 6000000 8300000 sil -1417.254517

82

. ”data/test/gatatu t07.rec” 0 5300000 sil -3834.749268 5300000 11300000 gatatu -5677.827148 11300000 14300000 sil -2043.956055 . ”data/test/kane t01.rec” 0 3200000 sil -2327.186279 3200000 6600000 kane -2558.228760 6600000 9800000 sil -1841.534424 . ”data/test/kane t02.rec” 0 4500000 sil -2885.650391 4500000 8300000 kane -2810.858643 8300000 10300000 sil -1252.745239 . ”data/test/kane t03.rec” 0 2900000 sil -2006.716797 2900000 6700000 kane -2852.447510 6700000 10300000 sil -1796.379395 . ”data/test/kane t04.rec” 0 3300000 sil -2214.952148 3300000 7000000 kane -2733.631592 7000000 8300000 sil -725.258545 . ”data/test/kane t05.rec” 0 3200000 sil -1931.485962 3200000 6900000 kane -2699.544434 6900000 8300000 sil -746.918701 . ”data/test/kane t06.rec” 0 600000 sil -460.335510 600000 4300000 kane -3288.060059 4300000 9800000 sil -3577.091064 . ”data/test/kane t07.rec” 0 7700000 sil -4761.689941 7700000 11500000 kane -3384.530518 11500000 12300000 sil -540.977661 . ”data/test/gatanu t01.rec” 0 3300000 sil -2016.124268 3300000 10200000 gatanu -5082.707520 10200000 12300000 sil -1063.003662 . ”data/test/gatanu t02.rec” 0 4100000 sil -2335.946777 4100000 10400000 gatanu -4572.570313 10400000 12300000 sil -990.890747 .

”data/test/gatanu t03.rec” 0

3300000 sil -1847.578735 3300000 9100000 gatanu -4382.743652 9100000 12300000 sil -1719.962524 . ”data/test/gatanu t04.rec” 0 4600000 sil -2818.372070 4600000 10100000 gatanu -3986.628174 10100000 12300000 sil -1200.381958 . ”data/test/gatanu t05.rec” 0 4100000 sil -2331.471924 4100000 9600000 gatanu -3933.541260 9600000 12300000 sil -1493.610840 . ”data/test/gatanu t06.rec” 0 900000 sil -576.625244 900000 6600000 gatandatu -5100.191895 6600000 8300000 sil 1051.420288 . ”data/test/gatanu t07.rec” 0 6800000 sil -4709.265137 6800000 13900000 gatanu -5845.623535 13900000 14300000 sil -260.411957 . ”data/test/gatandatu t01.rec” 0 4300000 sil -3079.326416 4300000 9500000 gatandatu -4043.568359 9500000 12300000 sil -1583.143311 . ”data/test/gatandatu t02.rec” 0 3500000 sil -1958.502197 3500000 10000000 gatandatu -5172.549805 10000000 12300000 sil -1294.331665 . ”data/test/gatandatu t03.rec” 0 3400000 sil -2035.299316 3400000 10100000 gatandatu -5445.789551 10100000 10300000 sil -167.506531 . ”data/test/gatandatu t04.rec” 0 1800000 sil -1209.215454 1800000 8900000 gatandatu -5420.343750 8900000 12300000 sil -1944.851929 . ”data/test/gatandatu t05.rec” 0 5100000 sil -2798.036377 5100000 12300000 gatandatu -5575.925293 12300000 16300000 sil -2116.223145 . ”data/test/gatandatu t06.rec” 0 200000 sil -410.355652 200000 7800000 gatandatu -6987.959473 7800000 8300000 sil -381.262238 . ”data/test/gatandatu t07.rec” 0 10200000 sil -6375.278320 10200000 17300000 gatandatu -6801.508301 17300000 20300000

83

sil -1918.914795 . ”data/test/karindwi t01.rec” 0 3100000 sil -2471.800293 3100000 9600000 karindwi -5190.781250 9600000 10300000 sil -447.699554 . ”data/test/karindwi t02.rec” 0 4000000 sil -2909.214600 4000000 10300000 karindwi -4961.906250 10300000 14300000 sil -2391.512451 . ”data/test/karindwi t03.rec” 0 3700000 sil -2514.477051 3700000 10200000 karindwi -5164.201172 10200000 14300000 sil -2367.164551 . ”data/test/karindwi t04.rec” 0 2200000 sil -1508.051147 2200000 8500000 karindwi -4925.602539 8500000 12300000 sil -2103.998535 . ”data/test/karindwi t05.rec” 0 3100000 sil -1962.216675 3100000 9400000 karindwi -5152.186035 9400000 12300000 sil -1768.007935 . ”data/test/karindwi t06.rec” 0 1500000 sil -1065.677124 1500000 7200000 karindwi -5150.283203 7200000 9800000 sil 1699.175293 . ”data/test/karindwi t07.rec” 0 6100000 sil -4322.256348 6100000 11800000 karindwi -5017.796387 11800000 18300000 sil -4314.830078 . ”data/test/umunani t01.rec” 0 2100000 sil -1265.518921 2100000 8500000 umunani -4985.961426 8500000 10300000 sil -1060.763550 . ”data/test/umunani t02.rec” 0 2700000 sil -1435.257080 2700000 10000000 umunani -5558.234863 10000000 10300000 sil -197.015854 . ”data/test/umunani t03.rec” 0 3400000 sil -1931.043823 3400000 10400000 umunani -5685.582520 10400000 12300000 sil -1096.007324 . ”data/test/umunani t04.rec” 0 2500000 sil -1603.023804 2500000 9400000 umunani -5220.315430 9400000 11800000 sil -1374.329956 . ”data/test/umunani t05.rec” 0 2500000 sil -1402.714966 2500000 9500000 umunani -5534.970215 9500000 12300000 sil -1562.454346 . ”data/test/umunani t06.rec” 0 4400000 sil -3140.977539 4400000 12400000 umunani -7357.295898 12400000 14300000 sil -1438.931641 . ”data/test/umunani t07.rec” 0 7800000 sil -5594.510254 7800000 16700000 umunani -7102.675293 16700000 18300000 sil -982.074829 . ”data/test/icyenda t01.rec” 0 1900000 sil -1452.497437 1900000 7700000 icyenda -4527.358398 7700000 10300000 sil -1541.053589 . ”data/test/icyenda t02.rec” 0 1800000 sil -1142.076294 1800000 7300000 icyenda -4290.866211 7300000 8300000 sil -662.530518 . ”data/test/icyenda t03.rec” 0 4100000 sil -2891.251953 4100000 8300000 icyenda -3231.903564 8300000 11800000 sil -2064.225830 . ”data/test/icyenda t04.rec” 0 2100000 sil -1223.421631 2100000 8100000 icyenda -4642.086426 8100000 10300000 sil -1201.611450 . ”data/test/icyenda t05.rec” 0 3400000 sil -2206.691406 3400000 9000000 icyenda -4240.103027 9000000 12300000 sil 1832.621826 . ”data/test/icyenda t06.rec” 0 800000 sil -518.711365 800000 7700000 gatandatu -6347.489746 7700000 10300000 sil -1752.465698 . ”data/test/icyenda t07.rec” 0 12400000 sil -8788.501953 12400000 19300000 icyenda -6196.111328 19300000 24300000 sil -3499.174805

84

. ”data/test/zero t01.rec” 0 1600000 sil -915.991943 1600000 6400000 zero -3366.200684 6400000 8300000 sil -1075.914063 . ”data/test/zero t02.rec” 0 2900000 sil -1598.527100 2900000 7700000 zero -3288.929688 7700000 8300000 sil -425.674469 . ”data/test/zero t03.rec” 0 2500000 sil -1701.465332 2500000 7300000 zero -3282.274902 7300000 8300000 sil -527.153931 . ”data/test/zero t04.rec” 0 3500000 sil -2084.071045 3500000 8000000 zero -3044.202148 8000000 10300000 sil -1241.999268 . ”data/test/zero t05.rec” 0 2800000 sil -1659.132935 2800000 7300000 zero -3011.566162 7300000 8300000 sil -530.568054 . ”data/test/zero t06.rec” 0 7200000 sil -4286.444336 7200000 10400000 zero -2708.985840 10400000 12300000 sil 1289.067627 . ”data/test/zero t07.rec” 0 8000000 sil -5543.634766 8000000 12700000 zero -4128.586914 12700000 18300000 sil -3445.831787 .

85

appendix H Testing Data data/test/rimwe t01.MFC data/test/rimwe t02.MFC data/test/rimwe t03.MFC data/test/rimwe t04.MFC data/test/rimwe t05.MFC data/test/rimwe t06.MFC data/test/rimwe t07.MFC data/test/kabiri t01.MFC data/test/kabiri t02.MFC data/test/kabiri t03.MFC data/test/kabiri t04.MFC data/test/kabiri t05.MFC data/test/kabiri t06.MFC data/test/kabiri t07.MFC data/test/gatatu t01.MFC data/test/gatatu t02.MFC data/test/gatatu t03.MFC data/test/gatatu t04.MFC data/test/gatatu t05.MFC data/test/gatatu t06.MFC data/test/gatatu t07.MFC data/test/kane t01.MFC data/test/kane t02.MFC data/test/kane t03.MFC data/test/kane t04.MFC data/test/kane t05.MFC data/test/kane t06.MFC data/test/kane t07.MFC

86

data/test/gatanu t01.MFC data/test/gatanu t02.MFC data/test/gatanu t03.MFC data/test/gatanu t04.MFC data/test/gatanu t05.MFC data/test/gatanu t06.MFC data/test/gatanu t07.MFC data/test/gatandatu t01.MFC data/test/gatandatu t02.MFC data/test/gatandatu t03.MFC data/test/gatandatu t04.MFC data/test/gatandatu t05.MFC data/test/gatandatu t06.MFC data/test/gatandatu t07.MFC data/test/karindwi t01.MFC data/test/karindwi t02.MFC data/test/karindwi t03.MFC data/test/karindwi t04.MFC data/test/karindwi t05.MFC data/test/karindwi t06.MFC data/test/karindwi t07.MFC data/test/umunani t01.MFC data/test/umunani t02.MFC data/test/umunani t03.MFC data/test/umunani t04.MFC data/test/umunani t05.MFC data/test/umunani t06.MFC data/test/umunani t07.MFC data/test/icyenda t01.MFC data/test/icyenda t02.MFC data/test/icyenda t03.MFC

87

data/test/icyenda t04.MFC data/test/icyenda t05.MFC data/test/icyenda t06.MFC data/test/icyenda t07.MFC data/test/zero t01.MFC data/test/zero t02.MFC data/test/zero t03.MFC data/test/zero t04.MFC data/test/zero t05.MFC data/test/zero t06.MFC data/test/zero t07.MFC

88

human computer interface for kinyarwanda language - CiteSeerX

human computer interface for kinyarwanda language - CiteSeerX

Suggest Documents

Kinyarwanda

automatic speech recognition: human computer interface ... - CiteSeerX

Toward Interface Design for Human Language Technology - CiteSeerX

A practical EMG-based human-computer interface for ... - CiteSeerX

A practical EMG-based human-computer interface for ... - CiteSeerX

Natural Language in Computer Human- Interaction - CiteSeerX

Connecting with Dysphonia: Human-Computer Interface for

Kinyarwanda

Feature Selection for Brain-Computer Interface - CiteSeerX

Language Interface and Tutorial - CiteSeerX

Finite State Solutions For Reduplication In Kinyarwanda Language

Towards the next generation of Human-Computer Interface - CiteSeerX

Towards a Multimodal Human-Computer Interface to ... - CiteSeerX

The Human-Computer Interface is the System - CiteSeerX

Customization of Human-Computer Interface Guided by ... - CiteSeerX

The Human-Computer Interface is the System - CiteSeerX

Natural Language Human-Robot Interface Using

A Study to Evaluate a Natural Language Interface for Computer ...

Human-Computer Interface, User-friendly Man Machine Interface ...

Human-Computer Interface, User-friendly Man Machine Interface ...

Subjective Evaluation of Human-Computer Interface ...

Human Activity Language - CiteSeerX

New Human-Computer Interface Concepts for Mission Operations ...

Basic Human Computer Interface for the Blind - laccei